You are on page 1of 21

University of Manitoba

Introduction to Artificial Intelligence Research Paper

Autonomous Agent Behaviour


Augmented By

Natural Principles of Evolution

Author: Dale Hamel Student number: 7615293

December 4, 2010

Contents
1 Abstract 2 Introduction 2.1 2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An overview of GA theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 4 5 5 5 5 6 6 7 7 7 8 8 8 9 9

3 Elements of a Genetic Algorithm 3.1 3.2 3.3 3.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Genes and Alleles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Chromosomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reproduction 3.4.1 3.4.2 3.5 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fitness and The Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary of the GA process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Genetic Algorithm Design 4.1 4.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GAs applied to the Travelling Salesperson Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 4.2.2 4.3 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fitness and Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Summary of Design Criteria for GAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10

5 Role of GAs in Augmenting Adaptive Agents 5.1 5.2 5.3 5.4

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Learning Classier Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Summary: The role of a GA in Augmenting Adapting Agents . . . . . . . . . . . . . . . . . . . . . . 11

6 Genetic Algorithms Applied to Agent Behaviour: A case study - Genetic Pacman 6.1 6.2 6.3 6.4 6.5 12

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 The Agents and the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 GA design for GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 6.5.1 Gene and Chromosome Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.5.2 6.5.3 6.6 6.7 6.8

GeneticAgent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 The tness function, reproduction, and the GA process . . . . . . . . . . . . . . . . . . . . . 14

Results of simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Possible Extensions to Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 17

7 Limitations of GAs 7.1 7.2 7.3

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 The Right Tool for the Right Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Summary of GA limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18

8 Conclusion

List of Figures
1 2 3 4 5 6 7 GA Illustrated [Grupe & Jooste, 2004] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Structure of a chromosome [Rothlauf, 2006] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crossover Illustrated [Grupe & Jooste, 2004] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mutation Illustrated [Grupe & Jooste, 2004] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculating the tness of a tour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 6 7 9

Mapping Environment Vector (EV) to an Environment State ES . . . . . . . . . . . . . . . . . . . . 13 http://http://www.youtube.com/watch?v=5w Ks2patU . . . . . . . . . . . . . . . . . . . . . . . . 20

Abstract
A common problem faced when creating intelligent autonomous agents is that it is dicult or impossible to

anticipate all environments and situations in which they will need to perform. Through the use of evolutionary principles from John Hollands work on Genetic Algorithms and Learning Classier Systems, methods have have been developed that allow agents to adapt to their environment. This paper will discuss these methods, with a heavy focus on Genetic Algorithms (GAs). The structure and framework of a GA will be explained, and their design considerations will be discussed by way of example. A case study will also be provided, based on a predator/prey scenario, to demonstrate how GAs can play a role in producing adaptive agents. Although GAs are a powerful problem solving tool, they are not a perfect technique, and so the limitations of GAs will also be discussed.

2
2.1

Introduction
Overview

Designing an autonomous agent to eciently perform an intelligent function within a specic environment is a daunting task. The designer must impart his or her own knowledge of the task and environment into the agent, but has the added diculty of trying to view the environment from the agents perspective. While this may be practical in some roles, such as the creation of an expert system [Grupe & Jooste, 2004], it is a poor model for agents that must adapt to changing environments, as a new environment may necessitate a fundamental redesign of the agent. The eld of Articial Intelligence (AI) traditionally approaches cognition from an abstract, formal perspective that is detached from the system that displays it; many now consider this a dead paradigm [Moreno & Ibaez, 1997]. This n is essentially a top-down [Melanie, 1999], labour-intensive process, analogous to a controlling parent (the designer) that prevents its child (the agent) from developing on its own. Out of necessity, machine learning techniques have evolved that attempt to solve this problem, as it may be very dicult, if not impossible for a human designer to incorporate enough ...knowledge into an agent from the very beginning [Dorigo & Colombetti, 1998] . These emerging techniques are bottom-up approaches that allow the human designer to write simple rules, and still allow for complex, intelligent behaviours to emerge [Melanie, 1999]. The focus of this paper will be on the use of Evolutionary Computing and Evolutionary Algorithms (EAs). EAs began with Genetic Algorithms (GAs) [Reeves & Rowe, 2003], rst pioneered in the 1970s by John Holland [Holland, 1975], which are the most prominent example [Melanie, 1999] of an EA technique. As such, GAs will be the primary focus of this paper, but allusions will be made to other EA techniques such as evolution strategies (ES), and evolutionary programming (EP) [Reeves & Rowe, 2003]. GAs mimic the process of biological evolution in developing a solution [Carnahan & Simha, 2001], and in doing so reap the benets of a technique that is already known to produce agents capable of performing complex tasks, without sacricing the ability to adapt to new environments [Melanie, 1999].

In the endeavour to create strong AI agents, GAs have a special appeal [Melanie, 1999]. Strong AI seeks to create an intelligence equal to that of a human. The only known process capable of producing agents with the level of intelligence of a human is that which is known to have created humans; this is the process evolution. As GAs seek to mimic this process [Carnahan & Simha, 2001], they have the inherent advantage that they are attempting to model the only technique that is already known to work.

2.2

An overview of GA theory

From a technical perspective, GAs are programs that attempt to nd optimal solutions to problems when one can ... evaluate the optimal solution [Grupe & Jooste, 2004]. It should be noted that GAs werent designed to solve any particular problem, but rather to be adaptable to a large set of problems [Melanie, 1999], making them a powerful general problem solving technique [Carnahan & Simha, 2001]. GAs work by employing analogues to biological structures and processes, as will be discussed in further detail in the subsequent section. To provide an abstract overview though, GAs function analogously to natural selection within an ecosystem, as they were designed to. Grupe provides a good summary of GAs, which follows [Grupe & Jooste, 2004]. See gure 1 below for a visual representation of this summary. A population is randomly generated with a set of randomly generated genes. The GA then evaluates the tness of each of these individual members of the population, then through selection, mutation, and crossover, it generates a new population. This population contains most of the solutions of the previous generation, but with a disproportionate number of the more successful or t solutions. This process then repeats. After many generations, a very good, or perhaps even optimal, solution emerges.

Figure 1: GA Illustrated [Grupe & Jooste, 2004]

3
3.1

Elements of a Genetic Algorithm


Overview

This section provides a detailed description of the purpose of each component of a GA, as well as a general overview of how GAs function. It is important to note however, that there is a great deal of room for variation in the structure of a GA, as there is no rigorous denition of a GA [Melanie, 1999]. What is provided is a general, but not denitive, description.

3.2

The Genes and Alleles

The most fundamental part of a GA are the represented genes [Grupe & Jooste, 2004], composed of one or more alleles. In a well known experiment involving pea plants, Mendel discovered in 1866 that the genetic information of an organism is stored in a set of pairwise alleles. These alleles are stored by a number of strings, and determine the physical properties, appearance, and shape of an individual [Rothlauf, 2006]. More recently, it was discovered that this information is stored in the form of DNA, which takes the form of two long, parallel strings of four dierent types of nucleotides [Rothlauf, 2006]. It is appropriate then that these genes are often modelled as bitstrings for the purposes of GAs, though this is not always the case [Reeves & Rowe, 2003]. In the limited scope of this paper however, an allele will be represented by a bit, and a gene will be modelled as bitstring of length greater than or equal to 1 [Rothlauf, 2006]. The encoding of the gene corresponds to genotype, and the actual physical properties produced by the gene correspond to its associated phenotype [Rothlauf, 2006]. A gene then is a collection of alleles that is interpreted together to map to a particular phenotypic property [Rothlauf, 2006].

3.3

The Chromosomes

The chromosome refers to a specic collection or conguration of genes, [Grupe & Jooste, 2004]. In real genetic systems, most individuals have several chromosomes; for reasons of simplicity however, most GAs choose to employ the use of only one chromosome [Reeves & Rowe, 2003]. Each chromosome consists of many alleles, represented in gure 2 below as either a 0 or a 1. A gene is a grouping of these alleles, and a chromosome is a grouping of these genes [Rothlauf, 2006].

Figure 2: The Structure of a chromosome [Rothlauf, 2006]

Each chromosome corresponds to one candidate solution, and thus a group of chromosomes corresponds to the population of candidate solutions [Melanie, 1999]. The key role of the chromosome is to serve as the unit

of sexual reproduction, thus allowing for the dierent members of the population to augment the next generation [Grupe & Jooste, 2004].

3.4

Reproduction

Reproduction is the aspect of the GA that produces variation from generation to generation. Through selectively mating chromosomes in proportion to how well they perform, each generation is ensured to have chromosomes that behave better than the previous one [Grupe & Jooste, 2004]. As with real genetics, it is through reproduction that the genes adapt to their environment [Rothlauf, 2006], but this will be discussed in further detail in the subsection on crossover below. Reproduction produces new variations of the genotypes in the current population, to be expressed as new phenotypes in the next generation [Rothlauf, 2006]. The reproductive process introduces variation into the chromosome population in two ways: through crossover of the parent chromosomes, and through random mutation [Melanie, 1999].

3.4.1

Crossover

The crossover process in GAs is a direct analogue to crossover in single chromosome haploid cells [Melanie, 1999]. As is shown below in gure 3, through crossover, the two parent chromosomes are segmented at several specic parallel loci, such that both chromosomes are divided according to the same schema [Grupe & Jooste, 2004]. The two chromosomes then exchange corresponding segments, either at random or according to a specic function [Grupe & Jooste, 2004].

Figure 3: Crossover Illustrated [Grupe & Jooste, 2004]

The result of crossover is a chromosome that contains recombinant properties of both parents [Grupe & Jooste, 2004]. The net results of this process is a chromosome that may be more well adapted to its environment than either of its parents [Melanie, 1999].

3.4.2

Mutation

Mutation typically occurs after crossover, and attempts to simulate the fact that in the biological analogy the replication of DNA isnt perfect, so changes are randomly introduced in the replication process [Melanie, 1999]. As illustrated in gure 4, GAs may use mutation to introduce additional variation by randomly modifying alleles, though this is typically done with a relatively low probability threshold ( .001) [Melanie, 1999].

Figure 4: Mutation Illustrated [Grupe & Jooste, 2004]

3.5

Fitness and The Fitness Function

Fitness is the measure of a genotypes inuence upon the future [Holland, 1975]. Genes act in many ways which aect the characteristics relevant to survival. That is to say, tness is a composite value, determined by the combination of genes, and not by any one gene [Levins, 1968]. It is through tness that adaptation becomes possible in any natural or articial adaptive system [Holland, 1975]. In order to assess tness in a GA, a tness function must be specically tailored to each GA. Its role is to assess the tness value of a chromosome in the population [Holland, 1975]. This value serves as a measure of the optimality of the solution [Grupe & Jooste, 2004]. The tness function is the GAs attempt to represent natural selection [Holland, 1975], which is arguably the most important mechanism of evolution [Campbell et al., 2007]. Natural selection is the process of the environment selecting those phenotypes, expressed by genes, that produce traits favourable to survival [Campbell et al., 2007]. Just as there is a genotype-to-phenotype mapping, there is a phenotype-to-tness mapping that is produced by the tness function [Rothlauf, 2006].

3.6

Summary of the GA process

All of the aforementioned aspects of a GA play an important role in the overall functioning of the GA. The genes and chromosomes provide the representation [Rothlauf, 2006] for the problem space, while reproduction introduces variation through crossover and mutation [Melanie, 1999]. Reproduction occurs according to the tness function, which evaluates each chromosome in the system to determine its likelihood of reproduction [Rothlauf, 2006]. Mitchell Melanie [Melanie, 1999] provides a good description of a general, simple genetic algorithm, which is paraphrased below. 1. Begin by randomly generating a population of chromosomes by randomly combining genes in the system, with each chromosome representing a candidate solution to the problem. 7

2. Calculate the tness of each chromosome in the population, simulating the act of the environment naturally selecting an organisms genes based on the organisms phenotype. 3. On the basis of this selection, reproduction occurs, with those individuals selected as being well-adapted having a greater chance of reproducing than those selected as being ill-adapted. Crossover and mutation add variation to the ospring chromosomes produced, and a new generation of solutions, encoded in chromosomes, is formed. 4. The current population is replaced with the population generated. 5. The entire process starts over from step 2. The process continues, usually for about 50-500 generations or until an optimal solution is reached, with the entire process being called a run [Melanie, 1999]. The elements of a GA are codependent on each other. While tness provides the basis for adaptation, simply duplicating a highly t parent in the production of the ospring is not sucient [Holland, 1975]. The GA process is dependant on crossover and mutation provide the essential variation required to adapt to new environments [Holland, 1975]. To summarize, crossover and mutation, acting on the chromosomes composed of genes and alleles, produce new variations that are assessed for tness and selected by the tness function. Using this process, after each generation the tness of the population gradually increases [Melanie, 1999].

4
4.1

Genetic Algorithm Design


Overview

Now that the concept of a GA has been claried, and the terminology has been established, it is appropriate to discuss the specic design considerations of a GA. This section will employ the use of toy problem case study, adapted from the work of Grupe and Jooste [Grupe & Jooste, 2004], as well as Carnahan and Simha [Carnahan & Simha, 2001] to facilitate the explanation.

4.2

GAs applied to the Travelling Salesperson Problem

A well known intractable graph theory problem is the Travelling Salesperson Problem (TSP) [Carnahan & Simha, 2001], which is perhaps the most well known combinatorial optimization problem. While this problem is well known, it is still prudent to provide a brief summary to provide a basis for discussion. To paraphrase from Carnahan and Simhas summary [Carnahan & Simha, 2001]: In the TSP there are a set of cities on a map, often represented by nodes on a graph. The objective of the salesperson is nd the shortest possible path (the optimal tour) that visits every city (node) exactly once, and ends up in the salespersons home city (the last node is the rst node). A conventional AI approach to solve this problem might use a heuristic, like iteratively picking the closest city to the current one, to eventually produce a tour. This tour however, is not necessarily optimal [Carnahan & Simha, 2001].

Even worse than this is a brute-force approach or exhaustive search that attempts to determine all possible tours. Such an approach must calculate all
n1 2

possible tours in an n-city problem, which is computationally impracti-

cal [Carnahan & Simha, 2001]. For the purpose of this representation, an instance of the TSP with 26 cities will be considered. Note that by an exhaustive search, there are 7.775 102 4 (nearly eight septillion) possible tours.

4.2.1

Representation

A key consideration in GA design is mapping elements of the problem to elements of a GA, which is not always easy or practical [Grupe & Jooste, 2004], but the limitations of GAs will be discussed in a later section. The process of representation begins by mapping each state in the search space to a chromosome [Melanie, 1999]. From there, the elements of a state in the search space are dissected to determine what genes make up a chromosome. This is done by asking the question what factors determine the state in the search space?, and mapping the results to genes [Rothlauf, 2006]. The GA approach to the TSP begins with representation. The chromosome corresponds to a particular tour, with each of the genes that compose it corresponding to a particular city in the tour. A population of chromosomes can be generated (randomly), to represent the solution set. The order of the genes within the chromosome correspond to the sequence of cities in the tour [Grupe & Jooste, 2004], and thus the optimal tour in the solution set is the ttest chromosome in the population. Consider now how this applies to the example instance of the TSP. This instance is paraphrased from the work of Grupe and Jooste [Grupe & Jooste, 2004]. There are 26 cities, so the gene that represents a city must have at least cieling( log(26) ) = 5 bits to uniquely identify it. A particular chromosome, encoding for a possible tour, would log(2) then contain a sequence of 26 of these genes, each encoding for a dierent stop in the tour.

Figure 5: Calculating the tness of a tour.

4.2.2

Fitness and Reproduction

Now that it has been established how a tour can be represented in a GA by a chromosome, and a population of chromosomes represents the solution set, the question of nding the optimal solution, or ttest chromosome, can be addressed. The role of a tness function is the same in any GA, though the implementation can vary substantially; this example is used pedagogically. The tness function for the TSP operates by examining the sequence of cities (genes in the chromosome), and looking up the distance between adjacent cities [Grupe & Jooste, 2004]. From this, it calculates the total length of each tour by adding the distances between all cities on the tour from start to nish. Shorter tours are more optimal, and are thus more t [Grupe & Jooste, 2004]. Each chromosome, encoding for a tour, is then assigned a tness 9

value, corresponding to its likelihood of reproduction. The higher the tness value, the more likely a the chromosome is to reproduce [Melanie, 1999]. This is essential for ensuring that the good solutions are propagated to the next generation [Holland, 1975]. Since it is not desirable to have exact duplicates of good solutions in subsequent generations, as this does not allow for further adaptation [Holland, 1975], crossover and mutation are applied to each chromosome that participates in reproduction. Note that in this model it is possible, but not certain, for a chromosome to reproduce with itself, negating the eects of crossover. This allows for the possibility of a chromosome to propagate unchanged to the next generation.

4.3

Summary of Design Criteria for GAs

There are specic design criteria that a problem must satisfy in order to be used by a GA, as Grupe and Jooste have outline [Grupe & Jooste, 2004], which has been paraphrased below: 1. The problem must be able to be dened in terms of parameters than can be encoded into chromosomes. 2. There must be a mathematical criteria for evaluating solutions in terms of their optimality. In the case of the TSP, this was the length of a specic tour. 3. GAs are especially eective when there are multiple optimal solutions. Consider in the TSP example, if there were several cities that were equidistant from each other; there would be several optimal tours. 4. GAs are especially advantageous when the only known solutions have a large problem space with many variables, where brute force is computationally impractical. GAs become more and more appealing as a the data inputs become more numerous, complex, and not well understood. While GAs themselves are not guaranteed to produce optimal solutions [Grupe & Jooste, 2004], they have been demonstrated to perform very in a wide variety of problems [Carnahan & Simha, 2001]. In the TSP example above, the GA would complete when it nds a tour that satises a certain optimal value, though as the GA has no means of determining this, it must be specied externally [Grupe & Jooste, 2004]. Thus, while GAs are known to outperform heuristic searches in the TSP [Carnahan & Simha, 2001], they only perform very well in this situation because the TSP lends itself to GA representation [Grupe & Jooste, 2004]. However, even for a general purpose problem solving technique [Grupe & Jooste, 2004], GAs are surprisingly versatile [Carnahan & Simha, 2001].

5
5.1

Role of GAs in Augmenting Adaptive Agents


Overview

Now that the concept of a GA has been suciently introduced and explained, it is appropriate to examine how they may be applied to the design of intelligent agents, which is the subject of this paper. This Section will discuss 10

the role of GAs in the context of producing adaptive agents by discussing the role of a GA in a Learning Classier System (LCS), as a method of Reinforcement Learning (RL).

5.2

Reinforcement Learning

RL is a class of problems where agents learn which rules are good by trial and error, and are provided an associated reinforcement value [Dorigo & Colombetti, 1998]. This value may be obtained directly, by the agents interaction with the environment, or indirectly, through an external observer that assesses the agents performance in the environment. This observer is typically called a trainer, as it is analogous to how a psychologist might using positive and negative reinforcement to train animals [Dorigo & Colombetti, 1998]. The preferred of these two approaches is of course the former, as an approach to creating agents that are able to evaluate and learn based on its own performance is both elegant and desirable [Dorigo & Colombetti, 1998]. Unfortunately, there are limitations to a strictly internal model of reinforcement learning. An agent may reward itself when it detects that a goal state has been achieved (positive reinforcement), and punish itself when an obstacle is reached (negative reinforcement) [Dorigo & Colombetti, 1998]. Delayed reinforcement of this nature is theoretically possible, but practically unacceptably inecient. However, the external trainer model provides immediate reinforcement that bypasses this problem [Dorigo & Colombetti, 1998]. Thus, it would seem that RL has a dilemma: strictly internal training (learning) is desirable, but only external training is practical.

5.3

Learning Classier Systems

Suppose that an articial trainer could be used to solve the previously mentioned problem. This exists in the form of a reinforcement program (RP), to help create a set of rules (mappings of states to actions that maximize the agents reward) [Dorigo & Colombetti, 1998]. One approach to this is to try to learn using a value function: to associate a value to each state or state-action pair in order to implement a rule [Dorigo & Colombetti, 1998]. A contrasting technique is to search through all rules instead of value functions [Dorigo & Colombetti, 1998]. The above techniques are attempts to solve the problem of adaptive optimal control [Sutton et al., 1992]. Learning Classier Systems (LCS) are a technique that uses RL to address the problem of adaptive optimal control [Dorigo & Colombetti, 1998]. First, positive reinforcement is attributed to the rules that contribute to a desired action. A GA is then used to try to come up with new, and possibly useful rules, and the tness function assigns each rule with an associated strength [Holland & Reitman, 1978]. LCSs thus may learn from external reinforcement provided by an RP, and internally by using GAs that use this provided reinforcement value [Holland & Reitman, 1978].

5.4

Summary: The role of a GA in Augmenting Adapting Agents

GAs are the rule discovery element of an LCS that provide variation, allowing for new rules to be generated intelligently from rules that were previously known to be good [Holland & Reitman, 1978]. In the case of an LCS, this reinforcement value may be provided by an articial RP [Dorigo & Colombetti, 1998]. GAs thus play a crucial role in producing adaptive agents, by allowing for internal adaptation to be used in concert with external reinforcement. 11

Genetic Algorithms Applied to Agent Behaviour: A case study - Genetic Pacman

6.1

Overview

This section will describe a specic application of GAs to produce adaptive autonomous agents. The code for this project is based on The Pacman Projects [Klein & DeNero, n.d.], it is available freely, and can be obtained using the SVN protocol at the address: svn://hake.cs.umanitoba.ca/pacman. Genetic Pacman (GP) is a toy problem, and demonstrates only how GAs may be used to generate new rules based on the reinforcement value provided by the tness function. It is not nearly as sophisticated as the LCS model described above. GP is an example an RL system employing what has been described previously as an internal reinforcement, as there is no trainer; the agent learns through many successive generations, and receives its reinforcement value directly from its environment via the tness function.

6.2

The Agents and the Environment

The universe in the GP simulation is a two dimensional maze, with every traversable Cartesian co-ordinate in the maze containing a pellet. This constitutes the environment that the agents interact with. In the GP simulation there are two types of agents: predators and prey. The predators are represented by ghosts, and the prey is represented by pacman agents. The goal of the predators are to catch (occupy the same Cartesian co-ordinates) as the prey. The goal of the prey is to evade the predators, and either survive for as long as possible or consume all of the pellets in the universe. The prey is motivated by a survival instinct: it has knowledge of the environment and the predators location, which it uses to try to stay alive for as long as possible. The prey is a reactive autonomous agent, and its behaviour depends entirely on the layout of its environment and the behaviour of the predator. If all pellets are consumed, the algorithm nishes having reached the goal state: an optimal chromosome. For the sake of simplicity, the simulation instance that will be discussed will contain only one ghost and one pacman. In the summary of this section, alternate congurations will be discussed.

6.3

GA design for GP

The pacman agents behaviour is governed by a very simple set of genetic rules, mapping the environment state to an action state. In order to do this however, there must be mechanisms in place to calculate the environment state, and some way of representing the associated action states. In order to map the environment state to an action state, a predator vector value is used. This value is computed as the dierence between the ghosts position, and pacmans position, ie, it is the vector pointing to the ghosts position relative to pacmans position. The predator vector is then mapped to one of 8 states, depending on where it is pointing, as shown in the gure below, thus providing a mechanism to determine the environment state.

12

EVX EVY ES

+ 0 0

+ + 1

0 + 2

+ 3

0 4

0 6

+ 7

Figure 6: Mapping Environment Vector (EV) to an Environment State ES Now that a mechanism has been provided to determine an environment state, the associated action state must be represented. The agent will have four possible action states: 1. Move up (Directions.NORTH) 2. Move down (Directions.SOUTH) 3. Move right (Directions.WEST) 4. Move left (Directions.EAST) Each of these actions forms an allele, requiring
log(4) log2

= 2 bits to represent it. A gene is then made up of three

alleles, containing three action states in order of preference. The entire chromosome is then eight genes, corresponding to the eight environment states mapped to action states. When mapping an environment state to an action state, the rst legal action in the gene is taken, such that the preferred legal action is the one executed. The initial population is generated such that there is genetic completeness; there must be an instance of each gene in the system. This is done done by generating all possible genes for each of the eight EVs, and placing them in eight separate EV pools. The chromosomes of the initial population are generated by randomly removing one gene from eight of these eight gene pools.

6.4

Reproduction

In the pacman simulation, reproduction is governed by the tness value associated with each chromosome. During each generation, the spawned chromosomes will be supplied to a pacman agent to govern its behaviour in the simulator. The number of pellets consumed by a pacman agent is recorded in the form of the agents score. The average score is calculated after each generation, and each agents tness value is determined by the ratio between its score, and the average score. 13

During reproduction, a mating pool will be generated, containing each chromosome in the population with a frequency proportionate to the score ratio calculated above. The chromosome that produced the highest score is mated with a random chromosome from this pool. These chromosomes participate in crossover, to generate an ospring chromosome that is added to the next generation. For purposes of simplicity, mutation is not modelled in this simulation. During crossover, two random loci (crossover points) are selected such that they are always at the start of a gene (crossover does not modify genes, only their higher level arrangement). After the loci are selected, the crossover process occurs as described previously. All chromosomes that do not reproduce are allowed to remain in the system, to preserve genetic completeness. This is to ensure that no genes are eliminated from the simulation. Over many generations however, the genes in the chromosomes that do reproduce begin to overwhelm the genes in the chromosomes that reproduce with lower frequency.

6.5

Implementation details

Some signicant modications were made to the pacman simulator in order to implement Genetic Pacman. See table of contents for more details.

6.5.1

Gene and Chromosome Representation

Gene and Chromosome classes were created that represent those objects within the simulation, as well as the necessary methods to generate an initial population that is genetically complete, and a method to generate a chromosome composed entirely of random genes. The latter of these two methods was created for the purpose being able to test how many runs would be required (without using the genetic process) to generate a goal chromosome. Most of this code is contained within the le genetic.py.

6.5.2

GeneticAgent

A new agent class called GeneticAgent was created, that interprets chromosomes and maps the information stored in each gene to the associated environment vector. This class is passed a chromosome object at each simulation, which the agent uses to govern its behaviour. When the simulation terminates, information from this class is passed into the le pacman.py, to an instance of the class ResultSet, which is read by the higher level GA methods.

6.5.3

The tness function, reproduction, and the GA process

The higher level methods that implement the rest of the GA are located in the main le runpg.py. This le accepts a few command line arguments, and includes the majority of the control logic for the GA. Within it, there are methods that calculate the tness values of each chromosome in the population, perform reproduction, perform crossover, and a main method that runs each generation through the simulator, and passes the results of the simulation to the above methods to generate the next generation. 14

6.6

Results of simulation

Through the use of the genetic algorithm, a goal chromosome was able to be obtained typically within 3-6 generations, though this depends on which computer the simulation is run on, as the random seed used is machine dependent. Some random seeds cause the population to converge on a local maxima, because a aw in the design can allow for the gene pool to be saturated. With a favourable random seed, as few as 44 chromosomes needed to be generated in order to nd a goal chromosome. There are instructions to run the simulation in this manner following the bibliography: Another set of 365373 random chromosomes were generated, of which 395 survived, which is a ratio of approximately 1.08 103 . This proves that the genetic process is vastly superior to random chance, and capable of rapidly producing adaptive agent behaviour. As the results obtained by the genetic and the random results above show that the GA produces the goal solution far faster than simply randomly generating the chromosomes. The agents that survived all exhibited a similar behaviour pattern: fear of the predator. Those agents that were not afraid of the predator died sooner than those who ran from it. The longer an agent survived, the more pellets it was likely to consume, and thus natural selection rewarded those agents with fear of the predator by assigning them a higher tness value. Those agents genes were thus more likely to propagate to subsequent generations. Over just a few generations a chromosome is likely to be produced that encodes the behaviour necessary to survive long enough to consume all of the pellets in the universe, and achieve the goal state. Thus, the GA was shown to successfully cause the agents to adapt to their environment after only a few generations.

6.7

Possible Extensions to Simulation

Currently, the simulation only maps two elements of the of the environment state to agent action states. As a result of this, the pacman agents are only aware of the predators location, and the legal moves available. This severely limits the agent in several ways. In particular, it only allows for one predator agent to be present in the system. This can be improved. Currently, the pacman agent is only aware of eight environment states, and four each of those states, it has three possible actions. This means that the pacman agent essentially only has 24 possible states encoded into it. Consider also that when an agent has no possible moves, its only alternative is to stop, and that brings the total number of states to 32. By increasing the environment states, to account for multiple predator vector states by increasing the gene size, it is likely that a chromosome could be developed that could survive against four ghost agents. Additionally, there is no environment mapping for food layout. Another chromosome could be used that maps predator and food proximity, to give the pacman agent incentive to try to eat when in low danger scenarios (ie, rather than pace back and forth when the predators are far away, go on quick food skirmishes). As the framework is already in place for a single predator, with some modications to the existing framework it should theoretically be possible to extend and enhance the simulator to account for more and more complex 15

environments: 1. With each new environment, tness functions are dened to act as environmental reinforcement, a form of RP, through natural selection. 2. A set of goal chromosomes, that pass these tness functions are produced, and thus the agents learn to adapt to a new environment. 3. A new aspect of the environment is modelled, and the process repeats, adapting and expanding and becoming more intelligent with each iteration.

6.8

Conclusion

The purpose of this case study was to demonstrate how, on a very simple level, GAs may be used to provide variation and adaptation to articial agents. While GAs are known to perform well within this context using an LCS [Dorigo & Colombetti, 1998], such a demonstration is beyond the scope of this paper, or the expertise of this author. The most signicant implication of the use of a GA to produce the chromosomes to govern the pacman agent behaviour is that the same chromosome that worked well against one predators behaviour can also perform well against any predators behaviour. If this is not the case, then the same framework used previously can be simply re-run against the new predator, to produce a new optimal chromosome. Thus, the GA provides a simple framework for producing adaptive agents to perform well against any predators behaviour. To obtain an optimal agent, the designer needs only to encode the atomic attributes of the problem into the GA, and allow the natural selection process to occur. In this simulation, GAs were used as a form of RL for the prey agent, only this was done on a population level, as opposed to an agent level. The environment, in the form of the predator, provided reinforcement values (the survival ratio), that allowed for the GA to re-enforce good behaviour, and suppress bad behaviour in subsequent generations. While GAs can be a very powerful form of RL, they only work well in situations where they can be modelled eectively. In this case study, GAs were used to model the agents reaction to its knowledge of the environment, by generating a mapping of environment state (ES) to action state. This is a very simple model, that can be expanded in numerous ways. This model accounted for only a single predator, and a single prey, but by using a more sophisticated chromosome, it is possible to extend the behaviour mapping to account for multiple predators on multiple vectors. Additionally, the pacman agents currently have no motive to consume the pellets; their only goal is to stay alive. A good extension would be to add a pellet vector to the model, so that the pacman agents would have a more sophisticated goal than simply to survive.

16

7
7.1

Limitations of GAs
Overview

While GAs and other evolutionary strategies are known to be good general problem solving tools [Grupe & Jooste, 2004], they have their limitations as with any other technique. Foremost among these, is that GAs are not guaranteed to nd an optimal solution to a problem, but rather nd the best solution possible given their particular structure and design [Grupe & Jooste, 2004]. This section will discuss the downfalls and limitations of GAs.

7.2

The Right Tool for the Right Problem

The criteria for a problem that is well suited for a GA to solve has already been discussed, however, the pitfalls of GAs were only alluded to. Such a one-sided discussion might give the false impression that GAs are the perfect problem solving tool. GAs are a framework for solving problems, and not a specic algorithm or process [Holland, 1975]. As such, there is no denitive way to deploy a GA, and great deal of design and careful consideration is required before they may be used [Dorigo & Colombetti, 1998]. This means that while it may be possible to use a GA to solve a wide range of problems, there may often be simpler solutions that are simpler, and more precisely modelled, without requiring as much design [Grupe & Jooste, 2004]. In such cases, GAs are probably the wrong tool to use. In particular, Grupe and Jooste [Grupe & Jooste, 2004] provide a set of limitations of GAs: 1. It may be impossible to frame a particular problem in the mathematical manner that is required by a GA. It may be dicult or impossible to represent such problems in the GA framework 2. The design of a GA requires a developer with both expert knowledge of the system in question, as well as the GA principles and framework. 3. If structured improperly, GAs will produce chromosomes that converge on a local maxima, reducing them GAs eectiveness to that of a hill-climbing search. In such a situation, the chromosome population lacks sucient diversity, and crossover produces nearly identical ospring. This relies on mutation to produce the goal state, which is essentially leaving the search process to random chance. 4. GAs typically use random number generations, which may make it impossible to produce the same results as previous runs. 5. The coding of the genes and chromosomes in a GA must be done appropriately, which may be very dicult.

7.3

Summary of GA limitations

While GAs are eective at solving problems that can be represented well within the GA framework [Rothlauf, 2006], they should not be used in cases where one struggles to force the problem to meet GA criteria. The key issues arising from the use of GAs are based on the fact that GAs are a general problem solving framework, and are not a specic 17

design [Holland, 1975]. In other cases, a GA may be useable to represent a problem, but this may still be computationally impractical, as GAs may run for a very long time and not produce an optimal solution [Campbell et al., 2007]. Thus, for certain problems GAs may be impractical to implement, or may never produce an optimal solution.

Conclusion
GAs are used for to solve a wide variety of problems [Melanie, 1999], and are currently used as a key com-

ponent of the LCSs [Holland & Reitman, 1978] that autonomous agents use to learn and adapt to their environment [Dorigo & Colombetti, 1998]. This provides the crucial advantage of not requiring the designer to consider and implement all possible situations that the agent might encounter, as this is not feasible [Dorigo & Colombetti, 1998]. The case study in section 6 was used to illustrate how a GA can be used to introduce variation and adaption to autonomous agents. This exemplied the process of breaking down a problem and reconstructing it in the framework of a GA. The key issue associated with GAs is perhaps also one of their greatest strengths. While a GAs may be adapted to many dierent problems [Carnahan & Simha, 2001], this loose framework can also often make it dicult to apply a GA to a specic problem. While GAs are not a perfect problem solving tool, they provide a model for adaptation based on the same model that created the only known intelligence of the level of strong AI: human beings. When used correctly, they can produce complex, intelligent behaviour through natural selection, rather than through brute force coding. In the endeavour to produce strong AI, it would appear that GAs will likely play a role, perhaps in the form of the rule discovery element of a more sophisticated LCS. While this is just speculation, it is the opinion of this author that the same process which created human intelligence can be utilized to created a sophisticated Articial Intelligence.

18

References
[Campbell et al., 2007] Campbell, Neil A, Reece, Jane B, Urry, Lisa A, Cain, Michael L, Wasserman, Steven A, Minorsky, Peter V, & Jackson, Robert B. 2007. Biology, 8th edition. [Carnahan & Simha, 2001] Carnahan, Joseph, & Simha, Rahul. 2001. Natures Algorithms. Ieee Potentials, 2124. [Dorigo & Colombetti, 1998] Dorigo, Marco, & Colombetti, Marco. 1998. Robot Shaping: An Experiment in Behavior Engineering. Intelligent Robotics and Autonomous Agents series, vol. 2. The MIT Press. [Grupe & Jooste, 2004] Grupe, Fritz H., & Jooste, Simon. 2004. Genetic algorithms: A business perspective. Information Management & Computer Security, 12(3), 288297. [Holland, 1975] Holland, J H. 1975. Adaptation in Natural and Articial Systems. Vol. 34. University of Michigan Press. [Holland & Reitman, 1978] Holland, John H, & Reitman, Judith S. 1978. Cognitive systems based on adaptive algorithms. Cognitive Systems, 49. [Klein & DeNero, n.d.] Klein, John, & DeNero, Dan. Pacman Projects (Multi Agent). [Levins, 1968] Levins, Richard. 1968. Evolution in changing environments. Vol. 2. Princeton University Press. [Melanie, 1999] Melanie, Mitchell. 1999. An introduction to genetic algorithms. 5th edn. Vol. 32. A Bradford Book, The MIT Press. [Moreno & Ibaez, 1997] Moreno, A, & Ibaez, J. 1997. Articial life: a bridge toward a new articial intelligence. n n Brain and cognition, 34(1), 14. [Reeves & Rowe, 2003] Reeves, C, & Rowe, J. 2003. Genetic Algorithms: Principles and Perspectives: Guide to GA Theory. Kluwer Academic publishers. [Rothlauf, 2006] Rothlauf, Franz. 2006. Representations for Genetic and Evolutionary Algorithms. Springer Science+Business Media. [Sutton et al., 1992] Sutton, R S, Barto, A G, & Williams, R J. 1992. Reinforcement Learning is Direct Adaptive Optimal Control. IEEE Control Systems Magazine, 12(2), 1922.

19

Obtaining the les


1. SVN repository: svn co svn://hake.cs.umanitoba.ca/pacman 2. Genetic test: http://dl.dropbox.com/u/12363834/Genetic 3. Random test:http://dl.dropbox.com/u/12363834/Fixed

Running the simulation


To run the genetic simulation: . / r u n g p . py q o o u t f i l e n a m e The result for a the goal chromosome 011000110100100111111000101100110001110001001001, the 44th chromosome generated in the third generation in the run can be viewed at

When purely random chromosomes were generated, approximately one out of every thousand produced a goal state. This was done to act as a control value, to demonstrate that the GA does indeed produce better results than simply random chance. To run the simulation in this manner, execute: . / r u n g p . py i 100000 q o o u t f i l e n a m e c a t o u t f i l e n a m e | g r e p c True Where the number following -i is the number of iterations to be performed. Set this to be high, and then examine the result le for a tuple containing True, as indicated above. Once a goal chromosome is obtained, the results can be viewed at any time using: . / r u n g p . py i 1 a chrome = 1 1 0 1 0 1 1 . . .

20

You might also like