You are on page 1of 7

Using a Genetic Algorithm to Evolve Rule Based Agents

to Play a Simple Strategy Game

William L. Johnson

CS 4633, Assignment #4

A genetic algorithm is used to evolve script like agents to play a RTS. The GA is found to be
successful at finding simple, effective strategies, but does not easily evolve more complex behavior.


This project explored to viability of evolving rule based agents to play a simple game. Each agent
consisted of n commands, where each command was evaluated each turn of game play. Each command
was given a probability and a enable/disable flag. When a command was evaluated, it would check to
see if was enabled and then test the probability against a linearly distributed random number. The
command was executed if both conditions were met.

These agents were matched against each other in a simple real time strategy game (RTS) created by the
author, called Melete's Game. This game involved a small number of game regions arranged in a four
by four grid. Each region contained two zones. Two types of units could be built and used to play the
game: a gatherer unit to collect resources, and a combat unit that could attack and defend against other
units. Each agent could issues commands to build units, move units between regions and zones, target
individual regions, select groups of regions, and modify the flags and variable that controlled the
agent's operation.

To evolve better agents, an agent was randomly generated, and then used to create a pool of child
agents. These children were generated using a standard crossover and mutation scheme. Each child
played a game against the parent and the scores of each match were recorded. The top x percentage of
tested agents were selected to create a new generation via normal genetic reproduction. This behavior
was repeated until the population achieved stopped improving. The agent which had achieved the
highest score was selected as the new opponent, and the cycle was repeated. These high scoring agents
were saved to a file when identified. After a sufficient number of agents were created this way, the
selection algorithm was modified to play games against the latest winner and a random selection of
previous winners. This encouraged the development of more robust strategies.

This project was implemented in Java. The project required work in three main areas: simulating the
game, running the agents, and evolving the agent pool.

The Game

The game was played on a four by four grid of regions. Each region was differentiated from the other
regions by a number of features. Each region had two zones, which had unique features.

Feature Use
Resource Quantity Gatherer Units could harvest resources which could be used to
build more units and directly contributes to the agents score. The
agent with the highest score wins the match, and the agents with the
highest overall score are selected for reproduction.
Resource Collection Rate This value affects how quickly a unit can collect resources.
Resource Accessibility Resource collection increases linearly with the number of Gatherer
Units present until a threshold is reached. After the threshold is
reached, resource collection suffers from diminishing returns. This
feature determines the threshold number.
Combat Bonus This feature affect how easy it is to attack or defend this region.
Figure 1 – Region Features for Melete's Game

Each region has two zones, called offensive and defensive. Each zone has unique values for each
feature above. However, units in the defensive zone receive a bonus to defend against attacks, while
the units in the offensive zone receive a bonus to attack. Since each zone has separate resources, it is
necessary to send units to both zones to fully collect the resources of a region.

Units automatically perform their functions while deployed to a region. Gatherer Units will collect
resources from the zone they are in until the resources there run out. Combat Units will attack a
random enemy unit each round. Units will only attack into the offensive zone until those units are
destroyed, and then they will attack into the defensive zone. Combat Units will attack an enemy
combat unit if one is present before they will attack an enemy gatherer unit. An attack consists of a
random number being generated for each unit involved. If the attackers number is higher, the target is
damaged based on the difference in the roles. Once a unit has been sufficiently damaged, it is removed
from the game.

Unit are built in a reserve area and may be moved between regions and the reserves with a command.
Units are always moved into and from the defensive zone. Units may be moved between the defensive
zone and offensive zones using another command. Damaged units are repaired while in the reserves.

The game ends after 1000 turns. Each agent is given point based on how much resources the collected,
how many units they built, and how many enemy units they destroyed. The agent with the highest
score wins the match. For selection purposes, the score of each match is recorded as the difference
between the scores of the players.

The Agents

Each agent has sixteen commands. Each command is evaluated each turn based on its probability and a
enable/disable flag. When a command is executed, some event takes place that modifies either the
game state or the state of the agent itself.

Commands Effect
Build Unit Create a new unit in the reserves if the player can
pay the cost of the unit.
Deploy Unit Send a unit from the reserves to a target region.
Recall Unit Return a unit to the reserves from a target region.
Advance Unit Move a unit from the Defensive to the Offensive
Retreat Unit Move a unit from the Offensive to the Defensive
Set Target Select a Target from a group. Targets are used
with Movement commands.
Make Group Select a set of Regions from the game board. This
process takes into account a number of different
features of the region.
Modify Group Create a new group based on an existing group.
(Agent State)
Set Flag Set a flag to be enabled or disabled based on game
defined and agent defined variables
Set Variable Store a value in an agent defined variable.
Figure 2 – Command Descriptions for Agents

The Genetic Algorithm

Each agent can be represented as a fixed length binary string. This string is created by concatenating a
string representation of each command together. The strings are formatted as follows:

Name Flag Probability Command Command Code Parameters

Purpose This number This determines Determine The more complex These parameters
determines how likely this s what the commands can be determine what
which flag to command is to be command completed many is effected or
watch for this executed each turn does, see different ways. used while
command. commands These commands evaluating a
table use and extra field command.
above. to encode that
Figure 3 – Field Descriptions for Chromosomal Representation of a Command
Figure 4 – Example of a Complete Agent Chromosome

New agents were created from older agents via a single crossover point and a small mutation
probability. In this implementation, a child had a 10% chance to come exclusively from one parent and
a 90% chance to be composed of a combination of two parents, where a random crossover point was
selected, and each parent contributed to one side of the crossover. After this operation was completed,
each bit had a 3% chance to be set to a random value. During each reproduction cycle, only the top x
percentage was allowed to reproduce. At first the reproducing percentage was 20%, but this was latter
changed to be 35%.

A pool of 80 agents was tested each cycle. The best agents were allowed to reproduce to make the next
generation of agents. This process continued until the population stabilized to a victory condition.
Here we required that the agents be able to win 85% of all games they played against the current
opponent. Additionally, the agents were allowed to continue evolving until the fitness score of the pool
stopped increasing. To prevent to negative changes in the population, the population saved its state
whenever it bet its previous best score, and had a chance of reverting to that state each time it failed to
improve. Here the chance to revert per failure was 8%. If the population reverted while it met the
victory condition, this was interpreted as a local maximum, and the population's best member was
selected as the new opponent to play against.


The author had two objectives for this project. The first was to see if agent representation and
evolution scheme could produce scripts that would perform well in the designed game. The second
was to see if a competitive, complex game could force the evolution of state based behavior in the
agents. The project was completely successful on the first goal, and a complete failure on the second.
The population of agents was able to rapidly evolve to capture nearly all available resources. They
behavior of the agents was determined solely on the reward scheme which determined the victor. The
evolution scheme was able to determine what the best way to win very quickly, and then attempted to
maximize that strategy. While this produced very successful agents, it did not produce interesting or
varied behavior. Since there was no extra reward for interesting behavior, this makes perfect sense. To
develop different strategies, it seems like that the learning environment would have to be varied. This
method would mimic the situations that encouraged diverse strategies in the real world.


Since the project was successful at finding a good strategy for a given environment, but failed at
producing interesting state transitions, the author would suggest dividing up the task explicitly, so that a
GA evolves strategies without state transitions to fit a particular environment ( the behavior of the
opponent can be included in the environment) and then evolve another agent to select the best strategy
based on features it detects in its current environment.