This action might not be possible to undo. Are you sure you want to continue?
Johnson for CS 4633, Assignment #4
Each child played a game against the parent and the scores of each match were recorded. Two types of units could be built and used to play the game: a gatherer unit to collect resources. This encouraged the development of more robust strategies. move units between regions and zones. This game involved a small number of game regions arranged in a four by four grid. target individual regions. These agents were matched against each other in a simple real time strategy game (RTS) created by the author. an agent was randomly generated. These high scoring agents were saved to a file when identified. When a command was evaluated.Abstract A genetic algorithm is used to evolve script like agents to play a RTS. Each command was given a probability and a enable/disable flag. . This behavior was repeated until the population achieved stopped improving. Each region contained two zones. To evolve better agents. Each agent could issues commands to build units. The agent which had achieved the highest score was selected as the new opponent. select groups of regions. These children were generated using a standard crossover and mutation scheme. Each agent consisted of n commands. and then used to create a pool of child agents. Introduction This project explored to viability of evolving rule based agents to play a simple game. but does not easily evolve more complex behavior. After a sufficient number of agents were created this way. and modify the flags and variable that controlled the agent's operation. and the cycle was repeated. called Melete's Game. The GA is found to be successful at finding simple. and a combat unit that could attack and defend against other units. where each command was evaluated each turn of game play. The top x percentage of tested agents were selected to create a new generation via normal genetic reproduction. it would check to see if was enabled and then test the probability against a linearly distributed random number. effective strategies. the selection algorithm was modified to play games against the latest winner and a random selection of previous winners. The command was executed if both conditions were met.
and the agents with the highest overall score are selected for reproduction. An attack consists of a . However. This feature determines the threshold number. resource collection suffers from diminishing returns. Combat Units will attack an enemy combat unit if one is present before they will attack an enemy gatherer unit. Since each zone has separate resources. Units automatically perform their functions while deployed to a region. After the threshold is reached. Each region was differentiated from the other regions by a number of features. Each zone has unique values for each feature above. Resource Collection Rate Resource Accessibility This value affects how quickly a unit can collect resources. Units will only attack into the offensive zone until those units are destroyed. Each region had two zones. called offensive and defensive. and then they will attack into the defensive zone. Combat Units will attack a random enemy unit each round. it is necessary to send units to both zones to fully collect the resources of a region. and evolving the agent pool. while the units in the offensive zone receive a bonus to attack. running the agents. Resource collection increases linearly with the number of Gatherer Units present until a threshold is reached. Gatherer Units will collect resources from the zone they are in until the resources there run out.Methods This project was implemented in Java. The project required work in three main areas: simulating the game. Feature Resource Quantity Use Gatherer Units could harvest resources which could be used to build more units and directly contributes to the agents score. Combat Bonus This feature affect how easy it is to attack or defend this region. The Game The game was played on a four by four grid of regions. The agent with the highest score wins the match. units in the defensive zone receive a bonus to defend against attacks. Figure 1 – Region Features for Melete's Game Each region has two zones. which had unique features.
Once a unit has been sufficiently damaged. the score of each match is recorded as the difference between the scores of the players. Send a unit from the reserves to a target region. some event takes place that modifies either the game state or the state of the agent itself. Each command is evaluated each turn based on its probability and a enable/disable flag. Move a unit from the Defensive to the Offensive Zone. the target is damaged based on the difference in the roles. Damaged units are repaired while in the reserves. Units may be moved between the defensive zone and offensive zones using another command. For selection purposes. If the attackers number is higher. Targets are used with Movement commands. Select a Target from a group. Each agent is given point based on how much resources the collected. Return a unit to the reserves from a target region. Units are always moved into and from the defensive zone. When a command is executed. Commands (Construction) Build Unit (Movement) Deploy Unit Recall Unit Advance Unit Retreat Unit (Targeting) Set Target Effect Create a new unit in the reserves if the player can pay the cost of the unit. it is removed from the game. The game ends after 1000 turns. how many units they built. The agent with the highest score wins the match. Move a unit from the Offensive to the Defensive Zone.random number being generated for each unit involved. Unit are built in a reserve area and may be moved between regions and the reserves with a command. The Agents Each agent has sixteen commands. . and how many enemy units they destroyed.
This process takes into account a number of different features of the region. Set a flag to be enabled or disabled based on game defined and agent defined variables Store a value in an agent defined variable. Figure 2 – Command Descriptions for Agents The Genetic Algorithm Each agent can be represented as a fixed length binary string. Figure 3 – Field Descriptions for Chromosomal Representation of a Command . commands These commands information. Modify Group (Agent State) Set Flag Set Variable Create a new group based on an existing group. The strings are formatted as follows: Name Purpose Flag This number determines which flag to watch for this command. use and extra field to encode that Parameters These parameters determine what is effected or used while evaluating a command.Make Group Select a set of Regions from the game board. executed each turn does. Probability This determines how likely this command is to be Command Command Code Type Determine The more complex s what the command commands can be completed many different ways. see table above. This string is created by concatenating a string representation of each command together.
only the top x percentage was allowed to reproduce. where a random crossover point was selected. each bit had a 3% chance to be set to a random value. but this was latter changed to be 35%. A pool of 80 agents was tested each cycle. and each parent contributed to one side of the crossover. This process continued until the population stabilized to a victory condition. After this operation was completed.110100001110010010111000101101110100100110100101110 101011101010010011010101011000001100001100100001110 110011111110001001011011111101101011110010001011011 110110111000111111001111100001110111011000111010101 011101011100000100010011100100001101101100100010000 001111101110001111011001001010110011000011111001000 000010011000011000001110000111011111110011001000100 100001110011011101001011010010101000101001000000001 100100001001000010001101011111111101110001010001111 111010001101011110001111100110000100100010101111110 100101001010101010001111101000111110001001010001011 101001001111100110010011101111001110001100001101010 001110101110011010001000011100110110001000101101010 111001110111110110010111110011011001101011010100001 010011110100010010000000011000010111111011000111011 101011101011110101010100010100100100100001001110000 Figure 4 – Example of a Complete Agent Chromosome New agents were created from older agents via a single crossover point and a small mutation probability. The best agents were allowed to reproduce to make the next generation of agents. In this implementation. . During each reproduction cycle. At first the reproducing percentage was 20%. a child had a 10% chance to come exclusively from one parent and a 90% chance to be composed of a combination of two parents.
Results The author had two objectives for this project. it seems like that the learning environment would have to be varied. The evolution scheme was able to determine what the best way to win very quickly. If the population reverted while it met the victory condition. but failed at producing interesting state transitions.Here we required that the agents be able to win 85% of all games they played against the current opponent. and then attempted to maximize that strategy. and the population's best member was selected as the new opponent to play against. this was interpreted as a local maximum. so that a GA evolves strategies without state transitions to fit a particular environment ( the behavior of the opponent can be included in the environment) and then evolve another agent to select the best strategy based on features it detects in its current environment. the agents were allowed to continue evolving until the fitness score of the pool stopped increasing. the author would suggest dividing up the task explicitly. it did not produce interesting or varied behavior. and had a chance of reverting to that state each time it failed to improve. Additionally. and a complete failure on the second. Improvements Since the project was successful at finding a good strategy for a given environment. The second was to see if a competitive. The population of agents was able to rapidly evolve to capture nearly all available resources. To prevent to negative changes in the population. The project was completely successful on the first goal. complex game could force the evolution of state based behavior in the agents. this makes perfect sense. the population saved its state whenever it bet its previous best score. To develop different strategies. While this produced very successful agents. . The first was to see if agent representation and evolution scheme could produce scripts that would perform well in the designed game. They behavior of the agents was determined solely on the reward scheme which determined the victor. Since there was no extra reward for interesting behavior. Here the chance to revert per failure was 8%. This method would mimic the situations that encouraged diverse strategies in the real world.
This action might not be possible to undo. Are you sure you want to continue?