You are on page 1of 5

Quality of State Improvisation Through Evaluation

Function Optimization in Genetic Application


Learning
Dharm Singh Chirag S Thaker, Sanjay M Shah
College of Technology and Engineering Department of Computer Science & Engineering
Maharana Pratap University of Agriculture and Technology Sursesh Gyan Vihar University
Udaipur ( India ) Jaipur (India)
dharm@mpuat.ac.in chiragthaker@yahoo.com, sanjay_shah_r@yahoo.com

Abstract— Artificial intelligence algorithms have been applied on cognitive moves by computer programs in all sorts of board
computer board games since 1950s. Board Games provide games. Game playing programs have become a facet of many
competitive, cognitive learning and dynamic environments that people’s day-to-day lives. [3]
make them ideal area of computational intelligence theories,
architectures, and algorithms. Natural Evolution can also be Almost all traditional board games involves great amount
considered to be a game play in which the rewards for a test of AI research as they all has very high space complexity to
organism that plays a good game of life are the propagation of its traverse before making good moves. These games poses main
genetic material to its successors and its continued survival. In challenges in form of guiding the evolution using human
natural evolution, the fitness of an individual is defined with knowledge and in achieving game playing behaviour that is not
respect to its competitors and collaborators, as well as to the only successful, but visibly intelligent to a human observer.
environment. Evolutionary algorithms follow the same path to [4][5].
evolve game playing programs. Among all computer board
games, because of its low branching property, Reversi (Game of
Othello) playing program can easily defeat humans by designing
II. HISTORY OF REVERSI (GAME OF OTHELLO)
with strategies based game moves. Here on, the goal of computer The exact origins of Othello are not known, although it is
Reversi game is no longer to challenge or defeat human players now believed that it has originated from the Chinese game ‘Fan
but to compete and evolve against other computer programs. Mian’, which translates to ‘reverse’ (hence the alternative name
This paper mainly highlights optimization of Reversi program Reversi has come up). Until recently the game has been
fitness values by applying genetic operators through linear primarily played in Japan, where it is the second most popular
evaluation function. game next to in 1890’s a game called Game of Go.
Keywords- Game Playing, Board Games, Genetic algorithm, Reversi was being marketed in England by Jacques and
Reversi (Game of Othello) Sons of London. This game was very similar to the modern
game except for two differences:
I. INTRODUCTION • Each player was given a set number of discs at the start of
Games have long dominated popular area of artificial the game. If a player ran out of discs, his opponent
intelligence (AI) research, and for all good reasons they are completed the remainder of the game.
very challenging on one part and yet easy to formalize and
develop new AI solution exploration methods. Through games • The game began with an empty board.
efficiency of AI working can be measured in terms of In 1975 Goro Hasegawa introduced the modern version of
capability to acquire intelligence without putting human lives Othello which has been adopted across the world. In the
or property at risk. Most research based work so far has modern version, players take from a central pool of discs and
focused on games that can be described and solved-won in a the initial layout of four discs is provided. Othello is fast
compact form using symbolic representations, such as board growing in popularity with major international tournaments
and card games. The so-called “good old-fashioned artificial held every year. It has also been a popular test-bed for AI
intelligence” techniques work well with them, and to a large research.
extent, such techniques were developed, tested and improvised
for such games [1][2]. Formally speaking, Othello is a two-person zero-sum
deterministic finite board game with perfect information. Two-
Since the 1990s, and early dawn of 21st century the field of person zero-sum games are characterized by the fact that either
programming has radically changed. Availability of one player wins and the other loses, or neither player wins
inexpensive yet powerful computer hardware has made it resulting in a draw or tie. It is not possible for both players to
possible to simulate complex physical learning environments, win.
resulting in an exploration of artificially improved soft

978-1-4577-0240-2/11/$26.00 ©2011 IEEE 93


Deterministic games are games where there are no random B. Disc Difference (Greedy)
events based on event of chance or luck such as the roll of a The most basic strategy is to maximize the number of one’s
dice. In addition to that Othello is a finite game, because there pieces at every move. This strategy is often used by novice
are a finite number of moves: at most 60. Othello has perfect Othello players. Although this strategy is very important in the
information, because all the game information such as piece later stages of the game, it can become disastrous when used
position is available to all players. [6] earlier in the game. As the opponent can maximize on his
positions by making a couple of good moves in middle or end
III. RULES OF THE GAME part of the game which may dramatically cost the game.
There is a saying about the game that Othello is one of
those games that take minutes to learn, but years to master. C. Positional Strategy
Great expertise is required to play well, since board positions A player using the positional strategy recognizes that some
can change drastically in just a few moves, especially towards locations on the Othello board are more valuable than others.
the end of the game. The game is played on a chess-like 8 x 8 For example, the corners are the most valuable, because a disc
board with a set of dual-coloured black and white discs. The placed in a corner cannot be captured. Such discs are called
initial setup of the board contains four discs placed in the stable.
centre; 2 white and 2 black. The game begins by black making
the first move. A valid move involves placing a disc into an Stable discs are important because they are guaranteed to
empty square on the board. Legal moves are those moves that remain to the end of the game and hence add to the total score
result in capture of opponent’s pieces. of their owner. Furthermore, they serve as a means for
capturing opponent’s non-stable discs. Thus the stability of the
player’s discs refers to the number of stable discs that the
player possesses and their locations. It is often the case that
good stability leads to a winning situation; however this does
not necessarily imply that controlling all four corners leads to a
win. In Reversi there exists a hierarchy of stable discs. Placing
such discs help the player to maximize his winning chances.
[7].

D. Mobility Strategy
Mobility or move-ability is one of the best strategies in
Othello, but is also one of the hardest to master. The mobility
of a player refers to the number of legal moves available to that
player. Lack of mobility can lead to severe difficulties. A
player with poor mobility may be forced to choose ‘bad’ moves
Figure 1. Othello Board – Initial Configuration that lead to a detrimental situation.
On the other hand, a player with good mobility has a large
Opponent pieces become captured if they are surrounded, choice of moves, allowing him to direct the game towards an
in a straight line, between the disc being played and another advantageous situation. Mobility is important during the
disc belonging to the current player. Once a disc is captured it middle part of the game where both players are fighting for a
is flipped over to the opponent’s colour. If a player cannot good position, which will provide easy capture of corners. It is
make a legal move then he has to pass and the move is believed that the mobility strategy was discovered in Japan in
transferred to his opponent. The game finishes when neither the mid 1970s. The strategy slowly spread to America and
player has a legal move, usually when all 64 discs have been Europe through personal contact between American and
placed. At the end of the game all discs are counted and the Japanese players. It has been shown that games played with
player with the most discs is declared to be the winner; unless mobility strategy were more difficult to recall, suggesting that
both players have the same number of discs in which case it is the mobility strategy is much harder to learn.
a tie.
Othello was chosen for the following reasons:
A. Reversi Playing Strategies • It is a well-studied game, researched by many authors
Like any other brain involving games Othello players use a using a variety of techniques.
number of strategies to maximize their chances of winning. • The rules of the game are simple and easy to implement.
The formation of this game playing programs is based on • It is easy to implement basic AI opponents, such as a
below mentioned strategies to maximize the power of
random player and a greedy player.
evaluation function which is the product of square weights and
holder of the square. • Othello has a reasonable complexity. It is not as simple as
Checkers and not as complex as Chess or Go.
• Since every game is guaranteed to finish in 60 moves or
less, Othello is perfect game for optimization algorithms
like Genetic Algorithms

94
E. Game Complexity A. Fitness Function
State-space complexity of any board game represents the An important parameter of a genetic algorithm is the fitness
number of possible board states in the game. For example, in function that defines the fitness of each chromosome where the
Othello there are 64 board locations where each location can values of genetic parameters are adapted as the genetic
take one of three values, giving approximately 364 § 1028 total evolution progresses. At every generation the best and worst
states. Game tree complexity represents the total number of chromosome fitness are kept track of. In some cases if both
nodes in a fully-expanded game tree. Othello has a game tree fitness values become equal, the mutation rate is increased, in
complexity of 1058. Based on that figure, the average branching order to help the genetic evolution get out of issues like local
factor is about 9.26. To check this, we made two random maxima. Once there is an improvement in the overall fitness,
players play against each other, recording the number of the original mutation rate is restored to continue evolution as
available moves for each stage of the game. The branching normal. Here if evolution stabilizes by the fitness does not
factor rises slowly from 4 and peaks at 12 on the 30th move. seem to be improving for several generations and the search
[8]. does not find any error, the genetic algorithm with the initial
default parameter values and a new randomly generated seed to
IV. GENETIC ALGORITHMS generate a new random initial population. [13]
Genetic algorithm being an effective and efficient subset of This paper uses genetic algorithms in game-playing
evolutionary algorithms is a natural contestant for learning in learning programs by constructing a static evaluation function
games. A genetic algorithm provides an algorithmic and logical based on the features and strategies of an individual game
framework for exploiting all possible board fame scenarios (Game of Reversi). To attain better results we modify genetic
through natural-evolution processes like selection, cross over algorithm as per the complexity of evaluation function
and mutation. It evolves intermediate candidate solutions to parameters. Constructing an evaluation function and updating
problem domains that have large solution search spaces and are its associated weight ages is always a challenging task.
not open to exhaustive search or conventional optimization
techniques. Since ages Genetic algorithms have been applied to V. REVERSI-GENETIC COALESCE
a broad range of learning and optimization problems [9][10].
The program was implemented using classical Genetic
When genetic evolution process is run for many numbers of algorithm in which first population is randomly generated, and
generations through a series of experiments, the program a random Evaluation Function serves as a "fitness finder" for
acquires a novel set of evaluation function parameters are the entire population. And the populations are non-overlapping
gathered for subsequent population. populations in nature. When a simple genetic algorithm is
created, it specifies either an individual or a population of
Normally, a genetic evolution process starts with a random individuals. The new genetic algorithm performs cloning for
set of population candidate solutions, called chromosomes. the individual(s) that are specify to make its own population.
Through a cross over process and mutation operators, it
evolves the population towards an optimal solution. GAs does The simple genetic algorithm creates an initial population
not guarantee an optimal solution hence the challenge is to by cloning the individual or population you pass when you
design a “genetic” process that maximizes the likelihood of create it. In each generation, the algorithm creates an entirely
generating such an optimized solution [11]. new population of individuals by selecting from the previous
population then mating to produce the new offspring for the
The first step is typically to evaluate the fitness of each new population. This process continues until the stopping
candidate solution in the current population based on the fitness criteria are met (determined by the terminator). [14]
values and to select the fittest candidate solutions to act as
parents of the next generation of candidate solutions. After This genetic string will play each and every string from the
being selected for reproduction, parents are recombined population, and the system will assign a fitness function to this
through a crossover operator and then mutated using a last string with respect to his performance in this match. Then
mutation operator to generate offspring. The fittest parents and the next generation is built according to this fitness, with a
their new offspring form a new population, from which the certain crossover probability and some mutation probability,
process is repeated to create new populations. the previously random string (the "fitness function") is now one
of the population’s strings, this can be seen as a cooperation
The operations of evaluation, selection, recombination and scheme, having a population of 1 on one side, and another
mutation are usually performed many times in a genetic population on another side. The "fitness giver" can also be a
algorithm. Selection, recombination, and mutation are generic random string for each generation.
operations in any genetic algorithm. Evaluation is problem
specific and relates directly to the structure of the solutions So So the most important question here is the "weight" of
in a genetic algorithm, a major issue is the choice of the evaluation function. If the weights are set correctly, our
structure of solutions and of the method of evaluation (fitness computer program will play well. If they're not set correctly,
function).Other important parameters include the size of the the program may play very badly. Genetic algorithms learn
population, the portion of the population taking part in good weights. The way to do this is to start with a random set
recombination, and the mutation rate. The mutation rate defines of weights in the program, and use them to test the program. If
the probability with which a bit is flipped in a chromosome that the program does well, we keep the weights, and use those
is produced by a crossover [12]. (making small changes) in the next version of the program. If

95
they do badly, we change them a lot or throw them out and coefficients by genetic algorithm again before manually reset
start again. the weight for each of them. Selection from one generation to
next generation is chosen using Roulette-wheel selection with
But, just like with animals, evolution happens slowly over a probability of 20% and 1-point crossover with probability of
large population of individuals. Likewise with different 80% is applied to the converted 1-dimensonal array of the 8x8
weights of board squares the evolution take place with every board. Mutation operator with the value of 0.01 changes an
successive generations. Instead of having just one weight age element of the vector as a new value ranging from -1 to 1.
value of the generation at a time, many values of generations
existed. All of them are to be tested and then keep a few of the
best ones. In this way we hope to get better and better
individual weight sets each time [15]. The Othello program provides three sets of initial value (1,
-1, and 0) assign to each parameter. When the Fitness
In this process, different versions of the Evaluation weight value (f) reaches the best value in the generation, the
Function play against each other. The weight set which gives best weights are added to the adapted function to make a new
best results is declared as a winner. adapted function. After many times of evolution, it has a big
chance that the parameter assignment in the system can come
During the training process, hopefully, the genetic up with excellent weight assignment ability. The advantage of
algorithm will find the best suitable weights for each inner this strategy is that it can precede the evolution without any
node. That means, the program will know how much influence standard information and the parameters got by evolution have
a node's value should have on the whole final position decision a good performing ability. The time needed to evolve game
to place a new move on board. playing genetic parameters for evolution is the disadvantage of
This evaluation usually works just by calculating simple this strategy.
numerical features of the game position (for example in
Othello, whether one player has more pieces or is controlling VII. RESULT
the corners of the board). The final evaluation of the position
is a number; you get it by computing a linear function of the
position features. It shows how the evaluation function uses 6 The program is not looking for a global optimum but to
features of the position, it calculates a number for each one, achieve genetic evolution of an evaluation function is to adjust
and then multiplies each numbers by its own "weight" value. its genetic parameters so that the overall performance of the
This is how it is been implemented. program is enhanced.

VI. GAME PROGRAM IMPLEMENTATION


The board is represented as a vector of length 64. Black
disc is represented as 1 and white disc as -1. Empty space is 0.
The relevance of the board is calculated using weighted piece
counter. The weighted piece counter is a vector of length 64.
Each element of the vector is corresponding to one square in
the Othello board, which means the weights of the board
position. The cumulative fitness value of the board squares is
calculated using dot product of two vectors (weights and
functional features) as follows:
Fitness Weight Value f = (W1 x F1) + (W2 x F2) + . . . +
(Wn x Fn)
Where the Fitness Weight Value f is called the static
evaluation of a game board configuration
• The Wi's are weights that indicate the relative importance Figure 2. Othello Program – Fitness Value Comparison (Generation 1 to 97)
of the features.
• The Fi's are functional features that play important roles At first quick look, automatic tuning of the evaluation
in game-playing strategies function appeared like an optimization task, well suited for
The weights of each position are initialized as a value GA. The many parameters and sub parameters of the
between -1 to 1. Until out-of-opening, the weightings in the evaluation function can be encoded as a bit-string. They were
population use opening knowledge. After out-of-opening, the randomly initialized as board square “chromosomes”, each
weightings are evaluated in the relevance of board and the next representing one weight age value for evaluation function.
move is decided. Among possible moves, the one with the Then the population was evolved until highly tuned “fit”
highest fitness weight value (f) is selected. evaluation functions emerged. During the experiments, one
major challenge was faced in the form of fitness function for
After doing the optimization for each parameter, the weight the application of GA. For every set of input parameters, a set
values are set for each parameter manually. Then give each of evaluation parameters as encoded as a chromosome to
parameter a weighted coefficient and optimize these calculate its fitness value. Here the solution was derived by

96
individuals’ fitness value in each generation. The main REFERENCES
drawback of genetic evolutionary approach is the unacceptably
large amount of time needed to evolve each generation. As a [1] Hong, J.-H. and Cho, S.-B. (2004). Evolution of emergent behaviors for
result, severe limitations are imposed on the length of the shooting game characters in robocode. In Evolutionary Computation,
games played after each generation, and also on the size of the 2004. CEC2004. Congress on Evolutionary Computation, volume 1,
pages 634–638, Piscataway, NJ. IEEE.
population involved.
[2] J. Clune. Heuristic evaluation functions for general game playing. In
The Othello program is performed on a Pentium IV Proc. of AAAI, 1134–1139, 2007.
machine with the RAM size of 512 MB. The results are [3] J¨org Denzinger, Kevin Loose, Darryl Gates, and John Buchanan.
collected for two sets of Generations. In the first set which has Dealing with parameterized actions in behavior testing of commercial
computer games. In Proceedings of the IEEE 2005 Symposium on
97 generations the population size was 100 and in the second Computational Intelligence and Games (CIG), pages 37–43, 2005.
set has 100 generations with 200 members per population. For
[4] Matt Gilgenbach. Fun game AI design for beginners. In Steve Rabin,
both the sets the minimum fitness weight value (f) and average editor, AI Game Programming Wisdom 3, 2006.
fitness value for each generation is collected and graphs are [5] S. Schiffel and M. Thielscher. A multiagent semantics for the game
plotted accordingly as shown in figures below mentioned. As description language. In Proc. of the Int.’l Conf. on Agents and Artificial
per fig.2 the Minimum Fitness values are increasing as the Intelligence, Porto 2009. Springer LNCS.
program proceeds from Gen 00 to Gen 97. But the average [6] T. Srinivasan, P.J.S. Srikanth, K. Praveen and L. Harish Subramaniam,
fitness value tends to remain steady the program evolves with “AI Game Playing Approach for Fast Processor Allocation in Hypercube
major low values in Generation No. 11,48,61,86 and 96. Systems using Veitch diagram (AIPA)”, IADIS International
Conference on Applied Computing 2005, vol. 1, Feb. 2005, pp. 65-72.
Fig. 3 is clearly indicative of the positive evolution of [7] Thomas P. Runarsson and Simon M. Lucas. Co-evolution versus self-
Othello program as the Function Fitness values increases from play temporal difference learning for acquiring position evaluation in
generation 01 to generation 100. In a nutshell both the graphs small-board go. IEEE Transactions on Evolutionary Computation, 9:628
– 640, 2005.
serve the purpose of genetic optimization in evolutionary
[8] Yannakakis, G., Levine, J., and Hallam, J. (2004). An evolutionary
Reversi game learning. approach for interactive computer games. In Evolutionary Computation,
2004. CEC2004. Congress on Evolutionary Computation, volume 1,
pages 986–993, Piscataway, NJ. IEEE.
[9] A. Hauptman and M. Sipper. Evolution of an efficient search algorithm
for the Mate-in-N problem in chess. In Proceedings of the 2007
European Conference on Genetic Programming, pages 78–89. Springer,
Valencia, Spain, 2007.
[10] P. Aksenov. Genetic algorithms for optimising chess position scoring.
Master’s Thesis, University of Joensuu, Finland, 2004. Y. Bjornsson and
T.A. Marsland. Multi-cut alpha-beta-pruning in game-tree search.
Theoretical Computer Science, 252(1-2):177–196, 2001.
[11] O. David-Tabibi, A. Felner, and N.S. Netanyahu. Blockage detection in
pawn endings. Computers and Games CG 2004, eds. H.J. van den Herik,
Y. Bjornsson, and N.S. Netanyahu, pages 187–201. Springer-Verlag,
2006.
[12] A. Hauptman and M. Sipper. Using genetic programming to evolve
chess endgame players. In Proceedings of the 2005 European
Figure 3. Othello Program – Fitness Analysis (Generation 1 to 100) Conference onGenetic Programming, pages 120–131. Springer,
Lausanne, Switzerland, 2005.
[13] G. Kendall and G. Whitwell. An evolutionary approach for the tuning of
VIII. CONCLUSION a chess evaluation function using population dynamics. In Proceedings
of the 2001 Congress on Evolutionary Computation, pages 995–1002.
Machine learning has been applied in many computer board IEEE Press, World Trade Center, Seoul, Korea, 2001.
games; the application of genetic operators is fit to enhance an [14] Holland, J. H. (1975). Adaptation in Natural and Artificial Systems: An
Othello game playing program. This paper used the genetic Introductory Analysis with Applications to Biology, Control and
parameter to speed up the efficiency of machine learning using Artificial Intelligence. Ann Arbor, MI: University of Michigan Press.
self-learning genetic features. Othello program was used to [15] Goldberg, D. E. (1989). Genetic Algorithms in Search,Optimization and
enhance the game playing strategy using fitness function and Machine Learning. Reading, MA: Addison-Wesley.
evaluation using Othello game playing strategies. The purpose
of making simple evaluation function is served as the program
evolves.
After practical test, it has proved that the way of self-
learning proposed in this paper is possible to enhance the
efficiency of learning greatly. The optimization of the genetic
algorithm can improvise fitness functions in order to evaluate
the board state precisely and improve the computer Othello.

97

You might also like