Professional Documents
Culture Documents
PH DThesis Lim Yew Jin
PH DThesis Lim Yew Jin
A THESIS SUBMITTED
SCHOOL OF COMPUTING
2007
Dedicated to my family, especially Mum and Dad
Acknowledgements
I am very fortunate to have a loving wife, Xin Yu, who constantly reminds me that there
is a life unrelated to research.
I have learnt a lot from Lee Wee Sun. I am grateful for his guidance in the course of
my research, as well as his patience and willingness to share his opinions on my ideas in
our weekly discussions. I am also indebted to Jürg Nievergelt for his wisdom and guid-
ance, and for suggesting to me to try out the game of Tigers and Goats first. Portions of
the text on Tigers and Goats had been co-authored with him. Elwyn Berlekamp pointed
out Tigers and Goats and got us interested in trying to solve this game - an exhaustive
search problem whose solution stretched out over three years.
As a collaborator, fellow student and friend, Oon Wee Chong has always been avail-
able for excellent help and suggestions in my research. I would like to acknowledge
friends like Weiyang, Yaoqiang and Yee Whye who kept me sane and grounded during
times of insanity.
And lastly, to those who I have not named, but have helped me in one way or another.
Thank you.
iii
Contents
Acknowledgements iii
Summary ix
List of Tables xi
1 Introduction 1
1.1 Tigers and Goats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 RankCut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Properties of Forward Pruning in Game-Tree Search . . . . . . . . . . . 4
1.4 List of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Game-Tree Search 8
2.1 Game-Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
iv
Contents v
6 RankCut 75
6.1 Existing Forward Pruning Techniques . . . . . . . . . . . . . . . . . . 76
6.1.1 Razoring and Futility Pruning . . . . . . . . . . . . . . . . . . 77
6.1.2 Null-Move Pruning . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1.3 ProbCut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.1.4 N -Best Selective Search . . . . . . . . . . . . . . . . . . . . . 81
Contents vii
This thesis presents the results of our research aimed at the theoretical understanding and
practical applications of forward pruning in game-tree search, also known as selective
search. The standard technique used by modern game-playing programs is a depth-first
search that relies on refinements of the Alpha-Beta paradigm. However, despite search
enhancements, such as transposition tables, move ordering and search extensions, the
game-tree complexity of many games are still beyond the computational limits of today’s
computers. To further improve game-playing performances, programs typically perform
forward pruning, also known as selective search. Our work on forward pruning focuses
on three main areas:
1. Solving Tigers and Goats - using forward pruning techniques in addition to other
advanced search techniques to reduce the game-tree complexity of the game of
Tigers and Goats to a reasonable size. We are then able to prove that Tigers and
Goats is a draw using modern desktop computers.
ix
Summary x
3.1 Number of distinct board images and positions for corresponding sub-
spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Tigers and Goats Endgame Database Statistics . . . . . . . . . . . . . . 49
3.3 Estimated Tree Complexity for various winning criterions . . . . . . . . 50
3.4 Estimated State-Space Complexities and Game-Tree Complexities of
various games [van den Herik et al., 2002] and Tigers and Goats (sorted
by Game-tree complexity) . . . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Halfway database statistics: the number of positions computed and their
value from Tiger’s point of view: win-or-draw vs. loss . . . . . . . . . 69
5.2 Number of positions created by different move generators . . . . . . . . 71
xi
List of Tables xii
8.1 Statistics for pruning schemes with time limit of 5 seconds per position 129
8.2 One-way ANOVA to test for differences in search depth gain among the
three pruning schemes with time limit of 5 seconds per position . . . . . 129
8.3 Statistics for pruning schemes with time limit of 10 seconds per position 130
8.4 One-way ANOVA to test for differences in search depth gain among the
three pruning schemes with time limit of 10 seconds per position . . . . 130
8.5 Top 3 FPV-l values for various search depths [Kocsis, 2003] . . . . . . 135
8.6 Top 3 FPV-d values for various search depths [Kocsis, 2003] . . . . . . 135
2.1 Final Position of D EEP B LUE (White) versus Kasparov (Black) game in
1997 where Kasparov loses in 19 moves . . . . . . . . . . . . . . . . . 10
2.2 The White Doctor Opening which has been shown to be a draw [Scha-
effer et al., 2005] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Initial Position for Othello . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Go board, or “goban” . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Game-Tree of initial Tic Tac Toe board . . . . . . . . . . . . . . . . . . 17
2.6 How alpha and beta values propagate in Alpha-Beta Search . . . . . . . 20
2.7 Minimal Alpha-Beta Search Tree . . . . . . . . . . . . . . . . . . . . . 21
xiii
List of Figures xiv
3.5 Left: the position after the only (modulo symmetry) first Goat move that
avoids an early capture of a goat. Right: puzzle with Goat to win in 5
plies if Tiger captures a goat . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Two of the five initial Goat moves that lead to the capture of a goat . . . 45
7.1 Log plot of the number of times ranked moves are chosen where either
Max or Min nodes are forward pruned . . . . . . . . . . . . . . . . . . 108
7.2 Log plot of the number of times ranked moves are chosen where ei-
ther Max or Min nodes are forward pruned in game-trees with branch-
dependent leaf values . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
List of Figures xv
7.3 Log plot of the number of times ranked moves are chosen with unequal
forward pruning on both Max and Min nodes in game-trees with branch-
dependent leaf values . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.4 Log plot of the number of times ranked moves are chosen with unequal
forward pruning on both Max and Min nodes in real Chess game-trees . 111
Search is the basis of many Artificial Intelligence (AI) techniques. Most AI applica-
tions need to search for the best solution, given resource constraints, from many alterna-
tives. Logically, such problems are trivial since we simply have to try every possibility
until a solution is found. However, for practical game-playing programs, this strategy is
not feasible. Expert humans can easily outplay computers in games with large branching
factors such as Shogi, Bridge and Go due to the exponential growth in computational
effort with increasing search depths.
Humans naturally perform selective search in game-tree searches. And we do it so
well that the best human Chess players are still competitive with modern Chess pro-
grams that search in excess of 200 million Chess positions per second [Björnsson and
1
2
Newborn, 1997]. This approach keeps the exponential explosion in computational ef-
fort with increasing search depth manageable, as selective search only considers rea-
sonable moves, thereby reducing the branching factor. The fact that humans can per-
form selective search so effectively has led experts to believe that full-width searchers in
Chess would be dominated by selective searchers [Abramson, 1989]. However, selective
searchers are difficult to implement correctly - in an early 4-game Chess experiment be-
tween a selective search program and a full-width search program, the selective searcher
lost handily [Abramson, 1989].
This experiment illustrated the relative difficulty in implementing a selective search
compared to a full-width search. While the premise of considering only “reasonable”
moves is simple to vocalize, it is much harder to construct algorithms that can identify
“good” and “bad” moves accurately. Nevertheless, effective selective search techniques
such as search extensions (Section 2.2.5), Razoring, Futility Pruning, Null-Move Prun-
ing, and ProbCut (Section 6.1), that have been developed to date have been shown to be
effective in game-tree search. Despite these techniques, however, the exponential explo-
sion of computational effort needed to search game-trees is beyond the computational
limits of modern computers. Hence the need for effective forward pruning techniques
has never diminished.
The goal of our research on forward pruning is to improve upon the state-of-the-
art of both the practical application and theoretical understanding of forward pruning
techniques in game-tree search. Our research comprises of work in several areas:
draw.
The game of Tigers and Goats is the national game of Nepal. Tigers and Goats is a two-
player perfect-information zero-sum game to which the Minimax paradigm is easily
applicable. As it is played on a 5×5 board, it looks deceptively easy to solve. However,
the game has an estimated game-tree complexity of 1041 . To give an idea of the size of
this game-tree, we assume that a search program can process 109 positions per second.
At this rate of searching, it will take approximately 1024 years to complete the search.
It is therefore clear that advanced search techniques, domain-specific optimizations and
selective search are needed to reduce the game-tree complexity to a reasonable size. Our
work on Tigers and Goats resulted in a program that proved that Tigers and Goats is a
draw using less than three days of computational time.
1.2 RankCut
Since game-playing programs already perform move ordering to improve the perfor-
mance of Alpha-Beta search, this information is available at no extra cost. As RankCut
uses additional information untapped by current forward pruning techniques, RankCut
is a forward pruning method that can be used to complement existing methods, and is
able to achieve improvements even when conventional pruning techniques are simulta-
neously employed. We implemented RankCut in modern open-source Chess programs
to show its effectiveness.
We also explore forward pruning using theoretical analyses and Monte Carlo simulations
and show two factors of forward pruning error propagation in game-tree search. Firstly,
we find that pruning errors propagate differently depending on the player to move, and
show that pruning errors on the opponent’s moves are potentially more serious than
pruning errors on the player’s own moves. While this suggests that pruning on the
player’s own move should be performed more aggressively compared to pruning on the
opponent’s move, empirical experiments with Chess programs suggest that this effect
might not be that important in practical settings. Secondly, we examined the ability of
the Minimax search to filter away pruning errors and give bounds on the rate of error
propagation to the root. We find that if the rate of pruning error is kept constant, the
growth of errors with the depth of the tree dominates the filtering effect, which suggests
that pruning should be done more aggressively near the root and less aggressively near
the leaves.
1.4 List of Contributions 5
• RankCut
RankCut is a novel domain-independent forward pruning technique in game-tree
search. It is designed to be simple to implement and has been shown to be highly
effective in Chess, even when existing forward pruning techniques are used to-
gether with RankCut.
We show that the depth of a node in the search affects the propagation of forward
pruning errors in game-tree search. We derive a theoretical analysis that shows
that the rate of error propagation increases with increasing search depth, and show
evidence that this effect is present even in simulated and Chess game-trees.
This thesis is organized as follows. Chapter 1 gives a summary of the thesis and outlines
the contributions of our research.
Chapter 2 introduces the basic game-tree search techniques and the more advanced
search enhancements used in current game-playing programs.
Chapters 3, 4 and 5 explain how the game of Tigers and Goats was solved. Chapter
3 describes the game of Tigers and Goats, and provides an analysis on the state-space
complexity and game-tree complexity of the game. An introduction to how other games
have been solved is also given. Chapter 4 presents the evolutionary computation method
used to create heuristic players employed during the forward search to show that Goat
has at least a draw. Chapter 5 outlines the techniques used to show that Tiger has at
least a draw, thus weakly solving the game of Tigers and Goats. This solution involved
intensive computation on numerous machines over a time period of approximately three
years.
Chapter 6 describes RankCut, which is a forward pruning technique in game-tree
search. It is designed to be simple to implement and has been shown to be highly ef-
fective in Chess, even when existing forward pruning techniques are used together with
RankCut.
Chapters 7 and 8 show how the player to move and the depth of a node affects
1.5 Thesis Outline 7
the propagation of forward pruning errors during game-tree search. To the best of our
knowledge, the player to move effect has not been reported in the literature. The depth
of a node effect is novel as an analysis of forward pruning although it builds on prior
work of Minimax pathology, or the property that Minimaxing amplifies errors as search
depth increases.
Chapter 9 concludes this thesis with a summary and a look at areas for future re-
search.
Chapter 2
Game-Tree Search
This thesis studies the theory and practice of forward pruning in game-tree search. It
presents research on applications of forward pruning in game-tree search to solve and
play games, and a theoretical analysis of forward pruning in game-tree search. In this
chapter we introduce game-tree search and search enhancements, and outline how they
are employed in game-playing programs.
AI techniques have been applied to board games for the past 40 years. For example,
Chess has been a popular testbed for AI techniques, and one of the most memorable
result of such research is the defeat of reigning world champion Garry Kasparov to a
computer system named D EEP B LUE under regular time controls in 1997.
The underlying algorithm typically used for AI in board games is based on the Min-
imax paradigm. The Minimax paradigm can be implemented by game-tree search al-
gorithms. In this thesis, we will focus on two-player zero-sum games with perfect in-
formation. The term two-player simply refers to a game that involves two players. The
8
2.1 Game-Tree Search 9
term perfect information means that the states of the game are completely visible to all
players. In contrast, the term imperfect information means that states of the game are
only partially observable, and therefore some relevant information is hidden from the
players. Zero-sum means that the gain of a player is the loss of his or her opponent. Let
scoreA (p) and scoreB (p) represent the scores of A and B in position p respectively. In
a zero-sum game, it is necessary that scoreA (p) + scoreB (p) = 0 ∀p. This is equivalent
to saying that there is no move that benefits both players simultaneously.
Computers are able to play board games such as Chess [Baxter et al., 1998, Björnsson
and Newborn, 1997], Checkers [Chellapilla and Fogel, 2001a,Schaeffer, 1997], Go [Müller,
2002, Dayan et al., 2001] and Othello [Buro, 1997a, Chong et al., 2003], which are all
two-player, perfect information, and zero-sum games. Advances in the playing strength
of computers can be largely attributed to the increased computing power available and
sophisticated game-tree search techniques, such as Alpha-Beta searching [Knuth and
Moore, 1975] and proof number searching [Allis et al., 1994]. In this section, we will
outline the research results of each game, and briefly see how selective search is em-
ployed to play them effectively.
Chess
Since the early development of computer games research, Chess has been considered the
pinnacle of AI research. Intensive research has been done since then and the dominant
paradigm used to tackle computer Chess is game-tree search with Alpha-Beta searching.
In 1988, IBM built D EEP T HOUGHT, the first Chess machine to beat a chess grand-
master in tournament play. D EEP T HOUGHT used game-tree search with Alpha-Beta
2.1 Game-Tree Search 10
searching and had a single-chip Chess move generator that could search in the neighbor-
hood of 500,000 positions per second to 700,000 positions per second.
In May 1997, D EEP B LUE [Björnsson and Newborn, 1997], the descendent of D EEP
T HOUGHT, beat world champion Garry Kasparov with a score of 3.5-2.5. D EEP B LUE
was based on a redesigned evaluation function had over 8,000 features and a new chip
that added hardware repetition detection, a number of specialized move generation modes
and efficiency improvements. D EEP B LUE is a massively parallel system with over 200
Chess chips, and each chip searches about 2-2.5 million positions per second. By using
over 200 of these chips, the overall speed of the program is 200 million positions per
second.
rZkZ0Z0s
o0Zna0o0
0ZbZ0Z0o
ZpZnZpZ0
0ZPO0Z0Z
Z0ZQZNA0
0O0Z0OPO
S0Z0Z0J0
Figure 2.1: Final Position of D EEP B LUE (White) versus Kasparov (Black) game in
1997 where Kasparov loses in 19 moves
Computer Chess has advanced rapidly since then, and modern Chess programs play
at grandmaster level even when using personal desktop computers. For example, in Oc-
tober 2002, a man-machine match held in Bahrain between the human world champion
Vladimir Kramnik and D EEP F RITZ, a commercial Chess program on a standard com-
puter configuration, finished in a 4-4 draw, with 2 wins each and 4 draws. And in January
2003, a six-game match between Garry Kasparov and another computer program named
2.1 Game-Tree Search 11
D EEP J UNIOR resulted in a 3-3 draw, with a win each and 4 draws. Most recently in
a match from 25 November to 5 December 2006, D EEP F RITZ beat World Champion
Vladimir Kramnik 4-2, with two wins for the computer and four draws.
While D EEP B LUE was a sophisticated brute-force searcher, modern computer Chess
programs for desktop computers are also able to play at grandmaster level partially due
to the successful forward pruning techniques such as futility/Razoring (Section 6.1.1)
and Null-Move Pruning (Section 6.1.2). Nearly all world-class chess programs apply
various forward pruning techniques throughout the search [Heinz, 1999].
Checkers
Checkers, also known as American Checkers, is played on a 8×8 board. Two players, on
opposite sides of the board, alternate move pieces diagonally, and pieces of the opponent
are captured by jumping over them. The player who has no pieces left or cannot move
loses the game.
Figure 2.2: The White Doctor Opening which has been shown to be a draw [Schaeffer
et al., 2005]
Checkers program had won a game against a strong human in 1959. Interestingly, the
win has since been noted to be dubious, as analyses of game records had showed that the
human had made several huge blunders uncharacteristic of a strong player. The stigma
of Checkers being a ‘solved’ game had resulted in lack of research being done on the
game.
In 1988, Jonathan Schaeffer, and a team at the University of Alberta started devel-
oping C HINOOK [Schaeffer, 1997], which defeated the current human world champion
in match play. The highlight of C HINOOK was its matches against the previous human
world champion, Marion Tinsley, who had been World champion since 1954 and was
perceived by many to be invincible in match play. In the 1992 series, Marion Tinsley
won 4, lost 2 and drew 33 games against C HINOOK. In the 1994 series, the match
was interrupted when Marion Tinsley fell seriously ill and C HINOOK was rescheduled
to play the second best human player, Don Lafferty. C HINOOK competed against Don
Lafferty in 1994 and won 1, lost 1 and drew 18 games, and in 1995, it won 1 and drew
32 games.
C HINOOK uses traditional AI techniques such as endgame database, Alpha-Beta
searching and opening books. C HINOOK could not use the Null-Move Pruning to per-
form forward pruning as many positions in Checkers are zugzwang (defined as positions
where the player-to-move benefits more if he or she does not move), for which the Null-
Move Pruning is known to be ineffective in (Section 6.1.2 for details). Schaeffer had to
therefore spend considerable time implementing hand-crafted heuristics to extend and
prune the search tree [Schaeffer et al., 1992].
In 2007, the C HINOOK team had computed the game-theoretic values of all Check-
ers positions up to 10 piece positions [Schaeffer et al., 2005]. By using these endgame
2.1 Game-Tree Search 13
Othello
Othello, also known as Reversi, is a strategic two-player board game on a 8×8 board
with Black and White pieces. The starting position is shown in Figure 2.3, and by
convention, Black makes the first move. Players must place a new piece in a position
such that there exists at least one straight line (horizontal, vertical or diagonal) between
the new piece and a piece of the player already on the board, with one or more opponent
pieces between them. After placing a piece on the board, the player flips all opponent
pieces lying on a straight line between the new piece and any other piece of the player
already on the board. If a player cannot make a valid move, play passes to the other
player. If neither player can move, the game ends. The player with more pieces on the
board at the end wins.
In 1997, L OGISTELLO [Buro, 1997a] defeated Takeshi Murakami, the world Oth-
ello champion, by winning all 6 games of the match. L OGISTELLO is able to learn
its opening books [Buro, 1997c] and uses a table-based evaluation function, which can
2.1 Game-Tree Search 14
capture more non-linear dependencies than small neural networks based on sigmoid
functions [Buro, 1998]. While the move selection process is a commonly-used Alpha-
Beta search, L OGISTELLO also incorporates a sophisticated forward pruning technique
called ProbCut. ProbCut is based on the idea that the result of a shallow search is a
rough estimate of a deeper search, and therefore it is possible to eliminate certain moves
during normal search based on a shallow search. ProbCut has been shown to be effective
in Othello, Chess, and Shogi [Jiang and Buro, 2003].
Go
Go is a two player Oriental board game that originated between 2500 and 4000 years
ago. It is one of the oldest games in the world that is still widely played in Asian
countries and is gaining popularity in Western countries. It is also known as Weiqi in
China and Baduk in Korea.
Like Chess, Go is a deterministic, perfect information, zero-sum game of strategy
between two players. Go is played on a board, which consists of a grid made by the
intersection of horizontal and vertical lines. The number of intersections determines the
size of the board. Go is normally played on a 19×19 sized board as shown in Figure
2.4. However, smaller board sizes, such as 9×9 and 13×13 sized boards, are also used
for playing quicker games. Two players alternate in placing black and white stones on
the intersection points of the board (including the edges and corners of the board), with
the black player moving first.
The aim of Go is to surround more territory and capture more prisoners than your
opponent. Two players alternate placing stones on the intersection points on the board,
but unlike Chess, the stones do not move on the board unless they are captured.
The traditional approach of Minimax game-tree search has proven to be difficult to
2.1 Game-Tree Search 15
implement in Go due to its high branching factor. Programs which use search trees
extensively can only play on smaller boards such as 9 × 9 as a result. Many programs
such as GNU G O1 therefore resort to using knowledge-based systems such as encoding
Go knowledge in patterns and using pattern matching algorithms to choose and evaluate
potential moves.
One alternative to using game-tree search is the use of Monte Carlo search tech-
niques [Bouzy, 2003, Bouzy, 2005, Coulom, 2006, Kocsis and Szepesvári, 2006]. These
methods generate a list of potential moves, and for each move, many random games
are simulated to the endgame where evaluation can be done. The move which gives
the best average score for the current player is chosen as the move to play. However,
since the moves used for evaluations are generated at random, it is possible for a weak
move to appear strong if there are only a few specific enemy counter-move. This prob-
lem is usually handled by incorporating a shallow ply search before invoking the Monte
Carlo simulations. So while the game-tree is not searched in a Minimax manner, for-
ward pruning remains important even in Monte Carlo tree search as not considering bad
moves improves the accuracy (and efficiency) of the search. One example of a strong
1
Available at http://www.gnu.org/software/gnugo/gnugo.html
2.1 Game-Tree Search 16
Go-playing program using UCT, a Monte Carlo search method, is M O G O [Gelly et al.,
2006,Gelly and Silver, 2007]. In this thesis, however, we do not consider the application
of forward pruning in Monte Carlo search.
The playing level of even the best Go programs [Fotland, 2004] remains modest
[Müller, 2002], compared to the successes achieved in other game domains such as
Chess and Checkers. Since the playing style of computers is different from humans, it
is difficult to make an accurate assessment of the strength of current Go programs. This
is especially true as humans can learn the weaknesses of computers after a few games
and are able to defeat the programs in subsequent games. As a rough estimate, the best
Go programs are ranked about 15 kyu using conventional search techniques [Müller,
2002], and Dan level (equivalent to expert player) on 9 × 9 boards using Monte Carlo
methods [Gelly and Silver, 2007].
2.1.2 Game-Tree
A turn-based game can be represented as a game-tree, where each node in the tree rep-
resents a board position. A game-tree consists of a root node representing the current
board position, terminal nodes that represent the end of a game and interior nodes that
have a value that is the function of their child nodes. Each edge represents one possible
move, and moves change the board position from one to another.
The number of branches from each node is defined as the branching factor. The
depth of a game-tree is the maximum length of a path from the root node to a terminal
node. If we assume a game-tree of uniform branching factor b and depth d, the number of
nodes in the game-tree is O(bd ). Figure 2.5 shows the game-tree of the starting position
of a Tic Tac Toe game up to depth 2.
2.1 Game-Tree Search 17
In mathematical game theory, the zero-sum condition leads rational players to act in a
Minimax fashion. This means that both players will try to maximize their own gains,
as this will simultaneously minimize that of their opponents. From the viewpoint of the
score of a single player, this is achieved by the player maximizing the score, and his
opponent minimizing the score. Minimax search can therefore be implemented by alter-
nating between maximizing and minimizing the score. In a two-player setting, the player
maximizing the score is typically called the MAX player, and the player minimizing the
score is called the MIN player. The Minimax value of a node u defined mathematically
is
utility(u) u is a leaf node,
score∗ (u) = max{score∗ (child(u))} u is a Max node,
min{score∗ (child(u))} u is a Min node.
position as a maximizing player by negating the scores of positions resulting from moves
in the current position. To see this, we note that by definition, the score of a player is the
negation of the score of his or her opponent in any position in a zero-sum game. After
making a move in the current position, the opponent is the player to move in the resulting
position. This means that rational players will try to maximize scores evaluated by
negating the score returned by a move. Negamax search simplifies the implementation
of Minimax search as it does not have to discriminate between a MAX or MIN node,
since all nodes are MAX nodes within the search.
Minimax and Negamax search are exhaustive searches that visit all nodes of a game-tree
to find its Minimax score. This can be shown to be non-optimal in many cases where
there are nodes visited by the search that do not affect the final Minimax score.
After finding the score of the first move, say x, in a MAX node, a MAX player
should only need to be concerned with moves that result in scores greater than x, as
he is trying to maximize his score. Consider the situation where MAX makes a second
move, and the first child of that MIN node, which we denote m2 , returns a score y, such
that y ≤ x. Since the MIN node is trying to minimize the score, the eventual value of
the MIN node is at most y. The MAX parent will therefore never pick m2 since MAX
already has a move that leads to a score of x > y. In other words, the MAX node has
imposed a lower bound on its MIN children in the above example. Conversely, a MIN
node would impose an upper bound on its MAX children. The lower and upper bounds
are equivalent to the values of alpha (α) and beta (β), respectively, in Alpha-Beta search.
In other words, the alpha bound is used by MAX nodes to represent the minimum value
that MAX is guaranteed to have, while the beta bound is used by MIN to represent the
maximum value that MIN is guaranteed to have. The propagation of alpha and beta
values [Knuth and Moore, 1975] can be demonstrated using Figure 2.6.
Figure 2.6: How alpha and beta values propagate in Alpha-Beta Search
move that returns the best score in a move sequence (breaking ties at random) at each
ply.
Minimal Tree If Alpha-Beta search is shown the principal variation first, then we
know that Alpha-Beta search is able to eliminate many branches of remaining moves as
their scores are guaranteed not to affect the Minimax value at the root. It is possible to
consider a subset of a game-tree called the minimal, also known as optimal or critical,
game-tree that a search algorithm requires to determine the Minimax value of the root.
The Minimax value of the root depends only on the node values present in the minimal
2.1 Game-Tree Search 21
tree, and the values of other nodes in the game-tree do not affect the Minimax value at
the root.
We are able to obtain a Minimal tree of any given game-tree by the following pro-
cedure, shown graphically in Figure 2.7 [Marsland and Popowich, 1985, Reinefeld and
Marsland, 1987]:
2. At a PV node, at least one child has the Minimax value of the root. Define one
such child to be a PV node, and the remaining child nodes to be CUT nodes.
3. At a CUT node, at least one child has a Minimax value less than the Minimax
value of the principal variation. Define one such child to be an ALL node. All
remaining child nodes do not affect the Minimax value of the root.
The nodes searched in Alpha-Beta search can therefore be categorized into several
types; in [Knuth and Moore, 1975], the minimal game-tree is made up of type 1, type
2 and type 3 nodes, but it is more common (and clearer) to refer to these nodes as PV,
CUT and ALL nodes [Marsland and Popowich, 1985, Reinefeld and Marsland, 1987].
2.2 Search Enhancements 22
d
The best-case time complexity of Alpha-Beta search in the minimal game-tree is bd 2 e +
d
bb 2 c − 1 [Slagle and Dixon, 1969], where b is the branching factor and d is the depth
of the game-tree. This is the minimum number of nodes that must be examined by any
search algorithm to determine the Minimax value [Knuth and Moore, 1975]. However,
the worst-case time complexity of Alpha-Beta search is the same as that of Minimax
search, or bd .
While Alpha-Beta search is able to prune off branches of the game-tree that provably
cannot affect the Minimax value of the root, the time complexity, even in the best case,
is still exponential. There has therefore been much research done on improving search
performance using other techniques, which we will outline in this section.
2.2.1 Transpositions
In board games, it is possible for different sequences of moves to lead to the same board
position. Since the board positions are exactly the same, they should have the same
Minimax value. To avoid repeating the search effort needed to re-evaluate the board
position, game-playing programs should make use of transposition tables that store the
Minimax values of board positions.
The transposition table also stores key features such as the search depth, best move,
the score of the search and the search window used. Since the transposition table is typ-
ically used within an Alpha-Beta search, it needs to keep track of the bound information
of the score, which can be an exact value, a lower bound, or an upper bound. When
there is a transposition table hit, it is possible for the cached score to cause a Alpha-Beta
2.2 Search Enhancements 23
Enhanced transposition cutoff [Plaat et al., 1996] is a simple but effective method of
improving transposition table use during search - before actually searching any move,
check all successor positions and see if they are in the transposition table and can cause
a cutoff. If such a position is found, then Alpha-Beta search can immediately return a
value and no further search needs to be done.
The search window of an Alpha-Beta search algorithm is defined as the interval between
the alpha and beta values. During the search, only moves that result in scores within this
window are considered, and all other moves are pruned. If the actual Minimax value of
the root is not within the initial search window, Alpha-Beta search will not return the
actual Minimax value but will instead fail high or fail low appropriately. It is possible
to guarantee that Alpha-Beta search always returns a correct value by setting the initial
search window to (−∞, ∞).
2.2 Search Enhancements 24
Using a small search window may seem like a bad idea, since the result might not
be exact and it is necessary to re-search with larger search windows to get the correct
value. However, searches with small search windows are sped up massively as they are
able to prune off more nodes. There are several search enhancements that make use of
small search windows to improve search performance.
Aspiration Search
Aspiration search [Slate and Atkin, 1977, Baudet, 1978] works by searching with an
initial search window (v − ∆, v + ∆), where v is an estimated evaluation of the board
position and ∆ is a pre-determined range. If the search fails high, it is possible to re-
search with search window (v + ∆, ∞) to find the exact score. Similarly, if the search
fails low, a re-search with search window (−∞, v − ∆) will return the exact score.
The estimated evaluation v can be obtained via several means, such as using the
evaluation of the previous board position, or when using Iterative-Deepening Search
(Section 2.2.3), to use the evaluation of the board position at depth d − 1 when at depth
d.
The constant ∆ should compromise between the time saved from having a smaller
search window, and the time it takes to re-search if the true score is not within (v −
∆, v + ∆).
The minimal window is defined when it is the case that beta = alpha + 1. Searches
with a minimal window do not return exact scores, but instead return a bound on the
score. There are only two possible cases: (1) the search fails high and score ≥ beta =
alpha + 1 > alpha, and (2) the search fails low and score ≤ alpha.
2.2 Search Enhancements 25
This might seem like futile work as a search with a minimal window will never return
an exact score. However, we note that after evaluating at least one child node, an Alpha-
Beta search algorithm would ideally only need to consider, for any evaluation score of
subsequent child nodes, that score ≤ alpha; this occurs if the best move was the first
move searched.
The Negascout/Principal Variation Search (PVS) [Marsland, 1983, Reinefeld, 1983,
Reinefeld, 1989] works on this principle and assumes good move ordering is performed
on the game-tree. For the first move and PV nodes, the search window is the usual (α, β);
for all other moves, the search window is the minimal window (α, α + 1). If the game-
tree is a minimal Alpha-Beta game-tree, all searches with the minimal window will fail
low, and search effort is saved as the minimal search window should have reduced search
effect. If a search with the minimal window fails high, then a re-search with the usual
(α, β) search window is required to get an exact score.
MTD
MTD(f ) is called within an iterative deepening framework, and the value of the previ-
ous iteration can be used as the initial estimate to MTD(f ).
The basic Minimax search previously explained is a depth-limited depth-first search, that
is, it searches the state-space in a depth-first manner until at most a fixed depth. This
presents a problem with game-playing programs, as they require real-time decision-
making during game-play; if a search with a high depth limit does not return a result
within a time limit, the program is unable to make a move. Iterative-Deepening Search
(IDS) is a search strategy that repeatedly calls a depth-limited search with increasing
depth limit d.
By applying IDS to the root node, a game-playing program is able to respond as
soon as a search to the base depth returns a result, and yet can be allowed to search as
deeply as possible until a time limit is reached; this is desirable as research [Junghanns
et al., 1997, Thompson, 1982] empirically demonstrates a positive correlation between
increasing search depth and game-playing performance.
There is another advantage to using IDS in game-tree search, as information from
searches at lower depths can be used to improve search performance at higher search
depths; examples include the History Heuristic and Killer Heuristic that improve move
ordering, and using search evaluations of moves from lower search depths as estimates
in aspiration search.
The time complexity of an IDS that calls a depth-first search from depth 0, 1, . . . , d is
surprisingly the same as a depth-first search to depth d – O(bd+1 ), for a game-tree of uni-
form branching factor b and depth d. This is due to the dominating exponential growth
of leaf nodes. Perhaps even more surprising, it is typically the case that the number of
2.2 Search Enhancements 27
nodes searched by all iterations of an IDS is smaller than the number of nodes searched
by a single depth-first search to the full depth in modern Chess programs [Heinz, 2000]
due to the performance gains that Alpha-Beta search window algorithms such as aspira-
tion search and PVS give.
Good move ordering is essential to achieving more cutoffs when using Alpha-Beta
search [Knuth and Moore, 1975]. Variants of Alpha-Beta search like PVS and MTD(f )
also depend on good move ordering for their efficiency. A slightly better move ordering
can improve search performance by 50% − 100% and above [Heinz, 2000] in a game-
playing program. This high dependency on move ordering for good search performance
has resulted in the development of dynamic and static move ordering heuristics that
allow modern Chess programs to search game-trees that have only 20% − 30% more
nodes [Heinz, 2000] than the minimal Alpha-Beta game-tree. Move ordering heuristics
are therefore usually used in practical implementations.
Dynamic Move-Ordering
Dynamic move ordering heuristics collect information about moves during search and
use that information to better re-order subsequent moves. Examples of dynamic move
ordering heuristics are the History Heuristic [Schaeffer, 1989] and the Killer Heuris-
tic [Akl and Newborn, 1977]. Both the History Heuristic and the Killer Heuristic are
domain-independent techniques that work by storing information when a move has
worked well in previous board positions.
In Chess, the History Heuristic maintains a table for all possible from squares and to
squares for each piece. Whenever a move is considered to have worked well at depth
2.2 Search Enhancements 28
d, such as causing a beta cutoff, the value in the table referenced by that move is in-
cremented by the value of a function that grows quickly with respect to d (e.g., d2 or
2d ). The function is designed to reward moves that worked well in deep searches more
than moves that worked well in shallow searches. Moves are then ordered by sorting
them in descending order of history values. The Killer Heuristic also rewards moves
that worked well in previous instances by maintaining a list of killer moves for each ply
of the search. Killer moves are determined by counters that track the number of times
a move has worked well in that ply, and are sorted and replaced using the values of the
counters.
Static Move-Ordering
Newborn, 1991] that analyzes capture sequences. The SOMA algorithm is more so-
phisticated than SEE and takes other static tactical features such as pins, forks and mate
threats into account. While SOMA considers each attacked square at a time, SUPER-
SOMA extends the capture analysis to the whole board. SUPER-SOMA is considerably
slower than the other techniques and seems to have been successful in Shogi [Rollason,
2000].
Instead of searching the game-tree to a fixed depth limit, it is possible to explore some
parts of the game-tree more deeply, while other parts of the tree are terminated early.
Humans do this intuitively by only searching moves that are deemed interesting and
ignoring moves that are deemed useless. We will outline the search extension techniques
used in programs.
Fractional-Ply Extensions
Fractional-play extensions [Levy et al., 1989, Rijswijck, 2000, Björnsson and Marsland,
2000] is a search-extension scheme that allocates different weights (in terms of ply) to
different classes of moves. These weights can be fractional. For example, in Chess,
move classes can be checking moves and recaptures. During the search, each move is
categorized under one of the move classes, and the depth of the current move path is the
sum of the values of plies along the path. As the value of a ply can be less than (or more
than) 1.0, this allows certain move sequences to be searched deeper (or shallower).
2.2 Search Enhancements 30
Singular Extensions
Quiescence Search
Game-tree search algorithms are subject to the horizon effect - this occurs when a losing
board position is beyond the search depth of the search, and the evaluation function is
not sophiscated enough to recognize the losing move sequence at the leaf nodes. The
game-tree search proceeds by making a losing move, but returns a good evaluation. A
few moves later, the game-tree search is able to search to the losing board positions and
the evaluation returns the correct assessment that the current board position is losing.
Unfortunately, at this point, the game-playing program is unable to avoid a loss. This
is akin to the game-playing program walking down a plank towards the sharks while
blindfolded and unaware of the immediate danger it is in until it is too late.
Quiescence search mitigates the horizon effect by continuing the search even when
the given depth limit is reached until a “quiet” or more stable position is encountered.
In Chess, this would commonly refer to positions where there are no possible captures
and no player is in check. Quiescence search is therefore able to give a better estimate
of the board position, especially when using a poor evaluation function that is unable to
2.3 Chapter Conclusions 31
This chapter introduced the various search techniques of game-tree search within the
Minimax framework. We also described several search enhancement techniques such
as move-reordering and search extensions. In the next few chapters, we will expand on
the methods presented here to solve games (Chapter 5) and to perform forward pruning
(Chapter 6).
Chapter 3
Solving Tigers and Goats
This chapter has portions of text extracted from our article “Computing Tigers and Goats”
published in the International Computer Games Association Journal 27(3), pages 131-141,
2004.
In this chapter, we first introduce how games are classified and solved. We then
introduce the game of Tigers and Goats before computing the state-space complexity
and game-tree complexity of the game. We also solved the endgame of Tigers and
Goats by retrograde analysis and present statistics on the endgame database. Finally, we
discuss the complexity of solving Tigers and Goats.
Compared to research in real-time game-playing programs, there has been relatively less
work on solving games. We will outline some of the main results of solving games in
this section, but interested readers should refer to the excellent overview of solved games
by Van den Herik et al. [van den Herik et al., 2002].
32
3.1 Solving Games 33
2. Weakly Solved. Weakly solved games mean that a strategy is known to achieve the
game-theoretic value of the game from the initial position. Most of the research
done has been at this level, such as solving Nine Men’s Morris [Gasser, 1996],
Connect-Four [Allen, 1989], Go-moku [Allis, 1994] and Awari [Romein and Bal,
2003].
3. Strongly solved. Games are strongly solved if for all possible board positions,
a strategy is known to achieve the game-theoretic value of that board position.
This is the most computationally expensive level of solving a game, but it is also
the most useful for research and recreational purposes, as it allows a computer
to show the exact moves needed to achieve a specific result. For example, Awari
is strongly solved [Romein and Bal, 2003] as Romein and Bal computed several
databases that can used together to select the best move from any legal board
position.
3.1 Solving Games 34
Classification by Complexity
In this section, we describe the complexity of games by using two different measures [Al-
lis, 1994], state-space complexity and game-tree complexity. It is useful to first define
the solution depth and solution search tree of a node.
The solution depth of a node J is the minimal depth (in plies) of a full-width search
that can determine the game-theoretic value of J. The solution search tree of a node J
is the full-width search tree with the same depth as the solution depth of J.
The state-space complexity of a game is the number of different legal positions in
the solution search tree of the initial position(s) of the game. Note that the same board
positions encountered when searching different branches of the game-tree are counted
once. For example, assume that a Chess position J with White to move has 10 legal
moves and for each legal move, Black has 5 legal moves, of which 1 mates White.
Furthermore, all moves lead to distinct positions. The solution search tree of J is the
full-width tree consisting of J, the 10 children of J, and the 50 grandchildren of J.
Subsequently, the state-space complexity of J is 1 + 10 + 50 = 61.
The game-tree complexity of a game is the number of leaf nodes in the solution
search tree of the initial position(s) of the game. Note that two same board positions
encountered when searching different branches of the game-tree are counted twice. For
example, assume that a Chess position J with White to move has 10 legal moves and
for each legal move, Black has 5 legal moves, of which 1 mates White. The solution
search tree of J is the full-width tree consisting of J, the 10 children of J, and the 50
grandchildren of J. Subsequently, the game-tree complexity of J is 50. The game-tree
complexity can be estimated using Monte Carlo simulations [Knuth, 1975].
3.1 Solving Games 35
Classification by game-type
Van den Herik et al. [van den Herik et al., 2002] introduced the concept of convergence
to describe games that have a state-space that decreases in size as the game progresses.
If the number of states increases as the game progresses, the game is said to be divergent.
Examples of convergence games are Nine Men’s Morris, Mancala games, Checkers, and
Tigers and Goats; examples of divergent games are Connect-Four, Go-Moku, Othello
and Go.
Search Techniques
The most basic method to solve games is to use an exhaustive search method like
Alpha-Beta search and its variants. Unfortunately, for all but the simplest games like
Tic Tac Toe, this method is infeasible to the exponential increase in computational
effort needed to search through the state-space. To reduce the search complexity, it
is beneficial to use domain-independent techniques that exploit common features of
games. These are termed as knowledge-based methods [van den Herik et al., 2002].
Examples of knowledge-based methods are threat-space search [Allis, 1994, Thomsen,
2000, Cazenave, 2001], proof-number search [Allis et al., 1994, Allis, 1994, Seo et al.,
2001], lambda search [Thomsen, 2000] and pattern search [Rijswijck, 2000].
Retrograde Analysis
convergent games.
Retrograde analysis was pioneered in Chess, and has been successfully used in
Checkers, Nine Men’s Morris and Awari. Whenever a forward search reaches a database
position, an exact game-theoretic value of the position can be returned immediately.
The first step in retrograde analysis is to initialize all easily recognizable terminal
positions as wins or losses, and all remaining positions as a draw. An iterative process
can then correctly determine the correct value of all remaining positions by the following
logic - if a position is lost for the player to move, all predecessors of the position are wins
for the opponent. Similarly, if a position is won for the player to move, all predecessors
are potential losses for the opponent. They are only true losses if all of their successors
are also won for the player to move. See Appendix C for more details on retrograde
analysis.
Transposition Tables
The use of transposition tables is part of the trade-off between memory usage and com-
putational effort needed in game solving. By storing results in a transposition table, it is
possible to avoid recomputing board positions that have already been searched before.
However, a forward search proof has to address the so-called Graph History Interaction
(GHI) problem, which introduces errors in games that contain repeated positions. GHI
problems occur when a search algorithm uses a cached search result that depends on
the move history. For example, in Chess and Checkers, repeated positions are scored
as draws. If the cached result is reached using a different move sequence, it might be
possible that the result is no longer correct.
Because no cycles occur in the placement phase of Tigers and Goats, the GHI prob-
lem does not affect the forward search proofs presented in our solution for Tigers and
3.1 Solving Games 37
Goats. However, if the GHI problem can indeed occur in a game, forward search proofs
can use the techniques described in [Kishimoto and Müller, 2004] to avoid the GHI prob-
lem, or simply not store any position that has a value influenced by the move sequence
of the search in the transposition table.
Connection Games
In this section, we will discuss connection games. The aim of players in connection
games is to form a straight line of k pieces on the board. A simple example of a connec-
tion game is Tic Tac Toe, which can be shown to be a draw by exhaustively searching
the game-tree. Connection games typically have very high state-space complexity and
are divergent games, and therefore retrograde analysis is infeasible. Instead, knowledge-
based search techniques [Allis, 1994,Thomsen, 2000] have proven to be highly success-
ful in these types of games.
starting in any other column allows the second player to draw or even force a win.
2. Standard Go-Moku requires a row of exactly five stones to win; rows of six or
more, also known as overlines, do not count as a win.
3. Renju is played on a 15×15 board, and Black is not allowed to (1) form overlines,
(2) simultaneously form two rows of three stones that are not blocked by White’s
stone at either end, and (3) simultaneously form two rows of four stones. There
are also special rules for the opening1 .
1
Interested users can refer to Renju International Federation: The International Rules of Renju
(http://www.renju.nu/rifrules.htm) for more details.
3.1 Solving Games 39
Figure 3.2: Example Free-style Go-moku Game where Black wins by a sequence of
forced threats
The game has been shown to be a first-player win by Allis using proof number
search [Allis, 1994], which is a best-first AND/OR search. Free-style Go-Moku can
be won by the first play in 18 moves against the optimal defense.
Renju, which is played professionally in Japan, has been shown to be a first-player
win without opening rules [Wágner and Virág, 2001] using dependency-based search [Al-
lis, 1994], transposition tables, and expert knowledge for moves which are non-threats.
By using iterative-deepening search, the program was able to find threat sequences up
to 17 plies, which is sufficient to find the game-theoretic value of Renju.
k-in-a-row Games It is interesting to note that Tic Tac Toe, Connect-Four, Go-Moku
and Renju can be generalized into k-in-a-row games, where the goal is to obtain a
straight line of pieces on board. To be more precise, in Connect(m, n, k, p, q) games,
two players alternate placing p stones on a m × n board except that the first player
3.1 Solving Games 40
places q stones for the first move [Wu and Huang, 2005]. For example, Tic Tac Toe
is Connect(3,3,3,1,1), and free-style Go-Moku is Connect(15,15,5,1,1). Threat-based
searches perform very well in Connect(m,n,k,p,q) games, and many game-theoretic val-
ues of such games have been published [Uiterwijk and van den Herik, 2000, van den
Herik et al., 2002, Wu and Huang, 2005].
Mancala
Mancala is a family of board games that are generally played on a board with holes that
have pieces known as seeds in them. Players move by picking up all the seeds in a hole,
then sowing the seeds by placing one seed at each subsequent hole from the initial hole
until all seeds have been sowed. Capturing of seeds occur based on the state of the board
and the rules vary widely between games.
Kalah Kalah is a game in the Mancala family and most variants of this game have been
solved and shown to be a first-player win in most cases [Irving et al., 2000]. Endgame
databases for Kalah were first built using retrograde analysis. By using iterative-deepening
MTD(f ), the program is able to solve several starting configurations of Kalah up to six
holes and five counters per hole. Search enhancement techniques such as move order-
ing, transposition tables, Futility Pruning, and enhanced transposition cut-off were also
used.
Awari Awari is also a variant of the Mancala family of games that allows ‘grand slams’
to end games. This allowed Bal and Romein [Romein and Bal, 2003] to use retrograde
analysis to create endgame databases. The scores for 889,063,398,406 positions were
determined by parallel retrograde analysis. The endgame databases store the scores
3.1 Solving Games 41
rather than best moves, and therefore some forward search is required [Lincke, 2002]
to find the correct move to make. By combining this huge endgame databases with
minimal forward search, they showed that the game-theoretic value of Awari is a draw.
Furthermore, as the endgame databases contain all positions that can occur in a game,
it is possible to obtain the game-theoretical score of every position given reasonable
resource constraints. This means that the game of Awari is strongly solved.
Nine Men’s Morris is a two-player game that is played on a board with 24 intersections
consisting of interlocking squares. Each player has nine pieces and the objective of the
game is to remove all enemy pieces.
The game starts with an empty board and players alternate placing pieces on any
empty intersection. After all pieces have been placed by both players, players take turns
moving the pieces along one of the board lines to an adjacent intersection. If a move
by a player results in a line of three, also known as a mill, on the board, the player can
remove one enemy piece that is not part of a mill. An excellent position for a player is
to be able to move one piece between two mills, thereby removing an opponent piece at
every turn, as shown in Figure 3.4.
The game has been shown by Gasser [Gasser, 1996] to be a draw for both play-
ers under best play. This was done by performing a forward search from the starting
3.1 Solving Games 42
Figure 3.4: Example Nine Men’s Morris Game - White to move; Black wins
board position and terminating at the endgame databases. Gasser first built the endgame
databases by using retrograde analysis. After removing unreachable positions and sym-
metrical positions, the total number of distinct positions in the endgame database is
7,673,759,269. A 18-ply forward search was then started from the starting position us-
ing a subset of databases and Gasser was able to show that both players have at least a
draw, therefore establishing that the game is a draw under optimal play.
Checkers Endgame
A team headed by Schaeffer has been working on Checkers endgame databases, and
have computed the game-theoretic values of all Checkers positions up to 10 piece po-
sitions [Schaeffer et al., 2005]. These endgame databases play a vital role in finding
the game-theoretic value of Checkers as the forward searches are able to terminate and
return an evaluation when an endgame position in the database is seen. For example,
the opening White Doctor has been shown to be a draw [Schaeffer et al., 2005] using a
hybrid combination of best-first and depth-first searches with the endgame databases. In
solving the traditional game, the researchers have also solved 21 of the 156 three-move
openings. By using these endgame databases and forward search, Checkers is weakly
3.2 Introduction to Tigers and Goats 43
solved [Schaeffer et al., 2007] (although it has elements of being strongly-solved due to
storing the proof-tree of the solution), and is shown to be a draw from the initial starting
position.
Chess Endgame
A Chess endgame database contains all possible endgame positions with small groups
of material, and their game-theoretic value of win, lose or draw. This allows a forward
search in a computer Chess program to return a game-theoretical result without search-
ing further. In addition, Chess endgame databases have been used to verify or change
historical endgame analysis by humans chess experts.
The first Chess databases were the four and five piece Chess endgame databases
[Thompson, 1986] computed by Thompson using retrograde analysis in 1986. As of
2006, all endgame positions with 6 or fewer pieces (including the two kings) have been
completely solved [Bourzutschky et al., 2005].
Bagha Chal, or “Moving Tiger”, is an ancient Nepali board game, which has recently at-
tracted attention among game fans under the name Tigers and Goats. This game between
two opponents, whom we call “Tiger” and “Goat”, is similar in concept to a number of
other asymmetric games played around the world - asymmetric in the sense that the op-
ponents fight with weapons of different characteristics, a feature whose entertainment
value has been known since the days of Roman gladiator combat.
3.2 Introduction to Tigers and Goats 44
On the small, crowded board of 5 × 5 grid points shown in Figure 3.5, four tigers
face up to 20 goats. A goat that strays away from the safety of the herd and ventures
next to a tiger gets eaten, and the goats lose if too many of them get swallowed up. A
tiger that gets trapped by a herd of goats is immobilized, and the tigers lose if none of
them can move. Various games share the characteristic that a multitude of weak pieces
tries to corner a few stronger pieces, such as “Fox and Geese” in various versions, as
described in “Winning ways” [Berlekamp et al., 2001] and other sources.
The rules of Tigers and Goats are simple. The game starts with the four tigers placed
on the four corner spots (grid points), followed by alternating moves with Goat to play
first. In a placement phase, which lasts 39 plies, Goat drops his 20 goats, one on each
move, on any empty spot. Tiger moves one of his tigers according to either of the
following two rules:
• a tiger can slide from his current spot to any empty spot that is adjacent and con-
nected by a line, or
• a tiger may jump in a straight line over any single adjacent goat, thereby killing
the goat (removing it from the board), provided the landing spot beyond the goat
is empty.
If Tiger has no legal move, he loses the game; if a certain number of goats have been
killed (typically five), Goat loses.
These rules are illustrated in the Figure 3.2, which also show that Goat loses a goat
within 10 plies unless his first move is on the center spot of a border. Goat has five
distinct first moves (ignoring symmetric variants). All but the first move shown in Figure
3.5 lead to the capture of a goat within at most 10 plies, as these two forcing sequences
show. At the right of Figure 3.2, Tiger’s last move eight sets up a double attack against
3.2 Introduction to Tigers and Goats 45
T G T T G G T G
G G G G
G G G G G
G G G T
T T T G G
Figure 3.5: Left: the position after the only (modulo symmetry) first Goat move that
avoids an early capture of a goat. Right: puzzle with Goat to win in 5 plies if Tiger
captures a goat
T T T T
3 2
1 4 6 8 1
2 4 3 5
T T T 7 T
Figure 3.6: Two of the five initial Goat moves that lead to the capture of a goat
The 39-ply placement phase is followed by the sliding phase that can last forever.
Whereas the legal Tiger moves remain the same, the Goat rule changes: on his turn to
play, Goat must slide any of his surviving goats to an adjacent empty spot connected by
a line. If there are 17 or fewer goats on the board, four tigers cannot block all of them
and such a move always exists. In some exceptional cases (which arise only if Goat
cooperates with Tiger) with 18 or more goats, the four tigers can surround and block off
a corner and prevent any goat moves. Since Goat has no legal moves, he loses the game.
3.3 Analysis of Tigers and Goats 46
Although various web pages that describe Tigers and Goats offer some advice on
how to play the game, we have found no expert know-how about strategy and tactics.
Plausible rules of thumb about good and bad play include the following. First, it is
obvious that the goats have to hug the border during the placement phase - any goat
that strays into the center will either get eaten or cause the demise of some other goat.
Goat’s strategy sounds simple: first populate the borders, and when at full strength, try
to advance in unbroken formation, in the hope of suffocating the tigers. Unfortunately,
this recipe is simpler to state than to execute. In contrast, we have found no active Tiger
strategy. It appears that the tigers cannot do much better than to wait, “doing nothing”,
i.e. moving back and forth, until near the end of the placement phase. Their goal is to
stay far apart from each other, for two reasons: one, in order to probe the full length of
the goats’ front line for gaps; two, so as to make it hard for the goats to immobilize all the
four tigers at the same time. Tiger’s big chance comes during the sliding phase, when
the compulsion to move causes some goat to step forward and offers Tiger a forcing
sequence that leads to capture. Thus, it seems that Tiger’s play is all tactics, illustrating
Chess grandmaster Tartakover’s famous pronouncement: “Tactics is what you do when
there is something to do. Strategy is what you do when there is nothing to do”.
This section discusses the size and structure of the two relevant state-spaces, correspond-
ing to the placement phase and the sliding phase. We compute their sizes using Polya’s
theory of counting (See Appendix B). We also describe the sliding phase data bases and
the placement phase forward searches respectively, with statistical summaries.
3.3 Analysis of Tigers and Goats 47
The first objective when attacking any search problem is to learn as much as possible
about the size and structure of the state-space in which the search takes place. For Tigers
and Goats it is convenient to partition this space into 6 subspaces:
S0 : all the positions that can occur during the placement phase,
including 4 tigers and 1 to 20 goats.
Sk : for k = 1 . . . 5, all the positions that can occur during the sliding phase,
with 4 tigers, 21 − k goats, and k empty spots on the board.
Notice that any position in any S1 to S5 visually looks exactly like some position in S0 ,
yet the two are different positions: in S0 , the legal Goat moves are to drop a goat onto
an empty spot, whereas in S1 to S5 , the legal moves are to slide one of the goats already
on the board. For each subspace S1 to S5 , a position is determined by the placement of
pieces on the board, which we call the board image, and by the player whose turn it is
to move. Thus, the number of positions in S1 to S5 is twice the number of distinct board
images.
For S0 , however, counting positions is more difficult, since the same board image can
arise from several different positions, depending on how many goats have been captured.
As an example consider an arbitrary board image in S5 , i.e. with 16 goats and 5 empty
spots. This same board image could have arisen, as an element of S0 , from 10 different
positions, in which 0, 1, 2, 3, or 4 goats have been captured, and in each case, it is
either Tiger’s or Goat’s turn to move. Although for board images in S1 through S4 the
multiplier is less than 10, these small subspaces do not diminish the average multiplier
by much. Thus, we estimate that the number of positions in S0 is close to 10 times the
number of board images in S0 , which amounts to about 33 billion.
3.3 Analysis of Tigers and Goats 48
Since the game board has all the symmetries of a square that can be rotated and
flipped, many board positions have symmetric “siblings” that behave identically for all
game purposes. Thus, all the spaces S0 to S5 can be reduced in size by roughly a factor of
8, so as to contain only positions that are pairwise inequivalent. Using Polya’s counting
theory [Polya, 1937] we computed the exact size of the symmetry-reduced state-spaces
S1 to S5 , and of the board images of S0 , as shown in Table 3.1.
Table 3.1: Number of distinct board images and positions for corresponding subspaces
S0 is very much larger than all of S1 to S5 together, and has a more complex structure.
Due to captures during the placement phase, play in S0 can proceed back and forth
between more or fewer goats on the board, whereas play in the sliding phase proceeds
monotonically from Sk to Sk+1 . These two facts suggest that the subspaces are analyzed
differently: S1 to S5 are analyzed exhaustively using retrograde analysis, whereas S0 is
probed selectively using forward search [Gasser, 1996].
More detailed statistics of the distribution of won, drawn and lost positions can be found
in Appendix A.
Num Goat to Goat to Goat to Total Tiger to Tiger to Tiger to Total
Goats Move Wins Move Move Move Wins Move Move
Cap- Draws Losses Draws Losses
tured
4 913,153 8,045,787 23,229,230 32,188,170 30,469,634 1,569,409 149,127 32,188,170
(2.8%) (25.0%) (72.2%) (100%) (94.7%) (4.9%) (0.5%) (100%)
3 1,315,111 6,226,358 1,928,496 9,469,965 6,260,219 2,918,104 291,642 9,469,965
(13.9%) (65.7%) (20.4%) (100%) (66.1%) (30.8%) (3.1%) (100%)
2 882,523 1,199,231 23,941 2,105,695 465,721 1,353,969 286,005 2,105,695
(41.9%) (57.0%) (1.1%) (100%) (22.1%) (64.3%) (13.6%) (100%)
1 252,381 80,706 88 (0.03%) 333,175 6,452 197,537 129,186 333,175
(75.8%) (24.2%) (100%) (1.9%) (59.3%) (38.8%) (100%)
0 30,609 2,812 60 (0.2%) 33,481 146 (0.4%) 9,468 23,867 33,481
(91.4%) (8.4%) (100%) (28.3%) (71.3%) (100%)
The search space S0 , with approximately 33 billion positions, is too large for a static
data structure that stores each position exactly once. Hence it is generated on the fly,
with portions of it stored in hash tables. As a consequence, the same position may be
generated and analyzed repeatedly. A worst case measure of the work thus generated is
called game-tree complexity. The size of the full search tree can be estimated by a Monte
Carlo technique as described by [Knuth, 1975]. For each of a number of random paths
from the root to a leaf, we evaluate the quantity F = 1 + f1 + f1 × f2 + f1 × f2 × f3 + . . .,
3.4 Complexity of Solving Tigers and Goats 50
where fj is the fan out, i.e. the number of children, of the node at level j encountered
along this path. The average of these values F , taken over the random paths sampled, is
the expected number of nodes in the full search tree. Table 3.3 lists the estimated game-
tree complexity (after the removal of symmetric positions) of five different “games”,
where the game ends by capturing one to five goats during the placement phase. These
estimates are based on 100,000 path samples.
Table 3.4 is reproduced from [van den Herik et al., 2002] and compares the state-space
complexity and game-tree complexity of several games together with the placement and
sliding phases of Tigers and Goats (discussed in Chapter 3). One conclusion reached by
[van den Herik et al., 2002] is that the determining factors in solving a game is either low
state-space complexity or low game-tree complexity. In general, brute-force methods are
used to solve games like Nine Men’s Morris, Kalah and Awari with relatively low state-
space complexity. Knowledge-based methods are used to solve games like Go-Moku
and Renju with relatively low game-tree complexity. Finally, games like Connect-Four
and Qubic with a low state-space complexity and a low game-tree complexity are solved
by both methods.
As we have solved the sliding phase of Tigers and Goats by retrograde analysis, we
3.4 Complexity of Solving Tigers and Goats 51
are primarily concerned with the complexity of solving the placement phase of Tigers
and Goats. While the placement phase of Tigers and Goats has low state-space com-
plexity, its game-tree complexity is larger than that of Awari and Checkers. In addition,
unlike the solving of Nine Men’s Morris [Gasser, 1996], we do not have the benefit of
an existing strong program that can play the game well, or the availability of experts to
guide us in constructing heuristics for the game. The moderately high game-tree com-
plexity of Tigers and Goats therefore presents a challenge to finding the game-theoretic
value of the game.
3.5 Chapter Conclusions 52
In this chapter, we introduced the game of Tigers and Goats and analyzed its state-
space complexity and game-tree complexity. In addition, we solved the sliding phase
(endgame phase) of Tigers and Goats by computing the game-theoretical values of all
88,260,972 positions. The complexity of solving Tigers and Goats was compared with
other games and we found that solving the placement phase of the game is computation-
ally hard but within current state of the art.
Chapter 4
Goat has at least a Draw
Consider a computer system built to the specifications of D EEP B LUE so that it can
search 200 million positions per second in the game of Tigers and Goats. It would still
take approximately 1.5 × 1025 years to completely traverse the game-tree of Tigers and
Goats, which is estimated at 1041 . The computationally hard problem of finding the
game-theoretic value of Tigers and Goats clearly requires a more directed search.
However, the naı̈ve idea of performing selective Minimax search for both players
accomplishes little - if both players’ moves are forward pruned during the search, then
the result is neither a lower nor upper bound on the true game-theoretic value. If, on the
other hand, we forward prune only Goat moves during search, then the result returned
is a lower bound on the true game-theoretic value for Goat. This is simple to see once
we note that the moves forward pruned during search might lead to better results for
Goat. Similarly, if we forward prune only Tiger moves, then the result returned is a
lower bound on the game-theoretic value for Tiger.
We attempt to show that the game is at least a draw for Goat as the higher branching
factor of Goat means that the reduction in computational effort if Goat is selectively
53
4.1 Cutting Search Trees in Half 54
searched is potentially much larger than if Tiger is selectively searched. This requires
good heuristics to differentiate between “good” and “bad” Goat moves during the search.
The idea of incorporating domain-specific knowledge to speed up brute-force meth-
ods to solve games is not new - games like Connect-Four [Allis, 1994], Nine Men’s
Morris [Gasser, 1996], Go-Moku [Allis, 1994] and Kalah [Irving et al., 2000] have been
solved using a combination of brute-force and knowledge-based methods. However,
given our lack of access to human expertise, we used evolutionary computing to learn
heuristic players.
In this chapter, we show how we developed these heuristic players and how we used
them to perform forward pruning in order reduce the game-tree complexity sufficiently
to be able to show that Goat has at least a draw.
A 39-ply search with a branching factor that often exceeds a dozen legal moves is a big
challenge. Therefore, the key to successful forward searches through the state-space S0
of the placement phase is to replace a 39-ply search with a number of carefully designed
searches that are effectively only 20 plies deep. This is achieved by 1) formulating
the hypotheses of the type “player X can achieve result Y”, such as “Goat has at least
a draw”, 2) programming a competent and efficient heuristic player X that generates
only one or a few candidate moves in each position, and 3) confronting the selective
player X with his exhaustive opponent who tries all his legal moves. If this search
that alternates selective and exhaustive move generation succeeds, the hypothesis Y is
proven. In the ideal scenario, the search tree is “cut in half” to a 20-ply search if the first
move suggested by the selective player X succeeds against all his exhaustive opponent’s
4.1 Cutting Search Trees in Half 55
legal moves.
If the search fails, one may try to develop a stronger heuristic player X, or weaken
the hypothesis, e.g. from “X wins” to “X can get at least a draw”. Using such searches
designed to verify a specific hypothesis we were able to prove several results including
the following: 1) Tiger can force the capture of a single goat within 39 plies, i.e. within
the placement phase, and 2) Tiger can force the capture of two goats within 40 plies, i.e.
by the end of the placement phase, but not earlier.
In order to make these searches feasible we had to develop strong heuristic players.
Given our lack of access to human expertise, we developed player programs that learn
from experience by being pitted against each other. For example, the proof that Tiger
can kill a certain number of goats requires a strong Tiger that tries to overcome an
exhaustive Goat. Conversely, the proof that Goat has a drawing strategy after the most
plausible opening requires a strong heuristic Goat that defies an exhaustive Tiger.
Since Goat usually has more legal moves than Tiger, it seemed prudent to prove that
Goat has at least a draw based on the winning criterion of capturing five goats by focus-
ing on developing good heuristic Goat players. Recently, Chellapilla and Fogel imple-
mented a co-evolutionary system [Chellapilla and Fogel, 2001b] using neural networks
as evaluation functions that taught itself to play Checkers at expert level. Co-evolution is
a flexible method for learning strategies to complex games where there is little available
information about the domain. In the absence of expert knowledge for Tigers and Goats,
we used co-evolutionary computing to learn a heuristic Goat player.
Co-evolutionary techniques have been applied to several games in the past, including
4.2 Neural Network Architecture 56
Chess [Kendall and Whitewell, 2001], Go [Luuberts and Miikkulainen, 2001], and Oth-
ello [Moriarty and Miikulainen, 1995]. However, possibly the most successful demon-
stration of the machine learning capability of evolutionary computation in the creating
of a game-playing program was performed by Kumar Chellapilla and David Fogel in the
creation of their Checkers program Anaconda [Chellapilla and Fogel, 1999, Chellapilla
and Fogel, 2001b, Fogel, 2002]. This program, created using co-evolution of artificial
neural networks without expert knowledge, is able to play Checkers at expert level, and
is a success story for evolutionary computing.
The setup of our neural networks follows the one used in [Chellapilla and Fogel, 1999],
but instead of restricting the evaluation function inputs to the raw board, we selected
26 features (shown in Figure 4.1) of the board position as inputs. Examples of the
selected features include whose turn to move, the number of goats captured so far, the
number of tigers which are currently trapped (i.e., unable to move), the number of legal
tiger moves, and the number of goat moves that avoid immediate capture. An initial
population of neural networks is generated with random weights. Each neural network
competes against a fixed number of randomly chosen opponents and its fitness score
is determined by the results of these games. After every neural network has played its
games, the “fittest” or highest-scoring networks are retained, while the rest are removed.
Each of the retained neural networks creates an offspring by mutating its weights. The
process is then repeated until the desired number of iterations is reached.
The architecture of each neural network is illustrated in Figure 4.1: (1) the inputs
have 24 nodes representing the current board position and there are 2 inputs directly
4.2 Neural Network Architecture 57
linked to the output node, (2) there are two hidden layers and they consist of 12 and
five nodes, respectively, and (3) the single output node gives the evaluation of the board.
The transfer function of each node in the neural network is the hyperbolic tangent (tanh
bounded by ±1). Hence an output value closer to 1.0 denotes a better position for the
player to move, and a value closer to -1.0 denotes a worse position.
Formally, we define the output o of a neuron with weights {w1 , w2 , . . . , wN } and
inputs {x1 , x2 , . . . , xN } by
XN
o = tanh( wi xi + σ) (4.1)
i=1
Figure 4.1: Neural Network Architecture of Tigers and Goats evaluation function
When initializing the neural networks, the connection weights are generated ran-
domly from a uniform distribution over [-0.2, 0.2]. Each connection also has an associ-
ated threshold σi (j), for j = 1, . . . , Nw , which are all set initially to 0.05. Reproduction
is achieved solely via mutation (no crossover operation is used). Specifically, for each
parent Pi , an offspring Pi0 is created by
For our experiments, the resultant neural networks are used to perform move ordering
and forward pruning during a search to prove the hypothesis “Goat can at least draw the
game”. Pitting a heuristic Goat player against all attacks by Tiger breaks into six cases
as shown in Figure 4.2. However, as mentioned in chapter 3, only the move at the middle
of the edge avoids an early capture. We therefore started the searches in our experiments
by first placing a goat in the middle of the edge.
T T
T T
Figure 4.2: Goat has six symmetrically distinct initial moves (highlighted)
The top neural network obtained after 1,000 generations is used as a move ordering
mechanism in a search to prove that Goat can at least draw the game. At each Goat-to-
move node, moves are re-ordered according to a two-ply shallow search using the neural
network as the evaluation function. Only the top three Goat moves are then tried, and
the rest of the legal moves are pruned.
We experimented with four co-evolutionary setups as shown in Table 4.2. These
setups are combinations of two features - (1) the number of populations (OnePop or
TwoPop) and (2) whether or not the search depths are biased depending on which player
4.3 Variants of Learning Heuristic Players 60
Name Description
OnePopNormal A single population of 30 neural networks is created.
When playing as Tiger, the search is limited to depth 4.
When playing as Goat, the search is limited to depth 4.
OnePopBiased A single population of 30 neural networks is created.
When playing as Tiger, the search is limited to depth 4.
When playing as Goat, the search is limited to depth 2.
TwoPopNormal Two populations of 20 neural networks each are created.
One population is designated as the evaluation functions for Goat and the other for Tiger.
When playing as Tiger, the search is limited to depth 4.
When playing as Goat, the search is limited to depth 4.
TwoPopBiased Two populations of 20 neural networks each are created.
One population is designated as the evaluation functions for Goat and the other for Tiger.
When playing as Tiger, the search is limited to depth 4.
When playing as Goat, the search is limited to depth 2.
Each neural network plays against four randomly chosen neural networks of the
opposite player. Similar to [Chellapilla and Fogel, 1999], we employ an asymmetric
scoring system where we reward a win more than we penalize a loss by awarding +2
to a win, 0 to a draw and -1 to a loss. The fitness of each neural network is the sum of
the scores achieved after playing the four opponents. For single population setups, the
top 15 neural networks are retained for the next generation. For two population setups,
the top 10 neural networks of each population are retained for the next generation. Each
retained neural network generates one offspring by mutating its weights. This is the
end of one generation and the process is repeated until 1,000 generations have been
produced.
Of the four co-evolutionary setups shown in Table 4.2, only TwoPopBiased succeeded
in proving that in the initial position shown in the left of Figure 3.5 is at least a draw
for Goat. In this experiment, we used a simple synchronous Alpha-Beta search [Hopp
4.4 Performance of Heuristic Players 62
and Sanders, 1995] to perform parallel tree search. Using 10 Pentium 4 Xeons, each
process managing its own hash table of 8,000,000 entries, the search took 6 months to
prove this position to be at least a draw for Goats. All the other three setups failed during
the search, i.e. Tiger manages to find a winning sequence against the defensive moves
suggested by the neural network.
3
Average Score
0
1 100 200 300 400 500 600 700 800 900 1000
-1
-2
Generations
Tiger Goat
Figure 4.3: Average scores of all neural networks of each generation in TwoPopNormal
It is useful to see how the biased search depths affect the co-evolution of the neural
networks. In Figure 4.3, we see that the Tiger and Goat are comparable in relative play-
ing ability across the generations during co-evolutionary in TwoPopNormal. Normally,
this is desirable as the dynamics of competitive co-evolution [Rosin and Belew, 1997]
would result in an “arms-race” where both players try to outmaneuver each other by
learning better strategies. However, an aggressive Goat player that tries to win by mak-
ing risky moves would not likely be a good heuristic to use in proving that the game is
at least a draw for Goat. Notice that the average scores of all neural network are usually
above 0. This is due to the unbalanced scoring system that assigns +2 to wins, 0 to draws
and -1 to losses.
4.4 Performance of Heuristic Players 63
The plot of averages scores for TwoPopBiased in Figure 4.4 shows that the higher
search depth of the heuristic Tiger player resulted in Tiger initially winning more often
during the evolutionary process. However, Goat soon learnt to draw the game despite
having the handicap of a lower search depth. The handicap during the evolutionary
process prepares the neural network for Goat to handle opponents that search deeper
than it does, and therefore the neural network is still be able to achieve a draw. This is
vital, as the top neural network for Goat after 1,000 generations is used to perform move
ordering and forward pruning during the search.
The plots for OnePopNormal and OnePopBiased are not shown as they provide little
insight to the individual playing ability of either player. From the results, we see that
while the single population mechanics of [Chellapilla and Fogel, 1999] is well suited to
learning evaluation functions for both players, it is difficult to use the same mechanism
to create an effective heuristic.
6
2
Average Score
0
1 100 200 300 400 500 600 700 800 900 1000
-2
-4
-6
Generations
Tiger Goat
Figure 4.4: Average scores of all neural networks of each generation in TwoPopBiased
4.5 Chapter Conclusion 64
the forthcoming book Games of No Chance 3, edited by Michael Albert and Richard
Nowakowski.
We were unable to discover easily formulated advice to players beyond plausible rules-
of-thumb such as “goats cautiously hug the border, tigers patiently wait to spring a sur-
prise attack”. On the other hand, our database explains the seemingly arbitrary number
“five” in the usual winning criterion “Tiger wins when five goats have been killed”. This
magic number “five” must have been observed as the best way to balance the chances.
We know that Tiger can kill some goats, so Tiger’s challenge must be more ambitious
65
5.2 Tiger has at least a Draw 66
than “kill any one goat”. On the other hand, we see from Table 3.2 that there is a sig-
nificant jump in the number of lost positions for Goat from three goats captured to four
goats captured. It is therefore fairly safe to conjecture that once half a dozen goats are
gone, they are all gone - Goat lacks the critical mass to put up resistance. But as long as
there are at least 16 goats on the board, i.e. at most four goats are captured, the herd is
still large enough to have a chance to trap the tigers.
Table 3.2 also shows that unless Tiger succeeds in capturing at least two goats during
the placement phase, he has practically no chance of winning. If he enters the sliding
phase facing 19 goats, less than 2% of all positions are won for Tiger, regardless of
whether it is his turn to move or not. The fact that Tiger can indeed force the capture of
two goats within 40 plies, i.e. by the end of the placement phase, as stated in Section
4.1, is another example of how well-balanced the opponents’ chances are.
The result that Goat has at least a draw had brought us tantalizingly close to determining
the game-theoretic value of Tigers and Goats. Computing the endgame database had
been relatively straightforward, but the 39-ply forward search had not yielded to the ju-
dicious application of established techniques. Experience had shown that by “cutting the
tree in half” as described in Section 4.1, i.e. approximating a 39-ply search by various
19-ply and 20-ply searches, we were able to answer a variety of questions. It appeared
plausible that by formulating sufficiently many well-chosen hypotheses this approach
would eventually yield a complete analysis of the game. We conjectured that Tiger also
has a drawing strategy, and set out to try to prove this using the same techniques that had
yielded Goat’s drawing strategy.
5.2 Tiger has at least a Draw 67
The asymmetric role of the two opponents, however, made itself felt at this point:
the searches pitting a heuristic Tiger player against an exhaustive Goat progressed no-
ticeably more slowly than those involving a heuristic Goat versus an exhaustive Tiger.
In retrospect we interpret this different behavior as due to the phenomenon “Tiger’s play
is all tactics”. Positional considerations - keep the goats huddled together - make it easy
to generate one or a few “probably safe” Goat’s moves, even without any look-ahead at
the immediate consequences. For Tiger, on the other hand, neither we nor apparently the
neural network that trained the player succeeded in recognizing “good moves” without a
local search. An attempt to make Tiger a stronger hunter (by considering the top 3 moves
suggested by the neural network followed by a few plies of full-width search) is incon-
sistent with the approach of “cutting the tree in half” and made the search unacceptably
slow.
Thus, a new approach had to be devised. The experience that 20-ply forward searches
proved feasible suggests a more direct approach: compute a database of positions of
known value halfway down the search tree. Specifically, we define halfway position as
one arising after 19 plies, i.e. after the placement of 10 goats, with Tiger to move next.
The value of any such position can be computed with a search that ends in the endgame
database after at most 20 plies. If sufficiently many such “halfway positions” are known
and stored, searches from the root of the tree (the starting position of the game) will run
into them and terminate the search after at most 19 plies.
The problem with this approach is that the number of halfway positions is large,
even after symmetric variants have been eliminated. Because of captures not all 10 goats
placed may still be on the board, hence a halfway position has anywhere between 6 and
10 goats, and correspondingly, 15 to 11 empty spots. Using the terminology of Section
3.3, the set of halfway positions is (perhaps a subset of) the union of S11 , S12 , S13 ,
5.2 Tiger has at least a Draw 68
S14 and S15 , where Sk is the set of all symmetrically inequivalent positions containing 4
tigers, 21−k goats, and k empty spaces. S11 , with about equally as many goats as empty
spots, is particularly large. On the assumption that in any subspace Sk the number of
symmetrically inequivalent positions is close to 1/8 of the total, S11 contains about 550
million inequivalent positions. The union of S11 through S15 contains about 1.6 × 109
positions. This number is about 25 times larger than the largest endgame database we
had computed before, namely S5 .
The approach to overcome the problem of constructing a large halfway database
exploits two ideas. First, the database of halfway positions of known value need not
necessarily include all halfway positions. In order to prove that Tiger has a drawing
strategy, the database need only include a sufficient number of positions known to be
drawn or a win for Tiger so that any forward search is trapped by the filter of these
positions. Second, the database of halfway positions is built on the fly: whenever a
halfway position is encountered whose value is unknown, this position is entered into
the database and a full-width search continues until its value has been computed.
Although there was no a priori certainty that this approach would terminate within a
reasonable time, trial and error and repeated program optimization over a period of five
months led to success. Table 5.1 contains the statistics of the halfway database actually
constructed. For each of S15 through S11 , it shows the number of positions whose value
was actually computed, broken down into the two categories relevant from Tiger’s point
of view, win-or-draw vs. loss.
Although the construction of the halfway database is intertwined with the forward
searches, i.e. a position is added and evaluated only as needed, logically it is clearest to
separate the two. We discuss details of the forward searches in the next section.
5.3 Implementation, Optimization, and Verification 69
Table 5.1: Halfway database statistics: the number of positions computed and their value
from Tiger’s point of view: win-or-draw vs. loss
Our investigation of Tigers and Goats has been active, on and off, for the past three
years. The resources used have varied form a Pentium 4 personal computer to a cluster
of Linux PC workstations. Hundreds of computer runs were used to explore the state-
space, test and confirm hypotheses, and verify results. The longest continuous run lasted
for five months as a background process on an Apple PowerMac G5 used mainly for web
surfing.
The algorithmic search techniques used are standard, but three main challenges must
be overcome in order to succeed with an extensive search problem such as Tigers and
Goats. First, efficiency must be pushed to the limit by adapting general techniques to
the specific problem at hand, such as the decision described above on how to combine
different search techniques. Second, programs must be optimized for each of the com-
puter systems used. Third, the results obtained must be verified to insure they are indeed
correct. We address these three issues as follows.
The two databases constructed, of endgame positions and halfway positions, limit all
forward searches to at most 20 plies. Still, performing a large number of 20-ply searches
5.3 Implementation, Optimization, and Verification 70
in a tree with an average branching factor of 10 remains a challenge that calls for opti-
mization wherever possible.
The most profitable source of optimizations is the high degree of symmetry of the
game board. Whereas the construction of the two databases of endgame and halfway
positions is designed to avoid symmetric variants, this same desirable goal proved not
to be feasible during forward searches - it would have meant constructing a database
consisting of all positions. Instead, the goal is to avoid generating some, though not
necessarily all, symmetrically equivalent positions in the first place when this can be
done fast, namely during move generation.
During the placement phase, any empty intersection is a legal goat move. Thus for
each empty point on the board, we pre-compute the result of performing each symmetry
operation and store the operations that generate the lowest board index. During the
forward search, we check which symmetries are active on the board. The empty points
that generate the lowest index from an active symmetry are redundant, as there is already
an empty point (and therefore a valid move by Goat) that already generates the same
board position under symmetry. For the Goat player, this check is sufficient to generate
each canonical position once.
For Tiger moves, the general idea is straightforward - Any position that arises dur-
ing the search is analyzed to determine all active symmetries. Thereafter, among all the
moves that generate symmetric outcomes, only the one that generates the resulting posi-
tion of lowest index is retained. This analysis guarantees that all immediate successors
to any given position are inequivalent. Because of transpositions, of course, symmetric
variants will appear among successor positions further down in the tree. In other words,
this check ensures that only “canonical” Tigers can move. However, an additional check
is needed to ensure that the Tiger makes only canonical moves.
5.3 Implementation, Optimization, and Verification 71
Table 5.2 shows the effect of this symmetry-avoiding move generation for the start-
ing position. Although there is a considerable reduction in the number of positions
generated, the relative savings diminish with an expanding horizon.
Ply Naı̈ve move gen. Symmetry-avoiding move gen. # of distinct positions
1 21 5 5
2 252 36 33
3 5,052 695 354
4 68,204 9,245 2,709
5 1,304,788 173,356 18,906
6 18,592,000 2,441,126 93,812
The proof that Goat has at least a draw in Tigers and Goats used a cluster of eight
Linux PC workstations with a simple synchronous distributed game-tree search algo-
rithm. However, there are fundamental problems with synchronous algorithms, dis-
cussed in [Brockington, 1997], that limit their efficiency. Furthermore, the cluster was
becoming more popular and was constantly overloaded. We therefore decided against
implementing a more sophisticated asynchronous game-tree search and instead relied on
a sequential program running on a single dedicated processor.
We focused our attention on improving the sequential program to run on an Apple
PowerMac G5 1.8 GHz machine running Mac OS-X. Firstly, the neural network code
was optimized using the Single Instruction Multiple Data (SIMD) unit in the PowerPC
architecture called AltiVec. AltiVec consists of highly parallel operations which allow
simultaneous execution of up to 16 operations in a single clock cycle. This provided a
modest improvement of about 15% to the efficiency of neural network evaluations of the
board, but sped up the overall efficiency of the search much more as the neural network
is used repeatedly within the search to evaluate and re-order the moves.
5.3 Implementation, Optimization, and Verification 72
Next, we moved much of the computations off-line. For example, the moves for
Tiger at each point on the board in every combination of surrounding pieces are pre-
computed into a table so that the program simply retrieves the table and appends it to
the move list during search. Operations like the indexing of the board and symmetry
transformation are also pre-computed such that the program only needs to retrieve data
from memory to get the result. Finally, we recompiled the software with G5-specific
optimizations.
5.3.3 Verification
Two independent re-searches confirm different components of the result. They used
separately coded programs written in C, and took 2 months to complete.
The first verification search used the database of halfway positions to confirm the
result at the root, namely, “Tiger has a drawing strategy”. Notice that this verification
used only the positions marked as win-or-draw in the database.
The second verification search confirmed the halfway positions marked as win-or-
draw by searching to the endgame database generated by the retrograde analysis de-
scribed in [Lim and Nievergelt, 2004]. All other positions can be ignored, as they have
no effect on the first search.
Another program was written in C to ‘re-prove’ the results. This program had the benefit
of a posteriori knowledge that the game is a draw, and this fact allowed us to concentrate
on using aggressive forward pruning techniques to verify the result. The program used
the same domain-specific optimizations such as symmetry reduction and the halfway
databases.
5.3 Implementation, Optimization, and Verification 73
The halfway database was optimized for size by storing the boolean evaluation of
each position using a single bit. Depending on the type of search, this boolean evaluation
could mean “Goat can at least draw” or “Tiger can at least draw”. Due to this space
optimization the halfway positions and endgame databases could be stored in memory,
thereby avoiding disk accesses and speeding up the search by orders of magnitude.
As Tiger is able to force the capture of two goats only by the end of the place-
ment phase (i.e. at ply 40), any position during the forward pruning with two goats
already captured can be deemed to be sub-optimal. In conjunction with move ordering
and forward pruning with the co-evolved neural network (see Chapter 4), the search for
“Goat can at least draw” used an aggressive forward pruning strategy of pruning posi-
tions which had two (or more) goats already captured. The halfway database was set
at ply 23, when 12 goats have already been placed and it is Tiger’s turn to move. The
search confirmed that “Goat can at least draw” in approximately 7 hours while visiting
7,735,443,119 nodes.
The program was also able to confirm that “Tiger can at least draw”. Due to the
large game-tree complexity of this search, two “halfway” databases were placed at ply
21 and ply 31. These databases contribute towards efficiency in two ways: first, they ter-
minate some searches early, and second, they generate narrower search trees. The latter
phenomenon is due to the fact that these databases are free of symmetrically equivalent
positions. In exchange for a large memory footprint of approximately 2 GB, search per-
formance was dramatically improved. Additional forward pruning heuristics eliminated
nodes where Tiger had few legal moves. The searched confirmed that “Tiger can at least
draw” in approximately 48 hours while visiting 40,521,418,103 nodes.
Additional searches of the program to prove “Goat can at least draw” and “Tiger
can at draw” without forward pruning ran for more than a week without results before
5.4 Chapter Conclusion 74
they were terminated. This shows that while the halfway databases helped to exploit the
relatively small state-space complexity of Tigers and Goats, forward pruning played a
major role in the efficiency of the search.
The theory of computation has developed powerful techniques for estimating the asymp-
totic complexity of problem classes. By contrast, there is little or no theory to help in
estimating the concrete complexity of computationally hard problem instances, such as
determining the game-theoretic value of Tigers and Goats. Although the general tech-
niques for attacking such problems have been well-known for decades, there are only
rules of thumb to guide us in adapting them to the specific problem at hand in an attempt
to optimize their efficiency [Nievergelt, 2000].
The principal rule of thumb we have followed in our approach to solving Tigers and
Goats is to pre-compute the solutions of as many subproblems as can be handled effi-
ciently with the storage available, both in main memory (hash-tables) and disks (position
data bases). If the net of these known subproblems is dense enough, forward pruning
techniques serves to truncate the depth of many forward searches, an effect that plays a
decisive role since the computation time tends to grow exponentially with search depth.
This suggests that developing better forward pruning techniques can enable us to solve
ever more computationally hard problems in the future.
Chapter 6
RankCut
This chapter is an extended version of our article “RankCut – A Domain Independent Forward
Pruning Method for Games” presented in the Twenty-First National Conference on Artifical
Intelligence (AAAI-06).
Alpha-Beta pruning [Knuth and Moore, 1975] and its variants like NegaScout/Principal
Variation Search [Reinefeld, 1989] and MTD(f ) [Plaat, 1996] have become the standard
methods used to search game-trees as they greatly reduce the search effort needed. Apart
from theoretical exceptions where decision quality decreases with search depth [Beal,
1980, Nau, 1979], it is generally accepted that searching deeper will result in higher
move decision quality [Junghanns et al., 1997]. However, search effort increases expo-
nentially with increasing search depth. To further reduce the number of nodes searched,
game-playing programs perform forward pruning [Marsland, 1986], where a node is
discarded without searching beyond that node if it is believed that the node is unlikely
to affect the final Minimax value of the node.
In this chapter, we describe RankCut [Lim and Lee, 2006b], which estimates the
75
6.1 Existing Forward Pruning Techniques 76
probability of discovering a better move later in the search by using the relative fre-
quency of such cases for various states during search. These probabilities are pre-
computed off-line using several self-play games. RankCut can then reduce search ef-
fort by performing a shallow search when the probability of a better move appearing
is below a certain threshold. RankCut implicitly requires good move ordering to work
well. However, game-playing programs already perform move ordering as it improves
the efficiency of Alpha-Beta search, and thus good move ordering is available at no extra
cost. We implement RankCut in the open source Chess programs, C RAFTY and T OGA
II, and show its effectiveness with test suites and matches.
Even with the search enhancements described in the chapter 2, search effort increases
exponentially with increasing search depth. However, a difficult conundrum arises
as, based on empirical data, move decision quality improves with increasing search
depth [Condon and Thompson, 1983, Thompson, 1982, Heinz, 1998a, Heinz, 2001b,
Junghanns et al., 1997]. To further reduce the number of nodes searched, game-playing
programs perform forward pruning [Marsland, 1986, Björnsson and Marsland, 2000b],
where a node is discarded without searching beyond that node if it is believed that the
node is unlikely to affect the final Minimax value of the node. This means that some
good move sequences might not be explored with forward pruning, which can affect
the Minimax evaluation and hence the move chosen by the search. Thus, using forward
pruning methods entails a compromise between the risk of making an error and pruning
more, thereby potentially making more errors, to allow deeper searches.
6.1 Existing Forward Pruning Techniques 77
Search extensions can be viewed as a type of forward pruning as the search is typi-
cally not full-width, and thus certain nodes are not visited despite no theoretical assur-
ance that they will not affect the Minimax value. However, it is useful to understand the
differences in the viewpoints of search extensions and forward pruning techniques. As
suggested by their names, search extension techniques strive to explore potential good
moves deeper to better evaluate them; forward pruning techniques strive to prune off
potential bad moves to avoid searching unnecessary parts of the game-tree.
Despite the stigma of forward pruning techniques during the early development of
Chess programs [Abramson, 1989], modern game-playing programs for Chess [Don-
ninger, 1993,Björnsson and Newborn, 1997], Checkers [Schaeffer, 1997], Othello [Buro,
1997a], Shogi [Iida et al., 2002], Abalone [Aichholzer et al., 2002] employ forward
pruning along with search extensions such as quiescence search, i.e., the search is not
full-width in some non-quiescence positions in the game-tree. This is largely due to
the development of several highly effective domain-independent forward pruning tech-
niques.
Both Razoring and Futility Pruning have been successfully used in Chess programs.
Razoring was first introduced in [Birmingham and Kent, 1977], while Futility Pruning
was first described in Schaeffer’s PhD thesis [Schaeffer, 1986]. Both techniques observe
that static evaluations of a board position one ply from the leaf nodes, also known as
frontier nodes, tend to be accurate enough to identify bad positions that need not be
considered further.
Razoring techniques [Birmingham and Kent, 1977, Heinz, 1998b] prune nodes just
above the horizon if their static evaluations fail low by a predetermined margin. Futility
6.1 Existing Forward Pruning Techniques 78
Pruning further observes that evaluation functions can usually be separated into major
and minor components. For example, in Chess, the major component would include
material count and the minor component would include positional evaluations. As the
minor component of the evaluation is composed solely of smaller adjustments to the
score, it can typically be bounded. At frontier nodes, if the evaluation of major compo-
nents falls greatly outside Alpha-Beta bounds and the bounds on the evaluation of the
minor component cannot possibly adjust the score to within the Alpha-Beta bound, the
frontier node can be safely pruned.
Heinz improved on Futility Pruning by applying Futility Pruning to pre- and pre-pre-
frontier nodes and called it “Extended Futility Pruning” [Heinz, 1998b]. Experiments
with well-known test suites and the Chess program DARK T HOUGHT [Heinz, 1998a]
showed improvements of 10% − 30% in fixed search depths of eight-12 plies.
Null-Move Pruning [Beal, 1989, Goetsch and Campbell, 1990, Donninger, 1993], which
was introduced in the 1990s, enables programs to search deeper with little tactical risk.
Null move pruning assumes that not making any move and passing (also known as mak-
ing a null move), even if passing is illegal in the game, is always bad for the player
to move. So when searching to depth d, the heuristic makes a null move by simply
changing the turn to move to the opponent and performs a reduced-depth search. If the
reduced-depth search returns a score greater than Beta, then the node is likely to be a
strong position for the player to move and will have a score greater than Beta when
searched to depth d without the null move. The program should therefore fail-high and
return Beta.
However, there is a class of positions known as zugzwang positions where making
6.1 Existing Forward Pruning Techniques 79
the null move is actually good for the player to move. This violates the assumption
that Null-Move Pruning makes and causes search errors. In Chess, zugzwang positions
normally occur during endgame positions. As a result, Null-Move Pruning is disabled
when the game is likely to be in the endgame phase (e.g., when the number of pieces on
the board is small).
6.1.3 ProbCut
The ProbCut heuristic [Buro, 1995a] prunes nodes that are likely to fail outside the
Alpha-Beta bounds during search by using simple statistical tools. ProbCut first re-
quires off-line computations to record results of both shallow and deep searches of the
same positions. The results of both searches are then correlated using linear regression.
Formally, the result v 0 of a shallow search is used to estimate the result v of a deeper
search by a linear relationship:
v = a · v0 + b + e
6.1 Existing Forward Pruning Techniques 80
where e is a normally distributed error variable with mean 0 and standard deviation σ.
The parameters a, b and σ can be estimated by linear regression over training search
results. ProbCut is therefore able to form a confidence interval from the result of a
shallow search to obtain an estimated range in which a deep search will return. By
algebraic manipulation, given v 0 , v ≥ β with probability at least p is equivalent to
v 0 ≥ (Φ−1 (p) · σ + β − b)/a, where Φ is the standard Gaussian distribution. Similarly,
v ≤ α with probability at least p is equivalent to v 0 ≤ (−Φ−1 (p) · σ + α − b)/a.
If this confidence interval falls outside the Alpha-Beta bounds, the program should
fail high or low appropriately. ProbCut was successfully applied in Logistello, an Oth-
ello program that beat the then World Human Othello Champion 6-0 under standard time
controls [Buro, 1997b].
Multi-ProbCut [Buro, 1999] is an enhancement of ProbCut which uses different re-
gression parameters and pruning thresholds for different stages of the game and multiple
depth pairings. Preliminary experiments show that Multi-ProbCut can be successfully
6.1 Existing Forward Pruning Techniques 81
applied to Chess [Jiang and Buro, 2003], but requires a fair amount of work to get good
results. Jiang and Buro [Jiang and Buro, 2003] suggest that there are at least two reasons
for the poor performance:
1. The Null-Move Pruning forward prunes the same type of positions as ProbCut.
Since the Null-Move Pruning is commonly applied in Chess program, this results
in ProbCut not being able to significantly improve game-performance when im-
plemented alongside the Null-Move Pruning. On the other hand, ProbCut has been
successfully applied to Othello as it is a zugzwang game, and therefore Null-Move
Pruning is ineffective in this game.
The N -Best Selective Search is a simple form of forward pruning that bears close resem-
blance to RankCut. The moves are ranked according to an evaluation function prior to
considering moves for pruning, and during the search only the N best moves are consid-
ered. This heuristic requires the move ordering to be able to rank the best move within
the top N moves constantly. So while this heuristic has the advantage of ensuring the
branching factor of the search is at most N , it normally introduces far too many pruning
errors to be a viable forward pruning technique. RankCut can be seen as a mechanism
that dynamically adapts N , but also has the additional advantage of being able to adjust
its risk appetite on a case-by-case basis.
6.1 Existing Forward Pruning Techniques 82
History Pruning, also known as Late Move Reduction, is a forward pruning technique
that has been described and discussed extensively in internet computer Chess forums1 .
History Pruning/Late Move Reduction assumes that good move ordering is done during
search, and therefore a Beta cutoff either occurs after the first few moves searched, or
not at all. The first few moves are then searched to the full depth, and any remaining
move is searched with reduced depth unless the score returned is deemed interesting,
such as the score being greater than Alpha.
There are different implementations of identifying moves to search with reduced
depth. For example, the strong open source Chess program F RUIT, which implements
1
Two popular forums are the Winboard Forum - http://wbforum.vpittlik.org and TalkChess -
http://www.talkChess.com/forum
6.1 Existing Forward Pruning Techniques 84
History Pruning/Late Move Reduction, counts the number of times a move has failed
high and failed low in previous searches, and moves that have a high fail high to fail low
ratio are not reduced. In addition, F RUIT does not reduce the search depth if the board
position is in check or the node is a PV node. As mentioned before, there is no standard
implementation of History Pruning/Late Move Reduction and there are several Chess
programs that use other forms of threat detection, like positional threat and evaluation
data, to decide whether a move should be searched with reduced depth.
There has been a lot of discussion on History Pruning/Late Move Reduction, but sur-
prisingly there has not been any scientific papers that show the effectiveness of History
Pruning/Late Move Reduction in games. RankCut shares some similarities with History
Pruning/Late Move Reduction, and later in this chapter we discuss the similarities and
differences between History Pruning/Late Move Reduction and RankCut.
6.2 Preliminaries 85
6.2 Preliminaries
We recall some properties of Alpha-Beta search algorithms and several definitions which
will be relevant to this chapter. Alpha-Beta search square roots the effective branching
factor compared to the Minimax algorithm when the nodes are evaluated in an optimal
or near optimal order [Knuth and Moore, 1975]. A reduced-depth search is a search with
depth d − r, where d is the original search depth and r is the amount of depth reduction.
The higher the value of r, the more errors the reduced-depth search makes due to the
horizon effect [Berliner, 1974]. However, a reduced-depth search will result in a smaller
search tree and therefore a compromise has to made between reduced search effort and
the risk of making more search errors.
Although it is technically more correct to refer to the value of a node (and its corre-
sponding position) in the Minimax framework, we will use the term “value of a move”
to refer to the value of the node as the result of that move, as it is conceptually easier
to describe how to use move order information during search in this manner. In later
sections, we will therefore use v(mi ) to refer the value of the board after making move
i, instead of the more correct notation of vi .
In section 6.1, we described various forward pruning techniques currently used success-
fully in game-playing programs such as Null-Move Pruning, ProbCut, Futility Pruning,
Razoring, History Pruning/Late Move Reduction and N -best selective search. Most of
these techniques can be seen as using the same principle of locally testing the probabil-
ity of a node scoring outside the Alpha or Beta bound, and pruning the nodes that have
a high probability of failing high or low. In other words, for a node n with k moves,
6.3 Is There Anything Better? 86
increase N to a high value where such occurrences are rare. However, if we now con-
sider a node where the best move has not changed since the first move, and the scores of
the subsequent moves are decreasing in a monotonic fashion, then it is likely that a high
value of N is too conservative for this particular case, and forward pruning opportunities
have been lost.
A better way of factoring in move ordering when pruning is by considering the prob-
ability Πx (f~i ) = p(v(mi ) > x | f~i ), where f~i = (f (m1 ), . . . , f (mi−1 )) are the salient
features of the previous moves considered, and x is an adaptive bound. In our experi-
ments, x is set to vbest , the score of the current best move. However, x can also be set to
a value like Alpha, or a variable bound so as to minimize the risk of error. For brevity,
we use Π(f~i ) to represent Πvbest (f~i ). Good features allow Π(f~i ) to identify when moves
are unlikely to affect the final score, and examples include the current move number and
the scores of prior moves. So when performing forward pruning, the probability Π(f~i )
gives the likelihood of mi returning a score better than the current best move, and if it
is below a certain threshold, we can prune this move as it is unlikely to affect the final
score. This approach is more adaptive than the static N -best selection search.
Since good move ordering is essential to achieving more cutoffs when using Alpha-
Beta search [Knuth and Moore, 1975], heuristics like the History Heuristic [Schaef-
fer, 1989] and the Killer Heuristic [Akl and Newborn, 1977], together with domain
dependent knowledge are used to improve move ordering. Good move ordering is
therefore usually available in practical implementations, and is another good reason to
consider move order in any forward pruning decision. This observation is not new –
Björnsson and Marsland [Björnsson and Marsland, 2000a] mention this insight, but re-
strict the application to only the possibility of failing high, or p(v(mi ) > β | v(m1 ) <
β, . . . , v(mi−1 ) < β). Moriarty and Miikkulainen [Moriarty and Miikkulainen, 1994]
6.4 RankCut 88
and Kocsis [Kocsis, 2003, Chapter 4] also considered pruning nodes while taking move
ordering into consideration by using machine learning techniques like neural networks
to estimate the probability. However, the results of these experiments have not been
conclusive and more research in their effectiveness is needed.
6.4 RankCut
In this section, we introduce a new domain independent forward pruning method called
RankCut. We later show its effectiveness by implementing it in the open source Chess
programs, C RAFTY2 and T OGA II3 .
6.4.1 Concept
In the previous section, we suggested considering the probability Π(f~i ) when forward
pruning. However, if the moves are ordered and Π(f~i ) is low, then the remaining moves
mj , where j > i, should also have low probabilities Π(f~j ). Testing each probability
Π(f~i ) is thus often redundant, and RankCut considers instead the value of Π0 (f~i ) =
p(max{v(mi ), v(mi+1 ), . . . , v(mk )} > vbest | f~i ), where k is the total number of legal
moves of the current node. RankCut can be thought of as asking the question “Is it likely
that any remaining move is better than the current best move?”. These probabilities are
estimated off-line by using the relative frequency of a better move appearing and can be
represented by x/y where x is the number of times a move mj , where j ≥ i, returns
a score better than the current best when in the state f~i , and y is the total number of
2
Available at ftp://ftp.cis.uab.edu/pub/hyatt
3
Available at http://www.uciengines.de/UCI-Engines/TogaII/togaii.html
6.4 RankCut 89
instances of the state f~i , regardless of whether or not the best move changes. This off-
line procedure of collecting the statistics requires modifying the game-playing program
to store the counts, and then playing a number of games under the same (or longer) time
controls expected.
One potential problem is that RankCut assumes the statistics of Π(f~i ) collected with-
out forward pruning remain the same when forward pruning. While our experiments in-
dicate that the assumption is reasonable for practical purposes, more research is needed
to understand how RankCut still works despite this simplifying assumption. We refer
the reader to Chapter 9 for more discussions on this issue.
RankCut tests Π0 (fmi ) < t for each move, where t ∈ (0, 1) is user-defined. If true,
RankCut does not prune the move, but instead does a reduced-depth search and returns
the score of that shallow search. The full-width nature of the reduced-depth search helps
to retain tactical reliability while reducing search effort. RankCut is domain independent
as it does not require any game logic and is easily added to an Alpha-Beta search as
shown in Pseudocode 9.
The main modifications to the Alpha-Beta search are the computations of f~i (Line 7)
and tests of Π0 (f~i ) (Line 8). Prior to making a move, RankCut tests if pruneRest is set
or Π0 (f~i ) < t. If either is true, RankCut makes a reduced-depth search. Otherwise, the
usual Alpha-Beta algorithm is executed.
Note that RankCut typically uses the results of a depth-reduced search to perform
Alpha-Beta cutoffs; although this is not theoretically sound, this has worked relatively
well in our experiments in Chess programs. Alternatively, there is an optional boolean
variable RankCutReSearch (Line 12) that can be set to require a re-search without
depth reduction if the reduced-depth search returns a score greater than Alpha. As this
forces a re-search without depth reduction, this option is a compromise between more
6.4 RankCut 90
accurate search results and faster search times. The experiments in this chapter did not
enable this option.
One potential problem is that RankCut assumes the statistics of Π(f~i ) collected with-
out forward pruning remain the same when forward pruning. Nevertheless, our experi-
ments indicate that the assumption is reasonable for practical purposes.
C RAFTY is a very strong open-source Chess engine and its rating is about 2600 on a
1.2 GHz Athlon with 256MB of RAM when tested independently by the Swedish Chess
Computer Association, or SSDF4 . C RAFTY uses modern computer Chess techniques
4
Details at http://web.telia.com/∼u85924109/ssdf/
6.4 RankCut 91
such as bitboards, 64-bit data structures, NegaScout search, Killer Move heuristics,
Static Exchange Evaluation, Quiescence search, and selective extensions. We incor-
porated RankCut into C RAFTY Version 19.19, which features Null-Move Pruning and
Futility Pruning, and ran all experiments on a PowerMac 1.8GHz. We will differentiate
between the two versions of C RAFTY when needed in our discussions by calling them
O RIGINAL C RAFTY and R ANK C UT C RAFTY.
C RAFTY has 5 sequential phases of move generation – (1) principal variation from
the previous search depth during iterative deepening, (2) capture moves sorted based on
the expected gain of material, (3) Killer moves, (4) at most 3 History moves, and (5) the
rest of the moves. We modified C RAFTY so that at phase 4 or the History move phase,
it would continue sorting remaining moves according to the History Heuristic until no
more suitable candidate was found.
During testing, we discovered that the probability of a better move appearing for
moves generated in phases before the History Move phase is always too high to trigger
a pruning decision. R ANK C UT C RAFTY saves computational effort by only starting to
forward prune when move generation is in the History move phase.
The probabilities Π0 (f~i ) were calculated by collecting the statistics from 50 self-play
games, each with a randomly-chosen opening, where C RAFTY played against itself in
a time control of 80 minutes per 40 moves. The pruning threshold t and the amount
of depth reduction in the shallow search were conservatively set at 0.75% and 1 ply
respectively. As we calculated the relative frequencies with a small set of 50 games, we
use the probabilities Π0 (f~i ) only if 1,000 or more instances of f~i were seen to ensure that
the statistics are reliable. The following features f~i were used:
5. Difference between the score of the current best move from the given Alpha bound
(discretised to 7 intervals)
6. Difference between the score of the last move from the current best move (discre-
tised to 7 intervals)
C RAFTY uses search extensions to explore promising positions more deeply. R ANK -
C UT C RAFTY therefore does not reduce the search depth even when Π0 (f~i ) < t if
C RAFTY extends the search depth as C RAFTY has signaled that the current node needs
more examination. R ANK C UT C RAFTY is also set to forward prune only nodes that
have search depth, defined as the length of a path from the current node to the leaf
nodes, greater or equal to 7. This is because move ordering tends to be less reliable when
iterative deepening is initially searching the first few depths. Furthermore, when search-
ing higher search depths, move ordering also becomes less reliable when the search is
further from the root.
Test Suites
We tested R ANK C UT C RAFTY with all 2,180 positions from the tactical Chess test suites
ECM, WAC and WCS (see Appendix D) by searching to fixed depths of 8, 10, and 12
plies respectively. Table 6.1 shows the results of these experiments and compares them
6.4 RankCut 93
Original RankCut
Test Suite #Nodes Avg Time (s) #Solved ∆Nodes ∆% ∆Time ∆% ∆Solved
ECM-08 1,170,566,012 1.68 549 -104,561,591 -8.9% -0.03 -1.98% -2
ECM-10 11,641,894,779 15.43 620 -3,636,256,257 -31.2% -3.98 -25.80% -7
ECM-12 88,924,232,803 120.92 678 -36,761,238,388 -41.3% -47.34 -39.15% -13
WAC-08 160,052,139 0.63 289 -11,554,559 -7.2% -0.02 -2.87% -2
WAC-10 1,961,241,110 6.85 294 -667,242,604 -34.0% -1.26 -18.32% 0
WAC-12 13,361,000,868 66.92 297 -4,763,207,469 -35.7% -35.95 -53.72% -1
WCS-08 914,031,896 1.11 840 -96,470,205 -10.6% -0.06 -4.96% 1
WCS-10 9,534,443,036 10.75 863 -2,234,703,349 -23.4% -2.32 -21.60% 1
WCS-12 77,536,489,398 87.36 873 -212,586,355 -36.1% -27.36 -31.31% -2
Sum-08 2,244,650,047 1.27 1678 -212,586,355 -9.5% -0.04 -3.23% -3
Sum-10 23,137,578,925 12.10 1777 -6,538,202,210 -28.3% -2.84 -23.50% -6
Sum-12 179,821,723,069 98.08 1848 -69,528,669,092 -38.7% -36.60 -37.31% -16
with the results of the O RIGINAL C RAFTY. The last three rows of Table 6.1 also show
the combined results of the three test suites.
The absolute standard error [Heinz, 1999] of n test positions with k correct solutions
p
is SE = n × p × (1 − p)/n where p = k/n. The standard error allows us to ascertain
whether the errors introduced by the pruning method is within ‘statistical’ error bounds.
The combined results of all three suites for O RIGINAL C RAFTY in Table 6.1 shows that
the standard error for “Sum-08”, “Sum-10” and “Sum-12” are SE8 = 19.6, SE10 =
18.1 and SE12 = 16.7, respectively. R ANK C UT C RAFTY, however, solves only 3, 6 and
16 fewer test positions while searching less nodes (with the difference from O RIGINAL
C RAFTY denoted by ∆ columns). Hence all the results of R ANK C UT C RAFTY are
within one standard error of the results of O RIGINAL C RAFTY.
We also tested using the LCT II test (see Appendix D), a set of 35 positions divided
into positional, tactical and end-game positions. The LCT II estimates an ELO rating
for the program based on the solution times. Both O RIGINAL C RAFTY and R ANK C UT
C RAFTY solved the same 28 out of 35 problems, but due to faster solutions, R ANK C UT
C RAFTY obtained a rating of 2635 ELO, whereas O RIGINAL C RAFTY was estimated at
2575 ELO. During the test, O RIGINAL C RAFTY searched to an average of 15.7 plies,
6.4 RankCut 94
whereas R ANK C UT C RAFTY was able to search to an average of 16.5 plies, almost 1
ply deeper on average.
F RUIT is one of the strongest Chess programs in the world. F RUIT 2.2.1 finished in
second place at the 2005 World Computer Chess Championships [Björnsson and van den
Herik, 2005] and obtained a rating of more than 2800 when tested by SSDF. F RUIT was
an open source Chess engine until Version 2.15 . We tested the O RIGINAL C RAFTY and
R ANK C UT C RAFTY against F RUIT 2.1 with the 32-openings used in [Jiang and Buro,
2003] under blitz time controls of 2 min + 10 sec/move. O RIGINAL C RAFTY lost to
F RUIT by +11 -39 =14 or 28.1% and R ANK C UT C RAFTY lost to F RUIT by +15 -40
=9 or 30.5%. In addition, we played 20 Nunn-II opening positions under 40 moves/40
minutes. O RIGINAL C RAFTY lost to F RUIT by +2 -26 =12 or 20.0% and R ANK C UT
C RAFTY lost to F RUIT by +5 -28 =7 or 21.25%.
These results suggest that the performance gains of R ANK C UT C RAFTY extend to
games against other Chess engines.
T OGA II. Both R ANK C UT T OGA II and O RIGINAL T OGA II had History Pruning/Late
Move Reduction enabled.
We collected statistics for RankCut in a similar manner. The probabilities Π0 (f~i )
were calculated by collecting the statistics from 50 self-play games, each with a randomly-
chosen opening, where T OGA II played against itself in a time control of 80 minutes per
40 moves. All options of O RIGINAL T OGA, such as History Pruning/Late Move Reduc-
tion, were unchanged during this training phase. The pruning threshold t and the amount
of depth reduction in the shallow search were again conservatively set at 0.75% and 1
ply respectively. We used the probabilities Π0 (f~i ) only if 1,000 or more instances of f~i
were seen to ensure that the statistics are reliable. Due to the different implementations
between T OGA II and C RAFTY, the same features used in R ANK C UT C RAFTY could
not be entirely replicated. Instead, the following features f~i were used:
3. Difference between the score of the current best move and the given Alpha bound
(discretised to bins of 100 points)
4. Difference between the score of the last move and the current best move (discre-
tised to bins of 100 points)
Note that unlike our implementation in C RAFTY, we do not use the depth feature.
If RankCut is able to prune accurately without the depth feature, we consider this to be
beneficial as it removes the need for RankCut to have seen a move at a particular depth
during training to be able to assess whether or not to prune.
T OGA II has different phases of move generation, depending on whether the position
is in check, the search is quiescence, and for all other types of searches which we term
“normal” searches. As R ANK C UT T OGA II does not prune when the position is in
check, or when the search is in quiescence, we are only concerned with the phases of
move generation in “normal” searches. In “normal” searches, T OGA II has 5 phases
of move generation - (1) Killer moves in the transposition table, (2) “good” captures,
(3) “bad” captures, (4) Killer moves not in the transposition table and lastly, (5) “quiet”
moves. Within these move generation phases, moves are sorted using heuristic values
such as history values and other heuristic evaluation functions.
T OGA II performs search extensions to explore promising or tactical positions more
deeply. The amount of search extension is captured by the reduction in depth based on
tactical heuristics. R ANK C UT T OGA II decides whether or not to prune regardless of
the search extensions, as long as Π0 (f~i ) < t. R ANK C UT T OGA II is also set to forward
prune only nodes that have search depth greater or equal to 7.
Results
As noted in section 6.3, most of the current forward pruning techniques, such as Null-
Move Pruning, ProbCut, Futility Pruning, Razoring, and N -best selective search, assess
whether or not to prune based on different criterions.
However, History Pruning/Late Move Reduction (Section 6.1.6) also assumes that
good move ordering is done during search, and searches moves with reduced depth if it
is deemed to be unlikely to affect the final score. While both algorithms share this basic
idea, there are two major differences between History Pruning/Late Move Reduction
and RankCut.
First of all, RankCut considers the rank of a move, and since moves are assumed to
be sorted in descending order of quality, RankCut is therefore able to prune all remaining
moves after a move is deemed to be unlikely to affect the final score. In contrast, History
Pruning/Late Move Reduction considers each move in isolation after a fixed number of
initial moves are searched to full depth.
In addition, RankCut decides whether or not to prune a move based on off-line com-
putations whereas History Pruning/Late Move Reduction uses on-line computations. By
7
Rating on 10 Sep 2006
6.4 RankCut 99
using off-line computations, RankCut is able to see how often a move is unlikely to affect
the final score across multiple games. In contrast, History Pruning/Late Move Reduc-
tion makes a decision to prune based on information from the start of the game up to
the current board position. Furthermore, the more features that are used to estimate the
probability of a better move appearing, the more training data is required. Using on-line
computations therefore limit the number of features used to predict whether or not a bet-
ter move will appear. For example, F RUIT uses only the type of piece, the from square
and the to square as features for History Pruning.
Despite the differences, both algorithms have been shown to be effective in practical
Chess programs. Furthermore, RankCut seems to improve game-playing performance
when implemented alongside History Pruning/Late Move Reduction in T OGA II. This
suggests that each algorithm is able to make effective pruning decisions that the other
algorithm is unable to make.
In our implementations, data representations for f~i and Π0 (f~i ) are straightforward. While
we are not limited to just the following methods, we will briefly describe how we rep-
resented them in our implementations of R ANK C UT C RAFTY and R ANK C UT T OGA
II.
As f~i is a vector of features of past moves, it can be implemented using an array or
a set of variables. Compute(f~i ) can then be performed incrementally for each move.
For example, if the current move number is a chosen feature, then MoveNumber++ in
C pseudocode suffices to incrementally update the feature for the next move.
Π0 (f~i ) can also be represented using arrays or a hash table. For example, if the
current move number is the only feature used, Π0 (f~i ) can be represented using an array
6.4 RankCut 100
with double P[MAX_MOVES] in C pseudocode, or using a hash table that maps f~i to
Π0 (f~i ).
40000
35000
30000
Number of Occurrences
25000
20000
15000
10000
5000
0
100 200 300 400 500 600 700 800 900 1000 More
Frequency Bins
Figure 6.1: Histogram of the number of features collected for R ANK C UT C RAFTY with
t < 0.75%. The x-axis consists of frequency bins of features. Each bin contains the
count of features that are seen the number of times between the previous bin and the
current bin, and the y-axis is the number of features within the frequency bin.
As RankCut prunes only if Π0 (f~i ) < t, we only need to store all f~i such that
Π0 (f~i ) < t if we do not update the frequencies of moves on-line. By deciding on a
threshold t beforehand, our implementations of RankCut were able to extract the f~i with
the appropriate probabilities. As we can see in the histograms shown in Figures 6.1 and
6.2, the number of features that are seen more than 1,000 times with t < 0.75% is rela-
tively small - approximately 10,000 for R ANK C UT C RAFTY and 15,000 for R ANK C UT
T OGA II. This resulted in our implementations of RankCut requiring less than 1 MB of
memory to store f~i .
6.5 Chapter Conclusions 101
35000
30000
25000
Number of Occurrences
20000
15000
10000
5000
0
100 200 300 400 500 600 700 800 900 1000 More
Frequency Bins
Figure 6.2: Histogram of the number of features collected for R ANK C UT T OGA II with
t < 0.75%. The x-axis consists of the frequency bins of features. Each bin contains
the count of features that are seen the number of times between the previous bin and the
current bin, and the y-axis is the number of features within the frequency bin.
The simplicity of RankCut makes implementation in various games easy, and it can
even be implemented on top of existing forward pruning techniques.
Chapter 7
Player to Move Effect
This chapter is based partially on our article “Properties of Forward Pruning in Game-Tree
(AAAI-06).
The Alpha-Beta algorithm [Knuth and Moore, 1975] is the standard approach used
in game-tree search to explore all combinations of moves to some fixed depth. How-
ever, even with Alpha-Beta pruning, search complexity still grows exponentially with
increasing search depth. To further reduce the number of nodes searched, practical
game-playing programs perform forward pruning [Marsland, 1986, Buro, 1995a, Heinz,
1999], where a node is discarded without searching beyond that node if it is believed that
the node is unlikely to affect the final Minimax value of the node. As all forward pruning
techniques inevitably have a non-zero probability of making pruning errors, employing
any forward pruning technique requires a compromise between accepting some risk of
error and pruning more in order to search deeper.
However, the effects of different pruning errors vary. We can observe this by con-
sidering a Chess game where White is to move and winning. During Minimax search,
most of White’s moves will be good enough to maintain the advantage of the game, so
103
7.1 Theoretical Analysis 104
it is acceptable to prune away most of White’s moves. However, it would be foolish for
White to lose the advantage by making a move that allows a tactical counter-attack by
Black, and White should therefore be concerned that no such counter-tactics by Black
exists for all the moves he considers. This example highlights the asymmetric effects
of pruning errors–Pruning errors in White-to-move positions, assuming that most of the
better moves have been considered, are tolerated more readily than pruning errors in
Black-to-move positions during search.
Forward pruning techniques should therefore consider how pruning errors propagate
in game-tree search. In this chapter we show that the severity of forward pruning errors
is asymmetric with respect to the player to move; the error is likely to be more severe
when pruning on the opponent’s moves rather than the player’s own moves [Lim and
Lee, 2006a]. This effect arises because pruning errors, when pruning exclusively on
the children of Max nodes, cannot cause a poor move to be deemed better than a good
move whose subtree does not contain errors; however, this is not the case when prun-
ing exclusively on the children of Min nodes. This suggests that to do well, pruning
should be done more aggressively on the player’s own move and less aggressively on
the opponent’s move.
In this section, we show that forward pruning errors are propagated differently depend-
ing on the player to move. We first state a lemma showing how the different pruning
errors affect Minimax results:
Lemma 1. Assume that we are performing forward pruning only on the children of Max
nodes throughout the tree. Then, for any unpruned node u, scoreM ax (u) ≤ score∗ (u),
7.1 Theoretical Analysis 105
where scoreM ax (u) is the score of the algorithm that only forward prunes children
of Max nodes. Conversely, if we are performing forward pruning only on the chil-
dren of Min nodes, then for any unpruned node u, scoreM in (u) ≥ score∗ (u), where
scoreM in (u) is the score of the algorithm that only forward prunes the children of Min
nodes.
Proof. By induction on the depth d of node u in the tree, where d is defined as the
number of nodes in a path from u to a leaf node.
Base Case:
scoreM ax (u) = utility(u) = score∗ (u) if d = 1
Assume that the inductive statement is true for d = n, and consider the case when
d = n + 1.
If it is Max’s turn to move at node u,
Note that the the inequalities are strict if and only if forward pruning discards a move
erroneously, e.g. the child with the highest Minimax score is discarded in a Max node.
7.2 Observing the Effect using Simulations 106
The standard approach for game-playing is to use the result of the game-tree search
with the root node representing the current board position. We assume without loss of
generality that the root node is a Max node.
Theorem 1. Assume there are b moves at the root node and that there is a strict ordering
based on the utility of the moves such that score∗ (ui ) > score∗ (uj ) if 1 ≤ i < j ≤ b
and ui is the ith child.
If forward pruning is applied only to the children of Max nodes, then ∀i such that no
pruning error occurs in the subtree of ui , i.e. scoreM ax (ui ) = score∗ (ui ), the new rank
i0 based on scoreM ax has the property i0 ≤ i.
The converse holds if forward pruning is applied only to the children of Min’s nodes,
i.e., ∀i such that no pruning error occurs in the subtree of ui , i.e. scoreM in (ui ) =
score∗ (ui ), the new rank i0 based on scoreM in has the property i0 ≥ i.
Proof. We will only prove the first statement, as the proof of the converse statement
is similar. Assume, on the contrary, that i0 > i. This implies that ∃j > i such that
score∗ (uj ) < score∗ (ui ), but the new ordering j 0 based on scoreM ax is j 0 < i0 and
scoreM ax (uj ) > scoreM ax (ui ). But scoreM ax (uj ) ≤ score∗ (uj ) by Lemma 1, and
score∗ (uj ) < score∗ (ui ) = scoreM ax (ui ) which imply that scoreM ax (uj ) < scoreM ax (ui ).
Contradiction.
node. We fix the number of nodes to be forward pruned – in each instance, three ran-
domly chosen children at either Max or Min nodes were forward pruned. Unfortunately,
error filtering effects that reduce error propagation to the root makes comparison more
difficult; the type of pruning errors that is filtered more depends on the height of the
tree. To ensure that the observed effects are not because of the filtering effect, we exper-
imented with trees of heights ranging from four to six so that there would instances of
both cases: having fewer Max nodes and having fewer Min nodes.
We recorded the rank of the move at the root (as ranked by a search without forward
pruning) that was eventually chosen by the search. For each experimental setup, we ran
a simulation of 106 randomly generated game-trees.
Figure 7.1 shows the number of times the search with forward pruning chooses a
move of a particular rank. We see that when children of Max nodes are pruned erro-
neously, the probability of choosing a move ranked 4th or 5th decreases sharply towards
zero; when children of Min nodes are pruned wrongly, the probability of choosing a
move ranked 4th or 5th only tapers gradually. In other words, if we have to choose be-
tween forward pruning only the children of Max or Min nodes, and the eventual rank of
the move chosen is important, we should choose to forward prune the children of Max
nodes.
We also simulated more realistic game-trees, using the approach of Newborn [New-
born, 1977]. In his approach, every node in the tree receives a random number, and the
value of a leaf node is the average of all random numbers in the nodes on the path from
the root of the tree to the leaf node. This ensures some level of correlation between the
Minimax values of sibling nodes, which is known to be non-pathological [Nau, 1982].
The random number in each node is chosen from the uniform distribution [0, 1). The
results with such branch-dependent leaf valued game-trees as shown in Figure 7.2 are
7.2 Observing the Effect using Simulations 108
1000000
100000
Number of Occurrences
10000
1000
100
10
1
1 2 3 4 5
Rank of Move Choice at Root
Min Prunes in trees of height 4 Min Prunes in trees of height 5
Max Prunes in trees of height 5 Max Prunes in trees of height 6
Figure 7.1: Log plot of the number of times ranked moves are chosen where either Max
or Min nodes are forward pruned
1000000
100000
Number of Occurrences
10000
1000
100
10
1
1 2 3 4 5
Move Choice at Root
Figure 7.2: Log plot of the number of times ranked moves are chosen where either Max
or Min nodes are forward pruned in game-trees with branch-dependent leaf values
7.3 Observing the Effect using Real Chess Game-trees 109
1000000
100000
Number of Occurrences
10000
1000
100
10
1
1 2 3 4 5
Rank of Move Choice at Root
More Min Prunes in trees of height 4 More Min Prunes in trees of height 5
More Max Prunes in trees of height 5 More Max Prunes in trees of height 6
Figure 7.3: Log plot of the number of times ranked moves are chosen with unequal
forward pruning on both Max and Min nodes in game-trees with branch-dependent leaf
values
We may want to prune of both types of nodes in practice, if pruning on only the
children of Max nodes does not provide enough computational savings. To simulate
this, we ran experiments with branch-dependent leaf valued trees, where three randomly
chosen children at either Max or Min nodes and one randomly chosen child of the other
node type were forward pruned. Figure 7.3 shows the results of these experiments for
various search depths. We see that the asymmetric effects of the severity of errors are
still present. Most of the errors that resulted in a move ranked 4th or 5th being chosen
are likely to have come from pruning children of Min nodes.
The simulation results in Section 7.2 are consistent with Theorem 1. However, “real”
game-trees typically have variable branching factors and heuristic leaf values whereas
7.3 Observing the Effect using Real Chess Game-trees 110
the simulated game-trees have uniform branching factor and randomly generated leaf
values. It therefore remains to be seen that the same effects can be observed using “real”
game-trees.
To show that the player to move effect applies in “real” game-trees, we use the
implementation of RankCut in the strong open-source Chess program T OGA II (Section
6.4.3). We will differentiate between the two versions of T OGA II when needed in our
discussions by calling them O RIGINAL T OGA II and R ANK C UT T OGA II to refer to the
version of Toga II which does not implement RankCut and the one which implements
RankCut respectively.
We first generated 56,167 Chess positions using O RIGINAL T OGA II, with a fixed
time limit of 10 seconds per move, by starting from positions obtained from computer
Chess test suites, and continuing play until the game ended.
Next, for each generated Chess positions, we use R ANK C UT T OGA II to find its
best move within a time limit of 10 seconds. We record the move made by R ANK C UT
T OGA II, but use O RIGINAL T OGA II to find the rank of the move made by R ANK C UT
T OGA II among all legal moves of that position. To ensure that the rank of the move
is not affected by the pruning done by R ANK C UT T OGA II, O RIGINAL T OGA II is
made to search to the same depth as R ANK C UT T OGA II did within the time limit. Two
pruning schemes were tested - t = 0.1 on one type of nodes (Max or Min), and t = 0.0
on the other type of nodes. Note that even when t = 0.0, R ANK C UT T OGA II forward
prunes moves with features in which RankCut did not encounter any better moves during
training.
Figure 7.4 shows the number of times R ANK C UT T OGA II chooses a move of a
particular rank. We see that even in real Chess game-trees, when the children of Max
nodes are pruned erroneously, the probability of choosing a low ranked move decreases
7.4 Effect on Actual Game Performance 111
100000
1000
100
10
1
1 6 11 16 21
Rank of Move Chosen
Prune more on Max moves Prune more on Min moves
Figure 7.4: Log plot of the number of times ranked moves are chosen with unequal
forward pruning on both Max and Min nodes in real Chess game-trees
more than when the children of Min nodes are pruned wrongly. In other words, if the
eventual rank of the move chosen by a search is important, but we are able to tolerate a
certain level of forward pruning errors, we should choose to forward prune more on the
children of Max nodes than on the children of Min nodes.
The experiments on both simulated and Chess game-trees have shown that the player to
move of a node affects the propagation of forward pruning errors in terms of the rank of
the move chosen by the root. We therefore expect that actual game performance will also
be affected by the player to move. In other words, since the rank of the move chosen by
the root is lower when forward pruning more aggressively on the children of Min nodes,
the game performance of a search that forward prunes more aggressively on the children
of Min nodes should also be poorer.
We tested the effect of player to move on actual game performance by running a
set of matches between R ANK C UT T OGA II and O RIGINAL T OGA II on 52 openings,
7.4 Effect on Actual Game Performance 112
consisting of 32 openings used in [Jiang and Buro, 2003], and 20 Nunn-II positions
under the blitz time control of 40 moves/10 minutes. R ANK C UT T OGA II was mod-
ified to accept two thresholds tM AX and tM IN , corresponding to the threshold used to
determine whether or not to forward prune when in a MAX or MIN node, respectively.
We tested the asymetric Max-Min Thresholds pairs of {0.5% − 0.0%, 0.0% − 0.5%},
{0.75% − 0.25%, 0.25% − 0.75%} and {1.0% − 0.5%,0.5% − 1.0%}. Note that the
differences in Max and Min thresholds are all 0.5%, a value high enough that we chose
so that the effect of the player to move would be evident.
In our preliminary experiments with R ANK C UT T OGA II, the scores achieved by the
asymmetric Max-Min thresholds were statistically similar, and therefore we were unable
to observe the effect of the player to move on game performance. We hypothesize that
the amount of forward pruning that RankCut introduced was insufficient for the effect to
manifest. Subsequently, R ANK C UT T OGA II was tweaked to incorporate more forward
pruning by (1) starting the forward pruning from depth 3 (instead of depth 7), and (2)
all features ~v which were seen at least once before are used in decide whether or not to
forward prune (instead of those seen at least 1,000 times).
While this made R ANK C UT T OGA II forward prune more often, it also weaken
R ANK C UT T OGA II, which now loses to O RIGINAL T OGA II as seen in Table 7.1 -
R ANK C UT T OGA II with thresholds 0.00%, 0.25%, 0.5%, 0.75% and 1.00% won only
50.48%, 46.63%, 42.79%, 47.60% and 38.94% of games, respectively. Note that at
t = 0.00% R ANK C UT T OGA II still prunes moves that have features which never had a
better move appear after it during training.
We are now able to observe the effect of the player to move of a node during forward
pruning on game performance. The differences between the asymmetric pruning thresh-
old of Max and Min players are evident - (1) {0.25%-0.75%} won 45.19% of games
7.5 Chapter Conclusion 113
Table 7.1: Scores achieved by various Max-Min thresholds combinations against O RIG -
INAL T OGA II
In two-player perfect information games, the player can typically make only one move,
so if the best move has been pruned, the game-tree search should then preferably return
the second best move. Theorem 1 and our experimental results in simulated and Chess
7.5 Chapter Conclusion 114
game-trees suggest that different pruning errors relative to the player at the root node
have different effects on the move quality chosen by the game-tree search. Pruning
errors in children of Max nodes will not decrease the rank of moves that are correctly
evaluated. This means that if the second best move is correctly evaluated but the best
move is incorrectly evaluated due to pruning errors, then the game-tree search will return
the second best move as the move to play. On the other hand, pruning errors in children
of Min nodes can incorrectly increase the rank of moves and the game-tree search could
possibly return the worst move as the move to play, even if the best move is correctly
evaluated.
Experiments involving matches played between R ANK C UT T OGA II and O RIGI -
NAL T OGA II suggest that this effect might extend to actual game-playing performance.
However, we were able to observe this effect when we increased the amount of forward
pruning that R ANK C UT T OGA II does. In addition, since the effects on the rank of the
move chosen in simulations are clearly evident only in log scale, this suggests that the
effect is relatively small and would explain why this effect has not been observed in
applications and empirical experiments until now.
Nevertheless, the player to move effect on pruning error propagation appears to be
present in game-tree search, and this suggests that all forward pruning techniques in
practical settings should consider and experiment with the risk management strategy of
forward pruning more aggressively on the children of Max nodes and more conserva-
tively on the children of Min nodes.
Chapter 8
Depth of Node Effect
This chapter is modified from parts of our article “Properties of Forward Pruning in Game-Tree
(AAAI-06).
The Minimax algorithm propagates scores from leaf nodes via a process of alternat-
ing between maximizing and minimizing. This process confers some measure of filter-
ing for pruning errors. However, such pruning errors propagate different in game-tree
search based on several factors. For example, we have shown in chapter 7 that the player
to move affects the propagation of forward pruning errors during game-tree search. This
effect is a property of forward pruning in game-tree search and this suggests that forward
pruning techniques should modify their behaviour depending on the player to move of a
node.
We extend this work in this chapter to show that the depth of a node also affects how
pruning errors are filtered [Lim and Lee, 2006a]. We build on Pearl’s error propagation
model [Pearl, 1984] for imperfect evaluation functions in game-tree search. This the-
oretical framework suggests a risk management strategy of pruning more near the root
and less near the leaf nodes to maximize game performance. We present experimental
115
8.1 Intuition 116
results on simulated and Chess game-trees that support this risk management strategy.
8.1 Intuition
For ease of explanation we use the height of a node u, defined as one less the number of
nodes in the longest path from u to a leaf node. Depth and height of a node are closely
related; a node of depth d is at height h − d − 1 for a game-tree of height h. To obtain
some insights, we first consider the case of a single pruning error.
Proposition 1. Assume a complete b-ary tree of height at least 3. Then the probability
of a change in value of a random node at height k, selected with uniform probability
from all nodes at that height, affecting the Minimax evaluation of its grandparent node
at height k + 2 is no more than 1b .
Proof. Consider the case where height k consists of Max nodes and a node at height
k is chosen with uniform probability from all nodes at that height to have a change in
value. We assume that the grandparent node g at height k + 2 has the true value of v. We
consider the case where the error decreases the node value first. If more than one child
of g has value v, a single value reduction at depth k will not change g’s value, so we
consider the case where only one child, say m, has value v. Note that the value of g can
only change if the value reduction occurs in m’s children and not anywhere else. The
probability of this occurring is no more than 1/b, since m has b children. Now, consider
the case where the error increases the node value. Consider any one of g’s children, say
m. Let m have value v. The value of the node m can change only in the case where only
one of its children has value v and that child is corrupted by error. Hence the number of
locations at height k that can change g’s value is no more than b out of the b2 nodes at
that height.
8.2 Theoretical model for the propagation of error 117
The cases for Min nodes are the same when we interchange the error type.
Theoretical models have found that the Minimax algorithm can amplify the evaluation
errors that occur at the leaf nodes. This is known as the Minimax pathology and was
independently discovered by Nau [Nau, 1979] and Beal [Beal, 1980]. Pearl [Pearl, 1984]
used a probabilistic game model, which we reproduce here, to examine this distortion
and quantify the amplification of errors. Let pk be the probability of a WIN for a node at
height k and consider a uniform binary game-tree where the leaf nodes are either WIN or
LOSS with probability p0 and 1−p0 , respectively. The leaf nodes also have an imperfect
evaluation function that estimates the values with a bi-valued variable e, where e = 1
or e = 0 represent a winning and losing position, respectively. We denote the Minimax
evaluation at position i as ei and the WIN-LOSS status at position i as Si . Positions
where i = 0 represent leaf nodes. ei and Si are represented in negamax notation and
refer to the player to move at position i. If node 3 has two children, 1 and 2, as in Figure
8.1, we write
L,
if S1 = W and S2 = W ,
S3 = (8.1)
W,
otherwise
8.2 Theoretical model for the propagation of error 118
0, if e1 = 1 and e2 = 1,
e3 = (8.2)
1, otherwise
1 2
α0 = P (e = 1|S = L) (8.3)
β0 = P (e = 0|S = W ) (8.4)
αk+1 = 1 − (1 − βk )2 (8.5)
αk
βk+1 = [(1 − pk )αk + 2pk (1 − βk )] (8.6)
1 + pk
Pearl also considered uniform b-ary game-trees where each non-leaf node has b succes-
sors, and similarly we obtain:
αk+1 = 1 − (1 − βk )b (8.8)
n o
b b
[pk (1 − βk ) + (1 − pk )αk ] − [pk (1 − βk )]
βk+1 = (8.9)
1 − pbk
where ξ is the solution to xb + x − 1 = 0. The limit points show that when the height of
the tree is large enough, under this model, the outcome at the root is uncertain only for
p0 = ξ. Hence, we are mostly interested in the behaviour at these three limit points.
While Pearl’s model assumed an imperfect evaluation function at the leaf nodes,
we assume that the evaluation of leaf nodes are reliable, since we are considering only
pruning errors. We adapt Pearl’s model to understand the effects of forward pruning by
considering α0 and β0 as the probabilities of pruning errors made at the frontier nodes
(nodes that have leaf nodes as children). We can now demonstrate the effects of the
depth of the node in forward pruning:
Theorem 2. Assume that errors exists only at the leaves of uniform b-ary game trees.
Then βk+2 /βk ≤ b with βk+2 /βk = b for some cases. Similarly, αk+2 /αk ≤ b with
8.2 Theoretical model for the propagation of error 120
1
αk+2 = 1 − (1 − ×
1 − pbk
{[pk (1 − βk ) + (1 − pk )αk ]b − [pk (1 − βk )]b })b (8.12)
1 b b
b
βk+2 = { (1 − p k )(1 − βk+1 ) + p k αk+1
1 − pbk+1
b
− (1 − pbk )(1 − βk+1 ) }
(8.13)
The value of αk+2 in equation (8.12) reaches its maximum value when βk = 0. We
also see that βk+1 = 0 when αk = 0 and therefore βk+2 in equation (8.13) reaches its
maximum value when αk = 0.
We denote αk when β0 = 0 by αk0 . When β0 = 0, we have βk = 0 when k is even.
0
αk+2 /αk0 gives us the rate of increase in error propagation based on the value of αk0 :
0 b b
1−[1−(αk ) ]
0
αk+2
α0k
if pk → 0,
= (8.14)
αk0 0 )b
1−(1−α
0
k
αk
if pk → 1,
0
αk+2 0 if pk → 0,
lim = (8.15)
αk →0 α0
k
b
if pk → 1,
[1−(1−βk ) ]
0 b
0
βk+2
βk0
if pk → 0,
= (8.16)
βk0 0 b b
[1−(1−β k) ]
0 βk
if pk → 1,
0
βk+2
if pk → 0,
b
lim = (8.17)
βk →0 β 0
k
0 if pk → 1,
Lastly, the proof in Proposition 1 can be modified to show that βk+2 /βk ≤ b and
αk+2 /αk ≤ b by considering one type of error instead of a single error.
To help us gain more insight into the rate of error propagation, we simplify the
analysis by considering uniform binary game trees. Setting β0 = 0 and reapplying the
recurrence equations for b = 2, we get
0 αk0
αk+2 = [(1 − pk )αk0 + 2pk ] ×
1 + pk
αk0
0
2− [(1 − pk )αk + 2pk ] . (8.18)
1 + pk
0 βk0 (2 − βk0 ) 2
pk 1 − (1 − βk0 )2 + 2(1 − p2k )
βk+2 = 2
(8.19)
2 − pk
Several interesting observations can be made from Figure 8.2, which shows the plot
0
of αk+2 0
/αk0 and βk+2 /βk0 for the limit points of pk . If pk → 1, errors are being filtered
out when βk0 < 1 − ξ, giving limk→∞ β2k
0
= 0. However, when βk0 > 1 − ξ, error rate
0
will increase with the height giving limk→∞ β2k = 1. If βk0 = 1 − ξ, the rate of error
0 0
propagation is constant, limk→∞ β2k = 1 − ξ. Similarly, when pk → 0, limk→∞ α2k is 0
when αk0 < ξ, is 1 when αk0 > ξ, and is ξ when αk0 = ξ. For pk = ξ, both types of errors
8.3 Theoretical Optimal Forward Pruning Scheme 122
α'k+2 β'k+2
or
α'k β'k
2 pk→ 0
1.75
1.5
{ pk→ ξ
pk→ 1
β'k+2
β'k
1.25
1
1-ξ ξ
0.75
pk→ 1
0.25
0.5
{ pk→ ξ
pk→ 0
α'k+2
α'k
α'k or β'k
0.2 0.4 0.6 0.8 1
grow with the height. These results are also given in [Pearl, 1984] for errors caused by
imperfect evaluation functions.
The upper bound on the rates of change in error propagation to the root of b by
Theorem 2 is clearly too conservative as shown in Figure 8.3. For example, when b = 2,
0
the maximum of βk+2 /βk0 for p0 = ξ occurs when βk0 ≈ 0.099 where βk+2
0
/βk0 ≈ 1.537,
which is less than the bound of 2 that the theorem suggests.
The theoretical model presented provides insights to the rate of forward pruning error
α0k+2 0
βk+2
propagation with respect to the depth of a node in game-tree search. Since α0k
and βk0
are greater than one for p0 = ξ, this means that the rate of pruning error propagation will
increase as the distance from the leaf node increases. The intuitive scheme is therefore
to perform the most amount of forward pruning near the root and the least amount of
8.3 Theoretical Optimal Forward Pruning Scheme 123
β'k+2
β'k
17.5 b = 50
15 b = 40
12.5
b = 30
10
b = 20
7.5 b = 10
b=5
5 b=4
b=3
b=2
2.5
1 β'k
0.2 0.4 0.6 0.8 1
0
βk+2
Figure 8.3: Plot of βk0
when p0 = ξ for various b
forward pruning near the leaf nodes. The theory, unfortunately, does not provide us with
the actual risk management strategy to maximize game-playing performance.
In this section, we compute the theoretical optimal forward pruning scheme for a
given error threshold with respect to the depth of the node. We first simplify the prob-
lem by assuming that the game-tree is of uniform breadth and depth. Furthermore, the
probability of a forward pruning error is the same as the probability of that error propa-
gating to the root. Formally, the simplifying assumptions are:
3. w(n) is defined as the number of nodes pruned if node n is forward pruned and
is equal to the size of a full-width subtree of height i where i is the height of the
node n
8.3 Theoretical Optimal Forward Pruning Scheme 124
4. p(n) is the probability of node n being forward pruned wrongly and is equal to
Pi ∈ [0, 1], where i is the height of the node n
5. f (n) is the probability of the pruning error propagating to the root and is equal to
Fi ∈ [0, 1], where i is the height of the node n
6. Pi = Fi , ∀i
X
w(n) (8.20)
n∈Pruned Nodes
P
subject to n∈Pruned Nodes p(n)f (n) = e, which we can rewrite as
h−1
X h−1
X
h−i i
Pi b b = Pi b h (8.21)
i=0 i=0
Ph−1 Ph
subject to the constraint i=0 Fi bh−i /bd(h−i)/2e = i=0 Pi bh−i /bd(h−i)/2e = e. Note
that the summation is for i = 0, 1, . . . , h − 1 as we do not forward prune the root node.
We now use lagrange multipliers to find the optimal solution by setting for all i:
h−1 h−1
∂ X h X
Pi b + λ( Pi bh−i /bd(h−i)/2e − e) = bh + λbb(h−i)/2c = 0 (8.22)
∂Pi i=0 i=0
which implies
λ = −bh−b(h−i)/2c (8.23)
Since λ is equal for all Pi , this forces i ≥ h − 1, and therefore the constraints show
8.4 Observing the Effect using Simulations 125
that the optimal forward pruning scheme is to prune when i = h − 1, or only at the root.
To illustrate the implications of our results, we once again perform a number of Monte
Carlo simulations. In our experiments, we use game-trees with uniform branching factor
5 and branch-dependent leaf values to simulate actual game-trees. Each node is assigned
a pruning probability qi : during search, a Bernoulli trial with probability qi of pruning
each child is performed, where i is the depth of the node. We test two different pruning
reduction schemes – Multiplicative and Linear pruning reduction. The multiplicative
pruning reduction schemes multiply the pruning probability by a constant factor for
every additional 2 depths, or qi+2 = qi × c, and q1 = q0 × c, where c is the multiplicative
factor. A multiplicative factor of 1.0 is equivalent to a Constant Pruning scheme. Linear
pruning reduction schemes reduce qi for each depth by subtracting a constant c from the
previous depth, or qi+1 = qi − c. Figure 8.4 shows the proportion of correct Minimax
evaluations for various pruning reduction schemes with starting pruning probability q0 =
0.1. We see that the linear pruning reduction schemes are clearly inadequate to prevent
amplification of pruning errors propagating to the root, even though the linear pruning
scheme of c = 0.02 reduces qi to zero when at search depth 6.
While the experiments have shown that multiplicative pruning reduction schemes
can prevent the amplification of pruning errors propagating to the root, it might be pos-
sible that multiplicative pruning reduction schemes are not pruning enough to justify
forward pruning at all. It is more interesting to consider the question “Given a fixed
time limit, what is the best pruning reduction scheme that allows the deepest search
while making less than a pre-defined threshold of errors?”.
8.4 Observing the Effect using Simulations 126
0.9
Proportion of Correct Answers
0.8
0.7
0.6
0.5
0.4
0.3
0.2
1 2 3 4 5 6
Search Depth
Multiplicative - 0.2 Multiplicative - 1/sqrt(5) Multiplicative - 0.5 Linear - 0.02
Multiplicative - 0.75 Linear - 0.01 Multiplicative - 1.0
We set the error threshold at 0.25, which means that the search should return a back-
up evaluation equal to the true Minimax value of the tree at least 75% of the time1 . To
simulate a fixed time limit, we used αβ search and iterative deepening to search until
the maximum number of nodes, which we set at 1000, were searched. We tested two
additional pruning reduction schemes – Root Pruning and Leaf Pruning. In the Root
pruning scheme, only the root node forward prunes, or q0 = c > 0 and qi = 0, for i > 0.
The Leaf pruning scheme only forward prunes leaf nodes, or qi = 0, for i < d and
qd = c > 0, where d is the search depth. We first used 8 iterations of binary searches
with 105 simulated game-trees each time to find pruning probabilities p0 for the various
pruning schemes that return correct answers at least 75% of the time. Next, we ran a
simulation of 106 game-trees and, for each generated game-tree, we performed every
1
In a real game, we would be able to use domain dependent information to decide when to prune. This
is likely to result in a smaller pruning error rate for the same aggressiveness in pruning
8.5 Observing the Effect using Chess Game-Trees 127
pruning scheme with the pruning probabilities found using binary search.
7
6.75
Search Depth Reached
6.5
6.25
5.75
5.5
5.25
Figure 8.5: Box plot showing the search depths reached with correct answers by each
pruning scheme
The Root pruning scheme is the best pruning scheme as it achieves, on average,
the deepest search depths among all pruning schemes as shown in Figure 8.5. A box
plot gives a five-number summary of: minimum data point, first quartile, median, third
quartile, and maximum data point. The mean and standard deviation of the search depths
reached for each pruning scheme are also given. These results suggest that pruning rate
decrease with the depth of the nodes.
To see the effects of the depth of a node on pruning error propagation in Chess game-
trees, experiments were done on 1,364 middle-game positions obtained from test suites
using R ANK C UT T OGA II.
We tested three pruning schemes, (1) pruning only at the root (RootOnly), (2) prun-
ing constantly throughout the game-tree (Constant), and (3) pruning only at the leaf
8.5 Observing the Effect using Chess Game-Trees 128
nodes (LeafOnly). To observe the effect more clearly, we modified R ANK C UT T OGA II
to forward prune more by start forward pruning from depth 3 (instead of depth 7).
As R ANK C UT T OGA II does not forward prune at the root node, and only starts
forward pruning at depth 3 onwards, the RootOnly scheme prunes only in nodes at the
level below the root, and LeafOnly prunes only in nodes at depth 3. The Constant
scheme therefore prunes at nodes between the level below the root and at depth 3, inclu-
sive. This does not affect the validity of the experiments as the various pruning schemes
prune differently depending on the depth of the node, even though the root and leaf
nodes do not perform forward pruning.
T OGA II and R ANK C UT T OGA II use iterative deepening (Section 2.2.3) for real-
time decisions. This means that the search depth reached within a time limit is variable.
The ‘true’ Minimax value of the tree is assumed to be the value returned by T OGA II
searched to the same depth achieved by R ANK C UT T OGA II (with forward pruning). In
addition, T OGA II and R ANK C UT T OGA II use principal variation search (PVS) (Sec-
tion 2.2.2) and aspiration search (Section 2.2.2) to accelerate Alpha-Beta search. As
previous search results are used as estimates for future search, this results in complex
interaction between pruning errors in previous iterations and search performance in fu-
ture iterations. We therefore removed PVS and aspiration search by setting Alpha and
Beta to ∞ and −∞, respectively, for each function call to the search.
We set the error threshold at 0.25, which means that the search should return a back-
up evaluation equal to the true Minimax value of the tree at least 75% of the time. We
used 10 iterations of binary search between [0.0, 1.0] to find the optimal value of the
pruning threshold for each of the three schemes on a set of 1,000 different middle-
games, also obtained from test suites. We then performed every pruning scheme with
the pruning probabilities found using binary search on 1,364 middle-game positions. We
8.5 Observing the Effect using Chess Game-Trees 129
tested two fixed time limits of 10 seconds and 5 seconds per position.
Experimental Results
For each pruning scheme, we record the number of correct solutions (as deemed by a
search using T OGA II to the same search depth), and the depth achieved by R ANK C UT
T OGA II. To observe the gain in search depth achieved by forward pruning, we also
record the depth achieved by T OGA II in the same time limit, and store the differences
between the search depths reached by R ANK C UT T OGA II and T OGA II.
Pruning Scheme Pruning Threshold # Correct % Correct µSearch Depth Gain σSearch Depth Gain
RootOnly 13.18% 1038 76.10% 0.08285 0.1050
LeafOnly 11.13% 1029 75.44% -0.04762 0.1193
Constant 1.86% 1023 75.00% -0.07136 0.1466
Table 8.1: Statistics for pruning schemes with time limit of 5 seconds per position
Table 8.2: One-way ANOVA to test for differences in search depth gain among the three
pruning schemes with time limit of 5 seconds per position
Table 8.1 shows the statistics for pruning schemes with the time limit of 5 sec-
onds per position. The RootOnly scheme is the most successful of the three, and is
the only scheme that achieved higher search depths than T OGA II within the same time
limit while achieving 75% accuracy of Minimax results. One-way Analysis of vari-
ance (ANOVA) is used in Table 8.2 to show that this result is statistically significant.
Tukey’s Honestly Significantly Different (HSD) comparison test revealed showed that
the RootOnly scheme was significantly different from the other two schemes.
Similarly, Table 8.3 shows the statistics for pruning schemes with the time limit of
8.5 Observing the Effect using Chess Game-Trees 130
Pruning Scheme Pruning Threshold # Correct % Correct µSearch Depth Gain σSearch Depth Gain
RootOnly 12.99% 1019 74.71% 0.08145 0.1378
LeafOnly 10.45% 1023 75.00% -0.07625 0.1468
Constant 1.66% 1014 74.34% -0.09862 0.1818
Table 8.3: Statistics for pruning schemes with time limit of 10 seconds per position
Source of Variation SS df MS F P-value F crit
Between Groups 19.62 2 9.808 63.12 1.381e−27 2.999
Within Groups 474.43 3053 0.1235
Total 494.05 3053
Table 8.4: One-way ANOVA to test for differences in search depth gain among the three
pruning schemes with time limit of 10 seconds per position
10 seconds per position. The RootOnly scheme is again the most successful of the three,
and is the only scheme that achieved higher search depths than T OGA II within the same
time limit while achieving 75% accuracy of Minimax results. Table 8.4 shows that this
result is statistically significant. In addition, Tukey’s HSD test showed that the RootOnly
scheme was significantly different from the other two schemes.
We report results from Kocsis’ PhD thesis [Kocsis, 2003] which introduces learning
algorithms for forward pruning. As some of the experiments involved learning depth
parameters that maximize performance of forward pruning, this presents additional evi-
dence for the effect of the depth of the node on pruning error propagation during game-
tree search. We recall the necessary background needed to formulate the relevant learn-
ing algorithm and describe experimental results from [Kocsis, 2003]. We will then ex-
plain these results using the effect of the depth of the node in forward pruning error
propagation.
8.5 Observing the Effect using Chess Game-Trees 131
ncount(~v ) ≤ N
~v = (v0 , v1 , . . . , vD )
vd ∈ Wd
d = {0, 1, . . . , D}
8.5 Observing the Effect using Chess Game-Trees 132
where ncount() is the size of the search tree, performance(~v ) is the quality of the move
(which can be defined in several ways and is left undefined), D is the maximum search
depth, Wd is the set of possible forward pruning parameters for depth d, and N is the
maximum number of nodes allowed to be explored.
TS-FPV Algorithm
It is difficult to solve the optimization problem given in the last section by testing each
FPV because it is a multivariate optimization problem. Kocsis introduces an algorithm
called TS-FPV that is based on tabu search [Glover, 1989]. Like tabu search, TS-FPV
is a local search and it selects FPVs from the neighborhood of the current FPV being
considered. TS-FPV uses a list called a tabu list (TL) to record some of the recent FPVs
previously investigated. These recent solutions are avoided (as they are “tabu”).
There are two phases in the algorithm: intensification and diversification. Intensifi-
cation tries to find neighboring solutions that improve the current FPV, where diversifi-
cation explores untested neighborhoods of solutions. In TS-FPV, the frequency of each
possible width at each depth is stored. During the diversification phase, candidate so-
lutions are explored in descending order of the frequencies of each value of the current
FPV. This is seen as a form of long-term memory.
We described a simple variant of FPV that uses one value to represent the amount of
forward pruning to perform at each search depth in Section 8.5.2. [Kocsis, 2003] extends
FPVs by mapping the values along an additional dimension - the iteration number. The
iteration number is the current search iteration of an iterative deepening search (Section
2.2.3), which is commonly performed by real-time game-playing programs. The two
8.5 Observing the Effect using Chess Game-Trees 133
relevant FPV variants are: (1) FPV-d, which forward prunes the same amount at each
search depth, and (2) FPV-l, which forward prunes the same amount at a certain distance
to the leaf nodes.
To illustrate this, we interpret the forward pruning that occurs under the variants with
the FPV {v1 , v2 , v3 , v4 , v5 } while searching to depth 5.
1. At iteration 1, FPV-d forward prunes v1 of nodes at the root. FPV-l forward prunes
v5 of nodes at the root.
4. And so on.
Experimental Setup
TS-FPV is used to maximize performance of FPV-d and FPV-l. The experiments for TS-
FPV are done using C RAFTY, which has 5 phases of move-generation, (1) moves from
the transposition table, (2) capture moves, (3) killer moves, and (4) 3 of the remaining
moves sorted by the History Heuristic, and (5) the remaining moves (Section 6.4.2 for
more details). In the experiments, moves are considered for forward pruning only from
phase 4 onwards, and are also re-ordered using the Neural MoveMap heuristic [Kocsis,
2003, Chapter 3].
3,000 randomly selected middle-game positions are used to measure the performance
of the FPV variants. The performance is the average difference in scores between the
8.5 Observing the Effect using Chess Game-Trees 134
search with forward pruning and the score returned by a 12-ply search. In other words,
the scores returned by the 12-ply search are considered the “true” Minimax value of the
position. The result is statistically significant with 3,000 test positions. The limit on the
number of nodes to expand, ncount(), is set to the number of nodes expanded by the
search without forward pruning searched to the reference depth.
The possible FPV values in the experiments are {1, 2, 3, 4, 5, 10, 20, 30, 40, 50,
all}, and the reference search depths are 4, 5, 6, 7 and 8. For each reference search
depth and FPV variant, TS-FPV is used to learn the best FPV, where the search with
forward pruning searches to one more the reference search depth.
Experimental Results
The top 3 FPVs of the various search depths for FPV-l and FPV-d are presented in Tables
8.5 and 8.6, respectively. In our discussions, we focus on the values of the FPVs with
respect to the distance to the root or leaves. Interested users should refer to [Kocsis,
2003] for detailed analysis on the experimental results.
Due to the use of iterative deepening and Alpha-Beta search window enhancements
of C RAFTY, it is not possible to exactly predict the exact forward pruning scheme that
TS-FPV will find using our theoretical analysis of forward pruning error propagation in
uniform game-trees with Minimax search. There are several reasons: (1) the Minimax
values returned at lower iterations are used as estimates to Alpha-Beta search windows
for later iterations and (2) positions evaluated incorrectly due to forward pruning errors
might be stored and later reused in the transposition tables. There is therefore a complex
interaction between the (correct or incorrect) scores returned by the search with forward
pruning at lower iterations and the effect of these earlier scores in subsequent iterations.
There is, however, a fairly straightforward conjecture that theory suggests - the best
8.5 Observing the Effect using Chess Game-Trees 135
Table 8.5: Top 3 FPV-l values for various search depths [Kocsis, 2003]
Table 8.6: Top 3 FPV-d values for various search depths [Kocsis, 2003]
8.6 Discussion 136
pruning scheme is to prune more near the root and less near the leaves. The experimental
results reported by Tables 8.5 and 8.6 provide further evidence to this conjecture. We
consider the vector {5 10 all 10 10 all 20} where FPV-l achieved its best performance
of 1.08 with reference search depth 6. The vector is of length 7 as FPV-l searches to
depth 7 (one more than the reference depth). Note that the root at iteration 7 considers
only at most 5 moves before forward pruning the remaining moves, and this is the most
aggressive action of all iterations and search depths for this vector. Similarly, we con-
sider the vector {3 2 all 20 20} where FPV-d achieved its best performance of 0.32 with
reference search depth 4. Using this vector, FPV-d forward prunes most aggressively,
considering only 2 moves, when at depth 2, and slightly less aggressively when at the
root. The remaining top performing vectors exhibit this similar trend - forward pruning
is done more aggressively near the root and less aggressively near the leaves.
8.6 Discussion
There is anecdotal evidence that supports our analysis of the effect of the depth of nodes
in forward pruning. Adaptive Null-Move Pruning is a variant of Null-Move Pruning that
essentially prunes less when near the leaf nodes. In developing Adaptive Null-Move
Pruning [Heinz, 1999], it was initially expected that the best performance in an adaptive
form of null-move pruning would come from pruning more when near the leaf nodes and
less when closer to the root. This belief was consistent with the expectations of other
researchers [Greenblatt et al., 1988, Goetsch and Campbell, 1990, Donninger, 1993] that
the amount of forward pruning should increase with increasing distance of the node from
the root of the tree. However, Heinz found that the opposite is true: Adaptive Null-Move
Pruning works better by pruning less near the leaf nodes, which agrees with our results.
8.7 Chapter Conclusion 137
Heinz notes that this scheme is contrary to static forward pruning methods which
prune more near the leaf nodes. We conjecture that static forward pruning methods can
prune more when near the leaf nodes simply because the static evaluations become more
accurate near the leaf nodes, and should prune less when far from the leaf nodes as they
become inaccurate.
Other forward pruning techniques appear to avoid having to deal with the effect of
the depth of a node on pruning error propagation by not forward pruning in the parts
of the search tree near to the leaf nodes. For example, (1) RankCut, discussed in chap-
ter 6, starts forward pruning only when the search depth is 7 or more, (2) Multi-Cut
αβ-Pruning does not prune close to the horizon so as to reduce “the time overhead in-
volved” [Björnsson and Marsland, 2001], (3) ProbCut [Buro, 1995b, Jiang and Buro,
2003] works only nodes at higher depths as it uses depth-reduced search to estimate
search results for deep searches, and (4) History Pruning/Late Move Reduction as im-
plemented in T OGA II does not forward prune when the depth is 4 or less.
Existing literature had painted the pessimistic picture that the Minimax algorithm cor-
rupts the back-up values in the presence of errors. While there are alternative expla-
nations for Minimax pathology, including but not limited to, [Smith and Nau, 1994,
Sadikov et al., 2005,Lustrek et al., 2005], our analysis show that the Minimax algorithm
is filtering pruning errors, but at a slower rate than the rate of growth of leaf nodes.
Since the rate at which the Minimax algorithm can filter out errors is smaller than
the rate at which leaf nodes are introduced for each additional search depth, this sug-
gests that forward pruning techniques should prune less as search depth increases. We
8.7 Chapter Conclusion 138
simulated game-trees with correlated leaf values, which have been shown to be non-
pathological, and demonstrated that it is better to prune more aggressively near the root
and less aggressively near the leaf nodes. Experimental data with Chess game-trees in
R ANK C UT T OGA II and C RAFTY also support this risk-management strategy.
The theoretical analysis therefore offers an explanation for the experimental results
and the anecdotal optimal forward pruning scheme observed by Chess programmers.
The risk management strategy of pruning more near the root and less near the leaf
nodes should help to maximize performance of forward pruning techniques in game-
tree search.
Chapter 9
Conclusion and Future Research
In this chapter, we begin by summarizing the work contained in this thesis, indicating the
primary results. Finally, we adopt a broader perspective and discuss possible directions
of future research.
9.1 Conclusion
139
9.1 Conclusion 140
The goal of our work has been to improve the state-of-the-art for forward pruning
techniques in game-tree search. This thesis has therefore focused on novel applications
of forward pruning techniques and better understanding of the theoretical properties
of forward pruning in game-tree search. In this thesis, we have (1) solved the game
of Tigers and Goats, a high game-tree complexity problem, by using forward pruning
techniques in forward searches, (2) introduced an effective forward pruning technique
called RankCut that can be applied alongside existing forward pruning techniques, and
(3) shown that two factors, namely the player to move and the depth of a node, affect
forward pruning error propagation in game-tree search and suggested risk management
strategies for forward pruning techniques to maximize game-playing performance.
9.1.2 RankCut
In this thesis, we presented work that successfully applied novel forward pruning tech-
niques to game-tree search in computationally hard search problems. We predict that
forward pruning, or selective search, will play bigger roles in search performance as the
search problems becoming even more computationally demanding. The long-term goal
of selective search research is to match and even surpass the intuitive ability of humans
to selectively search the state-space and yet be able to make accurate decisions; for the
short to medium term, we can suggest the following areas of future research.
While we have weakly solved the game of Tigers and Goats, it is always desirable to be
able to strongly solve a game and to play it perfectly. Strongly solving Tigers and Goats
is possible with current state of the art machines by emulating [Romein and Bal, 2003]
and completely enumerating the state-space of Tigers and Goats.
There are other variants of Tigers and Goats that have different winning criterions. It
would be interesting to investigate and find the game-theoretical values of these variants.
9.2.2 RankCut
One potential problem is that RankCut assumes the statistics of Π(f~i ) collected without
forward pruning remain the same when forward pruning. While our experiments indi-
cate that the assumption is reasonable for practical purposes, one possible solution is to
recollect the statistics with forward pruning until the probabilities stabilize. However,
this approach needs experiments to verify its effectiveness.
Π(f~i ) was estimated using the relative frequency of a better move appearing. The
9.2 Future Research 143
search up to an average of 8 plies in 1 minute with RankCut. Under fixed time limits of
1 minute per move, Alpha-Beta search with RankCut was able to beat the commercial
version of ABA-PRO [Aichholzer et al., 2002], arguably the strongest Abalone-playing
entity, at a fixed playing level of 9 by +12 -5 =3 in a 20-game series, whereas Alpha-Beta
search without RankCut lost handily by +2 -15 =5 to ABA-PRO.
More recently, joint work with Cheng Wei Chang and Wee Sun Lee have resulted in
a Abalone program called ABA-CUT [Chang, 2007] that plays the game Abalone using
hard-coded heuristics and RankCut. A finely-tuned version of ABA-CUT was able to
defeat ABA-PRO, fixed playing level of 8, +5 -0 =1. In addition, experiments using
ABA-CUT with and without RankCut show performance difference in self-play games.
However, as these games were not under tournament conditions, these results at best
suggest that RankCut is effective in other games. More research needs to be done to
show the effectiveness of RankCut in games with high branching factor.
We have shown that two factors, the depth and the player to move of a node, affect the
rate of forward pruning error propagation in game-tree search. For each factor, we also
determined the risk management strategy to use that minimizes the amount of pruning
error propagation. Some possible areas of research are to (1) theoretically derive and
empirically find the best pruning schemes for each factor, (2) analyze the interaction of
both factors in actual game-tree search, and (3) discover other factors that affect pruning
error propagation.
The game of Go can be considered as the grand challenge of game AI at this point in
time. One interesting development in computer Go has been the introduction of Monte
9.2 Future Research 145
Carlo methods that combine game-tree search and randomly generated moves for eval-
uation [Coulom, 2006, Kocsis and Szepesvári, 2006]. The random nature of Monte
Carlo methods corresponds well with the theoretical analysis of the properties of for-
ward pruning presented in this thesis, and should extend to Monte Carlo tree search.
More research on how to incorporate risk management strategies in forward pruning can
be done to further improve the state of the art for Monte Carlo tree search.
Appendix A
Additional Tigers and Goats Endgame
Database Statistics
Tiger to Move
Goat to Move
Tiger to Move
Goat to Move
146
147
Tiger to Move
Goat to Move
Tiger to Move
Goat to Move
Tiger to Move
Goat to Move
Consider the 5 × 5 Tigers and Goats board, with grid points numbered 1 to 25, and the
8 symmetry transformations that permute the set D = {1, . . . , 25} as shown in Figure
B.1. We identify a board position with a function f : D → {T iger, Goat, Empty} =
{T, G, E} that assigns to each grid point its status, subject to the constraint that there
are exactly 4 Tigers and a number of Goats that varies from 16 to 20. We show how to
compute the number of distinct board positions, modulo symmetry, for the case of 20
Goats, which we abbreviate as 4T 1E, i.e. 4 Tigers and 1 empty spot. Any symmetry
permutation of the square board can be obtained by some sequence of flips around the
axes as shown in Figure B.1. The shaded areas at right are used in the analysis of
rotational symmetries.
Let S = {f : D → {T, G, E}|f assumes 4 values T, 20 values G, 1 value E} Obser-
vation: the group G of permutations of the domain D induces permutations on the set S
of functions. Def: two functions f and g are equivalent iff there is a permutation P in G
148
149
1
P
Burnside’s Lemma. The number of equivalence classes of S under G is |G|
· I(P ) ,
where the sum is taken over all permutations P in G·.
• Identity permutation, ’1’. Any board position remains invariant. There are 25
possibilities to place the empty spot, multiplied by 24
4
possibilities to place the 4
tigers, resulting in 265,650 board positions invariant under the identity permuta-
tion.
• Rotation by 90◦ or by 270◦ . The empty spot must be on the center point 13, and
each of the 4 Tigers must be placed in ”his own” 2 × 3 area (shaded in Figure B.1,
top right), such that the 4 chosen spots are symmetric under rotation (e.g. on spots
150
2, 10, 24, 16). Since the location of 1 Tiger determines the place of the other 3 as
well, there are 6 board positions invariant under rotations by ±90◦ .
• Rotation by 180◦ . The empty spot must be on the center point 13. 2 Tigers can
be placed anywhere in the upper angular area consisting of 12 spots, shaded in
Figure B.1, bottom right. The location of the other 2 Tigers is then determined by
symmetry, resulting in 12
2
= 66 invariant board positions.
• Flipping around the vertical or horizontal axis through the center. The empty spot
can be anywhere along the axis. i.e. in 5 different places. After the empty spot
has been placed, three cases of Tiger placements must be distinguished:
a 4 Tigers located on the flipping axis: there is only 1 way to place them
4
b 2 Tigers located on the flipping axis ( 2
ways to place them), 1 Tiger in
each half of the remaining board (10 ways to place the first Tiger, the other
is determined by symmetry)
10
c 2 Tigers in each half of the remaining board ( ways to place the first 2,
2
+ 10
2
) = 530 invariant board positions.
• Flipping around either diagonal axis through the center. This turns out to be the
same as flipping around the vertical or horizontal axis, resulting in 530 invariant
board positions.
24
1 Identity 25 × 4
= 265,650
2 Rotations ±90◦ 6
12
1 Rotations 180◦
2
= 66
4 Flips 530
8 Permutations 267,848
The calculations for 16 to 19 Goats are similar but more complicated, resulting in
the following numbers of inequivalent board positions:
4T 20G 1E 33,481
4T 19G 2E 333,175
4T 18G 3E 2,105,695
4T 17G 4E 9,469,965
4T 16G 5E 32,188,170
1
In each case, the number of inequivalent board positions is roughly 8
of the number
of distinct positions if symmetry is ignored.
Appendix C
Implementing Retrograde Analysis for
Tigers and Goats
The first step in retrograde analysis is to initialize all easily recognizable terminal posi-
tions. It is simple to identify terminal nodes as they can only be of three types:
• If there are 16 goats with at least 1 goat in immediate danger of being captured,
then Tiger to Move, Tiger wins.
After the initialization has set these terminal positions, it sets the value of all remain-
ing positions to a draw. An iterative process can then correctly determine the correct
value of all remaining positions. Note that if a position is lost for the player to move, all
of the successors of the position can be marked as wins for the opponent. Similarly, if a
position is won for the player to move, all preceding positions are “potential” losses for
the opponent. However, they are only true losses if all of their successors are also won
152
C.1 Indexing Scheme 153
• Initialize the database by determining the number of successors for each position.
If the node is terminal, we compute its value and set its value appropriately, and
its status to known. Otherwise, the number of successors is stored, and its status
is set to unknown.
• Perform multiple passes through the database. The database is traversed and for
each node with known status, all preceding positions are notified of its value.
Each predecessor updates its score if it is improved by the child. Repeat iterat-
ing through the database until no score changes to any position occurs during a
complete round.
Note that with k similar pieces which are unlabeled, the k! arrangements of k similar,
labeled pieces on the board are equivalent. The index range can therefore be reduced
by a factor of k!. There are kq = k!(q−k)!
q!
placements of k similar pieces on q squares.
Let the position of k similar pieces be {p1 , p2 , . . . , pi } where p1 < p2 < . . . < pi
and pi ∈ [0, q − 1]. A space-efficient indexing scheme is then given by the following
algorithm [Nalimov et al., 2000] as shown in Pseudocode 10.
The tigers were selected as the pieces to be constrained to select a “canonical” board.
We define a canonical board by choosing the board with the lowest index of tigers af-
ter performing the 8 symmetric operations (identity, rotation by 90, 180 and 270 de-
grees, reflection on the x-axis, y-axis and the two diagonals) on the board. There are
C.1 Indexing Scheme 154
Pseudocode 10 index({p1 , p2 , . . . , pi })
1: index ← 0
2: while k > 0 do
3: while p1 6= 0 do
q−1
4: index ← index + k−1
5: q ←q−1
6: for i ← 1, 2, . . . , k do
7: pi ← pi − 1
8: k ← k − 1, q ← q − 1
9: for i ← 1, 2, . . . , k do
10: pi ← pi+1 − 1
1,666 unique canonical forms for all possible tiger configurations. Note that there are
25
4
= 12, 650 possible tiger configurations. To reduce running time, the indices of
the canonical boards for each tiger configuration can be pre-computed and stored in an
array of size 12,650. Table C.1 shows the different total index size for different number
of goats on the board. There are still some board positions which are symmetric to each
other within the indexing scheme. This occurs when the tigers are symmetric under a
certain operation (e.g. reflection), but since all combinations of goats are generated,
there will be combinations of goats which are also symmetric under the same operation
but are assigned different indices.
During the construction of the endgame databases, the board needs to be accessed and
evaluated. However, if we store the board separately in the database, the storage of each
separate board will take approximately 50 bits, since the naive method is to use 2 bits for
each board position. This separate board representation will greatly increase the space
requirements of the endgame databases. This storage requirement can be eliminated by
introducing an inverse operator for the indexing scheme, where we can get the position
of pieces by providing the index, the number of pieces and the total number of squares:
• The hardware used was a Apple PowerMac Dual 1.8 GHz PowerPC G5 with 2.25
GB Ram.
• All versions of C RAFTY and T OGA II used the default settings. No endgame
database was used.
156
157
• Pondering was turned off. CPU time was used during test suites and elapsed time
was used during matches.
Appendix E
Chess Openings
158
E.1 32 Openings from [Jiang and Buro, 2003] 159
rZbZ0skZ rZblrZkZ
opZ0apop opZ0Zpap
0Z0opm0Z nZ0o0mpZ
l0Z0Z0A0 Z0oPZ0Z0
0Z0LPO0Z 0Z0ZPZ0Z
Z0M0Z0Z0 Z0M0Z0Z0
POPZ0ZPO PO0MBOPO
Z0JRZBZR S0AQZRJ0
1 e4 c5 2 Nf3 Nc6 3 1 d4 Nf6 2 c4 c5 3 d5 e6
d4 cXd4 4 NXd4 Nf6 5 4 Nc3 eXd5 5 cXd5 d6 6
Nc3 d6 6 Bg5 e6 7 Qd2 e4 g6 7 Nf3 Bg7 8 Be2
Be7 8 O-O-O O-O 9 f4 O-O 9 O-O Re8 10 Nd2
NXd4 10 QXd4 Qa5 Na6
Bibliography
[Aichholzer et al., 2002] Aichholzer, O., Aurenhammer, F., and Werner, T. (2002). Al-
gorithmic fun - Abalone. Special Issue on Foundations of Information Processing of
TELEMATIK, 1:4–6.
[Akl and Newborn, 1977] Akl, S. G. and Newborn, M. M. (1977). The Principle Con-
tinuation and the Killer Heuristic. In ACM Annual Conference, pages 466–473.
[Allis, 1994] Allis, L. V. (1994). Searching for Solutions in Games and Artificial Intel-
ligence. PhD thesis, University of Limburg, Masstricht, The Netherlands.
[Allis et al., 1994] Allis, L. V., van der Meulen, M., and van den Herik, H. J. (1994).
Proof-Number Search. Artificial Intelligence, 66:91–124.
164
Bibliography 165
[Anantharaman et al., 1990] Anantharaman, T., Campbell, M. S., and Hsu, F. (1990).
Singular extensions: Adding selectivity to brute-force searching. Artificial Intelli-
gence, 43(1):99–109.
[Baudet, 1978] Baudet, G. M. (1978). The Design and Analysis of Algorithms for Asyn-
chronous Multiprocessors. PhD thesis, Carnegie Mellon University, Pittsburgh, PA.
[Baxter et al., 1998] Baxter, J., Tridgell, A., and Weaver, L. (1998). KnightCap: A
Chess Program That Learns by Combining TD(lambda) with Game-Tree Search. Fif-
teenth International Conference on Machine Learning, pages 28–36.
[Beal, 1989] Beal, D. F. (1989). Experiments with the null move. In Advances in
Computer Chess 5, pages 65–79. Elsevier Science.
[Berlekamp et al., 2001] Berlekamp, E., Conway, J., and Guy, R. (2001). Winning Ways
For Your Mathematical Plays, volume 4 volumes. A.K. Peters, 2nd edition edition.
[Birmingham and Kent, 1977] Birmingham, J. A. and Kent, P. (1977). Tree searching
and tree pruning techniques. Advances in Computer Chess 1, pages 89–107.
[Björnsson and Newborn, 1997] Björnsson, Y. and Newborn, M. (1997). Kasparov ver-
sus Deep Blue: Computer Chess Comes of Age. International Computer Chess As-
sociation Journal, 20(2):92–92.
[Björnsson and van den Herik, 2005] Björnsson, Y. and van den Herik, H. J. (2005).
The 13th World Computer-Chess Championship. International Computer Games As-
sociation Journal, 28(3).
[Bourzutschky et al., 2005] Bourzutschky, M., Tamplin, J., and Haworth, G. (2005).
Chess endgames: 6-man data and strategy. Theoretical Computer Science,
349(2):140–157.
[Buro, 1997a] Buro, M. (1997a). The Othello Match of the Year: Takeshi Murakami
vs. Logistello. International Computer Chess Association Journal, 20(3):189–193.
[Buro, 1997b] Buro, M. (1997b). The Othello Match of the Year: Takeshi Murakami
vs. Logistello. International Computer Chess Association Journal, 20(3):189–193.
[Buro, 1997c] Buro, M. (1997c). Toward opening book learning. Technical Report 2,
NECI.
[Buro, 1998] Buro, M. (1998). From simple features to sophisticated evaluation func-
tions. Technical Report 60, NECI.
[Buro, 1999] Buro, M. (1999). Experiments with Multi-ProbCut and a new high-quality
evaluation function for Othello. In van den Herik, H. J. and Iida, H., editors, Games
in AI Research. Institute for Knowledge and Agent Technology IKAT, Universiteit
Maastricht, Maastricht, The Netherlands.
[Chang, 2007] Chang, C. W. (2007). Searching game trees with high branching fac-
tor: Abalone. Undergraduate Research Opportunity Program (UROP) Project Report
(NUS).
Bibliography 168
[Chellapilla and Fogel, 1999] Chellapilla, K. and Fogel, D. (1999). Evolving Neural
Networks to Play Checkers without Expert Knowledge. IEEE Transactions on Neural
Networks, 10(6):1382–1391.
[Chellapilla and Fogel, 2001a] Chellapilla, K. and Fogel, D. (2001a). Evolving an Ex-
pert Checkers Playing Program without using Human Expertise. IEEE Transactions
on Evolutionary Computation, 5(5):422–428.
[Chellapilla and Fogel, 2001b] Chellapilla, K. and Fogel, D. (2001b). Evolving an Ex-
pert Checkers Playing Program without Using Human Expertise. IEEE Transactions
on Evolutionary Computation, 5(4):422–428.
[Chong et al., 2003] Chong, S. Y., Ku, D. C., Lim, H. S., Tan, M. K., and White, J. D.
(2003). Evolved Neural Networks Learning Othello Strategies. In Congress on Evo-
lutionary Computation (CEC’03), pages 2222–2229.
[Dayan et al., 2001] Dayan, P., Schraudolph, N. N., and Sejnowski, T. J. (2001). Learn-
ing to evaluate Go positions via temporal difference methods, pages 74–96. Springer
Verlag.
[Donninger, 1993] Donninger, C. (1993). Null move and deep search: Selective search
heuristics for obtuse Chess programs. International Computer Chess Association
Journal, 16(3):137–143.
Bibliography 169
[Fogel, 2002] Fogel, D. (2002). Blondie24: Playing at the Edge of AI. Academic Press,
London, UK.
[Gelly and Silver, 2007] Gelly, S. and Silver, D. (2007). Combining online and offline
knowledge in uct. In Proceedings of ICML 2007.
[Gelly et al., 2006] Gelly, S., Wang, Y., Munos, R., and Teytaud, O. (2006). Modication
of uct with patterns in monte-carlo go. Technical Report RR-6062, INRIA.
[Glover, 1989] Glover, F. (1989). Tabu Search - Part I. ORSA Journal of Computing,
1(3):190–206.
[Greenblatt et al., 1988] Greenblatt, R. D., Eastlake, D. E., and Crocker, S. D. (1988).
The Greenblatt Chess Program. In Computer Chess Compendium, pages 56–66.
Springer-Verlag.
[Heinz, 2000] Heinz, E. A. (2000). Scalable Search in Computer Chess. Vieweg Verlag.
[Hopp and Sanders, 1995] Hopp, H. and Sanders, P. (1995). Parallel game tree search
on SIMD machines. In Workshop on Parallel Algorithms for Irregularly Structured
Problems, pages 349–361.
[Iida et al., 2002] Iida, H., Sakuta, M., and Rollason, J. (2002). Computer Shogi. Arti-
ficial Intelligence, 134:121–144.
[Irving et al., 2000] Irving, G., Donkers, H. H. L. M., and Uiterwijk, J. W. H. M. (2000).
Solving Kalah. International Computer Games Association Journal, 23(3):139–148.
[Jiang and Buro, 2003] Jiang, A. and Buro, M. (2003). First Experimental Results of
ProbCut Applied to Chess. In van den Herik, H. J., Iida, H., and Heinz, E. A., editors,
Advances in Computer Games Conference 10, pages 19–31.
[Junghanns et al., 1997] Junghanns, A., Schaeffer, J., Brockington, M., Bjornsson, Y.,
and Marsland, T. (1997). Diminishing returns for additional search in Chess. In
van den Herik, J. and Uiterwijk, J., editors, Advances in Computer Chess 8, pages
53–67. Univ. of Rulimburg.
Bibliography 171
[Kishimoto and Müller, 2004] Kishimoto, A. and Müller, M. (2004). A general solution
to the graph history interaction problem. In Proceedings of the Nineteenth National
Conference on Artificial Intelligence (AAAI-04), pages 644–649.
[Knuth and Moore, 1975] Knuth, D. E. and Moore, R. W. (1975). An analysis of Alpha-
Beta pruning. Artificial Intelligence, 6:293–326.
[Kocsis, 2003] Kocsis, L. (2003). Learning Search Decisions. PhD thesis, Universiteit
Maastricht, IKAT/Computer Science Department.
[Kocsis and Szepesvári, 2006] Kocsis, L. and Szepesvári, C. (2006). Bandit Based
Monte-Carlo Planning. In ECML 2006, pages 282–293.
[Levy and Newborn, 1991] Levy, D. and Newborn, M. (1991). How Computers Play
Chess. Computer Science Press.
[Levy et al., 1989] Levy, D. N. L., Broughton, D. C., and Taylor, M. (1989). The SEX
algorithm in computer Chess. International Computer Chess Association Journal,
12(1):10–21.
[Lim and Nievergelt, 2004] Lim, Y. and Nievergelt, J. (2004). Computing Tigers and
Goats. International Computer Games Association Journal, 27(3):131–141.
Bibliography 172
[Lim and Lee, 2006a] Lim, Y. J. and Lee, W. S. (2006a). Properties of Forward Prun-
ing in Game-Tree Search. In Proceedings of Twenty-First National Conference on
Artificial Intelligence (AAAI-06).
[Lim and Lee, 2006b] Lim, Y. J. and Lee, W. S. (2006b). RankCut – A Domain Inde-
pendent Forward Pruning Method for Games. In Proceedings of Twenty-First Na-
tional Conference on Artificial Intelligence (AAAI-06).
[Lincke, 2002] Lincke, T. R. (2002). Exploring the Computational Limits of Large Ex-
haustive Search Problems. PhD thesis, ETH Zurich, Swiss.
[Lustrek et al., 2005] Lustrek, M., Gams, M., and Bratko, I. (2005). Why Minimax
Works: An Alternative Explanation. In IJCAI, pages 212–217.
[Nalimov et al., 2000] Nalimov, E. V., Haworth, G. M., and Heinz, E. A. (2000). Space-
efficient indexing of Chess endgame tables. International Computer Games Associ-
ation Journal, 23(3):148–162.
[Nau, 1979] Nau, D. S. (1979). Quality of decision versus depth of search on game-
trees. PhD thesis, Duke University.
[Pearl, 1984] Pearl, J. (1984). Heuristics – Intelligent Search Strategies for Computer
Problem Solving. Addison-Wesley Publishing Co., Reading, MA.
[Plaat, 1996] Plaat, A. (1996). Research Re: search & Re-search. PhD thesis, Erasmus
University Rotterdam, Rotterdam, Netherlands.
Bibliography 174
[Plaat et al., 1996] Plaat, A., Schaeffer, J., Pijls, W., and de Bruin, A. (1996). Exploiting
Graph Properties of Game Trees. In 13th National Conference on Artificial Intelli-
gence, volume 1, pages 234–239.
[Rijswijck, 2000] Rijswijck, J. V. (2000). Are Bees better than Fruitflies? (Experiments
with a Hex playing program). AI’00: Advances in Artificial Intelligence, 13th bien-
nial Canadian Society for Computational Studies of Intelligence (CSCI) Conference,
pages 13–25.
[Romein and Bal, 2003] Romein, J. and Bal, H. (2003). Solving the Game of Awari
using Parallel Retrograde Analysis. IEEE Computer, 36(10):26–33.
[Rosin and Belew, 1997] Rosin, C. D. and Belew, R. K. (1997). New methods for com-
petitive coevolution. Evolutionary Computation, 5(1):1–29.
[Sadikov et al., 2005] Sadikov, Bratko, and Kononenko (2005). Bias and Pathology in
Minimax Search. TCS: Theoretical Computer Science, 349.
Bibliography 175
[Schaeffer, 1989] Schaeffer, J. (1989). The History Heuristic and Alpha-Beta Search
Enhancements in Practice. IEEE Transactions on Pattern Analysis and Machine In-
telligence, PAMI-11(11):1203–1212.
[Schaeffer et al., 2005] Schaeffer, J., Björnsson, Y., Burch, N., Kishimoto, A., Müller,
M., Lake, R., Lu, P., and Sutphen, S. (2005). Solving Checkers. IJCAI-05, pages
292–297.
[Schaeffer et al., 2007] Schaeffer, J., Burch, N., Björnsson, Y., Kishimoto, A., Muller,
M., Lake, R., Lu, P., and Sutphen, S. (2007). Checkers is solved. Science. to appear.
[Schaeffer et al., 1992] Schaeffer, J., Culberson, J., Treloar, N., Knight, B., Lu, P., and
Szafron, D. (1992). A World Championship Caliber Checkers Program. Artificial
Intelligence, 53(2–3):273–290.
[Seo et al., 2001] Seo, M., Iida, H., and Uiterwijk, J. W. H. M. (2001). The PN*-search
algorithm: Application to Tsume-Shogi. Artificial Intelligence, 129(1–2):253–277.
[Slagle and Dixon, 1969] Slagle, J. H. and Dixon, J. K. (1969). Experiments with some
programs that search game trees. Journal of the ACM, 16(2):189–207.
[Slate and Atkin, 1977] Slate, D. J. and Atkin, L. R. (1977). Chess 4.5 - the Northwest-
ern University Chess Program. In Frey, P., editor, Chess Skill in Man and Machine,
pages 82–118. Springer-Verlag.
Bibliography 176
[Smith and Nau, 1994] Smith, S. J. J. and Nau, D. S. (1994). An analysis of forward
pruning. In Proceedings of 12th National Conference on Artificial Intelligence (AAAI-
94), pages 1386–1391.
[Uiterwijk and van den Herik, 2000] Uiterwijk, J. W. H. M. and van den Herik, H. J.
(2000). The advantage of the initiative. Information Sciences, 122(1):43–58.
[van den Herik et al., 2002] van den Herik, H. J., Uiterwijk, J. W. H. M., and van Ri-
jswijck, J. (2002). Games Solved: Now and in the Future. Artificial Intelligence,
134(1–2):277–311.
[Wágner and Virág, 2001] Wágner, J. and Virág, I. (2001). Solving Renju. Interna-
tional Computer Games Association Journal, 24(1):30–34.
[Wu and Huang, 2005] Wu, I.-C. and Huang, D.-Y. (2005). A New Family of k-in-a-
row Games. In The 11th Advances in Computer Games Conference (ACG’11), Taipei,
Taiwan.
[Wu and Beal, 2002] Wu, R. and Beal, D. (2002). A Memory Efficient Retrograde Al-
gorithm and Its Application To Chinese Chess Endgames. In More Games of No
Chance, volume 42, pages 213–227. MSRI Publications.
Bibliography 177
[Zobrist, 1969] Zobrist, A. L. (1969). A Hashing Method with Applications for Game
Playing. Technical Report Tech. Rep. 88, Computer Sciences Department, University
of Wisconsin.