Reimagining Chess With AlphaZero

contributed articles
DOI:10.1145/ 3460349
game and is unlikely to ever fall out of
AI is driving the next evolution of chess, giving fashion, alternative variants provide
an avenue for more creative play. In
players a glimpse into the game’s future. Fischer random chess, the brainchild
of former world champion Bobby
BY NENAD TOMAŠEV, ULRICH PAQUET, Fischer, the initial position is ran-
DEMIS HASSABIS, AND VLADIMIR KRAMNIK domized to counter the dominance of
opening preparation in a game.7 One
Reimagining
could consider not only entirely new
ideas, but also reassess some of the
newer additions to the game. For ex-
ample, the “castling” move was only
Chess with
introduced in its current form in the
17th century. What would chess have
been like had castling not been in-
corporated into the rules? Without
AlphaZero
recourse to repeating history, we rei-
magine chess and address such ques-
tions in silico with AlphaZero.25
AlphaZero is a system that can
learn superhuman chess strategies
from scratch without any human
supervision.19,22 It represents a mile-
stone in artificial intelligence (AI),
a field that has ventured down the
corridors of chess more than once in
search of challenges and inspiration.
MODER N C H ESS I S the culmination of centuries of Throughout the history of computer
experience, as well as an evolutionary sequence of chess, the focus was on creating sys-
rule adjustments from its inception in the 6th century tems that could spar with top human
players over the board.3 Computer
to the modern rules we know today.17 While classical chess has progressed steadily since
chess still captivates the minds of millions of players the 1950s, with better-tuned evalu-
ation functions and enhanced
worldwide, the game is anything but static. Many search algorithms deployed on
variants have been proposed and played over the increasingly more computational
years by enthusiasts and theorists.8,20 They continue
the evolutionary cycle by altering the board, piece key insights
placement, or the rules—offering players “something ˽ Using neural networks together with
advanced reinforcement learning
subtle, sparkling, or amusing which cannot be done in algorithms can surpass human
knowledge of chess and allow us to
ordinary chess.”1 reimagine the game as we know it.
Technological progress is the new driver of the ˽ We use AlphaZero, a system that learns
to play chess from scratch, to explore
evolutionary cycle. Chess engines increase in strength, what variations of chess would look like
at superhuman level.
and players have access to millions of computer games
˽ Atomic changes to the rules of classical
and volumes of opening theory. Consequently, the chess alter game balance, with the
material value of the pieces being
number of decisive games in super-tournaments has altered accordingly, resulting in either
declined, and it takes longer for players to move from more draws or more decisive outcomes.
The findings of our quantitative and
home preparation to playing original moves on the qualitative analysis demonstrate the rich
possibilities that lie beyond the rules of
board.14 While classical chess remains a fascinating modern chess.
60 COM MUNICATIO NS O F TH E AC M | F EBR UA RY 2022 | VO L . 65 | NO. 2

resources.2,3,9,13,18,24 Alan Turing their own pieces. Final-
already envisioned more in 1953 by ly, Stalemate=win recasts
asking, “Could one make a machine stalemate as a win for the at-
to play chess, and to improve its tacking side, rather than a draw.
play, game by game, profiting from As such, it is particularly aimed at
its experience?”27 Unlike its prede- increasing the decisiveness of the
cessors, AlphaZero learns its policy game, by removing certain defensive
from scratch from repeated self-play patterns. Self-capture is sometimes
games, answering the second part referred to as “Reform Chess” or “Free
of Turing’s question. The result is a Capture Chess,” while Pawn-back is
unique approach to playing classical called “Wren’s Game” by Pritchard.20
chess22 and a new era in the devel- Figure 1 illustrates positions from
opment of chess engines, as spear- AlphaZero games for three of these
headed by Leela Chess Zero.15 variants.
AlphaZero’s ability to continu-
ally improve its understanding of the AlphaZero
game, and reach superhuman play- AlphaZero is an adaptive learning
ing strength in classical chess and system that improves through many
Go,25 lends itself to the question of rounds of self-play.25 It consists of a
assessing chess variants and poten- deep neural network fθ with weights
tial variants of other board games θ that compute (p, υ) = fθ(s) for a given
in the future. Provided only with the position or state s. The network out-
implementation of the rules, it is pos- puts a vector of move probabilities p
sible to effectively simulate decades with elements p(s′|s) as prior prob-
of human experience in a day, open- abilities for considering each move
ing a window into top-level play of and, hence, each next state s′.a If we
each variant. In doing so, computer denote a game’s outcome numerically
chess completes the circle, from the by +1 for a win, 0 for a draw, and −1 for
early days of pitting man vs. machine a loss, the network additionally out-
to a collaborative present of man with puts a scalar estimate υ ∈ (−1, 1) of the
machine, where AI can empower play- expected outcome of the game from
ers to explore what chess is and what position s.
it could become.11 Move selection is done
by Monte Carlo tree
Rule Alterations search (MCTS), which runs
There are many ways in which the repeated search simula-
rules of chess could be altered. In this tions of how the game might
work, we limit ourselves to atomic unfold up to a preset
changes that do not involve changes maximum ply depth.
to the starting position and keep the In one MCTS simula-
game as close as possible to classical tion, f θ is recursively
chess. Some of the alterations that we applied to a sequence
consider are novel, while others have of positions until a
been previously discussed within the maximum-depth leaf
ALL ILLUST RATIONS BY PET ER C ROW T HER ASSOCIAT ES
chess community but have yet to be node is reached. The

widely adopted. The nine changes sequence of moves in
considered in this study are listed in
Table 1. No-castling and No-castling a We have suppressed nota-
(10) involve a full and partial restric- tion somewhat; the prob-
tion on the castling rule. Pawn-one- abilities are technically
square, Semi-torpedo, Torpedo, Pawn- over actions or moves a in
state s, but as each action a
back, and Pawn-sideways involve deterministically leads to a
changes to pawn mobility. Self-capture separate next position s′, we use
chess allows players to also capture the concise p(s′|s) in this paper.
F E B R UA RY 2 0 2 2 | VO L. 6 5 | N O. 2 | C OM M U N IC AT ION S OF T HE ACM 61
a simulation depends on an action moves at a “first glance” of the board. diversity by sampling the first 20
selection criterion that is applied at When a leaf node is reached, its posi- plies in each game proportional to
each node along the path; it is this tion’s evaluation υ is “backed up” to the MCTS visit counts for each move.
version of the PUCT algorithm 21 that the root, with each node along the Game outcomes are presented in
trades off exploration against revisit- path incrementing its visit count and Figure 2. A selection of the self-play
ing more promising moves more fre- including the leaf’s υ in its action- games is annotated and presented in
quently over consecutive simulations. value estimate. After a number of such our technical report, together with the
The action selection criterion is such MCTS simulations, the root move that qualitative assessments by Vladimir
that before a state is first encountered was visited most is played. Kramnik.26
in a simulation, its resulting prior Training fθ (s) is done via gradient Across all variants the percentage
vector p assigns weights to candidate descent steps to let p and υ predict the of drawn games increases with lon-
next move and final game outcome ger calculation times, suggesting that
from s as closely as possible for a they may be theoretically drawn, as
streaming sample of game positions. is believed to be the case for classical
The sample is continually refreshed chess. However, when calculation is
with self-play games that are gener- sliced at either 1 sec or 1 min, four vari-
ated using fθ in MCTS as fθ is updated. ants consistently stand out with games
The result is a feedback loop that pro- that are more decisive than classi-
duces a streaming sample of games of cal chess: Torpedo, Semi-torpedo,
increasing quality. No-castling, and Stalemate = win.
The starting point for our experi- Conditioned on the games in Figure
mental setup is the input state 2, the posterior probabilities that each
representation, neural network of these four variants would yield a
architecture, MCTS depth and lower draw rate than classical chess is
action selection criterion, at least 99.9% at 1 sec calculation and
AlphaZero training configura- 87% at 1 min calculation (see Tomašev
tion, and hardware in Silver et al.26 for full analysis). Simply put,
et al.25 As the rule changes in some variants may be harder to play,
Table 1 are atomic, we assume involving more calculation and richer
that a quality of play compa- patterns. Figure 2 adds data to a long-
rable to AlphaZero’s classical standing debate about whether letting
chess games22 would be reached for stalemate count as a win would make
each of the variants under the same top-level chess substantially more
conditions. If the experimental setup decisive.16 At 1 sec per move, there
is kept constant except for altering are 2.4% fewer Stalemate=win draws
the legal moves list, we furthermore than under classical rules; at 1 min
assume that differences in game per move, this reduces to 0.8% fewer
outcomes would be informative draws.
of their relative decisiveness. We
trained each variant in Table 1’s Utilization of Special Moves
neural network from a random The chess variants’ special moves
initialization for 1 million gradi- play an important role in the arising
ent descent steps using the config- dynamics of play, as evidenced by
uration in Silver et al.25 their frequency of use in the 1-min-
per-move self-play games discussed
Self-Play Games for in the previous section. The variants’
Each Chess Variant median game lengths ranged from 62
For each chess variant, we used to 76 moves, while the median game
the resulting AlphaZero model under classical rules was 68 moves
to generate a diverse set of long. This suggests that none of the
10,000 self-play games at special moves would radically impact
1 sec per move and 1,000 the time that a player spends at the
self-play games at 1 min board.
per move. In the absence of At least one torpedo move appeared
external stochasticity, each in 83% of all Semi-torpedo games and
variant’s self-play games 94% of Torpedo games. A pawn promo-
would be identical under tion with a torpedo move occurred in
the same time controls. To 21% of Torpedo games, highlighting
generate self-play games the speed at which a passed pawn can
for analysis, we promoted be promoted to a queen. Backwards
62 COMM UNICATIO NS O F THE ACM | F EBR UA RY 2022 | VO L . 65 | NO. 2

Table 1. A list of considered alterations to the rules of chess.
Variant Rule change

No-castling Castling is disallowed.
No-castling (10) Castling is disallowed for the first 10 moves (20 plies).
Pawn one square Pawns can only move by one square.
Stalemate = win Forcing stalemate is a win rather than a draw.
Torpedo Pawns can move by one or two squares anywhere on the board and can be captured en passant after any
two-square advance.
Semi-torpedo Pawns can move by two squares both from the second and the third ranks, and they can be captured en passant
after any two-square advance.
Pawn-back Pawns can move backwards by one square but only back to the second/seventh rank for White/Black.
Pawn moves do not count toward the 50-move rule.
Pawn-sideways Pawns can also move sideways by one square. Sideways pawn moves do not count toward the 50-move rule.
Self-capture It is possible to capture one’s own pieces.
Figure 1. Examples by AlphaZero of three of the nine chess variants analyzed in this article. In Torpedo chess (left), White generates rapid
counterplay with a torpedo move (b4-b6). Rh1 is followed by yet another torpedo move, b6-b8=Q. In Pawn-sideways chess (center), Black
plays a tactical sideways pawn move (f7-e7) after sacrificing a knight on f2 in the previous move, opening the f-file toward the White king.
In Self-capture chess (right), White’s self-capture move (Rxh4) generates threats against the Black king.
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
a b c d e f g h a b c d e f g h a b c d e f g h
Figure 2. AlphaZero self-play game outcomes: for 10,000 games played at 1 sec per move (left) and for 1,000 games played at 1 min per
move (right).
GameGame
outcomes at 1 second
outcomes per move
at 1 second per move GameGame
outcomes
outcomes
at 1 minute
at 1 minute
per move
per move
WhiteWhite
wins wins DrawDraw BlackBlack
wins wins WhiteWhite
wins wins DrawDraw BlackBlack
wins wins
Classical 772 772

Classical 88208820 409 409 Classical 18
Classical 18 979 979 3 3
No-castling
No-castling 1110 1110 84418441 449 449 No-castling 29
No-castling 29 968 968 3 3
No-castling (10) (10)604 604
No-castling 90029002 394 394No-castling
No-castling 11 11
(10) (10) 985 985 4 4
PawnPawn one square 709 709
one square 88918891 400 400
PawnPawn
one square 9
one square 9 988 988 3 3
Stalemate=win 10001000
Stalemate=win 86068606 394 394 Stalemate=win 25
Stalemate=win 25 971 971 4 4
Torpedo
Torpedo 20862086 7191 7191 723 723 Torpedo 93
Torpedo 93 894 894 13 13
Semi-torpedo
Semi-torpedo 13061306 81038103 591 591 Semi-torpedo 27
Semi-torpedo 27 964 964 6 6
Pawn-back532 532
Pawn-back 91609160 308 308 Pawn-back 6
Pawn-back 6 990 990 4 4
Pawn-sideways 872 872
Pawn-sideways 8815 8815 313 313 Pawn-sideways 15 15
Pawn-sideways 980 980 5 5
Self-capture 871 871
Self-capture 87838783 346 346 Self-capture 17
Self-capture 17 981 981 2 2
F E B R UA RY 2 0 2 2 | VO L. 6 5 | N O. 2 | C OM M U N IC AT ION S OF T HE ACM 63
pawn moves occurred in 97% of Pawn- Diversity of Choices estimate the diversity of choice in
back games, while all Pawn-sideways in Top-Level Play a single position st via the entropy
games involved a sideways pawn Rule perturbations should not leave of the AlphaZero prior. The prior is
move. A startling 12% of all moves in top-level players with only a few forc- a weighted list of possible moves
Pawn-sideways chess were sideways ing lines of play and so diminish the to st+1 from st that are used in
moves, demonstrating a high utiliza- richness of classical chess. Generally, AlphaZero’s MCTS search; it specifies
tion of the newly introduced move. In if there are M(st) legal moves in posi- candidates for consideration before
Stalemate=win chess, the percentage tion or state st at ply t, then the number MCTS calculation. The average
of all decisive games that were won by of candidate moves m(st)—the num- information content, or entropy,
stalemate rather than mate was 31%, ber that a top player would realisti- ,
although this number includes con- cally consider—is much smaller than represents the information content in
clusions from endings such as K+Q M(st). de Groot6 called M(st) a player’s the reasonable choices available in st. A
vs. K, where either conclusion is a legal freedom of choice and m(st) an more naturally interpretable number
valid win. In Self-capture chess, 42% objective freedom of choice. Iida et is the average number of candidate
of games featured self-capture moves, al.10 hypothesize that moves in the position, which we define
most commonly involving pawns on average, and we show that similar as m(st) = exp(H(st)). It is the number of
(95%), while bishop (3%), knight (1%), relationships hold across variants. uniformly weighted moves that could
and rook self-captures (1%) were rarer. Choices in a single position. We be encoded in the same number of
nats as p(st+1|st).b These two quantities
Table 2. The entropy (in nats) of the first 20 plies of the AlphaZero prior, with an estimate of will be used to construct footprints for
the number of lines in the equivalent opening book.
opening tree diversity.
Opening tree diversity. We mea
Variant Entropy 20-ply Games Variant Entropy 20-ply Games
sure the diversity of choice with the
No-castling 27.65 1.02 × 10 12
Stalemate = win 29.01 3.97 × 1012 entropy of the first T plies of play
Torpedo 27.89 1.30 × 1012 Semi-torpedo 31.63 5.45 × 1013 from each variant’s prior p from fθ(s).
If s = [s1, s2, … sT] represents the seq
Self-capture 27.94 1.36 × 1012 Pawn-back 32.30 1.07 × 1014
uence of states after T plies, the prior
No-castling (10) 27.97 1.40 × 1012 Pawn-sideways 34.16 6.85 × 1014 probability of s is .
Classical 28.58 2.58 × 10 12
Pawn one square 38.95 8.24 × 1016 The entropy in a sequence of T moves
Uniform random 64.96 1.63 × 1028
is hence H(T) = – Σs p(s) log p(s) =
Es∼p(s)[– log p(s)]. An entropy H(T) = 0
implies that, according to the prior,
one and only one reasonable opening
Figure 3. Statistical footprints of the diversity of responses to 1. e4 and 1. Nf3 in Classical line could be considered by White and
and No-castling chess, as well as the average number of candidate moves available for
White and Black at each ply.
Black up to depth T, with all deviations
from that line leading to substantially
1... e5 2. Nf3 Nc6 3. Bb5 Nf6 4. O-O Nxe4 5. Re1 Nd6 6. Nxe5 Nxe5
worse positions for the deviating
0.05
7. Bf1 Be7 8. Rxe5 O-O 9. d4 Bf6 10. Re1 Re8 11. c3
side. A higher H(T) implies that we
0.05
Classical (e4) No-castling (e4) would a priori expect a wider opening
Classical (Nf3) No-castling (Nf3)
0.04 0.04 tree of variations, and consequently
a more diverse set of middlegame
Density (histogram)
Density (histogram)
0.03 0.03 positions. H(T) contains an average

0.02
over an exponential number of move
0.02
sequences, which we approximate
0.01 0.01 with a Monte Carlo estimate. Table
2 shows the estimated entropy of 20-
0.00 0.00
0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 ply opening trees for each variant.
−log p(s) when s ∼ p(s) at ply depth 20 −log p(s) when s ∼ p(s) at ply depth 20
As one of the most drawish variants,
players of Pawn-one-square chess
6.5 7.5
No-castling (e4) (w) will have substantially more playable
Average number of candidate moves
6.0 7.0
Average number of candidate moves
No-castling (e4) (b)
5.5
No-castling (Nf3) (w) candidate moves at their disposal,
6.5
5.0
No-castling (Nf3) (b) despite there being fewer legal moves
6.0
4.5
than any other variant.
5.5
4.0
5.0
3.5 Classical (e4) (w) b As an illustrative example, if the number of
4.5
Classical (e4) (b) candidate moves is m(st) = 3 for some p(st+1|st)
3.0
Classical (Nf3) (w) 4.0 that might put nonzero mass on all of its
2.5 Classical (Nf3) (b)
3.5 moves, then m(st) is also equal to the number
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 of candidate moves of a probability vector
Ply t Ply t
that puts equal nonzero mass
on only three moves.
64 COMM UNICATIO NS O F THE AC M | F EBR UA RY 2022 | VO L . 65 | NO. 2

Shannon’s source coding theorem Table 3. Estimated piece values from AlphaZero self-play games for each variant.
states that we can compress a game
sampled from p(s) into just over H(T) Variant p N B R Q Variant p N B R Q
nats. This is equivalent to assigning a
Classical 1 3.05 3.33 5.63 9.5 Self-capture 1 3.10 3.22 5.34 9.42
unique number to each of exp(H(T))
games, and we take this as the size of No-castling 1 2.97 3.13 5.02 9.49 Pawn-back 1 2.65 2.85 4.67 9.39
a variant’s opening book: the number No-castling (10) 1 3.14 3.40 5.37 9.85 Semi-torpedo 1 2.72 2.95 4.69 8.3
of plausible T-ply games in a variant.
Pawn one square 1 2.95 3.14 5.36 9.62 Torpedo 1 2.25 2.46 3.58 7.12
The “uniform random” policy in
Table 2 plays all legal moves in clas- Stalemate = win 1 2.95 3.13 4.76 8.96 Pawn-sideways 1 1.8 1.98 2.99 5.92
sical chess with equal probability. Its
entropy is more than double that of
classical chess; conversely, like the 4. O-O. Without the option to castle, pieces in chess helps players master
hypothesis by Iida et al.,10 the classical the Berlin Defence and many other the game and is one of the very first
opening book is a little smaller than lines either disappear or become less pieces of chess knowledge taught to
the square root of the number of all prominent, where 1. e4 yields a simi- beginners.
legal openings. lar variety of preferred lines of play as We approximate piece values via
Classical vs. no-castling chess. 1. Nf3. the weights of a linear model trained
Both White and Black castle in Average number of candidate moves. to predict game outcome from differ-
most classical chess openings, The entropy of a chess variant’s prior ences in piece counts for any given
and removing castling as an opening tree is an unwieldy number position, given as d = [1, dp, dN, dB,
option profoundly changes the that does not immediately inform us dR, dQ]. We fit the model on fast-play
characteristics of the game.14 In how many move options are in each AlphaZero games for each of the chess
this section, we explore the changes variant. Instead, Figure 3 (bottom variants. We define gw with weights w
using one opening, the Berlin row) also shows the average number ∈ R6 as gw(s) = tanh(wT d). In the linear
Defence, as an example. A tool is the of candidate moves at ply T, M(T) model, the weights w indicate relative
decomposition of the entropy H(T)’s = Σs p(s) m(sT) = Es∼p(s) [m(sT)], for a
statistical expectation, which can progression of plies T = 2, 3, … after
help identify the existence (or not) of 1. e4 and 1. Nf3. Notably, Black in
defensive lines that equalize the game particular is afforded far fewer
in an almost forceful way. The top options on average after 1.
row in Figure 3 shows histograms of e4 in classical chess than in
–log p(s) when s is generated from the No-castling chess. When
AlphaZero prior after 1. e4 and 1. Nf3. castling is removed as a
The histograms provide a footprint of legal move, we can expect
opening diversity. both White and Black to
Berlin defence. In classical chess, have at least one more
one equalizing defensive resource for plausible move at their
Black in the Ruy Lopez (1. e4 e5 2. Nf3 disposal for each of the first
Nc6 3. Bb5) is the Berlin Defence, 20 plies.
starting with 3… Nf6. Vladimir
Kramnik successfully used it as a Piece Value
defensive resource for Black during Material plays an important
his world championship match role in chess and is often
against Garry Kasparov in 2000. Prior used to assess whether a
to the match, chess engines of the particular sequence of
time evaluated the Berlin endgame piece exchanges and
at around +1 in White’s favor, but captures is favor-
today it is considered to be very solid, able. Material sac-
with modern engines assessing most rifices in chess are
arising positions as being equal.26 made either for
In Figure 3 (top left), samples of concrete tacti-
s that contribute to the high prob- cal reasons—for
ability spike after 1. e4 correspond to example, mat-
AlphaZero’s strong preference for the ing attacks—or
Berlin Defence in classical chess. In as a trade-off for
the footprint of lines after 1. Nf3, the long-term posi-
spike disappears, indicating a wider tional strength.
range of possibilities for both sides. Understanding the
White’s main response to 3. Nf6 is material value of
piece importance. If (s, z) ∼ games Torpedo, Pawn-sideways, No-Castling, 7. Gligorić, S. Shall We Play Fischerandom Chess?
Batsford, 2002
represent a sample of a position and and Self-capture are now a reality, 8. Gollon, J. Chess Variations: Ancient, Regional and
final game outcome from a variant’s playable on a major chess portal such Modern. Charles E. Tuttle Company, 1968.
9. Heinz, E.A. Scalable Search in Computer Chess.
self-play games, we minimize (w) = as chess.com.5 On the back of the Vieweg+Teubner Verlag (2000).
E(s,z)∼games [(z – gw(s))2] empirically over initial evidence, the first No-castling 10. Iida, H., Takeshita, N., and Yoshimura, J. A metric
for entertainment of boardgames: Its implication
w and normalize weights w by wp to tournament was held in Chennai in for evolution of chess variants. In Entertainment
yield the relative piece values. The January 2020.23 Chess’s role in artifi- Computing: Technologies and Application, R. Nakatsu
and J. Hoshino, eds. (2003). 65–72.
recovered piece values for each of the cial intelligence research is far from 11. Kasparov, G. Deep Thinking: Where Machine Intelligence
chess variants are given in Table 3. over. The outcome of this article is a Ends and Human Creativity Begins. John Murray, 2017.
12. Kaufman, L. The evaluation of material imbalances.
Looking at the piece value esti- result of a man with machine, show- Chess Life (1999).
13. Knuth, D.E. and Moore, R.W. An analysis of alpha-beta
mates for classical chess, the ing how AI can provide the evidence to pruning. Artif. Intell. 6, 4 (1975), 293–326.
method approximately recovers take reimagining to reality. 14. Kramnik, V. Kramnik and AlphaZero: How to rethink
chess. Chess.com (December 2, 2019), https://chess.
known material values4,12 and identi- com/article/view/no-castling-chess-kramnik-alphazero.
fies bishops as more valuable than 15. LCZero Development Community. Leela Chess Zero.
https://lczero.org.
knights. Estimates of piece values 16. Lillebo, P. Stalemate: The long and the short of it.
in No-castling, No- castling (10), ChessBase (Aug. 2, 2014), https://en.chessbase.com/
post/stalemate-the-long-and-the-short-of-it.
Pawn-one-square, Self-capture, and 17. Murray, H.J.R. The History of Chess. Oxford University
Stalemate = win variants look fairly Press, 1913.
18. Newborn, M. Computer Chess. Academic Press,
similar, which is not surprising given 19. Nielsen, P.H. When Magnus met AlphaZero. New
the minor differences in piece mobil- Chess (December 2019), 14–23.
20. Pritchard, D.B. The Classified Encyclopedia of Chess
ity compared to the other variants. Variants. Games and Puzzles Publications (1994).
Variants involving an increase in pawn 21. Rosin, C.D. Multi-armed bandits with episode context.
Ann. Math. Artif. Intell. 61, 3 (2011), 203–230.
mobility result in lower relative val- 22. Sadler, M. and Regan, N. Game Changer: AlphaZero’s
ues for other pieces, as can be seen in Groundbreaking Chess Strategies and the Promise of
AI. New In Chess (February 2019).
Pawn-back, Semi-torpedo, Torpedo, 23. Shah, S. First ever “no-castling” tournament results in
89% decisive games! ChessBase (January 19, 2020),
and Pawn-sideways. In Pawn-sideways https://en.chessbase.com/post/the-first-ever-no-
chess, AlphaZero often sees the trade castling-chess-tournament-results-in-89-decisive-
games.
of a knight or bishop for two pawns 24. Shannon, C.E. Programming a computer for playing
as favorable, in accordance with this chess. Philos. Mag. 41, 314 (1950), 256–275.
25. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I.,
approximation. Such an exchange Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D.,
would normally be considered bad in Graepel, T., Lillicrap, T., Simonyan, K., and Hassabis,
D. A general reinforcement learning algorithm that
classical chess. Material values may masters chess, shogi, and Go through self-play.
vary across different game stages and Science 362, 6419 (2018), 1140–1144.
26. Tomašev, N., Paquet, U., Hassabis, D., and Kramnik, V.
position types, and hence, the values Assessing game balance with AlphaZero: Exploring
in Table 3 are merely meant to help alternative rule sets in chess (Sept. 2020). arXiv:cs.
AI/2009.04374
new players make sense of tactical 27. Turing, A. Digital computers applied to games.
exchanges in these chess variants. Looking beyond chess, this ar- In Faster Than Thought: A Symposium on Digital
Computing Machines, B.V. Bowden ed. Pitman
ticle’s contribution hinged on being Publishing, London, 1953, 286–310.
From Reimagining to Reality able to learn a policy for an agent in an 28. Zhang, H., Wang, J., Zhou, Z., Zhang, W., Wen, Y., Yu,
Y., and Li, W. Learning to design games: Strategic
The combination of human curiosity environment with known dynamics environments in reinforcement learning. In
Proceedings of the 27th Intern. Joint Conf. on Artificial
and a powerful reinforcement learn- and then exploring changes in the en- Intelligence (2018), J. Lang ed, 3068–3074.
ing system allowed us to reimagine vironment to measure different emer-
what chess would have looked like if gent properties of agent behavior.28 Nenad Tomašev is a research scientist at DeepMind
history had taken a slightly different We believe that a similar approach Technologies Ltd.
course. When the statistical proper- could be used for auto-adjusting game Ulrich Paquet is a research scientist at DeepMind
Technologies Ltd.
ties of top-level AlphaZero games mechanics in other types of games,
are compared to classical chess, a including computer games, in cases Demis Hassabis is the founder and CEO of DeepMind
Technologies Ltd.
number of more decisive variants where a sufficiently strong reinforce-
Vladimir Kramnik is a former World Chess Champion
appear, without impacting the diver- ment learning system is available. (2000–2007).
sity of plausible options available to
a player. Aside from a mathematical References Copyright held by authors/owners.
1. Beasly, J. What can we expect from a new chess
evaluation, the actual games could be variant? Variant Chess 4, 29 (1998), 2.
viewed through an aesthetic lens; this 2. Berliner, H. 1st U.S. Computer Chess Championship.
Chess Live (November 1970), 638.
was done in Tomašev et al.26 and in 3. Campbell, M., Hoane Jr., J., and Hsu, F-h. Deep blue.
many online discussion forums.14 Artif. Intell. 134, (2002), 57–83.
4. Capablanca, J.R. and de Firmian, N. Chess
Taken together, the statistical Fundamentals: Completely Revised and Updated for
properties and aesthetics provide evi- the 21st Century. Random House Puzzles & Games,
(2006). Watch the authors discuss
dence that some variants would lead 5. Chess.com variants. Chess.com. http://chess.com/ this work in the exclusive
variants. Communications video.
to games that are at least as engaging 6. de Groot, A.D. Het Denken van den Schaker. (Thought and https://cacm.acm.org/videos/
as classical chess. Variants such as Choice in Chess). Amsterdam University Press, 1946. reimagining-chess-alphazero
66 COMM UNICATIO NS O F THE AC M | F EBR UA RY 2022 | VO L . 65 | NO. 2

Reimagining Chess With AlphaZero

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reimagining Chess With AlphaZero

Uploaded by

Copyright:

Available Formats

contributed articles

60 COM MUNICATIO NS O F TH E AC M | F EBR UA RY 2022 | VO L . 65 | NO. 2

chess community but have yet to be node is reached. The

62 COMM UNICATIO NS O F THE ACM | F EBR UA RY 2022 | VO L . 65 | NO. 2

Table 1. A list of considered alterations to the rules of chess.

Variant Rule change

Classical 772 772

0.03 0.03 positions. H(T) contains an average

No-castling (e4) (b)

64 COMM UNICATIO NS O F THE AC M | F EBR UA RY 2022 | VO L . 65 | NO. 2

66 COMM UNICATIO NS O F THE AC M | F EBR UA RY 2022 | VO L . 65 | NO. 2

You might also like