You are on page 1of 23

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/364438505

AI-powered tiebreak mechanisms: An application to chess

Preprint · October 2022


DOI: 10.48550/arXiv.2210.08289

CITATIONS READS
0 29

2 authors:

Nejat Anbarci Mehmet S. Ismail


Durham University King's College London
102 PUBLICATIONS   1,375 CITATIONS    23 PUBLICATIONS   63 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Economics of Natural Disasters View project

Conflict and Cooperation View project

All content following this page was uploaded by Mehmet S. Ismail on 01 November 2022.

The user has requested enhancement of the downloaded file.


AI-powered mechanisms as judges: Breaking ties in
chess and beyond*
Nejat Anbarci„ Mehmet S. Ismail…

Revised: 1st November, 2022; First version: 12th August, 2022

Abstract
Recently, Artificial Intelligence (AI) technology use has been rising in sports.
For example, to reduce staff during the COVID-19 pandemic, major tennis tour-
naments replaced human line judges with Hawk-Eye Live technology. AI is now
ready to move beyond such mundane tasks, however. A case in point and a perfect
application ground is chess. To reduce the growing incidence of draws, many elite
tournaments have resorted to fast chess tiebreakers. However, these tiebreakers
are vulnerable to strategic manipulation, e.g., in the last game of the 2018 World
Chess Championship, Carlsen—in a significantly advantageous position—offered a
draw to Caruana (whom accepted the offer) to proceed to fast chess tiebreaks in
which Carlsen had even better odds of winning the championship. By contrast, we
prove that our AI-based method can serve as a judge to break ties without being
vulnerable to such manipulation. It relies on measuring the difference between the
evaluations of a player’s actual move and the best move as deemed by a powerful
chess engine. If there is a tie, the player with the higher quality measure wins the
tiebreak. We generalize our method to all competitive sports and games in which
AI’s superiority is—or can be—established.

Keywords: AI, chess, tiebreaks, strategic manipulation, strategyproofness


* We would like to thank FIDE World Chess Champion (2005-2006) and former world no. 1 Grand
Master Veselin Topalov for his comments, encouragement, and referring our proposal to Norway Chess.
We are also thankful to Norway Chess and its founder Kjell Madland for the opportunity to implement
our research in the 2023 edition of the tournament, which is expected to host some of the best players in
the world. We are also grateful to Steven Brams, Michael Naef and Shahanah Schmid for their valuable
comments and suggestions.
„ Department of Economics, Durham University, Durham DH1 3LB, UK. nejat.anbarci@durham.ac.uk
… Department of Political Economy, King’s College London, London, WC2R 2LS, UK.
mehmet.ismail@kcl.ac.uk

1
100

80
Draw Percentage

60

40

20

0
1880 1900 1920 1940 1960 1980 2000 2020
World Chess Championship

Figure 1: Percentage of draws in World Chess Championship Matches 1886–2021. Source:


chessgames.com

1 Introduction
The use of Artificial Intelligence (AI) technology in sports has been on the rise recently.
During the COVID-19 pandemic, for instance, major tennis tournaments replaced human
line judges with Hawk-Eye Live technology in an effort to reduce staff. Also, more than
a decade ago, football began using Goal-line technology to assess when the ball has
completely crossed the goal line. These are examples of mechanical AI systems requiring
the assistance of electronic devices to determine the precise location of balls impartially
and fairly, thus minimizing, if not eliminating, any controversy.
A major question now is whether AI could move beyond such rudimentary tasks in
sports. A case in point and a perfect application ground is chess for two complementary
reasons. On the one hand, advanced AI systems, including Stockfish, AlphaZero, and
MuZero have already been implemented in chess [1, 27, 26]; further, the superiority of
top chess engines has been widely acknowledged ever since IBM’s Deep Blue defeated
former world chess champion Garry Kasparov in 1997 [9]. On the other hand, despite
its current popularity all around the world, chess is very much hampered by the grow-
ing incidence of draws, especially in the world championships, as Fig. 1 illustrates. In
response to this draw problem, elite chess tournaments—like other sports competitions
[2, 3, 6, 13]—resorted to the tiebreakers. The most common final tiebreaker is the so-
called Armageddon game, where White has more time (e.g., five minutes) to think on the

2
Time-control Time per player
Classical 90 min.
Rapid 15 min.
Blitz 5 min.
Armageddon White: 5 min., Black: 4 min. (and draw odds)

Table 1: An example of classical vs. fast chess time-controls

clock than Black (e.g., four minutes), but Black wins in the event of a draw. However, it
sparks controversy among elite players and chess aficionados alike:

“Armageddon is a chess penalty shoot-out, a controversial format intended to


prevent draws and to stimulate interesting play. It can also lead to chaotic
scrambles where pieces fall off the board, players bang down their moves and
hammer the clocks, and fractions of a second decide the result” (Leonard
Barden, The Guardian [5]).

“Logic could hardly ever be found in the Armageddon games. But this, in
turn, has its own logic” (Grand Master (GM) Ian Nepomniachtchi on the
final tiebreaker in the 2022 World Fischer Random Chess Championship [23]).

In this paper, we propose that AI systems serve as a judge in the event of a tie in
games such as chess. In chess, in particular, we introduce a novel and practicable AI-
based method and show that it essentially eliminates the draw problem. In a nutshell, for
each position in the game, our method measures the difference between the evaluations
of a player’s “actual move” and the “best move” as deemed by a powerful chess engine.
In case of a tie, the player with the higher “quality” measure wins the tiebreak.
Most importantly, we prove that our method is immune to strategic manipulation,
whereas the current fast chess tiebreakers, as illustrated in Table 1, are not. To give an
example, in the last game of the 2018 World Chess Championship, Magnus Carlsen—in a
significantly advantageous position—offered a draw to Fabiano Caruana, which Caruana
accepted. The reason behind Carlsen’s offer was to proceed to fast chess tiebreaks in
which he had even better odds of winning the championship. In contrast, Carlsen would
not possibly benefit from such an offer under our method, so he most likely would not
offer a draw in the same situation (see section 2.1 for details).

3
We generalize our method to all competitive sports and games in which AI’s superiority
is—or can be—established. More specifically, we introduce a family of AI-based scoring
mechanisms and the concept of “tiebreak strategyproofness” in n-person zero-sum games.
A mechanism is called tiebreak strategyproof (TSP) if a player cannot improve their
tiebreak score by playing a sub-optimal action according to a given AI system. Moreover,
we show that our chess tiebreak method is TSP. We anticipate that our method will be
the first of many applications of AI-based TSP mechanisms to break ties in sports and
games.
TSP is related to the notion of strategyproofness that is often used in social choice,
voting systems, mechanism design, and sports and competitive games; though, formal
definition of strategyproofness varies depending on the context. (For a selective literature,
see, e.g., [17, 19, 25, 21, 4, 7, 16, 15]) Informally, a social choice rule or a mechanism is said
to be strategyproof if being truthful is a weakly dominant strategy for every player. In
sports and competitive games, a mechanism is said to be strategyproof if the mechanism
is immune to strategic manipulation, i.e., no agent can benefit from strategizing.

1.1 The draw problem in chess


The “draw problem” in chess has a long history. Neither chess aficionados nor elite players
appear to enjoy the increasing number of draws in chess tournaments. The current world
champion, Magnus Carlsen, who recently announced that he will not defend his title in
the 2023 cycle, appears to be dissatisfied as well. “Personally, I’m hoping that this time
there will be fewer draws than there have been in the last few times, because basically I
have not led a world championship match in classical chess since 2014” [11].
The 2018 world championship tournament, for instance, ended with 12 consecutive
draws. The world champion was then determined by a series of “rapid” games, whereby
players compete under significantly shorter time-control than the classical games (see
Table 1). If the games in the tiebreaks did not determine the winner, then a final game
called Armageddon would have been played to determine the winner. Clearly, compared
to classical chess games, there is no doubt that the fast-paced rapid, blitz and Armageddon
games lower the quality of chess played; the latter also raises questions about its fairness
because it treats players asymmetrically.

4
1.2 An AI-based scoring mechanism
In the event of a win, it is straightforward to deduce that the winner played a higher quality
chess than the loser. In the event of a tie, however, it is more difficult to assert that the
two players’ performances were comparable, let alone identical. With the advancements
in chess AIs, their differences in quality can now be quantified. Average centipawn loss
is a known metric for evaluating the quality of a player’s moves in a game where the unit
of measurement is 1/100th of a pawn.
We instead propose a more intuitive modification of that metric, which we term the
“total pawn loss,” because (i) even chess enthusiasts do not seem to find the average
centipawn loss straightforward, and (ii) it can be manipulated by intentionally extending
the game in e.g. a theoretically drawn position. We define total pawn loss as follows.
First, at each position in the game, the difference between the evaluations of a player’s
actual move and the “best move” as deemed by a chess engine is calculated. Then, the
total pawn loss value (TPLV) for each player is simply the equivalent of the total number
of “pawn-units” the player has lost during a chess game as a result of errors. If the TPLV
is equal to zero, then it indicates that every move was perfect according to a chess engine.
Along the above lines, we propose the following AI scoring rule. In the event of a
win, the winner receives 2 points and the loser 0 points, and the player with the lower
TPLV receives an additional 1 point. If the chess engine is “strong,” then the winner
should not have a higher TPLV unless the opponent runs out of time. If the player who
lost on time has a lower TPLV, then they receive 1 point instead of 0 points. This is to
disincentivize players playing quick moves to “flag” their opponent’s clock. In the event of
a draw, the player with the lower TPLV receives 2 points and the other receives 1 point.
Each player receives 1.5 points when both players have the same TPLV, or their TPLVs
are within a threshold determined by tournament organizers or FIDE, the International
Chess Federation. For uniformity against slight inaccuracies in chess engine evaluations,
we suggest using a certain threshold, e.g. 5%, within which the TPLVs can be considered
equivalent. We next give examples of TPLV calculations in several real-world situations.

2 Real-world examples
“Everybody could see that I wasn’t really necessarily going for the maximum.
I just wanted a position that was completely safe and where I could put some
pressure. If a draw hadn’t been a satisfactory result, obviously I would have
approached it differently” (Magnus Carlsen [12] on game 12 in the 2018 World

5
chess championship).

In this section, as suggested by the above quote, we will illustrate that fast-control
(i.e., rapid, blitz, and Armageddon) tiebreaks are not TSP with a counter-example in
an actual game. If the tiebreaks are decided with faster time-control games, then the
better fast chess player—measured by having a greater Elo rating [18] under rapid/blitz
time-control—might be incentivized to make a weaker move under classical time-control
to draw the game.

2.1 World chess championship 2018: Carlsen vs. Caruana


As highlighted before, Magnus Carlsen offered a draw in a better position against Fabiano
Caruana in the last classical game in their world championship match in 2018. This was
because Carlsen was a much better player in rapid/blitz time-control than his opponent.
Indeed, he won the rapid tiebreaks convincingly with a score of 3-0. Note that Carlsen
made the best decision given the championship match, but due to the tiebreak system his
decision was not the best (i.e., manipulation-proof) in the particular game. As we will
show later, our AI-based tiebreak format better aligns these incentives. Note also that the
situation would be very different if Carlsen would guarantee winning the championship
with a draw in the last game. In that case, the incentive compatibility issue is not created
by the tiebreak mechanism but by (i) the scoring system that gives a strictly positive
point to a draw, and (ii) the fact that the value of world championship is much greater
than the value of winning a game. We do not think that it is desirable and practicable
to avoid such scenarios in part because the value of winning the world championship
title is huge. That being said, offering extra cash prizes to the winner of each game
can help incentivize players to win the games as well. For these reasons, our tiebreaking
mechanisms intentionally do not rule out the aforementioned scenarios.
Carlsen, of course, knew what he was doing when he offered a draw. During the
post-game interview, he said “My approach was not to unbalance the position at that
point” [12]. Indeed, in our opinion Carlsen would not have offered a draw in their last
game under TPLV-based scoring system because he was already doing better in terms of
having a lower TPLV than Caruana.
Carlsen’s TPLV was 5.2 in the position before his draw offer, whereas Caruana’s was
5.9. When Carlsen offered a draw, the evaluation of the position was about −1.0—i.e.,
Black is a pawn-unit better—according to Sesse, which is a strong computer running
Stockfish. If Carlsen played the best move, then his evaluation would be about −1.0,

6
GM Magnus Carlsen (B)
TPLV before draw offer 5.2
Evaluation of best move −1.0
Evaluation of draw offer 0.0
Pawn loss of draw offer 1 = −(−1 − 0)
TPLV after draw offer 6.2 = 5.2 + 1

Table 2: Calculation of TPLV after Carlsen’s draw offer to Caruana. Negative values
imply that the chess engine deems Carlsen’s position better as he has the black pieces.

Player T P LV Score AI Score


GM Fabiano Caruana (W) 5.9 0.5 2
GM Magnus Carlsen (B) 6.2 0.5 1

Table 3: Game 12 in the 2018 world chess championship. TPLVs include the draw offer
and its acceptance.

which means he would be ahead about a pawn-unit. After the draw offer was accepted,
the game ended in a draw and the evaluation of the position is obviously 0. As a result,
Carlsen lost
1 = −(−1 − 0)

pawn-unit with his offer, as calculated in Table 2. Thus, his final TPLV is 6.2 = 5.2 + 1
as Table 3 shows. A draw offer in that position would make Caruana the winner of the
tiebreak under our method.

2.2 Armageddon as a game tiebreaker: An innovation of Nor-


way Chess

Player T P LV Score Armageddon Score AI Score


GM Veselin Topalov (W) 3.15 0.5 1 2
GM Magnus Carlsen (B) 3.4 0.5 1.5 1

Table 4: TPLV scoring system vs Armageddon scoring

In another example, Table 4 illustrates the outcome of the game played by Veselin
Topalov and Magnus Carlsen in the 2022 Norway Chess Tournament. Their classical

7
25

20

15
TPLV

10

5
Jennifer Yu
0 Irina Krush
1 2 3 4 5 6 7 8 9 10 11 12 13
Round

Figure 2: TPLVs of Irina Krush and Jennifer Yu in the 2022 US Women Chess Champi-
onship games. Lower TPLV implies better play.

game ended in a draw. Then, they played an Armageddon tiebreak game, which Topalov
drew with white pieces against Magnus Carlsen, hence losing the tiebreak. However,
notice that Topalov’s TPLV is lower, so according to our TPLV scoring system he would
have won the tiebreak.

2.3 Armageddon as a tournament tiebreaker


Many elite tournaments, including the world chess championship as mentioned earlier,
uses Armageddon as a final tiebreaker. Most recently, Armageddon tiebreaker was used
in the 2022 US Women Chess Championship when Jennifer Yu and Irina Krush both tied
for the first place, scoring each 9 points out of 13. Both players made big blunders in the
Armageddon game; Irina Krush made an illegal move under time pressure and eventually
lost the game and the championship.
Fig. 2 and Table 2.3 illustrate the TPLVs of Irina Krush and Jennifer Yu in the 2022
US Women Chess Championship and the cumulative TPLVs of each player, respectively.
According to our TPLV-based tiebreak method, Irina Krush would have been the US
champion because she played a significantly better chess in the tournament according to
Stockfish: Irina Krush’s games were about two pawn-units better on average than Jennifer
Yu’s.

8
Player Cumulative TPLV Average TPLV
GM Irina Krush 159.21 12.24
GM Jennifer Yu 188.62 14.50

Table 5: TPLV vs Armageddon tiebreakers in the 2022 US Women Chess Championship.


Irina Krush would have been the champion because she had significantly lower cumulative
TPLV in the tournament.

3 Tiebreak strategyproof mechanisms in chess


This section employs basic notation and focuses solely on chess for the sake of clarity. For
a formal definition of extensive-form games and a generalization of the notions mentioned,
see Appendix A.

Game Theory Notation Chess

a game G the game of chess

a player i ∈ {1, 2} White or Black

an action ai ∈ A i a move or a draw offer/acceptance

a play ā ∈ Ā a single chess game

a node xj ∈ X a position

a tournament T (G) a tournament

AI vi : X → R a chess engine

AI best-response a∗i (x) ∈ arg maxai (x)∈Ai (x) vi (ai (x)) best move

Table 6: The terminology in game theory and chess

Let G denote the extensive-form game of chess under the standard International Chess
Federation (FIDE) rules. Table 6 summarizes the relationship between the terminologies
used in game theory and chess. In the chess terminology, a chess game is an alternating
sequence of actions taken by White and Black from the beginning to the end of the game.
In game theory, we call a chess game a play and denote it by ā ∈ Ā, which is the finite set
of all plays. A chess position describes in detail the history of a chess game up to a certain

9
point in the game. Formally, in game theory, a position is a node, denoted by x ∈ X, in
the extensive-form game G. A chess move is a choice of a player in a given position. A
draw offer is a proposal of a player that leads to a draw if agreed by the opponent. If
the opponent does not accept the offer, then the game continues as usual. Formally, an
action of a player i ∈ {1, 2} at a node x ∈ X is denoted by ai (x) ∈ Ai (x), which is the set
of all available actions of player i at node x. An action can be a move, a draw offer, or
the acceptance or rejection of the offer. A chess tournament, denoted by T (G), specifies
the rules of a chess competition, e.g., the world chess championship where two players
play a series of chess games, and Swiss tournament where a player plays against a subset
of other competitors.
We define AI as a profile v of functions where for each player i, vi : X → R. In words,
an AI yields an evaluation for every player and every position in a chess game. A chess
engine is an AI which inputs a position and outputs the evaluation of the position for
each player.
Let a∗i ∈ Ai be an action at a node x. The action is called an AI best-response if
a∗i (x) ∈ arg maxai (x)∈Ai (x) vi (ai (x)). In words, an AI best-response action at a position is
the best move according to the chess engine v.
We now introduce our metric of “total pawn loss.”

Definition 1 (Pawn loss). Let vi (a∗i (xj )) be a chess engine’s evaluation of the best move
for player i at position xj and vi (aji (xj )) be chess engine’s evaluation of i’s actual move.
Then, the pawn loss of move aji (xj ) is defined as vi (a∗i (xj )) − vi (aji (xj )).

Definition 2 (Total pawn loss value). Let ā ∈ Ā be a chess game (i.e., a play) and aji
be player i’s action at position xj in chess game ā, where āi = (a1i , a2i , ..., alii ) for some li .
Then, player i’s total pawn loss value (TPLV) is defined as,

li
X
T P LVi (ā) = [vi (a∗i (xj )) − vi (aji (xj ))].
j=1

Let ā1 , ā2 , ..., āK where āk ∈ Ā be a sequence of chess games in each of which i is a
player. Player i’s cumulative TPLV is defined as

K
X
T P LVi (āk ).
k=1

In words, at every position the difference between the evaluations of a player’s actual

10
move and the best move is calculated. A player’s TPLV is simply the total number of
pawn-units the player loses during a chess game.
Let V be the set of all AIs. An AI chess scoring mechanism in a game is a function
f : V × Ā → R2 , which inputs an AI, v, and a chess game, ā, and outputs a score for each
player. We next introduce a family of TPLV-based scoring mechanisms.

Definition 3 (TPLV-based AI chess scoring mechanisms). We define a family of AI


scoring mechanisms based on the type of competition.

1. Games: The player with the lowest T P LV receives an additional point or points,
on top of their score based on the outcome of the game (i.e., win, draw, or a loss).

2. Tournament: In case of ties in a chess tournament, the ties are broken in favor of the
player(s) with the lowest cumulative TPLV whom are ranked the first, the player(s)
with the second lowest TPLV rank the second, and so on.

We next define a specific and practicable AI scoring mechanism for chess games, which
we call the AI scoring rule.

Definition 4 (TPLV-based AI scoring rule for chess). Let ā be a chess game, si the
score of player i, and T P LVi < T P LVj be player i’s TPLV in ā. If player i wins the
chess game, ā, then i receives 2 points and player j ̸= i receives 0 points: si = 2 and
sj = 0. If chess game ā is drawn then each player receives 1 point. Under any case, if
T P LVi < T P LVj , then the player i receives an additional 1 point, and player j does not
receive any additional point: si = 2 and sj = 1. If T P LVi = T P LVj , then each player
receives an additional 0.5 points.

In simple words, we propose that the winner of a chess game receives 3 points (if they
have a lower TPLV) and the loser 0 points, and in the event of a draw, the player with
the lower TPLV receives 2 points and the other receives 1 point. This (3, 2, 1) scoring
system is akin to the scoring system used in volleyball when the match proceeds to the
tiebreak, which is a shorter fifth set. Norway Chess also experimented with the (3, 2, 1)
scoring system, but now uses (3, 1.5, 1) system perhaps to further incentivize winning a
game. To our knowledge, Norway Chess was the first to use Armageddon to break ties at
the game level rather than at the tournament level.
There are several ways one could use TPLV to break ties. Definition 4 provides a
specific scoring rule in case of a tie in a chess game. For example, AI scoring mechanism
can also be used with the (3, 1.5, 1) scoring system: The winner of a game receives 3

11
points regardless of the TPLVs, the winner of the tiebreak receives 1.5 points and the
loser of the tiebreaker receives 1 point. In short, based on the needs and specific aims of
tournaments, the organizers could use different TPLV-based scoring systems. Regardless
of which scoring rule is used to break ties in specific games, Definition 3 provides a
tiebreaking rule based on cumulative TPLV in chess tournaments. In the unlikely event
that cumulative TPLVs of two players are equal in a chess tournament, then average
centipawn loss of the players could be used as a second tiebreaker; if these are also equal,
then there is a strong indication that the tie should not be broken. But if the tie has to
be broken in a tournament such as the world championship, then we suggest that players
play two games—one as White and one as Black—until the tie is broken by the AI scoring
rule. In the extremely unlikely event that the tie is not broken after a series of two-game
matches, one could, e.g., argue that the reigning world champion should keep their title.
We next define tiebreak strategyproofness in chess. We refer the interested reader to
the Appendix A, for the definition of tiebreak strategyproofness in more other games.

Definition 5 (TSP). A play ā ∈ Ā is called tiebreak strategyproof (TSP) if for every


player i and every action aki in sequence ā,

li
X li
X
[vi (a∗i (xj )) − vi (aji (xj ))] ≤ [vi (a∗i (xj )) − vi (aji (xj ))].
j=1,j̸=k j=1

An AI scoring mechanism f is called TSP if every play ā ∈ Ā under the mechanism


is TSP.

In words, given a play ā ∈ Ā, fix the total pawn-losses excluding a node x on the path
of ā. If the play is TSP, then it is in the best interest of the active player at x to choose
an AI best-response action. A straightforward extension of TSP mechanisms could be
to define tiebreak strategyproofness with respect to a more general function of the errors
made in a game rather than with respect to the total errors as in TPLV. We keep the
current definition for its simplicity.
We next show that our tiebreaking rule based on TPLV is indeed TSP.

Theorem 1 (TSP mechanisms). AI scoring mechanisms given by Definition 3 are TSP.

The proof of the theorem is in the Appendix A. To explain TSP in games in plain
words, suppose, to reach a contradiction, that the AI scoring rule given in Definition 4 is
not TSP in a chess game. This implies that there is some player i, a position in a chess
game, and there are two moves (move 1 and move 2) such that move 1 is the best move

12
according to an engine and its evaluation is strictly greater than engine evaluation of
move 2. Notice that choosing move 1 would decrease player i’s TPLV, which implies that
player i would be better off with choosing move 1 instead of move 2. Thus, AI scoring
rule for chess is indeed TSP. The fast chess tiebreakers are not TSP as we illustrated in
section 2.1.

4 Concluding remarks: Potential concerns/benefits,


and future directions
4.1 Logistics
Both the AI system (software) and the hardware play a role in calculating TPLVs in a
game. Thus, both of these should be made public knowledge in advance of a tournament.
The engine settings should be kept fixed across all games, unless the tournament director
has a reasonable doubt that the AI’s assessment of a particular position in a game was
flawed in a way that might affect the result. In that case, the tournament director may
seek a re-evaluation of the position/game. Today, several of the best chess engines, in-
cluding Stockfish and AlphaZero, are widely acknowledged to be clearly much better than
humans. Thus, either of these chess engines could be employed for the AI scoring rule. (In
tournaments with a large number of participants, however, one could use computationally
less expensive engine settings to calculate the TPLVs.)

4.2 Computer-like play


A reasonable concern could be that our proposal will make players to play more “computer-
like.” Nevertheless, we believe that top chess players now already play more like engines
than they did in the past. Expert players try to learn as much as they can from engines,
including openings and end-game strategies, in order to gain a competitive edge. As an
example, Carlsen [10] recently explained how he gained a huge amount of knowledge and
benefited from neural network-based engines such as AlphaZero. He also said that some
players have not used these AIs in the correct way, and hence have not benefited from
them. (For a further discussion, see González-Dı́az and Palacios-Huerta [20].) In sum-
mary, there is little, if anything, that the players can do to play more computer-like and
take advantage of the AI scoring mechanism on top of what they normally do. To put it
slightly differently, if there is any “computer-like” chess concept that a player can learn

13
and improve their AI score, then they would learn this concept to gain a competitive edge
anyway—even if AI scoring mechanism is not used to break ties. That being said, it is
up to the tournament organizers to decide which chess engine to use for tiebreaking, and
some engines are more “human-like” than the others [22].

4.3 Playing strength


It is simpler to play (and win) a game against a weaker opponent than a stronger opponent,
and a player is less likely to make mistakes when playing against a weaker opponent. Is it
then unfair to compare the quality of the moves of different players? We do not think so.
First, in most strong tournaments, including the world championship and the candidates
tournament, every player plays against everyone else. Second, in Swiss tournaments,
players who face each other at any round are in general of comparable strength due the
format of this tournament. While it is impossible to guarantee that each tied player plays
against the same opponents in a Swiss tournament, we believe that AI scoring mechanisms
are preferable to other mechanisms because they are impartial, tiebreak strategyproof,
and based on the quality of the moves played by the player themself as opposed to other
tiebreak mechanisms that are based on e.g. the performance of the player’s opponents.
(For a review of ranking systems used in Swiss tournaments, see Csató [14].)

4.4 Playing style


The playing style—positional vs tactical, or conservative vs aggressive—of a player may
make them more (or less) susceptible to making mistakes against a player with a different
playing style. A valid concern is whether our AI mechanisms favor one style over another.
The answer depends on the chess engine (software) and the hardware that are used to
break ties. A top player may have a better “tactical awareneness” than a relatively weak
chess engine or a strong engine that runs on a weak hardware. Using such an engine to
break ties would then obviously be unfair to the player. However, there is little doubt
that the latest version of Stockfish running on a strong hardware is a better tactical
and/or positional player than a human player. As an analogy, suppose that a world
chess champion evaluates a move in a game played by amateur players. While the world
champion may be biased, like any other player, there is little doubt that their evaluation
would be more reliable than the evaluation of an amateur.
In addition, in our opinion, the scoring system, as opposed to the tiebreak mechanism,
is the primary determinant of which playing style (aggressive vs conservative) would be

14
preferred by the players. For example, the standard (1,0.5) chess scoring system—where
the winner of a game receives 1 point and in case of a draw players each receive 0.5
points—does not discourage conservative playing style. By contrast, the (3,1.5,1) scoring
system, most prominently used by Norway Chess, discourages conservative play because
drawing two games does not give the same number of points as winning one unless one
wins the tiebreak in both drawn games (for details, see section 3). In summary, players
adjust their playing style according to the scoring system.

4.5 How will the incentives of the players change?


Apart from boosting the quality of matches by naturally giving more incentives to players
to find the best moves, our quality-based tiebreaking rule provides two additional benefits.
First, observe that it is very likely to discourage “prematurely” agreed draws, as there is
no assurance that each player will have the same TPLV when a draw is agreed upon during
a game; thus, at least the player who senses having the worse (i.e., higher) TPLV up to
that point will be less likely to offer or agree to a draw. Second, this new mechanism is also
likely to reduce the incentive for players to play quick moves to “flag” their opponent’s
clock—so that the opponent loses on time—because in case of a draw by insufficient
material, for instance, the player with the lower TPLV would gain an extra point.

4.6 What is the “best strategy” under our tiebreak mechanism?


Another valid question is whether playing solid moves, e.g., the top engine moves from
beginning to the end, is the “best strategy” under any of our tiebreak mechanisms. The
answer is that the best strategy in a human vs human competition is not to always pick
the top engine moves! This is because the opponent might memorize the line that is best
response to the top engine moves in which case the outcome of the game would most
likely be a draw. Our TSP mechanism says that one cannot improve their tiebreak score
by playing a sub-optimal move, and hence the only time a player should deviate from the
optimal move must be when one does not want the game to go to the tiebreak—i.e., when
one wants to win the game. And, winning the game is more valuable than winning
a tiebreak. Therefore, playing a sub-optimal computer move might be the “human-
optimal” move to win the game. Notice that this seemingly paradoxical conclusion does
not contradict with the tiebreak strategyproofness of our mechanism in part because we
intentionally apply our mechanism in case of a tie (unless a player runs out of time).

15
4.7 Conclusion
In contrast to the current tiebreak system of rapid, blitz, and Armageddon games, the
winner of the tiebreak under a quality-based tiebreak strategyproof AI mechanism is
determined by an objective, state-of-the-art chess engine with an Elo rating of about
3600. Under the new mechanism, players’ TPLV is highly likely to be different in the
event of a draw by mutual agreement, draw by insufficient material, or any other ‘regular’
draw. Thus, nearly every game will result in a winner, making games more exciting to
watch and thereby increasing fan viewership.
A valid question for future research direction is whether and to what extent our pro-
posal could be applied to other games and sports. Note that we have defined AI scor-
ing mechanisms and tiebreak strategyproofness for a general class of n-person zero-sum
games. Thus, our TSP scoring mechanisms are applicable to all games in this class, in-
cluding chess, Go, poker, backgammon, football (soccer), and tennis. However, one must
be cautious when using an AI scoring mechanism in a game/sport where AI’s superiority
is not commonly recognised, particularly by the best players in that game. Only after is
it established that AI is capable of judging the quality of the game—which is currently
the case only in a handful of games including Go, backgammon, and poker [8]—do we
recommend using our TSP scoring mechanisms.

A Appendix
The setup
Let G = (N, X, I, π, S) be an extensive form zero-sum game with perfect information and
perfect recall. N = {1, 2, ..., n} denotes the set of players, X a finite game tree, x0 the
root of the game tree, Z ⊂ X the set of terminal nodes, I the player function, πi : Z → R
the payoff function of player i ∈ N , and π the profile of payoff functions. Note that G is
P
called a zero-sum game if for every z ∈ Z, i∈N πi = c, where c ∈ R is a constant. (For
a standard textbook on game theory, see, e.g., [24].)
Let Xi = {x ∈ X : I(x) = i}, i.e., the set of nodes at which i is active. Let Ai (x)
S
denote the finite set of pure actions of player i at node x and Ai = x:I(x)=i Ai (x) player
i’s set of all pure actions. With a slight abuse of notation, each action ai of a player i at
node x can be naturally associated with a function ai : Xi → X where ai (x) 7→ x′ denotes
player i’s action ai ∈ Ai (x) at node x that leads to node x′ ∈ X. Thus, Ai (x) can be
viewed as the set of all successor nodes of x. A path is a sequence p = (x0 , x1 , ..., xm ) in

16
which xm ∈ Z and for every k ∈ {0, 1, ..., m − 1}, xk+1 is a successor of xk .
Let āi = (a1i , a2i , ..., alii ) be a sequence of actions of player i. Fix n′ ∈ N . A sequence
ā = (ā1 , ā2 , ..., ān′ ) is called a play if there exists a path p = (x0 , x1 , ..., xm ) for every
action aji in sequence a there exists a node xk in path p such that aji (xk ) = xk+1 . Let
Ā be the set of all plays. Let ā ∈ Ā be a play and aki (xk ) be an action in sequence ā of
player i at node xk . A modified play, denoted by ā \ a′i (xk ), is the sequence such that,
holding everything else in sequence ā fixed, player i plays action a′i (xk ) instead of action
aki (xk ) at xk . If a modified play is not a play, then its difference from a play is only one
action. Let Ā′ be the set of all modified plays. The notion of modified play will be useful
to when defining tiebreak strategyproofness.

Basic concepts
We first define formally what we mean by “AI.”

Definition 6 (AI). Let G be a game. An AI is a profile v of functions where for each


player i, vi : X → R.

As defined earlier, an AI mechanism yields a value (i.e., evaluation) for every player
and every node in a game G.

Definition 7 (AI best-response). An action a∗i at a node x is called an AI best-response


if
a∗i (x) ∈ arg max vi (ai (x)).
ai (x)∈Ai (x)

In words, an AI best-response is the best action chosen by the AI mechanism at a


given position (i.e., node in the game tree).
We next introduce AI scoring mechanisms.

Definition 8 (AI scoring mechanisms). Let G be a game and ā′ ∈ Ā′ be a modified
play. Let V be the set of all AIs of G. An AI scoring mechanism in G is a function
f : V × Ā′ → R2 , where f (v, ā′ ) 7→ (s1 , s2 ) ∈ R2 .

An AI scoring mechanism is a function that inputs an AI and a (modified) play and


outputs a score for each player. Our interpretation of f is that its i’th component,
fi : V × Ā → R, measures the “quality” of the play of player i—the lower the value of fi
the higher the quality of i’s play. Given an AI v and a play ā, if i ∈ arg mini∈N fi (v, ā),
then player i is ranked the highest in play ā. (There could be ties.)
Next, we define a simple AI scoring mechanism, which we call AI scoring rule.

17
Definition 9 (AI scoring rule). Let G be a game and ā ∈ Ā be a play. Let aji be player
i’s action at node xj in play ā. An AI scoring rule is an AI scoring mechanism, f , such
that for every player i and every play ā,

li
X
fi (v, ā) := [vi (a∗i (xj )) − vi (aji (xj ))].
j=1

An AI scoring mechanism gives a score for every play in a game. An AI scoring rule
is a specific scoring mechanism that aggregates the “errors” in a given play based on the
evaluation function of the AI. The player with the lowest total errors “wins” the game.
We next define tiebreak strategyproofness in games.

Definition 10 (TSP mechanisms). Let G be a game and ā′ ∈ Ā′ be a modified play. A
play ā ∈ Ā is called tiebreak strategyproof (TSP) if for every player i and every action aki
in sequence ā,
fi (v, ā \ a∗i (xk )) ≤ fi (v, ā).

An AI scoring mechanism f is called TSP if every play ā ∈ Ā under the mechanism is


TSP.

In plain words, fix a play ā ∈ Ā and the “quality” of play fi (v, ā) excluding node x
on the path of ā. If the play is TSP, then it is in the best interest of the active player at
node x to choose an AI best-response action, holding everything else fixed.
A simple extension of TSP method could be to determine tiebreak strategyproofness
based on strategy profiles as opposed to plays, as we did previously. A simple extension of
TSP mechanisms could be to define tiebreak strategyproofness based on strategy profiles
as opposed to plays as we did above. This would not be difficult to define; however, we
believe that it is impractical, if not impossible, to know the complete plan of actions of
players in every game. At the end of a chess game, for example, it is common for players to
discuss some alternative counterfactual lines that they considered throughout the game,
but these are very limited plans which are nowhere near a complete plan of action (i.e,
strategy) for the full game tree of chess.

Proof of Theorem 1
Proof. We show that AI chess scoring mechanisms are TSP in games and tournaments.
Case 1: Chess games. To reach a contradiction, suppose that AI scoring rule for chess is
not TSP. It implies that there exists a chess game ā ∈ Ā and some action aki in ā such

18
that
li
X li
X
[vi (a∗i (xj )) − vi (aji (xj ))] > [vi (a∗i (xj )) − vi (aji (xj ))] = T P LVi (ā). (1)
j=1,j̸=k j=1

Note that T P LVi (ā) can be written as

li
X
[vi (a∗i (xj )) − vi (aji (xj ))] + [vi (a∗i (xk )) − vi (aji (xk ))].
j=1,j̸=k

Thus, inequality 1 be rearranged as follows.

0 > [vi (a∗i (xk )) − vi (aji (xk ))],

if and only if
vi (aji (xk )) > vi (a∗i (xk )).

However, this contradicts to the fact that at node xk , a∗i (xk ) is an AI best-response, i.e.,

a∗i (x) ∈ arg max vi (ai (x)).


ai (x)∈Ai (x)

Thus, AI scoring rule is TSP in games.


Case 2: Tournaments. To reach a contradiction, suppose that the AI scoring rule is
not TSP in a match. This implies that there is some player i and a position xk in a
chess game ā, there are two moves,aji (xk ) and a∗i (xk ), with engine evaluations vi (aji (xk ))
and vi (a∗i (xk )) such that vi (aji (xk )) > vi (a∗i (xk )), and choosing the move with evaluation
vi (aji (xk )) improves player i’s cumulative TPLV compared with the other move in the
tournament (rather than the chess game as in the previous case). This move would then
increase player i’s TPLV in the game and cumulative TPLV in the tournament. As a
result, AI scoring mechanism is TSP in tournaments.
As desired, AI scoring mechanism for chess is TSP.

References
[1] Stockfish. https://stockfishchess.org/. Accessed: 2022-10-29.

[2] Nejat Anbarci, Ching-Jen Sun, and M Utku Ünver. Designing practical and fair

19
sequential team contests: The case of penalty shootouts. Games and Economic
Behavior, 130:25–43, 2021.

[3] Jose Apesteguia and Ignacio Palacios-Huerta. Psychological pressure in competitive


environments: Evidence from a randomized natural experiment. American Economic
Review, 100(5):2548–64, 2010.

[4] Haris Aziz, Florian Brandl, Felix Brandt, and Markus Brill. On the tradeoff between
efficiency and strategyproofness. Games and Economic Behavior, 110:1–18, 2018.

[5] Leonard Barden. Chess: Armageddon divides fans while magnus carlsen leads again
in norway. The Guardian.

[6] Steven J Brams and Mehmet S Ismail. Making the Rules of Sports Fairer. SIAM
Review, 60(1):181–202, 2018.

[7] Steven J Brams, Mehmet S Ismail, D Marc Kilgour, and Walter Stromquist. Catch-
Up: A Rule That Makes Service Sports More Competitive. The American Mathe-
matical Monthly, 125(9):771–796, 2018.

[8] Noam Brown and Tuomas Sandholm. Superhuman AI for multiplayer poker. Science,
365(6456):885–890, 2019.

[9] Murray Campbell, A Joseph Hoane Jr, and Feng-hsiung Hsu. Deep Blue. Artificial
Intelligence, 134(1-2):57–83, 2002.

[10] Magnus Carlsen. Magnus Carlsen: Greatest Chess Player of All Time | Lex Fridman
Podcast #315. https://www.youtube.com/watch?v=0ZO28NtkwwQ&t=1422s.
Interview by Lex Fridman. 2022-08-27.

[11] Magnus Carlsen. Magnus Carlsen: “I’m hoping this time there will be fewer draws”.
https://chess24.com/en/read/news/magnus-carlsen-i-m-hoping-this-tim
e-there-will-be-fewer-draws. Interview by Colin McGourty. 2021-10-04.

[12] Magnus Carlsen. World Chess Championship 2018 day 12 press conference. https:
//www.youtube.com/watch?v=dzO7aFh8AMU&t=315s. 2018-11-26.

[13] Danny Cohen-Zada, Alex Krumer, and Offer Moshe Shapir. Testing the effect of serve
order in tennis tiebreak. Journal of Economic Behavior & Organization, 146:106–115,
2018.

20
[14] László Csató. On the ranking of a Swiss system chess team tournament. Annals of
Operations Research, 254(1):17–36, 2017.

[15] László Csató. UEFA Champions League entry has not satisfied strategyproofness in
three seasons. Journal of Sports Economics, 20(7):975–981, 2019.

[16] Dmitry Dagaev and Konstantin Sonin. Winning by Losing: Incentive Incompatibility
in Multiple Qualifiers. Journal of Sports Economics, 19(8):1122–1146, 2018.

[17] Edith Elkind and Helger Lipmaa. Hybrid Voting Protocols and Hardness of Ma-
nipulation. In Proceedings of the 16th International Conference on Algorithms and
Computation, ISAAC’05, page 206–215, Berlin, Heidelberg, 2005. Springer-Verlag.

[18] Arpad E Elo. The Rating of Chess Players, Past and Present. Arco Publishing, New
York, 1978.

[19] Piotr Faliszewski and Ariel D. Procaccia. AI’s War on Manipulation: Are We Win-
ning? AI Magazine, 31(4):53–64, Sep. 2010.

[20] Julio González-Dı́az and Ignacio Palacios-Huerta. AlphaZero Ideas. Preprint at


SSRN 4140916, 2022.

[21] Shengwu Li. Obviously Strategy-Proof Mechanisms. American Economic Review,


107(11):3257–87, November 2017.

[22] Reid McIlroy-Young, Siddhartha Sen, Jon Kleinberg, and Ashton Anderson. Aligning
Superhuman AI with Human Behavior: Chess as a Model System. In Proceedings of
the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data
Mining, page 1677–1687, New York, NY, USA, 2020. Association for Computing
Machinery.

[23] Ian Nepomniachtchi. Twitter post. https://twitter.com/lachesisq/status/158


6835781229969408. Accessed: 2022-10-30.

[24] Martin J Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press,
Cambridge, MA, 1994.

[25] Marc Pauly. Can strategizing in round-robin subtournaments be avoided? Social


Choice and Welfare, 43(1):29–46, 2013.

21
[26] Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Lau-
rent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore
Graepel, et al. Mastering Atari, Go, chess and shogi by planning with a learned
model. Nature, 588(7839):604–609, 2020.

[27] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew
Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel,
et al. A general reinforcement learning algorithm that masters chess, shogi, and Go
through self-play. Science, 362(6419):1140–1144, 2018.

22

View publication stats

You might also like