You are on page 1of 15

1

Project 2-1
Analysis of AI approaches to playing Dots and
Boxes
Lindalee Conradie
Husam Abdelqader
Jonathon Bird
Haoran Luan
Kiril Tikhonov

Abstract—At first glance, applying AI techniques to games game popular among children and adults around the world.
may seem frivolous, but it is only for the uninformed. Games Despite apparent simplicity of the game’s rules, the game
are an important subject of AI research, and the field itself has many possible strategies for it. Even though it is a finite
mainly benefits from the development of games research. It is
expected that the application of artificial intelligence in games game, the scalable size of the board and possible moves
will provide new approaches that can be transferred to real- makes it complex to evaluate all the possible moves. Due
world applications. This paper presents an investigation into AI- to this, we will evaluate each implemented technique, its
techniques in the context of the classic board game Dots and average winning rate against other techniques, and answer the
Boxes. A simple heuristic approach, Monte Carlo Tree Search, following questions: Which approach is most effective overall?
Q-learning, MiniMax with alpha-beta pruning were developed,
and their performance compared. Which algorithm performs best against a basic strategy in a
reasonable processing time (less than 100 milliseconds)? How
much better is the advanced strategy compared to the basic
I. I NTRODUCTION strategy when considering only the next step? How does node
The use of technology is becoming more predominant with limit on MiniMax algorithm affect it’s processing time and
every discovery and research that’s being made. Financial win rates on different board sizes? How does the number of
institutions, legal institutions, media companies, and insurance simulations in MCTS affect processing time and win rate on
companies are all figuring out ways to use artificial intelligence different boards?
to their advantage. From fraud detection to writing news First, we will briefly discuss some terminology, rules, and
stories with natural language processing and reviewing law common strategies to play the game of Dots and Boxes.
briefs, artificial intelligence’s reach is extensive. The appli- This is followed by a brief overview of how the game was
cation of AI techniques has led to major improvements and implemented. Further on, the different AI techniques are
developments in computer science, mechanical engineering, discussed: a brief overview of each technique, a description
medical diagnostics, portable technology, and marketing. of its implementation, followed by a complexity analysis.
But what does all this have to do with applying AI tech- The experiments performed are then presented, followed by
niques to games? Just as in games, real-world problems consist a discussion of the results of these experiments. Finally, a
of a goal state and many options/solutions to get there. This conclusion is drawn from these discussions.
also includes having a number of variables that may influence
our decisions. Using AI techniques in games is a way for us
II. D OTS AND B OXES GAME
to evaluate how that technique makes decisions and acts in
different circumstances. It provides measurable results for the Dots and Boxes [3] is a simple pen and pencil game for two
further development of these methods. players (sometimes more) first published in the 19th century
While some research was done on solving the Dots and by a French mathematician Edouard Lucas. The game starts
Boxes game[1][2], we decided to take a different perspective with an empty grid of dots. Players then take turns to add
and test how well different approaches can handle a game like either a single horizontal or vertical line between two unjoined
Dots and Boxes. We can furthermore assess which technique adjacent dots. Once a player completes the fourth side of a
performs better with which games and under which conditions. 1 × 1 box, he earns a point and is then obliged to take another
In this report, we will look at how the AI techniques we turn. The game is finished when it’s not possible to add any
have studied and implemented will perform when playing the more lines to the grid. The player with the most points is then
game of Dots and Boxes. Dots and Boxes is a combinatorial the winner of the game. Furthermore, the board may be any
2

size grid. The game Dots and Boxes is similar to other board two outer-most empty lines. A chain containing three or
games in the sense that it’s impartial. This means that the more boxes is called a long chain;
current score and which player did what move does not affect • Cycle: a chain whose ends meet up forming a closed
the further possible moves. Furthermore, Dots and Boxes is circle.
a zero-sum games, which means that since there are a finite
number of points available, each point that one player gets is
B. Strategy
a point that the other player won’t be able to gain. In addition
to being impartial, the game is also fully observable. The common strategies for playing Dots and Boxes include
taking boxes where possible whilst also avoiding to draw the
third edge on a box which would result in the opponent taking
the box. Due to the fact that most players then avoid drawing
the third edge, most of these games will result in drawing two
lines per box until it’s absolutely necessary to draw the third.
Once this happens, a chain is created where the opponent can
then complete nearly all the boxes in the chain.
The following is the description of some fundamental strate-
gies involved in Dots and Boxes [4].
Double-dealing move: leave the opponent with two boxes
Fig. 1: Player one turn. Fig. 2: Player two turn. with valence 3, which he takes by making a single move which
is called a double-crossed move. The point of declining the
last two boxes in the chain is that the opponent is forced to
open up the next chain regardless of whether he takes the two
offered boxes. The double-dealing move for cycles is to split
the last four boxes in half forming two chains.
Whoever can force their opponent to be the first one to
play in a long chain is said to have control. Now if you have
control, you can maintain it by declining the last two boxes
of every long chain except the last (you should take all of the
last chain). If there are long enough chains around, then you
win by getting and maintaining control up to the end.
Fig. 3: Box filled by player two. Fig. 4: Game results. Getting Control: The long chain rule
The chain rule tells you how many chains you should make
Black edges are ones that have already been placed by to force your opponent to open the first long chain or cycle:
players. Furthermore, when hovering over possible edge place- • If there is an odd total number of dots, then the first
ment, it is highlighted with the color of the player. Once a player should make an odd number of chains and the
player completes a box, his initials or personal score is shown second player an even number of chains.
inside the box with the color of the player who completed it. • If there is an even number of total dots, then the first
At the end of the game the player who won the game is shown player should make an even number of chains and the
on the display according to how many games he won out of second player an odd number of chains.
the total number of games played. The two players are then
For the purposes of the long chain rule, cycles do not count
given the opportunity to play again, and finally the score of
as long chains.
the game is displayed.
Control and Chain/Cycle Length
When you have control, you need to have chains that are
A. Terminology long enough to overcome the cost of maintaining control.
Hence, the player who is going to get control should try to
The following is a list wherein the most important terms
make the chains as long as possible and try to avoid cycles
and definitions used are explained:
(especially quads). If the cost of maintaining control is more
• Dot: a single point on the board to which a line can
than the number of boxes you are going to get, then at some
connect; point it is going to pay to relinquish control by taking all of
• Line: an edge between two adjacent dots;
a chain or cycle. Conversely, if you are going to lose control,
• Box: the area on the board that is enclosed by four lines
then you should try to keep the chains as short as possible and
which add a point to the player that placed the final line; try to create a cycle particularly a quad [? ].
• N x M board: A board that is N dots tall and M dots
wide;
• Valence: an evaluation of the number of empty lines a
III. G AME IMPLEMENTATION
box has, this is between 0 and 4 inclusive. The game is represented by a graph data structure. The dots
• Chain: a sequence of boxes with valence 2, where every are vertices and the lines are edges. The state of the game is
empty line is part of two adjacent boxes, except for the stored as an adjacency matrix.
3

Value in a matrix Meaning


0 No edge is possible
of lines and the depth limit is more critical. Therefore, another
1 An edge is possible but not placed system has been implemented: there is a set max amount of
2 An edge has been placed nodes allowed in a tree, and it will find the maximum depth
within this limit before the MiniMax is called. The size of a
game tree before pruning without a depth limit is the amount
This system means find out information like how many of available lines factorial, as after each turn the number of
edges are placed in a box by taking the sum of the values available lines decreases by 1. The size of the tree is then
in the matrix. For example, a full box would sum up to 8, a t = x!/(x − z)!, where x is the number of available lines
box with valence 1 will sum up to 7. and z is the depth limit. We want to find the max z value so
Other notable data structures are the Vertex class and Edge that the condition t < n is true, where n is is the maximum
class. The Vertex class stores it’s x and y position and a number of nodes allowed. The calculated tree size does not
reference to each vertex it can have an edge with. The Edge take pruning into account, so the actual size after pruning is
class holds two vertices and an "ELine", which is the graphical much smaller.
representation of the edge. The state is also represented by a
b) Handling bonus turns: According to the rules of the
list of possible edges.
game, when the box is completed the player gets another
turn, which initially seemed to be a problem because in the
IV. AI T ECHNIQUES MiniMax algorithm the turns alternate. However, further on
A. BaseBot it was realized that this is not an issue: when the boolean
The first AI that we have implemented is heuristic-based parameter sets the algorithm as max or min, it is only referring
AI. BaseBot is a simple implementation of the basic strategy to the children of that node in a tree. Thus, at no point are the
the vast majority of casual players will take. It places edges minimum or maximum nodes mixed at the same level, even
randomly except that it will never set up a box for the other if some children create bonus turns or not.
player (i.e. does not create a box with a valence of 1) unless c) Move ordering: The effectiveness of alpha-beta prun-
forced to and will always complete a box if it can. We further ing is highly dependent on the order in which the states are
use the performance of this bot to compare the implemented examined. The implemented ordering is based on BaseBot
algorithms against a casual play. heuristic: lines that complete a box are checked first, lines
that set up a box for the opponent are considered last.
B. MiniMax with alpha-beta pruning d) Evaluation function: As mentioned earlier, the two
main advanced strategies in Dots and Boxes are the chain
1) Overview: Dots and Boxes is a deterministic, turn-
rule and the double-dealing move. implemented evaluation is
taking, two-player, zero-sum game of perfect information. This
based on both of those and has 3 parameters: the difference
means that the utility values at the end of the game are always
between the bot’s score and the opponents score, the long
equal and opposite. Thus the agents’ goals are in conflict,
chain rule, and the double-dealing move. The coefficients for
giving rise to the adversarial search problem [5].
each parameter is 1, 0.5, and -0.25, respectively. It’s set up that
MiniMax is a backtracking algorithm that is commonly used
way so that the bot builds desired chain structures according
to solve zero-sum games. The goal of the algorithm is to
to the chain rule but doesn’t unnecessarily sacrifice boxes.
recursively investigate the game tree until it reaches terminal
nodes on the bottom level (the last possible states of the e) Genetic algorithm: A genetic algorithm was employed
game). Each game state has a value associated with it given to find the best coefficients for the evaluation function param-
by an evaluation function. In the case of Dots and Boxes, the eters. Starting from initial values, 10 bots are created, with
evaluation assigns higher scores to moves that will potentially each coefficient having a chance to mutate. The bots then each
gain points to the player and lower scores to moves that could play a set amount of games, and their wins + 0.25 ∗ draws
lead to the opponent gaining points (e.g. by completing the is recorded as the score. The max score out of all bots is
box(es) later in the game). found and each bot that has a score within a selected range
2) Implementation: forms the basis for the next run of the algorithm. Each bot
a) Adaptive depth limit system: Without limiting the that made it through the selection process plays another set
search and pruning states, the problem quickly becomes un- of games and their scores are averaged, then 10 more mutated
solvable: while the rules are simple, the state space of games bots are created from the selected ones. After a certain number
on even small boards is very large (the 4 × 4 game has of iterations, the parameters with the max score are selected.
24 edges and thus has a search space of 24!). The most 3) Complexity Analysis: Applying alpha-beta pruning and
straightforward approach to controlling the amount of search domain-specific move ordering to MiniMax algorithm allows
is to set a fixed depth limit. However, the issue is that in us to significantly reduce the size of the search tree to be
the initial phase of the game there is not much use for large examined. We also control the amount of search by setting a
depths, yet it takes the longest time to process them since fixed depth limit and use evaluation function to approximate
there is the largest number of available moves. The original the true utility of a state without doing a complete search
solution was to lower the depth limit for the initial moves and Time complexity: with "perfect ordering" O(bd/2 ), where b
then raise it to a set value, but there was another part that was is the number of legal moves at each node and d is the depth
not addressed: later in games when there are the least number of the search.
4

Space complexity: O(bd) (based on depth-first traversal), on the expansion up until it reaches a leaf node, after
where b is the number of legal moves at each node and d is which it call on the simulation.
the depth of the search. 2) Expansion: If the tree chooses a node whose children
have not yet been generated, it will generate the children
C. Monte Carlo tree search for this node and add them to the tree.
3) Simulation: Once the tree has reached a leaf node which
1) Overview: The basic idea of MCTS [5] [6] is to attempt
is a final game state, it will evaluate who the winner is
to approximate solutions that are difficult to calculate by
at this game and feed this result to the back propagation.
collecting data from simulated samples and then averaging
4) Back Propagation: According to whether the leaf node
the result over the number of simulations.The advantage of
the tree encountered is a winning state or not, it will start
MCTS is that the only information that it requires is how the
back tracking through the tree, going through every node
state changes when a move is made. Due to the fact that the
the tree visited to reach this leaf node until it reaches the
simulations are played until the end of the game, the result of
node of the tree. For each of the nodes it visits along the
the game is used to update the values in the previous states,
way, it will update their inner variables which represent
therefore the search doesn’t need to evaluate the intermediate
the number of times the node has been visited, and the
states in the game.
number of time it lead to a win.
MCTS builds a portion of the game tree using the simulated
games where each node contains the number of times that node The process explained above simulates one possible way a
was reached, the number of times an action was chosen and game could proceed and end, to give a better evaluation the
the total reward for each action from all the simulations. tree simulates a number of games given at initialization, the
In the game, at each move that it needs to make the MCTS higher this number the better the evaluation, however more
player will perform simulations which start at the current state simulations lead to more processing time.
of the game. Lastly, when the tree has simulated the number of games it
Each simulation consists of selection, expansion, playout has been asked to. It will evaluate which of the current root
and then backpropagation. In the selection stage, the tree is children are the best option for us to win the game, it does
traversed using a decision rule at each node. The expansion this by using the following evaluation function [7]:
stage happens when an action is selected that doesn’t have a
r
wi ln Ni
coinciding node in the tree, this is where new nodes are added + 2
ni ni
to the tree. Then, during the playout stage, the simulation will
continue until a terminal state is reached. After the game is • wi : the number of times this node lead to a win
finished, the values for the nodes in the selection stage are • ni : the total number of simulations ran by the tree so far,
updated based on the result of the game. this is given by MCTS.
• Ni : the number of times this node has been visited
2) Implementation: We implemented the traditional version
of MCTS, selection, expansion, simulation and back propaga- Furthermore Dots and Boxes has a game tree size of (n!,
tion, with each node of the tree being a representation of the n=number of edges), therefore while MCTS is exploring and
state of the game at that step with the following variables simulating different play outs of the game, it can consume a
stored at each node: huge chunk of the available memory. In order to minimize the
• A variable that stores the number of times this node has
occurrence of this and to make the algorithm more efficient as
been visited the game moves forward, once the MCTS decides on a best
• A variable that stores the number of times this node has
move, all of the other branches will be deleted out of the game
lead to a winning leaf tree, this reduces the tree size thus making the algorithm more
• Object State that stores the current status of the board at
time and memory efficient.
that node. 3) Complexity Analysis: Time complexity:
• A reference to the parent of the node
The runtime of our algorithm can be computed as:
• A list containing all the children of this node
O(mkI/C).
Here, m is the number of random children that are considered
When the tree is created it is given the initial state of the
per search, k is the number of parallel searches, I is the
board to be the root node of our tree, and expand to create it
amount of iterations that have been performed, and C is the
is children. Afterwards, on each MCTS turn the tree is fed the
number of cores.
current state of the board, it locates it in the tree or creates
Space complexity:
the node if it has not yet been created and sets it at the root
The space complexity is O(mk)
in order to start the MCTS steps:
1) Selection: This the first step in the process of evaluating
our next best possible move. The tree will start by
selecting a random child of the root node, and reference D. Deep Q learning
it at the current node. It will keep on choosing a random Overview QLearning is a reinforcement learning algorithm
child from the node we are currently looking at, until that learns the ’quality’ of each action at each particular state.
it hits a leaf node, or a node whose children have not For any finite Markov Decision Process QLearning will find an
yet been generated, in the case of the later, it will call optimal policy by maximizing the expected value of the total
5

reward. Given infinite time, QLearning will find the optimal shown in the tables. We also recorded the processing time
policy for any finite Markov Decision Process. for each move. Experiments were performed on the following
QLearning is applicable to Dots and Boxes as it can board sizes: 3x3, 3x4, 4x4.
represented as a Markov chain. It’s discrete time; as each turn • Experiments with different values of number of simu-
is a separate point in time, stochastic; as player inputs mean lations are run in order to determine how it affects the
it’s not fully deterministic, and only the current state matters win rate and if the running time trade off is feasible.This
for choosing actions. With the addition of actions and rewards is done using the following board sizes: 3 × 3, 3 ×
this becomes a Markov Decision Process. The actions are the 4, 4 × 4, 5 × 5 and the following amount of simulations,
edges. 100, 1000, 10000, 100000. Each experiment was run for
1) QLearning Neural Network: The Q function can be 100 games.
approximated by a neural network. The deeplearning4j (dl4j)
plugin was used for the neural network. The implementation B. MiniMax vs BaseBot
involved setting up the Markov Decision Process environment MiniMax is tested in how it fairs against the basic strategy
and adjusting the hyper-parameters of the network. on the board sizes ranging from 3x3 to 6x6 and how limiting
The neural network has a state and an observation space. the maximum number of nodes (see the adaptive depth system)
The observation space is the part of the state that the neural affects MiniMax’s performance.
network can see. In this implementation the observation space Settings: For 3x3 to 4x4 board sizes 1000 games are played
was the matrix, while the state included all the information testing the different maximum node values from 10,000 to
needed for the methods to run, like playerScore or whose turn 100,000,000. After that 100 games are played testing node
it is. This is because only the matrix matters, not the score or expansions from 10,000 to 10,000,000 due to processing time
other information, in terms of the best move to play. constraints. The results and processing time are recorded
It couldn’t be set up to train against itself so it trains and the average processing time for MiniMax is shown. For
against BaseBot. BaseBot makes sense as an opponent as it more detailed statistics on processing time, see the appendix.
will always punish the bot for setting up boxes and the bot BaseBot’s processing time is always 0 ms.
can’t rely on it making mistakes and setting up boxes for it.
This is the reward system implemented: C. BaseBot+ vs BaseBot
Event Reward These experiments test how a short-sighted (having only the
Win 50 next move in mind) player using a basic strategy aligns with
Draw 0
Loss -10
a blind player using an advanced strategy.
Settings: 100 games are played on the 3x3, 4x4, and 5x5
The neural network has been trained on a 3x3 board size, board sizes, the results are shown, and the average processing
as P1 as you can simulate the largest number of games on the time for Basebot+ is shown. For more detailed statistics on
smallest board sizes and you need a large amount of games processing time, see the appendix.
to train a reinforcement neural network..
D. MiniMax vs MCTS
E. BaseBot+: Rule-based Experiments testing how MCTS fairs against MiniMax with
varying levels of maximum node limits for MiniMax and
We also decided to improve our BaseBot with the domain- number of simulations for MCTS.
specific strategies used for MiniMax algorithm and thus create Settings: The experiments are played on the 3x3, 4x4, and
a rule-based bot based on it. Instead of searching through all 5x5 board sizes and the results and average processing time
possible game states and finding the best set of moves, he for both bots are recorded. For more detailed statistics on
uses (strategic) rules to choose his next move. In this way he processing time, see the appendix.
may represent a more expert player who is familiar with these
rules. E. MiniMax vs BaseBot+
MiniMax is tested against a short-sighted opponent who
V. E XPERIMENTS
is also taking into account chain parity and double-dealing
We conducted a number of experiments on Dots and Boxes in addition to utilizing randomness with varying levels of
to quantify the strength of the implemented AI techniques. maximum node limit for MiniMax.
With our experiments we want to examine how well the al- Settings: 100 games are played for each experiment. The
gorithms can play against purely random moves; RandomBot, 100,000, 1,000,000 and 10,000,000 node limits are tested.
casual players; BaseBot and shortsighted advanced players;
BaseBot+. a We also compare the performance of the algo- F. Neural Network Experiments
rithms in matches against each other.
Experiments test how the neural network performs against
RandomBot, BaseBot, and MiniMax on a 3x3 board as player
A. MCTS Experiments 1, as that was the only setting it was trained on. The results
Settings: Bots play 500 games, then switch places and play are shown as well as the average processing time. For more
another 500 games. The percentage of games won or tied are detailed statistics on processing time, see the appendix.
6

4x4
G. Neural Network Experiments Number of Simulations MCTS DRAW RANDOMBOT
Experiments test how the neural network performs against 100 93 7 0
1000 93 7 0
RandomBot, BaseBot, and MiniMax on a 3x3 board as player 10 000 99 0 1
1, as that was the only setting it was trained on. The results TABLE VII: Results of running different number of simulations on a 4x4
are shown as well as the average processing time. For more board for MCTS vs RandomBot
detailed statistics on processing time, see the appendix.
5x5
Number of Simulations MCTS DRAW RANDOMBOT
VI. R ESULTS 100 100 0 0
1000 100 0 0
A. MCTS results 10 000 - - -
TABLE VIII: Results of running different number of simulations on a 5x5
1) Number of simulations: Below are the tables with the board for MCTS vs RandomBot
results of running the MCTS algorithm against the basebot.

3x3
Number of Simulations MCTS DRAW BASEBOT
100 1 0 99
1000 0 0 100
10 000 0 0 100
TABLE I: Results of running different number of simulations on a 3x3 board
for MCTS vs BaseBot

3x4
Number of Simulations MCTS DRAW BASEBOT
100 4 12 84
1000 3 16 81
10 000 0 14 86
TABLE II: Results of running different number of simulations on a 3x4 board
for MCTS vs BaseBot

4x4
Number of Simulations MCTS DRAW BASEBOT
100 2 0 98
1000 2 0 98
10 000 0 0 100
TABLE III: Results of running different number of simulations on a 4x4 board
for MCTS vs BaseBot

5x5
Number of Simulations MCTS DRAW BASEBOT
100 0 0 100
1000 0 0 100
10 000 - - -
TABLE IV: Results of running different number of simulations on a 5x5 board
for MCTS vs BaseBot

Here is MCTS vs RandomBot

3x3
Number of Simulations MCTS DRAW RANDOMBOT
100 40 40 20
1000 51 27 22
10 000 45 18 37
TABLE V: Results of running different number of simulations on a 3x3 board
for MCTS vs RandomBot

3x4
Number of Simulations MCTS DRAW RANDOMBOT
100 89 5 6
1000 77 11 12
10 000 93 3 4
TABLE VI: Results of running different number of simulations on a 3x4 board
for MCTS vs RandomBot
7

B. MiniMax vs BaseBot

3x3 MiniMax WIN:DRAW:BASEBOT WIN


Maximum number of nodes 100000000 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Results 1000:0:0 1000:0:0 1000:0:0 1000:0:0 907:56:37
Average time (ms) 193.55 45.88 26.29 2.44 0.43
P1: BASEBOT P2: MiniMax
Results 390:469:141 379:480:141 415:460:125 382:408:210 363:411:226
Average time (ms) 63.11 29.36 4.25 1.73 1.02
TABLE IX: Results of running different node limits on a 3x3 board for MiniMax vs BaseBot

3x4 MiniMax WIN:DRAW:BASEBOT WIN


Maximum number of nodes 100000000 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Results 791:51:158 739:58:203 741:24:235 781:44:175 667:20:313
Average time (ms) 464.76 136.63 22.01 4.14 1.60
P1: BASEBOT P2: MiniMax
Results 957:27:16 950:11:39 862:32:106 823:51:126 850:13:137
Average time (ms) 290.34 81.08 23.69 5.03 0.83
TABLE X: Results of running different node limits on a 3x4 board for MiniMax vs BaseBot

4x4 MiniMax WIN:DRAW:BASEBOT WIN


Maximum number of nodes 100000000 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Results 912:0:88 885:115 883:0:117 828:0:172 681:0:319
Average time (ms) 786.10 175.51 37.15 9.34 1.51
P1: BASEBOT P2: MiniMax
Results 948:0:52 951:0:49 911:0:89 889:0:111 834:0:166
Average time (ms) 964.47 237.20 36.97 9.12 1.93
TABLE XI: Results of running different node limits on a 3x3 board for MiniMax vs BaseBot

5x5 MiniMax WIN:DRAW:BASEBOT WIN


Maximum number of nodes 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Results 93:1:6 79:8:13 72:7:21 56:15:29
Average time (ms) 325.91 84.13 12.76 2.47
P1: BASEBOT P2: MiniMax
Results 93:3:4 84:0:16 64:7:29 58:12:30
Average time (ms) 366.33 76.84 11.15 2.18
TABLE XII: Results of running different node limits on a 3x3 board for MiniMax vs BaseBot

6x6 MiniMax WIN:DRAW:BASEBOT WIN


Maximum number of nodes 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Results 89:11 81:19 70:30 31:69
Average time (ms) 597.82 77.49 21.26 2.64
P1: BASEBOT P2: MiniMax
Results 82:18 76:24 71:29 60:40
Average time (ms) 603.66 96.24 28.63 2.46
TABLE XIII: Results of running different node limits on a 3x3 board for MiniMax vs BaseBot
8

5x5 MCTS WIN: MiniMax WIN


C. BaseBot+ vs BaseBot Min Max Maximum nodes 10000000 1000000 100000
P1: MiniMax P2: MCTS 100
BASEBOT+ WIN:DRAW:BASEBOT WIN Results 0:100 0:100 0:100
Board Size 3x3 4x4 5x5 MCTS 100 Average time (ms) 130.44 130.30 135.15
P1: BASEBOT+ P2: BASEBOT MiniMax Average time (ms) 406.10 76.07 16.16
Results 75:20:5 44:56 53:14:33 P1: MCTS 100 P2: MiniMax
Average time (ms) 0.07 0.03 0.02 Results 0:100 0:100 0:100
P1: BASEBOT P2: BASEBOT+ MCTS 100 Average time (ms) 103.69 142.30 110.50
Results 41:34:25 69:31 47:11:42 MiniMax Average time (ms) 388.85 94.96 12.79
Average time (ms) 0.06 0.02 0.07 P1: MiniMax P2: MCTS 1000
TABLE XIV: Results of BaseBot+ vs BaseBot Results 0:100 0:100 0:100
MCTS 1000 Average time (ms) 1433.06 1527.10 1678.83
MiniMax Average time (ms) 446.53 76.55 19.59
D. MiniMax vs MCTS P1: MCTS 1000 P2: MiniMax
Results 0:100 0:100 0:100
3x3 MCTS WIN: MiniMax WIN MCTS 1000 Average time (ms) 1120.62 1549.89 1471.15
MiniMax Maximum nodes 10000000 1000000 100000 MiniMax Average time (ms) 70.09 91.96 14.93
P1: MiniMax P2: MCTS 100 TABLE XVII: Results of running different node limits and different number
Results 0:100 0:100 0:100 of simulations on a 5x5 board for MiniMax vs MCTS
MCTS 100 Average time (ms) 18.05 17.54 17.83
MiniMax Average time (ms) 29.28 3.96 2.79 E. MiniMax vs BaseBot+
P1: MCTS 100 P2: MiniMax
Results 0:100 0:100 0:100
MiniMax WIN: DRAW: BASEBOT+ WIN
MCTS 100 Average time (ms) 17.45 17.47 18.11
Maximum number of nodes 10000000 1000000 100000
MiniMax Average time (ms) 30.29 4.97 1.96
3x3
P1: MiniMax P2: MCTS 1000
P1: MiniMax P2: BASEBOT
Results 0:100 0:100 0:100
Results 100:0:0 100:0:0 100:0:0
MCTS 1000 Average time (ms) 34.98 34.05 35.98
Average time (ms) 24.71 6.23 3.12
MiniMax Average time (ms) 26.79 4.55 4.05
P1: BASEBOT P2: MiniMax
P1: MCTS 1000 P2: MiniMax
Results 41:50:9 48:43:9 38:44:18
Results 0:100 0:100 0:100
Average time (ms) 27.69 6.37 1.51
MCTS 1000 Average time (ms) 35.51 42.13 38.33
4x4
MiniMax Average time (ms) 33.04 7.41 1.39
P1: MiniMax P2: BASEBOT+
P1: MiniMax P2: MCTS 10000
Results 87:13 85:15 74:26
Results 0:100 0:100 0:100
Average time (ms) 73.43 33.78 19.42
MCTS 10000 Average time (ms) 235.22 268.97 248.95
P1: BASEBOT+ P2: MiniMax
MiniMax Average time (ms) 37.40 8.5025 4.62
Results 99:1 94:3 85:15
P1: MCTS 10000 P2: MiniMax
Average time (ms) 94.45 40.10 10.36
Results 0:100 0:100 0:100
5x5
MCTS 10000 Average time (ms) 269.01 282.85 324.24
P1: MiniMax P2: BASEBOT+
MiniMax Average time (ms) 51.64 9.25 2.71
Results 83:10:7 77:9:14 78:8:14
TABLE XV: Results of running different node limits and different number of Average time (ms) 329.86 73.16 9.76
simulations on a 3x3 board for MiniMax vs MCTS
P1: BASEBOT+ P2: MiniMax
Results 64:12:24 66:4:30 58:10:32
Average time (ms) 77.18 88.66 12.65
4x4 MCTS WIN: MiniMax WIN
TABLE XVIII: Results of running different node limits for MiniMax vs
Min Max Maximum nodes 10000000 1000000 100000 BaseBot+
P1: MiniMax P2: MCTS 100
Results 0:100 0:100 0:100
MCTS 100 Average time (ms) 27.97 30.06 30.47 F. Neural Network results
MiniMax Average time (ms) 178.32 27.48 4.61
P1: MCTS 100 P2: MiniMax 3x3 NEURAL WIN: DRAW: NEURAL LOSS
Results 0:100 0:100 0:100 P1: NEURAL NETWORK
MCTS 100 Average time (ms) 33.06 34.13 36.69 Bots: P2: RANDOMBOT P2: BASEBOT P2: MiniMax
MiniMax Average time (ms) 139.61 20.98 6.34 Results 985:15:0 509:260:231 154:749:97
P1: MiniMax P2: MCTS 1000 Average time (ms) 7.01 11.55 16.69
Results 0:100 0:100 0:100 TABLE XIX: Results of the Neural Network
MCTS 1000 Average time (ms) 243.44 288.17 265.20
MiniMax Average time (ms) 221.14 39.94 6.85
P1: MCTS 1000 P2: MiniMax VII. D ISCUSSION
Results 0:100 0:100 0:100 A. MCTS
MCTS 1000 Average time (ms) 334.82 335.13 351.78
MiniMax Average time (ms) 179.31 28.38 8.93 1) MCTS vs Basebot with different number of simulations:
P1: MiniMax P2: MCTS 10000 As we can see from the results in tables I,II,III,IV by looking
Results 0:100 0:100 0:100 at the games played on a 3x4 board, we notice that by using
MCTS 10000 Average time (ms) 3103.19 3262.64 3289.44
MiniMax Average time (ms) 299.08 42.50 8.30 10 000 simulations it was able to draw the most games and
P1: MCTS 10000 P2: MiniMax win 3. In this scenario BaseBot won the least amount of times.
Results 0:100 0:100 0:100 When looking at III we see that MCTS does not perform that
MCTS 10000 Average time (ms) 3982.18125 4071.975 4171.94
MiniMax Average time (ms) 203.89 46.63 15.64
well and don’t see a difference between running 100 and 1000
TABLE XVI: Results of running different node limits and different number simulations. Furthermore by looking at IV we notice that the
of simulations on a 4x4 board for MiniMax vs MCTS same pattern is followed as in III.
9

2) MCTS vs Randombot with different number of simula- C. BaseBot+ vs BaseBot


tions: As we can see from the results in tables V,VI,VII,VIII, On boards where the long chain rule works to BaseBot+’s
MCTS wins the vast majority of games against the randomBot, advantage, it has an edge over BaseBot. As proven by the 3x3
on a 3x3 board in V, is where RandomBot performed the best and 4x4 boards results, where player 1 and player 2 has an
against MCTS, and this is due to the small board size, where advantage, respectively. On the 3x3 board, BaseBot+ manages
strategy is not the dominant factor, however moving onto the a 75% win rate as player 1, while BaseBot only manages
larger board sizes, we can see that MCTS’s win rate starts a 25% win rate as player 1. On the 4x4 boards, BaseBot+
increasing firstly on a 3x4 board in VI, where MCTS has a accomplishes a 69% win rate as player 2, while BaseBot
win rate between 77-93%, it is important to note that since achieves a 56% win rate. On the 5x5 BaseBot+’s edge shrinks,
both bots depend on randomness, some games may be played as it’s heuristics are based on the chain rule, it is difficult to
perfectly while others may not. On a 4x4 board in VII we build an effective chain structure when you can only look at
can see identical performance on 100 and 1000 simulations the next move, however it still maintains a clear advantage
where MCTS’s win rate is 93%, and a 99% win rate on 10 winning 20% more games as player 1 and 5% more games as
000 simulations. Lastly on a 5x5 board in table VIII, it is clear player 2.
the MCTS wins all games against RANDOMBOT, despite the
number of simulations, this is the opposite case of a 3x3 board D. MiniMax vs MCTS
, as this is a larger board and strategy is more important.
MiniMax will beat MCTS every time, no matter the board
In addition, when attempting to run the algorithm for 10 000
size or who’s assigned to which player. Processing time
simulations on a 5x5 and larger board sizes, we encounter an
increases for MCTS as the number of simulations increases
outOfMemory error. As of such, it is infeasible for these board
and the same for MiniMax and the maximum node limit.
sizes and simulation values as the assumption is made that a
public user will encounter the same error.
E. MiniMax vs BaseBot+
B. MiniMax vs BaseBot When BaseBot+ has the chain rule advantage on the 3x3 as
MiniMax plays optimally on board sizes and maximum player 1, it gets only a 9% win rate on both the 10,000,000
node expansion combinations where it can generate the tree for and 1,000,000 node limits and gets an 18% win rate on
the entire game. As proven by the 3x3 board results, if player the 100,000 limit. This is compared to BaseBot which gets
1 plays optimally - he will win. On a 3x3 board, there can’t a 14.1%/12.5%/21% win rate against MiniMax in the same
be more than one chain, so 2-player’s only option is to block settings. BaseBot+’s deficit in win rate compared to BaseBot
any 3-chain from being created in the first place, which can be increases as the node limit increases, with a deficit of -
counter played, giving an optimal player 1 the advantage. On 5.1%/-3.5%/-3%. On the 4x4 board size as player 2, it has
the 3x3 board past a maximum node expansion of 100,000, a win rate of 13%/15%/26%, whereas for BaseBot had a
there’s a limited gain in results, however, there’s a persistent win rate of 11.5%/11.7%/17.2%. So the difference between
gain in processing time. them, 1.5%/3.3%/8.8%, increases as the node limit decreases.
On the 3x4 board size, the reverse dynamic plays out since BaseBot+ performs comparatively worse than BaseBot the
it’s an even number of dots player 2 wants an odd number of higher the depth limit for MiniMax. This may be because
long chains while player 1 requires an even number of those. BaseBot+ uses the same strategies as MiniMax, so BaseBot+
On such a small board, there’s only one set-up with two long will likely perform the move that MiniMax considers the opti-
chains, when every edge placed down is horizontal, yet the mal move for the opponent. However, BaseBot might choose a
board is still big enough that blocking every long chain is completely different move which might work out better for it
difficult, so player 2 has the advantage. There’s a clear increase on smaller board sizes. This is likely the reason why MiniMax
in win rate with higher depths allowed for MiniMax, as well beats BaseBot+ by a larger margin when Basebot+ plays first
as a consistent increase in processing time. on 4x4.
On the 4x4 board, player 2 still has a slight advantage, as On the 5x5 board size, BaseBot+’s advantage over BaseBot
setting up a single long chain and blocking the others is easier, in games versus MiniMax becomes more apparent, when
but as the boards get bigger this advantage deteriorates. playing as player 1 BaseBot+ has a win rate of 24%/30%/32%,
Throughout the results, we see a consistent increase in whereas BaseBot has a win rate of 4%/16%/29%. This proves
win rate and processing time as the maximum node limit BaseBot+ has a clear advantage on larger board sizes, and
rises. The only exception to the increase in results is between that the advanced strategy is better for larger board sizes, even
1,000,000,000 and 100,000,000 on the smaller board sizes when only the next move is considered.
(3x4 and 4x4) as there’s not a substantial difference in depth
limit on those boards. The full game tree will quickly fall F. Neural Network results
under 100,000,000 nodes. The neural network beats RandomBot in 98.5% of played
MiniMax beats BaseBot under all conditions except for games. It has a 50.9% win rate against BaseBot, which is
node limit of 10,000 on a 6x6 grid, at that point a lot of lower than both BaseBot+ (by -24.1%) and MiniMax (by -
the depth limits will be 2 or 3, so it’s hard for it to build a 49.1%) on that board and player position. It has a 15.4% win
successful chain structure nor open up the shortest chain at rate against MiniMax which is worse than both BaseBot (by
the end when it can’t see how its move will turn out. -23.6%) and BaseBot+ (by -25%).
10

VIII. C ONCLUSION XVII Results of running different node limits and dif-
The number of simulations for Monte Carlo has little ferent number of simulations on a 5x5 board for
effect on results but increasing the simulations increases the MiniMax vs MCTS . . . . . . . . . . . . . . . . 8
processing time. Increasing the maximum number of nodes for XVIII Results of running different node limits for Min-
MiniMax increases win rates and processing time, however, iMax vs BaseBot+ . . . . . . . . . . . . . . . . . 8
the increase in win rates becomes less pronounced the smaller XIX Results of the Neural Network . . . . . . . . . . 8
the board. BaseBot+’s advantage over BaseBot proves the ad-
vanced strategy is better even when only considering the next R EFERENCES
move. MiniMax with a node limit of 10,000,000 consistently [1] Joseph K. Barker and Richard E. Korf. Solving dots-and-
performs in a reasonable time, with no average time exceeding boxes. In Proceedings of the Twenty-Sixth AAAI Confer-
100 milliseconds even on the larger board sizes. That version ence on Artificial Intelligence, AAAI’12, page 414–419.
proves to be the best against the basic strategy, consistently AAAI Press, 2012.
beating BaseBot by the largest margin. It also proves to be the [2] Daniel Allcock. Best play in dots and boxes endgames,
best algorithm overall, consistently getting a higher win rate 2019.
than its opponent, by a large margin. [3] S. Li, Y. Zhang, M. Ding, and P. Dai. Research on
integrated computer game algorithm for dots and boxes.
L IST OF F IGURES The Journal of Engineering, 2020(13):601–606, 2020.
[4] E.R. Berlekamp. The Dots and Boxes Game: Sophisticated
1 Player one turn. . . . . . . . . . . . . . . . . . . 2 Child’s Play. CRC Press, 2000.
2 Player two turn. . . . . . . . . . . . . . . . . . . 2 [5] Stuart Russell and Peter Norvig. Artificial Intelligence: A
3 Box filled by player two. . . . . . . . . . . . . . 2 Modern Approach. Prentice Hall Press, USA, 3rd edition,
4 Game results. . . . . . . . . . . . . . . . . . . . 2 2009.
[6] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas,
L IST OF TABLES P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez,
I Results of running different number of simula- S. Samothrakis, and S. Colton. A survey of monte carlo
tions on a 3x3 board for MCTS vs BaseBot . . . 6 tree search methods. IEEE Transactions on Computational
II Results of running different number of simula- Intelligence and AI in Games, 4(1):1–43, 2012.
tions on a 3x4 board for MCTS vs BaseBot . . . 6 [7] Karol Wal˛edzik and Jacek Mańdziuk. Multigame playing
III Results of running different number of simula- by means of uct enhanced with automatically generated
tions on a 4x4 board for MCTS vs BaseBot . . . 6 evaluation functions. In Jürgen Schmidhuber, Kristinn R.
IV Results of running different number of simula- Thórisson, and Moshe Looks, editors, Artificial General
tions on a 5x5 board for MCTS vs BaseBot . . . 6 Intelligence, pages 327–332, Berlin, Heidelberg, 2011.
V Results of running different number of simula- Springer Berlin Heidelberg.
tions on a 3x3 board for MCTS vs RandomBot . 6
VI Results of running different number of simula-
tions on a 3x4 board for MCTS vs RandomBot . 6
VII Results of running different number of simula-
tions on a 4x4 board for MCTS vs RandomBot . 6
VIII Results of running different number of simula-
tions on a 5x5 board for MCTS vs RandomBot . 6
IX Results of running different node limits on a 3x3
board for MiniMax vs BaseBot . . . . . . . . . . 7
X Results of running different node limits on a 3x4
board for MiniMax vs BaseBot . . . . . . . . . . 7
XI Results of running different node limits on a 3x3
board for MiniMax vs BaseBot . . . . . . . . . . 7
XII Results of running different node limits on a 3x3
board for MiniMax vs BaseBot . . . . . . . . . . 7
XIII Results of running different node limits on a 3x3
board for MiniMax vs BaseBot . . . . . . . . . . 7
XIV Results of BaseBot+ vs BaseBot . . . . . . . . . 8
XV Results of running different node limits and dif-
ferent number of simulations on a 3x3 board for
MiniMax vs MCTS . . . . . . . . . . . . . . . . 8
XVI Results of running different node limits and dif-
ferent number of simulations on a 4x4 board for
MiniMax vs MCTS . . . . . . . . . . . . . . . . 8
11

IX. A PPENDIX
A. MiniMax vs MCTS results
3x3 MiniMax WIN: DRAW: MCTS WIN
Min Max Maximum nodes 10000000 1000000 100000
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 17 17 17
Q1 (ms) 16 16 16
MCTS 100 Q3 (ms) 19 18 18
Average time (ms) 18.05 17.54 17.83
SD 3.74 4.14 5.47
Median time (ms) 0 0 0
Q1 (ms) 0 0 0
MinMax Q3 (ms) 16 6 2
Average time (ms) 29.28 3.96 2.79
SD 70.24 11.40 9.97
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 17 17 17
Q1 (ms) 16 16 16
MCTS 100 Q3 (ms) 18.00 18 19
Average time (ms) 17.45 17.47 18.11
SD 3.68 4.12 5.89
Median time (ms) 0.00 0 0.00
Q1 (ms) 0.00 0 0.00
MinMax Q3 (ms) 67.00 2 2.00
Average time (ms) 30.29 4.97 1.96
SD 61.71 14.95 11.44
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 29 29 31.5
Q1 (ms) 19.75 18.75 20
MCTS 1000 Q3 (ms) 46.00 42 45.25
Average time (ms) 34.98 34.05 35.98
SD 27.44 39.94 31.30
Median time (ms) 0.00 0 0
Q1 (ms) 0.00 0 0
MinMax Q3 (ms) 13.00 6 2
Average time (ms) 26.79 4.55 4.05
SD 70.53 15.05 16.02
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 26 30 28
Q1 (ms) 17 17 17
MCTS 1000 Q3 (ms) 46 53 50
Average time (ms) 35.51 42.13 38.33
SD 31.53 35.59 31.96
Median time (ms) 0 0 0
Q1 (ms) 0 0 0
MinMax Q3 (ms) 68 4 2
Average time (ms) 33.04 7.41 1.39
SD 76.39 19.82 4.28
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 109.5 129 150
Q1 (ms) 30.75 30.25 33
MCTS 10000 Q3 (ms) 394.5 456.25 443
Average time (ms) 235.22 268.97 248.95
SD 253.80 295.10 254.19
Median time (ms) 0 0 0
Q1 (ms) 0 0 0
MinMax Q3 (ms) 15 8 2
Average time (ms) 37.40 8.5025 4.62
SD 92.26 24.39731604 16.53
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 119.5 139.5 164
Q1 (ms) 21 21 22
MCTS 10000 Q3 (ms) 430 488.25 565.25
Average time (ms) 269.01 282.85 324.24
SD 301.62 315.33 349.70
Median time (ms) 0 0 0.00
Q1 (ms) 0 0 0.00
MinMax Q3 (ms) 92.00 3 3.00
Average time (ms) 51.64 9.25 2.71
SD 115.35 27.15 12.39
12

4x4 MiniMax WIN: DRAW: MCTS WIN


Min Max Maximum nodes 10000000 1000000 100000
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 26 29 29
Q1 (ms) 17 20 21
MCTS 100 Q3 (ms) 35 37 36
Average time (ms) 27.97 30.06 30.47
SD 14.60 17.50 17.82
Median time (ms) 40 5 4
Q1 (ms) 0 0 1
MinMax Q3 (ms) 240.25 40 6
Average time (ms) 178.32 27.48 4.61
SD 277.19 44.40 6.56
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 31 31 32.5
Q1 (ms) 22 22.75 23
MCTS 100 Q3 (ms) 40.00 41 43
Average time (ms) 33.06 34.13 36.69
SD 18.46 18.75 23.08
Median time (ms) 22.00 4 2.00
Q1 (ms) 0.00 0 0.00
MinMax Q3 (ms) 172.00 28 8.00
Average time (ms) 139.61 20.98 6.34
SD 261.64 35.26 10.59
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 168.5 270 254
Q1 (ms) 32 69.75 74.75
MCTS 1000 Q3 (ms) 406.25 463.25 428
Average time (ms) 243.44 288.17 265.20
SD 225.06 226.25 198.46
Median time (ms) 53.00 7 4
Q1 (ms) 0.00 0 1
MinMax Q3 (ms) 286.00 50 8
Average time (ms) 221.14 39.94 6.85
SD 334.56 66.03 14.76
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 256 295.5 336
Q1 (ms) 90 93.75 100
MCTS 1000 Q3 (ms) 517.25 518.25 545
Average time (ms) 334.82 335.13 351.78
SD 283.18 267.74 265.36
Median time (ms) 24 7 3
Q1 (ms) 0 0 0
MinMax Q3 (ms) 220.25 35 10
Average time (ms) 179.31 28.38 8.93
SD 330.29 49.83 18.01
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 2124.5 2925 2955.5
Q1 (ms) 141.5 766 861.25
MCTS 10000 Q3 (ms) 5934.75 5965 5796.5
Average time (ms) 3103.19 3262.64 3289.44
SD 3028.64 2698.37 2614.49
Median time (ms) 88.5 10 5
Q1 (ms) 0 0 1
MinMax Q3 (ms) 392.5 65 11
Average time (ms) 299.08 42.5016835 8.30
SD 452.27 67.28463235 14.62
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 3617.5 3838 3740
Q1 (ms) 973 1384.5 1215.5
MCTS 10000 Q3 (ms) 6738.5 6905.75 6952.75
Average time (ms) 3982.18125 4071.975 4171.94
SD 3163.68 3078.93 3076.08
Median time (ms) 28 13 6.00
Q1 (ms) 0 0 0.00
MinMax Q3 (ms) 305.00 71 19.00
Average time (ms) 203.89 46.63 15.64
SD 361.08 72.09 25.53
13

5x5 MiniMax WIN: DRAW: MCTS WIN


Min Max Maximum nodes 10000000 1000000 100000
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 83 101 107.5
Q1 (ms) 39.25 41 48
MCTS 100 Q3 (ms) 192 185.5 184.75
Average time (ms) 130.44 130.30 135.15
SD 120.46 111.97 110.76
Median time (ms) 82.5 18 7
Q1 (ms) 14 6 0.75
MinMax Q3 (ms) 449 72.5 14.25
Average time (ms) 406.10 76.07 16.16
SD 831.12 133.26 30.99
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 66.5 109.5 90
Q1 (ms) 24.75 48 37
MCTS 100 Q3 (ms) 155.25 192 154.25
Average time (ms) 103.69 142.30 110.50
SD 102.17 125.85 96.72
Median time (ms) 159.50 35.5 5.00
Q1 (ms) 22.00 8 1.00
MinMax Q3 (ms) 473.00 131.25 16.00
Average time (ms) 388.85 94.96 12.79
SD 564.61 146.60 23.14
P1: MiniMax P2: MCTS
Results 0:100 0:100 0:100
Median time (ms) 928 1100.5 1498
Q1 (ms) 340.5 398.75 505.75
MCTS 1000 Q3 (ms) 2426.25 2712.25 2693.5
Average time (ms) 1433.06 1527.10 1678.83
SD 1290.33 1250.04 1242.17
Median time (ms) 94.50 17 8
Q1 (ms) 11.00 5.75 1
MinMax Q3 (ms) 490.25 77.25 22
Average time (ms) 446.53 76.55 19.59
SD 873.70 132.58 35.01
P1: MCTS P2: MiniMax
Results 0:100 0:100 0:100
Median time (ms) 970 1276 1277.5
Q1 (ms) 352.25 455.5 399.25
MCTS 1000 Q3 (ms) 1793 2572.25 2458.75
Average time (ms) 1120.62 1549.89 1471.15
SD 859.48 1232.06 1202.92
Median time (ms) 22.5 33 6
Q1 (ms) 5 8 1
MinMax Q3 (ms) 97.25 124 19
Average time (ms) 70.09 91.96 14.93
SD 104.30 125.61 23.35

B. MiniMax vs BaseBot results

3x3 | MiniMax WIN: DRAW: BASEBOT WIN


Maximum number of nodes 100000000 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Result 1000:0:0 1000:0:0 1000:0:0 1000:0:0 907:56:37
Median time (ms) 0 0 0 0 0
Q1 (ms) 0 0 0 0 0
Q3 (ms) 145 69 12 3 0
Average time (ms) 193.55 45.88 26.29 2.44 0.43
SD 420.67 86.92 62.36 5.91 1.70
P1: BASEBOT P2: MiniMax
Result 390:469:141 379:480:141 415:460:125 382:408:210 363:411:226
Median time (ms) 0 0 0 0 0
Q1 (ms) 0 0 0 0 0
Q3 (ms) 53 54 3 3 1
Average time (ms) 63.11 29.36 4.25 1.73 1.02
SD 128.75 53.20 9.48 3.30 2.84
14

3x4 | MiniMax WIN: DRAW: BASEBOT WIN


Maximum number of nodes 100000000 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Result 791:51:158 739:58:203 741:24:235 781:44:175 667:20:313
Median time (ms) 42 9 2 1 0
Q1 (ms) 0 0 0 0 0
Q3 (ms) 380 105 18 5 2
Average time (ms) 464.76 136.63 22.01 4.14 1.60
SD 854.47 261.58 43.38 7.36 4.13
P1: BASEBOT P2: MiniMax
Result 957:27:16 950:11:39 862:32:106 823:51:126 850:13:137
Median time (ms) 2 2 2 0 0
Q1 (ms) 0 0 0 0 0
Q3 (ms) 221 72 25 8 1
Average time (ms) 290.34 81.08 23.69 5.03 0.83
SD 704.81 157.14 43.78 9.80 1.91

4x4 | MiniMax WIN: BASEBOT WIN


Maximum number of nodes 100000000 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT
Result 912:88 885:115 883:117 828:172 681:319
Median time (ms) 46 39 8 3 1
Q1 (ms) 0 0 0 1 0
Q3 (ms) 1027.25 248.25 60 15 2
Average time (ms) 786.10 175.51 37.15 9.34 1.51
SD 1394.89 293.86 56.79 13.09 2.21
P1: BASEBOT P2: MiniMax
Result 948:52 951:49 911:89 889:111 834:166
Median time (ms) 50 20 5 4 1
Q1 (ms) 0 0 0 0 0
Q3 (ms) 1344.25 376 54 14 2
Average time (ms) 964.47 237.20 36.97 9.12 1.93
SD 1801.12 389.26 60.73 13.68 3.70
15

5x5 | MiniMax WIN: DRAW: BASEBOT WIN


Maximum number of nodes 10000000 1000000 100000 10000 D. BaseBot+ results
P1: MiniMax P2: BASEBOT
Result 93:1:6 79:8:13 72:7:21 56:15:29 3x3 | BASEBOT+ WIN: DRAW: BASEBOT WIN
Median time (ms) 148 41 9 1 P1: BASEBOT+ P2: BASEBOT
Q1 (ms) 8 5 1 0 Result 75:20:5
Q3 (ms) 418 130 16 3
Median time (ms) 0
Average time (ms) 325.91 84.13 12.76 2.47
SD 490.14 109.32 17.48 4.87 Q1 (ms) 0
P1: BASEBOT P2: MiniMax Q3 (ms) 0
Result 93:3:4 84:0:16 64:7:29 58:12:30 Average time (ms) 0.07
Median time (ms) 170 41 7 11 SD 0.43
Q1 (ms) 7 5 1 0 P1: BASEBOT P2: BASEBOT+
Q3 (ms) 463.25 118 13 2
Average time (ms) 366.33 76.84 11.15 2.18
Result 41:34:25
SD 562.59 96.80 15.75 5.250 Median time (ms) 0
Q1 (ms) 0
Q3 (ms) 0
Average time (ms) 0.06
SD 0.83
6x6 | MiniMax WIN: DRAW: BASEBOT WIN
Maximum number of nodes 10000000 1000000 100000 10000
P1: MiniMax P2: BASEBOT 4x4 | BASEBOT+ WIN: BASEBOT WIN
Result 89:11 81:19 70:30 31:69 P1: BASEBOT+ P2: BASEBOT
Median time (ms) 393 39 11 1
Q1 (ms) 102.5 18 1 1
Result 44:56
Q3 (ms) 878 87 28 3 Median time (ms) 0
Average time (ms) 597.82 77.49 21.26 2.64 Q1 (ms) 0
SD 648.25 112.59 29.89 3.92
P1: BASEBOT P2: MiniMax
Q3 (ms) 0
Result 82:18 76:24 71:29 60:40 Average time (ms) 0.03
Median time (ms) 389 56 13 1 SD 0.3
Q1 (ms) 81 22 1 1
Q3 (ms) 883 103 41 3
P1: BASEBOT P2: BASEBOT+
Average (ms) 603.66 96.24 28.63 2.46 Result 69:31
SD 662.43 135.80 40.78 5.23 Median time (ms) 0
Q1 (ms) 0
Q3 (ms) 0
Average time (ms) 0.02
SD 0.26
C. Neural Network results
5x5 | BASEBOT+ WIN: DRAW: BASEBOT WIN
P1: BASEBOT+ P2: BASEBOT
Result 53:14:33
3x3 | NETWORK WIN: DRAW: RANDOMBOT WIN Median time (ms) 0
Q1 (ms) 0
P1: NEURAL NETWORK P2: RANDOMBOT
Q3 (ms) 0
Result 985:15:0 Average time (ms) 0.06
Median time (ms) 6 SD 0.36
Q1 (ms) 5 P1: BASEBOT P2: BASEBOT+
Q3 (ms) 7 Result 47:11:42
Average time (ms) 7.01 Median time (ms) 0
SD 20.24 Q1 (ms) 0
Q3 (ms) 0
Average time (ms) 0.07
SD 0.59

3x3 | NETWORK WIN: DRAW: BASEBOT WIN


P1: NEURAL NETWORK P2: BASEBOT
Result 509:260:231
Median time (ms) 8
Q1 (ms) 6
Q3 (ms) 12.75
Average (ms) 11.55
SD 34.26

You might also like