You are on page 1of 1

Playing Backgammon with Artificial Intelligence

Ali Sharafat & Ahmadreza Momeni


Stanford University, Computer Science Department
{sharafat,amomenis}@stanfrod.edu

Abstract 4. Train the Neural Network using TD-Learning


The purpose of our work is to create a smart agent that can play the game of backgammon better than humans.
We have implemented the game of backgammon based the pacman implementation in the homework (without the
w ← w − η V̂(s, w) − r + γ V̂(s0, w)
 
GUI). Since the game tree is very wide, methods such as minimax and expectiminimax are practically impossible.
Alpha-beta pruning works, but the main objective becomes having an accurate evaluation function. Our main con-
tribution is extending the TD-Gammon implementation, which is essentially rudimantary TD-learning applied on
a minimal representation of backgammon. We have experimented different architectures of neural nets as well as 5. Play the game using the estimated value function and the know strategies such
different representations of the game. We have also other experiments, such as changing the depth of the tree that
the algorithm explores. as the minimax policy with limited depth search


Problem Definition 

 Utility(s) IsEnd(s)

V̂ (s, w) d=0
Vmax,min(s, d) =
• Backgammon is a board game played with dice and checkers in which players 
 maxa∈Actions(s) Vmax,min(Succ(s, a), d) Player(s) = agent
try to be the first to gather their pieces into one corner and then systematically


min Vmax,min(Succ(s, a), d) Player(s) = opp
a∈Actions(s)
remove them from the board
Results

Figure 1: The initial board setup

The essential rules of the game are the followings:


– Move the checkers in clockwise direction Figure 2: Win rate vs random player after training for n games

– Players may move 1, 2, 3, or 4 checkers in a turn depending on the dice


values thrown and the player’s choice of available moves Vanilla game has a one layer NN with 50 hidden units and 294 features. Fea-
– For a move to be allowed the destination point must have no more than one ture1 has 390 features and mostly expands on the state of columns and feature2
of the opponent’s checkers on it has 326 features and mostly expands on the state of checkers off the board.
– If a point only has one checker on it then the other player may move their
checker(s) to it and send it to the bar which is marked down the middle of Analysis
the board
We observe that the neural net trains rather quickly to comprehensively beat a
– A player may not make any further moves until all their checkers on the bar
random player. The increase in the number of features makes the model train
have been brought back into play
faster. The increase in the depth of the NN decreases the win rate, but it might
– Once a player has moved all their checkers around the board and into their
be due to the learning parameters that we are using. We will need to compare
home-board
the loss plots to see the true difference between the variations of the model.

Challenges Ongoing Work


• Game has a large state space (at least 1020) [cite G. Tesauro.]
• Obtain loss plots for different models
• Inherent randomness in the game: pair of dice dictate the set of possible next
• Add 2-ply and 3-ply models for smarter players
moves
• Play games against the oracle to compare our best model against human-level
• Large branching of the game tree (each roll of dice can create 154 possible
players
moves)
• Increase number of training games up to 1M
• Game tree is deep: each game takes about 50-60 turns
• Difficult to handcraft good evaluation functions since human-created heuris-
tics for the game are not perfect References

[1] Gerald Tesauro. Temporal difference learning and td-gammon. Communi-


Approaches cations of the ACM, 38(3):58–68, 1995.

1. Implement backgammon in python (starting with the implementation of pac-


man)
Acknowledgements
2. Define backgammon as a game problem We would like to thank Prof. Liang, Prof. Efron, and our mentor Zhi Bie for
3. Design a Neural Network (v1:k) to estimate the value function the patient guidance, encouragement and advice he has provided throughout
k my time as his student. We have been extremely lucky to have such a mentor
X
V̂ (s, w, v1:k) = wjσ(vj · φ(s)) who cared so much about our work, and who responded to our questions and
j=1 queries so promptly.

You might also like