You are on page 1of 9

A Pebble Game Chip Featuring AI Driven by Min-Max Algorithm

Yitong Dai

The chip was designed specifically to play a pebble game. Figure 1 demonstrates how the game will be

played. The top row belongs to the PC player, the bottom row belongs to a human player. At the beginning

of a match, each square has two pebbles, in other words each player has four pebbles in his own row. In

each player’s turn, a player needs to select one square in his row where the number of pebbles is not zero.

The pebbles in this square will be distributed to other squares in clockwise. Figure 1 shows what the game

board will look like after the PC player chooses the top left square. A player will lose once the sum of

pebbles in his row is zero. The reason why I chose this topic is because I am interested in using digital

circuit design to solve practical problems, also I am curious about the performance difference between

software solution and hardware solution that are designed to solve the same problem.

The AI part of the chip is driven by Min-Max algorithm. Figure 2 shows how this algorithm works. The basic

ideas is, given a certain game state, AI will examine all possible moves, thus new game states can be

generated based on all these possible moves. Then for each of new game states AI will examine all

possible moves again. By doing this eventually a searching tree will be formed, and each layer of the tree

represents either AI’s turn or human player’ turn. In figure 2, the AI looks four steps ahead. At the leaf

nodes of the tree, a heuristic function is used to return a value that will suggest the probability that the AI

can win the match. Heuristic values generated at leaf nodes will travel backwards. For example, the second

bottom layer is the human player’s turn, thus min values will be chosen as less chance for the AI to win

means better chance for the human player to win. At the third bottom layer, it is the AI’s turn and the AI will

choose max values for its best interest. At the very top, the AI will eventually choose the move that leads to

the largest heuristic values returned from the bottom. In figure 2, at the top the middle move will be chosen

by the AI as it has the largest value with “6”.


Figure 3 is the simulation result of the chip, figure 4 shows the result generated by the software version of

Min-Max algorithm that is used to verify the simulation result .The chip has three inputs besides clock.

“Reset” is used to reset the game board to its initial state. “Play” and “player_position” will be a bundle and

these two signals come from user input. For example, when a player presses a certain button to select a

square, “play” will be “1” to represent that a button was pressed; “player_position” will represent which

button was pressed. There are also four outputs from the chip (Fig 3). “Pos1”, “pos2”, “pos3” and “pos4”

represent a number of pebbles for each square, also they are listed in clockwise just like what is shown in

figure 1. “Winner” tells which player win or a match was still being played. In order to verify that the result

generated by ModelSim matches what is in figure 4, look at “player_position” first. If “player_position” is “1”

in figure 3, it means square “2” is selected in figure 4; if “player_position” is “0” in figure 3, it means square

“3” is selected in figure 4. In both figure 3 and figure 4, a player selects the same square when his turn

comes. For example, left bottom square is selected in a human player’s first turn, based on figure 4 the new

game board should be “3-3-2-0” when values in each square is read in clockwise. In figure 3, it also shows

“3-3-2-0”, which means that the chip generates the new game board correctly. After a player’s first turn, in

figure 4 the AI player selects left top square, which leads to the result “0-4-1-3”, the same result “0-4-1-3” is

shown in figure 3. By comparing game states in both figure 3 and figure 4, the results match each other,

which indicates that the AI module of the chip is able to return the same result as the software solution. In

figure 3, a second match was played to test that the transition between two matches works as expected

and verify the AI module further. For the sake of keeping the report simple, I didn’t provide the result

generated by the software for the second match shown in figure 3, but I included the executable (Mac

version) and C++ source code of my software solution in my submission. So the second match will be able

to be verified by using my executable program.


In figure 5, “position updater unit” was implemented by the following approach. The number of pebbles in

the square that is selected is divided by 4, the quotient is the base number of pebbles that each square will

be added by; the remainder tells how many squares after the chosen square in clockwise will be added one

more pebbles on top of the base number. To make it more clear, a quotient refers to a number of pebbles;

a reminder refers to a number of squares. The purpose of “input factory unit” is to make sure that “controller

unit” is only sensitive to a positive edge of “play” signal, so that gameboard will not be updated for multiple

times even if a player keeps pressing a button. “AI unit” and “position decoder unit” are in parallel, both of

their jobs is to tell “position updater unit” which square will be selected. The difference is “position decoder

unit” generates outputs based on a human player’s input, while “AI unit” generates outputs by Min-Max

algorithm. “Controller unit” is used to ensure that a match is played by the AI and a human player in turns.

Also if a human player selects a square with zero pebble in his turn, the next turn will still be the human

player’s turn rather than AI’s until the human player provides a valid input. “Controller unit” is also able to

end a match when there is a winner, so that the gameboard won’t be updated by either AI or user input.

AI unit (fig 6) is the largest module of the chip, it is consist of 62 “position updater unit” to achieve five

layers prediction. The reason why I chose five as the prediction depth is because five is the minimum

prediction depth in order to have a competitive AI based on my software simulation. The circuit itself mimics

the structure of Min-Max algorithm. One major problem that needs to be tackled is that, the circuit needs to

have a complete structure of a searching tree, but at the same time illegal branches needs to be pruned.

That is why there are extra wires that connect “position updater unit” and “min comparator unit” together.

So “min comparator unit” is aware if there are illegal moves. For example, if the top input of “min

comparator unit” is a result of an illegal move, the top input will not be selected even if its value is less than

the bottom input.


With my current design, I chose 70 ns as the clock cycle with a positive slack of 45 ns (fig. 7). I didn’t chase

the smallest clock cycle as even if I achieved a 30 ns clock cycle, it doesn’t change the magnitude when I

do the comparison between the hardware and the software solution. I run the software version of Min-Max

algorithm on Intel Core i5-4260U processor that uses 22nm technology, on the other hand my hardware

solution uses 180nm technology. By measuring the elapsed time in the software solution, based on the

fastest record rather the average, it takes 17000 ns to predict five steps ahead, which is 243 times (243 =

17000 / 70) more than what my hardware solution will take. This comparison fully demonstrates the

advantage of having a dedicated hardware component to do a certain job. While the trade-off of dedicated

hardware is that it is not capable of doing any other types of tasks.

Acknowledgements:

I thank Edouard Giacomin to help me figure out a lot of issues during the placement & routing stage and
answer my questions at the initial stage of my design.

References:

[1] Stephen Brown and Zvonko Vranesic, “Fundamentals of Digital Logic with Verilog Design (3rd edition)”,
2014.

[2] Neil H.E. Weste and David Money Harris , “CMOS VLSI Design A Circuits and Systems Perspective 4th
edition”, 2009.

[3] FPGA4Student.com, “Tic Tac Toe Game in Verilog and LogiSim”, available online at:
https://www.fpga4student.com/2017/06/tic-tac-toe-game-in-verilog-and-logisim.html .
Figure 1: Example of how pebble game will be played.

Figure 2: Example of how Min-Max algorithm works.


Figure 3: Simulation results generated by Modelsim.

Figure 4: Simulation results generated by the software.


Figure 5: The internal structure of the chip. The direction of arrows show inputs and outputs for each
module.

Figure 6: The internal structure of AI unit with a prediction depth of two as an example.
Figure 7: Timing report.

Figure 8: Area report.


Figure 9: Power report.

Figure 10: Floorplan.

You might also like