You are on page 1of 10

AI player for Risk Test Plan

Dirk Brand, 16077229 July 25, 2013

Contents
1 Introduction 1.1 Testing Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Unit Testing and Code Coverage 2.1 Unit Testing Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Code Coverage Tools . . . . . . . . . . . . . . . . . . . . . . . . . 3 Integration Testing 4 User Interface Testing 5 Testing of AI players 5.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Table of Unit Tests B Evaluation Plan 3 3 4 5 5 5 6 6 6 6 8 9

Introduction

For the implementation of the project from the project design, I will primarily follow the test-driven development (TDD) [1] software development process. The following components from the framework will be tested:
The facilitator. The controller. The graphical user interface. Both the GUI backend engine and the engine for computer player players.

In addition to the testing of the above components, the critical algorithms used in the computer players will be tested. The quality of the computer players will be evaluated using the evaluation plan. The following methods will be used to test the components:
Unit testing of the individual methods within the facilitator, controller and engines. Integration testing of the facilitator, controller and engines. Code coverage of all code that is implemented. Automated testing of the user interface. Acceptance testing of the entire project.

1.1

Testing

The order in which the development and testing of the framework will take place, is shown in Figure 1.

Developement and testing of shared objects

Unit Testing

Development and testing of facilitator and controller

Development and testing of engines

Integration testing of facilitator, controller and engines

Testing of Protocol

Development and testing of User Interface

Testing of component interaction accross a network

Development and testing of Computer Players

Evaluation of Computer Players Figure 1: Planned order of development and testing of framework.

Unit Testing and Code Coverage

The facilitator, controller and engines will be tested with unit tests. According to the test-driven development approach, the unit tests for a class are written before the class itself is developed. Thus, for each class in the framework, a suite of unit tests will be written before the class is implemented. After the implementation, additional unit tests will be written to obtain sucient

code coverage. The unit tests will each test a single function of a class and will be isolated from other functions (the test results will not be dependent on the correctness of other functions). The number of unit tests and the nature of the tests will be linked to the code coverage. To obtain adequate code coverage, the tests would have to reach a certain number of statements in the code. Thus, the number of tests and the nature of the tests depend on the level of code coverage needed as well as how thorough the test driven approach was followed. The tests will also be written to test as many as possible corner cases of methods. Certain methods will not be tested with unit tests, as they interact with the network and their correctness can thus be tested with integration testing.

2.1

Unit Testing Tools

JUnit 4.8 will be used as the testing library. Each unit test has some sample input and an expected output (or Oracle). The expected result of a unit test might be that the method throws a specic exception (if invalid input is supplied). Table 2 shows examples of unit tests that may be written for some of the core methods in the framework. For each test, there is a list of possible inputs with the expected output for each input. Any methods not listed cannot be adequately tested with unit tests and will be tested with integration testing. The unit tests for all relevant getters and setters are implicitly dened and will be implemented.

2.2

Code Coverage Tools

EclEmma [5] will be used to measure the adequacy of the unit tests by means of code coverage. EclEmma is a free code coverage plugin for the Eclipse IDE. It allows the developer to directly analyse their unit tests for coverage and provides coverage data of each method as percentage of statements covered. It is accepted that at least 70% code coverage is adequate for system testing [2]. I aim to write enough tests to get 70% code coverage.

Integration Testing

The integration tests will involve testing the interaction between the dierent components of the framework. This would involve testing that the communication protocol is handled correctly by the facilitator, controller and engines. To test the protocol, messages will be sent between components across the network and the replies will be veried with the verication of the requirements listed in the Requirements document. The test scenario is also shown in the Requirements document and should be followed to test the behaviour of the components. Since most of the communication and interaction between components will be across a network, the integration testing would need to involve launching the facilitator or controller on a physical remote machine and then attempting to connect, communicate and disconnect from a local machine.

User Interface Testing

The user interfaces in the framework will be tested with the FEST-Swing library [4]. It exists as a stand-alone library or as a plugin for the Eclipse IDE. The library provides an easy-to-use API that allows for creation and maintenance of GUI tests. FEST simulates actual user gestures at the level of the operating system (so a simulated mouse click would seem exactly like a real mouse click to the GUI). It also creates screenshots of the components at the moment of test failure, so the behaviour can be analysed by the developer. The user interfaces will be tested both for robustness and for correct behaviour. An example of how the FEST-swing package tests behaviour of components is shown below (the code tests if an error message is displayed when the user neglects to enter a password): dialog . comboBox ( " domain " ). select ( " Users " ); dialog . textBox ( " username " ). enterText (" alex . ruiz " ); dialog . button ( " ok " ). click (); dialog . optionPane (). requireErrorMessage () . requireMessage (" Please enter your password " ); The various panels and components in the user interface will be tested in a similar fashion.

Testing of AI players

The playing of games between AI players will be automated with a driver class that directly launches the controller with two AI players (thus circumnavigating the facilitator).

5.1

Testing

As described in the requirements, the AI players will play against each other and the correctness of the players will be veried by viewing the log after a game was played. Each player will play against the submissive player and if the log shows that the game ran to completion and did not crash at any point, the AI player is veried as being correct. If the player did not win the game against the submissive player, it would not be veried as being correct. The correctness of the submissive player will be veried by eye.

5.2

Evaluation

The focus of the entire project is to investigate, develop and evaluate computer players for our game of Risk. This involves evaluating the quality of the players relative to each other and other existing AI players (players created by Yura [6]). The algorithms implemented in the AI players have to be evaluated. This has been discussed in the evaluation plan (shown in Appendix B). Variants of each AI player will be created based on the following factors:
Dierent evaluation functions.

7
The time allowed per turn. Various heuristics (like ordering the search of the game tree).

The players will play sets of games against each other. The rst player would play a single game against all the other players. The players could then be rated and ranked based on the Glicko rating system [7]. The rest of the games that will be played will be chosen based on the relevant rankings of the players. Players with similar ratings (their ratings lie within a certain range of each other) will play against each other to make the ratings more accurate.

References
[1] Kent Beck. Test driven development: By example. Addison-Wesley Professional, 2003. [2] Kevin Burr and William Young. Combinatorial test techniques: Table-based automation, test generation and code coverage. In Proc. of the Intl. Conf. on Software Testing Analysis & Review. Citeseer, 1998. [3] Arpad E Elo. The rating of chessplayers, past and present, volume 3. Batsford London, 1978. [4] FEST: Fixtures for Easy Software Testing. http://fest.easytesting.org/, Accessed: 24 May 2013. [5] EclEmma : Java Code Coverage for Eclipse. http://www.eclemma.org/, Accessed: 29 May 2013. [6] Domination (Risk Board Game). http://sourceforge.net/p/domination/wiki/Home/, Accessed: 27 Mar 2013. [7] Mark E Glickman. The glicko system. Boston University, 1995.

Table of Unit Tests


Sample Input playerID = 1, TerritoryID = 0 playerID = 1, TerritoryID = 3 playerID = 1, TerritoryID = 3 playerID = 1, TerritoryID = 0 playerID = 1, TerritoryID = 3 playerID = 1, TerritoryID = 0

Class / Test Name GameState testTransferTerritory testPlaceTroop testRemoveTroop testPlaceTroop ConnectedPlayer testCloseConnections Logger testLog ControllerLogic testLoadMap

Expected Output No Exceptions and territory transfered. Exception. Player does not own territory 3. No Exceptions and troop placed. Exception. Player does not own territory 0. No Exceptions and troop removed. Exception. Player does not own territory 0. No Exceptions and Socket.isClosed() evaluates to true. Hello, World! when log.toString() is called.

ConnectedPlayer created with Socket.

Hello, World! is written to the log.

The name of an existing map le. The name of a non-existing map le.

No Exceptions and the map le is processed correctly. This means that all the territories in the game are created correctly. Exception. The map does not exist. Defender loses two troops.

testResolveAttack

AttackD1 = 4, AttackD3 = 6, DefendD2 = 2 AttackD1 = 4, AttackD3 = 1, DefendD2 = 6 AttackD1 = 4, AttackD3 = 6, DefendD2 = 6 AttackD1 = 1, AttackD3 = 3, DefendD2 = 5

AttackD2 = 5, DefendD1 = 1, AttackD2 = 5, DefendD1 = 3, AttackD2 = 5, DefendD1 = 1, AttackD2 = 2, DefendD1 = 4,

Defender loses one troop. troop. Defender loses one troop. troop. Attacker loses two troops.

Attacker loses one

Attacker loses one

testGenRoll FacilitatorLogic testReadAIOpponents EngineLogic testLoadMap testTroopsPlaced testResolveAttack testTransferTerritoryControl

Test for 1000 iterations.

Any integer between 1 and 6 inclusive (for each iteration). AI les are read correctly and the names are stored in the FacilitatorLogic class. Same as testLoadMap under ControllerLogic. Same as testLoadMap under ControllerLogic. No Exceptions. Territory troop allocations are incremented by the correct amount. Exception. The territory list contains nonexistent territories. No Exceptions. The territory at TerritoryID1 is decremented by 3 and the territory at TerritoryID2 is incremented by 3. Exception. The territories are not connected.

Create mock AI les in the directory.

A list of existing territoryIDs and a list integers for troop numbers. A list of non-existing territoryIDs and a list integers for troop numbers. TerritoryID1 = 3, TerritoryID2 = 4, NumberOfTroops = 3 TerritoryID1 = 3, TerritoryID2 = 7, NumberOfTroops = 3

testSetOppManoeuvre

Figure 2: Descriptions of unit tests.

Evaluation Plan

Computer Player Testing


Four computer players (AIs) will be implemented. These will be:
A submissive player that may only defend and will place all recruited troops on a random territory that it owns. A baseline player that plays legal moves according to a simple scheme. A player based on expectiminimax game tree searching with Alpha-Beta pruning. A player based on a Monte Carlo Tree Search (MCTS) approach.

The submissive player will be used for testing purposes. A user testing the framework (with the test scenario) could play against the submissive player to verify that all the basic features of the GUI and the communication protocol works correctly. The baseline player can then be used to test the mechanics of battles, since it would be able to attack as well. Variants of the AI players will be made. Variants will dier based on the following factors:
Dierent evaluation functions or heuristics employed. Time allowed per turn.

The dierent variants of the AIs will play sets of matches against each other in order to determine ratings for the players. The ratings will be based on the Glicko rating system [7], an extension of the Elo rating system [3]. The system takes into account the condence one has in the accuracy of a players rating (called the ratings deviation, or RD). Thus, the more matches are played, the more accurate a rating will be obtainable (or the more condent one could be about a players rating). The Glicko system also says that the accuracy in a players rating decreases with time, but since AIs perform the same over time, in this project, the accuracy of the AIs ratings will not decrease. The baseline player will be given an Glicko rating of 1500 with a ratings deviation of 350. Variants will be given a unique identier and ranked according to their Glicko ratings. The rankings could look something like this: 1. AI1 3secs eval1. 2. AI2 5secs eval1. 3. AI1 3secs eval3. After each variant has played only a few matches against each other, preliminary Glicko ratings for the variants will be available and that will help determine which matchings should be played further. Variants should only need to play matches against opponents in their own Glicko rating range.

10

The Glicko rating system does not take the speed or time taken by players into account. Thus, variants that have similar ratings will be distinguished based on the time they take to play games (or, since time per move could be constrained, how deep the game tree built by the AI is).

Challenges
Playing a set of games between AI players is very computationally intensive. Players are thus given limited time to play a turn, thus limiting the depth to which they may build a game tree (in the case of expectiminimax and Monte Carlo Tree Search). This can be overcome by using the Glicko ratings to determine which matchings are worth exploring further to get more accurate ratings.

You might also like