Professional Documents
Culture Documents
1
official cite: http://kde.org
engine, a GUI, or a player. The GUIs and the players are Figure 2 also shows the relations among the components. We
connected to the game engine via UDP sockets. The align these components vertically according to the catalogs. A
components can be run in one or several computers. lower algorithm uses the outputs of the upper one as its inputs.
KBlocks can be configured with parameters defined in a The learners computes the models which are used in the
text file. The game engine (and the GUI) supports game algorithms.
GUI The middle column with the dotted arrow lines shows the
socketconnection sequence of the computation in the games. With the current
DataPreprocessor
Game
P layer2 P layer1
Engine Counting
Filtered
Filters
Data
Hand
Pattern Activated
playeror Info.Gain Coded
Calculator Patterns Features
GUI...
stop
competitions among up to 8 players, in which one player could Fig. 2. System Components
be a human. A hand-coded artificial player is integrated [16]. It
can clear on average 2000−3000 lines in single games. The
competitions can be done in a synchronized mode, in which
each player gets the new piece after the slowest player finishes
the current placement.
A new artificial player can be integrated into the platform
with ease. We provide a source code package in Internet 1 , in Fig. 3. An example of the Pattern
which the class KBlocksDummyAI is a clean and simple
interface for the further development. Graduate students can
simply change the source code for their internship or thesis. board state and the piece (s, p), the data preprocessor can
Researchers can play around with some ideas or organize generate up to 34 candidate placements by enumerating all the
competitions. rotation and the position of the p. The candidates are filtered
because of the heavy computational power required by training
C. LEARNING BY IMITATION
the SVMs. The rests of the candidates are passed to the pattern
In Section -B, we addressed the functionality of the Tetris calculator and the hand-coded features. Each candidate is
platform. In this section, the learning by imitation is discussed transferred into a vector of the values of the patterns and the
in details. First, we give a brief introduction to the system features. The vectors are used as the input of the SVM for the
components. Then, the patterns, which are used in the filters and prediction. The output of the SVM can be described as how
support vector machines (SVMs), are explained. Last, we similar a candidate is to the choice of the imitated player.
address how the SVMs are used for data classification in our Consequently, the most similar one is labeled as the final
imitation learning. choice.
The training data of the imitation learning are obtained from
the imitated system. In this paper, they are the Tetris games The Learning of the Patterns
played by the imitated player. We created several models to Training the SVMs is time consuming. There are 7 different
obtain the skills of the imitated player. The training process pieces in Tetris: L, J, O, I, T, Z, and S. To place one of L, J, or
receives positive feedback if the models make the same T, there will be 34 candidates by combining all the possible
decision as the imitated system. Otherwise, it receives the rotations and positions; O has 9 combinations; I, Z, or S have
negative feedback. The imitation learning is successful if the 17. The candidate chosen by the imitated player is regarded as
trained models keep the similarity even if the data never appear the positive case. The others are the negative cases. If the size
in the training set. of the training set is 10000, there are about 220000 tuples
The learning system consists of several components, as (cases) in the set. If each tuple is a vector of 39 values, training
shown in Figure 2. We created three catalogs for these a SVM from these data would take more than a week using a
components: the data representation; the algorithms; and the 2.3GHz PC.
learners. They are illustrated as the gray rectangles, the regular In order to reduce the data set, the types of pieces are used in
rectangles, and the round-cornered rectangles in the figure. the data preprocessor to separate the data into 7 subsets. Each
1
http://www.informatik.uni-freiburg.de/∼kiro/KBlocksDummyPlayer.tgz
subset is used to train its own filter, patterns, and SVM. In other clear 4 rows. This would be important for the game. The
words, seven SVMs work together in the artificial player. patterns, however, cannot express this occurrence.
When placing the current piece, human players can first We designed hand-coded features to acquire “global”
reduce the candidates to a limited number by observing the information. If the patterns can define the tactics of the games,
surface of the accumulated blocks. Then, they choose one from the features can be used to describe the strategies. In order to
the filtered candidates as their final decision. This idea was used define these features, we use Figure 4 to illustrate some phrases:
to develop the filters for reducing the amount of data in the hole, flat, column, and well. A well or a hole is buried if it is no
learning. deeper than three cells from the surface.
A filter consists of a set of patterns. Figure 3 shows the The features are listed in Table I. Items 2 and 3 are for the
column column. Items 4 − 6 are about the flat. 9 − 11 are for the hole.
14 − 18 are about the well. Our features are compared
x
x x T ABLE I
x x List of Hand-Coded Features
x x c c c
flat x x x x x x x c x 1* How many attacks are possible after the current
hole x x x x x x x x
x x x x x x x x x placement.
x x x x x x x x x
x x x x x x x x x 2 T he number of the columns.
3 T he increased height of the column.
buriedwell
4 T he increased number of the flat.
Fig. 4. T he Illustrations of the Features 5 T he decreased number of the flat.
6 T he maximum length of the flat
7 T he increased height of accumulated blocks.
concept of the patterns. The current piece is denoted by ’c’, it is
8 T he height difference between the current placement and
an ’L’ in the figure. We use ’x’ to denote the already occupied
the highest position of the accumulated blocks.
cells. Around the placement, a small field, which is marked in
9 How many holes will be created after the current
gray, is chosen as the activated area for the patterns. The placement
patterns are smaller than the small field. For example, the 10 How many holes will be removed after the current
deeper gray area in the figure shows a pattern. It contains 5×2 placement.
cells. The cells with a ’c’ or ’x’ inside are occupied. 11* How may occupied cells are added over a hole.
A pattern can be activated by a placement. As mentioned 12 T he number of removed lines of the current placement.
above, the small field is activated by the placement. All the 5 × 13* How well will the next piece be accommodated.
2 patterns can be enumerated. We move a pattern around the
14 If a well is closed by the current placement, how deep is the
small field. It is activated by the placement, if the occupied cells well.
in the pattern match the occupied cells in the background (the 15 If a well is open by the current placement, how deep is the
small field). well.
Filters can thus be learned by counting. If a pattern is never 16* How may occupied cells are added over a buried well.
activated by the placements of the imitated player, it can be used 17 T he number of the open wells.
to reduce the candidates. Each filter is a set of such patterns. It
18 How deep is the well, if it is created by the current
can be learned by running the activation tests over all th e placement.
training data. 19 Whether a well is removed by current placement.
Support Vector Machines with the features listed in [1]; the items with * were not
mentioned. There are differences in the descriptions of the
The patterns are useful not only in the filters but also for
features because we use them as the inputs of the SVMs. The
modeling the skills of the imitated players. For instance, a
pattern was activated 1000 times over the training set, among other researchers developed the evaluation function with the
which 900 were activated by the positive cases. This pattern linear combinations of the weighted features.
A large number of patterns can be created by enumeration.
cannot be used in a filter because there are mixed negative and
positive cases. However, activating it apparently indicates that For example, an enumeration of 5×2 will create 1024 patterns.
the placement tends to be positive because of the positive to It is difficult to consider all these patterns as the inputs of the
negative rate in the training data. Therefore, the patterns are also SVMs because of the required computational power. To our
used in this section to compute the inputs of the SVMs. knowledge, there is no trivial way to compute a subset of the
However, the patterns can only get the “local” information. patterns which yield to the optimal performance of the SVMs.
They are checked within the small field around the placement. Therefore, we employ the information gain in decision tree for
From another aspect, it is important to cons ider some “global” computing a subset of 20 patterns for each SVM.
parameters in Tetris. For example, a candidate placement can
SVMs are a popular method in data classification, in which player. The results are shown in the upper plot of Figure 5. The
the whole data set are globally classified with a set of the labels. data were averaged over 10 slices.
Nevertheless, the data in Tetris are grouped by the current piece. The solid lines show the performance of the player that
Among the candidate placements of the current piece, the imitates the human player. The dotted lines are the player that
algorithm needs to choose the one which is closest to the choice was imitating Fehey’s player. Both imitations achieved a
of the imitated player. LIBSVM [13] provides an API to similarity of about 0.7. The curves resemble a typical learning
compute this probability, which is used in our implementation. curve because the similarity is regarded as the evaluation in the
The values of the inputs should be within the same range in learning. The similarity cannot be higher because of the
the SVMs. The patterns always have a value of 0 or 1, which differences in the data representation and the models between
denotes whether or not it is activated by the current placement. the imitating system and the imitated systems.
The value of features, however, can be much bigger. For The trained models compete against Fehey’s player in the
example, the maximu m length of the flat can be up to 9 in a synchronized two-player games. 200 random piece sequences
standard Tetris game. In order to avoid this situation, the values were generated for the 200 games, so that each model was
of the features were mapped to 0, 0.5, or 1 in our evaluated in the same set of the games. The middle plo t in
Imitating Human
0.8 Imitating AI Figure 5 shows the winning rates of the imitating players. The
0.7
0.6 player imitating human finally achieved 0.25 as its rate of wins
0.5
0 20 40 60 80 100
35 Imitating Human
Imitating AI
25
15
5
0 20 40 60 80 100
References
[1] S. B. T . Christophe, “Building Controllers for T etris,” International
Computer Games Association Journal, vol. 32, pp. 3–11, 2009.
[2] C. Fehey, “T etris AI,”
http://www.colinfahey.com/tetris/tetrisen.html, 2003, www accessed
on 02-August-2010.
[3] G. K. S. M. N. Bo¨hm, “An evolutionary approach to tetris,” 2005, in
Proceedings of the sixth metaheuristics international conference
(MIC2005).
[4] I. A. Lo˝rincz, “Learning tetris using the noisy cross-entropy method,”
Neural Computation, vol. 18, pp. 2936–2941, 2006.
[5] L. Reyzin, “ 2 player tetris is pspace hard,” 2006, in Proceedings of 16th
Fall Workshop on Computational and Combinatorial Geometry.
[6] A. Bandura, “ Social learning theory,” New York: General Learning
Press, 1971.