You are on page 1of 2

Chapter 17

Reinforcement Learning in Games

István Szita

Abstract. Reinforcement learning and games have a long and mutually beneficial
common history. From one side, games are rich and challenging domains for test-
ing reinforcement learning algorithms. From the other side, in several games the
best computer players use reinforcement learning. The chapter begins with a se-
lection of games and notable reinforcement learning implementations. Without any
modifications, the basic reinforcement learning algorithms are rarely sufficient for
high-level gameplay, so it is essential to discuss the additional ideas, ways of insert-
ing domain knowledge, implementation decisions that are necessary for scaling up.
These are reviewed in sufficient detail to understand their potentials and their limi-
tations. The second part of the chapter lists challenges for reinforcement learning in
games, together with a review of proposed solution methods. While this listing has
a game-centric viewpoint, and some of the items are specific to games (like oppo-
nent modelling), a large portion of this overview can provide insight for other kinds
of applications, too. In the third part we review how reinforcement learning can be
useful in game development and find its way into commercial computer games. Fi-
nally, we provide pointers for more in-depth reviews of specific games and solution
approaches.

17.1 Introduction

Reinforcement learning (RL) and games have a long and fruitful common his-
tory. Samuel’s Checkers player, one of the first learning programs ever, already had
the principles of temporal difference learning, decades before temporal difference
learning (TD) was described and analyzed. And it was another game, Backgam-
mon, where reinforcement learning reached its first big success, when Tesauro’s

István Szita
University of Alberta, Canada
e-mail: szityu@gmail.com

M. Wiering and M. van Otterlo (Eds.): Reinforcement Learning, ALO 12, pp. 539–577.
springerlink.com © Springer-Verlag Berlin Heidelberg 2012
540 I. Szita

TD-Gammon reached and exceeded the level of top human players – and did so en-
tirely by learning on its own. Since then, RL has been applied to many other games,
and while it could not repeat the success of TD-Gammon in every game, there are
many promising results and many lessons to be learned. We hope to present these
in this chapter, both in classical games and computer games games, real-time strat-
egy games, first person shooters, role-playing games. Most notably, reinforcement
learning approaches seem to have the upper hand in one of the flagship applications
of artificial intelligence research, Go.
From a different point of view, games are an excellent testbed for RL research.
Games are designed to entertain, amuse and challenge humans, so, by studying
games, we can (hopefully) learn about human intelligence, and the challenges that
human intelligence needs to solve. At the same time, games are challenging domains
for RL algorithms as well, probably for the same reason they are so for humans:
they are designed to involve interesting decisions. The types of challenges move on
a wide scale, and we aim at presenting a representative selection of these challenges,
together with RL approaches to tackle them.

17.1.1 Aims and Structure

One goal of this chapter is to collect notable RL applications to games. But there is
another, far more important goal: to get an idea how (and why) RL algorithms work
(or fail) in practice. Most of the algorithms mentioned in the chapter are described
in detail in other parts of the book. Their theoretical analysis (if exists) give us
guarantees that they work under ideal conditions, but these are impractical for most
games: conditions are too restrictive, statements are loose or both. For example, we
know that TD-learning1 converges to an optimal policy if the environment is a finite
Markov decision process (MDP), values of each state are stored individually, learn-
ing rates are decreasing in an proper manner, and exploration is sufficient. Most of
these conditions are violated in a typical game application. Yet, TD-learning works
phenomenally well for Backgammon and not at all for other games (for example,
Tetris). There is a rich literature of attempts to identify game attributes that make
TD-learning and other RL algorithms perform well. We think that an overview of
these attempts is pretty helpful for future developments.
In any application of RL, the choice of algorithm is just one among many factors
that determine success or failure. Oftentimes the choice of algorithm is not even the
most significant factor: the choice of representation, formalization, the encoding of
domain knowledge, additional heuristics and variations, proper setting of parameters
can all have great influence. For each of these issues, we can find exciting ideas that
have been developed for conquering specific games. Sadly (but not surprisingly),
there is no “magic bullet”: all approaches are more-or-less game- or genre-specific.
1 In theoretical works, TD-learning refers to a policy evaluation method, while in the games-
related literature, it is used in the sense “actor-critic learning with a critic using TD-
learning”.

You might also like