Game Theory and Control-Jason R. Marden and Jeff S. Shamma-2017

AS01CH02_Shamma ARI 31 October 2017 13:22
Annual Review of Control, Robotics, and

Autonomous Systems
Game Theory and Control

Jason R. Marden1 and Jeff S. Shamma2
1
Department of Electrical and Computer Engineering, University of California,
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
Santa Barbara, California 93106, USA; email: jrmarden@ece.ucsb.edu

2
Computer, Electrical, and Mathematical Science and Engineering Division (CEMSE),
Access provided by University of Reading on 01/25/18. For personal use only.
King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900,

Saudi Arabia; email: jeff.shamma@kaust.edu.sa
Annu. Rev. Control Robot. Auton. Syst. 2018. Keywords

1:2.1–2.30
game theory, zero-sum games, team games, distributed control, learning in
The Annual Review of Control, Robotics, and
Autonomous Systems is online at games, mechanism design
control.annualreviews.org
Abstract
https://doi.org/10.1146/annurev-control-060117-
105102 Game theory is the study of decision problems in which there are multiple
Copyright c 2018 by Annual Reviews. decision makers and the quality of a decision maker’s choice depends on
All rights reserved both that choice and the choices of others. While game theory has been
studied predominantly as a modeling paradigm in the mathematical social
sciences, there is a strong connection to control systems in that a controller
can be viewed as a decision-making entity. Accordingly, game theory is rele-
vant in settings with multiple interacting controllers. This article presents an
introduction to game theory, followed by a sampling of results in three spe-
cific control theory topics where game theory has played a significant role:
(a) zero-sum games, in which the two competing players are a controller
and an adversarial environment; (b) team games, in which several controllers
pursue a common goal but have access to different information; and (c) dis-
tributed control, in which both a game and online adaptive rules are designed
to enable distributed interacting subsystems to achieve a collective objective.
2.1
Review in Advance first posted on

November 8, 2017. (Changes may
still occur before final publication.)
1. INTRODUCTION AND OVERVIEW

In his 1997 textbook, Myerson (1, p. 1) defines game theory as “the study of mathematical models
of conflict and cooperation between intelligent rational decision-makers.” From a control theory
perspective, one can interpret a control law as a sort of intelligent rational decision maker that was
designed to produced a desired effect. Accordingly, again from a control theory perspective, game
theory can be viewed as the study of conflict and cooperation between interacting controllers,
where the identities (who are the controllers?) and preferences (what are they trying to achieve?)
depend on the specific setting.
One setting, which is one of conflict, where game theory and control theory have had a long-
standing connection is strictly competitive games, more commonly known as zero-sum games.
There are two players, and what is to the benefit of one player is to the detriment of the other.
The connection to control theory is as follows. A driving motivation for feedback is to assure
satisfactory performance in the face of an uncertain environment (see the robust control literature
on this topic, e.g., as described in Reference 2). From a conflict perspective, the controller is one
player and environmental uncertainty is the other player.

Another setting, which is one of cooperation, that is connected to control theory is team games.
Team games can include many players, and their defining feature is that all players have the same
preferences, typically expressed as a utility function. The complication is that different players
have access to different information, and this distribution of information prohibits a conventional
centralized controller implementation.
A third setting, which is one of neither strict conflict nor cooperation, is where a controller
can be viewed as a collection of interacting subsystems that are distributed over an operating
domain with limited communication capabilities (mobile sensor networks, robotic swarms, etc.).
Each subsystem can be viewed as a player in a game, but now the challenge is to design the game
itself (i.e., both preferences and online adaptation rules) to evoke a desirable emergent behavior
of the collective subsystems (see 3, 4).
This review begins with a basic introduction to the elements of game theory and proceeds with
a limited sampling of selected results in each of these three settings. The analysis and theorems
stated herein are presented informally; technical details can be found in the associated references.
2. BASIC CONCEPTS AND NOTATION

We begin with a review of some basic material. The discussion is deliberately brief, given the
many texts on the topic of game theory directed at different audiences, including economists (1,
5, 6), computer scientists (7–9), and engineers (10–12).
Standard notation and basic concepts from control theory are used throughout, with clarifi-
cations provided as needed. Some specific notation is as follows. The symbol Z+ denotes the set
of nonnegative integers, {0, 1, 2, . . .}. For a set X , 2X denotes the (power) set of subsets. For a
∞ 1/2
function f : R+ → Rn , define f 2 = 0 f T (t) f (t) dt . A function f is in L2 if f 2 < ∞.
For a vector x ∈ Rn , |x|∞ = maxi=1 n
|xi |. The notation ξ ∼ N (x, X ) means that ξ is a Gaussian
random variable with mean x and covariance X .
2.1. Game Elements

A game is described by three elements. First, there is the set of players, P. We limit the discussion
to a finite set of players, i.e.,
P = {1, 2, . . . , P } .
2.2 Marden · Shamma

For each player, p ∈ P, there is a set of strategies, S p . The joint strategy set is
S = S1 × · · · × S p .
A joint strategy s ∈ S is represented as
s = (s 1 , s 2 , . . . , sp ).
We sometimes represent a joint strategy from the perspective of a specific player, p ∈ P, as
s = (s 1 , . . . , sp , . . . , sp ) = (sp , s−p ).
Here, s−p denotes the set of strategies of players in P or p, i.e., players other than player p. Finally,
for each player, p ∈ P, there is a utility function
up : S → R
that captures the player’s preferences over joint strategies. That is, for any two joint strategies,
s, s ∈ S, player p strictly prefers s to s if and only if
up (s ) > up (s );
i.e., larger is better. If up (s ) = up (s ), then player p is indifferent between joint strategies s and s .
The vector of utility functions is denoted by u, i.e.,
u = (u 1 , u 2 , . . . , uP ) : S → R P .
It is sometimes more convenient to express a game in terms of cost functions rather than utility
functions. In this case, for each player, p ∈ P, there is a cost function
c p : S → R,
and player p prefers the joint strategy s to s if and only if
c p (s ) < c p (s );
i.e., smaller is better.
2.2. Examples
A game is fully described by the triplet of (a) the player set, P; (b) the joint strategy set, S; and
(c) the vector of utility functions, u (or cost functions, c ). The following examples illustrate the
versatility of this framework.
2.2.1. Two-player matrix games. In a two-player matrix game, there are two players: a row
player and a column player. Each player has a finite set of actions. Accordingly, the utility functions
can be represented by two matrices, Mrow and Mcol , for the row and column players, respectively.
The elements Mrow (i, j ) and Mcol (i, j ) indicate the utility to each player when the row player
selects its ith strategy and the column player selects its j th strategy.
Such games admit the convenient visualization shown in Figure 1a. In this illustration, the
row player’s strategy set is labeled {T, B}, and the column player’s strategy set is {L, R}. In this
www.annualreviews.org • Game Theory and Control 2.3

a Column
L R
T a, e b, f
Row
B c, g d, h
b Follower
R G B
S H C D R 0, 1 1, 0 0, 0
Leader
S 3, 3 0, 2 C 3, 3 0, 4 G 0, 0 0, 1 1, 0
H 2, 0 2, 2 D 4, 0 1, 1 B 1, 0 0, 0 0, 1
Stag hunt Prisoner’s dilemma Shapley fashion game
Figure 1
Two-player matrix games. (a) A basic visualization. The players here are labeled “row” and “column”; the
row player’s strategies are denoted as top (T) and bottom (B), and the column player’s strategies are denoted
as left (L) and right (R). In each cell, the first variable is the payoff for the row player, and the second is the
payoff for the column player. (b) Three examples of two-player matrix games. In a stag hunt, the players can
choose to hunt either a stag (S) or a hare (H); a stag is worth more than a hare, but successfully hunting a stag
requires the cooperation of the other player. In the prisoner’s dilemma, the players can either cooperate (C)
or defect (D); choosing to defect when the other player chooses to cooperate provides the largest payoff for
the defecting player, but a defection by both players results in a smaller payoff than mutual cooperation. In
the Shapley fashion game, the players can choose to wear red (R), green (G), or blue (B); the fashion leader
(row player) wants to wear a color that contrasts with what the follower (column player) is wearing, whereas
the follower wants to wear the same color as the leader.
case,

a b e f
Mrow = and Mcol = .
c d g h
If the players have more than two strategies, then the matrix representations simply have more
rows and columns. Figure 1b shows representative two-player matrix games that have received
considerable attention for their illustration of various phenomena.
2.2.2. Randomized (mixed) strategies. Continuing with the discussion of two-player matrix
games, let us redefine the strategy sets as follows. For any positive integer n, define [n] to be the
n-dimensional probability simplex, i.e.,

n
[n] = π ∈ R : πi ≥ 0 and
n
πi = 1 .
i=1
Now redefine the strategy sets as follows. If the matrices Mrow and Mcol are of dimension n × m,
define
Srow = [n] and Scol = [m].
The strategies of each player can be interpreted as the probabilities of selecting a row or column.
The utility functions over these randomized strategies are now defined as
u row (srow , scol ) = srow

T
Mrow scol ,
u col (srow , scol ) = srow
T
Mcol scol .

These redefined utility functions can be identified with the expected outcome of the previous
deterministic utility functions over independent randomized strategies.
2.2.3. Parallel congestion games. In a parallel congestion game, there is a collection of parallel
roads from A to B,
R = {1, 2, . . . , R} .
Although there is a finite set of players, the number of players is typically much larger than the
number of roads, i.e., P R. Each player must select a single road, and so the strategy set of
each player is R. For each road, there is a function that expresses the congestion on that road as a
function of the number of players using the road. That is, for each r ∈ R, there is a road-specific
congestion function
κr : Z+ → R.
A natural assumption is that the congestion functions are increasing. For the joint strategy s, let
νr (s ) denote the number of players such that ap = r, i.e., the number of users of road r.
Players seek to avoid congestion. Accordingly, the cost function of player p depends on the
selected road, r, and the road selection of other players, s−p . In terms of the congestion functions,
the cost function of player p evaluated at the joint strategy s = (r, s−p ) is
c p (r, s−p ) = κr (νr (r, s−p )),
i.e., the congestion experienced on the selected road, r.

Although this example is for parallel congestion games, one can define a similar setup for a
network of roads and player-specific starting and destination points. For an extensive discussion
of this topic, see Reference 13.
2.2.4. Repeated prisoner’s dilemma. In the presentation of the prisoner’s dilemma shown in
Figure 1b, the strategy set of each player is {C, D}, whose elements stand for cooperate and defect,
respectively. We now consider a scenario in which the matrix game is played repeatedly over an
infinite series of stages, t = 0, 1, 2, . . . . At stage t, both players simultaneously select an action in
{C, D}. The actions are observed by both players, and the process repeats at stage t + 1 and so on.
Let a(t) denote the joint actions of both players at stage t, so that a(t) can take one of four values:
(C, C), (C, D), (D, C), or (D, D). At stage t, the observed history of play is
H(t) = (a(0), a(1), . . . , a(t − 1)).
Let H∗ denote the set of such finite histories. In the repeated prisoner’s dilemma, a strategy is a
mapping

s : H∗ → C, D .
In words, a strategy is a reaction rule that dictates the action a player will take as a function of the
observed history of play. Two example strategies that have played an important role in the analysis
of the prisoner’s dilemma are grim trigger and tit-for-tat. In grim trigger, the player cooperates
(i.e., selects C) as long as the other player has always cooperated:

C, a−p (τ ) = D, ∀τ < t;
ap (t) =
D, otherwise.

In tit-for-tat, the player selects the action that the other player selected in the previous round:

C, t = 0;
ap (t) =
a−p (t − 1), t ≥ 1.
Here, p stands for either the row or column player, and as usual, − p stands for the other
player.
To complete the description of the game, we still need to define the utility functions. Let
s = (srow , scol ) denote a pair of strategies for the row and column players. Any pair of strategies,
s, induces an action stream, a(0; s ), a(1; s ), . . . . Let Up (a(t; s )) denote the single-stage utility to
player p at stage t under the (strategy-dependent) joint action a(t; s ). Now define the repeated
game utility function:

∞
up (s ) = (1 − δ) δ t Up (a(t; s )),
t=0
where the discount factor, δ, satisfies 0 < δ < 1. In words, the utility function is the future
discounted sum of the single-stage payoffs.
The repeated prisoner’s dilemma is an example of a dynamic game, which is a model of an
evolving scenario. This example illustrates that the basic framework of Section 2.1 is versatile
enough to accommodate (a) infinite strategy sets, since there are an infinite number of reac-
tion rules, and (b) utility functions that reflect payoff streams that are realized over an infinite
time.
2.3. Nash Equilibrium

Game theory has its origins in the mathematical social sciences as a modeling framework in settings
that involve multiple decision-making actors. As a modeler, one would like to propose an outcome,
or solution concept, once the sets of players, strategies, and utility functions have been specified.
Nash equilibrium is one such proposal. Note that there is an extensive literature—which is not
discussed here—that examines and critiques Nash equilibrium in the face of experimental evidence
and presents a variety of alternative proposals (see 14, 15).
At a Nash equilibrium, each player’s strategy is optimal with respect to the strategies of other
players.
Definition 1. The set of strategies (s 1∗ , s 2∗ , . . . , sp∗ ) ∈ S is a Nash equilibrium if for each

p ∈ P,
up (sp∗ , s−p
∗ ∗
) ≥ up (sp , s−p ), ∀sp ∈ S p .
An alternative representation is in terms of best-response functions.
Definition 2. The best-response function BR p : S− p → 2S p is defined by
BR p (s−p ) = {sp ∈ S p : up (sp , s−p ) ≥ up (sp , s−p ) for all sp ∈ S p }.
In words, BR p (s−p ) is the set of strategies that maximize the utility of player p in response to the
strategies s−p of other players. Note that there need not be a unique maximizer, and so BR p (s − p)
is, in general, set valued.

In terms of the best-response functions, a Nash equilibrium is a set of strategies, (s 1∗ , s 2∗ , . . . , sp∗ ) ∈

S, such that for all p ∈ P,
sp∗ ∈ BR p (s−p
∗
).
Let us now revisit the examples in Section 2.2.
2.3.1. Nash equilibria in two-player matrix games. For the stag hunt, there are two Nash
equilibria, (S, S) and (H, H); for the prisoner’s dilemma, the sole Nash equilibrium is (D, D); and
for the Shapley fashion game, there is no Nash equilibrium. These examples illustrate that games
can exhibit multiple, unique, or no Nash equilibria.
In the case of randomized strategies, the stag hunt inherits the same two Nash equilibria, now
expressed as probability vectors:

Pr [S] 1 0
srow = scol = = and srow = scol = .

Pr [H ] 0 1
Furthermore, there is a new Nash equilibrium:

2/3
srow = scol = .
1/3
The prisoner’s dilemma continues to have a unique Nash equilibrium. The Shapley fashion game
now has a single Nash equilibrium:
⎛ ⎞ ⎛ ⎞
Pr [R] 1/3
⎜ ⎟ ⎜ ⎟
srow = scol = ⎝Pr [G]⎠ = ⎝1/3⎠.
Pr [B] 1/3
2.3.2. Nash equilibria in parallel congestion games. At a Nash equilibrium, no player has an
incentive to use a different road. Accordingly, all of the roads that are being used have (approxi-
mately) the same level of congestion.
2.3.3. Nash equilibria in repeated prisoner’s dilemma games. One can show that grim trigger
versus grim trigger constitutes a Nash equilibrium in the case that the discount factor, δ, is
sufficiently close to 1. The standard interpretation is that cooperation can be incentivized provided
that the players have a long-term outlook. By contrast, the single-stage (or even finitely repeated)
prisoner’s dilemma has a unique Nash equilibrium of always defect versus always defect.
3. ZERO-SUM AND MINIMAX DYNAMIC GAMES

In zero-sum games, there are two players with strategy sets S1 and S2 , and the utility functions
satisfy the zero-sum property:
u 1 (s 1 , s 2 ) + u 2 (s 1 , s 2 ) = 0, ∀s 1 ∈ S1 , s 2 ∈ S2 .
Accordingly, an increase in utility for one player results in a decrease in utility for the other player
(by the same amount). Given this special structure, zero-sum games are usually expressed in terms
of a single objective function, φ(s 1 , s 2 ), that is the cost function of player 1 (the minimizer) and

the utility function of player 2 (the maximizer). In terms of the original formulation,
φ(s 1 , s 2 ) = −u 1 (s 1 , s 2 ) = u 2 (s 1 , s 2 ).
Define
val[φ] = sup inf φ(s 1 , s 2 ),

s 2 ∈S2 s 1 ∈S1
val[φ] = inf sup φ(s 1 , s 2 ).

s 1 ∈S1 s ∈S
2 2
The quantity val[φ] represents the best guaranteed cost for the minimizer in the worst-case sce-
nario of its strategy, s 1 , being known to the maximizer. The quantity val[φ] has an analogous
interpretation for the maximizing player. In general,
val[φ] ≤ val[φ].
In the case of equality, the zero-sum game is said to have a value:
val[φ] = val[φ] = val[φ].
For zero-sum games, a Nash equilibrium is characterized by the following saddle-point con-
dition. The pair of strategies s 1∗ and s 2∗ is a Nash equilibrium if
φ(s 1∗ , s 2 ) ≤ φ(s 1∗ , s 2∗ ) ≤ φ(s 1 , s 2∗ ), ∀s 1 ∈ S1 , s 2 ∈ S2 .
This inequality indicates that s 1∗ is the best response (for the minimizer) to s 2∗ and vice versa. If a
zero-sum game has a Nash equilibrium, then it has a value, with
val[φ] = φ(s 1∗ , s 2∗ ).
The converse need not be true, depending on whether the associate infimum or supremum oper-
ations are achieved.
In control problems, one is often interested in a worst-case analysis against an environmental
opponent. Taking the viewpoint that the control design is the minimizer, define
φwc (s 1 ) = sup φ(s 1 , s 2 ),

s 2 ∈S2
i.e., the worst-case outcome as a function of the strategy s 1 . Let us call a strategy, s 1 , that achieves
φwc (s 1 ) ≤
a security strategy that guarantees

. It may be the case that one is interested solely in construct-
ing a security strategy (for the sake of establishing a guaranteed level of performance) without
determining the optimal
(which is val[φ]) or even whether a value or Nash equilibrium exists.
3.1. Pursuit–Evasion Games

Pursuit–evasion games have long received attention in the controls literature. A very early summary
article was written by Ho et al. (16), who also directed readers to the seminal work of Isaacs (17).
The following discussion follows the work of Başar & Olsder (10, their section 8.2].
As the terminology implies, in this game, a mobile pursuer is trying to approach a mobile
evader. A typical modeling setup is that the pursuer (labeled “p”) and evader (labeled “e”) are

modeled by controlled differential equations:

p
ẋ p = f p (x p , u), x p (0) = x0 ,
ẋ e = f e (x e , v), x e (0) = x0e .
The pursuer and evader control inputs u(t) ∈ U and v(t) ∈ V, respectively, which may be subject
to various constraints that model effects such as bounded velocity, acceleration, or turning radius,
as captured by the sets U and V. Let x = (x p x e ) denote the concatenated state vector. The
evader is considered to be captured by the pursuer at time t if

(x(t), t) = 0, 1.
where the function

(·, ·) can represent events such as the pursuer being sufficiently close to the
evader.
An example is the so-called homicidal chauffeur problem (17), where the pursuer is faster than
the evader but the evader is more agile. These constraints are modeled in two dimensions as
follows. The pursuer is subject to Dubins (or unicycle) dynamics:

p
ẋ1 = V p cos(θ),
p
ẋ2 = V p sin(θ ),
θ̇ = u,

with u(t) < 1. The evader is a simple integrator:
ẋ1e = v1 ,
ẋ2e = v2 .

The constraints are v12 (t) + v22 (t) ≤ V e , where V e is the maximum evader velocity and V e < V p .
We are interested in finding closed-loop strategies for the pursuer and evader of the form
u(t) = s p (t, x p (t), x e (t)),

v(t) = s e (t, x p (t), x e (t)),
which are time- and state-dependent feedback laws. One can also formulate alternative variations,
such as partial observations of the opponent’s state, in which case a closed-loop strategy may be a
function of the history of observations.
Finally, we define an objective function of the form
T
φ(s p , s e ) = g(t, x(t), u(t), v(t)) dt + q (T , x(T )), 2.
0
where the final time, T , is defined as the shortest time that satisfies the termination condition of
Equation 1. An example setup from Ho et al. (16) is
T
φ(s p , s e ) = x T (T )Qx(T ) + u(t)Rp u(t) − vT (t)Re v(t) dt,
0

(x, t) = T max − t.
In this formulation, the game ends in a specified terminal time, T max . The terminal objective is
some quadratic function of the pursuer and evader states, such as the norm of the distance between

the pursuer and evader locations. The integrated objective penalizes the weighted control energy
of both the (minimizing) pursuer and (maximizing) evader.
Of course, the remaining challenge is to characterize strategies that constitute a Nash equilib-
rium. Note that for a fixed strategy of the opponent, the remaining player is left with a one-sided
optimal control problem (see 18). This insight leads to the Isaacs equation that defines a partial
differential equation for an unknown value function, J(t, x). Let

f p (x p , u)
f (x, u, v) =
f e (x e , v)
denote the concatenated dynamics vector. The Isaacs equation is

∂J ∂J
− = min max f (x, u, v) + g(t, x, u, v)
∂t u∈U v∈V ∂x

∂J
= max min f (x, u, v) + g(t, x, u, v) . 3.

v∈V u∈U ∂x
If a solution to the Isaacs equation exists, then one can define strategies s p (t, x) and s e (t, x) as the
associated minimizer and maximizer, respectively.
A sufficient condition for the existence of a Nash equilibrium is the following theorem.
Theorem 1 (from Reference 10, theorem 8.1). Assume that there exists a contin-
uously differentiable function, J(t, x), that satisfies (a) the Isaacs equation shown as
Equation 3 and (b) the boundary condition J(T, x) = q (T, x) on the set
(T , x) = 0. As-
sume further that the associated trajectories induced by u(t) = s p (t, x) and v(t) = s e (t, x)
generate trajectories that terminate in finite time. Then the pair (s p , s e ) constitutes a
Nash equilibrium.
Theorem 1 leaves open the issue of constructing a solution to the Isaacs equation, which can
be computationally challenging even for low-dimensional problems. This issue has motivated a
variety of constructive approaches.
3.1.1. Backwards reachability. Suppose that the capture condition of Equation 1 does not de-
pend on time but is only a function of the pursuer and evader states. Now define the goal set

G0 = x :
(x) = 0 .
The backwards reachability problem (for the pursuer) is to find all states that can be driven to
G0 regardless of the control inputs of the evader, i.e., a forced capture. Backwards reachability
approaches are closely related to methods of set invariance (e.g., 19) and viability theory (e.g., 20)
and are connected to the discussion of optimal disturbance rejection in Section 3.3.
It is convenient to describe the main idea using the discrete-time pursuit–evasion dynamics
x(t + 1) = f (x(t), u(t), v(t)), x(0) = x0 ,
subject to constraints u(t) ∈ U and v(t) ∈ V. First, define the following set:

G1 = x : ∃u ∈ U s.t. f (x, u, v) ∈ G0 , ∀v ∈ V .

In words, G1 is the set of all states that can be forced (i.e., regardless of the evader’s action, v) to
the target set in one time step. Proceeding in a similar manner, recursively define

Gk+1 = x : ∃u ∈ U s.t. f (x, u, v) ∈ Gk , ∀v ∈ V .
By construction, if ever x0 ∈ Gk for some k, then the state trajectory starting from an initial
condition of x0 can be forced to the target set in k time steps, again regardless of the evader’s
future actions. Alternatively, if an initial condition x0 does not lie in ∪∞
k=1 Gk , then the evader can
perpetually avoid the target set.
Mitchell et al. (21) developed these ideas for continuous-time dynamics. In particular, they as-
sociated backwards reachability sets with a terminal-penalty pursuit–evasion problem [the integral
cost g(·) = 0 in Equation 2] and represented the set Gτ (for nondiscrete values of τ ) as the level
set of a numerically constructed function.
3.1.2. Approximate methods. The backwards reachability approach seeks to exactly characterize
all states that lead to capture. An alternative is to exploit special simplified settings to derive a
specific strategy that guarantees capture (not necessarily optimizing any specific criterion). One
approach, taken by Zhou et al. (22), is to compute the Voronoi cell of the evader, i.e., the set
of points that the evader can reach before the pursuer. The control law derived by Zhou et al.
(22) minimizes the instantaneous time rate of change of the area of the evader’s Voronoi cell.
The authors went on to derive the minimizing control law and establish that this control law
guarantees eventual capture, where capture is defined as the proximity condition x p − x e ≤ δ
for some capture radius δ.
Karaman & Frazzoli (23) considered a probabilistic approach to compute an escape path for an
evader, if one exists. The algorithm is based on random trajectory generation for both the evader
and pursuer. The outcome is an open-loop trajectory (i.e., not a feedback law) that, with high
probability, guarantees escape for all trajectories of the pursuer. Here, the term high probability
refers to a high probability of the algorithm producing a guaranteed escape trajectory—if one
exists—but the movements of the evader and pursuer are not randomized. Randomized pursuit
trajectories were considered by Isler et al. (24). The game is played in a nonconvex polygonal
region, where the pursuer’s objective is to have the evader in its line of sight. The authors showed
that randomized open-loop pursuit trajectories can lead to capture in settings where deterministic
trajectories can be perpetually evaded.
Note that backwards reachability and approximate methods are more aligned with the above-
mentioned worst-case or minimax formulation in that they do not attempt to address the construc-
tion of a pair of strategies that constitutes a Nash equilibrium. Rather, the objective is to derive a
strategy for one of the players that guarantees a desired outcome regardless of the behavior of the
other player.
3.2. Zero-Sum Linear–Quadratic Games and Robust Control

One of the foundational motivations for feedback control is to mitigate the effects of environmental
uncertainty, which can come in the form of exogenous disturbances, model misspecification, and
parameter variations, among others. The robust control program (e.g., 2) sought to address such
issues as optimization problems. Quoting the seminal work of Zames (25, p. 301), “can the classical
‘lead-lag’ controllers be derived from an optimization problem?”
There is an immediate connection to zero-sum or minimax games, where the controller to
be designed is in direct competition with the uncertain environment. In this section, we review

a problem of induced-norm minimization that makes this connection explicit. For an extensive
technical presentation and a historical overview, see Reference 26.
The induced-norm minimization problem is to design a controller that minimizes the effect
of exogenous disturbances. This objective can be stated as follows. Let T [K ] denote an operator
that represents a closed-loop system as a function of the controller, K . For example (see 2),

W 1 (I + P K )−1
T [K ] = ,
W 2 K (I + P K )−1
where P is the plant to be controlled and W 1 and W 2 are dynamic weighting operators (e.g.,
frequency-shaping filters).
For any stabilizing controller, define the induced norm

T (K )v
T [K ]ind,2 = sup 2
.
v∈L2 v2
We are interested in the optimization
min T [K ]ind,2 .
K stabilizing
In addition to the face-value norm minimization, this objective function is also relevant to the
problem of robust stabilization, i.e., stabilization in the presence of dynamic modeling uncertainty
(see 27). One can also formulate optimal estimation problems in a similar manner (e.g., 28).
An example setup is the case of state feedback controllers. For the linear system,
ẋ = Ax + Bu + Lv, x(0) = x0 ,

Cx
z= ,
u
u is the control signal, v is an exogenous disturbance, and z gathers signals to be made small in
the presence of disturbances (tracking error, control authority, etc.). A state feedback controller,
K ∈ Rm×n , is stabilizing if A − B K is a stability matrix. (The eigenvalues have strictly negative
real parts.) Given a stabilizing state feedback, the closed-loop system is
ẋ = (A − B K )x + Lv, x(0) = x0 ,

Cx
z= , 4.
−K x
and the resulting induced norm is

z2
sup
v∈L2 v2
along solutions of Equation 4 with zero initial conditions (x0 = 0). Clearly, this objective depends
on the stabilizing K .
To make the connection to zero-sum games, we modify the problem slightly. Rather than
attempting to minimize the norm, we ask the question of whether it is possible to make the
induced norm less than some specified level, γ > 0, as in
z2
sup ≤γ
v∈L2 v2

or, equivalently,
sup z2 − γ 2 v2 ≤ 0.
v∈L2
By iterating on γ , one can then seek to derive controllers that make γ as small as possible.
The associated objective function for the zero-sum game formulation, with u as the minimizer
and v the maximizer, is
∞ ∞
φ(u, v) = x T (t)C T C x(t) + u T (t)u(t) dt − γ 2 v T (t)v(t) dt 5.
0 0
along solutions of Equation 4, where it is not assumed that x0 = 0.

The following theorem combines the presentations of Başar & Bernhard (26, their theorem
4.11) and Limebeer et al. (29).
Theorem 2. Assume that the pair (A, B) is stabilizable and ( A, C) is observable. The
following statements are equivalent:

1. The zero-sum game Equation 5 has a value for any x0 .
2. There exists a positive definite solution to the algebraic Riccati equation

1
XA + AT X + X BBT − 2 LLT + C T C = 0
γ
with (A − (BBT − γ12 LLT )X ) a stability matrix.

3. There exists a stabilizing state feedback K such that T [K ]ind,2 < γ .
If any of these statements is satisfied, the state feedback K = −BBT X achieves T [K ]ind,2 < γ .
The above is just a sample of the extensive results concerning linear–quadratic games; for a
dedicated discussion, see Reference 30.
3.3. Minimax Dynamic Programming

In the previous section, a large adversarial disturbance, v, was penalized in Equation 5 by the
energy term
∞
−γ 2 v T (t)v(t) dt,
0
but otherwise the disturbance could have an arbitrarily large magnitude. Another model of an
adversarial disturbance is that it takes values in a bounded domain. This unknown-but-bounded
approach has its origins in Reference 31 and 32 and is related to set-invariance methods (19),
viability theory (20), and the backwards reachability method discussed in Section 3.1.
The discussion here follows Bertsekas (33, his sections 1.6 and 4.6.2). We consider discrete-
time dynamics of the form
x(t + 1) = f (x(t), u(t), v(t)), x(0) = x0 ,
subject to constraints u(t) ∈ U and v(t) ∈ V. (For added generality, one can also have the constraint
sets be state dependent.) We are interested in finding the (possibly time-dependent) state feedback
u(t) = μt (x(t)) ∈ U

to minimize

T −1
max q (x(T )) + g(x(t), μt (x(t)), v(t))
v(0),v(1),...,v(T −1)∈V
t=0
over some time horizon, [0, 1, . . . , T ]. Let μ denote a collection of state feedback laws
(μ0 (·), μ1 (·), . . . , μT −1 (·)). The order of operations here is that the controller (or minimizer) com-
mits to μ and then the constrained disturbance reacts to maximize the cost.
The dynamic programming solution follows the recursive procedure of value iteration. Define
J0 (x) = q (x)
and recursively define the cost-to-go functions


Jk (x) = min max g(x, u, v) + Jk−1 ( f (x, u, v)) .
u∈U v∈V
Then, by standard dynamic programming arguments, one can show that

T −1
JT (x0 ) = min max q (x(T )) + g(x(t), μt (x(t)), v(t)) .
μ=(μ
0 ,...μT −1 ) v(0),v(1),...,v(T −1)∈V
t=0
Furthermore, the optimal policy at stage t is the minimizing

μt (x(t)) = arg min max g(x(t), u, v) + JT −(t+1) ( f (x(t), u, v)) .
u∈U v∈V
The connection to zero-sum games is that minimax dynamic programming provides a procedure
to construct an optimal security strategy from the perspective of the minimizer.
3.3.1. Controlled invariance. The problem of controlled invariance is to maintain the state in
a specified region in the presence of disturbances. In the case of linear systems, consider
x(t + 1) = Ax(t) + Bu(t) + Lv(t),
subject to |v|∞ ≤ 1 and |u|∞ ≤ u max . Our objective is to maintain the state in the bounded region
|x|∞ ≤ 1. One can map this problem to the minimax dynamic programming approach by defining
the stage and terminal costs as

0, |x|∞ ≤ 1;
q (x) = g(x, u, v) =
1, otherwise.
Then an initial condition, x0 , satisfies JT (x0 ) = 0 if and only if there exists a policy, μ,
that assures
that the state will satisfy x(t)∞ ≤ 1 for all t = 0, 1, . . . , T .
One can mimic the backwards reachability algorithm to construct a representation of the cost-
to-go functions, Jk (·), as follows. Define
G0 = {x : |x|∞ ≤ 1}
and then define

G1 = x : ∃u with |u|∞ ≤ u max s.t. Ax + Bu + Lv ∈ G0 , ∀ |v| ≤ 1 ∩ G0 .

Then x0 ∈ G1 if and only if J1 (x0 ) = 0, or, equivalently, there exists a control action that keeps
x(t) ≤ 1 for t = 0 and t = 1. Proceeding recursively, define
∞

Gk = x : ∃u with |u|∞ ≤ u max s.t. Ax + Bu + Lv ∈ Gk−1 , ∀ |v| ≤ 1 ∩ Gk−1 .
Note that the only difference between this algorithm and that of backwards reachability is the
recursive intersection. Although computationally costly, one can use polyhedral representations
of these sets (e.g., 19). That is, one can recursively construct matrices, Mk , such that
Gk = {x : |Mk x|∞ ≤ 1} .
By construction, JT (x0 ) = 0 if and only if x0 ∈ GT . Furthermore, subject to certain technical

conditions (32), if x0 ∈ ∩∞ G , then there exists a policy that keeps x(t) ≤ 1 indefinitely.
k=0 k
∞
It turns out that the controlled invariance problem is also related to minimizing an induced
norm of the form

T (K )v
∞
T [K ]ind,∞ = sup ,
v∈L∞ v∞
where, for a function f : Z+ → Rn ,

f ∞ = sup f (t)∞ .
t=0,1,...
This optimization is known as

1 optimal control (see 34 and references therein). The connection
between
1 optimal control and controlled invariance was investigated in References 35 and 36.
3.3.2. Linear-parameter-varying optimal control. Linear-parameter-varying systems are lin-

ear systems that depend on an exogenous time-varying parameter whose values are unknown a
priori. Stemming from the connection to the nonlinear design methodology of gain schedul-
ing, the linear-parameter-varying paradigm can serve as a middle-ground model between linear
time-invariant and nonlinear models (for an overview and further discussion, see 37).
Consider the controlled discrete-time linear-parameter-varying dynamics
x(t + 1) = A(θ(t))x(t) + Bu(t), x(0) = x0 . 6.
For this discussion, we assume that the parameter takes on finite values,
θ (t) ∈ Q, ∀t = 0, 1, 2, . . . ,
some finite set, Q. One can also add constraints like bounds on the rate of change, such as
for
θ (t + 1) − θ (t) ≤ δmax .
We are interested in deriving a collection of state feedback laws (μ0 (·), μ1 (·), . . . , μT −1 (·)) to
optimize

T −1
sup g(x(t), μt (x(t)))
θ(0),θ(1),...,θ(T −1)∈Q t=0
along solutions of Equation 6. This structure falls within the above-mentioned minimax dynamic
programming, where the game is between the control, u, and parameter, θ.
The following results, along with explicit computational methods, were presented in
Reference 38.

Theorem 3. Assume that the stage cost has the polyhedral representation

g(x, u) = Mg x + N g u ∞ .
Then there exists a sequence of matrices, Mk , so that the cost-to-go functions of minimax
dynamic programming satisfy
Jk (x) = |Mk x|∞ .
Furthermore, the stationary (receding horizon) feedback policy

μT (x(t)) = arg min max g(x(t), u) + JT A(θ)x(t) + Bu
u θ
asymptotically approximates the optimal infinite horizon policy for sufficiently large
horizon lengths, T .
3.4. Asymmetric Information and Randomized Deception

Thus far, we have assumed an environmental opponent that is all knowing in that we guard against
all possible opponent actions after committing to a control policy. Another formulation of interest
is where the opposing environment has limited information.
A specific mathematical model is the setting of repeated zero-sum matrix games. In a one-shot
zero-sum game, there is a matrix, M ∈ R I ×J , whose i j th element is the penalty to the (minimizing)
row player and payoff to the (maximizing) column player when the row player uses its ith (out of
I ) action and the column player uses its j th (out of J) action.
In a repeated zero-sum game, this game is repeated over a sequence of stages, t = 0, 1, 2, . . .
(see Section 2.2.4), and a strategy is a mapping from past actions over stages 0, 1, . . . , t − 1 to the
selected action in stage t. In the case of asymmetric information, the characteristic matrix, M , is
selected by nature from a collection of possible matrices, M, as in

M ∈ M = M 1, M 2, . . . , MK ,
according to some known probability distribution, π ∈ [K ], where πk is the probability that the
matrix Mk is selected.
The informed player knows which matrix was selected, whereas the uninformed player does not.
Furthermore, a common assumption is that the uninformed player cannot measure the realized
payoff at each stage. Accordingly, the strategy of the informed player can depend on the selected
characteristic matrix, whereas the strategy of the uninformed player can depend only on past
actions. Following the notation of Section 2.2.4, let H∗ denote the set of finite histories of play.
Then a (behavioral) strategy for the informed row player is a mapping
srow : H∗ × M → [P],
whereas a strategy for the uninformed column player depends only on the history of play, as in
scol : H∗ → [Q].
This setup, as with more complicated variations, has received considerable attention, from the
seminal work of Aumann et al. (39) through the more recent monograph by Mertens et al.
(40).
A motivating setting is network interdiction (e.g., 41–43). In network interdiction prob-
lems, the activities of a network owner (defender) are being observed by an attacker. The

network owner is the informed player in that the owner knows the details of the network,
e.g., which parts of the network are protected or susceptible. Based on these observations, the
attacker launches a limited attack to disable a portion of the network. The network owner
faces a trade-off of exploitation versus revelation. If the network owner exploits its superior
information, it may reveal sensitive information to the attacker and thereby increase the like-
lihood of a critical attack. (For further discussion of game theory for network security, see
44.)
Returning to the repeated zero-sum game problem, it turns out that the optimal strategy of
the informed player is to randomize its actions based on the following model of the uninformed
player. First, assume that the uninformed player is executing Bayesian updates to compute the
posterior probability
post
πk [H(t)] = Pr [Mk | H(t)] ,
which is the posterior probability that the matrix Mk was selected based on the observed sequence
of past actions, H(t). The action of the uninformed player is then modeled as a myopically optimal
randomized strategy with respect to these posterior probabilities:
K
post
σcol (H(t)) = arg max min πrow
T
πk [H(t)]Mk πcol .
πcol ∈[Q] πrow ∈[P ]
k=1
A classic result (see 39) is that the informed row player’s optimal security strategy is a best
response to the above column player strategy. The resulting strategy will randomize according to
probabilities that depend on both the stage of play and the online outcome of prior stages. One can
think of this strategy as a deception attempt to influence the posterior beliefs of the uninformed
attacker. Recent work has presented efficient algorithms for the computation of these probabilities
based on recursive linear programs (42, 43, 45, 46).
4. TEAM GAMES
We now discuss another class of games that has had a long-standing relationship with control
theory: team games. In team games, there can be multiple players, 1, 2, . . . , P . The defining
characteristic is that all players have the same preferences. Recall that S = S1 × · · · × S P is the
joint strategy set for players 1, 2, . . . , P . In team games,1 for some cost function, C : S → R,
c1 (s ) = c2 (s ) = · · · = c P (s ) = C(s ).
While team games have been of interest for several decades (e.g., 47), there has been a recent
surge of interest related to distributed control applications, where there is no single centralized
controller that has authority over all control inputs or access to all measurements (e.g., 48–51)
(for an alternative approach to distributed control, see Section 5).
The main issue in team games is that each player has access to different information. If all
players had the same information, then one could approach the game as a conventional central-
ized control design problem. Players with different information can significantly complicate the
characterization and, more importantly, the explicit and efficient computational construction of
optimal strategies.
1
In accordance with much of the literature, we assume that players are minimizers.

Another issue in team games is the distinction between a team-optimal strategy and a Nash
equilibrium. The joint strategy, s ∗ , is team optimal if
C(s ∗ ) ≤ C(s ), ∀s ∈ S,
but a Nash equilibrium if
C(s 1∗ , . . . , sp∗ , . . . , s P∗ ) ≤ C(s 1∗ , . . . , sp , . . . , s P∗ ), ∀sp ∈ S p , ∀ p ∈ P.
In much of the team game literature, a Nash equilibrium is referred to as person-by-person

optimal. Clearly, a team-optimal strategy is a Nash equilibrium, but the converse need not hold.
4.1. Static Linear–Quadratic Team Games

A classic result that dates back to work by Radner (52) is as follows [the discussion here follows Ho
(47)]. There are P players, and player p can measure the scalar (for simplicity) random variable
zp = H p ξ ,
where ξ ∈ Rn is Gaussian; ξ ∼ N (0, X ), with X > 0; and H p is a matrix of appropriate dimension.

The strategy of player p is a mapping
sp : R → R
from a player’s measurement, z p , to its actions, up , i.e., up = sp (z p ). The common cost function is

1
C(s 1 , . . . , sp ) = E u T Qu + u T Sξ ,
2
where matrices Q = Q T > 0 and S are cost function parameters and
⎛ ⎞ ⎛ ⎞
u1 s 1 (z1 )
⎜.⎟ ⎜ . ⎟
u=⎜ ⎟ ⎜
⎝ .. ⎠ = ⎝ .. ⎠.
⎟
uP s P (zP )
Theorem 4 (from Reference 52). The unique Nash equilibrium strategies are team
optimal and linear, i.e.,
sp∗ (z p ) = F p z p
for matrices F p .
For this specific case, person-by-person optimality also implies team optimality.
4.2. Information Structures

In more general models, the measurements of each player can depend on the actions of other
players, as in

P
zp = H p ξ + Dpq u q 7.
q =1

for matrices Dpq . A specific case that has received considerable attention is the Witsenhausen
counterexample (53; see also 54). Here, there are two players, and the random variable ξ = (ξ1 , ξ2 )
is in R2 . The measurements are
z1 = ξ1 ,
z2 = ξ2 + u 1 ,
and the joint cost function is

C(s 1 , s 2 ) = E α · (u 1 − ξ1 )2 + β · (u 2 − u 1 )2
for positive weights α, β > 0. The standard interpretation of this example is that player 1, through
its action u 1 , faces the dual objective to either cancel the disturbance effect of ξ1 or signal its own
value to u 2 to overcome the noise effect of ξ2 .
Despite the linear–quadratic look and feel of this example, Witsenhausen (53) showed that
there exist nonlinear team-optimal strategies that strictly outperform the best linear strategies.
The team optimality of nonlinear strategies leads to the question of when linear strategies are team
optimal, which ultimately relates to the desire to explicitly construct team-optimal strategies. For
example, one well-known case is that of partially nested information. Continuing with the general
formulation of Equation 7, suppose that the measurements can be decomposed as follows:

zq , ∀q ∈ O p
zp = ,
Hp ξ
where the sets O p are defined as
O p = {q | Dpq = 0}.
In words, the measurement of player p includes the information of player q whenever the action
of player q affects the measurement of player p (as indicated by Dpq = 0).
Theorem 5 (from Reference 55). Under partially nested information, team-optimal

strategies are unique and linear.
Again, person-by-person optimality implies team optimality.
4.3. Common Information

A more recent development for team games exploits when players have some measurement in-
formation in common (56, 57). To establish the approach, consider a variation of the setup in
Section 4.1. Let ξ be a random variable, and let z0 , z1 , z2 be three random variables that are
functions of ξ .
For this discussion, there are two players. Player 1 measures (z0 , z1 ), and player 2 measures
(z0 , z2 ). As before, each player’s strategy is a mapping from its measurements to an action, and so
u 1 = s 1 (z0 , z1 ),
u 2 = s 2 (z0 , z2 ).
The common cost function is

C(s 1 , s 2 ) = E [
(ξ , u 1 , u 2 )] .

We see that both players have access to the common information, z0 .

The approach of Nayyar et al. (56, 57) is to transform the team problem into a centralized
optimization as follows. First, let z0 , z1 , z2 , u 1 , and u 2 take values in the sets Z0 , Z1 , Z2 , U1 , and
U2 , respectively. Define S1 to be the set of mappings
S1 = {φ1 | φ1 : Z1 → U1 } .
Likewise, define
S2 = {φ1 | φ2 : Z2 → U2 } .
Finally, let R denote the set of mappings

R = {ρ | ρ : Z0 → S1 × S2 } .
Now imagine a hypothetical coordinator that has access to the common information, z0 . Upon
obtaining a measurement, the coordinator recommends a pair of mappings r1 (·; z0 ) and r2 (·; z0 )
to the two players to apply to their private information, z1 and z2 , respectively. The optimization
problem for the coordinator is then
min E [
(ξ , r1 (z1 ; z0 ), r2 (z2 ; z0 )] ,
(r1 ,r2 )∈R
i.e., to derive the optimal recommended mappings. The main results of Nayyar et al. (56, 57) es-
tablish conditions under which the optimal recommendations, (r1∗ (·; z0 ), r2∗ (·; z2 )), constitute team-
optimal strategies, with the associations
s 1∗ (z0 , z1 ) = r1∗ (z1 ; z0 ),
s 2∗ (z0 , z2 ) = r2∗ (z2 ; z0 ).
4.4. Dynamic Settings

The discussion thus far has been restricted to static games, where there is no explicit underlying
dynamical system. (Indeed, dynamic team games can often be mapped to a static formulation
by associating the control actions over different times as different players.) The question of the
structure of team-optimal controllers in dynamic games also has received significant interest. A
prototypical setup (here with two players) is stochastic linear dynamics:
x(t + 1) = Ax(t) + B1 u 1 (t) + B2 u 2 (t) + Lw(t), x(0) ∼ N (x0 , ),
y1 (t) = C1 x(t) + D1 w(t),
y2 (t) = C2 x(t) + D2 w(t),
where a strategy for player 1, s 1 , is a dynamical system mapping measurements, y1 (·), to control
actions, u 1 (·). Likewise, a strategy for player 2, s 2 , is a dynamical system mapping measurements,
y2 (·), to control actions, u 2 (·). The common cost function to be minimized is
T −1

C(s 1 , s 2 ) = E x(t) Qx(t) + u 1 (t) u 1 (t) + u 2 (t) u 2 (t) + x(T ) Sx(T )
T T T T
t=0
for some positive definite weighting matrices Q, S, and .

Of particular interest is when the structure of the dynamics and measurements admit linear
team-optimal strategies whose computation is tractable. Examples include one-step delay infor-
mation sharing (e.g., 58–60), funnel causality (61), and quadratic invariance (62).

5. GAME-THEORETIC DISTRIBUTED CONTROL

In this section, we begin to view game theory from a design perspective as opposed to the primarily
analytical focus discussed in the preceding sections. Our motivation for this new direction is the
problem of distributed control, which focuses on systems where decision making is distributed
throughout the system. More formally, each system comprises several individual subsystems, each
constrained to making independent decisions in response to locally available information. Some
examples are the following:
Networking routing: In network routing, the goal of a system operator is to allocate demand
over a network to optimize a given measure of performance, e.g., throughput (63, 64). In
most situations, centralized routing policies are infeasible because routing decisions are made
at a far more local level. Lasaulce et al. (65) and Ozdaglar & Menache (66) provide a broader
discussion of network applications.
Wind farms: Wind farm control seeks to optimize the power production. Employing control
strategies where the individual turbines independently optimize their own power production
is not optimal with regard to optimizing the power production in the wind farm (67–70).
Area coverage: In coverage problems, a collection of mobile sensors in an unknown en-
vironment seeks to maximize the area under surveillance. Furthermore, some portions of
the environment may carry more weight than others. Applications range from following a
drifting spill to intruder detection (71–73).
The overarching goal of such distributed control problems is to characterize admissible decision-
making policies for the individual subsystems that ensure the emergent collective behavior is
desirable.
The design of distributed control policies can be derived from a game-theoretic perspective,
where the subsystems are modeled as players in a game with designed utility functions. We no
longer view equilibrium concepts such as Nash equilibrium as plausible outcomes of a game.
Rather, we view these equilibrium concepts as stable outcomes associated with distributed learn-
ing where the individual subsystems adjust their behavior over time in response to information
regarding their designed utility function and the behavior of the other subsystems.
The following sections present highlights of a game-theoretic approach. Two recent overview
articles are References 3 and 4.
5.1. Learning in Games

We begin by focusing on the question of how players can reach an equilibrium. To that end,
we consider the framework of repeated one-shot games in which a game is repeated over time
and players are allowed to revise their strategies in response to available information regarding
previous plays of the game (74). More formally, consider any finite strategic form game with player
set P, finite strategy sets S, and utility functions u. A repeated one-shot game yields a sequence
of strategy profiles s (0), s (1), s (2), . . . , where at any time t ≥ 1, the strategy profile s(t) is derived
from a set of decision-making rules D = (d1 , . . . , dn ) of the form
sp (t) = d p (s (0), s (1), . . . , s (t − 1); up ), 8.
meaning that each player adjusts its strategy at time t using knowledge of the previous decisions
of the other players as well as information regarding the structural form of the player’s utility
function. Any learning algorithm that can be expressed in this form is termed uncoupled, as
players are not allowed to condition their choice on information regarding the utility functions of
other players. An example of a decision-making rule of this form is the well-studied best-response

dynamics, in which
sp (t) ∈ BR p (s−p (t − 1)), 9.
where BR p (·) is the best-response set defined in Definition 2. In the best-response dynamics, each
player selects a best response to the behavior of the other players at the previous play of the game.
An alternative class of learning algorithms to Equation 8 that imposes less informational de-
mands on the players is termed completely uncoupled dynamics or payoff-based dynamics (see
75–77) and is of the form
sp (t) = d p (up (s (0)), up (s (1)), . . . , up (s (t − 1))). 10.
In completely uncoupled dynamics, each agent is now given only the payoff that the agent received
at each play of the game. A player is no longer able to observe the behavior of the other agent
or access the utility that the agent would have received for any alternative choices at any stage
of the game. In problems such as network routing, learning algorithms of the form shown in
Equation 10 may be far more reasonable than those in Equation 8.
Ignoring the informational demands for the moment, the theory of learning in games has sought
to establish whether there exist distributed learning algorithms of the form shown in Equation 8
that will always ensure that the resulting collective behavior reaches a Nash equilibrium (or its
generalizations). Representative papers include References 74, 78, and 79 in the economic game
theory literature and References 80–82 in the control literature.
The following theorem highlights an inherent challenge associated with distributed learning
for Nash equilibrium in general.
Theorem 6 (from Reference 75). There are no natural dynamics of the form shown
in Equation 8 that converge to a pure Nash equilibrium for all games.
The term natural used in Theorem 6, which we do not explicitly define here, seeks to eliminate
learning dynamics that exhibit phenomena such as exhaustive search or centralized coordination.
A key point regarding this negative result is the phrase “for all games,” which effectively means that
there are complex games for which no natural dynamics exist that converge to a Nash equilibrium.
A central question that we seek to address here is whether a system designer can exploit the
freedom to design the players’ utility functions to avoid this negative result.
5.2. Game Structures

In this section, we turn our attention to games that possess an underlying structure that negates
the negative result given in Theorem 6. One such class of games, termed potential games (83), is
defined as follows.
Definition 3. A game (P, S, u) is a potential game if there exists a potential function

φ : S → R such that for any strategy profile s ∈ S, player p ∈ P, and strategy sp ∈ S p ,
up (sp , s−p ) − up (sp , s−p ) = φ(sp , s−p ) − φ(sp , s−p ). 11.
In a potential game, each player’s utility is directly aligned with a common potential function in
the sense that the change in the utility that a player would receive by unilaterally deviating from
a strategy sp to a new strategy sp when all other players are selecting s−p is equal to the difference
of the potential function over those two strategy profiles. Note that there is no mention of what
happens to the utility of the other players j = i across these two strategy profiles.

There are several desirable properties regarding potential games that are of interest to dis-
tributed control. First, a pure Nash equilibrium is guaranteed to exist in any potential game
because any strategy s ∈ arg maxs ∈S φ(s ) is a pure Nash equilibrium. Second, there are natural
dynamics that do in fact lead to a pure Nash equilibrium for any potential game. In fact, the follow-
ing mild modification of the best-response dynamics given in Equation 12, termed best-response
dynamics with inertia, accomplishes this goal:

BR p (s−p (t)) with probability (1 − ),
sp (t) ∈ 12.
sp (t − 1) with probability ,
where > 0 is referred to as the player’s inertia (79). More formally, the learning algorithm in
Equation 12 converges almost surely to a pure Nash equilibrium in any potential game. Similar
positive results also hold for relaxed versions of potential games, e.g., generalized ordinal potential
games or weakly acyclic games (79), which seek to relax Equation 11 through either the equality
or the condition for all players.
One of the other interesting facets of potential games is the availability of distributed learning
algorithms that exhibit equilibrium selection properties, i.e., learning algorithms that favor one
Nash equilibrium over other Nash equilibria. For example, consider the algorithm log-linear
learning (see 84–87), where at each time t ≥ 1 a single agent i is given the opportunity to revise
its strategy, i.e., s−p (t) = s−p (t − 1), and this updating player adjusts its choice according to the
following mixed strategy:
e β·up (sp ,s−p (t−1))
sp (t) = sp with probability , 13.
β·up (sp ,s−p (t−1))
sp ∈S p e
where β > 0. For any β ≥ 0, this process induces an aperiodic and irreducible Markov process
over the finite strategy set S with a unique stationary distribution q = {q s }s ∈S ∈ (S). When the
game is a potential game, the stationary distribution satisfies
e β·φ(s )
qs = , 14.

s ∈Se β·φ(s )
which means that when β → ∞, the support of the stationary distribution is contained in the
strategy profiles with highest potential value.
5.3. Efficiency Guarantees

A key facet of any distributed control problem is a measure of the performance of the collective
behavior, which we define by a nonnegative welfare function W : S → R. Furthermore, we now
define a game by the tuple (P, S, u, W ). Consider a game G where a pure Nash equilibrium is
known to exist. We measure the efficiency of this Nash equilibrium according to two worst-case
measures, termed the price of anarchy and the price of stability, defined as follows (88):
! "
W (s ne )
PoA(G) = min , 15.
s ne ∈G maxs ∈S W (s )
! "
W (s ne )
PoS(G) = max , 16.
s ne ∈G maxs ∈S W (s )
where we use the notation s ne ∈ G to mean a pure Nash equilibrium in the game G. The price
of anarchy seeks to bound the performance of any pure Nash equilibrium relative to the optimal
strategy profile, while the price of stability focuses purely on the best Nash equilibrium. Note that
1 ≥ PoS(G) ≥ PoA(G) ≥ 0.

Now consider a family of games G where a pure Nash equilibrium is known to exist for each
game G ∈ G. The price of anarchy and price of stability extend to this family of games G in the
following manner:
PoA(G) = min PoA(G), 17.

G∈G
PoS(G) = min PoS(G). 18.

G∈G
In essence, the price of anarchy and price of stability provide worst-case performance guarantees
when restricting attention to a specific type of equilibrium behavior.
5.4. Utility Design

We now turn to the question of how to design agent utility functions. To that end, suppose that a
#
system operator has knowledge of the player set P, an overestimate of the strategy sets S̄ = j S̄ j ,
and a welfare function W : S̄ → R. By overestimate, we mean that the true strategy sets satisfy
Sp ⊆ S̄ p for all i, but the system designer does not know this information a priori. Accordingly,
the question that we seek to address is whether a system design can commit to a specific design of
agent utility functions that ensures desirable properties irrespective of the chosen strategy sets S.
The following theorem from References 89–92 provides one such mechanism.
Theorem 7. Consider any player set P, an overestimate of the strategy set S̄, and a
welfare function W : S̄ → R. Define an overestimate of the utility function where for
any s ∈ S̄ and player p ∈ P

up (sp , s−p ) = W (sp , s−p ) − W spBL , s−p , 19.
where spBL ∈ S̄ p is a fixed baseline strategy. Then for any game G = (P, S, u, W ) where
S p ⊆ S̄ p for all p, the game G is a potential game with potential function W restricted
to the domain S.
Consider the family of games G induced by the utility design mechanism given in Equation 19
for any baseline strategy s BL . Theorem 7 implies that the price of stability satisfies
PoS(G) = 1
or, alternatively, that the optimal strategy profile is a Nash equilibrium for any game G ∈ G,
i.e., any realization with strategy sets S p ⊆ S̄ p . Not only is the optimal strategy profile a pure
Nash equilibrium, but this optimal strategy profile is also the potential function maximizer. Con-
sequently, by appealing to the algorithm log-linear learning defined above, one can ensure that
the resulting collective behavior is nearly optimal when β is sufficiently high.
5.4.1. Optimizing the price of anarchy. Note that Theorem 7 makes no claim regarding the
price of anarchy. In fact, what utility design mechanism optimizes the price of anarchy over the set
of induced games G remains an open question. Recent work has sought to address this question in
several different domains, including concave cost-sharing problems (93, 94), submodular resource
allocation problems (95, 96), network coding problems (97), set-covering problems (98), routing
problems (64, 99), and more general cost-sharing problems with a restriction to utility design
mechanisms that enforce p up (s ) = W (s ) for all s ∈ S (100–102).

5.4.2. Local utility design. The utility design given in Theorem 7 also makes no reference to the
structure or the informational dependence or locality of the derived utility functions. This question
has been looked at extensively in the cost-sharing literature, where agents’ utility functions are
required to depend only on information regarding the selected resources; for example, in network
routing problems, the utility of a player for selecting a route can depend only on the edges within
that route and the other players that selected these edges. Confined to such local agent utility
functions, Gopalakrishnan et al. (103) derived the complete set of local agent utility functions that
ensure the existence of a pure Nash equilibrium; however, optimizing over such utility functions
to optimize the price of anarchy is very much an open question, as highlighted above. Other recent
work (104–106) has introduced a state in the game environment as a design parameter to design
player objective functions of a desired locality.
6. CONCLUDING REMARKS
This review has presented an overview of selected topics in game theory of particular relevance
to control systems, namely zero-sum games, team games, and game-theoretic distributed control.
Of course, with limited space, there are bound to be some major omissions. Topics not discussed
here include the following:
General sum dynamic games: These are dynamic games that need not be zero-sum or team
games. Of particular interest is the existence of Nash equilibria (for a summary, see 107) as
well as reduced-complexity solution concepts (e.g., 108, 109) and learning (e.g., 110, 111).
Large-population games: Here, a very large number of players can be approximated as a
continuum. Sandholm (112) provides a general discussion, and Quijano et al. (113) provide
a recent overview tailored to a control audience. Also related to large populations is the topic
of mean-field games (114).
Cooperative game theory: All of the models discussed above fall under the framework of
noncooperative game theory. Cooperative game theory is a complementary formalism that
relates to problems such as bargaining, matching, and coalition formation (e.g., for various
engineering applications, see 115–117). Fele et al. (118) provide a recent overview tailored
to a control audience.
DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.
ACKNOWLEDGMENTS
This work was supported by Office of Naval Research grant N00014-15-1-2762, National Science
Foundation grant ECCS-1351866, and funding from King Abdullah University of Science and
Technology (KAUST).
LITERATURE CITED
1. Myerson R. 1997. Game Theory. Cambridge, MA: Harvard Univ. Press
2. Zhou K, Doyle J. 1998. Essentials of Robust Control. Upper Saddle River, NJ: Prentice Hall
3. Marden JR, Shamma JS. 2015. Game theory and distributed control. In Handbook of Game Theory with
Economic Applications, Vol. 4, ed. H Young, S Zamir, pp. 861–99. Amsterdam, Neth.: Elsevier

4. Marden JR, Shamma JS. 2018. Game theoretic learning in distributed control. In Handbook of Dynamic
Game Theory, ed. T Başar, G Zaccour. Basel, Switz.: Birkhaüser. In press
5. Fudenberg D, Tirole J. 1991. Game Theory. Cambridge, MA: MIT Press
6. Osborne M, Rubinstein A. 1994. A Course in Game Theory. Cambridge, MA: MIT Press
7. Cesa-Bianchi N, Lugosi G. 2006. Prediction, Learning, and Games. Cambridge, UK: Cambridge Univ.
Press
8. Leyton-Brown K, Shoham Y. 2008. Essentials of Game Theory: A Concise, Multidisciplinary Introduction.
San Rafael, CA: Morgan & Claypool
9. Nisan N, Roughgarden T, Tardos E, Vazirani VV. 2007. Algorithmic Game Theory. Cambridge, UK:
Cambridge Univ. Press
10. Başar T, Olsder G. 1999. Dynamic Noncooperative Game Theory. Philadelphia, PA: Soc. Ind. Appl. Math.
11. Bauso D. 2016. Game Theory with Engineering Applications. Philadelphia, PA: Soc. Ind. Appl. Math.
12. Hespanha J. 2017. Noncooperative Game Theory: An Introduction for Engineers and Computer Scientists.
Princeton, NJ: Princeton Univ. Press

13. Roughgarden T. 2005. Selfish Routing and the Price of Anarchy. Cambridge, MA: MIT Press
14. Camerer C. 2003. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton, NJ: Princeton
Univ. Press
15. Rubinstein A. 1998. Modeling Bounded Rationality. Cambridge, MA: MIT Press
16. Ho Y, Bryson A, Baron S. 1965. Differential games and optimal pursuit-evasion strategies. IEEE Trans.
Automat. Control 10:385–89
17. Isaacs R. 1965. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control
and Optimization. New York: Wiley
18. Liberzon D. 2012. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton,
NJ: Princeton Univ. Press
19. Blanchini F, Miani S. 2007. Set-Theoretic Methods in Control. Basel, Switz.: Birkhäuser
20. Aubin J. 1991. Viability Theory. Basel, Switz.: Birkhäuser
21. Mitchell IM, Bayen AM, Tomlin CJ. 2005. A time-dependent Hamilton-Jacobi formulation of reachable
sets for continuous dynamic games. IEEE Trans. Automat. Control 50:947–57
22. Zhou Z, Zhang W, Ding J, Huang H, Stipanović DM, Tomlin CJ. 2016. Cooperative pursuit with
Voronoi partitions. Automatica 72:64–72
23. Karaman S, Frazzoli E. 2011. Incremental sampling-based algorithms for a class of pursuit-evasion
games. In Algorithmic Foundations of Robotics IX: Selected Contributions of the Ninth International Workshop
on the Algorithmic Foundations of Robotics, ed. D Hsu, V Isler, JC Latombe, MC Lin, pp. 71–87. Berlin:
Springer
24. Isler V, Kannan S, Khanna S. 2005. Randomized pursuit-evasion in a polygonal environment. IEEE
Trans. Robot. 21:875–84
25. Zames G. 1981. Feedback and optimal sensitivity: model reference transformations, multiplicative semi-
norms, and approximate inverses. IEEE Trans. Automat. Control 26:301–20
26. Başar T, Bernhard P. 1991. H-Infinity-Optimal Control and Related Minimax Design Problems. Basel, Switz.:
Birkhäuser
27. Doyle J, Francis B, Tannenbaum A. 2009. Feedback Control Theory. Mineola, NY: Dover
28. Banavar RN, Speyer JL. 1991. A linear-quadratic game approach to estimation and smoothing. In 1991
American Control Conference (ACC), pp. 2818–22. New York: IEEE
29. Limebeer DJN, Anderson BDO, Khargonekar PP, Green M. 1992. A game theoretic approach to H∞
control for time-varying systems. SIAM J. Control Optim. 30:262–83
30. Engwerda J. 2005. LQ Dynamic Optimization and Differential Games. Hoboken, NJ: Wiley
31. Bertsekas D, Rhodes I. 1971. On the minimax reachability of target sets and target tubes. Automatica
7:233–47
32. Bertsekas D. 1972. Infinite time reachability of state-space regions by using feedback control. IEEE
Trans. Automat. Control 17:604–13
33. Bertsekas D. 2005. Dynamic Programming and Optimal Control. Belmont, MA: Athena Sci.

34. Dahleh M, Diaz-Bobillo I. 1995. Control of Uncertain Systems: A Linear Programming Approach. Upper
Saddle River, NJ: Prentice Hall
35. Shamma JS. 1993. Nonlinear state feedback for
1 optimal control. Syst. Control Lett. 21:265–70
36. Shamma JS. 1996. Optimization of the
∞ -induced norm under full state feedback. IEEE Trans. Automat.
Control 41:533–44
37. Shamma JS. 2012. An overview of LPV systems. In Control of Linear Parameter Varying Systems with
Applications, ed. J Mohammadpour, CW Scherer, pp. 3–26. Boston: Springer
38. Shamma JS, Xiong D. 1999. Set-valued methods for linear parameter varying systems. Automatica
35:1081–89
39. Aumann RJ, Maschler M, Stearns RE. 1995. Repeated Games with Incomplete Information. Cambridge,
MA: MIT Press
40. Mertens J, Sorin S, Zamir S. 2015. Repeated Games. Cambridge, UK: Cambridge Univ. Press
41. Smith JC, Prince M, Geunes J. 2013. Modern network interdiction problems and algorithms. In Handbook
of Combinatorial Optimization, ed. PM Pardalos, DZ Du, RL Graham, pp. 1949–87. New York: Springer
42. Zheng J, Castañón DA. 2012. Stochastic dynamic network interdiction games. In 2012 American Control
Conference (ACC), pp. 1838–44. New York: IEEE
43. Zheng J, Castañón DA. 2012. Dynamic network interdiction games with imperfect information and
deception. In 2012 51st Annual IEEE Conference on Decision and Control (CDC), pp. 7758–63. New York:
IEEE
44. Alpcan T, Başar T. 2010. Network Security: A Decision and Game-Theoretic Approach. Cambridge, UK:
Cambridge Univ. Press
45. Li L, Shamma JS. 2015. Efficient computation of discounted asymmetric information zero-sum stochastic
games. In 2015 54th Annual IEEE Conference on Decision and Control (CDC), pp. 4531–36. New York:
IEEE
46. Li L, Langbort C, Shamma J. 2017. Computing security strategies in finite horizon repeated bayesian
games. In 2017 American Control Conference (ACC), pp. 3664–69. New York: IEEE
47. Ho YC. 1980. Team decision theory and information structures. Proc. IEEE 68:644–54
48. Lewis FL, Zhang H, Hengster-Movric K, Das A. 2013. Cooperative Control of Multi-Agent Systems: Optimal
and Adaptive Design Approaches. New York: Springer
49. Shamma JS. 2008. Cooperative Control of Distributed Multi-Agent Systems. Hoboken, NJ: Wiley
50. van Schuppen J, Villa T. 2014. Coordination Control of Distributed Systems. Cham, Switz.: Springer
51. Wang Y, Zhang F. 2017. Cooperative Control of Multi-Agent Systems: Theory and Applications. Hoboken,
NJ: Wiley
52. Radner R. 1962. Team decision problems. Ann. Math. Stat. 33:857–81
53. Witsenhausen HS. 1968. A counterexample in stochastic optimum control. SIAM J. Control 6:131–47
54. Başar T. 2008. Variations on the theme of the Witsenhausen counterexample. In 2008 47th IEEE
Conference on Decision and Control, pp. 1614–19. New York: IEEE
55. Ho YC, Chu K. 1974. Information structure in dynamic multi-person control problems. Automatica
10:341–51
56. Nayyar A, Mahajan A, Teneketzis D. 2013. Decentralized stochastic control with partial history sharing:
a common information approach. IEEE Trans. Autom. Control 58:1644–58
57. Nayyar A, Mahajan A, Teneketzis D. 2014. The common-information approach to decentralized stochas-
tic control. In Information and Control in Networks, ed. G Como, B Bernhardsson, A Rantzer, pp. 123–56.
Cham, Switz.: Springer
58. Kurtaran BZ, Sivan R. 1974. Linear-quadratic-Gaussian control with one-step-delay sharing pattern.
IEEE Trans. Autom. Control 19:571–74
59. Sandell N, Athans M. 1974. Solution of some nonclassical LQG stochastic decision problems. IEEE
Trans. Autom. Control 19:108–16
60. Nayyar N, Kalathil D, Jain R. 2017. Optimal decentralized control with asymmetric one-step
delayed information sharing. IEEE Trans. Control Netw. Syst. In press. https://doi.org/10.1109/
TCNS.2016.2641802
61. Bamieh B, Voulgaris PG. 2005. A convex characterization of distributed control problems in spatially
invariant systems with communication constraints. Syst. Control Lett. 54:575–83

62. Rotkowitz M, Lall S. 2006. A characterization of convex problems in decentralized control. IEEE Trans.
Autom. Control 51:274–86
63. Roughgarden T. 2005. Selfish Routing and the Price of Anarchy. Cambridge, MA: MIT Press
64. Brown PN, Marden JR. 2017. Studies on robust social influence mechanisms, incentives for efficient
network routing in uncertain settings. IEEE Control Syst. Mag. 37:98–115
65. Lasaulce S, Jimenez T, Solan E, eds. 2017. Network Games, Control, and Optimization: Proceedings of
NETGCOOP 2016, A vignon, France. Basel, Switz.: Birkhaüser
66. Ozdaglar A, Menache I. 2011. Network Games: Theory, Models, and Dynamics. San Rafael, CA: Morgan &
Claypool
67. Marden JR, Ruben S, Pao L. 2013. A model-free approach to wind farm control using game theoretic
methods. IEEE Trans. Control Syst. Technol. 21:1207–14
68. Johnson KE, Thomas N. 2009. Wind farm control: addressing the aerodynamic interaction among wind
turbines. In 2009 American Control Conference, pp. 2104–9. New York: IEEE
69. Steinbuch M, de Boer WW, Bosgra OH, Peters S, Ploeg J. 1998. Optimal control of wind power plants.
J. Wind Eng. Ind. Aerodyn. 27:237–46
70. Pao L, Johnson KE. 2009. A tutorial on the dynamics and control of wind turbines and wind farms. In
2009 American Control Conference, pp. 35–36. New York: IEEE
71. Cortes J, Martinez S, Karatas T, Bullo F. 2004. Coverage control for mobile sensing networks. IEEE
Trans. Robot. Autom. 20:243–55
72. Marden JR, Arslan G, Shamma JS. 2009. Cooperative control and potential games. IEEE Trans. Syst.
Man Cybernet. B 39:1393–407
73. Yazicioglu AY, Egerstedt M, Shamma JS. 2016. Communication-free distributed coverage for networked
systems. IEEE Trans. Control Netw. Syst. 4:499–510
74. Fudenberg D, Levine D. 1998. The Theory of Learning in Games. Cambridge, MA: MIT Press
75. Hart S, Mas-Colell A. 2003. Uncoupled dynamics do not lead to Nash equilibrium. Am. Econ. Rev.
93:1830–36
76. Marden JR, Young HP, Arslan G, Shamma JS. 2009. Payoff based dynamics for multi-player weakly
acyclic games. SIAM J. Control Optim. 48:373–96
77. Babichenko Y. 2010. Completely uncoupled dynamics and Nash equilibria. Discuss. Pap., Cent. Study Ration.,
Hebrew Univ., Jerusalem
78. Hart S. 2005. Adaptive heuristics. Econometrica 73:1401–30
79. Young HP. 2005. Strategic Learning and Its Limits. Oxford, UK: Oxford Univ. Press
80. Shamma JS, Arslan G. 2005. Dynamic fictitious play, dynamic gradient play, and distributed convergence
to Nash equilibria. IEEE Trans. Autom. Control 50:312–27
81. Fox MJ, Shamma JS. 2013. Population games, stable games, and passivity. Games 4:561–83
82. Frihauf P, Krstic M, Başar T. 2012. Nash equilibrium seeking in noncooperative games. IEEE Trans.
Autom. Control 57:1192–207
83. Monderer D, Shapley L. 1996. Potential games. Games Econ. Behav. 14:124–43
84. Blume L. 1993. The statistical mechanics of strategic interaction. Games Econ. Behav. 5:387–424
85. Blume L. 1997. Population games. In The Economy as an Evolving Complex System II, ed. B Arthur,
S Durlauf, D Lane, pp. 425–60. Reading, MA: Addison-Wesley
86. Marden JR, Shamma JS. 2012. Revisiting log-linear learning: asynchrony, completeness and a payoff-
based implementation. Games Econ. Behav. 75:788–808
87. Alos-Ferrer C, Netzer N. 2010. The logit-response dynamics. Games Econ. Behav. 68:413–27
88. Koutsoupias E, Papadimitriou C. 1999. Worst-case equilibria. In STACS 99: 16th Annual Symposium
on Theoretical Aspects of Computer Science, Trier, Germany, March 4–6, 1999: Proceedings, ed. C Meinel, S
Tison, pp. 404–13. Berlin: Springer
89. Wolpert D. 2004. Theory of collective intelligence. In Collectives and the Design of Complex Systems, ed.
K Tumer, D Wolpert, pp. 43–106. New York: Springer
90. Arslan G, Marden JR, Shamma JS. 2007. Autonomous vehicle-target assignment: a game theoretical
formulation. ASME J. Dyn. Syst. Meas. Control 129:584–96
91. Marden JR, Wierman A. 2013. Distributed welfare games. Oper. Res. 61:155–68

92. Gopalakrishnan R, Marden JR, Wierman A. 2011. An architectural view of game theoretic control. ACM
SIGMETRICS Perf. Eval. Rev. 38:31–36
93. Phillips M, Shalaby Y, Marden JR. 2016. The importance of budget in efficient utility design. In 2016
55th Annual IEEE Conference on Decision and Control (CDC), pp. 6117–22. New York: IEEE
94. Marden J, Phillips M. 2016. Optimizing the price of anarchy in concave cost sharing games. In 2017
American Control Conference (ACC), pp. 5237–42. New York: IEEE
95. Marden J, Roughgarden T. 2014. Generalized efficiency bounds in distributed resource allocation. IEEE
Trans. Autom. Control 59:571–84
96. Vetta A. 2002. Nash equilibria in competitive societies with applications to facility location, traffic routing,
and auctions. In 43rd Annual IEEE Symposium on Foundations of Computer Science, pp. 416–25. New York:
IEEE
97. Marden JR, Effros M. 2012. The price of selfishness in network coding. IEEE Trans. Inform. Theory
58:2349–61
98. Gairing M. 2009. Covering games: approximation through noncooperation. In Internet and Network
Economics: 5th International Workshop, WINE 2009, Rome, Italy, December 14–18, 2009: Proceedings, ed.
S Leonardi, pp. 184–95. Berlin: Springer

99. Brown PN, Marden JR. 2016. Optimal mechanisms for robust coordination in congestion games. In
2015 54th Annual IEEE Conference on Decision and Control (CDC), pp. 2283–88. New York: IEEE
100. Chen HL, Roughgarden T, Valiant G. 2008. Designing networks with good equilibria. In Proceedings of
the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’08), pp. 854–63. New York:
IEEE
101. Gkatzelis V, Kollias K, Roughgarden T. 2016. Optimal cost-sharing in general resource selection games.
Oper. Res. 64:1230–38
102. Dobzinski S, Mehta A, Roughgarden T, Sundararajan M. 2017. Is Shapley cost sharing optimal? Games
Econ. Behav. In press
103. Gopalakrishnan R, Marden JR, Wierman A. 2014. Potential games are necessary to ensure pure Nash
equilibria in cost sharing games. Math. Oper. Res. 39:1252–96
104. Marden JR. 2012. State based potential games. Automatica 48:3075–88
105. Li N, Marden JR. 2013. Designing games for distributed optimization. IEEE J. Sel. Top. Signal Process.
7:230–42
106. Li N, Marden JR. 2010. Designing games to handle coupled constraints. In 2010 49th Annual IEEE
Conference on Decision and Control (CDC), pp. 250–55. New York: IEEE
107. Solan E, Vieille N. 2015. Stochastic games. PNAS 112:13743–46
108. Dudebout N, Shamma JS. 2012. Empirical evidence equilibria in stochastic games. In 2012 51st Annual
IEEE Conference on Decision and Control (CDC), pp. 5780–85. New York: IEEE
109. Dudebout N, Shamma JS. 2014. Exogenous empirical-evidence equilibria in perfect-monitoring repeated
games yield correlated equilibria. In 2014 53rd Annual IEEE Conference on Decision and Control (CDC),
pp. 1167–72. New York: IEEE
110. Arslan G, Yksel S. 2017. Decentralized Q-learning for stochastic teams and games. IEEE Trans. Autom.
Control 62:1545–58
111. Yu CK, van der Schaar M, Sayed AH. 2017. Distributed learning for stochastic generalized Nash equi-
librium problems. IEEE Trans. Signal Process. 65:3893–908
112. Sandholm W. 2010. Population Games and Evolutionary Dynamics. Cambridge, MA: MIT Press
113. Quijano N, Ocampo-Martinez C, Barreiro-Gomez J, Obando G, Pantoja A, Mojica-Nava E. 2017. The
role of population games and evolutionary dynamics in distributed control systems: the advantages of
evolutionary game theory. IEEE Control Syst. 37:70–97
114. Caines PE. 2013. Mean field games. In Encyclopedia of Systems and Control, ed. J Baillieul, T Samad,
pp. 1–6. London: Springer
115. Hamza D, Shamma JS. 2017. BLMA: a blind matching algorithm with application to cognitive radio
networks. IEEE J. Sel. Areas Commun. 35:302–16

116. Saad W, Han Z, Debbah M, Hjorungnes A, Basar T. 2009. Coalitional game theory for communication
networks. IEEE Signal Process. Mag. 26:77–97
117. Saad W, Han Z, Poor HV. 2011. Coalitional game theory for cooperative micro-grid distribution net-
works. In 2011 IEEE International Conference on Communications Workshops (ICC). New York: IEEE.
https://doi.org/10.1109/iccw.2011.5963577
118. Fele F, Maestre JM, Camacho EF. 2017. Coalitional control: cooperative game theory and control. IEEE
Control Syst. 37:53–69


Game Theory and Control-Jason R. Marden and Jeff S. Shamma-2017

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Game Theory and Control-Jason R. Marden and Jeff S. Shamma-2017

Uploaded by

Copyright:

Available Formats

AS01CH02_Shamma ARI 31 October 2017 13:22

Annual Review of Control, Robotics, and

Game Theory and Control

Santa Barbara, California 93106, USA; email: jrmarden@ece.ucsb.edu

King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900,

Annu. Rev. Control Robot. Auton. Syst. 2018. Keywords

Review in Advance first posted on

1. INTRODUCTION AND OVERVIEW

player and environmental uncertainty is the other player.

2. BASIC CONCEPTS AND NOTATION

2.1. Game Elements

2.2 Marden · Shamma

Review in Advance first posted on

A joint strategy s ∈ S is represented as

We sometimes represent a joint strategy from the perspective of a specific player, p ∈ P, as

and player p prefers the joint strategy s to s  if and only if

i.e., smaller is better.

www.annualreviews.org • Game Theory and Control 2.3

Review in Advance first posted on

Stag hunt Prisoner’s dilemma Shapley fashion game

u row (srow , scol ) = srow

2.4 Marden · Shamma

Review in Advance first posted on

c p (r, s−p ) = κr (νr (r, s−p )),

i.e., the congestion experienced on the selected road, r.

H(t) = (a(0), a(1), . . . , a(t − 1)).

www.annualreviews.org • Game Theory and Control 2.5

Review in Advance first posted on

2.3. Nash Equilibrium

Definition 1. The set of strategies (s 1∗ , s 2∗ , . . . , sp∗ ) ∈ S is a Nash equilibrium if for each

An alternative representation is in terms of best-response functions.

Definition 2. The best-response function BR p : S− p → 2S p is defined by

BR p (s−p ) = {sp ∈ S p : up (sp , s−p ) ≥ up (sp , s−p ) for all sp ∈ S p }.

2.6 Marden · Shamma

Review in Advance first posted on

In terms of the best-response functions, a Nash equilibrium is a set of strategies, (s 1∗ , s 2∗ , . . . , sp∗ ) ∈

Let us now revisit the examples in Section 2.2.

srow = scol = = and srow = scol = .

3. ZERO-SUM AND MINIMAX DYNAMIC GAMES

www.annualreviews.org • Game Theory and Control 2.7

Review in Advance first posted on

val[φ] = sup inf φ(s 1 , s 2 ),

val[φ] = inf sup φ(s 1 , s 2 ).

In the case of equality, the zero-sum game is said to have a value:

val[φ] = val[φ] = val[φ].

φ(s 1∗ , s 2 ) ≤ φ(s 1∗ , s 2∗ ) ≤ φ(s 1 , s 2∗ ), ∀s 1 ∈ S1 , s 2 ∈ S2 .

φwc (s 1 ) = sup φ(s 1 , s 2 ),

a security strategy that guarantees

3.1. Pursuit–Evasion Games

2.8 Marden · Shamma

Review in Advance first posted on

modeled by controlled differential equations:

where the function

follows. The pursuer is subject to Dubins (or unicycle) dynamics:

u(t) = s p (t, x p (t), x e (t)),

www.annualreviews.org • Game Theory and Control 2.9

Review in Advance first posted on

denote the concatenated dynamics vector. The Isaacs equation is

= max min f (x, u, v) + g(t, x, u, v) . 3.

x(t + 1) = f (x(t), u(t), v(t)), x(0) = x0 ,

2.10 Marden · Shamma

Review in Advance first posted on

3.2. Zero-Sum Linear–Quadratic Games and Robust Control

www.annualreviews.org • Game Theory and Control 2.11

and player p prefers the joint strategy s to s if and only if

BR p (s−p ) = {sp ∈ S p : up (sp , s−p ) ≥ up (sp , s−p ) for all sp ∈ S p }.

φ(s 1∗ , s 2 ) ≤ φ(s 1∗ , s 2∗ ) ≤ φ(s 1 , s 2∗ ), ∀s 1 ∈ S1 , s 2 ∈ S2 .

By construction, JT (x0 ) = 0 if and only if x0 ∈ GT . Furthermore, subject to certain technical

C(s 1∗ , . . . , sp∗ , . . . , s P∗ ) ≤ C(s 1∗ , . . . , sp , . . . , s P∗ ), ∀sp ∈ S p , ∀ p ∈ P.

for some positive definite weighting matrices Q, S, and .