Professional Documents
Culture Documents
2.1
satisfactory performance in the face of an uncertain environment (see the robust control literature
on this topic, e.g., as described in Reference 2). From a conflict perspective, the controller is one
Access provided by University of Reading on 01/25/18. For personal use only.
For each player, p ∈ P, there is a set of strategies, S p . The joint strategy set is
S = S1 × · · · × S p .
s = (s 1 , s 2 , . . . , sp ).
s = (s 1 , . . . , sp , . . . , sp ) = (sp , s−p ).
Here, s−p denotes the set of strategies of players in P or p, i.e., players other than player p. Finally,
for each player, p ∈ P, there is a utility function
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
up : S → R
Access provided by University of Reading on 01/25/18. For personal use only.
that captures the player’s preferences over joint strategies. That is, for any two joint strategies,
s, s ∈ S, player p strictly prefers s to s if and only if
up (s ) > up (s );
i.e., larger is better. If up (s ) = up (s ), then player p is indifferent between joint strategies s and s .
The vector of utility functions is denoted by u, i.e.,
u = (u 1 , u 2 , . . . , uP ) : S → R P .
It is sometimes more convenient to express a game in terms of cost functions rather than utility
functions. In this case, for each player, p ∈ P, there is a cost function
c p : S → R,
c p (s ) < c p (s );
2.2. Examples
A game is fully described by the triplet of (a) the player set, P; (b) the joint strategy set, S; and
(c) the vector of utility functions, u (or cost functions, c ). The following examples illustrate the
versatility of this framework.
2.2.1. Two-player matrix games. In a two-player matrix game, there are two players: a row
player and a column player. Each player has a finite set of actions. Accordingly, the utility functions
can be represented by two matrices, Mrow and Mcol , for the row and column players, respectively.
The elements Mrow (i, j ) and Mcol (i, j ) indicate the utility to each player when the row player
selects its ith strategy and the column player selects its j th strategy.
Such games admit the convenient visualization shown in Figure 1a. In this illustration, the
row player’s strategy set is labeled {T, B}, and the column player’s strategy set is {L, R}. In this
a Column
L R
T a, e b, f
Row
B c, g d, h
b Follower
R G B
S H C D R 0, 1 1, 0 0, 0
Leader
S 3, 3 0, 2 C 3, 3 0, 4 G 0, 0 0, 1 1, 0
H 2, 0 2, 2 D 4, 0 1, 1 B 1, 0 0, 0 0, 1
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
Figure 1
Access provided by University of Reading on 01/25/18. For personal use only.
Two-player matrix games. (a) A basic visualization. The players here are labeled “row” and “column”; the
row player’s strategies are denoted as top (T) and bottom (B), and the column player’s strategies are denoted
as left (L) and right (R). In each cell, the first variable is the payoff for the row player, and the second is the
payoff for the column player. (b) Three examples of two-player matrix games. In a stag hunt, the players can
choose to hunt either a stag (S) or a hare (H); a stag is worth more than a hare, but successfully hunting a stag
requires the cooperation of the other player. In the prisoner’s dilemma, the players can either cooperate (C)
or defect (D); choosing to defect when the other player chooses to cooperate provides the largest payoff for
the defecting player, but a defection by both players results in a smaller payoff than mutual cooperation. In
the Shapley fashion game, the players can choose to wear red (R), green (G), or blue (B); the fashion leader
(row player) wants to wear a color that contrasts with what the follower (column player) is wearing, whereas
the follower wants to wear the same color as the leader.
case,
a b e f
Mrow = and Mcol = .
c d g h
If the players have more than two strategies, then the matrix representations simply have more
rows and columns. Figure 1b shows representative two-player matrix games that have received
considerable attention for their illustration of various phenomena.
2.2.2. Randomized (mixed) strategies. Continuing with the discussion of two-player matrix
games, let us redefine the strategy sets as follows. For any positive integer n, define [n] to be the
n-dimensional probability simplex, i.e.,
n
[n] = π ∈ R : πi ≥ 0 and
n
πi = 1 .
i=1
Now redefine the strategy sets as follows. If the matrices Mrow and Mcol are of dimension n × m,
define
Srow = [n] and Scol = [m].
The strategies of each player can be interpreted as the probabilities of selecting a row or column.
The utility functions over these randomized strategies are now defined as
These redefined utility functions can be identified with the expected outcome of the previous
deterministic utility functions over independent randomized strategies.
2.2.3. Parallel congestion games. In a parallel congestion game, there is a collection of parallel
roads from A to B,
R = {1, 2, . . . , R} .
Although there is a finite set of players, the number of players is typically much larger than the
number of roads, i.e., P R. Each player must select a single road, and so the strategy set of
each player is R. For each road, there is a function that expresses the congestion on that road as a
function of the number of players using the road. That is, for each r ∈ R, there is a road-specific
congestion function
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
κr : Z+ → R.
Access provided by University of Reading on 01/25/18. For personal use only.
A natural assumption is that the congestion functions are increasing. For the joint strategy s, let
νr (s ) denote the number of players such that ap = r, i.e., the number of users of road r.
Players seek to avoid congestion. Accordingly, the cost function of player p depends on the
selected road, r, and the road selection of other players, s−p . In terms of the congestion functions,
the cost function of player p evaluated at the joint strategy s = (r, s−p ) is
2.2.4. Repeated prisoner’s dilemma. In the presentation of the prisoner’s dilemma shown in
Figure 1b, the strategy set of each player is {C, D}, whose elements stand for cooperate and defect,
respectively. We now consider a scenario in which the matrix game is played repeatedly over an
infinite series of stages, t = 0, 1, 2, . . . . At stage t, both players simultaneously select an action in
{C, D}. The actions are observed by both players, and the process repeats at stage t + 1 and so on.
Let a(t) denote the joint actions of both players at stage t, so that a(t) can take one of four values:
(C, C), (C, D), (D, C), or (D, D). At stage t, the observed history of play is
Let H∗ denote the set of such finite histories. In the repeated prisoner’s dilemma, a strategy is a
mapping
s : H∗ → C, D .
In words, a strategy is a reaction rule that dictates the action a player will take as a function of the
observed history of play. Two example strategies that have played an important role in the analysis
of the prisoner’s dilemma are grim trigger and tit-for-tat. In grim trigger, the player cooperates
(i.e., selects C) as long as the other player has always cooperated:
C, a−p (τ ) = D, ∀τ < t;
ap (t) =
D, otherwise.
In tit-for-tat, the player selects the action that the other player selected in the previous round:
C, t = 0;
ap (t) =
a−p (t − 1), t ≥ 1.
Here, p stands for either the row or column player, and as usual, − p stands for the other
player.
To complete the description of the game, we still need to define the utility functions. Let
s = (srow , scol ) denote a pair of strategies for the row and column players. Any pair of strategies,
s, induces an action stream, a(0; s ), a(1; s ), . . . . Let Up (a(t; s )) denote the single-stage utility to
player p at stage t under the (strategy-dependent) joint action a(t; s ). Now define the repeated
game utility function:
∞
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
up (s ) = (1 − δ) δ t Up (a(t; s )),
t=0
Access provided by University of Reading on 01/25/18. For personal use only.
where the discount factor, δ, satisfies 0 < δ < 1. In words, the utility function is the future
discounted sum of the single-stage payoffs.
The repeated prisoner’s dilemma is an example of a dynamic game, which is a model of an
evolving scenario. This example illustrates that the basic framework of Section 2.1 is versatile
enough to accommodate (a) infinite strategy sets, since there are an infinite number of reac-
tion rules, and (b) utility functions that reflect payoff streams that are realized over an infinite
time.
In words, BR p (s−p ) is the set of strategies that maximize the utility of player p in response to the
strategies s−p of other players. Note that there need not be a unique maximizer, and so BR p (s − p)
is, in general, set valued.
2.3.1. Nash equilibria in two-player matrix games. For the stag hunt, there are two Nash
equilibria, (S, S) and (H, H); for the prisoner’s dilemma, the sole Nash equilibrium is (D, D); and
for the Shapley fashion game, there is no Nash equilibrium. These examples illustrate that games
can exhibit multiple, unique, or no Nash equilibria.
In the case of randomized strategies, the stag hunt inherits the same two Nash equilibria, now
expressed as probability vectors:
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
Pr [S] 1 0
Access provided by University of Reading on 01/25/18. For personal use only.
2.3.2. Nash equilibria in parallel congestion games. At a Nash equilibrium, no player has an
incentive to use a different road. Accordingly, all of the roads that are being used have (approxi-
mately) the same level of congestion.
2.3.3. Nash equilibria in repeated prisoner’s dilemma games. One can show that grim trigger
versus grim trigger constitutes a Nash equilibrium in the case that the discount factor, δ, is
sufficiently close to 1. The standard interpretation is that cooperation can be incentivized provided
that the players have a long-term outlook. By contrast, the single-stage (or even finitely repeated)
prisoner’s dilemma has a unique Nash equilibrium of always defect versus always defect.
u 1 (s 1 , s 2 ) + u 2 (s 1 , s 2 ) = 0, ∀s 1 ∈ S1 , s 2 ∈ S2 .
Accordingly, an increase in utility for one player results in a decrease in utility for the other player
(by the same amount). Given this special structure, zero-sum games are usually expressed in terms
of a single objective function, φ(s 1 , s 2 ), that is the cost function of player 1 (the minimizer) and
the utility function of player 2 (the maximizer). In terms of the original formulation,
φ(s 1 , s 2 ) = −u 1 (s 1 , s 2 ) = u 2 (s 1 , s 2 ).
Define
The quantity val[φ] represents the best guaranteed cost for the minimizer in the worst-case sce-
nario of its strategy, s 1 , being known to the maximizer. The quantity val[φ] has an analogous
interpretation for the maximizing player. In general,
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
val[φ] ≤ val[φ].
Access provided by University of Reading on 01/25/18. For personal use only.
For zero-sum games, a Nash equilibrium is characterized by the following saddle-point con-
dition. The pair of strategies s 1∗ and s 2∗ is a Nash equilibrium if
This inequality indicates that s 1∗ is the best response (for the minimizer) to s 2∗ and vice versa. If a
zero-sum game has a Nash equilibrium, then it has a value, with
val[φ] = φ(s 1∗ , s 2∗ ).
The converse need not be true, depending on whether the associate infimum or supremum oper-
ations are achieved.
In control problems, one is often interested in a worst-case analysis against an environmental
opponent. Taking the viewpoint that the control design is the minimizer, define
i.e., the worst-case outcome as a function of the strategy s 1 . Let us call a strategy, s 1 , that achieves
φwc (s 1 ) ≤
The pursuer and evader control inputs u(t) ∈ U and v(t) ∈ V, respectively, which may be subject
to various constraints that model effects such as bounded velocity, acceleration, or turning radius,
as captured by the sets U and V. Let x = (x p x e ) denote the concatenated state vector. The
evader is considered to be captured by the pursuer at time t if
(x(t), t) = 0, 1.
An example is the so-called homicidal chauffeur problem (17), where the pursuer is faster than
the evader but the evader is more agile. These constraints are modeled in two dimensions as
Access provided by University of Reading on 01/25/18. For personal use only.
ẋ1e = v1 ,
ẋ2e = v2 .
The constraints are v12 (t) + v22 (t) ≤ V e , where V e is the maximum evader velocity and V e < V p .
We are interested in finding closed-loop strategies for the pursuer and evader of the form
which are time- and state-dependent feedback laws. One can also formulate alternative variations,
such as partial observations of the opponent’s state, in which case a closed-loop strategy may be a
function of the history of observations.
Finally, we define an objective function of the form
T
φ(s p , s e ) = g(t, x(t), u(t), v(t)) dt + q (T , x(T )), 2.
0
where the final time, T , is defined as the shortest time that satisfies the termination condition of
Equation 1. An example setup from Ho et al. (16) is
T
φ(s p , s e ) = x T (T )Qx(T ) + u(t)Rp u(t) − vT (t)Re v(t) dt,
0
(x, t) = T max − t.
In this formulation, the game ends in a specified terminal time, T max . The terminal objective is
some quadratic function of the pursuer and evader states, such as the norm of the distance between
the pursuer and evader locations. The integrated objective penalizes the weighted control energy
of both the (minimizing) pursuer and (maximizing) evader.
Of course, the remaining challenge is to characterize strategies that constitute a Nash equilib-
rium. Note that for a fixed strategy of the opponent, the remaining player is left with a one-sided
optimal control problem (see 18). This insight leads to the Isaacs equation that defines a partial
differential equation for an unknown value function, J(t, x). Let
f p (x p , u)
f (x, u, v) =
f e (x e , v)
∂t u∈U v∈V ∂x
∂J
Access provided by University of Reading on 01/25/18. For personal use only.
Theorem 1 (from Reference 10, theorem 8.1). Assume that there exists a contin-
uously differentiable function, J(t, x), that satisfies (a) the Isaacs equation shown as
Equation 3 and (b) the boundary condition J(T, x) = q (T, x) on the set
(T , x) = 0. As-
sume further that the associated trajectories induced by u(t) = s p (t, x) and v(t) = s e (t, x)
generate trajectories that terminate in finite time. Then the pair (s p , s e ) constitutes a
Nash equilibrium.
Theorem 1 leaves open the issue of constructing a solution to the Isaacs equation, which can
be computationally challenging even for low-dimensional problems. This issue has motivated a
variety of constructive approaches.
3.1.1. Backwards reachability. Suppose that the capture condition of Equation 1 does not de-
pend on time but is only a function of the pursuer and evader states. Now define the goal set
G0 = x :
(x) = 0 .
The backwards reachability problem (for the pursuer) is to find all states that can be driven to
G0 regardless of the control inputs of the evader, i.e., a forced capture. Backwards reachability
approaches are closely related to methods of set invariance (e.g., 19) and viability theory (e.g., 20)
and are connected to the discussion of optimal disturbance rejection in Section 3.3.
It is convenient to describe the main idea using the discrete-time pursuit–evasion dynamics
subject to constraints u(t) ∈ U and v(t) ∈ V. First, define the following set:
G1 = x : ∃u ∈ U s.t. f (x, u, v) ∈ G0 , ∀v ∈ V .
In words, G1 is the set of all states that can be forced (i.e., regardless of the evader’s action, v) to
the target set in one time step. Proceeding in a similar manner, recursively define
Gk+1 = x : ∃u ∈ U s.t. f (x, u, v) ∈ Gk , ∀v ∈ V .
By construction, if ever x0 ∈ Gk for some k, then the state trajectory starting from an initial
condition of x0 can be forced to the target set in k time steps, again regardless of the evader’s
future actions. Alternatively, if an initial condition x0 does not lie in ∪∞
k=1 Gk , then the evader can
perpetually avoid the target set.
Mitchell et al. (21) developed these ideas for continuous-time dynamics. In particular, they as-
sociated backwards reachability sets with a terminal-penalty pursuit–evasion problem [the integral
cost g(·) = 0 in Equation 2] and represented the set Gτ (for nondiscrete values of τ ) as the level
set of a numerically constructed function.
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
Access provided by University of Reading on 01/25/18. For personal use only.
3.1.2. Approximate methods. The backwards reachability approach seeks to exactly characterize
all states that lead to capture. An alternative is to exploit special simplified settings to derive a
specific strategy that guarantees capture (not necessarily optimizing any specific criterion). One
approach, taken by Zhou et al. (22), is to compute the Voronoi cell of the evader, i.e., the set
of points that the evader can reach before the pursuer. The control law derived by Zhou et al.
(22) minimizes the instantaneous time rate of change of the area of the evader’s Voronoi cell.
The authors went on to derive the minimizing control law and establish that this control law
guarantees eventual capture, where capture is defined as the proximity condition x p − x e ≤ δ
for some capture radius δ.
Karaman & Frazzoli (23) considered a probabilistic approach to compute an escape path for an
evader, if one exists. The algorithm is based on random trajectory generation for both the evader
and pursuer. The outcome is an open-loop trajectory (i.e., not a feedback law) that, with high
probability, guarantees escape for all trajectories of the pursuer. Here, the term high probability
refers to a high probability of the algorithm producing a guaranteed escape trajectory—if one
exists—but the movements of the evader and pursuer are not randomized. Randomized pursuit
trajectories were considered by Isler et al. (24). The game is played in a nonconvex polygonal
region, where the pursuer’s objective is to have the evader in its line of sight. The authors showed
that randomized open-loop pursuit trajectories can lead to capture in settings where deterministic
trajectories can be perpetually evaded.
Note that backwards reachability and approximate methods are more aligned with the above-
mentioned worst-case or minimax formulation in that they do not attempt to address the construc-
tion of a pair of strategies that constitutes a Nash equilibrium. Rather, the objective is to derive a
strategy for one of the players that guarantees a desired outcome regardless of the behavior of the
other player.
a problem of induced-norm minimization that makes this connection explicit. For an extensive
technical presentation and a historical overview, see Reference 26.
The induced-norm minimization problem is to design a controller that minimizes the effect
of exogenous disturbances. This objective can be stated as follows. Let T [K ] denote an operator
that represents a closed-loop system as a function of the controller, K . For example (see 2),
W 1 (I + P K )−1
T [K ] = ,
W 2 K (I + P K )−1
where P is the plant to be controlled and W 1 and W 2 are dynamic weighting operators (e.g.,
frequency-shaping filters).
For any stabilizing controller, define the induced norm
T (K )v
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
T [K ]ind,2 = sup 2
.
v∈L2 v2
Access provided by University of Reading on 01/25/18. For personal use only.
min T [K ]ind,2 .
K stabilizing
In addition to the face-value norm minimization, this objective function is also relevant to the
problem of robust stabilization, i.e., stabilization in the presence of dynamic modeling uncertainty
(see 27). One can also formulate optimal estimation problems in a similar manner (e.g., 28).
An example setup is the case of state feedback controllers. For the linear system,
ẋ = Ax + Bu + Lv, x(0) = x0 ,
Cx
z= ,
u
u is the control signal, v is an exogenous disturbance, and z gathers signals to be made small in
the presence of disturbances (tracking error, control authority, etc.). A state feedback controller,
K ∈ Rm×n , is stabilizing if A − B K is a stability matrix. (The eigenvalues have strictly negative
real parts.) Given a stabilizing state feedback, the closed-loop system is
ẋ = (A − B K )x + Lv, x(0) = x0 ,
Cx
z= , 4.
−K x
z2
sup ≤γ
v∈L2 v2
or, equivalently,
sup z2 − γ 2 v2 ≤ 0.
v∈L2
By iterating on γ , one can then seek to derive controllers that make γ as small as possible.
The associated objective function for the zero-sum game formulation, with u as the minimizer
and v the maximizer, is
∞ ∞
φ(u, v) = x T (t)C T C x(t) + u T (t)u(t) dt − γ 2 v T (t)v(t) dt 5.
0 0
Theorem 2. Assume that the pair (A, B) is stabilizable and ( A, C) is observable. The
Access provided by University of Reading on 01/25/18. For personal use only.
The above is just a sample of the extensive results concerning linear–quadratic games; for a
dedicated discussion, see Reference 30.
but otherwise the disturbance could have an arbitrarily large magnitude. Another model of an
adversarial disturbance is that it takes values in a bounded domain. This unknown-but-bounded
approach has its origins in Reference 31 and 32 and is related to set-invariance methods (19),
viability theory (20), and the backwards reachability method discussed in Section 3.1.
The discussion here follows Bertsekas (33, his sections 1.6 and 4.6.2). We consider discrete-
time dynamics of the form
subject to constraints u(t) ∈ U and v(t) ∈ V. (For added generality, one can also have the constraint
sets be state dependent.) We are interested in finding the (possibly time-dependent) state feedback
u(t) = μt (x(t)) ∈ U
to minimize
T −1
max q (x(T )) + g(x(t), μt (x(t)), v(t))
v(0),v(1),...,v(T −1)∈V
t=0
over some time horizon, [0, 1, . . . , T ]. Let μ denote a collection of state feedback laws
(μ0 (·), μ1 (·), . . . , μT −1 (·)). The order of operations here is that the controller (or minimizer) com-
mits to μ and then the constrained disturbance reacts to maximize the cost.
The dynamic programming solution follows the recursive procedure of value iteration. Define
J0 (x) = q (x)
Jk (x) = min max g(x, u, v) + Jk−1 ( f (x, u, v)) .
Access provided by University of Reading on 01/25/18. For personal use only.
u∈U v∈V
The connection to zero-sum games is that minimax dynamic programming provides a procedure
to construct an optimal security strategy from the perspective of the minimizer.
3.3.1. Controlled invariance. The problem of controlled invariance is to maintain the state in
a specified region in the presence of disturbances. In the case of linear systems, consider
subject to |v|∞ ≤ 1 and |u|∞ ≤ u max . Our objective is to maintain the state in the bounded region
|x|∞ ≤ 1. One can map this problem to the minimax dynamic programming approach by defining
the stage and terminal costs as
0, |x|∞ ≤ 1;
q (x) = g(x, u, v) =
1, otherwise.
Then an initial condition, x0 , satisfies JT (x0 ) = 0 if and only if there exists a policy, μ,
that assures
that the state will satisfy x(t)∞ ≤ 1 for all t = 0, 1, . . . , T .
One can mimic the backwards reachability algorithm to construct a representation of the cost-
to-go functions, Jk (·), as follows. Define
G0 = {x : |x|∞ ≤ 1}
Then x0 ∈ G1 if and only if J1 (x0 ) = 0, or, equivalently, there exists a control action that keeps
x(t) ≤ 1 for t = 0 and t = 1. Proceeding recursively, define
∞
Gk = x : ∃u with |u|∞ ≤ u max s.t. Ax + Bu + Lv ∈ Gk−1 , ∀ |v| ≤ 1 ∩ Gk−1 .
Note that the only difference between this algorithm and that of backwards reachability is the
recursive intersection. Although computationally costly, one can use polyhedral representations
of these sets (e.g., 19). That is, one can recursively construct matrices, Mk , such that
Gk = {x : |Mk x|∞ ≤ 1} .
∞
It turns out that the controlled invariance problem is also related to minimizing an induced
norm of the form
Access provided by University of Reading on 01/25/18. For personal use only.
T (K )v
∞
T [K ]ind,∞ = sup ,
v∈L∞ v∞
where, for a function f : Z+ → Rn ,
f ∞ = sup f (t)∞ .
t=0,1,...
For this discussion, we assume that the parameter takes on finite values,
θ (t) ∈ Q, ∀t = 0, 1, 2, . . . ,
some finite set, Q. One can also add constraints like bounds on the rate of change, such as
for
θ (t + 1) − θ (t) ≤ δmax .
We are interested in deriving a collection of state feedback laws (μ0 (·), μ1 (·), . . . , μT −1 (·)) to
optimize
T −1
sup g(x(t), μt (x(t)))
θ(0),θ(1),...,θ(T −1)∈Q t=0
along solutions of Equation 6. This structure falls within the above-mentioned minimax dynamic
programming, where the game is between the control, u, and parameter, θ.
The following results, along with explicit computational methods, were presented in
Reference 38.
Theorem 3. Assume that the stage cost has the polyhedral representation
g(x, u) = Mg x + N g u ∞ .
Then there exists a sequence of matrices, Mk , so that the cost-to-go functions of minimax
dynamic programming satisfy
Jk (x) = |Mk x|∞ .
Furthermore, the stationary (receding horizon) feedback policy
μT (x(t)) = arg min max g(x(t), u) + JT A(θ)x(t) + Bu
u θ
asymptotically approximates the optimal infinite horizon policy for sufficiently large
horizon lengths, T .
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
Access provided by University of Reading on 01/25/18. For personal use only.
according to some known probability distribution, π ∈ [K ], where πk is the probability that the
matrix Mk is selected.
The informed player knows which matrix was selected, whereas the uninformed player does not.
Furthermore, a common assumption is that the uninformed player cannot measure the realized
payoff at each stage. Accordingly, the strategy of the informed player can depend on the selected
characteristic matrix, whereas the strategy of the uninformed player can depend only on past
actions. Following the notation of Section 2.2.4, let H∗ denote the set of finite histories of play.
Then a (behavioral) strategy for the informed row player is a mapping
srow : H∗ × M → [P],
whereas a strategy for the uninformed column player depends only on the history of play, as in
scol : H∗ → [Q].
This setup, as with more complicated variations, has received considerable attention, from the
seminal work of Aumann et al. (39) through the more recent monograph by Mertens et al.
(40).
A motivating setting is network interdiction (e.g., 41–43). In network interdiction prob-
lems, the activities of a network owner (defender) are being observed by an attacker. The
network owner is the informed player in that the owner knows the details of the network,
e.g., which parts of the network are protected or susceptible. Based on these observations, the
attacker launches a limited attack to disable a portion of the network. The network owner
faces a trade-off of exploitation versus revelation. If the network owner exploits its superior
information, it may reveal sensitive information to the attacker and thereby increase the like-
lihood of a critical attack. (For further discussion of game theory for network security, see
44.)
Returning to the repeated zero-sum game problem, it turns out that the optimal strategy of
the informed player is to randomize its actions based on the following model of the uninformed
player. First, assume that the uninformed player is executing Bayesian updates to compute the
posterior probability
post
πk [H(t)] = Pr [Mk | H(t)] ,
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
which is the posterior probability that the matrix Mk was selected based on the observed sequence
Access provided by University of Reading on 01/25/18. For personal use only.
of past actions, H(t). The action of the uninformed player is then modeled as a myopically optimal
randomized strategy with respect to these posterior probabilities:
K
post
σcol (H(t)) = arg max min πrow
T
πk [H(t)]Mk πcol .
πcol ∈[Q] πrow ∈[P ]
k=1
A classic result (see 39) is that the informed row player’s optimal security strategy is a best
response to the above column player strategy. The resulting strategy will randomize according to
probabilities that depend on both the stage of play and the online outcome of prior stages. One can
think of this strategy as a deception attempt to influence the posterior beliefs of the uninformed
attacker. Recent work has presented efficient algorithms for the computation of these probabilities
based on recursive linear programs (42, 43, 45, 46).
4. TEAM GAMES
We now discuss another class of games that has had a long-standing relationship with control
theory: team games. In team games, there can be multiple players, 1, 2, . . . , P . The defining
characteristic is that all players have the same preferences. Recall that S = S1 × · · · × S P is the
joint strategy set for players 1, 2, . . . , P . In team games,1 for some cost function, C : S → R,
c1 (s ) = c2 (s ) = · · · = c P (s ) = C(s ).
While team games have been of interest for several decades (e.g., 47), there has been a recent
surge of interest related to distributed control applications, where there is no single centralized
controller that has authority over all control inputs or access to all measurements (e.g., 48–51)
(for an alternative approach to distributed control, see Section 5).
The main issue in team games is that each player has access to different information. If all
players had the same information, then one could approach the game as a conventional central-
ized control design problem. Players with different information can significantly complicate the
characterization and, more importantly, the explicit and efficient computational construction of
optimal strategies.
1
In accordance with much of the literature, we assume that players are minimizers.
Another issue in team games is the distinction between a team-optimal strategy and a Nash
equilibrium. The joint strategy, s ∗ , is team optimal if
C(s ∗ ) ≤ C(s ), ∀s ∈ S,
A classic result that dates back to work by Radner (52) is as follows [the discussion here follows Ho
Access provided by University of Reading on 01/25/18. For personal use only.
(47)]. There are P players, and player p can measure the scalar (for simplicity) random variable
zp = H p ξ ,
from a player’s measurement, z p , to its actions, up , i.e., up = sp (z p ). The common cost function is
1
C(s 1 , . . . , sp ) = E u T Qu + u T Sξ ,
2
where matrices Q = Q T > 0 and S are cost function parameters and
⎛ ⎞ ⎛ ⎞
u1 s 1 (z1 )
⎜.⎟ ⎜ . ⎟
u=⎜ ⎟ ⎜
⎝ .. ⎠ = ⎝ .. ⎠.
⎟
uP s P (zP )
Theorem 4 (from Reference 52). The unique Nash equilibrium strategies are team
optimal and linear, i.e.,
sp∗ (z p ) = F p z p
for matrices F p .
For this specific case, person-by-person optimality also implies team optimality.
for matrices Dpq . A specific case that has received considerable attention is the Witsenhausen
counterexample (53; see also 54). Here, there are two players, and the random variable ξ = (ξ1 , ξ2 )
is in R2 . The measurements are
z1 = ξ1 ,
z2 = ξ2 + u 1 ,
for positive weights α, β > 0. The standard interpretation of this example is that player 1, through
its action u 1 , faces the dual objective to either cancel the disturbance effect of ξ1 or signal its own
value to u 2 to overcome the noise effect of ξ2 .
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
Despite the linear–quadratic look and feel of this example, Witsenhausen (53) showed that
Access provided by University of Reading on 01/25/18. For personal use only.
there exist nonlinear team-optimal strategies that strictly outperform the best linear strategies.
The team optimality of nonlinear strategies leads to the question of when linear strategies are team
optimal, which ultimately relates to the desire to explicitly construct team-optimal strategies. For
example, one well-known case is that of partially nested information. Continuing with the general
formulation of Equation 7, suppose that the measurements can be decomposed as follows:
zq , ∀q ∈ O p
zp = ,
Hp ξ
where the sets O p are defined as
O p = {q | Dpq = 0}.
In words, the measurement of player p includes the information of player q whenever the action
of player q affects the measurement of player p (as indicated by Dpq = 0).
u 1 = s 1 (z0 , z1 ),
u 2 = s 2 (z0 , z2 ).
Likewise, define
S2 = {φ1 | φ2 : Z2 → U2 } .
Now imagine a hypothetical coordinator that has access to the common information, z0 . Upon
Access provided by University of Reading on 01/25/18. For personal use only.
obtaining a measurement, the coordinator recommends a pair of mappings r1 (·; z0 ) and r2 (·; z0 )
to the two players to apply to their private information, z1 and z2 , respectively. The optimization
problem for the coordinator is then
min E [
(ξ , r1 (z1 ; z0 ), r2 (z2 ; z0 )] ,
(r1 ,r2 )∈R
i.e., to derive the optimal recommended mappings. The main results of Nayyar et al. (56, 57) es-
tablish conditions under which the optimal recommendations, (r1∗ (·; z0 ), r2∗ (·; z2 )), constitute team-
optimal strategies, with the associations
s 1∗ (z0 , z1 ) = r1∗ (z1 ; z0 ),
s 2∗ (z0 , z2 ) = r2∗ (z2 ; z0 ).
t=0
Wind farms: Wind farm control seeks to optimize the power production. Employing control
strategies where the individual turbines independently optimize their own power production
Access provided by University of Reading on 01/25/18. For personal use only.
is not optimal with regard to optimizing the power production in the wind farm (67–70).
Area coverage: In coverage problems, a collection of mobile sensors in an unknown en-
vironment seeks to maximize the area under surveillance. Furthermore, some portions of
the environment may carry more weight than others. Applications range from following a
drifting spill to intruder detection (71–73).
The overarching goal of such distributed control problems is to characterize admissible decision-
making policies for the individual subsystems that ensure the emergent collective behavior is
desirable.
The design of distributed control policies can be derived from a game-theoretic perspective,
where the subsystems are modeled as players in a game with designed utility functions. We no
longer view equilibrium concepts such as Nash equilibrium as plausible outcomes of a game.
Rather, we view these equilibrium concepts as stable outcomes associated with distributed learn-
ing where the individual subsystems adjust their behavior over time in response to information
regarding their designed utility function and the behavior of the other subsystems.
The following sections present highlights of a game-theoretic approach. Two recent overview
articles are References 3 and 4.
meaning that each player adjusts its strategy at time t using knowledge of the previous decisions
of the other players as well as information regarding the structural form of the player’s utility
function. Any learning algorithm that can be expressed in this form is termed uncoupled, as
players are not allowed to condition their choice on information regarding the utility functions of
other players. An example of a decision-making rule of this form is the well-studied best-response
dynamics, in which
sp (t) ∈ BR p (s−p (t − 1)), 9.
where BR p (·) is the best-response set defined in Definition 2. In the best-response dynamics, each
player selects a best response to the behavior of the other players at the previous play of the game.
An alternative class of learning algorithms to Equation 8 that imposes less informational de-
mands on the players is termed completely uncoupled dynamics or payoff-based dynamics (see
75–77) and is of the form
In completely uncoupled dynamics, each agent is now given only the payoff that the agent received
at each play of the game. A player is no longer able to observe the behavior of the other agent
or access the utility that the agent would have received for any alternative choices at any stage
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
of the game. In problems such as network routing, learning algorithms of the form shown in
Equation 10 may be far more reasonable than those in Equation 8.
Access provided by University of Reading on 01/25/18. For personal use only.
Ignoring the informational demands for the moment, the theory of learning in games has sought
to establish whether there exist distributed learning algorithms of the form shown in Equation 8
that will always ensure that the resulting collective behavior reaches a Nash equilibrium (or its
generalizations). Representative papers include References 74, 78, and 79 in the economic game
theory literature and References 80–82 in the control literature.
The following theorem highlights an inherent challenge associated with distributed learning
for Nash equilibrium in general.
Theorem 6 (from Reference 75). There are no natural dynamics of the form shown
in Equation 8 that converge to a pure Nash equilibrium for all games.
The term natural used in Theorem 6, which we do not explicitly define here, seeks to eliminate
learning dynamics that exhibit phenomena such as exhaustive search or centralized coordination.
A key point regarding this negative result is the phrase “for all games,” which effectively means that
there are complex games for which no natural dynamics exist that converge to a Nash equilibrium.
A central question that we seek to address here is whether a system designer can exploit the
freedom to design the players’ utility functions to avoid this negative result.
In a potential game, each player’s utility is directly aligned with a common potential function in
the sense that the change in the utility that a player would receive by unilaterally deviating from
a strategy sp to a new strategy sp when all other players are selecting s−p is equal to the difference
of the potential function over those two strategy profiles. Note that there is no mention of what
happens to the utility of the other players j = i across these two strategy profiles.
There are several desirable properties regarding potential games that are of interest to dis-
tributed control. First, a pure Nash equilibrium is guaranteed to exist in any potential game
because any strategy s ∈ arg maxs ∈S φ(s ) is a pure Nash equilibrium. Second, there are natural
dynamics that do in fact lead to a pure Nash equilibrium for any potential game. In fact, the follow-
ing mild modification of the best-response dynamics given in Equation 12, termed best-response
dynamics with inertia, accomplishes this goal:
BR p (s−p (t)) with probability (1 − ),
sp (t) ∈ 12.
sp (t − 1) with probability ,
where > 0 is referred to as the player’s inertia (79). More formally, the learning algorithm in
Equation 12 converges almost surely to a pure Nash equilibrium in any potential game. Similar
positive results also hold for relaxed versions of potential games, e.g., generalized ordinal potential
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
games or weakly acyclic games (79), which seek to relax Equation 11 through either the equality
or the condition for all players.
Access provided by University of Reading on 01/25/18. For personal use only.
One of the other interesting facets of potential games is the availability of distributed learning
algorithms that exhibit equilibrium selection properties, i.e., learning algorithms that favor one
Nash equilibrium over other Nash equilibria. For example, consider the algorithm log-linear
learning (see 84–87), where at each time t ≥ 1 a single agent i is given the opportunity to revise
its strategy, i.e., s−p (t) = s−p (t − 1), and this updating player adjusts its choice according to the
following mixed strategy:
e β·up (sp ,s−p (t−1))
sp (t) = sp with probability , 13.
β·up (sp ,s−p (t−1))
sp ∈S p e
where β > 0. For any β ≥ 0, this process induces an aperiodic and irreducible Markov process
over the finite strategy set S with a unique stationary distribution q = {q s }s ∈S ∈ (S). When the
game is a potential game, the stationary distribution satisfies
e β·φ(s )
qs = , 14.
s ∈Se β·φ(s )
which means that when β → ∞, the support of the stationary distribution is contained in the
strategy profiles with highest potential value.
where we use the notation s ne ∈ G to mean a pure Nash equilibrium in the game G. The price
of anarchy seeks to bound the performance of any pure Nash equilibrium relative to the optimal
strategy profile, while the price of stability focuses purely on the best Nash equilibrium. Note that
1 ≥ PoS(G) ≥ PoA(G) ≥ 0.
Now consider a family of games G where a pure Nash equilibrium is known to exist for each
game G ∈ G. The price of anarchy and price of stability extend to this family of games G in the
following manner:
In essence, the price of anarchy and price of stability provide worst-case performance guarantees
when restricting attention to a specific type of equilibrium behavior.
We now turn to the question of how to design agent utility functions. To that end, suppose that a
#
system operator has knowledge of the player set P, an overestimate of the strategy sets S̄ = j S̄ j ,
Access provided by University of Reading on 01/25/18. For personal use only.
and a welfare function W : S̄ → R. By overestimate, we mean that the true strategy sets satisfy
Sp ⊆ S̄ p for all i, but the system designer does not know this information a priori. Accordingly,
the question that we seek to address is whether a system design can commit to a specific design of
agent utility functions that ensures desirable properties irrespective of the chosen strategy sets S.
The following theorem from References 89–92 provides one such mechanism.
Theorem 7. Consider any player set P, an overestimate of the strategy set S̄, and a
welfare function W : S̄ → R. Define an overestimate of the utility function where for
any s ∈ S̄ and player p ∈ P
up (sp , s−p ) = W (sp , s−p ) − W spBL , s−p , 19.
where spBL ∈ S̄ p is a fixed baseline strategy. Then for any game G = (P, S, u, W ) where
S p ⊆ S̄ p for all p, the game G is a potential game with potential function W restricted
to the domain S.
Consider the family of games G induced by the utility design mechanism given in Equation 19
for any baseline strategy s BL . Theorem 7 implies that the price of stability satisfies
PoS(G) = 1
or, alternatively, that the optimal strategy profile is a Nash equilibrium for any game G ∈ G,
i.e., any realization with strategy sets S p ⊆ S̄ p . Not only is the optimal strategy profile a pure
Nash equilibrium, but this optimal strategy profile is also the potential function maximizer. Con-
sequently, by appealing to the algorithm log-linear learning defined above, one can ensure that
the resulting collective behavior is nearly optimal when β is sufficiently high.
5.4.1. Optimizing the price of anarchy. Note that Theorem 7 makes no claim regarding the
price of anarchy. In fact, what utility design mechanism optimizes the price of anarchy over the set
of induced games G remains an open question. Recent work has sought to address this question in
several different domains, including concave cost-sharing problems (93, 94), submodular resource
allocation problems (95, 96), network coding problems (97), set-covering problems (98), routing
problems (64, 99), and more general cost-sharing problems with a restriction to utility design
mechanisms that enforce p up (s ) = W (s ) for all s ∈ S (100–102).
5.4.2. Local utility design. The utility design given in Theorem 7 also makes no reference to the
structure or the informational dependence or locality of the derived utility functions. This question
has been looked at extensively in the cost-sharing literature, where agents’ utility functions are
required to depend only on information regarding the selected resources; for example, in network
routing problems, the utility of a player for selecting a route can depend only on the edges within
that route and the other players that selected these edges. Confined to such local agent utility
functions, Gopalakrishnan et al. (103) derived the complete set of local agent utility functions that
ensure the existence of a pure Nash equilibrium; however, optimizing over such utility functions
to optimize the price of anarchy is very much an open question, as highlighted above. Other recent
work (104–106) has introduced a state in the game environment as a design parameter to design
player objective functions of a desired locality.
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
6. CONCLUDING REMARKS
Access provided by University of Reading on 01/25/18. For personal use only.
This review has presented an overview of selected topics in game theory of particular relevance
to control systems, namely zero-sum games, team games, and game-theoretic distributed control.
Of course, with limited space, there are bound to be some major omissions. Topics not discussed
here include the following:
General sum dynamic games: These are dynamic games that need not be zero-sum or team
games. Of particular interest is the existence of Nash equilibria (for a summary, see 107) as
well as reduced-complexity solution concepts (e.g., 108, 109) and learning (e.g., 110, 111).
Large-population games: Here, a very large number of players can be approximated as a
continuum. Sandholm (112) provides a general discussion, and Quijano et al. (113) provide
a recent overview tailored to a control audience. Also related to large populations is the topic
of mean-field games (114).
Cooperative game theory: All of the models discussed above fall under the framework of
noncooperative game theory. Cooperative game theory is a complementary formalism that
relates to problems such as bargaining, matching, and coalition formation (e.g., for various
engineering applications, see 115–117). Fele et al. (118) provide a recent overview tailored
to a control audience.
DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.
ACKNOWLEDGMENTS
This work was supported by Office of Naval Research grant N00014-15-1-2762, National Science
Foundation grant ECCS-1351866, and funding from King Abdullah University of Science and
Technology (KAUST).
LITERATURE CITED
1. Myerson R. 1997. Game Theory. Cambridge, MA: Harvard Univ. Press
2. Zhou K, Doyle J. 1998. Essentials of Robust Control. Upper Saddle River, NJ: Prentice Hall
3. Marden JR, Shamma JS. 2015. Game theory and distributed control. In Handbook of Game Theory with
Economic Applications, Vol. 4, ed. H Young, S Zamir, pp. 861–99. Amsterdam, Neth.: Elsevier
4. Marden JR, Shamma JS. 2018. Game theoretic learning in distributed control. In Handbook of Dynamic
Game Theory, ed. T Başar, G Zaccour. Basel, Switz.: Birkhaüser. In press
5. Fudenberg D, Tirole J. 1991. Game Theory. Cambridge, MA: MIT Press
6. Osborne M, Rubinstein A. 1994. A Course in Game Theory. Cambridge, MA: MIT Press
7. Cesa-Bianchi N, Lugosi G. 2006. Prediction, Learning, and Games. Cambridge, UK: Cambridge Univ.
Press
8. Leyton-Brown K, Shoham Y. 2008. Essentials of Game Theory: A Concise, Multidisciplinary Introduction.
San Rafael, CA: Morgan & Claypool
9. Nisan N, Roughgarden T, Tardos E, Vazirani VV. 2007. Algorithmic Game Theory. Cambridge, UK:
Cambridge Univ. Press
10. Başar T, Olsder G. 1999. Dynamic Noncooperative Game Theory. Philadelphia, PA: Soc. Ind. Appl. Math.
11. Bauso D. 2016. Game Theory with Engineering Applications. Philadelphia, PA: Soc. Ind. Appl. Math.
12. Hespanha J. 2017. Noncooperative Game Theory: An Introduction for Engineers and Computer Scientists.
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
14. Camerer C. 2003. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton, NJ: Princeton
Univ. Press
15. Rubinstein A. 1998. Modeling Bounded Rationality. Cambridge, MA: MIT Press
16. Ho Y, Bryson A, Baron S. 1965. Differential games and optimal pursuit-evasion strategies. IEEE Trans.
Automat. Control 10:385–89
17. Isaacs R. 1965. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control
and Optimization. New York: Wiley
18. Liberzon D. 2012. Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton,
NJ: Princeton Univ. Press
19. Blanchini F, Miani S. 2007. Set-Theoretic Methods in Control. Basel, Switz.: Birkhäuser
20. Aubin J. 1991. Viability Theory. Basel, Switz.: Birkhäuser
21. Mitchell IM, Bayen AM, Tomlin CJ. 2005. A time-dependent Hamilton-Jacobi formulation of reachable
sets for continuous dynamic games. IEEE Trans. Automat. Control 50:947–57
22. Zhou Z, Zhang W, Ding J, Huang H, Stipanović DM, Tomlin CJ. 2016. Cooperative pursuit with
Voronoi partitions. Automatica 72:64–72
23. Karaman S, Frazzoli E. 2011. Incremental sampling-based algorithms for a class of pursuit-evasion
games. In Algorithmic Foundations of Robotics IX: Selected Contributions of the Ninth International Workshop
on the Algorithmic Foundations of Robotics, ed. D Hsu, V Isler, JC Latombe, MC Lin, pp. 71–87. Berlin:
Springer
24. Isler V, Kannan S, Khanna S. 2005. Randomized pursuit-evasion in a polygonal environment. IEEE
Trans. Robot. 21:875–84
25. Zames G. 1981. Feedback and optimal sensitivity: model reference transformations, multiplicative semi-
norms, and approximate inverses. IEEE Trans. Automat. Control 26:301–20
26. Başar T, Bernhard P. 1991. H-Infinity-Optimal Control and Related Minimax Design Problems. Basel, Switz.:
Birkhäuser
27. Doyle J, Francis B, Tannenbaum A. 2009. Feedback Control Theory. Mineola, NY: Dover
28. Banavar RN, Speyer JL. 1991. A linear-quadratic game approach to estimation and smoothing. In 1991
American Control Conference (ACC), pp. 2818–22. New York: IEEE
29. Limebeer DJN, Anderson BDO, Khargonekar PP, Green M. 1992. A game theoretic approach to H∞
control for time-varying systems. SIAM J. Control Optim. 30:262–83
30. Engwerda J. 2005. LQ Dynamic Optimization and Differential Games. Hoboken, NJ: Wiley
31. Bertsekas D, Rhodes I. 1971. On the minimax reachability of target sets and target tubes. Automatica
7:233–47
32. Bertsekas D. 1972. Infinite time reachability of state-space regions by using feedback control. IEEE
Trans. Automat. Control 17:604–13
33. Bertsekas D. 2005. Dynamic Programming and Optimal Control. Belmont, MA: Athena Sci.
34. Dahleh M, Diaz-Bobillo I. 1995. Control of Uncertain Systems: A Linear Programming Approach. Upper
Saddle River, NJ: Prentice Hall
35. Shamma JS. 1993. Nonlinear state feedback for
1 optimal control. Syst. Control Lett. 21:265–70
36. Shamma JS. 1996. Optimization of the
∞ -induced norm under full state feedback. IEEE Trans. Automat.
Control 41:533–44
37. Shamma JS. 2012. An overview of LPV systems. In Control of Linear Parameter Varying Systems with
Applications, ed. J Mohammadpour, CW Scherer, pp. 3–26. Boston: Springer
38. Shamma JS, Xiong D. 1999. Set-valued methods for linear parameter varying systems. Automatica
35:1081–89
39. Aumann RJ, Maschler M, Stearns RE. 1995. Repeated Games with Incomplete Information. Cambridge,
MA: MIT Press
40. Mertens J, Sorin S, Zamir S. 2015. Repeated Games. Cambridge, UK: Cambridge Univ. Press
41. Smith JC, Prince M, Geunes J. 2013. Modern network interdiction problems and algorithms. In Handbook
of Combinatorial Optimization, ed. PM Pardalos, DZ Du, RL Graham, pp. 1949–87. New York: Springer
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
42. Zheng J, Castañón DA. 2012. Stochastic dynamic network interdiction games. In 2012 American Control
Conference (ACC), pp. 1838–44. New York: IEEE
Access provided by University of Reading on 01/25/18. For personal use only.
43. Zheng J, Castañón DA. 2012. Dynamic network interdiction games with imperfect information and
deception. In 2012 51st Annual IEEE Conference on Decision and Control (CDC), pp. 7758–63. New York:
IEEE
44. Alpcan T, Başar T. 2010. Network Security: A Decision and Game-Theoretic Approach. Cambridge, UK:
Cambridge Univ. Press
45. Li L, Shamma JS. 2015. Efficient computation of discounted asymmetric information zero-sum stochastic
games. In 2015 54th Annual IEEE Conference on Decision and Control (CDC), pp. 4531–36. New York:
IEEE
46. Li L, Langbort C, Shamma J. 2017. Computing security strategies in finite horizon repeated bayesian
games. In 2017 American Control Conference (ACC), pp. 3664–69. New York: IEEE
47. Ho YC. 1980. Team decision theory and information structures. Proc. IEEE 68:644–54
48. Lewis FL, Zhang H, Hengster-Movric K, Das A. 2013. Cooperative Control of Multi-Agent Systems: Optimal
and Adaptive Design Approaches. New York: Springer
49. Shamma JS. 2008. Cooperative Control of Distributed Multi-Agent Systems. Hoboken, NJ: Wiley
50. van Schuppen J, Villa T. 2014. Coordination Control of Distributed Systems. Cham, Switz.: Springer
51. Wang Y, Zhang F. 2017. Cooperative Control of Multi-Agent Systems: Theory and Applications. Hoboken,
NJ: Wiley
52. Radner R. 1962. Team decision problems. Ann. Math. Stat. 33:857–81
53. Witsenhausen HS. 1968. A counterexample in stochastic optimum control. SIAM J. Control 6:131–47
54. Başar T. 2008. Variations on the theme of the Witsenhausen counterexample. In 2008 47th IEEE
Conference on Decision and Control, pp. 1614–19. New York: IEEE
55. Ho YC, Chu K. 1974. Information structure in dynamic multi-person control problems. Automatica
10:341–51
56. Nayyar A, Mahajan A, Teneketzis D. 2013. Decentralized stochastic control with partial history sharing:
a common information approach. IEEE Trans. Autom. Control 58:1644–58
57. Nayyar A, Mahajan A, Teneketzis D. 2014. The common-information approach to decentralized stochas-
tic control. In Information and Control in Networks, ed. G Como, B Bernhardsson, A Rantzer, pp. 123–56.
Cham, Switz.: Springer
58. Kurtaran BZ, Sivan R. 1974. Linear-quadratic-Gaussian control with one-step-delay sharing pattern.
IEEE Trans. Autom. Control 19:571–74
59. Sandell N, Athans M. 1974. Solution of some nonclassical LQG stochastic decision problems. IEEE
Trans. Autom. Control 19:108–16
60. Nayyar N, Kalathil D, Jain R. 2017. Optimal decentralized control with asymmetric one-step
delayed information sharing. IEEE Trans. Control Netw. Syst. In press. https://doi.org/10.1109/
TCNS.2016.2641802
61. Bamieh B, Voulgaris PG. 2005. A convex characterization of distributed control problems in spatially
invariant systems with communication constraints. Syst. Control Lett. 54:575–83
62. Rotkowitz M, Lall S. 2006. A characterization of convex problems in decentralized control. IEEE Trans.
Autom. Control 51:274–86
63. Roughgarden T. 2005. Selfish Routing and the Price of Anarchy. Cambridge, MA: MIT Press
64. Brown PN, Marden JR. 2017. Studies on robust social influence mechanisms, incentives for efficient
network routing in uncertain settings. IEEE Control Syst. Mag. 37:98–115
65. Lasaulce S, Jimenez T, Solan E, eds. 2017. Network Games, Control, and Optimization: Proceedings of
NETGCOOP 2016, A vignon, France. Basel, Switz.: Birkhaüser
66. Ozdaglar A, Menache I. 2011. Network Games: Theory, Models, and Dynamics. San Rafael, CA: Morgan &
Claypool
67. Marden JR, Ruben S, Pao L. 2013. A model-free approach to wind farm control using game theoretic
methods. IEEE Trans. Control Syst. Technol. 21:1207–14
68. Johnson KE, Thomas N. 2009. Wind farm control: addressing the aerodynamic interaction among wind
turbines. In 2009 American Control Conference, pp. 2104–9. New York: IEEE
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
69. Steinbuch M, de Boer WW, Bosgra OH, Peters S, Ploeg J. 1998. Optimal control of wind power plants.
J. Wind Eng. Ind. Aerodyn. 27:237–46
Access provided by University of Reading on 01/25/18. For personal use only.
70. Pao L, Johnson KE. 2009. A tutorial on the dynamics and control of wind turbines and wind farms. In
2009 American Control Conference, pp. 35–36. New York: IEEE
71. Cortes J, Martinez S, Karatas T, Bullo F. 2004. Coverage control for mobile sensing networks. IEEE
Trans. Robot. Autom. 20:243–55
72. Marden JR, Arslan G, Shamma JS. 2009. Cooperative control and potential games. IEEE Trans. Syst.
Man Cybernet. B 39:1393–407
73. Yazicioglu AY, Egerstedt M, Shamma JS. 2016. Communication-free distributed coverage for networked
systems. IEEE Trans. Control Netw. Syst. 4:499–510
74. Fudenberg D, Levine D. 1998. The Theory of Learning in Games. Cambridge, MA: MIT Press
75. Hart S, Mas-Colell A. 2003. Uncoupled dynamics do not lead to Nash equilibrium. Am. Econ. Rev.
93:1830–36
76. Marden JR, Young HP, Arslan G, Shamma JS. 2009. Payoff based dynamics for multi-player weakly
acyclic games. SIAM J. Control Optim. 48:373–96
77. Babichenko Y. 2010. Completely uncoupled dynamics and Nash equilibria. Discuss. Pap., Cent. Study Ration.,
Hebrew Univ., Jerusalem
78. Hart S. 2005. Adaptive heuristics. Econometrica 73:1401–30
79. Young HP. 2005. Strategic Learning and Its Limits. Oxford, UK: Oxford Univ. Press
80. Shamma JS, Arslan G. 2005. Dynamic fictitious play, dynamic gradient play, and distributed convergence
to Nash equilibria. IEEE Trans. Autom. Control 50:312–27
81. Fox MJ, Shamma JS. 2013. Population games, stable games, and passivity. Games 4:561–83
82. Frihauf P, Krstic M, Başar T. 2012. Nash equilibrium seeking in noncooperative games. IEEE Trans.
Autom. Control 57:1192–207
83. Monderer D, Shapley L. 1996. Potential games. Games Econ. Behav. 14:124–43
84. Blume L. 1993. The statistical mechanics of strategic interaction. Games Econ. Behav. 5:387–424
85. Blume L. 1997. Population games. In The Economy as an Evolving Complex System II, ed. B Arthur,
S Durlauf, D Lane, pp. 425–60. Reading, MA: Addison-Wesley
86. Marden JR, Shamma JS. 2012. Revisiting log-linear learning: asynchrony, completeness and a payoff-
based implementation. Games Econ. Behav. 75:788–808
87. Alos-Ferrer C, Netzer N. 2010. The logit-response dynamics. Games Econ. Behav. 68:413–27
88. Koutsoupias E, Papadimitriou C. 1999. Worst-case equilibria. In STACS 99: 16th Annual Symposium
on Theoretical Aspects of Computer Science, Trier, Germany, March 4–6, 1999: Proceedings, ed. C Meinel, S
Tison, pp. 404–13. Berlin: Springer
89. Wolpert D. 2004. Theory of collective intelligence. In Collectives and the Design of Complex Systems, ed.
K Tumer, D Wolpert, pp. 43–106. New York: Springer
90. Arslan G, Marden JR, Shamma JS. 2007. Autonomous vehicle-target assignment: a game theoretical
formulation. ASME J. Dyn. Syst. Meas. Control 129:584–96
91. Marden JR, Wierman A. 2013. Distributed welfare games. Oper. Res. 61:155–68
92. Gopalakrishnan R, Marden JR, Wierman A. 2011. An architectural view of game theoretic control. ACM
SIGMETRICS Perf. Eval. Rev. 38:31–36
93. Phillips M, Shalaby Y, Marden JR. 2016. The importance of budget in efficient utility design. In 2016
55th Annual IEEE Conference on Decision and Control (CDC), pp. 6117–22. New York: IEEE
94. Marden J, Phillips M. 2016. Optimizing the price of anarchy in concave cost sharing games. In 2017
American Control Conference (ACC), pp. 5237–42. New York: IEEE
95. Marden J, Roughgarden T. 2014. Generalized efficiency bounds in distributed resource allocation. IEEE
Trans. Autom. Control 59:571–84
96. Vetta A. 2002. Nash equilibria in competitive societies with applications to facility location, traffic routing,
and auctions. In 43rd Annual IEEE Symposium on Foundations of Computer Science, pp. 416–25. New York:
IEEE
97. Marden JR, Effros M. 2012. The price of selfishness in network coding. IEEE Trans. Inform. Theory
58:2349–61
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
98. Gairing M. 2009. Covering games: approximation through noncooperation. In Internet and Network
Economics: 5th International Workshop, WINE 2009, Rome, Italy, December 14–18, 2009: Proceedings, ed.
Access provided by University of Reading on 01/25/18. For personal use only.
116. Saad W, Han Z, Debbah M, Hjorungnes A, Basar T. 2009. Coalitional game theory for communication
networks. IEEE Signal Process. Mag. 26:77–97
117. Saad W, Han Z, Poor HV. 2011. Coalitional game theory for cooperative micro-grid distribution net-
works. In 2011 IEEE International Conference on Communications Workshops (ICC). New York: IEEE.
https://doi.org/10.1109/iccw.2011.5963577
118. Fele F, Maestre JM, Camacho EF. 2017. Coalitional control: cooperative game theory and control. IEEE
Control Syst. 37:53–69
Annu. Rev. Control Robot. Auton. Syst. 2018.1. Downloaded from www.annualreviews.org
Access provided by University of Reading on 01/25/18. For personal use only.