Professional Documents
Culture Documents
Coco Games: Graphical Game-Theoretic Swarm Control For Communication-Aware Coverage
Coco Games: Graphical Game-Theoretic Swarm Control For Communication-Aware Coverage
3, JULY 2022
2377-3766 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5967
II. RELATED WORKS Qi (xi ). Further, −i denotes the set of all players but i, (xi , x−i )
denotes an alternative action profile where i plays xi instead of
A. Multi-Robot Coverage
xi while x−i remains the same. We define Q(x) as an arbitrary
The coverage problem typically involves deploying a set of joint probability distribution over the action profile x with mixed
mobile nodes over a field to maximize some objective: wireless strategies Qi (xi ) as the marginals.
signal coverage – in case of wireless routers or, information gain A graphical game Γ consists of the tuple G, M where G
– in case of sensors [7], [8]. Myriad literature discusses coor- defines a graph whose vertices correspond to the players, and
dinating wireless nodes over the spectrum of communication M represents the set of payoff functions. The graphical game-
and control methods ranging from disk-based channel models theory significantly reduce one’s interacting agents count from
with centralized control to stochastic models with decentralized n to it’s local neighborhood size [4]. For some player i, Mi ∈
control. A widely known array of work uses disk graph-based M, Mi : Ai − → R. Following the definition of expectation, we
methods to coordinate multiple robots while maintaining the define the expected utility as follows.
overall network connectivity as the nodes move [9]–[11]. Fur- Definition 1: The expected utility of player i under a joint
ther, disk-based channel models have also been used for cover- probability Q is
age controlling in [10]–[13]. However, most of them overlook
the volatility of the network topology caused by the stochasticity EQ [Mi (x)] = Q(xi , x−i )Mi (xi , x−i ). (1)
of wireless signals. Additionally, the “disk” assumption enforces x
excessively restrictive control on the robots to maintain the local Definition 2: The correlated equilibrium (CE) of a graphical
connectivity, sacrificing the coverage gain. game is a joint distribution Q over the associated undirected
In [8], [14] authors present optimization-based decentralized graphical model, under which no player has a unilateral incentive
coverage control for networked robot teams. The former ap- to deviate. Thus, for xi , xi ∈ Ai , ∀i, and xi = xi ,
proach mainly relies on a static coverage function and fails to
adjust to dynamic ROIs. In the latter, Kantaros et al. proposes EQ [Mi (x−i , xi )] ≥ EQ [Mi (x−i , xi )] .
optimizing similar auxiliary objectives to ours, considering
message-routing and fixed UEs. However, our work differs in its Definition 3: A mixed strategy Nash equilibrium (MSNE) is
ability to cater to both fixed and moving UEs while eliminating a special case of CE, where the joint probability
is a product
the need to incorporate UE positions into the optimization distribution of the marginals. Thus, Q(X) = i Qi (Xi ) [20].
problem explicitly. By using the team abstraction proposed
in [15], we make our approach highly scalable in the number B. Variational Energy Functional
of UEs. In [16], [17] the authors combine the stochastic channel The VI casts the inference problem over an MRF as a
models to account for the wireless fading effect in point-to-point convex optimization problem and approximates the posterior
communication to find the optimum router configurations under distribution much more efficiently in contrast to exact inference
different routing algorithms. methods [23]. Let G = (V, E) be an MRF, where V and E are
the set of vertices and edges, and V consists of a set of discrete
B. Graphical Games RVs {X1 , . . . , Xn }.
Definition 4: The joint probability distribution over an MRF
Graphical games reflect the notion that a multi-player game
is often represented as a Gibbs distribution
can be succinctly represented by a graph, and a player’s payoff
only depends on its neighbors’ actions [4]. In [18]–[20] the 1
authors established the interplay between games’ solutions and p(X = x) = φc (xc ), (2)
Z
c∈C
probabilistic graphical models. Although the notion of graphical
games resembles that of collective dynamics-based approaches, where, φc is a factor potential function associated with
only a few attempts have been made to employ the framework in some clique c ∈ C of G, φc : X |c| − → R+ , and Z is the
swarm coordination, despite the success gained by the latter. In partition
function to normalize the distribution, where Z =
our previous work [21], we presented an MRF-based approach xc φ
c∈C c c(x ).
to steer a robot swarm to a flocking consensus, yet the theoretical Remark 1: For any φc = exp{ε(xc )}, p(x) defines an expo-
guarantees and the connection to graphical games were missing. nential family distribution, where ε : X |c| −
→ R is some function
In [12], [22] the authors proposed graphical game-theoretic that maps the clique c to a real number.
methods for distributed mobile sensor coverage by conserving Let Q and P denote the approximating and the true posterior
energy; however, the works assume stationary neighborhoods distributions in VI. We consider the I-projection of Kullback-
while overlooking the robot dynamics and network limitations. Leiber (KL)-divergence between the two distributions,
III. PRELIMINARIES Q(X)
D(Q||PC ) = EQ ln ,
A. Graphical Game Theory PC (X)
Consider a game involving n players and Ai define the where, PC is the probability distribution over the set of cliques
set of actions available to any player i ∈ {1, . . . , n}. Let x = C. From (2) and, HQ (X) = −EQ [ln Q(X)];
(x1 , . . . , xn ) denote the joint action profile of the players. We
allow the players to play mixed strategies, and the probability D(Q||PC ) = − HQ (X) − EQ ln φc (xc ) + EQ [ln Z],
that i is playing the action xi is denoted by the mixed strategy c∈C
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
5968 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5969
node. Next, we use this coverage model to introduce the sufficient φi (xi ) = exp {αi .ψC } c ∈ {Ni |∀i}
statistics for MRF and the payoff functions. φc (xc ) = (8)
φij (xi , xj ) = exp {αij .ψR } c ∈ E,
B. Payoff Function for the Stage Graphical Game where φi , φij simply redefine the factor potentials given the
clique c’s type and αi , αij ≥ 0 are some weights associated
Following the communication topology and the neighborhood
with the cliques. With this definition, we consider the following
parameter k, we define the graph Gt = (Vt , Et ) for the stage
joint probability for the MRF G .
game Γt . The set of vertices Vt = {X1 , . . . , Xn } comprises ⎧ ⎫
the random variables for each robot node. The set of edges Et 1 ⎨ ⎬
contains an element (i, j) if and only if i and j satisfies the neigh- p(x) = exp αa ψC (xi ) + αb ψR (xi , xj ) .
borhood condition under k. Thus, Et = {(i, j)|j ∈ Nit (k), ∀i}. Z ⎩ ⎭
i∈V (ij)∈E
Hereafter, we ignore the script t, as we are interested in finding (9)
the equilibrium for a single-stage game. Also, for a fixed k, let Theorem 1: The probability over the MRF G , p(x) defines
Nit (k) = Nit . a linear exponential family with canonical parameters α and
We consider that the payoff of a robot i relies on the coverage sufficient statistics ψ.
provided by the neighborhood and its expected RSS with the Proof: Define two vectors ψ and α that contain factor po-
neighbors. Formally, we define, tentials and associated weights for cliques C of G . Set each
αc associated with the clique c; αc = αa for neighborhood
Mi (x) = αa ψC (xi ) + αb ψR (xi , xj ), (7) cliques, αc = αb for pairwise cliques, and αc = 0 for aux-
j∈N−i iliary pairwise cliques. Thus, the summations in (9) become
p(x) = exp{α, ψ − Λ}, where α, ψ is the inner product α,
where αa , αb > 0 are two predefined weight parameters. In this ψ, and Λ = ln Z. Following Remark 1, this defines a linear
work, they scale the contributions from two factor potentials exponential family.
ψc , ψp in the payoff function proportionately. The first term With Remark 1 and Theorem 1, we observe that the posterior
forces the robot to move farther to maximize the coverage, and p(x) takes the standard form of a joint distribution over an
the second term penalizes the robot for selecting the actions MRF. Given p(x) and the payoff Mi , we further notice that
that weaken the signal strength. We assume that the expected two functions share a mutual form. Specifically, the summation
RSS remains roughly unchanged in close proximities over R. terms that pertain to a single robot i inside the exponential
Thus, considering the ROI size and inter-robot distances, we of the former resembles the payoff function. In the following
argue that the contradicting effects of the auxiliary objectives section, we perform posterior inference over the MRF G and
can be ignored for sufficiently small intervals of t. Therefore, a establish that the resulting probability distribution Q(x) yields
robot’s best response action maximizes the coverage as well as a consensus formation of the stage game Γ.
the expected RSS with its neighbors. In this work, we propose
solving the game Γ by performing posterior inference over an D. Stage Game Optimization With MFVI
appropriately tailored MRF. By using an exponential family
distribution, we establish an analogy between the equilibrium In Theorem 1 we established that the joint posterior p(x)
in the game and the resulting joint distribution. adheres to the form in (2), and in general, ε(xc ) = ψc (xc ) for all
c ∈ C. Thus, we start by considering the variational energy func-
tional F , associated with (9). It is clear from (3) that maximizing
C. Exponential Family Posterior Distribution for MRF F results in the minimum KL divergence between the true and
We now discuss integrating the payoff function with a prob- the approximating posteriors. Here we use MFVI to optimize
abilistic graphical model defined on the game Γ = (G, M). We F and to obtain the approximating posterior. Specifically, the
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
5970 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022
where Qi (xi ) denotes the marginal probability distribution of where Zi is a typical exponential family normalization constant,
i in the posterior. The goal of the MFVI is to find an update rule as introduced in Definition 4, and the Lagrange multiplier λ
for each marginal distribution while keeping the neighboring gets dropped in the normalization. The resulting update rule is
RVs fixed. From (3), the variational energy functional takes the also known as the coordinate ascent mean-field approximation.
form Further, the redefinition of φc in (8) ensures that ln φc (xc ) exists
and reflects the coverage model.
F [P̃C , Q] = HQ (X) + EQ ln φc (xc ) . The stage game optimization algorithm summarizes the key
c∈C steps of MFVI into an iterative procedure to solve the graphical
game Γ = G, M by performing posterior inference on MRF
We consider the variational energy imparted on i from its
G . Specifically, we optimize each RV of the MRF to calculate the
neighborhood Ni under joint Q expressively,
marginal probabilities Qi (Xi ) while fixing the neighboring RVs.
⎡ ⎤
In the graphical game theoretical paradigm, this is equivalent to
Fi = HQ (Xi ) + EQ ⎣ ln φij (xi , xj ) + ln φi (xi )⎦ . calculating the mixed strategies profile of each player given the
j∈N−i
neighborhood. In the next section, we next show that a resulting
(11) posterior probability distribution Q induces an equilibrium in
In order to maximize Fi , we write the Lagrangian Li with the the graphical game Γ.
Lagrange multiplier being λ as
E. Equilibrium in the Stage Game
Li [Q] = HQ (Xi ) + EQ [ln φc ] + λ Qi (xi ) − 1 . Consider the modified stage game Γ = G, M where Mi ∈
c∈C xi
M. In this section, we discuss the interplay between the equilib-
Next, we differentiate the Lagrangian Li w.r.t the marginal rium solutions of Γ and the posterior probability of the factorized
Qi (xi ) and obtain the fixed points corresponding to the maxi- MRF induced by G .
mum. Theorem 2: The joint posterior distribution Q∗ (x) over the
MRF induced by G results a CE for the stage game.
d Proof: According to (11) and (12), each marginal Qi (Xi )
Li [Q] = − ln Q(x) − 1 + EQ [ln φc |xi ] + λ.(12)
dQ(x)
c∈C that comprises Q∗ (X) defines a local maximum of the energy
functional Fi . Therefore, for some actions xi , xi ∈ Ai , under
During the differentiation step we used the facts that the Q∗ (X) the marginals Qi (xi ) ≥ Qi (xi ), and thus,
derivatives of the entropy HQ (Xi ) and expectation EQ [ln φc ]
terms w.r.t Q(xi ) are − ln Q(xi ) − 1 and EQ [ln φc |xi ] respec-
ln Qi (xi ) ≥ ln Qi (xi ). (14)
tively. Here EQ [ln φc |xi ] denotes the conditional expectation
of the factor potential φc given the value xi . Notice that the
factor function φc changes according to (8) given the clique Now consider an action profile x = (xi , x−i ) where x−i rep-
type. Setting the derivative to 0 and taking the exponentials of resents the actions of N−i and xj ∈ x−i for some j ∈ N−i . From
the both sides yields the update rule (13),
⎧ ⎡ ⎤⎫ ⎡ ⎤
1 ⎨ ⎬
Qi (xi ) = exp EQ ⎣ ln φij |xi + ln φi ⎦ , (13) ln Qi (xi ) ∝ EQ∗ ⎣ ln φij (xi , xj ) + ln φi (xi )⎦ .
Zi ⎩ ⎭
j∈N−i j∈N−i
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5971
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
5972 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5973
Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.