You are on page 1of 8

5966 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO.

3, JULY 2022

CoCo Games: Graphical Game-Theoretic Swarm


Control for Communication-Aware Coverage
Malintha Fernando , Ransalu Senanayake, and Martin Swany

Abstract—We propose a novel framework for real-time


communication-aware coverage control in networked robot
swarms. Our framework unifies the robot dynamics with network-
level message-routing to reach consensus on swarm formations in
the presence of communication uncertainties by leveraging local
information. Specifically, we formulate the communication-aware
coverage as a cooperative graphical game, and use variational in-
ference to reach mixed strategy Nash equilibria of the stage games.
We experimentally validate the proposed approach in a mobile
ad-hoc wireless network scenario using teams of aerial vehicles and Fig. 1. A swarm of 3 UAV robots providing wireless network coverage to a
terrestrial user equipment (UE) operating over a large geographic team of 3 travelling UEs. The ROI, UAV-UAV and UAV-UE communication
region of interest. We show that our approach can provide wireless linkes are represented by the ellipsoid, red and blue dashed lines, respectively.
coverage to stationary and mobile UEs under realistic network CoCo games can coordinate the UAV swarm in real time to maximize the
conditions. coverage for the ROI independent of the UE movements.

Index Terms—Distributed robot systems, networked robots,


cooperating robots.

coordinating swarms with interconnected objectives like ours.


I. INTRODUCTION Additionally, the graphical game-theoretic foundation underpins
ULTI-ROBOT systems have been gaining significant using the robots’ local information to reach swarm consensus.
M attention in interdisciplinary research areas such as wire-
less networks and environmental monitoring thanks to the recent
Compared to more conventional stochastic games framework
which require one’s payoff to depend on the joint action profile
advancements in robotics and telecommunication sectors [1]. of all the others, this greatly simplifies the game structure [4].
Specifically, mobile ad-hoc wireless networks are emerging as To account for often changing communication topology
a disruptive technology to accommodate on-demand coverage caused by network uncertainties, we routinely update the game
and capacity enhancement for networked robot systems [2], structure with network-level information from message-routing
[3]. However, deploying robots in such applications requires tables. In contrast to disk-based coverage schemes, this renders
overcoming numerous challenges; maintaining the connectivity, our approach highly robust to the volatility of the communication
maximizing the network coverage, and performing real-time topology caused by the robots’ movements and signal attenua-
motion planning with limited global information, to name a few. tion. While many swarm control methods require aggregating
We propose a novel game-theoretical approach to maximize the global information [5], [6], the local neighborhood property
the coverage over a geographical region of interest (ROI) un- of graphical games allow us to selectively integrate partial ob-
der practical network constraints in real-time. Specifically, we servations into decision-making, reducing the communication
formulate the communication-aware coverage as a general-sum overhead. Thus, we believe graphical games allow designing
graphical game where the robots aim to maximize two auxiliary more effective control paradigms for large-scale robot swarms,
objectives: 1) coverage, and 2) connectivity, by leveraging local where the global state aggregation is intractable due to commu-
information. The general-sum games permit each robot’s payoff nication limitations.
to arbitrarily relate, thus making it an ideal framework for By using variational inference (VI), we substantiate the inter-
play between the game and an adjoined Markov random field
(MRF), whose posterior distribution resembles the solution of
Manuscript received October 30, 2021; accepted March 1, 2022. Date of
publication March 22, 2022; date of current version April 12, 2022. This letter
the former. We experimentally validate our approach in a mobile
was recommended for publication by Associate Editor G. Notomista and Editor wireless network scenario with an Unmanned Aerial Vehicle
M. A. Hsieh upon evaluation of the reviewers’ comments. (Corresponding (UAV) team to provide coverage for a set of User Equipment
author: Malintha Fernando.) (UE) over an ROI (Fig. 1).
Malintha Fernando and Martin Swany are with the Luddy School of Informat-
ics, Computing, and Engineering, Indiana University, Bloomngton, IN 47401 The main contributions of our work are, 1) Formulation
USA (e-mail: ccfernan@iu.edu; swany@iu.edu). of the communication-aware coverage as a graphical game,
Ransalu Senanayake is with the Stanford University, Stanford, CA 94305 2) Theoretical guarantees for the stage-game’s equilibrium by
USA (e-mail: ransalu@stanford.edu).
leveraging variational inference, 3) A scalable algorithm to reach
This letter has supplementary downloadable material available at
https://doi.org/10.1109/LRA.2022.3160968, provided by the authors. the equilibrium consensus, and 4) Experimental results for the
Digital Object Identifier 10.1109/LRA.2022.3160968 proposed approach under realistic network conditions.

2377-3766 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5967

II. RELATED WORKS Qi (xi ). Further, −i denotes the set of all players but i, (xi , x−i )
denotes an alternative action profile where i plays xi instead of
A. Multi-Robot Coverage
xi while x−i remains the same. We define Q(x) as an arbitrary
The coverage problem typically involves deploying a set of joint probability distribution over the action profile x with mixed
mobile nodes over a field to maximize some objective: wireless strategies Qi (xi ) as the marginals.
signal coverage – in case of wireless routers or, information gain A graphical game Γ consists of the tuple G, M where G
– in case of sensors [7], [8]. Myriad literature discusses coor- defines a graph whose vertices correspond to the players, and
dinating wireless nodes over the spectrum of communication M represents the set of payoff functions. The graphical game-
and control methods ranging from disk-based channel models theory significantly reduce one’s interacting agents count from
with centralized control to stochastic models with decentralized n to it’s local neighborhood size [4]. For some player i, Mi ∈
control. A widely known array of work uses disk graph-based M, Mi : Ai − → R. Following the definition of expectation, we
methods to coordinate multiple robots while maintaining the define the expected utility as follows.
overall network connectivity as the nodes move [9]–[11]. Fur- Definition 1: The expected utility of player i under a joint
ther, disk-based channel models have also been used for cover- probability Q is
age controlling in [10]–[13]. However, most of them overlook 
the volatility of the network topology caused by the stochasticity EQ [Mi (x)] = Q(xi , x−i )Mi (xi , x−i ). (1)
of wireless signals. Additionally, the “disk” assumption enforces x
excessively restrictive control on the robots to maintain the local Definition 2: The correlated equilibrium (CE) of a graphical
connectivity, sacrificing the coverage gain. game is a joint distribution Q over the associated undirected
In [8], [14] authors present optimization-based decentralized graphical model, under which no player has a unilateral incentive
coverage control for networked robot teams. The former ap- to deviate. Thus, for xi , xi ∈ Ai , ∀i, and xi = xi ,
proach mainly relies on a static coverage function and fails to
adjust to dynamic ROIs. In the latter, Kantaros et al. proposes EQ [Mi (x−i , xi )] ≥ EQ [Mi (x−i , xi )] .
optimizing similar auxiliary objectives to ours, considering
message-routing and fixed UEs. However, our work differs in its Definition 3: A mixed strategy Nash equilibrium (MSNE) is
ability to cater to both fixed and moving UEs while eliminating a special case of CE, where the joint probability
 is a product
the need to incorporate UE positions into the optimization distribution of the marginals. Thus, Q(X) = i Qi (Xi ) [20].
problem explicitly. By using the team abstraction proposed
in [15], we make our approach highly scalable in the number B. Variational Energy Functional
of UEs. In [16], [17] the authors combine the stochastic channel The VI casts the inference problem over an MRF as a
models to account for the wireless fading effect in point-to-point convex optimization problem and approximates the posterior
communication to find the optimum router configurations under distribution much more efficiently in contrast to exact inference
different routing algorithms. methods [23]. Let G = (V, E) be an MRF, where V and E are
the set of vertices and edges, and V consists of a set of discrete
B. Graphical Games RVs {X1 , . . . , Xn }.
Definition 4: The joint probability distribution over an MRF
Graphical games reflect the notion that a multi-player game
is often represented as a Gibbs distribution
can be succinctly represented by a graph, and a player’s payoff
only depends on its neighbors’ actions [4]. In [18]–[20] the 1 
authors established the interplay between games’ solutions and p(X = x) = φc (xc ), (2)
Z
c∈C
probabilistic graphical models. Although the notion of graphical
games resembles that of collective dynamics-based approaches, where, φc is a factor potential function associated with
only a few attempts have been made to employ the framework in some clique c ∈ C of G, φc : X |c| − → R+ , and Z is the
swarm coordination, despite the success gained by the latter. In partition
  function to normalize the distribution, where Z =
our previous work [21], we presented an MRF-based approach xc φ
c∈C c c(x ).
to steer a robot swarm to a flocking consensus, yet the theoretical Remark 1: For any φc = exp{ε(xc )}, p(x) defines an expo-
guarantees and the connection to graphical games were missing. nential family distribution, where ε : X |c| −
→ R is some function
In [12], [22] the authors proposed graphical game-theoretic that maps the clique c to a real number.
methods for distributed mobile sensor coverage by conserving Let Q and P denote the approximating and the true posterior
energy; however, the works assume stationary neighborhoods distributions in VI. We consider the I-projection of Kullback-
while overlooking the robot dynamics and network limitations. Leiber (KL)-divergence between the two distributions,
 
III. PRELIMINARIES Q(X)
D(Q||PC ) = EQ ln ,
A. Graphical Game Theory PC (X)

Consider a game involving n players and Ai define the where, PC is the probability distribution over the set of cliques
set of actions available to any player i ∈ {1, . . . , n}. Let x = C. From (2) and, HQ (X) = −EQ [ln Q(X)];
(x1 , . . . , xn ) denote the joint action profile of the players. We 

allow the players to play mixed strategies, and the probability D(Q||PC ) = − HQ (X) − EQ ln φc (xc ) + EQ [ln Z],
that i is playing the action xi is denoted by the mixed strategy c∈C

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
5968 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022

D(Q||PC ) = − F [P̃C , Q] + ln Z, (3)



for P̃C (x) = c∈C φc (xc ). We identify F [P̃C , Q] as the vari-
ational energy functional. From (3), maximum F gives the
minimum KL divergence between the approximating and the
true posteriors. In this work, we subsume the two auxiliary
objectives into factor potentials associated with cliques of the
underlying graphical model.

IV. PROBLEM FORMULATION


To accommodate a wide range of applications, we consider
Fig. 2. (a) The change of RSS, (b) the number of communicating nodes (blue)
providing wireless network coverage to a team of mobile UEs and the average hop-count (red) against the physical distance. The inf hops
using a robot (UAV) team. Note that we only control the robots have been ignored when the link disappears for the purpose of plotting. The
while the UEs move arbitrarily over the ROI. Our formulation parameters used are n=3, F=32(dBm)2 , T0 = 16.02 dBm.
can be further generalized to other coverage control problems
such as surveillance, where the robot teams need to maintain the
local connectivity while patrolling over an ROI. To accommo- Fig. 2(a) shows the change in RSS against the inter-robot
date the cooperative nature of the game, we assume that all the distance for the channel model. As in Fig. 2(b), we observed the
robots are interested in maximizing the coverage and thus, share network is densely connected at the beginning, but separated
the same payoff function. drastically as the nodes moved further away. Specifically, the
First, we define a concentration ellipsoid R to represent the hop-count between the robots increased with distance, making
ROI to capture the UE distribution, following [15]. Such an it much harder to maintain fixed neighborhoods. Here the hop-
abstraction allows us to easily scale, and to quantify the total count refers to the number of interim connections between two
wireless coverage independent of the UE team size. Let Σ define communicating nodes in the network. In practice, as the ROI
the covariance matrix associated with R. expands, the robots require to travel farther in search of better
Considering the homogeneity among the robots, we populate coverage; however, these experimental results signify the need
the action space Ai by discretizing the dynamically feasible to adjust the neighborhoods as the network topology changes.
control action space of a robot. The state equation of a robot i Thus, we define the local neighborhood of any robot i at time
can be written as ẏi = Ayi + Bxi , where A, B are positive semi- t, Nit (k) as the list of k-hop nodes in the instantaneous network
definite matrices and xi ∈ Ai . Let ẏipos ∈ R3 be the position of topology. Therefore, for any j = i, Nit (k) = {j|Hops(i, j) ≤
i as extracted from the state vector ẏi . k}, where Hops(i, j) is the number of hops to node j according
We consider that each stage of the communication-aware to i’s routing table. With the definition of expectation, we obtain
coverage game corresponds to a timestep t and is a graphical the expected RSS E[fRSS ] = ψR (xi , xj ) resulted by selecting
game Γt defined on the local neighborhoods Nit , ∀i. The aim the actions xi , xj as
of our work is to find a consensus UAV formation for R in the ψR (xi , xj ) = T0 − {L0 + 10n. log(dij )} . (5)
form of a MSNE for Γt . In the following sections, we show that
this can be achieved by leveraging VI and iterative stage game We define function ψC : Ai − → R to quantify the cov-
optimization. erage provided by the neighborhood Ni . We define the
communication-aware coverage as the expected “cooperative
V. APPROACH RSS field” for the ellipsoidal R. Therefore,

A. Communication-Aware Cooperative Coverage ψC (xN , R) = max {ψR (xi , r), ψR (x−i , r)} p(r), (6)
In practice, the RSS between any two wireless nodes can r∈R
attenuate for multiple reasons such as path loss, shadowing, and where p(r) is the probability of a point r ∈ R in the ROI. In our
fading. Thus, in wireless networks it is common to model the work, p(r) denotes the probability of having a UE at r under
channels in a stochastic fashion [24]. In this work, we employ the distribution characterized by Σ. Here, ψR (x−i , r) denotes
a channel model for the IEEE 802.11a protocol to obtain the the coverage imparted on r by i’s any other neighbor. When the
expected RSS in point-point communication. For a joint action values assigned to the local neighborhood of i, x−i , and the ROI
set we define fRSS : Ai × Aj − → R as are fixed, we observe that the coverage function only varies with
fRSS (xi , xj ) = T0 − {L0 + 10n. log(dij ) + F, } (4) xi within the time interval. Therefore, we denote the coverage
function associated with Ni as ψC (xi ). This formulation allows
where, dij =
ẏipos − ẏjpos
, fRSS (dij ) is the RSS measured us to construct a more realistic cooperative coverage field for
in dBm (decibels relative to milliWatt), T0 is the transmission the neighborhood, as UEs often select the wireless node with
power, L0 is the reference power loss for free space, n is a path the highest signal strength to connect in practice.
loss exponent and F is a zero mean Gaussian distribution to ac- Note that for r over large distances, ψC (xi ) is stationary for
count for the fading effect. In this work, we used L0 = 46.67 dBm some robot j ∈ N−i when ψR (xi , r) ≤ ψR (xj , r), as any UE at
calculated using the F riis model for open spaces. Similar r would connect to j despite the actions of i due to the higher
real-world experiments have also been conducted in [24], which signal strength. Thus, this introduces a partitioning in the local
helped us to model the fading in the channel. coverage functions. Fig. 3(a) shows a simple network topology

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5969

start by modifying and factorizing G into pairwise and neigh-


borhood subgraphs to complement the communication-aware
coverage model defined in (5) and (6). Let us first introduce
an auxiliary edge (j, h) between j and h nodes, if j, h ∈ Ni
and (j, h) ∈/ E for all i. In other words, we convert every local
neighborhood of G into a complete subgraph by adding a set
of auxiliary edges, (j, h) ∈ EAux . We denote the derived graph
as G  = (V, E  ), where E  = E + EAux . Fig. 3(b) shows a clique
transformed neighborhood subgraph for an initial communica-
tion topology. We define neighborhood clique as a clique that is
Fig. 3. (a). A visual representation of the local cooperative coverage function comprised of the nodes of some neighborhood. Next, we factor-
of i for a single hop neighborhood. The partitions covered by each neighbor ize the derived graph into a set of cliques C = E  ∩ {Ni |∀i},
are colored differently. (b). A communication topology with the newly added such that each clique c ∈ C either represents an edge in E 
auxiliary edges (red) to a neighborhood. The dashed polygon denotes the
resulting neighborhood clique. The neighborhood robots and the communication
or a neighborhood clique. Finally, we associate each clique
links are shown in blue. c ∈ C with a factor potential function φc : X |c| − → R+ . This
formulation allows us to define a joint distribution over the
induced MRF G  using the exponential family similar to (2).
for 4 robot nodes and a k = 1-hop neighborhood for the i-th Consider

node. Next, we use this coverage model to introduce the sufficient φi (xi ) = exp {αi .ψC } c ∈ {Ni |∀i}
statistics for MRF and the payoff functions. φc (xc ) = (8)
φij (xi , xj ) = exp {αij .ψR } c ∈ E,

B. Payoff Function for the Stage Graphical Game where φi , φij simply redefine the factor potentials given the
clique c’s type and αi , αij ≥ 0 are some weights associated
Following the communication topology and the neighborhood
with the cliques. With this definition, we consider the following
parameter k, we define the graph Gt = (Vt , Et ) for the stage
joint probability for the MRF G  .
game Γt . The set of vertices Vt = {X1 , . . . , Xn } comprises ⎧ ⎫
the random variables for each robot node. The set of edges Et 1 ⎨   ⎬
contains an element (i, j) if and only if i and j satisfies the neigh- p(x) = exp αa ψC (xi ) + αb ψR (xi , xj ) .
borhood condition under k. Thus, Et = {(i, j)|j ∈ Nit (k), ∀i}. Z ⎩ ⎭
i∈V (ij)∈E
Hereafter, we ignore the script t, as we are interested in finding (9)
the equilibrium for a single-stage game. Also, for a fixed k, let Theorem 1: The probability over the MRF G  , p(x) defines
Nit (k) = Nit . a linear exponential family with canonical parameters α and
We consider that the payoff of a robot i relies on the coverage sufficient statistics ψ.
provided by the neighborhood and its expected RSS with the Proof: Define two vectors ψ and α that contain factor po-
neighbors. Formally, we define, tentials and associated weights for cliques C of G  . Set each
 αc associated with the clique c; αc = αa for neighborhood
Mi (x) = αa ψC (xi ) + αb ψR (xi , xj ), (7) cliques, αc = αb for pairwise cliques, and αc = 0 for aux-
j∈N−i iliary pairwise cliques. Thus, the summations in (9) become
p(x) = exp{α, ψ − Λ}, where α, ψ is the inner product α,
where αa , αb > 0 are two predefined weight parameters. In this ψ, and Λ = ln Z. Following Remark 1, this defines a linear
work, they scale the contributions from two factor potentials exponential family. 
ψc , ψp in the payoff function proportionately. The first term With Remark 1 and Theorem 1, we observe that the posterior
forces the robot to move farther to maximize the coverage, and p(x) takes the standard form of a joint distribution over an
the second term penalizes the robot for selecting the actions MRF. Given p(x) and the payoff Mi , we further notice that
that weaken the signal strength. We assume that the expected two functions share a mutual form. Specifically, the summation
RSS remains roughly unchanged in close proximities over R. terms that pertain to a single robot i inside the exponential
Thus, considering the ROI size and inter-robot distances, we of the former resembles the payoff function. In the following
argue that the contradicting effects of the auxiliary objectives section, we perform posterior inference over the MRF G and
can be ignored for sufficiently small intervals of t. Therefore, a establish that the resulting probability distribution Q(x) yields
robot’s best response action maximizes the coverage as well as a consensus formation of the stage game Γ.
the expected RSS with its neighbors. In this work, we propose
solving the game Γ by performing posterior inference over an D. Stage Game Optimization With MFVI
appropriately tailored MRF. By using an exponential family
distribution, we establish an analogy between the equilibrium In Theorem 1 we established that the joint posterior p(x)
in the game and the resulting joint distribution. adheres to the form in (2), and in general, ε(xc ) = ψc (xc ) for all
c ∈ C. Thus, we start by considering the variational energy func-
tional F , associated with (9). It is clear from (3) that maximizing
C. Exponential Family Posterior Distribution for MRF F results in the minimum KL divergence between the true and
We now discuss integrating the payoff function with a prob- the approximating posteriors. Here we use MFVI to optimize
abilistic graphical model defined on the game Γ = (G, M). We F and to obtain the approximating posterior. Specifically, the

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
5970 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022

mean-field assumption considers the posterior distribution as


a product of marginals each corresponding to a single RV.
Therefore, the posterior consists of a set of independently and
identically distributed (iid) exponential marginals characterized
by their means [25]. This assumption improves the tractability
of the inference procedure by restricting the search space to the
mean-field family instead of the entire space of distributions.
Using the mean-field assumption, we formally define the opti-
mization problem as

Finding {Qi (Xi )},


Maximizing F [P̃C , Q],

Subjecting to Q(X) = Qi (Xi ),
i

Q(xi ) = 1, ∀i = 1 . . . , n.
xi

where Qi (xi ) denotes the marginal probability distribution of where Zi is a typical exponential family normalization constant,
i in the posterior. The goal of the MFVI is to find an update rule as introduced in Definition 4, and the Lagrange multiplier λ
for each marginal distribution while keeping the neighboring gets dropped in the normalization. The resulting update rule is
RVs fixed. From (3), the variational energy functional takes the also known as the coordinate ascent mean-field approximation.
form Further, the redefinition of φc in (8) ensures that ln φc (xc ) exists

 and reflects the coverage model.
F [P̃C , Q] = HQ (X) + EQ ln φc (xc ) . The stage game optimization algorithm summarizes the key
c∈C steps of MFVI into an iterative procedure to solve the graphical
game Γ = G, M by performing posterior inference on MRF
We consider the variational energy imparted on i from its
G  . Specifically, we optimize each RV of the MRF to calculate the
neighborhood Ni under joint Q expressively,
marginal probabilities Qi (Xi ) while fixing the neighboring RVs.
⎡ ⎤
In the graphical game theoretical paradigm, this is equivalent to

Fi = HQ (Xi ) + EQ ⎣ ln φij (xi , xj ) + ln φi (xi )⎦ . calculating the mixed strategies profile of each player given the
j∈N−i
neighborhood. In the next section, we next show that a resulting
(11) posterior probability distribution Q induces an equilibrium in
In order to maximize Fi , we write the Lagrangian Li with the the graphical game Γ.
Lagrange multiplier being λ as

 E. Equilibrium in the Stage Game
 
Li [Q] = HQ (Xi ) + EQ [ln φc ] + λ Qi (xi ) − 1 . Consider the modified stage game Γ = G, M where Mi ∈
c∈C xi
M. In this section, we discuss the interplay between the equilib-
Next, we differentiate the Lagrangian Li w.r.t the marginal rium solutions of Γ and the posterior probability of the factorized
Qi (xi ) and obtain the fixed points corresponding to the maxi- MRF induced by G  .
mum. Theorem 2: The joint posterior distribution Q∗ (x) over the
 MRF induced by G  results a CE for the stage game.
d Proof: According to (11) and (12), each marginal Qi (Xi )
Li [Q] = − ln Q(x) − 1 + EQ [ln φc |xi ] + λ.(12)
dQ(x)
c∈C that comprises Q∗ (X) defines a local maximum of the energy
functional Fi . Therefore, for some actions xi , xi ∈ Ai , under
During the differentiation step we used the facts that the Q∗ (X) the marginals Qi (xi ) ≥ Qi (xi ), and thus,
derivatives of the entropy HQ (Xi ) and expectation EQ [ln φc ]
terms w.r.t Q(xi ) are − ln Q(xi ) − 1 and EQ [ln φc |xi ] respec-
ln Qi (xi ) ≥ ln Qi (xi ). (14)
tively. Here EQ [ln φc |xi ] denotes the conditional expectation
of the factor potential φc given the value xi . Notice that the
factor function φc changes according to (8) given the clique Now consider an action profile x = (xi , x−i ) where x−i rep-
type. Setting the derivative to 0 and taking the exponentials of resents the actions of N−i and xj ∈ x−i for some j ∈ N−i . From
the both sides yields the update rule (13),
⎧ ⎡ ⎤⎫ ⎡ ⎤
1 ⎨  ⎬ 
Qi (xi ) = exp EQ ⎣ ln φij |xi + ln φi ⎦ , (13) ln Qi (xi ) ∝ EQ∗ ⎣ ln φij (xi , xj ) + ln φi (xi )⎦ .
Zi ⎩ ⎭
j∈N−i j∈N−i

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5971

Substituting from the definition (8) for φi , φij gives,


⎡ ⎤

ln Qi (xi ) ∝ EQ∗ ⎣ αb ψR (xi , xj ) + αa ψC (xi )⎦ .
j∈N−i

Notice that the summations inside the expectation match the


definition of a player’s payoff function (7). Therefore, substitut-
ing from the payoff function,
ln Qi (xi ) ∝ EQ∗ [Mi (xi , x−i )] . (15)
From (14) and (15),
Fig. 4. (a). Change of payoff according to (7) as the UAV swarm navigate
EQ∗ [Mi (xi , x−i )] ≥ EQ∗ [Mi (xi , x−i )] . over a stationary ROI. The values are plotted by changing the swarm size and
the communication hop-count. (b). Computational time to optimize a single
Thus, according to the definition, Q∗ (X) yields a CE of the stage stage of the game against the number of neighbors |Ni | and the size of the ROI.
game Γ.  Each cell represents an area of 100m2 .
According to Definition 3, we observe that the CE resulting
from MFVI is indeed an MSNE, due to the product form of
communication delays. We further assume that the UAVs can
Q∗ (X). Therefore, we argue that the posterior inference over
observe the ROI abstraction, characterized by its mean and the
MRF G  results in a consensus formation for the stage game
covariance matrix Σ, by communicating with a base station.
Γ. However, recall that the stage game is sub-terminal as the
We considered a differentially flat dynamical model for fixed
calculated mixed-strategy actions are the controls of the robots’
altitude navigation for the UAVs. Thus, we uniformly sampled
dynamical model. Therefore, we solve the stage game in it-
the acceleration space of a UAV within bounds [−3 ms2 , 3 ms2 ]
eratively to obtain continuous trajectories and, subsequently,
to populate Ai ⊂ R2 , ∀i along each X, Y dimension similar
a consensus formation for the communication-aware coverage
to our previous work [21]. For the computational feasibility,
game. The distributed configuration of our approach reduces the
we discretized the ROI R into 10 m × 10 m cells. All the
stage game graph into the neighborhood clique of the derived
implementations and experiments are conducted using the C++
graph G  for any robot, rendering the distributed setting much
programming language.
more desirable for controlling large-scale swarms.

VI. EXPERIMENTS AND RESULTS B. Experiments


We evaluated the coverage and computational performance of
A. Experiments Setup
the proposed approach against the number of UAVs and the size
We evaluated the proposed approach using NS-3 and Robot of the ROI. We accommodated the latter scenario by allowing
Operating System (ROS) environments in a mobile wireless the UE team to travel between arbitrarily chosen start and goal
network scenario, that consists of UAVs and a mobile UE team.1 locations, resulting the ROI’s shape and size to change over time
Briefly, NS-3 is a widely employed event simulator to design ( Fig. 5). We observed that for small ROIs, αb ’s effect is minimal,
and implement network models, and we delegate the task of thus rendering the cooperative coverage to govern the payoff.
network packet routing to NS-3 by accounting for wireless This reduces the tuning effort of the proposed method to a great
signal attenuation as the nodes move. Each UAV was equipped extent. Throughout the experiments we used the parameters
with two wireless interfaces that complied with the channel αa = 1 and αb = 0.001 to cater to varying ROI sizes.
model to establish the 1) inter-UAV, and 2) UAV-UE links. The We first computed the average payoff of a UAV for the
action set of a UAV consists of acceleration-based discretized stationary ROI case by varying the maximum hop-count and the
control inputs, and we select the actions by optimizing the stage number of UAV nodes in the swarm as in Fig. 4(a). Even though
games iteratively to reach a consensus formation. The UAVs the single-hop neighborhoods’ payoffs overlapped with those
communicate with their neighbors over the inter-UAV network of higher-order at close proximities, as the UAVs travel farther,
for optimizing the game. The UAV-UE links were only used to the stability of the formations and the payoffs deteriorated. This
measure the coverage RSS of the UEs for evaluation purposes. is mainly caused by the densely connected network topology at
Further, to calculate the routing paths as the UAV and UE close proximities; which permits the UAVs to observe the global
nodes move, we used NS-3’s inbuilt Optimized Link State Rout- swarm state with fewer hops. Therefore, it is understandable that
ing (OLSR) algorithm [27]. The OLSR algorithm’s ability to as the network separates, the cooperative coverage deviates from
populate the routing tables by accounting for the communica- the global value due to the increased locality. By increasing the
tion uncertainties helps us update the local neighborhoods in allowed maximum hops, we show our approach can result in
real-time. The ROS and NS-3 platforms communicate through better payoffs and stable UAV formations; without explicitly
Remote Procedure Calls (RPC) to update the game accordingly. aggregating the global swarm state. Thanks to the adaptive
For all the UAVs, we ran the stage game optimization algorithm neighborhood property that complements the network dynamics,
at a modest frequency 1/t = 1 Hz in order to account for our approach scales well with the size of the swarm and ROI.
For the performance evaluations, we used k = 3 as the al-
1 Find the Mavswarm simulator we used for this work at: https://github.com/ lowed communication hops throughout the experiments. We
malintha/multi_uav_simulator. evaluated computational time for two crucial steps in the game:

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
5972 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 3, JULY 2022

Fig. 6. Change of the average RSS of UE nodes against time in stationary


(a) and moving (b) scenarios for different UAV swarm sizes, using the CoCo
model. For clarity, only the error for n = 8 scenario is plotted throughout the
experiments.

attenuating wireless signals, routing, and calculating RSS for


each communication link. In Fig. 6(a) and (b), we present the
average RSS as observed by a UE for for the experiments.
Throughout the convergence process, we observed that the UAVs
to constantly increase their cooperative payoff, leading to higher
RSS in the UEs. We observed significant improvements in the
Fig. 5. Trajectories of robot nodes with time. The top row shows the equilib- average RSS as the UAVs reached the equilibrium formations
rium communication topology for a stationary ROI and the initial UE positions under both scenarios. Also, it’s worth noting that decibel is
(blue). (d)–(i) represents the UAV (red) and UE (blue) trajectories as the UE a logarithmic scale, and a unit increment represents doubling
nodes move to their designated goal positions. The green ellipsoidal fence the observed signal strength. The global connectivity of the
represents the concentration ellipsoid for the ROI R. Note that the robot models
are not to the scale for visualization purposes. network was maintained throughout the experiments, as the
OLSR algorithm was able to find routes between the network
nodes successfully. The final communication topologies for the
1) calculating the cooperative coverage and 2) mean-field ap- moving ROI experiments were observed to be more densely
proximation stages. Specifically, we performed the calculations connected than the stationary scenarios due to the relatively
against the number of the neighbors in a game |Ni |, and the size small ROI size.
of the discretized ROI. In the implementation, we calculated the We compare the network and optimization aspects of our
fixed values for any r ∈ R in the cooperative coverage function framework to that of the widely used disk-based model and
ψC beforehand to eliminate redundant calculations and max(.) the decentralized, adaptive coverage method presented in [8].
comparisons within a single stage of the game. We observed Following our initial findings presented in Fig. 4(b), we selected
this to improve the computational efficiency by multitudes, the average single-hop distance ≈ 60m as the radii for the disk
especially optimizing a single stage of the game under 50 ms model under similar environmental constraints. We report the
for most cases. Fig. 4(b) shows the computational time for the average observed RSS of a UE in Fig. 7(a) and (b) for the same
two corresponding steps. configurations. The disk-based models resulted in RSS gains
Fig. 5 shows the robot trajectories for n = 3, 5 and 8 UAVs initially. However, as the UAVs moved beyond each others’
scenarios. We defined the ROI using the concentration ellipsoid disk radii, the algorithm failed to reach a consensus formation.
for 5 UEs distributed around the origin. In each scenario, we Even though increasing the communication radii seems like a
simulated the system with stationary and moving ROIs for 40 s trivial solution, we emphasize that it can risk losing the global
and 80 s intervals, respectively. We initialized the UAVs from connectivity altogether due to the highly stochastic nature of the
the origin for all the experiments with a fixed altitude, and wireless signals at large distances.
maintaining the UEs stationary at first. Fig. 5(a)–(c) show the The decentralized, adaptive coverage control [8] follows a
equilibrium UAV formations and the communication topologies. similar approach to partition the environment based on the
Similarly, Fig. 5(d)–(i) show the UAV and UE trajectories for observed sensor gain for each robot. For the comparison, we
the moving ROI experiments. The UEs travels between the start implemented the coverage as a bivariate Gaussian function
and arbitrarily chosen goal locations by morphing the ROI. The defined at the center of R with covariance Σ. The work, however,
blue squares and dots represent the start and current locations limits itself to static coverage functions and fails to handle the
of the UEs. With the stage game optimization, the UAV swarm scenarios with dynamic ROIs and stochastic communication
repeatedly move to maximize the coverage as the ROI changes. links. In contrast to our unified framework, it also overlooks
As the UE team members reach their destinations, the UAV the robots’ dynamics, thus the calculated paths needed to be
swarm converge to the equilibrium formations as showed in further smoothed with trajectory optimization before deploying
Fig. 5(g), (h), and (i). For clarity, only the most recent trajectory on the UAVs. Fig. 7(c) shows the average RSS resulting in a UE
trails of the UAVs are showed. We repeated each experiment using the decentralized, adaptive coverage control. Even though
for 20 trials using NS-3 to perform network events such as the approach converged to the consensus faster, it failed to yield

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.
FERNANDO et al.: COCO GAMES: GRAPHICAL GAME-THEORETIC SWARM CONTROL FOR COMMUNICATION-AWARE COVERAGE 5973

[5] H. Zhu, J. Juhl, L. Ferranti, and J. Alonso-Mora, “Distributed multi-robot


formation splitting and merging in dynamic environments,” in Proc. Int.
Conf. Robot. Automat., 2019, pp. 9080–9086.
[6] B. Şenbaşlar, W. Hönig, and N. Ayanian, “Robust trajectory execution for
multi-robot teams using distributed real-time replanning,” in Distributed
Autonomous Robotic Systems. Berlin, Germany: Springer, 2019, pp. 167–
181.
[7] W. Li and C. G. Cassandras, “Distributed cooperative coverage control
of sensor networks,” in Proc. 44th IEEE Conf. Decis. Control, 2005,
pp. 2542–2547.
[8] M. Schwager, D. Rus, and J.-J. Slotine, “Decentralized, adaptive cov-
erage control for networked robots,” Int. J. Robot. Res., vol. 28, no. 3,
pp. 357–375, 2009.
[9] M. Ji and M. Egerstedt, “Distributed coordination control of multiagent
systems while preserving connectedness,” IEEE Trans. Robot., vol. 23,
no. 4, pp. 693–703, Aug. 2007.
[10] G. Notarstefano, K. Savla, F. Bullo, and A. Jadbabaie, “Maintaining
limited-range connectivity among second-order agents,” in Proc. Amer.
Control Conf., 2006, p. 6.
[11] P. Yang, R. A. Freeman, G. J. Gordon, K. M. Lynch, S.S. Srinivasa, and R.
Sukthankar, “Decentralized estimation and control of graph connectivity
for mobile sensor networks,” Automatica, vol. 46, no. 2, pp. 390–396,
2010.
Fig. 7. Comparisons: Average RSS of UE nodes in stationary (a) and moving [12] E. Paraskevas, D. Maity, and J. S. Baras, “Distributed energy-aware mobile
ROI (b) scenarios for different UAV swarm sizes using a fixed-radii model. (c) sensor coverage: A game theoretic approach,” in Proc. Amer. Control
Average RSS plots for the adaptive coverage method in the stationary scenario. Conf., 2016, pp. 6259–6264.
[13] J.-M. Etancelin, A. Fabbri, F. Guinand, and M. Rosalie, “DACYCLEM: A
decentralized algorithm for maximizing coverage and lifetime in a mobile
higher RSS values in practice due to rigid connectivity and the wireless sensor network,” Ad Hoc Netw., vol. 87, pp. 174–187, 2019.
overlooked dynamics. [14] Y. Kantaros and M. M. Zavlanos, “Distributed communication-aware
coverage control by mobile sensor networks,” Automatica, vol. 63,
VII. CONCLUSION pp. 209–220, 2016.
[15] C. Belta and V. Kumar, “Abstraction and control for groups of robots,”
We have proposed a novel game-theoretic swarm coordination IEEE Trans. Robot., vol. 20, no. 5, pp. 865–875, Oct. 2004.
framework to achieve communication-aware coverage for robot [16] D. Mox, M. Calvo-Fullana, M. Gerasimenko, J. Fink, V. Kumar, and A.
Ribeiro, “Mobile wireless network infrastructure on demand,” in Proc.
swarms by only utilizing the local information. Our work com- IEEE Int. Conf. Robot. Automat., 2020, pp. 7726–7732.
plements the underlying network and robot dynamic models in [17] Y. Yan and Y. Mostofi, “Robotic router formation in realistic communi-
neighborhood selection and control, resulting a robust coverage cation environments,” IEEE Trans. Robot., vol. 28, no. 4, pp. 810–827,
Aug. 2012.
scheme for large-scale ROIs. We have evaluated our approach [18] C. Daskalakis and C. H. Papadimitriou, “Computing pure nash equilibria
in an ad-hoc mobile wireless network scenario to show that it in graphical games via Markov random fields,” in Proc. 7th ACM Conf.
can result in significant coverage gains while maintaining the Electron. Commerce, 2006, pp. 91–99.
local connectivity. We have further established that our approach [19] S. Kakade, M. Kearns, J. Langford, and L. Ortiz, “Correlated equilibria
in graphical games,” in Proc. 4th ACM Conf. Electron. Commerce, 2003,
achieves real-time control and consensus for networked robot pp. 42–47.
teams. [20] L. E. Ortiz, B. Wang, and Z. Gong, “Correlated equilibria for approximate
variational inference in MRFs,” in Proc. Int. Conf. Probabilistic Graphical
Models, 2020, pp. 329–340.
REFERENCES [21] M. Fernando, “Online flocking control of UAVs with mean-field approxi-
[1] M. Mozaffari, W. Saad, M. Bennis, Y.-H. Nam, and M. Debbah, “A mation,” in Proc. IEEE Int. Conf. Robot. Automat., 2021, pp. 8977–8983.
tutorial on UAVs for wireless networks: Applications, challenges, and [22] X. Ai, V. Srinivasan, and C.-K. Tham, “Optimality and complexity of
open problems,” IEEE Commun. Surv. Tut., vol. 21, no. 3, pp. 2334–2360, pure nash equilibria in the coverage game,” IEEE J. Sel. Areas Commun.,
Jul./Sep. 2019. vol. 26, no. 7, pp. 1170–1182, Sep. 2008.
[2] V. Sharma, M. Bennis, and R. Kumar, “UAV-assisted heterogeneous net- [23] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles
works for capacity enhancement,” IEEE Commun. Lett., vol. 20, no. 6, and Techniques. Cambridge, MA, USA: MIT Press, 2009.
pp. 1207–1210, Jun. 2016. [24] J. Fink, A. Ribeiro, and V. Kumar, “Robust control for mobility and wire-
[3] G. Skorobogatov, C. Barrado, and E. Salamí, “Multiple UAV systems: A less communication in cyber-physical systems with application to robot
survey,” Unmanned Syst., vol. 8, no. 02, pp. 149–169, 2020. teams,” Proc. IEEE Proc. IRE, vol. 100, no. 1, pp. 164–178, Jan. 2012.
[4] M. J. Kearns, M. L. Littman, and S. Singh, “Graphical models [25] M. J. Wainwright and M. I. Jordan, Graphical Models, Exponential Fam-
for game theory,” in Proc. 17th Conf. Uncertainty Artif. Intell., ilies, and Variational Inference. Boston, MA, USA: Now, 2008.
San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001, [26] T. Clausen and P. Jacquet, “RFC3626: Optimized link state routing proto-
pp. 253–260. col (OLSR),” USA: RFC Editor, 2003.

Authorized licensed use limited to: Corporacion Universitaria de la Costa. Downloaded on July 18,2022 at 21:32:41 UTC from IEEE Xplore. Restrictions apply.

You might also like