Distributed Estimation With Information-Seeking Control in Agent Networks

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO.
11, NOVEMBER 2015 2439
Distributed Estimation With Information-Seeking

Control in Agent Networks
Florian Meyer, Student Member, IEEE, Henk Wymeersch, Member, IEEE,
Markus Fröhle, Student Member, IEEE, and Franz Hlawatsch, Fellow, IEEE
Abstract—We introduce a distributed cooperative framework sensors, wireless communication interfaces, a processing unit,
and method for Bayesian estimation and control in decentralized and actuators, all together forming a cyber-physical system
agent networks. Our framework combines joint estimation of [10], [11] with a tight coupling between sensing, comput-
time-varying global and local states with information-seeking con-
trol optimizing the behavior of the agents. It is suited to nonlinear ing, communication, and control. A common task in mobile
and non-Gaussian problems and, in particular, to location-aware agent networks is seeking information, either about external
networks. For cooperative estimation, a combination of belief phenomena or about the network itself. This task relies on
propagation message passing and consensus is used. For cooper- estimation (quantifying, fusing, and disseminating information)
ative control, the negative posterior joint entropy of all states is and control (configuring the network to increase information).
maximized via a gradient ascent. The estimation layer provides
the control layer with probabilistic information in the form of A common theme in previous works is the reliance on position
sample representations of probability distributions. Simulation information for estimation and/or control.
results demonstrate intelligent behavior of the agents and excellent Estimation methods for mobile agent networks (our focus
estimation performance for a simultaneous self-localization and will be on distributed Bayesian estimation) address estimation
target tracking problem. In a cooperative localization scenario of common global states [7], [12]–[16], estimation of local
with only one anchor, mobile agents can localize themselves after
a short time with an accuracy that is higher than the accuracy of states [17]–[22], or combined estimation of local and global
the performed distance measurements. states [23]–[25]. In the first case, the agents obtain local mea-
surements with respect to external objects or the surrounding
Index Terms—Agent networks, distributed estimation, dis-
tributed control, information-seeking control, distributed target environment, which are fused across the network. Global fusion
tracking, cooperative localization, belief propagation, message methods that require only local communication include consen-
passing, consensus, sensor networks, sequential estimation. sus [26] and gossip [27]. Example applications are distributed
target tracking [7], cooperative exploration of the environment
I. I NTRODUCTION [15], and chemical plume tracking [3]. In the second case
(estimation of local states), the agents cooperate such that
A. Motivation and State of the Art each agent is better able to estimate its own local state. Here,
the dimensionality of the total state grows with the network
R ECENT research on distributed estimation and control
in mobile agent networks [1]–[4] has frequently been
motivated by location-aware scenarios and problems includ-
size, which leads to more complex factorizations of the joint
posterior probability density function (pdf). When the factor
ing environmental and agricultural monitoring [5], healthcare graph [28] of this factorization matches the network topology,
monitoring [6], target tracking [7], pollution source localization efficient message passing methods for distributed inference can
[8], chemical plume tracking [3], and surveillance [9]. The be used, such as the belief propagation (BP) [28] and mean field
agents in a mobile agent network are generally equipped with [29] methods. Example applications are cooperative localiza-
tion [17], synchronization [19], [22], and simultaneous localiza-
tion and synchronization [20], [21]. In the third case (estimation
Manuscript received August 15, 2014; revised November 27, 2014; accepted of both global states and local states), a message passing
March 18, 2015. Date of publication May 6, 2015; date of current version algorithm can be combined with a networkwide information
October 19, 2015. This paper was presented in part at the IEEE International dissemination technique. An example application is cooperative
Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane,
Australia, April 2015. This work was supported in part by the Austrian Science simultaneous self-localization and target tracking [23], [25].
Fund (FWF) under Grants S10603 and P27370, and by the European Commis- In many cooperative estimation scenarios, it is advantageous
sion under ERC Grant 258418 (COOPNET), the EU FP7 Marie Curie Initial to control certain properties of the agent network, such as
Training Network MULTI-POS (Grant 316528), and the Newcom# Network of
Excellence in Wireless Communications. the agent positions or the measurement characteristics (“con-
F. Meyer and F. Hlawatsch are with the Institute of Telecommunications, trolled sensing”) [1]–[9]. In particular, here we will address
Vienna University of Technology, Vienna 1040, Austria (e-mail: florian.meyer@ the problem of combining distributed estimation and distributed
nt.tuwien.ac.at; franz.hlawatsch@nt.tuwien.ac.at).
H. Wymeersch and M. Fröhle are with the Department of Signals and Sys- control in mobile agent networks. We will limit our discussion
tems, Chalmers University of Technology, Gothenburg 412 96, Sweden (e-mail: to information-seeking control, which seeks to maximize the
henkw@chalmers.se; frohle@chalmers.se). information carried by the measurements of all agents about
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. the global and/or local states to be estimated. The use of
Digital Object Identifier 10.1109/JSAC.2015.2430519 information measures for the control of a single agent or a
0733-8716 © 2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution
requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2440 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO. 11, NOVEMBER 2015
network of agents was introduced in [30] and [31], respectively.

Suitable measures of information include negative posterior
entropy [32], mutual information [32], and scalar-valued func-
tions of the Fisher information matrix [33]. In particular, the
determinant, trace, and spectral norm of the Fisher information
matrix were considered in [34], where the control objective is to
maximize the information related to the positions of the agents
and of a target. The maximization of negative posterior entropy
was considered in [12]–[16]. In [14], a central controller steers
agents with known positions along the gradient of negative (n) (n)
Fig. 1. Agent network with CAs and targets. The neighborhood sets Al , Cl ,
posterior entropy to optimally sense a global state. A distributed (n) (n)
Tl , and Cm for one CA l∈ C and one target m ∈ T are also shown.
solution for global state estimation was proposed in [12],
[13] based on a pairwise neighboring-agents approximation
of mutual information and in [15], [16] by using a consensus negative posterior entropy reduces to maximization of the mu-
algorithm. However, the methods proposed in [12]–[16] did not tual information between observations and states, our controller
use BP, did not allow for multiple time-varying states, and did includes an additional term that arises because the posterior
not include estimation of local (controlled) states. entropy involves also the local (controlled) states of the agents.
Due to this more general formulation, our controller is suited to
decentralized estimation tasks where the agents cooperatively
B. Contribution and Paper Organization infer also their own states.
This paper is organized as follows. In Section II, the system
Here, we present a unified Bayesian framework and method
model is described and the joint estimation and control problem
for (i) distributed, cooperative joint estimation of time-
is formulated. Section III reviews joint local and global state
varying global and local states and (ii) distributed, cooperative
estimation [24], [25]. In Section IV, we introduce the proposed
information-seeking control. Our framework and method are
gradient-based controller. The distributed computation of the
suited to nonlinear and non-Gaussian problems, they require
gradient is discussed in Sections V–VII. Section VIII considers
only communication with neighboring agents, and they are able
two special cases of the joint estimation and control framework.
to cope with a changing network topology. Thereby, they are
Finally, in Section IX, we present simulation results demon-
particularly suited to localization and tracking tasks in location-
strating the performance of our method for a simultaneous self-
aware scenarios involving mobile networks and nonlinear
localization and target tracking problem.
models.
For distributed estimation, following [24], [25], we combine
BP message passing, consensus, and sample-based representa- II. S YSTEM M ODEL AND P ROBLEM F ORMULATION
tions of the involved probability distributions. For distributed
control, we define a global (holistic) objective function as the We consider a network of mobile agents k ∈ A as shown in
negative joint posterior entropy of all states in the network at Fig. 1. The set of all agents, A, consists of the set of cooperative
agents (CAs), C ⊆ A, and the set of “targets,” T = A \ C.
the next time step conditioned on all measurements at the next
time step. This objective function is optimized jointly by all Here, a target may be anything that does not cooperate and can-
agents via a gradient ascent. This reduces to the evaluation not be controlled, such as a noncooperative agent or a relevant
feature of the environment. We will typically use the indices
of local gradients at each agent, which is performed by using
Monte Carlo integration based on the sample representations k ∈ A, l ∈ C, and m ∈ T to denote a generic agent, a CA, and
provided by the estimation stage and a distributed evaluation a target, respectively. A block diagram of the overall “signal
of the joint (networkwide) likelihood function. Our method processing system” is shown in Fig. 2 for a CA (including the
advances beyond [12]–[16] in the following respects: estimation and control layers of the proposed method) and for
a target. This system is described below.
• It constitutes a more general information-maximizing
control framework based on BP for estimation problems
involving multiple time-varying states. A. Agent States and Sensor Measurements
• It includes estimation of the local (controlled) states of The state of agent k ∈ A at discrete time n ∈ {0, 1, . . .}
the agents, thus enabling its use in a wider range of
is denoted by the vector x(n)k . For example, in a localization
applications. (n)
scenario, xk may consist of the current position and motion-
Contrary to [24] and [25], which introduce the distributed related quantities such as velocity, acceleration, and angular
joint estimation of time-varying local and global states in agent velocity [35]. The states evolve according to
networks, here we focus on the information-maximizing control
of the agents. Our main contribution is a derivation and sample- x(n)
l = gl xl
(n−1) (n) (n)
, ul , ql , l ∈ C (1)
based formulation of a new information-seeking controller that
maximizes the negative joint posterior entropy of time-varying or
local and global states. Compared to the information-seeking
controller proposed in [12]–[16], where maximization of the x(n) (n−1) (n)
m = gm xm , qm , m∈T, (2)
MEYER et al.: DISTRIBUTED ESTIMATION WITH INFORMATION-SEEKING CONTROL IN AGENT NETWORKS 2441
measured agent (CA or target) k ∈ A(n)

l according to

y(n) (n) (n) (n)
l,k = dl xl , xk , vl,k , l ∈ C, k ∈ A(n)
l , (3)
where dl (·) is a possibly nonlinear function and v(n)

l,k is measure-
ment noise. An example is the scalar measurement

(n) (n) (n) (n)
yl,k = xl − xk + vl,k , (4)

where x(n) equals the position of agent k and, hence, x(n)
l −
(n)
k

xk is the spatial distance between agents l and k. The statis-
(n)
tical relation between the measurement yl,k and the involved
(n) (n)
states xl and xk is also described by the local likelihood
(n) (n) (n)
function f yl,k xl , xk . For the derivation of the controller,
(n) (n) (n) (n)
f yl,k xl , xk is assumed differentiable with respect to xl .
Fig. 2. Block diagram of the overall “signal processing system” for (a) a CA We also make the following assumptions. The number of tar-
l ∈ C and (b) a target m ∈ T . gets is known, and the targets can be identified by the CAs, i.e.,
target-to-measurement assignments are known. Furthermore,
(n) CA l knows the state evolution models gk (·) and process noise
where gk (·) is a possibly nonlinear function, qk is process
pdfs f q(n) for k ∈ {l} ∪ Cl(n) ∪ T ; the initial prior pdfs of the
(driving) noise, and u(n)
l ∈ Ul is a deterministic control vector that
k
(0) (0)
(n) agent states, f (xk ), for k ∈ {l} ∪ Cl ∪ T ; the measurement
controls the lth CA. Since ul is deterministic [36, Chap. 5], (n)
models dl (·) for l ∈ {l} ∪ Cl ; and the measurement noise pdfs
it is either completely unknown (before it is determined) or (n)
perfectly known (after being determined by the control layer). f vl,k , k ∈ A(n) (n) (n)
l and f vl,l , l ∈ Cl .
Note that also targets may have control variables. However,
as these are hidden from the CAs, we will subsume any B. Problem Formulation
(n)
control for target m in the noise qm . For the derivation of
(n−1) (n) (n) The following tasks are to be performed at each time n:
the controller, we assume that for l ∈ C, gl xl , ul , ql is
(n−1)
bijective with respect to xl and differentiable with respect to 1) Each CA l ∈ C estimates the states x(n) k , k ∈ {l} ∪ T (i.e.,
its own local state and the states of all targets) from prior
ul . The statistical relation between x(n−1)
(n)
k and x(n)
k as defined information and all past and present measurements in the
by (1) or (2) can also be described by the state-transition pdf
network.
f x(n) (n−1); u(n) for l ∈ C and f x(n)
l xl l
(n−1) for m ∈ T .
m xm 2) The state of each CA is controlled such that the negative
The measurement and communication topology of the net- joint posterior entropy of all states in the network at
(n) (n) (n)
work is described by the neighborhood sets Cl , Tl , and Al the next time, conditioned on all measurements in the
as follows. CA l acquires a measurement y(n) l,l relative to CA network at the next time, is maximized.
l if l ∈ Cl(n). This relation is symmetric, i.e., l ∈ Cl(n) implies We solve these two problems in a distributed and recursive
(n)
l ∈ Cl . It is assumed that CAs that acquire measurements manner. Our method consists of an estimation layer and a
relative to each other are able to communicate, i.e., to transmit control layer, as shown in Fig. 2(a). In the estimation layer,
data via a communication link. Furthermore, CA l ∈ C acquires CA l computes an approximation of the marginal posterior
(n) (n) (n)
a measurement yl,m relative to target m if m ∈ Tl ⊆ T . The pdfs of the states xk , k ∈ {l} ∪ T given all the past and
targets are noncooperative in that they do not communicate and present measurements and control vectors in the entire net-
(n) (n) work. In the control layer, CA l uses these marginal posteriors
do not acquire any measurements. We also define Al Cl ∪
(n) (n) and the statistical model to determine a quasi-optimal control
Tl . Finally, the set Cm contains all CAs measuring target m,
(n) (n) (n) (n) (n) variable u(n+1)
l . In both layers, the CAs communicate with
i.e., all l ∈ C with m ∈ Tl . The sets Cl , Tl , Al , and Cm are neighbor CAs.
generally time-dependent. An example of a measurement and
communication topology is given in Fig. 1. We assume that the
(n)
graph defined by {Cl }l∈C is connected. III. E STIMATION L AYER
We consider “pairwise” measurements1 y(n) l,k that depend on The estimation layer performs distributed estimation of the
the state x(n) (n)
l of a measuring CA l ∈ C and the state xk of a
local and global states by using the BP- and consensus-based
method introduced in [24], [25]. We will review this
(n) method in
our present context. Let us denote by x(n) xk k∈A , u(n)
[u(n) (n) [y(n) ]
1 The proposed framework can be easily extended to self-measurements
(measurements that involve only the own state) and cluster measurements l ]l∈C , and y l,k l∈C, k∈A(n)
l
the stacked vectors of,
(measurements that involve the states of several other agents). respectively, all states, control variables, and measurements at
time n. Furthermore, let x(1:n) [x(1)T, . . . , x(n)T]T , u(1:n) In the correction step, CA l determines the marginal posteriors
[u(1)T, . . . , u(n)T ]T , and y(1:n) [y(1)T, . . . , y(n)T]T . Each CA f (xl |y) and f (xm |y), m ∈ T , which are given by

l ∈ C estimates its local state x(n) l and all the target states
(n) f (xk |y) ∝ f (xk ) f (yl,k1 |xl , xk1 ) dx∼k ,
xm , m ∈ T from the measurements of all CAs up to
k ∈A l ∈C k1 ∈Al
time n, y(1:n). This estimation is based on the posteriors
k ∈ {l} ∪ T . (10)
f (x(n)
k |y
(1:n) ; u(1:n) ), k ∈ {l} ∪ T , which are marginals of the
joint posterior f (x(1:n) |y(1:n); u(1:n) ).

Here, x∼k denotes x with xk removed and f (xk |y) is short for
Using Bayes’ rule and common assumptions [17], the joint
f x(n)
k y
(1:n); u(1:n) . As shown in Fig. 2(a), these marginal pos-
posterior can be factorized as
teriors are handed over to the control layer, which determines

the control input for the next time step, u+ l .
f x(1:n)y(1:n); u(1:n)

n

(0)
(n ) (n −1)
∝ f x k f x m x m
B. BP Message Passing and Consensus
k∈A n =1 m∈T Calculation of f (xk |y) according to (10) is generally in-
(n ) (n −1) (n ) (n ) (n ) (n )
× f x l x l ; ul f yl,k xl , xk . (5) feasible, due to the reliance on nonlocal information and the
inherent complexity of the marginalization process. Based on
l∈C k ∈A(n )
l the factorization of the joint posterior in (5), a computationally
feasible approximation of (10) is provided by a distributed, co-
The marginal posterior of state x(n)
k is then given by operative algorithm that combines BP message passing and the
average consensus scheme [24], [25]. This algorithm computes
(1:n) (1:n)
f x(n)
k
y ; u = f x (1:n) (1:n) (1:n)
y ; u (1:n)
dx∼k,n , (6) an approximation (“belief”) b(xk ) ≈ f (xk |y) in an iterative
manner, using only communication with neighbor CAs l ∈ Cl .
Its complexity scales only linearly with the number of states in
(1:n) (n)
where x∼k,n denotes x(1:n) with xk removed. Using the network, |A| (for a fixed number of iterations).
(n)
f (xk |y(1:n); u(1:n)), the minimum mean-square error (MMSE) According to [24], [25], the belief of local state xl at message
(n)
estimator [33] of xk is obtained as passing iteration p ∈ {1, . . . , P} is given by

(n) (n)

(n)

(n)
x̂k,MMSE xk f xk y(1:n) ; u(1:n) dxk , k ∈ A . (7) b(p)(xl ) ∝ f (xl ) f (yl,l |xl , xl ) b(p−1)(xl ) dxl
l ∈Cl
(p−1)
A. Sequential Calculation × f (yl,m |xl , xm ) ψm→l (xm ) dxm , (11)
For a review of the sequential calculation of (6) proposed in m∈Tl
[24], [25], we now switch to the following simplified notation
for the sake of readability. In the conditions of the various con- which is initialized as b(0)(xl ) = f (xl ). Similarly, the belief of
ditional pdfs, we omit the measurements up to time n −1, i.e., target state xm at message passing iteration p is given by
y(1:n−1), and the control vectors up to time n, i.e., u(1:n), because

y(1:n−1) has already been observed and u(1:n) has already been b(p)(xm ) ∝ f (xm )
(p−1)
f (yl,m |xl , xm ) ψl→m (xl ) dxl , (12)
determined; hence both are considered fixed. Furthermore, we l∈Cm
suppress the time index n, and we write the current and previous
(n) (n−1)
states of CA l as xl = xl and xl = x−
l , respectively. Sim- (p−1)
(n+1) + with initialization b(0)(xm ) = f (xm ). Here, ψm→l (xm ) and
ilarly, we write ul = ul . For sequential calculation of (6), (p−1)
CA l ∈ C employs a basic Bayesian recursive filtering method ψl→m (xl ) are the “extrinsic informations” from target m to
[17] consisting of a prediction step and a correction step. In the CA l and from CA l to target m, respectively, at the previous
prediction step, CA l ∈ C computes a predictive posterior of its message passing iteration p − 1. These extrinsic informations
current state, are calculated recursively as

− − (p) b(p)(xm )
f (xl ) = f xl |x− l f xl dxl . (8) ψm→l (xm ) = (p−1)
(13)
f (yl,m |xl , xm ) ψl→m (xl ) dxl
(n)
Here, f (xl ) and f (x−
l ) are short for f (xl |y
(1:n−1) ; u(1:n) )
(p) b(p) (xl )
(n−1) (1:n−1) (1:n−1) ψl→m (xl ) = (p−1)
, (14)
and f (xl |y ;u ), respectively. Furthermore, CA f (yl,m |xl , xm ) ψm→l (xm ) dxm
l computes predictive posteriors of the target states
(0)
with initialization ψm→l (xm ) = f (xm ) and ψl→m (0)
(xl ) = f (xl ), re-
−
f (xm ) = f xm |x− −
m f (xm ) dxm , m∈T. (9) spectively. In (11), the beliefs b (p−1) (xl ) of neighbor CAs l ∈ Cl
are used instead of the extrinsic informations. This is part of additional condition that has been observed previously and is
the sum–product algorithm over a wireless network (SPAWN) thus fixed:
scheme for cooperative localization [17], which results from a
specific choice of the message schedule in loopy BP and has −h x(n+1) y(n+1) ; y(1:n), u(1:n+1)
been observed to provide highly accurate estimates [17], [24],
[25], [37].
= f x(n+1), y(n+1)y(1:n) ; u(1:n+1)
A sample-based distributed implementation of (11)–(14)
has been proposed in [24], [25]. A problem
for a distributed
implementation is that the products l∈Cm f (yl,m |xl , xm ) × log f x(n+1)y(n+1), y(1:n); u(1:n+1)
(p−1)
ψl→m (xl ) dxl , m ∈ T involved in (12) are not available at the
CAs. However, an approximation of these products can be pro- × dx(n+1)dy(n+1), (16)
vided to each CA in a distributed manner (i.e., using only local
communications) either by a consensus algorithm performed in where
log denotes the natural logarithm. Note that
parallel for each sample weight [25], [38], [39] or by the like- h x (n+1) y (n+1) ; y(1:n), u(1:n+1) depends on the random vectors
lihood consensus scheme [7], [24], [40]. For the calculations in x(n+1) and y(n+1), i.e., on their joint distribution but not on their
the control layer—to be described in Sections IV–VII—all CAs values. Our notation indicates this fact by using a (1:n+1)
sans serif

require a common set of samples representing the target beliefs font for x (n+1) and y (n+1) in h x (n+1)y (n+1); y(1:n), u .
(n+1) (n+1) (1:n)
b(p)(xm ). This can be ensured by additionally using, e.g., a max- According to expression (16), −h x y ;y ,
consensus and providing all CAs with the same seed for random
u(1:n+1) is a function of the control vector u(n+1) . This function
number generation [39], [41].
will be denoted by Dh u(n+1) , i.e.,
The output of the estimation layer is the set of beliefs
b(P)(xk ), k ∈ {l} ∪ T at the final message passing iteration
Dh u(n+1) − h x(n+1)y(n+1); y(1:n), u(1:n+1) , (17)
p = P. These beliefs are handed over to the control layer,
which calculates the control variables u+l (see Fig. 2(a)). Each and it will be used as the objective function for control at
(P) (j)
belief b (xk ) is represented by J samples {xk }Jj=1 , which is each CA. This objective function is holistic in that it involves
(j)
briefly denoted by {xk }Jj=1 ∼ b(P)(xk ). From these samples, an all the next states (of both the CAs and the targets), x(n+1),
approximation of the MMSE estimate (7) is calculated as and all the next measurements,
(n+1) y(n+1). Instead of a full-blown
maximization of Dh u , we perform one step of a gradient
ascent [42] at each time n. Thus, u(n+1) is determined as
1 (j)
J
x̂k = xk .
J
j=1
û(n+1) = u(n+1)
r + c(n+1) ∇Dh u(n+1) u(n+1) =u(n+1) , (18)
r
(n+1)
where ur is a reference vector and c(n+1) > 0 is a step
(n+1)
IV. C ONTROL L AYER size. The choice of ur depends on the manner in which
(n)
the local control vectors ul (which are subvectors of u(n) )
Next, we present the information-seeking controller. We (n)
temporarily revert to the full notation. appear in the state evolution functions gl x(n−1) l , u(n)
l , ql in
(n+1) (n+1)
(1); two common choices are ur = u(n) and ur =0
(cf. Section IX-A).

A. Objective Function and Controller Since u(n+1) = u(n+1)
l l∈C
, we have

According to our definition at the beginning of Section III, (n+1) ∂Dh (u(n+1) )
the vector comprising all the measurements in the network at ∇Dh (u )= ,
∂u(n+1)
the next time n + 1 is y(n+1) = [y(n+1)
l∈C
l,k ]l∈C,k∈A(n+1) . However, l
l
(n+1)
in this definition of y(n+1), we now formally replace Al and thus the gradient ascent (18) with respect to u(n+1) is
(n) (n+1) equivalent to local gradient ascents at the individual CAs l with
by Al since at the current time n, the sets Al are not yet (n+1)
known. Thus, with an abuse of notation, y (n+1) is redefined as respect to the local control vectors ul . At CA l, the local
gradient ascent is performed as
(n+1)
(n+1)
y(n+1) yl,k (n)
. (15) (n+1) (n+1) (n+1) ∂Dh u

l∈C,k∈Al ûl = ur,l + cl , (19)
∂u(n+1)
l
(n+1)
u(n+1) =ur
In the proposed control approach, each CA l ∈ C determines its

(n+1) where u(n+1)
r,l is the part of u(n+1)
r that corresponds to CA l
next control variable ul such that the information about the (n+1) (n+1) (n+1)
next joint state x (n+1) given y(1:n+1) is maximized. We quantify (we have ur = ur,l l∈C
). The local step sizes cl are
(n+1)
this information by the negative conditional differential entropy constrained by the condition ul ∈ Ul for given sets Ul . In
[32, Chap. 8] of x(n+1) given y(n+1), with y(1:n) being an practice, this condition can be easily satisfied by an appropriate
scaling of the c(n+1)

l . Note that, as in [15], we use different parameter, i.e., not a third random variable), which is given by
(n+1) [32, Chap. 8]
local step sizes cl at the individual CAs l. This heuristic
modification is made to account for the possibly different sets
Ul and to avoid the necessity of reaching a consensus on a I(xxC , x + + +
T ;y ; u )
common step size across all the CAs; it was observed to yield
good results. Because the objective function Dh (·) changes over f (xC , x+ + +
T ,y ;u )
= f (xC , x+ + +
T , y ; u ) log
time n, the local ascent described by (19) generally is not f (xC , x+ + +
T )f (y ; u )
guaranteed to converge; this is similar to existing information-
seeking control algorithms [13], [15]. Indeed, the goal of the × dxC dx+ +
T dy . (24)
proposed control algorithm is to make available informative
measurements to the estimation layer; because of the dynamic Note that h(xxC , x + +
T ) in (23) does not depend on u , since
scenario, this is generally not compatible with convergence. +
neither the CA states xC nor the future target states xT are con-
trolled by the future control variable u+. We explicitly express
the dependence of I(xxC , x + + + +
T ; y ; u ) on u by defining the
B. Expansion of the Objective Function function
A central contribution of this paper is a distributed
sample-based technique for calculating the gradients DI (u+ ) I(xxC , x + + +
T ;y ; u ) . (25)
∂Dh (u(n+1) )
(n+1) (n+1)
in (19). As a starting point for
∂ul u (n+1) =ur
Combining (17), (21), (23), and (25) then yields the following
developing this technique, we next derive an expansion of
expansion of the objective function:
the objective function Dh u(n+1) . We will use the following
simplified notation. We do not indicate the conditioning on
y(1:n) and the dependence on u(1:n) because at time n + 1, Dh (u+ ) = −h(xxC , x + +
T ) + DI (u ) − Gl (u+
l ). (26)
y(1:n) has already been observed and u(1:n) has already been l∈C
determined; hence both are fixed. (Note that in Section III, we

This entails the following expansion of the gradient in (19):
did not indicate the conditioning on y(1:n−1), rather than y(1:n).)
Also, we suppress the time index n and designate variables
∂Dh (u+ ) ∂DI (u+ ) ∂Gl (u+
l )
at time n +1 by the superscript
“+”. For example, we write = − . (27)
+ + +
h(xx+ |yy+ ; u+ ) instead of h x (n+1) y (n+1) ; y(1:n), u(1:n+1) . ∂ul ∂ul ∂ul
For calculating the gradient, following [13] and [15], we
disregard the unknown driving noise ql in (1) by formally In Sections V–VII, we will develop sample-based
techniques
replacing it with its expectation, q̄l ql f (ql ) dql . We can ∂DI (u+ ) ∂Gl (u+l )
for calculating ∂u+ + + and ∂u+ + + . The calcula-
then rewrite (1) (with n replaced by n + 1) as u =ur ul =ur,l
l l
∂DI (u+ )
tion of ∂u+ + + is cooperative and distributed; it requires
x+ + + +
l = gl (xl , ul , q̄l ) = g̃l (xl , ul ) , l∈ C . (20) l u =ur
communication with neighbor CAs l ∈ Cl . The calculation
∂Gl (u+ )
As shown in Appendix A, the conditional differential entropy of
∂u+
l
+ + is performed locally at each CA l. Both
l ul =ur,l
in (17) can be expressed as calculations use the samples of relevant marginal posteriors
that were computed by the estimation layer. The operations
h(xx+ |yy+ ; u+ ) = h(xxC , x + y+ ; u+ ) +
T |y Gl (u+
l ), (21) performed in the control layer as described in this section and in
l∈C Sections V–VII are summarized in Figs. 3 and 4 for two alter-
+ native distributed implementations (discussed in Section VI).
where xC xl l∈C , x+
T xm m∈T , and
V. C ALCULATION OF THE G RADIENT OF DI (u+ )

Gl (u+
l ) f (xl ) log |Jg̃l (xl ; u+
l )| dxl , In this section, we develop a Monte Carlo approximation

∂DI (u+ )
of ∂u+ + + that uses importance sampling. The dis-
∂ g̃l (xl , u+
l )
l u =ur
with Jg̃l (xl ; u+
l ) det . (22) tributed computation of this approximation will be addressed in
∂xl Section VI.
The mutual information in (24) can be rewritten as
The first term on the right-hand side of (21) can be decom-
posed as [32, Chap. 8]
DI (u+ ) = f (y+ |xC , x+ + +
T ; u ) f (xC , xT )
h(xxC , x + y+ ; u+ )
T |y = h(xxC , x +
T)− I(xxC , x + + +
T ;y ; u ) . (23)
Here, I(xxC , x + + + f (y+ |xC , x+ +

T ;u )
T ; y ; u ) denotes the two-variable mutual infor- × log dxC dx+ +
T dy .
mation between (xC , x+ + + f (y+ ; u+ )
T ) and y (with u being a deterministic
Fig. 3. Flowchart of the flooding-based implementation of the control layer at CA l (see Section VI-A).
Fig. 4. Flowchart of the consensus-based implementation of the control layer at CA l (see Section VI-B).
Invoking [15, Th. 1], we obtain The likelihood function f (y+ |xC , x+ +
T ; u ) involved in (28) can
be written as

∂DI (u+ ) ∂f (y+ |xC , x+ +
T ;u )
= f (xC , x+
T) f (y+ |xC , x+ +
f (y+ + +
T ;u ) = l,l |xl , xl ; ul , ul )

∂u+l ∂u+
l l∈C l ∈Cl
+
f (y+ |x +
C , xT ; u ) × f (y+ + +
× log dxC dx+ + l,m |xl , xm ; ul ) , (29)
f (y+ ; u+ ) T dy . (28)
m∈Tl
with q(y+, xC , x+ + +
T ) f (xC , xT )f (y |xC , x
+ +
T ; ur ) viathe following
(j) +(j) J
f (y+ + + two-stage procedure. First, samples xC , xT are drawn
l,l |xl , xl ; ul , ul )
j=1
+ +
from
= f (y+
l,l |xl ,xl ) x+ = g̃ (x ,u+ ),x+ = g̃ (x ,u+ ) , l ∈ C, l ∈ Cl (30)
l l
f (xC , x+ f (x+
l l l l l l
T)= f (xl ) m) . (35)
f (y+ + +
l,m |xl , xm ; ul ) l∈C m∈T

= f (y+ |x+, x+ ) +
l,m l m xl = g̃ (x ,u+ )
, l ∈ C, m ∈ Tl . (This factorization expresses the conditional statistical indepen-
l l l
dence of the xl , l ∈ C and the x+ m , m ∈ T given y
(1:n) . This
(The latter expressions are obtained using (3) and (20).) is a common approximation, which was introduced in [17]
Let ỹ+
l denote the subvector of y + = y+
l,k l∈C,k∈Al (cf. (15)) and is also used in the estimation layer [24], [25].) Then,
whose likelihood function includes all those factors of (j) +(j) J
for each sample xC , xT , samples y+(j,j ) j =1 are drawn
f (y+ |xC , x+ +
T ; u ) in (29) that depend on the local control vector (j) +(j)
+
ul . This subvector is given by from the conditional pdf f y+ xC , xT ; u+ r . The distributed
calculation of these samples will be discussed in Section VI
+ T T
ỹ+
l y + T
l,k k∈A yl ,l l ∈C , (31) and in Appendix C. Finally, we note that using (32), one
l l
easily obtains a (rather unwieldy) expression of the derivative
∂f (ỹ+ + +
l |xl ,xCl ,xT ;uC )
and its likelihood function is obtained as l l
occurring in (33). This expression involves
∂u+
l
f (ỹ+ + +
l |xl , xCl , xTl ; uCl )
∂ g̃l (xl ,u+
l )
∂f (y+ |x+,x+ )
l,l l l
the factors in (32) and the derivatives + , for
∂ul ∂x+ l
= f (y+ + + + + +
l,l |xl , xl ; ul , ul ) f (yl,l |xl , xl ; ul , ul )
∂f (y+ + +
l,m |xl ,xm )
l ∈ Cl , and ∂xl + for m ∈ Tl .
l ∈Cl

× f (y+ + +
l,m |xl , xm ; ul ) , (32)
VI. D ISTRIBUTED P ROCESSING
m∈Tl
+ In this section, we present two alternative schemes
for a dis-
with u+
Cl ul l ∈{l}∪C . By comparing (32) with (29), it is ∂DI (u+ )
l tributed, sample-based computation of ∂u+ + + according
seen that ỹ+ +
l is also the subvector of y whose likelihood
l u =ur
to (33) and (34). Both schemes are distributed in that they
function includes all those factors of f (y |xC , x+
+ +
T ; u ) in (29) require only local communication, i.e., each CA l ∈ C transmits
+
that involve the state xl . data only to its neighbors l ∈ Cl .
Using (29) and (32), it is shown in Appendix B that a Monte
Carlo (i.e., sample-based) approximation of (28) evaluated at
u+ = u+ r is given by
2 A. Flooding-Based Processing

∂DI (u+ ) We first discuss a distributed scheme where each CA l ∈ C
∂u+ + + performs an exact (“quasi-centralized”) calculation of (33) and
l u =ur (j)J
(34). As a result of the estimation layer, samples xk j=1 ∼

1
J J
1 f (xk ), k ∈ {l} ∪ T are available at CA l (see (10), noting that
≈ +(j,j ) (j) (j) +(j) + f (xk ) was denoted f (xk |y) there). A flooding protocol [44]
JJ f ỹ x ,x ,x ;u
j=1 j =1 l l Cl Tl r,Cl is now used to make available (j)to
J each CA l the reference
+(j,j ) (j) (j) +(j) + vectors u+
x , x , x ; u and the samples x l j=1 ∼ f (xl ) of all the other

∂f ỹl r,l
l Cl Tl Cl
× CAs l ∈ C\{l}. The flooding protocol requires each CA l to
∂u+ l u+ +
C =ur,C communicate with its neighbor CAs l ∈ Cl . In addition, CA l
l l
(j) +(j) +
+(j,j )
locally calculates predictive marginal posteriors for all target
f y x , x ; u
× log C T + r , (33) states via the following prediction step (which is (9) with n
f y+(j,j ) ; ur replaced by n + 1):

with f (x+ ) = f (x+
m m |xm )f (xm ) dxm , m∈T. (36)
1 J
(j ) +(j )
f y+(j,j ) ; u+ ≈ f y+(j,j ) xC , xT ; u+ (j)J
r r . (34)
An implementation of (36) using the samples xm j=1 , m ∈
J
j =1
T produced by the estimation layer (which are available at
+(j) J
(j) +(j)
Here, y+(j,j ) , xC , and xT are samples of y+, xC , and x+ CA l) and yielding samples xm j=1 ∼ f (x+ m ), m ∈ T is de-
T, (j)J
respectively that are drawn from the importance density [43] scribed in [24], [25]. At this point, samples xl j=1 ∼ f (xl )
+(j) J
for l ∈ C and xm j=1 ∼ f (x+ m ) for m ∈ T are available at
2 With an abuse of notation, the superscript (j) now indicates the jth sample,
whereas previously, in our full-blown notation, the superscript (n) indicated the CA l. Because all states xl , l ∈ C and x+ m , m ∈ T are con-
nth time step. ditionally independent given y(1:n) (see (35)), CA l can now
(j) +(j) J

obtain samples xC , xT ∼ f xC , x+T by the simple stack-
The key question at this point is as to whether the quantities
(j)
j=1
+(j) +(j,j ) (j) (j) +(j) + ∂f (ỹ+(j,j
) (j) (j) +(j)
|xl ,xC ,xT ;u+
(j) +(j) Cl )
ing operations xC = xl l ∈C and xT = xm m∈T . Finally, f ỹl xl , xCl , xTl ; ur,Cl ,
l
+
l l
+ + ,
(j) +(j) ∂u
J l uC =ur,C
for each xC , xT , CA l computes samples y+(j,j ) j =1 ∼ (j ) +(j ) +(j,j ) (j)
l l
(j) +(j) and f y+(j,j ) xC , xT ; u+ r (and, in particular, f y

x ,
C
f y+ xC , xT ; u+ r as described in Appendix C. +(j) +
(j) +(j) xT ; ur ) involved in(33) and (34) are locally available at CA
Using the samples xC , xT

and y+(j,j ) , j = 1, . . . , J, j = +
l. The factors of f ỹ+ +
l xl , xCl , xTl ; ur,Cl (see (32)) correspond
1, . . . , J , as well as the reference vectors u+
r,l , l ∈ C, CA l to measurements to be acquired by CA l or by its neighbor

I (u ) CAs l ∈ Cl ; they are known to CA l since its own state
+
can compute the gradient ∂D∂u + + +
locally according to
l u =ur evolution and measurement models and those of its neighbors
(33) and (34). Note, however, that this flooding-based scheme
are known to CA l (cf. (30)). Thus, we conclude that the
presupposes that each CA l knows the state evolution models +(j,j ) (j) (j) +(j) +
f ỹl x , x , x ; u
(1) and the measurement models (3) of all the other CAs l Cl Tl r,Cl are available

at CA l. On the other

hand, many of the factors of f (y xC , xT ; u+
+ +
l ∈ C\{l}. r ) (see (29)) corre-
The communication cost of the flooding-based scheme, in spond to measurements to be acquired by CAs that are not in the
terms of the number of real values transmitted by each CA, neighborhood of CA l; they are not known to CA l since, typ-
is (JM + Mu )W ≈ JMW. Here, M and Mu are the dimensions ically, the respective state evolution and measurement models
(j ) +(j )
of the vectors xl and ul , respectively, and W depends on the are unknown to CA l. Therefore, the f y+(j,j ) xC , xT ; u+ r
network size and topology and is bounded as 1 ≤ W ≤ |C|. are not available at CA l.
Thus, the number of transmissions scales linearly with J and We will now present a distributed computation of
+(j,j ) (j ) +(j ) +
does not depend on J . In large networks, flooding protocols x , x ; ur . Let y+ +
f y
+ C T l denote the subvector of y =
tend to require a large memory and book-keeping overhead and yl,k l∈C,k∈A in (15) that comprises the measurements acquired
introduce a significant delay [45]. If the network formed by l
by CA l at the next time, i.e.,
the CAs is fully connected, i.e., C = {l} ∪ Cl ∀ l ∈ C, then all
(j) +(j) J +
the samples xC , xT j=1
can be obtained without flooding: y+
l yl,k k∈A . (37)
l
CA l simply broadcasts its reference vector u+ r,l and its sam-
(j) J The likelihood function of y+
ples xl j=1 ∼ f (xl ) to all the other CAs in the network and l combines all the factors in (29)
that involve the entries of y+
l , i.e.,
receives their reference vectors and samples. Here, the number
of real values transmitted by each CA is only JM + Mu .
+
Finally, the computational complexity per CA of the f y+ +
l xl , xCl , xTl ; uCl
flooding-based scheme—i.e., evaluation of (33) and (34), with
J and J fixed—scales linearly with the number of agents in the = f (y+ + +
l,l |xl , xl ; ul , ul )
f (y+ + +
l,m |xl , xm ; ul ) . (38)
network. Because the computational complexity and the com- l ∈Cl m∈Tl
munication cost increase with the network size, the flooding- Using (29) and (38), one can show that
based distributed processing scheme is primarily suited to small
(j ) +(j )
networks. f y+(j,j ) xC , xT ; u+ r = exp |C|Fj,j,j , (39)
where
B. Consensus-Based Processing
1 (l)
Next, we present an alternative distributed computation of Fj,j,j Fj,j,j (40)
|C|
(33) and (34) that avoids the use of a flooding protocol and l∈C
does not require each CA to know the state evolution and
with
measurement models of all the other CAs. As a first step, CA l
(j)J (l) +(j,j ) (j ) (j ) +(j ) +
broadcasts its own samples xl j=1 ∼ f (xl ) calculated in the Fj,j,j log f yl x , x , x ; ur,Cl , (41)
l Cl Tl
estimation layer to all neighbor CAs l ∈ Cl , and it receives
(j)J
from them their own samples xl j=1 ∼ f (xl ), l ∈ Cl . In ad- for j = 1, . . . , J, j = 1, . . . , J , and j = 1, . . . , J. To com-
+(j,j ) J
+(j) J pute Fj,j(l)
,j in (41), CA l needs samples yl ∼
dition, CA l locally calculates samples xm j=1 ∼ f (x+ m ) for + (j) (j) +(j) +
j =1
all m ∈ Tl via the prediction step (36) (with T replaced by Tl ), f yl xl , xCl , xTl ; ur,Cl and the reference vectors ur,l for +
+(j,j ) J
using the sample-based implementation described in [24], [25].
(j) (j) +(j)
l ∈ {l} ∪ Cl . The samples yl j =1
are already available at
Thus, after the stacking operations xCl = xl l ∈C and xTl =
+(j) (j) (j) +(j) J l CA l since y+ +
l is a subvector of ỹl (see (31) and (37)) and
xm m∈T , samples xl , xCl , xTl ∼ f (xl , xCl , x+ +(j,j ) J + (j) (j) +(j) +
j=1 T ) are samples ỹl ∼ f ỹl xl , xCl , xTl ; ur,Cl have already
l
(j) (j) l +(j) j =1
available at CA l. Then, for each sample xl , xCl , xTl , been computed as described above. The u+
+(j,j ) J (j) (j) +(j) r,l can be obtained at
J samples ỹ l
j =1
∼ f ỹ+ x , x , x ; u+ are com-
l l Cl Tl r,Cl
CA l through communication with the neighbor CAs l ∈ Cl .
(l)
puted as described in Appendix D. This only involves commu- Once the Fj,j ,j have been calculated at CA l, their av-
nication between neighboring CAs. erages Fj,j,j in (40) can be computed in a distributed way
by using J 2 J parallel instances of an average consensus or respect to u+ + +

l at ur,l . A sufficient condition is that Jg̃l (xl ; ul )
gossip scheme [26], [27]. These schemes are iterative; they are + +
is differentiable with respect to ul at ur,l and nonzero for all
(l)
initialized at each CA l with Fj,j,j . They are robust to com- u+ +
l in some (arbitrarily small) neighborhood of ur,l .
munication link failures [26], [27] and use only communication (j)J
Based on the samples xl j=1 ∼ f (xl ) that were calculated
between neighbor CAs (i.e., each CA l ∈ C transmits data to
in the estimation layer, a Monte Carlo approximation of (42) is
each neighbor l ∈ Cl ). After convergence of the consensus or
(j ) +(j ) obtained as
gossip scheme, Fj,j,j and, hence, f y+(j,j ) xC , xT ; u+ r for
all j, j, j is available at each CA l.
+(j,j ) (j) (j) ∂Gl (u+
l )
At this point, CA l has available f ỹl x , x , +
l Cl ∂u
+(j,j ) (j) (j) +(j) +
l u+ +
l =ur,l
+(j) + ∂f (ỹl |xl ,xC ,xT ;uC )
(j)
xTl ; ur,Cl and l l l
+ + , and an ap-
∂u+ uC =ur,C 1
J
1 ∂ Jg̃l xl ; u+
l
≈ l
(j ) +(j ) l (j) + + +. (43)
l
Jg̃ x ; u +
proximation of f y+(j,j ) xC , xT ; u+ r has been provided by
J
j=1 l l r,l
∂ul u =u l r,l
the consensus or gossip scheme, for j = 1, . . . , J, j = 1, . . . , J ,
and j = 1, . . . , J. Therefore, CA l is now able to evaluate (33) For many practically relevant
state evolution models (20),
∂Gl (u+l )
and (34). the computation of
∂u+
+ + can be avoided altogether
In the course of the overall distributed computation, CA ul =ur,l
l
l transmits J 2 J |Cl |R + JJ |Cl |My + JM + Mu ≈ J 2 J |Cl |R real l )

∂Gl (u+
or + + can be calculated in closed form, without a
∂ul +
ul =ur,l
values, where R is the number of iterations used for one instance
of the consensus or gossip scheme and My is the dimension of sample-based approximation. Some examples are considered in
the vectors yl,k . Asymptotically, for R → ∞, this distributed the following.

∂DI (u+ )
computation of ∂u+ + + converges to the exact central- ∂Gl (u+l )
l u =ur 1) Jg̃l (xl ; u+ +
l ) does not depend on ul : In this case, ∂u+
=
ized result given by (33) and (34). The speed of convergence l
0. An important example is the “linear additive” state
depends on the topology and size of the network [26], [27]. As
evolution model g̃l (xl , u+ +
l ) = Axl + ζ(ul ) with some
R increases, the information available at each agent converges,
matrix A and function ζ(·) of suitable dimensions. Here,
which means that local data is disseminated over large distances ∂Gl (u+l )
in the network. However, because the control vector of a given we obtain Jg̃l (xl ; u+
l ) = det A and thus ∂u+
= 0. A
l
CA might not be strongly affected by information from far second important example is the odometry motion model
away CAs, a small R might be sufficient for good performance. [47, Sec. 5.3]. Here, the local state xl is the pose of a
Because the communication requirements are proportional to robot, which consists of the 2D position (xl,1 , xl,2 ) and the
J 2 J , they are typically higher than those of the flooding-based orientation θl , and the control vector ul consists of the
scheme discussed in Section VI-A unless the network is large translational velocity νl and the rotational velocity ωl .
and R is small. The state evolution model is given by
Finally, the computational complexity per CA of the dis- ⎡ ⎤
tributed processing—i.e., evaluation of (33) and (34), with J, J , xl,1 + νl+ cos(θl + ωl+ )
and R fixed—is constant in the number of agents in the network. g̃l (xl , u+ ⎣ + + ⎦
l ) = xl,2 + νl sin(θl + ωl ) .
+
θl + ωl
VII. C ALCULATION OF THE G RADIENT OF Gl (u+

l )
∂Gl (u+l )
Here, Jg̃l (xl ; u+
l ) = 1 and thus = 0.
∂u+ l
Next, we consider the second gradient in the expansion (27), Jg̃l (xl ; u+ +
2) l ) does not depend on xl : If Jg̃l (xl ; ul ) =
∂Gl (u+l )
i.e., ∂u + +
ul =u+
. Using (22), we obtain Jg̃l (u+ + +
l ), then (22) simplifies to Gl (ul ) = log |Jg̃l (ul )|.
l
r,l
Thus, we have
∂Gl (u+ l )

+ + + ∂Gl (u+ ∂|Jg̃l (u+
∂ul ul =ur,l l ) 1 l )|
= ,
∂u+ +
|Jg̃l (ul )| ∂ul +
∂ log |Jg̃l (xl ; u+
l )|
l
= f (xl ) + + + dxl
∂ul ul =ur,l
which can be calculated in closed form.

1 ∂|Jg̃l (xl ; u+
l )|
= f (xl ) + + + + dxl . (42) VIII. T WO S PECIAL C ASES
|Jg̃l (xl ; ur,l )| ∂ul ul =ur,l
A. Cooperative Estimation of Local States
Here, we assumed that |Jg̃l (xl ; u+ l )| is continuous and satisfies Here, we assume that there are no targets, and thus the task
|f (xl ) ∂ log |Jg̃l (xl ; u+
l )|/∂u+
l | ≤ α(xl , u+ +
l ) for all (xl , ul ), for considered is only the distributed, cooperative estimation of the
+
some function α(xl , ul ) ≥ 0 that is integrable with respect local states.
to xl for each u+ l [46, Cor. 5.9]. Furthermore, we assumed 1) Estimation Layer: The marginal posteriors correspond-
that for each value of xl , |Jg̃l (xl ; u+ l )| is differentiable with ing to the targets are no longer calculated. In the calculation
of the marginal posterior of CA l, the correction step (10) be ensured by additionally using, e.g., a max-consensus and
simplifies to providing all CAs with the same seed for random number
generation [39], [41].
2) Control Layer: Since there are no unknown CA
f (xl |y) ∝ f (xl ) f (yl1 ,l |xl1 , xl ) dx∼l , (44) states, the objective function in (26) simplifies in that
l ∈C l1 ∈Cl DI (u+ ) = I(xx + + + +
T ; y ; u ) and Gl (ul ) = 0 for all l ∈ C. The
expression of the gradient of DI (u+ ) in (33) and (34)
while the prediction step (8) remains unchanged. A feasible simplifies because f (ỹ+ + + + + +
l |xl , xCl , xTl ; uCl ) = f (ỹl |xl , xTl ; uCl )
+ + + + + +
and, typically, accurate approximation of f (xl |y) in (44) can be and f (y |xC , xT ; u ) = f (y |xT ; u ); furthermore, sampling
obtained by evaluating from f (xC , x+
T ) reduces to sampling from f (xT ).
+
This special case was previously considered in [15]. More

specifically, [15] studied estimation of one static global
b(p)(xl ) ∝ f (xl ) f (yl,l |xl , xl ) b(p−1)(xl ) dxl (45) state and proposed a distributed, gradient-based, information-
l ∈Cl seeking controller and a sample-based implementation.
for iteration index p = 1, . . . , P, where b(0)(xl ) = f (xl ), l ∈

Cl . This amounts to the BP-based SPAWN scheme presented IX. S IMULATION R ESULTS
in [17]. All quantities involved in (45) are locally available at We demonstrate the performance of the proposed method for
CA l or can be made available by communicating only with the three different localization scenarios. In Section IX-B, we study
neighbor CAs l ∈ Cl . A sample-based implementation of (45) the behavior of the controller by considering noncooperative
is discussed in [37] and [48]. self-localization of four mobile CAs based on distance mea-
2) Control Layer: Since there are no targets, the com- surements relative to an anchor. In Section IX-C, we consider
ponent DI (u+ ) = I(xxC , x + + +
T ; y ; u ) of the objective function cooperative self-localization of three mobile CAs. Finally, in
in (26) simplifies to DI (u ) = I(xxC ; y + ; u+ ). The expres-
+
Section IX-D, two mobile CAs perform cooperative simultane-
sion of the gradient of DI (u+ ) in (33) and (34) simpli- ous self-localization and tracking of a target. Further simulation
fies as well because f (ỹ+ + + + +
l |xl , xCl , xTl ; uCl ) = f (ỹl |xl , xCl ; uCl ) results demonstrating the performance of the estimation layer
+
and f (y+ |xC , xT ; u+ ) = f (y+ |xC ; u+ ) (according to (32) and in larger networks are reported in [25]. Simulation source files
(29), since T = ∅); furthermore, sampling from f (xC , x+ T)
and animated plots are available at http://www.nt.tuwien.ac.at/
(see Section VI) reduces to sampling from f (xC ). about-us/staff/florian-meyer/.
B. Cooperative Estimation of Global States A. Simulation Setup
Next, we discuss the case where the local states of the CAs The following aspects of the simulation setup are common
are known, and thus our task is only the distributed, cooperative to all three scenarios. The states of the CAs consist of their 2D
estimation of the target states. position, i.e., x(n) (n) (n) T
l [xl,1 , xl,2 ] in a global reference frame. In
1) Estimation Layer: The marginal posteriors corresponding addition to the mobile CAs, there is one anchor CA (indexed by
to the CAs are no longer calculated, and the correction step (10) l = 1), i.e., a static CA that broadcasts its own (true) position to
in the calculation of the marginal posterior of the mth target the mobile CAs but does not perform any measurements. The
simplifies to CA network is fully connected. The states of the mobile CAs
evolve independently according to [35]

f (xm |y) ∝ f (xm ) f (yl,m |xl , xm ) , (46) (n) (n−1) (n) (n)
xl = xl + ul + ql , n = 1, 2, . . . . (47)
l∈Cm
(n)
where f (xm ) is calculated according to (9). A computationally Here, ql ∈ R2 is zero-mean Gaussian with independent and
feasible sample-based approximation of sequential state esti- identically distributed entries, i.e., q(n)
l ∼ N (0, σq I) with σq =
2 2
(n) (n )
mation as given by (46) and (9) is provided by the particle filter 10−3 , and ql and ql are independent unless (l, n) = (l, n ).
[7], [40], [49]. The admissible set Ul of the control vector u(n) is defined
The product of local likelihood functions l∈Cm f (yl,m |xl , xm ) (n) l
is not available at the CAs. However, as in Section III-B, an by the norm constraint ul ≤ umax l . For the interpretation
(n)
approximation of these products can be provided to each CA of ul within (47), it is assumed that the CAs know the
in a distributed manner by a consensus (or gossip) algorithm orientation of the global reference frame. In the initialization
performed in parallel for each sample weight [25], [38], [39] or of the algorithms, at time n = 0, we use a state prior that is
by the likelihood consensus scheme [7], [24], [40]. uniform on [−200, 200]×[−200, 200].
For the calculations in the control layer (described presently), The mobile CAs acquire distance
measurements according
a common set of samples is required at each CA. This can to (4), i.e., y(n)
l,k = x(n) − x(n) + v (n) , where the measurement
l k l,k
(n)
noise vl,k is independent across l, k, and n and Gaussian with
variance
⎧ (n)
⎨σ 2 , x − x(n) ≤ d0
(n)2 0
(n) (n) κ l k
σl,k = (n) (48)
⎩σ 2 xl −xk −1 +1 , x − x(n) > d0 .
0 d0 l k
(n)2 (n) (n)

That is, σl,k is a function of xl − xk that stays constant
up to some distance d0 and then increases polynomially with
some exponent κ. This is a simple model for time-of-arrival
distance measurements [50]. We set σ02 = 50 and κ = 2 and, if
not stated otherwise, d0 = 50.
In the estimation layer, we use J = 3,600 samples. (Choos- Fig. 5. Example trajectories for noncooperative self-localization with
ing J below 3,000 was observed in some rare cases to lead information-seeking control (except CA 5). The initial CA position and the
to a convergence to the wrong estimate.) A resampling step anchor position are indicated by a bullet and a star, respectively.
is performed to avoid weight degeneracy [51]. Resampling
(n)(j) (n)(j)J (n)
transforms weighted samples x̃k , wk representing as CA l is not localized (i.e., Tl ≥ 10), its objective func-
(n) j=1(n)(j)J (n+1) (1:n) (1:n+1)
the belief b xk into nonweighted samples xk j=1
. (We tion is D̃h u(n+1)
l −h x (n+1)
l
y
l,1 ; yl,1 , ul , i.e., the
note that weighted samples arise in the estimation layer, as negative differential entropy of only the own state condi-
discussed in [24], [25].) We use a somewhat nonorthodox tioned on only the own measurement relative to the anchor
(n+1)
type of resampling that helps move samples to positions with CA, yl,1 .
high probability mass, thereby reducing the number of sam- The local gradient ascents in the controller (see (19)) use
ples needed. More specifically, at every Lth time step n, we the reference points u(n)
r,l = 0, which are consistent with the
sample from a kernel approximation of the belief; at all other (n)
state evolution model (47), and step sizes cl chosen such that
time steps, we perform standard systematic resampling [51]. (n)
(n) u = umax . Thus, each CA l ∈ C moves with maximum
The kernel approximation of the belief b xk is obtained l l
nominal speed (determined by umax l ) in the direction of max-
as [52]
imum local increase of the objective function. If not stated
otherwise, the number of samples used in the control layer
J
(n)(j) (n)(j) is JJ = 60,000, with J = 1,200 and J = 50. (The J = 1,200
b̃ x(n)
k = wk K x(n) k − x̃k ,
samples are obtained by random selection from the 3,600
j=1
samples produced by the estimation layer.) We note that a
reduction of J to J = 1 was observed to result in more jagged
with the Gaussian kernel K(x) = (2πσK2 )−1 exp −x2 / CA trajectories and a slightly slower reduction of the estimation
(n)
(2σK2 ) . Here, the variance σK2 is chosen as σK2 = J 1/3 Tk /2 error over time.
(n) (n)
if Tk < 2σ02 and σK2 = σ02 otherwise, where Tk denotes the
trace of the weighted sample covariance matrix defined as B. Noncooperative Self-Localization
To study the behavior of the controller, we consider four
(n)

J
(n)(j) (n)(j) (n)(j)T (n) (n)T mobile CAs l = 2, 3, 4, 5 that perform self-localization without
Ck = wk x̃k x̃k − μk μ k , any cooperation during 300 time steps n. The mobile CAs
j=1 measure their distance to the static anchor CA (l = 1), which
is located at position [0, 0]T, but they do not measure any
J (n)(j) (n)(j)
with μ (n)
k = j=1 wk x̃k . This case distinction in choos- distances between themselves. Their measurement models use
different values of d0 , namely d0 = 20, 50, 100, and 100 for
ing σK is used since σK = J 1/3 Tk(n) /2 is only accurate for a
2 2
l = 2, 3, 4, and 5, respectively. The mobile CAs start at position
unimodal distribution [52] whereas σK2 = σ02 is suitable for an- [100, 0]T and move with identical nominal speed determined
nularly shaped distributions [48]. We choose L = 40 if Tk(n) < by umax = 1. The objective function for the control of CAs
(n) (n)
l (n+1) (1:n) (1:n+1)
80, L = 20 if 80 ≤ Tk < 1,000, and L = 10 if Tk ≥ 1,000; 2, 3, and 4 is D̃h u(n+1)
l −h x (n+1)
l
y
l,1 ; yl,1 , ul .
this led to good results in our simulation setting. CA 5 is not controlled; it randomly chooses a direction at time
We employ a censoring scheme [37] to reduce the number n = 1 and then moves in that direction with constant nominal
of samples and avoid numerical problems during the first time speed determined by umax l = 1. Fig. 5 shows an example of the
steps, where the mobile CAs still have uninformative beliefs. trajectories of the four mobile CAs. These trajectories are quite
More specifically, only CAs l with Tl(n) < 10 are used as lo- different because of the different values of d0 and the fact that
calization partners by neighbor CAs and (in our third scenario) CA 5 is not controlled. CA 4, after an initial turn, is roughly
are involved in localizing the target. In the control layer, this localized in the sense that the shape of its marginal posterior
censoring scheme corresponds to the following strategy: as long has changed from an annulus to only a segment of an annulus.
Fig. 6. Self-localization RMSE for noncooperative self-localization with

information-seeking control (except CA 5). Fig. 7. Example trajectories for cooperative self-localization with information-
seeking control (C-C scheme). The initial CA positions and the anchor position
are indicated by bullets and a star, respectively.
Thereafter, CA 4 turns around the anchor, which is reasonable
in view of the single distance measurement available at each
time n and the fact that, since d0 = 100, the measurement
noise cannot be decreased by approaching the anchor. CA 3
(with d0 = 50) initially approaches the anchor. At a distance of
50 to the anchor, the measurement noise cannot be decreased
any more, and thus CA 3 turns around the anchor without
approaching it further. A similar behavior is exhibited by CA
2 (with d0 = 20).
Fig. 6 shows the self-localization root-mean-square errors
(RMSEs) of the four mobile CAs. These RMSEs were de-
termined at each time n by averaging over 300 simulation
runs. As can be seen, the three CAs performing information-
seeking control (l = 2, 3, 4) are fairly well localized after about
100 time steps. CA 2 (with d0 = 20) takes longer to localize
itself than CAs 3 and 4 since, prior to reaching a distance
Fig. 8. Self-localization RMSE of the proposed estimation/control method and
of 20 to the anchor, it has a larger noise variance (see (48)). of two reference methods.
The performance of CA 3 and CA 4 is almost identical; the
larger noise variance of CA 3 during the initial time steps is
compensated by a smaller turning radius once a distance of 50
mobile CAs and over 300 simulation runs. It is seen that the
to the anchor has been reached. CA 5 is unable to localize itself,
RMSEs of the two reference schemes N-C and C-N decrease
due to the lack of intelligent control.
only very slowly whereas, after about 100 time steps, the RMSE
of the proposed C-C scheme has decreased to a low value.
C. Cooperative Self-Localization This behavior can be explained as follows. Without cooperation
(N-C) or without intelligent control (C-N), CAs 3 and 4 need
Next, we study the proposed method for cooperative self-
a long time to localize themselves because they are slow and
localization with information-seeking control (abbreviated as
initially far away from the anchor. On the other hand, CA
C-C). There are three mobile CAs l = 2, 3, and 4 with dif-
2 localizes itself very quickly because it is fast and initially
ferent start points ([−50, 0]T, [0, −50]T, and [0, 70]T, respec-
close to the anchor. With cooperation and control (C-C), CA
tively) and different nominal speeds (umax
l = 1, 0.3, and 0.1,
2 moves in such a way that it supports the self-localization
respectively). The mobile CAs measure their distances to a
of the two other CAs. In fact, as shown by Fig. 7, CA 2 first
static anchor l = 1 located at [−60, 0]T and to each other,
localizes itself by starting to turn around the anchor and then
using d0 = 50. Example trajectories are shown in Fig. 7. For
makes a sharp turn to approach CAs 3 and 4, which helps
comparison, we also consider noncooperative self-localization
them localize themselves. This demonstrates the function and
with information-seeking control as studied in Section IX-B
benefits of cooperative estimation and control.
(abbreviated as N-C). Finally, we consider another scheme
(abbreviated as C-N) where the CAs cooperate in the estimation
layer but no intelligent control is performed. Here, each CA
D. Cooperative Self-Localization and Target Tracking
randomly chooses a direction and then moves in that direction
with constant nominal speed determined by umaxl . Finally, we consider cooperative simultaneous self-
Fig. 8 shows the self-localization RMSEs of the three localization and target tracking. Two mobile CAs l = 2, 3
schemes, which were determined by averaging over the three starting at position [20, 20]T and [−10, −10]T, respectively
information-seeking control (N-C) and cooperative localization

with fixed, randomly chosen directions of movement (C-N).
Fig. 10 shows the self-localization RMSEs and target localiza-
tion RMSEs of the three schemes, which were determined by
averaging over the two CAs and over 100 simulation runs. The
following observations can be made:
• The self-localization performance of C-N is very poor:

after an initial decrease, the RMSE slowly increases. In
fact, typically, no cooperation actually takes place, since
the CAs are unable to localize themselves and thus each
CA is censored by the respective other CA. The self-
localization RMSEs of C-C and N-C decrease rather
quickly to a low value. They are very similar, which
Fig. 9. Example trajectories for cooperative simultaneous self-localization and
target tracking with information-seeking control (C-C scheme). The initial CA can be explained as follows. Because both CAs move
positions are indicated by bullets, the initial target position by a cross, and the with the same nominal speed, they localize themselves
anchor position by a star. approximately in the same manner. Therefore, as long as
the CAs are not localized, no cooperation takes place due
to censoring, and after they are localized, no further gain
and with nominal speed determined by umax l = 1 cooperatively can be achieved by cooperation.
localize and track themselves and a mobile target. There • The target localization RMSEs of the three methods are
is also a static anchor l = 1 at position [−50, 0]T. The initially equal to 50 and slowly increase during the first
target state x(n) m = x4
(n)
consists of position and velocity, 40 time steps. Indeed, due to the censoring scheme,
(n) (n) (n) (n) T
i.e., x(n)
4 x 4,1 4,2 4,1 , ẋ4,2 . The target state evolves
, x , ẋ the CAs start localizing the target only when they are
according to [35] localized themselves. Therefore, during the first 40 time
steps, no measurements of the distance to the target
(n) (n−1) (n)
x4 = Gx4 + Wq4 , n = 1, 2, . . . , are used by the CAs, and thus the CAs’ target position
estimation is solely based on the prior distribution, which
where is uniform. This leads to a target position estimate of
⎛ ⎞ ⎛ ⎞ [0, 0]T and in turn (since the target is initially located
1 0 1 0 0.5 0 at [50, 0]T) to an initial target localization RMSE of
⎜0 1 0 1⎟ ⎜0 0.5⎟
G=⎜
⎝0
⎟, W=⎜ ⎟, 50 at time n = 1. During the first 40 time steps, the
0 1 0⎠ ⎝1 0⎠ RMSE slowly increases since the target slowly moves
0 0 0 1 0 1 away from [0, 0]T. The RMSE of C-N continues to
increase in this manner even after n = 40 since with
and q(n)
4 ∈ R is zero-mean Gaussian with independent and
2
C-N, the CAs are never localized and therefore never
(n)
identically distributed entries, i.e., q4 ∼ N (0, σ̃q2 I) with σ̃q2 = start localizing the target. For C-C and N-C (both em-
ploying information-seeking control), after n = 40, the
10−5 , and with q(n) (n )
4 and q4 independent unless n = n . The tar-
(0) (0) T RMSE first increases and then decreases. The RMSE
get trajectory is initialized with position x4,1 , x4,2 = [50, 0]T of C-C decreases sooner and more quickly than that
(0) (0) T
and velocity ẋ4,1 , ẋ4,2 = [0.05, 0.05]T. In the initialization of N-C, which again shows the benefits of cooperative
of the algorithms, we use a target position prior that is uni- estimation.
form on [−200, 200]×[−200, 200] and a target velocity prior The initial increase and subsequent decrease of the
that is Gaussian with mean [0, 0]T and covariance matrix target localization RMSE observed with C-C and N-C
diag{10−1 , 10−1 }. The number of samples used in the esti- after n = 40 can be explained as follows. After the
mation layer is J = 120,000; the number of samples used in CAs localized themselves and start localizing the target,
the control layer is JJ = 6,000, with J = 1,200 and J = 5. the target position posterior at a given CA is roughly
Fig. 9 shows an example of CA and target trajectories ob- annularly shaped, with the center of the annulus being
tained with the proposed method for cooperative localization the CA position. (This position is equal to the turning
with information-seeking control (C-C). One can observe that point of the respective CA trajectory in Fig. 9.) The
the two CAs first start turning around the anchor to localize resulting target position estimate is located at that center.
themselves and then approach the target. Finally, at a distance Thus, it is more distant from the true target position
of 50 to the target, where further approaching the target would than the estimate [0, 0]T that was obtained when the CA
no longer decrease the measurement noise, the CAs spread was not yet localized and the target position posterior
out to achieve a geometric formation that is favorable for was still uniform. As the CAs approach the target, the
cooperatively localizing and tracking the target. target position posterior becomes unimodal and the target
As before, we compare our C-C method with two ref- can be localized, resulting in a decrease of the target
erence methods, namely, noncooperative localization with localization RMSE.
Fig. 10. Performance of three different methods for simultaneous self-localization and target localization. (a) Self-localization RMSE. (b) Target-localization
RMSE.
X. C ONCLUSION where g(·) is a bijective differentiable function with Jacobian

We proposed a Bayesian framework and method for dis- determinant Jg (a) = det ∂g(a)
∂a ,

tributed estimation with information-seeking control in agent
b) = h(a
h(b a) + e(a a), with e(a a) f (a) log |Jg (a)| da . (49)
networks. Distributed, cooperative estimation is performed for
time-varying global states (related to noncooperative targets or
features of the environment) and/or time-varying local states The conditional differential entropy h(xx+ |yy+ ; u+ ) can be
(related to individual cooperative agents), using a combination expanded as [32, Chap. 8]
of belief propagation message passing and consensus. The
distributed, cooperative control seeks to optimize the behavior h(xx+ |yy+ ; u+ ) = h(xx+, y + ; u+ ) − h(yy+ ; u+ ) . (50)
of the cooperative agents by maximizing the negative joint +
The vector x+ consists of x+ +
l and xA\{l} xk k∈A\{l} , and there
posterior entropy of the agent states via a gradient ascent. A
probabilistic information transfer from the estimation layer to is x+ +
l = g̃l (xl , ul ) (see (20)). Thus, the first term on theright-
hand side of (50) can be expressed as h(xx+, y + ; u+ ) = h g̃l (xxl ,
the control layer enables effective control strategies and thus
leads to excellent estimation performance. u+ + + +
l ), x A\{l} , y ; u . Applying the transformation rule (49) to

A major advantage of the proposed approach is its generality. the “extended state evolution mapping” g̃∗l : xTl , x+T A\{l} , y
+T T →
Our method relies on general state evolution and measurement T +T
g̃l (xl , u+
l ) , xA\{l} , y
+T T , we then obtain
models, an information-theoretic objective function for con-

trol, and sample-based representations of probability distribu- h(xx+, y + ; u+ ) = h x l , x + + +
A\{l} , y ; u
tions. These characteristics make it suitable for nonlinear and +
non-Gaussian systems, such as those arising in location-aware + e xl, x+ +
A\{l} , y ; ul , (51)
networks. Numerical simulations for a simultaneous agent
where
self-localization and target tracking problem demonstrated in-

telligent behavior of the cooperative agents and a resulting e xl, x+
A\{l} , y + ; u+
improvement of estimation performance. l
+
Possible directions for future research include an extension f xl , x+ A\{l} , y
+
log Jg̃∗l xl , x+ +
A\{l} , y ; ul
of the myopic controller (i.e., optimizing only one time step
× dxl dx+ +
A\{l} dy .
ahead) to a receding horizon [53]; this can be expected to im-
prove the performance in scenarios with multiple time-varying +
Here, Jg̃∗l xl , x+ +
A\{l} , y ; ul is the Jacobian determinant of
global states. Furthermore, the complexity and communication
cost of the proposed method can be reduced by introducing g̃∗l xl , x+
A\{l} , y + ; u+ . It is easily seen that J ∗ x , x+
g̃l
+
l A\{l} , y ;
l
Gaussian or Gaussian mixture approximations [54] and using u+ = Jg̃l (xl ; u+ ), and thus we obtain further
cubature points [55] instead of random samples. +
e xl, x+ +
A\{l} , y ; ul

+ +
+ +
= f xl , xA\{l} , y dxA\{l} dy
A PPENDIX A
P ROOF OF E QUATION (21) × log |Jg̃l (xl ; u+
l )| dxl

We will use the following transformation rule for differential = f (xl ) log |Jg̃l (xl ; u+
l )| dxl
entropy [56, Eq. 18]: For a continuous random vector a and
a transformed random vector of identical dimension b = g(a), = Gl (u+
l ). (52)
Inserting (52) into (51) and the resulting expression of Setting u+ = u+ r , and multiplying and dividing the integrand in
h(xx+, y + ; u+ ) into (50) gives (55) by f (ỹ+
l |x l , xCl , x+ +
Tl ; ur,Cl ), we obtain further

∂DI (u+ )
h(xx+ |yy+ ; u+ ) = h x l , x + +
A\{l} , y ; u
+
+ Gl (u+ y+ ; u+ ).
l ) − h(y + +
(53) ∂u+ l u =ur

1
= q(y+, xC , x+ T)
Next, we repeat this transformation procedure but apply it to f (ỹ+ |x + +
l Cl , xTl ; ur,Cl )
, x
l
the term h x l , x + , + ; u+ in (53) instead of h(x x+, y + ; u+ ). +
A\{l} y ∂f (ỹ+ +
+
Consider an arbitrary l ∈ C\{l}, and note that xA\{l} consists × + +
+ ∂u+ l uC =ur,C
of x+ + + +
l and xA\{l,l } xk k∈A\{l,l } , where xl = g̃l (xl , ul ) ac-
l l
+
f (y+ |x +
C , xT ; ur )
cording to (20).Proceeding as above and inserting the resulting × log + dxC dx+ +
T dy , (56)
expression of h x l , x + + + into (53) yields
A\{l} , y ; u f (y+ ; ur )
where
h(xx+ |yy+ ; u+ ) = h x l , x l , x + +
A\{l,l } , y ; u
+
+ Gl (u+
l ) q(y+, xC , x+ + + + +
T ) f (xC , xT ) f (ỹl |xl , xCl , xTl ; ur,Cl )
+ Gl (u+ y+ ; u+ ) . × f (y̆+ + +
l |xC , xT ; ur ) .
l ) − h(y
Then, (33) is recognized to be a Monte Carlo approximation

We continue this procedure in a recursive fashion, splitting of (56) that is obtained by performing importance sampling
off CA state vectors from x+ A\{l,l } until only the target states [43] using q(y+, xC , x+ T ) as importance density, i.e., the sam-
(contained in x+
T ) are left, and applying the transformation rule (j) +(j)
ples y+(j,j ) , xC , and xT occurring in (33) are drawn from
at each recursion. In the end, we obtain
q(y+, xC , x+ T ). Using (54), this importance density can be ex-
pressed as
h(xx+ |yy+ ; u+ ) = h(xxC , x + + +
T , y ; u )+ Gl (u+ y+ ; u+ ).
l ) − h(y
l∈C q(y+, xC , x+ + + + +
T ) = f (xC , xT ) f (y |xC , xT ; ur )
= f (xC , x+ + +
T , y ; ur ) .
Finally, Equation (21) is obtained by noting that h(xxC , x + +
T ,y ;
u+ ) = h(xxC , x + y+ ; u+ ) + h(yy+ ; u+ ).
T |y The first expression, f (xC , x+ + + +
T )f (y |xC , xT ; ur ), underlies the
two-stage sampling procedure described in Section V.
2) Derivation of (34): We have
A PPENDIX B

D ERIVATION OF (33) AND (34)
f (y+ ; u+
r ) = f (y+ |xC , x+ + + +
T ; ur ) f (xC , xT ) dxC dxT . (57)
1) Derivation of (33): Let us first define the vector y̆+
l

y+ which contains all those measurements y+
l,k l ∈C\{l},k∈Al , l,k (j) +(j) J
Using samples xC , xT ∼ f (xC , x+
T ) (see Section V), a
that are not contained in ỹ+ l (cf. (31)). The corresponding
j=1
likelihood function is given by Monte Carlo approximation of (57) is obtained as
1 + (j ) +(j ) +
J
f (y+ |xC , x+ +
T ;u ) f (y+ ; u+
r ) ≈ f y xC , xT ; ur .
f (y̆+ +
l |xC , xT ; u
+
= , (54) J
f (ỹ+ + +
j =1

which, according to (29) and (32), involves all factors of Evaluating this for y+ = y+(j,j ) (again see Section V) yields (34).
f (y+ |xC , x+ +
T ; u ) that do not depend on the local control vector
+
ul . Using (54) in (28) yields
A PPENDIX C
(j) +(j)
D RAWING S AMPLES F ROM f y+ xC , xT ; u+
r
∂DI (u+ )
= f (xC , x+ + + +
T ) f (y̆l |xC , xT ; u ) We consider the setting of Section VI-A. As discussed there,
∂u+ (j)J +(j)J
l
samples xl j=1 ∼ f (xl ), l ∈ C and xm j=1 ∼ f (x+ m ), m ∈ T
∂f (ỹ+ + +
are available at CA l, and it is assumed that the state evolution
× and measurement models of all CAs l ∈ C are known to CA l.
∂u+
l We start by noting that by combining (15) and (3), the compos-
ite measurement vector y+ can be written as
f (y+ |xC , x+ +
T ;u )
× log dxC dx+ +
T dy . (55) + +
f (y+ ; u+ ) y+ = dl (x+
l , xk , vl,k ) l∈C,k∈A . (58)
l
+(j) J
First, CA l obtains samples xl ∼ f̃ (x+
l )
R EFERENCES
j=1
f (x+ ) +
[1] T. Shima and S. Rasmussen, UAV Cooperative Decision and Control:
l xl = g̃l (xl ,u+ )
(see (20)) for all l ∈ C by evaluating Challenges and Practical Approaches. Philadelphia, PA, USA: SIAM,
r,l
(j) 2009.
g̃l (xl , u+ + +
l ) at xl = xl and ul = ur,l , i.e.,
[2] F. Bullo, J. Cortés, and S. Martínez, Distributed Control of Robotic
Networks: A Mathematical Approach to Motion Coordination Algorithms.
+(j) (j) Princeton, NJ, USA: Princeton Univ. Press, 2009.
xl = g̃l xl , u+
r,l , j = 1, . . . , J. (59) [3] A. Nayak and I. Stojmenović, Wireless Sensor and Actuator Networks:
Algorithms and Protocols for Scalable Coordination and Data Communi-
+(j) J cation. Hoboken, NJ, USA: Wiley, 2010.
Thus, at this point, samples xk j=1
for all k ∈ A are avail- [4] F. Zhao and L. J. Guibas, Wireless Sensor Networks: An Information
Processing Approach. Amsterdam, The Netherlands: Morgan Kaufmann,
able at CA l. Next, for each j ∈ {1, . . . , J}, CA l draws samples 2004.
+(j,j ) J
vl,k j =1
∼ f (v+
l,k ) for l ∈ C and k ∈ Al . Finally, CA l ob-

[5] P. Corke et al., “Environmental wireless sensor networks,” Proc. IEEE,
vol. 98, no. 11, pp. 1903–1917, Nov. 2010.
+(j,j ) J + (j) +(j) +
tains samples y j =1
∼ f y xC , xT ; ur by evaluating [6] J. Ko et al., “Wireless sensor networks for healthcare,” Proc. IEEE,
vol. 98, no. 11, pp. 1947–1960, Nov. 2010.
(58) using the appropriate samples, i.e., [7] O. Hlinka, F. Hlawatsch, and P. M. Djuric, “Distributed particle filtering
in agent networks: A survey, classification, and comparison,” IEEE Signal
+(j) +(j) +(j,j ) Process. Mag., vol. 30, no. 1, pp. 61–81, Jan. 2013.
y+(j,j ) = dl xl , xk , vl,k l ∈C,k∈A
, j = 1, . . . , J . [8] T. Zhao and A. Nehorai, “Distributed sequential Bayesian estimation
l
of a diffusive source in wireless sensor networks,” IEEE Trans. Signal
Process., vol. 55, no. 4, pp. 1511–1524, Apr. 2007.
[9] H. Aghajan and A. Cavallaro, Multi-Camera Networks: Principles and
A PPENDIX D (j) (j) +(j) +
Applications. Burlington, MA, USA: Academic, 2009.
D RAWING S AMPLES F ROM f ỹ+ [10] K.-D. Kim and P. R. Kumar, “Cyber physical systems: A perspective at
l xl , xCl , xTl ; ur,Cl the centennial,” Proc. IEEE, vol. 100, pp. 1287–1308, May 2012.
(j)J [11] S. Haykin, Cognitive Dynamic Systems: Perception-Action Cycle, Radar
In the setting of Section VI-B, samples xl j=1 ∼ f (xl ), l ∈ and Radio. New York, NY, USA: Cambridge Univ. Press, 2012.
+(j)J [12] A. D. Ryan, H. Durrant-Whyte, and J. K. Hedrick, “Information-theoretic
{l} ∪ Cl and xm j=1 ∼ f (x+ m ), m ∈ Tl are available at CA l. sensor motion control for distributed estimation,” in Proc. IMECE,
We start by noting that combining (37) and (3) yields Seattle, WA, USA, Nov. 2007, pp. 725–734.
[13] G. M. Hoffmann and C. J. Tomlin, “Mobile sensor network control using
mutual information methods and particle filters,” IEEE Trans. Autom.
+ + +
y+
l = dl (xl , xk , vl,k ) k∈A . (60) Control, vol. 55, no. 1, pp. 32–47, Jan. 2010.
l [14] M. Schwager, P. Dames, D. Rus, and V. Kumar, “A multi-robot control
policy for information gathering in the presence of unknown hazards,” in
Proc. ISRR, Flagstaff, AZ, USA, Aug. 2011, pp. 1–16.
Based on the analogy of this expression to (58), CA l
+(j,j ) J (j) (j) +(j) + [15] B. J. Julian, M. Angermann, M. Schwager, and D. Rus, “Distributed
∼ f y+
first obtains samples yl l xl , xCl , xTl ; ur,Cl
robotic sensor networks: An information-theoretic approach,” Int. J.
j =1 Robot. Res., vol. 31, no. 10, pp. 1134–1154, Sep. 2012.
by carrying out the steps of Appendix C with obvious [16] N. A. Atanasov, J. L. Ny, and G. J. Pappas, “Distributed algorithms for
modifications—in particular, y+ is replaced by y+ l , C by {l} ∪ stochastic source seeking with mobile robot networks,” Trans. ASME, J.
Cl , and T by Tl . More specifically, CA l obtains samples Dyn. Syst. Meas. Control, vol. 137, no. 3, Mar. 2015, Art. ID. 031004.
+(j) J [17] H. Wymeersch, J. Lien, and M. Z. Win, “Cooperative localization in
xl j=1
for l ∈ {l} ∪ Cl according to (59). Then, for each wireless networks,” Proc. IEEE, vol. 97, no. 2, pp. 427–450, Feb. 2009.
+(j,j ) J [18] T. Sathyan and M. Hedley, “Fast and accurate cooperative tracking
j ∈ {1, . . . , J}, CA l draws samples vl,k j =1
∼ f (v+
l,k ) for in wireless networks,” IEEE Trans. Mobile Comput., vol. 12, no. 9,
+(j,j ) J pp. 1801–1813, Sep. 2013.
k ∈ Al and, in turn, obtains samples yl j =1
by evaluating [19] Y.-C. Wu, Q. M. Chaudhari, and E. Serpedin, “Clock synchronization of
wireless sensor networks,” IEEE Signal Process. Mag., vol. 28, no. 1,
(60) using the appropriate samples, i.e., pp. 124–138, Jan. 2011.
[20] F. Meyer, B. Etzlinger, F. Hlawatsch, and A. Springer, “A distributed
+(j,j ) +(j) +(j) +(j,j )
yl = dl xl , xk , vl,k k∈A
, j = 1, . . . , J . particle-based belief propagation algorithm for cooperative simultaneous
l localization and synchronization,” in Proc. Asilomar Conf. Signals, Syst.,
Comput., Pacific Grove, CA, USA, Nov. 2013, pp. 527–531.
[21] B. Etzlinger, F. Meyer, A. Springer, F. Hlawatsch, and H. Wymeersch,
It remains to obtain samples of those entries of ỹ+ l that “Cooperative simultaneous localization and synchronization: A distrib-
+ uted hybrid message passing algorithm,” in Proc. Asilomar Conf. Signals,
are not contained in yl (cf. (31) and (37)). More specifically,
+(j,j ) +(j,j) Syst., Comput., Pacific Grove, CA, USA, Nov. 2013, pp. 1978–1982.
for each sample yl , CA l needs to obtain samples yl,l , [22] B. Etzlinger, H. Wymeersch, and A. Springer, “Cooperative synchroniza-
l ∈ Cl . This is done through communication with neighbor tion in wireless networks,” IEEE Trans. Signal Process., vol. 62, no. 11,
pp. 2837–2849, Jun. 2014.
CAs: CA l transmits to each neighbor CA l ∈ Cl the samples [23] J. Teng, H. Snoussi, C. Richard, and R. Zhou, “Distributed varia-
+(j,j ) J
yl,l j =1
, j = 1, . . . , J, and it receives from CA l ∈ Cl tional filtering for simultaneous sensor localization and target tracking
in wireless sensor networks,” IEEE Trans. Veh. Technol., vol. 61, no. 5,
+(j,j ) J pp. 2305–2318, Jun. 2012.
the samples yl,l j =1
, j = 1, . . . , J. Thus, finally, samples [24] F. Meyer, E. Riegler, O. Hlinka, and F. Hlawatsch, “Simultaneous dis-
+(j,j ) J + (j) (j) +(j) +
∼ f ỹl xl , xCl , xTl ; ur,Cl are locally available
tributed sensor self-localization and target tracking using belief propa-
ỹl j =1 gation and likelihood consensus,” in Proc. 46th Asilomar Conf. Signals,
at CA l. Syst., Comput., Pacific Grove, CA, USA, Nov. 2012, pp. 1212–1216.
[25] F. Meyer, O. Hlinka, H. Wymeersch, E. Riegler, and F. Hlawatsch,
“Distributed localization and tracking of mobile networks including non-
cooperative objects,” submitted. [Online]. Available: http://arxiv.org/abs/
ACKNOWLEDGMENT 1403.1824
[26] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and coop-
The authors would like to thank Dr. Günther Koliander for eration in networked multi-agent systems,” Proc. IEEE, vol. 95, no. 1,
illuminating discussions. pp. 215–233, Jan. 2007.
[27] A. G. Dimakis, S. Kar, J. M. F. Moura, M. G. Rabbat, and A. Scaglione, [55] I. Arasaratnam and S. Haykin, “Cubature Kalman filters,” IEEE Trans.
“Gossip algorithms for distributed signal processing,” Proc. IEEE, Autom. Control, vol. 54, no. 6, pp. 1254–1269, Jun. 2009.
vol. 98, no. 11, pp. 1847–1864, Nov. 2010. [56] Y. Deville and A. Deville, “Exact and approximate quantum independent
[28] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and component analysis for qubit uncoupling,” in Proc. LVA/ICA, Tel Aviv,
the sum–product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, Israel, Mar. 2012, pp. 58–65.
pp. 498–519, Feb. 2001.
[29] C. M. Bishop, Pattern Recognition and Machine Learning. New York,
Florian Meyer (S’12) received the Dipl.-Ing. (M.Sc.)
NY, USA: Springer-Verlag, 2006.
and Ph.D. degrees in electrical engineering from
[30] W. Burgard, D. Fox, and S. Thrun, “Active mobile robot localization by
Vienna University of Technology, Vienna, Austria, in
entropy minimization,” in Proc. EUROBOT, Brescia, Italy, Oct. 1997,
2011 and 2015, respectively. Since 2011, he has been
pp. 155–162.
a Research and Teaching Assistant with the Institute
[31] B. Grocholsky, “Information-theoretic control of multiple sensor plat-
of Telecommunications, Vienna University of Tech-
forms,” Ph.D. dissertation, Dept. Aerosp., Mechatron. Mech. Eng., Univ.
nology. He was a Visiting Scholar at the Depart-
Sydney, Sydney, Australia, 2002.
ment of Signals and Systems, Chalmers University of
[32] T. M. Cover and J. A. Thomas, Elements of Information Theory.
Technology, Gothenburg, Sweden, in 2013 and at the
New York, NY, USA: Wiley, 2006.
Centre of Maritime Research and Experimentation
[33] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation
(CMRE), La Spezia, Italy, in 2014. His research
Theory. Upper Saddle River, NJ, USA: Prentice-Hall, 1993.
interests include signal processing for wireless sensor networks, localization
[34] F. Morbidi and G. L. Mariottini, “Active target tracking and coopera-
and tracking, information-seeking control, message passing algorithms, and
tive localization for teams of aerial vehicles,” IEEE Trans. Control Syst.
finite set statistics.
Technol., vol. 21, no. 5, pp. 1694–1707, Sep. 2013.
[35] X. R. Li and V. P. Jilkov, “Survey of maneuvering target tracking—Part I:
Dynamic models,” IEEE Trans. Aerosp. Electron. Syst., vol. 39, no. 4, Henk Wymeersch (S’99–M’05) received the Ph.D.
pp. 1333–1364, Oct. 2003. degree in electrical engineering/applied sciences
[36] Y. Bar-Shalom, X.-R. Li, and T. Kirubarajan, Estimation With Appli- from Ghent University, Ghent, Belgium, in 2005.
cations to Tracking and Navigation. New York, NY, USA: Wiley, He is currently an Associate Professor with the De-
2001. partment of Signals and Systems, Chalmers Univer-
[37] J. Lien, J. Ferner, W. Srichavengsup, H. Wymeersch, and M. Z. Win, sity of Technology, Gothenburg, Sweden. Prior to
“A comparison of parametric and sample-based message representation joining Chalmers, he was a Postdoctoral Associate
in cooperative localization,” Int. J. Navig. Observ., vol. 2012, 2012, with the Laboratory for Information and Decision
Art. ID. 281 592. Systems, Massachusetts Institute of Technology.
[38] S. Farahmand, S. I. Roumeliotis, and G. B. Giannakis, “Set- Dr. Wymeersch served as an Associate Editor of the
membership constrained particle filter: Distributed adaptation for sensor IEEE C OMMUNICATION L ETTERS (2009–2013),
networks,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4122–4138, IEEE T RANSACTIONS ON W IRELESS C OMMUNICATIONS (2013–present), and
Sep. 2011. Transactions on Emerging Telecommunications Technologies (2011–present).
[39] V. Savic, H. Wymeersch, and S. Zazo, “Belief consensus algorithms
for fast distributed target tracking in wireless sensor networks,” Signal
Process., vol. 95, pp. 149–160, Feb. 2014. Markus Fröhle received the B.Sc. and M.Sc. degrees
[40] O. Hlinka, F. Hlawatsch, and P. M. Djuric, “Consensus-based distributed in telematics from Graz University of Technology,
particle filtering with distributed proposal adaptation,” IEEE Trans. Signal Graz, Austria, in 2009 and 2012, respectively. He is
Process., vol. 62, no. 12, pp. 3029–3041, Jun. 2014. currently working toward the Ph.D. degree in elec-
[41] C. Lindberg, L. S. Muppirisetty, K.-M. Dahlen, V. Savic, and trical engineering with the Department of Signals
H. Wymeersch, “MAC delay in belief consensus for distributed tracking,” and Systems, Chalmers University of Technology,
in Proc. IEEE WPNC, Dresden, Germany, Mar. 2013, pp. 1–6. Gothenburg, Sweden. From 2012 to 2013, he was a
[42] R. Fletcher, Practical Methods of Optimization. New York, NY, USA: Research Assistant with the Signal Processing and
Wiley, 1987. Speech Communication Laboratory, Graz University
[43] A. Doucet, N. de Freitas, and N. Gordon, Sequential Monte Carlo of Technology. His research interests include indoor
Methods in Practice. New York, NY, USA: Springer-Verlag, 2001. positioning and navigation, UWB, and the interplay
[44] H. Lim and C. Kim, “Multicast tree construction and flooding in wireless between estimation theory, communications, and control in multiagent systems.
ad hoc networks,” in Proc. MSWIM, New York, NY, USA, Aug. 2000,
pp. 61–68.
[45] L. Xiao, S. Boyd, and S. Lall, “A scheme for robust distributed sensor Franz Hlawatsch (S’85–M’88–SM’00–F’12)
fusion based on average consensus,” in Proc. IPSN, Los Angeles, CA, received the Diplom-Ingenieur, Dr. techn., and
USA, Apr. 2005, pp. 63–70. Univ.-Dozent (Habilitation) degrees in electrical
[46] R. G. Bartle, The Elements of Integration and Lebesgue Measure. engineering/signal processing from Vienna Univer-
New York, NY, USA: Wiley, 1995. sity of Technology, Vienna, Austria, in 1983, 1988,
[47] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. Cambridge, and 1996, respectively. Since 1983, he has been with
MA, USA: MIT Press, 2005. the Institute of Telecommunications, Vienna Univer-
[48] A. T. Ihler, J. W. Fisher, R. L. Moses, and A. S. Willsky, “Nonparametric sity of Technology, where he is currently an Associate
belief propagation for self-localization of sensor networks,” IEEE J. Sel. Professor. During 1991–1992, as a recipient of an
Areas Commun., vol. 23, no. 4, pp. 809–819, Apr. 2005. Erwin Schrödinger Fellowship, he spent a sabbatical
[49] B. Ristic, S. Arulampalam, and N. Gordon, Beyond the Kalman Filter: year with the Department of Electrical Engineering,
Particle Filters for Tracking Applications. Norwood, MA, USA: Artech University of Rhode Island, Kingston, RI, USA. In 1999, 2000, and 2001, he held
House, 2004. one-month Visiting Professor positions with INP/ENSEEIHT, Toulouse, France
[50] G. E. Garcia, L. S. Muppirisetty, E. M. Schiller, and H. Wymeersch, and IRCCyN, Nantes, France. He (co)authored a book, three review papers
“On the trade-off between accuracy and delay in cooperative UWB lo- that appeared in the IEEE Signal Processing Magazine, about 200 refereed
calization: Performance bounds and scaling laws,” IEEE Trans. Wireless scientific papers and book chapters, and three patents. He coedited three books.
Commun., vol. 13, no. 8, pp. 4574–4585, Aug. 2014. His research interests include statistical and compressive signal processing
[51] R. Douc and O. Cappe, “Comparison of resampling schemes for particle methods and their application to sensor networks and wireless communications.
filtering,” in Proc. ISPA, Zagreb, Croatia, Sep. 2005, pp. 64–69. Prof. Hlawatsch was a Technical Program Co-Chair of EUSIPCO 2004 and
[52] B. W. Silverman and P. J. Green, Density Estimation for Statistics and served on the technical committees of numerous IEEE conferences. He was an
Data Analysis. London, U.K.: Chapman & Hall, 1986. Associate Editor of the IEEE T RANSACTIONS ON S IGNAL P ROCESSING from
[53] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert, “Con- 2003 to 2007 and the IEEE T RANSACTIONS ON I NFORMATION T HEORY from
strained model predictive control: Stability and optimality,” Automatica, 2008 to 2011. From 2004 to 2009, he was a member of the IEEE SPCOM Tech-
vol. 36, no. 6, pp. 789–814, Jun. 2000. nical Committee. He currently serves as an Associate Editor of the IEEE T RANS -
[54] D. L. Alspach and H. W. Sorenson, “Nonlinear Bayesian estimation using ACTIONS ON S IGNAL AND I NFORMATION P ROCESSING OVER N ETWORKS .
Gaussian sum approximations,” IEEE Trans. Autom. Control, vol. AC-17, He coauthored papers that won an IEEE Signal Processing Society Young Au-
no. 4, pp. 439–448, Aug. 1972. thor Best Paper Award and a Best Student Paper Award at IEEE ICASSP 2011.

Distributed Estimation With Information-Seeking Control in Agent Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Estimation With Information-Seeking Control in Agent Networks

Uploaded by

Copyright:

Available Formats

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 33, NO.

11, NOVEMBER 2015 2439

Distributed Estimation With Information-Seeking

network of agents was introduced in [30] and [31], respectively.

measured agent (CA or target) k ∈ A(n)

where dl (·) is a possibly nonlinear function and v(n)

joint posterior f (x(1:n) |y(1:n); u(1:n) ).

In the proposed control approach, each CA l ∈ C determines its

scaling of the c(n+1)

determined; hence both are fixed. (Note that in Section III, we

V. C ALCULATION OF THE G RADIENT OF DI (u+ )

Here, I(xxC , x + + + f (y+ |xC , x+ +

 (j) +(j) J 

 (j) +(j)  and f y+(j,j ) xC , xT ; u+ r (and, in particular, f y

by using J 2 J parallel instances of an average consensus or respect to u+ + +

l transmits J 2 J |Cl |R + JJ |Cl |My + JM + Mu ≈ J 2 J |Cl |R real l )

VII. C ALCULATION OF THE G RADIENT OF Gl (u+

This special case was previously considered in [15]. More

for iteration index p = 1, . . . , P, where b(0)(xl ) = f (xl ), l ∈

B. Cooperative Estimation of Global States A. Simulation Setup

(n)2  (n) (n) 

Fig. 6. Self-localization RMSE for noncooperative self-localization with

information-seeking control (N-C) and cooperative localization

• The self-localization performance of C-N is very poor:

X. C ONCLUSION where g(·) is a bijective differentiable function with Jacobian

Then, (33) is recognized to be a Monte Carlo approximation

You might also like

(j) +(j) J

(j) +(j) and f y+(j,j ) xC , xT ; u+ r (and, in particular, f y

l transmits J 2 J |Cl |R + JJ |Cl |My + JM + Mu ≈ J 2 J |Cl |R real l )

(n)2 (n) (n)