Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
by
Jason L. Williams
B.E.(Electronics)(Hons.) B.Inf.Tech., Queensland University of Technology, 1999
M.S.E.E., Air Force Institute of Technology, 2003
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulﬁllment of the requirements for the degree of
Doctor of Philosophy
in Electrical Engineering and Computer Science
at the Massachusetts Institute of Technology
February, 2007
c _ 2007 Massachusetts Institute of Technology
All Rights Reserved.
Signature of Author:
Department of Electrical Engineering and Computer Science
January 12, 2007
Certiﬁed by:
John W. Fisher III
Principal Research Scientist, CSAIL
Thesis Supervisor
Certiﬁed by:
Alan S. Willsky
Edwin Sibley Webster Professor of Electrical Engineering
Thesis Supervisor
Accepted by:
Arthur C. Smith
Professor of Electrical Engineering
Chair, Committee for Graduate Students
2
Information Theoretic Sensor Management
by Jason L. Williams
Submitted to the Department of Electrical Engineering
and Computer Science on January 12, 2007
in Partial Fulﬁllment of the Requirements for the Degree of
Doctor of Philosophy in Electrical Engineering and Computer Science
Abstract
Sensor management may be deﬁned as those stochastic control problems in which con
trol values are selected to inﬂuence sensing parameters in order to maximize the utility
of the resulting measurements for an underlying detection or estimation problem. While
problems of this type can be formulated as a dynamic program, the state space of the
program is in general inﬁnite, and traditional solution techniques are inapplicable. De
spite this fact, many authors have applied simple heuristics such as greedy or myopic
controllers with great success.
This thesis studies sensor management problems in which information theoretic
quantities such as entropy are utilized to measure detection or estimation performance.
The work has two emphases: ﬁrstly, we seek performance bounds which guarantee per
formance of the greedy heuristic and derivatives thereof in certain classes of problems.
Secondly, we seek to extend these basic heuristic controllers to ﬁnd algorithms that pro
vide improved performance and are applicable in larger classes of problems for which
the performance bounds do not apply. The primary problem of interest is multiple ob
ject tracking and identiﬁcation; application areas include sensor network management
and multifunction radar control.
Utilizing the property of submodularity, as proposed for related problems by diﬀer
ent authors, we show that the greedy heuristic applied to sequential selection problems
with information theoretic objectives is guaranteed to achieve at least half of the optimal
reward. Tighter guarantees are obtained for diﬀusive problems and for problems involv
ing discounted rewards. Online computable guarantees also provide tighter bounds in
speciﬁc problems. The basic result applies to open loop selections, where all decisions
are made before any observation values are received; we also show that the closed loop
4
greedy heuristic, which utilizes observations received in the interim in its subsequent
decisions, possesses the same guarantee relative to the open loop optimal, and that no
such guarantee exists relative to the optimal closed loop performance.
The same mathematical property is utilized to obtain an algorithm that exploits
the structure of selection problems involving multiple independent objects. The algo
rithm involves a sequence of integer programs which provide progressively tighter upper
bounds to the true optimal reward. An auxiliary problem provides progressively tighter
lower bounds, which can be used to terminate when a nearoptimal solution has been
found. The formulation involves an abstract resource consumption model, which allows
observations that expend diﬀerent amounts of available time.
Finally, we present a heuristic approximation for an object tracking problem in a
sensor network, which permits a direct tradeoﬀ between estimation performance and
energy consumption. We approach the tradeoﬀ through a constrained optimization
framework, seeking to either optimize estimation performance over a rolling horizon
subject to a constraint on energy consumption, or to optimize energy consumption sub
ject to a constraint on estimation performance. Lagrangian relaxation is used alongside
a series of heuristic approximations to ﬁnd a tractable solution that captures the essen
tial structure in the problem.
Thesis Supervisors: John W. Fisher III
†
and Alan S. Willsky
‡
Title: † Principal Research Scientist,
Computer Science and Artiﬁcial Intelligence Laboratory
‡ Edwin Sibley Webster Professor of Electrical Engineering
Acknowledgements
We ought to give thanks for all fortune: if it is good, because it is good,
if bad, because it works in us patience, humility and the contempt of this world
and the hope of our eternal country.
C.S. Lewis
It has been a wonderful privilege to have been able to study under and alongside such a
tremendous group of people in this institution over the past few years. There are many
people whom I must thank for making this opportunity the great experience that it has
been. Firstly, I oﬀer my sincerest thanks to my advisors, Dr John Fisher III and Prof
Alan Willsky, whose support, counsel and encouragement has guided me through these
years. The Army Research Oﬃce, the MIT Lincoln Laboratory Advanced Concepts
Committee and the Air Force Oﬃce of Scientiﬁc Research all supported this research
at various stages of development.
Thanks go to my committee members, Prof David Casta˜ n´on (BU) and Prof Dimitri
Bertsekas for oﬀering their time and advice. Prof Casta˜ n´on suggested applying column
generation techniques to Section 4.1.2, which resulted in the development in Section 4.3.
Various conversations with David Choi, Dan Rudoy and John Weatherwax (MIT Lin
coln Laboratory) as well as Michael Schneider (BAE Systems Advanced Information
Technologies) provided valuable input in the development of many of the formulations
studied. Vikram Krishnamurthy (UBC) and David Choi ﬁrst pointed me to the recent
work applying submodularity to sensor management problems, which led to the results
in Chapter 3.
My oﬃce mates, Pat Kreidl, Emily Fox and Kush Varshney, have been a constant
sounding board for halfbaked ideas over the years—I will certainly miss them on a
professional level and on a personal level, not to mention my other lab mates in the
Stochastic Systems Group. The members of the Eastgate Bible study, the Graduate
Christian Fellowship and the Westgate community have been an invaluable source of
friendship and support for both Jeanette and I; we will surely miss them as we leave
Boston.
5
6 ACKNOWLEDGEMENTS
I owe my deepest gratitude to my wife, Jeanette, who has followed me around the
world on this crazy journey, being an everpresent source of companionship, friendship
and humour. I am extremely privileged to beneﬁt from her unwavering love, support
and encouragement. Thanks also go to my parents and extended family for their support
as they have patiently awaited our return home.
Finally, to the God who creates and sustains, I humbly refer recognition for all
success, growth and health with which I have been blessed over these years. The Lord
gives and the Lord takes away; may the name of the Lord be praised.
Contents
Abstract 3
Acknowledgements 5
List of Figures 13
1 Introduction 19
1.1 Canonical problem structures . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 Waveform selection and beam steering . . . . . . . . . . . . . . . . . . . 20
1.3 Sensor networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.4 Contributions and thesis outline . . . . . . . . . . . . . . . . . . . . . . 22
1.4.1 Performance guarantees for greedy heuristics . . . . . . . . . . . 23
1.4.2 Eﬃcient solution for beam steering problems . . . . . . . . . . . 23
1.4.3 Sensor network management . . . . . . . . . . . . . . . . . . . . 23
2 Background 25
2.1 Dynamical models and estimation . . . . . . . . . . . . . . . . . . . . . 25
2.1.1 Dynamical models . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.2 Kalman ﬁlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.3 Linearized and extended Kalman ﬁlter . . . . . . . . . . . . . . . 29
2.1.4 Particle ﬁlters and importance sampling . . . . . . . . . . . . . . 30
2.1.5 Graphical models . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.6 Cram´erRao bound . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Markov decision processes . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Partially observed Markov decision processes . . . . . . . . . . . 37
2.2.2 Open loop, closed loop and open loop feedback . . . . . . . . . . 38
2.2.3 Constrained dynamic programming . . . . . . . . . . . . . . . . . 38
7
8 CONTENTS
2.3 Information theoretic objectives . . . . . . . . . . . . . . . . . . . . . . . 40
2.3.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.2 Mutual information . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.3.3 KullbackLeibler distance . . . . . . . . . . . . . . . . . . . . . . 43
2.3.4 Linear Gaussian models . . . . . . . . . . . . . . . . . . . . . . . 44
2.3.5 Axioms resulting in entropy . . . . . . . . . . . . . . . . . . . . . 45
2.3.6 Formulations and geometry . . . . . . . . . . . . . . . . . . . . . 46
2.4 Set functions, submodularity and greedy heuristics . . . . . . . . . . . . 48
2.4.1 Set functions and increments . . . . . . . . . . . . . . . . . . . . 48
2.4.2 Submodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.3 Independence systems and matroids . . . . . . . . . . . . . . . . 51
2.4.4 Greedy heuristic for matroids . . . . . . . . . . . . . . . . . . . . 54
2.4.5 Greedy heuristic for arbitrary subsets . . . . . . . . . . . . . . . 55
2.5 Linear and integer programming . . . . . . . . . . . . . . . . . . . . . . 57
2.5.1 Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.2 Column generation and constraint generation . . . . . . . . . . . 58
2.5.3 Integer programming . . . . . . . . . . . . . . . . . . . . . . . . . 59
Relaxations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Cutting plane methods . . . . . . . . . . . . . . . . . . . . . . . . 60
Branch and bound . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.6.1 POMDP and POMDPlike models . . . . . . . . . . . . . . . . . 61
2.6.2 Model simpliﬁcations . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.6.3 Suboptimal control . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.6.4 Greedy heuristics and extensions . . . . . . . . . . . . . . . . . . 62
2.6.5 Existing work on performance guarantees . . . . . . . . . . . . . 65
2.6.6 Other relevant work . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.6.7 Contrast to our contributions . . . . . . . . . . . . . . . . . . . . 66
3 Greedy heuristics and performance guarantees 69
3.1 A simple performance guarantee . . . . . . . . . . . . . . . . . . . . . . 70
3.1.1 Comparison to matroid guarantee . . . . . . . . . . . . . . . . . 72
3.1.2 Tightness of bound . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.1.3 Online version of guarantee . . . . . . . . . . . . . . . . . . . . . 73
3.1.4 Example: beam steering . . . . . . . . . . . . . . . . . . . . . . . 74
3.1.5 Example: waveform selection . . . . . . . . . . . . . . . . . . . . 76
CONTENTS 9
3.2 Exploiting diﬀusiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.2.1 Online guarantee . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2.2 Specialization to trees and chains . . . . . . . . . . . . . . . . . . 82
3.2.3 Establishing the diﬀusive property . . . . . . . . . . . . . . . . . 83
3.2.4 Example: beam steering revisited . . . . . . . . . . . . . . . . . . 84
3.2.5 Example: bearings only measurements . . . . . . . . . . . . . . . 84
3.3 Discounted rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4 Time invariant rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.5 Closed loop control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5.1 Counterexample: closed loop greedy versus closed loop optimal . 97
3.5.2 Counterexample: closed loop greedy versus open loop greedy . . 98
3.5.3 Closed loop subset selection . . . . . . . . . . . . . . . . . . . . . 99
3.6 Guarantees on the Cram´erRao bound . . . . . . . . . . . . . . . . . . . 101
3.7 Estimation of rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.8 Extension: general matroid problems . . . . . . . . . . . . . . . . . . . . 105
3.8.1 Example: beam steering . . . . . . . . . . . . . . . . . . . . . . . 106
3.9 Extension: platform steering . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4 Independent objects and integer programming 111
4.1 Basic formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.1.1 Independent objects, additive rewards . . . . . . . . . . . . . . . 112
4.1.2 Formulation as an assignment problem . . . . . . . . . . . . . . . 113
4.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2 Integer programming generalization . . . . . . . . . . . . . . . . . . . . . 119
4.2.1 Observation sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.2.2 Integer programming formulation . . . . . . . . . . . . . . . . . . 120
4.3 Constraint generation approach . . . . . . . . . . . . . . . . . . . . . . . 122
4.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.3.2 Formulation of the integer program in each iteration . . . . . . . 126
4.3.3 Iterative algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.3.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.3.5 Theoretical characteristics . . . . . . . . . . . . . . . . . . . . . . 135
4.3.6 Early termination . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.4 Computational experiments . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.4.1 Implementation notes . . . . . . . . . . . . . . . . . . . . . . . . 141
10 CONTENTS
4.4.2 Waveform selection . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.4.3 State dependent observation noise . . . . . . . . . . . . . . . . . 146
4.4.4 Example of potential beneﬁt: single time slot observations . . . . 150
4.4.5 Example of potential beneﬁt: multiple time slot observations . . 151
4.5 Time invariant rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.5.1 Avoiding redundant observation subsets . . . . . . . . . . . . . . 155
4.5.2 Computational experiment: waveform selection . . . . . . . . . . 156
4.5.3 Example of potential beneﬁt . . . . . . . . . . . . . . . . . . . . 158
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5 Sensor management in sensor networks 163
5.1 Constrained Dynamic Programming Formulation . . . . . . . . . . . . . 164
5.1.1 Estimation objective . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.1.2 Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.1.3 Constrained communication formulation . . . . . . . . . . . . . . 167
5.1.4 Constrained entropy formulation . . . . . . . . . . . . . . . . . . 168
5.1.5 Evaluation through Monte Carlo simulation . . . . . . . . . . . . 169
5.1.6 Linearized Gaussian approximation . . . . . . . . . . . . . . . . . 169
5.1.7 Greedy sensor subset selection . . . . . . . . . . . . . . . . . . . 171
5.1.8 nScan pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.1.9 Sequential subgradient update . . . . . . . . . . . . . . . . . . . 177
5.1.10 Rollout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.1.11 Surrogate constraints . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.2 Decoupled Leader Node Selection . . . . . . . . . . . . . . . . . . . . . . 180
5.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.4 Conclusion and future work . . . . . . . . . . . . . . . . . . . . . . . . . 184
6 Contributions and future directions 189
6.1 Summary of contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.1.1 Performance guarantees for greedy heuristics . . . . . . . . . . . 189
6.1.2 Eﬃcient solution for beam steering problems . . . . . . . . . . . 190
6.1.3 Sensor network management . . . . . . . . . . . . . . . . . . . . 190
6.2 Recommendations for future work . . . . . . . . . . . . . . . . . . . . . 191
6.2.1 Performance guarantees . . . . . . . . . . . . . . . . . . . . . . . 191
Guarantees for longer lookahead lengths . . . . . . . . . . . . . 191
CONTENTS 11
Observations consuming diﬀerent resources . . . . . . . . . . . . 191
Closed loop guarantees . . . . . . . . . . . . . . . . . . . . . . . . 192
Stronger guarantees exploiting additional structure . . . . . . . . 192
6.2.2 Integer programming formulation of beam steering . . . . . . . . 192
Alternative update algorithms . . . . . . . . . . . . . . . . . . . 192
Deferred reward calculation . . . . . . . . . . . . . . . . . . . . . 192
Accelerated search for lower bounds . . . . . . . . . . . . . . . . 193
Integration into branch and bound procedure . . . . . . . . . . . 193
6.2.3 Sensor network management . . . . . . . . . . . . . . . . . . . . 193
Problems involving multiple objects . . . . . . . . . . . . . . . . 193
Performance guarantees . . . . . . . . . . . . . . . . . . . . . . . 194
Bibliography 195
12 CONTENTS
List of Figures
2.1 Contour plots of the optimal reward to go function for a single time step
and for four time steps. Smaller values are shown in blue while larger
values are shown in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Reward in single stage continuous relaxation as a function of the param
eter α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1 (a) shows total reward accrued by the greedy heuristic in the 200 time
steps for diﬀerent diﬀusion strength values (q), and the bound on optimal
obtained through Theorem 3.2. (b) shows the ratio of these curves,
providing the factor of optimality guaranteed by the bound. . . . . . . . 75
3.2 (a) shows region boundary and vehicle path (counterclockwise, starting
from the left end of the lower straight segment). When the vehicle is
located at a ‘´’ mark, any one grid element with center inside the sur
rounding dotted ellipse may be measured. (b) graphs reward accrued by
the greedy heuristic after diﬀerent periods of time, and the bound on the
optimal sequence for the same time period. (c) shows the ratio of these
two curves, providing the factor of optimality guaranteed by the bound. 77
3.3 Marginal entropy of each grid cell after 75, 225 and 525 steps. Blue
indicates the lowest uncertainty, while red indicates the highest. Vehicle
path is clockwise, commencing from topleft. Each revolution takes 300
steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.4 Strongest diﬀusive coeﬃcient versus covariance upper limit for various
values of ˜ q, with ˜ r = 1. Note that lower values of α
∗
correspond to
stronger diﬀusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
13
14 LIST OF FIGURES
3.5 (a) shows total reward accrued by the greedy heuristic in the 200 time
steps for diﬀerent diﬀusion strength values (q), and the bound on optimal
obtained through Theorem 3.5. (b) shows the ratio of these curves,
providing the factor of optimality guaranteed by the bound. . . . . . . . 86
3.6 (a) shows average total reward accrued by the greedy heuristic in the
200 time steps for diﬀerent diﬀusion strength values (q), and the bound
on optimal obtained through Theorem 3.5. (b) shows the ratio of these
curves, providing the factor of optimality guaranteed by the bound. . . . 88
3.7 (a) shows the observations chosen in the example in Sections 3.1.4 and
3.2.4 when q = 1. (b) shows the smaller set of observations chosen in the
constrained problem using the matroid selection algorithm. . . . . . . . 107
4.1 Example of operation of assignment formulation. Each “strip” in the
diagram corresponds to the reward for observing a particular object at
diﬀerent times over the 10step planning horizon (assuming that it is only
observed once within the horizon). The role of the auction algorithm is to
pick one unique object to observe at each time in the planning horizon in
order to maximize the sum of the rewards gained. The optimal solution
is shown as black dots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2 Example of randomly generated detection map. The color intensity in
dicates the probability of detection at each x and y position in the region.117
4.3 Performance tracking M = 20 objects. Performance is measured as the
average (over the 200 simulations) total change in entropy due to in
corporating chosen measurements over all time. The point with a plan
ning horizon of zero corresponds to observing objects sequentially; with a
planning horizon of one the auctionbased method is equivalent to greedy
selection. Error bars indicate 1σ conﬁdence bounds for the estimate of
average total reward. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
LIST OF FIGURES 15
4.4 Subsets available in iteration l of example scenario. The integer program
may select for each object any candidate subset in T
i
l
, illustrated by
the circles, augmented by any subset of elements from the corresponding
exploration subset, illustrated by the rectangle connected to the circle.
The sets are constructed such that there is a unique way of selecting
any subset of observations in S
i
. The subsets selected for each object
must collectively satisfy the resource constraints in order to be feasible.
The shaded candidate subsets and exploration subset elements denote
the solution of the integer program at this iteration. . . . . . . . . . . . 125
4.5 Subsets available in iteration (l+1) of example scenario. The subsets that
were modiﬁed in the update between iterations l and (l +1) are shaded.
There remains a unique way of selecting each subset of observations; e.g.,
the only way to select elements g and e together (for object 2) is to select
the new candidate subset ¦e, g¦, since element e was removed from the
exploration subset for candidate subset ¦g¦ (i.e., B
2
l+1,{g}
). . . . . . . . . 127
4.6 Four iterations of operations performed by Algorithm 4.1 on object 1 (ar
ranged in counterclockwise order, from the topleft). The circles in each
iteration show the candidate subsets, while the attached rectangles show
the corresponding exploration subsets. The shaded circles and rectangles
in iterations 1, 2 and 3 denote the sets that were updated prior to that
iteration. The solution to the integer program in each iteration is shown
along with the reward in the integer program objective (“IP reward”),
which is an upper bound to the exact reward, and the exact reward of
the integer program solution (“reward”). . . . . . . . . . . . . . . . . . . 133
4.7 The two radar sensor platforms move along the racetrack patterns shown
by the solid lines; the position of the two platforms in the tenth time slot
is shown by the ‘*’ marks. The sensor platforms complete 1.7 revolutions
of the pattern in the 200 time slots in the simulation. M objects are
positioned randomly within the [10, 100][10, 100] according to a uniform
distribution, as illustrated by the ‘_’ marks. . . . . . . . . . . . . . . . 143
16 LIST OF FIGURES
4.8 Results of Monte Carlo simulations for planning horizons between one
and 30 time slots (in each sensor). Top diagram shows results for 50
objects, while middle diagram shows results for 80 objects. Each trace
in the plots shows the total reward (i.e., the sum of the MI reductions in
each time step) of a single Monte Carlo simulation for diﬀerent planning
horizon lengths divided by the total reward with the planning horizon
set to a single time step, giving an indication of the improvement due to
additional planning. Bottom diagram shows the computation complexity
(measured through the average number of seconds to produce a plan for
the planning horizon) versus the planning horizon length. . . . . . . . . 145
4.9 Computational complexity (measured as the average number of seconds
to produce a plan for the 10step planning horizon) for diﬀerent numbers
of objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.10 Top diagram shows the total reward for each planning horizon length
divided by the total reward for a single step planning horizon, averaged
over 20 Monte Carlo simulations. Error bars show the standard deviation
of the mean performance estimate. Lower diagram shows the average
time required to produce plan for the diﬀerent length planning horizon
lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.11 Upper diagram shows the total reward obtained in the simulation using
diﬀerent planning horizon lengths, divided by the total reward when the
planning horizon is one. Lower diagram shows the average computation
time to produce a plan for the following N steps. . . . . . . . . . . . . . 152
4.12 Upper diagram shows the total reward obtained in the simulation using
diﬀerent planning horizon lengths, divided by the total reward when the
planning horizon is one. Lower diagram shows the average computation
time to produce a plan for the following N steps. . . . . . . . . . . . . . 154
4.13 Diagram illustrates the variation of rewards over the 50 time step plan
ning horizon commencing from time step k = 101. The line plots the
ratio between the reward of each observation at time step in the plan
ning horizon and the reward of the same observation at the ﬁrst time slot
in the planning horizon, averaged over 50 objects. The error bars show
the standard deviation of the ratio, i.e., the variation between objects. 157
LIST OF FIGURES 17
4.14 Top diagram shows the total reward for each planning horizon length
divided by the total reward for a single step planning horizon, averaged
over 17 Monte Carlo simulations. Error bars show the standard deviation
of the mean performance estimate. Lower diagram shows the average
time required to produce plan for the diﬀerent length planning horizon
lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.15 Upper diagram shows the total reward obtained in the simulation using
diﬀerent planning horizon lengths, divided by the total reward when the
planning horizon is one. Lower diagram shows the average computation
time to produce a plan for the following N steps. . . . . . . . . . . . . . 161
5.1 Tree structure for evaluation of the dynamic program through simulation.
At each stage, a tail subproblem is required to be evaluated each new
control, and a set of simulated values of the resulting observations. . . 172
5.2 Computation tree after applying the linearized Gaussian approximation
of Section 5.1.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.3 Computation tree equivalent to Fig. 5.2, resulting from decomposition of
control choices into distinct stages, selecting leader node for each stage
and then selecting the subset of sensors to activate. . . . . . . . . . . . . 173
5.4 Computation tree equivalent to Fig. 5.2 and Fig. 5.3, resulting from
further decomposing sensor subset selection problem into a generalized
stopping problem, in which each substage allows one to terminate and
move onto the next time slot with the current set of selected sensors, or
to add an additional sensor. . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.5 Tree structure for nscan pruning algorithm with n = 1. At each stage
new leaves are generated extending each remaining sequence with using
each new leader node. Subsequently, all but the best sequence ending
with each leader node is discarded (marked with ‘’), and the remaining
sequences are extended using greedy sensor subset selection (marked with
‘G’). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
18 LIST OF FIGURES
5.6 Position entropy and communication cost for dynamic programming
method with communication constraint (DP CC) and information
constraint (DP IC) with diﬀerent planning horizon lengths (N),
compared to the methods selecting as leader node and activating the
sensor with the largest mutual information (greedy MI), and the sensor
with the smallest expected square distance to the object (min expect
dist). Ellipse centers show the mean in each axis over 100 Monte Carlo
runs; ellipses illustrate covariance, providing an indication of the
variability across simulations. Upper ﬁgure compares average position
entropy to communication cost, while lower ﬁgure compares average of
the minimum entropy over blocks of the same length as the planning
horizon (i.e., the quantity to which the constraint is applied) to
communication cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.7 Adaptation of communication constraint dual variable λ
k
for diﬀerent
horizon lengths for a single Monte Carlo run, and corresponding cumu
lative communication costs. . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.8 Position entropy and communication cost for dynamic programming
method with communication constraint (DP CC) and information
constraint (DP IC), compared to the method which dynamically
selects the leader node to minimize the expected communication cost
consumed in implementing a ﬁxed sensor management scheme. The
ﬁxed sensor management scheme activates the sensor (‘greedy’) or two
sensors (‘greedy 2’) with the observation or observations producing the
largest expected reduction in entropy. Ellipse centers show the mean in
each axis over 100 Monte Carlo runs; ellipses illustrate covariance,
providing an indication of the variability across simulations. . . . . . . . 186
Chapter 1
Introduction
D
ETECTION and estimation theory considers the problem of utilizing
noisecorrupted observations to infer the state of some underlying process or
phenomenon. Examples include detecting the presence of a heart disease using
measurements from MRI, estimating ocean currents using image data from satellite,
detecting and tracking people using video cameras, and tracking and identifying
aircraft in the vicinity of an airport using radar.
Many modern sensors are able to rapidly change mode of operation and steer be
tween physically separated objects. In many problem contexts, substantial performance
gains can be obtained by exploiting this ability, adaptively controlling sensors to maxi
mize the utility of the information received. Sensor management deals with such situa
tions: where the objective is to maximize the utility of measurements for an underlying
detection or estimation task.
Sensor management problems involving multiple time steps (in which decisions at a
particular stage may utilize information received in all prior stages) can be formulated
and, conceptually, solved using dynamic programming. However, in general the optimal
solution of these problems requires computation and storage of continuous functions
with no ﬁnite parameterization, hence it is intractable even problems involving small
numbers of objects, sensors, control choices and time steps.
This thesis examines several types of sensor resource management problems. We fol
low three diﬀerent approaches: ﬁrstly, we examine performance guarantees that can be
obtained for simple heuristic algorithms applied to certain classes of problems; secondly,
we exploit structure that arises in problems involving multiple independent objects to
eﬃciently ﬁnd optimal or guaranteed nearoptimal solutions; and ﬁnally, we ﬁnd a
heuristic solution to a speciﬁc problem structure that arises in problems involving sen
sor networks.
19
20 CHAPTER 1. INTRODUCTION
1.1 Canonical problem structures
Sensor resource management has received considerable attention from the research com
munity over the past two decades. The following three canonical problem structures,
which have been discussed by several authors, provide a rough classiﬁcation of existing
work, and of the problems examined in this thesis:
Waveform selection. The ﬁrst problem structure involves a single object, which can
be observed using diﬀerent modes of a sensor, but only one mode can be used at
a time. The role of the controller is to select the best mode of operation for the
sensor in each time step. An example of this problem is in object tracking using
radar, in which diﬀerent signals can be transmitted in order to obtain information
about diﬀerent aspects of the object state (such as position, velocity or identity).
Beam steering. A related problem involves multiple objects observed by a sensor.
Each object evolves according to an independent stochastic process. At each time
step, the controller may choose which object to observe; the observation models
corresponding to diﬀerent objects are also independent. The role of the controller
is to select which object to observe in each time step. An example of this problem
is in optical tracking and identiﬁcation using steerable cameras.
Platform steering. A third problem structure arises when the sensor possesses an
internal state that aﬀects which observations are available or the costs of obtain
ing those observations. The internal state evolves according to a fully observed
Markov random process. The controller must choose actions to inﬂuence the sen
sor state such that the usefulness of the observations is optimized. Examples of
this structure include control of UAV sensing platforms, and dynamic routing of
measurements and models in sensor networks.
These three structures can be combined and extended to scenarios involving wave
form selection, beam steering and platform steering with multiple objects and multiple
sensors. An additional complication that commonly arises is when observations require
diﬀerent or random time durations (or, more abstractly, costs) to complete.
1.2 Waveform selection and beam steering
The waveform selection problem naturally arises in many diﬀerent application areas. Its
name is derived from active radar and sonar, where the time/frequency characteristics
Sec. 1.2. Waveform selection and beam steering 21
of the transmitted waveform aﬀect the type of information obtained in the return.
Many other problems share the same structure, i.e., one in which diﬀerent control
decisions obtain diﬀerent types of information about the same underlying process. Other
examples include:
• Passive sensors (e.g., radar warning receivers) often have limited bandwidth, but
can choose which interval of the frequency spectrum to observe at each time. Dif
ferent choices will return information about diﬀerent aspects of the phenomenon
of interest. A similar example is the use of cameras with controllable pan and
zoom to detect, track and identify people or cars.
• In ecological and geological applications, the phenomenon of interest is often com
prised of the state of a large interconnected system. The dependencies within the
system prevent the type of decomposition that is used in beam steering, and sen
sor resource management must be approached as a waveform selection problem
involving diﬀerent observations of the full system. Examples of problems of this
type include monitoring of air quality, ocean temperature and depth mapping,
and weather observation.
• Medical diagnosis concerns the determination of the true physiological state of a
patient, which is evolving in time according to an underlying dynamical system.
The practitioner has at their disposal a range of tests, each of which provides
observations of diﬀerent aspects of the phenomenon of interest. Associated with
each test is a notion of cost, which encompasses time, patient discomfort, and
economical considerations. The essential structure of this problem ﬁts within the
waveform selection category.
Beam steering may be seen to be a special case of the waveform selection problem.
For example, consider the hyperobject that encompasses all objects being tracked.
Choosing to observe diﬀerent constituent objects will result in information relevant to
diﬀerent aspects of the hyperobject. Of course it is desirable to exploit the speciﬁc
structure that exists in the case of beam steering.
Many authors have approached the waveform selection and beam steering problems
by proposing an estimation performance measure, and optimizing the measure over the
next time step. This approach is commonly referred to as greedy or myopic, since it
does not consider future observation opportunities. Most of the nonmyopic extensions
of these methods are either tailored to very speciﬁc problem structure (observation
22 CHAPTER 1. INTRODUCTION
models, dynamics models, etc), or are limited to considering two or three time intervals
(longer planning horizons are typically computationally prohibitive). Furthermore, it
is unclear when additional planning can be beneﬁcial.
1.3 Sensor networks
Networks of wireless sensors have the potential to provide unique capabilities for mon
itoring and surveillance due to the close range at which phenomena of interest can be
observed. Application areas that have been investigated range from agriculture to eco
logical and geological monitoring to object tracking and identiﬁcation. Sensor networks
pose a particular challenge for resource management: not only are there short term
resource constraints due to limited communication bandwidth, but there are also long
term energy constraints due to battery limitations. This necessitates long term plan
ning: for example, excessive energy should not be consumed in obtaining information
that can be obtained a little later on at a much lower cost. Failure to do so will result
in a reduced operational lifetime for the network.
It is commonly the case that the observations provided by sensors are highly infor
mative if the sensor is in the close vicinity of the phenomenon of interest, and compara
tively uninformative otherwise. In the context of object tracking, this has motivated the
use of a dynamically assigned leader node, which determines which sensors should take
and communicate observations, and stores and updates the knowledge of the object as
new observations are obtained. The choice of leader node should naturally vary as the
object moves through the network. The resulting structure falls within the framework
of platform steering, where the sensor state is the currently activated leader node.
1.4 Contributions and thesis outline
This thesis makes contributions in three areas. Firstly, we obtain performance guaran
tees that delineate problems in which additional planning is and is not beneﬁcial. We
then examine two problems in which longterm planning can be beneﬁcial, ﬁnding an
eﬃcient integer programming solution that exploits the structure of beam steering, and
ﬁnally, ﬁnding an eﬃcient heuristic sensor management method for object tracking in
sensor networks.
Sec. 1.4. Contributions and thesis outline 23
1.4.1 Performance guarantees for greedy heuristics
Recent work has resulted in performance guarantees for greedy heuristics in some ap
plications, but there remains no guarantee that is applicable to sequential problems
without very special structure in the dynamics and observation models. The analysis
in Chapter 3 obtains guarantees similar to the recent work in [46] for the sequential
problem structures that commonly arise in waveform selection and beam steering. The
result is quite general in that it applies to arbitrary, time varying dynamics and obser
vation models. Several extensions are obtained, including tighter bounds that exploit
either process diﬀusiveness or objectives involving discount factors, and applicability
to closed loop problems. The results apply to objectives including mutual information,
and the posterior Cram´erRao bound. Examples demonstrate that the bounds are tight,
and counterexamples illuminate larger classes of problems to which they do not apply.
1.4.2 Eﬃcient solution for beam steering problems
The analysis in Chapter 4 exploits the special structure in problems involving large
numbers of independent objects to ﬁnd an eﬃcient solution of the beam steering prob
lem. The analysis from Chapter 3 is utilized to obtain an upper bound on the objective
function. Solutions with guaranteed nearoptimality are found by simultaneously re
ducing the upper bound and raising a matching lower bound.
The algorithm has quite general applicability, admitting time varying observation
and dynamical models, and observations requiring diﬀerent time durations to complete.
Computational experiments demonstrate application to problems involving 50–80 ob
jects planning over horizons up to 60 time slots. An alternative formulation, which is
able to address time invariant rewards with a further computational saving, is also dis
cussed. The methods apply to the same wide range of objectives as Chapter 3, including
mutual information and the posterior Cram´erRao bound.
1.4.3 Sensor network management
In Chapter 5, we seek to trade oﬀ estimation performance and energy consumed in an
object tracking problem. We approach the trade oﬀ between these two quantities by
maximizing estimation performance subject to a constraint on energy cost, or the dual
of this, i.e., minimizing energy cost subject to a constraint on estimation performance.
We assign to each operation (sensing, communication, etc) an energy cost, and then
we seek to develop a mechanism that allows us to choose only those actions for which
the resulting estimation gain received outweighs the energy cost incurred. Our analysis
24 CHAPTER 1. INTRODUCTION
proposes a planning method that is both computable and scalable, yet still captures
the essential structure of the underlying trade oﬀ. Simulation results demonstrate a
dramatic reduction in the communication cost required to achieve a given estimation
performance level as compared to previously proposed algorithms.
Chapter 2
Background
T
HIS section provides an outline of the background theory which we utilize to de
velop our results. The primary problem of interest is that of detecting, tracking
and identifying multiple objects, although many of the methods we discuss could be
applied to any other dynamical process.
Sensor management requires an understanding of several related topics: ﬁrst of all,
one must develop a statistical model for the phenomenon of interest; then one must
construct an estimator for conducting inference on that phenomenon. One must select
an objective that measures how successful the sensor manager decisions have been, and,
ﬁnally, one must design a controller to make decisions using the available inputs.
In Section 2.1, we brieﬂy outline the development of statistical models for object
tracking before describing some of the estimation schemes we utilize in our experiments.
Section 2.2 outlines the theory of stochastic control, the category of problems in which
sensor management naturally belongs. In Section 2.3, we describe the information
theoretic objective functions that we utilize, and certain properties of the objectives
that we utilize throughout the thesis. Section 2.4 details some existing results that
have been applied to related problems to guarantee performance of simple heuristic
algorithms; the focus of Chapter 3 is extending these results to sequential problems.
Section 2.5 brieﬂy outlines the theory of linear and integer programming that we utilize
in Chapter 4. Finally, Section 2.6 surveys the existing work in the ﬁeld, and contrasts
the approaches presented in the later chapters to the existing methods.
2.1 Dynamical models and estimation
This thesis will be concerned exclusively with sensor management systems that are
based upon statistical models. The starting point of such algorithms is a dynamical
model which captures mathematically how the physical process evolves, and how the
observations taken by the sensor relate to the model variables. Having designed this
25
26 CHAPTER 2. BACKGROUND
model, one can then construct an estimator which uses sensor observations to reﬁne
one’s knowledge of the state of the underlying physical process.
Using a Bayesian formulation, the estimator maintains a representation of the condi
tional probability density function (PDF) of the process state conditioned on the obser
vations incorporated. This representation is central to the design of sensor management
algorithms, which seek to choose sensor actions in order to minimize the uncertainty in
the resulting estimate.
In this section, we outline the construction of dynamical models for object tracking,
and then brieﬂy examine several estimation methods that one may apply.
2.1.1 Dynamical models
Traditionally, dynamical models for object tracking are based upon simple observations
regarding behavior of targets and the laws of physics. For example, if we are tracking
an aircraft and it moving at an essentially constant velocity at one instant in time, it
will probably still be moving at a constant velocity shortly afterward. Accordingly, we
may construct a mathematical model based upon Newtonian dynamics. One common
model for nonmaneuvering objects hypothesizes that velocity is a random walk, and
position is the integral of velocity:
˙ v(t) = w(t) (2.1)
˙ p(t) = v(t) (2.2)
The process w(t) is formally deﬁned as a continuous time white noise with strength
Q(t). This strength may be chosen in order to model the expected deviation from the
nominal trajectory.
In tracking, the underlying continuous time model is commonly chosen to be a
stationary linear Gaussian system. Given any such model,
˙ x(t) = F
c
x(t) +w(t) (2.3)
where w(t) is a zeromean Gaussian white noise process with strength Q
c
, we can
construct a discretetime model which has the equivalent eﬀect at discrete sample points.
This model is given by: [67]
x
k+1
= Fx
k
+w
k
(2.4)
where
1
F = exp[F
c
∆t] (2.5)
1
exp[·] denotes the matrix exponential.
Sec. 2.1. Dynamical models and estimation 27
∆t is the time diﬀerence between subsequent samples and w
k
is a zeromean discrete
time Gaussian white noise process with covariance
Q =
_
∆t
0
exp[F
c
τ]Q
c
exp[F
c
τ]
T
dτ (2.6)
As an example, we consider tracking in two dimensions using the nominally constant
velocity model described above:
˙ x(t) =
_
¸
¸
¸
¸
_
0 1 0 0
0 0 0 0
0 0 0 1
0 0 0 0
_
¸
¸
¸
¸
_
x(t) +w(t) (2.7)
where x(t) = [p
x
(t) v
x
(t) p
y
(t) v
y
(t)]
T
and w(t) = [w
x
(t) w
y
(t)]
T
is a continuous time
zeromean Gaussian white noise process with strength Q
c
= qI
2×2
. The equivalent
discretetime model becomes:
x
k+1
=
_
¸
¸
¸
¸
_
1 ∆t 0 0
0 1 0 0
0 0 1 ∆t
0 0 0 1
_
¸
¸
¸
¸
_
x
k
+w
k
(2.8)
where w
k
is a discrete time zeromean Gaussian white noise process with covariance
Q = q
_
¸
¸
¸
¸
_
∆t
3
3
∆t
2
2
0 0
∆t
2
2
∆t 0 0
0 0
∆t
3
3
∆t
2
2
0 0
∆t
2
2
∆t
_
¸
¸
¸
¸
_
(2.9)
Objects undergoing frequent maneuvers are commonly modelled using jump Markov
linear systems. In this case, the dynamical model at any time is a linear system, but
the parameters of the linear system change at discrete time instants; these changes
are modelled through a ﬁnite state Markov chain. While not explicitly explored in this
thesis, the jump Markov linear system can be addressed by the methods and guarantees
we develop. We refer the reader to [6] for further details of estimation using jump
Markov linear systems.
2.1.2 Kalman ﬁlter
The Kalman ﬁlter is the optimal estimator according to most sensible criteria, including
mean square error, mean absolute error and uniform cost, for a linear dynamical system
28 CHAPTER 2. BACKGROUND
with additive white Gaussian noise and linear observations with additive white Gaussian
noise. If we relax the Gaussianity requirement on the noise processes, the Kalman ﬁlter
remains the optimal linear estimator according to the mean square error criterion. We
brieﬂy outline the Kalman ﬁlter below; the reader is referred to [3, 6, 27, 67] for more
indepth treatments.
We consider the discrete time linear dynamical system:
x
k+1
= F
k
x
k
+w
k
(2.10)
commencing from x
0
∼ ^¦x
0
; ˆ x
00
, P
00
¦. The dynamics noise w
k
is a the white noise
process, w
k
∼ ^¦w
k
; 0, Q
k
¦ which is uncorrelated with x
0
. We assume a linear obser
vation model:
z
k
= H
k
x
k
+v
k
(2.11)
where v
k
∼ ^¦v
k
; 0, R
k
¦ is a white noise process that is uncorrelated with x
0
and with
the process v
k
. The Kalman ﬁlter equations include a propagation step:
ˆ x
kk−1
= F
k
ˆ x
k−1k−1
(2.12)
P
kk−1
= F
k
P
k−1k−1
F
T
k
+Q
k
(2.13)
and an update step:
ˆ x
kk
= ˆ x
kk−1
+K
k
[z
k
−H
k
ˆ x
kk−1
] (2.14)
P
kk
= P
kk−1
−K
k
H
k
P
kk−1
(2.15)
K
k
= P
kk−1
H
T
k
[H
k
P
kk−1
H
T
k
+R
k
]
−1
(2.16)
ˆ x
kk−1
is the estimate of x
k
conditioned on observations up to and including time
(k − 1), while P
kk−1
is the covariance of this error. In the Gaussian case, these
two parameters completely describe the posterior distribution, i.e., p(x
k
[z
0:k−1
) =
^¦x
k
; ˆ x
kk−1
, P
kk−1
¦. Similar comments apply to ˆ x
kk
and P
kk
.
Finally, we note that the recursive equations for the covariance P
kk−1
and P
kk
, and
the gain K
k
are both invariant to the value of the observations received z
k
. Accordingly,
both the ﬁlter gain and covariance may be computed oﬄine in advance and stored. As
we will see in Section 2.3, in the linear Gaussian case, the uncertainty in an estimate
as measured through entropy is dependent only upon the covariance matrix, and hence
this too can be calculated oﬄine.
Sec. 2.1. Dynamical models and estimation 29
2.1.3 Linearized and extended Kalman ﬁlter
While the optimality guarantees for the Kalman ﬁlter apply only to linear systems,
the basic concept is regularly applied to nonlinear systems through two algorithms
known as the linearized and extended Kalman ﬁlters. The basic concept is that a mild
nonlinearity may be approximated as being linear about a nominal point though a
Taylor series expansion. In the case of the linearized Kalman ﬁlter, the linearization
point is chosen in advance; the extended Kalman ﬁlter relinearizes online about the
current estimate value. Consequently, the linearized Kalman ﬁlter retains the ability to
calculate ﬁlter gains and covariance matrices in advance, whereas the extended Kalman
ﬁlter must compute both of these online.
In this document, we assume that the dynamical model is linear, and we present
the equations for the linearized and extended Kalman ﬁlters for the case in which the
only nonlinearity present is in the observation equation. This is most commonly the
case in tracking applications. The reader is directed to [68], the primary source for
this material, for information on the nonlinear dynamical model case. The model we
consider is:
x
k+1
= F
k
x
k
+w
k
(2.17)
z
k
= h(x
k
, k) +v
k
(2.18)
where, as in Section 2.1.2, w
k
and v
k
are uncorrelated white Gaussian noise processes
with known covariance, both of which are uncorrelated with x
0
. The linearized Kalman
ﬁlter calculates a linearized measurement model about a prespeciﬁed nominal state
trajectory ¦¯ x
k
¦
k=1,2,...
:
z
k
≈ h(¯ x
k
, k) +H(¯ x
k
, k)[x
k
− ¯ x
k
] +v
k
(2.19)
where
H(¯ x
k
, k) = [∇
x
h(x, k)
T
]
T
[
x=¯ x
k
(2.20)
and ∇
x
[
∂
∂x
1
,
∂
∂x
2
, . . . ,
∂
∂xnx
]
T
where n
x
is the number of elements in the vector x.
The linearized Kalman ﬁlter update equation is therefore:
ˆ x
kk
= ˆ x
kk−1
+K
k
¦z
k
−h(¯ x
k
, k) −H(¯ x
k
, k)[x
k
− ¯ x
k
]¦ (2.21)
P
kk
= P
kk−1
−K
k
H(¯ x
k
, k)P
kk−1
(2.22)
K
k
= P
kk−1
H(¯ x
k
, k)
T
[H(¯ x
k
, k)P
kk−1
H(¯ x
k
, k)
T
+R
k
]
−1
(2.23)
30 CHAPTER 2. BACKGROUND
Again we note that the ﬁlter gain and covariance are both invariant to the observation
values, and hence they can be precomputed.
The extended Kalman ﬁlter diﬀers only in the in the point about which the model
is linearized. In this case, we linearize about the current state estimate:
z
k
≈ h(ˆ x
kk−1
, k) +H(ˆ x
kk−1
, k)[x
k
− ˆ x
kk−1
] +v
k
(2.24)
The extended Kalman ﬁlter update equation becomes:
ˆ x
kk
= ˆ x
kk−1
+K
k
[z
k
−h(ˆ x
kk−1
, k)] (2.25)
P
kk
= P
kk−1
−K
k
H(ˆ x
kk−1
, k)P
kk−1
(2.26)
K
k
= P
kk−1
H(ˆ x
kk−1
, k)
T
[H(ˆ x
kk−1
, k)P
kk−1
H(ˆ x
kk−1
, k)
T
+R
k
]
−1
(2.27)
Since the ﬁlter gain and covariance are dependent on the state estimate and hence the
previous observation values, the extended Kalman ﬁlter must be computed online.
2.1.4 Particle ﬁlters and importance sampling
In many applications, substantial nonlinearity is encountered in observation models,
and the coarse approximation performed by the extended Kalman ﬁlter is inadequate.
This is particularly true in sensor networks, since the local focus of observations yields
much greater nonlinearity in range or bearing observations than arises when sensors are
distant from the objects under surveillance. This nonlinearity can result in substantial
multimodality in posterior distributions (such as results when one receives two range
observations from sensors in diﬀerent locations) which cannot be eﬃciently modelled
using a Gaussian distribution. We again assume a linear dynamical model (although
this is by no means required) and a nonlinear observation model:
x
k+1
= F
k
x
k
+w
k
(2.28)
z
k
= h(x
k
, k) +v
k
(2.29)
We apply the same assumptions on w
k
and v
k
as in previous sections.
The particle ﬁlter [4, 28, 83] is an approximation which is commonly used in problems
involving a high degree of nonlinearity and/or nonGaussianity. The method is based
on importance sampling, which enables one to approximate an expectation under one
distribution using samples drawn from another distribution. Using a particle ﬁlter,
the conditional PDF of object state x
k
conditioned on observations received up to
Sec. 2.1. Dynamical models and estimation 31
and including time k, z
0:k
, p(x
k
[z
0:k
), is approximated through a set of N
p
weighted
samples:
p(x
k
[z
0:k
) ≈
Np
i=1
w
i
k
δ(x
k
−x
i
k
) (2.30)
Several variants of the particle ﬁlter diﬀer in the way in which this approximated is
propagated and updated from step to step. Perhaps the most common (and the easiest
to implement) is the Sampling Importance Resampling (SIR) ﬁlter. This algorithm ap
proximates the propagation step by using the dynamics model as a proposal distribution,
drawing a random sample for each particle from the distribution x
i
k+1
∼ p(x
k+1
[x
i
k
),
to yield an approximation of the prior density at the next time step of:
p(x
k+1
[z
0:k
) ≈
Np
i=1
w
i
k
δ(x
k+1
−x
i
k+1
) (2.31)
The algorithm then uses importance sampling to reweight these samples to implement
the Bayes update rule for incorporating observations:
p(x
k+1
[z
0:k+1
) =
p(z
k+1
[x
k+1
)p(x
k+1
[z
0:k
)
p(z
k+1
[z
0:k
)
(2.32)
=
Np
i=1
w
i
k
p(z
k+1
[x
i
k+1
)δ(x
k+1
−x
i
k+1
)
p(z
k+1
[z
0:k
)
(2.33)
=
Np
i=1
w
i
k+1
δ(x
k+1
−x
i
k+1
) (2.34)
where
w
i
k+1
=
w
i
k
p(z
k+1
[x
i
k+1
)
Np
j=1
w
j
k
p(z
k+1
[x
j
k+1
)
(2.35)
The ﬁnal step of the SIR ﬁlter is to draw a new set of N
p
samples from the updated dis
tribution to reduce the number of samples allocated to unlikely regions and reinitialize
the weights to be uniform.
A more sophisticated variant of the particle ﬁlter is the Sequential Importance Sam
pling (SIS) algorithm. Under this algorithm, for each previous sample x
i
k
, we draw a new
sample at the next time step, x
k+1
, from the proposal distribution q(x
k+1
[x
i
k
, z
k+1
).
This is commonly approximated using a linearization of the measurement model for
z
k+1
(Eq. (2.29)) about the point F
k
x
i
k
, as described in Eq. (2.19). This distribution
can be obtained using the extended Kalman ﬁlter equations: the Dirac delta function
δ(x
k
−x
i
k
) at time k will diﬀuse to give:
p(x
k+1
[x
i
k
) = ^(x
k+1
; F
k
x
i
k
; Q
k
) (2.36)
32 CHAPTER 2. BACKGROUND
at time (k +1). This distribution can be updated using the EKF update equation (Eq.
(2.25)–(2.27)) to obtain:
q(x
k+1
[x
i
k
, z
k+1
) = ^(x
k+1
; ˆ x
i
k+1
, P
i
k+1
) (2.37)
where
ˆ x
i
k+1
= F
k
x
i
k
+K
i
k+1
[z
k+1
−h(F
k
x
i
k
, k)] (2.38)
P
i
k+1
= Q
k
−K
i
k+1
H(F
k
x
i
k
, k)Q
k
(2.39)
K
i
k+1
= Q
k
H(F
k
x
i
k
, k)
T
[H(Fx
i
k
, k)Q
k
H(Fx
i
k
, k)
T
+R
k
]
−1
(2.40)
Because the linearization is operating in a localized region, one can obtain greater
accuracy than is possible using the EKF (which uses a single linearization point). A
new particle x
i
k+1
is drawn from the distribution in Eq. (2.37), and the importance
sampling weight w
i
k+1
is calculated by
w
i
k+1
= cw
i
k
p(z
k+1
[x
i
k+1
)p(x
i
k+1
[x
i
k
)
q(x
i
k+1
[x
i
k
, z
k+1
)
(2.41)
where c is the normalization constant necessary to ensure that
Np
i=1
w
i
k+1
= 1, and
p(z
k+1
[x
i
k+1
) = ^¦z
k+1
; h(x
i
k+1
, k), R
k
¦. The resulting approximation for the distri
bution of x
k+1
conditioned on the measurements z
0:k+1
is:
p(x
k+1
[z
0:k+1
) ≈
Np
i=1
w
i
k+1
δ(x
k+1
−x
i
k+1
) (2.42)
At any point in time, a Gaussian representation can be momentmatched to the
particle distribution by calculating the mean and covariance:
ˆ x
k
=
Np
i=1
w
i
k
x
i
k
(2.43)
P
k
=
Np
i=1
w
i
k
(x
i
k
− ˆ x
k
)(x
i
k
− ˆ x
k
)
T
(2.44)
2.1.5 Graphical models
In general, the complexity of an estimation problem increases exponentially as the
number of variables increases. Probabilistic graphical models provide a framework for
Sec. 2.1. Dynamical models and estimation 33
recognizing and exploiting structure which allows for eﬃcient solution. Here we brieﬂy
describe Markov random ﬁelds, a variety of undirected graphical model. Further details
can be found in [38, 73, 92].
We assume that our model is represented as as graph ( consisting of vertices 1 and
edges c ⊆ 1 1. Corresponding to each vertex v ∈ 1 is a random variable x
v
and
several possible observations (from which we may choose some subset) of that variable,
¦z
1
v
, . . . , z
nv
v
¦. Edges represent dependences between the local random variables, i.e.,
(v, w) ∈ c denotes that variables x
v
and x
w
have direct dependence on each other. All
observations are assumed to depend only on the corresponding local random variable.
In this case, the joint distribution function can be shown to factorize over the max
imal cliques of the graph. A clique is deﬁned as a set of vertices ( ⊆ 1 which are fully
connected, i.e., (v, w) ∈ c ∀ v, w ∈ (. A maximal clique is a clique which is not a subset
of any other clique in the graph (i.e., a clique for which no other vertex can be added
while still retaining full connectivity). Denoting the collection of all maximal cliques as
M, the joint distribution of variables and observations can be written as:
p(¦x
v
, ¦z
1
v
, . . . , z
nv
v
¦¦
v∈V
) ∝
C∈M
ψ(¦x
v
¦
v∈C
)
v∈V
nv
i=1
ψ(x
v
, z
i
v
) (2.45)
Graphical models are useful in recognizing independence structures which exist. For
example, two random variables x
v
and x
w
(v, w ∈ 1) are independent conditioned on a
given set of vertices T if there is no path connecting vertices v and w which does not
pass through any vertex in T. Obviously, if we denote by ^(v) the neighbors of vertex
v, then x
v
will be independent of all other variables in the graph conditioned on ^(v).
Estimation problems involving undirected graphical models with a tree as the graph
structure can be solved eﬃciently using the belief propagation algorithm. Some prob
lems involving sparse cyclic graphs can be addressed eﬃciently by combining small
numbers of nodes to obtain tree structure (referred to as a junction tree), but in gen
eral approximate methods, such as loopy belief propagation, are necessary. Estimation
in time series is a classical example of a treebased model: the Kalman ﬁlter (or, more
precisely, the Kalman smoother) may be seen to be equivalent to belief propagation
specialized to linear Gaussian Markov chains, while [35, 88, 89] extends particle ﬁltering
from Markov chains to general graphical models using belief propagation.
2.1.6 Cram´erRao bound
The Cram´erRao bound (CRB) [84, 91] provides a lower limit on the mean square error
performance achievable by any estimator of an underlying quantity. The simplest and
34 CHAPTER 2. BACKGROUND
most common form of the bound, presented below in Theorem 2.1, deals with unbiased
estimates of nonrandom parameters (i.e., parameters which are not endowed with a
prior probability distribution). We omit the various regularity conditions; see [84, 91]
for details. The notation A _ B implies that the matrix A−B is positive semideﬁnite
(PSD). We adopt convention from [90] that ∆
z
x
= ∇
x
∇
T
z
.
Theorem 2.1. Let x be a nonrandom vector parameter, and z be an observation with
distribution p(z[x) parameterized by x. Then any unbiased estimator of x based on z,
ˆ x(z), must satisfy the following bound on covariance:
E
zx
_
[ˆ x(z) −x][ˆ x(z) −x]
T
_
_ C
z
x
[J
z
x
]
−1
where J
z
x
is the Fisher information matrix, which can be calculated equivalently through
either of the following two forms:
J
z
x
E
zx
_
[∇
x
log p(z[x)][∇
x
log p(z[x)]
T
_
= E
zx
¦−∆
x
x
log p(z[x)¦
From the ﬁrst form above we see that the Fisher information matrix is positive semi
deﬁnite.
The posterior Cram´erRao bound (PCRB) [91] provides a similar performance limit
for dealing with random parameters. While the bound takes on the same form, the
Fisher information matrix now decomposes into two terms: one involving prior in
formation about the parameter, and another involving information gained from the
observation. Because we take an expectation over the possible values of x as well as z,
the bound applies to any estimator, biased or unbiased.
Theorem 2.2. Let x be a random vector parameter with probability distribution p(x),
and z be an observation with model p(z[x). Then any estimator of x based on z, ˆ x(z),
must satisfy the following bound on covariance:
E
x,z
_
[ˆ x(z) −x][ˆ x(z) −x]
T
_
_ C
z
x
[J
z
x
]
−1
where J
z
x
is the Fisher information matrix, which can be calculated equivalently through
Sec. 2.2. Markov decision processes 35
any of the following forms:
J
z
x
E
x,z
_
[∇
x
log p(x, z)][∇
x
log p(x, z)]
T
_
= E
x,z
¦−∆
x
x
log p(x, z)¦
= E
x
_
[∇
x
log p(x)][∇
x
log p(x)]
T
_
+ E
x,z
_
[∇
x
log p(z[x)][∇
x
log p(z[x)]
T
_
= E
x
¦−∆
x
x
log p(x)¦ + E
x,z
¦−∆
x
x
log p(z[x)¦
= E
x,z
_
[∇
x
log p(x[z)][∇
x
log p(x[z)]
T
_
= E
x,z
¦−∆
x
x
log p(x[z)¦
The individual terms:
J
∅
x
E
x
¦−∆
x
x
log p(x)¦
¯
J
z
x
E
x,z
¦−∆
x
x
log p(z[x)¦
are both positive semideﬁnite. We also deﬁne C
∅
x
[J
∅
x
]
−1
.
Convenient expressions for calculation of the PCRB in nonlinear ﬁltering problems
can be found in [90]. The recursive expressions are similar in form to the Kalman ﬁlter
equations.
2.2 Markov decision processes
Markov Decision Processes (MDPs) provide a natural way of formulating problems
involving sequential structure, in which decisions are made incrementally as additional
information is received. We will concern ourselves primarily with problems involving
planning over a ﬁnite number of steps (socalled ﬁnite horizon problems); in practice
we will design our controller by selecting an action for the current time considering the
following N time steps (referred to as rolling horizon or receding horizon control). The
basic problem formulation includes:
State. We denote by X
k
∈ A the decision state of the system at time k. The decision
state is a suﬃcient statistic for all past and present information upon which the
controller can make its decisions. The suﬃcient statistic must be chosen such
that future values are independent of past values conditioned on the present value
(i.e., it must form a Markov process).
36 CHAPTER 2. BACKGROUND
Control. We denote by u
k
∈ 
X
k
k
the control to be applied to the system at time k.

X
k
k
⊆  is the set of controls available at time k if the system is in state X
k
. In
some problem formulations this set will vary with time and state; in others it will
remain constant.
Transition. If the state at time k is X
k
and control u
k
is applied, then the state at time
(k+1), X
k+1
, will be distributed according to the probability measure P([X
k
; u
k
).
Reward. The objective of the system is speciﬁed as a reward (or cost) to be maximized
(or minimized). This consists of two components: the perstage reward g
k
(X
k
, u
k
),
which is the immediate reward if control u
k
is applied at time k from state X
k
,
and the terminal reward g
N
(X
N
), which is the reward associated with arriving in
state X
N
on completion of the problem.
The solution of problems with this structure comes in the form of a policy, i.e., a
rule that speciﬁes which control one should apply if one arrives in a particular state
at a particular time. We denote by µ
k
: A →  the policy for time k, and by π =
¦µ
1
, . . . , µ
N
¦ the timevarying policy for the ﬁnite horizon problem. The expected
reward to go of a given policy can be found through the following backward recursion:
J
π
k
(X
k
) = g
k
(X
k
, µ
k
(X
k
)) + E
X
k+1
∼P(·X
k
,µ
k
(X
k
))
J
π
k+1
(X
k+1
) (2.46)
commencing from the terminal condition J
π
N
(X
N
) = g
N
(X
N
). The expected reward to
go of the optimal policy can be formulated similarly as a backward recursion:
J
∗
k
(X
k
) = max
u
k
∈U
X
k
k
_
g
k
(X
k
, u
k
) + E
X
k+1
∼P(·X
k
,u
k
)
J
∗
k+1
(X
k+1
)
_
(2.47)
commencing from the same terminal condition, J
∗
N
(X
N
) = g
N
(X
N
). The optimal policy
is implicity speciﬁed by the optimal reward to go through the expression
µ
∗
k
(X
k
) = arg max
u
k
∈U
X
k
k
_
g
k
(X
k
, u
k
) + E
X
k+1
∼P(·X
k
,u
k
)
J
∗
k+1
(X
k+1
)
_
(2.48)
The expression in Eq. (2.46) can be used to evaluate the expected reward of a policy
when the cardinality of the state space A is small enough to allow computation and
storage for each element. Furthermore, assuming that the optimization is solvable,
Eq. (2.47) and Eq. (2.48) may be used to determine the optimal policy and its expected
reward. When the cardinality of the state space A is inﬁnite, this process in general
requires inﬁnite computation and storage, although there are special cases (such as
LQG) in which the reward functions admit a ﬁnite parameterization.
Sec. 2.2. Markov decision processes 37
2.2.1 Partially observed Markov decision processes
Partially observed MDPs (POMDPs) are a special case of MDPs in which one seeks to
control a dynamical system for which one never obtains the exact value of the state of
the system (denoted x
k
), but rather only noisecorrupted observations (denoted z
k
) of
some portion of the system state at each time step. In this case, one can reformulate
the problem as a fully observed MDP, in which the decision state is either the infor
mation vector (i.e., the history of all controls applied to the system and the resulting
observations), or the conditional probability distribution of system state conditioned on
previously received observations (which forms a suﬃcient statistic for the information
vector [9]).
The fundamental assumption of POMDPs is that the reward per stage and termi
nal reward, g
k
(x
k
, u
k
) and g
N
(x
N
), can be expressed as functions of the state of the
underlying system. Given the conditional probability distribution of system state, one
can then calculate an induced reward as the expected value of the given quantity, i.e.,
g
k
(X
k
, u
k
) =
x
k
g
k
(x
k
, u
k
)p(x
k
[z
0:k−1
; u
0:k−1
) (2.49)
g
N
(X
N
) =
x
N
g
N
(x
N
)p(x
N
[z
0:N−1
; u
0:N−1
) (2.50)
where X
k
= p(x
k
[z
0:k−1
; u
0:k−1
) is the conditional probability distribution which forms
the decision state of the system. There is a unique structure which results: the induced
reward per stage will be a linear function of the decision state, X
k
. In this case, one
can show [87] that the reward to go function at all time steps will subsequently be a
piecewise linear convex
2
function of the conditional probability distribution X
k
, i.e., for
some 1
k
and some v
i
k
(), i ∈ 1
k
,
J
∗
k
(X
k
) = max
i∈I
k
x
k
v
i
k
(x
k
)p(x
k
[z
0:k−1
; u
0:k−1
) (2.51)
Solution strategies for POMDPs exploit this structure extensively, solving for an optimal
(or nearoptimal) choice of these parameters. The limitation of these methods is that
they are restricted to small state spaces, as the size of the set 1
k
which is needed
in practice grows rapidly with the number of states, observation values and control
values. Theoretically, POMDPs are PSPACEcomplete (i.e., as hard as any problem
which is solvable using an amount memory that is polynomial in the problem size, and
unlimited computation time) [15, 78]; empirically, a 1995 study [63] found solution times
2
or piecewise linear concave in the case of minimizing cost rather than maximizing reward.
38 CHAPTER 2. BACKGROUND
in the order of hours for problems involving ﬁfteen underlying system states, ﬁfteen
observation values and four actions. Clearly, such strategies are intractable in cases
where there is an inﬁnite state space, e.g., when the underlying system state involves
continuous elements such as position and velocity. Furthermore, attempts to discretize
this type of state space are also unlikely to arrive at a suﬃciently small number of states
for this class of algorithm to be applicable.
2.2.2 Open loop, closed loop and open loop feedback
The assumption inherent in MDPs is that the decision state is revealed to the controller
at each decision stage. The optimal policy uses this new information as it becomes avail
able, and anticipates the arrival of future information through the Markov transition
model. This is referred to as closed loop control (CLC). Open loop control (OLC)
represents the opposite situation: where one constructs a plan for the ﬁnite horizon
(i.e., single choice of which control to apply at each time, as opposed to a policy), and
neither anticipates the availability of future information, nor utilizes that information
as it arrives.
Open Loop Feedback Control (OLFC) is a compromise between these two extremes:
like an open loop controller, a plan (rather than a policy) is constructed for the ﬁnite
horizon at each step of the problem. This plan does not anticipate the availability
future information; only a policy can do so. However, unlike an open loop controller,
when new information is received it is utilized in constructing an updated plan. The
controller operates by constructing a plan for the ﬁnite horizon, executing one or more
steps of that plan, and then constructing a new plan which incorporates the information
received in the interim.
There are many problems in which solving the MDP is intractable, yet open loop
plans can be found within computational limitations. The OLFC is a commonly used
suboptimal method in these situations. One can prove that the performance of the
optimal OLFC is no worse than the optimal OLC [9]; the diﬀerence in performance
between OLFC and CLC can be arbitrarily large.
2.2.3 Constrained dynamic programming
In Chapter 5 we will consider sensor resource management in sensor networks. In this
application, there is a fundamental tradeoﬀ which arises between estimation perfor
mance and energy cost. A natural way of approaching such a tradeoﬀ is as a constrained
optimization, optimizing one quantity subject to a constraint on the other.
Sec. 2.2. Markov decision processes 39
Constrained dynamic programming has been explored by previous authors in [2, 13,
18, 94]. We describe a method based on Lagrangian relaxation, similar to that in [18],
which yields a convenient method of approximate evaluation for the problem examined
in Chapter 5.
We seek to minimize the cost over an Nstep rolling horizon, i.e., at time k, we
minimize the cost incurred in the planning horizon involving steps ¦k, . . . , k + N −1¦.
Denoting by µ
k
(X
k
) the control policy for time k, and by π
k
= ¦µ
k
, . . . , µ
k+N−1
¦ the set
of policies for the next N time steps, we seek the policy corresponding to the optimal
solution to the constrained minimization problem:
min
π
E
_
k+N−1
i=k
g(X
i
, µ
i
(X
i
))
_
s.t. E
_
k+N−1
i=k
G(X
i
, µ
i
(X
i
))
_
≤ M (2.52)
where g(X
k
, u
k
) is the perstage cost and G(X
k
, u
k
) is the perstage contribution to the
additive constraint function. We address the constraint through a Lagrangian relax
ation, a common approximation method for discrete optimization problems, by deﬁning
the dual function:
J
D
k
(X
k
, λ) = min
π
E
_
k+N−1
i=k
g(X
i
, µ
i
(X
i
)) + λ
_
k+N−1
i=k
G(X
i
, µ
i
(X
i
)) −M
__
(2.53)
and solving the dual optimization problem involving this function:
J
L
k
(X
k
) = max
λ≥0
J
D
k
(X
k
, λ) (2.54)
We note that the dual function J
D
k
(X
k
, λ) takes the form of an unconstrained dynamic
program with a modiﬁed perstage cost:
¯ g(X
k
, u
k
, λ) = g(X
k
, u
k
) + λG(X
k
, u
k
) (2.55)
The optimization of the dual problem provides a lower bound to the minimum value
of the original constrained problem; the presence of a duality gap is possible since the
optimization space is discrete. The size of the duality gap is given by the expression
λE[
i
G(X
i
, µ
i
(X
i
)) − M], where π
k
= ¦µ
i
(, )¦
i=k:k+N−1
is the policy attaining the
minimum in Eq. (2.53) for the value of λ attaining the maximum in Eq. (2.54). If
it happens that the optimal solution produced by the dual problem has no duality
40 CHAPTER 2. BACKGROUND
gap, then the resulting solution is also the optimal solution of the original constrained
problem. This can occur in one of two ways: either the Lagrange multiplier λ is zero,
such that the solution of the unconstrained problem satisﬁes the constraint, or the
solution yields a result for which the constraint is tight. If a duality gap exists, a better
solution may exist satisfying the constraint; however, the solution returned would have
been optimal if the constraint level had been lower such that the constraint was tight.
The method described in [18] avoids a duality gap utilizing randomized policies.
Conceptually, the dual problem in Eq. (2.54) can be solved using a subgradient
method [10]. The following expression can be seen to be a supergradient
3
of the dual
objective:
S(X
k
, π
k
, λ) = E
_
i
G(X
i
, µ
i
(X
i
)) −M
_
(2.56)
In other words, S(X
k
, π
k
, λ) ∈ ∂J
D
k
(X
k
, λ), where ∂ denotes the superdiﬀerential, i.e.,
the set of all supergradients. The subgradient method operates according to the same
principle as a gradient search, iteratively stepping in the direction of a subgradient
with a decreasing step size [10]. For a single constraint, one may also employ methods
such a line search; for multiple constraints the linear programming column generation
procedure described in [16, 94] can be more eﬃcient.
2.3 Information theoretic objectives
In some circumstances, the most appropriate choice of reward function might be obvious
from the system speciﬁcation. For example, if a sensing system is being used to estimate
the location of a stranded yachtsman in order to minimize the distance from the survivor
to where airdropped supplies land, then a natural objective would be to minimize the
expected landing distance, or to maximize the probability that the distance is less than a
critical threshold. Each of these relates directly to a speciﬁc quantity at a speciﬁc time.
As the highlevel system objective becomes further removed from the performance of the
sensing system, the most appropriate choice of reward function becomes less apparent.
When an application demands continual tracking of multiple objects without a direct
terminal objective, it is unclear what reward function should be selected.
Entropy is a commonlyused measure of uncertainty in many applications including
sensor resource management, e.g., [32, 41, 61, 95]. This section explores the deﬁnitions
of and basic inequalities involving entropy and mutual information. All of the results
3
Since we are maximizing a nondiﬀerentiable concave function rather than minimizing a non
diﬀerentiable convex function, subgradients are replaced by supergradients.
Sec. 2.3. Information theoretic objectives 41
presented are wellknown, and can be found in classical texts such as [23]. Throughout
this document we use Shannon’s entropy, as opposed to the generalization referred to as
Renyi entropy. We will exploit various properties that are unique to Shannon entropy.
2.3.1 Entropy
Entropy, joint entropy and conditional entropy are deﬁned as:
H(x) = −
_
p(x) log p(x)dx (2.57)
H(x, z) = −
__
p(x, z) log p(x, z)dxdz (2.58)
H(x[z) = −
_
p(z)
_
p(x[z) log p(x[z)dxdz (2.59)
= H(x, z) −H(z) (2.60)
The above deﬁnitions relate to diﬀerential entropy, which concerns continuous variables.
If the underlying sets are discrete, then a counting measure is used, eﬀectively replacing
the integral by a summation. In the discrete case, we have H(x) ≥ 0. It is traditional
to use a base2 logarithm when dealing with discrete variables, and a natural logarithm
when dealing with continuous quantities. We will also use a natural logarithm in cases
involving a mixture of continuous and discrete quantities.
The conditioning in H(x[z) in Eq. (2.59) is on the random variable z, hence an
expectation is performed over the possible values that the variable may ultimately
assume. We can also condition on a particular value of a random variable:
H(x[z = ζ) = −
_
p(x[z = ζ) log p(x[z = ζ)dx (2.61)
We will sometimes use the notation H(x[ˇ z) to denote conditioning on a particular value,
i.e., H(x[ˇ z) H(x[z = ˇ z). Comparing Eq. (2.59) and Eq. (2.61), we observe that:
H(x[z) =
_
p
z
(ζ)H(x[z = ζ)dζ (2.62)
42 CHAPTER 2. BACKGROUND
2.3.2 Mutual information
Mutual information (MI) is deﬁned as the expected reduction in entropy in one random
variable due to observation of another variable:
I(x; z) =
__
p(x, z) log
p(x, z)
p(x)p(z)
dxdz (2.63)
= H(x) −H(x[z) (2.64)
= H(z) −H(z[x) (2.65)
= H(x) + H(z) −H(x, z) (2.66)
Like conditional entropy, conditional MI can be deﬁned with conditioning on either a
random variable, or a particular value. In either case, the conditioning appears in all
terms of the deﬁnition, i.e., in the case of conditioning on a random variable y,
I(x; z[y) =
_
p
y
(ψ)
__
p(x, z[y = ψ) log
p(x, z[y = ψ)
p(x[y = ψ)p(z[y = ψ)
dxdzdψ (2.67)
= H(x[y) −H(x[z, y) (2.68)
= H(z[y) −H(z[x, y) (2.69)
= H(x[y) + H(z[y) −H(x, z[y) (2.70)
and in the case of conditioning on a particular value, ψ:
I(x; z[y = ψ) =
__
p(x, z[y = ψ) log
p(x, z[y = ψ)
p(x[y = ψ)p(z[y = ψ)
dxdz (2.71)
= H(x[y = ψ) −H(x[z, y = ψ) (2.72)
= H(z[y = ψ) −H(z[x, y = ψ) (2.73)
= H(x[y = ψ) + H(z[y = ψ) −H(x, z[y = ψ) (2.74)
Again, we will sometimes use the notation I(x; z[ˇ y) to indicate conditioning on a par
ticular value, i.e. I(x; z[ˇ y) I(x; z[y = ˇ y). Also note that, like conditional entropy, we
can write:
I(x; z[y) =
_
p
y
(ψ)I(x; z[y = ψ)dψ (2.75)
The chain rule of mutual information allows us to expand a mutual information
expression into the sum of terms:
I(x; z
1
, . . . , z
n
) =
n
i=1
I(x; z
i
[z
1
, . . . , z
i−1
) (2.76)
Sec. 2.3. Information theoretic objectives 43
The ith term in the sum, I(x; z
i
[z
1
, . . . , z
i−1
) represents the incremental gain we obtain
in our knowledge of x due to the new observation z
i
, conditioned on the previous
observation random variables (z
1
, . . . , z
i−1
).
Suppose we have observations (z
1
, . . . , z
n
) of an underlying state (x
1
, . . . , x
n
), where
observation z
i
is independent of all other observations and underlying states when
conditioned on x
i
. In this case, we ﬁnd:
I(x
1
, . . . , x
n
; z
i
) = H(z
i
) −H(z
i
[x
1
, . . . , x
n
)
= H(z
i
) −H(z
i
[x
i
)
= I(x
i
; z
i
) (2.77)
In this case, the chain rule may be written as:
I(x
1
, . . . , x
n
; z
1
, . . . , z
n
) =
n
i=1
I(x
i
; z
i
[z
1
, . . . , z
i−1
) (2.78)
It can be shown (through Jensen’s inequality) that mutual information is nonneg
ative, i.e., I(x; z) ≥ 0, with equality if and only if x and z are independent.
4
Since
I(x; z) = H(x) −H(x[z), this implies that H(x) ≥ H(x[z), i.e., that conditioning on a
random variable reduces entropy. The following example illustrates that conditioning
on a particular value of a random variable may not reduce entropy.
Example 2.1. Suppose we want to infer a state x ∈ ¦0, . . . , N¦ where p(x = 0) =
(N − 1)/N and p(x = i) = 1/N
2
, i ,= 0 (assume N > 1). Calculating the entropy, we
obtain H(x) = log N −
1
N
log
(N−1)
N−1
N
=
1
N
log N +
1
N
log
N
N
(N−1)
N−1
> 0. We have an
observation z ∈ ¦0, 1¦, with p(z = 0[x = 0) = 1 and p(z = 0[x ,= 0) = 0. If we receive
the observation value z = 0, then we know that x = 0, hence H(x[z = 0) = 0 < H(x). If
we receive the observation value z = 1 then H(x[z = 1) = log N > H(x). Conditioning
on the random variable, we ﬁnd H(x[z) =
1
N
log N < H(x).
2.3.3 KullbackLeibler distance
The relative entropy or KullbackLeibler (KL) distance is a measure of the diﬀerence
between two probability distributions, deﬁned as:
D(p(x)[[q(x)) =
_
p(x) log
p(x)
q(x)
dx (2.79)
4
This is also true for conditional MI, with conditioning on either an observation random variable or
an observation value, with equality iﬀ x and z are independent under the respective conditioning.
44 CHAPTER 2. BACKGROUND
Comparing Eq. (2.79) with Eq. (2.63), we obtain:
I(x; z) = D(p(x, z)[[p(x)p(z)) (2.80)
Manipulating Eq. (2.63) we also obtain:
I(x; z) =
_
p(z)
_
p(x[z) log
p(x[z)
p(x)
dxdz = E
z
D(p(x[z)[[p(x)) (2.81)
Therefore the MI between x and z is equivalent to the expected KL distance between
the posterior distribution p(x[z) and the prior distribution p(x).
Another interesting relationship can be obtained by considering the expected KL
distance between the posterior given a set of observations ¦z
1
, . . . , z
n
¦ from which we
may choose a single observation, and the posterior given the single observation z
u
(u ∈ ¦1, . . . , n¦):
ED(p(x[z
1
, . . . , z
n
)[[p(x[z
u
))
=
_
p(z
1
, . . . , z
n
)
_
p(x[z
1
, . . . , z
n
) log
p(x[z
1
, . . . , z
n
)
p(x[z
u
)
dxdz
1
, . . . , dz
n
=
_
p(x, z
1
, . . . , z
n
) log
p(x, z
1
, . . . , z
n
)p(z
u
)
p(x, z
u
)p(z
1
, . . . , z
n
)
dxdz
1
, . . . , dz
n
=
_
p(x, z
1
, . . . , z
n
)
_
log
p(x)p(z
u
)
p(x, z
u
)
+ log
p(z
1
, . . . , z
n
, x)
p(x)p(z
1
, . . . , z
n
)
_
dxdz
1
, . . . , dz
n
= −I(x; z
u
) + I(x; z
1
, . . . , z
n
)
Since the second term is invariant to the choice of u, we obtain the result that choosing
u to maximize the MI between the state x and the observation z
u
is equivalent to
minimizing the expected KL distance between the posterior distribution of x given all
observations and the posterior given only the chosen observation z
u
.
2.3.4 Linear Gaussian models
Entropy is closely related to variance for Gaussian distributions. The entropy of
an ndimensional multivariate Gaussian distribution with covariance P is equal to
1
2
log [2πeP[ =
n
2
log 2πe +
1
2
log [P[. Thus, under linearGaussian assumptions, mini
mizing conditional entropy is equivalent to minimizing the determinant of the posterior
covariance, or the volume of the uncertainty hyperellipsoid.
Suppose x and z are jointly Gaussian random variables with covariance:
P =
_
P
x
P
xz
P
T
xz
P
z
_
Sec. 2.3. Information theoretic objectives 45
Then the mutual information between x and z is given by:
I(x; z) =
1
2
log
[P
x
[
[P
x
−P
xz
P
−1
z
P
T
xz
[
(2.82)
=
1
2
log
[P
z
[
[P
z
−P
T
xz
P
−1
x
P
xz
[
(2.83)
In the classical linear Gaussian case (to which the Kalman ﬁlter applies), where z =
Hx +v and v ∼ ^¦v; 0, R¦ is independent of x,
I(x; z) =
1
2
log
[P
x
[
[P
x
−P
x
H
T
(HP
x
H
T
+R)
−1
HP
x
[
(2.84)
=
1
2
log
[P
−1
x
+H
T
R
−1
H[
[P
−1
x
[
(2.85)
=
1
2
log
[HP
x
H
T
+R[
[R[
(2.86)
Furthermore, if x, y and z are jointly Gaussian then:
I(x; z[y) = I(x; z[y = ψ) ∀ ψ (2.87)
The result of Eq. (2.87) is due to the fact that the posterior covariance in a Kalman
ﬁlter is not aﬀected by the observation value (as discussed in Section 2.1.2), and that
entropy and MI are uniquely determined by the covariance of a Gaussian distribution.
2.3.5 Axioms resulting in entropy
One may show that Shannon’s entropy is the unique (up to a multiplicative constant)
realvalued measure of the uncertainty in a discrete probability distribution which sat
isﬁes the following three axioms [7]. We assume x ∈ ¦x
1
, . . . , x
n
¦ where n < ∞, and
use the notation p
x
i
= P[x = x
i
], and H(p
x
1
, . . . , p
xn
) = H(x).
1. H(p
x
1
, . . . , p
xn
) is a continuous
5
function of the probability distribution
(p
x
1
, . . . , p
xn
) (deﬁned for all n).
2. H(p
x
1
, . . . , p
xn
) is permutation symmetric, i.e., if π(x) is a permutation of x then
H(p
x
1
, . . . , p
xn
) = H(p
π(x
1
)
, . . . , p
π(xn)
).
3. If x
n,1
and x
n,2
partition the event x
n
(so that p
x
n,1
+ p
x
n,2
= p
xn
> 0) then:
H(p
x
1
, . . . , p
x
n−1
, p
x
n,1
, p
x
n,2
) = H(p
x
1
, . . . , p
xn
) + p(x
n
)H
_
p
x
n,1
p
xn
,
p
x
n,2
p
xn
_
5
This condition can be relaxed to H(x) being a Lebesgue integrable function of p(x) (see [7]).
46 CHAPTER 2. BACKGROUND
The ﬁnal axiom relates to additivity of the measure: in eﬀect it requires that, if we
receive (in y) part of the information in a random variable x, then the uncertainty in
x must be equal to the uncertainty in y plus the expected uncertainty remaining in x
after y has been revealed.
2.3.6 Formulations and geometry
A common formulation for using entropy as an objective for sensor resource management
is to seek to minimize the joint entropy of the state to be estimated over a rolling horizon.
In this case, the canonical problem that we seek to solve is to ﬁnd at time k the non
stationary policy π
k
= ¦µ
k
, . . . , µ
k+N−1
¦ which minimizes the expected entropy over
the next N time steps conditioned on values of the observations already received:
π
k
= arg min
µ
k
,...,µ
k+N−1
H(x
k
, . . . , x
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
, z
µ
k
(X
k
)
k
, . . . , z
µ
k+N−1
(X
k+N−1
)
k+N−1
) (2.88)
where X
k
an appropriate choice of the decision state (discussed below). If we have a
problem involving estimation of the state of multiple objects, we simply deﬁne x
k
to be
the joint state of the objects. Applying Eq. (2.72), we obtain the equivalent formulation:
[25]
π
k
= arg min
µ
k
,...,µ
k+N−1
_
H(x
k
, . . . , x
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
)
−I(x
k
, . . . , x
k+N−1
; z
µ
k
(X
k
)
k
, . . . , z
µ
k+N−1
(X
k+N−1
)
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
)
_
(2.89)
= arg max
µ
k
,...,µ
k+N−1
I(x
k
, . . . , x
k+N−1
; z
µ
k
(X
k
)
k
, . . . , z
µ
k+N−1
(X
k+N−1
)
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
) (2.90)
= arg max
µ
k
,...,µ
k+N−1
k+N−1
l=k
I(x
l
; z
µ
l
(X
l
)
l
[ˇ z
0
, . . . , ˇ z
k−1
, z
µ
k
(X
k
)
k
, . . . , z
µ
l−1
(X
l−1
)
l−1
) (2.91)
Eq. (2.91) results from applying Eq. (2.78), assuming that the observations at time i
are independent of each other and the remainder of the state conditioned on the state
at time i.
This problem can be formulated as a MDP in which the reward per stage is chosen
to be g
k
(X
k
, u
k
) = I(x
k
; z
u
k
k
[ˇ z
0
, . . . , ˇ z
k−1
). We choose the decision state to be the
conditional PDF X
k
= p(x
k
[ˇ z
0:k−1
). Although conditioning is denoted on the history
of observations ¦ˇ z
0
, . . . , ˇ z
k−1
¦, the current conditional PDF is a suﬃcient statistic for
all observations in calculation of the reward and the decision state at the next time.
The structure of this MDP is similar to a POMDP in that the decision state is the
conditional PDF of the underlying state. However, the reward per stage cannot be
Sec. 2.3. Information theoretic objectives 47
expressed as a linear function of the conditional PDF, and the reward to go is not
piecewise linear convex.
From [23], the mutual information I(x; z) is a concave function of p(x) for a given
p(z[x), and a convex function of p(z[x) for a given p(x). Accordingly, by taking the
maximum of the reward per stage over several diﬀerent candidate observations, we are
taking the pointwise maximum of several diﬀerent concave functions of the PDF p(x),
which will in general result in a function which is nonconcave and nonconvex. This is
illustrated in the following example.
Example 2.2. Consider the problem in which the underlying state x
k
∈ ¦−1, 0, 1¦ has
a uniform prior distribution, and transitions according to the rule p(x
k
= i[x
k−1
= i) =
1 −2, and p(x
k
= i[x
k−1
= j) = , i ,= j. Assume we have two observations available
to us, z
1
k
, z
2
k
∈ ¦0, 1¦, with the following models:
p(z
1
k
= 1[x
k
) =
_
¸
¸
¸
_
¸
¸
¸
_
1 −δ, x
k
= −1
δ, x
k
= 0
0.5, x
k
= 1
p(z
2
k
= 1[x
k
) =
_
¸
¸
¸
_
¸
¸
¸
_
0.5, x
k
= −1
δ, x
k
= 0
1 −δ, x
k
= 1
Contour plots of the optimal reward to go function for a single time step and for four
time steps are shown in Fig. 2.1, with = 0.075 and δ = 0.1. The diagrams illustrate
the nonconcave structure which results.
The structure underlying the maximization problem within a single stage can also
be revealed through these basic geometric observations. For example, suppose we are
choosing between diﬀerent observations z
1
and z
2
which share the same cardinality and
have models p(z
1
[x) and p(z
2
[x). Consider a continuous relaxation of the problem in
which we deﬁne the “quasiobservation” z
α
with p(z
α
[x) = αp(z
1
[x) + (1 − α)p(z
2
[x)
for α ∈ [0, 1]:
u = arg max
α∈[0,1]
I(x; z
α
)
In this case, the observation model p(z
α
[x) is a linear function of α and, as discussed
above, the mutual information I(x; z
α
) is a convex function of the observation model
p(z
α
[x) for a given prior distribution p(x). Thus this is a convex maximization, which
only conﬁrms that the optimal solution lies at an integer point, again exposing the
combinatorial complexity of the problem. This is illustrated in the following example.
Example 2.3. Consider a single stage of the example from Example 2.2. Assume a
prior distribution p(x
k
= −1) = 0.5 and p(x
k
= 0) = p(x
k
= 1) = 0.25. The mutual
48 CHAPTER 2. BACKGROUND
information of the state and the quasiobservation z
α
is shown as a function of α in
Fig. 2.2. The convexity of the function implies that gradientbased methods will only
converge to an extreme point, not necessarily to a solution which is good in any sense.
It should be noted that these observations relate to a particular choice of parameter
ization and continuous relaxation. Given the choice of objective, however, there are no
known parameterizations which avoid these diﬃculties. Furthermore, there are known
complexity results: for example, selection of the best nelement subset of observations
to maximize MI is NPcomplete [46].
2.4 Set functions, submodularity and greedy heuristics
Of the methods discussed in the previous section, the greedy heuristic and extensions
thereof provide the only generally applicable solution to the sensor management problem
which is able to handle problems involving a large state space. The remarkable charac
teristic of this algorithm is that, in certain circumstances, one can establish bounds on
the loss of performance for using the greedy method rather than the optimal method
(which is intractable). Sections 2.4.1, 2.4.2 and 2.4.3 provide the theoretical background
required to derive these bounds, mostly from [75] and [26], after which Sections 2.4.4
and 2.4.5 present proofs of the bounds from existing literature. These bounds will be
adapted to alternative problem structures in Chapter 3.
Throughout this section (and Chapter 3) we assume open loop control, i.e., we
make all of our observation selections before any observation values are received. In
practice, the methods described could be employed in an OLFC manner, as described
in Section 2.2.2.
2.4.1 Set functions and increments
A set function is a realvalued function which takes as its input subsets of a given set.
For example, consider the function f : 2
U
→R (where 2
U
denotes the set of subsets of
the ﬁnite set ) deﬁned as:
f(/) = I(x; z
A
)
where z
A
denotes the observations corresponding to the set / ⊆ . Thus f(/) would
denote the information learned about the state x by obtaining the set of observations
z
A
.
Deﬁnition 2.1 (Nonnegative). A set function f is nonnegative if f(/) ≥ 0 ∀ /.
p
(
x
k
=
−
1
)
p(x
k
=0)
Optimal reward for single time step
0 0.2 0.4 0.6 0.8
0
0.2
0.4
0.6
0.8
p
(
x
k
=
−
1
)
p(x
k
=0)
Optimal reward for four time steps
0 0.2 0.4 0.6 0.8
0
0.2
0.4
0.6
0.8
Figure 2.1. Contour plots of the optimal reward to go function for a single time step and
for four time steps. Smaller values are shown in blue while larger values are shown in red.
0 0.2 0.4 0.6 0.8 1
0
0.1
0.2
0.3
0.4
α
I(x;z
α
)
Figure 2.2. Reward in single stage continuous relaxation as a function of the parameter
α.
50 CHAPTER 2. BACKGROUND
Deﬁnition 2.2 (Nondecreasing). A set function f is nondecreasing if
f(B) ≥ f(/) ∀ B ⊇ /.
Obviously, a nondecreasing function will be nonnegative iﬀ f(∅) ≥ 0. MI is an
example of a nonnegative, nondecreasing function, since I(x; ∅) = 0 and I(x; z
B
) −
I(x; z
A
) = I(x; z
B\A
[z
A
) ≥ 0. Intuitively, this corresponds to the notion that includ
ing more observations must increase the information (i.e., on average, the entropy is
reduced).
Deﬁnition 2.3 (Increment function). We denote the single element increment by
ρ
j
(/) f(/∪ ¦j¦) −f(/), and the set increment by ρ
B
(/) f(/∪ B) −f(/).
Applying the chain rule in reverse, the increment function for MI is equivalent to
ρ
B
(/) = I(x; z
B
[z
A
).
2.4.2 Submodularity
Submodularity captures the notion that as we select more observations, the value of the
remaining unselected observations decreases, i.e., the notion of diminishing returns.
Deﬁnition 2.4 (Submodular). A set function f is submodular if f(( ∪ /) − f(/) ≥
f(( ∪ B) −f(B) ∀ B ⊇ /.
From Deﬁnition 2.3, we note that ρ
C
(/) ≥ ρ
C
(B) ∀ B ⊇ / for any increment function
ρ arising from a submodular function f. The following lemma due to Krause and
Guestrin [46] establishes conditions under which mutual information is a submodular
set function.
Lemma 2.1. If the observations are conditionally independent conditioned on the state,
then the mutual information between the state and the subset of observations selected is
submodular.
Sec. 2.4. Set functions, submodularity and greedy heuristics 51
Proof. Consider B ⊇ /:
I(x; z
C∪A
) −I(x; z
A
)
(a)
= I(x; z
C\A
[z
A
)
(b)
= I(x; z
C\B
[z
A
) + I(x; z
C∩(B\A)
[z
A∪(C\B)
)
(c)
≥ I(x; z
C\B
[z
A
)
(d)
= H(z
C\B
[z
A
) −H(z
C\B
[x, z
A
)
(e)
= H(z
C\B
[z
A
) −H(z
C\B
[x)
(f)
≥ H(z
C\B
[z
B
) −H(z
C\B
[x)
(g)
= I(x; z
C\B
[z
B
)
(h)
= I(x; z
C∪B
) −I(x; z
B
)
(a), (b) and (h) result from the chain rule, (c) from nonnegativity, (d) and (g) from
the deﬁnition of mutual information, (e) from the assumption that observations are
independent conditioned on x, and (f) from the fact that conditioning reduces entropy.
The simple result that we will utilize from submodularity is that I(x; z
C
[z
A
) ≥
I(x; z
C
[z
B
) ∀ B ⊇ /. As discussed above, this may be intuitively understood as the
notion of diminishing returns: that the new observations z
C
are less valuable if the set
of observations already obtained is larger.
The proof of Lemma 2.1 relies on the fact that conditioning reduces entropy. While
this is true on average, it is necessarily not true for every value of the conditioning vari
able. Consequently, our proofs exploiting submodularity will apply to open loop control
(where the value of future actions is averaged over all values of current observations)
but not closed loop control (where the choice of future actions may change depending
on the values of current observations).
Throughout this document, we will assume that the reward function f is nonnega
tive, nondecreasing and submodular, properties which mutual information satisﬁes.
2.4.3 Independence systems and matroids
In many problems of interest, the set of observation subsets that we may select possesses
particular structure. Independence systems provide a basic structure for which we can
construct any valid set iteratively by commencing with an empty set and adding one
52 CHAPTER 2. BACKGROUND
element at a time in any order, maintaining a valid set at all times. The essential
characteristic is therefore that any subset of a valid set is valid.
Deﬁnition 2.5 (Independence system). (, F) is an independence system if F is a
collection of subsets of  such that if / ∈ F then B ∈ F ∀ B ⊆ /. The members of
F are termed independent sets, while subsets of  which are not members of F are
termed dependent sets.
The following example illustrates collections of sets which do and do not form inde
pendence systems.
Example 2.4. Let  = ¦a, b, c, d¦, F
1
= ¦∅, ¦a¦, ¦b¦, ¦c¦, ¦d¦, ¦a, b¦, ¦a, c¦, ¦a, d¦,
¦b, c¦, ¦b, d¦, ¦c, d¦¦, F
2
= ¦∅, ¦a¦, ¦b¦, ¦c¦, ¦d¦, ¦a, b¦, ¦a, c¦, ¦a, d¦, ¦b, c¦, ¦a, b, c¦¦
and F
3
= ¦∅, ¦a¦, ¦b¦, ¦c¦, ¦a, b¦, ¦c, d¦¦. Then (, F
1
) and (, F
2
) are independence
systems, while (, F
3
) is not since ¦d¦ ⊆ ¦c, d¦ and ¦c, d¦ ∈ F
3
, but ¦d¦ / ∈ F
3
.
We construct a subset by commencing with an empty set (/
0
= ∅) and adding a new
element at each iteration (/
i
= /
i−1
∪ ¦u
i
¦), ensuring that /
i
∈ F ∀ i. When (, F)
is an independence system we are guaranteed that, if for some i, /
i
∪ ¦u¦ / ∈ F then
/
j
∪ ¦u¦ / ∈ F ∀ j > i, i.e., if we cannot add an element at a particular iteration, then
we cannot add it at any later iteration either. The collection F
3
in the above example
violates this: we cannot extend /
0
= ∅ with the element d in the ﬁrst iteration, yet
if we choose element c in the ﬁrst iteration, then we can extend /
1
= ¦c¦ with the
element d in the second iteration.
The iterative process for constructing a set terminates when we reach a point where
adding any more elements yields a dependent (i.e., invalid) set. Such a set is referred
to as being maximal.
Deﬁnition 2.6 (Maximal). A set / ∈ F is maximal if /∪ ¦b¦ / ∈ F ∀ b ∈ ¸/.
Matroids are a particular type of independence system for which eﬃcient optimiza
tion algorithms exist for certain problems. The structure of a matroid is analogous
with the that structure results from associating each element of  with a column of a
matrix. Independent sets correspond to subsets of columns which are linearly indepen
dent, while dependent sets correspond to columns which are linearly dependent. We
illustrate this below Example 2.5.
Deﬁnition 2.7 (Matroid). A matroid (, F) is an independence system in which, for
all ^ ⊆ , all maximal sets of the collection F
N
¦/ ∈ F[/ ⊆ ^¦ have the same
cardinality.
Sec. 2.4. Set functions, submodularity and greedy heuristics 53
The collection F
N
represents the collection of sets in F whose elements are all
contained in ^. Note that the deﬁnition does not require ^ ∈ F. The following
lemma establishes an equivalent deﬁnition of a matroid.
Lemma 2.2. An independence system (, F) is a matroid if and only if ∀ /, B ∈ F
such that [/[ < [B[, ∃ u ∈ B¸/ such that /∪ ¦u¦ ∈ F.
Proof. Only if: Consider F
A∪B
for any /, B ∈ F with [/[ < [B[. Since B ∈ F
A∪B
, /
cannot be maximal in F
A∪B
, hence ∃ u ∈ B¸/ such that /∪ ¦u¦ ∈ F.
If: Consider F
N
for any ^ ⊆ . Let / and B be two maximal sets in F
N
;
note that /, B ⊆ . If [/[ < [B[ then ∃ u ∈ B¸/ such that / ∪ ¦u¦ ∈ F, hence
/∪ ¦u¦ ∈ F
N
(noting the deﬁnition of F
N
), contradicting the maximality of /.
The following example illustrates independence systems which are and are not ma
troids.
Example 2.5. Let  = ¦a, b, c, d¦, F
1
= ¦∅, ¦a¦, ¦b¦, ¦c¦, ¦d¦, ¦a, c¦, ¦a, d¦, ¦b, c¦,
¦b, d¦¦, F
2
= ¦∅, ¦a¦, ¦b¦, ¦c¦, ¦d¦, ¦a, b¦, ¦a, c¦, ¦a, d¦, ¦b, c¦, ¦b, d¦, ¦c, d¦¦ and F
3
=
¦∅, ¦a¦, ¦b¦, ¦c¦, ¦d¦, ¦a, b¦, ¦c, d¦¦. Then (, F
1
) and (, F
2
) are matroids, while
(, F
3
) is not (applying Lemma 2.2 with / = ¦a¦ and B = ¦c, d¦).
As discussed earlier, the structure of a matroid is analogous with the structure of
linearly independent columns in a matroid. As an illustration of this, consider the
matrices M
i
= [c
a
i
c
b
i
c
c
i
c
d
i
], where c
u
i
is the matrix column associated with element u in
the matrix corresponding to F
i
. In the case of F
1
, the columns are such that c
a
1
and
c
b
1
are linearly dependent, and c
c
1
and c
d
1
are linearly dependent, e.g.,
c
a
1
=
_
1
0
_
c
b
1
=
_
1
0
_
c
c
1
=
_
0
1
_
c
d
1
=
_
0
1
_
This prohibits elements a and b from being selected together, and similarly for c and d.
In the case of F
2
, the columns are such that any two are linearly independent, e.g.,
c
a
2
=
_
1
0
_
c
b
2
=
_
0
1
_
c
c
2
=
_
1
1
_
c
d
2
=
_
1
−1
_
The independence systems corresponding to each of the following problems may be
seen to be a matroid:
• Selecting the best observation out of a set of at each time step over an Nstep
planning horizon (e.g., F
1
above)
54 CHAPTER 2. BACKGROUND
• Selecting the best kelement subset of observations out of a set of n (e.g., F
2
above)
• The combination of these two: selecting the best k
i
observations out of a set of
n
i
at each time step i over an Nstep planning horizon
• Selecting up to k
i
observations out of a set of n
i
at each time step i over an Nstep
planning horizon, such that no more than K observations selected in total
2.4.4 Greedy heuristic for matroids
The following is a simpliﬁed version of the theorem in [77] which establishes the more
general result that the greedy algorithm applied to an independence system resulting
from the intersection of P matroids achieves a reward no less than
1
P+1
the reward
achieved by the optimal set. The proof has been simpliﬁed extensively in order to
specialize to the single matroid case; the simpliﬁcations illuminate the possibility of
a diﬀerent alternative theorem which will be possible for a diﬀerent style of selection
structure as discussed in Chapter 3. To our knowledge this bound has not previously
been applied in the context of maximizing information.
Deﬁnition 2.8. The greedy algorithm for selecting a set of observations in a matroid
(, F) commences by setting (
0
= ∅. At each stage i = ¦1, 2, . . . ¦, the new element
selected is:
g
i
= arg max
u∈U\G
i−1
G
i−1
∪{u}∈F
ρ
u
((
i−1
)
where (
i
= (
i−1
∪ ¦g
i
¦. The algorithm terminates when the set (
i
is maximal.
Theorem 2.3. Applied to a submodular, nondecreasing function f() on a matroid,
the greedy algorithm in Deﬁnition 2.8 achieves a reward no less than 0.5 the reward
achieved by the optimal set.
Proof. Denote by O the optimal set, and by ( the set chosen by the algorithm in
Deﬁnition 2.8. Without loss of generality, assume that [O[ = [([ (since all maximal sets
have the same cardinality, and since the objective is nondecreasing). Deﬁne N [O[,
O
N
= O, and for each i = N − 1, N − 2, . . . , 1, take o
i
∈ O
i
such that o
i
/ ∈ (
i−1
, and
(
i−1
∪ ¦o
i
¦ ∈ F (such an element exists by Lemma 2.2). Deﬁne O
i−1
O
i
¸¦o
i
¦. The
Sec. 2.4. Set functions, submodularity and greedy heuristics 55
bound can be obtained through the following steps:
f(O) −f(∅)
(a)
≤ f(( ∪ O) −f(∅)
(b)
≤ f(() +
j∈O\G
ρ
j
(() −f(∅)
(c)
≤ f(() +
j∈O
ρ
j
(() −f(∅)
(d)
= f(() +
N
i=1
ρ
o
i
(() −f(∅)
(e)
≤ f(() +
N
i=1
ρ
o
i
((
i−1
) −f(∅)
(f)
≤ f(() +
N
i=1
ρ
g
i
((
i−1
) −f(∅)
(g)
= 2(f(() −f(∅))
where (a) and (c) result from the nondecreasing property, (b) and (e) result
from submodularity, (d) is a simple rearrangement using the above construction
O = ¦o
1
, . . . , o
N
¦, (f) is a consequence of the structure of the greedy algorithm in
Deﬁnition 2.8, and (g) is a simple rearrangement of (f).
2.4.5 Greedy heuristic for arbitrary subsets
The following theorem specializes the previous theorem to the case in which F consists
of all Kelement subsets of observations, as opposed to an arbitrary matroid. In this
case, the bound obtainable is tighter. The theorem comes from [76], which addresses
the more general case of a possibly decreasing reward function. Krause and Guestrin
[46] ﬁrst recognized that the bound can be applied to sensor selection problems with
information theoretic objectives.
Theorem 2.4. Suppose that F = ¦^ ⊆  s.t. [^[ ≤ K¦. Then the greedy algorithm
applied to the nondecreasing submodular function f() on the independence system
(, F) achieves a reward no less than (1 − 1/e) ≈ 0.632 the reward achieved by the
optimal set.
56 CHAPTER 2. BACKGROUND
Proof. Denote by O the optimal Kelement subset, and by ( the set chosen by the
algorithm in Deﬁnition 2.8. For each stage of the greedy algorithm, i ∈ ¦1, . . . , K¦, we
can write from line (b) of the proof of Theorem 2.3:
f(O) ≤ f((
i
) +
j∈O\G
i
ρ
j
((
i
)
By deﬁnition of g
i+1
in Deﬁnition 2.8, ρ
g
i+1
((
i
) ≥ ρ
j
((
i
) ∀ j, hence
f(O) ≤ f((
i
) +
j∈O\G
i
ρ
g
i+1
((
i
)
= f((
i
) +[O¸(
i
[ρ
g
i+1
((
i
)
≤ f((
i
) + Kρ
g
i+1
((
i
)
By deﬁnition of the increment function ρ, we can write
f((
i
) = f(∅) +
i
j=1
ρ
g
j
((
j−1
)
where we use the convention that (
0
= ∅. Thus we can write ∀ i ∈ ¦1, . . . , K¦:
f(O) −f(∅) ≤
i
j=1
ρ
g
j
((
j−1
) + Kρ
g
i+1
((
i
) (2.92)
Now consider the linear program in variables ρ
1
, . . . , ρ
K+1
, parameterized by Z:
P(Z) =min
K
j=1
ρ
j
(2.93)
s.t. Z ≤
i
j=1
ρ
j
+ Kρ
i+1
, i ∈ ¦1, . . . , K¦
Taking the dual of the linear program:
D(Z) =max
K
j=1
Zx
j
s.t. Kx
i
+
K
j=i+1
x
j
= 1, i ∈ ¦1, . . . , K¦
The system of constraints has a single solution which can found through a backward
recursion commencing with x
K
:
x
K
=
1
K
, x
K−1
=
K −1
K
2
, . . . , x
1
=
(K −1)
K−1
K
K
Sec. 2.5. Linear and integer programming 57
which yields
D(Z) = Z
_
1 −
_
K −1
K
_
K
_
which, by strong duality, is also the solution to P(Z). Thus, any set O which respects
the series of inequalities in Eq. (2.92) must have
K
j=1
ρ
g
j
((
j−1
) = f(() −f(∅) ≥
_
1 −
_
K −1
K
_
K
_
[f(O) −f(∅)]
Finally, note that
_
1 −
_
K −1
K
_
K
_
> 1 −1/e ∀ K > 1;
_
1 −
_
K −1
K
_
K
_
→ 1 −1/e, K → ∞
Thus
f(() −f(∅) ≥ [1 −1/e][f(O) −f(∅)] ≥ 0.632[f(O) −f(∅)]
2.5 Linear and integer programming
Chapter 4 will utilize an integer programming formulation to solve certain sensor re
source management problems with manageable complexity. This section brieﬂy outlines
the idea behind linear and integer programming; the primary source for the material is
[12].
2.5.1 Linear programming
Linear programming is concerned with solving problems of the form:
min
x
c
T
x
s.t. Ax ≥ b
(2.94)
Any problem of the form Eq. (2.94) can be converted to the standard form:
min
x
c
T
x
s.t. Ax = b
x ≥ 0
(2.95)
58 CHAPTER 2. BACKGROUND
The two primary mechanisms for solving problems of this type are simplex methods
and interior point methods. Primaldual methods simultaneously manipulate both the
original problem (referred to as the primal problem), and a dual problem which provides
a lower bound on the optimal objective value achievable. The diﬀerence between the
current primal solution and the dual solution is referred to as the duality gap; this
provides an upper bound on the improvement possible through further optimization.
Interior point methods have been observed to be able to reduce the duality gap by a
factor γ in a problem of size n in an average number of iterations of O(log nlog γ); the
worstcase behavior is O(
√
nlog γ).
2.5.2 Column generation and constraint generation
In many large scale problems, it is desirable to be able to ﬁnd a solution without
explicitly considering all of the optimization variables. Column generation is a method
which is used alongside the revised simplex method to achieve this goal. The method
involves the iterative solution of a problem involving a small subset of the variables in
the full problem. We assume availability of an eﬃcient algorithm that tests whether
incorporation of additional variables would be able to improve on the present solution.
Occasionally this algorithm is executed, producing additional variables to be added to
the subset. The process terminates when the problem involving the current subset of
variables reaches an optimal solution and the algorithm producing new variables ﬁnds
that there are no more variables able to produce further improvement.
Constraint generation is commonly used to solve large scale problems without ex
plicitly considering all of the constraints. The method involves iterative solution of a
problem involving the a small subset of the constraints in the full problem. We assume
availability of an eﬃcient algorithm that tests whether the current solution violates
any of the constraints in the full problem, and returns one or more violated constraints
if any exist. The method proceeds by optimizing the problem involving the subset of
constraints, occasionally executing the algorithm to produce additional constraints that
were previously violated. The process terminates when we ﬁnd an optimal solution to
the subproblem which does not violate any constraints in the full problem. Constraint
generation may be interpreted as column generation applied to the dual problem.
Sec. 2.5. Linear and integer programming 59
2.5.3 Integer programming
Integer programming deals with problems similar to Eqs. (2.94) and (2.95), but in which
the optimization variables are constrained to take on integer values:
min
x
c
T
x
s.t. Ax = b
x ≥ 0
x integer
(2.96)
Some problem structures possess a property in which, when the integer constraint is
relaxed, there remains an integer point that attains the optimal objective. A common
example is network ﬂow problems in which the problem data takes on integer values.
In this case, solution of the linear program (with the integrality constraint relaxed)
can provide the optimal solution to the integer program. In general the addition of
the integer constraint dramatically increases the computational complexity of ﬁnding a
solution.
Relaxations
Two diﬀerent relaxations are commonly used in integer programming: the linear pro
gramming relaxation, and the Lagrangian relaxation. The linear programming relax
ation is exactly that described in the previous section: solving the linear program which
results from relaxing the integrality constraint. If the problem possesses the necessary
structure that there is an integer point that attains the optimal objective in the relaxed
problem, then this point is also optimal in the original integer programming problem.
In general this will not be the case, however the solution of the linear programming
relaxation provides a lower bound to the solution of the integer program (since a wider
range of solutions is considered).
As described in Section 2.2.3, Lagrangian relaxation involves solution of the La
grangian dual problem. By weak duality, the dual problem also provides a lower bound
on the optimal cost attainable in the integer program. However, since the primal prob
lem involves a discrete optimization strong duality does not hold, and there may be
a duality gap (i.e., in general there will not be an integer programming solution that
obtains the same cost as the solution of the Lagrangian dual). It can be shown that
the Lagrangian relaxation provides a tighter lower bound than the linear programming
relaxation.
60 CHAPTER 2. BACKGROUND
Cutting plane methods
Let A be the set of feasible solutions to the integer program in Eq. (2.96) (i.e., the
integer points which satisfy the various constraints), and let CH(A) be the convex hull
of these points. Then the optimal solution of Eq. (2.96) is also an optimal solution of
the following linear program:
min
x
c
T
x
s.t. x ∈ CH(A)
(2.97)
Cutting plane methods, one of the two most common methods for solving integer pro
grams, exploit this fact. We solve a series of linear programs, commencing with the
linear programming relaxation. If, at any stage, we ﬁnd an integer solution that is op
timal, this is the optimal solution to the original problem. At each iteration, we add a
constraint that is violated by the solution of the linear program in the current iteration,
but is satisﬁed by every integer solution in the the original problem (i.e., every point in
A). Thus the feasible region is slowly reduced, and approaches CH(A). There are two
diﬃculties associated with this method: ﬁrstly, it can be diﬃcult to ﬁnd constraints
with the necessary characteristics; and secondly, it may be necessary to generate a very
large number of constraints in order to obtain the integer solution.
Branch and bound
Branch and bound is the other commonly used method for solving integer programming
problems. The basic concept is to divide the feasible region into subregions, and
simultaneously search for “good” feasible solutions and tight lower bounds within in
each subregion. Any time we ﬁnd that a subregion R has a lower bound that is greater
than a feasible solution found in another subregion, the region R can be discarded. For
example, suppose that we are dealing with a binary problem, where the feasible set is
A = ¦0, 1¦
N
. Suppose we branch on variable x
1
, i.e., we divide the feasible region up
into two subregions, where in the ﬁrst we ﬁx x
1
= 0, and in the second we ﬁx x
1
= 1.
Suppose we ﬁnd a feasible solution within the subregion in which x
1
= 1 with objective
3.5, and, furthermore we ﬁnd that the objective of the subproblem in which we ﬁx the
value x
1
= 0 is bounded below by 4.5. Then the optimal solution cannot have x
1
= 0,
so this subregion may be discarded without further investigation. Linear programming
relaxations and cutting plane methods are commonly used in concert with a branch
and bound approach to provide the lower bounds.
Sec. 2.6. Related work 61
2.6 Related work
The attention received by sensor resource management has steadily increased over the
past two decades. The sections below summarize a number of strategies proposed by
diﬀerent authors. We coarsely categorize the material, although many of the methods
do not ﬁt precisely into any one category. We conclude in Section 2.6.7 by contrasting
our approach to the work described.
2.6.1 POMDP and POMDPlike models
In [55, 56, 57], Krishnamurthy and Evans present two sensor management methods
based upon POMDP methods. In [56, 57], the beam scheduling problem is cast as
a multiarm bandit, assuming that the conditional PDF of unobserved objects remains
unchanged between decision intervals (this is slightly less restrictive than requiring the
state itself to remain unchanged). Under these assumptions, it is proven that there
exists an optimal policy in the form of index rule, and that the index function is piece
wise linear concave. In [55], Krishnamurthy proposes a similar method for waveform
selection problems in which the per stage reward is approximated by a piecewise linear
concave function of the conditional PDF. In this regime, the reward to go function
remains piecewise linear concave at each time in the backward recursion and POMDP
methods can be applied. In [56], it is suggested that the continuous kinematic quantities
could be discretized into coarse quantities such as “near” and “far”. A similar method
is proposed for adaptive target detection in [60]. Computational examples in [55, 59]
utilize underlying state space alphabets of three, six and 25. This reveals the primary
limitation of this category of work: its inability to address problems with large state
spaces. Many problems of practical interest cannot be represented with this restriction.
Casta˜ n´on [18] formulates the problem of beam scheduling and waveform selection
for identiﬁcation of a large number of objects as a constrained dynamic program. By
relaxing the sample path constraints to being constraints in expectation, a dual solution,
which decouples the problem into a series of single object problems coupled only through
the search for the correct values of the Lagrange multipliers, can be found using a
method similar to that discussed in Section 2.2.3. By requiring observations at diﬀerent
times in the planning horizon to have identical characteristics, observations needing
diﬀerent time durations to complete are naturally addressed. The method is extended
in [16] to produce a lower bound on the classiﬁcation error performance in a sensor
network. Again, the primary limitation of this method is the requirement for the state
space to be small enough that traditional POMDP solution methods should be able to
62 CHAPTER 2. BACKGROUND
address the decoupled single object problem. In the context of object identiﬁcation,
this state space alphabet size restriction precludes addition of latent states such as
object features (observations will often be dependent conditioned on the object class,
but the object state can be expanded to incorporate continuous object features in order
to regain the required conditional independence).
2.6.2 Model simpliﬁcations
Washburn, et al [93] observe that, after a minor transformation of the cost function,
the solution method from multiarm bandit problems method may be applied to beam
steering, assuming that the state of unobserved objects remains unchanged. The policy
based on this assumption is then used as a base policy in a rollout with a one or two
step lookahead. The authors also suggest methods for practical application such as
selecting as the decision state an estimate of the covariance matrix rather than condi
tional PDF, and simplifying stochastic disturbance to simple models such as detection
and no detection. The ideas are explored further in [85].
2.6.3 Suboptimal control
Common suboptimal control methods such as rollout [9] have also been applied to
sensor management problems. Nedich, et al [74] consider tracking movestop targets,
and utilize an approximation of the cost to go of a heuristic base policy which captures
the structure of the future reward for the given scenario. He and Chong [29] describe
how a simulationbased rollout method with an unspeciﬁed base policy could be used
in combination with particle ﬁltering.
2.6.4 Greedy heuristics and extensions
Many authors have approached the problem of waveform selection and beam steering
using greedy heuristics which choose at each time the action which maximizes some
instantaneous reward function. Information theoretic objectives are commonly used
with this method; this may be expressed equivalently as
• Minimizing the conditional entropy of the state at the current time conditioned
on the new observation (and on the values of the previous observations)
• Maximizing the mutual information between the state at the current time and
the new observation
Sec. 2.6. Related work 63
• Maximizing the KullbackLeibler distance between the prior and posterior distri
bution.
• Minimizing the KullbackLeibler distance between the posterior distribution and
the posterior distribution that would be obtained if all possible observations were
incorporated.
Use of these objectives in classiﬁcation problems can be traced back as early as [61].
Hintz [32] and Kastella [41] appear to be two of the earliest instances in sensor fusion
literature. Formally, if g
k
(X
k
, u
k
) is the reward for applying control u
k
at time k, then
the greedy heuristic operates according to the following rule:
µ
g
k
(X
k
) = arg max
u
k
∈U
X
k
k
g
k
(X
k
, u
k
) (2.98)
McIntyre and Hintz [69] apply the greedy heuristic, choosing mutual information as
the objective in each stage to trade oﬀ the competing tasks of searching for new objects
and maintaining existing tracks. The framework is extended to consider higher level
goals in [31]. Kershaw and Evans [42] propose a method of adaptive waveform selection
for radar/sonar applications. An analysis of the signal ambiguity function provides
a prediction of the of the posterior covariance matrix. The use of a greedy heuristic
is justiﬁed by a restriction to constant energy pulses. Optimization criteria include
posterior mean square error and validation gate volume. The method is extended to
consider the impact of clutter through Probabilistic Data Association (PDA) ﬁlter in
[43].
Mahler [66] discusses the use of a generalization of KullbackLeibler distance, global
Csisz´ar cdiscrimination, for sensor resource management within the ﬁnite set statistics
framework, i.e., where the number of objects is unknown and the PDF is invariant
to any permutation of the objects. Kreucher, et al [49, 50, 51, 52, 54] apply a greedy
heuristic to the sensor management problem in which the joint PDF of a varying number
of objects is maintained using a particle ﬁlter. The objective function used is Renyi
entropy; motivations for using this criterion are discussed in [50, 51]. Extensions to
a twostep lookahead using additional simulation, and a rollout approach using a
heuristic reward to go capturing the structure of longterm reward due to expected
visibility and obscuration of objects are proposed in [53].
Kolba, et al [44] apply the greedy heuristic with an information objective to land
mine detection, addressing the additional complexity which occurs when sensor motion
is constrained. Singh, et al [86] show how the control variates method can be used to
64 CHAPTER 2. BACKGROUND
reduce the variance of estimates of KullbackLeibler divergence (equivalent to mutual
information) for use in sensor management; their experiments use a greedy heuristic.
Kalandros and Pao [40] propose a method which allows for control of process co
variance matrices, e.g., ensuring that the covariance matrix of a process P meets a
speciﬁcation S in the sense that P−S _ 0. This objective allows the system designer
to dictate a more speciﬁc performance requirement. The solution method uses a greedy
heuristic.
Zhao, et al [95] discuss object tracking in sensor networks, proposing methods based
on greedy heuristics where the estimation objectives include nearest neighbor, Maha
lanobis distance, entropy and KullbackLeibler distance; inconsistencies with the mea
sures proposed in this paper are discussed in [25].
Chhetri, et al [21] examine scheduling of radar and IR sensors for object tracking
to minimize the mean square error over the next N time steps. The method proposed
utilizes a linearized Kalman ﬁlter for evaluation of the error predictions, and performs a
bruteforce enumeration of all sequences within the planning horizon. Experiments are
performed using planning horizons of one, two and three. In [20], the sensor network
object tracking problem is approached by minimizing energy consumption subject to
a constraint on estimation performance (measured using Cram´erRao bounds). The
method constructs an open loop plan by considering each candidate solution in ascend
ing order of cost, and evaluating the estimation performance until a feasible solution
is found (i.e., one which meets the estimation criterion). Computational examples use
planning horizons of one, two and three.
Logothetis and Isaksson [65] provide an algorithm for pruning the search tree in the
problems involving control of linear GaussMarkov systems with information theoretic
criteria. If two candidate sequences obtain covariance matrices P
1
and P
2
, and P
1
_
P
2
, then the total reward of any extension of the ﬁrst sequence will be less than or
equal to the total reward of the same extension of the second sequence, thus the ﬁrst
sequence can be pruned from the search tree. Computational examples demonstrate a
reduction of the tree width by a factor of around ﬁve.
Zwaga and Driessen examine the problem of selecting revisit rate and dwell time for a
multifunction radar to minimize the total duty cycle consumed subject to a constraint on
the postupdate covariance [97] and prediction covariance [96]. Both methods consider
only the current observation interval.
Sec. 2.6. Related work 65
2.6.5 Existing work on performance guarantees
Despite the popularity of the greedy heuristic, little work has been done to ﬁnd guar
antees of performance. In [17], Casta˜ n´on shows that a greedy heuristic is optimal for
the problem of dynamic hypothesis testing (e.g., searching for an object among a ﬁnite
set of positions) with symmetric measurement distributions (i.e., P[missed detection] =
P[false alarm]) according to the minimum probability of error criterion. In [33], Howard,
et al prove optimality of greedy methods for the problem of beam scheduling of inde
pendent onedimensional GaussMarkov processes when the cost per stage is set to the
sum of the error variances. The method does not extend to multidimensional processes.
Krause and Guestrin [46] apply results from submodular optimization theory to
establish the surprising and elegant result that the greedy heuristic applied to the sensor
subset selection algorithm (choosing the best nelement subset) is guaranteed to achieve
performance within a multiple of (1 − 1/e) of optimality (with mutual information as
the objective), as discussed in Section 2.4.5. A similar performance guarantee is also
established in [46] for the budgeted case, in which each observation incurs a cost, and
there is a maximum total cost that can be expended. For this latter bound to apply, it
is necessary to perform a greedy selection commencing from every threeelement subset.
The paper also establishes a theoretical guarantee that no polynomial time algorithm
can improve on the performance bound unless P = NP, and discusses issues which arise
when the reward to go values are computed approximately. This analysis is applied to
the selection of sensor placements in [47], and the sensor placement model is extended
to incorporate communication cost in [48].
2.6.6 Other relevant work
Berry and Fogg [8] discuss the merits of entropy as a criterion for radar control, and
demonstrate its application to sample problems. The suggested solutions include min
imizing the resources necessary to satisfy a constraint on entropy, and selecting which
targets to observe in a planning horizon in order to minimize entropy. No clear guid
ance is given on eﬃcient implementation of the optimization problem resulting from
either case. Moran, et al [70] discuss sensor management for radar, incorporating both
selection of which waveform from a library to transmit at any given time, and how to
design the waveform library a priori .
Hernandez, et al [30] utilize the posterior Cram´erRao bound as a criterion for
iterative deployment and utilization of sonar sensor resources for submarine tracking.
66 CHAPTER 2. BACKGROUND
Sensor positions are determined using a greedy
6
simulated annealing search, where the
objective is the estimation error at a particular time instant in the future. Selection
of the subset of sensors to activate during tracking is performed either using brute
force enumeration or a genetic algorithm; the subset remains ﬁxed in between sensor
deployments.
The problem of selecting the subset of sensors to activate in a single time slot to
minimize energy consumption subject to a mean square error estimation performance
constraint is considered in [22]. An integer programming formulation using branch and
bound techniques enables optimal solution of problems involving 50–70 sensors in tens
of milliseconds. The branch and bound method used exploits quadratic structure that
results when the performance criterion is based on two states (i.e., position in two
dimensions).
2.6.7 Contrast to our contributions
As outlined in Section 2.6.4, many authors have applied greedy heuristics and short
time extensions (e.g., using open loop plans over two or three time steps) to sensor
management problems using criteria such as mutual information, mean square error
and the posterior Cram´erRao bound. Thus it is surprising that little work has been
done toward obtaining performance guarantees for these methods. As discussed in Sec
tion 2.6.5, [46] is the ﬁrst generally applicable performance guarantee to a problem
with structure resembling the type which results to sensor management. However, this
result is not directly applicable to sensor management problems involving sequential
estimation (e.g., object tracking), where there is typically a set of observations corre
sponding to each time slot, the elements of which correspond to the modes in which we
may operate the sensor in that time slot, rather than a single set of observations. The
typical constraint structure is one which permits selection of one element (or a small
number of elements) from each of these sets, rather than a total of n elements from a
larger subset.
The analysis in Chapter 3 extends the performance guarantees of [46] to problems
involving this structure, providing the surprising result that a similar guarantee applies
to the greedy heuristic applied in a sequential fashion, even though future observa
tion opportunities are ignored. The result is applicable to a large class of timevarying
models. Several extensions are obtained, including tighter bounds that exploit either
6
The search is greedy in that proposed locations are accepted with probability 1 if the objective is
improved and probability 0 otherwise.
Sec. 2.6. Related work 67
process diﬀusiveness or objectives involving discount factors, and applicability to closed
loop problems. We also show that several of the results may be applied to the posterior
Cram´erRao bound. Examples demonstrate that the bounds are tight, and counterex
amples illuminate larger classes of problems to which they do not apply.
Many of the existing works that provide nonmyopic sensor management either rely
on very speciﬁc problem structure, and hence they are not generally applicable, or they
do not allow extensions to planning horizons past two or three time slots due to the
computational complexity. The development in Chapter 4 provides an integer pro
gramming approach that exploits submodularity to ﬁnd optimal or nearoptimal open
loop plans for problems involving multiple objects over much longer planning horizons;
experiments utilize up to 60 time slots. The method can be applied to any submod
ular, nondecreasing objective function, and does not require any speciﬁc structure in
dynamics or observation models.
Finally, Chapter 5 approaches the problem of sensor management in sensor networks
using a constrained dynamic programming formulation. The trade oﬀ between esti
mation performance and communication cost is formulated by maximizing estimation
performance subject to a constraint on energy cost, or the dual of this, i.e., minimizing
energy cost subject to a constraint on estimation performance. Heuristic approxima
tions that exploit the problem structure of tracking a single object using a network
of sensors again enable planning over dramatically increased horizons. The method is
both computable and scalable, yet still captures the essential structure of the underlying
trade oﬀ. Simulation results demonstrate a signiﬁcant reduction in the communication
cost required to achieve a given estimation performance level as compared to previously
proposed algorithms.
68 CHAPTER 2. BACKGROUND
Chapter 3
Greedy heuristics and
performance guarantees
T
HE performance guarantees presented in Section 2.4 apply to algorithms with a
particular selection structure: ﬁrstly, where one can select any set within a ma
troid, and secondly where one can select any arbitrary subset of a given cardinality.
This section develops a bound which is closely related to the matroid selection case,
except that we apply additional structure, in which we have N subsets of observations
and from each we can select a subset of a given cardinality. This structure is natu
rally applicable to dynamic models, where each subset of observations corresponds to
a diﬀerent time stage of the problem, e.g., we can select one observation at each time
stage. Our selection algorithm allows us to select at each time the observation which
maximizes the reward at that time, ignorant of the remainder of the time horizon. The
analysis establishes that the same performance guarantee that applies to the matroid
selection problem also applies to this problem.
We commence the chapter by deriving the simplest form of the performance guar
antee, with both online and oﬄine variants, in Section 3.1. Section 3.2 examines a
potentially tighter guarantee which exists for processes exhibiting diﬀusive characteris
tics, while Section 3.3 presents a similar guarantee for problems involving a discounted
objective. Section 3.5 then extends the results of Sections 3.1–3.3 to closed loop policies.
While the results in Sections 3.1–3.5 are presented in terms of mutual information,
they are applicable to a wider class of objectives. Sections 3.1 and 3.3 apply to any
submodular, nondecreasing objective for which the reward of an empty set is zero
(similar to the requirements for the results in Sections 2.4.4 and 2.4.5). The additional
requirements in Sections 3.2 and 3.5 are discussed as the results are presented. In
Section 3.6, we demonstrate that the log of the determinant of the Fisher information
matrix is also submodular and nondecreasing, and thus that the various guarantees of
69
70 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
Sections 2.4.4, 2.4.5, 3.1 and 3.3 can also be applied to the determinant of the covariance
through the Cram´erRao bound.
Section 3.8 presents examples of some slightly diﬀerent problem structures which do
not ﬁt into the structure examined in Section 3.1 and Section 3.2, but are still able to
be addressed by the prior work discussed in Section 2.4.4. Finally, Section 3.9 presents
a negative result regarding a particular extension of the greedy heuristic to platform
steering problems.
3.1 A simple performance guarantee
To commence, consider a simple sequential estimation problem involving two time steps,
where at each step we must choose a single observation (e.g., in which mode to operate
a sensor) from a diﬀerent set of observations. The goal is to maximize the information
obtained about an underlying quantity X. Let ¦o
1
, o
2
¦ denote the optimal choice for
the two stages, which maximizes I(X; z
o
1
1
, z
o
2
2
). Let ¦g
1
, g
2
¦ denote the choice made
by the greedy heuristic, where g
1
is chosen to maximize I(X; z
g
1
1
) and g
2
is chosen
to maximize I(X; z
g
2
2
[z
g
1
1
) (where conditioning is on the random variable z
g
1
1
, not on
the resulting observation value). Then the following analysis establishes a performance
guarantee for the greedy algorithm:
I(X; z
o
1
1
, z
o
2
2
)
(a)
≤ I(X; z
g
1
1
, z
g
2
2
, z
o
1
1
, z
o
2
2
)
(b)
= I(X; z
g
1
1
) + I(X; z
g
2
2
[z
g
1
1
) + I(X; z
o
1
1
[z
g
1
1
, z
g
2
2
) + I(x; z
o
2
2
[z
g
1
1
, z
g
2
2
, z
o
1
1
)
(c)
≤ I(X; z
g
1
1
) + I(X; z
g
2
2
[z
g
1
1
) + I(X; z
o
1
1
) + I(x; z
o
2
2
[z
g
1
1
)
(d)
≤ 2I(X; z
g
1
1
) + 2I(X; z
g
2
2
[z
g
1
1
)
(e)
= 2I(X; z
g
1
1
, z
g
2
2
)
where (a) results from the nondecreasing property of MI, (b) is an application of the
MI chain rule, (c) results from submodularity (assuming that all observations are inde
pendent conditioned on X), (d) from the deﬁnition of the greedy heuristic, and (e) from
a reverse application of the chain rule. Thus the optimal performance can be no more
than twice that of the greedy heuristic, or, conversely, the performance of the greedy
heuristic is at least half that of the optimal.
1
Theorem 3.1 presents this result in its most general form; the proof directly follows
the above steps. The following assumption establishes the basic structure: we have N
1
Note that this is considering only open loop control; we will discuss closed loop control in Section 3.5.
Sec. 3.1. A simple performance guarantee 71
sets of observations, and we can select a speciﬁed number of observations from each set
in an arbitrary order.
Assumption 3.1. There are N sets of observations, ¦¦z
1
1
, . . . , z
n
1
1
¦, ¦z
1
2
, . . . , z
n
2
2
¦, . . . ,
¦z
1
N
, . . . , z
n
N
N
¦¦, which are mutually independent conditioned on the quantity to be esti
mated (X). Any k
i
observations can be chosen out of the ith set (¦z
1
i
, . . . , z
n
i
i
¦). The
sequence (w
1
, . . . , w
M
) (where w
i
∈ ¦1, . . . , N¦ ∀ i) speciﬁes the order in which we visit
observation sets using the greedy heuristic (i.e., in the ith stage we select a previously
unselected observation out of the w
i
th set).
Obviously we require [¦j ∈ ¦1, . . . , M¦[w
j
= i¦[ = k
i
∀ i (i.e., we visit the ith set
of observations k
i
times, selecting a single additional observation at each time), thus
N
i=1
k
i
= M. The abstraction of the observation set sequence (w
1
, . . . , w
M
) allows us
to visit observation sets more than once (allowing us to select multiple observations
from each set) and in any order. The greedy heuristic operating on this structure is
deﬁned below.
Deﬁnition 3.1. The greedy heuristic operates according to the following rule:
g
j
= arg max
u∈{1,...,nw
j
}
I(X; z
u
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
We assume without loss of generality that the same observation is not selected twice
since the reward for selecting an observation that was already selected is zero. We are
now ready to state the general form of the performance guarantee.
Theorem 3.1. Under Assumption 3.1, the greedy heuristic in Deﬁnition 3.1 has per
formance guaranteed by the following expression:
I(X; z
o
1
w
1
, . . . , z
o
M
w
M
) ≤ 2I(X; z
g
1
w
1
, . . . , z
g
M
w
M
)
where ¦z
o
1
w
1
, . . . , z
o
M
w
M
¦ is the optimal set of observations, i.e., the one which maximizes
I(X; z
o
1
w
1
, . . . , z
o
M
w
M
).
72 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
Proof. The performance guarantee is obtained through the following steps:
I(X;z
o
1
w
1
, . . . , z
o
M
w
M
)
(a)
≤ I(X; z
g
1
w
1
, . . . , z
g
M
w
M
, z
o
1
w
1
, . . . , z
o
M
w
M
)
(b)
=
M
j=1
_
I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
M
w
M
, z
o
1
w
1
, . . . , z
o
j−1
w
j−1
)
_
(c)
≤
M
j=1
_
I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j
)
_
(d)
≤ 2
M
j=1
I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
(e)
= 2I(X; z
g
1
w
1
, . . . , z
g
M
w
M
)
where (a) is due to the nondecreasing property of MI, (b) is an application of the
MI chain rule, (c) results from submodularity, (d) from Deﬁnition 3.1, and (e) from a
reverse application of the chain rule.
3.1.1 Comparison to matroid guarantee
The prior work using matroids (discussed in Section 2.4.3) provides another algorithm
with the same guarantee for problems of this structure. However, to achieve the guar
antee on matroids it is necessary to consider every observation at every stage of the
problem. Computationally, it is far more desirable to be able to proceed in a dynamic
system by selecting observations at time k considering only the observations available
at that time, disregarding future time steps (indeed, all of the previous works described
in Section 2.6.4 do just that). The freedom of choice of the order in which we visit
observation sets in Theorem 3.1 extends the performance guarantee to this commonly
used selection structure.
3.1.2 Tightness of bound
The bound derived in Theorem 3.1 can be arbitrarily close to tight, as the following
example shows.
Example 3.1. Consider a problem with X = [a, b]
T
where a and b are independent
binary random variables with P(a = 0) = P(a = 1) = 0.5 and P(b = 0) = 0.5 −
; P(b = 1) = 0.5 + for some > 0. We have two sets of observations with n
1
= 2,
Sec. 3.1. A simple performance guarantee 73
n
2
= 1 and k
1
= k
2
= 1. In the ﬁrst set of observations we may measure z
1
1
= a for
reward I(X; z
1
1
) = H(a) = 1, or z
2
1
= b for reward I(X; z
2
1
) = H(b) = 1 − δ(), where
δ() > 0 ∀ > 0, and δ() → 0 as → 0. At the second stage we have one choice,
z
1
2
= a. Our walk is w = (1, 2), i.e., we visit the ﬁrst set of observations once, followed
by the second set.
The greedy algorithm selects at the ﬁrst stage to observe z
1
1
= a, as it yields a higher
reward (1) than z
2
1
= b (1 − δ()). At the second stage, the algorithm already has the
exact value for a, hence the observation at the second stage yields zero reward. The
total reward is 1.
Compare this result to the optimal sequence, which selects observation z
2
1
= b for
reward 1 −δ(), and then gains a reward of 1 from the second observation z
1
2
. The total
reward is 2 − δ(). By choosing arbitrarily close to zero, we may make the ratio of
optimal reward to greedy reward, 2 −δ(), arbitrarily close to 2.
3.1.3 Online version of guarantee
Modifying step (c) of Theorem 3.1, we can also obtain an online performance guarantee,
which will often be substantially tighter in practice (as demonstrated in Sections 3.1.4
and 3.1.5).
Theorem 3.2. Under the same assumptions as Theorem 3.1, for each i ∈ ¦1, . . . , N¦
deﬁne
¯
k
i
= min¦k
i
, n
i
−k
i
¦, and for each j ∈ ¦1, . . . ,
¯
k
i
¦ deﬁne
¯ g
j
i
= arg max
u∈{1,...,n
i
}−{¯ g
l
i
l<j}
I(X; z
u
i
[z
g
1
w
1
, . . . , z
g
M
w
M
) (3.1)
Then the following two performance guarantees, which are computable online, apply:
I(X; z
o
1
w
1
, . . . , z
o
M
w
M
) ≤ I(X; z
g
1
w
1
, . . . , z
g
M
w
M
) +
N
i=1
¯
k
i
j=1
I(X; z
¯ g
j
i
i
[z
g
1
w
1
, . . . , z
g
M
w
M
) (3.2)
≤ I(X; z
g
1
w
1
, . . . , z
g
M
w
M
) +
N
i=1
¯
k
i
I(X; z
¯ g
1
i
i
[z
g
1
w
1
, . . . , z
g
M
w
M
) (3.3)
Proof. The expression in Eq. (3.2) is obtained directly from step (b) of Theorem 3.1
through submodularity and the deﬁnition of ¯ g
j
i
in Eq. (3.1). Eq. (3.3) uses the fact that
I(X; z
¯ g
j
1
i
[z
g
1
w
1
, . . . , z
g
M
w
M
) ≥ I(X; z
¯ g
j
2
i
[z
g
1
w
1
, . . . , z
g
M
w
M
) for any j
1
≤ j
2
.
74 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
The online bound in Theorem 3.2 can be used to calculate an upper bound for the
optimal reward starting from any sequence of observation choices, not just the choice
made by the greedy heuristic in Deﬁnition 3.1, (g
1
, . . . , g
M
). The online bound will
tend to be tight in cases where the amount of information remaining after choosing the
set of observations is small.
3.1.4 Example: beam steering
Consider the beam steering problem in which two objects are being tracked. Each
object evolves according to a linear Gaussian process:
x
i
k+1
= Fx
i
k
+w
i
k
where w
i
k
∼ ^¦w
k
; 0, Q¦ are independent white Gaussian noise processes. The state
x
i
k
is assumed to be position and velocity in two dimensions (x
i
k
= [p
i
x
v
i
x
p
i
y
v
i
y
]
T
),
where velocity is modelled as a continuoustime random walk with constant diﬀusion
strength q (independently in each dimension), and position is the integral of velocity.
Denoting the sampling interval as T = 1, the corresponding discretetime model is:
F =
_
¸
¸
¸
¸
_
1 T 0 0
0 1 0 0
0 0 1 T
0 0 0 1
_
¸
¸
¸
¸
_
; Q = q
_
¸
¸
¸
¸
_
T
3
3
T
2
2
0 0
T
2
2
T 0 0
0 0
T
3
3
T
2
2
0 0
T
2
2
T
_
¸
¸
¸
¸
_
At each time instant we may choose between linear Gaussian measurements of the
position of either object:
z
i
k
=
_
1 0 0 0
0 0 1 0
_
x
i
k
+v
i
k
where v
i
k
∼ ^¦v
i
k
; 0, I¦ are independent white Gaussian noise processes, independent
of w
j
k
∀ j, k. The objects are tracked over a period of 200 time steps, commencing from
an initial distribution x
0
∼ ^¦x
0
; 0, P
0
¦, where
P
0
=
_
¸
¸
¸
¸
_
0.5 0.1 0 0
0.1 0.05 0 0
0 0 0.5 0.1
0 0 0.1 0.05
_
¸
¸
¸
¸
_
Fig. 3.1 shows the bound on the fraction of optimality according to the guarantee of
0
200
400
600
800
1000
1200
1400
1600
1800
2000
2200
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
1
10
2
T
o
t
a
l
M
I
Diﬀusion strength q
(a) Reward obtained by greedy heuristic and bound on optimal
Greedy reward
Bound on optimal
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
10
−6
10
−5
10
−4
10
−3
10
−2
10
−1
10
0
10
1
10
2
F
r
a
c
t
i
o
n
o
f
o
p
t
i
m
a
l
i
t
y
Diﬀusion strength q
(b) Factor of optimality from online guarantee
Figure 3.1. (a) shows total reward accrued by the greedy heuristic in the 200 time steps
for diﬀerent diﬀusion strength values (q), and the bound on optimal obtained through
Theorem 3.2. (b) shows the ratio of these curves, providing the factor of optimality
guaranteed by the bound.
76 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
Theorem 3.2 as a function of the diﬀusion strength q. In this example, the quantity
being estimated, X, is the combination of the states of both objects over all time steps
in the problem:
X = [x
1
1
; x
2
1
; x
1
2
; x
2
2
; . . . ; x
1
200
; x
2
200
]
Examining Fig. 3.1, the greedy controller obtains close to the optimal amount of infor
mation as diﬀusion decreases since the measurements that were declined become highly
correlated with the measurements that were chosen.
3.1.5 Example: waveform selection
Suppose that we are using a surface vehicle travelling at a constant velocity along a ﬁxed
path (as illustrated in Fig. 3.2(a)) to map the depth of the ocean ﬂoor in a particular
region. Assume that, at any position on the path (such as the points denoted by ‘´’),
we may steer our sensor to measure the depth of any point within a given region around
the current position (as depicted by the dotted ellipses), and that we receive a linear
measurement of the depth corrupted by Gaussian noise with variance R. Suppose that
we model the depth of the ocean ﬂoor as a GaussMarkov random ﬁeld with a 500100
thin membrane grid model where neighboring node attractions are uniformly equal to
q. One cycle of the vehicle path takes 300 time steps to complete.
Deﬁning the state X to be the vector containing one element for each cell in the
500100 grid, the structure of the problem can be seen to be waveform selection: at
each time we choose between observations which convey information about diﬀerent
aspects of the same underlying phenomenon.
Fig. 3.2(b) shows the accrual of reward over time as well as the bound on the optimal
sequence obtained using Theorem 3.2 for each time step when q = 100 and R = 1/40,
while Fig. 3.2(c) shows the ratio between the achieved performance and the optimal
sequence bound over time. The graph indicates that the greedy heuristic achieves at
least 0.8 the optimal reward. The tightness of the online bound depends on particular
model characteristics: if q = R = 1, then the guarantee ratio is much closer to the value
of the oﬄine bound (i.e., 0.5). Fig. 3.3 shows snapshots of how the uncertainty in the
depth estimate progresses over time. The images display the marginal entropy of each
cell in the grid.
0
50
100
150
200
250
0 100 200 300 400 500 600
A
c
c
r
u
e
d
r
e
w
a
r
d
(
M
I
)
Time step
(b) Reward obtained by greedy heuristic and bound on optimal
Greedy reward
Bound on optimal
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0 100 200 300 400 500 600 F
r
a
c
t
i
o
n
o
f
o
p
t
i
m
a
l
i
t
y
Time step
(c) Factor of optimality from online guarantee
(a) Region boundary and vehicle path
Figure 3.2. (a) shows region boundary and vehicle path (counterclockwise, starting from
the left end of the lower straight segment). When the vehicle is located at a ‘’ mark,
any one grid element with center inside the surrounding dotted ellipse may be measured.
(b) graphs reward accrued by the greedy heuristic after diﬀerent periods of time, and the
bound on the optimal sequence for the same time period. (c) shows the ratio of these two
curves, providing the factor of optimality guaranteed by the bound.
(a) 75 steps
(b) 225 steps
(c) 525 steps
Figure 3.3. Marginal entropy of each grid cell after 75, 225 and 525 steps. Blue indi
cates the lowest uncertainty, while red indicates the highest. Vehicle path is clockwise,
commencing from topleft. Each revolution takes 300 steps.
Sec. 3.2. Exploiting diﬀusiveness 79
3.2 Exploiting diﬀusiveness
In problems such as object tracking, the kinematic quantities of interest evolve according
to a diﬀusive process, in which correlation between states at diﬀerent time instants
reduces as the time diﬀerence increases. Intuitively, one would expect that a greedy
algorithm would be closer to optimal in situations in which the diﬀusion strength is
high. This section develops a performance guarantee which exploits the diﬀusiveness of
the underlying process to obtain a tighter bound on performance.
The general form of the result, stated in Theorem 3.3, deals with an arbitrary graph
(in the sense of Section 2.1.5) in the latent structure. The simpler cases involving trees
and chains are discussed in the sequel. The theorem is limited to only choosing a single
observation from each set; the proof of Theorem 3.3 exploits this fact. The basic model
structure is set up in Assumption 3.2.
Assumption 3.2. Let the latent structure which we seek to infer consist of an undi
rected graph ( with nodes X = ¦x
1
, . . . , x
L
¦, with an arbitrary interconnection structure.
Assume that each node has a set of observations ¦z
1
i
, . . . , z
n
i
i
¦, which are independent
of each other and all other nodes and observations in the graph conditioned on x
i
. We
may select a single observation from each set. Let (w
1
, . . . , w
L
) be a sequence which
determines the order in which nodes are visited (w
i
∈ ¦1, . . . , L¦ ∀ i); we assume that
each node is visited exactly once.
The results of Section 3.1 were applicable to any submodular, nondecreasing ob
jective for which the reward of an empty set was zero. In this section, we exploit an
additional property of mutual information which holds under Assumption 3.2, that for
any set of conditioning observations z
A
:
I(X; z
j
i
[z
A
) = H(z
j
i
[z
A
) −H(z
j
i
[X, z
A
)
= H(z
j
i
[z
A
) −H(z
j
i
[x
j
)
= I(x
i
; z
j
i
[z
A
) (3.4)
We then utilize this property in order to exploit process diﬀusiveness. The general form
of the diﬀusive characteristic is stated in Assumption 3.3. This is a strong assumption
that is diﬃcult to establish globally for any given model; we examine it for one simple
model in Section 3.2.3. In Section 3.2.1 we present an online computable guarantee
which exploits the characteristic to whatever extent it exists in a particular selection
problem. In Section 3.2.2 we then specialize the assumption to cases where the latent
graph structure is a tree or a chain.
80 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
Assumption 3.3. Under the structure in Assumption 3.2, let the graph ( have the
diﬀusive property in which there exists α < 1 such that for each i ∈ ¦1, . . . , L¦ and each
observation z
j
w
i
at node x
w
i
,
I(x
N(w
i
)
; z
j
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) ≤ αI(x
w
i
; z
j
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
)
where x
N(w
i
)
denotes the neighbors of node x
w
i
in the latent structure graph (.
Assumption 3.3 states that the information which the observation z
j
w
i
contains about
x
w
i
is discounted by a factor of at least α when compared to the information it contains
about the remainder of the graph. Theorem 3.3 uses this property to bound the loss of
optimality associated with the greedy choice to be a factor of (1 + α) rather than 2.
Theorem 3.3. Under Assumptions 3.2 and 3.3, the performance of the greedy heuristic
in Deﬁnition 3.1 satisﬁes the following guarantee:
I(X; z
o
1
w
1
, . . . , z
o
L
w
L
) ≤ (1 + α)I(X; z
g
1
w
1
, . . . , z
g
L
w
L
)
Proof. To establish an induction step, assume that (as trivially holds for j = 1),
I(X; z
o
1
w
1
, . . . , z
o
L
w
L
) ≤ (1 + α)I(X; z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(X; z
o
j
w
j
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
(3.5)
Manipulating the second term in Eq. (3.5),
I(X; z
o
j
w
j
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
(a)
= I(x
w
j
; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(x
w
j+1
, . . . , x
w
L
; z
o
j+1
w
j+1
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
, z
o
j
w
j
)
(b)
≤ I(x
w
j
; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(x
w
j+1
, . . . , x
w
L
; z
o
j+1
w
j+1
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
(c)
≤ I(x
w
j
; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(x
w
j+1
, . . . , x
w
L
; z
g
j
w
j
, z
o
j+1
w
j+1
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
(d)
= I(x
w
j
; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(x
N(w
j
)
; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
+ I(x
w
j+1
, . . . , x
w
L
; z
o
j+1
w
j+1
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j
w
j
)
(e)
≤ (1 + α)I(x
w
j
; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(x
w
j+1
, . . . , x
w
L
; z
o
j+1
w
j+1
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j
w
j
)
(f)
= (1 + α)I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) + I(X; z
o
j+1
w
j+1
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j
w
j
)
where (a) and (f) result from the chain rule, from independence of z
o
j
w
j
and z
g
j
w
j
on the re
maining latent structure conditioned on x
w
j
, and from independence of ¦z
o
j+1
w
j+1
, . . . , z
o
L
w
L
¦
Sec. 3.2. Exploiting diﬀusiveness 81
on the remaining latent structure conditioned on ¦x
w
j+1
, . . . , x
w
L
¦; (b) from submodu
larity and the deﬁnition of the greedy heuristic; (c) from the nondecreasing property;
(d) from the chain rule (noting that z
g
j
w
j
is independent of all nodes remaining to be vis
ited conditioned on the neighbors of x
w
j
); and (e) from the assumed diﬀusive property
of Assumption 3.3. Replacing the second term in Eq. (3.5) with the ﬁnal result in (f),
we obtain a strengthened bound:
I(X; z
o
1
w
1
, . . . , z
o
L
w
L
) ≤ (1 + α)I(X; z
g
1
w
1
, . . . , z
g
j
w
j
) + I(X; z
o
j+1
w
j+1
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j
w
j
)
Applying this induction step L times, we obtain the desired result.
3.2.1 Online guarantee
For many models the diﬀusive property is diﬃcult to establish globally. Following from
step (d) of Theorem 3.3, one may obtain an online computable bound which does not
require the property of Assumption 3.3 to hold globally, but exploits it to whatever
extent it exists in a particular selection problem.
Theorem 3.4. Under the model of Assumption 3.2, but not requiring the diﬀusive prop
erty of Assumption 3.3, the following performance guarantee, which can be computed
online, applies to the greedy heuristic of Deﬁnition 3.1:
I(X; z
o
1
w
1
, . . . , z
o
L
w
L
) ≤ I(X; z
g
1
w
1
, . . . , z
g
L
w
L
) +
L
j=1
I(x
N(w
j
)
; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
Proof. The proof directly follows Theorem 3.3. Commence by assuming (for induction)
that:
I(X; z
o
1
w
1
, . . . , z
o
L
w
L
) ≤ I(X; z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) +
j−1
i=1
I(x
N(w
i
)
; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
)
+ I(X; z
o
j
w
j
, . . . , z
o
L
w
L
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) (3.6)
Eq. (3.6) trivially holds for j = 1. If we assume that it holds for j then step (d) of
Theorem 3.3 obtains an upper bound to the the ﬁnal term in Eq. (3.6) which establishes
that Eq. (3.6) also holds for (j + 1). Applying the induction step L times, we obtain
the desired result.
82 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
3.2.2 Specialization to trees and chains
In the common case where the latent structure X = ¦x
1
, . . . , x
L
¦ forms a tree, we may
avoid including all neighbors a node in the condition of Assumption 3.3 and in the
result of Theorem 3.4. The modiﬁed assumptions are presented below. An additional
requirement on the sequence (w
1
, . . . , w
L
) is necessary to exploit the tree structure.
Assumption 3.4. Let the latent structure which we seek to infer consist of an undi
rected graph ( with nodes X = ¦x
1
, . . . , x
L
¦, which form a tree. Assume that each
node has a set of observations ¦z
1
i
, . . . , z
n
i
i
¦, which are independent of each other and
all other nodes and observations in the graph conditioned on x
i
. We may select a single
observation from each set. Let (w
1
, . . . , w
L
) be a sequence which determines the order
in which nodes are visited (w
i
∈ ¦1, . . . , L¦ ∀ i); we assume that each node is visited
exactly once. We assume that the sequence is “bottomup”, i.e., that no node is visited
before all of its children have been visited.
Assumption 3.5. Under the structure in Assumption 3.4, let the graph ( have the
diﬀusive property in which there exists α < 1 such that for each i ∈ ¦1, . . . , L¦ and each
observation z
j
w
i
at node x
w
i
,
I(x
π(w
i
)
; z
j
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) ≤ αI(x
w
i
; z
j
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
)
where x
π(w
i
)
denotes the parent of node x
w
i
in the latent structure graph (.
Theorem 3.3 holds under Assumptions 3.4 and 3.5; the proof passes directly once
x
N(w
j
)
is replaced by x
π(w
j
)
in step (d). The modiﬁed statement of Theorem 3.4 is
included below. Again, the proof passes directly once x
N(w
i
)
is replaced by x
π(w
i
)
.
Theorem 3.5. Under the model of Assumption 3.4, but not requiring the diﬀusive prop
erty of Assumption 3.5, the following performance guarantee, which can be computed
online, applies to the greedy heuristic of Deﬁnition 3.1:
I(X; z
o
1
w
1
, . . . , z
o
L
w
L
) ≤ I(X; z
g
1
w
1
, . . . , z
g
L
w
L
) +
L
j=1
I(x
π(w
j
)
; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
The most common application of the diﬀusive model is in Markov chains (a special
case of a tree), where the ith node corresponds to time i. In this case, the sequence
is simply w
i
= i, i.e., we visit the nodes in time order. Choosing the ﬁnal node in the
Sec. 3.2. Exploiting diﬀusiveness 83
chain to be the tree root, this sequence respects the bottomup requirement, and the
diﬀusive requirement becomes:
I(x
k+1
; z
j
k
[z
g
1
1
, . . . , z
g
k−1
k−1
) ≤ αI(x
k
; z
j
k
[z
g
1
1
, . . . , z
g
k−1
k−1
) (3.7)
3.2.3 Establishing the diﬀusive property
As an example, we establish the diﬀusive property for a simple onedimensional sta
tionary linear GaussMarkov chain model. The performance guarantee is uninteresting
in this case since the greedy heuristic may be easily shown to be optimal. Nevertheless,
some intuition may be gained from the structure of the condition which results. The
dynamics model and observation models are given by:
x
k+1
= fx
k
+ w
k
(3.8)
z
j
k
= h
j
x
k
+ v
j
k
, j ∈ ¦1, . . . , n¦ (3.9)
where w
k
∼ ^¦w
k
; 0, q¦ and v
j
k
∼ ^¦v
j
k
; 0, r
j
¦. We let ˜ q = q/f
2
and ˜ r
j
= r
j
/(h
j
)
2
.
The greedy heuristic in this model corresponds to choosing the observation z
j
k
with the
smallest normalized variance ˜ r
j
. Denoting the covariance of x
k
conditioned on the prior
observations as P
kk−1
, the terms involved in Eq. (3.7) can be evaluated as:
I(x
k
; z
j
k
[z
g
1
1
, . . . , z
g
k−1
k−1
) =
1
2
log
_
1 +
P
kk−1
˜ r
j
_
(3.10)
I(x
k+1
; z
j
k
[z
g
1
1
, . . . , z
g
k−1
k−1
) =
1
2
log
_
1 +
P
2
kk−1
(˜ r
j
+ ˜ q)P
kk−1
+ ˜ r
j
˜ q
_
(3.11)
If P
kk−1
can take on any value on the positive real line then no α < 1 exists, since:
lim
P→∞
_
_
_
1
2
log
_
1 +
P
2
(˜ r
j
+˜ q)P+˜ r
j
˜ q
_
1
2
log
_
1 +
P
˜ r
j
¸
_
_
_
= 1
Thus we seek a range for P
kk−1
such that there does exist an α < 1 for which Eq. (3.7)
is satisﬁed. If such a result is obtained, then the diﬀusive property is established as
long as the covariance remains within this range during operation.
Substituting Eq. (3.10) and Eq. (3.11) into Eq. (3.7) and exponentiating each side,
we need to ﬁnd the range of P for which
b
α
(P) =
1 +
P
2
(˜ r
j
+˜ q)P+˜ r
j
˜ q
_
1 +
P
˜ r
j
¸
α
≤ 1 (3.12)
84 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
Note that b
α
(0) = 1 for any α. Furthermore,
d
dP
b
α
(P) may be shown to be negative for
P ∈ [0, a) and positive for P ∈ (a, ∞) (for some positive a). Hence b
α
(P) reduces from
a value of unity initially before increasing monotonically and eventually crossing unity.
For any given α, there will be a unique nonzero value of P for which b
α
(P) = 1. For a
given value of P, we can easily solve Eq. (3.12) to ﬁnd the smallest value of α for which
the expression is satisﬁed:
α
∗
(P) =
log
_
(P+˜ r
j
)(P+˜ q)
(˜ r
j
+˜ q)P+˜ r
j
˜ q
_
log(P + ˜ r
j
)
(3.13)
Hence, for any P ∈ [0, P
0
], Eq. (3.12) is satisﬁed for any α ∈ [α
∗
(P
0
), 1]. The strongest
diﬀusion coeﬃcient is shown in Fig. 3.4 as a function of the covariance upper limit P
0
for various values of ˜ r and ˜ q. Diﬀerent values of the dynamics model parameter f will
yield diﬀerent steady state covariances, and hence select diﬀerent operating points on
the curve.
While closedform analysis is very diﬃcult for multidimensional linear Gaussian
systems, nor for nonlinear and/or nonGaussian systems, the general intuition of the
single dimensional linear Gaussian case may be applied. For example, many systems will
satisfy some degree of diﬀusiveness as long as all states remain within some certainly
level. The examples in Sections 3.2.4 and 3.2.5 demonstrate the use of the online
performance guarantee in cases where the diﬀusive condition has not been established
globally.
3.2.4 Example: beam steering revisited
Consider the beam steering scenario presented in Section 3.1.4. The performance bound
obtained using the online analysis form Theorem 3.5 is shown in Fig. 3.5. As expected,
the bound tightens as the diﬀusion strength increases. In this example, position states
are directly observable, while velocity states are only observable through the induced
impact on position. The guarantee is substantially tighter when all states are directly
observable, as shown in the following example.
3.2.5 Example: bearings only measurements
Consider an object which moves in two dimensions according to a Gaussian random
walk:
x
k+1
= x
k
+w
k
0
0.1
0.2
0.3
0.4
0.5
0 5 10 15 20 25
D
i
ﬀ
u
s
i
v
e
c
o
e
ﬃ
c
i
e
n
t
α
∗
Covariance limit P
0
Lowest diﬀusive coeﬃcient vs covariance limit
˜ q = 20
˜ q = 10
˜ q = 5
Figure 3.4. Strongest diﬀusive coeﬃcient versus covariance upper limit for various values
of ˜ q, with ˜ r = 1. Note that lower values of α
∗
correspond to stronger diﬀusion.
400
600
800
1000
1200
1400
1600
1800
2000
10
0
10
1
10
2
T
o
t
a
l
M
I
Diﬀusion strength q
(a) Reward obtained by greedy heuristic and bound on optimal
Greedy reward
Bound on optimal
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
10
0
10
1
10
2
F
r
a
c
t
i
o
n
o
f
o
p
t
i
m
a
l
i
t
y
Diﬀusion strength q
(b) Factor of optimality from online diﬀusive guarantee
Figure 3.5. (a) shows total reward accrued by the greedy heuristic in the 200 time steps
for diﬀerent diﬀusion strength values (q), and the bound on optimal obtained through
Theorem 3.5. (b) shows the ratio of these curves, providing the factor of optimality
guaranteed by the bound.
Sec. 3.3. Discounted rewards 87
where w
k
∼ ^¦w
k
; 0, I¦. The initial position of the object is distributed according
to x
0
∼ ^¦x
0
; 0, I¦. Assume that bearing observations are available from four sen
sors positioned at (±100, ±100), but that only one observation may be utilized at any
instant. Simulations were run for 200 time steps. The total reward and the bound
obtained from Theorem 3.5 are shown in Fig. 3.6(a) as a function of the measurement
noise standard deviation (in degrees). The results demonstrate that the performance
guarantee becomes stronger as the measurement noise decreases; the same eﬀect occurs
if the observation noise is held constant and the dynamics noise increased. Fig. 3.6(b)
shows the ratio of the greedy performance to the upper bound on optimal, demonstrat
ing that the greedy heuristic is guaranteed to be within a factor of 0.77 of optimal with
a measurement standard deviation of 0.1 degrees.
In this example, we utilized the closed loop greedy heuristic examined in Section 3.5,
hence it was necessary to use multiple Monte Carlo simulations to compute the online
guarantee. Tracking was performed using an extended Kalman ﬁlter, hence the bounds
are approximate (the EKF variances were used to calculate the rewards). In this sce
nario, the low degree of nonlinearity in the observation model provides conﬁdence that
the inaccuracy in the rewards is insigniﬁcant.
3.3 Discounted rewards
In Sections 3.1 and 3.2 the objective we were seeking to optimize was the mutual
information between the state and observations through the planning horizon, and the
optimal open loop reward to which we compared was I(X; z
o
1
w
1
, . . . , z
o
M
w
M
). In some
sequential estimation problems, it is not only desirable to maximize the information
obtained within a particular period of time, but also how quickly, within the planning
horizon, the information is obtained. One way
2
of capturing this notion is to incorporate
a discount factor in the objective, reducing the value of information obtained later in
the problem. Subsequently, our optimization problem is changed from:
min
u
1
,...,u
M
I(X; z
u
1
1
, . . . , z
u
M
N
) = min
u
1
,...,u
M
M
k=1
I(X; z
u
k
k
[z
u
1
1
, . . . , z
u
k−1
k−1
)
2
Perhaps the most natural way of capturing this would be to reformulate the problem, choosing as
the objective (to be minimized) the expected time required to reduce the uncertainty below a desired
criterion. The resulting problem is intractable, hence the approximate method using MI is an appealing
substitute.
0
50
100
150
200
250
300
350
400
450
500
10
−1
10
0
10
1
T
o
t
a
l
M
I
Measurement noise standard deviation σ
(a) Reward obtained by greedy heuristic and bound on optimal
Greedy reward
Bound on optimal
0.5
0.55
0.6
0.65
0.7
0.75
0.8
10
−1
10
0
10
1
F
r
a
c
t
i
o
n
o
f
o
p
t
i
m
a
l
i
t
y
Measurement noise standard deviation σ
(b) Factor of optimality from online guarantee
Figure 3.6. (a) shows average total reward accrued by the greedy heuristic in the 200
time steps for diﬀerent diﬀusion strength values (q), and the bound on optimal obtained
through Theorem 3.5. (b) shows the ratio of these curves, providing the factor of optimality
guaranteed by the bound.
Sec. 3.3. Discounted rewards 89
to:
min
u
1
,...,u
M
M
k=1
α
k−1
I(X; z
u
k
k
[z
u
1
1
, . . . , z
u
k−1
k−1
)
where α < 1. Not surprising, the performance guarantee for the greedy heuristic be
comes tighter as the discount factor is decreased, as Theorem 3.6 establishes. We deﬁne
the abbreviated notation for the optimal reward to go from stage i to the end of the
problem conditioned on the previous observation choices (u
1
, . . . , u
k
):
J
o
i
[(u
1
, . . . , u
k
)]
M
j=i
α
j−1
I(X; z
o
j
w
j
[z
u
1
w
1
, . . . , z
u
k
w
k
, z
o
i
w
i
, . . . , z
o
j−1
w
j−1
)
and the reward so far for the greedy heuristic in i stages:
J
g
→i
i
j=1
α
j−1
I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
We will use the following lemma in the proof of Theorem 3.6.
Lemma 3.1. The optimal reward to go for the discounted sequence satisﬁes the rela
tionship:
J
o
i+1
[(g
1
, . . . , g
i−1
)] ≤ α
i
I(X; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + J
o
i+1
[(g
1
, . . . , g
i
)]
90 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
Proof. Expanding the lefthand side and manipulating:
J
o
i+1
[(g
1
, . . . , g
i−1
)] =
M
j=i+1
α
j−1
I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
, z
o
i+1
w
i+1
, . . . , z
o
j−1
w
j−1
)
(a)
= α
i
_
I(X; z
o
i+1
w
i+1
, . . . , z
o
M
w
M
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
)
+
M
j=i+1
(α
j−i−1
−1)I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
, z
o
i+1
w
i+1
, . . . , z
o
j−1
w
j−1
)
_
(b)
≤ α
i
_
I(X; z
g
i
w
i
, z
o
i+1
w
i+1
, . . . , z
o
M
w
M
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
)
+
M
j=i+1
(α
j−i−1
−1)I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
, z
o
i+1
w
i+1
, . . . , z
o
j−1
w
j−1
)
_
(c)
= α
i
_
I(X; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + I(X; z
o
i+1
w
i+1
, . . . , z
o
M
w
M
[z
g
1
w
1
, . . . , z
g
i
w
i
)
+
M
j=i+1
(α
j−i−1
−1)I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
, z
o
i+1
w
i+1
, . . . , z
o
j−1
w
j−1
)
_
(d)
≤ α
i
_
I(X; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + I(X; z
o
i+1
w
i+1
, . . . , z
o
M
w
M
[z
g
1
w
1
, . . . , z
g
i
w
i
)
+
M
j=i+1
(α
j−i−1
−1)I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
i
w
i
, z
o
i+1
w
i+1
, . . . , z
o
j−1
w
j−1
)
_
(e)
= α
i
I(X; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + J
o
i+1
[(g
1
, . . . , g
i
)]
where (a) results from adding and subtracting α
i
I(X; z
o
i+1
w
i+1
, . . . , z
o
M
w
M
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
),
(b) from the nondecreasing property of MI (introducing the additional observation z
g
i
w
i
into the ﬁrst term), (c) from the MI chain rule, (d) from submodularity (adding the
conditioning z
g
i
w
i
into the second term, noting that the coeﬃcient is negative), and (e)
from cancelling similar terms and applying the deﬁnition of J
o
i+1
.
Theorem 3.6. Under Assumption 3.1, the greedy heuristic in Deﬁnition 3.1 has per
formance guaranteed by the following expression:
M
j=1
α
j−1
I(X; z
o
j
w
j
[z
o
1
w
1
, . . . , z
o
j−1
w
j−1
) ≤ (1 + α)
M
j=1
α
j−1
I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
Sec. 3.4. Time invariant rewards 91
where ¦z
o
1
w
1
, . . . , z
o
M
w
M
¦ is the optimal set of observations, i.e., the one which maximizes
M
j=1
α
j−1
I(X; z
o
j
w
j
[z
o
1
w
1
, . . . , z
o
j−1
w
j−1
)
Proof. In the abbreviated notation, we seek to prove:
J
o
1
[∅] ≤ (1 + α)J
g
→M
The proof follows an induction on the expression
J
o
1
[∅] ≤ (1 + α)J
g
→i−1
+ J
o
i
[(g
1
, . . . , g
i−1
)]
which is trivially true for i = 1. Suppose it is true for i; manipulating the expression
we obtain:
J
o
1
[∅] ≤ (1 + α)J
g
→i−1
+ J
o
i
[(g
1
, . . . , g
i−1
)]
(a)
= (1 + α)J
g
→i−1
+ α
i−1
I(X; z
o
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + J
o
i+1
[(g
1
, . . . , g
i−1
, o
i
)]
(b)
≤ (1 + α)J
g
→i−1
+ α
i−1
I(X; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + J
o
i+1
[(g
1
, . . . , g
i−1
, o
i
)]
(c)
≤ (1 + α)J
g
→i−1
+ α
i−1
I(X; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + J
o
i+1
[(g
1
, . . . , g
i−1
)]
(d)
≤ (1 + α)J
g
→i−1
+ (1 + α)α
i−1
I(X; z
g
i
w
i
[z
g
1
w
1
, . . . , z
g
i−1
w
i−1
) + J
o
i+1
[(g
1
, . . . , g
i
)]
= (1 + α)J
g
→i
+ J
o
i+1
[(g
1
, . . . , g
i
)]
where (a) results from the deﬁnition of J
o
i
, (b) results from the deﬁnition of the greedy
heuristic, (c) results from submodularity (allowing us to remove the conditioning on o
i
from each term in J
o
i+1
) and (d) from Lemma 3.1.
Applying the induction step M times we get the desired result.
3.4 Time invariant rewards
The reward function in some problems is wellapproximated as being time invariant.
In this case, a tighter bound, (1 −1/e), may be obtained. The structure necessary is
given in Assumption 3.6.
92 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
Assumption 3.6. At each stage k ∈ ¦1, . . . , L¦ there are n observations, ¦z
1
k
, . . . , z
n
k
¦,
which are mutually independent conditioned on the quantity to be estimated (X). At
each stage k we may select any one of the n observations; for each observation i, the
observation model p(z
i
k
[X) does not change with stage. Selecting the same observation
in m diﬀerent stages results in m conditionally independent observations based on the
same model.
Note that the only diﬀerence between this problem and the kelement subset selec
tion problem in Section 2.4.5 is that the same observation can be chosen multiple times
in diﬀerent stages to yield additional information through the same observation model
(e.g., eﬀectively providing a reduced variance observation). The following proof obtains
the guarantee by transforming the problem into a kelement subset selection problem.
Theorem 3.7. Under Assumption 3.6, the greedy heuristic in Deﬁnition 3.1 has per
formance guaranteed by the following expression:
(1 −1/e)I(X; z
o
1
1
, . . . , z
o
L
L
) ≤ I(X; z
g
1
1
, . . . , z
g
L
L
)
where ¦z
o
1
1
, . . . , z
o
L
L
¦ is the optimal set of observations, i.e., the one which maximizes
I(X; z
o
1
1
, . . . , z
o
L
L
).
Proof. Consider the Lelement subset selection problem involving the set of observations
¦z
1,1
, . . . , z
1,L
, . . . , z
n,1
, . . . , z
n,L
¦
For all (i, j), the observation model for z
i,j
is p
z
i
k
(z
i,j
[X), i.e., the same as that of
z
i
k
. Note that any choice of observations maps directly in to a choice in the original
problem. For example, if three observations are chosen out of the set ¦z
i,1
, . . . , z
i,n
¦,
then we select the observation z
i
k
for three time slots k out of the set ¦1, . . . , L¦. The
exact choice of time slots is not important since rewards are invariant to time.
Thus the solution of the Lelement subset selection problem, which is within a
factor of (1 −1/e) the optimal solution of the Lelement subset selection problem by
Theorem 2.4, provides a solution that is within (1 −1/e) the optimal solution of the
problem structure in Assumption 3.6.
Example 3.1 relied on nonstationary observation models to demonstrate that the
possible diﬀerence between the greedy and optimal selections. The following example
illustrates the diﬀerence that is possible using stationary models and a static state.
Sec. 3.5. Closed loop control 93
Example 3.2. Suppose that our state consists of four independent binary random vari
ables, X = [a b c d]
T
, where H(a) = H(b) = 1, and H(c) = H(d) = 1 − for some
small > 0. In each stage k ∈ ¦1, 2¦ there are three observations available, z
1
k
= [a b]
T
,
z
2
k
= [a c]
T
and z
3
k
= [b d]
T
.
In stage 1, the greedy heuristic selects observation z
1
1
since I(X; z
1
1
) = 2 whereas
I(X; z
2
1
) = I(X; z
3
1
) = 2 −. In stage 2, the algorithm has already learned the values of
a and b, hence I(X; z
1
2
[z
1
1
) = 0, and I(X; z
2
2
[z
1
1
) = I(X; z
3
2
[z
1
1
) = 1−. The total reward
is 3 −.
An optimal choice is z
2
1
and z
3
2
, achieving reward 4 − 2. The ratio of the greedy
reward to optimal reward is
3 −
4 −2
which approaches 0.75 as → 0. Examining Theorem 2.4, we see that the performance
of the greedy heuristic over K = 2 stages is guaranteed to be within a factor of [1−(1−
1/K)
K
] = 0.75 of the optimal, hence this factor is the worst possible over two stages.
The intuition behind the scenario in this example is that information about dif
ferent portions of the state can be obtained in diﬀerent combinations, therefore it is
necessary to use additional planning to ensure that the observations we obtain provide
complementary information.
3.5 Closed loop control
The analysis in Sections 3.13.3 concentrates on an open loop control structure, i.e., it
assumes that all observation choices are made before any observation values are received.
Greedy heuristics are often applied in a closed loop setting, in which an observation is
chosen, and then its value is received before the next choice is made.
The performance guarantees of Theorems 3.1 and 3.3 both apply to the expected
performance of the greedy heuristic operating in a closed loop fashion, i.e., in expec
tation the closed loop greedy heuristic achieves at least half the reward of the optimal
open loop selection. The expectation operation is necessary in the closed loop case since
control choices are random variables that depend on the values of previous observations.
Theorem 3.8 establishes the result of Theorem 3.1 for the closed loop heuristic. The
same process can be used to establish a closed loop version of Theorem 3.3. To obtain
the closed loop guarantee, we need to exploit an additional characteristic of mutual
information:
I(X; z
A
[z
B
) =
_
I(X; z
A
[z
B
= ζ)p
z
B(ζ)dζ (3.14)
94 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
While the results are presented in terms of mutual information, they apply to any other
objective which meets the previous requirements as well as Eq. (3.14).
We deﬁne h
j
= (u
1
, z
u
1
w
1
, u
2
, z
u
2
w
2
, . . . , u
j−1
, z
u
j−1
w
j−1
) to be the history of all observation
actions chosen, and the resulting observation values, prior to stage j (this constitutes
all the information which we can utilize in choosing our action at time j). Accordingly,
h
1
= ∅, and h
j
= (h
j−1
, u
j
, z
u
j
w
j
). The greedy heuristic operating in closed loop is deﬁned
in Deﬁnition 3.2.
Deﬁnition 3.2. Under the same assumptions as Theorem 3.1, deﬁne the closed loop
greedy heuristic policy µ
g
:
µ
g
j
(h
j
) = arg max
u∈{1,...,nw
j
}
I(X; z
u
w
j
[h
j
) (3.15)
We use the convention that conditioning on h
i
in an MI expression is always on
the value, and hence if h
i
contains elements which are random variables we will always
include an explicit expectation operator. The expected reward to go from stage j to
the end of the planning horizon for the greedy heuristic µ
g
j
(h
j
) commencing from the
history h
j
is denoted as:
J
µ
g
j
(h
j
) = I(X; z
µ
g
j
(h
j
)
w
j
, . . . , z
µ
g
N
(h
N
)
w
N
[h
j
) (3.16)
= E
_
_
N
i=j
I(X; z
µ
g
i
(h
i
)
w
i
[h
i
)
¸
¸
¸
¸
¸
¸
h
j
_
_
(3.17)
The expectation in Eq. (3.17) is over the random variables corresponding to the ac
tions ¦µ
g
j+1
(h
j+1
), . . . , µ
g
N
(h
N
)¦,
3
along with the observations resulting from the actions,
¦z
µ
g
j
(h
j
)
w
j
, . . . , z
µ
g
N
(h
N
)
w
N
¦, where h
i
is the concatenation of the previous history sequence
h
i−1
with the new observation action µ
g
i
(h
i
) and the new observation value z
µ
g
i
(h
i
)
w
i
. The
expected reward of the greedy heuristic over the full planning horizon is J
µ
g
1
(∅). We
also deﬁne the expected reward accrued by the greedy heuristic up to and including
stage j, commencing from an empty history sequence (i.e., h
1
= ∅), as:
J
µ
g
→j
= E
_
j
i=1
I(X; z
µ
g
i
(h
i
)
w
i
[h
i
)
_
(3.18)
This gives rise to the recursive relationship:
J
µ
g
→j
= E[I(X; z
µ
g
j
(h
j
)
w
j
[h
j
)] + J
µ
g
→j−1
(3.19)
3
We assume a deterministic policy, hence the action at stage j is ﬁxed given knowledge of hj.
Sec. 3.5. Closed loop control 95
Comparing Eq. (3.17) with Eq. (3.18), we have J
µ
g
→N
= J
µ
g
1
(∅). We deﬁne J
µ
g
→0
= 0.
The reward of the tail of the optimal open loop observation sequence (o
j
, . . . , o
N
)
commencing from the history h
j
is denoted by:
J
o
j
(h
j
) = I(X; z
o
j
w
j
, . . . , z
o
N
w
N
[h
j
) (3.20)
Using the MI chain rule and Eq. (3.14), this can be written recursively as:
J
o
j
(h
j
) = I(X; z
o
j
w
j
[h
j
) + E
z
o
j
w
j
h
j
J
o
j+1
[(h
j
, o
j
, z
o
j
w
j
)] (3.21)
where J
o
N+1
(h
N+1
) = 0. The reward of the optimal open loop observation sequence over
the full planning horizon is:
J
o
1
(∅) = I(X; z
o
1
w
1
, . . . , z
o
N
w
N
) (3.22)
We seek to obtain a guarantee on the performance ratio between the optimal open
loop observation sequence and the closed loop greedy heuristic. Before we prove the
theorem, we establish a simple result in terms of our new notation.
Lemma 3.2. Given the above deﬁnitions:
E
z
o
j
w
j
h
j
J
o
j+1
[(h
j
, o
j
, z
o
j
w
j
)] ≤ J
o
j+1
(h
j
) ≤ I(X; z
µ
g
j
(h
j
)
w
j
[h
j
) + E
z
µ
g
j
(h
j
)
w
j
h
j
J
o
j+1
[(h
j
, µ
g
j
(h
j
), z
µ
g
j
(h
j
)
w
j
)]
Proof. Using Eq. (3.20), the ﬁrst inequality corresponds to:
I(X; z
o
j+1
w
j+1
, . . . , z
o
N
w
N
[h
j
, z
o
j
w
j
) ≤ I(X; z
o
j+1
w
j+1
, . . . , z
o
N
w
N
[h
j
)
where conditioning is on the value h
j
throughout (as per the convention introduced
below Eq. (3.15)), and on the random variable z
o
j
w
j
. Therefore, the ﬁrst inequality
results directly from submodularity.
The second inequality results from the nondecreasing property of MI:
I(X; z
o
j+1
w
j+1
, . . . , z
o
N
w
N
[h
j
)
(a)
≤ I(X; z
µ
g
j
(h
j
)
w
j
, z
o
j+1
w
j+1
, . . . , z
o
N
w
N
[h
j
)
(b)
= I(X; z
µ
g
j
(h
j
)
w
j
[h
j
) + I(z
o
j+1
w
j+1
, . . . , z
o
N
w
N
[h
j
, z
µ
g
j
(h
j
)
w
j
)
(c)
= I(X; z
µ
g
j
(h
j
)
w
j
[h
j
) + E
z
µ
g
j
(h
j
)
w
j
h
j
J
o
j+1
[(h
j
, µ
g
j
(h
j
), z
µ
g
j
(h
j
)
w
j
)]
(a) results from the nondecreasing property, (b) from the chain rule, and (c) from the
deﬁnition in Eq. (3.20).
96 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
We are now ready to prove our result, that the reward of the optimal open loop
sequence is no greater than twice the expected reward of the greedy closed loop heuristic.
Theorem 3.8. Under the same assumptions as Theorem 3.1,
J
o
1
(∅) ≤ 2J
µ
g
1
(∅)
i.e., the expected reward of the closed loop greedy heuristic is at least half the reward of
the optimal open loop policy.
Proof. To establish an induction, assume that
J
o
1
(∅) ≤ 2J
µ
g
→j−1
+ EJ
o
j
(h
j
) (3.23)
Noting that h
1
= ∅, this trivially holds for j = 1 since J
µ
g
→0
= 0. Now, assuming that it
holds for j, we show that it also holds for (j + 1). Applying Eq. (3.21),
J
o
1
(∅) ≤ 2J
µ
g
→j−1
+ E
_
_
_
I(X; z
o
j
w
j
[h
j
) + E
z
o
j
w
j
h
j
J
o
j+1
[(h
j
, o
j
, z
o
j
w
j
)]
_
_
_
By the deﬁnition of the closed loop greedy heuristic (Deﬁnition 3.2),
I(X; z
o
j
w
j
[h
j
) ≤ I(X; z
µ
g
j
(h
j
)
w
j
[h
j
)
hence:
J
o
1
(∅) ≤ 2J
µ
g
→j−1
+ E
_
_
_
I(X; z
µ
g
j
(h
j
)
w
j
[h
j
) + E
z
o
j
w
j
h
j
J
o
j+1
[(h
j
, o
j
, z
o
j
w
j
)]
_
_
_
Applying Lemma 3.2, followed by Eq. (3.19):
J
o
1
(∅) ≤ 2J
µ
g
→j−1
+ E
_
¸
_
¸
_
2I(X; z
µ
g
j
(h
j
)
w
j
[h
j
) + E
z
µ
g
j
(h
j
)
w
j
h
j
J
o
j+1
[(h
j
, µ
g
j
(h
j
), z
µ
g
j
(h
j
)
w
j
)]
_
¸
_
¸
_
= 2J
µ
g
→j
+ EJ
o
j+1
(h
j+1
)
where h
j+1
= (h
j
, µ
g
j
(h
j
), z
µ
g
j
(h
j
)
w
j
). This establishes the induction step.
Applying the induction step N times, we obtain:
J
o
1
(∅) ≤ 2J
µ
g
→N
+ EJ
o
N+1
(h
N+1
) = 2J
µ
g
1
(∅)
since J
o
N+1
(h
N+1
) = 0 and J
µ
g
→N
= J
µ
g
1
(∅).
Sec. 3.5. Closed loop control 97
We emphasize that this performance guarantee is for expected performance: it does
not provide a guarantee for the change in entropy of every sample path. An online
bound cannot be obtained on the basis of a single realization, although online bounds
similar to Theorems 3.2 and 3.4 could be calculated through Monte Carlo simulation
(to approximate the expectation).
3.5.1 Counterexample: closed loop greedy versus closed loop optimal
While Theorem 3.8 provides a performance guarantee with respect to the optimal open
loop sequence, there is no guarantee relating the performance of the closed loop greedy
heuristic to the optimal closed loop controller, as the following example illustrates. One
exception to this is linear Gaussian models, where closed loop policies can perform no
better than open loop sequences, so that the open loop guarantee extends to closed
loop performance.
Example 3.3. Consider the following twostage problem, where X = [a, b, c]
T
, with
a ∈ ¦1, . . . , N¦, b ∈ ¦1, . . . , N + 1¦, and c ∈ ¦1, . . . , M¦. The prior distribution of each
of these is uniform and independent. In the ﬁrst stage, we may measure z
1
1
= a for
reward log N, or z
2
1
= b for reward log(N +1). In the second stage, we may choose z
i
2
,
i ∈ ¦1, . . . , N¦, where
z
i
2
=
_
_
_
c, i = a
d, otherwise
where d is independent of X, and is uniformly distributed on ¦1, . . . , M¦. The greedy
algorithm in the ﬁrst stage selects the observation z
2
1
= b, as it yields a higher reward
(log(N+1)) than z
1
1
= a (log N). At the second stage, all options have the same reward,
1
N
log M, so we choose one arbitrarily for a total reward of log(N +1) +
1
N
log M. The
optimal algorithm in the ﬁrst stage selects the observation z
1
1
= a for reward log N,
followed by the observation z
a
2
for reward log M, for total reward log N + log M. The
ratio of the greedy reward to the optimal reward is
log(N + 1) +
1
N
log M
log N + log M
→
1
N
, M → ∞
Hence, by choosing N and M to be large, we can obtain an arbitrarily small ratio
between the greedy closedloop reward and the optimal closedloop reward.
The intuition of this example is that the value of the observation at the ﬁrst time
provides guidance to the controller on which observation it should take at the second
98 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
time. The greedy heuristic is unable to anticipate the later beneﬁt of this guidance.
Notice that the reward of any observation z
i
2
without conditioning on the ﬁrst observa
tion z
1
1
is I(X; z
i
2
) =
1
N
log M. In contrast, the reward of z
a
2
conditioned on the value
of the observation z
1
1
= a is I(X; z
a
2
[ˇ z
1
1
) = log M. This highlights that the property of
diminishing returns (i.e., submodularity) is lost when later choices (and rewards) are
conditioned earlier observation values.
We conjecture that it may be possible to establish a closed loop performance guar
antee for diﬀusive processes, but it is likely to be dramatically weaker than the bounds
presented in this chapter.
3.5.2 Counterexample: closed loop greedy versus open loop greedy
An interesting side note is that the closed loop greedy heuristic can actually result in
lower performance than the open loop greedy heuristic, as the following example shows.
Example 3.4. Consider the following threestage problem, where X = [a, b, c]
T
, with
a ∈ ¦1, 2¦, b ∈ ¦1, . . . , N +1¦, and c ∈ ¦1, . . . , N¦, where N ≥ 2. The prior distribution
of each of these is uniform and independent. In the ﬁrst stage, a single observation is
available, z
1
= a. In the second stage, we may choose z
1
2
, z
2
2
or z
3
2
= c, where z
1
2
and
z
2
2
are given by:
z
i
2
=
_
_
_
b, i = a
d, otherwise
where d is independent of X, and is uniformly distributed on ¦1, . . . , N + 1¦. In the
third stage, a single observation is available, z
3
= b. The closed loop greedy algorithm
gains reward log 2 for the ﬁrst observation. At the second observation, the value of a
is known, hence it selects z
a
2
= b for reward log (N + 1). The ﬁnal observation z
3
= b
then yields no further reward; the total reward is log 2(N + 1). The open loop greedy
heuristic gains the same reward (log 2) for the ﬁrst observation. Since there is no prior
knowledge of a, z
1
2
and z
2
2
yield the same reward (
1
2
log (N + 1)) which is less than the
reward of z
3
2
(log N), hence z
3
2
is chosen. The ﬁnal observation yields reward log N
for a total reward of log 2N
√
N + 1. For any N ≥ 2, the open loop greedy heuristic
achieves higher reward than the closed loop greedy heuristic.
Since the open loop greedy heuristic has performance no better than the optimal
open loop sequence, and the closed loop greedy heuristic has performance no worse than
half that of the optimal open loop sequence, the ratio of open loop greedy performance
to closed loop greedy performance can be no greater than two. The converse is not
Sec. 3.5. Closed loop control 99
true since the performance of the closed loop greedy heuristic is not bounded by the
performance of the optimal open loop sequence. This can be demonstrated with a slight
modiﬁcation of Example 3.3 in which the observation z
2
1
is made unavailable.
3.5.3 Closed loop subset selection
A simple modiﬁcation of the proof of Theorem 3.8 can also be used to extend the
result of Theorem 2.4 to closed loop selections. In this structure, there is a single set
of observations from which we may choose any subset of ≤ K elements out of the
ﬁnite set . Again, we obtain the value of each observation before making subsequent
selections. We may simplify our notation slightly in this case since we have a single pool
of observations. We denote the history of observations chosen and the resulting values
to be h
j
= (u
1
, z
u
1
, . . . , u
j−1
, z
u
j−1
). The optimal choice of observations is denoted as
(o
1
, . . . , o
K
); the ordering within this choice is arbitrary.
The following deﬁnitions are consistent with the previous deﬁnitions within the new
notation:
µ
g
(h
j
) = arg max
u∈U\{u
1
,...,u
j−1
}
I(X; z
u
[h
j
)
J
µ
g
→j
= E
_
j
i=1
I(X; z
µ
g
(h
i
)
[h
i
)
_
= E[I(X; z
µ
g
(h
j
)
[h
j
)] + J
µ
g
→j−1
J
o
j
(h
j
) = I(X; z
o
j
, . . . , z
o
N
[h
j
)
where h
1
= ∅ and h
j+1
= (h
j
, µ
g
(h
j
), z
µ
g
(h
j
)
). Lemmas 3.3 and 3.4 establish two results
which we use to prove the theorem.
Lemma 3.3. For all i ∈ ¦1, . . . , K¦,
J
o
1
(∅) ≤ J
µ
g
→i
+ EJ
o
1
(h
i+1
)
Proof. The proof follows an induction on the desired result. Note that the expression
trivially holds for i = 0 since J
µ
g
→0
= 0 and h
1
= ∅. Now suppose that the expression
100 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
holds for (i −1):
J
o
1
(∅) ≤ J
µ
g
→i−1
+ EJ
o
1
(h
i
)
(a)
= J
µ
g
→i−1
+ E[I(X; z
o
1
, . . . , z
o
K
[h
i
)]
(b)
≤ J
µ
g
→i−1
+ E[I(X; z
µ
g
(h
i
)
, z
o
1
, . . . , z
o
K
[h
i
)]
(c)
= J
µ
g
→i−1
+ E[I(X; z
µ
g
(h
i
)
[h
i
)] + E[I(X; z
o
1
, . . . , z
o
K
[h
i
, z
µ
g
(h
i
)
)]
(d)
= J
µ
g
→i
+ E
_
E
z
µ
g
(h
i
)
h
i
J
o
1
[(h
i
, µ
g
(h
i
), z
µ
g
(h
i
)
)]
_
(e)
= J
µ
g
→i
+ EJ
o
1
[(h
i+1
)]
where (a) uses the deﬁnition of J
o
1
, (b) results from the nondecreasing property of MI,
(c) results from the MI chain rule, (d) uses the deﬁnition of J
µ
g
→i
and J
o
1
, and (e) uses
the deﬁnition of h
i+1
.
Lemma 3.4. For all i ∈ ¦1, . . . , K¦,
J
o
1
(h
i
) ≤ KI(X; z
µ
g
(h
i
)
[h
i
)
Proof. The following steps establish the result:
J
o
1
(h
i
)
(a)
= I(X; z
o
1
, . . . , z
o
K
[h
i
)
(b)
=
K
j=1
I(X; z
o
j
[h
i
, z
o
1
, . . . , z
o
j−1
)
(c)
≤
K
j=1
I(X; z
o
j
[h
i
)
(d)
≤
K
j=1
I(X; z
µ
g
(h
i
)
[h
i
)
= KI(X; z
µ
g
(h
i
)
[h
i
)
where (a) results from the deﬁnition of J
o
1
, (b) from the MI chain rule, (c) from sub
modularity, and (d) from the deﬁnition of µ
g
.
Sec. 3.6. Guarantees on the Cram´erRao bound 101
Theorem 3.9. The expected reward of the closed loop greedy heuristic in the Kelement
subset selection problem is at least (1 − 1/e) the reward of the optimal open loop
sequence, i.e.,
J
µ
g
→K
≥ (1 −1/e)J
o
1
(∅)
Proof. To commence, note from Lemma 3.4 that, for all i ∈ ¦1, . . . , K¦:
EJ
o
1
(h
i
) ≤ K E[I(X; z
µ
g
(h
i
)
[h
i
)]
= K(J
µ
g
→i
−J
µ
g
→i−1
)
Combining this with Lemma 3.3, we obtain:
J
o
1
(∅) ≤ J
µ
g
→i−1
+ K(J
µ
g
→i
−J
µ
g
→i−1
) ∀ i ∈ ¦1, . . . , K¦ (3.24)
Letting ρ
j
= J
µ
g
→j
−J
µ
g
→j−1
and Z = J
o
1
(∅) and comparing Eq. (3.24) with Eq. (2.93) in
Theorem 2.4, we obtain the desired result.
3.6 Guarantees on the Cram´erRao bound
While the preceding discussion has focused exclusively on mutual information, the re
sults are applicable to a larger class of objectives. The following analysis shows that the
guarantees in Sections 2.4.4, 2.4.5, 3.1 and 3.3 can also yield a guarantee on the pos
terior Cram´erRao bound. We continue to assume that observations are independent
conditioned on X.
To commence, assume that the objective we seek to maximize is the log of the
determinant of the Fisher information: (we will later show that a guarantee on this
quantity yields a guarantee on the determinant of the Cram´erRao bound matrix)
D(X; z
A
) log
¸
¸
¸J
∅
X
+
a∈A
¯
J
z
a
X
¸
¸
¸
[J
∅
X
[
(3.25)
where J
∅
X
and
¯
J
z
a
X
are as deﬁned in Section 2.1.6. We can also deﬁne an increment
function similar to conditional MI:
D(X; z
A
[z
B
) D(X; z
A∪B
) −D(X; z
B
) (3.26)
= log
¸
¸
¸J
∅
X
+
a∈A∪B
¯
J
z
a
X
¸
¸
¸
¸
¸
¸J
∅
X
+
b∈B
¯
J
z
b
X
¸
¸
¸
(3.27)
102 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
It is easy to see that the reward function D(X; z
A
) is a nondecreasing, submodular
function of the set / by comparing Eq. (3.25) to the MI of a linear Gaussian process
(see Section 2.3.4; also note that I(X; z
A
) =
1
2
D(X; z
A
) in the linear Gaussian case).
The following development derives these properties without using this link. We will
require the following three results from linear algebra; for proofs see [1].
Lemma 3.5. Suppose that A _ B ~ 0. Then B
−1
_ A
−1
~ 0.
Lemma 3.6. Suppose that A _ B ~ 0. Then [A[ ≥ [B[ > 0.
Lemma 3.7. Suppose A ∈ R
n×m
and B ∈ R
m×n
. Then [I +AB[ = [I +BA[.
Theorem 3.10. D(X; z
A
) is a nondecreasing set function of the set /, with
D(X; z
∅
) = 0. Assuming that all observations are independent conditioned on X,
D(X; z
A
) is a submodular function of /.
Proof. To show that D(X; z
A
) is nondecreasing, consider the increment:
D(X; z
A
[z
B
) = log
¸
¸
¸J
∅
X
+
a∈A∪B
¯
J
z
a
X
¸
¸
¸
¸
¸
¸J
∅
X
+
b∈B
¯
J
z
b
X
¸
¸
¸
Since J
∅
X
+
a∈A∪B
¯
J
z
a
X
_ J
∅
X
+
b∈B
¯
J
z
b
X
, we have by Lemma 3.6 that
[J
∅
X
+
a∈A∪B
¯
J
z
a
X
[ ≥ [J
∅
X
+
b∈B
¯
J
z
b
X
[, hence D(X; z
A
[z
B
) ≥ 0. If / = ∅, we trivially
ﬁnd D(X; z
A
) = 0.
For submodularity, we need to prove that ∀ B ⊇ /,
D(X; z
C∪A
) −D(X; z
A
) ≥ D(X; z
C∪B
) −D(X; z
B
) (3.28)
For convenience, deﬁne the shorthand notation:
¯
J
C
X
¯
J
z
C
X
c∈C
¯
J
z
c
X
J
A
X
J
z
A
X
J
∅
X
+
a∈A
¯
J
z
a
X
Sec. 3.6. Guarantees on the Cram´erRao bound 103
We proceed by forming the diﬀerence of the two sides of the expression in Eq. (3.28):
[D(X;z
C∪A
) −D(X; z
A
)] −[D(X; z
C∪B
) −D(X; z
B
)]
=
_
log
[J
A
X
+
¯
J
C\A
X
[
[J
∅
X
[
−log
[J
A
X
[
[J
∅
X
[
_
−
_
log
[J
B
X
+
¯
J
C\B
X
[
[J
∅
X
[
−log
[J
B
X
[
[J
∅
X
[
_
= log
[J
A
X
+
¯
J
C\A
X
[
[J
A
X
[
−log
[J
B
X
+
¯
J
C\B
X
[
[J
B
X
[
= log
¸
¸
I + J
A
X
−
1
2 ¯
J
C\A
X
J
A
X
−
1
2
¸
¸
−log
¸
¸
I + J
B
X
−
1
2 ¯
J
C\B
X
J
B
X
−
1
2
¸
¸
where J
B
X
−
1
2
is the inverse of the symmetric square root matrix of J
B
X
. Since
¯
J
C\A
X
_
¯
J
C\B
X
, we can write through Lemma 3.6:
≥ log
¸
¸
I + J
A
X
−
1
2 ¯
J
C\B
X
J
A
X
−
1
2
¸
¸
−log
¸
¸
I + J
B
X
−
1
2 ¯
J
C\B
X
J
B
X
−
1
2
¸
¸
Factoring
¯
J
C\B
X
and applying Lemma 3.7, we obtain:
= log
¸
¸
I +
¯
J
C\B
X
1
2
J
A
X
−1
¯
J
C\B
X
1
2
¸
¸
−log
¸
¸
I +
¯
J
C\B
X
1
2
J
B
X
−1
¯
J
C\B
X
1
2
¸
¸
Finally, since J
A
X
−1
_ J
B
X
−1
(by Lemma 3.5), we obtain through Lemma 3.6:
≥ 0
This establishes submodularity of D(X; z
A
).
Following Theorem 3.10, if we use the greedy selection algorithm on D(X; z
A
), then
we obtain the guarantee from Theorem 3.1 that D(X; z
G
) ≥ 0.5D(X; z
O
) where z
G
is
the set of observations chosen by the greedy heuristic and z
O
is the optimal set. The
following theorem maps this into a guarantee on the posterior Cram´erRao bound.
Theorem 3.11. Let z
G
be the set of observations chosen by the greedy heuristic op
erating on D(X; z
A
), and let z
O
be the optimal set of observations for this objective.
Assume that, through one of guarantees (online or oﬄine) in Section 3.1 or 3.3, we
have for some β:
D(X; z
G
) ≥ βD(X; z
O
)
Then the determinants of the matrices in the posterior Cram´erRao bound in Sec
tion 2.1.6 satisfy the following inequality:
[C
G
X
[ ≤ [C
∅
X
[
_
[C
O
X
[
[C
∅
X
[
_
β
104 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
The ratio [C
G
X
[/[C
∅
X
[ is the fractional reduction of uncertainty (measured through
covariance determinant) which is gained through using the selected observations rather
than the prior information alone. Thus Theorem 3.11 provides a guarantee on how much
of the optimal reduction you lose by using the greedy heuristic. From Section 2.1.6 and
Lemma 3.6, the determinant of the error covariance of any estimator of X using the
data z
G
is lower bounded by [C
G
X
[.
Proof. By the deﬁnition of D(X; z
A
) (Eq. (3.25)) and from the assumed condition we
have:
[log [J
z
G
X
[ −log [J
∅
X
[] ≥ β[log [J
z
O
X
[ −log [J
∅
X
[]
Substituting in C
z
A
X
= [J
z
A
X
]
−1
and using the identity [A
−1
[ = [A[
−1
we obtain:
[log [C
z
G
X
[ −log [C
∅
X
[] ≤ β[log [C
z
O
X
[ −log [C
∅
X
[]
Exponentiating both sides this becomes:
[C
z
G
X
[
[C
∅
X
[
≤
_
[C
z
O
X
[
[C
∅
X
[
_
β
which is the desired result.
The results of Sections 3.2 and 3.5 do not apply to Fisher information since the
required properties (Eq. (3.4) and Eq. (3.14) respectively) do not generally apply to
D(X; z
A
). Obviously linear Gaussian processes are an exception to this rule, since
I(X; z
A
) =
1
2
D(X; z
A
) in this case.
3.7 Estimation of rewards
The analysis in Sections 3.1–3.6 assumes that all reward values can be calculated exactly.
While this is possible for some common classes of problems (such as linear Gaussian
models), approximations are often necessary. The analysis in [46] can be easily extended
to the algorithms described in this chapter. As an example, consider the proof of
Theorem 3.1, where the greedy heuristic is used with estimated MI rewards,
g
j
= arg max
g∈{1,...,nw
j
}
ˆ
I(X; z
g
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)
Sec. 3.8. Extension: general matroid problems 105
and the error in the MI estimate is bounded by , i.e.,
[I(X; z
g
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
) −
ˆ
I(X; z
g
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j−1
)[ ≤
From step (c) of Theorem 3.1,
I(X; z
o
1
w
1
, . . . , z
o
M
w
M
) ≤ I(X; z
g
1
w
1
, . . . , z
g
M
w
M
) +
M
j=1
I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j
)
≤ I(X; z
g
1
w
1
, . . . , z
g
M
w
M
) +
M
j=1
_
ˆ
I(X; z
o
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j
) +
_
≤ I(X; z
g
1
w
1
, . . . , z
g
M
w
M
) +
M
j=1
_
ˆ
I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j
) +
_
≤ I(X; z
g
1
w
1
, . . . , z
g
M
w
M
) +
M
j=1
_
I(X; z
g
j
w
j
[z
g
1
w
1
, . . . , z
g
j−1
w
j
) + 2
_
= 2I(X; z
g
1
w
1
, . . . , z
g
M
w
M
) + 2M
Hence the deterioration in the performance guarantee is at most 2M.
3.8 Extension: general matroid problems
The guarantees described in this chapter have concentrated on problem structures in
volving several sets of observations, in which we select a ﬁxed number of observations
from each set. In this section, we brieﬂy demonstrate wider a class of problems that
may be addressed using the previous work in [77] (described in Section 2.4.4), which,
to our knowledge, has not been previously applied in this context.
As described in Section 2.4.3, (, F) is a matroid if ∀ /, B ∈ F such that [/[ < [B[,
∃ u ∈ B¸/ such that / ∪ ¦u¦ ∈ F. Consider the class of problems described in
Assumption 3.1, in which we are choosing observations from N sets, and we may choose
k
i
elements from the ith set. It is easy to see that this class of selection problems ﬁts in
to the matroid class: given any two valid
4
observation selection sets /, B with [/[ < [B[,
pick any set i such that the number of elements in / from this set is fewer than the
number of elements in B from this set (such an i must exist since / and B have diﬀerent
cardinality). Then we can ﬁnd an element in B from the ith set which is not in /, but
can be added to / while maintaining a valid set.
4
By valid, we mean that no more than ki elements are chosen from the ith set.
106 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
A commonly occurring structure which cannot be addressed within Assumption 3.1
is detailed in Assumption 3.7. The diﬀerence is that there is an upper limit on the total
number of observations able to be taken, as well as on the number of observations able
to be taken from each set. Under this generalization, the greedy heuristic must consider
all remaining observations at each stage of the selection problem: we cannot visit one
set at a time in an arbitrary order as in Assumption 3.1. This is the key advantage of
Theorem 3.1 over the prior work described in Section 2.4.4 when dealing with problems
that have the structure of Assumption 3.1.
Assumption 3.7. There are N sets of observations, ¦¦z
1
1
, . . . , z
n
1
1
¦, . . . ,
¦z
1
N
, . . . , z
n
N
N
¦¦, which are mutually independent conditioned on the quantity to be
estimated (X). Any k
i
observations can be chosen out of the ith set (¦z
1
i
, . . . , z
n
i
i
¦),
but the total number of observations chosen cannot exceed K.
The structure in Assumption 3.7 clearly remains a matroid: as per the previous
discussion, take any two observation selection sets /, B with [/[ < [B[ and pick any
set i such that the number of elements in / from this set is fewer than the number of
elements in B from this set. Then we can ﬁnd an element in B from the ith set which
is not in /, but can be added to / while maintaining a valid set (since [/[ < [B[ ≤ K).
3.8.1 Example: beam steering
As an example, consider a beam steering problem similar to the one described in Sec
tion 3.1.4, but where the total number of observations chosen (of either object) in the
200 steps should not exceed 50 (we assume an open loop control structure, in which we
choose all observation actions before obtaining any of the resulting observation values).
This may be seen to ﬁt within the structure of Assumption 3.7, so the selection algo
rithm and guarantee of Section 2.4.4 applies. Fig. 3.7 shows the observations chosen
at each time in the previous case (where an observation was chosen in every time step)
and in the constrained case in which a total of 50 observations is chosen.
3.9 Extension: platform steering
A problem structure which generalizes the openloop observation selection problem in
volves control of sensor state. In this case, the controller simultaneously selects obser
vations in order to control an information state, and controls a ﬁnite state, completely
observed Markov chain (with a deterministic transition law) which determines the sub
set of measurements available at each time. We now describe a greedy algorithm which
1
2
0 50 100 150 200
O
b
j
e
c
t
o
b
s
e
r
v
e
d
Time step
(a) Choosing one observation in every time step
1
2
0 50 100 150 200
O
b
j
e
c
t
o
b
s
e
r
v
e
d
Time step
(b) Choosing up to 50 total observations using matroid algorithm
Figure 3.7. (a) shows the observations chosen in the example in Sections 3.1.4 and 3.2.4
when q = 1. (b) shows the smaller set of observations chosen in the constrained problem
using the matroid selection algorithm.
108 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
one may use to control such a problem. We assume that in each sensor state there is a
single measurement available.
Deﬁnition 3.3. The greedy algorithm for jointly selecting observations and controlling
sensor state commences in the following manner: (s
0
is the initial state of the algorithm
which is ﬁxed upon execution)
Stage 0: Calculate the reward of the initial observation
J
0
= I(x; s
0
)
Stage 1: Consider each possible sensor state which can follow s
0
. Calculate the reward
J
1
(s
1
) =
_
_
_
J
0
+ I(x; s
1
[s
0
), s
1
∈ o
s
0
−∞, otherwise
Stage i: For each s
i
, calculate the highest reward sequence which can precede that state:
s
g
i−1
(s
i
) = arg max
s
i−1
s
i
∈S
s
i−1
J
i−1
(s
i−1
)
Then, for each s
i
, add the reward of the new observation obtained in that state:
J
i
(s
i
) = J
i−1
(s
g
i−1
(s
i
)) + I(x; s
i
[s
g
i−1
(s
i
), s
g
i−2
(s
g
i−1
(s
i
)), . . . , s
0
)
After stage N, calculate ˜ s
g
N
= arg max
s
J
N
(s). The ﬁnal sequence is found through
the backward recursion ˜ s
g
i
= s
g
i
(˜ s
g
i+1
).
The following example demonstrates that the ratio between the greedy algorithm
for open loop joint observation and sensor state control and the optimal open loop
algorithm can be arbitrarily close to zero.
Example 3.5. We seek to control the sensor state s
k
∈ ¦0, . . . , 2N +1¦. In stage 0 we
commence from sensor state s
0
= 0. From sensor state 0 we can transition to any other
sensor state. From sensor state s ,= 0, we can stay in the same state, or transition to
sensor state (s −1) (provided that s > 1) or (s + 1) (provided that s < 2N + 1).
The unobserved state, x, about which we seek to gather information is
static (x
1
= x
2
= = x), and consists of N(2N + 1) binary elements
Sec. 3.9. Extension: platform steering 109
¦x
1,1
, . . . , x
1,2N+1
, . . . , x
N,1
, . . . , x
N,2N+1
¦. The prior distribution of these elements is
uniform.
The observation in sensor state 2j −2, j ∈ ¦1, . . . , N¦ is uninformative. The obser
vation in sensor state 2j − 1, j ∈ ¦1, . . . , N¦ at stage i provides a direct measurement
of the state elements ¦x
j,1
, . . . , x
j,i
¦.
The greedy algorithm commences stage 0 with J
0
= 0. In stage 1, we obtain:
J
1
(s
1
) =
_
¸
¸
¸
_
¸
¸
¸
_
−∞, s
1
= 0
0, s
1
positive and even
1, s
1
odd
Suppose that we commence stage i with
J
i−1
(s
i−1
) =
_
¸
¸
¸
_
¸
¸
¸
_
−∞, s
i−1
= 0
i −2, s
i−1
positive and even
i −1, s
i−1
odd
Settling ties by choosing the state with lower index, we obtain:
s
g
i−1
(s
i
) =
_
¸
¸
¸
_
¸
¸
¸
_
undeﬁned, s
i
= 0
s
i
−1, s
i
positive and even
s
i
, s
i
odd
Incorporating the new observation, we obtain:
J
i
(s
i
) =
_
¸
¸
¸
_
¸
¸
¸
_
−∞, s
i
= 0
i −1, s
i
positive and even
i, s
i
odd
At the end of stage 2N +1, we ﬁnd that the best sequence remains in any oddnumbered
state for all stages, and obtains a reward of 2N + 1.
Compare this result to the optimal sequence, which visits state i at stage i. The
reward gained in each stage is:
I(x; s
i
[s
0
, . . . , s
i−1
) =
_
_
_
0, i even
i, i odd
110 CHAPTER 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES
The total reward is thus
N
j=0
(2j + 1) = N
2
+ 2N + 1
The ratio of greedy reward to optimal reward for the problem involving 2N+1 stages
is:
2N + 1
N
2
+ 2N + 1
→ 0, N → ∞
3.10 Conclusion
The performance guarantees presented in this chapter provide theoretical basis for sim
ple heuristic algorithms that are widely used in practice. The guarantees apply to both
open loop and closed loop operation, and are naturally tighter for diﬀusive processes, or
discounted objectives. The examples presented throughout the chapter demonstrate the
applicability of the guarantees to a wide range of waveform selection and beam steering
problems, and the substantially stronger online guarantees that can be obtained for spe
ciﬁc problems through computation of additional quantities after the greedy selection
has been completed.
Chapter 4
Independent objects and
integer programming
I
N this chapter, we use integer programming methods to construct an open loop plan
of which sensor actions to perform within a given planning horizon. We may either
execute this entire plan, or execute some portion of the plan before constructing an
updated plan (socalled open loop feedback control, as discussed in Section 2.2.2).
The emphasis of our formulation is to exploit the structure which results in sensor
management problems involving observation of multiple independent objects. In addi
tion to the previous assumption that observations should be independent conditioned
on the state, three new assumptions must be met for this structure to arise:
1. The prior distribution of the objects must be independent
2. The objects must evolve according to independent dynamical processes
3. The objects must be observed through independent observation processes
When these three assumptions are met, the mutual information reward of observations
of diﬀerent objects becomes the sum of the individual observation rewards—i.e., sub
modularity becomes additivity. Accordingly, one may apply a variety of techniques that
exploit the structure of integer programming with linear objectives and constraints.
We commence this chapter by developing in Section 4.1 a simple formulation that
allows us to select up to one observation for each object. In Section 4.2, we generalize
this to our proposed formulation, which ﬁnds the optimal plan, permits multiple ob
servations of each object, and can address observations that require diﬀerent durations
to complete. In Section 4.4, we perform experiments which explore the computational
eﬃciency of this formulation on a range of problems. The formulation is generalized
to consider resources with arbitrary capacities in Section 4.5; this structure is useful in
111
112 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
problems involving time invariant rewards.
4.1 Basic formulation
We commence by presenting a special case of the abstraction discussed in Section 4.2.
As per the previous chapter, we assume that the planning horizon is broken into discrete
time slots, numbered ¦1, . . . , N¦, that a sensor can perform at most one task in any
time slot, and that each observation action occupies exactly one time slot. We have a
number of objects, numbered ¦1, . . . , M¦, each of which can be observed in any time
slot. Our task is to determine which object to observe in each time slot. Initially, we
assume that we have only a single mode for the sensor to observe each object in each
time slot, such that the only choice to be made in each time slot is which object to
observe; this represents the purest form of the beam steering structure discussed in
Section 1.1.
To motivate this structure, consider a problem in which we use an airborne sensor to
track objects moving on the ground beneath foliage. In some positions, objects will be
in clear view and observation will yield accurate position information; in other positions,
objects will be obscured by foliage and observations will be essentially uninformative.
Within the time scale of a planning horizon, objects will move in and out of obscuration,
and it will be preferable to observe objects during the portion of time in which they are
expected to be in clear view.
4.1.1 Independent objects, additive rewards
The basis for our formulation is the fact that rewards for observations of independent
objects are additive. Denoting by X
i
= ¦x
i
1
, . . . , x
i
N
¦ the joint state (over the planning
horizon) of object i, we deﬁne the reward of observation set /
i
⊆ ¦1, . . . , N¦ of object
i (i.e., /
i
represents the subset of time slots in which we observe object i) to be:
r
i
A
i
= I(X
i
; z
i
A
i
) (4.1)
where z
i
A
i
are the random variables corresponding to the observations of object i in the
time slots in /
i
. As discussed in the introduction of this chapter, if we assume that the
initial states of the objects are independent:
p(x
1
1
, . . . , x
M
1
) =
M
i=1
p(x
i
1
) (4.2)
Sec. 4.1. Basic formulation 113
and that the dynamical processes are independent:
p(x
1
k
, . . . , x
M
k
[x
1
k−1
, . . . , x
M
k−1
) =
M
i=1
p(x
i
k
[x
i
k−1
) (4.3)
and, ﬁnally, that observations are independent conditioned on the state, and that each
observation relates to a single object:
p(z
1
A
1
, . . . , z
M
A
M
[X
1
, . . . , X
M
) =
M
i=1
k∈A
i
p(z
i
k
[X
i
) (4.4)
then the conditional distributions of the states of the objects will be independent con
ditioned on any set of observations:
p(X
1
, . . . , X
M
[z
1
A
1
, . . . , z
M
A
M
) =
M
i=1
p(X
i
[z
i
A
i
) (4.5)
In this case, we can write the reward of choosing observation set /
i
for object i ∈
¦1, . . . , M¦ as:
I(X
1
, . . . , X
M
; z
1
A
1
, . . . , z
M
A
M
) = H(X
1
, . . . , X
M
) −H(X
1
, . . . , X
M
[z
1
A
1
, . . . , z
M
A
M
)
=
M
i=1
H(X
i
) −
M
i=1
H(X
i
[z
i
A
i
)
=
M
i=1
I(X
i
; z
i
A
i
)
=
M
i=1
r
i
A
i
(4.6)
4.1.2 Formulation as an assignment problem
While Eq. (4.6) shows that rewards are additive across objects, the reward for taking
several observations of the same object is not additive. In fact, it can be easily shown
through submodularity that (see Lemma 4.3)
I(X
i
; z
i
A
i
) ≤
k∈A
i
I(X
i
; z
i
k
)
As an initial approach, consider the problem structure which results if we restrict our
selves to observing each object at most once. For this restriction to be sensible, we must
114 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
assume that the number of time slots is no greater than the number of objects, so that
an observation will be taken in each time slot. In this case, the overall reward is simply
the sum of single observation rewards, since each set /
i
has cardinality at most one.
Accordingly, the problem of determining which object to observe at each time reduces
to an asymmetric assignment problem, assigning time slots to objects. The assignment
problem may be written as a linear program in the following form:
max
ω
i
k
M
i=1
N
k=1
r
i
{k}
ω
i
k
(4.7a)
s.t.
M
i=1
ω
i
k
≤ 1 ∀ k ∈ ¦1, . . . , N¦ (4.7b)
N
k=1
ω
i
k
≤ 1 ∀ i ∈ ¦1, . . . , M¦ (4.7c)
ω
i
k
∈ ¦0, 1¦ ∀ i, k (4.7d)
The binary indicator variable ω
i
k
assumes the value of one if object i is observed in time
slot k and zero otherwise. The integer program may be interpreted as follows:
• The objective in Eq. (4.7a) is the sum of the rewards corresponding to each ω
i
k
that assumes a nonzero value, i.e., the sum of the rewards of each choice of a
particular object to observe in a particular time slot.
• The constraint in Eq. (4.7b) requires that at most one object can be observed in
any time slot. This ensures that physical sensing constraints are not exceeded.
• The constraint in Eq. (4.7c) requires that each object can be observed at most
once. This ensures that the additive reward objective provides the exact reward
value (the reward for selecting two observations of the same object is not the sum
of the rewards of each individual observation).
• The integrality constraint in Eq. (4.7d) requires solutions to take on binary values.
Because of the structure of the assignment problem, this can be relaxed to allow
any ω
i
k
∈ [0, 1], and there will still be an integer point which attains the optimal
solution.
The assignment problem can be solved eﬃcient using algorithms such as Munkres
[72], JonkerVolgenantCasta˜ n´on [24], or Bertsekas’ auction [11]. We use the auction
algorithm in our experiments.
0
2
4
6
8
10
0
20
40
0
0.5
1
1.5
Time step
Illustration of reward trajectories and resulting assignment
Object number
R
e
w
a
r
d
Figure 4.1. Example of operation of assignment formulation. Each “strip” in the diagram
corresponds to the reward for observing a particular object at diﬀerent times over the 10
step planning horizon (assuming that it is only observed once within the horizon). The
role of the auction algorithm is to pick one unique object to observe at each time in the
planning horizon in order to maximize the sum of the rewards gained. The optimal solution
is shown as black dots.
116 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
One can gain an intuition for this formulation from the diagram in Fig. 4.1. The
rewards correspond to a snapshot of the scenario discussed in Section 4.1.3. The sce
nario considers the problem of object tracking when probability of detection varies with
position. The diagram illustrates the tradeoﬀ which the auction algorithm performs:
rather than taking the highest reward observation at the ﬁrst time (as the greedy heuris
tic would do), the controller defers measurement of that object until a later time when
a more valuable observation is available. Instead, it measures at the ﬁrst time an ob
ject with comparatively lower reward value, but one for which the observations at later
times are still less valuable.
4.1.3 Example
The approach was tested on a tracking scenario in which a single sensor is used to
simultaneously track 20 objects. The state of object i at time k, x
i
k
, consists of position
and velocity in two dimensions. The state evolves according to a linear Gaussian model:
x
i
k+1
= Fx
i
k
+w
i
k
(4.8)
where w
i
k
∼ ^¦w
i
k
; 0, Q¦ is a white Gaussian noise process. F and Q are set as:
F =
_
¸
¸
¸
¸
_
1 T 0 0
0 1 0 0
0 0 1 T
0 0 0 1
_
¸
¸
¸
¸
_
; Q = q
_
¸
¸
¸
¸
_
T
3
3
T
2
2
0 0
T
2
2
T 0 0
0 0
T
3
3
T
2
2
0 0
T
2
2
T
_
¸
¸
¸
¸
_
(4.9)
The diﬀusion strength q is set to 0.01. The sensor can be used to observe any one of the
M objects in each time step. The measurement obtained from observing object u
k
with
the sensor consists of a detection ﬂag d
u
k
k
∈ ¦0, 1¦ and, if d
u
k
k
= 1, a linear Gaussian
measurement of the position, z
u
k
k
:
z
u
k
k
= Hx
u
k
k
+v
u
k
k
(4.10)
where v
u
k
k
∼ ^¦v
u
k
k
; 0, R¦ is a white Gaussian noise process, independent of w
u
k
k
. H
and R are set as:
H =
_
1 0 0 0
0 0 1 0
_
; R =
_
5 0
0 5
_
(4.11)
The probability of detection P
d
u
k
k
x
u
k
k
(1[x
u
k
k
) is a function of object position. The
function is randomly generated for each Monte Carlo simulation; an example of the
x position
y
p
o
s
i
t
i
o
n
Probability of detection
200 400 600
100
200
300
400
500
600
700 0
0.2
0.4
0.6
0.8
1
Figure 4.2. Example of randomly generated detection map. The color intensity indicates
the probability of detection at each x and y position in the region.
0 5 10 15 20
0
20
40
60
80
100
Planning horizon
T
o
t
a
l
r
e
w
a
r
d
Performance for 20 objects over 200 simulations
Figure 4.3. Performance tracking M = 20 objects. Performance is measured as the
average (over the 200 simulations) total change in entropy due to incorporating chosen
measurements over all time. The point with a planning horizon of zero corresponds to
observing objects sequentially; with a planning horizon of one the auctionbased method
is equivalent to greedy selection. Error bars indicate 1σ conﬁdence bounds for the estimate
of average total reward.
Sec. 4.2. Integer programming generalization 119
function is illustrated in Fig. 4.2. The function may be viewed as an obscuration map,
e.g. due to foliage. Estimation is performed using the Gaussian particle ﬁlter [45].
The performance over 200 Monte Carlo runs is illustrated in Fig. 4.3. The point
with a planning horizon of zero corresponds to a raster, in which objects are observed
sequentially. With a planning horizon of one, the auctionbased algorithm corresponds
to greedy selection. The performance is measured as the average (over the 200 sim
ulations) total change in entropy due to incorporating chosen measurements over all
time. The diagram demonstrates that, with the right choice of planning horizon, the
assignment formulation is able to improve performance over the greedy method. The
reduction in performance for longer planning horizons is a consequence of the restriction
to observe each object at most once in the horizon. If the planning horizon is on the
order of the number of objects, we are then, in eﬀect, enforcing that each object must
be observed once. As illustrated in Fig. 4.1, in this scenario there will often be objects
receiving low reward values throughout the planning interval, hence by forcing the con
troller to observe each object, we are forcing it to (at some stage) take observations
of little value. It is not surprising that the increase in performance above the greedy
heuristic is small, since the performance guarantees discussed in the previous chapter
apply to this scenario.
These limitations motivate the generalization explored in the following section,
which allows us to admit multiple observations of each object, as well as observations
that require diﬀerent durations to complete (note that the performance guarantees of
Chapter 3 require all observations to consume a single time slot, hence they are not
applicable to this wider class of problems).
4.2 Integer programming generalization
An abstraction of the previous analysis replaces the discrete time slots ¦1, . . . , N¦ with
a set of available resources, ¹ (assumed ﬁnite), the elements of which may correspond
to the use of a particular sensor over a particular interval of time. As in the previous
section, each element of ¹ can be assigned to at most one task. Unlike the previous
section and the previous chapter, the formulation in this section allows us to accom
modate observations which consume multiple resources (e.g., multiple time slots on the
same sensor, or the same time slot on multiple sensors). We also relax the constraint
that each object may be observed at most once, and utilize a more advanced integer
programming formulation to ﬁnd an eﬃcient solution.
120 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
4.2.1 Observation sets
Let 
i
= ¦u
i
1
, . . . , u
i
L
i
¦ be the set of elemental observation actions (assumed ﬁnite) that
may be used for object i, where each elemental observation u
i
j
corresponds to observing
object i using a particular mode of a particular sensor within a particular period of
time. An elemental action may occupy multiple resources; let t(u
i
j
) ⊆ ¹ be the subset
of resource indices consumed by the elemental observation action u
i
j
. Let S
i
⊆ 2
U
i
be
the collection of observation subsets which we allow for object i. This is assumed to
take the form of Eq. (4.12), (4.13) or (4.14); note that in each case it is an independence
system, though not necessarily a matroid (as deﬁned in Section 2.4), since observations
may consume diﬀerent resource quantities. If we do not limit the sensing resources that
are allowed to be used for object i, the collection will consist of all subsets of 
i
for
which no two elements consume the same resource:
S
i
= ¦/ ⊆ 
i
[t(u
1
)∩t(u
2
) = ∅ ∀ u
1
, u
2
∈ /¦ (4.12)
Alternatively we may limit the total number of elemental observations allowed to be
taken for object i to k
i
:
S
i
= ¦/ ⊆ 
i
[t(u
1
)∩t(u
2
) = ∅ ∀ u
1
, u
2
∈ /, [/[ ≤ k
i
¦ (4.13)
or limit the total quantity of resources allowed to be consumed for object i to R
i
:
S
i
=
_
/ ⊆ 
i
¸
¸
¸
¸
¸
t(u
1
)∩t(u
2
) = ∅ ∀ u
1
, u
2
∈ /,
u∈A
[t(u)[ ≤ R
i
_
(4.14)
We denote by t(/) ⊆ ¹ the set of resources consumed by the actions in set /, i.e.,
t(/) =
_
u∈A
t(u)
The problem that we seek to solve is that of selecting the set of observation actions
for each object such that the total reward is maximized subject to the constraint that
each resource can be used at most once.
4.2.2 Integer programming formulation
The optimization problem that we seek to solve is reminiscent of the assignment problem
in Eq. (4.7), except that now we are assigning to each object a set of observations, rather
Sec. 4.2. Integer programming generalization 121
than a single observation:
max
ω
i
A
i
M
i=1
A
i
∈S
i
r
i
A
i
ω
i
A
i
(4.15a)
s.t.
M
i=1
A
i
∈S
i
t∈t(A
i
)
ω
i
A
i
≤ 1 ∀ t ∈ ¹ (4.15b)
A
i
∈S
i
ω
i
A
i
= 1 ∀ i ∈ ¦1, . . . , M¦ (4.15c)
ω
i
A
i
∈ ¦0, 1¦ ∀ i, /
i
∈ S
i
(4.15d)
Again, the binary indicator variables ω
i
A
i
are 1 if the observation set /
i
is chosen and
0 otherwise. The interpretation of each line of the integer program follows.
• The objective in Eq. (4.15a) is the sum of the rewards of the subset selected for
each object i (i.e., the subsets for which ω
i
A
i
= 1).
• The constraints in Eq. (4.15b) ensure that each resource (e.g., sensor time slot)
is used at most once.
• The constraints in Eq. (4.15c) ensure that exactly one observation set is chosen
for any given object. This is necessary to ensure that the additive objective is the
exact reward of corresponding selection (since, in general, r
i
A∪B
,= r
i
A
+r
i
B
). Note
that the constraint does not force us to take an observation of any object, since
the empty observation set is allowed (∅ ∈ S
i
) for each object i.
• The integrality constraints in Eq. (4.15d) ensure that the selection variables take
on the values zero (not selected) or one (selected).
Unlike the formulation in Eq. (4.7), the integrality constraints in Eq. (4.15d) can
not be relaxed. The problem is not a pure assignment problem, as the observation
subsets /
i
∈ S
i
consume multiple resources and hence appear in more than one of
the constraints deﬁned by Eq. (4.15b). The problem is a bundle assignment problem,
and conceptually could be addressed using combinatorial auction methods (e.g., [79]).
However, generally this would require computation of r
i
A
i
for every subset /
i
∈ S
i
. If
the collections of observation sets S
i
, i ∈ ¦1, . . . , M¦ allow for several observations to
be taken of the same object, the number of subsets may be combinatorially large.
122 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
Our approach exploits submodularity to solve a sequence of integer programs, each
of which represents the subsets available through a compact representation. The solu
tion of the integer program in each iteration provides an upper bound to the optimal
reward, which becomes increasingly tight as iterations progress. The case in which
all observations are equivalent, such that the only decision is to determine how many
observations to select for each object, could be addressed using [5]. In this thesis, we
address the general case which arises when observations are heterogeneous and require
diﬀerent subsets of resources (e.g., diﬀerent time durations). The complexity associated
with evaluating the reward of each of an exponentially large collection of observation
sets can be ameliorated using a constraint generation approach, as described in the
following section.
4.3 Constraint generation approach
The previous section described a formulation which conceptually could be used to ﬁnd
the optimal observation selection, but the computational complexity of the formulation
precludes its utility. This section details an algorithm that can be used to eﬃciently
solve the integer program in many practical situations. The method proceeds by se
quentially solving a series of integer programs with progressively greater complexity. In
the limit, we arrive at the full complexity of the integer program in Eq. (4.15), but in
many practical situations it is possible to terminate much sooner with an optimal (or
nearoptimal) solution.
The formulation may be conceptually understood as dividing the collection of sub
sets for each object (S
i
) at iteration l into two collections: T
i
l
⊆ S
i
and the remainder
S
i
¸T
i
l
. The subsets in T
i
l
are those for which the exact reward has been evaluated;
we will refer to these as candidate subsets.
Deﬁnition 4.1 (candidate subset). The collection of candidate subsets, T
i
l
⊆ S
i
,
is the collection subsets of observations for object i for which the exact reward has
been evaluated prior to iteration l. New subsets are added to the collection at each
iteration (and their rewards calculated), so that T
i
l
⊆ T
i
l+1
for all l. We commence
with T
i
l
= ¦∅¦.
The reward of each of the remaining subsets (i.e., those in S
i
¸T
i
l
) has not been
evaluated, but an upper bound to each reward is available. In practice, we will not
explicitly enumerate the elements in S
i
¸T
i
l
; rather we use a compact representation
which obtains upper bounds through submodularity (details of this will be given later
Sec. 4.3. Constraint generation approach 123
in Lemma 4.3).
In each iteration of the algorithm we solve an integer program, the solution of which
selects a subset for each object, ensuring that the resource constraints (e.g., Eq. (4.15b))
are satisﬁed. If the subset that the integer program selects for each object i is in T
i
l
—
i.e., it is a subset which had been generated and for which the exact reward had been
evaluated in a previous iteration—then we have found an optimal solution to the original
problem, i.e., Eq. (4.15). Conversely, if the integer program selects a subset in S
i
¸T
i
l
for one or more objects, then we need to tighten the upper bounds on the rewards of
those subsets; one way of doing this is to add the newly selected subsets to T
i
l
and
evaluate their exact rewards. Each iteration of the optimization reconsiders all decision
variables, allowing the solution from the previous iteration to be augmented or reversed
in any way.
The compact representation of S
i
¸T
i
l
associates with each candidate subset, /
i
∈
T
i
l
, a subset of observation actions, B
i
l,A
i
; /
i
may be augmented with any subset of
B
i
l,A
i
to generate new subsets that are not in T
i
l
(but that are in S
i
). We refer to B
i
l,A
i
as an exploration subset, since it provides a mechanism for discovering promising new
subsets that should be incorporated into T
i
l+1
.
Deﬁnition 4.2 (exploration subset). With each candidate subset /
i
∈ T
i
l
we associate
an exploration subset B
i
l,A
i
⊆ 
i
. The candidate subset /
i
may be augmented with any
subset of elemental observations from B
i
l,A
i
(subject to resource constraints) to generate
subsets in S
i
¸T
i
l
.
The solution of the integer program at each iteration l is a choice of one candidate
subset, /
i
∈ T
i
l
, for each object i, and a subset of elements of the corresponding
exploration subset, (
i
⊆ B
i
l,A
i
. The subset of observations selected by the integer
program for object i is the union of these, /
i
∪ (
i
.
Deﬁnition 4.3 (selection). The integer program at each iteration l selects a subset
of observations, T
i
∈ S
i
, for each object i. The selected subset, T
i
, is indicated
indirectly through a choice of one candidate subset, /
i
, and a subset of elements from
the corresponding exploration subset, (
i
⊆ B
i
l,A
i
(possibly empty), such that T
i
= /
i
∪(
i
.
The update algorithm (Algorithm 4.1) speciﬁes the way in which the collection of
candidate subsets T
i
l
and the exploration subsets B
i
l,A
i
are updated between iterations
using the selection results of the integer program. We will prove in Lemma 4.1 that
this update procedure ensures that there is exactly one way of selecting each subset
124 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
T
i
∈ S
i
, augmenting a choice of /
i
∈ T
i
l
with a subset of elements of the exploration
subset (
i
⊆ B
i
l,A
i
.
Deﬁnition 4.4 (update algorithm). The update algorithm takes the result of the integer
program at iteration l and determines the changes to make to T
i
l+1
(i.e., which new
candidate subsets to add), and B
i
l+1,A
i
for each /
i
∈ T
i
l+1
for iteration (l + 1).
As well as evaluating the reward of each candidate subset /
i
∈ T
i
l
, we also evaluate
the incremental reward of each element in the corresponding exploration subset, B
i
l,A
i
.
The incremental rewards are used to obtain upper bounds on the reward of observation
sets in S
i
¸T
i
l
generated using candidate subsets and exploration subsets.
Deﬁnition 4.5 (incremental reward). The incremental reward r
uA
i of an elemental
observation u ∈ B
i
l,A
i
given a candidate subset /
i
is the increase in reward for choosing
the single new observation u when the candidate subset /
i
is already chosen:
r
i
uA
i
= r
i
A
i
∪{u}
−r
i
A
i
4.3.1 Example
Consider a scenario involving two objects. There are three observations available for
object 1 (
1
= ¦a, b, c¦), and four observations for object 2 (
2
= ¦d, e, f, g¦). There
are four resources (¹ = ¦α, β, γ, δ¦); observations a, b, and c consume resources α, β
and γ respectively, while observations d, e, f and g consume resources α, β, γ and δ
respectively. The collection of possible subsets for object i is S
i
= 2
U
i
.
The subsets involved in iteration l of the algorithm are illustrated in Fig. 4.4. The
candidate subsets shown in the circles in the diagram and the corresponding exploration
subsets shown in the rectangles attached to the circles are the result of previous itera
tions of the algorithm. The exact reward of each candidate subset has been evaluated,
as has the incremental reward of each element of the corresponding exploration subset.
The sets are constructed such that there is a unique way of selecting any subset of
observations in S
i
.
The integer program at iteration l can choose any candidate subset /
i
∈ T
i
l
, aug
mented by any subset of the corresponding exploration subset. For example, for object
1, we could select the subset ¦c¦ by selecting the candidate subset ∅ with the exploration
subset element ¦c¦, and for object 2, we could select the subset ¦d, e, g¦ by choosing
the candidate subset ¦g¦ with the exploration subset elements ¦d, e¦. The candidate
B
2
l,{g}
= {d, e, f} B
2
l,∅
= {d, e, f}
∅ {g}
Object 2
Candidate subsets (T
2
l
)
B
1
l,{a,b}
= {c} B
1
l,{a}
= {c} B
1
l,∅
= {b, c}
∅ {a} {a, b}
Object 1
Candidate subsets (T
1
l
)
Exploration subsets
Exploration subsets
Figure 4.4. Subsets available in iteration l of example scenario. The integer program may
select for each object any candidate subset in T
i
l
, illustrated by the circles, augmented
by any subset of elements from the corresponding exploration subset, illustrated by the
rectangle connected to the circle. The sets are constructed such that there is a unique way
of selecting any subset of observations in S
i
. The subsets selected for each object must
collectively satisfy the resource constraints in order to be feasible. The shaded candidate
subsets and exploration subset elements denote the solution of the integer program at this
iteration.
126 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
subsets and exploration subsets are generated using an update algorithm which ensures
that there is exactly one way of selecting each subset; e.g., to select the set ¦a, c¦ for
object 1, we must choose candidate subset ¦a¦ and exploration subset element c; we
cannot select candidate subset ∅ with exploration subset elements ¦a, c¦ since a / ∈ B
1
l,∅
.
We now demonstrate the operation of the update algorithm that is described in
detail in Section 4.3.3. Suppose that the solution of the integer program at iteration
l selects subset ¦a¦ for object 1, and subset ¦e, f, g¦ for object 2 (i.e., the candidate
subsets and exploration subset elements that are shaded in Fig. 4.4). Since ¦a¦ ∈ T
1
l
the exact reward of the subset selected for object 1 has already been evaluated and we
do not need to modify the candidate subsets for object 1, so we simply set T
1
l+1
= T
1
l
.
For object 2, we ﬁnd that ¦e, f, g¦ / ∈ T
2
l
, so an update is required. There are many
ways that this update could be performed. Our method (Algorithm 4.1) creates a new
candidate subset
˜
/ consisting of the candidate subset selected for the object (¦g¦),
augmented by the single element (out of the selected exploration subset elements) with
the highest incremental reward. Suppose in our case that the reward of the subset ¦e, g¦
is greater than the reward of ¦f, g¦; then
˜
/ = ¦e, g¦.
The subsets in iteration (l + 1) are illustrated in Fig. 4.5. The sets that were
modiﬁed in the update are shaded in the diagram. There remains a unique way of
selecting each subset of observations; e.g., the only way to select elements g and e
together (for object 2) is to select the new candidate subset ¦e, g¦, since element e
was removed from the exploration subset for candidate subset ¦g¦ (i.e., B
2
l+1,{g}
). The
procedure that assuring that this is always the case is part of the algorithm which we
describe in Section 4.3.3.
4.3.2 Formulation of the integer program in each iteration
The collection of candidate subsets at stage l of the solution is denoted by T
i
l
⊆ S
i
,
while the exploration subset corresponding to candidate subset /
i
∈ T
i
l
is denoted by
B
i
l,A
i
⊆ 
i
. To initialize the problem, we select T
i
0
= ¦∅¦, and B
i
0,∅
= 
i
for all i. The
B
2
l+1,{e,g}
= {d, f} B
2
l+1,{g}
= {d, f} B
2
l+1,∅
= {d, e, f}
∅ {g} {e, g}
Object 2
Candidate subsets (T
2
l+1
)
B
1
l+1,{a,b}
= {c} B
1
l+1,{a}
= {c} B
1
l+1,∅
= {b, c}
∅ {a} {a, b}
Object 1
Candidate subsets (T
1
l+1
)
Exploration subsets
Exploration subsets
Figure 4.5. Subsets available in iteration (l + 1) of example scenario. The subsets that
were modiﬁed in the update between iterations l and (l + 1) are shaded. There remains a
unique way of selecting each subset of observations; e.g., the only way to select elements
g and e together (for object 2) is to select the new candidate subset {e, g}, since element
e was removed from the exploration subset for candidate subset {g} (i.e., B
2
l+1,{g}
).
128 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
integer program that we solve at each stage is:
max
ω
i
A
i
, ω
i
uA
i
M
i=1
A
i
∈T
i
l
_
¸
_r
i
A
i
ω
i
A
i
+
u∈B
i
l,A
i
r
i
uA
i
ω
i
uA
i
_
¸
_ (4.16a)
s.t.
M
i=1
A
i
∈T
i
l
t∈t(A
i
)
ω
i
A
i
+
M
i=1
A
i
∈T
i
l
u∈B
i
l,A
i
t∈t(u)
ω
i
uA
i
≤ 1 ∀ t ∈ ¹ (4.16b)
A
i
∈T
i
ω
i
A
i
= 1 ∀ i ∈ ¦1, . . . , M¦ (4.16c)
u∈B
i
l,A
i
ω
i
uA
i
−[B
i
l,A
i
[ω
i
A
i
≤ 0 ∀ i, /
i
∈ T
i
l
(4.16d)
ω
i
A
i
∈ ¦0, 1¦ ∀ i, /
i
∈ T
i
(4.16e)
ω
i
uA
i
∈ ¦0, 1¦ ∀ i, /
i
∈ T
i
, u ∈ B
i
l,A
i
(4.16f)
If there is a cardinality constraint of the form of Eq. (4.13) on the maximum number of
elemental observations allowed to be used on any given object, we add the constraints:
A
i
∈T
i
l
_
¸
_[/
i
[ω
i
A
i
+
u∈B
i
l,A
i
ω
i
uA
i
_
¸
_ ≤ k
i
∀ i ∈ ¦1, . . . , M¦ (4.16g)
Alternatively, if there is a constraint of the form of Eq. (4.14) on the maximum number
of elements of the resource set ¹ allowed to be utilized on any given object, we add the
constraints:
A
i
∈T
i
l
_
¸
_t(/
i
)ω
i
A
i
+
u∈B
i
l,A
i
t(u)ω
i
uA
i
_
¸
_ ≤ R
i
∀ i ∈ ¦1, . . . , M¦ (4.16h)
The selection variable ω
i
A
i
indicates whether candidate subset /
i
∈ T
i
l
is chosen; the
constraint in Eq. (4.16c) guarantees that exactly one candidate subset is chosen for each
object. The selection variables ω
i
uA
i
indicate the elements of the exploration subset
corresponding to /
i
that are being used to augment the candidate subset. All selection
variables are either zero (not selected) or one (selected) due to the integrality constraints
in Eq. (4.16e) and Eq. (4.16f). In accordance with Deﬁnition 4.3, the solution of the
integer program selects for each object i a subset of observations:
Sec. 4.3. Constraint generation approach 129
Deﬁnition 4.6 (selection variables). The values of the selection variables, ω
i
A
i
, ω
i
uA
i
determine the subsets selected for each object i. The subset selected for object i is
T
i
= /
i
∪ (
i
where /
i
is the candidate subset for object i such that ω
i
A
i
= 1 and
(
i
= ¦u ∈ B
i
l,A
i
[ω
i
uA
i
= 1¦.
The objective and constraints in Eq. (4.16) may be interpreted as follows:
• The objective in Eq. (4.16a) is the sum of the reward for the candidate subset
selected for each object, plus the incremental rewards of any exploration subset
elements that are selected. As per Deﬁnition 4.5, the reward increment r
i
uA
i
=
(r
i
A
i
∪{u}
−r
i
A
i
) represents the additional reward for selecting the elemental action
u given that the candidate subset /
i
has been selected. We will see in Lemma 4.3
that, due to submodularity, the sum of the reward increments
u
r
i
uA
i
is an
upper bound for the additional reward obtained for selecting those elements given
that the candidate set /
i
has been selected.
1
• The constraint in Eq. (4.16b) dictates that each resource can only be used once,
either by a candidate subset or an exploration subset element; this is analogous
with Eq. (4.15b) in the original formulation. The ﬁrst term includes all candidate
subsets that consume resource t, while the second includes all exploration subset
elements that consume resource t.
• The constraint in Eq. (4.16c) speciﬁes that exactly one candidate subset should
be selected per object. At each solution stage, there will be a candidate subset
corresponding to taking no observations (/
i
= ∅), hence this does not force the
system to take an observation of any given object.
• The constraint in Eq. (4.16d) speciﬁes that exploration subset elements which
correspond to a given candidate subset can only be chosen if that candidate subset
is chosen.
• The integrality constraints in Eq. (4.16e) and Eq. (4.16f) require each variable to
be either zero (not selected) or one (selected).
Again, the integrality constraints cannot be relaxed—the problem is not an assign
ment problem since candidate subsets may consume multiple resources, and there are
side constraints (Eq. (4.16d)).
1
Actually, the reward for selecting a candidate subset and one exploration subset variable is a exact
reward value; the reward for selecting a candidate subset and two or more exploration subset variables
is an upper bound.
130 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
4.3.3 Iterative algorithm
Algorithm 4.1 describes the iterative manner in which the integer program in Eq. (4.16)
is applied:
T
i
0
= ∅ ∀ i; B
i
0,∅
= 
i
∀ i; l = 0 1
evaluate r
i
u∅
∀ i, u ∈ B
i
0,∅
2
solve problem in Eq. (4.16) 3
while ∃ i /
i
such that
u∈B
i
l,A
i
ω
i
uA
i
> 1 do
4
for i ∈ ¦1, . . . , M¦ do 5
let
˚
/
i
l
be the unique subset such that ω
i
˚
A
i
l
= 1
6
if
u∈B
i
l,
˚
A
i
l
ω
i
u
˚
A
i
l
≤ 1 then
7
T
i
l+1
= T
i
l
8
B
i
l+1,A
i
= B
i
l,A
i
∀ /
i
∈ T
i
l
9
else 10
let ˚u
i
l
= arg max
u∈B
i
l,
˚
A
i
l
r
i
u
˚
A
i
l
11
let
˜
/
i
l
=
˚
/
i
l
∪ ¦˚u
i
l
¦ 12
T
i
l+1
= T
i
l
∪ ¦
˜
/
i
l
¦ 13
B
i
l+1,
˜
A
i
l
= B
i
l,
˚
A
i
l
¸¦˚u
i
l
¦¸¦u ∈ B
i
l,
˚
A
i
l
[
˜
/
i
l
∪ ¦u¦ / ∈ S
i
¦
14
evaluate r
i
˜
A
i
l
= r
i
˚
A
i
l
+ r
i
˚u
˚
A
i
l
15
evaluate r
i
u
˜
A
i
l
∀ u ∈ B
i
l+1,
˜
A
i
l
16
B
i
l+1,
˚
A
i
l
= B
i
l,
˚
A
i
l
¸¦˚u
i
l
¦
17
B
i
l+1,A
i
= B
i
l,A
i
∀ /
i
∈ T
i
l
, /
i
,=
˚
/
i
l
18
end 19
end 20
l = l + 1 21
resolve problem in Eq. (4.16) 22
end 23
Algorithm 4.1: Constraint generation algorithm which iteratively utilizes
Eq. (4.16) to solve Eq. (4.15).
• In each iteration (l) of the algorithm, the integer program in Eq. (4.16) is resolved.
Sec. 4.3. Constraint generation approach 131
If no more than one exploration subset element is chosen for each object, then the
rewards of all subsets selected correspond to the exact values (as opposed to upper
bounds), the optimal solution has been found (as we will show in Theorem 4.1),
and the algorithm terminates; otherwise another iteration of the “while” loop
(line 4 of Algorithm 4.1) is executed.
• Each iteration of the “while” loop considers decisions corresponding to each object
(i) in turn: (the for loop in line 4)
– If no more than one exploration subset element is chosen for object i then
the reward for that object is an exact value rather than an upper bound, so
the collection of candidate subsets (T
i
l
) and the exploration subsets (B
i
l,A
i
)
remain unchanged for that object in the following iteration (lines 8–9).
– If more than one exploration subset element has been chosen for a given
object, then the reward obtained by the integer program for that object is
an upper bound to the exact reward (as we show in Lemma 4.3). Thus we
generate an additional candidate subset (
˜
/
i
l
) which augments the previously
chosen candidate subset (
˚
/
i
l
) with the exploration subset element with the
highest reward increment (˚u
i
l
) (lines 11–13). This greedy exploration is anal
ogous to the greedy heuristic discussed in Chapter 3. Here, rather than using
it to make greedy action choices (which would result in loss of optimality),
we use it to decide which portion of the action space to explore ﬁrst. As we
will see in Theorem 4.1, this scheme maintains a guarantee of optimality if
allowed to run to termination.
– Exploration subsets allow us to select any subset of observations for which the
exact reward has not yet been calculated, using an upper bound to the exact
reward. Obviously we want to preclude selection of a candidate subset along
with additional exploration subset elements to construct a subset for which
the exact reward has already been calculated; the updates of the exploration
subsets in lines 14 and 17 achieve this, as shown in Lemma 4.1.
4.3.4 Example
Suppose that there are three objects (numbered ¦1, 2, 3¦) and three resources (¹ =
¦α, β, γ¦), and that the reward of the various observation subsets are as shown in
Table 4.1. We commence with a single empty candidate subset for each object (T
i
0
=
Object Subset Resources consumed Reward
1 ∅ ∅ 0
1 ¦a¦ ¦α¦ 2
1 ¦b¦ ¦β¦ 2
1 ¦c¦ ¦γ¦ 2
1 ¦a, b¦ ¦α, β¦ 3
1 ¦a, c¦ ¦α, γ¦ 3
1 ¦b, c¦ ¦β, γ¦ 3
1 ¦a, b, c¦ ¦α, β, γ¦ 3.5
2 ∅ ∅ 0
2 ¦d¦ ¦β¦ 0.6
3 ∅ ∅ 0
3 ¦e¦ ¦γ¦ 0.8
Table 4.1. Observation subsets, resources consumed and rewards for each object in the
example shown in Fig. 4.6.
{a, b, c}
∅
Iteration 0
Candidate subsets (T
1
0
)
Exploration subsets
{b, c}
∅
Candidate subsets (T
1
1
)
{b, c}
{a}
{b, c}
∅
Candidate subsets (T
1
2
)
Exploration subsets
{c}
{a}
{c}
{a, b}
Solution:
ω
1
∅
= ω
1
a∅
= ω
1
b∅
= ω
1
c∅
= 1
ω
2
∅
= ω
3
∅
= 1
IP reward: 6; reward: 3.5
Exploration subsets
{c}
∅
Candidate subsets (T
1
3
)
Exploration subsets
{c}
{a}
{c}
{a, b}
{c}
{b}
Iteration 1
Iteration 2 Iteration 3
Solution:
ω
1
{a}
= ω
1
b{a}
= ω
1
c{a}
= 1
ω
2
∅
= ω
3
∅
= 1
IP reward: 4; reward: 3.5
Solution:
ω
1
∅
= ω
1
b∅
= ω
1
c∅
= 1
ω
2
∅
= ω
3
∅
= 1
IP reward: 4; reward: 3
Solution:
ω
1
{a,b}
= 1
ω
2
∅
= ω
3
∅
= ω
3
e∅
= 1
IP reward: 3.8; reward: 3.8
Figure 4.6. Four iterations of operations performed by Algorithm 4.1 on object 1 (ar
ranged in counterclockwise order, from the topleft). The circles in each iteration show
the candidate subsets, while the attached rectangles show the corresponding exploration
subsets. The shaded circles and rectangles in iterations 1, 2 and 3 denote the sets that
were updated prior to that iteration. The solution to the integer program in each iteration
is shown along with the reward in the integer program objective (“IP reward”), which is
an upper bound to the exact reward, and the exact reward of the integer program solution
(“reward”).
134 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
¦∅¦). The diagram in Fig. 4.6 illustrates the operation of the algorithm in this scenario:
• In iteration l = 0, the integer program selects the three observations for object 1
(i.e., choosing the candidate subset ∅, and the three exploration subset elements
¦a, b, c¦), and no observations for objects 2 and 3, yielding an upper bound for
the reward of 6 (the incremental reward of each exploration subset element from
the empty set is 2). The exact reward of this conﬁguration is 3.5. Selection of
these exploration subset elements indicates that subsets involving them should be
explored further, hence we create a new candidate subset
˜
/
1
0
= ¦a¦ (the element
with the highest reward increment, breaking ties arbitrarily).
• Now, in iteration l = 1, the incremental reward of observation b or c conditioned
on the candidate subset ¦a¦ is 1, and the optimal solution to the integer program
in iteration is still to select these three observations for object 1, but to do so it is
now necessary to select candidate subset ¦a¦ ∈ T
1
1
together with the exploration
subset elements ¦b, c¦. No observations are chosen for objects 2 and 3. The
upper bound to the reward provided by the integer program is now 4, which
is substantially closer to the exact reward of the conﬁguration (3.5). Again we
have two exploration subset elements selected (which is why the reward in the
integer program is not the exact reward), so we introduce a new candidate subset
˜
/
1
1
= ¦a, b¦.
• The incremental reward of observation c conditioned on the new candidate subset
¦a, b¦ is 0.5. The optimal solution to the integer program at iteration l = 2 is
then to select object 1 candidate subset ∅ and exploration subset elements ¦b, c¦,
and no observations for objects 2 and 3. The upper bound to the reward provided
by the integer program remains 4, but the exact reward of the conﬁguration is
reduced to 3 (note that the exact reward of the solution to the integer program
at each iteration is not monotonic). Once again there are two exploration subset
elements selected, so we introduce a new candidate subset
˜
/
1
2
= ¦b¦.
• The optimal solution to the integer program in iteration l = 3 is then the true
optimal conﬁguration, selecting observations a and b for object 1 (i.e., candidate
subset ¦a, b¦ and no exploration subset elements), no observations for object 2,
and observation e for object 3 (i.e., candidate subset ∅ and exploration subset
element ¦e¦). Since no more than one exploration subset element is chosen for
Sec. 4.3. Constraint generation approach 135
each object, the algorithm knows that it has found the optimal solution and thus
terminates.
4.3.5 Theoretical characteristics
We are now ready to prove a number of theoretical characteristics of our algorithm. Our
goal is to prove that the algorithm terminates in ﬁnite time with an optimal solution;
Theorem 4.1 establishes this result. Several intermediate results are obtained along the
way. Lemma 4.1 proves that there is a unique way of selecting each subset in S
i
in
each iteration, while Lemma 4.2 proves that no subset that is not in S
i
can be selected;
together, these two results establish that the collection of subsets from which we may
select remains the same (i.e., o
i
) in every iteration. Lemma 4.3 establishes that the
reward in the integer program of any selection is an upper bound to the exact reward of
that selection, and that the upper bound is monotonically nonincreasing with iteration.
Lemma 4.4 establishes that the reward in the integer program of the solution obtained
upon termination of Algorithm 4.1 is an exact value (rather than an upper bound).
This leads directly to the ﬁnal result in Theorem 4.1, that the algorithm terminates
with an optimal solution.
Before we commence, we prove a simple proposition that allows us to represent
selection of a feasible observation subset in two diﬀerent ways.
Proposition 4.1. If ( ∈ S
i
, /
i
∈ T
i
l
and /
i
⊆ ( ⊆ /
i
∪B
i
l,A
i
then the conﬁguration:
ω
i
A
i
= 1
ω
uA
i =
_
_
_
1, u ∈ (¸/
i
0, otherwise
is feasible (provided that the required resources are not consumed on other objects),
selects subset ( for object i, and is the unique conﬁguration doing so that has ω
i
A
i
=
1. Conversely, if a feasible selection variable conﬁguration for object i selects (, and
ω
i
A
i
= 1, then /
i
⊆ ( ⊆ /
i
∪ B
i
A
i
.
Proof. Since ( ∈ S
i
, no two resources in ( utilize the same resource. Hence the
conﬁguration is feasible, assuming that the required resources are not consumed on
other objects. That the conﬁguration selects ( is immediate from Deﬁnition 4.6. From
Algorithm 4.1, it is clear that /
i
∩ B
i
l,A
i
= ∅ ∀ l, /
i
∈ T
i
l
(it is true for l = 0, and
136 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
every time a new candidate subset /
i
is created by augmenting an existing candidate
subset with a new element, the corresponding element is removed from B
i
l,A
i
). Hence
the conﬁguration given is the unique conﬁguration selecting for which ω
i
A
i
= 1. The
converse results directly from Deﬁnition 4.6. In each case, note that all other selection
variables for object i must be zero due to the constraints in Eq. (4.16c) and Eq. (4.16d).
The following lemma establishes that the updates in Algorithm 4.1 maintain a
unique way of selecting observation subset in S
i
in each iteration.
Lemma 4.1. In every iteration l, for any object i and any subset ( ∈ S
i
there is
exactly one conﬁguration of selection variables for the ith object (ω
i
A
i
, ω
i
uA
i
) that selects
observation subset (.
Proof. In the ﬁrst iteration (l = 0), T
i
0
= ¦∅¦, hence the only conﬁguration selecting
set ( is that obtained from Proposition 4.1 with /
i
= ∅.
Now suppose that the lemma holds for some l; we will show that it also holds for
(l +1). Given any ( ∈ S
i
, there exists a unique /
†
∈ T
i
l
such that /
†
⊆ ( ⊆ /
†
∪B
i
l,A
†
by the induction hypothesis and Proposition 4.1. If /
†
=
˚
/
i
l
and ˚u
i
l
∈ ( then
˜
/
i
l
⊆ ( ⊆
˜
/
i
l
∪ B
i
l+1,
˜
A
i
l
, and it is not the case that /
i
⊆ ( ⊆ /
i
∪ B
i
l+1,A
i
for any other /
i
∈ T
i
l+1
(it is not the case for
˚
/
i
l
since ˚u
i
l
/ ∈
˚
/
i
l
∪ B
i
l+1,
˚
A
i
l
, and it is not the case for any other
/
i
∈ T
i
l
since B
i
l+1,A
i
= B
i
l,A
i
and the condition did not hold for iteration l by the
induction hypothesis). Hence exactly one selection variable conﬁguration selects (.
If /
†
=
˚
/
i
l
and ˚u
i
l
/ ∈ ( then
˚
/
i
l
⊆ ( ⊆
˚
/
i
l
∪ B
i
l+1,
˚
A
i
l
, and it is not the case that
/
i
⊆ ( ⊆ /
i
∪ B
i
l+1,A
i
for any other /
i
∈ T
i
l+1
(it is not the case for
˜
/
i
l
since
˜
/
i
l
(,
and it is not the case for any other /
i
∈ T
i
l
since B
i
l+1,A
i
= B
i
l,A
i
and the condition
did not hold for iteration l by the induction hypothesis). Hence exactly one selection
variable conﬁguration selects (.
Finally, if /
†
,=
˚
/
i
l
then /
†
∈ T
i
l+1
and /
†
⊆ ( ⊆ /
†
∪ B
i
l+1,A
†
. Since it is not the
case that
˚
/
i
l
⊆ ( ⊆
˚
/
i
l
∪ B
i
l,
˚
A
i
l
, it will also not be the case that
˚
/
i
l
⊆ ( ⊆
˚
/
i
l
∪ B
i
l+1,
˚
A
i
l
or
˜
/
i
l
⊆ ( ⊆
˜
/
i
l
∪ B
i
l+1,
˜
A
i
l
, since each of these conditions represents a tightening of the
original condition, which was already false. It will also not be the case that /
i
⊆ ( ⊆
/
i
∪ B
i
l+1,A
i
for any other /
i
∈ T
i
l
, /
i
,= /
†
since B
i
l+1,A
i
= B
i
l,A
i
.
This shows that the lemma holds for (l + 1). By induction, it must then hold for
every iteration.
Sec. 4.3. Constraint generation approach 137
The following result establishes that only subsets in S
i
can be generated by the
integer program. Combined with the previous result, this establishes that the subsets
in S
i
and the conﬁgurations of the selection variables in the integer program for object
i are in a bijective relationship at every iteration.
Lemma 4.2. In every iteration l, any feasible selection of a subset for object i, T
i
, is
in S
i
.
Proof. Suppose, for contradiction, that there is a subset T
i
that is feasible for object
i (i.e., there exists a feasible conﬁguration of selection variables with ω
i
A
i
= 1 and
(
i
= ¦u ∈ B
i
l,A
i
[ω
i
l,A
i
= 1¦, such that T
i
= /
i
∪ (
i
), but that T
i
/ ∈ S
i
. Assume that
S
i
is of the form of Eq. (4.12). Since T
i
/ ∈ S
i
, there must be at least one resource in
T
i
that is used twice. This selection must then be infeasible due to the constraint in
Eq. (4.16b), yielding a contradiction. If S
i
is of the form of Eq. (4.13) or (4.14), then
a contradiction will be obtained from Eq. (4.16b), (4.16g) or (4.16h).
The following lemma uses submodularity to show that the reward in the integer
program for selecting any subset at any iteration is an upper bound to the exact reward
and, furthermore, that the bound tightens as more iterations are performed. This is a
key result for proving the optimality of the ﬁnal result when the algorithm terminates.
As with the analysis in Chapter 3, the result is derived in terms of mutual information,
but applies to any nondecreasing, submodular reward function.
Lemma 4.3. The reward associated with selecting observation subset ( for object i in
the integer program in Eq. (4.16) is an upper bound to the exact reward of selecting that
subset in every iteration. Furthermore, the reward for selecting subset ( for object i in
the integer program in iteration l
2
is less than or equal to the reward for selecting ( in
l
1
for any l
1
< l
2
.
Proof. Suppose that subset ( is selected for object i in iteration l. Let
˚
/
i
l
be the subset
such that ω
i
˚
A
i
l
= 1. Introducing an arbitrary ordering ¦u
1
, . . . , u
n
¦ of the elements of
B
i
l,
˚
A
i
l
for which ω
i
u
i

˚
A
i
l
= 1 (i.e., ( = /
i
l
∪ ¦u
1
, . . . , u
n
¦), the exact reward for selecting
138 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
observation subset ( is:
r
i
C
I(X
i
; z
i
C
)
(a)
= I(X
i
; z
i
˚
A
i
l
) +
n
j=1
I(X
i
; z
i
u
j
[
˚
/
i
l
, z
i
u
1
, . . . , z
i
u
j−1
)
(b)
≤ I(X
i
; z
i
˚
A
i
l
) +
n
j=1
I(X
i
; z
i
u
j
[z
i
˚
A
i
l
)
(c)
= r
i
˚
A
i
l
+
n
j=1
r
i
u
j

˚
A
i
l
where (a) is an application of the chain rule, (b) results from submodularity, and (c)
results from the deﬁnition of r
i
iA
i
and conditional mutual information. This establishes
the ﬁrst result, that the reward for selecting any observation set in any iteration of the
integer program is an upper bound to the exact reward of that set.
Now consider the change which occurs between iteration l and iteration (l + 1).
Suppose that the set updated for object i in iteration l,
˚
/
i
l
, is the unique set (by
Lemma 4.1) for which
˚
/
i
l
⊆ ( ⊆
˚
/
i
l
∪ B
i
l,
˚
A
i
l
, and that ˚u
i
l
∈ (. Assume without loss of
generality that the ordering ¦u
1
, . . . , u
n
¦ from the previous stage was chosen such that
u
n
= ˚u
i
l
. Then the reward for selecting subset ( at stage (l + 1) will be:
r
i
˜
A
i
l
+
n−1
j=1
(r
i
˜
A
i
l
∪{u
j
}
−r
i
˜
A
i
l
)
(a)
= I(X
i
; z
i
˜
A
i
l
) +
n−1
j=1
I(X
i
; z
i
u
j
[z
i
˜
A
i
l
)
(b)
= I(X
i
; z
i
˚
A
i
l
) + I(X
i
; z
i
˚u
i
l
[z
i
˚
A
i
l
)
+
n−1
j=1
I(X
i
; z
i
u
j
[z
i
˚
A
i
l
, z
i
un
)
(c)
≤ I(X
i
; z
i
˚
A
i
l
) +
n
j=1
I(X
i
; z
i
u
j
[z
i
˚
A
i
l
)
(d)
= r
i
˚
A
i
l
+
n
j=1
r
i
u
j

˚
A
i
l
where (a) and (d) result from the deﬁnition or r
i
A
and conditional mutual information,
(b) from the chain rule, and (c) from submodularity. The form in (d) is the reward for
selecting ( at iteration l.
If it is not true that
˚
/
i
l
⊆ ( ⊆
˚
/
i
l
∪ B
i
l,
˚
A
i
l
, or if ˚u
i
l
/ ∈ (, then the conﬁguration
Sec. 4.3. Constraint generation approach 139
selecting set ( in iteration (l + 1) will be identical to the conﬁguration in iteration l,
and the reward will be unchanged.
Hence we have found that the reward for selecting subset ( at iteration (l + 1) is
less than or equal than the reward for selecting ( at iteration l. By induction we then
obtain the second result.
At this point we have established that, at each iteration, we have available to us
a unique way of selecting each subset in same collection of subsets, o
i
; that in each
iteration the reward in the integer program for selecting any subset is an upper bound
to the exact reward of that subset; and that the upper bound becomes increasingly
tight as iterations proceed. These results directly provide the following corollary.
Corollary 4.1. The reward achieved in the integer program in Eq. (4.16) at each
iteration is an upper bound to the optimal reward, and is monotonically nonincreasing
with iteration.
The last result that we need before we prove the ﬁnal outcome is that the reward
in the integer program for each object in the terminal iteration of Algorithm 4.1 is the
exact reward of the selection chosen for that object.
Lemma 4.4. At termination, the reward in the integer program for the subset T
i
selected for object i is the exact reward of that subset.
Proof. In accordance with Deﬁnition 4.6, let /
i
be the subset such that ω
i
A
i
= 1, and
let (
i
= ¦u ∈ B
i
l,A
i
[ω
uA
i = 1¦, so that T
i
= /
i
∪(
i
. Since the algorithm terminated at
iteration l, we know that (from line 4 of Algorithm 4.1)
[(
i
[ =
u∈B
i
l,A
i
ω
i
uA
i
≤ 1
Hence either no exploration subset element is chosen, or one exploration subset element
is chosen. If no exploration subset element is chosen, then /
i
= T
i
, and the reward
obtained by the integer program is simply r
i
A
i
, the exact value. If an exploration subset
element u is chosen (i.e., (
i
= ¦u¦), then the reward obtained by the integer program
is
r
i
A
i
+ r
i
uA
i
= r
i
A
i
+ (r
i
A
i
∪{u}
−r
i
A
i
) = r
i
A
i
∪{u}
= r
i
D
i
which, again, is the exact value. Since the objects are independent, the exact overall
reward is the sum of the rewards of each object, which is the objective of the integer
program.
140 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
We now utilize these outcomes to prove our main result.
Theorem 4.1. Algorithm 4.1 terminates in ﬁnite time with an optimal solution.
Proof. To establish that the algorithm terminates, note that the number of observation
subsets in the collection S
i
is ﬁnite for each object i. In every iteration, we add a new
subset /
i
∈ S
i
into the collection of candidate subsets T
i
l
for some object i. If no such
subset exists, the algorithm must terminate, hence ﬁnite termination is guaranteed.
Since the reward in the integer program for the subset selected for each object is the
exact reward for that set (by Lemma 4.4), and the rewards for all other other subsets
are upper bounds to the exact rewards (by Lemma 4.3), the solution upon termination
is optimal.
4.3.6 Early termination
Note that, while the algorithm is guaranteed to terminate ﬁnitely, the complexity may
be combinatorially large. It may be necessary in some situations to evaluate the reward
of every observation subset in S
i
for some objects. At each iteration, the reward
obtained by the integer program is an upper bound to the optimal reward, hence if we
evaluate the exact reward of the solution obtained at each iteration, we can obtain a
guarantee on how far our existing solutions are from optimality, and decide whether
to continue processing or terminate with a nearoptimal solution. However, while the
reward in the integer program is a monotonically nonincreasing function of iteration
number, the exact reward of the subset selected by the integer program in each iteration
may increase or decrease as iterations progress. This was observed in the the example
in Section 4.3.4: the exact reward of the optimal solution in iteration l = 1 was 3.5,
while the exact reward of the optimal solution in iteration l = 2 was 3.
It may also be desirable at some stage to terminate the iterative algorithm and
select the best observation subset amongst the subsets for which the reward has been
evaluated. This can be achieved through a ﬁnal execution of the integer program in
Eq. (4.16), adding the following constraint:
A
i
∈T
i
l
u∈B
i
l,A
i
ω
i
uA
i
≤ 1 ∀ i ∈ ¦1, . . . , M¦ (4.17)
Since we can select no more than one exploration subset element for each object, the
reward given for any feasible selection will be the exact reward (as shown in Lemma 4.4).
Sec. 4.4. Computational experiments 141
The reward that can be obtained by this augmented integer program is a nondecreasing
function of iteration number, since the addition of new candidate subsets yields a wider
range of subsets that can be selected without using multiple exploration actions.
4.4 Computational experiments
The experiments in the following sections illustrate the utility of our method. We com
mence in Section 4.4.1 by describing our implementation of the algorithm. Section 4.4.2
examines a scenario involving surveillance of multiple objects using two moving radar
platforms. Section 4.4.3 extends this example to incorporate additional nonstationarity
(through observation noise which increases when objects are closely spaced), as well as
observations consuming diﬀerent numbers of time slots. The scenario discussed in Sec
tion 4.4.4 was constructed to demonstrate the increase in performance that is possible
due to additional planning when observations all consume a single time slot (so that
the guarantees of Chapter 3 apply). Finally, the scenario discussed in Section 4.4.5
was constructed to demonstrate the increase in performance which is possible due to
additional planning when observations consume diﬀerent numbers of time slots.
4.4.1 Implementation notes
The implementation utilized in the experiments in this section was written in C++,
solving the integer programs using ILOG
CPLEX
10.1 through the callable library
interface. In each iteration of Algorithm 4.1, we solve the integer program, terminating
when we ﬁnd a solution that is guaranteed to be within 98% of optimality, or after
15 seconds of computation time have elapsed. Before commencing the next iteration,
we solve the augmented integer program described in Section 4.3.6, again terminating
when we ﬁnd a solution that is guaranteed to be within 98% of optimality or after 15
seconds. Termination occurs when the solution of the augmented integer program is
guaranteed to be within 95% of optimality,
2
or after 300 seconds have passed. The
present implementation of the planning algorithm is limited to addressing linear Gaus
sian models. We emphasize that this is not a limitation of the planning algorithm; the
assumption merely simpliﬁes the computations involved in reward function evaluations.
Unless otherwise stated, all experiments utilized an open loop feedback control strat
egy (as discussed in Section 2.2.2). Under this scheme, at each time step a plan was
constructed for the next Nstep planning horizon, the ﬁrst step was executed, the re
2
The guarantee is obtained by accessing the upper bound found by CPLEX
to the optimal reward
of the integer program in Algorithm 4.1.
142 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
sulting observations were incorporated, and then a new plan was constructed for the
following Nstep horizon.
The computation times presented in the following sections include only the com
putation time involved in the solution of the integer program in Algorithm 4.1. The
computation time expended in the augmented integer program of Section 4.3.6 are
excluded as the results of these computations are not used as inputs to the following it
erations. As such, the augmented integer program could easily be executed on a second
processor core in a modern parallel processing architecture. All times were measured
using a 3.06 GHz Intel
Xeon
processor.
4.4.2 Waveform selection
Our ﬁrst example models surveillance of multiple objects by two moving radar plat
forms. The platforms move along ﬁxed racetrack patterns, as shown in Fig. 4.7. We
denote by y
i
k
the state (i.e., position and velocity) of platform i at time k. There are
M objects under track, the states of which evolve according to the nominally constant
velocity model described in Eq. (2.8), with ∆t = 0.03 sec and q = 1. The simulation
length is 200 steps; the sensor platforms complete 1.7 revolutions of the racetrack pat
tern in Fig. 4.7 in this time.
3
The initial positions of objects are distributed uniformly
in the region [10, 100] [10, 100]; the initial velocities in each direction are drawn from
a Gaussian distribution with mean zero and standard deviation 0.25. The initial esti
mates are set to the true state, corrupted by additive Gaussian noise with zero mean
and standard deviation 1 (in position states) and 0.1 (in velocity states).
In each time slot, each sensor may observe one of the M objects, obtaining either
an azimuth and range observation, or an azimuth and range rate observation, each of
which occupies a single time slot:
z
i,j,r
k
=
_
¸
_
tan
−1
_
[x
i
k
−y
j
k
]
3
[x
i
k
−y
j
k
]
1
_
_
([x
i
k
−y
j
k
]
1
)
2
+ ([x
i
k
−y
j
k
]
3
)
2
_
¸
_ +
_
b(x
i
k
, y
j
k
) 0
0 1
_
v
i,j,r
k
(4.18)
z
i,j,d
k
=
_
¸
¸
_
tan
−1
_
[x
i
k
−y
j
k
]
3
[x
i
k
−y
j
k
]
1
_
[x
i
k
−y
j
k
]
1
[x
i
k
−y
j
k
]
2
+[x
i
k
−y
j
k
]
3
[x
i
k
−y
j
k
]
4
q
([x
i
k
−y
j
k
]
1
)
2
+([x
i
k
−y
j
k
]
3
)
2
_
¸
¸
_
+
_
b(x
i
k
, y
j
k
) 0
0 1
_
v
i,j,d
k
(4.19)
where z
i,j,r
k
denotes the azimuth/range observation for object i using sensor j at time k,
3
The movement of the sensor is accentuated in order to create some degree of nonstationarity in
the sensing model.
0 20 40 60 80 100
0
20
40
60
80
100
Sensor paths and example object positions
Figure 4.7. The two radar sensor platforms move along the racetrack patterns shown
by the solid lines; the position of the two platforms in the tenth time slot is shown by the
‘*’ marks. The sensor platforms complete 1.7 revolutions of the pattern in the 200 time
slots in the simulation. M objects are positioned randomly within the [10, 100] ×[10, 100]
according to a uniform distribution, as illustrated by the ‘’ marks.
144 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
and z
i,j,d
k
denotes the azimuth/range rate (i.e., Doppler) observation. The notation [a]
l
denotes the lth element of the vector a; the ﬁrst and third elements of the object state
x
i
k
and the sensor state y
j
k
contain the position in the xaxis and yaxis respectively,
while the second and fourth elements contain the velocity in the xaxis and yaxis
respectively.
The observation noise in each observation is independent of all other observations,
and distributed according to a Gaussian PDF:
p(v
i,j,r
k
) = ^
_
v
i,j,r
k
; 0,
_
σ
2
φ
0
0 σ
2
r
__
p(v
i,j,d
k
) = ^
_
v
i,j,d
k
; 0,
_
σ
2
φ
0
0 σ
2
d
__
The standard deviation of the noise on the azimuth observations (σ
φ
) is 3
◦
; the mul
tiplier function b(x
i
k
, y
j
k
) varies from unity on the broadside (i.e., when the sensor
platform heading is perpendicular to the vector from the sensor to the object) to 3
1
3
endon. The standard deviation of the range observation (σ
r
) is 0.05 units, while the
standard deviation of the range rate observation (σ
d
) is 0.03 units/sec. In practice
the tracking accuracy is high enough that the variation in azimuth noise due to the
uncertainty in object position is small, so we simply use the object position estimate
to calculate an estimate of the noise variance. A linearized Kalman ﬁlter is utilized in
order to evaluate rewards for the planning algorithm, while an extended Kalman ﬁlter
is used for estimation.
Results for 16 Monte Carlo simulations with planning horizons of between one and
30 time slots per sensor are shown in Fig. 4.8. The top diagram shows the results for
50 objects, while the middle diagram shows the results for 80 objects. The diagrams
show that there is a small increase in performance as the planning horizon increases,
although the increase is limited to around 1–3% over that obtained when the planning
horizon is limited to a single time slot. With a planning horizon of one, planning is
conducted jointly over the two sensors; the computation cost of doing so through brute
force enumeration is equivalent to that of planning for a single sensor over two time
slots.
The small increase in performance is not unexpected given that there is little non
stationarity in the problem. Intuitively one would expect a relatively short planning
horizon to be adequate; Fig. 4.8 provides a conﬁrmation of this intuition using plan
ning horizons that are intractable using previous methods. The example also demon
strates the computational complexity of the method, as shown in the bottom diagram in
0 5 10 15 20 25 30
1
1.01
1.02
1.03
Horizon length (time slots per sensor)
R
e
l
a
t
i
v
e
g
a
i
n
Performance in 16 Monte Carlo simulations of 50 objects
0 5 10 15 20 25 30
1
1.01
1.02
1.03
Horizon length (time slots per sensor)
R
e
l
a
t
i
v
e
g
a
i
n
Performance in 16 Monte Carlo simulations of 80 objects
0 5 10 15 20 25 30
10
−3
10
−2
10
−1
10
0
10
1
10
2
Horizon length (time slots per sensor)
Average computation time to produce plan
A
v
e
r
a
g
e
t
i
m
e
(
s
e
c
o
n
d
s
)
50 objects
80 objects
Figure 4.8. Results of Monte Carlo simulations for planning horizons between one and
30 time slots (in each sensor). Top diagram shows results for 50 objects, while middle
diagram shows results for 80 objects. Each trace in the plots shows the total reward (i.e.,
the sum of the MI reductions in each time step) of a single Monte Carlo simulation for
diﬀerent planning horizon lengths divided by the total reward with the planning horizon
set to a single time step, giving an indication of the improvement due to additional plan
ning. Bottom diagram shows the computation complexity (measured through the average
number of seconds to produce a plan for the planning horizon) versus the planning horizon
length.
146 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
Fig. 4.8. The diagram shows that, as expected, the complexity increases exponentially
with the planning horizon length. However, using the algorithm it is possible to produce
a plan for 50 objects over 20 time slots each in two sensors using little over one second
in computation time. Performing the same planning through full enumeration would
involve evaluation of the reward of 10
80
diﬀerent candidate sequences, a computation
which is intractable on any foreseeable computational hardware.
The computational complexity for problems involving diﬀerent numbers of objects
for a ﬁxed planning horizon length (10 time slots per sensor) is shown in Fig. 4.9.
When the planning horizon is signiﬁcantly longer than the number of objects, it be
comes necessary to construct plans involving several observations of each object. This
will generally involve enumeration of an exponentially increasing number of candidate
subsets in S
i
for each object, resulting in an increase in computational complexity.
As the number of objects increases, it quickly becomes clear that it is better to spread
the available resources evenly across objects rather than taking many observations of
a small number of objects, so the number of candidate subsets in S
i
requiring consid
eration is vastly lower. Eventually, as the number of objects increases, the overhead
induced by the additional objects again increases the computational complexity.
4.4.3 State dependent observation noise
The second scenario involves a modiﬁcation of the ﬁrst in which observation noise
increases when objects become close to each other. This is a surrogate for the impact
of data association, although we do not model the dependency between objects which
generally results. The dynamical model has ∆t = 0.01 sec, and q = 0.25; the simulation
runs for 100 time slots. As per the previous scenario, the initial positions of the objects
are distributed uniformly on the region [10, 100] [10, 100]; velocity magnitudes are
drawn from a Gaussian distribution with mean 30 and standard deviation 0.5, while
the velocity directions are distributed uniformly on [0, 2π]. The initial estimates are set
to the true state, corrupted by additive Gaussian noise with zero mean and standard
deviation 0.02 (in position states) and 0.1 (in velocity states). The scenario involves
a single sensor rather than two sensors; the observation model is essentially the same
as the previous case, except that there is a statedependent scalar multiplier on the
10 20 30 40 50 60 70 80 90
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
A
v
e
r
a
g
e
t
i
m
e
(
s
e
c
o
n
d
s
)
Number of objects
Computation time vs number of objects (two sensors, horizon length = 10)
Figure 4.9. Computational complexity (measured as the average number of seconds to
produce a plan for the 10step planning horizon) for diﬀerent numbers of objects.
148 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
observation noise:
z
i,j,r
k
=
_
¸
_
tan
−1
_
[x
i
k
−y
j
k
]
3
[x
i
k
−y
j
k
]
1
_
_
([x
i
k
−y
j
k
]
1
)
2
+ ([x
i
k
−y
j
k
]
3
)
2
_
¸
_ + d
i
(x
1
k
, . . . , x
M
k
)
_
b(x
i
k
, y
j
k
) 0
0 1
_
v
i,j,r
k
(4.20)
z
i,j,d
k
=
_
¸
¸
_
tan
−1
_
[x
i
k
−y
j
k
]
3
[x
i
k
−y
j
k
]
1
_
[x
i
k
−y
j
k
]
1
[x
i
k
−y
j
k
]
2
+[x
i
k
−y
j
k
]
3
[x
i
k
−y
j
k
]
4
q
([x
i
k
−y
j
k
]
1
)
2
+([x
i
k
−y
j
k
]
3
)
2
_
¸
¸
_
+ d
i
(x
1
k
, . . . , x
M
k
)
_
b(x
i
k
, y
j
k
) 0
0 1
_
v
i,j,d
k
(4.21)
The azimuth noise multiplier b(x
i
k
, y
j
k
) is the same as in Section 4.4.2, as is the azimuth
noise standard deviation (σ
φ
= 3
◦
). The standard deviation of the noise on the range
observation is σ
r
= 0.1, and on the range rate observation is σ
d
= 0.075. The function
d(x
1
k
, . . . , x
M
k
) captures the increase in observation noise when objects are close together:
d
i
(x
1
k
, . . . , x
M
k
) =
j=i
δ
_
_
([x
i
k
−x
j
k
]
1
)
2
+ ([x
i
k
−x
j
k
]
3
)
2
_
where δ(x) is the piecewise linear function:
δ(x) =
_
_
_
10 −x, 0 ≤ x < 10
0, x ≥ 10
The state dependent noise is handled in a manner similar to the optimal linear estimator
for bilinear systems, in which we estimate the variance of the observation noise, and then
use this in a conventional linearized Kalman ﬁlter (for reward evaluations for planning)
and extended Kalman ﬁlter (for estimation). We draw a number of samples of the
joint state of all objects, and evaluate the function d(x
1
k
, . . . , x
M
k
) for each. We then
estimate the noise multiplier as being the 90% percentile point of these evaluations,
i.e., the smallest value such that 90% of the samples evaluate to a lower value. This
procedure provides a pessimistic estimate of the noise amplitude.
In addition to the option of these two observations, the sensor can also choose a more
accurate observation that takes three time slots to complete, and is not subject increased
noise when objects become closely spaced. The azimuth noise for these observations in
the broadside aspect has σ
φ
= 0.6
◦
, while the range noise has σ
r
= 0.02 units, and the
range rate noise has σ
d
= 0.015 units/sec.
0 5 10 15 20
1
1.05
1.1
1.15
Horizon length (time slots)
R
e
l
a
t
i
v
e
g
a
i
n
Performance in 20 Monte Carlo simulations of 50 objects
0 5 10 15 20
10
−3
10
−2
10
−1
10
0
10
1
10
2
Horizon length (time slots)
Average computation time to produce plan
A
v
e
r
a
g
e
t
i
m
e
(
s
e
c
o
n
d
s
)
Figure 4.10. Top diagram shows the total reward for each planning horizon length divided
by the total reward for a single step planning horizon, averaged over 20 Monte Carlo
simulations. Error bars show the standard deviation of the mean performance estimate.
Lower diagram shows the average time required to produce plan for the diﬀerent length
planning horizon lengths.
150 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
The results of the simulation are shown in Fig. 4.10. When the planning hori
zon is less than three time steps, the controller does not have the option of the three
time step observation available to it. A moderate gain in performance is obtained by
extending the planning horizon from one time step to three time steps to enable use
of the longer observation. The increase is roughly doubled as the planning horizon is
increased, allowing the controller to anticipate periods when objects are unobservable.
The computational complexity is comparable with that of the previous case; the addi
tion of observations consuming multiple time slots does not increase computation time
signiﬁcantly.
4.4.4 Example of potential beneﬁt: single time slot observations
The third scenario demonstrates the gain in performance which is possible by planning
over long horizon length on problems to which the guarantees of Chapter 3 apply (i.e.,
when all observations occupy a single time slot). The scenario involves M = 50 objects
being tracked using a single sensor over 50 time slots. The object states evolve according
to the nominally constant velocity model described in Eq. (2.8), with ∆t = 10
−4
sec
and q = 1. The initial position and velocity of the objects is identical to the scenario
in Section 4.4.3. The initial state estimates of the ﬁrst 25 objects are corrupted by
Gaussian noise with covariance I (i.e., the 4 4 identity matrix), while the estimates
of the remaining objects is corrupted by Gaussian noise with covariance 1.1I.
In each time slot, any one of the M objects can be observed; each observation has
a linear Gaussian model (i.e., Eq. (2.11)) with H
i
k
= I. The observation noise for the
ﬁrst 25 objects has covariance R
i
k
= 10
−6
I in the ﬁrst 25 time slots, and R
i
k
= I in the
remaining time slots. The observation noise of the remaining objects has covariance
R
i
k
= 10
−6
I for all k. While this example represents an extreme case, one can see
that similar events can commonly occur on a smaller scale in realistic scenarios; e.g.,
in the problem examined in the previous section, observation noise variances frequently
increased as objects became close together.
The controller constructs a plan for each Nstep planning horizon, and then executes
a single step before replanning for the following N steps. When the end of the current
Nstep planning horizon is the end of the scenario (e.g., in the eleventh time step when
N = 40, and in the ﬁrst time step when N = 50), the entire plan is executed without
replanning.
Intuitively, it is obvious that planning should be helpful in this scenario: half of the
objects have signiﬁcantly degraded observability in the second half of the simulation,
Sec. 4.4. Computational experiments 151
and failure to anticipate this will result in a signiﬁcant performance loss. This intuition
is conﬁrmed in the results presented in Fig. 4.11. The top plot shows the increase in
relative performance as the planning horizon increases from one time slot to 50 (i.e., the
total reward for each planning horizon, divided by the total reward when the planning
horizon is unity). Each additional time slot in the planning horizon allows the controller
to anticipate the change in observation models sooner, and observe more of the ﬁrst
25 objects before the increase in observation noise covariance occurs. The performance
increases monotonically with planning horizon apart from minor variations. These
are due to the stopping criteria in our algorithm: rather than waiting for an optimal
solution, we terminate when we obtain a solution that is within 95% of the optimal
reward.
With a planning horizon of 50 steps (spanning the entire simulation length), the total
reward is 74% greater than the total reward with a single step planning horizon. Since
all observations occupy a single time slot, the performance guarantees of Chapter 3,
and the maximum gain possible in any scenario of this type is 100% (i.e., the optimal
performance can be no more than twice that of the onestep greedy heuristic). Once
again, while this example represents an extreme case, one can see that similar events
can commonly occur on a smaller scale in realistic scenarios. The smaller change in
observation model characteristics and comparative infrequency of these events results
the comparatively modest gains found in Sections 4.4.2 and 4.4.3.
4.4.5 Example of potential beneﬁt: multiple time slot observations
The ﬁnal scenario demonstrates the increase in performance which is possible through
long planning horizons when observations occupy diﬀerent numbers of time slots. In
such circumstances, algorithms utilizing shortterm planning may make choices that
preclude selection of later observations that may be arbitrarily more valuable.
The scenario involves M = 50 objects observed using a single sensor. The initial
positions and velocities of the objects are the same as in the previous scenario; the initial
estimates are corrupted by additive Gaussian noise with zero mean and covariance I.
In each time slot, a single object may be observed through either of two linear
Gaussian observations (i.e., of the form in Eq. (2.11)). The ﬁrst, which occupies a
single time slot, has H
i,1
k
= I, and R
i,1
k
= 2I. The second, which occupies ﬁve time
slots, has H
i,2
k
= I, and R
i,2
k
= r
k
I. The noise variance of the longer observation, r
k
,
0 10 20 30 40 50
1
1.2
1.4
1.6
1.8
Horizon length (time slots)
R
e
l
a
t
i
v
e
g
a
i
n
Reward relative to one−step planning horizon
0 10 20 30 40 50
10
−3
10
−2
10
−1
10
0
10
1
10
2
A
v
e
r
a
g
e
t
i
m
e
(
s
e
c
o
n
d
s
)
Horizon length (time slots)
Average computation time to produce plan
Figure 4.11. Upper diagram shows the total reward obtained in the simulation using
diﬀerent planning horizon lengths, divided by the total reward when the planning horizon
is one. Lower diagram shows the average computation time to produce a plan for the
following N steps.
Sec. 4.5. Time invariant rewards 153
varies periodically with time, according to the following values:
r
k
=
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
10
−1
mod (k, 5) = 1
10
−2
mod (k, 5) = 2
10
−3
mod (k, 5) = 3
10
−4
mod (k, 5) = 4
10
−5
mod (k, 5) = 0
The time index k commences at k = 1. Unless the planning horizon is suﬃciently
long to anticipate the availability of the observation with variance 10
−5
several time
steps later, the algorithm will select an observation with lower reward, which precludes
selection of this later more accurate observation.
The performance of the algorithm in the scenario is shown in Fig. 4.12. The
maximum increase in performance over the greedy heuristic is a factor of 4.7. While
this is an extreme example, it illustrates another occasion when additional planning
is highly beneﬁcial: when there are observations that occupy several time slots with
time varying rewards. In this circumstance, an algorithm utilizing shortterm planning
may make choices that preclude selection of later observations which may be arbitrarily
more valuable.
4.5 Time invariant rewards
In many selection problems where the time duration corresponding to the planning
horizon is short, the reward associated with observing the same object using the same
sensing mode at diﬀerent times within the planning horizon is wellapproximated as
being time invariant. In this case, the complexity of the selection problem can be
reduced dramatically by replacing the individual resources associated with each time
slot with a single resource, with capacity corresponding to the total number of time
units available. In this generalization of Sections 4.2 and 4.3, we associate with each
resource r ∈ ¹ a capacity C
r
∈ R, and deﬁne t(u
i
j
, r) ∈ R to be the capacity of resource
r consumed by elemental observation u
i
j
∈ 
i
. The collection of subsets from which we
may select for object i is changed from Eq. (4.12) to:
S
i
= ¦/ ⊆ 
i
[t(/, r) ≤ C
r
∀ r ∈ ¹¦ (4.22)
where
t(/, r) =
u∈A
t(u, r)
0 10 20 30 40 50
1
2
3
4
5
Horizon length (time slots)
R
e
l
a
t
i
v
e
g
a
i
n
Reward relative to one−step planning horizon
0 10 20 30 40 50
10
−3
10
−2
10
−1
A
v
e
r
a
g
e
t
i
m
e
(
s
e
c
o
n
d
s
)
Horizon length (time slots)
Average computation time to produce plan
Figure 4.12. Upper diagram shows the total reward obtained in the simulation using
diﬀerent planning horizon lengths, divided by the total reward when the planning horizon
is one. Lower diagram shows the average computation time to produce a plan for the
following N steps.
Sec. 4.5. Time invariant rewards 155
The full integer programming formulation of Eq. (4.15) becomes:
max
ω
i
A
i
M
i=1
A
i
∈S
i
r
i
A
i
ω
i
A
i
(4.23a)
s.t.
M
i=1
A
i
∈S
i
t(/
i
, r)ω
i
A
i
≤ C
r
∀ r ∈ ¹ (4.23b)
A
i
∈S
i
ω
i
A
i
= 1 ∀ i ∈ ¦1, . . . , M¦ (4.23c)
ω
i
A
i
∈ ¦0, 1¦ ∀ i, /
i
∈ S
i
(4.23d)
The iterative solution methodology in Section 4.3 may be generalized to this case
by replacing Eq. (4.16) with:
max
ω
i
A
i
, ω
i
uA
i
M
i=1
A
i
∈T
i
l
_
¸
_r
i
A
i
ω
i
A
i
+
u∈B
i
l,A
i
r
i
uA
i
ω
i
uA
i
_
¸
_ (4.24a)
s.t.
M
i=1
A
i
∈T
i
l
t(/
i
, r)ω
i
A
i
+
M
i=1
A
i
∈T
i
l
u∈B
i
l,A
i
t(u, r)ω
i
uA
i
≤ C
r
∀ r ∈ ¹ (4.24b)
A
i
∈T
i
ω
i
A
i
= 1 ∀ i ∈ ¦1, . . . , M¦ (4.24c)
u∈B
i
l,A
i
ω
i
uA
i
−[B
i
l,A
i
[ω
i
A
i
≤ 0 ∀ i, /
i
∈ T
i
l
(4.24d)
ω
i
A
i
∈ ¦0, 1¦ ∀ i, /
i
∈ T
i
(4.24e)
ω
i
uA
i
∈ ¦0, 1¦ ∀ i, /
i
∈ T
i
, u ∈ B
i
l,A
i
(4.24f)
The only change from Eq. (4.16) is in the form of the resource constraint, Eq. (4.24b).
Algorithm 4.1 may be applied without modiﬁcation using this generalized integer pro
gram.
4.5.1 Avoiding redundant observation subsets
Straightforward implementation of the formulation just described will yield a substan
tial ineﬃciency due to redundancy in the observation subset S
i
, as illustrated in the
following scenario. Suppose we seek to generate a plan for the next N time slots,
where there are η total combinations of sensor and mode within each time slot. For
each sensor mode
4
u
i
j
, j ∈ ¦1, . . . , η¦, we generate for each duration d ∈ ¦1, . . . , N¦
4
We assume that the index j enumerates all possible combinations of sensor and mode.
156 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
(i.e., the duration for which the sensor mode is applied) an elemental observation u
i
j,d
,
which corresponds to applying sensor mode u
i
j
for d time slots, so that 
i
= ¦u
i
j,d
[j ∈
¦1, . . . , η¦, d ∈ ¦1, . . . , N¦¦. With each sensor s ∈ o (where o is the set of sensors) we
associate a resource r
s
with capacity C
r
s
= N, i.e., each sensor has up to N time slots
available for use. Denoting by s(u
i
j
) the sensor associated with u
i
j
, we set
t(u
i
j,d
, s) =
_
_
_
d, s = s(u
i
j
)
0, otherwise
We could generate the collection of observation subsets S
i
from Eq. (4.22) using
this structure, with 
i
and t(, ) as described above, and ¹ = ¦r
s
[s ∈ o¦. However,
this would introduce a substantial ineﬃciency which is quite avoidable. Namely, in
addition to containing the single element observation subset ¦u
i
j,d
¦, the collection of
subsets S
i
may also contain several subsets of the form ¦u
i
j,d
l
[
l
d
l
= d¦.
5
We would
prefer to avoid this redundancy, ensuring instead that the only way observe object i
using sensor mode j for a total duration of d time slots is to choose the elemental
observation u
i
j,d
alone. This can be achieved by introducing one additional resource for
each combination of object and sensor mode, ¦r
i,j
[i ∈ ¦1, . . . , M¦, j ∈ ¦1, . . . , η¦¦, each
with capacity C
r
i,j
= 1, setting:
t(u
i
j,d
, r
i
,j
) =
_
_
_
1, i = i
, j = j
0, otherwise
The additional constraints generated by these resources ensure that (at most) a single
element from the set ¦u
i
j,d
[d ∈ ¦1, . . . , N¦¦ can be chosen for each (i, j). As the following
experiment demonstrates, this results in a dramatic decrease in the action space for each
object, enabling solution of larger problems.
4.5.2 Computational experiment: waveform selection
To demonstrate the reduction in computational complexity that results from this for
mulation, we apply it to a modiﬁcation of the example presented in Section 4.4.2. We
set ∆t = 0.01 and reduce the sensor platform motion to 0.01 units/step. Fig. 4.13
shows variation in rewards for 50 objects being observed with a single mode of a single
sensor over 50 time slots in one realization of the example, conﬁrming that the rewards
are reasonably well approximated as being timeinvariant.
5
For example, as well as providing an observation subset for observing object i with sensor mode j
100 110 120 130 140 150 160
1
1.05
1.1
1.15
Time step (k)
Reward at time k / reward at time 101
R
a
t
i
o
o
f
r
e
w
a
r
d
s
Azimuth/range rate observation
Azimuth/range observation
Figure 4.13. Diagram illustrates the variation of rewards over the 50 time step planning
horizon commencing from time step k = 101. The line plots the ratio between the reward
of each observation at time step in the planning horizon and the reward of the same
observation at the ﬁrst time slot in the planning horizon, averaged over 50 objects. The
error bars show the standard deviation of the ratio, i.e., the variation between objects.
158 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
The simulations run for 200 time slots. The initial position and velocity of the
objects is identical to that in Section 4.4.2; the initial state estimates are corrupted by
additive Gaussian noise with covariance (1 + ρ
i
)I, where ρ
i
is randomly drawn, inde
pendently for each object, from a uniform distribution on [0, 1] (the values are known
to the controller). The observation model is identical to that described in Section 4.4.2.
If the controller chooses to observe an object for d time slots, the variance is reduced
by a factor of d from that of the same single time slot observation; in the absence of
the dynamics process this is equivalent to d single time slot observations.
At each time step, the controller constructs an Nstep plan. Since rewards are
assumed time invariant, the elements of this plan are not directly attached to time
slots. We assign time slots to each task in the plan by processing objects in random
order, assigning the ﬁrst time slots to the observations assigned to the ﬁrst object in
the random order, etc. We then execute the action assigned to the ﬁrst time slot in
the plan, before constructing a new plan; this is consistent with the open loop feedback
control methodology used in the examples in Section 4.4.
The results of the simulations are shown in Fig. 4.14. There is a very small gain in
performance as the planning horizon increases from one time slot to around ﬁve time
slots. Beyond this limit, the performance drops to be lower than the performance of
the greedy heuristic (i.e., using a single step planning horizon). This is due to the
mismatch between the assumed model (that rewards are time invariant) and the true
model (that rewards are indeed time varying), which worsens as the planning horizon
increases. The computational cost in the lower diagram demonstrates the eﬃciency
of this formulation. With a planning horizon of 40, we are taking an average of four
observations per object. This would attract a very large computational burden in the
original formulation discussed in the experiments of Section 4.4.
4.5.3 Example of potential beneﬁt
To demonstrate the beneﬁt that additional planning can provide in problems involving
time invariant rewards, we construct an experiment that is an extension of the example
presented in Section 3.4. The scenario involves M = 50 objects being observed using a
single sensor over 100 time slots. The fourdimensional state of each object is static in
for d = 10 steps, there are also subsets for observing object i with sensor mode j multiple times for the
same total duration—e.g., for 4 and 6 steps, for 3 and 7 steps, and for 1, 4 and 5 steps.
0 5 10 15 20 25 30 35 40
0.96
0.97
0.98
0.99
1
1.01
1.02
Horizon length (time slots)
R
e
l
a
t
i
v
e
l
o
s
s
/
g
a
i
n
Performance in 17 Monte Carlo simulations of 10 objects
0 5 10 15 20 25 30 35 40
10
−3
10
−2
10
−1
10
0
Horizon length (time slots)
Average computation time to produce plan
A
v
e
r
a
g
e
t
i
m
e
(
s
e
c
o
n
d
s
)
Figure 4.14. Top diagram shows the total reward for each planning horizon length divided
by the total reward for a single step planning horizon, averaged over 17 Monte Carlo
simulations. Error bars show the standard deviation of the mean performance estimate.
Lower diagram shows the average time required to produce plan for the diﬀerent length
planning horizon lengths.
160 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
time, and the initial distribution is Gaussian with covariance:
P
i
0
=
_
¸
¸
¸
¸
_
1.1 0 0 0
0 1.1 0 0
0 0 1 0
0 0 0 1
_
¸
¸
¸
¸
_
In each time slot, there are three diﬀerent linear Gaussian observations available for
each object. The measurement noise covariance of each is R
i,1
k
= R
i,2
k
= R
i,3
k
= 10
−6
I,
while the forward models are:
H
i,1
k
=
_
1 0 0 0
0 1 0 0
_
; H
i,2
k
=
_
1 0 0 0
0 0 1 0
_
; H
i,3
k
=
_
0 1 0 0
0 0 0 1
_
The performance of the algorithm in the scenario is summarized in Fig. 4.15. The
results demonstrate that performance increases by 29% as planning increases from a
single step to the full simulation length (100 steps). Additional planning allows the
controller to anticipate that it will be possible to take a second observation of each
object, and hence, rather than utilizing the ﬁrst observation (which has the highest
reward) it should utilize either the second or third observations, which are completely
complementary and together provide all of the information found in the ﬁrst. For
longer planning horizons, the computational complexity appears to be roughly linear
with planning horizon; the algorithm is able to construct a plan for the entire 100 time
slots in ﬁve seconds.
4.6 Conclusion
The development in this chapter provides an eﬃcient method of optimal and near
optimal solution for a wide range of beam steering problems involving multiple inde
pendent objects. Each iteration of the algorithm in Section 4.3.3 provides an successive
reduction of the upper bound to the reward attainable. This may be combined with
the augmented problem presented in Section 4.3.6, which provides a series of solutions
for which the rewards are successively improved, to yield an algorithm that terminates
when a solution has been found that is within the desired tolerance of optimality. The
experiments in Section 4.4 demonstrate the computational eﬃciency of the approach,
and the gain in performance that can be obtained through use of longer planning hori
zons.
In practical scenarios it is commonly the case that, when objects become closely
spaced, the only measurements available are joint observations of the two (or more)
0 20 40 60 80 100
1
1.05
1.1
1.15
1.2
1.25
1.3
1.35
Horizon length (time slots)
R
e
l
a
t
i
v
e
g
a
i
n
Reward relative to one−step planning horizon
0 20 40 60 80 100
10
−3
10
−2
10
−1
10
0
10
1
10
2
A
v
e
r
a
g
e
t
i
m
e
(
s
e
c
o
n
d
s
)
Horizon length (time slots)
Average computation time to produce plan
Figure 4.15. Upper diagram shows the total reward obtained in the simulation using
diﬀerent planning horizon lengths, divided by the total reward when the planning horizon
is one. Lower diagram shows the average computation time to produce a plan for the
following N steps.
162 CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
objects, rather than observations of the individual objects. This inevitably results
in statistical dependency between object states in the conditional distribution; the
dependency is commonly represented through association hypotheses (e.g., [14, 19, 58,
71, 82]).
If the objects can be decomposed into many small independent “groups”, as in the
notion of independent clusters in Multiple Hypothesis Tracking (MHT), then Algo
rithm 4.1 may be applied to the transformed problem in which each independent group
of objects is treated as a single object with state and action space corresponding to the
cartesian product of the objects forming the group. This approach may be tractable if
the number of objects in each group remains small.
Chapter 5
Sensor management in
sensor networks
N
ETWORKS of intelligent sensors have the potential to provide unique capabilities
for monitoring wide geographic areas through the intelligent exploitation of local
computation (so called innetwork computing) and the judicious use of intersensor
communication. In many sensor networks energy is a dear resource to be conserved so
as to prolong the network’s operational lifetime. Additionally, it is typically the case
that the energy cost of communications is orders of magnitude greater than the energy
cost of local computation [80, 81].
Tracking moving objects is a common application in which the quantities of inter
est (i.e., kinematic state) are inferred largely from sensor observations which are in
proximity to the object (e.g., [62]). Consequently, local fusion of sensor data is suﬃ
cient for computing an accurate estimate of object state, and the knowledge used to
compute this estimate is summarized by the conditional probability density function
(PDF). This property, combined with the need to conserve energy, has led to a variety
of approaches (e.g., [37, 64]) which eﬀectively designate the responsibility of computing
the conditional PDF to one sensor node (referred to as the leader node) in the network.
Over time the leader node changes dynamically as function of the kinematic state of the
object. This leads to an inevitable tradeoﬀ between the uncertainty in the conditional
PDF, the cost of acquiring observations, and the cost of propagating the conditional
PDF through the network. In this chapter we examine this tradeoﬀ in the context of
object tracking in distributed sensor networks.
We consider a sensor network consisting a set of sensors (denoted o, where [o[ = N
s
),
in which the sensing model is assumed to be such that the observation provided by
the sensor is highly informative in the region close to the node, and uninformative in
regions far from the node. For the purpose of addressing the primary issue, trading oﬀ
163
164 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
energy consumption for accuracy, we restrict ourselves to sensor resource planning issues
associated with tracking a single object. While additional complexities certainly arise
in the multiobject case (e.g., data association) they do not change the basic problem
formulation or conclusions.
If the energy consumed by sensing and communication were unconstrained, then the
optimal solution would be to collect and fuse the observations provided by all sensors
in the network. We consider a scheme in which, at each time step, a subset of sensors is
selected to take an observation and transmit to a sensor referred to as the leader node,
which fuses the observations with the prior conditional PDF and tasks sensors at the
next time step. The questions which must be answered by the controller are how to
select the subset of sensors at each point in time, and how to select the leader node at
each point in time.
The approach developed in Section 5.1 allows for optimization of estimation per
formance subject to a constraint on expected communication cost, or minimization of
communication cost subject to a constraint on expected estimation performance. The
controller uses a dual problem formulation to adaptively utilize multiple sensors at
each time step, incorporating a subgradient update step to adapt the dual variable
(Section 5.1.9), and introducing a heuristic cost to go in the terminal cost to avoid
anomalous behavior (Section 5.1.10). Our dual problem formulation is closely related
to [18], and provides an approximation which extends the Lagrangian relaxation ap
proach to problems involving sequential replanning. Other related work includes [29],
which suggests incorporation of sensing costs and estimation performance into a uniﬁed
objective without adopting the constrained optimization framework that we utilize, and
[20], which adopts a constrained optimization framework without incorporating estima
tion performance and sensing cost into a uniﬁed objective, a structure which results in
a major computational saving for our approach.
5.1 Constrained Dynamic Programming Formulation
The sensor network object tracking problem involves an inherent tradeoﬀ between
performance and energy expenditure. One way of incorporating both estimation per
formance and communication cost into an optimization procedure is to optimize one of
the quantities subject to a constraint on the other. In the development which follows,
we provide a framework which can be used to either maximize the information obtained
from the selected observations subject to a constraint on the expected communication
cost, or to minimize the communication cost subject to a constraint on the estimation
Sec. 5.1. Constrained Dynamic Programming Formulation 165
quality. This can be formulated as a constrained Markov Decision Process (MDP), as
discussed in Section 2.2.3.
As discussed in the introduction to this chapter, the tracking problem naturally ﬁts
into the Bayesian state estimation formulation, such that the role of the sensor network
is to maintain a representation of the conditional PDF of the object state (i.e., position,
velocity, etc) conditioned on the observations. In our experiments, we utilize a parti
cle ﬁlter to maintain this representation (as described in Section 2.1.4), although the
planning method that we develop is equally applicable to any state estimation method
including the Kalman ﬁlter (Section 2.1.2), extended Kalman ﬁlter (Section 2.1.3), or
unscented Kalman ﬁlter [39]. An eﬃcient method of compressing particle representa
tions of PDFs for transmission in sensor networks is studied in [36]; we envisage that
any practical implementation of particle ﬁlters in sensor networks would use such a
scheme.
The estimation objective that we employ in our formulation is discussed in Sec
tion 5.1.1, while our communication cost is discussed in Section 5.1.2. These two
quantities are utilized diﬀerently in dual formulations, the ﬁrst of which optimizes
estimation performance subject to a constraint on communication cost (Section 5.1.3),
and the second of which optimizes communication cost subject to a constraint on esti
mation performance (Section 5.1.4). In either case, the control choice available at each
time is u
k
= (l
k
, o
k
), where l
k
∈ o is the leader node at time k and o
k
⊆ o is the
subset of sensors activated at time k. The decision state of the dynamic program is the
combination of conditional PDF of object state, denoted X
k
p(x
k
[ˇ z
0
, . . . , ˇ z
k−1
), and
the previous choice of leader node, l
k−1
∈ o.
5.1.1 Estimation objective
The estimation objective that we utilize in our formulation is the joint entropy of the
object state over the N steps commencing from the current time k:
H(x
k
, . . . , x
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
k
k
, . . . , z
S
k+N−1
k+N−1
)
where z
S
l
l
denotes the random variables corresponding to the observations of the sensors
in the set o
l
⊆ o at time l. As discussed in Section 2.3.6, minimizing this quantity
with respect to the observation selections o
l
is equivalent to maximizing the following
mutual information expression:
k+N−1
l=k
I(x
l
; z
S
l
l
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
k
k
, . . . , z
S
l−1
l−1
)
166 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
Since we are selecting a subset of sensor observations at each time step, this expression
can be further decomposed using with an additional application of the chain rule. To do
so, we introduce an arbitrary ordering of the elements of o
l
, denoting the jth element
by s
j
l
, and the ﬁrst (j −1) elements by by o
j
l
¦s
1
l
, . . . , s
j−1
l
¦ (i.e., the selection prior
to introduction of the jth element):
k+N−1
l=k
S
l

j=1
I(x
l
; z
s
j
l
l
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
k
k
, . . . , z
S
l−1
l−1
, z
S
j
l
l
)
Our formulation requires this additivity of estimation objective. The algorithm we
develop could be applied to other measures of estimation performance, although the
objectives which result may not be as natural.
5.1.2 Communications
We assume that any sensor node can communicate with any other sensor node in the
network, and that the cost of these communications is known at every sensor node;
in practice this will only be required within a small region around each node. In our
simulations, the cost (per bit) of direct communication between two nodes is modelled
as being proportional to the square distance between the two sensors:
˜
C
ij
∝ [[y
i
−y
j
[[
2
2
(5.1)
where y
s
is the location of the sth sensor (which is assumed to be known, e.g., through
the calibration procedure as described in [34]). Communications between distant nodes
can be performed more eﬃciently using a multihop scheme, in which several sensors
relay the message from source to destination. Hence we model the cost of communi
cating between nodes i and j, C
ij
, as the length of the shortest path between i and j,
using the distances from Eq. (5.1) as arc lengths:
C
ij
=
n
ij
k=1
˜
C
i
k−1
i
k
(5.2)
where ¦i
0
, . . . , i
n
ij
¦ is the shortest path from node i = i
0
to node j = i
n
ij
. The shortest
path distances can be calculated using any shortest path algorithm, such as deter
ministic dynamic programming or label correcting methods [9]. We assume that the
complexity of the probabilistic model (i.e., the number of bits required for transmis
sion) is ﬁxed at B
p
bits, such that the energy required to communicate the model from
node i to node j is B
p
C
ij
. This value will depend on the estimation scheme used in
Sec. 5.1. Constrained Dynamic Programming Formulation 167
a given implementation; if a particle ﬁlter is utilized then the quantity may adapt as
the form of the probabilistic model varies, as detailed in [36]. The number of bits in an
observation is denoted as B
m
, such that the energy required to transmit an observation
from node i to node j is B
m
C
ij
. These costs may be amended to incorporate the cost of
activating the sensor, taking the measurement, the expected number of retransmissions
required, etc, without changing the structure of the solution.
5.1.3 Constrained communication formulation
Our ﬁrst formulation optimizes estimation performance subject to a constraint on com
munication cost. As discussed in Section 5.1.1, we utilize mutual information as our
estimation objective, and deﬁne the perstage cost:
g(X
k
, l
k−1
, u
k
) = −I(x
k
; z
S
k
k
[ˇ z
0
, . . . , ˇ z
k−1
) (5.3)
= −
S
k

j=1
I(x
k
; z
s
j
k
k
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
j
k
k
) (5.4)
We choose the perstage constraint contribution G(X
k
, l
k−1
, u
k
) to be such that the the
expected communication cost over the next N time steps is constrained:
G(X
k
, l
k−1
, u
k
) =
_
_
B
p
C
l
k−1
l
k
+
j∈S
k
B
m
C
l
k
j
_
_
(5.5)
The ﬁrst term of this communication cost captures the cost of transmitting the repre
sentation of the conditional PDF from the previous leader node to the new leader node,
while the second captures the cost of transmitting observations from each active sensor
to the new leader node.
Substituting the perstage cost and constraint function into Eq. (2.53), the uncon
strained optimization in the Lagrangian (for a particular value of the Lagrange multi
plier λ) can be solved conceptually using the recursive dynamic programming equation:
J
D
i
(X
i
, l
i−1
, λ) = min
u
i
_
¯ g(X
i
, l
i−1
, u
i
, λ) + E
X
i+1
X
i
,u
i
J
D
i+1
(X
i+1
, l
i
, λ)
_
(5.6)
for time indices i ∈ ¦k, . . . , k +N −1¦, terminated by J
D
k+N
(X
k+N
, l
k+N−1
, λ) = −λM.
The conditional PDF at the next time X
i+1
is calculated using the recursive Bayes
update described in Section 2.1. The augmented perstage cost combines the informa
tion gain and communication cost in a single quantity, adaptively adjusting the weight
168 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
applied to each using a Lagrange multiplier:
¯ g(X
k
, l
k−1
, u
k
, λ) = −
S
k

j=1
I(x
k
; z
s
j
k
k
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
j
k
k
) + λ
_
_
B
p
C
l
k−1
l
k
+
j∈S
k
B
m
C
l
k
j
_
_
(5.7)
This incorporation of the constraint terms into the perstage cost is a key step, which
allows the greedy approximation described in Sections 5.1.7 and 5.1.8 to capture the
tradeoﬀ between estimation quality and communication cost.
5.1.4 Constrained entropy formulation
The formulation above provides a means of optimizing the information obtained subject
to a constraint on the communication energy expended; there is also a closelyrelated
formulation which optimizes the communication energy subject to a constraint on the
entropy of probabilistic model of object state. The cost per stage is set to the commu
nication cost expended by the control decision:
g(X
k
, l
k−1
, u
k
) = B
p
C
l
k−1
l
k
+
j∈S
k
B
m
C
l
k
j
(5.8)
We commence by formulating a constraint function on the joint entropy of the state
of the object over each time in the planning horizon:
E¦H(x
k
, . . . , x
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
k
k
, . . . , z
S
k+N−1
k+N−1
)¦ ≤ H
max
(5.9)
Manipulating this expression using Eq. (2.72), we obtain
−E
_
_
_
k+N−1
i=k
S
i

j=1
I(x
i
; z
s
j
l
l
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
k
k
, . . . , z
S
l−1
l−1
, z
S
j
l
l
)
_
_
_
≤ H
max
−H(x
k
, . . . , x
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
) (5.10)
from which we set M = H
max
−H(x
k
, . . . , x
k+N−1
[ˇ z
0
, . . . , ˇ z
k−1
),
1
and
G(X
k
, l
k−1
, u
k
) = −
S
k

j=1
I(x
k
; z
s
j
k
k
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
j
k
k
) (5.11)
1
In our implementation, we construct a new control policy at each time step by applying the approx
imate dynamic programming method described in the following section commencing from the current
probabilistic model, X
k
. At time step k, H(x
k
, . . . , x
k+N−1
ˇ z0, . . . , ˇ z
k−1
) is a known constant (repre
senting the uncertainty prior to receiving any observations in the present planning horizon), hence the
dependence on X
k
is immaterial.
Sec. 5.1. Constrained Dynamic Programming Formulation 169
Following the same procedure as described previously, the elements of the information
constraint in Eq. (5.10) can be integrated into the perstage cost, resulting in a for
mulation which is identical to Eq. (5.7), except that the Lagrange multiplier is on the
mutual information term, rather than the communication cost terms:
¯ g(X
k
, l
k−1
, u
k
, λ) = B
p
C
l
k−1
l
k
+
j∈S
k
B
m
C
l
k
j
−λ
_
_
S
k

j=1
I(x
k
; z
S
j
k
k
[ˇ z
0
, . . . , ˇ z
k−1
, z
S
1:j−1
k
k
)
_
_
(5.12)
5.1.5 Evaluation through Monte Carlo simulation
The constrained dynamic program described above has an inﬁnite state space (the space
of probability distributions over object state), hence it cannot be evaluated exactly.
The following sections describe a series of approximations which are applied to obtain
a practical implementation.
Conceptually, the dynamic program of Eq. (5.6) could be approximated by sim
ulating sequences of observations for each possible sequence of controls. There are
N
s
2
Ns
possible controls at each time step, corresponding all possible selections of leader
node and subsets of sensors to activate. The complexity of the simulation process is
formidable: to evaluate
¯
J
D
k
(X
k
, l
k−1
, λ) for a given decision state and control, we draw a
set of N
p
samples of the set of observations z
S
k
k
from the distribution p(z
S
k
k
[ˇ z
0
, . . . , ˇ z
k−1
)
derived from X
k
, and evaluate the cost to go one step later
¯
J
D
k+1
(X
k+1
, l
k
, λ) correspond
ing to the decision state resulting from each set of observations. The evaluation of each
cost to go one step later will yield the same branching. A tree structure develops, where
for each previous leaf of the tree, N
s
2
Ns
N
p
new leaves (samples) are drawn, such that
the computational complexity increases as O(N
s
N
2
NsN
N
p
N
) as the tree depth N (i.e.,
the planning horizon) increases, as illustrated in Fig. 5.1. Such an approach quickly
becomes intractable even for a small number of sensors (N
s
) and simulated observation
samples (N
p
), hence we seek to exploit additional structure in the problem to ﬁnd a
computable approximate solution.
5.1.6 Linearized Gaussian approximation
In Section 2.3.4, we showed that the mutual information of a linearGaussian observation
of a quantity whose prior distribution is also Gaussian is a function only of the prior
covariance and observation model, not of the state estimate. Since the covariance of
a Kalman ﬁlter is independent of observation values (as seen in Section 2.1.2), this
170 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
result implies that, in the recursion of Eq. (5.6), future rewards depend only on the
control values: they are invariant to the observation values that result. It is wellknown
that this result implies that open loop policies are optimal: we just need to search
for control values for each time step, rather than control policies. Accordingly, in the
linear Gaussian case, the growth of the tree discussed in Section 5.1.5 is reduced to
O(N
s
N
2
NsN
) with the horizon length N, rather than O(N
s
N
2
NsN
N
p
N
).
While this is a useful result, its applicability to this problem is not immediately clear,
as the observation model of interest generally nonlinear (such as the model discussed in
Section 5.3). However, let us suppose that the observation model can be approximated
by linearizing about a nominal state trajectory. If the initial uncertainty is relatively
low, the strength of the dynamics noise is relatively low, and the planning horizon
length is relatively short (such that deviation from the nominal trajectory is small), then
such a linearization approximation may provide adequate ﬁdelity for planning of future
actions (this approximation is not utilized for inference: in our experiments, the SIS
algorithm of Section 2.1.4 is used with the nonlinear observation function to maintain
the probabilistic model). To obtain the linearization, we ﬁt a Gaussian distribution
to the a priori PDF (e.g., using Eq. (2.44)); suppose that the resulting distribution
is ^(x
k
; µ
k
, P
k
). We then calculate the nominal trajectory by calculating the mean
at each of the following N steps. In the case of the stationary linear dynamics model
discussed in Section 2.1.1:
x
0
k
= µ
k
(5.13)
x
0
i
= Fx
0
i−1
, i ∈ ¦k + 1, . . . , k + N −1¦ (5.14)
Subsequently, the observation model is approximated using Eq. (2.19) where the lin
earization point at time i is x
0
i
. This is a wellknown approximation, referred to as
the linearized Kalman ﬁlter; it is discussed further in Section 2.1.3; it was previously
applied to a sensor scheduling problem in [21]. The controller which results has a struc
ture similar to the open loop feedback controller (Section 2.2.2): at each stage a plan
for the next N time steps is generated, the ﬁrst step of the plan executed, and then a
new plan for the following N steps is generated, having relinearized after incorporating
the newly received observations.
A signiﬁcant horizon length is required in order to provide an eﬀective tradeoﬀ
between communication cost and inference quality, since many time steps are required
for the longterm communication cost saved and information gained from a leader node
change to outweigh the immediate communication cost incurred. While the linear
Sec. 5.1. Constrained Dynamic Programming Formulation 171
Gaussian approximation eliminates the O(N
p
N
) factor in the growth of computational
complexity with planning horizon length, the complexity is still exponential in both time
and the number of sensors, growing as O(N
s
N
2
NsN
). The following two sections describe
two tree pruning approximations we introduce to obtain a tractable implementation.
5.1.7 Greedy sensor subset selection
To avoid the combinatorial complexity associated with optimization over subsets of sen
sors, we decompose each decision stage into a number of substages and apply heuris
tic approximations in a carefully chosen way. Following the application of the lin
earized Gaussian approximation (Section 5.1.6), the branching of the computation tree
of Fig. 5.1 will be reduced to the structure shown in Fig. 5.2. Each stage of control
branching involves selection of a leader node, and a subset of sensors to activate; we
can break these two phases apart, as illustrated in Fig. 5.3. Finally, one can decom
pose the choice of which subset of sensors to activate (given a choice of leader node)
into a generalized stopping problem [9] in which, at each substage (indexed by i
), the
control choices are to terminate (i.e., move on to portion of the tree corresponding to
the following time slot) with the current set of selections, or to select an additional
sensor. This is illustrated in Fig. 5.4; the branches labelled ‘T’ represent the decision
to terminate with the currently selected subset.
For the communication constrained formulation, the DP recursion becomes:
¯
J
i
(X
i
, l
i−1
, λ) = min
l
i
∈S
¦λB
p
C
l
i−1
l
i
+
¯
J
0
i
(X
i
, l
i
, ¦∅¦, λ)¦ (5.15)
for i ∈ ¦k, . . . , k + N −1¦, terminated by setting
¯
J
N
(X
N
, l
N−1
, λ) = −λM, where
¯
J
i
i
(X
i
, l
i
, o
i
i
, λ) = min
_
E
X
i+1
X
i
,S
i
i
¯
J
i+1
(X
i+1
, l
i
, λ),
min
s
i
i
∈S\S
i
i
¦¯ g(X
i
, l
i
, o
i
i
, s
i
i
, λ) +
¯
J
i
+1
i
(X
i
, l
i
, o
i
i
∪ ¦s
i
i
¦, λ)¦
_
(5.16)
o
i
i
is the set of sensors chosen in stage i prior to substage i
, and the substage cost
¯ g(X
i
, l
i
, o
i
i
, s
i
i
, λ) is
¯ g(X
i
, l
i
, o
i
i
, s
i
i
, λ) = λB
m
C
l
i
s
i
i
−I(x
i
; z
s
i
i
i
[ˇ z
0
, . . . , ˇ z
i−1
, z
S
i
i
i
) (5.17)
The cost to go
¯
J
i
(X
i
, l
i−1
, λ) represents the expected cost to the end of the problem
(i.e., the bottom of the computation tree) commencing from the beginning of time slot
z
u
1
k
,1
k
z
u
1
k
,2
k
z
u
1
k
,3
k
u
1
k+1
u
2
k+1
u
3
k+1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
u
1
k
u
2
k
u
3
k
Figure 5.1. Tree structure for evaluation of the dynamic program through simulation.
At each stage, a tail subproblem is required to be evaluated each new control, and a set
of simulated values of the resulting observations.
choices for
leader node
& active subset
at time k
choices for
leader node
& active subset
at time k+1
Figure 5.2. Computation tree after applying the linearized Gaussian approximation of
Section 5.1.6.
leader node
at time k
leader node
at time k+1
active subset
at time k
Figure 5.3. Computation tree equivalent to Fig. 5.2, resulting from decomposition of
control choices into distinct stages, selecting leader node for each stage and then selecting
the subset of sensors to activate.
leader node
at time k
leader node
at time k+1
second active sensor
at time k
first active sensor
at time k
T
T
Figure 5.4. Computation tree equivalent to Fig. 5.2 and Fig. 5.3, resulting from further
decomposing sensor subset selection problem into a generalized stopping problem, in which
each substage allows one to terminate and move onto the next time slot with the current
set of selected sensors, or to add an additional sensor.
Sec. 5.1. Constrained Dynamic Programming Formulation 175
i (i.e., the position of the tree in Fig. 5.4 where branching occurs over choices of leader
node for that time slot). The function
¯
J
i
i
(X
i
, l
i
, o
i
i
, λ), represents the cost to go from
substage i
of stage i to the end of the problem, i.e., the expected cost to go to the
bottom of the tree commencing from a partial selection of which sensors to activate at
time i. The ﬁrst choice in the outer minimization in Eq. (5.16) represents the choice
to terminate (i.e., move on to the next time slot) with the currently selected subset of
sensors, while the second represents the choices of additional sensors to select.
While this formulation is algebraically equivalent to the original problem, it is in
a form which is more suited to approximation. Namely, the substages which form a
generalized stopping problem may be performed using a greedy method, in which, at
each stage, if there is no sensor s
i
i
for which the substage cost ¯ g(X
i
, l
i
, o
i
i
, s
i
i
, λ) ≤
0 (i.e., for which the cost of transmitting the observation is not outweighed by the
expected information it will provide), then we progress to the next stage; otherwise the
sensor s
i
i
with the lowest substage cost is added. The fact that the constraint terms
of the Lagrangian were distributed into the perstage and persubstage cost allows the
greedy approximation to be used in a way which trades oﬀ estimation quality and
communication cost.
While worstcase complexity of this algorithm is O(N
2
s
), careful analysis of the sensor
model can yield substantial practical reductions. One quite general simpliﬁcation can
be made: assuming that sensor measurements are independent conditioned on the state,
one can show that, for the substage cost in Eq. (5.17) since the ﬁrst term is constant
with respect to o
i
i
and the second is submodular: (assuming that observations are
independent conditioned on the state)
¯ g(X
i
, l
i
, o
i
i
, s, λ) ≤ ¯ g(X
i
, l
i
, o
i
i
, s, λ) ∀ i
< i
(5.18)
Using this result, if at any substage of stage i we ﬁnd that the substage cost of adding
a particular sensor is greater then zero (so that the augmented cost of activating the
sensor is higher than the augmented cost of terminating), then that sensor will not be
selected in any later substages of stage i (as the cost cannot decrease as we add more
sensors), hence it can be excluded from consideration. In practice this will limit the
sensors requiring consideration to those in a small neighborhood around the current
leader node and object, reducing computational complexity when dealing with large
networks.
.
.
.
.
.
.
.
.
.
l
1
k
l
2
k
l
3
k
l
1
k+1
l
2
k+1
l
3
k+1
l
1
k+2
l
2
k+2
l
3
k+2
G
G G G
G G
Figure 5.5. Tree structure for nscan pruning algorithm with n = 1. At each stage new
leaves are generated extending each remaining sequence with using each new leader node.
Subsequently, all but the best sequence ending with each leader node is discarded (marked
with ‘×’), and the remaining sequences are extended using greedy sensor subset selection
(marked with ‘G’).
Sec. 5.1. Constrained Dynamic Programming Formulation 177
5.1.8 nScan pruning
The algorithm described above is embedded within a slightly less coarse approxima
tion for leader node selection, which incorporates costs over multiple time stages. This
approximation operates similarly to the nscan pruning algorithm, which is commonly
used to control computational complexity in the Multiple Hypothesis Tracker [58]. Set
ting n = 1, the algorithm is illustrated in Fig. 5.5. We commence by considering each
possible choice of leader node
2
for the next time step and calculating the greedy sensor
subset selection from Section 5.1.7 for each leader node choice (the decisions made in
each of these branches will diﬀer since the sensors must transmit their observations to
a diﬀerent leader node, incurring a diﬀerent communication cost). Then, for each leaf
node, we consider the candidate leader nodes at the following time step. All sequences
ending with the same candidate leader node are compared, the one with the lowest
cost value is kept, and the other sequences are discarded. Thus, at each stage, we keep
some approximation of the best control trajectory which ends with each sensor as leader
node.
Using such an algorithm, the tree width is constrained to the number of sensors,
and the overall worst case computational complexity is O(NN
s
3
) (in practice, at each
stage we only consider candidate sensors in some neighborhood of the estimated ob
ject location, and the complexity will be substantially lower). This compares to the
simulationbased evaluation of the full dynamic programming recursion which, as dis
cussed in Section 5.1.5, has a computation complexity of the order O(N
s
N
2
NsN
N
p
N
).
The diﬀerence in complexity is striking: even for a problem with N
s
= 20 sensors, a
planning horizon of N = 10 and simulating N
p
= 50 values of observations at each
stage, the complexity is reduced from 1.6 10
90
to (at worst case) 8 10
5
.
Because the communication cost structure is Markovian with respect to the leader
node (i.e., the communication cost of a particular future control trajectory is unaﬀected
by the control history given the current leader node), it is captured perfectly by this
model. The information reward structure, which is not Markovian with respect to the
leader node, is approximated using the greedy method.
5.1.9 Sequential subgradient update
The previous two sections provide an eﬃcient algorithm for generating a plan for the
next N steps given a particular value of the dual variable λ. Substituting the resulting
2
The set of candidate leader nodes would, in practice, be limited to sensors close to the object,
similar to the sensor subset selection.
178 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
plan into Eq. (2.56) yields a subgradient which can be used to update the dual variables
(under the linear Gaussian approximation, feedback policies correspond to open loop
plans, hence the argument of the expectation of E[
i
G(X
i
, l
i−1
, µ
i
(X
i
, l
i−1
)) − M] is
deterministic). A full subgradient implementation would require evaluation for many
diﬀerent values of the dual variable each time replanning is performed, which is un
desirable since each evaluation incurs a substantial computational cost.
3
Since the
planning is over many time steps, in practice the level of the constraint (i.e., the value
of E[
i
G(X
i
, l
i−1
, µ
i
(X
i
, l
i−1
))−M]) will vary little between time steps, hence the slow
adaptation of the dual variable provided by a single subgradient step in each iteration
may provide an adequate approximation.
In the experiments which follow, at each time step we plan using a single value of
the dual variable, and then update it for the next time step utilizing either an additive
update:
λ
k+1
=
_
_
_
min¦λ
k
+ γ
+
, λ
max
¦, E[
i
G(X
i
, l
i−1
, µ
i
(X
i
, l
i−1
))] > M
max¦λ
k
−γ
−
, λ
min
¦, E[
i
G(X
i
, l
i−1
, µ
i
(X
i
, l
i−1
))] ≤ M
(5.19)
or a multiplicative update:
λ
k+1
=
_
_
_
min¦λ
k
β
+
, λ
max
¦, E[
i
G(X
i
, l
i−1
, µ
i
(X
i
, l
i−1
))] > M
max¦λ
k
/β
−
, λ
min
¦, E[
i
G(X
i
, l
i−1
, µ
i
(X
i
, l
i−1
))] ≤ M
(5.20)
where γ
+
and γ
−
are the increment and decrement sizes, β
+
and β
−
are the increment
and decrement factors, and λ
max
and λ
min
are the maximum and minimum values of
the dual variable. It is necessary to limit the values of the dual variable since the
constrained problem may not be feasible. If the variable is not constrained, undesirable
behavior can result such as utilizing every sensor in a sensor network in order to meet
an information constraint which cannot be met in any case, or because the dual variable
in the communication constraint was adapted such that it became too low, eﬀectively
implying that communications are costfree.
The dual variables may be initialized using several subgradient iterations or some
form of line search when the algorithm is ﬁrst executed in order to commence with a
value in the right range.
3
The rolling horizon formulation necessitates reoptimization of the dual variable at every time step,
as opposed to [18].
Sec. 5.1. Constrained Dynamic Programming Formulation 179
5.1.10 Rollout
If the horizon length is set to be too small in the communications constrained for
mulation, then the resulting solution will be to hold the leader node ﬁxed, and take
progressively fewer observations. To prevent this degenerate behavior, we use a rollout
approach (a commonly used suboptimal control methodology), in which we add to the
terminal cost in the DP recursion (Eq. (5.6)) the cost of transmitting the probabilistic
model to the sensor with the smallest expected distance to the object at the ﬁnal stage.
Denoting by ˜ µ(X
k
) ∈ o the policy which selects as leader node the sensor with the
smallest expected distance to the object, the terminal cost is:
J
k+N
(X
k+N
, l
k+N−1
) = λB
p
C
l
k+N−1
˜ µ(X
k+N
)
(5.21)
where the Lagrange multiplier λ is included only in the communicationconstrained
case. This eﬀectively acts as the cost of the base policy in a rollout [9]. The resulting
algorithm constructs a plan which assumes that, at the ﬁnal stage, the leader node will
have to be transferred to the closest sensor, hence there is no beneﬁt in holding it at its
existing location indeﬁnitely. In the communicationconstrained case, this modiﬁcation
can make the problem infeasible for short planning horizons, but the limiting of the
dual variables discussed in Section 5.1.9 can avoid anomalous behavior.
5.1.11 Surrogate constraints
A form of information constraint which is often more desirable is one which captures
the notion that it is acceptable for the uncertainty in object state to increase for short
periods of time if informative observations are likely to become available later. The
minimum entropy constraint is such an example:
E
_
min
i∈{k,...,k+N−1}
H(x
i
[ˇ z
0
, . . . , ˇ z
i−1
) −H
max
_
≤ 0 (5.22)
The constraint in Eq. (5.22) does not have an additive decomposition (cf Eq. (5.10)),
as required by the approximations in Sections 5.1.7 and 5.1.8. However, we can use
the constraint in Eq. (5.10) to generate plans for a given value of the dual variable λ
using the approximations, and then perform the dual variable update of Section 5.1.9
using the desired constraint, Eq. (5.22). This simple approximation eﬀectively uses the
additive constraint in Eq. (5.10) as a surrogate for the desired constraint in Eq. (5.22),
allowing us to use the computationally convenient method described above with a more
meaningful criterion.
180 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
5.2 Decoupled Leader Node Selection
Most of the sensor management strategies proposed for object localization in existing
literature seek to optimize the estimation performance of the system, incorporating
communication cost indirectly, such as by limiting the maximum number of sensors
utilized. These methods typically do not consider the leader node selection problem
directly, although the communication cost consumed in implementing them will vary
depending on the leader node since communications costs are dependent on the trans
mission distance. In order to compare the performance of the algorithm developed in
Section 5.1 with these methods, we develop an approach which, conditioned on a par
ticular sensor management strategy (that is insensitive to the choice of leader node),
seeks to dynamically select the leader node to minimize the communications energy
consumed due to activation, deactivation and querying of sensors by the leader node,
and transmission of observations from sensors to the leader node. This involves a trade
oﬀ between two diﬀerent forms of communication: the large, infrequent step increments
produced when the probability distribution is transferred from sensor to sensor during
leader node handoﬀ, and the small, frequent increments produced by activating, de
activating and querying sensors. The approach is fundamentally diﬀerent from that in
Section 5.1 as we are optimizing the leader node selection conditioned on a ﬁxed sensor
management strategy, rather than jointly optimizing sensor management and leader
node selection.
5.2.1 Formulation
The objective which we seek to minimize is the expected communications cost over
an Nstep rolling horizon. We require the sensor management algorithm to provide
predictions of the communications performed by each sensor at each time in the fu
ture. As in Section 5.1, the problem corresponds to a dynamic program in which the
decision state at time k is the combination of the conditional PDF of object state,
X
k
p(x
k
[ˇ z
0
, . . . , ˇ z
k−1
), and the previous leader node, l
k−1
. The control which we
may choose is the leader node at each time, u
k
= l
k
∈ o. Denoting the expected cost of
communications expended by the sensor management algorithm (due to sensor activa
tion and deactivation, querying and transmission of obervations) at time k if the leader
node is l
k
as g
c
(X
k
, l
k
), the dynamic program for selecting the leader node at time k
can be written as the following recursive equation:
J
i
(X
i
, l
i−1
) = min
l
i
∈S
_
g
c
(X
i
, l
i
) + B
p
C
l
i−1
l
i
+ E
X
i+1
X
i
,l
i
J
i+1
(X
i+1
, l
i
)
_
(5.23)
Sec. 5.3. Simulation results 181
for i ∈ ¦k, . . . , k + N − 1¦. In the same way as discussed in Section 5.1.10, we set the
terminal cost to the cost of transmitting the probabilistic model from the current leader
node to the node with the smallest expected distance to the object, ˜ µ(X
k+N
):
J
k+N
(X
k+N
, l
k+N−1
) = B
p
C
l
k+N−1
˜ µ(X
k+N
)
(5.24)
In Section 5.3 we apply this method using a single lookahead step (N = 1) with a
greedy sensor management strategy selecting ﬁrstly the most informative observation,
and then secondly the two most informative observations.
5.3 Simulation results
As an example of the employment of our algorithm, we simulate a scenario involving
an object moving through a network of sensors. The state of the object is position and
velocity in two dimensions (x
k
= [p
x
v
x
p
y
v
y
]
T
); the state evolves according to the
nominally constant velocity model described in Eq. (2.8), with ∆t = 0.25 and q = 10
−2
.
The simulation involves N
s
= 20 sensors positioned randomly according to a uniform
distribution inside a 100100 unit region; each trial used a diﬀerent sensor layout and
object trajectory. Denoting the measurement taken by sensor s ∈ o = ¦1 : N
s
¦ (where
N
s
is the number of sensors) at time k as z
s
k
, a nonlinear observation model is assumed:
z
s
k
= h(x
k
, s) + v
s
k
(5.25)
where v
s
k
∼ ^¦v
s
k
; 0, 1¦ is a white Gaussian noise process, independent of w
k
∀ k and
of v
j
k
, j ,= s ∀ k. The observation function h(, s) is a quasirange measurement, e.g.,
resulting from measuring the intensity of an acoustic emission of known amplitude:
h(x
k
, s) =
a
[[Lx
k
−y
s
[[
2
2
+ b
(5.26)
where L is the matrix which extracts the position of the object from the object state
(such that Lx
k
is the location of the object), and y
s
is the location of the sth sensor.
The constants a and b can be tuned to model the signaltonoise ratio of the sensor,
and the rate at which the signaltonoise ratio decreases as distance increases; we use
a = 2000 and b = 100. The information provided by the observation reduces as the
range increases due to the nonlinearity.
As described in Section 2.1.3, the measurement function h(, s) can be approximated
as a ﬁrstorder Taylor series truncation in a small vicinity around a nominal point x
0
:
z
s
k
≈ h(x
0
, s) +H
s
(x
0
)(x
k
−x
0
) +v
s
k
H
s
(x
0
) = ∇
x
h(x, s)[
x=x
0
182 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
where:
H
s
(x
0
) =
−2a
([[Lx
0
−y
s
[[
2
2
+ b)
2
(Lx
0
−y
s
)
T
L (5.27)
This approximation is used for planning as discussed in Section 5.1.6; the particle ﬁlter
described in Section 2.1.4 is used for inference.
The model was simulated for 100 Monte Carlo trials. The initial position of the
object is in one corner of the region, and the initial velocity is 2 units per second in
each dimension, moving into the region. The simulation ends when the object leaves
the 100100 region or after 200 time steps, which ever occurs sooner (the average
length is around 180 steps). The communication costs were B
p
= 64 and B
m
= 1, so
that the cost of transmitting the probabilistic model is 64 the cost of transmitting
an observation. For the communicationconstrained problem, a multiplicative update
was used for the subgradient method, with β
+
= β
−
= 1.2, λ
min
= 10
−5
, λ
max
=
510
−3
, and C
max
= 10N where N is the planning horizon length. For the information
constrained problem, an additive update was used for the subgradient method, with
γ
+
= 50, γ
−
= 250, λ
min
= 10
−8
, λ
max
= 500 and H
max
= 2 (these parameters were
determined experimentally).
The simulation results are summarized in Fig. 5.6. The top diagram demonstrates
that the communicationconstrained formulation provides a way of controlling sensor
selection and leader node which reduces the communication cost and improves estima
tion performance substantially over the myopic singlesensor methods, which at each
time activate and select as leader node the sensor with the observation producing the
largest expected reduction in entropy. The informationconstrained formulation allows
for an additional saving in communication cost while meeting an estimation criterion
wherever possible.
The top diagram in Fig. 5.6 also illustrates the improvement which results from uti
lizing a longer planning horizon. The constraint level in the communicationconstrained
case is 10 cost units per time step; since the average simulation length is 180 steps, the
average communication cost if the constraint were always met with equality would be
1800. However, because this cost tends to occur in bursts (due to the irregular handoﬀ
of leader node from sensor to sensor as the object moves), the practical behavior of the
system is to reduce the dual variable when there is no handoﬀ in the planning horizon
(allowing more sensor observations to be utilized), and increase it when there is a hand
oﬀ in the planning horizon (to come closer to meeting the constraint). A longer planning
horizon reduces this undesirable behavior by anticipating upcoming leader node hand
oﬀ events earlier, and tempering spending of communication resources sooner. This is
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
4
0.5
1
1.5
2
2.5
3
Accrued communication cost
A
v
e
r
a
g
e
p
o
s
i
t
i
o
n
e
n
t
r
o
p
y
Position entropy vs communication cost
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
4
0.5
1
1.5
2
2.5
3
Accrued communication cost
A
v
e
r
a
g
e
m
i
n
i
m
u
m
e
n
t
r
o
p
y
Position entropy vs communication cost
DP CC N=5
DP CC N=10
DP CC N=25
DP IC N=5
DP IC N=10
DP IC N=25
Greedy MI
Min Expect Dist
Figure 5.6. Position entropy and communication cost for dynamic programming method
with communication constraint (DP CC) and information constraint (DP IC) with diﬀer
ent planning horizon lengths (N), compared to the methods selecting as leader node and
activating the sensor with the largest mutual information (greedy MI), and the sensor with
the smallest expected square distance to the object (min expect dist). Ellipse centers show
the mean in each axis over 100 Monte Carlo runs; ellipses illustrate covariance, providing
an indication of the variability across simulations. Upper ﬁgure compares average position
entropy to communication cost, while lower ﬁgure compares average of the minimum en
tropy over blocks of the same length as the planning horizon (i.e., the quantity to which
the constraint is applied) to communication cost.
184 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
demonstrated in Fig. 5.7, which shows the adaptation of the dual variable for a single
Monte Carlo run.
In the informationconstrained case, increasing the planning horizon relaxes the
constraint, since it requires the minimum entropy within the planning horizon to be
less than a given value. Accordingly, using a longer planning horizon, the average
minimum entropy is reduced, and additional communication energy is saved. The lower
diagram in Fig. 5.6 shows the average minimum entropy in blocks of the same length as
the planning horizon, demonstrating that the information constraint is met more often
with a longer planning horizon (as well as resulting in a larger communication saving).
Fig. 5.8 compares the adaptive Lagrangian relaxation method discussed in Sec
tion 5.1 with the decoupled scheme discussed in Section 5.2, which adaptively selects
the leader node to minimize the expected communication cost expended in implement
ing the decision of the ﬁxed sensor management method. The ﬁxed sensor management
scheme activates the sensor or two sensors with the observation or observations produc
ing the largest expected reduction in entropy. The results demonstrate that for this case
the decoupled method using a single sensor at each time step results in similar estima
tion performance and communication cost to the Lagrangian relaxation method using
an information constraint with the given level. Similarly, the decoupled method using
two sensors at each time step results in similar estimation performance and commu
nication cost to the Lagrangian relaxation method using a communication constraint
with the given level. The additional ﬂexibility of the Lagrangian relaxation method
allows one to select the constraint level to achieve various points on the estimation
performance/communication cost tradeoﬀ, rather than being restricted to particular
points corresponding to diﬀerent numbers of sensors.
5.4 Conclusion and future work
This paper has demonstrated how an adaptive Lagrangian relaxation can be utilized
for sensor management in an energyconstrained sensor network. The introduction
of secondary objectives as constraints provides a natural methodology to address the
tradeoﬀ between estimation performance and communication cost.
The planning algorithm may be applied alongside a wide range of estimation meth
ods, ranging from the Kalman ﬁlter to the particle ﬁlter. The algorithm is also ap
plicable to a wide range of sensor models. The linearized Gaussian approximation
in Section 5.1.6 results in a structure identical to the OLFC. The remainder of our
algorithm (removing the linearized Gaussian approximation) may be applied to ﬁnd
0 20 40 60 80 100 120 140 160
0
1
2
3
4
5
x 10
−3
Time step (k)
Adaptation of dual variable λ
k
0 20 40 60 80 100 120 140 160
0
1000
2000
3000
Time step (k)
Communication cost accrual
N=5
N=10
N=25
Figure 5.7. Adaptation of communication constraint dual variable λ
k
for diﬀerent horizon
lengths for a single Monte Carlo run, and corresponding cumulative communication costs.
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10
4
0.5
1
1.5
2
2.5
3
Accrued communication cost
A
v
e
r
a
g
e
p
o
s
i
t
i
o
n
e
n
t
r
o
p
y
Position entropy vs communication cost
Greedy/Decoupled
Greedy 2/Decoupled
DP CC N=25
DP IC N=25
Figure 5.8. Position entropy and communication cost for dynamic programming method
with communication constraint (DP CC) and information constraint (DP IC), compared
to the method which dynamically selects the leader node to minimize the expected com
munication cost consumed in implementing a ﬁxed sensor management scheme. The ﬁxed
sensor management scheme activates the sensor (‘greedy’) or two sensors (‘greedy 2’) with
the observation or observations producing the largest expected reduction in entropy. El
lipse centers show the mean in each axis over 100 Monte Carlo runs; ellipses illustrate
covariance, providing an indication of the variability across simulations.
Sec. 5.4. Conclusion and future work 187
an eﬃcient approximation of the OLFC as long as an eﬃcient estimate of the reward
function (mutual information in our case) is available.
The simulation results in Section 5.3 demonstrate that approximations based on
dynamic programming are able to provide similar estimation performance (as measured
by entropy), for a fraction of the communication cost in comparison to simple heuristics
which consider estimation performance alone and utilize a single sensor. The discussion
in Section 5.1.7 provides a guide for eﬃcient implementation strategies that can enable
implementation on the latest generation wireless sensor networks. Future work includes
incorporation of the impact on planning caused by the interaction between objects when
multiple objects are observed by a single sensor, and developing approximations which
are less coarse than the linearized Gaussian model.
188 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS
Chapter 6
Contributions and future directions
T
HE preceding chapters have extended existing sensor management methods in
three ways: ﬁrstly, obtaining performance guarantees for sequential sensor man
agement problems; secondly, ﬁnding an eﬃcient integer programming solution that
exploits the structure of beam steering; and ﬁnally, ﬁnding an eﬃcient heuristic sen
sor management method for object tracking in sensor networks. This chapter brieﬂy
summarizes these contributions, before outlining suggestions of areas for further inves
tigation.
6.1 Summary of contributions
The following sections outline the contributions made in this thesis.
6.1.1 Performance guarantees for greedy heuristics
The analysis in Chapter 3 extends the recent work in [46] to the sequential problem
structures that commonly arise in waveform selection and beam steering. The extension
is quite general in that it applies to arbitrary, time varying observation and dynamical
models. Extensions include tighter bounds that exploit either process diﬀusiveness or
objectives involving discount factors, and applicability to closed loop problems. The
results apply to objectives including mutual information; the logdeterminant of the
Fisher information matrix was also shown to be submodular, yielding a guarantee on
the posterior Cram´erRao bound. Examples demonstrate that the bounds are tight,
and counterexamples illuminate larger classes of problems to which they do not apply.
The results are the ﬁrst of their type for sequential problems, and eﬀectively justify
the use of the greedy heuristic in certain contexts, delineating problems in which addi
tional open loop planning can be beneﬁcial from those in which it cannot. For example,
if a factor of 0.5 of the optimal performance is adequate, then additional planning is un
189
190 CHAPTER 6. CONTRIBUTIONS AND FUTURE DIRECTIONS
necessary for any problem that ﬁts within the structure. The online guarantees conﬁrm
cases in which the greedy heuristic is even closer to optimality.
6.1.2 Eﬃcient solution for beam steering problems
The analysis in Chapter 4 exploits the special structure in problems involving large
numbers of independent objects to ﬁnd an eﬃcient solution of the beam steering prob
lem. The analysis from Chapter 3 was utilized to obtain an upper bound on the objec
tive function. Solutions with guaranteed nearoptimality were found by simultaneously
reducing the upper bound and raising a matching lower bound.
The algorithm has quite general applicability, admitting time varying observation
and dynamical models, and observations requiring diﬀerent time durations to complete.
An alternative formulation that was specialized to time invariant rewards provided a
further computational saving. The methods are applicable to a wide range of objectives,
including mutual information and the posterior Cram´erRao bound.
Computational experiments demonstrated application to problems involving 50–80
objects planning over horizons up to 60 time slots. Performing planning of this type
through full enumeration would require evaluation of the reward of more than 10
100
diﬀerent observation sequences. As well as demonstrating that the algorithm is suitable
for online application, these experiments also illustrate a new capability for exploring
the beneﬁt that is possible through utilizing longer planning horizons. For example, we
have quantiﬁed the small beneﬁt of additional open loop planning in problems where
models exhibit low degrees of nonstationarity.
6.1.3 Sensor network management
In Chapter 5, we presented a method trading oﬀ estimation performance and energy
consumed in an object tracking problem. The trade oﬀ between these two quanti
ties was formulated by maximizing estimation performance subject to a constraint on
energy cost, or the dual of this, i.e., minimizing energy cost subject to a constraint
on estimation performance. Our analysis has proposed a planning method that is both
computable and scalable, yet still captures the essential structure of the underlying trade
oﬀ. The simpliﬁcations enable computation over much longer planning horizons: e.g.,
in a problem involving N
s
= 20 sensors, closed loop planning over a horizon of N = 20
time steps using N
p
= 50 simulated values of observations at each stage would involve
complexity of the order O([N
s
2
Ns
]
N
N
N
p
) ≈ 10
180
; the simpliﬁcations yield worstcase
computation of the order O(NN
3
s
) = 1.6 10
5
. Simulation results demonstrate the
Sec. 6.2. Recommendations for future work 191
dramatic reduction in the communication cost required to achieve a given estimation
performance level as compared to previously proposed algorithms. The approximations
are applicable to a wide range of problems; e.g., even if the linearized Gaussian as
sumption is relaxed, the remaining approximations may be applied to ﬁnd an eﬃcient
approximation of the open loop feedback controller as long as an eﬃcient estimate of
the reward function (mutual information in our case) is available.
6.2 Recommendations for future work
The following sections describe some promising areas for future investigation.
6.2.1 Performance guarantees
Chapter 3 has explored several performance guarantees that are possible through ex
ploitation of submodular objectives, as well as some of the boundaries preventing wider
application. Directions in which this analysis may be extended include those outlined
in the following paragraphs.
Guarantees for longer lookahead lengths
It is easy to show that no stronger guarantees exist for heuristics using longer look
ahead lengths for general models; e.g., if we introduce additional time slots, in which
all observations are uninformative, in between the two original time slots in Exam
ple 3.1, we can obtain the same factor of 0.5 for any lookahead length. However, under
diﬀusive assumptions, one would expect that additional lookahead steps would yield
an algorithm that is closer to optimal.
Observations consuming diﬀerent resources
Our analysis inherently assumes that all observations utilize the same resources: the
same options available to us in later stages regardless of the choice we make in the
current stage. In [46], a guarantee is obtained for the subset selection problem in which
each observation j has a resource consumption c
j
, and we seek the most informative
subset / of observations for which
j∈A
c
j
≤ C. Expanding this analysis to sequential
selection problems involving either a single resource constraint encompassing all time
slots, or separate resource constraints for each time slot, would be an interesting exten
sion. A guarantee with a factor of (e −1)/(2e −1) ≈ 0.387 can be obtained quite easily
in the latter case, but it may be possible to obtain tighter guarantees (i.e., 0.5).
192 CHAPTER 6. CONTRIBUTIONS AND FUTURE DIRECTIONS
Closed loop guarantees
Example 3.3 establishes that there is no guarantee on the ratio of the performance of
the greedy heuristic operating in closed loop to the performance of the optimal closed
loop controller. However, it may be possible to introduce additional structure (e.g.,
diﬀusiveness and/or limited bandwidth observations) to obtain some form of weakened
guarantee.
Stronger guarantees exploiting additional structure
Finally, while the guarantees in Chapter 3 have been shown to be tight within the level
of generality to which they apply, it may be possible to obtain stronger guarantees
for problems with speciﬁc structure, e.g., linear Gaussian problems with dynamics and
observation models satisfying particular properties. An example of this is the result
that greedy heuristics are optimal for beam steering of onedimensional linear Gaussian
systems [33].
6.2.2 Integer programming formulation of beam steering
Chapter 4 proposed a new method for eﬃcient solution of beam steering problems, and
explored its performance and computation complexity. There are several areas in which
this development may be extended, as outlined in the following sections.
Alternative update algorithms
The algorithm presented in Section 4.3.3 represents one of many ways in which the
update between iterations could be performed. It remains to explore the relative beneﬁts
of other update algorithms; e.g., generating candidate subsets for each of the chosen
exploration subset elements, rather than just the one with the highest reward increment.
Deferred reward calculation
It may be beneﬁcial to defer calculation of some of the incremental rewards, e.g., the
incremental reward of an exploration subset element conditioned on a given candidate
subset is low enough that it is unlikely to be chosen, it would seem unnecessary to
recalculate the incremental reward when the candidate subset is extended.
Sec. 6.2. Recommendations for future work 193
Accelerated search for lower bounds
The lower bound in Section 4.3.6 utilizes the results of the Algorithm 4.1 to ﬁnd the
best solution amongst those explored so far (i.e., those for which the exact reward
has been evaluated). However, the decisions made by Algorithm 4.1 tend to focus on
reducing the upper bound to the reward rather than on ﬁnding solutions for which the
reward is high. It may be beneﬁcial to incorporate heuristic searches that introduce
additional candidate subsets that appear promising in order to raise the lower bound
quickly as well as decreasing the upper bound. One example of this would be to include
candidate subsets corresponding to the decisions made by the greedy heuristic—this
would ensure that a solution at least as good as that of the greedy heuristic will be
obtained regardless of when the algorithm is terminated.
Integration into branch and bound procedure
In the existing implementation, changes made between iterations of the integer program
force the solver to restart the optimization. A major computational saving may result
if a method is found that allows the solution of the previous iteration to be applied to
the new solution. This is easily performed in linear programming problems, but is more
diﬃcult in integer programming problems since the bounds previously evaluated in the
branch and bound process are (in general) invalidated. A further extension along the
same line would be to integrate the algorithm for generating new candidate sets with
the branch and bound procedure for the integer program.
6.2.3 Sensor network management
Chapter 5 provides a computable method for tracking an object using a sensor network,
with demonstrated empirical performance. Possible extensions include multiple objects
and performance guarantees.
Problems involving multiple objects
While the discussion in Chapter 5 focused on the case of a single object, the concept
may be easily extended to multiple objects. When objects are wellseparated in space,
one may utilize a parallel instance of the algorithm from Chapter 5 for each object.
When objects become close together and observations induce conditional dependency
in their states, one may either store the joint conditional distribution of the object
group at one sensor, or utilize a distributed representation across two or more sensors.
In the former case, there will be a control choice corresponding to breaking the joint
194 CHAPTER 6. CONTRIBUTIONS AND FUTURE DIRECTIONS
distribution into its marginals after the objects separate again. This will result in a loss
of information and a saving in communication cost, both of which could be incorporated
into the tradeoﬀ performed by the constrained optimization. In the case of a distributed
representation of the joint conditional distribution, it will be necessary to quantify both
the beneﬁt (in terms of estimation performance) and cost of each of the communications
involved in manipulating the distributed representation.
Performance guarantees
The algorithm presented in Chapter 5 does not possess any performance guarantee;
Section 3.9 provides an example of a situation in which one element of the approxi
mate algorithm performs poorly. An extension of both Chapters 3 and 5 is to exploit
additional problem structure and amend the algorithm to guarantee performance.
Bibliography
[1] Karim M. Abadir and Jan R. Magnus. Matrix Algebra. Cambridge University
Press, 2005.
[2] Eitan Altman. Constrained Markov decision processes. Chapman and Hall, Lon
don, UK, 1999.
[3] Brian D. O. Anderson and John B. Moore. Optimal ﬁltering. PrenticeHall, En
glewood Cliﬀs, NJ, 1979.
[4] M. Sanjeev Arulampalam, Simon Maskell, Neil Gordon, and Tim Clapp. A tuto
rial on particle ﬁlters for online nonlinear/nonGaussian Bayesian tracking. IEEE
Transactions on Signal Processing, 50(2):174–188, February 2002.
[5] Lawrence M. Ausubel. An eﬃcient ascendingbid auction for multiple objects.
American Economic Review, 94(5):1452–1475, December 2004.
[6] Yaakov BarShalom and XiaoRong Li. Estimation and Tracking: Principles, Tech
niques and Software. Artech House, Norwood, MA, 1993.
[7] M Behara. Additive and Nonadditive Measures of Entropy. Wiley Eastern Ltd,
New Delhi, India, 1990.
[8] P.E. Berry and D.A.B. Fogg. On the use of entropy for optimal radar resource man
agement and control. In Radar Conference, 2003. Proceedings of the International,
pages 572–577, 2003.
[9] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scien
tiﬁc, Belmont, MA, second edition, 2000.
[10] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientiﬁc, Belmont, MA,
second edition, 1999.
195
196 BIBLIOGRAPHY
[11] D.P. Bertsekas. Auction algorithms for network ﬂow problems: A tutorial intro
duction. Computational Optimization and Applications, 1:7–66, 1992.
[12] Dimitris Bertsimas and John N. Tsitsiklis. Introduction to Linear Optimization.
Athena Scientiﬁc, Belmont, MA, 1997.
[13] Frederick J. Beutler and Keith W. Ross. Optimal policies for controlled Markov
chains with a constraint. Journal of Mathematical Analysis and Applications, 112
(1):236–252, November 1985.
[14] Samuel S. Blackman and Robert Popoli. Design and Analysis of Modern Tracking
Systems. Artech House, Norwood, MA, 1999.
[15] V.D. Blondel and John N. Tsitsiklis. A survey of computational complexity results
in systems and control. Automatica, 36(9):1249–1274, September 2000.
[16] David A. Casta˜ n´on. Stochastic control bounds on sensor network performance. In
IEEE Conference on Decision and Control, pages 4939–4944, 2005.
[17] David A. Casta˜ n´on. Optimal search strategies in dynamic hypothesis testing. Sys
tems, Man and Cybernetics, IEEE Transactions on, 25(7):1130–1138, July 1995.
[18] David A. Casta˜ n´on. Approximate dynamic programming for sensor management.
In Proc 36th Conference on Decision and Control, pages 1202–1207. IEEE, Decem
ber 1997.
[19] Lei Chen, Martin J. Wainwright, M¨ ujdat C¸ etin, and Alan S. Willsky. Data as
sociation based on optimization in graphical models with application to sensor
networks. Mathematical and Computer Modelling, 43(910):1114–1135, May 2006.
[20] Amit S. Chhetri, Darryl Morrell, and Antonia PapandreouSuppappola. Energy
eﬃcient target tracking in a sensor network using nonmyopic sensor scheduling.
In Proc. Eighth International Conference of Information Fusion, July 2005.
[21] A.S. Chhetri, D. Morrell, and A. PapandreouSuppappola. Scheduling multiple
sensors using particle ﬁlters in target tracking. In IEEE Workshop on Statistical
Signal Processing, pages 549–552, September/October 2003.
[22] A.S. Chhetri, D. Morrell, and A. PapandreouSuppappola. Sensor scheduling using
a 01 mixed integer programming framework. In Fourth IEEE Workshop on Sensor
Array and Multichannel Processing, 2006.
BIBLIOGRAPHY 197
[23] Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John
Wiley and Sons, New York, NY, 1991.
[24] O.E. Drummond, David A. Casta˜ n´on, and M.S. Bellovin. Comparison of 2D as
signment algorithms for sparse, rectangular, ﬂoating point, cost matrices. Journal
of the SDI Panels on Tracking, (4):81–97, December 1990.
[25] Emre Ertin, John W. Fisher, and Lee C. Potter. Maximum mutual information
principle for dynamic sensor query problems. In Proc IPSN 2003, pages 405–416.
SpringerVerlag, April 2003.
[26] Satoru Fujishige. Submodular functions and optimization, volume 58 of Annals of
discrete mathematics. Elsevier, Boston, MA, second edition, 2005.
[27] Arthur Gelb. Applied optimal estimation. MIT Press, Cambridge, MA, 1974.
[28] Neil Gordon, David J. Salmond, and A.F.M. Smith. Novel approach to nonlinear
and nonGaussian Bayesian state estimation. IEE Proceedings F: Radar and Signal
Processing, 140:107–113, 1993.
[29] Ying He and Edwin K. P. Chong. Sensor scheduling for target tracking: A Monte
Carlo sampling approach. Digital Signal Processing. to appear.
[30] M.L. Hernandez, T. Kirubarajan, and Y. BarShalom. Multisensor resource de
ployment using posterior cramerrao bounds. IEEE Transactions on Aerospace and
Electronic Systems, 40(2):399–416, 2004.
[31] Kenneth J. Hintz and Gregory A. McIntyre. Goal lattices for sensor management.
In Signal Processing, Sensor Fusion, and Target Recognition VIII, volume 3720,
pages 249–255. SPIE, 1999.
[32] K.J. Hintz and E.S. McVey. Multiprocess constrained estimation. Systems, Man
and Cybernetics, IEEE Transactions on, 21(1):237–244, 1991.
[33] Stephen Howard, Soﬁa Suvorova, and Bill Moran. Optimal policy for scheduling
of GaussMarkov systems. In Proceedings of the Seventh International Conference
on Information Fusion, 2004.
[34] A. T. Ihler, J. W. Fisher III, R. L. Moses, and A. S. Willsky. Nonparametric belief
propagation for selfcalibration in sensor networks. IEEE Journal of Selected Areas
in Communication, 2005.
198 BIBLIOGRAPHY
[35] A.T. Ihler, E.B. Sudderth, W.T. Freeman, and A.S. Willsky. Eﬃcient multiscale
sampling from products of Gaussian mixtures. In Neural Information Processing
Systems 17, 2003.
[36] A.T. Ihler, J.W. Fisher III, and A.S. Willsky. Communicationsconstrained infer
ence. Technical Report 2601, Massachusetts Institute of Technology Laboratory
for Information and Decision Systems, 2004.
[37] Mark Jones, Shashank Mehrotra, and Jae Hong Park. Tasking distributed sensor
networks. International Journal of High Performance Computing Applications, 16
(3):243–257, 2002.
[38] Michael I. Jordan. Graphical models. Statistical Science, 19(1):140–155, 2004.
[39] S.J. Julier and J.K. Uhlmann. Unscented ﬁltering and nonlinear estimation. Pro
ceedings of the IEEE, 92(3):401–422, March 2004.
[40] M. Kalandros and L.Y. Pao. Covariance control for multisensor systems. IEEE
Transactions on Aerospace and Electronic Systems, 38(4):1138–1157, 2002.
[41] Keith D. Kastella. Discrimination gain to optimize detection and classiﬁcation.
SPIE Signal and Data Processing of Small Targets, 2561(1):66–70, 1995.
[42] D.J. Kershaw and R.J. Evans. Optimal waveform selection for tracking systems.
IEEE Transactions on Information Theory, 40(5):1536–1550, September 1994.
[43] D.J. Kershaw and R.J. Evans. Waveform selective probabilistic data association.
IEEE Transactions on Aerospace and Electronic Systems, 33(4):1180–1188, Octo
ber 1997.
[44] Mark P. Kolba, Peter A. Torrione, and Leslie M. Collins. Informationbased sensor
management for landmine detection using multimodal sensors. In Detection and
Remediation Technologies for Mines and Minelike Targets X, volume 5794, pages
1098–1107. SPIE, 2005.
[45] J.H. Kotecha and P.M. Djuric. Gaussian particle ﬁltering. Signal Processing,
IEEE Transactions on [see also Acoustics, Speech, and Signal Processing, IEEE
Transactions on], 51(10):2592–2601, October 2003.
[46] Andreas Krause and Carlos Guestrin. Nearoptimal nonmyopic value of information
in graphical models. In UAI 2005, July 2005.
BIBLIOGRAPHY 199
[47] Andreas Krause, Carlos Guestrin, and Ajit Paul Singh. Nearoptimal sensor place
ments in Gaussian processes. In International Conference on Machine Learning,
August 2005.
[48] Andreas Krause, Carlos Guestrin, Anupam Gupta, and John Kleinberg. Near
optimal sensor placements: Maximizing information while minimizing communica
tion cost. In Fifth International Conference on Information Processing in Sensor
Networks, April 2006.
[49] Chris Kreucher, Keith Kastella, and Alfred O. Hero III. Informationbased sensor
management for multitarget tracking. In SPIE Signal and Data Processing of
Small Targets, volume 5204, pages 480–489. The International Society for Optical
Engineering, 2003.
[50] Chris Kreucher, Alfred O. Hero III, and Keith Kastella. A comparison of task
driven and information driven sensor management for target tracking. In IEEE
Conference on Decision and Control, December 2005.
[51] Chris Kreucher, Keith Kastella, and Alfred O. Hero III. Sensor management using
an active sensing approach. Signal Processing, 85(3):607–624, March 2005.
[52] Chris M. Kreucher, Keith Kastella, and Alfred O. Hero III. A bayesian method for
integrated multitarget tracking and sensor management. In International Confer
ence on Information Fusion, volume 1, pages 704–711, 2003.
[53] Chris M. Kreucher, Alfred O. Hero III, Keith Kastella, and Daniel Chang. Eﬃcient
methods of nonmyopic sensor management for multitarget tracking. In 43rd IEEE
Conference on Decision and Control, December 2004.
[54] Christopher M. Kreucher, Alfred O. Hero III, Keith D. Kastella, and Ben Shapo.
Informationbased sensor management for simultaneous multitarget tracking and
identiﬁcation. In Proceedings of The Thirteenth Annual Conference on Adaptive
Sensor Array Processing (ASAP), June 2005.
[55] V. Krishnamurthy. Algorithms for optimal scheduling and management of hidden
Markov model sensors. Signal Processing, IEEE Transactions on, 50(6):1382–1397,
2002.
200 BIBLIOGRAPHY
[56] V. Krishnamurthy and R.J. Evans. Hidden Markov model multiarm bandits: a
methodology for beam scheduling in multitarget tracking. Signal Processing, IEEE
Transactions on, 49(12):2893–2908, December 2001.
[57] V. Krishnamurthy and R.J. Evans. Correction to ‘Hidden Markov model multi
arm bandits: a methodology for beam scheduling in multitarget tracking’. Signal
Processing, IEEE Transactions on, 51(6):1662–1663, June 2003.
[58] Thomas Kurien. Issues in the design of practical multitarget tracking algorithms. In
MultitargetMultisensor Tracking: Advanced Applications, pages 43–83, Norwood,
MA, 1990. ArtechHouse.
[59] B. La Scala, M. Rezaeian, and B. Moran. Optimal adaptive waveform selection for
target tracking. In Proceedings of the Eighth International Conference on Infor
mation Fusion, volume 1, pages 552–557, 2005.
[60] B.F. La Scala, W. Moran, and R.J. Evans. Optimal adaptive waveform selection
for target detection. In Proceedings of the International Radar Conference, pages
492–496, September 2003.
[61] II Lewis, P. The characteristic selection problem in recognition systems. IEEE
Transactions on Information Theory, 8(2):171–178, February 1962. ISSN 0018
9448.
[62] Dan Li, Kerry D. Wong, Yu Hen Hu, and Akbar M. Sayeed. Detection, classi
ﬁcation, and tracking of targets. IEEE Signal Processing Magazine, 19(2):17–29,
March 2002.
[63] Michael L. Littman, Anthony R. Cassandra, and Leslie Pack Kaelbling. Eﬃcient
dynamicprogramming updates in partially observable Markov decision processes.
Technical Report CS9519, Brown University, 1995.
[64] Juan Liu, James Reich, and Feng Zhao. Collaborative innetwork processing for
target tracking. EURASIP Journal on Applied Signal Processing, (4):378–391,
2003.
[65] A. Logothetis and A. Isaksson. On sensor scheduling via information theoretic
criteria. In Proceedings of the American Control Conference, volume 4, pages
2402–2406, San Diego, CA, June 1999.
BIBLIOGRAPHY 201
[66] Ronald P. S. Mahler. Global posterior densities for sensor management. In Acqui
sition, Tracking, and Pointing XII, volume 3365, pages 252–263. SPIE, 1998.
[67] Peter S. Maybeck. Stochastic Models, Estimation, and Control, volume 1. Navtech,
Arlington, VA, 1994.
[68] Peter S. Maybeck. Stochastic Models, Estimation, and Control, volume 2. Navtech,
Arlington, VA, 1994.
[69] Gregory A. McIntyre and Kenneth J. Hintz. Information theoretic approach to
sensor scheduling. In Signal Processing, Sensor Fusion, and Target Recognition V,
volume 2755, pages 304–312. SPIE, 1996.
[70] Bill Moran, Soﬁa Suvorova, and Stephen Howard. Advances in Sensing with Se
curity Applications, chapter Sensor management for radar: a tutorial. Springer
Verlag, 2006.
[71] S. Mori, CheeYee Chong, E. Tse, and R. Wishner. Tracking and classifying mul
tiple targets without a priori identiﬁcation. IEEE Transactions on Automatic
Control, 31(5):401–409, May 1986. ISSN 00189286.
[72] James Munkres. Algorithms for the assignment and transportation problems. Jour
nal of the Society for Industrial and Applied Mathematics, 5(1):32–38, March 1957.
[73] Kevin P. Murphy. Dynamic Bayesian networks: representation, inference and
learning. PhD thesis, University of California, Berkeley, 2002.
[74] A. Nedich, M.K. Schneider, and R.B. Washburn. Farsighted sensor management
strategies for move/stop tracking. In Proceedings of the Eighth International Con
ference on Information Fusion, volume 1, pages 566–573, 2005.
[75] George L. Nemhauser and Laurence A. Wolsey. Integer and combinatorial opti
mization. Wiley, 1988.
[76] G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations
for maximizing submodular set functions–I. Mathematical Programming, 14(1):
265–294, December 1978.
[77] G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. An analysis of approximations
for maximizing submodular set functions–II. In M.L. Balinski and A.J. Hoﬀman,
202 BIBLIOGRAPHY
editors, Polyhedral combinatorics, volume 8 of Mathematical programming study,
pages 73–87. Elsevier, 1978.
[78] C.H. Papadimitrou and John N. Tsitsiklis. The complexity of Markov decision
processes. Mathematics of Operations Research, 12(3):441–450, August 1987.
[79] David C. Parkes and Lyle H. Ungar. Iterative combinatorial auctions: Theory
and practice. In Proc 17th National Conference on Artiﬁcial Intelligence (AAAI),
pages 74–81, 2000.
[80] Kris Pister. Smart dust (keynote address). In IPSN ’03, April 2003.
[81] G.J. Pottie and W.J. Kaiser. Wireless integrated network sensors. Communications
of the ACM, 43(5):51–58, May 2000.
[82] Donald B. Reid. An algorithm for tracking multiple targets. IEEE Transactions
on Automatic Control, AC24(6):843–854, December 1979.
[83] Branko Ristic, Sanjeev Arulampalam, and Neil Gordon. Beyond the Kalman Filter:
Particle Filters for Tracking Applications. Artech House, 2004.
[84] Louis L. Scharf. Statistical Signal Processing: Detection, Estimation and Time
Series Analysis. AddisonWesley, Reading, MA, 1991.
[85] M.K. Schneider, G.L. Mealy, and F.M. Pait. Closing the loop in sensor fusion sys
tems: stochastic dynamic programming approaches. In Proceedings of the Ameri
can Control Conference, volume 5, pages 4752–4757, 2004.
[86] Sumeetpal Singh, BaNgu Vo, Robin J. Evans, and Arnaud Doucet. Variance
reduction for monte carlo implementation of adaptive sensor management. In Proc.
Seventh International Conference of Information Fusion, pages 901–908, 2004.
[87] Richard D. Smallwood and Edward J. Sondik. The optimal control of partially
observable Markov decision processes over a ﬁnite horizon. Operations Research,
21(5):1071–1088, 1973.
[88] E. Sudderth, A.T. Ihler, W. Freeman, and A.S. Willsky. Nonparametric belief prop
agation. Technical Report LIDSTR2551, Massachusetts Institute of Technology,
2002.
[89] E.B. Sudderth, A.T. Ihler, W.T. Freeman, and A.S. Willsky. Nonparametric belief
propagation. In Computer Vision and Pattern Recognition, 2003.
BIBLIOGRAPHY 203
[90] P. Tichavsky, C.H. Muravchik, and A. Nehorai. Posterior Cram´erRao bounds for
discretetime nonlinear ﬁltering. IEEE Transactions on Signal Processing, 46(5):
1386–1396, 1998.
[91] Harry L. Van Trees. Detection, Estimation, and Modulation Theory. Wiley
Interscience, 2001.
[92] Martin J. Wainwright. Stochastic Processes on Graphs: Geometric and Variational
Approaches. PhD thesis, Massachusetts Institute of Technology, 2002.
[93] R.B. Washburn, M.K. Schneider, and J.J. Fox. Stochastic dynamic programming
based approaches to sensor resource management. In Proceedings of the Fifth
International Conference on Information Fusion, volume 1, pages 608–615, 2002.
[94] Kirk A. Yost and Alan R. Washburn. The LP/POMDP marriage: Optimiza
tion with imperfect information. Naval Research Logistics, 47(8):607–619, October
2000.
[95] Feng Zhao, Jaewon Shin, and James Reich. Informationdriven dynamic sensor
collaboration. IEEE Signal Processing Magazine, 19(2):61–72, March 2002.
[96] J.H. Zwaga and H. Driessen. Tracking performance constrained MFR parameter
control: applying constraints on prediction accuracy. In International Conference
on Information Fusion, volume 1, July 2005.
[97] J.H. Zwaga, Y. Boers, and H. Driessen. On tracking performance constrained MFR
parameter control. In International Conference on Information Fusion, volume 1,
pages 712–718, 2003.
2
Information Theoretic Sensor Management
by Jason L. Williams Submitted to the Department of Electrical Engineering and Computer Science on January 12, 2007 in Partial Fulﬁllment of the Requirements for the Degree of Doctor of Philosophy in Electrical Engineering and Computer Science Abstract Sensor management may be deﬁned as those stochastic control problems in which control values are selected to inﬂuence sensing parameters in order to maximize the utility of the resulting measurements for an underlying detection or estimation problem. While problems of this type can be formulated as a dynamic program, the state space of the program is in general inﬁnite, and traditional solution techniques are inapplicable. Despite this fact, many authors have applied simple heuristics such as greedy or myopic controllers with great success. This thesis studies sensor management problems in which information theoretic quantities such as entropy are utilized to measure detection or estimation performance. The work has two emphases: ﬁrstly, we seek performance bounds which guarantee performance of the greedy heuristic and derivatives thereof in certain classes of problems. Secondly, we seek to extend these basic heuristic controllers to ﬁnd algorithms that provide improved performance and are applicable in larger classes of problems for which the performance bounds do not apply. The primary problem of interest is multiple object tracking and identiﬁcation; application areas include sensor network management and multifunction radar control. Utilizing the property of submodularity, as proposed for related problems by diﬀerent authors, we show that the greedy heuristic applied to sequential selection problems with information theoretic objectives is guaranteed to achieve at least half of the optimal reward. Tighter guarantees are obtained for diﬀusive problems and for problems involving discounted rewards. Online computable guarantees also provide tighter bounds in speciﬁc problems. The basic result applies to open loop selections, where all decisions are made before any observation values are received; we also show that the closed loop
4 greedy heuristic, which utilizes observations received in the interim in its subsequent decisions, possesses the same guarantee relative to the open loop optimal, and that no such guarantee exists relative to the optimal closed loop performance. The same mathematical property is utilized to obtain an algorithm that exploits the structure of selection problems involving multiple independent objects. The algorithm involves a sequence of integer programs which provide progressively tighter upper bounds to the true optimal reward. An auxiliary problem provides progressively tighter lower bounds, which can be used to terminate when a nearoptimal solution has been found. The formulation involves an abstract resource consumption model, which allows observations that expend diﬀerent amounts of available time. Finally, we present a heuristic approximation for an object tracking problem in a sensor network, which permits a direct tradeoﬀ between estimation performance and energy consumption. We approach the tradeoﬀ through a constrained optimization framework, seeking to either optimize estimation performance over a rolling horizon subject to a constraint on energy consumption, or to optimize energy consumption subject to a constraint on estimation performance. Lagrangian relaxation is used alongside a series of heuristic approximations to ﬁnd a tractable solution that captures the essential structure in the problem.
Thesis Supervisors: John W. Fisher III† and Alan S. Willsky‡ Title: † Principal Research Scientist, Computer Science and Artiﬁcial Intelligence Laboratory ‡ Edwin Sibley Webster Professor of Electrical Engineering
Prof David Casta˜´n (BU) and Prof Dimitri no Bertsekas for oﬀering their time and advice. the MIT Lincoln Laboratory Advanced Concepts Committee and the Air Force Oﬃce of Scientiﬁc Research all supported this research at various stages of development. Pat Kreidl.2. if bad. Emily Fox and Kush Varshney. There are many people whom I must thank for making this opportunity the great experience that it has been. Dr John Fisher III and Prof Alan Willsky. Vikram Krishnamurthy (UBC) and David Choi ﬁrst pointed me to the recent work applying submodularity to sensor management problems. The Army Research Oﬃce. have been a constant sounding board for halfbaked ideas over the years—I will certainly miss them on a professional level and on a personal level. The members of the Eastgate Bible study.Acknowledgements We ought to give thanks for all fortune: if it is good.1. we will surely miss them as we leave Boston. C. humility and the contempt of this world and the hope of our eternal country. because it is good. whose support. because it works in us patience. Prof Casta˜´n suggested applying column no generation techniques to Section 4. the Graduate Christian Fellowship and the Westgate community have been an invaluable source of friendship and support for both Jeanette and I. Various conversations with David Choi. 5 . not to mention my other lab mates in the Stochastic Systems Group. which resulted in the development in Section 4.S.3. Dan Rudoy and John Weatherwax (MIT Lincoln Laboratory) as well as Michael Schneider (BAE Systems Advanced Information Technologies) provided valuable input in the development of many of the formulations studied. Lewis It has been a wonderful privilege to have been able to study under and alongside such a tremendous group of people in this institution over the past few years. Firstly. I oﬀer my sincerest thanks to my advisors. counsel and encouragement has guided me through these years. which led to the results in Chapter 3. My oﬃce mates. Thanks go to my committee members.
. to the God who creates and sustains. support and encouragement. growth and health with which I have been blessed over these years.6 ACKNOWLEDGEMENTS I owe my deepest gratitude to my wife. Finally. Thanks also go to my parents and extended family for their support as they have patiently awaited our return home. Jeanette. I humbly refer recognition for all success. The Lord gives and the Lord takes away. being an everpresent source of companionship. may the name of the Lord be praised. friendship and humour. who has followed me around the world on this crazy journey. I am extremely privileged to beneﬁt from her unwavering love.
. . . . 2. .4. . . . . . . . . . . . .2. . .2 Eﬃcient solution for beam steering problems 1.1 Dynamical models and estimation . . .1. . . . . . .1. .1 Dynamical models . . . . .4 Particle ﬁlters and importance sampling . . . . .3 Linearized and extended Kalman ﬁlter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . .1. . . . . . . .4. 2. . . . . . . . . . . 2. . . . . . . e 2. . .3 Sensor networks . . . . . . . 2. . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. .4 Contributions and thesis outline . . . .5 Graphical models . . closed loop and open loop feedback 2. . . . . . .Contents Abstract Acknowledgements List of Figures 1 Introduction 1.1 Canonical problem structures . . . . . . . .1 Performance guarantees for greedy heuristics 1. . . . . . 2. . . . . . .1. . . . . .2. . . . .2 Open loop. 1. . . . . . . . . . . . . .2. . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Cram´rRao bound . . . . . . . . . . . . . 2. .3 Constrained dynamic programming . . . . .2 Waveform selection and beam steering . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . .2 Markov decision processes . . . .2 Kalman ﬁlter . . . . . . . . . . . . . . . . . . .1. . . .4. . 2 Background 2. . . . . . . .1 Partially observed Markov decision processes . . . . . . .3 Sensor network management . . . . . . . . . . . 3 5 13 19 20 20 22 22 23 23 23 25 25 26 27 29 30 32 33 35 37 38 38 7 . . . . . . . . . . . 2. . . .
. . . . . . . . . 2. . . . . . . . . . . . .6. . . . . . . . . . .6 Formulations and geometry . . . . . . . .1. 2.1. 2. . . . . . 2. . . . . 2. . . . . . . . . . . . . . . . . . . .4 Greedy heuristics and extensions . . . . . . .1 POMDP and POMDPlike models . . . . . . Linear and integer programming . . . . . . . Relaxations . . . . . . . . . . .1 Set functions and increments . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . .6. . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . submodularity and greedy heuristics . . . . . . . . . . . Related work . . .6 . . . . . . . . . . . . . . . . . . . . . . . . .3 Independence systems and matroids . . . . 3. . 2. . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . Cutting plane methods . .4 Example: beam steering . . . . . . . . . . . . .3 KullbackLeibler distance .4. .5 2. . . . .3 Information theoretic objectives . . . . . . . . . . . 2. . . . .1 A simple performance guarantee .3. . . . . . . . .3 Suboptimal control . . . . . . . . .4.1 Entropy . . . . . . . . . . . . . . . . . . . . . .4 2. . . . . . 2. . . . 2. . . . . . . . . . .6 Other relevant work . . . . . . . . . . . . . . . .4 Greedy heuristic for matroids . . . .3. . . . . . . .1 Comparison to matroid guarantee .2 Submodularity . . . . . . . . . . . . . . . . . . . .6. . . .3. . . .8 2. . . . . . . . 2. .3. . . . . . . . . . . . .5.4. . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Integer programming . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . .2 Column generation and constraint generation 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6. . . . 3. . . . . . . . . . . . . . 2.3 Online version of guarantee . .4 Linear Gaussian models . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . .5 Example: waveform selection . . . . . . 3. . Branch and bound . . . . . . . . . 2. . .1 Linear programming . . . . . . . . . . . . . . . . . . 2. . . .4.5 Axioms resulting in entropy . . . . . . . . . .2 Tightness of bound .2 Model simpliﬁcations . . . . . . .5 Existing work on performance guarantees .6. . . . 2. . . . . . 2.2 Mutual information . . . 3. . . . .7 Contrast to our contributions . .4. . . . . . . . . . . . . . . . . . . . .1. . . . 2. . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . .5 Greedy heuristic for arbitrary subsets . . . . . . . . . . . 40 41 42 43 44 45 46 48 48 50 51 54 55 57 57 58 59 59 60 60 61 61 62 62 62 65 65 66 69 70 72 72 73 74 76 3 Greedy heuristics and performance guarantees 3. . . .6. . . . .3. . . . . . . . . . Set functions. . . . CONTENTS 2.5. . . .6. . . . . . . . . . . . . . . . . . .1. . . . . . .5. . . . . . . . . .
. . . .5 Example: bearings only measurements . . . 4. . . . . . . . . . . . . . . . . . . . . . .4 Time invariant rewards . . . . . . . . . . . . . . .6 Early termination . . . 4. . . . . . .4. . . . . .9 Extension: platform steering . . 3. . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . 3. . . . 3. . . . . .7 Estimation of rewards . . . . 3. . . . . . . . . . . . . 4 Independent objects and integer programming 4. . . . . . . 4. .2 Formulation as an assignment problem .2. . 4. . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . .8.1 Counterexample: closed loop greedy versus closed loop optimal 3. . . .2 Formulation of the integer program in each iteration 4. 3. . . . . . . . . . .3 Closed loop subset selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Integer programming formulation . . . . . . . . . .2. .3. . . .2. . . . . . . . . . . . .1 Basic formulation .8 Extension: general matroid problems . . . 4. . . . 4. 4. . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Independent objects. . .1 Online guarantee . . . . . . .4 Computational experiments . . . . . .5 Theoretical characteristics . . . .2 Counterexample: closed loop greedy versus open loop greedy . .2. . 3. . .3 Example . . .1. . . .5. . . . . .1. . . .4 Example . . . . . . . . . . . . 3. . . 3. . . . . . . .1 Example: beam steering . . . . . . .1 Observation sets . . . . . . . . . .3.3 Constraint generation approach . . . . .2 Specialization to trees and chains . . . . . .3 Discounted rewards . . . . . . . . . . . . . . .3. . . . . . . . . . 4. 3. . . . . . . . . 4. . . . . . . . . . . . . . . .2. . . . . . . . .10 Conclusion . . . . . . . 4. . 3. . . . . . 3. . . . . . . .1 Example . . . . . . . .6 Guarantees on the Cram´rRao bound . . . .2. . . . . . . . . . . . . . . . . . .2. additive rewards . . . . 3. . .3 Establishing the diﬀusive property . .3. . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . .2 Integer programming generalization . . . . . . . . e 3. . . . . . . . . . . . . . . 3. . . . . . . . . .3 Iterative algorithm . . . . 3. . .5. . . .4 Example: beam steering revisited . . . . . . . . . . . . . . . . . . . . . .5 Closed loop control . . .1 Implementation notes . . . .2 Exploiting diﬀusiveness . . . . . . . . . . . . . . . . . . . .CONTENTS 9 . .5. . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . 79 81 82 83 84 84 87 91 93 97 98 99 101 104 105 106 106 110 111 112 112 113 116 119 120 120 122 124 126 130 131 135 140 141 141 3. . 3. . . . . . . . . . .
. . . . . . . . . 5. . . . . . . . . . . . .1 Constrained Dynamic Programming Formulation . . . . . . . . . . . . . .2 Computational experiment: waveform selection . . . . . . .1 Summary of contributions . . . . . . . .2 Eﬃcient solution for beam steering problems 6.9 Sequential subgradient update . . . . . . .1 Avoiding redundant observation subsets . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . .3 Constrained communication formulation . .1. . . . . 5. .1. .5 Example of potential beneﬁt: multiple time slot observations Time invariant rewards . 6. . . . . . . . . . . . . . . . . . . . . . . .4. . . . . .8 nScan pruning . . . . . . . . . . .4 Conclusion and future work .5 4.2 Decoupled Leader Node Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . .2. .1. . . . . . . . .5. . .4. . . . . . . . . . .2 Communications . . . . . . . .5 Evaluation through Monte Carlo simulation 5. .10 Rollout . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . 6 Contributions and future directions 6.1. . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Surrogate constraints . 4. . . . . . . . . .5. . . . . . . . . .1 Formulation . . . . . . . . . . . . . . . . . . .1. . . . . . 5. .2. . . . . . 5. 5. . 5. . . . . . . . . . .4 Constrained entropy formulation . . 5. . . . . . . . . . . . . 5. . . . . . . .7 Greedy sensor subset selection . . . .3 Example of potential beneﬁt . . . . . . . . . . . . . . . . .4 Example of potential beneﬁt: single time slot observations . . . . . . . . 5. . . . . . . . . . . .1 Performance guarantees .1. . 142 146 150 151 153 155 156 158 160 163 164 165 166 167 168 169 169 171 177 177 179 179 180 180 181 184 189 189 189 190 190 191 191 191 5 Sensor management in sensor networks 5. .1. . .3 State dependent observation noise . . . . .2 Recommendations for future work . . . .10 CONTENTS 4. . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . 6. . 5. . . . 4.1.3 Simulation results . . . .1. . . . . . . . . . . . . . . . .1. . . . . . . . . . Conclusion . . . . . . . . . . . . . . . .6 4. . . 4. . . . . . . . . 5. .2 Waveform selection . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . .6 Linearized Gaussian approximation . . . .3 Sensor network management .1. . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . .1 Estimation objective . .1. . . . . 4. .5. . . . . . . . . . . . . . . . . . . . . . . 5. 5. . . .1 Performance guarantees for greedy heuristics 6. . . . . . Guarantees for longer lookahead lengths . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensor network management . . . . . . . . . .2. . . . . . . . . . . . . . . . . . .2. .3 Bibliography . . . . . . . . . . . . . . .CONTENTS 11 Observations consuming diﬀerent resources . Closed loop guarantees . . . . . . . . . . . . . Deferred reward calculation . . . . . . . 191 192 192 192 192 192 193 193 193 193 194 195 6. . . .2 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accelerated search for lower bounds . . . . . . . . . . . . Problems involving multiple objects . Integration into branch and bound procedure . . . . . . Performance guarantees . . Stronger guarantees exploiting additional structure Integer programming formulation of beam steering Alternative update algorithms . . . . . . . . . . . . . . . . . . . .
12 CONTENTS .
List of Figures
2.1
2.2
Contour plots of the optimal reward to go function for a single time step and for four time steps. Smaller values are shown in blue while larger values are shown in red. . . . . . . . . . . . . . . . . . . . . . . . . . . . Reward in single stage continuous relaxation as a function of the parameter α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (a) shows total reward accrued by the greedy heuristic in the 200 time steps for diﬀerent diﬀusion strength values (q), and the bound on optimal obtained through Theorem 3.2. (b) shows the ratio of these curves, providing the factor of optimality guaranteed by the bound. . . . . . . . (a) shows region boundary and vehicle path (counterclockwise, starting from the left end of the lower straight segment). When the vehicle is located at a ‘ ’ mark, any one grid element with center inside the surrounding dotted ellipse may be measured. (b) graphs reward accrued by the greedy heuristic after diﬀerent periods of time, and the bound on the optimal sequence for the same time period. (c) shows the ratio of these two curves, providing the factor of optimality guaranteed by the bound. Marginal entropy of each grid cell after 75, 225 and 525 steps. Blue indicates the lowest uncertainty, while red indicates the highest. Vehicle path is clockwise, commencing from topleft. Each revolution takes 300 steps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strongest diﬀusive coeﬃcient versus covariance upper limit for various values of q , with r = 1. Note that lower values of α∗ correspond to ˜ ˜ stronger diﬀusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49
3.1
75
3.2
77
3.3
78
3.4
85
13
14 3.5
LIST OF FIGURES
3.6
3.7
(a) shows total reward accrued by the greedy heuristic in the 200 time steps for diﬀerent diﬀusion strength values (q), and the bound on optimal obtained through Theorem 3.5. (b) shows the ratio of these curves, providing the factor of optimality guaranteed by the bound. . . . . . . . 86 (a) shows average total reward accrued by the greedy heuristic in the 200 time steps for diﬀerent diﬀusion strength values (q), and the bound on optimal obtained through Theorem 3.5. (b) shows the ratio of these curves, providing the factor of optimality guaranteed by the bound. . . . 88 (a) shows the observations chosen in the example in Sections 3.1.4 and 3.2.4 when q = 1. (b) shows the smaller set of observations chosen in the constrained problem using the matroid selection algorithm. . . . . . . . 107 Example of operation of assignment formulation. Each “strip” in the diagram corresponds to the reward for observing a particular object at diﬀerent times over the 10step planning horizon (assuming that it is only observed once within the horizon). The role of the auction algorithm is to pick one unique object to observe at each time in the planning horizon in order to maximize the sum of the rewards gained. The optimal solution is shown as black dots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Example of randomly generated detection map. The color intensity indicates the probability of detection at each x and y position in the region.117 Performance tracking M = 20 objects. Performance is measured as the average (over the 200 simulations) total change in entropy due to incorporating chosen measurements over all time. The point with a planning horizon of zero corresponds to observing objects sequentially; with a planning horizon of one the auctionbased method is equivalent to greedy selection. Error bars indicate 1σ conﬁdence bounds for the estimate of average total reward. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.1
4.2 4.3
LIST OF FIGURES
15
4.4
4.5
4.6
4.7
Subsets available in iteration l of example scenario. The integer program may select for each object any candidate subset in Tl i , illustrated by the circles, augmented by any subset of elements from the corresponding exploration subset, illustrated by the rectangle connected to the circle. The sets are constructed such that there is a unique way of selecting any subset of observations in S i . The subsets selected for each object must collectively satisfy the resource constraints in order to be feasible. The shaded candidate subsets and exploration subset elements denote the solution of the integer program at this iteration. . . . . . . . . . . . Subsets available in iteration (l+1) of example scenario. The subsets that were modiﬁed in the update between iterations l and (l + 1) are shaded. There remains a unique way of selecting each subset of observations; e.g., the only way to select elements g and e together (for object 2) is to select the new candidate subset {e, g}, since element e was removed from the 2 exploration subset for candidate subset {g} (i.e., Bl+1,{g} ). . . . . . . . . Four iterations of operations performed by Algorithm 4.1 on object 1 (arranged in counterclockwise order, from the topleft). The circles in each iteration show the candidate subsets, while the attached rectangles show the corresponding exploration subsets. The shaded circles and rectangles in iterations 1, 2 and 3 denote the sets that were updated prior to that iteration. The solution to the integer program in each iteration is shown along with the reward in the integer program objective (“IP reward”), which is an upper bound to the exact reward, and the exact reward of the integer program solution (“reward”). . . . . . . . . . . . . . . . . . . The two radar sensor platforms move along the racetrack patterns shown by the solid lines; the position of the two platforms in the tenth time slot is shown by the ‘*’ marks. The sensor platforms complete 1.7 revolutions of the pattern in the 200 time slots in the simulation. M objects are positioned randomly within the [10, 100]×[10, 100] according to a uniform distribution, as illustrated by the ‘ ’ marks. . . . . . . . . . . . . . . .
125
127
133
143
16 4.8
LIST OF FIGURES
4.9
4.10
4.11
4.12
4.13
Results of Monte Carlo simulations for planning horizons between one and 30 time slots (in each sensor). Top diagram shows results for 50 objects, while middle diagram shows results for 80 objects. Each trace in the plots shows the total reward (i.e., the sum of the MI reductions in each time step) of a single Monte Carlo simulation for diﬀerent planning horizon lengths divided by the total reward with the planning horizon set to a single time step, giving an indication of the improvement due to additional planning. Bottom diagram shows the computation complexity (measured through the average number of seconds to produce a plan for the planning horizon) versus the planning horizon length. . . . . . . . . Computational complexity (measured as the average number of seconds to produce a plan for the 10step planning horizon) for diﬀerent numbers of objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Top diagram shows the total reward for each planning horizon length divided by the total reward for a single step planning horizon, averaged over 20 Monte Carlo simulations. Error bars show the standard deviation of the mean performance estimate. Lower diagram shows the average time required to produce plan for the diﬀerent length planning horizon lengths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upper diagram shows the total reward obtained in the simulation using diﬀerent planning horizon lengths, divided by the total reward when the planning horizon is one. Lower diagram shows the average computation time to produce a plan for the following N steps. . . . . . . . . . . . . . Upper diagram shows the total reward obtained in the simulation using diﬀerent planning horizon lengths, divided by the total reward when the planning horizon is one. Lower diagram shows the average computation time to produce a plan for the following N steps. . . . . . . . . . . . . . Diagram illustrates the variation of rewards over the 50 time step planning horizon commencing from time step k = 101. The line plots the ratio between the reward of each observation at time step in the planning horizon and the reward of the same observation at the ﬁrst time slot in the planning horizon, averaged over 50 objects. The error bars show the standard deviation of the ratio, i.e., the variation between objects.
145
147
149
152
154
157
. .1 Tree structure for evaluation of the dynamic program through simulation. .4 174 5. . 5. . . Computation tree equivalent to Fig.14 Top diagram shows the total reward for each planning horizon length divided by the total reward for a single step planning horizon. . . . . . . . . . 161 5. . Computation tree after applying the linearized Gaussian approximation of Section 5. . . 5. averaged over 17 Monte Carlo simulations. . . . . . resulting from further decomposing sensor subset selection problem into a generalized stopping problem. . . . . . . . . . in which each substage allows one to terminate and move onto the next time slot with the current set of selected sensors. . . . . . .2 5. At each stage. . . . . At each stage new leaves are generated extending each remaining sequence with using each new leader node. . . . . . . . . . . . . . . . . . . . . . . . . . selecting leader node for each stage and then selecting the subset of sensors to activate. . . . . divided by the total reward when the planning horizon is one. .6. . . . . . .3 173 5. . . . . . . . . . . . . . . . . . . .2 and Fig. . . .1. . . . . 172 172 5. . . . . resulting from decomposition of control choices into distinct stages. and a set of simulated values of the resulting observations.5 176 . Tree structure for nscan pruning algorithm with n = 1. Lower diagram shows the average time required to produce plan for the diﬀerent length planning horizon lengths. . . . . . . . . . . . . . . . . . . . . all but the best sequence ending with each leader node is discarded (marked with ‘×’). . .15 Upper diagram shows the total reward obtained in the simulation using diﬀerent planning horizon lengths. . . . .LIST OF FIGURES 17 4. . . . . . . 5. . . . a tail subproblem is required to be evaluated each new control. . . 159 4.3. Subsequently. . Error bars show the standard deviation of the mean performance estimate. .2. . or to add an additional sensor. Computation tree equivalent to Fig. . . . . . . . . Lower diagram shows the average computation time to produce a plan for the following N steps. . and the remaining sequences are extended using greedy sensor subset selection (marked with ‘G’). .
. . . . . . . . . . . . Ellipse centers show the mean in each axis over 100 Monte Carlo runs. ellipses illustrate covariance. . . . and corresponding cumulative communication costs. . . . . . . . . . . . 186 . . Upper ﬁgure compares average position entropy to communication cost. . . . . . . .7 5. Ellipse centers show the mean in each axis over 100 Monte Carlo runs. compared to the method which dynamically selects the leader node to minimize the expected communication cost consumed in implementing a ﬁxed sensor management scheme. . . The ﬁxed sensor management scheme activates the sensor (‘greedy’) or two sensors (‘greedy 2’) with the observation or observations producing the largest expected reduction in entropy. . . . . . ellipses illustrate covariance. . .. . . . . . .18 5. 185 Position entropy and communication cost for dynamic programming method with communication constraint (DP CC) and information constraint (DP IC). . while lower ﬁgure compares average of the minimum entropy over blocks of the same length as the planning horizon (i. . providing an indication of the variability across simulations. . .6 LIST OF FIGURES 5. . providing an indication of the variability across simulations. .8 Position entropy and communication cost for dynamic programming method with communication constraint (DP CC) and information constraint (DP IC) with diﬀerent planning horizon lengths (N ).e. 183 Adaptation of communication constraint dual variable λk for diﬀerent horizon lengths for a single Monte Carlo run. . . the quantity to which the constraint is applied) to communication cost. . . and the sensor with the smallest expected square distance to the object (min expect dist). . compared to the methods selecting as leader node and activating the sensor with the largest mutual information (greedy MI). .
conceptually. in general the optimal solution of these problems requires computation and storage of continuous functions with no ﬁnite parameterization. 19 . Sensor management deals with such situations: where the objective is to maximize the utility of measurements for an underlying detection or estimation task. substantial performance gains can be obtained by exploiting this ability. adaptively controlling sensors to maximize the utility of the information received. we exploit structure that arises in problems involving multiple independent objects to eﬃciently ﬁnd optimal or guaranteed nearoptimal solutions. control choices and time steps. solved using dynamic programming. and tracking and identifying aircraft in the vicinity of an airport using radar. sensors. secondly. hence it is intractable even problems involving small numbers of objects. and ﬁnally. Examples include detecting the presence of a heart disease using measurements from MRI.Chapter 1 Introduction D ETECTION and estimation theory considers the problem of utilizing noisecorrupted observations to infer the state of some underlying process or phenomenon. estimating ocean currents using image data from satellite. In many problem contexts. We follow three diﬀerent approaches: ﬁrstly. This thesis examines several types of sensor resource management problems. we ﬁnd a heuristic solution to a speciﬁc problem structure that arises in problems involving sensor networks. we examine performance guarantees that can be obtained for simple heuristic algorithms applied to certain classes of problems. Many modern sensors are able to rapidly change mode of operation and steer between physically separated objects. However. Sensor management problems involving multiple time steps (in which decisions at a particular stage may utilize information received in all prior stages) can be formulated and. detecting and tracking people using video cameras.
An additional complication that commonly arises is when observations require diﬀerent or random time durations (or. A related problem involves multiple objects observed by a sensor. At each time step. An example of this problem is in optical tracking and identiﬁcation using steerable cameras.1 Canonical problem structures Sensor resource management has received considerable attention from the research community over the past two decades. Beam steering. beam steering and platform steering with multiple objects and multiple sensors. Each object evolves according to an independent stochastic process. 1. Platform steering. costs) to complete. These three structures can be combined and extended to scenarios involving waveform selection. in which diﬀerent signals can be transmitted in order to obtain information about diﬀerent aspects of the object state (such as position. and dynamic routing of measurements and models in sensor networks. An example of this problem is in object tracking using radar. the observation models corresponding to diﬀerent objects are also independent. which can be observed using diﬀerent modes of a sensor. where the time/frequency characteristics . which have been discussed by several authors. the controller may choose which object to observe. velocity or identity). The role of the controller is to select which object to observe in each time step. The internal state evolves according to a fully observed Markov random process.20 CHAPTER 1. Its name is derived from active radar and sonar.2 Waveform selection and beam steering The waveform selection problem naturally arises in many diﬀerent application areas. provide a rough classiﬁcation of existing work. The ﬁrst problem structure involves a single object. The role of the controller is to select the best mode of operation for the sensor in each time step. A third problem structure arises when the sensor possesses an internal state that aﬀects which observations are available or the costs of obtaining those observations. more abstractly. The following three canonical problem structures. but only one mode can be used at a time. The controller must choose actions to inﬂuence the sensor state such that the usefulness of the observations is optimized. Examples of this structure include control of UAV sensing platforms. and of the problems examined in this thesis: Waveform selection. INTRODUCTION 1.
and optimizing the measure over the next time step. patient discomfort. radar warning receivers) often have limited bandwidth. one in which diﬀerent control decisions obtain diﬀerent types of information about the same underlying process..g. which encompasses time. Most of the nonmyopic extensions of these methods are either tailored to very speciﬁc problem structure (observation . Many other problems share the same structure. The practitioner has at their disposal a range of tests.2. each of which provides observations of diﬀerent aspects of the phenomenon of interest. and sensor resource management must be approached as a waveform selection problem involving diﬀerent observations of the full system. Beam steering may be seen to be a special case of the waveform selection problem. Other examples include: • Passive sensors (e. since it does not consider future observation opportunities. ocean temperature and depth mapping. Examples of problems of this type include monitoring of air quality. track and identify people or cars. Of course it is desirable to exploit the speciﬁc structure that exists in the case of beam steering. A similar example is the use of cameras with controllable pan and zoom to detect. • In ecological and geological applications. 1.Sec.e. consider the hyperobject that encompasses all objects being tracked. the phenomenon of interest is often comprised of the state of a large interconnected system. i. and economical considerations. The dependencies within the system prevent the type of decomposition that is used in beam steering. and weather observation. Waveform selection and beam steering 21 of the transmitted waveform aﬀect the type of information obtained in the return. Choosing to observe diﬀerent constituent objects will result in information relevant to diﬀerent aspects of the hyperobject. Different choices will return information about diﬀerent aspects of the phenomenon of interest. Many authors have approached the waveform selection and beam steering problems by proposing an estimation performance measure. For example. but can choose which interval of the frequency spectrum to observe at each time. • Medical diagnosis concerns the determination of the true physiological state of a patient. Associated with each test is a notion of cost. which is evolving in time according to an underlying dynamical system. This approach is commonly referred to as greedy or myopic. The essential structure of this problem ﬁts within the waveform selection category..
1. . 1. The resulting structure falls within the framework of platform steering. Sensor networks pose a particular challenge for resource management: not only are there short term resource constraints due to limited communication bandwidth. It is commonly the case that the observations provided by sensors are highly informative if the sensor is in the close vicinity of the phenomenon of interest. etc). In the context of object tracking. Furthermore. this has motivated the use of a dynamically assigned leader node. INTRODUCTION models. ﬁnding an eﬃcient integer programming solution that exploits the structure of beam steering. ﬁnding an eﬃcient heuristic sensor management method for object tracking in sensor networks. but there are also long term energy constraints due to battery limitations.3 Sensor networks Networks of wireless sensors have the potential to provide unique capabilities for monitoring and surveillance due to the close range at which phenomena of interest can be observed. or are limited to considering two or three time intervals (longer planning horizons are typically computationally prohibitive). and stores and updates the knowledge of the object as new observations are obtained. we obtain performance guarantees that delineate problems in which additional planning is and is not beneﬁcial. it is unclear when additional planning can be beneﬁcial. We then examine two problems in which longterm planning can be beneﬁcial. The choice of leader node should naturally vary as the object moves through the network. This necessitates long term planning: for example. dynamics models.4 Contributions and thesis outline This thesis makes contributions in three areas. Application areas that have been investigated range from agriculture to ecological and geological monitoring to object tracking and identiﬁcation. excessive energy should not be consumed in obtaining information that can be obtained a little later on at a much lower cost. where the sensor state is the currently activated leader node. and ﬁnally. Firstly. which determines which sensors should take and communicate observations.22 CHAPTER 1. Failure to do so will result in a reduced operational lifetime for the network. and comparatively uninformative otherwise.
4. is also discussed. etc) an energy cost. Solutions with guaranteed nearoptimality are found by simultaneously reducing the upper bound and raising a matching lower bound.Sec. and the posterior Cram´rRao bound. we seek to trade oﬀ estimation performance and energy consumed in an object tracking problem. Contributions and thesis outline 23 1. The analysis in Chapter 3 obtains guarantees similar to the recent work in [46] for the sequential problem structures that commonly arise in waveform selection and beam steering. Our analysis . communication. We approach the trade oﬀ between these two quantities by maximizing estimation performance subject to a constraint on energy cost. time varying dynamics and observation models. which is able to address time invariant rewards with a further computational saving. including tighter bounds that exploit either process diﬀusiveness or objectives involving discount factors. e 1. and observations requiring diﬀerent time durations to complete. 1. Several extensions are obtained. 1. Examples demonstrate that the bounds are tight. and applicability to closed loop problems.4.1 Performance guarantees for greedy heuristics Recent work has resulted in performance guarantees for greedy heuristics in some applications.2 Eﬃcient solution for beam steering problems The analysis in Chapter 4 exploits the special structure in problems involving large numbers of independent objects to ﬁnd an eﬃcient solution of the beam steering problem. but there remains no guarantee that is applicable to sequential problems without very special structure in the dynamics and observation models. including mutual information and the posterior Cram´rRao bound. The analysis from Chapter 3 is utilized to obtain an upper bound on the objective function. Computational experiments demonstrate application to problems involving 50–80 objects planning over horizons up to 60 time slots. i. minimizing energy cost subject to a constraint on estimation performance. admitting time varying observation and dynamical models.e. An alternative formulation.4. The result is quite general in that it applies to arbitrary. We assign to each operation (sensing.3 Sensor network management In Chapter 5.4. The results apply to objectives including mutual information. and then we seek to develop a mechanism that allows us to choose only those actions for which the resulting estimation gain received outweighs the energy cost incurred.. The algorithm has quite general applicability. e and counterexamples illuminate larger classes of problems to which they do not apply. or the dual of this. The methods apply to the same wide range of objectives as Chapter 3.
. yet still captures the essential structure of the underlying trade oﬀ.24 CHAPTER 1. INTRODUCTION proposes a planning method that is both computable and scalable. Simulation results demonstrate a dramatic reduction in the communication cost required to achieve a given estimation performance level as compared to previously proposed algorithms.
although many of the methods we discuss could be applied to any other dynamical process. we describe the information theoretic objective functions that we utilize. tracking and identifying multiple objects. then one must construct an estimator for conducting inference on that phenomenon. we brieﬂy outline the development of statistical models for object tracking before describing some of the estimation schemes we utilize in our experiments. the focus of Chapter 3 is extending these results to sequential problems. The starting point of such algorithms is a dynamical model which captures mathematically how the physical process evolves. In Section 2. and contrasts the approaches presented in the later chapters to the existing methods. Section 2. In Section 2.4 details some existing results that have been applied to related problems to guarantee performance of simple heuristic algorithms. one must design a controller to make decisions using the available inputs.1. one must develop a statistical model for the phenomenon of interest.3.6 surveys the existing work in the ﬁeld.Chapter 2 Background HIS section provides an outline of the background theory which we utilize to develop our results. Section 2. and how the observations taken by the sensor relate to the model variables. Section 2. One must select an objective that measures how successful the sensor manager decisions have been. ﬁnally. T 2. Sensor management requires an understanding of several related topics: ﬁrst of all.5 brieﬂy outlines the theory of linear and integer programming that we utilize in Chapter 4. the category of problems in which sensor management naturally belongs.2 outlines the theory of stochastic control. Finally. Section 2. and certain properties of the objectives that we utilize throughout the thesis. Having designed this 25 .1 Dynamical models and estimation This thesis will be concerned exclusively with sensor management systems that are based upon statistical models. and. The primary problem of interest is that of detecting.
.2) The process w(t) is formally deﬁned as a continuous time white noise with strength Q(t). one can then construct an estimator which uses sensor observations to reﬁne one’s knowledge of the state of the underlying physical process. Given any such model. 2. Accordingly. we outline the construction of dynamical models for object tracking. For example. and position is the integral of velocity: v(t) = w(t) ˙ p(t) = v(t) ˙ (2.26 CHAPTER 2. One common model for nonmaneuvering objects hypothesizes that velocity is a random walk. and then brieﬂy examine several estimation methods that one may apply.3) where w(t) is a zeromean Gaussian white noise process with strength Qc . ˙ x(t) = Fc x(t) + w(t) (2.1) (2. BACKGROUND model.1. which seek to choose sensor actions in order to minimize the uncertainty in the resulting estimate. dynamical models for object tracking are based upon simple observations regarding behavior of targets and the laws of physics. the underlying continuous time model is commonly chosen to be a stationary linear Gaussian system. This representation is central to the design of sensor management algorithms.1 Dynamical models Traditionally. the estimator maintains a representation of the conditional probability density function (PDF) of the process state conditioned on the observations incorporated. This model is given by: [67] xk+1 = Fxk + wk (2. This strength may be chosen in order to model the expected deviation from the nominal trajectory. In this section. it will probably still be moving at a constant velocity shortly afterward. we can construct a discretetime model which has the equivalent eﬀect at discrete sample points.4) where1 F = exp[Fc ∆t] 1 (2. In tracking.5) exp[·] denotes the matrix exponential. we may construct a mathematical model based upon Newtonian dynamics. if we are tracking an aircraft and it moving at an essentially constant velocity at one instant in time. Using a Bayesian formulation.
We refer the reader to [6] for further details of estimation using jump Markov linear systems. the jump Markov linear system can be addressed by the methods and guarantees we develop. The equivalent discretetime model becomes: 1 ∆t 0 0 0 1 0 0 xk+1 = (2.8) 0 0 1 ∆t xk + wk 0 0 0 1 where wk is a discrete time zeromean Gaussian white noise process with covariance 3 ∆t ∆t2 0 0 3 2 ∆t2 2 ∆t 0 0 Q = q (2. 2. the dynamical model at any time is a linear system. Dynamical models and estimation 27 ∆t is the time diﬀerence between subsequent samples and wk is a zeromean discrete time Gaussian white noise process with covariance ∆t Q= 0 exp[Fc τ ]Qc exp[Fc τ ]T dτ (2. 2.Sec. While not explicitly explored in this thesis. for a linear dynamical system . including mean square error.9) ∆t3 ∆t2 0 0 3 2 ∆t2 ∆t 0 0 2 Objects undergoing frequent maneuvers are commonly modelled using jump Markov linear systems. but the parameters of the linear system change at discrete time instants.6) As an example.1. mean absolute error and uniform cost. In this case. we consider tracking in two dimensions using the nominally constant velocity model described above: 0 1 0 0 0 0 0 0 x(t) + w(t) ˙ x(t) = (2.1.7) 0 0 0 1 0 0 0 0 where x(t) = [px (t) vx (t) py (t) vy (t)]T and w(t) = [wx (t) wy (t)]T is a continuous time zeromean Gaussian white noise process with strength Qc = qI2×2 . these changes are modelled through a ﬁnite state Markov chain.2 Kalman ﬁlter The Kalman ﬁlter is the optimal estimator according to most sensible criteria.
13) ˆ xkk−1 is the estimate of xk conditioned on observations up to and including time (k − 1). 6. We brieﬂy outline the Kalman ﬁlter below. If we relax the Gaussianity requirement on the noise processes. Finally. and hence this too can be calculated oﬄine.14) (2. In the Gaussian case. in the linear Gaussian case. BACKGROUND with additive white Gaussian noise and linear observations with additive white Gaussian noise. Similar comments apply to xkk and Pkk .. Qk } which is uncorrelated with x0 . 67] for more indepth treatments. As we will see in Section 2. both the ﬁlter gain and covariance may be computed oﬄine in advance and stored. we note that the recursive equations for the covariance Pkk−1 and Pkk . 27. 0.12) (2.15) (2.28 CHAPTER 2. We consider the discrete time linear dynamical system: xk+1 = Fk xk + wk (2. the uncertainty in an estimate as measured through entropy is dependent only upon the covariance matrix.e. 0. the reader is referred to [3. Accordingly. Pkk−1 }. and the gain Kk are both invariant to the value of the observations received z k . while Pkk−1 is the covariance of this error. the Kalman ﬁlter remains the optimal linear estimator according to the mean square error criterion.16) (2. p(xk z 0:k−1 ) = ˆ ˆ N {xk .10) ˆ commencing from x0 ∼ N {x0 . The dynamics noise wk is a the white noise process. P00 }. xkk−1 .3. Rk } is a white noise process that is uncorrelated with x0 and with the process v k .11) where v k ∼ N {v k . i. x00 . The Kalman ﬁlter equations include a propagation step: ˆ ˆ xkk−1 = Fk xk−1k−1 Pkk−1 = Fk Pk−1k−1 FT + Qk k and an update step: ˆ ˆ ˆ xkk = xkk−1 + Kk [z k − Hk xkk−1 ] Pkk = Pkk−1 − Kk Hk Pkk−1 Kk = Pkk−1 HT [Hk Pkk−1 HT + Rk ]−1 k k (2. We assume a linear observation model: z k = Hk xk + v k (2. wk ∼ N {wk . these two parameters completely describe the posterior distribution. .
k)Pkk−1 H(¯ k . the primary source for this material. : x ¯ z k ≈ h(¯ k . ∂x2 . ∂xn ]T where nx is the number of elements in the vector x. the basic concept is regularly applied to nonlinear systems through two algorithms known as the linearized and extended Kalman ﬁlters.17) (2.2.23) . The basic concept is that a mild nonlinearity may be approximated as being linear about a nominal point though a Taylor series expansion. k) + v k (2.1. and we present the equations for the linearized and extended Kalman ﬁlters for the case in which the only nonlinearity present is in the observation equation. whereas the extended Kalman ﬁlter must compute both of these online. we assume that the dynamical model is linear. . k) + H(¯ k .1. k) = [ x T T x h(x. The reader is directed to [68]. k)Pkk−1 x Kk = Pkk−1 H(¯ k . as in Section 2. In the case of the linearized Kalman ﬁlter. the linearized Kalman ﬁlter retains the ability to calculate ﬁlter gains and covariance matrices in advance.20) ∂ ∂ ∂ and x [ ∂x1 . k) ] x=¯ k x (2. k)T [H(¯ k . k)[xk − xk ] + v k x x where H(¯ k .2.. the linearization point is chosen in advance. 2.. The linearized Kalman ﬁlter calculates a linearized measurement model about a prespeciﬁed nominal state trajectory {¯ k }k=1. for information on the nonlinear dynamical model case. . k) − H(¯ k .18) where. . Dynamical models and estimation 29 2. This is most commonly the case in tracking applications..3 Linearized and extended Kalman ﬁlter While the optimality guarantees for the Kalman ﬁlter apply only to linear systems. k)[xk − xk ]} x x Pkk = Pkk−1 − Kk H(¯ k . The model we consider is: xk+1 = Fk xk + wk z k = h(xk . both of which are uncorrelated with x0 .Sec. k)T + Rk ]−1 x x x (2. Consequently. In this document.1. wk and v k are uncorrelated white Gaussian noise processes with known covariance.22) (2.19) (2. x The linearized Kalman ﬁlter update equation is therefore: ˆ ˆ ¯ xkk = xkk−1 + Kk {z k − h(¯ k .21) (2. . the extended Kalman ﬁlter relinearizes online about the current estimate value.
28. The extended Kalman ﬁlter diﬀers only in the in the point about which the model is linearized.1.30 CHAPTER 2. k)] x Pkk = Pkk−1 − Kk H(ˆ kk−1 . substantial nonlinearity is encountered in observation models. This nonlinearity can result in substantial multimodality in posterior distributions (such as results when one receives two range observations from sensors in diﬀerent locations) which cannot be eﬃciently modelled using a Gaussian distribution. the extended Kalman ﬁlter must be computed online. 83] is an approximation which is commonly used in problems involving a high degree of nonlinearity and/or nonGaussianity. and the coarse approximation performed by the extended Kalman ﬁlter is inadequate. k)[xk − xkk−1 ] + v k x x The extended Kalman ﬁlter update equation becomes: ˆ ˆ xkk = xkk−1 + Kk [z k − h(ˆ kk−1 . which enables one to approximate an expectation under one distribution using samples drawn from another distribution. In this case. This is particularly true in sensor networks.25) (2. 2. the conditional PDF of object state xk conditioned on observations received up to . The particle ﬁlter [4. we linearize about the current state estimate: ˆ z k ≈ h(ˆ kk−1 . and hence they can be precomputed.4 Particle ﬁlters and importance sampling In many applications. The method is based on importance sampling. k) + v k (2.29) We apply the same assumptions on wk and v k as in previous sections.27) (2. k)Pkk−1 x Kk = Pkk−1 H(ˆ kk−1 . Using a particle ﬁlter. since the local focus of observations yields much greater nonlinearity in range or bearing observations than arises when sensors are distant from the objects under surveillance. k)T + Rk ]−1 x x x (2.28) (2. k)Pkk−1 H(ˆ kk−1 . k) + H(ˆ kk−1 .26) (2. We again assume a linear dynamical model (although this is by no means required) and a nonlinear observation model: xk+1 = Fk xk + wk z k = h(xk .24) Since the ﬁlter gain and covariance are dependent on the state estimate and hence the previous observation values. BACKGROUND Again we note that the ﬁlter gain and covariance are both invariant to the observation values. k)T [H(ˆ kk−1 .
k This is commonly approximated using a linearization of the measurement model for z k+1 (Eq. we draw a new k sample at the next time step.35) The ﬁnal step of the SIR ﬁlter is to draw a new set of Np samples from the updated distribution to reduce the number of samples allocated to unlikely regions and reinitialize the weights to be uniform. for each previous sample xi .31) The algorithm then uses importance sampling to reweight these samples to implement the Bayes update rule for incorporating observations: p(xk+1 z 0:k+1 ) = = = i=1 p(z k+1 xk+1 )p(xk+1 z 0:k ) p(z k+1 z 0:k ) Np i i i=1 wk p(z k+1 xk+1 )δ(xk+1 (2. z 0:k .32) − xi ) k+1 (2. as described in Eq.Sec.29)) about the point Fk xi . from the proposal distribution q(xk+1 xi . xk+1 . Perhaps the most common (and the easiest to implement) is the Sampling Importance Resampling (SIR) ﬁlter. z k+1 ). This algorithm approximates the propagation step by using the dynamics model as a proposal distribution. Qk ) k k (2.33) (2. Fk xi . i drawing a random sample for each particle from the distribution xi k+1 ∼ p(xk+1 xk ).1. Under this algorithm.36) . 2. (2.30) Several variants of the particle ﬁlter diﬀer in the way in which this approximated is propagated and updated from step to step. Dynamical models and estimation 31 and including time k. is approximated through a set of Np weighted samples: Np p(xk z 0:k ) ≈ i=1 i wk δ(xk − xi ) k (2.34) p(z k+1 z 0:k ) Np i wk+1 δ(xk+1 − xi ) k+1 where i wk+1 = i wk p(z k+1 xi ) k+1 Np j j j=1 wk p(z k+1 xk+1 ) (2. p(xk z 0:k ). to yield an approximation of the prior density at the next time step of: Np p(xk+1 z 0:k ) ≈ i=1 i wk δ(xk+1 − xi ) k+1 (2. A more sophisticated variant of the particle ﬁlter is the Sequential Importance Sampling (SIS) algorithm. This distribution k can be obtained using the extended Kalman ﬁlter equations: the Dirac delta function δ(xk − xi ) at time k will diﬀuse to give: k p(xk+1 xi ) = N (xk+1 .19). (2.
Probabilistic graphical models provide a framework for . The resulting approximation for the distrik+1 k+1 bution of xk+1 conditioned on the measurements z 0:k+1 is: Np p(xk+1 z 0:k+1 ) ≈ i=1 i wk+1 δ(xk+1 − xi ) k+1 (2. z k+1 ) = N (xk+1 .41) p i where c is the normalization constant necessary to ensure that i=1 wk+1 = 1. BACKGROUND at time (k + 1).38) (2.44) 2. This distribution can be updated using the EKF update equation (Eq. k). k)Qk k+1 k+1 k Ki k+1 = Qk H(Fk xi .1. k)Qk H(Fxi . and p(z k+1 xi ) = N {z k+1 . and the importance i sampling weight wk+1 is calculated by i i wk+1 = cwk p(z k+1 xi )p(xi xi ) k+1 k k+1 q(xi xi . a Gaussian representation can be momentmatched to the particle distribution by calculating the mean and covariance: Np ˆ xk = i=1 Np i wk xi k (2. k)] k k+1 k Pi = Qk − Ki H(Fk xi . h(xi . xi . k)T k k k + Rk ] −1 (2.43) Pk = i=1 i ˆ ˆ wk (xi − xk )(xi − xk )T k k (2.39) (2.42) At any point in time.25)–(2. A new particle xi k+1 is drawn from the distribution in Eq. z k+1 ) k+1 k N (2. Pi ) k where ˆ k+1 xi = Fk xi + Ki [z k+1 − h(Fk xi . one can obtain greater accuracy than is possible using the EKF (which uses a single linearization point). k)T [H(Fxi . (2. Rk }. the complexity of an estimation problem increases exponentially as the number of variables increases.5 Graphical models In general.37).27)) to obtain: ˆ k+1 k+1 q(xk+1 xi .32 CHAPTER 2.37) (2.40) Because the linearization is operating in a localized region. (2.
two random variables xv and xw (v. if we denote by N (v) the neighbors of vertex v. A maximal clique is a clique which is not a subset of any other clique in the graph (i. 92]..e. 88.. then xv will be independent of all other variables in the graph conditioned on N (v).45) Graphical models are useful in recognizing independence structures which exist. . w ∈ C. i. We assume that our model is represented as as graph G consisting of vertices V and edges E ⊆ V × V.1. . A clique is deﬁned as a set of vertices C ⊆ V which are fully connected.e. Obviously. Some problems involving sparse cyclic graphs can be addressed eﬃciently by combining small numbers of nodes to obtain tree structure (referred to as a junction tree). i. . a clique for which no other vertex can be added while still retaining full connectivity). Corresponding to each vertex v ∈ V is a random variable xv and several possible observations (from which we may choose some subset) of that variable. {zv . The simplest and . 2. 1 n {zv . w) ∈ E ∀ v. (v. zv v }. Dynamical models and estimation 33 recognizing and exploiting structure which allows for eﬃcient solution. .1. zv ) (2. Further details can be found in [38. w ∈ V) are independent conditioned on a given set of vertices D if there is no path connecting vertices v and w which does not pass through any vertex in D. w) ∈ E denotes that variables xv and xw have direct dependence on each other. while [35. such as loopy belief propagation. Estimation in time series is a classical example of a treebased model: the Kalman ﬁlter (or. a variety of undirected graphical model.e. the joint distribution function can be shown to factorize over the maximal cliques of the graph. but in general approximate methods.. . For example. In this case. the Kalman smoother) may be seen to be equivalent to belief propagation specialized to linear Gaussian Markov chains. 89] extends particle ﬁltering from Markov chains to general graphical models using belief propagation. Denoting the collection of all maximal cliques as M . Edges represent dependences between the local random variables. 2. the joint distribution of variables and observations can be written as: nv 1 n p({xv . Here we brieﬂy describe Markov random ﬁelds. 73. zv v }}v∈V ) ∝ C∈M ψ({xv }v∈C ) v∈V i=1 i ψ(xv . Estimation problems involving undirected graphical models with a tree as the graph structure can be solved eﬃciently using the belief propagation algorithm. more precisely. All observations are assumed to depend only on the corresponding local random variable.Sec. 91] provides a lower limit on the mean square error e performance achievable by any estimator of an underlying quantity. (v. are necessary. .6 Cram´rRao bound e The Cram´rRao bound (CRB) [84. . .
ˆ x(z). BACKGROUND most common form of the bound. Theorem 2.2. deals with unbiased estimates of nonrandom parameters (i. While the bound takes on the same form. must satisfy the following bound on covariance: E zx [ˆ (z) − x][ˆ (z) − x]T x x Cz x [Jz ]−1 x where Jz is the Fisher information matrix. Let x be a nonrandom vector parameter. must satisfy the following bound on covariance: x.1. which can be calculated equivalently through x either of the following two forms: Jz x E zx zx [ x log p(zx)][ T x log p(zx)] = E {−∆x log p(zx)} x From the ﬁrst form above we see that the Fisher information matrix is positive semideﬁnite. the bound applies to any estimator. which can be calculated equivalently through x . Let x be a random vector parameter with probability distribution p(x). the Fisher information matrix now decomposes into two terms: one involving prior information about the parameter.. x(z). x z Theorem 2. ˆ and z be an observation with model p(zx).34 CHAPTER 2. see [84. Because we take an expectation over the possible values of x as well as z. presented below in Theorem 2.z E [ˆ (z) − x][ˆ (z) − x]T x x Cz x [Jz ]−1 x where Jz is the Fisher information matrix. biased or unbiased. parameters which are not endowed with a prior probability distribution). Then any estimator of x based on z. Then any unbiased estimator of x based on z. We adopt convention from [90] that ∆z = x T . The posterior Cram´rRao bound (PCRB) [91] provides a similar performance limit e for dealing with random parameters. 91] for details. The notation A B implies that the matrix A − B is positive semideﬁnite (PSD). and another involving information gained from the observation.1. and z be an observation with distribution p(zx) parameterized by x. We omit the various regularity conditions.e.
Sec.2 Markov decision processes Markov Decision Processes (MDPs) provide a natural way of formulating problems involving sequential structure. z)] = E {−∆x log p(x. The suﬃcient statistic must be chosen such that future values are independent of past values conditioned on the present value (i. in which decisions are made incrementally as additional information is received. it must form a Markov process).e.z x. in practice we will design our controller by selecting an action for the current time considering the following N time steps (referred to as rolling horizon or receding horizon control). .2.z [ x log p(xz)][ T x log p(xz)] = E {−∆x log p(xz)} x The individual terms: J∅ x ¯ Jz x E {−∆x log p(x)} x x x. x x Convenient expressions for calculation of the PCRB in nonlinear ﬁltering problems can be found in [90]. z)][ T x log p(x.z [ x log p(zx)][ T x log p(zx)] = E {−∆x log p(x)} + E {−∆x log p(zx)} x x x. We will concern ourselves primarily with problems involving planning over a ﬁnite number of steps (socalled ﬁnite horizon problems).z E {−∆x log p(zx)} x are both positive semideﬁnite. Markov decision processes 35 any of the following forms: Jz x x.z = E x. We also deﬁne C∅ [J∅ ]−1 . 2. The decision state is a suﬃcient statistic for all past and present information upon which the controller can make its decisions. We denote by Xk ∈ X the decision state of the system at time k. z)} x =E [ x x log p(x)][ T x log p(x)] + E x.. The basic problem formulation includes: State. 2.z x E [ x log p(x. The recursive expressions are similar in form to the Kalman ﬁlter equations.z x.
uk ) E ∗ Jk+1 (Xk+1 ) (2.. The expected reward to go of a given policy can be found through the following backward recursion: π Jk (Xk ) = gk (Xk .46) π commencing from the terminal condition JN (XN ) = gN (XN ).36 CHAPTER 2. Eq. X Uk k ⊆ U is the set of controls available at time k if the system is in state Xk . this process in general requires inﬁnite computation and storage.46) can be used to evaluate the expected reward of a policy when the cardinality of the state space X is small enough to allow computation and storage for each element. When the cardinality of the state space X is inﬁnite. We denote by uk ∈ Uk k the control to be applied to the system at time k.47) ∗ commencing from the same terminal condition. will be distributed according to the probability measure P (·Xk . This consists of two components: the perstage reward gk (Xk . uk ) + k X uk ∈Uk k Xk+1 ∼P (·Xk . and by π = {µ1 . (2. . µk (Xk )) + Xk+1 ∼P (·Xk .e. The objective of the system is speciﬁed as a reward (or cost) to be maximized (or minimized).47) and Eq. then the state at time (k+1). The optimal policy is implicity speciﬁed by the optimal reward to go through the expression µ∗ (Xk ) = arg max gk (Xk . If the state at time k is Xk and control uk is applied. The expected reward to go of the optimal policy can be formulated similarly as a backward recursion: ∗ Jk (Xk ) = max uk ∈Uk k X gk (Xk . a rule that speciﬁes which control one should apply if one arrives in a particular state at a particular time. . Transition. Xk+1 . although there are special cases (such as LQG) in which the reward functions admit a ﬁnite parameterization. JN (XN ) = gN (XN ). (2. .µk (Xk )) E π Jk+1 (Xk+1 ) (2. (2. assuming that the optimization is solvable. . which is the reward associated with arriving in state XN on completion of the problem. in others it will remain constant. uk ) + Xk+1 ∼P (·Xk . In some problem formulations this set will vary with time and state.uk ) E ∗ Jk+1 (Xk+1 ) (2.48) The expression in Eq. . which is the immediate reward if control uk is applied at time k from state Xk . We denote by µk : X → U the policy for time k. and the terminal reward gN (XN ). uk ). The solution of problems with this structure comes in the form of a policy. µN } the timevarying policy for the ﬁnite horizon problem.48) may be used to determine the optimal policy and its expected reward. BACKGROUND X Control. uk ). Reward. Furthermore. i.
as the size of the set Ik which is needed in practice grows rapidly with the number of states.e. uk ) and gN (xN ). uk ) = xk gk (xk .1 Partially observed Markov decision processes Partially observed MDPs (POMDPs) are a special case of MDPs in which one seeks to control a dynamical system for which one never obtains the exact value of the state of the system (denoted xk ). a 1995 study [63] found solution times 2 or piecewise linear concave in the case of minimizing cost rather than maximizing reward.. Theoretically. but rather only noisecorrupted observations (denoted z k ) of some portion of the system state at each time step. observation values and control values. POMDPs are PSPACEcomplete (i.51) Solution strategies for POMDPs exploit this structure extensively. In this case. . The limitation of these methods is that they are restricted to small state spaces. Markov decision processes 37 2. for i some Ik and some vk (·).49) (2. 78]. solving for an optimal (or nearoptimal) choice of these parameters. the history of all controls applied to the system and the resulting observations).. The fundamental assumption of POMDPs is that the reward per stage and terminal reward.Sec.50) gN (XN ) = where Xk = p(xk z 0:k−1 . i. gk (Xk . and unlimited computation time) [15. i. one can show [87] that the reward to go function at all time steps will subsequently be a piecewise linear convex2 function of the conditional probability distribution Xk . u0:k−1 ) is the conditional probability distribution which forms the decision state of the system.e. There is a unique structure which results: the induced reward per stage will be a linear function of the decision state. empirically. or the conditional probability distribution of system state conditioned on previously received observations (which forms a suﬃcient statistic for the information vector [9]).e. u0:k−1 ) gN (xN )p(xN z 0:N −1 . one can then calculate an induced reward as the expected value of the given quantity.2. u0:k−1 ) (2.e. Given the conditional probability distribution of system state. In this case. gk (xk . i ∈ Ik . Xk .. as hard as any problem which is solvable using an amount memory that is polynomial in the problem size. uk )p(xk z 0:k−1 . u0:N −1 ) xN (2.. 2. ∗ Jk (Xk ) = max i∈Ik xk i vk (xk )p(xk z 0:k−1 . can be expressed as functions of the state of the underlying system.2. one can reformulate the problem as a fully observed MDP. in which the decision state is either the information vector (i.
In this application. There are many problems in which solving the MDP is intractable..2. and anticipates the arrival of future information through the Markov transition model.2 Open loop.e. . ﬁfteen observation values and four actions. Furthermore. single choice of which control to apply at each time. and then constructing a new plan which incorporates the information received in the interim. closed loop and open loop feedback The assumption inherent in MDPs is that the decision state is revealed to the controller at each decision stage. attempts to discretize this type of state space are also unlikely to arrive at a suﬃciently small number of states for this class of algorithm to be applicable. there is a fundamental tradeoﬀ which arises between estimation performance and energy cost.3 Constrained dynamic programming In Chapter 5 we will consider sensor resource management in sensor networks. The controller operates by constructing a plan for the ﬁnite horizon. The OLFC is a commonly used suboptimal method in these situations. Open Loop Feedback Control (OLFC) is a compromise between these two extremes: like an open loop controller. This plan does not anticipate the availability future information. the diﬀerence in performance between OLFC and CLC can be arbitrarily large. unlike an open loop controller. such strategies are intractable in cases where there is an inﬁnite state space. as opposed to a policy). optimizing one quantity subject to a constraint on the other. when new information is received it is utilized in constructing an updated plan. Open loop control (OLC) represents the opposite situation: where one constructs a plan for the ﬁnite horizon (i. a plan (rather than a policy) is constructed for the ﬁnite horizon at each step of the problem. yet open loop plans can be found within computational limitations.2. 2. nor utilizes that information as it arrives. A natural way of approaching such a tradeoﬀ is as a constrained optimization. Clearly. The optimal policy uses this new information as it becomes available. This is referred to as closed loop control (CLC). One can prove that the performance of the optimal OLFC is no worse than the optimal OLC [9]. BACKGROUND in the order of hours for problems involving ﬁfteen underlying system states. e..38 CHAPTER 2. and neither anticipates the availability of future information. when the underlying system state involves continuous elements such as position and velocity. executing one or more steps of that plan. However. 2. only a policy can do so.g.
the presence of a duality gap is possible since the optimization space is discrete. λ) = g(Xk . we seek the policy corresponding to the optimal solution to the constrained minimization problem: k+N −1 min E π i=k k+N −1 g(Xi . µk+N −1 } the set of policies for the next N time steps. 18. similar to that in [18].53) for the value of λ attaining the maximum in Eq. (2. . uk ) is the perstage cost and G(Xk . µi (Xi )) − M ].54). E i=k G(Xi .. uk ) + λG(Xk .54) D We note that the dual function Jk (Xk .53) and solving the dual optimization problem involving this function: D L Jk (Xk ) = max Jk (Xk .Sec. 13. If it happens that the optimal solution produced by the dual problem has no duality . µi (Xi )) + λ i=k G(Xi . which yields a convenient method of approximate evaluation for the problem examined in Chapter 5. uk ) ¯ (2. k + N − 1}.e.52) where g(Xk . a common approximation method for discrete optimization problems. . and by πk = {µk . λ) takes the form of an unconstrained dynamic program with a modiﬁed perstage cost: g (Xk . µi (Xi )) − M (2. µi (Xi )) s. ·)}i=k:k+N −1 is the policy attaining the minimum in Eq. Markov decision processes 39 Constrained dynamic programming has been explored by previous authors in [2. . We address the constraint through a Lagrangian relaxation. µi (Xi )) ≤ M (2. we minimize the cost incurred in the planning horizon involving steps {k. The size of the duality gap is given by the expression λ E[ i G(Xi . uk ) is the perstage contribution to the additive constraint function. where πk = {µi (·. λ) = min E π i=k k+N −1 g(Xi . . We seek to minimize the cost over an N step rolling horizon. λ) λ≥0 (2. Denoting by µk (Xk ) the control policy for time k. . by deﬁning the dual function: k+N −1 D Jk (Xk . i. 94]. 2. . .55) The optimization of the dual problem provides a lower bound to the minimum value of the original constrained problem. . uk .t. at time k. (2.2. We describe a method based on Lagrangian relaxation.
πk . µi (Xi )) − M (2. As the highlevel system objective becomes further removed from the performance of the sensing system. πk . the solution returned would have been optimal if the constraint level had been lower such that the constraint was tight. G(Xi . if a sensing system is being used to estimate the location of a stranded yachtsman in order to minimize the distance from the survivor to where airdropped supplies land. This can occur in one of two ways: either the Lagrange multiplier λ is zero. then the resulting solution is also the optimal solution of the original constrained problem. i. such that the solution of the unconstrained problem satisﬁes the constraint. iteratively stepping in the direction of a subgradient with a decreasing step size [10]. or the solution yields a result for which the constraint is tight. If a duality gap exists. or to maximize the probability that the distance is less than a critical threshold.56) 2. The method described in [18] avoids a duality gap utilizing randomized policies.. λ) = E i D In other words. 95].. (2. When an application demands continual tracking of multiple objects without a direct terminal objective. subgradients are replaced by supergradients. for multiple constraints the linear programming column generation procedure described in [16. For a single constraint. S(Xk . Conceptually. the set of all supergradients. 94] can be more eﬃcient. [32. 3 . the most appropriate choice of reward function becomes less apparent. This section explores the deﬁnitions of and basic inequalities involving entropy and mutual information. BACKGROUND gap. λ) ∈ ∂Jk (Xk . the dual problem in Eq.54) can be solved using a subgradient method [10]. a better solution may exist satisfying the constraint. All of the results Since we are maximizing a nondiﬀerentiable concave function rather than minimizing a nondiﬀerentiable convex function. however. the most appropriate choice of reward function might be obvious from the system speciﬁcation. The subgradient method operates according to the same principle as a gradient search.40 CHAPTER 2. For example. The following expression can be seen to be a supergradient3 of the dual objective: S(Xk . Entropy is a commonlyused measure of uncertainty in many applications including sensor resource management.e. λ). where ∂ denotes the superdiﬀerential. 61. Each of these relates directly to a speciﬁc quantity at a speciﬁc time. one may also employ methods such a line search.g.3 Information theoretic objectives In some circumstances. e. 41. it is unclear what reward function should be selected. then a natural objective would be to minimize the expected landing distance.
59) is on the random variable z.Sec. joint entropy and conditional entropy are deﬁned as: H(x) = − H(x.59) (2. as opposed to the generalization referred to as Renyi entropy.61) We will sometimes use the notation H(xˇ) to denote conditioning on a particular value. The conditioning in H(xz) in Eq. It is traditional to use a base2 logarithm when dealing with discrete variables. hence an expectation is performed over the possible values that the variable may ultimately assume. (2. z) − H(z) The above deﬁnitions relate to diﬀerential entropy. we have H(x) ≥ 0.62) . 2. If the underlying sets are discrete. z) log p(x. and can be found in classical texts such as [23]. z i.3. We can also condition on a particular value of a random variable: H(xz = ζ) = − p(xz = ζ) log p(xz = ζ)dx (2. Throughout this document we use Shannon’s entropy.59) and Eq. 2.60) = H(x.3. we observe that: z ˇ H(xz) = pz (ζ)H(xz = ζ)dζ (2. z) = − H(xz) = − p(x) log p(x)dx p(x. (2. We will also use a natural logarithm in cases involving a mixture of continuous and discrete quantities..1 Entropy Entropy. z)dxdz p(z) p(xz) log p(xz)dxdz (2. In the discrete case. then a counting measure is used. H(xˇ) H(xz = z ).e. which concerns continuous variables. Information theoretic objectives 41 presented are wellknown.57) (2. and a natural logarithm when dealing with continuous quantities. We will exploit various properties that are unique to Shannon entropy.61).58) (2. eﬀectively replacing the integral by a summation. Comparing Eq. (2.
like conditional entropy.. we y ˇ can write: I(x. zy) = py (ψ) p(x. zy) = py (ψ)I(x. .67) (2. zˇ) to indicate conditioning on a pary ticular value.71) (2. zi z1 .70) = H(xy) − H(xz. z1 .73) (2.74) = H(xy = ψ) − H(xz. .3. zy = ψ) = p(x. y) = H(zy) − H(zx. . zn ) = i=1 I(x. zy = ψ) dxdzdψ p(xy = ψ)p(zy = ψ) (2. zy = y ).2 Mutual information Mutual information (MI) is deﬁned as the expected reduction in entropy in one random variable due to observation of another variable: I(x. BACKGROUND 2. zy) and in the case of conditioning on a particular value.72) (2. I(x.64) (2. zˇ) I(x. y) = H(xy) + H(zy) − H(x. I(x. Also note that.63) (2. .e. conditional MI can be deﬁned with conditioning on either a random variable. zy = ψ) Again. z) = p(x.42 CHAPTER 2. In either case. ψ: I(x.69) (2. y = ψ) = H(zy = ψ) − H(zx. . z) dxdz p(x)p(z) (2. we will sometimes use the notation I(x.75) The chain rule of mutual information allows us to expand a mutual information expression into the sum of terms: n I(x.66) = H(x) − H(xz) = H(z) − H(zx) = H(x) + H(z) − H(x. i.68) (2. .65) (2. the conditioning appears in all terms of the deﬁnition. zy = ψ) log p(x. in the case of conditioning on a random variable y. z) Like conditional entropy. zy = ψ) dxdz p(xy = ψ)p(zy = ψ) (2. y = ψ) = H(xy = ψ) + H(zy = ψ) − H(x. zy = ψ) log p(x.76) . zi−1 ) (2. z) log p(x. . or a particular value. i. zy = ψ)dψ (2.e. .
. i. 1}. . . xn . . In this case. .Sec. Information theoretic objectives 43 The ith term in the sum. .3 KullbackLeibler distance The relative entropy or KullbackLeibler (KL) distance is a measure of the diﬀerence between two probability distributions. . zn ) = i=1 I(xi . . deﬁned as: D(p(x)q(x)) = 4 p(x) log p(x) dx q(x) (2. we N −1 1 1 1 NN obtain H(x) = log N − N log (N −1) = N log N + N log (N −1)N −1 > 0. . I(x. i. we ﬁnd: I(x1 .3. xn . . . . Suppose we have observations (z1 . zi−1 ). . . If we receive the observation value z = 0.4 Since I(x. I(x. . . 2. Conditioning 1 on the random variable. zi z1 . xn ) = H(zi ) − H(zi xi ) = I(xi . . zi−1 ) represents the incremental gain we obtain in our knowledge of x due to the new observation zi . with conditioning on either an observation random variable or an observation value. 2. . .77) I(x1 . z) = H(x) − H(xz).79) This is also true for conditional MI. zi z1 . . . the chain rule may be written as: n (2. . z) ≥ 0. Suppose we want to infer a state x ∈ {0. . with equality if and only if x and z are independent. that conditioning on a random variable reduces entropy.. hence H(xz = 0) = 0 < H(x). with p(z = 0x = 0) = 1 and p(z = 0x = 0) = 0. . zn ) of an underlying state (x1 . . . The following example illustrates that conditioning on a particular value of a random variable may not reduce entropy. . this implies that H(x) ≥ H(xz). . . conditioned on the previous observation random variables (z1 .e.e. z1 . Calculating the entropy. . zi−1 ) (2. . . . We have an N observation z ∈ {0. zi ) = H(zi ) − H(zi x1 . . . . zi ) In this case. If we receive the observation value z = 1 then H(xz = 1) = log N > H(x). .. Example 2. . .1.3. N } where p(x = 0) = (N − 1)/N and p(x = i) = 1/N 2 . then we know that x = 0. . i = 0 (assume N > 1). with equality iﬀ x and z are independent under the respective conditioning. .78) It can be shown (through Jensen’s inequality) that mutual information is nonnegative. xn ). we ﬁnd H(xz) = N log N < H(x). where observation zi is independent of all other observations and underlying states when conditioned on xi .
.3. Another interesting relationship can be obtained by considering the expected KL distance between the posterior given a set of observations {z1 . and the posterior given the single observation zu (u ∈ {1. . . . .63). zn ) log p(x. . (2. .4 Linear Gaussian models Entropy is closely related to variance for Gaussian distributions. under linearGaussian assumptions. zn . . . dzn p(x. . . . . . . Thus. . zn )p(zu ) dxdz1 . . . z1 . . . z)p(x)p(z)) Manipulating Eq. . . zn ) p(xz1 . . . . (2. . . . BACKGROUND (2. . z1 . we obtain: I(x. (2. z1 . zn ) Since the second term is invariant to the choice of u. z) = D(p(x. The entropy of an ndimensional multivariate Gaussian distribution with covariance P is equal to 1 log 2πeP = n log 2πe + 1 log P. zn ) dxdz1 . . 2. . . dzn p(xzu ) p(x. . x) p(x)p(zu ) + log dxdz1 .44 Comparing Eq. zn ) log p(x. zn ) p(xz1 . . . . . . . .63) we also obtain: I(x. zn } from which we may choose a single observation. . . . n}): ED(p(xz1 . . z) = p(z) p(xz) log CHAPTER 2. zu ) + I(x. . . we obtain the result that choosing u to maximize the MI between the state x and the observation zu is equivalent to minimizing the expected KL distance between the posterior distribution of x given all observations and the posterior given only the chosen observation zu . . .81) Therefore the MI between x and z is equivalent to the expected KL distance between the posterior distribution p(xz) and the prior distribution p(x). . dzn p(x. zn )p(xzu )) = = = p(z1 . zn ) p(z1 . . Suppose x and z are jointly Gaussian random variables with covariance: P= Px Pxz PT Pz xz . . zu )p(z1 . . . . . zn ) log = −I(x. .79) with Eq. . or the volume of the uncertainty hyperellipsoid.80) p(xz) dxdz = E D(p(xz)p(x)) z p(x) (2. . zu ) p(x)p(z1 . mini2 2 2 mizing conditional entropy is equivalent to minimizing the determinant of the posterior covariance. z1 . . . . . . .
0. .86) Furthermore.83) In the classical linear Gaussian case (to which the Kalman ﬁlter applies). . i. . pxn. and H(px1 . . .2 = pxn > 0) then: H(px1 . H(px1 .2 . pπ(xn ) ).1.84) (2.2 ) = H(px1 . z) = 1 Px  log 2 Px − Pxz P−1 PT  z xz Pz  1 = log 2 Pz − PT P−1 Pxz  xz x (2. . .1 + pxn. pxn. . Information theoretic objectives 45 Then the mutual information between x and z is given by: I(x. . . pxn ) = H(pπ(x1 ) . pxn p xn This condition can be relaxed to H(x) being a Lebesgue integrable function of p(x) (see [7]). and that entropy and MI are uniquely determined by the covariance of a Gaussian distribution. (2. .1 and xn. where z = Hx + v and v ∼ N {v. R} is independent of x. zy) = I(x.2). z) = 1 Px  log 2 Px − Px HT (HPx HT + R)−1 HPx  1 P−1 + HT R−1 H = log x 2 P−1  x HPx HT + R 1 = log 2 R (2. pxn ) (deﬁned for all n). . . pxn ) + p(xn )H 5 pxn. 3. We assume x ∈ {x1 . and use the notation pxi = P [x = xi ]. xn } where n < ∞. . zy = ψ) ∀ ψ (2.3. 2.. I(x. . . . 1.1 pxn. If xn. 2. . 2. .3. . . pxn−1 . . . .82) (2.e. .85) (2. . if π(x) is a permutation of x then H(px1 . . . . . H(px1 . . . .2 partition the event xn (so that pxn. .87) is due to the fact that the posterior covariance in a Kalman ﬁlter is not aﬀected by the observation value (as discussed in Section 2.Sec. . if x.1 . y and z are jointly Gaussian then: I(x.5 Axioms resulting in entropy One may show that Shannon’s entropy is the unique (up to a multiplicative constant) realvalued measure of the uncertainty in a discrete probability distribution which satisﬁes the following three axioms [7].87) The result of Eq. . . pxn ) is permutation symmetric. pxn ) is a continuous5 function of the probability distribution (px1 . pxn ) = H(x).
. .. . .. xk+N −1 ˇ 0 .. In this case. .90) z (Xl−1 ) = arg max µk . . z k−1 ).µk+N −1 µ (Xk+N −1 ) ˇ ˇ 0 . z k+N −1 µ (Xk+N −1 ) ˇ ˇ 0 . We choose the decision state to be the k z conditional PDF Xk = p(xk ˇ 0:k−1 ). . . we obtain the equivalent formulation: [25] πk = arg min µk . ..6 Formulations and geometry A common formulation for using entropy as an objective for sensor resource management is to seek to minimize the joint entropy of the state to be estimated over a rolling horizon. .91) results from applying Eq. .88) where Xk an appropriate choice of the decision state (discussed below). ... µk+N −1 } which minimizes the expected entropy over the next N time steps conditioned on values of the observations already received: ˇ πk = arg min H(xk . If we have a problem involving estimation of the state of multiple objects. . . xk+N −1 ˇ 0 . z k−1 . z k k z µ (Xk ) l−1 . . 2. . . . BACKGROUND The ﬁnal axiom relates to additivity of the measure: in eﬀect it requires that. . . the canonical problem that we seek to solve is to ﬁnd at time k the nonstationary policy πk = {µk . . Although conditioning is denoted on the history z ˇ of observations {ˇ 0 . . . This problem can be formulated as a MDP in which the reward per stage is chosen ˇ to be gk (Xk . . assuming that the observations at time i are independent of each other and the remainder of the state conditioned on the state at time i. z k+N −1 − I(xk . . z k k k+N −1 µ (Xk ) k+N −1 . xk+N −1 . . However.µk+N −1 µ (Xk ) k+N −1 .46 CHAPTER 2..µk+N −1 ˇ H(xk . . z k−1 ) z (2. . . . . .89) = arg max I(xk . z k−1 . uk ) = I(xk . z k k µk .78). z k k z µk . . . z uk ˇ 0 . (2.µk+N −1 l=k I(xl .. . . . . z k−1 }..3. . z k−1 ) (2. . . . xk+N −1 . . . . . . . . . the current conditional PDF is a suﬃcient statistic for z all observations in calculation of the reward and the decision state at the next time. . The structure of this MDP is similar to a POMDP in that the decision state is the conditional PDF of the underlying state. . . . .. . z k+N −1 µ (Xk+N −1 ) ) (2. then the uncertainty in x must be equal to the uncertainty in y plus the expected uncertainty remaining in x after y has been revealed. (2.. . . . ...91) Eq. if we receive (in y) part of the information in a random variable x. . we simply deﬁne xk to be the joint state of the objects. the reward per stage cannot be ... z l−1 µ ) (2. . z k−1 ) z µ (Xk ) k+N −1 . Applying Eq.72). . . z l l µ (Xl ) ˇ ˇ 0 . . (2.
which will in general result in a function which is nonconcave and nonconvex. i = j. suppose we are choosing between diﬀerent observations z 1 and z 2 which share the same cardinality and have models p(z 1 x) and p(z 2 x). Consider a continuous relaxation of the problem in which we deﬁne the “quasiobservation” z α with p(z α x) = αp(z 1 x) + (1 − α)p(z 2 x) for α ∈ [0. as discussed above.1. the mutual information I(x.5.25. 1]: u = arg max I(x.Sec. For example. again exposing the combinatorial complexity of the problem. zk . Example 2. This is illustrated in the following example. with = 0. xk = −1 k 1 2 p(zk = 1xk ) = δ. which only conﬁrms that the optimal solution lies at an integer point. xk = 0 xk = 0 0.5.075 and δ = 0.5 and p(xk = 0) = p(xk = 1) = 0. by taking the maximum of the reward per stage over several diﬀerent candidate observations. we are taking the pointwise maximum of several diﬀerent concave functions of the PDF p(x). The diagrams illustrate the nonconcave structure which results.1] In this case. 1}. 2. This is illustrated in the following example.1. and the reward to go is not piecewise linear convex. p(zk = 1xk ) = δ. 2.2. z α ) α∈[0. 1} has a uniform prior distribution. z) is a concave function of p(x) for a given p(zx). 0. and a convex function of p(zx) for a given p(x). the observation model p(z α x) is a linear function of α and. The structure underlying the maximization problem within a single stage can also be revealed through these basic geometric observations.3. Accordingly. Assume a prior distribution p(xk = −1) = 0.3. Assume we have two observations available 2 1 to us. The mutual . with the following models: 1 − δ. From [23]. x = 1 xk = 1 k Contour plots of the optimal reward to go function for a single time step and for four time steps are shown in Fig. Consider a single stage of the example from Example 2.2. Consider the problem in which the underlying state xk ∈ {−1. and transitions according to the rule p(xk = ixk−1 = i) = 1 − 2 . Example 2. 1 − δ. x = −1 0. z α ) is a convex function of the observation model p(z α x) for a given prior distribution p(x). zk ∈ {0. Thus this is a convex maximization. Information theoretic objectives 47 expressed as a linear function of the conditional PDF. and p(xk = ixk−1 = j) = . the mutual information I(x.
5 present proofs of the bounds from existing literature. after which Sections 2. It should be noted that these observations relate to a particular choice of parameterization and continuous relaxation. the greedy heuristic and extensions thereof provide the only generally applicable solution to the sensor management problem which is able to handle problems involving a large state space.4. as described in Section 2. one can establish bounds on the loss of performance for using the greedy method rather than the optimal method (which is intractable).4 Set functions.3 provide the theoretical background required to derive these bounds. mostly from [75] and [26]. These bounds will be adapted to alternative problem structures in Chapter 3. Sections 2.. the methods described could be employed in an OLFC manner. in certain circumstances. selection of the best nelement subset of observations to maximize MI is N P complete [46]. . Deﬁnition 2. Furthermore.2. 2. Given the choice of objective. there are no known parameterizations which avoid these diﬃculties. submodularity and greedy heuristics Of the methods discussed in the previous section.48 CHAPTER 2.2.4.4. A set function f is nonnegative if f (A) ≥ 0 ∀ A. BACKGROUND information of the state and the quasiobservation z α is shown as a function of α in Fig. consider the function f : 2U → R (where 2U denotes the set of subsets of the ﬁnite set U) deﬁned as: f (A) = I(x.4.2. not necessarily to a solution which is good in any sense. i. z A ) where z A denotes the observations corresponding to the set A ⊆ U. Thus f (A) would denote the information learned about the state x by obtaining the set of observations zA. The convexity of the function implies that gradientbased methods will only converge to an extreme point. however. we make all of our observation selections before any observation values are received. 2.4. 2. For example.4 and 2.1 Set functions and increments A set function is a realvalued function which takes as its input subsets of a given set. The remarkable characteristic of this algorithm is that.2 and 2. In practice. there are known complexity results: for example.4.e. Throughout this section (and Chapter 3) we assume open loop control. 2.1 (Nonnegative).1.
8 p(xk=−1) Optimal reward for four time steps 0.1.3 0.8 Figure 2.8 1 Figure 2.Optimal reward for single time step 0.8 0.8 p(xk=−1) 0.2 0.2 0.2 0 0 0.4 α 0. Reward in single stage continuous relaxation as a function of the parameter α.4 0.2 0.2 0 0 0.6 0.2.4 0.4 0.4 0.4 0.6 p(xk=0) 0. Contour plots of the optimal reward to go function for a single time step and for four time steps. Smaller values are shown in blue while larger values are shown in red.6 0.6 p(xk=0) 0.1 0 0 0.2 0.6 0.zα) 0. . I(x.
this corresponds to the notion that including more observations must increase the information (i.e. the notion of diminishing returns. we note that ρC (A) ≥ ρC (B) ∀ B ⊇ A for any increment function ρ arising from a submodular function f . 2. A set function f is submodular if f (C ∪ A) − f (A) ≥ f (C ∪ B) − f (B) ∀ B ⊇ A. Intuitively. on average. nondecreasing function. and the set increment by ρB (A) f (A ∪ B) − f (A). Deﬁnition 2. a nondecreasing function will be nonnegative iﬀ f (∅) ≥ 0..e.. MI is an example of a nonnegative. If the observations are conditionally independent conditioned on the state. z A ) = I(x.1. z B\A z A ) ≥ 0.3 (Increment function). the entropy is reduced).2 (Nondecreasing). i. We denote the single element increment by ρj (A) f (A ∪ {j}) − f (A). A f (B) ≥ f (A) ∀ B ⊇ A. Applying the chain rule in reverse. From Deﬁnition 2.3. Deﬁnition 2. z B z A ).2 Submodularity Submodularity captures the notion that as we select more observations.4 (Submodular). set function f CHAPTER 2. then the mutual information between the state and the subset of observations selected is submodular.50 Deﬁnition 2. since I(x. Lemma 2. The following lemma due to Krause and Guestrin [46] establishes conditions under which mutual information is a submodular set function. z B ) − I(x. the increment function for MI is equivalent to ρB (A) = I(x. BACKGROUND is nondecreasing if Obviously.4. ∅) = 0 and I(x. . the value of the remaining unselected observations decreases.
Consequently. z C∪A ) − I(x. this may be intuitively understood as the notion of diminishing returns: that the new observations z C are less valuable if the set of observations already obtained is larger. Independence systems provide a basic structure for which we can construct any valid set iteratively by commencing with an empty set and adding one . While this is true on average. The simple result that we will utilize from submodularity is that I(x. Set functions.1 relies on the fact that conditioning reduces entropy. 2. z C∪B ) − I(x. submodularity and greedy heuristics 51 Proof. it is necessarily not true for every value of the conditioning variable.4. z A ) = I(x. z B ) (h) (a). z C z A ) ≥ I(x. Consider B ⊇ A: I(x. As discussed above.4. z C\A z A ) (b) (a) = I(x. z C∩(B\A) z A∪(C\B) ) ≥ I(x. nondecreasing and submodular. 2. z C\B z A ) + I(x. Throughout this document. z C\B z A ) = H(z C\B z A ) − H(z C\B x.3 Independence systems and matroids In many problems of interest. z A ) (c) (d) (e) = H(z C\B z A ) − H(z C\B x) ≥ H(z C\B z B ) − H(z C\B x) (f ) (g) = I(x. and (f ) from the fact that conditioning reduces entropy. properties which mutual information satisﬁes. the set of observation subsets that we may select possesses particular structure. (c) from nonnegativity.Sec. our proofs exploiting submodularity will apply to open loop control (where the value of future actions is averaged over all values of current observations) but not closed loop control (where the choice of future actions may change depending on the values of current observations). (e) from the assumption that observations are independent conditioned on x. (d) and (g) from the deﬁnition of mutual information. z C z B ) ∀ B ⊇ A. z C\B z B ) = I(x. we will assume that the reward function f is nonnegative. The proof of Lemma 2. (b) and (h) result from the chain rule.
Such a set is referred to as being maximal. {b. d}. then we can extend A1 = {c} with the element d in the second iteration. b}. if for some i. {a. {d}. b. {c}. {a}. F2 = {∅. {b}. c}. d}}. {a. maintaining a valid set at all times. F2 ) are independence systems. F ) is an independence system we are guaranteed that. When (U. {c}. {b. then / we cannot add it at any later iteration either. d} ∈ F3 . {a. The following example illustrates collections of sets which do and do not form independence systems. The collection F3 in the above example violates this: we cannot extend A0 = ∅ with the element d in the ﬁrst iteration. Then (U. {b}. {b. A set A ∈ F is maximal if A ∪ {b} ∈ F ∀ b ∈ U\A. d}}. d} and {c. F1 ) and (U. . {a.5.52 CHAPTER 2. Example 2. while dependent sets correspond to columns which are linearly dependent.e. c}. b. Independent sets correspond to subsets of columns which are linearly independent. for all N ⊆ U. The essential characteristic is therefore that any subset of a valid set is valid. ensuring that Ai ∈ F ∀ i. invalid) set.. F ) is an independence system in which.. F1 = {∅. all maximal sets of the collection F N {A ∈ F A ⊆ N } have the same cardinality. d}. The iterative process for constructing a set terminates when we reach a point where adding any more elements yields a dependent (i. d}. c}. F3 ) is not since {d} ⊆ {c. {a. {a}.e. d}. Deﬁnition 2. F ) is an independence system if F is a collection of subsets of U such that if A ∈ F then B ∈ F ∀ B ⊆ A. b}. but {d} ∈ F3 . yet if we choose element c in the ﬁrst iteration. while (U. {d}.6 (Maximal). {a}. A matroid (U. BACKGROUND element at a time in any order.5 (Independence system). The structure of a matroid is analogous with the that structure results from associating each element of U with a column of a matrix. i. The members of F are termed independent sets. while subsets of U which are not members of F are termed dependent sets. {a. Ai ∪ {u} ∈ F then / Aj ∪ {u} ∈ F ∀ j > i. Deﬁnition 2. if we cannot add an element at a particular iteration. (U. {c. {a. b}. {b}. c}. Deﬁnition 2.7 (Matroid). / We construct a subset by commencing with an empty set (A0 = ∅) and adding a new element at each iteration (Ai = Ai−1 ∪ {ui }). We illustrate this below Example 2. {c}. c. {c. / Matroids are a particular type of independence system for which eﬃcient optimization algorithms exist for certain problems. Let U = {a.4. {a. c}} and F3 = {∅.
Note that the deﬁnition does not require N ∈ F . Since B ∈ F A∪B . {c}. b}. d}}. If A < B then ∃ u ∈ B\A such that A ∪ {u} ∈ F . F1 ) and (U. {a}.2 with A = {a} and B = {c. {a}. c}. Lemma 2. {b. e. contradicting the maximality of A. {b}. the structure of a matroid is analogous with the structure of linearly independent columns in a matroid. c}. consider the matrices Mi = [ca cb cc cd ].2. hence A ∪ {u} ∈ F N (noting the deﬁnition of F N ). {a. Then (U. and similarly for c and d. {b. {a. {b}. e. d}}. Proof. Only if: Consider F A∪B for any A. The following example illustrates independence systems which are and are not matroids. As an illustration of this. Let U = {a. c1 1 1 ca = 1 1 0 cb = 1 1 0 cc = 1 0 1 cd = 1 0 1 This prohibits elements a and b from being selected together.g.4. b}.. B ∈ F such that A < B. {b. A cannot be maximal in F A∪B .. {d}.Sec. {a. {c}. {d}. where cu is the matrix column associated with element u in i i i i i the matrix corresponding to Fi . d}. 2. {a. B ∈ F with A < B. c}. {a. c}. d}. the columns are such that ca and 1 b are linearly dependent. F3 ) is not (applying Lemma 2. The following lemma establishes an equivalent deﬁnition of a matroid. F ) is a matroid if and only if ∀ A. ∃ u ∈ B\A such that A ∪ {u} ∈ F . the columns are such that any two are linearly independent. note that A. Let A and B be two maximal sets in F N . and cc and cd are linearly dependent.. F2 ) are matroids. B ⊆ U. {c. Set functions. {b}. In the case of F2 . b. {b. F1 above) .5. F1 = {∅. d}. An independence system (U. d}} and F3 = {∅. ca = 2 1 0 cb = 2 0 1 cc = 2 1 1 cd = 2 1 −1 The independence systems corresponding to each of the following problems may be seen to be a matroid: • Selecting the best observation out of a set of at each time step over an N step planning horizon (e.g. d}. Example 2. {d}. {a}. As discussed earlier. {c.g. {c}. submodularity and greedy heuristics 53 The collection F N represents the collection of sets in F whose elements are all contained in N . while (U. If: Consider F N for any N ⊆ U. c. {a. F2 = {∅. In the case of F1 . hence ∃ u ∈ B\A such that A ∪ {u} ∈ F . d}).
. The . such that no more than K observations selected in total 2. Denote by O the optimal set.4 Greedy heuristic for matroids The following is a simpliﬁed version of the theorem in [77] which establishes the more general result that the greedy algorithm applied to an independence system resulting 1 from the intersection of P matroids achieves a reward no less than P +1 × the reward achieved by the optimal set.8 achieves a reward no less than 0. .3. 2.5× the reward achieved by the optimal set.2).g. ON = O. Deﬁne Oi−1 Oi \{oi }. The greedy algorithm for selecting a set of observations in a matroid (U. 1.. Deﬁne N O. .8. the new element selected is: gi = arg max ρu (Gi−1 ) u∈U \Gi−1 Gi−1 ∪{u}∈F where Gi = Gi−1 ∪ {gi }. The proof has been simpliﬁed extensively in order to specialize to the single matroid case. Applied to a submodular. nondecreasing function f (·) on a matroid. and / Gi−1 ∪ {oi } ∈ F (such an element exists by Lemma 2. . the simpliﬁcations illuminate the possibility of a diﬀerent alternative theorem which will be possible for a diﬀerent style of selection structure as discussed in Chapter 3. F ) commences by setting G0 = ∅. }.8. Proof. The algorithm terminates when the set Gi is maximal.4. Without loss of generality. assume that O = G (since all maximal sets have the same cardinality. BACKGROUND • Selecting the best kelement subset of observations out of a set of n (e. and by G the set chosen by the algorithm in Deﬁnition 2. . . N − 2. . To our knowledge this bound has not previously been applied in the context of maximizing information. Deﬁnition 2. and since the objective is nondecreasing). and for each i = N − 1.54 CHAPTER 2. Theorem 2. the greedy algorithm in Deﬁnition 2. take oi ∈ Oi such that oi ∈ Gi−1 . F2 above) • The combination of these two: selecting the best ki observations out of a set of ni at each time step i over an N step planning horizon • Selecting up to ki observations out of a set of ni at each time step i over an N step planning horizon. At each stage i = {1.
the bound obtainable is tighter. 2. Then the greedy algorithm applied to the nondecreasing submodular function f (·) on the independence system (U. (d) is a simple rearrangement using the above construction O = {o1 .632× the reward achieved by the optimal set. submodularity and greedy heuristics 55 bound can be obtained through the following steps: (a) f (O) − f (∅) ≤ f (G ∪ O) − f (∅) (b) ≤ f (G) + j∈O\G (c) ρj (G) − f (∅) ρj (G) − f (∅) j∈O N ≤ f (G) + (d) = f (G) + i=1 N ρoi (G) − f (∅) ρoi (Gi−1 ) − f (∅) i=1 N (e) ≤ f (G) + (f ) ≤ f (G) + i=1 (g) ρgi (Gi−1 ) − f (∅) = 2(f (G) − f (∅)) where (a) and (c) result from the nondecreasing property. oN }. (b) and (e) result from submodularity. . . . In this case. N  ≤ K}. and (g) is a simple rearrangement of (f ).8. Krause and Guestrin [46] ﬁrst recognized that the bound can be applied to sensor selection problems with information theoretic objectives. F ) achieves a reward no less than (1 − 1/e) ≈ 0. Set functions. as opposed to an arbitrary matroid.4. The theorem comes from [76]. 2. .t. Suppose that F = {N ⊆ U s.4.5 Greedy heuristic for arbitrary subsets The following theorem specializes the previous theorem to the case in which F consists of all Kelement subsets of observations. Theorem 2. which addresses the more general case of a possibly decreasing reward function.Sec. .4. (f ) is a consequence of the structure of the greedy algorithm in Deﬁnition 2.
56 CHAPTER 2. . i ∈ {1. . For each stage of the greedy algorithm. and by G the set chosen by the algorithm in Deﬁnition 2. .t. K}: i f (O) − f (∅) ≤ j=1 ρgj (Gj−1 ) + Kρgi+1 (Gi ) (2. K}.3: f (O) ≤ f (Gi ) + j∈O\Gi ρj (Gi ) By deﬁnition of gi+1 in Deﬁnition 2. i ∈ {1. Denote by O the optimal Kelement subset. parameterized by Z: K P (Z) = min j=1 ρj i (2.8.92) Now consider the linear program in variables ρ1 . Kxi + j=i+1 xj = 1. . i ∈ {1. we can write i f (Gi ) = f (∅) + j=1 ρgj (Gj−1 ) where we use the convention that G0 = ∅. . hence f (O) ≤ f (Gi ) + j∈O\Gi ρgi+1 (Gi ) = f (Gi ) + O\Gi ρgi+1 (Gi ) ≤ f (Gi ) + Kρgi+1 (Gi ) By deﬁnition of the increment function ρ. K} Taking the dual of the linear program: K D(Z) = max j=1 Zxj K s. . . . . . . ρgi+1 (Gi ) ≥ ρj (Gi ) ∀ j. BACKGROUND Proof. . . . K} The system of constraints has a single solution which can found through a backward recursion commencing with xK : xK = K −1 (K − 1)K−1 1 . . . . .t. xK−1 = .8. we can write from line (b) of the proof of Theorem 2. Z ≤ j=1 ρj + Kρi+1 . . x1 = K K2 KK . . . Thus we can write ∀ i ∈ {1. . . ρK+1 . .93) s.
(2. any set O which respects the series of inequalities in Eq. 1− K −1 K K → 1 − 1/e.t. Ax = b x≥0 (2. Ax ≥ b Any problem of the form Eq.632[f (O) − f (∅)] K −1 K K > 1 − 1/e ∀ K > 1. 2.5.Sec.t.92) must have K ρgj (Gj−1 ) = f (G) − f (∅) ≥ 1 − j=1 K −1 K K [f (O) − f (∅)] Finally. is also the solution to P (Z). Thus. (2. 2.5. the primary source for the material is [12]. This section brieﬂy outlines the idea behind linear and integer programming. Linear and integer programming 57 which yields D(Z) = Z 1 − K −1 K K which.95) .1 Linear programming Linear programming is concerned with solving problems of the form: min cT x x (2.94) can be converted to the standard form: min cT x x s.94) s. K → ∞ 2. by strong duality.5 Linear and integer programming Chapter 4 will utilize an integer programming formulation to solve certain sensor resource management problems with manageable complexity. note that 1− Thus f (G) − f (∅) ≥ [1 − 1/e][f (O) − f (∅)] ≥ 0.
Interior point methods have been observed to be able to reduce the duality gap by a factor γ in a problem of size n in an average number of iterations of O(log n log γ). occasionally executing the algorithm to produce additional constraints that were previously violated. and returns one or more violated constraints if any exist. 2. it is desirable to be able to ﬁnd a solution without explicitly considering all of the optimization variables. We assume availability of an eﬃcient algorithm that tests whether incorporation of additional variables would be able to improve on the present solution. this provides an upper bound on the improvement possible through further optimization. Constraint generation is commonly used to solve large scale problems without explicitly considering all of the constraints. The method involves iterative solution of a problem involving the a small subset of the constraints in the full problem. The process terminates when we ﬁnd an optimal solution to the subproblem which does not violate any constraints in the full problem.2 Column generation and constraint generation In many large scale problems.5. producing additional variables to be added to the subset. and a dual problem which provides a lower bound on the optimal objective value achievable. the √ worstcase behavior is O( n log γ). The diﬀerence between the current primal solution and the dual solution is referred to as the duality gap. Constraint generation may be interpreted as column generation applied to the dual problem. . The method involves the iterative solution of a problem involving a small subset of the variables in the full problem.58 CHAPTER 2. Column generation is a method which is used alongside the revised simplex method to achieve this goal. The process terminates when the problem involving the current subset of variables reaches an optimal solution and the algorithm producing new variables ﬁnds that there are no more variables able to produce further improvement. Primaldual methods simultaneously manipulate both the original problem (referred to as the primal problem). We assume availability of an eﬃcient algorithm that tests whether the current solution violates any of the constraints in the full problem. Occasionally this algorithm is executed. The method proceeds by optimizing the problem involving the subset of constraints. BACKGROUND The two primary mechanisms for solving problems of this type are simplex methods and interior point methods.
2.95).3 Integer programming Integer programming deals with problems similar to Eqs.3. Relaxations Two diﬀerent relaxations are commonly used in integer programming: the linear programming relaxation. In this case.96) Some problem structures possess a property in which. The linear programming relaxation is exactly that described in the previous section: solving the linear program which results from relaxing the integrality constraint. then this point is also optimal in the original integer programming problem. It can be shown that the Lagrangian relaxation provides a tighter lower bound than the linear programming relaxation. Linear and integer programming 59 2.. the dual problem also provides a lower bound on the optimal cost attainable in the integer program.t. By weak duality. However. Lagrangian relaxation involves solution of the Lagrangian dual problem. since the primal problem involves a discrete optimization strong duality does not hold.94) and (2. and the Lagrangian relaxation. A common example is network ﬂow problems in which the problem data takes on integer values.e. As described in Section 2. and there may be a duality gap (i. . If the problem possesses the necessary structure that there is an integer point that attains the optimal objective in the relaxed problem.2. however the solution of the linear programming relaxation provides a lower bound to the solution of the integer program (since a wider range of solutions is considered). solution of the linear program (with the integrality constraint relaxed) can provide the optimal solution to the integer program.5. in general there will not be an integer programming solution that obtains the same cost as the solution of the Lagrangian dual). In general the addition of the integer constraint dramatically increases the computational complexity of ﬁnding a solution. when the integer constraint is relaxed. there remains an integer point that attains the optimal objective.Sec.5. In general this will not be the case. Ax = b x≥0 x integer (2. (2. but in which the optimization variables are constrained to take on integer values: min cT x x s.
i. where in the ﬁrst we ﬁx x1 = 0. Then the optimal solution cannot have x1 = 0. 1}N . The basic concept is to divide the feasible region into subregions. we add a constraint that is violated by the solution of the linear program in the current iteration. exploit this fact. Any time we ﬁnd that a subregion R has a lower bound that is greater than a feasible solution found in another subregion.. the integer points which satisfy the various constraints).5. so this subregion may be discarded without further investigation. one of the two most common methods for solving integer programs.t. and in the second we ﬁx x1 = 1..e. it can be diﬃcult to ﬁnd constraints with the necessary characteristics.96) (i. furthermore we ﬁnd that the objective of the subproblem in which we ﬁx the value x1 = 0 is bounded below by 4. For example. BACKGROUND Let X be the set of feasible solutions to the integer program in Eq. and simultaneously search for “good” feasible solutions and tight lower bounds within in each subregion. At each iteration. There are two diﬃculties associated with this method: ﬁrstly. the region R can be discarded.97) s. commencing with the linear programming relaxation. but is satisﬁed by every integer solution in the the original problem (i. If. every point in X ). Suppose we branch on variable x1 . suppose that we are dealing with a binary problem.5. We solve a series of linear programs. . Linear programming relaxations and cutting plane methods are commonly used in concert with a branch and bound approach to provide the lower bounds. Branch and bound Branch and bound is the other commonly used method for solving integer programming problems. and let CH(X ) be the convex hull of these points. we ﬁnd an integer solution that is optimal. (2. and secondly. this is the optimal solution to the original problem.60 Cutting plane methods CHAPTER 2. (2. Then the optimal solution of Eq. and approaches CH(X ).. and. Thus the feasible region is slowly reduced. x ∈ CH(X ) Cutting plane methods.e. it may be necessary to generate a very large number of constraints in order to obtain the integer solution.e. where the feasible set is X = {0. at any stage.96) is also an optimal solution of the following linear program: min cT x x (2. we divide the feasible region up into two subregions. Suppose we ﬁnd a feasible solution within the subregion in which x1 = 1 with objective 3.
the beam scheduling problem is cast as a multiarm bandit. In [55]. The sections below summarize a number of strategies proposed by diﬀerent authors. 57]. Krishnamurthy and Evans present two sensor management methods based upon POMDP methods. A similar method is proposed for adaptive target detection in [60].2. the reward to go function remains piecewise linear concave at each time in the backward recursion and POMDP methods can be applied. Many problems of practical interest cannot be represented with this restriction. which decouples the problem into a series of single object problems coupled only through the search for the correct values of the Lagrange multipliers. 56. can be found using a method similar to that discussed in Section 2. six and 25. In [56.1 POMDP and POMDPlike models In [55. a dual solution. This reveals the primary limitation of this category of work: its inability to address problems with large state spaces. Under these assumptions. although many of the methods do not ﬁt precisely into any one category. Krishnamurthy proposes a similar method for waveform selection problems in which the per stage reward is approximated by a piecewise linear concave function of the conditional PDF. it is proven that there exists an optimal policy in the form of index rule.3. the primary limitation of this method is the requirement for the state space to be small enough that traditional POMDP solution methods should be able to . In [56].6. We coarsely categorize the material. 2.6. In this regime. Related work 61 2. observations needing diﬀerent time durations to complete are naturally addressed.Sec. Computational examples in [55. it is suggested that the continuous kinematic quantities could be discretized into coarse quantities such as “near” and “far”. assuming that the conditional PDF of unobserved objects remains unchanged between decision intervals (this is slightly less restrictive than requiring the state itself to remain unchanged).7 by contrasting our approach to the work described. 59] utilize underlying state space alphabets of three. 2. Again.6. By relaxing the sample path constraints to being constraints in expectation. We conclude in Section 2. and that the index function is piecewise linear concave. 57].6 Related work The attention received by sensor resource management has steadily increased over the past two decades. Casta˜´n [18] formulates the problem of beam scheduling and waveform selection no for identiﬁcation of a large number of objects as a constrained dynamic program. By requiring observations at diﬀerent times in the planning horizon to have identical characteristics. The method is extended in [16] to produce a lower bound on the classiﬁcation error performance in a sensor network.
but the object state can be expanded to incorporate continuous object features in order to regain the required conditional independence). the solution method from multiarm bandit problems method may be applied to beam steering. Information theoretic objectives are commonly used with this method.6. 2.3 Suboptimal control Common suboptimal control methods such as rollout [9] have also been applied to sensor management problems. In the context of object identiﬁcation. after a minor transformation of the cost function.4 Greedy heuristics and extensions Many authors have approached the problem of waveform selection and beam steering using greedy heuristics which choose at each time the action which maximizes some instantaneous reward function. et al [93] observe that.2 Model simpliﬁcations Washburn. The ideas are explored further in [85]. The policy based on this assumption is then used as a base policy in a rollout with a one or two step lookahead. 2. this state space alphabet size restriction precludes addition of latent states such as object features (observations will often be dependent conditioned on the object class. He and Chong [29] describe how a simulationbased rollout method with an unspeciﬁed base policy could be used in combination with particle ﬁltering. this may be expressed equivalently as • Minimizing the conditional entropy of the state at the current time conditioned on the new observation (and on the values of the previous observations) • Maximizing the mutual information between the state at the current time and the new observation . Nedich.6. and simplifying stochastic disturbance to simple models such as detection and no detection. assuming that the state of unobserved objects remains unchanged. The authors also suggest methods for practical application such as selecting as the decision state an estimate of the covariance matrix rather than conditional PDF. 2. and utilize an approximation of the cost to go of a heuristic base policy which captures the structure of the future reward for the given scenario. BACKGROUND address the decoupled single object problem.6. et al [74] consider tracking movestop targets.62 CHAPTER 2.
98) McIntyre and Hintz [69] apply the greedy heuristic. Mahler [66] discusses the use of a generalization of KullbackLeibler distance. Optimization criteria include posterior mean square error and validation gate volume. then the greedy heuristic operates according to the following rule: µg (Xk ) = arg max gk (Xk . An analysis of the signal ambiguity function provides a prediction of the of the posterior covariance matrix. choosing mutual information as the objective in each stage to trade oﬀ the competing tasks of searching for new objects and maintaining existing tracks. i. if gk (Xk . 54] apply a greedy heuristic to the sensor management problem in which the joint PDF of a varying number of objects is maintained using a particle ﬁlter. Related work 63 • Maximizing the KullbackLeibler distance between the prior and posterior distribution. Extensions to a twostep lookahead using additional simulation.e. • Minimizing the KullbackLeibler distance between the posterior distribution and the posterior distribution that would be obtained if all possible observations were incorporated. for sensor resource management within the ﬁnite set statistics a framework. The use of a greedy heuristic is justiﬁed by a restriction to constant energy pulses. The framework is extended to consider higher level goals in [31]. 52. where the number of objects is unknown and the PDF is invariant to any permutation of the objects. et al [44] apply the greedy heuristic with an information objective to landmine detection. The objective function used is Renyi entropy. 2. and a rollout approach using a heuristic reward to go capturing the structure of longterm reward due to expected visibility and obscuration of objects are proposed in [53]. addressing the additional complexity which occurs when sensor motion is constrained. Kreucher. motivations for using this criterion are discussed in [50.6. uk ) k uk ∈Uk k X (2. uk ) is the reward for applying control uk at time k. et al [86] show how the control variates method can be used to . Kershaw and Evans [42] propose a method of adaptive waveform selection for radar/sonar applications. 50. The method is extended to consider the impact of clutter through Probabilistic Data Association (PDA) ﬁlter in [43].. Singh. 51]. Formally. Use of these objectives in classiﬁcation problems can be traced back as early as [61]. 51. Kolba. et al [49. global Csisz´r cdiscrimination. Hintz [32] and Kastella [41] appear to be two of the earliest instances in sensor fusion literature.Sec.
Computational examples demonstrate a reduction of the tree width by a factor of around ﬁve. Zwaga and Driessen examine the problem of selecting revisit rate and dwell time for a multifunction radar to minimize the total duty cycle consumed subject to a constraint on the postupdate covariance [97] and prediction covariance [96].64 CHAPTER 2. Both methods consider only the current observation interval. e. the sensor network object tracking problem is approached by minimizing energy consumption subject to a constraint on estimation performance (measured using Cram´rRao bounds).g. entropy and KullbackLeibler distance. thus the ﬁrst sequence can be pruned from the search tree.. If two candidate sequences obtain covariance matrices P1 and P2 . BACKGROUND reduce the variance of estimates of KullbackLeibler divergence (equivalent to mutual information) for use in sensor management. and evaluating the estimation performance until a feasible solution is found (i. The solution method uses a greedy heuristic. two and three.. their experiments use a greedy heuristic. Logothetis and Isaksson [65] provide an algorithm for pruning the search tree in the problems involving control of linear GaussMarkov systems with information theoretic criteria. Mahalanobis distance. and performs a bruteforce enumeration of all sequences within the planning horizon. Experiments are performed using planning horizons of one. The e method constructs an open loop plan by considering each candidate solution in ascending order of cost. et al [21] examine scheduling of radar and IR sensors for object tracking to minimize the mean square error over the next N time steps. one which meets the estimation criterion). Chhetri. et al [95] discuss object tracking in sensor networks. ensuring that the covariance matrix of a process P meets a speciﬁcation S in the sense that P − S 0. This objective allows the system designer to dictate a more speciﬁc performance requirement. In [20].e. Computational examples use planning horizons of one. and P1 P2 . two and three. . Kalandros and Pao [40] propose a method which allows for control of process covariance matrices. inconsistencies with the measures proposed in this paper are discussed in [25]. The method proposed utilizes a linearized Kalman ﬁlter for evaluation of the error predictions. then the total reward of any extension of the ﬁrst sequence will be less than or equal to the total reward of the same extension of the second sequence. Zhao. proposing methods based on greedy heuristics where the estimation objectives include nearest neighbor.
P [missed detection] = P [false alarm]) according to the minimum probability of error criterion. in which each observation incurs a cost. The suggested solutions include minimizing the resources necessary to satisfy a constraint on entropy. For this latter bound to apply. and there is a maximum total cost that can be expended. In [33]. it is necessary to perform a greedy selection commencing from every threeelement subset. et al prove optimality of greedy methods for the problem of beam scheduling of independent onedimensional GaussMarkov processes when the cost per stage is set to the sum of the error variances. Casta˜´n shows that a greedy heuristic is optimal for no the problem of dynamic hypothesis testing (e.. No clear guidance is given on eﬃcient implementation of the optimization problem resulting from either case.6.4. searching for an object among a ﬁnite set of positions) with symmetric measurement distributions (i.5.g. and how to design the waveform library a priori . Krause and Guestrin [46] apply results from submodular optimization theory to establish the surprising and elegant result that the greedy heuristic applied to the sensor subset selection algorithm (choosing the best nelement subset) is guaranteed to achieve performance within a multiple of (1 − 1/e) of optimality (with mutual information as the objective). and discusses issues which arise when the reward to go values are computed approximately.Sec.6 Other relevant work Berry and Fogg [8] discuss the merits of entropy as a criterion for radar control. 2. et al [70] discuss sensor management for radar. incorporating both selection of which waveform from a library to transmit at any given time. The method does not extend to multidimensional processes. little work has been done to ﬁnd guarantees of performance. Related work 65 2. and demonstrate its application to sample problems. This analysis is applied to the selection of sensor placements in [47].e.5 Existing work on performance guarantees Despite the popularity of the greedy heuristic. Moran. A similar performance guarantee is also established in [46] for the budgeted case. Howard.6.6. as discussed in Section 2. In [17]. . and the sensor placement model is extended to incorporate communication cost in [48]. and selecting which targets to observe in a planning horizon in order to minimize entropy.. Hernandez. et al [30] utilize the posterior Cram´rRao bound as a criterion for e iterative deployment and utilization of sonar sensor resources for submarine tracking. The paper also establishes a theoretical guarantee that no polynomial time algorithm can improve on the performance bound unless P = N P . 2.
many authors have applied greedy heuristics and shorttime extensions (e. using open loop plans over two or three time steps) to sensor management problems using criteria such as mutual information. the elements of which correspond to the modes in which we may operate the sensor in that time slot.6. [46] is the ﬁrst generally applicable performance guarantee to a problem with structure resembling the type which results to sensor management. The branch and bound method used exploits quadratic structure that results when the performance criterion is based on two states (i. providing the surprising result that a similar guarantee applies to the greedy heuristic applied in a sequential fashion. including tighter bounds that exploit either The search is greedy in that proposed locations are accepted with probability 1 if the objective is improved and probability 0 otherwise.. the subset remains ﬁxed in between sensor deployments. Thus it is surprising that little work has been e done toward obtaining performance guarantees for these methods.g. An integer programming formulation using branch and bound techniques enables optimal solution of problems involving 50–70 sensors in tens of milliseconds. 2. 6 .6.6. The typical constraint structure is one which permits selection of one element (or a small number of elements) from each of these sets..7 Contrast to our contributions As outlined in Section 2. BACKGROUND Sensor positions are determined using a greedy6 simulated annealing search. this result is not directly applicable to sensor management problems involving sequential estimation (e. rather than a single set of observations. where the objective is the estimation error at a particular time instant in the future.5. Selection of the subset of sensors to activate during tracking is performed either using bruteforce enumeration or a genetic algorithm. rather than a total of n elements from a larger subset. position in two dimensions).g. where there is typically a set of observations corresponding to each time slot. However. Several extensions are obtained. The analysis in Chapter 3 extends the performance guarantees of [46] to problems involving this structure. even though future observation opportunities are ignored. As discussed in Section 2. The result is applicable to a large class of timevarying models..4. mean square error and the posterior Cram´rRao bound. object tracking).e.66 CHAPTER 2. The problem of selecting the subset of sensors to activate in a single time slot to minimize energy consumption subject to a mean square error estimation performance constraint is considered in [22].
and applicability to closed loop problems. 2. experiments utilize up to 60 time slots. .6. The trade oﬀ between estimation performance and communication cost is formulated by maximizing estimation performance subject to a constraint on energy cost.e. nondecreasing objective function. Examples demonstrate that the bounds are tight. Chapter 5 approaches the problem of sensor management in sensor networks using a constrained dynamic programming formulation. or they do not allow extensions to planning horizons past two or three time slots due to the computational complexity. or the dual of this. Heuristic approximations that exploit the problem structure of tracking a single object using a network of sensors again enable planning over dramatically increased horizons. The method can be applied to any submodular. Related work 67 process diﬀusiveness or objectives involving discount factors. Many of the existing works that provide nonmyopic sensor management either rely on very speciﬁc problem structure. We also show that several of the results may be applied to the posterior Cram´rRao bound. The development in Chapter 4 provides an integer programming approach that exploits submodularity to ﬁnd optimal or nearoptimal open loop plans for problems involving multiple objects over much longer planning horizons. and counterexe amples illuminate larger classes of problems to which they do not apply. minimizing energy cost subject to a constraint on estimation performance.Sec. Finally. yet still captures the essential structure of the underlying trade oﬀ. and hence they are not generally applicable. Simulation results demonstrate a signiﬁcant reduction in the communication cost required to achieve a given estimation performance level as compared to previously proposed algorithms. i. The method is both computable and scalable.. and does not require any speciﬁc structure in dynamics or observation models.
68 CHAPTER 2. BACKGROUND .
3 presents a similar guarantee for problems involving a discounted objective.2 examines a potentially tighter guarantee which exists for processes exhibiting diﬀusive characteristics. where each subset of observations corresponds to a diﬀerent time stage of the problem.2 and 3. This structure is naturally applicable to dynamic models.4 and 2. with both online and oﬄine variants. The additional requirements in Sections 3.4 apply to algorithms with a particular selection structure: ﬁrstly. ignorant of the remainder of the time horizon.. This section develops a bound which is closely related to the matroid selection case. Sections 3. in which we have N subsets of observations and from each we can select a subset of a given cardinality. We commence the chapter by deriving the simplest form of the performance guarantee.1 and 3. Section 3. Section 3. The analysis establishes that the same performance guarantee that applies to the matroid selection problem also applies to this problem. while Section 3. While the results in Sections 3. they are applicable to a wider class of objectives. e. except that we apply additional structure. In Section 3.5 then extends the results of Sections 3.Chapter 3 Greedy heuristics and performance guarantees HE performance guarantees presented in Section 2. Our selection algorithm allows us to select at each time the observation which maximizes the reward at that time.6.1. we demonstrate that the log of the determinant of the Fisher information matrix is also submodular and nondecreasing. where one can select any set within a matroid. in Section 3. and thus that the various guarantees of T 69 .1–3. nondecreasing objective for which the reward of an empty set is zero (similar to the requirements for the results in Sections 2. we can select one observation at each time stage.1–3.5).3 to closed loop policies.4.3 apply to any submodular.5 are discussed as the results are presented.5 are presented in terms of mutual information.g.4. and secondly where one can select any arbitrary subset of a given cardinality.
(d) from the deﬁnition of the greedy heuristic. z1 1 . e Section 3. z2 2 ) ≤ I(X. and (e) from a reverse application of the chain rule. Finally. z22 ) + I(x. z22 .5. not on the resulting observation value). z2 2 z11 ) g g g ≤ 2I(X. z22 z11 ) (where conditioning is on the random variable z11 . z22 ) where (a) results from the nondecreasing property of MI.5. Then the following analysis establishes a performance guarantee for the greedy algorithm: g g o o o o I(X. z11 ) + 2I(X.. z11 . (b) is an application of the MI chain rule. z22 . Let {o1 . Let {g1 . g2 } denote the choice made g by the greedy heuristic. z11 ) and g2 is chosen g g g to maximize I(X. z2 2 z11 . The following assumption establishes the basic structure: we have N 1 Note that this is considering only open loop control.4. where at each step we must choose a single observation (e. z22 z11 ) + I(X.4. z1 1 ) g g g g o o ≤ I(X. consider a simple sequential estimation problem involving two time steps.70 CHAPTER 3. where g1 is chosen to maximize I(X. z2 2 ). the proof directly follows the above steps.g.1 Theorem 3. Thus the optimal performance can be no more than twice that of the greedy heuristic. z1 1 . we will discuss closed loop control in Section 3. the performance of the greedy heuristic is at least half that of the optimal. z11 ) + I(X. which maximizes I(X. (c) results from submodularity (assuming that all observations are independent conditioned on X). or. z1 1 .4.4.9 presents a negative result regarding a particular extension of the greedy heuristic to platform steering problems. 3.1 and Section 3. conversely. z1 1 z11 . . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES Sections 2.4. in which mode to operate a sensor) from a diﬀerent set of observations.1 presents this result in its most general form. 3. z22 z11 ) + I(X. z22 z11 ) (a) (c) (d) (e) g g = 2I(X.3 can also be applied to the determinant of the covariance through the Cram´rRao bound.8 presents examples of some slightly diﬀerent problem structures which do not ﬁt into the structure examined in Section 3.1 A simple performance guarantee To commence. z11 . z11 ) + I(X. 2.1 and 3. z1 1 ) + I(x.2. but are still able to be addressed by the prior work discussed in Section 2. z2 2 ) (b) g g g g g g g o o o = I(X. o2 } denote the optimal choice for o o the two stages. Section 3. The goal is to maximize the information obtained about an underlying quantity X.
zw11 . zNN }}. . i. . A simple performance guarantee 71 sets of observations.. zwM ) ≤ 2I(X. zw11 . . . zw11 . . wM ) (where wi ∈ {1. selecting a single additional observation at each time). . .. . . . zwM ). .1 has performance guaranteed by the following expression: o o g g I(X.1. . . wM ) allows us to visit observation sets more than once (allowing us to select multiple observations from each set) and in any order. {z2 . . .. . . . . we visit the ith set of observations ki times.1. . . . z2 2 }. the one which maximizes M o o I(X.. . . We are now ready to state the general form of the performance guarantee.Sec. . . zwM ) M M o o where {zw11 . . zi i }). There are N sets of observations. . thus N i=1 ki = M . The greedy heuristic operating on this structure is deﬁned below.. . {{z1 . the greedy heuristic in Deﬁnition 3. . n n 1 1 Assumption 3. . zwj zw11 .. . Obviously we require {j ∈ {1. .. . . The sequence (w1 . in the ith stage we select a previously unselected observation out of the wi th set). Theorem 3. M . The abstraction of the observation set sequence (w1 . . .e. Deﬁnition 3. . zwM } is the optimal set of observations. . . . . and we can select a speciﬁed number of observations from each set in an arbitrary order. . . . .1.1. .nwj } g We assume without loss of generality that the same observation is not selected twice since the reward for selecting an observation that was already selected is zero. . . . which are mutually independent conditioned on the quantity to be estin 1 mated (X). .e. N } ∀ i) speciﬁes the order in which we visit observation sets using the greedy heuristic (i. . Any ki observations can be chosen out of the ith set ({zi . n 1 {zN . zwj−1 ) j−1 u∈{1. 3. M }wj = i} = ki ∀ i (i.e. The greedy heuristic operates according to the following rule: u g gj = arg max I(X. z1 1 }. . . .1. . Under Assumption 3.
2 Tightness of bound The bound derived in Theorem 3. to achieve the guarantee on matroids it is necessary to consider every observation at every stage of the problem. . zw11 . (c) results from submodularity. . all of the previous works described in Section 2. it is far more desirable to be able to proceed in a dynamic system by selecting observations at time k considering only the observations available at that time. . . as the following example shows. 3. . . zwjj zw11 . zwj−1 ) j−1 g g g g = 2I(X. . . . However. . . . zwj−1 ) + I(X. . zwj−1 ) + I(X. The performance guarantee is obtained through the following steps: o o I(X.5 − . zwjj zw11 . . . zwjj zw11 . b]T where a and b are independent binary random variables with P (a = 0) = P (a = 1) = 0. . . zwjj zw11 . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES Proof. Computationally. . zwM .6. . .3) provides another algorithm with the same guarantee for problems of this structure. .1.4 do just that).5 + for some > 0. Example 3.5 and P (b = 0) = 0. The freedom of choice of the order in which we visit observation sets in Theorem 3. . zwM ) M (a) g g o o ≤ I(X. disregarding future time steps (indeed.4. . . . (d) from Deﬁnition 3. . . zw11 .1. zwjj zw11 . .1.1 Comparison to matroid guarantee The prior work using matroids (discussed in Section 2.1. zwj−1 ) j−1 j−1 M j=1 (c) M g g o o (b) = ≤ j=1 (d) M g g I(X. . zwM ) M M M g g g o I(X. . .1 extends the performance guarantee to this commonly used selection structure. P (b = 1) = 0.1 can be arbitrarily close to tight. . . zw11 . zwM ) M where (a) is due to the nondecreasing property of MI. . . . . and (e) from a reverse application of the chain rule. 3. . . (b) is an application of the MI chain rule. zw11 . zwM . .72 CHAPTER 3. We have two sets of observations with n1 = 2. . zwj−1 ) j−1 j g g o g ≤2 j=1 (e) g I(X.zw11 . Consider a problem with X = [a.
3. At the second stage. zwM ) M + i=1 j=1 N g g I(X. .e. At the second stage we have one choice. or z1 = b for reward I(X. Under the same assumptions as Theorem 3. In the ﬁrst set of observations we may measure z1 = a for 1 2 2 reward I(X. as it yields a higher 2 reward (1) than z1 = b (1 − δ( )). 1 The greedy algorithm selects at the ﬁrst stage to observe z1 = a. . The total reward is 2 − δ( ). . N } ¯ ¯ deﬁne ki = min{ki . zi i zw11 . 3. and for each j ∈ {1. . Eq. ni − ki }. . . ki } deﬁne gi = ¯j arg max u∈{1. . where δ( ) > 0 ∀ > 0.3 Online version of guarantee Modifying step (c) of Theorem 3. .3) uses the fact that ¯j gj ¯ gj ¯ g g g g I(X. .ni }−{¯i l<j} gl u g g I(X.2) g g ≤ I(X. . apply: N o o I(X. for each i ∈ {1. (3. . followed by the second set. By choosing arbitrarily close to zero... zi 1 zw11 .1) Then the following two performance guarantees. we can also obtain an online performance guarantee. .1 through submodularity and the deﬁnition of gi in Eq. . . zi i zw11 .. .4 and 3. z1 ) = H(b) = 1 − δ( ). which selects observation z1 = b for 1 reward 1 − δ( ). . (3.1.2) is obtained directly from step (b) of Theorem 3.. zwM ) M (3. . we may make the ratio of optimal reward to greedy reward.5). zw11 . which are computable online. . The total reward is 1. . zwM ) + M i=1 g ¯ g g ¯ ki I(X. . zwM ) M 1 (3. hence the observation at the second stage yields zero reward. Theorem 3. A simple performance guarantee 73 1 n2 = 1 and k1 = k2 = 1. . zi zw11 . zwM ) M ¯ ki ≤ g g I(X. . which will often be substantially tighter in practice (as demonstrated in Sections 3. zw11 . . i. we visit the ﬁrst set of observations once. .1). . .3) Proof.1. arbitrarily close to 2. zwM ) M gj ¯ (3. 2).. . 1 z2 = a.2.1. 2 − δ( ). . . . . . .Sec.1. . .1. zi 2 zw11 . (3. 2 Compare this result to the optimal sequence. . zwM ) ≥ I(X. the algorithm already has the exact value for a. zwM ) for any j1 ≤ j2 . M M . . z1 ) = H(a) = 1. Our walk is w = (1. .1. and δ( ) → 0 as → 0. and then gains a reward of 1 from the second observation z2 . . . zw11 . The expression in Eq.
.05 Fig. commencing from k an initial distribution x0 ∼ N {x0 . Q} are independent white Gaussian noise processes.1. y x k k where velocity is modelled as a continuoustime random walk with constant diﬀusion strength q (independently in each dimension). not just the choice made by the greedy heuristic in Deﬁnition 3. Denoting the sampling interval as T = 1.1 shows the bound on the fraction of optimality according to the guarantee of .05 0 0 P0 = 0 0 0. The state k i i xi is assumed to be position and velocity in two dimensions (xi = [pi vx pi vy ]T ). The online bound will tend to be tight in cases where the amount of information remaining after choosing the set of observations is small. P0 }. k. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES The online bound in Theorem 3.4 Example: beam steering Consider the beam steering problem in which two objects are being tracked. Q = q 2 F= 3 2 0 0 T3 T2 0 0 1 T T2 0 0 0 1 T 0 0 2 At each time instant we may choose between linear Gaussian measurements of the position of either object: zi = k 1 0 0 0 0 0 1 0 xi + v i k k where v i ∼ N {v i .1 0.1 0 0 0. 3. the corresponding discretetime model is: 3 T T2 1 T 0 0 0 0 T32 2 0 1 0 0 T 0 0 . and position is the integral of velocity. gM ). 3. . . 0. 0. independent k k of wj ∀ j.74 CHAPTER 3. (g1 .1 0 0 0. . where 0.2 can be used to calculate an upper bound for the optimal reward starting from any sequence of observation choices. I} are independent white Gaussian noise processes.5 0.5 0. 0.1 0. Each object evolves according to a linear Gaussian process: xi = Fxi + wi k+1 k k where wi ∼ N {wk . The objects are tracked over a period of 200 time steps.1.
6 10−6 10−5 10−4 10−3 10−2 10−1 Diﬀusion strength q 100 101 102 Figure 3.66 0.2.1. (a) shows total reward accrued by the greedy heuristic in the 200 time steps for diﬀerent diﬀusion strength values (q).72 0.76 Fraction of optimality 0.(a) Reward obtained by greedy heuristic and bound on optimal 2200 Greedy reward 2000 Bound on optimal 1800 1600 1400 1200 1000 800 600 400 200 0 10−6 10−5 10−4 Total MI 10−3 10−2 10−1 Diﬀusion strength q 100 101 102 (b) Factor of optimality from online guarantee 0. (b) shows the ratio of these curves.62 0. .64 0.74 0. providing the factor of optimality guaranteed by the bound. and the bound on optimal obtained through Theorem 3.7 0.68 0.
we may steer our sensor to measure the depth of any point within a given region around the current position (as depicted by the dotted ellipses). The images display the marginal entropy of each cell in the grid.2(a)) to map the depth of the ocean ﬂoor in a particular region. 3. the quantity being estimated. the greedy controller obtains close to the optimal amount of information as diﬀusion decreases since the measurements that were declined become highly correlated with the measurements that were chosen. 0. . while Fig. the structure of the problem can be seen to be waveform selection: at each time we choose between observations which convey information about diﬀerent aspects of the same underlying phenomenon. 3..2 as a function of the diﬀusion strength q. x1 .5 Example: waveform selection Suppose that we are using a surface vehicle travelling at a constant velocity along a ﬁxed path (as illustrated in Fig. 3. Suppose that we model the depth of the ocean ﬂoor as a GaussMarkov random ﬁeld with a 500×100 thin membrane grid model where neighboring node attractions are uniformly equal to q.2 for each time step when q = 100 and R = 1/40. One cycle of the vehicle path takes 300 time steps to complete. The tightness of the online bound depends on particular model characteristics: if q = R = 1.1.e. x2 .5). Fig. 3. x1 . at any position on the path (such as the points denoted by ‘ ’). x2 ] 1 1 2 2 200 200 Examining Fig.8× the optimal reward. Assume that. In this example. Deﬁning the state X to be the vector containing one element for each cell in the 500×100 grid.76 CHAPTER 3. . . .2(b) shows the accrual of reward over time as well as the bound on the optimal sequence obtained using Theorem 3. X.1. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES Theorem 3. and that we receive a linear measurement of the depth corrupted by Gaussian noise with variance R. 3. 3.2(c) shows the ratio between the achieved performance and the optimal sequence bound over time. then the guarantee ratio is much closer to the value of the oﬄine bound (i. Fig.3 shows snapshots of how the uncertainty in the depth estimate progresses over time. is the combination of the states of both objects over all time steps in the problem: X = [x1 . The graph indicates that the greedy heuristic achieves at least 0. . x2 .
(a) Region boundary and vehicle path Accrued reward (MI) (b) Reward obtained by greedy heuristic and bound on optimal 250 200 150 100 50 0 0 Greedy reward Bound on optimal 100 200 300 Time step 400 500 600 Fraction of optimality (c) Factor of optimality from online guarantee 0. When the vehicle is located at a ‘ ’ mark.82 0.72 0. any one grid element with center inside the surrounding dotted ellipse may be measured.8 0. (c) shows the ratio of these two curves. and the bound on the optimal sequence for the same time period. (b) graphs reward accrued by the greedy heuristic after diﬀerent periods of time.74 0. .78 0.84 0. starting from the left end of the lower straight segment). (a) shows region boundary and vehicle path (counterclockwise. providing the factor of optimality guaranteed by the bound.7 0 100 200 300 Time step 400 500 600 Figure 3.76 0.2.
Vehicle path is clockwise. while red indicates the highest. commencing from topleft. 225 and 525 steps. Blue indicates the lowest uncertainty.(a) 75 steps (b) 225 steps (c) 525 steps Figure 3. Marginal entropy of each grid cell after 75. .3. Each revolution takes 300 steps.
the kinematic quantities of interest evolve according to a diﬀusive process. . which are independent of each other and all other nodes and observations in the graph conditioned on xi . Intuitively. In Section 3. . Let (w1 . . .3.1 we present an online computable guarantee which exploits the characteristic to whatever extent it exists in a particular selection problem. we exploit an additional property of mutual information which holds under Assumption 3. . nondecreasing objective for which the reward of an empty set was zero. In Section 3. the proof of Theorem 3. The results of Section 3. . . 3.3. Exploiting diﬀusiveness 79 3.1 were applicable to any submodular. we examine it for one simple model in Section 3. .2. The basic model structure is set up in Assumption 3. zi i }. one would expect that a greedy algorithm would be closer to optimal in situations in which the diﬀusion strength is high. . L} ∀ i).3. .2. zi z A ) (3. . The simpler cases involving trees and chains are discussed in the sequel.3 exploits this fact. . we assume that each node is visited exactly once. n 1 Assume that each node has a set of observations {zi .5) in the latent structure.Sec.2.2. This is a strong assumption that is diﬃcult to establish globally for any given model. Let the latent structure which we seek to infer consist of an undirected graph G with nodes X = {x1 .1. The theorem is limited to only choosing a single observation from each set. stated in Theorem 3. xL }. . The general form of the result. wL ) be a sequence which determines the order in which nodes are visited (wi ∈ {1. in which correlation between states at diﬀerent time instants reduces as the time diﬀerence increases.2 we then specialize the assumption to cases where the latent graph structure is a tree or a chain. The general form of the diﬀusive characteristic is stated in Assumption 3. zi z A ) = H(zi z A ) − H(zi X. This section develops a performance guarantee which exploits the diﬀusiveness of the underlying process to obtain a tighter bound on performance. . z A ) j j = H(zi z A ) − H(zi xj ) j = I(xi . .2.2 Exploiting diﬀusiveness In problems such as object tracking.2. . Assumption 3.2. We may select a single observation from each set.4) We then utilize this property in order to exploit process diﬀusiveness. In this section. deals with an arbitrary graph (in the sense of Section 2. with an arbitrary interconnection structure. that for any set of conditioning observations z A : j j j I(X. .
zwj+1 . zwj−1 . . . . zwj−1 ) + I(xwj+1 . zwj−1 ) + I(xwj+1 . . . . . . assume that (as trivially holds for j = 1). . . . . .80 CHAPTER 3. zwL ) L L Proof. . zwj−1 ) j−1 j−1 o g + I(xwj+1 . zwjj ) j−1 j+1 L o g g g o g g g o g (f ) where (a) and (f ) result from the chain rule. zwj−1 ) + I(xwj+1 . . . . . Theorem 3. . . (3. zwL zw11 . . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES Assumption 3. zwj+1 . . . . zwL ) ≤ (1 + α)I(X. . . . . . . . . . . zwjj zw11 .3. . from independence of zwjj and zwjj on the reo o maining latent structure conditioned on xwj . zwj−1 ) j+1 j−1 j−1 L g o g ≤ I(xwj . . . . Under Assumptions 3. . . . . . . zwjj . . zwL zw11 . zwj−1 ) + I(X. . . . zwj−1 ) + I(xN (wj ) . zw11 . . . . . . let the graph G have the diﬀusive property in which there exists α < 1 such that for each i ∈ {1. zw11 . . . . . . . . zwjj zw11 . . . . zwj−1 ) j−1 j+1 j−1 L g g = I(xwj . Theorem 3. . xwL . . j g j g I(xN (wi ) . . . and from independence of {zwj+1 .1 satisﬁes the following guarantee: o o g g I(X. . . .2 and 3. zwL zw11 . . . . . To establish an induction step.3. . .3 uses this property to bound the loss of optimality associated with the greedy choice to be a factor of (1 + α) rather than 2. zwL zw11 . . . . . . . . . . . . zwj−1 ) + I(X. . L} and each j observation zwi at node xwi . xwL . . zwL zw11 . . . . . . zwjj ) j+1 j−1 j−1 L g g o g g o g o g o o g g o g g o g ≤ I(xwj . zw11 . . . xwL . . .2. . zwj−1 ) + I(xwj+1 . . . . . the performance of the greedy heuristic in Deﬁnition 3. . . zwj+1 . . . zwjj ) j+1 L o g g g g g g g o g (c) (d) (e) g o g ≤ (1 + α)I(xwj . zwL ) ≤ (1 + α)I(X. zwL } j+1 L . zw11 . zwL zw11 . . . zwL zw11 . . . . . . . . . . zwj+1 . . . .5) Manipulating the second term in Eq. zwj−1 ) j−1 j−1 L L (3. Under the structure in Assumption 3. zwL zw11 . . j Assumption 3. zwjj . . . .3 states that the information which the observation zwi contains about xwi is discounted by a factor of at least α when compared to the information it contains about the remainder of the graph. . . . zwjj zw11 . . . .5). . . . xwL .3. . zwj+1 . zwjj ) j−1 j+1 L g o g = (1 + α)I(X. . zwj+1 . zwjj zw11 . zwjj zw11 . xwL . zwi−1 ) ≤ αI(xwi . . o g I(X. . . zwi zw11 . zwj−1 ) j−1 L (a) (b) o g g = I(xwj . o o g o g I(X. . zwjj . . zwi−1 ) i−1 i−1 g g where xN (wi ) denotes the neighbors of node xwi in the latent structure graph G. . . . zwjj zw11 . zwjj zw11 . zwi zw11 .
zwj−1 ) j−1 g g Proof. zwii zw11 . . (3. . zwj−1 ) (3. zwL ) ≤ (1 + α)I(X. . . .6) trivially holds for j = 1. zwj+1 .3. zw11 . . we obtain the desired result. . zw11 .1 Online guarantee For many models the diﬀusive property is diﬃcult to establish globally. . .5) with the ﬁnal result in (f ). 3. one may obtain an online computable bound which does not require the property of Assumption 3. .1: L o o I(X. . (b) from submodularity and the deﬁnition of the greedy heuristic.6) j−1 L o g g g g I(xN (wi ) . . zwi−1 ) i−1 g Eq. Theorem 3.4. Exploiting diﬀusiveness 81 on the remaining latent structure conditioned on {xwj+1 . and (e) from the assumed diﬀusive property of Assumption 3.2. .3. . . but exploits it to whatever extent it exists in a particular selection problem. . .6) which establishes that Eq. . . . Applying the induction step L times. (c) from the nondecreasing property. we obtain a strengthened bound: o o g o g I(X. applies to the greedy heuristic of Deﬁnition 3.3. . .2. Commence by assuming (for induction) that: j−1 o o g I(X. . . g (d) from the chain rule (noting that zwjj is independent of all nodes remaining to be visited conditioned on the neighbors of xwj ). . . . . zw11 . . xwL }. .Sec. zw11 . zwjj zw11 . (3. zwL zw11 . . . . .3 obtains an upper bound to the the ﬁnal term in Eq. . . zwL ) L + j=1 g I(xN (wj ) . . zwjj ) + I(X. zwL zw11 . zwjj ) j+1 L L g o g Applying this induction step L times. If we assume that it holds for j then step (d) of Theorem 3. zwL ) ≤ I(X. Following from step (d) of Theorem 3. zw11 . the following performance guarantee. . .6) also holds for (j + 1). . The proof directly follows Theorem 3. . which can be computed online. . (3. . zwjj . . . . . but not requiring the diﬀusive property of Assumption 3. zwj−1 ) + j−1 L i=1 o g + I(X. .3 to hold globally. Under the model of Assumption 3.2. zwL ) L ≤ g g I(X. . . . . . 3. we obtain the desired result. zw11 . (3.3. Replacing the second term in Eq.
.e. Under the model of Assumption 3.5. . The modiﬁed statement of Theorem 3. . the proof passes directly once xN (wi ) is replaced by xπ(wi ) . the proof passes directly once xN (wj ) is replaced by xπ(wj ) in step (d). we assume that each node is visited exactly once.4.5.4 is included below. Theorem 3. we may avoid including all neighbors a node in the condition of Assumption 3. wL ) be a sequence which determines the order in which nodes are visited (wi ∈ {1. . . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES 3.. Let (w1 . where the ith node corresponds to time i.3 and in the result of Theorem 3. i. . . .5. . . zwL ) L ≤ g g I(X. the following performance guarantee. . Let the latent structure which we seek to infer consist of an undirected graph G with nodes X = {x1 . . . L} and each j observation zwi at node xwi . . Assumption 3. . zi i }. . . . . .4. .3 holds under Assumptions 3. the sequence is simply wi = i. . . i. applies to the greedy heuristic of Deﬁnition 3.4. but not requiring the diﬀusive property of Assumption 3. . . that no node is visited before all of its children have been visited.2 Specialization to trees and chains In the common case where the latent structure X = {x1 .5. The modiﬁed assumptions are presented below. . . which can be computed online. j g j g I(xπ(wi ) . . . .e. zw11 . zwL ) L + j=1 g I(xπ(wj ) . xL }. zwjj zw11 . zwi zw11 . . . zwi−1 ) ≤ αI(xwi . Theorem 3. . xL } forms a tree. we visit the nodes in time order. . zw11 . which form a tree. In this case. which are independent of each other and all other nodes and observations in the graph conditioned on xi . An additional requirement on the sequence (w1 . zwi−1 ) i−1 i−1 g g where xπ(wi ) denotes the parent of node xwi in the latent structure graph G. Again.. . . Assumption 3. . .82 CHAPTER 3. . wL ) is necessary to exploit the tree structure. . . L} ∀ i).2. . .4 and 3. let the graph G have the diﬀusive property in which there exists α < 1 such that for each i ∈ {1. . zwi zw11 . Under the structure in Assumption 3. . . We assume that the sequence is “bottomup”. Choosing the ﬁnal node in the . Assume that each n 1 node has a set of observations {zi . zwj−1 ) j−1 g g The most common application of the diﬀusive model is in Markov chains (a special case of a tree).1: L o o I(X. . .4. We may select a single observation from each set.
Substituting Eq. some intuition may be gained from the structure of the condition which results. . this sequence respects the bottomup requirement. 3. . . 0. (3. zk z11 . zk−1 ) = j g k−1 I(xk+1 .9) = hj xk + j vk . .Sec. j The greedy heuristic in this model corresponds to choosing the observation zk with the smallest normalized variance rj . j j ˜ ˜ where wk ∼ N {wk . The dynamics model and observation models are given by: xk+1 = f xk + wk j zk (3.2. since: P2 1 log 1 + (˜j +˜)P +˜j q 2 r q r ˜ lim =1 1 P P →∞ 2 log 1 + rj ˜ Thus we seek a range for Pkk−1 such that there does exist an α < 1 for which Eq. then the diﬀusive property is established as long as the covariance remains within this range during operation.7) can be evaluated as: j g k−1 I(xk .10) (3. . .7) is satisﬁed.12) . . . and the diﬀusive requirement becomes: j g j g k−1 k−1 I(xk+1 . zk−1 ) = g g Pkk−1 1 log 1 + 2 rj ˜ 2 Pkk−1 1 log 1 + j 2 (˜ + q )Pkk−1 + rj q r ˜ ˜ ˜ (3. Exploiting diﬀusiveness 83 chain to be the tree root. we need to ﬁnd the range of P for which 1+ bα (P ) = P2 (˜j +˜)P +˜j q r q r ˜ P α 1 + rj ˜ ≤1 (3. (3. (3. rj }. zk−1 ) ≤ αI(xk . We let q = q/f 2 and rj = rj /(hj )2 .2. zk z11 . . 0. . .3 Establishing the diﬀusive property As an example. Nevertheless.11) If Pkk−1 can take on any value on the positive real line then no α < 1 exists. n} (3. . . The performance guarantee is uninteresting in this case since the greedy heuristic may be easily shown to be optimal. Denoting the covariance of xk conditioned on the prior ˜ observations as Pkk−1 . . q} and vk ∼ N {vk .7) and exponentiating each side. . we establish the diﬀusive property for a simple onedimensional stationary linear GaussMarkov chain model. zk z11 .7) 3. . (3. If such a result is obtained. .8) j ∈ {1. (3. zk−1 ) g g (3.11) into Eq. . . . zk z11 . the terms involved in Eq.10) and Eq.
For a given value of P .4 and 3. many systems will satisfy some degree of diﬀusiveness as long as all states remain within some certainly level.5 Example: bearings only measurements Consider an object which moves in two dimensions according to a Gaussian random walk: xk+1 = xk + wk . (3. Furthermore. Eq.4. (3.1. there will be a unique nonzero value of P for which bα (P ) = 1. position states are directly observable. and hence select diﬀerent operating points on the curve. for any P ∈ [0. Hence bα (P ) reduces from a value of unity initially before increasing monotonically and eventually crossing unity. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES d Note that bα (0) = 1 for any α. P0 ]. 3. we can easily solve Eq. while velocity states are only observable through the induced impact on position. 3.4 as a function of the covariance upper limit P0 for various values of r and q .2.12) is satisﬁed for any α ∈ [α∗ (P0 ).13) Hence. nor for nonlinear and/or nonGaussian systems. For any given α. Diﬀerent values of the dynamics model parameter f will ˜ ˜ yield diﬀerent steady state covariances. dP bα (P ) may be shown to be negative for P ∈ [0. the bound tightens as the diﬀusion strength increases.2.2. the general intuition of the single dimensional linear Gaussian case may be applied.5.12) to ﬁnd the smallest value of α for which the expression is satisﬁed: α∗ (P ) = log (P +˜j )(P +˜) r q (˜j +˜)P +˜j q r q r ˜ log(P + rj ) ˜ (3. As expected. ∞) (for some positive a). The guarantee is substantially tighter when all states are directly observable. In this example. While closedform analysis is very diﬃcult for multidimensional linear Gaussian systems.5 demonstrate the use of the online performance guarantee in cases where the diﬀusive condition has not been established globally. The examples in Sections 3. 1]. The strongest diﬀusion coeﬃcient is shown in Fig.2. The performance bound obtained using the online analysis form Theorem 3. 3. a) and positive for P ∈ (a.4 Example: beam steering revisited Consider the beam steering scenario presented in Section 3.5 is shown in Fig.84 CHAPTER 3. For example. as shown in the following example. 3.
2 0.3 0. Note that lower values of α∗ correspond to stronger diﬀusion.5 0.4 Diﬀusive coeﬃcient α∗ 0. with r = 1.Lowest diﬀusive coeﬃcient vs covariance limit 0.4. ˜ ˜ .1 q = 20 ˜ q = 10 ˜ q=5 ˜ 0 0 5 10 15 20 25 Covariance limit P0 Figure 3. Strongest diﬀusive coeﬃcient versus covariance upper limit for various values of q .
(a) Reward obtained by greedy heuristic and bound on optimal 2000 1800 1600 Total MI 1400 1200 1000 800 600 400 100 101 Diﬀusion strength q (b) Factor of optimality from online diﬀusive guarantee 0.5. (a) shows total reward accrued by the greedy heuristic in the 200 time steps for diﬀerent diﬀusion strength values (q).5.62 0.56 0.64 Fraction of optimality 0.5 100 101 Diﬀusion strength q 102 102 Greedy reward Bound on optimal Figure 3.54 0. (b) shows the ratio of these curves. and the bound on optimal obtained through Theorem 3.58 0.6 0.52 0. providing the factor of optimality guaranteed by the bound. .
±100).Sec. The initial position of the object is distributed according to x0 ∼ N {x0 .77 of optimal with a measurement standard deviation of 0. but that only one observation may be utilized at any instant. Discounted rewards 87 where wk ∼ N {wk .uM 2 u u min I(X. 0. zw11 . z1 1 . Assume that bearing observations are available from four sensors positioned at (±100. . 3. The resulting problem is intractable. hence the bounds are approximate (the EKF variances were used to calculate the rewards). 3. within the planning horizon. hence the approximate method using MI is an appealing substitute.6(b) shows the ratio of the greedy performance to the upper bound on optimal.5. it is not only desirable to maximize the information obtained within a particular period of time. . .uM u u k−1 I(X.3 Discounted rewards In Sections 3.3.. Simulations were run for 200 time steps. our optimization problem is changed from: M u1 ... hence it was necessary to use multiple Monte Carlo simulations to compute the online guarantee. and the o o optimal open loop reward to which we compared was I(X. 0. the low degree of nonlinearity in the observation model provides conﬁdence that the inaccuracy in the rewards is insigniﬁcant. reducing the value of information obtained later in the problem. Fig. In this example.1 degrees.. .. Tracking was performed using an extended Kalman ﬁlter. . the information is obtained. Subsequently. In some M sequential estimation problems.. zk k z1 1 . I}. we utilized the closed loop greedy heuristic examined in Section 3. . . . but also how quickly.6(a) as a function of the measurement noise standard deviation (in degrees). One way2 of capturing this notion is to incorporate a discount factor in the objective. . I}.. The total reward and the bound obtained from Theorem 3.5 are shown in Fig. the same eﬀect occurs if the observation noise is held constant and the dynamics noise increased. The results demonstrate that the performance guarantee becomes stronger as the measurement noise decreases..1 and 3. choosing as the objective (to be minimized) the expected time required to reduce the uncertainty below a desired criterion. .2 the objective we were seeking to optimize was the mutual information between the state and observations through the planning horizon. . zwM ). zNM ) = min u1 . zk−1 ) k=1 u Perhaps the most natural way of capturing this would be to reformulate the problem. . demonstrating that the greedy heuristic is guaranteed to be within a factor of 0. In this scenario. . 3. 3.
5 10−1 100 Measurement noise standard deviation σ 101 Figure 3.8 Fraction of optimality 0.6 0.55 0.75 0. and the bound on optimal obtained through Theorem 3. . (a) shows average total reward accrued by the greedy heuristic in the 200 time steps for diﬀerent diﬀusion strength values (q).5.65 0. (b) shows the ratio of these curves. providing the factor of optimality guaranteed by the bound.6.7 0.(a) Reward obtained by greedy heuristic and bound on optimal 500 450 400 350 300 250 200 150 100 50 0 10−1 Greedy reward Bound on optimal Total MI 100 Measurement noise standard deviation σ (b) Factor of optimality from online guarantee 101 0.
3. The optimal reward to go for the discounted sequence satisﬁes the relationship: o g g o Ji+1 [(g1 . . . zk k z1 1 .. . zwk . gi−1 )] ≤ αi I(X. .uM M min u u k−1 αk−1 I(X. . . . . Discounted rewards 89 to: u1 . zwjj zw11 . . . zwii zw11 . . . . . .6 establishes. uk ): M Jio [(u1 . We deﬁne the abbreviated notation for the optimal reward to go from stage i to the end of the problem conditioned on the previous observation choices (u1 . . zwj−1 ) j−1 g g We will use the following lemma in the proof of Theorem 3. . . zwj−1 ) j−1 1 k o o and the reward so far for the greedy heuristic in i stages: i g J→i j=1 g αj−1 I(X. . . Lemma 3.6. gi )] i−1 g . . . . . . zwii . uk )] j=i o u u αj−1 I(X. . . as Theorem 3. . Not surprising. zwi−1 ) + Ji+1 [(g1 .. . .Sec.1. . .. the performance guarantee for the greedy heuristic becomes tighter as the discount factor is decreased. . . . zwjj zw1 . zk−1 ) k=1 u where α < 1. 3. ..
zwjj zw11 . .90 CHAPTER 3. gi )] i−1 o g g o where (a) results from adding and subtracting αi I(X. . zwi+1 . . . zwj−1 ) j−1 i+1 i−1 o g o o o g g g g ≤ αi I(X. . . zwi−1 ) + I(X. . . . .1 has performance guaranteed by the following expression: M o αj−1 I(X. . . . . . . . . . . . . zwjj zw11 . . . zwi−1 . . . . zwi−1 . . i−1 i+1 M g (b) from the nondecreasing property of MI (introducing the additional observation zwii into the ﬁrst term). zwii zw11 . zwii . the greedy heuristic in Deﬁnition 3. . . zwi+1 . (d) from submodularity (adding the g conditioning zwii into the second term. zwjj zw11 . . . . . . zwi−1 . . gi−1 )] = j=i+1 (a) o g = αi I(X. zwj−1 ) j−1 g g . . . (c) from the MI chain rule. zwi−1 ). zwjj zw11 . . . . Expanding the lefthand side and manipulating: M o Ji+1 [(g1 . . . . . . . .1. . Under Assumption 3. zwi−1 ) + Ji+1 [(g1 . and (e) o from cancelling similar terms and applying the deﬁnition of Ji+1 . . . . .6. . . zwM zw11 . . zwjj zw11 . . . . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES Proof. . zwM zw11 . . zwi+1 . zwM zw11 . . zwi+1 . zwii ) i+1 i−1 M M g o + j=i+1 (e) g g (αj−i−1 − 1)I(X. . . . noting that the coeﬃcient is negative). . zwM zw11 . . zwi+1 . . . . . zwi−1 . zwi+1 . zwii . zwj−1 ) i−1 i+1 j−1 o g o o g o g ≤ αi I(X. . . zwj−1 ) i−1 i+1 j−1 o g o o + j=i+1 (b) g (αj−i−1 − 1)I(X. . . . . zwj−1 ) i−1 i+1 j−1 o g o o o g g g g = αi I(X. . . zwii zw11 . . zwii zw11 . . . . . zwj−1 ) ≤ (1 + α) j−1 j=1 j=1 o o M g αj−1 I(X. . . . . . . . . zwjj zw11 . . . . zwi+1 . zwi−1 ) i+1 i−1 M M o g g αj−1 I(X. . . . zwi+1 . Theorem 3. zwii ) i+1 i−1 M M g o + j=i+1 (d) g (αj−i−1 − 1)I(X. zwi+1 . . zwi−1 ) + I(X. zwi−1 ) i+1 i−1 M M o g + j=i+1 (c) g (αj−i−1 − 1)I(X. . . zwi+1 . . . zwj−1 ) j−1 i+1 g o o o g g o = αi I(X. . zwM zw11 . zwjj zw11 . . . . .
zwi−1 ) + Ji+1 [(g1 . gi )] where (a) results from the deﬁnition of Jio . . . . . The structure necessary is given in Assumption 3. . . zwM } is the optimal set of observations. . . .4. Suppose it is true for i. (c) results from submodularity (allowing us to remove the conditioning on oi o from each term in Ji+1 ) and (d) from Lemma 3.. 3. . zwjj zw11 . we seek to prove: g o J1 [∅] ≤ (1 + α)J→M The proof follows an induction on the expression g o J1 [∅] ≤ (1 + α)J→i−1 + Jio [(g1 . . . Applying the induction step M times we get the desired result. . . . . 3. . zwii zw11 . . . . In the abbreviated notation. . gi−1 )] i−1 g g g o ≤ (1 + α)J→i−1 + (1 + α)αi−1 I(X. .4 Time invariant rewards The reward function in some problems is wellapproximated as being time invariant. gi−1 . zwii zw11 . . . zwii zw11 . zwi−1 ) + Ji+1 [(g1 . a tighter bound.Sec. . . zwj−1 ) j−1 j=1 o o Proof. In this case.6. . . gi )] i−1 g g (c) (d) g o = (1 + α)J→i + Ji+1 [(g1 . . gi−1 )] which is trivially true for i = 1. .1. Time invariant rewards 91 o o where {zw11 . . . . . . gi−1 . the one which maximizes M M o αj−1 I(X. gi−1 )] (a) (b) g g o o = (1 + α)J→i−1 + αi−1 I(X. i. . . . . . . zwi−1 ) + Ji+1 [(g1 . may be obtained. . . .e. . . . . zwi−1 ) + Ji+1 [(g1 . . oi )] i−1 g g g g o g ≤ (1 + α)J→i−1 + αi−1 I(X. . . zwii zw11 . . manipulating the expression we obtain: g o J1 [∅] ≤ (1 + α)J→i−1 + Jio [(g1 . (b) results from the deﬁnition of the greedy heuristic. oi )] i−1 g g g o ≤ (1 + α)J→i−1 + αi−1 I(X. (1 − 1/e)×. .
. . the observation model for z i. z n. . For example.1 . .1 relied on nonstationary observation models to demonstrate that the possible diﬀerence between the greedy and optimal selections. . . eﬀectively providing a reduced variance observation). .. which are mutually independent conditioned on the quantity to be estimated (X). . At each stage k ∈ {1. {zk . Note that any choice of observations maps directly in to a choice in the original problem. the same as that of k i zk . i then we select the observation zk for three time slots k out of the set {1. the greedy heuristic in Deﬁnition 3. z1 1 .e.j is pz i (z i.j X). .L . ..g. L}. the i observation model p(zk X) does not change with stage. Note that the only diﬀerence between this problem and the kelement subset selection problem in Section 2. . The exact choice of time slots is not important since rewards are invariant to time. provides a solution that is within (1 − 1/e)× the optimal solution of the problem structure in Assumption 3. L} there are n observations. . z11 . Consider the Lelement subset selection problem involving the set of observations {z 1. . The following example illustrates the diﬀerence that is possible using stationary models and a static state. Under Assumption 3. At each stage k we may select any one of the n observations.6.6. . for each observation i. . . . z n. z1 1 .1 .L } For all (i. . which is within a factor of (1 − 1/e)× the optimal solution of the Lelement subset selection problem by Theorem 2. .n }..92 CHAPTER 3.1 has performance guaranteed by the following expression: g g o o (1 − 1/e)I(X. zLL ). . .6. . Thus the solution of the Lelement subset selection problem. . zk }. . . i. . . Theorem 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES 1 n Assumption 3. . z 1. . . . . .e. . Proof. . zLL ) o o where {z1 1 . zLL } is the optimal set of observations. . Example 3.4.5 is that the same observation can be chosen multiple times in diﬀerent stages to yield additional information through the same observation model (e. . i. . zLL ) ≤ I(X. . .4. . . . . if three observations are chosen out of the set {z i.1 . The following proof obtains the guarantee by transforming the problem into a kelement subset selection problem. j). the one which maximizes o o I(X. Selecting the same observation in m diﬀerent stages results in m conditionally independent observations based on the same model. z i. .7.
it assumes that all observation choices are made before any observation values are received. we see that the performance of the greedy heuristic over K = 2 stages is guaranteed to be within a factor of [1 − (1 − 1/K)K ] = 0. the greedy heuristic selects observation z1 since I(X. Greedy heuristics are often applied in a closed loop setting. we need to exploit an additional characteristic of mutual information: I(X. Suppose that our state consists of four independent binary random variables.. i. The same process can be used to establish a closed loop version of Theorem 3. 2 3 zk = [a c]T and zk = [b d]T . 2} there are three observations available. Closed loop control 93 Example 3. the algorithm has already learned the values of 1 1 2 1 3 1 a and b.3 both apply to the expected performance of the greedy heuristic operating in a closed loop fashion. zk = [a b]T . z2 z1 ) = I(X.e. The expectation operation is necessary in the closed loop case since control choices are random variables that depend on the values of previous observations. To obtain the closed loop guarantee. hence I(X. 2 3 An optimal choice is z1 and z2 . Examining Theorem 2.e. z2 z1 ) = 0. therefore it is necessary to use additional planning to ensure that the observations we obtain provide complementary information. 3.8 establishes the result of Theorem 3.4. and then its value is received before the next choice is made.2. z A z B = ζ)pz B (ζ)dζ (3. achieving reward 4 − 2 . The intuition behind the scenario in this example is that information about different portions of the state can be obtained in diﬀerent combinations. 3. z1 ) = I(X.Sec. The ratio of the greedy reward to optimal reward is 3− 4−2 which approaches 0. and H(c) = H(d) = 1 − for some 1 small > 0.3 concentrates on an open loop control structure. Theorem 3. z2 z1 ) = 1 − ..1 and 3. 1 1 In stage 1. X = [a b c d]T .5. where H(a) = H(b) = 1. The performance guarantees of Theorems 3. and I(X. hence this factor is the worst possible over two stages. z A z B ) = I(X.14) .75 of the optimal.5 Closed loop control The analysis in Sections 3. z1 ) = 2 − . in expectation the closed loop greedy heuristic achieves at least half the reward of the optimal open loop selection. i. In stage 2. z1 ) = 2 whereas 2 3 I(X. The total reward is 3 − . in which an observation is chosen.13. In each stage k ∈ {1.1 for the closed loop heuristic.75 as → 0.3.
The greedy heuristic operating in closed loop is deﬁned j in Deﬁnition 3. . . u h1 = ∅.. and the resulting observation values. . . uj−1 . µg (hN )}.3 along with the observations resulting from the actions. . .94 CHAPTER 3.2. zwN N µg (hN ) }. u2 .17) is over the random variables corresponding to the actions {µg (hj+1 ).2. .15) We use the convention that conditioning on hi in an MI expression is always on the value.. as: j µg J→j =E i=1 I(X. and hj = (hj−1 .16) (3. zwN N µg (hi ) µg (hN ) hj ) (3. deﬁne the closed loop greedy heuristic policy µg : u µg (hj ) = arg max I(X.. The expected reward to go from stage j to the end of the planning horizon for the greedy heuristic µg (hj ) commencing from the j history hj is denoted as: µ Jj (hj ) = I(X. . zwii µg (hi ) hi ) (3. The i µg expected reward of the greedy heuristic over the full planning horizon is J1 (∅). (3. . zwj ). prior to stage j (this constitutes all the information which we can utilize in choosing our action at time j). .nwj } (3. hence the action at stage j is ﬁxed given knowledge of hj .17) = E i=j I(X. . . zw2 .19) We assume a deterministic policy. Under the same assumptions as Theorem 3. commencing from an empty history sequence (i. uj . . . u u u We deﬁne hj = (u1 . zwii hi ) hj The expectation in Eq. Accordingly. zwj j N g µg (hj ) .18) This gives rise to the recursive relationship: µ J→j = E[I(X.. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES While the results are presented in terms of mutual information. zwj j 3 g µg (hj ) µ hj )] + J→j−1 g (3.e. j+1 N {zwj j µg (hj ) .1. where hi is the concatenation of the previous history sequence µg (h ) hi−1 with the new observation action µg (hi ) and the new observation value zwii i . zw1 . Deﬁnition 3. they apply to any other objective which meets the previous requirements as well as Eq. . (3. and hence if hi contains elements which are random variables we will always include an explicit expectation operator. . zwj hj ) j u∈{1. zwj−1 ) to be the history of all observation j−1 1 2 actions chosen.. . We also deﬁne the expected reward accrued by the greedy heuristic up to and including stage j. h1 = ∅).14).
(3. . zwN hj ) ≤ I(X.17) with Eq. (3. µg (hj ). . (3. . . zwj+1 . zwj j j µg (hj ) (a) results from the nondecreasing property. Given the above deﬁnitions: zwj hj j o o E Jj+1 [(hj . . zwN hj ) j+1 j+1 N N o o o where conditioning is on the value hj throughout (as per the convention introduced o below Eq. .20). zwj j o o µg (hj ) hj ) + zwj j µ (hj ) E g o Jj+1 [(hj . (3. . .2. and on the random variable zwjj . . µg (hj ). . . zwjj )] zwj hj j o o (3. (3. zwj+1 . zwN hj . zwj+1 . the ﬁrst inequality results directly from submodularity. . Lemma 3. we have J→N = J1 (∅). .22) We seek to obtain a guarantee on the performance ratio between the optimal open loop observation sequence and the closed loop greedy heuristic. . . zwj j+1 j N ) )] (c) µg (hj ) j wj hj ) + µ (hj ) zwj hj j E g o Jj+1 [(hj . zw11 . The second inequality results from the nondecreasing property of MI: o I(X.Sec. . We deﬁne J→0 = 0. . . .5. (3. . Therefore. . .14). zwN hj . zwN hj ) j+1 N o µg (hj ) o = I(X. Using Eq. zwj j j hj µg (hj ) )] Proof. the ﬁrst inequality corresponds to: o o I(X. Closed loop control 95 g g g µ µ µ Comparing Eq. zwjj . we establish a simple result in terms of our new notation. Before we prove the theorem. zwjj hj ) + oE Jj+1 [(hj . . The reward of the tail of the optimal open loop observation sequence (oj .20) Using the MI chain rule and Eq. . (b) from the chain rule.20). . zwN hj ) N o (3. z µg (hj ) o hj ) + I(zwj+1 . . . zwj j = I(X. zwj+1 . zwjj )] ≤ Jj+1 (hj ) ≤ I(X. oj . oj . zwN ) N (3.18). . . zwj j+1 j N (b) o (a) µg (hj ) o . and (c) from the deﬁnition in Eq. The reward of the optimal open loop observation sequence over the full planning horizon is: o o o J1 (∅) = I(X. 3. oN ) commencing from the history hj is denoted by: o o Jj (hj ) = I(X. . this can be written recursively as: o o Jj (hj ) = I(X.15)).21) o where JN +1 (hN +1 ) = 0. zwjj ) ≤ I(X. . . .
Now. zwj hj ) + g E Jj+1 [(hj . µj (hj ).23) µ Noting that h1 = ∅.19): g g µj (hj ) µj (hj ) µg g o o J1 (∅) ≤ 2J→j−1 + E 2I(X. µg (hj ). (3. we obtain: µ µ o o J1 (∅) ≤ 2J→N + E JN +1 (hN +1 ) = 2J1 (∅) µ µ o since JN +1 (hN +1 ) = 0 and J→N = J1 (∅). g g g g µg (hj ) . oj .2. this trivially holds for j = 1 since J→0 = 0. zwj hj ) + oE Jj+1 [(hj .8. g o o µ o o J1 (∅) ≤ 2J→j−1 + E I(X. µ o J1 (∅) ≤ 2J1 (∅) g i. Under the same assumptions as Theorem 3. This establishes the induction step. (3. j j Applying the induction step N times. zwjj hj ) ≤ I(X. that the reward of the optimal open loop sequence is no greater than twice the expected reward of the greedy closed loop heuristic. followed by Eq.e. zwjj )] j zwj hj µg j Applying Lemma 3. zwj ). zwjj )] zwj hj j By the deﬁnition of the closed loop greedy heuristic (Deﬁnition 3. Theorem 3. Proof. To establish an induction. the expected reward of the closed loop greedy heuristic is at least half the reward of the optimal open loop policy. assume that µ o o J1 (∅) ≤ 2J→j−1 + E Jj (hj ) g g (3.2). GREEDY HEURISTICS AND PERFORMANCE GUARANTEES We are now ready to prove our result.21). I(X. oj .. zwj j hence: o µg (hj ) hj ) g µ (hj ) o o o J1 (∅) ≤ 2J→j−1 + E I(X.1. zwj )] µ (hj ) z j hj wj o = 2J→j + E Jj+1 (hj+1 ) µg where hj+1 = (hj .96 CHAPTER 3. assuming that it holds for j. we show that it also holds for (j + 1). zwjj hj ) + oE Jj+1 [(hj . Applying Eq.
1 Counterexample: closed loop greedy versus closed loop optimal While Theorem 3. we may choose z2 . . An online bound cannot be obtained on the basis of a single realization. as the following example illustrates. a followed by the observation z2 for reward log M . . . where X = [a. In the ﬁrst stage. . . so that the open loop guarantee extends to closed loop performance. . . . . . . M }. by choosing N and M to be large. . . . . . M →∞ log N + log M N Hence. The ratio of the greedy reward to the optimal reward is 1 log(N + 1) + N log M 1 → . Consider the following twostage problem. we can obtain an arbitrarily small ratio between the greedy closedloop reward and the optimal closedloop reward. . all options have the same reward. N }. N + 1}. The prior distribution of each 1 of these is uniform and independent. 1 1 N log M . In the second stage. . with a ∈ {1. M }.3. b ∈ {1. b. Example 3.8 provides a performance guarantee with respect to the optimal open loop sequence. The greedy 2 algorithm in the ﬁrst stage selects the observation z1 = b. . . we may measure z1 = a for 2 i reward log N . as it yields a higher reward 1 (log(N +1)) than z1 = a (log N ). N }. there is no guarantee relating the performance of the closed loop greedy heuristic to the optimal closed loop controller. for total reward log N + log M . 3. At the second stage. although online bounds similar to Theorems 3. c]T . or z1 = b for reward log(N + 1). and is uniformly distributed on {1. One exception to this is linear Gaussian models. The intuition of this example is that the value of the observation at the ﬁrst time provides guidance to the controller on which observation it should take at the second . where closed loop policies can perform no better than open loop sequences.5. where c. 3. so we choose one arbitrarily for a total reward of log(N + 1) + N log M . Closed loop control 97 We emphasize that this performance guarantee is for expected performance: it does not provide a guarantee for the change in entropy of every sample path. and c ∈ {1.Sec.4 could be calculated through Monte Carlo simulation (to approximate the expectation).2 and 3. The 1 optimal algorithm in the ﬁrst stage selects the observation z1 = a for reward log N . i = a i z2 = d.5. i ∈ {1. otherwise where d is independent of X.
where N ≥ 2. z2 or z2 = c. . c]T . b. We conjecture that it may be possible to establish a closed loop performance guarantee for diﬀusive processes. i Notice that the reward of any observation z2 without conditioning on the ﬁrst observa1 1 i a tion z1 is I(X. z2 ) = N log M . The greedy heuristic is unable to anticipate the later beneﬁt of this guidance. i = a i z2 = d. In the third stage. hence z2 is chosen. b ∈ {1. the reward of z2 conditioned on the value 1 a z1 of the observation z1 = a is I(X. z2 ˇ1 ) = log M . In the ﬁrst stage. where z2 and 2 z2 are given by: b. and c ∈ {1. hence it selects z2 = b for reward log (N + 1). For any N ≥ 2. The open loop greedy heuristic gains the same reward (log 2) for the ﬁrst observation. . the open loop greedy heuristic achieves higher reward than the closed loop greedy heuristic. the value of a a is known. The prior distribution of each of these is uniform and independent.. . submodularity) is lost when later choices (and rewards) are conditioned earlier observation values. the total reward is log 2(N + 1). In the second stage.2 Counterexample: closed loop greedy versus open loop greedy An interesting side note is that the closed loop greedy heuristic can actually result in lower performance than the open loop greedy heuristic. z3 = b. N }. . . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES time. where X = [a. a single observation is 2 3 1 1 available. as the following example shows. . . . Since there is no prior 1 2 knowledge of a. . Example 3.98 CHAPTER 3. . N + 1}.4. In contrast.e. and is uniformly distributed on {1. The converse is not . At the second observation. This highlights that the property of diminishing returns (i. 3. The ﬁnal observation z3 = b then yields no further reward. we may choose z2 . .5. otherwise where d is independent of X. Consider the following threestage problem. z2 and z2 yield the same reward ( 1 log (N + 1)) which is less than the 2 3 3 reward of z2 (log N ). a single observation is available. and the closed loop greedy heuristic has performance no worse than half that of the optimal open loop sequence. with a ∈ {1. N + 1}. The closed loop greedy algorithm gains reward log 2 for the ﬁrst observation. the ratio of open loop greedy performance to closed loop greedy performance can be no greater than two. . The ﬁnal observation yields reward log N √ for a total reward of log 2N N + 1. z1 = a. Since the open loop greedy heuristic has performance no better than the optimal open loop sequence. but it is likely to be dramatically weaker than the bounds presented in this chapter. 2}.
3.8 can also be used to extend the result of Theorem 2. We denote the history of observations chosen and the resulting values to be hj = (u1 .. . . . . . Now suppose that the expression .4 to closed loop selections. oK ). g (h ) j ). Again.3 Closed loop subset selection A simple modiﬁcation of the proof of Theorem 3. . Note that the expression µg trivially holds for i = 0 since J→0 = 0 and h1 = ∅. z oj . z u hj ) g (h i) I(X. z u1 . we obtain the value of each observation before making subsequent selections. . 3. z µ g (h j) hi ) g = E[I(X. the ordering within this choice is arbitrary. . z oN hj ) where h1 = ∅ and hj+1 = (hj . In this structure. We may simplify our notation slightly in this case since we have a single pool of observations. K}.uj−1 } j I(X.4 establish two results µ o o J1 (∅) ≤ J→i + E J1 (hi+1 ) g Proof. The following deﬁnitions are consistent with the previous deﬁnitions within the new notation: µg (hj ) = µ J→j = E i=1 g arg max u∈U \{u1 .3 and 3.. . z uj−1 ). Lemmas 3. . Closed loop control 99 true since the performance of the closed loop greedy heuristic is not bounded by the performance of the optimal open loop sequence.. 3.Sec. . . . µg (hj ). z µ which we use to prove the theorem.5. The optimal choice of observations is denoted as (o1 .. uj−1 . . For all i ∈ {1.5. .3 in which the observation z1 is made unavailable. Lemma 3. z µ µ hj )] + J→j−1 o Jj (hj ) = I(X. . there is a single set of observations from which we may choose any subset of ≤ K elements out of the ﬁnite set U. The proof follows an induction on the desired result. This can be demonstrated with a slight 2 modiﬁcation of Example 3.
. . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES µ o o J1 (∅) ≤ J→i−1 + E J1 (hi ) (a) (b) µ = J→i−1 + E[I(X. .4. . z oK hi )] hi )] + E[I(X. and (e) uses the deﬁnition of hi+1 . . . z oK hi )] g g (h i) g g µ ≤ J→i−1 + E[I(X. z µ g . . z µ g (h i) hi ) Proof. z µ g (h ) i g (h i) (c) µ = J→i−1 + E[I(X. z o1 . z oj hi . K}. (c) from submodularity. µg (hi ). . z oK hi ) (b) K (a) = I(X. o J1 (hi ) ≤ KI(X. . . z oj hi ) I(X. . . z µ g (h i) hi ) o where (a) results from the deﬁnition of J1 . z µ i )] (e) µ o = J→i + E J1 [(hi+1 )] o where (a) uses the deﬁnition of J1 . (b) from the MI chain rule. z oK hi . z µ µ = J→i + E g g g (h i) )] (d) g z µ (hi ) h E o J1 [(hi . z o1 . z o1 . z µ g (h i) ≤ j=1 hi ) = KI(X. The following steps establish the result: o J1 (hi ) = I(X. . . . . . . . z o1 . For all i ∈ {1. µg o (c) results from the MI chain rule. Lemma 3. and (d) from the deﬁnition of µg . z oj−1 ) j=1 (c) K ≤ j=1 (d) K I(X. (d) uses the deﬁnition of J→i and J1 . . . .100 holds for (i − 1): CHAPTER 3. . (b) results from the nondecreasing property of MI. z o1 . . .
. (3. z A∪B ) − D(X. the results are applicable to a larger class of objectives.3 can also yield a guarantee on the posterior Cram´rRao bound.Sec. (2. . µg o J→K ≥ (1 − 1/e)J1 (∅) Proof. Guarantees on the Cram´rRao bound e 101 Theorem 3. The expected reward of the closed loop greedy heuristic in the Kelement subset selection problem is at least (1 − 1/e)× the reward of the optimal open loop sequence.4. The following analysis shows that the guarantees in Sections 2. . z ) A log J∅ + X ¯ za a∈A JX J∅  X (3. i.. we obtain the desired result.26) (3. . To commence. we obtain: µ µ µ o J1 (∅) ≤ J→i−1 + K(J→i − J→i−1 ) ∀ i ∈ {1.93) in Theorem 2. z A z B ) D(X.5. We continue to assume that observations are independent e conditioned on X. 2.4.4 that. note from Lemma 3. . K}: o E J1 (hi ) ≤ K E[I(X.6. z B ) = log J∅ + X J∅ + X a∈A∪B (3. 3.24) µ µ o Letting ρj = J→j − J→j−1 and Z = J1 (∅) and comparing Eq.e. .4.1.24) with Eq.3.1 and 3. 3.27) ¯ a Jz X ¯ zb b∈B JX .25) ¯ a where J∅ and Jz are as deﬁned in Section 2. . assume that the objective we seek to maximize is the log of the determinant of the Fisher information: (we will later show that a guarantee on this quantity yields a guarantee on the determinant of the Cram´rRao bound matrix) e D(X. 3.4. K} g g g g g µg (3.6. . We can also deﬁne an increment X X function similar to conditional MI: D(X.9. for all i ∈ {1. To commence. z µ µg g (h i) hi )] = K(J→i − J→i−1 ) Combining this with Lemma 3.6 Guarantees on the Cram´rRao bound e While the preceding discussion has focused exclusively on mutual information.
z A z B ) ≥ 0. Suppose A ∈ Rn×m and B ∈ Rm×n . we need to prove that ∀ B ⊇ A. Lemma 3.7. (3.6.10. Then A ≥ B > 0. The following development derives these properties without using this link. For submodularity. If A = ∅. z B ) For convenience. we trivially X X X X ﬁnd D(X. Theorem 3. Suppose that A B B 0. z A ) is a submodular function of A. z A ) is a nondecreasing set function of the set A. Proof. We will require the following three results from linear algebra.102 CHAPTER 3.6 that a∈A∪B JX X X ¯ a ¯ b J∅ + a∈A∪B Jz  ≥ J∅ + b∈B Jz . GREEDY HEURISTICS AND PERFORMANCE GUARANTEES It is easy to see that the reward function D(X. z A ) = 0.4. 0.5. z z ) = log J∅ + X J∅ + X a∈A∪B b∈B ¯ a Jz X ¯ b Jz X ¯ za ¯ zb Since J∅ + J∅ + b∈B JX . also note that I(X. deﬁne the shorthand notation: ¯ JC X JA X ¯ C Jz X c∈C A Jz X (3. Assuming that all observations are independent conditioned on X. z C∪B ) − D(X. z A ) is nondecreasing. Suppose that A Lemma 3. z ∅ ) = 0. we have by Lemma 3. To show that D(X. Lemma 3. z A ) ≥ D(X. z A ) = 2 D(X.28) ¯ c Jz X J∅ + X a∈A ¯ a Jz X . z A ) in the linear Gaussian case). D(X. consider the increment: A B D(X. Then I + AB = I + BA. D(X.3. z C∪A ) − D(X. Then B−1 A−1 0. for proofs see [1]. D(X. hence D(X. submodular function of the set A by comparing Eq.25) to the MI of a linear Gaussian process 1 (see Section 2. z A ) is a nondecreasing. with D(X.
we obtain through Lemma 3.1.11. e Theorem 3. since JA X −1 JB X −1 (by Lemma 3. Guarantees on the Cram´rRao bound e 103 We proceed by forming the diﬀerence of the two sides of the expression in Eq.28): [D(X. then we obtain the guarantee from Theorem 3. z A ). z O ) Then the determinants of the matrices in the posterior Cram´rRao bound in Sece tion 2. (3.1 that D(X. through one of guarantees (online or oﬄine) in Section 3. z A ). Let z G be the set of observations chosen by the greedy heuristic operating on D(X.6 satisfy the following inequality: CG  X ≤ C∅  X CO  X C∅  X β . z G ) ≥ 0. we have for some β: D(X.Sec. The following theorem maps this into a guarantee on the posterior Cram´rRao bound.z C∪A ) − D(X. Since JX X X ¯ C\B JX . z O ) where z G is the set of observations chosen by the greedy heuristic and z O is the optimal set. if we use the greedy selection algorithm on D(X.10. we obtain: −1 C\B −1 C\B ¯ C\B ¯ ¯ ¯ C\B = log I + JX 2 JA JX 2 − log I + JX 2 JB JX 2 X X 1 1 1 1 Finally.6. z A ). 3.5D(X. z A )] − [D(X.6: ≥0 This establishes submodularity of D(X. Following Theorem 3.5). Assume that. z C∪B ) − D(X. and let z O be the optimal set of observations for this objective. z G ) ≥ βD(X.3. we can write through Lemma 3.6: ≥ log I + JA X −1 2 − − − ¯ C\B ¯ C\B JX JA 2 − log I + JB 2 JX JB 2 X X X 1 1 1 ¯ C\B Factoring JX and applying Lemma 3. z B )] = log ¯ JA + JX  X J∅  X C\A − log JA  X J∅  X − log ¯ JB + JX  X J∅  X C\B − log JB  X J∅  X = log ¯ C\A ¯ C\B JA + JX  JB + J  X − log X B X JA  JX  X −1 2 − − − ¯ C\A ¯ C\B JX JA 2 − log I + JB 2 JX JB 2 X X X 1 1 1 = log I + JA X 1 − ¯ C\A where JB 2 is the inverse of the symmetric square root matrix of JB .1 or 3.7.
7 Estimation of rewards The analysis in Sections 3. .nwj } .5 do not apply to Fisher information since the required properties (Eq. Obviously linear Gaussian processes are an exception to this rule. consider the proof of Theorem 3.6 assumes that all reward values can be calculated exactly. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES The ratio CG /C∅  is the fractional reduction of uncertainty (measured through X X covariance determinant) which is gained through using the selected observations rather than the prior information alone. The results of Sections 3. approximations are often necessary.1..4) and Eq.6. Thus Theorem 3. . z A ) in this case. zwj−1 ) j−1 g∈{1.. By the deﬁnition of D(X. where the greedy heuristic is used with estimated MI rewards. z A ) (Eq.104 CHAPTER 3.14) respectively) do not generally apply to D(X.. (3. The analysis in [46] can be easily extended to the algorithms described in this chapter. While this is possible for some common classes of problems (such as linear Gaussian models).11 provides a guarantee on how much of the optimal reduction you lose by using the greedy heuristic.25)) and from the assumed condition we have: [log Jz  − log J∅ ] ≥ β[log Jz  − log J∅ ] X X X X Substituting in Cz = [Jz ]−1 and using the identity A−1  = A−1 we obtain: X X [log Cz  − log C∅ ] ≤ β[log Cz  − log C∅ ] X X X X Exponentiating both sides this becomes: Cz  X C∅  X which is the desired result. X Proof.1–3. As an example. zwj zw11 . the determinant of the error covariance of any estimator of X using the data z G is lower bounded by CG . .6 and Lemma 3. (3. z A ) = 2 D(X.. z A ). From Section 2. . (3. G G O A A G O ≤ Cz  X C∅  X O β 3. g g g ˆ gj = arg max I(X.2 and 3. since 1 I(X.1.
.1. which. . we brieﬂy demonstrate wider a class of problems that may be addressed using the previous work in [77] (described in Section 2. . . It is easy to see that this class of selection problems ﬁts in to the matroid class: given any two valid4 observation selection sets A.3. zwjj zw11 . . .e. Then we can ﬁnd an element in B from the ith set which is not in A. zwjj zw11 . has not been previously applied in this context. . . . . As described in Section 2. 3. . . zwj−1 ) ≤ j−1 j−1 g g From step (c) of Theorem 3. .8. ∃ u ∈ B\A such that A ∪ {u} ∈ F . . . (U. . . . . in which we are choosing observations from N sets. . 4 By valid. zwj−1 ) + j ≤ = g g I(X. . . pick any set i such that the number of elements in A from this set is fewer than the number of elements in B from this set (such an i must exist since A and B have diﬀerent cardinality). . zw11 . . . . zwj−1 ) + j ≤ g g I(X. . Extension: general matroid problems 105 and the error in the MI estimate is bounded by . .Sec.4.1. zwM ) M + j=1 M o g g ˆ I(X. zwM ) M + 2M Hence the deterioration in the performance guarantee is at most 2M . to our knowledge.4. 3. zwM ) M ≤ g g I(X. zwM ) M + j=1 M g g g ˆ I(X. zwM ) M + j=1 g I(X. .4). . zwj zw11 . . . . . . . . and we may choose ki elements from the ith set. i. we mean that no more than ki elements are chosen from the ith set. . g g g g ˆ I(X. in which we select a ﬁxed number of observations from each set. zwM ) M + j=1 M g I(X. . B ∈ F such that A < B. . zw11 . . . . B with A < B. F ) is a matroid if ∀ A. . zwjj zw11 . zwj zw11 . . zw11 .. . zwjj zw11 . zwj−1 ) − I(X. . but can be added to A while maintaining a valid set. zwj−1 ) + 2 j g g g g 2I(X. M o o I(X. zw11 . .8 Extension: general matroid problems The guarantees described in this chapter have concentrated on problem structures involving several sets of observations. . zwj−1 ) j o g ≤ g g I(X. Consider the class of problems described in Assumption 3. zw11 . zw11 . In this section.
3.7. and controls a ﬁnite state.1. . .1 is detailed in Assumption 3. zNN }}. 3. This is the key advantage of Theorem 3. . Fig.4 when dealing with problems that have the structure of Assumption 3. 3. . but the total number of observations chosen cannot exceed K. Then we can ﬁnd an element in B from the ith set which is not in A.7 shows the observations chosen at each time in the previous case (where an observation was chosen in every time step) and in the constrained case in which a total of 50 observations is chosen.8. In this case. z1 1 }. . but where the total number of observations chosen (of either object) in the 200 steps should not exceed 50 (we assume an open loop control structure. n 1 Assumption 3.4. completely observed Markov chain (with a deterministic transition law) which determines the subset of measurements available at each time. . This may be seen to ﬁt within the structure of Assumption 3. . .7. consider a beam steering problem similar to the one described in Section 3. There are N sets of observations. the controller simultaneously selects observations in order to control an information state. n 1 {zN . so the selection algorithm and guarantee of Section 2.1 Example: beam steering As an example. .1 over the prior work described in Section 2. We now describe a greedy algorithm which . . .1. Under this generalization. {{z1 .4 applies. . . which are mutually independent conditioned on the quantity to be n 1 estimated (X).4. The diﬀerence is that there is an upper limit on the total number of observations able to be taken. . take any two observation selection sets A.106 CHAPTER 3. .7 clearly remains a matroid: as per the previous discussion. as well as on the number of observations able to be taken from each set. . The structure in Assumption 3.1. in which we choose all observation actions before obtaining any of the resulting observation values). GREEDY HEURISTICS AND PERFORMANCE GUARANTEES A commonly occurring structure which cannot be addressed within Assumption 3. but can be added to A while maintaining a valid set (since A < B ≤ K). the greedy heuristic must consider all remaining observations at each stage of the selection problem: we cannot visit one set at a time in an arbitrary order as in Assumption 3. B with A < B and pick any set i such that the number of elements in A from this set is fewer than the number of elements in B from this set.9 Extension: platform steering A problem structure which generalizes the openloop observation selection problem involves control of sensor state. Any ki observations can be chosen out of the ith set ({zi .7. zi i }).4.
4 and 3. .2.7. (b) shows the smaller set of observations chosen in the constrained problem using the matroid selection algorithm. (a) shows the observations chosen in the example in Sections 3.1.4 when q = 1.(a) Choosing one observation in every time step Object observed 2 1 0 50 100 Time step 150 200 (b) Choosing up to 50 total observations using matroid algorithm Object observed 2 1 0 50 100 Time step 150 200 Figure 3.
for each si . . . Deﬁnition 3. Calculate the reward J + I(x. .108 CHAPTER 3. . 2N + 1}. s0 ) Stage 1: Consider each possible sensor state which can follow s0 . s ∈ S s0 0 1 0 1 J1 (s1 ) = −∞. From sensor state 0 we can transition to any other sensor state. ˜ i si+1 The following example demonstrates that the ratio between the greedy algorithm for open loop joint observation and sensor state control and the optimal open loop algorithm can be arbitrarily close to zero. about which we seek to gather information is static (x1 = x2 = · · · = x). and consists of N (2N + 1) binary elements . . The greedy algorithm for jointly selecting observations and controlling sensor state commences in the following manner: (s0 is the initial state of the algorithm which is ﬁxed upon execution) Stage 0: Calculate the reward of the initial observation J0 = I(x. or transition to sensor state (s − 1) (provided that s > 1) or (s + 1) (provided that s < 2N + 1). s0 ) i−1 i−1 i−2 i−1 After stage N . we can stay in the same state. We seek to control the sensor state sk ∈ {0. otherwise Stage i: For each si . sg (sg (si )). calculate sg = arg maxs JN (s).3. The unobserved state. In stage 0 we commence from sensor state s0 = 0. . Example 3. x. .5. calculate the highest reward sequence which can precede that state: sg (si ) = arg max Ji−1 (si−1 ) i−1 si−1 si ∈S si−1 Then. We assume that in each sensor state there is a single measurement available. The ﬁnal sequence is found through ˜N g the backward recursion si = sg (˜g ). . si sg (si ). s s ). From sensor state s = 0. add the reward of the new observation obtained in that state: Ji (si ) = Ji−1 (sg (si )) + I(x. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES one may use to control such a problem.
. The observation in sensor state 2j − 1.9. . . Extension: platform steering 109 {x1. si−1 = 0 si−1 positive and even si−1 odd index.2N +1 }. . . The reward gained in each stage is: 0. si positive and even i. . .1 . x1. . xj.2N +1 . Compare this result to the optimal sequence. The prior distribution of these elements is uniform. xN. we obtain: −∞. . . The greedy algorithm commences stage 0 with J0 = 0. . . si . i odd . i − 1. . In stage 1. Incorporating the new observation. s = 0 i Ji (si ) = i − 1. . we ﬁnd that the best sequence remains in any oddnumbered state for all stages. . s1 odd Suppose that we commence stage i with −∞. s = 0 1 J1 (s1 ) = 0. i even I(x. . which visits state i at stage i. and obtains a reward of 2N + 1. N } is uninformative. s1 positive and even 1. j ∈ {1.1 . si−1 ) = i. . 3. .Sec. . j ∈ {1. . .i }. si odd At the end of stage 2N + 1. g si−1 (si ) = si − 1. we obtain: −∞. The observation in sensor state 2j − 2. we obtain: si = 0 si positive and even si odd Settling ties by choosing the state with lower undeﬁned. si s0 . . . xN. . Ji−1 (si−1 ) = i − 2. . . N } at stage i provides a direct measurement of the state elements {xj.1 . . .
. N → ∞ N 2 + 2N + 1 3. GREEDY HEURISTICS AND PERFORMANCE GUARANTEES N (2j + 1) = N 2 + 2N + 1 j=0 The ratio of greedy reward to optimal reward for the problem involving 2N +1 stages is: 2N + 1 → 0.10 Conclusion The performance guarantees presented in this chapter provide theoretical basis for simple heuristic algorithms that are widely used in practice. The examples presented throughout the chapter demonstrate the applicability of the guarantees to a wide range of waveform selection and beam steering problems.110 The total reward is thus CHAPTER 3. and the substantially stronger online guarantees that can be obtained for speciﬁc problems through computation of additional quantities after the greedy selection has been completed. and are naturally tighter for diﬀusive processes. or discounted objectives. The guarantees apply to both open loop and closed loop operation.
permits multiple observations of each object. as discussed in Section 2. In Section 4.1 a simple formulation that allows us to select up to one observation for each object.Chapter 4 Independent objects and integer programming N this chapter.2). which ﬁnds the optimal plan. The formulation is generalized to consider resources with arbitrary capacities in Section 4. we use integer programming methods to construct an open loop plan of which sensor actions to perform within a given planning horizon. this structure is useful in 111 . In addition to the previous assumption that observations should be independent conditioned on the state. three new assumptions must be met for this structure to arise: I 1. Accordingly. the mutual information reward of observations of diﬀerent objects becomes the sum of the individual observation rewards—i.. The objects must be observed through independent observation processes When these three assumptions are met. The prior distribution of the objects must be independent 2. The objects must evolve according to independent dynamical processes 3. and can address observations that require diﬀerent durations to complete.e. The emphasis of our formulation is to exploit the structure which results in sensor management problems involving observation of multiple independent objects. In Section 4.2. We may either execute this entire plan. or execute some portion of the plan before constructing an updated plan (socalled open loop feedback control. we generalize this to our proposed formulation. we perform experiments which explore the computational eﬃciency of this formulation on a range of problems.4. We commence this chapter by developing in Section 4. submodularity becomes additivity.5.2. one may apply a variety of techniques that exploit the structure of integer programming with linear objectives and constraints.
Ai represents the subset of time slots in which we observe object i) to be: i i rAi = I(X i . objects will be in clear view and observation will yield accurate position information. . Our task is to determine which object to observe in each time slot. We have a number of objects. . this represents the purest form of the beam steering structure discussed in Section 1.2) . 4. . . Within the time scale of a planning horizon. In some positions.2. we assume that the planning horizon is broken into discrete time slots. . . . . .1) i where zAi are the random variables corresponding to the observations of object i in the time slots in Ai . numbered {1.e. . we assume that we have only a single mode for the sensor to observe each object in each time slot. .1 Basic formulation We commence by presenting a special case of the abstraction discussed in Section 4. and that each observation action occupies exactly one time slot. .1 Independent objects. . we deﬁne the reward of observation set Ai ⊆ {1. .. xi } the joint state (over the planning 1 N horizon) of object i. in other positions. 4. . . As per the previous chapter. each of which can be observed in any time slot. Initially. additive rewards The basis for our formulation is the fact that rewards for observations of independent objects are additive. if we assume that the initial states of the objects are independent: M p(x1 . xM ) = 1 1 i=1 p(xi ) 1 (4. . As discussed in the introduction of this chapter. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING problems involving time invariant rewards.112 CHAPTER 4. . To motivate this structure. consider a problem in which we use an airborne sensor to track objects moving on the ground beneath foliage. objects will be obscured by foliage and observations will be essentially uninformative. and it will be preferable to observe objects during the portion of time in which they are expected to be in clear view. M }. zAi ) (4. N } of object i (i.1. . objects will move in and out of obscuration.1. . N }. numbered {1. such that the only choice to be made in each time slot is which object to observe. that a sensor can perform at most one task in any time slot. Denoting by X i = {xi .
Sec. 4.1. Basic formulation
113
and that the dynamical processes are independent:
M
p(x1 , . . . , xM x1 , . . . , xM ) = k k k−1 k−1
i=1
p(xi xi ) k k−1
(4.3)
and, ﬁnally, that observations are independent conditioned on the state, and that each observation relates to a single object:
M M 1 p(zA1 , . . . , zAM X 1 , . . . , X M )
=
i=1 k∈Ai
i p(zk X i )
(4.4)
then the conditional distributions of the states of the objects will be independent conditioned on any set of observations:
M 1 M p(X 1 , . . . , X M zA1 , . . . , zAM ) = i=1 i p(X i zAi )
(4.5)
In this case, we can write the reward of choosing observation set Ai for object i ∈ {1, . . . , M } as:
1 M 1 M I(X 1 , . . . , X M ; zA1 , . . . , zAM ) = H(X 1 , . . . , X M ) − H(X 1 , . . . , X M zA1 , . . . , zAM ) M M
=
i=1 M
H(X ) −
i=1 i I(X i ; zAi ) i=1 M
i
i H(X i zAi )
=
=
i=1
i rA i
(4.6)
4.1.2 Formulation as an assignment problem
While Eq. (4.6) shows that rewards are additive across objects, the reward for taking several observations of the same object is not additive. In fact, it can be easily shown through submodularity that (see Lemma 4.3)
i I(X i ; zAi ) ≤ k∈Ai i I(X i ; zk )
As an initial approach, consider the problem structure which results if we restrict ourselves to observing each object at most once. For this restriction to be sensible, we must
114
CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
assume that the number of time slots is no greater than the number of objects, so that an observation will be taken in each time slot. In this case, the overall reward is simply the sum of single observation rewards, since each set Ai has cardinality at most one. Accordingly, the problem of determining which object to observe at each time reduces to an asymmetric assignment problem, assigning time slots to objects. The assignment problem may be written as a linear program in the following form:
M N i i r{k} ωk
max
i ωk
(4.7a)
i=1 k=1 M i s.t. ωk ≤ 1 i=1 N i ωk ≤ 1 k=1 i ωk ∈ {0, 1}
∀ k ∈ {1, . . . , N }
(4.7b)
∀ i ∈ {1, . . . , M } ∀ i, k
(4.7c) (4.7d)
i The binary indicator variable ωk assumes the value of one if object i is observed in time slot k and zero otherwise. The integer program may be interpreted as follows: i • The objective in Eq. (4.7a) is the sum of the rewards corresponding to each ωk that assumes a nonzero value, i.e., the sum of the rewards of each choice of a particular object to observe in a particular time slot.
• The constraint in Eq. (4.7b) requires that at most one object can be observed in any time slot. This ensures that physical sensing constraints are not exceeded. • The constraint in Eq. (4.7c) requires that each object can be observed at most once. This ensures that the additive reward objective provides the exact reward value (the reward for selecting two observations of the same object is not the sum of the rewards of each individual observation). • The integrality constraint in Eq. (4.7d) requires solutions to take on binary values. Because of the structure of the assignment problem, this can be relaxed to allow i any ωk ∈ [0, 1], and there will still be an integer point which attains the optimal solution. The assignment problem can be solved eﬃcient using algorithms such as Munkres [72], JonkerVolgenantCasta˜´n [24], or Bertsekas’ auction [11]. We use the auction no algorithm in our experiments.
Illustration of reward trajectories and resulting assignment
1.5 1 0.5 0 40 20 Object number 0 0 2 4 6 8 10
Reward
Time step
Figure 4.1. Example of operation of assignment formulation. Each “strip” in the diagram corresponds to the reward for observing a particular object at diﬀerent times over the 10step planning horizon (assuming that it is only observed once within the horizon). The role of the auction algorithm is to pick one unique object to observe at each time in the planning horizon in order to maximize the sum of the rewards gained. The optimal solution is shown as black dots.
116
CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
One can gain an intuition for this formulation from the diagram in Fig. 4.1. The rewards correspond to a snapshot of the scenario discussed in Section 4.1.3. The scenario considers the problem of object tracking when probability of detection varies with position. The diagram illustrates the tradeoﬀ which the auction algorithm performs: rather than taking the highest reward observation at the ﬁrst time (as the greedy heuristic would do), the controller defers measurement of that object until a later time when a more valuable observation is available. Instead, it measures at the ﬁrst time an object with comparatively lower reward value, but one for which the observations at later times are still less valuable.
4.1.3 Example
The approach was tested on a tracking scenario in which a single sensor is used to simultaneously track 20 objects. The state of object i at time k, xi , consists of position k and velocity in two dimensions. The state evolves according to a linear Gaussian model: xi = Fxi + wi k k k+1 where wi ∼ N {wi ; 0, Q} is a k k 1 T 0 1 F= 0 0 0 0 white Gaussian noise process. F 3 T T2 0 0 0 T32 2 0 0 T 0 ; Q = q 2 0 T3 1 T 0 3 T2 0 1 0 0 2 (4.8) and Q are set as: 0 0 (4.9) T2 2 T
The diﬀusion strength q is set to 0.01. The sensor can be used to observe any one of the M objects in each time step. The measurement obtained from observing object uk with the sensor consists of a detection ﬂag duk ∈ {0, 1} and, if duk = 1, a linear Gaussian k k measurement of the position, z uk : k z uk = Hxuk + v uk k k k (4.10)
where v uk ∼ N {v uk ; 0, R} is a white Gaussian noise process, independent of wuk . H k k k and R are set as: 1 0 0 0 5 0 H= ; R= (4.11) 0 0 1 0 0 5 The probability of detection Pduk xuk (1xuk ) is a function of object position. The k k k function is randomly generated for each Monte Carlo simulation; an example of the
Probability of detection 1 100 0.6 0.8 200 y position 300 400 500 0.2 600 700 200 400 x position 600 0 0.2. The color intensity indicates the probability of detection at each x and y position in the region. Example of randomly generated detection map.4 Figure 4. .
.Performance for 20 objects over 200 simulations 100 80 Total reward 60 40 20 0 0 5 10 Planning horizon 15 20 Figure 4. with a planning horizon of one the auctionbased method is equivalent to greedy selection. Error bars indicate 1σ conﬁdence bounds for the estimate of average total reward.3. Performance is measured as the average (over the 200 simulations) total change in entropy due to incorporating chosen measurements over all time. Performance tracking M = 20 objects. The point with a planning horizon of zero corresponds to observing objects sequentially.
. As illustrated in Fig. The reduction in performance for longer planning horizons is a consequence of the restriction to observe each object at most once in the horizon. we are then. in eﬀect. 4. N } with a set of available resources. Integer programming generalization 119 function is illustrated in Fig.3. It is not surprising that the increase in performance above the greedy heuristic is small. 4.2. and utilize a more advanced integer programming formulation to ﬁnd an eﬃcient solution. the elements of which may correspond to the use of a particular sensor over a particular interval of time. the assignment formulation is able to improve performance over the greedy method. As in the previous section. . The performance over 200 Monte Carlo runs is illustrated in Fig.1.Sec. . each element of R can be assigned to at most one task. These limitations motivate the generalization explored in the following section. enforcing that each object must be observed once. in which objects are observed sequentially. the formulation in this section allows us to accommodate observations which consume multiple resources (e. We also relax the constraint that each object may be observed at most once. which allows us to admit multiple observations of each object. . With a planning horizon of one. . . the auctionbased algorithm corresponds to greedy selection. hence they are not applicable to this wider class of problems).2 Integer programming generalization An abstraction of the previous analysis replaces the discrete time slots {1. e. multiple time slots on the same sensor. Estimation is performed using the Gaussian particle ﬁlter [45]. hence by forcing the controller to observe each object. R (assumed ﬁnite). with the right choice of planning horizon. we are forcing it to (at some stage) take observations of little value.g.g. The point with a planning horizon of zero corresponds to a raster. as well as observations that require diﬀerent durations to complete (note that the performance guarantees of Chapter 3 require all observations to consume a single time slot. Unlike the previous section and the previous chapter. 4. 4. or the same time slot on multiple sensors). The diagram demonstrates that. The function may be viewed as an obscuration map. in this scenario there will often be objects receiving low reward values throughout the planning interval.2. since the performance guarantees discussed in the previous chapter apply to this scenario. 4. If the planning horizon is on the order of the number of objects. due to foliage. The performance is measured as the average (over the 200 simulations) total change in entropy due to incorporating chosen measurements over all time.
If we do not limit the sensing resources that are allowed to be used for object i.. t(A) = u∈A t(u) The problem that we seek to solve is that of selecting the set of observation actions for each object such that the total reward is maximized subject to the constraint that each resource can be used at most once.1 Observation sets Let U i = {ui .2 Integer programming formulation The optimization problem that we seek to solve is reminiscent of the assignment problem in Eq.12) Alternatively we may limit the total number of elemental observations allowed to be taken for object i to k i : S i = {A ⊆ U i t(u1 )∩t(u2 ) = ∅ ∀ u1 . since observations may consume diﬀerent resource quantities. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING 4. i. Let S i ⊆ 2U be j the collection of observation subsets which we allow for object i.13) or (4. A ≤ k i } (4. u2 ∈ A. u2 ∈ A.120 CHAPTER 4. (4.12). . .14). . (4. where each elemental observation ui corresponds to observing j object i using a particular mode of a particular sensor within a particular period of time. u∈A t(u) ≤ Ri (4.2. This is assumed to take the form of Eq. rather .7). . note that in each case it is an independence system.14) We denote by t(A) ⊆ R the set of resources consumed by the actions in set A. An elemental action may occupy multiple resources. u2 ∈ A} (4. ui i } be the set of elemental observation actions (assumed ﬁnite) that 1 L may be used for object i. though not necessarily a matroid (as deﬁned in Section 2. let t(ui ) ⊆ R be the subset j i of resource indices consumed by the elemental observation action ui .e.4). (4. except that now we are assigning to each object a set of observations. the collection will consist of all subsets of U i for which no two elements consume the same resource: S i = {A ⊆ U i t(u1 )∩t(u2 ) = ∅ ∀ u1 . 4.13) or limit the total quantity of resources allowed to be consumed for object i to Ri : Si = A ⊆ U i t(u1 )∩t(u2 ) = ∅ ∀ u1 .2.
g. and conceptually could be addressed using combinatorial auction methods (e. . . M } allow for several observations to be taken of the same object.15a) s. the number of subsets may be combinatorially large.15d) cannot be relaxed. ∀t∈R (4.15c) ensure that exactly one observation set is chosen for any given object. . (4. 4.15c) (4.t. the subsets for which ωAi = 1). Integer programming generalization 121 than a single observation: M max ωi Ai i i rA i ω A i i=1 Ai ∈S i M i ω Ai ≤ 1 i=1 Ai ∈S i t∈t(Ai ) i ωAi = 1 ∀ i ∈ {1. This is necessary to ensure that the additive objective is the i i i exact reward of corresponding selection (since.Sec. (4. [79]). 1} ∀ i. • The integrality constraints in Eq.g. (4. .15b) ensure that each resource (e. The interpretation of each line of the integer program follows. .. i ∈ {1. The problem is a bundle assignment problem.2. .15d) i Again. . rA∪B = rA + rB ). the binary indicator variables ωAi are 1 if the observation set Ai is chosen and 0 otherwise. sensor time slot) is used at most once.. Unlike the formulation in Eq. • The objective in Eq.7)..15a) is the sum of the rewards of the subset selected for i each object i (i. If the collections of observation sets S i .15d) ensure that the selection variables take on the values zero (not selected) or one (selected). generally this would require computation of rAi for every subset Ai ∈ S i . M } Ai ∈S i i ωAi ∈ {0. . Ai ∈ S i (4. The problem is not a pure assignment problem. (4. (4. • The constraints in Eq. (4. i However.e.15b) (4. the integrality constraints in Eq. • The constraints in Eq. Note that the constraint does not force us to take an observation of any object. as the observation subsets Ai ∈ S i consume multiple resources and hence appear in more than one of the constraints deﬁned by Eq. since the empty observation set is allowed (∅ ∈ S i ) for each object i. . (4.15b). in general.
The collection of candidate subsets. In the limit. could be addressed using [5]. but an upper bound to each reward is available. Deﬁnition 4.e. such that the only decision is to determine how many observations to select for each object. rather we use a compact representation which obtains upper bounds through submodularity (details of this will be given later . which becomes increasingly tight as iterations progress. The reward of each of the remaining subsets (i. The solution of the integer program in each iteration provides an upper bound to the optimal reward. as described in the following section. we arrive at the full complexity of the integer program in Eq. This section details an algorithm that can be used to eﬃciently solve the integer program in many practical situations. but in many practical situations it is possible to terminate much sooner with an optimal (or nearoptimal) solution.122 CHAPTER 4. we will not explicitly enumerate the elements in S i \Tl i . (4.3 Constraint generation approach The previous section described a formulation which conceptually could be used to ﬁnd the optimal observation selection. We commence with Tl i = {∅}. but the computational complexity of the formulation precludes its utility. we address the general case which arises when observations are heterogeneous and require diﬀerent subsets of resources (e. 4. The subsets in Tl i are those for which the exact reward has been evaluated.. those in S i \Tl i ) has not been evaluated. Tl i ⊆ S i .15). INDEPENDENT OBJECTS AND INTEGER PROGRAMMING Our approach exploits submodularity to solve a sequence of integer programs. The formulation may be conceptually understood as dividing the collection of subsets for each object (S i ) at iteration l into two collections: Tl i ⊆ S i and the remainder S i \Tl i . diﬀerent time durations). New subsets are added to the collection at each i iteration (and their rewards calculated). each of which represents the subsets available through a compact representation. In this thesis.g.1 (candidate subset). The case in which all observations are equivalent. In practice. so that Tl i ⊆ Tl+1 for all l. The complexity associated with evaluating the reward of each of an exponentially large collection of observation sets can be ameliorated using a constraint generation approach. is the collection subsets of observations for object i for which the exact reward has been evaluated prior to iteration l. we will refer to these as candidate subsets.. The method proceeds by sequentially solving a series of integer programs with progressively greater complexity.
Di ∈ S i . if the integer program selects a subset in S i \Tl i for one or more objects. Ai ∪ C i . Eq. Eq.1) speciﬁes the way in which the collection of i candidate subsets Tl i and the exploration subsets Bl. Ai .Ai (possibly empty). We refer to Bl.. If the subset that the integer program selects for each object i is in Tl i — i.15b)) are satisﬁed. Bl. Di . We will prove in Lemma 4.Ai . The candidate subset Ai may be augmented with any i subset of elemental observations from Bl. ensuring that the resource constraints (e. since it provides a mechanism for discovering promising new i subsets that should be incorporated into Tl+1 . The subset of observations selected by the integer program for object i is the union of these.1 that this update procedure ensures that there is exactly one way of selecting each subset . The update algorithm (Algorithm 4.e.Ai (subject to resource constraints) to generate subsets in S i \Tl i . C i ⊆ Bl.3). the solution of which selects a subset for each object. is indicated indirectly through a choice of one candidate subset. Each iteration of the optimization reconsiders all decision variables.Ai ⊆ U i .15). for each object i. it is a subset which had been generated and for which the exact reward had been evaluated in a previous iteration—then we have found an optimal solution to the original problem.Ai as an exploration subset.3 (selection). for each object i.2 (exploration subset). 4. C i ⊆ Bl.g. a subset of observation actions.Ai . The compact representation of S i \Tl i associates with each candidate subset. and a subset of elements from i the corresponding exploration subset. Ai may be augmented with any subset of i i Bl. Constraint generation approach 123 in Lemma 4. (4. one way of doing this is to add the newly selected subsets to Tl i and evaluate their exact rewards. then we need to tighten the upper bounds on the rewards of those subsets. Conversely. i. Ai ∈ i Tl i .Sec. In each iteration of the algorithm we solve an integer program. The integer program at each iteration l selects a subset of observations. and a subset of elements of the corresponding i exploration subset..Ai are updated between iterations using the selection results of the integer program. (4. Deﬁnition 4.Ai to generate new subsets that are not in Tl i (but that are in S i ). The solution of the integer program at each iteration l is a choice of one candidate subset.e. Ai ∈ Tl i . The selected subset. With each candidate subset Ai ∈ Tl i we associate i an exploration subset Bl. allowing the solution from the previous iteration to be augmented or reversed in any way. such that Di = Ai ∪C i . Deﬁnition 4..3.
124 CHAPTER 4. and four observations for object 2 (U 2 = {d. The exact reward of each candidate subset has been evaluated. The integer program at iteration l can choose any candidate subset Ai ∈ Tl i . INDEPENDENT OBJECTS AND INTEGER PROGRAMMING Di ∈ S i . as has the incremental reward of each element of the corresponding exploration subset.e.3. and for object 2. augmented by any subset of the corresponding exploration subset. we also evaluate i the incremental reward of each element in the corresponding exploration subset. which new i i candidate subsets to add). c}). γ. augmenting a choice of Ai ∈ Tl i with a subset of elements of the exploration i subset C i ⊆ Bl. δ}). g}). g} by choosing the candidate subset {g} with the exploration subset elements {d. e. f. The collection of possible subsets for object i is S i = 2U . f and g consume resources α. There are four resources (R = {α. and Bl+1. The update algorithm takes the result of the integer i program at iteration l and determines the changes to make to Tl+1 (i.Ai for each Ai ∈ Tl+1 for iteration (l + 1). while observations d. The incremental reward ruAi of an elemental i observation u ∈ Bl. e}. observations a. The incremental rewards are used to obtain upper bounds on the reward of observation sets in S i \Tl i generated using candidate subsets and exploration subsets. we could select the subset {c} by selecting the candidate subset ∅ with the exploration subset element {c}.4 (update algorithm). β and γ respectively. for object 1. As well as evaluating the reward of each candidate subset Ai ∈ Tl i . e.5 (incremental reward). β.Ai .Ai given a candidate subset Ai is the increase in reward for choosing the single new observation u when the candidate subset Ai is already chosen: i i i ruAi = rAi ∪{u} − rAi 4.Ai . β. The candidate . For example. and c consume resources α. 4. b. γ and δ i respectively. Deﬁnition 4. The subsets involved in iteration l of the algorithm are illustrated in Fig. Deﬁnition 4.. There are three observations available for object 1 (U 1 = {a. The sets are constructed such that there is a unique way of selecting any subset of observations in S i . The candidate subsets shown in the circles in the diagram and the corresponding exploration subsets shown in the rectangles attached to the circles are the result of previous iterations of the algorithm. b. we could select the subset {d.1 Example Consider a scenario involving two objects. e. Bl.4.
The shaded candidate subsets and exploration subset elements denote the solution of the integer program at this iteration. c} 1 Bl. e.{a} = {c} 1 Bl. The integer program may select for each object any candidate subset in Tli . f } Exploration subsets Figure 4. f } 2 Bl.{a.b} = {c} Exploration subsets Object 2 Candidate subsets (Tl2 ) ∅ {g} 2 Bl.Object 1 Candidate subsets (Tl1 ) ∅ {a} {a.4. augmented by any subset of elements from the corresponding exploration subset. illustrated by the rectangle connected to the circle. Subsets available in iteration l of example scenario.∅ = {b.∅ = {d. The sets are constructed such that there is a unique way of selecting any subset of observations in S i . e. illustrated by the circles. The subsets selected for each object must collectively satisfy the resource constraints in order to be feasible. b} 1 Bl. .{g} = {d.
4. g}.3.e. so we simply set Tl+1 = Tl 1 .4). while the exploration subset corresponding to candidate subset Ai ∈ Tl i is denoted by i i Bl.126 CHAPTER 4.3. e. / 1 We now demonstrate the operation of the update algorithm that is described in detail in Section 4.5. we cannot select candidate subset ∅ with exploration subset elements {a.1) creates a new ˜ candidate subset A consisting of the candidate subset selected for the object ({g}).g. Bl+1. g}. since element e 2 was removed from the exploration subset for candidate subset {g} (i. e.. The sets that were modiﬁed in the update are shaded in the diagram.. The procedure that assuring that this is always the case is part of the algorithm which we describe in Section 4. to select the set {a. we must choose candidate subset {a} and exploration subset element c. and subset {e. 4. we select T0i = {∅}. There are many / ways that this update could be performed. the only way to select elements g and e together (for object 2) is to select the new candidate subset {e..3. and B0.{g} ).Ai ⊆ U i . The subsets in iteration (l + 1) are illustrated in Fig. g} ∈ Tl 2 .3. For object 2. c} for object 1. f. augmented by the single element (out of the selected exploration subset elements) with the highest incremental reward.e. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING subsets and exploration subsets are generated using an update algorithm which ensures that there is exactly one way of selecting each subset. To initialize the problem.∅ = U i for all i.. 4. Suppose that the solution of the integer program at iteration l selects subset {a} for object 1. Suppose in our case that the reward of the subset {e. Since {a} ∈ Tl 1 the exact reward of the subset selected for object 1 has already been evaluated and we 1 do not need to modify the candidate subsets for object 1. g} ˜ is greater than the reward of {f. Our method (Algorithm 4. the candidate subsets and exploration subset elements that are shaded in Fig. There remains a unique way of selecting each subset of observations. The . g} for object 2 (i. g}. then A = {e.3.2 Formulation of the integer program in each iteration The collection of candidate subsets at stage l of the solution is denoted by Tl i ⊆ S i . so an update is required.g.∅ . c} since a ∈ Bl. f. we ﬁnd that {e.
There remains a unique way of selecting each subset of observations.{a.. e.{g} ). f } 2 Bl+1.{a} = {c} 1 Bl+1. c} 1 Bl+1.Object 1 1 Candidate subsets (Tl+1 ) ∅ {a} {a.e.g} = {d.{g} = {d.{e. f } 2 Bl+1. e. Subsets available in iteration (l + 1) of example scenario. Bl+1. since element 2 e was removed from the exploration subset for candidate subset {g} (i.∅ = {b. the only way to select elements g and e together (for object 2) is to select the new candidate subset {e. The subsets that were modiﬁed in the update between iterations l and (l + 1) are shaded. f } Exploration subsets Figure 4.∅ = {d..g. g} 2 Bl+1. b} 1 Bl+1. g}. .b} = {c} Exploration subsets Object 2 2 Candidate subsets (Tl+1 ) ∅ {g} {e.5.
1} ∀ i.16f) ∀ i.Ai i ωuAi ≤ k i ∀ i ∈ {1. the constraint in Eq. .14) on the maximum number of elements of the resource set R allowed to be utilized on any given object.16c) (4. . .16a) max i i rAi ωAi + i=1 M Ai ∈Tli M i ω Ai i=1 Ai ∈T i l t∈t(Ai ) u∈Bi i l.A i i ruAi ωuAi s. the solution of the integer program selects for each object i a subset of observations: . 1} ∀ i.t. In accordance with Deﬁnition 4. (4. + i=1 Ai ∈T i u∈Bi l l. M } (4.Ai ωAi ≤ 0 u∈Bi i l.16f).Ai i ωuAi ≤ 1 t∈t(u) ∀t∈R (4.16e) and Eq. Ai ∈ Tl i If there is a cardinality constraint of the form of Eq. All selection variables are either zero (not selected) or one (selected) due to the integrality constraints in Eq.13) on the maximum number of elemental observations allowed to be used on any given object.128 CHAPTER 4. . Ai ∈ T i . we add the constraints: i i A ωAi + Ai ∈Tli u∈Bi l.16b) i ωAi = 1 ∀ i ∈ {1.16e) (4. ωi i A uA (4.16g) Alternatively. (4. .16d) (4. M } (4. we add the constraints: i i t(A )ωAi + Ai ∈Tli u∈Bi i l.16h) i The selection variable ωAi indicates whether candidate subset Ai ∈ Tl i is chosen. Ai ∈ T i i i ωuAi ∈ {0. . .16c) guarantees that exactly one candidate subset is chosen for each i object.A i ωAi ∈ {0. u ∈ Bl. . . M } Ai ∈T i i i i ωuAi − Bl.Ai (4. . .3. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING integer program that we solve at each stage is: M ωi i .A i t(u)ωuAi ≤ Ri ∀ i ∈ {1. The selection variables ωuAi indicate the elements of the exploration subset corresponding to Ai that are being used to augment the candidate subset. (4. (4. (4. if there is a constraint of the form of Eq. .
Sec. (4. (4. the reward for selecting a candidate subset and one exploration subset variable is a exact reward value. while the second includes all exploration subset elements that consume resource t.6 (selection variables). (4.3 i that.5. due to submodularity.16b) dictates that each resource can only be used once.16e) and Eq. the integrality constraints cannot be relaxed—the problem is not an assignment problem since candidate subsets may consume multiple resources. Again. (4. hence this does not force the system to take an observation of any given object. As per Deﬁnition 4. there will be a candidate subset corresponding to taking no observations (Ai = ∅). The values of the selection variables.16f) require each variable to be either zero (not selected) or one (selected).16d)). The ﬁrst term includes all candidate subsets that consume resource t. the reward for selecting a candidate subset and two or more exploration subset variables is an upper bound. • The constraint in Eq. either by a candidate subset or an exploration subset element.16a) is the sum of the reward for the candidate subset selected for each object.16d) speciﬁes that exploration subset elements which correspond to a given candidate subset can only be chosen if that candidate subset is chosen.3. • The integrality constraints in Eq.16c) speciﬁes that exactly one candidate subset should be selected per object. The subset selected for object i is i Di = Ai ∪ C i where Ai is the candidate subset for object i such that ωAi = 1 and i i C i = {u ∈ Bl. (4. Actually. (4. the sum of the reward increments u ruAi is an upper bound for the additional reward obtained for selecting those elements given that the candidate set Ai has been selected. this is analogous with Eq. (4. the reward increment ruAi = i i (rAi ∪{u} − rAi ) represents the additional reward for selecting the elemental action u given that the candidate subset Ai has been selected. plus the incremental rewards of any exploration subset i elements that are selected. (4. Constraint generation approach 129 i i Deﬁnition 4.16) may be interpreted as follows: • The objective in Eq. The objective and constraints in Eq. ωAi .1 • The constraint in Eq. 1 . ωuAi determine the subsets selected for each object i.Ai ωuAi = 1}. (4. and there are side constraints (Eq.15b) in the original formulation. • The constraint in Eq. At each solution stage. 4. We will see in Lemma 4.
130 CHAPTER 4.1 describes the iterative manner in which the integer program in Eq.Ai \{˚i } ul ˚ ˚ l l i i ˚ Bl+1.16) to solve Eq. .Ai l ˚ ˜ ul let Ai = Ai ∪ {˚i } l l i = T i ∪ {Ai } ˜ Tl+1 l l i i i ˜ / Bl+1.16) while ∃ i Ai such that u∈Bi l. M } do i ˚ let Ai be the unique subset such that ωAi = 1 l ˚ l if i Tl+1 = i Bl+1.Ai i ωuAi > 1 do for i ∈ {1. B0.16) is applied: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 i T0i = ∅ ∀ i.16) is resolved.∅ = U i ∀ i.1: Constraint generation algorithm which iteratively utilizes Eq. (4. l = 0 i i evaluate ru∅ ∀ i. (4.Ai \{˚i }\{u ∈ Bl. (4.3. u ∈ B0.Ai Ai ∪ {u} ∈ S i } ul ˜ l ˚ ˚ l l l i i i evaluate rAi = rAi + r˚Ai ˜ ˚ u ˚l l l i i evaluate ruAi ∀ u ∈ Bl+1. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING 4. .Ai ∀ Ai ∈ Tl i i i ruAi ˚ l else let ˚i = arg maxu∈Bi ul ˚ l.15). (4.A l i ωuAi ≤ 1 then ˚ l Tl i = Bl.3 Iterative algorithm Algorithm 4. Ai = Ai l end end l =l+1 resolve problem in Eq.Ai u∈Bi ˚i l.Ai ∀ Ai ∈ Tl i .∅ solve problem in Eq. • In each iteration (l) of the algorithm.Ai = Bl.Ai = Bl. the integer program in Eq.Ai ˜ ˜ l l i i Bl+1. . (4. . (4.Ai = Bl. .16) end Algorithm 4.
1). We commence with a single empty candidate subset for each object (T0i = .1.Ai ) remain unchanged for that object in the following iteration (lines 8–9).3).3. This greedy exploration is analul ogous to the greedy heuristic discussed in Chapter 3. γ}). Thus we ˜ generate an additional candidate subset (Ai ) which augments the previously l i ) with the exploration subset element with the ˚ chosen candidate subset (Al highest reward increment (˚i ) (lines 11–13). Obviously we want to preclude selection of a candidate subset along with additional exploration subset elements to construct a subset for which the exact reward has already been calculated.1. – If more than one exploration subset element has been chosen for a given object. otherwise another iteration of the “while” loop (line 4 of Algorithm 4. then the rewards of all subsets selected correspond to the exact values (as opposed to upper bounds). the optimal solution has been found (as we will show in Theorem 4. then the reward obtained by the integer program for that object is an upper bound to the exact reward (as we show in Lemma 4.3. 4. and that the reward of the various observation subsets are as shown in Table 4.1) is executed. • Each iteration of the “while” loop considers decisions corresponding to each object (i) in turn: (the for loop in line 4) – If no more than one exploration subset element is chosen for object i then the reward for that object is an exact value rather than an upper bound. so i the collection of candidate subsets (Tl i ) and the exploration subsets (Bl. the updates of the exploration subsets in lines 14 and 17 achieve this.4 Example Suppose that there are three objects (numbered {1. 4. we use it to decide which portion of the action space to explore ﬁrst. rather than using it to make greedy action choices (which would result in loss of optimality). 2. using an upper bound to the exact reward. as shown in Lemma 4. and the algorithm terminates. As we will see in Theorem 4. this scheme maintains a guarantee of optimality if allowed to run to termination. Here. β.1.Sec. – Exploration subsets allow us to select any subset of observations for which the exact reward has not yet been calculated. Constraint generation approach 131 If no more than one exploration subset element is chosen for each object. 3}) and three resources (R = {α.
1. β} {α. resources consumed and rewards for each object in the example shown in Fig. b. c} {b.8 Table 4.5 0 0. γ} {β. c} {a. β.Object 1 1 1 1 1 1 1 1 2 2 3 3 Subset ∅ {a} {b} {c} {a.6 0 0. . γ} ∅ {β} ∅ {γ} Reward 0 2 2 2 3 3 3 3. 4. b} {a. Observation subsets.6. γ} {α. c} ∅ {d} ∅ {e} Resources consumed ∅ {α} {β} {γ} {α.
Iteration 0 Candidate subsets (T01 ) Iteration 1 Candidate subsets (T11 ) ∅ ∅ {a} {a. The solution to the integer program in each iteration is shown along with the reward in the integer program objective (“IP reward”). reward: 3. reward: 3. Four iterations of operations performed by Algorithm 4. .1 on object 1 (arranged in counterclockwise order. c} {b. from the topleft). c} Exploration subsets Solution: 1 1 1 ω{a} = ωb{a} = ωc{a} = 1 2 3 ω ∅ = ω∅ = 1 IP reward: 4. which is an upper bound to the exact reward.8. b. reward: 3.6. and the exact reward of the integer program solution (“reward”). b} ∅ {a} {a. while the attached rectangles show the corresponding exploration subsets. c} {c} {c} {c} {c} {c} {c} Exploration subsets Solution: 1 1 1 ω∅ = ωb∅ = ωc∅ = 1 2 3 ω ∅ = ω∅ = 1 IP reward: 4. The circles in each iteration show the candidate subsets. reward: 3 Exploration subsets Solution: 1 ω{a. b} {b} {b. The shaded circles and rectangles in iterations 1.8 Figure 4.5 Iteration 2 Candidate subsets (T21 ) Iteration 3 Candidate subsets (T31 ) ∅ {a} {a.5 {b. c} Exploration subsets Solution: 1 1 1 1 ω∅ = ωa∅ = ωb∅ = ωc∅ = 1 2 3 ω ∅ = ω∅ = 1 IP reward: 6. 2 and 3 denote the sets that were updated prior to that iteration.b} = 1 2 3 3 ω∅ = ω∅ = ωe∅ = 1 IP reward: 3.
candidate subset {a. the incremental reward of observation b or c conditioned on the candidate subset {a} is 1. b} and no exploration subset elements). choosing the candidate subset ∅. the integer program selects the three observations for object 1 (i. The exact reward of this conﬁguration is 3.6 illustrates the operation of the algorithm in this scenario: • In iteration l = 0. No observations are chosen for objects 2 and 3. which is substantially closer to the exact reward of the conﬁguration (3. c}. 1 • The incremental reward of observation c conditioned on the new candidate subset {a. c}). but the exact reward of the conﬁguration is reduced to 3 (note that the exact reward of the solution to the integer program at each iteration is not monotonic).. b}.e. candidate subset ∅ and exploration subset element {e}). and the optimal solution to the integer program in iteration is still to select these three observations for object 1. breaking ties arbitrarily). The upper bound to the reward provided by the integer program remains 4.e. in iteration l = 1.. but to do so it is now necessary to select candidate subset {a} ∈ T11 together with the exploration subset elements {b. so we introduce a new candidate subset ˜ A1 = {a. The upper bound to the reward provided by the integer program is now 4. b} is 0..5). c}.134 CHAPTER 4. and no observations for objects 2 and 3. • Now. and no observations for objects 2 and 3. 2 • The optimal solution to the integer program in iteration l = 3 is then the true optimal conﬁguration. 4. and observation e for object 3 (i.5. The optimal solution to the integer program at iteration l = 2 is then to select object 1 candidate subset ∅ and exploration subset elements {b. Once again there are two exploration subset ˜ elements selected. yielding an upper bound for the reward of 6 (the incremental reward of each exploration subset element from the empty set is 2). b. Selection of these exploration subset elements indicates that subsets involving them should be ˜ explored further. and the three exploration subset elements {a. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING {∅}). so we introduce a new candidate subset A1 = {b}. selecting observations a and b for object 1 (i. no observations for object 2.e. hence we create a new candidate subset A1 = {a} (the element 0 with the highest reward increment.5. Again we have two exploration subset elements selected (which is why the reward in the integer program is not the exact reward). Since no more than one exploration subset element is chosen for . The diagram in Fig.
S i ) in every iteration. 4.6.1. 4.3. and that the upper bound is monotonically nonincreasing with iteration. we prove a simple proposition that allows us to represent selection of a feasible observation subset in two diﬀerent ways.1 is an exact value (rather than an upper bound). Theorem 4. That the conﬁguration selects C is immediate from Deﬁnition 4.1 establishes this result. the algorithm knows that it has found the optimal solution and thus terminates. together.3. then Ai ⊆ C ⊆ Ai ∪ BAi ..3 establishes that the reward in the integer program of any selection is an upper bound to the exact reward of that selection. Our goal is to prove that the algorithm terminates in ﬁnite time with an optimal solution. Hence the conﬁguration is feasible. i selects subset C for object i. these two results establish that the collection of subsets from which we may select remains the same (i. and is the unique conﬁguration doing so that has ωAi = 1.1. If C ∈ S i . Several intermediate results are obtained along the way. Proof. Before we commence.Sec. no two resources in C utilize the same resource. assuming that the required resources are not consumed on other objects. Ai ∈ Tl i (it is true for l = 0.4 establishes that the reward in the integer program of the solution obtained upon termination of Algorithm 4.5 Theoretical characteristics We are now ready to prove a number of theoretical characteristics of our algorithm. if a feasible selection variable conﬁguration for object i selects C. Constraint generation approach 135 each object. while Lemma 4. Conversely.1 proves that there is a unique way of selecting each subset in S i in each iteration. u ∈ C\Ai otherwise is feasible (provided that the required resources are not consumed on other objects).Ai = ∅ ∀ l. Lemma 4. From i Algorithm 4.2 proves that no subset that is not in S i can be selected. and .Ai then the conﬁguration: i ω Ai = 1 1. Lemma 4. it is clear that Ai ∩ Bl. Ai ∈ Tl i and Ai ⊆ C ⊆ Ai ∪ Bl.1. This leads directly to the ﬁnal result in Theorem 4.e. and i i ωAi = 1. Since C ∈ S i . i Proposition 4. Lemma 4. that the algorithm terminates with an optimal solution. ωuAi = 0.
the corresponding element is removed from Bl. and it is not the case that l ˚ l l i i ˜ ˜ Ai ⊆ C ⊆ Ai ∪ Bl+1. Hence i the conﬁguration given is the unique conﬁguration selecting for which ωAi = 1. In every iteration l. It will also not be the case that Ai ⊆ C ⊆ i i i Ai ∪ Bl+1.A† . Now suppose that the lemma holds for some l.Ai . and it is not the case for any other ul / ˚l l ˚ i i Ai ∈ Tl i since Bl+1.Ai . for any object i and any subset C ∈ S i there is i i exactly one conﬁguration of selection variables for the ith object (ωAi . Given any C ∈ S i . The following lemma establishes that the updates in Algorithm 4. and it is not the case that A for any other Ai ∈ Tl+1 Al ˜ l+1.Ai i ˚ (it is not the case for Ai since ˚i ∈ Ai ∪ Bl+1. T0i = {∅}. Hence exactly one selection variable conﬁguration selects C. i i ˚ Finally. we will show that it also holds for i (l + 1). l l i i and it is not the case for any other Ai ∈ Tl i since Bl+1.Ai = Bl.Ai = Bl. Lemma 4. since each of these conditions represents a tightening of the ˜ l l l l l l l l original condition. In each case. ωuAi ) that selects observation subset C.Ai for any other Ai ∈ Tl i . If A† = Ai and ˚i ∈ C then Ai ⊆ C ⊆ l l i ⊆ C ⊆ Ai ∪ B i i i ∪ Bi ˜ . it must then hold for every iteration.Ai .Ai and the condition did not hold for iteration l by the induction hypothesis). there exists a unique A† ∈ Tl i such that A† ⊆ C ⊆ A† ∪Bl. (4.Ai for any other Ai ∈ Tl+1 (it is not the case for Ai since Ai C.Ai ).Ai and the condition did not hold for iteration l by the induction hypothesis). (4. By induction. This shows that the lemma holds for (l + 1). Since it is not the l i i ˚ ˚ ˚ ˚ case that Ai ⊆ C ⊆ Ai ∪ Bl. Proof.Ai l+1. Hence exactly one selection variable conﬁguration selects C. hence the only conﬁguration selecting set C is that obtained from Proposition 4. The converse results directly from Deﬁnition 4.136 CHAPTER 4. In the ﬁrst iteration (l = 0). if A† = Ai then A† ∈ Tl+1 and A† ⊆ C ⊆ A† ∪ Bl+1. Ai = A† since Bl+1. it will also not be the case that Ai ⊆ C ⊆ Ai ∪ Bl+1.1 with Ai = ∅.1.A† ˜ ˚ ul by the induction hypothesis and Proposition 4.Ai .16c) and Eq. .1.6.16d).1 maintain a unique way of selecting observation subset in S i in each iteration.Ai . which was already false. i ˚ ˚ ˚ ul / If A† = Ai and ˚i ∈ C then Ai ⊆ C ⊆ Ai ∪ Bl+1.Ai l l ˚ l l ˚ i ˜ ˜ or Ai ⊆ C ⊆ Ai ∪ Bl+1.Ai = Bl. note that all other selection variables for object i must be zero due to the constraints in Eq. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING every time a new candidate subset Ai is created by augmenting an existing candidate i subset with a new element.
e. un }). submodular reward function. 4. (4. there must be at least one resource in S / i that is used twice. . Since D i ∈ S i . Let Ai be the subset l i such that ωAi = 1.16b). .16g) or (4. un } of the elements of ˚ i i Bl.Sec. Assume that / i is of the form of Eq.16h). the reward for selecting subset C for object i in the integer program in iteration l2 is less than or equal to the reward for selecting C in l1 for any l1 < l2 . .16) is an upper bound to the exact reward of selecting that subset in every iteration. (4. The following lemma uses submodularity to show that the reward in the integer program for selecting any subset at any iteration is an upper bound to the exact reward and. This is a key result for proving the optimality of the ﬁnal result when the algorithm terminates. that there is a subset Di that is feasible for object i i (i. In every iteration l.3. The reward associated with selecting observation subset C for object i in the integer program in Eq. .. Proof.2.12). is in S i . Combined with the previous result. Lemma 4. Di . . this establishes that the subsets in S i and the conﬁgurations of the selection variables in the integer program for object i are in a bijective relationship at every iteration.14). C = Ai ∪ {u1 . Furthermore. that the bound tightens as more iterations are performed. there exists a feasible conﬁguration of selection variables with ωAi = 1 and i i C i = {u ∈ Bl. As with the analysis in Chapter 3. Introducing an arbitrary ordering {u1 . . any feasible selection of a subset for object i. ˚ Proof. (4. Suppose.16b). for contradiction. the exact reward for selecting ˚ ˚ l l i l l . yielding a contradiction. but applies to any nondecreasing. then a contradiction will be obtained from Eq. (4. Suppose that subset C is selected for object i in iteration l.Ai for which ωu Ai = 1 (i.Ai = 1}. (4. Lemma 4. . furthermore. Constraint generation approach 137 The following result establishes that only subsets in S i can be generated by the integer program. but that Di ∈ S i . .13) or (4.3. This selection must then be infeasible due to the constraint in D Eq. If S i is of the form of Eq. (4.Ai ωl.. such that Di = Ai ∪ C i ).e. the result is derived in terms of mutual information.
138 observation subset C is:
i rC (a)
CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
i I(X i ; zC ) n
l
i = I(X i ; zAi ) + ˚ j=1 n
i ˚ i i I(X i ; zuj Ai , zu1 , . . . , zuj−1 ) l
(b)
i ≤ I(X i ; zAi ) + ˚
l
i i I(X i ; zuj zAi ) ˚ j=1
l
(c)
n
l
i = rA i + ˚ j=1
i ru
˚i j A l
where (a) is an application of the chain rule, (b) results from submodularity, and (c) i results from the deﬁnition of riAi and conditional mutual information. This establishes the ﬁrst result, that the reward for selecting any observation set in any iteration of the integer program is an upper bound to the exact reward of that set. Now consider the change which occurs between iteration l and iteration (l + 1). ˚ Suppose that the set updated for object i in iteration l, Ai , is the unique set (by l i ∈ C. Assume without loss of i ⊆ C ⊆ Ai ∪ B i ˚ ˚ , and that ˚l u Lemma 4.1) for which Al l ˚ l,Ai generality that the ordering {u1 , . . . , un } from the previous stage was chosen such that un = ˚i . Then the reward for selecting subset C at stage (l + 1) will be: ul
n−1 i rA i + ˜
l l
i (rAi ∪{u ˜ j=1
l
j
i i − rAi ) = I(X i ; zAi ) + ˜ ˜ }
l l
(a)
n−1 i i I(X i ; zuj zAi ) ˜ j=1
l
(b)
i i i = I(X i ; zAi ) + I(X i ; z˚i zAi ) ˚ ˚ u
l l l
n−1
+
j=1 (c)
i i i I(X i ; zuj zAi , zun ) ˚
l
n
l
i ≤ I(X i ; zAi ) + ˚ j=1 i ru j=1 n i rA i ˚ l
i i I(X i ; zuj zAi ) ˚
l
(d)
=
+
˚i j A l
i where (a) and (d) result from the deﬁnition or rA and conditional mutual information, (b) from the chain rule, and (c) from submodularity. The form in (d) is the reward for selecting C at iteration l. i ˚ ˚ If it is not true that Ai ⊆ C ⊆ Ai ∪ Bl,Ai , or if ˚i ∈ C, then the conﬁguration ul / l l ˚
l
Sec. 4.3. Constraint generation approach
139
selecting set C in iteration (l + 1) will be identical to the conﬁguration in iteration l, and the reward will be unchanged. Hence we have found that the reward for selecting subset C at iteration (l + 1) is less than or equal than the reward for selecting C at iteration l. By induction we then obtain the second result. At this point we have established that, at each iteration, we have available to us a unique way of selecting each subset in same collection of subsets, S i ; that in each iteration the reward in the integer program for selecting any subset is an upper bound to the exact reward of that subset; and that the upper bound becomes increasingly tight as iterations proceed. These results directly provide the following corollary. Corollary 4.1. The reward achieved in the integer program in Eq. (4.16) at each iteration is an upper bound to the optimal reward, and is monotonically nonincreasing with iteration. The last result that we need before we prove the ﬁnal outcome is that the reward in the integer program for each object in the terminal iteration of Algorithm 4.1 is the exact reward of the selection chosen for that object. Lemma 4.4. At termination, the reward in the integer program for the subset Di selected for object i is the exact reward of that subset.
i Proof. In accordance with Deﬁnition 4.6, let Ai be the subset such that ωAi = 1, and i let C i = {u ∈ Bl,Ai ωuAi = 1}, so that Di = Ai ∪ C i . Since the algorithm terminated at iteration l, we know that (from line 4 of Algorithm 4.1)
C i  =
u∈Bi
l,Ai
i ωuAi ≤ 1
Hence either no exploration subset element is chosen, or one exploration subset element is chosen. If no exploration subset element is chosen, then Ai = Di , and the reward i obtained by the integer program is simply rAi , the exact value. If an exploration subset element u is chosen (i.e., C i = {u}), then the reward obtained by the integer program is i i i i i i i rAi + ruAi = rAi + (rAi ∪{u} − rAi ) = rAi ∪{u} = rDi which, again, is the exact value. Since the objects are independent, the exact overall reward is the sum of the rewards of each object, which is the objective of the integer program.
140
CHAPTER 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING
We now utilize these outcomes to prove our main result. Theorem 4.1. Algorithm 4.1 terminates in ﬁnite time with an optimal solution. Proof. To establish that the algorithm terminates, note that the number of observation subsets in the collection S i is ﬁnite for each object i. In every iteration, we add a new subset Ai ∈ S i into the collection of candidate subsets Tl i for some object i. If no such subset exists, the algorithm must terminate, hence ﬁnite termination is guaranteed. Since the reward in the integer program for the subset selected for each object is the exact reward for that set (by Lemma 4.4), and the rewards for all other other subsets are upper bounds to the exact rewards (by Lemma 4.3), the solution upon termination is optimal.
4.3.6 Early termination
Note that, while the algorithm is guaranteed to terminate ﬁnitely, the complexity may be combinatorially large. It may be necessary in some situations to evaluate the reward of every observation subset in S i for some objects. At each iteration, the reward obtained by the integer program is an upper bound to the optimal reward, hence if we evaluate the exact reward of the solution obtained at each iteration, we can obtain a guarantee on how far our existing solutions are from optimality, and decide whether to continue processing or terminate with a nearoptimal solution. However, while the reward in the integer program is a monotonically nonincreasing function of iteration number, the exact reward of the subset selected by the integer program in each iteration may increase or decrease as iterations progress. This was observed in the the example in Section 4.3.4: the exact reward of the optimal solution in iteration l = 1 was 3.5, while the exact reward of the optimal solution in iteration l = 2 was 3. It may also be desirable at some stage to terminate the iterative algorithm and select the best observation subset amongst the subsets for which the reward has been evaluated. This can be achieved through a ﬁnal execution of the integer program in Eq. (4.16), adding the following constraint:
i ωuAi ≤ 1 ∀ i ∈ {1, . . . , M } Ai ∈Tli u∈Bi
l,Ai
(4.17)
Since we can select no more than one exploration subset element for each object, the reward given for any feasible selection will be the exact reward (as shown in Lemma 4.4).
Sec. 4.4. Computational experiments
141
The reward that can be obtained by this augmented integer program is a nondecreasing function of iteration number, since the addition of new candidate subsets yields a wider range of subsets that can be selected without using multiple exploration actions.
4.4 Computational experiments
The experiments in the following sections illustrate the utility of our method. We commence in Section 4.4.1 by describing our implementation of the algorithm. Section 4.4.2 examines a scenario involving surveillance of multiple objects using two moving radar platforms. Section 4.4.3 extends this example to incorporate additional nonstationarity (through observation noise which increases when objects are closely spaced), as well as observations consuming diﬀerent numbers of time slots. The scenario discussed in Section 4.4.4 was constructed to demonstrate the increase in performance that is possible due to additional planning when observations all consume a single time slot (so that the guarantees of Chapter 3 apply). Finally, the scenario discussed in Section 4.4.5 was constructed to demonstrate the increase in performance which is possible due to additional planning when observations consume diﬀerent numbers of time slots.
4.4.1 Implementation notes
The implementation utilized in the experiments in this section was written in C++, solving the integer programs using ILOG CPLEX 10.1 through the callable library interface. In each iteration of Algorithm 4.1, we solve the integer program, terminating when we ﬁnd a solution that is guaranteed to be within 98% of optimality, or after 15 seconds of computation time have elapsed. Before commencing the next iteration, we solve the augmented integer program described in Section 4.3.6, again terminating when we ﬁnd a solution that is guaranteed to be within 98% of optimality or after 15 seconds. Termination occurs when the solution of the augmented integer program is guaranteed to be within 95% of optimality,2 or after 300 seconds have passed. The present implementation of the planning algorithm is limited to addressing linear Gaussian models. We emphasize that this is not a limitation of the planning algorithm; the assumption merely simpliﬁes the computations involved in reward function evaluations. Unless otherwise stated, all experiments utilized an open loop feedback control strategy (as discussed in Section 2.2.2). Under this scheme, at each time step a plan was constructed for the next N step planning horizon, the ﬁrst step was executed, the reThe guarantee is obtained by accessing the upper bound found by CPLEX of the integer program in Algorithm 4.1.
2
to the optimal reward
4.7 revolutions of the racetrack pattern in Fig.3. as shown in Fig.j. All times were measured using a 3. 100] × [10. There are k M objects under track.d (4.8).e. or an azimuth and range rate observation. The platforms move along ﬁxed racetrack patterns.7 in this time. The computation times presented in the following sections include only the computation time involved in the solution of the integer program in Algorithm 4.7. y j ) 0 k [xi −y j ]1 k k k + v i.r (4.142 CHAPTER 4.. (2.j. In each time slot. The simulation length is 200 steps.r = k k 0 1 i − y j ] )2 + ([xi − y j ] )2 ([xk k 1 k k 3 i −y j ] [xk k 3 tan−1 b(xi . y j ) 0 [xi −y j ]1 k k k k v i.3 The initial positions of objects are distributed uniformly in the region [10.1. the states of which evolve according to the nominally constant velocity model described in Eq.j.25.03 sec and q = 1. We denote by y i the state (i. and then a new plan was constructed for the following N step horizon. corrupted by additive Gaussian noise with zero mean and standard deviation 1 (in position states) and 0.j.d = [xi −yj ] [xi −yj ] +[xi −yj ] [xi −yj ] + k k k k1 k k2 k k3 k k4 0 1 q ([xi −y j ]1 )2 +([xi −y j ]3 )2 k k k k where z i. 4.j. position and velocity) of platform i at time k. As such.18) z i. 3 . 4.06 GHz Intel Xeon processor.r denotes the azimuth/range observation for object i using sensor j at time k. 4. with ∆t = 0.1 (in velocity states). the initial velocities in each direction are drawn from a Gaussian distribution with mean zero and standard deviation 0. 100].19) z i. The computation time expended in the augmented integer program of Section 4.2 Waveform selection Our ﬁrst example models surveillance of multiple objects by two moving radar platforms. The initial estimates are set to the true state.6 are excluded as the results of these computations are not used as inputs to the following iterations. the augmented integer program could easily be executed on a second processor core in a modern parallel processing architecture. obtaining either an azimuth and range observation. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING sulting observations were incorporated. each sensor may observe one of the M objects. each of which occupies a single time slot: j i −1 [xk −y k ]3 tan b(xi . k The movement of the sensor is accentuated in order to create some degree of nonstationarity in the sensing model. the sensor platforms complete 1.
Sensor paths and example object positions 100 80 60 40 20 0 0 20 40 60 80 100 Figure 4. M objects are positioned randomly within the [10. 100] according to a uniform distribution. as illustrated by the ‘ ’ marks.7. .7 revolutions of the pattern in the 200 time slots in the simulation. The sensor platforms complete 1. 100] × [10. The two radar sensor platforms move along the racetrack patterns shown by the solid lines. the position of the two platforms in the tenth time slot is shown by the ‘*’ marks.
k v i. while an extended Kalman ﬁlter is used for estimation.03 units/sec.. Fig.e. A linearized Kalman ﬁlter is utilized in order to evaluate rewards for the planning algorithm. k k while the second and fourth elements contain the velocity in the xaxis and yaxis respectively. 4. the ﬁrst and third elements of the object state xi and the sensor state y j contain the position in the xaxis and yaxis respectively. The notation [a]l k denotes the lth element of the vector a.8.d denotes the azimuth/range rate (i. although the increase is limited to around 1–3% over that obtained when the planning horizon is limited to a single time slot.j. The diagrams show that there is a small increase in performance as the planning horizon increases. The example also demonstrates the computational complexity of the method. The standard deviation of the range observation (σr ) is 0. 0. when the sensor k k 1 platform heading is perpendicular to the vector from the sensor to the object) to 3 3 endon. Intuitively one would expect a relatively short planning horizon to be adequate. and distributed according to a Gaussian PDF: p(v i.05 units. 0. while the standard deviation of the range rate observation (σd ) is 0. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING and z i. The small increase in performance is not unexpected given that there is little nonstationarity in the problem.e. so we simply use the object position estimate to calculate an estimate of the noise variance. planning is conducted jointly over the two sensors.144 CHAPTER 4.j..8 provides a conﬁrmation of this intuition using planning horizons that are intractable using previous methods. Results for 16 Monte Carlo simulations with planning horizons of between one and 30 time slots per sensor are shown in Fig. The top diagram shows the results for 50 objects.j.j. the computation cost of doing so through brute force enumeration is equivalent to that of planning for a single sensor over two time slots. as shown in the bottom diagram in . y j ) varies from unity on the broadside (i. The observation noise in each observation is independent of all other observations. In practice the tracking accuracy is high enough that the variation in azimuth noise due to the uncertainty in object position is small.d ) = N k v i. the multiplier function b(xi . With a planning horizon of one.r . 4.d . k 2 σφ 0 2 0 σr 2 σφ 0 2 0 σd The standard deviation of the noise on the azimuth observations (σφ ) is 3◦ .j. while the middle diagram shows the results for 80 objects. Doppler) observation.r ) = N k p(v i.
.02 1. giving an indication of the improvement due to additional planning. the sum of the MI reductions in each time step) of a single Monte Carlo simulation for diﬀerent planning horizon lengths divided by the total reward with the planning horizon set to a single time step..01 1 0 10 15 20 25 Horizon length (time slots per sensor) Performance in 16 Monte Carlo simulations of 80 objects 5 30 1. Top diagram shows results for 50 objects. Each trace in the plots shows the total reward (i.Performance in 16 Monte Carlo simulations of 50 objects 1.8. Results of Monte Carlo simulations for planning horizons between one and 30 time slots (in each sensor).03 Relative gain 1.02 1.01 1 0 5 Average time (seconds) 10 10 10 10 10 10 2 1 0 −1 −2 −3 10 15 20 25 Horizon length (time slots per sensor) Average computation time to produce plan 30 50 objects 80 objects 0 5 10 15 20 Horizon length (time slots per sensor) 25 30 Figure 4. Bottom diagram shows the computation complexity (measured through the average number of seconds to produce a plan for the planning horizon) versus the planning horizon length.03 Relative gain 1. while middle diagram shows results for 80 objects.e.
3 State dependent observation noise The second scenario involves a modiﬁcation of the ﬁrst in which observation noise increases when objects become close to each other. velocity magnitudes are drawn from a Gaussian distribution with mean 30 and standard deviation 0. The scenario involves a single sensor rather than two sensors. although we do not model the dependency between objects which generally results. while the velocity directions are distributed uniformly on [0. 4. 2π]. the complexity increases exponentially with the planning horizon length. 4. However. This is a surrogate for the impact of data association. and q = 0. except that there is a statedependent scalar multiplier on the . The diagram shows that. The dynamical model has ∆t = 0. When the planning horizon is signiﬁcantly longer than the number of objects.02 (in position states) and 0. As per the previous scenario. so the number of candidate subsets in S i requiring consideration is vastly lower. using the algorithm it is possible to produce a plan for 50 objects over 20 time slots each in two sensors using little over one second in computation time. The computational complexity for problems involving diﬀerent numbers of objects for a ﬁxed planning horizon length (10 time slots per sensor) is shown in Fig. as the number of objects increases. corrupted by additive Gaussian noise with zero mean and standard deviation 0. Eventually.1 (in velocity states). it becomes necessary to construct plans involving several observations of each object.9. the observation model is essentially the same as the previous case.25. This will generally involve enumeration of an exponentially increasing number of candidate subsets in S i for each object.5. the initial positions of the objects are distributed uniformly on the region [10. Performing the same planning through full enumeration would involve evaluation of the reward of 1080 diﬀerent candidate sequences. a computation which is intractable on any foreseeable computational hardware. 100].146 CHAPTER 4.01 sec. it quickly becomes clear that it is better to spread the available resources evenly across objects rather than taking many observations of a small number of objects. As the number of objects increases. the overhead induced by the additional objects again increases the computational complexity. as expected. 100] × [10. resulting in an increase in computational complexity.8. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING Fig. The initial estimates are set to the true state.4. the simulation runs for 100 time slots. 4.
Computation time vs number of objects (two sensors.06 0.2 0.1 0.04 0. horizon length = 10) 0.14 Average time (seconds) 0.08 0. Computational complexity (measured as the average number of seconds to produce a plan for the 10step planning horizon) for diﬀerent numbers of objects.16 0.9.12 0. .18 0.02 0 10 20 30 40 50 60 Number of objects 70 80 90 Figure 4.
4.1. . . The function d(x1 .j. . and is not subject increased noise when objects become closely spaced. the smallest value such that 90% of the samples evaluate to a lower value.j.02 units.6◦ .e. and on the range rate observation is σd = 0. . In addition to the option of these two observations. 0 ≤ x < 10 δ(x) = 0. i. . x ≥ 10 The state dependent noise is handled in a manner similar to the optimal linear estimator for bilinear systems.015 units/sec. .r k (4. and then use this in a conventional linearized Kalman ﬁlter (for reward evaluations for planning) and extended Kalman ﬁlter (for estimation). . . in which we estimate the variance of the observation noise. This procedure provides a pessimistic estimate of the noise amplitude.21) [xi −y j ]1 k k [xi −y j ]1 [xi −y j ]2 +[xi −y j ]3 [xi −y j ]4 k q k k k k k k k ([xi −y j ]1 )2 +([xi −y j ]3 )2 k k k k The azimuth noise multiplier b(xi . . . The azimuth noise for these observations in the broadside aspect has σφ = 0. and evaluate the function d(x1 ..2. . xM ) captures the increase in observation noise when objects are close together: k k di (x1 .d k (4. y j ) is the same as in Section 4. . as is the azimuth k k noise standard deviation (σφ = 3◦ ).j. xM ) = k k j=i δ ([xi − xj ]1 )2 + ([xi − xj ]3 )2 k k k k where δ(x) is the piecewise linear function: 10 − x. . while the range noise has σr = 0. .j.r = k CHAPTER 4.d k = tan−1 + di (x1 . INDEPENDENT OBJECTS AND INTEGER PROGRAMMING tan−1 [xi −y j ]3 k k [xi −y j ]1 k k y j ]3 )2 k i 1 M + d (xk . We then k k estimate the noise multiplier as being the 90% percentile point of these evaluations. . y j ) 0 k k 0 1 v i. .148 observation noise: z i. . and the range rate noise has σd = 0.075.20) ([xi − k y j ]1 )2 k + ([xi − k [xi −y j ]3 k k z i. The standard deviation of the noise on the range observation is σr = 0. xk ) b(xi . the sensor can also choose a more accurate observation that takes three time slots to complete. We draw a number of samples of the joint state of all objects. . xM ) k k b(xi . xM ) for each. . y j ) 0 k k 0 1 v i. . .
averaged over 20 Monte Carlo simulations.05 1 0 5 10 Horizon length (time slots) 15 20 10 Average time (seconds) 10 10 10 10 10 2 Average computation time to produce plan 1 0 −1 −2 −3 0 5 10 Horizon length (time slots) 15 20 Figure 4.Performance in 20 Monte Carlo simulations of 50 objects 1.15 Relative gain 1. . Error bars show the standard deviation of the mean performance estimate.1 1.10. Lower diagram shows the average time required to produce plan for the diﬀerent length planning horizon lengths. Top diagram shows the total reward for each planning horizon length divided by the total reward for a single step planning horizon.
.e. the controller does not have the option of the three time step observation available to it.4 Example of potential beneﬁt: single time slot observations The third scenario demonstrates the gain in performance which is possible by planning over long horizon length on problems to which the guarantees of Chapter 3 apply (i.11)) with Hi = I. and then executes a single step before replanning for the following N steps. in the problem examined in the previous section.g.e.. When the end of the current N step planning horizon is the end of the scenario (e. The increase is roughly doubled as the planning horizon is increased. one can see k that similar events can commonly occur on a smaller scale in realistic scenarios..4. While this example represents an extreme case. and Ri = I in the ﬁrst 25 objects has covariance Rk k remaining time slots. it is obvious that planning should be helpful in this scenario: half of the objects have signiﬁcantly degraded observability in the second half of the simulation. The observation noise for the k i = 10−6 I in the ﬁrst 25 time slots. The initial position and velocity of the objects is identical to the scenario in Section 4. The object states evolve according to the nominally constant velocity model described in Eq. each observation has a linear Gaussian model (i.10. the 4 × 4 identity matrix). In each time slot.150 CHAPTER 4. when all observations occupy a single time slot). e. Intuitively. 4.4. The computational complexity is comparable with that of the previous case..g.e.. any one of the M objects can be observed. (2. with ∆t = 10−4 sec and q = 1.3.8).1I. the addition of observations consuming multiple time slots does not increase computation time signiﬁcantly. The controller constructs a plan for each N step planning horizon. . The scenario involves M = 50 objects being tracked using a single sensor over 50 time slots. When the planning horizon is less than three time steps. The initial state estimates of the ﬁrst 25 objects are corrupted by Gaussian noise with covariance I (i. Eq. in the eleventh time step when N = 40. observation noise variances frequently increased as objects became close together. allowing the controller to anticipate periods when objects are unobservable. A moderate gain in performance is obtained by extending the planning horizon from one time step to three time steps to enable use of the longer observation. while the estimates of the remaining objects is corrupted by Gaussian noise with covariance 1. (2. The observation noise of the remaining objects has covariance Ri = 10−6 I for all k. 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING The results of the simulation are shown in Fig. and in the ﬁrst time step when N = 50). the entire plan is executed without replanning.
Sec. 4.4. Computational experiments
151
and failure to anticipate this will result in a signiﬁcant performance loss. This intuition is conﬁrmed in the results presented in Fig. 4.11. The top plot shows the increase in relative performance as the planning horizon increases from one time slot to 50 (i.e., the total reward for each planning horizon, divided by the total reward when the planning horizon is unity). Each additional time slot in the planning horizon allows the controller to anticipate the change in observation models sooner, and observe more of the ﬁrst 25 objects before the increase in observation noise covariance occurs. The performance increases monotonically with planning horizon apart from minor variations. These are due to the stopping criteria in our algorithm: rather than waiting for an optimal solution, we terminate when we obtain a solution that is within 95% of the optimal reward. With a planning horizon of 50 steps (spanning the entire simulation length), the total reward is 74% greater than the total reward with a single step planning horizon. Since all observations occupy a single time slot, the performance guarantees of Chapter 3, and the maximum gain possible in any scenario of this type is 100% (i.e., the optimal performance can be no more than twice that of the onestep greedy heuristic). Once again, while this example represents an extreme case, one can see that similar events can commonly occur on a smaller scale in realistic scenarios. The smaller change in observation model characteristics and comparative infrequency of these events results the comparatively modest gains found in Sections 4.4.2 and 4.4.3.
4.4.5 Example of potential beneﬁt: multiple time slot observations
The ﬁnal scenario demonstrates the increase in performance which is possible through long planning horizons when observations occupy diﬀerent numbers of time slots. In such circumstances, algorithms utilizing shortterm planning may make choices that preclude selection of later observations that may be arbitrarily more valuable. The scenario involves M = 50 objects observed using a single sensor. The initial positions and velocities of the objects are the same as in the previous scenario; the initial estimates are corrupted by additive Gaussian noise with zero mean and covariance I. In each time slot, a single object may be observed through either of two linear Gaussian observations (i.e., of the form in Eq. (2.11)). The ﬁrst, which occupies a single time slot, has Hi,1 = I, and Ri,1 = 2I. The second, which occupies ﬁve time k k slots, has Hi,2 = I, and Ri,2 = rk I. The noise variance of the longer observation, rk , k k
Reward relative to one−step planning horizon 1.8
1.6 Relative gain
1.4
1.2
1
0
10
20 30 Horizon length (time slots)
40
50
10 Average time (seconds) 10 10 10 10 10
2
Average computation time to produce plan
1
0
−1
−2
−3
0
10
20 30 Horizon length (time slots)
40
50
Figure 4.11. Upper diagram shows the total reward obtained in the simulation using diﬀerent planning horizon lengths, divided by the total reward when the planning horizon is one. Lower diagram shows the average computation time to produce a plan for the following N steps.
Sec. 4.5. Time invariant rewards
153
varies periodically with time, according to the following values: 10−1 mod (k, 5) = 1 −2 10 mod (k, 5) = 2 −3 rk = 10 mod (k, 5) = 3 −4 10 mod (k, 5) = 4 −5 10 mod (k, 5) = 0 The time index k commences at k = 1. Unless the planning horizon is suﬃciently long to anticipate the availability of the observation with variance 10−5 several time steps later, the algorithm will select an observation with lower reward, which precludes selection of this later more accurate observation. The performance of the algorithm in the scenario is shown in Fig. 4.12. The maximum increase in performance over the greedy heuristic is a factor of 4.7×. While this is an extreme example, it illustrates another occasion when additional planning is highly beneﬁcial: when there are observations that occupy several time slots with time varying rewards. In this circumstance, an algorithm utilizing shortterm planning may make choices that preclude selection of later observations which may be arbitrarily more valuable.
4.5 Time invariant rewards
In many selection problems where the time duration corresponding to the planning horizon is short, the reward associated with observing the same object using the same sensing mode at diﬀerent times within the planning horizon is wellapproximated as being time invariant. In this case, the complexity of the selection problem can be reduced dramatically by replacing the individual resources associated with each time slot with a single resource, with capacity corresponding to the total number of time units available. In this generalization of Sections 4.2 and 4.3, we associate with each resource r ∈ R a capacity C r ∈ R, and deﬁne t(ui , r) ∈ R to be the capacity of resource j r consumed by elemental observation ui ∈ U i . The collection of subsets from which we j may select for object i is changed from Eq. (4.12) to: S i = {A ⊆ U i t(A, r) ≤ C r ∀ r ∈ R} where t(A, r) =
u∈A
(4.22)
t(u, r)
Reward relative to one−step planning horizon 5
4 Relative gain
3
2
1
0
10
20 30 Horizon length (time slots)
40
50
10 Average time (seconds)
−1
Average computation time to produce plan
10
−2
10
−3
0
10
20 30 Horizon length (time slots)
40
50
Figure 4.12. Upper diagram shows the total reward obtained in the simulation using diﬀerent planning horizon lengths, divided by the total reward when the planning horizon is one. Lower diagram shows the average computation time to produce a plan for the following N steps.
as illustrated in the following scenario.24c) (4. ∀ r ∈ R (4.1 Avoiding redundant observation subsets Straightforward implementation of the formulation just described will yield a substantial ineﬃciency due to redundancy in the observation subset S i . η}. we generate for each duration d ∈ {1. . . Suppose we seek to generate a plan for the next N time slots.23a) s.5. M } Ai ∈T i i i i ωuAi − Bl. where there are η total combinations of sensor and mode within each time slot.Ai i i ruAi ωuAi M i t(u. r)ωAi ≤ C r i=1 Ai ∈S i i ωAi = 1 ∀ i ∈ {1. 4.24e) (4.16) is in the form of the resource constraint. Ai ∈ T i i i ωuAi ∈ {0. . For each sensor mode4 ui . . . . . 4. .24b) (4. (4.5.Ai ωAi ≤ 0 u∈Bi i l. .t. 1} ∀ i. N } j 4 We assume that the index j enumerates all possible combinations of sensor and mode. . 1} ∀ i.24f) i ωAi = 1 ∀ i ∈ {1.15) becomes: M max ωi Ai i i rA i ω A i i=1 Ai ∈S i M i t(Ai . (4. Ai ∈ T i . j ∈ {1. . (4.24d) (4. Eq. . r)ωuAi ≤ C r i=1 Ai ∈T i u∈Bi l l. .Ai ∀ i. Algorithm 4.23d) The iterative solution methodology in Section 4. . 1} ∀ i.23b) (4. ∀r∈R (4.24b). (4. .Ai (4. .A i ωAi ∈ {0. ωi i A uA max i i rAi ωAi + i=1 Ai ∈T i l M u∈Bi i t(Ai .Sec.3 may be generalized to this case by replacing Eq. .16) with: M ωi i . r)ωAi + i=1 Ai ∈T i l l. Ai ∈ Tl i The only change from Eq. u ∈ Bl.t.24a) s.1 may be applied without modiﬁcation using this generalized integer program. M } Ai ∈S i i ωAi ∈ {0. Time invariant rewards 155 The full integer programming formulation of Eq. Ai ∈ S i (4.23c) (4.
j ∈ {1. this results in a dramatic decrease in the action space for each object. with U i and t(·. s = s(ui ) j t(ui . j.j ) = j. conﬁrming that the rewards are reasonably well approximated as being timeinvariant. . so that U i = {ui j ∈ which corresponds to applying sensor mode uj j. j). .d 0. Namely. i.d prefer to avoid this redundancy. Fig. d ∈ {1.j with capacity C r = 1. . as well as providing an observation subset for observing object i with sensor mode j . otherwise The additional constraints generated by these resources ensure that (at most) a single element from the set {ui d ∈ {1.01 and reduce the sensor platform motion to 0.e. .. the collection of j. {ri. ..d 0.5. . . .13 shows variation in rewards for 50 objects being observed with a single mode of a single sensor over 50 time slots in one realization of the example.d subsets S i may also contain several subsets of the form {ui l  l dl = d}. M }. Denoting by s(ui ) the sensor associated with ui . INDEPENDENT OBJECTS AND INTEGER PROGRAMMING (i. in addition to containing the single element observation subset {ui }.156 CHAPTER 4. 4.5 We would j. .d experiment demonstrates.2 Computational experiment: waveform selection To demonstrate the reduction in computational complexity that results from this formulation.d {1. .2.e. otherwise We could generate the collection of observation subsets S i from Eq. N }} can be chosen for each (i. and R = {rs s ∈ S}. we set j j d. . each i. .d i for d time slots. ri .22) using this structure. With each sensor s ∈ S (where S is the set of sensors) we s associate a resource rs with capacity C r = N . η}. . the duration for which the sensor mode is applied) an elemental observation ui . N }}. . .4. η}}. 5 For example.j i ∈ {1. We set ∆t = 0. this would introduce a substantial ineﬃciency which is quite avoidable. However. i = i . s) = j. setting: 1. ·) as described above. This can be achieved by introducing one additional resource for j.d each combination of object and sensor mode. . . we apply it to a modiﬁcation of the example presented in Section 4. enabling solution of larger problems. j = j t(ui .01 units/step. (4. each sensor has up to N time slots available for use. . As the following j. . . 4. ensuring instead that the only way observe object i using sensor mode j for a total duration of d time slots is to choose the elemental observation ui alone.
Diagram illustrates the variation of rewards over the 50 time step planning horizon commencing from time step k = 101. the variation between objects. The error bars show the standard deviation of the ratio.1 Ratio of rewards 1. .e. averaged over 50 objects. i.05 1 100 110 120 130 Time step (k) 140 150 160 Figure 4.. The line plots the ratio between the reward of each observation at time step in the planning horizon and the reward of the same observation at the ﬁrst time slot in the planning horizon.Reward at time k / reward at time 101 1.15 Azimuth/range rate observation Azimuth/range observation 1.13.
for 4 and 6 steps.4. The initial position and velocity of the objects is identical to that in Section 4. The scenario involves M = 50 objects being observed using a single sensor over 100 time slots. . 4. The fourdimensional state of each object is static in for d = 10 steps. If the controller chooses to observe an object for d time slots... 1] (the values are known to the controller). We assign time slots to each task in the plan by processing objects in random order.e. This would attract a very large computational burden in the original formulation discussed in the experiments of Section 4. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING The simulations run for 200 time slots. the controller constructs an N step plan. where ρi is randomly drawn. the performance drops to be lower than the performance of the greedy heuristic (i. The observation model is identical to that described in Section 4. Since rewards are assumed time invariant. and for 1. which worsens as the planning horizon increases.4. This is due to the mismatch between the assumed model (that rewards are time invariant) and the true model (that rewards are indeed time varying).14. etc. before constructing a new plan. independently for each object. this is consistent with the open loop feedback control methodology used in the examples in Section 4. using a single step planning horizon). there are also subsets for observing object i with sensor mode j multiple times for the same total duration—e. We then execute the action assigned to the ﬁrst time slot in the plan.g. Beyond this limit. 4.158 CHAPTER 4. the elements of this plan are not directly attached to time slots. we are taking an average of four observations per object.3 Example of potential beneﬁt To demonstrate the beneﬁt that additional planning can provide in problems involving time invariant rewards.4. we construct an experiment that is an extension of the example presented in Section 3. the variance is reduced by a factor of d from that of the same single time slot observation. 4 and 5 steps.5. There is a very small gain in performance as the planning horizon increases from one time slot to around ﬁve time slots.2. With a planning horizon of 40. in the absence of the dynamics process this is equivalent to d single time slot observations. the initial state estimates are corrupted by additive Gaussian noise with covariance (1 + ρi )I. assigning the ﬁrst time slots to the observations assigned to the ﬁrst object in the random order. for 3 and 7 steps.4. from a uniform distribution on [0.4. The computational cost in the lower diagram demonstrates the eﬃciency of this formulation. At each time step.2. The results of the simulations are shown in Fig.
Performance in 17 Monte Carlo simulations of 10 objects 1.99 0. Lower diagram shows the average time required to produce plan for the diﬀerent length planning horizon lengths. Top diagram shows the total reward for each planning horizon length divided by the total reward for a single step planning horizon.98 0. .96 0 5 10 15 20 25 Horizon length (time slots) 30 35 40 10 Average time (seconds) 0 Average computation time to produce plan 10 −1 10 −2 10 −3 0 5 10 15 20 25 Horizon length (time slots) 30 35 40 Figure 4.02 1. averaged over 17 Monte Carlo simulations.97 0.14. Error bars show the standard deviation of the mean performance estimate.01 Relative loss/gain 1 0.
to yield an algorithm that terminates when a solution has been found that is within the desired tolerance of optimality. 4. 4. The results demonstrate that performance increases by 29% as planning increases from a single step to the full simulation length (100 steps).1 0 0 0 0 1.6 Conclusion The development in this chapter provides an eﬃcient method of optimal and nearoptimal solution for a wide range of beam steering problems involving multiple independent objects. there are three diﬀerent linear Gaussian observations available for each object.1 0 0 i P0 = 0 0 1 0 0 0 0 1 In each time slot.3 provides an successive reduction of the upper bound to the reward attainable. The measurement noise covariance of each is Ri. rather than utilizing the ﬁrst observation (which has the highest reward) it should utilize either the second or third observations. which provides a series of solutions for which the rewards are successively improved. which are completely complementary and together provide all of the information found in the ﬁrst. the only measurements available are joint observations of the two (or more) .3.3 = k 0 1 0 0 0 0 0 1 The performance of the algorithm in the scenario is summarized in Fig.1 = k 1 0 0 0 0 1 0 0 . INDEPENDENT OBJECTS AND INTEGER PROGRAMMING time. Each iteration of the algorithm in Section 4. i. Additional planning allows the controller to anticipate that it will be possible to take a second observation of each object.6. the computational complexity appears to be roughly linear with planning horizon. and hence. the algorithm is able to construct a plan for the entire 100 time slots in ﬁve seconds. and the initial distribution is Gaussian with covariance: 1.3.1 = Ri. when objects become closely spaced.15. This may be combined with the augmented problem presented in Section 4.4 demonstrate the computational eﬃciency of the approach. and the gain in performance that can be obtained through use of longer planning horizons.3 = 10−6 I. For longer planning horizons.2 = Ri. k k k while the forward models are: Hi.2 Hk = 1 0 0 0 0 0 1 0 . In practical scenarios it is commonly the case that. The experiments in Section 4. Hi.160 CHAPTER 4.
15 1.1 1. Upper diagram shows the total reward obtained in the simulation using diﬀerent planning horizon lengths.3 Relative gain 1.2 1.25 1.05 1 0 20 40 60 Horizon length (time slots) 80 100 10 Average time (seconds) 10 10 10 10 10 2 Average computation time to produce plan 1 0 −1 −2 −3 0 20 40 60 Horizon length (time slots) 80 100 Figure 4. Lower diagram shows the average computation time to produce a plan for the following N steps.35 1.15. . divided by the total reward when the planning horizon is one.Reward relative to one−step planning horizon 1.
the dependency is commonly represented through association hypotheses (e. 19..1 may be applied to the transformed problem in which each independent group of objects is treated as a single object with state and action space corresponding to the cartesian product of the objects forming the group. rather than observations of the individual objects. [14. INDEPENDENT OBJECTS AND INTEGER PROGRAMMING objects. as in the notion of independent clusters in Multiple Hypothesis Tracking (MHT). This approach may be tractable if the number of objects in each group remains small.162 CHAPTER 4. This inevitably results in statistical dependency between object states in the conditional distribution. . then Algorithm 4. 82]). 58. If the objects can be decomposed into many small independent “groups”. 71.g.
it is typically the case that the energy cost of communications is orders of magnitude greater than the energy cost of local computation [80. Over time the leader node changes dynamically as function of the kinematic state of the object. [37. Consequently. in which the sensing model is assumed to be such that the observation provided by the sensor is highly informative in the region close to the node. and uninformative in regions far from the node. We consider a sensor network consisting a set of sensors (denoted S. and the knowledge used to compute this estimate is summarized by the conditional probability density function (PDF). In this chapter we examine this tradeoﬀ in the context of object tracking in distributed sensor networks. For the purpose of addressing the primary issue.g.e. where S = Ns ). This property.Chapter 5 Sensor management in sensor networks ETWORKS of intelligent sensors have the potential to provide unique capabilities for monitoring wide geographic areas through the intelligent exploitation of local computation (so called innetwork computing) and the judicious use of intersensor communication. has led to a variety of approaches (e. In many sensor networks energy is a dear resource to be conserved so as to prolong the network’s operational lifetime.. the cost of acquiring observations. 64]) which eﬀectively designate the responsibility of computing the conditional PDF to one sensor node (referred to as the leader node) in the network. kinematic state) are inferred largely from sensor observations which are in proximity to the object (e.. and the cost of propagating the conditional PDF through the network. [62]).g. combined with the need to conserve energy.. local fusion of sensor data is suﬃcient for computing an accurate estimate of object state. This leads to an inevitable tradeoﬀ between the uncertainty in the conditional PDF. Tracking moving objects is a common application in which the quantities of interest (i. 81]. trading oﬀ N 163 . Additionally.
The approach developed in Section 5. or minimization of communication cost subject to a constraint on expected estimation performance. which fuses the observations with the prior conditional PDF and tasks sensors at the next time step. which suggests incorporation of sensing costs and estimation performance into a uniﬁed objective without adopting the constrained optimization framework that we utilize. The controller uses a dual problem formulation to adaptively utilize multiple sensors at each time step.1. then the optimal solution would be to collect and fuse the observations provided by all sensors in the network. or to minimize the communication cost subject to a constraint on the estimation . and provides an approximation which extends the Lagrangian relaxation approach to problems involving sequential replanning.10). 5.1 Constrained Dynamic Programming Formulation The sensor network object tracking problem involves an inherent tradeoﬀ between performance and energy expenditure. incorporating a subgradient update step to adapt the dual variable (Section 5. Other related work includes [29].9). SENSOR MANAGEMENT IN SENSOR NETWORKS energy consumption for accuracy. data association) they do not change the basic problem formulation or conclusions. In the development which follows. If the energy consumed by sensing and communication were unconstrained. We consider a scheme in which. which adopts a constrained optimization framework without incorporating estimation performance and sensing cost into a uniﬁed objective. The questions which must be answered by the controller are how to select the subset of sensors at each point in time.164 CHAPTER 5. and [20]. we restrict ourselves to sensor resource planning issues associated with tracking a single object.1. and introducing a heuristic cost to go in the terminal cost to avoid anomalous behavior (Section 5. we provide a framework which can be used to either maximize the information obtained from the selected observations subject to a constraint on the expected communication cost. a structure which results in a major computational saving for our approach.. Our dual problem formulation is closely related to [18]. at each time step.g. While additional complexities certainly arise in the multiobject case (e. a subset of sensors is selected to take an observation and transmit to a sensor referred to as the leader node. One way of incorporating both estimation performance and communication cost into an optimization procedure is to optimize one of the quantities subject to a constraint on the other.1 allows for optimization of estimation performance subject to a constraint on expected communication cost. and how to select the leader node at each point in time.
etc) conditioned on the observations.1. This can be formulated as a constrained Markov Decision Process (MDP). z k−1 . we utilize a particle ﬁlter to maintain this representation (as described in Section 2. . the control choice available at each time is uk = (lk .1.1. The estimation objective that we employ in our formulation is discussed in Section 5. An eﬃcient method of compressing particle representations of PDFs for transmission in sensor networks is studied in [36].1 Estimation objective The estimation objective that we utilize in our formulation is the joint entropy of the object state over the N steps commencing from the current time k: k+N −1 ˇ H(xk .3). In either case.Sec. the tracking problem naturally ﬁts into the Bayesian state estimation formulation.3. z Sk . while our communication cost is discussed in Section 5. In our experiments. and z the previous choice of leader node.3). . xk+N −1 ˇ 0 .6. As discussed in the introduction to this chapter.1. lk−1 ∈ S.4). As discussed in Section 2.1.2. .. . as discussed in Section 2. . or unscented Kalman ﬁlter [39]. These two quantities are utilized diﬀerently in dual formulations. .4). z Sl ˇ 0 . position. Sk ). .1. .3.2). 5.1.e. . the ﬁrst of which optimizes estimation performance subject to a constraint on communication cost (Section 5. . . velocity. and the second of which optimizes communication cost subject to a constraint on estimation performance (Section 5. z k+N −1 ) z k S where z Sl denotes the random variables corresponding to the observations of the sensors l in the set Sl ⊆ S at time l. . . .1. 5. .1. . z l−1 ) l z k S l=k . . . . where lk ∈ S is the leader node at time k and Sk ⊆ S is the subset of sensors activated at time k. z Sk . z k−1 ). Constrained Dynamic Programming Formulation 165 quality. denoted Xk p(xk ˇ 0 .1. extended Kalman ﬁlter (Section 2. minimizing this quantity with respect to the observation selections Sl is equivalent to maximizing the following mutual information expression: k+N −1 l−1 ˇ I(xl . . . although the planning method that we develop is equally applicable to any state estimation method including the Kalman ﬁlter (Section 2. such that the role of the sensor network is to maintain a representation of the conditional PDF of the object state (i. . . we envisage that any practical implementation of particle ﬁlters in sensor networks would use such a scheme. The decision state of the dynamic program is the ˇ combination of conditional PDF of object state. z k−1 . .2.
z Sk . . through the calibration procedure as described in [34]). and that the cost of these communications is known at every sensor node. . Hence we model the cost of communicating between nodes i and j. z l−1 . the selection prior l l l to introduction of the jth element): k+N −1 Sl  l=k j=1 l−1 ˇ I(xl . . . . sj−1 } (i. such that the energy required to communicate the model from node i to node j is Bp Cij . the cost (per bit) of direct communication between two nodes is modelled as being proportional to the square distance between the two sensors: ˜ Cij ∝ y i − y j 2 2 (5. We assume that the complexity of the probabilistic model (i. To do so. .1) as arc lengths: nij Cij = k=1 ˜ Cik−1 ik (5. denoting the jth element by sj ... . Cij . . . . inij } is the shortest path from node i = i0 to node j = inij . . as the length of the shortest path between i and j.2) where {i0 . . 5. the number of bits required for transmission) is ﬁxed at Bp bits. and the ﬁrst (j − 1) elements by by Slj {s1 .166 CHAPTER 5. e.2 Communications We assume that any sensor node can communicate with any other sensor node in the network. The shortest path distances can be calculated using any shortest path algorithm.e.. In our simulations.1) where y s is the location of the sth sensor (which is assumed to be known. in which several sensors relay the message from source to destination. . this expression can be further decomposed using with an additional application of the chain rule. The algorithm we develop could be applied to other measures of estimation performance. Communications between distant nodes can be performed more eﬃciently using a multihop scheme. . (5. in practice this will only be required within a small region around each node. although the objectives which result may not be as natural. . we introduce an arbitrary ordering of the elements of Sl .e. z k−1 . using the distances from Eq. z l l ˇ 0 .1.g. SENSOR MANAGEMENT IN SENSOR NETWORKS Since we are selecting a subset of sensor observations at each time step. . such as deterministic dynamic programming or label correcting methods [9]. z l l ) z k sj S Sj Our formulation requires this additivity of estimation objective. This value will depend on the estimation scheme used in .
lk−1 . uk ) = −I(xk . without changing the structure of the solution.1. .Sec. .1. adaptively adjusting the weight .1. terminated by Jk+N (Xk+N . the expected number of retransmissions required. z k−1 ) k z Sk  (5. 5.5) The ﬁrst term of this communication cost captures the cost of transmitting the representation of the conditional PDF from the previous leader node to the new leader node. li−1 . λ) = −λM . As discussed in Section 5. lk+N −1 . k + N − 1}. The number of bits in an observation is denoted as Bm . and deﬁne the perstage cost: ˇ g(Xk . . .4) We choose the perstage constraint contribution G(Xk . λ) = min g (Xi . z kk ˇ 0 .3) Sj =− j=1 ˇ z I(xk . taking the measurement. z k k ) sj (5. z k−1 .6) D for time indices i ∈ {k. . (2. Substituting the perstage cost and constraint function into Eq. etc.1. . .ui E D Ji+1 (Xi+1 . we utilize mutual information as our estimation objective. lk−1 .53). . z Sk ˇ 0 . . λ) + ¯ ui Xi+1 Xi . uk ) to be such that the the expected communication cost over the next N time steps is constrained: G(Xk . li . ui . These costs may be amended to incorporate the cost of activating the sensor. li−1 . if a particle ﬁlter is utilized then the quantity may adapt as the form of the probabilistic model varies. . Constrained Dynamic Programming Formulation 167 a given implementation. the unconstrained optimization in the Lagrangian (for a particular value of the Lagrange multiplier λ) can be solved conceptually using the recursive dynamic programming equation: JiD (Xi . . 5. . uk ) = Bp Clk−1 lk + j∈Sk Bm Clk j (5. λ) (5. as detailed in [36]. while the second captures the cost of transmitting observations from each active sensor to the new leader node.1.3 Constrained communication formulation Our ﬁrst formulation optimizes estimation performance subject to a constraint on communication cost. The conditional PDF at the next time Xi+1 is calculated using the recursive Bayes update described in Section 2. such that the energy required to transmit an observation from node i to node j is Bm Cij . The augmented perstage cost combines the information gain and communication cost in a single quantity. lk−1 .
10) z ˇ from which we set M = Hmax − H(xk .1. At time step k. z kk ˇ 0 . . xk+N −1 ˇ 0 . . . there is also a closelyrelated formulation which optimizes the communication energy subject to a constraint on the entropy of probabilistic model of object state. .7) This incorporation of the constraint terms into the perstage cost is a key step. . . . . .1. . z Sk . z k−1 . .11) In our implementation. . The cost per stage is set to the communication cost expended by the control decision: g(Xk . . . (2. lk−1 . . z l−1 . . . xk+N −1 ˇ 0 .7 and 5. SENSOR MANAGEMENT IN SENSOR NETWORKS applied to each using a Lagrange multiplier: Sk  sj Sj ˇ I(xk . . . . lk−1 . . z l l ) k i=k j=1 ˇ ≤ Hmax − H(xk .72). 5. xk+N −1 ˇ 0 . . . z Sk .1 and z Sk  G(Xk . lk−1 . . . λ) = − ¯ j=1 + λ Bp Clk−1 lk + (5. uk ) = − j=1 1 ˇ I(xk . .168 CHAPTER 5. z k k ) z Bm Clk j j∈Sk g (Xk . . . z k−1 . . uk ) = Bp Clk−1 lk + j∈Sk Bm Clk j (5. . z kk ˇ 0 . we obtain k+N −1 Si  j j S s Sl−1 ˇ −E z I(xi .4 Constrained entropy formulation The formulation above provides a means of optimizing the information obtained subject to a constraint on the communication energy expended. z k+N −1 )} ≤ Hmax z k S (5.1. z k−1 . uk . . . . . . which allows the greedy approximation described in Sections 5. . Xk . . z k−1 ). . .9) Manipulating this expression using Eq. . . z k−1 . . . . z k−1 ) (5. H(xk . . we construct a new control policy at each time step by applying the approximate dynamic programming method described in the following section commencing from the current ˇ probabilistic model. . . . z k k ) z sj Sj (5. hence the dependence on Xk is immaterial.8 to capture the tradeoﬀ between estimation quality and communication cost. . . . . .8) We commence by formulating a constraint function on the joint entropy of the state of the object over each time in the planning horizon: k+N −1 ˇ E{H(xk . z l l ˇ 0 . z k−1 ) is a known constant (reprez senting the uncertainty prior to receiving any observations in the present planning horizon). xk+N −1 ˇ 0 .
4. Conceptually. we draw a ˇ set of Np samples of the set of observations z Sk from the distribution p(z Sk ˇ 0 . (5. except that the Lagrange multiplier is on the mutual information term. such that the computational complexity increases as O(Ns N 2Ns N Np N ) as the tree depth N (i. corresponding all possible selections of leader node and subsets of sensors to activate. and evaluate the cost to go one step later Jk+1 ing to the decision state resulting from each set of observations. The following sections describe a series of approximations which are applied to obtain a practical implementation. λ) = Bp Clk−1 lk + ¯ j∈Sk Bm Clk j − λ j=1 ˇ I(xk .7). Ns 2Ns Np new leaves (samples) are drawn. Constrained Dynamic Programming Formulation 169 Following the same procedure as described previously. There are Ns 2Ns possible controls at each time step. the elements of the information constraint in Eq.1. the dynamic program of Eq. z k k z Sj S 1:j−1 ) (5.6 Linearized Gaussian approximation In Section 2.10) can be integrated into the perstage cost. .3. z k−1 . The evaluation of each cost to go one step later will yield the same branching. 5.. Since the covariance of a Kalman ﬁlter is independent of observation values (as seen in Section 2. The complexity of the simulation process is ¯D formidable: to evaluate Jk (Xk . z k−1 ) k z k ¯D (Xk+1 . . not of the state estimate. 5. A tree structure develops. . lk−1 .5 Evaluation through Monte Carlo simulation The constrained dynamic program described above has an inﬁnite state space (the space of probability distributions over object state). . (5. λ) correspondderived from Xk . . as illustrated in Fig. hence we seek to exploit additional structure in the problem to ﬁnd a computable approximate solution.1.1. 5.6) could be approximated by simulating sequences of observations for each possible sequence of controls. Such an approach quickly becomes intractable even for a small number of sensors (Ns ) and simulated observation samples (Np ). λ) for a given decision state and control. this . lk−1 . z k k ˇ 0 .Sec. . lk . we showed that the mutual information of a linearGaussian observation of a quantity whose prior distribution is also Gaussian is a function only of the prior covariance and observation model.2).12) 5.e. . hence it cannot be evaluated exactly. . uk . resulting in a formulation which is identical to Eq.1. where for each previous leaf of the tree.1. rather than the communication cost terms: Sk  g (Xk . (5. the planning horizon) increases.
its applicability to this problem is not immediately clear. i i−1 i ∈ {k + 1.2. (5.1: x0 = µk k x0 = Fx0 . .44)). µk . While the linear . Pk ). we ﬁt a Gaussian distribution to the a priori PDF (e.5 is reduced to O(Ns N 2Ns N ) with the horizon length N . We then calculate the nominal trajectory by calculating the mean at each of the following N steps. rather than control policies.. as the observation model of interest generally nonlinear (such as the model discussed in Section 5.14) Subsequently.1.170 CHAPTER 5. it was previously applied to a sensor scheduling problem in [21]. A signiﬁcant horizon length is required in order to provide an eﬀective tradeoﬀ between communication cost and inference quality.1. rather than O(Ns N 2Ns N Np N ). (2. However.6). then such a linearization approximation may provide adequate ﬁdelity for planning of future actions (this approximation is not utilized for inference: in our experiments. and then a new plan for the following N steps is generated.g. It is wellknown that this result implies that open loop policies are optimal: we just need to search for control values for each time step. having relinearized after incorporating the newly received observations.1. To obtain the linearization. the strength of the dynamics noise is relatively low. in the linear Gaussian case. the growth of the tree discussed in Section 5. The controller which results has a structure similar to the open loop feedback controller (Section 2.3). since many time steps are required for the longterm communication cost saved and information gained from a leader node change to outweigh the immediate communication cost incurred. the SIS algorithm of Section 2. and the planning horizon length is relatively short (such that deviation from the nominal trajectory is small). If the initial uncertainty is relatively low. referred to as i the linearized Kalman ﬁlter. In the case of the stationary linear dynamics model discussed in Section 2.3. suppose that the resulting distribution is N (xk .19) where the linearization point at time i is x0 . While this is a useful result. . (2. the ﬁrst step of the plan executed. . .4 is used with the nonlinear observation function to maintain the probabilistic model). future rewards depend only on the control values: they are invariant to the observation values that result. the observation model is approximated using Eq. in the recursion of Eq. let us suppose that the observation model can be approximated by linearizing about a nominal state trajectory. k + N − 1} (5. using Eq. This is a wellknown approximation. SENSOR MANAGEMENT IN SENSOR NETWORKS result implies that.13) (5.1.2): at each stage a plan for the next N time steps is generated. it is discussed further in Section 2. Accordingly.
z i−1 . li . and the substage cost g (Xi .15) ¯ for i ∈ {k. 5. . 5. λ) is ¯ i ˇ g (Xi .6). si . λ) = min{λBp Cli−1 li + Ji0 (Xi . Following the application of the linearized Gaussian approximation (Section 5. {∅}. we decompose each decision stage into a number of substages and apply heuristic approximations in a carefully chosen way. λ).. li .2. the control choices are to terminate (i. λ) = λBm Cli si − I(xi .1. at each substage (indexed by i ). Sii . . The following two sections describe two tree pruning approximations we introduce to obtain a tractable implementation.Si ¯ Ji+1 (Xi+1 . z i i ) ¯ z i i si Si (5. 5.4. λ) = −λM .1. λ) + Jii +1 (Xi . This is illustrated in Fig. . Sii . we can break these two phases apart. .. (5. the branches labelled ‘T ’ represent the decision to terminate with the currently selected subset. li . k + N − 1}.1. λ)} g i i i ∈S\Si Sii is the set of sensors chosen in stage i prior to substage i . 5.1 will be reduced to the structure shown in Fig. . . li . Each stage of control branching involves selection of a leader node. growing as O(Ns N 2Ns N ).3.e. the bottom of the computation tree) commencing from the beginning of time slot . where ¯ Jii (Xi .16) si i ¯ min {¯(Xi . li . For the communication constrained formulation. z i i ˇ 0 . terminated by setting JN (XN .17) ¯ The cost to go Ji (Xi . li−1 . and a subset of sensors to activate. λ) represents the expected cost to the end of the problem (i. lN −1 . the DP recursion becomes: ¯ ¯ Ji (Xi . . as illustrated in Fig. λ) = min E i Xi+1 Xi . Finally. λ)} li ∈S (5. the complexity is still exponential in both time and the number of sensors. move on to portion of the tree corresponding to the following time slot) with the current set of selections. Sii . li . si . or to select an additional sensor. the branching of the computation tree of Fig.e. 5. Sii ∪ {si }. li . 5.Sec. one can decompose the choice of which subset of sensors to activate (given a choice of leader node) into a generalized stopping problem [9] in which. . li−1 .7 Greedy sensor subset selection To avoid the combinatorial complexity associated with optimization over subsets of sensors. Constrained Dynamic Programming Formulation 171 Gaussian approximation eliminates the O(Np N ) factor in the growth of computational complexity with planning horizon length. Sii . si .
Figure 5. a tail subproblem is required to be evaluated each new control. . zkk u1 . . choices for leader node & active subset at time k choices for leader node & active subset at time k+1 Figure 5. 3 u2 k+1 uk+1 . .1 u2 u3 k k . .1.2 zkk u1 . . . . .2.1. . . Computation tree after applying the linearized Gaussian approximation of Section 5. . At each stage. . .3 u1 k+1 . .u1 k zkk u1 . . and a set of simulated values of the resulting observations. . . . . . .6. . Tree structure for evaluation of the dynamic program through simulation.
leader node at time k active subset at time k leader node at time k+1 Figure 5.2. Computation tree equivalent to Fig. resulting from decomposition of control choices into distinct stages. 5. . selecting leader node for each stage and then selecting the subset of sensors to activate.3.
Computation tree equivalent to Fig.leader node at time k first active sensor at time k T T second active sensor at time k leader node at time k+1 Figure 5. 5. 5.4.2 and Fig. . resulting from further decomposing sensor subset selection problem into a generalized stopping problem. or to add an additional sensor. in which each substage allows one to terminate and move onto the next time slot with the current set of selected sensors.3.
one can show that. 5. The function Jii (Xi .e. λ) ≤ ¯ i i 0 (i. Sii . represents the cost to go from substage i of stage i to the end of the problem.e. Sii . move on to the next time slot) with the currently selected subset of sensors. λ).. li . hence it can be excluded from consideration. the expected cost to go to the bottom of the tree commencing from a partial selection of which sensors to activate at time i. While this formulation is algebraically equivalent to the original problem. (5.4 where branching occurs over choices of leader ¯ node for that time slot). reducing computational complexity when dealing with large networks. The ﬁrst choice in the outer minimization in Eq. then that sensor will not be selected in any later substages of stage i (as the cost cannot decrease as we add more sensors).. li . li .1. while the second represents the choices of additional sensors to select. s. i. λ) ∀ i < i ¯ ¯ (5. λ) ≤ g (Xi .. Sii . in which. The fact that the constraint terms i of the Lagrangian were distributed into the perstage and persubstage cost allows the greedy approximation to be used in a way which trades oﬀ estimation quality and communication cost. .Sec. if there is no sensor si for which the substage cost g (Xi . s.18) Using this result. if at any substage of stage i we ﬁnd that the substage cost of adding a particular sensor is greater then zero (so that the augmented cost of activating the sensor is higher than the augmented cost of terminating). it is in a form which is more suited to approximation. for which the cost of transmitting the observation is not outweighed by the expected information it will provide). Constrained Dynamic Programming Formulation 175 i (i. the position of the tree in Fig. at each stage.e.16) represents the choice to terminate (i. the substages which form a generalized stopping problem may be performed using a greedy method. Namely. (5.17) since the ﬁrst term is constant with respect to Sii and the second is submodular: (assuming that observations are independent conditioned on the state) g (Xi .e. In practice this will limit the sensors requiring consideration to those in a small neighborhood around the current leader node and object. si . otherwise the sensor si with the lowest substage cost is added. for the substage cost in Eq. then we progress to the next stage. One quite general simpliﬁcation can be made: assuming that sensor measurements are independent conditioned on the state. 5. li .. careful analysis of the sensor model can yield substantial practical reductions. Sii . 2 While worstcase complexity of this algorithm is O(Ns ).
. . . . . Tree structure for nscan pruning algorithm with n = 1. .1 lk 2 lk 3 lk G 1 lk+1 2 3 lk+1 lk+1 G G G 1 lk+2 2 lk+2 3 lk+2 G G . .5. . and the remaining sequences are extended using greedy sensor subset selection (marked with ‘G’). all but the best sequence ending with each leader node is discarded (marked with ‘×’). Figure 5. Subsequently. . At each stage new leaves are generated extending each remaining sequence with using each new leader node.
and the overall worst case computational complexity is O(N Ns 3 ) (in practice.1.1.6 × 1090 to (at worst case) 8 × 105 . Setting n = 1.1.Sec. the complexity is reduced from 1. 5. a planning horizon of N = 10 and simulating Np = 50 values of observations at each stage. incurring a diﬀerent communication cost). which is not Markovian with respect to the leader node. has a computation complexity of the order O(Ns N 2Ns N Np N ). Because the communication cost structure is Markovian with respect to the leader node (i. as discussed in Section 5. at each stage. is approximated using the greedy method. Thus. the one with the lowest cost value is kept. 2 . The diﬀerence in complexity is striking: even for a problem with Ns = 20 sensors. The information reward structure. 5. be limited to sensors close to the object. the communication cost of a particular future control trajectory is unaﬀected by the control history given the current leader node). the algorithm is illustrated in Fig. We commence by considering each possible choice of leader node2 for the next time step and calculating the greedy sensor subset selection from Section 5. Then. we keep some approximation of the best control trajectory which ends with each sensor as leader node. for each leaf node. in practice. similar to the sensor subset selection. it is captured perfectly by this model. we consider the candidate leader nodes at the following time step. All sequences ending with the same candidate leader node are compared. and the complexity will be substantially lower). Constrained Dynamic Programming Formulation 177 5.. which incorporates costs over multiple time stages. the tree width is constrained to the number of sensors.1. 5.1. Substituting the resulting The set of candidate leader nodes would.e.5. at each stage we only consider candidate sensors in some neighborhood of the estimated object location. which is commonly used to control computational complexity in the Multiple Hypothesis Tracker [58]. This compares to the simulationbased evaluation of the full dynamic programming recursion which. Using such an algorithm.7 for each leader node choice (the decisions made in each of these branches will diﬀer since the sensors must transmit their observations to a diﬀerent leader node.9 Sequential subgradient update The previous two sections provide an eﬃcient algorithm for generating a plan for the next N steps given a particular value of the dual variable λ.5. and the other sequences are discarded.8 nScan pruning The algorithm described above is embedded within a slightly less coarse approximation for leader node selection. This approximation operates similarly to the nscan pruning algorithm.
E[ k i G(Xi . l . li−1 . µi (Xi .178 CHAPTER 5. which is undesirable since each evaluation incurs a substantial computational cost. hence the slow adaptation of the dual variable provided by a single subgradient step in each iteration may provide an adequate approximation. µi (Xi . as opposed to [18]. E[ k i G(Xi . li−1 ))] > M λk+1 = (5. µi (Xi .e. hence the argument of the expectation of E[ i G(Xi . in practice the level of the constraint (i. li−1 ))] >M ≤M (5. λmin }. The dual variables may be initialized using several subgradient iterations or some form of line search when the algorithm is ﬁrst executed in order to commence with a value in the right range. (2.19) max{λ − γ − . and λmax and λmin are the maximum and minimum values of the dual variable. λmin }. li−1 )) − M ] is deterministic). λmax }. li−1 ))] i G(Xi . SENSOR MANAGEMENT IN SENSOR NETWORKS plan into Eq. µi (Xi . In the experiments which follow. µ (X . at each time step we plan using a single value of the dual variable. The rolling horizon formulation necessitates reoptimization of the dual variable at every time step.20) where γ + and γ − are the increment and decrement sizes. µi (Xi . It is necessary to limit the values of the dual variable since the constrained problem may not be feasible. 3 . eﬀectively implying that communications are costfree. and then update it for the next time step utilizing either an additive update: min{λ + γ + . li−1 . li−1 . the value of E[ i G(Xi . l ))] ≤ M k i i i−1 i i i−1 or a multiplicative update: min{λ β + .56) yields a subgradient which can be used to update the dual variables (under the linear Gaussian approximation. undesirable behavior can result such as utilizing every sensor in a sensor network in order to meet an information constraint which cannot be met in any case. li−1 . λmax }. A full subgradient implementation would require evaluation for many diﬀerent values of the dual variable each time replanning is performed. li−1 )) − M ]) will vary little between time steps.. β + and β − are the increment and decrement factors. E[ k λk+1 = max{λ /β − . If the variable is not constrained. E[ G(X . feedback policies correspond to open loop plans. or because the dual variable in the communication constraint was adapted such that it became too low. li−1 .3 Since the planning is over many time steps.
21) where the Lagrange multiplier λ is included only in the communicationconstrained case. However.10) to generate plans for a given value of the dual variable λ using the approximations. Denoting by µ(Xk ) ∈ S the policy which selects as leader node the sensor with the ˜ smallest expected distance to the object.1. (5.9 using the desired constraint. we can use the constraint in Eq. This eﬀectively acts as the cost of the base policy in a rollout [9]. and then perform the dual variable update of Section 5. The minimum entropy constraint is such an example: E min ˇ H(xi ˇ 0 . as required by the approximations in Sections 5. and take progressively fewer observations. the terminal cost is: Jk+N (Xk+N .22) does not have an additive decomposition (cf Eq.6)) the cost of transmitting the probabilistic model to the sensor with the smallest expected distance to the object at the ﬁnal stage.1.1. this modiﬁcation can make the problem infeasible for short planning horizons.10 Rollout If the horizon length is set to be too small in the communications constrained formulation. (5. . lk+N −1 ) = λBp Clk+N −1 µ(Xk+N ) ˜ (5. allowing us to use the computationally convenient method described above with a more meaningful criterion. but the limiting of the dual variables discussed in Section 5.22) i∈{k. at the ﬁnal stage.7 and 5. we use a rollout approach (a commonly used suboptimal control methodology). in which we add to the terminal cost in the DP recursion (Eq. the leader node will have to be transferred to the closest sensor. z i−1 ) − Hmax z ≤0 (5. ...1. .. . . Eq. (5.1. To prevent this degenerate behavior. (5. (5. The resulting algorithm constructs a plan which assumes that. (5. In the communicationconstrained case. then the resulting solution will be to hold the leader node ﬁxed.10) as a surrogate for the desired constraint in Eq.9 can avoid anomalous behavior. 5.Sec.8. 5. hence there is no beneﬁt in holding it at its existing location indeﬁnitely.1..11 Surrogate constraints A form of information constraint which is often more desirable is one which captures the notion that it is acceptable for the uncertainty in object state to increase for short periods of time if informative observations are likely to become available later. (5. Constrained Dynamic Programming Formulation 179 5.k+N −1} The constraint in Eq.10)).1. This simple approximation eﬀectively uses the additive constraint in Eq.22).22).
We require the sensor management algorithm to provide predictions of the communications performed by each sensor at each time in the future. In order to compare the performance of the algorithm developed in Section 5. This involves a tradeoﬀ between two diﬀerent forms of communication: the large. rather than jointly optimizing sensor management and leader node selection.23) . querying and transmission of obervations) at time k if the leader node is lk as gc (Xk .180 CHAPTER 5. lk−1 . ˇ Xk p(xk ˇ 0 . such as by limiting the maximum number of sensors utilized.1 with these methods. . and the previous leader node. The approach is fundamentally diﬀerent from that in Section 5. li−1 ) = min gc (Xi . As in Section 5. uk = lk ∈ S.li E Ji+1 (Xi+1 . seeks to dynamically select the leader node to minimize the communications energy consumed due to activation. conditioned on a particular sensor management strategy (that is insensitive to the choice of leader node). we develop an approach which. deactivating and querying sensors.2 Decoupled Leader Node Selection Most of the sensor management strategies proposed for object localization in existing literature seek to optimize the estimation performance of the system. li ) + Bp Cli−1 li + li ∈S Xi+1 Xi .2. deactivation and querying of sensors by the leader node. li ) (5. 5. These methods typically do not consider the leader node selection problem directly.1. infrequent step increments produced when the probability distribution is transferred from sensor to sensor during leader node handoﬀ. SENSOR MANAGEMENT IN SENSOR NETWORKS 5. the dynamic program for selecting the leader node at time k can be written as the following recursive equation: Ji (Xi . the problem corresponds to a dynamic program in which the decision state at time k is the combination of the conditional PDF of object state. .1 Formulation The objective which we seek to minimize is the expected communications cost over an N step rolling horizon. The control which we z may choose is the leader node at each time.1 as we are optimizing the leader node selection conditioned on a ﬁxed sensor management strategy. Denoting the expected cost of communications expended by the sensor management algorithm (due to sensor activation and deactivation. . incorporating communication cost indirectly. z k−1 ). although the communication cost consumed in implementing them will vary depending on the leader node since communications costs are dependent on the transmission distance. and the small. and transmission of observations from sensors to the leader node. frequent increments produced by activating. . lk ).
. As described in Section 2. the state evolves according to the nominally constant velocity model described in Eq. and y s is the location of the sth sensor.3 Simulation results As an example of the employment of our algorithm. Simulation results 181 for i ∈ {k. s) = (5.g. each trial used a diﬀerent sensor layout and object trajectory. s) + vk (5. we use a = 2000 and b = 100. independent of wk ∀ k and j of vk . j = s ∀ k. . with ∆t = 0.25) s s where vk ∼ N {vk . In the same way as discussed in Section 5. The observation function h(·.1. we set the terminal cost to the cost of transmitting the probabilistic model from the current leader node to the node with the smallest expected distance to the object. µ(Xk+N ): ˜ Jk+N (Xk+N .26) Lxk − y s 2 + b 2 where L is the matrix which extracts the position of the object from the object state (such that Lxk is the location of the object). (2..24) In Section 5. The constants a and b can be tuned to model the signaltonoise ratio of the sensor. s) + Hs (x0 )(xk − x0 ) + v s k Hs (x0 ) = x h(x. The state of the object is position and velocity in two dimensions (xk = [px vx py vy ]T ). 1} is a white Gaussian noise process. 0. we simulate a scenario involving an object moving through a network of sensors. The simulation involves Ns = 20 sensors positioned randomly according to a uniform distribution inside a 100×100 unit region.1. e. s)x=x0 .10. resulting from measuring the intensity of an acoustic emission of known amplitude: a h(xk . the measurement function h(·. . and the rate at which the signaltonoise ratio decreases as distance increases. Denoting the measurement taken by sensor s ∈ S = {1 : Ns } (where s Ns is the number of sensors) at time k as zk .Sec. s) can be approximated as a ﬁrstorder Taylor series truncation in a small vicinity around a nominal point x0 : s zk ≈ h(x0 . 5. and then secondly the two most informative observations. . lk+N −1 ) = Bp Clk+N −1 µ(Xk+N ) ˜ (5. s) is a quasirange measurement.8). a nonlinear observation model is assumed: s s zk = h(xk .3 we apply this method using a single lookahead step (N = 1) with a greedy sensor management strategy selecting ﬁrstly the most informative observation.3.3. The information provided by the observation reduces as the range increases due to the nonlinearity. k + N − 1}. 5.25 and q = 10−2 .
2. so that the cost of transmitting the probabilistic model is 64× the cost of transmitting an observation. since the average simulation length is 180 steps. SENSOR MANAGEMENT IN SENSOR NETWORKS −2a (Lx0 − y s )T L (Lx0 − y s 2 + b)2 2 (5.6.182 where: Hs (x0 ) = CHAPTER 5.1.1. with γ + = 50. For the informationconstrained problem. For the communicationconstrained problem. and tempering spending of communication resources sooner. with β + = β − = 1. the particle ﬁlter described in Section 2. an additive update was used for the subgradient method.6 also illustrates the improvement which results from utilizing a longer planning horizon. which at each time activate and select as leader node the sensor with the observation producing the largest expected reduction in entropy. moving into the region. 5. A longer planning horizon reduces this undesirable behavior by anticipating upcoming leader node handoﬀ events earlier.27) This approximation is used for planning as discussed in Section 5. The model was simulated for 100 Monte Carlo trials. However. λmax = 5×10−3 . λmin = 10−5 . and the initial velocity is 2 units per second in each dimension. 5. λmin = 10−8 .4 is used for inference. the average communication cost if the constraint were always met with equality would be 1800. γ − = 250. the practical behavior of the system is to reduce the dual variable when there is no handoﬀ in the planning horizon (allowing more sensor observations to be utilized). and Cmax = 10N where N is the planning horizon length.6. The top diagram in Fig. The top diagram demonstrates that the communicationconstrained formulation provides a way of controlling sensor selection and leader node which reduces the communication cost and improves estimation performance substantially over the myopic singlesensor methods. The simulation ends when the object leaves the 100×100 region or after 200 time steps. because this cost tends to occur in bursts (due to the irregular handoﬀ of leader node from sensor to sensor as the object moves). This is . The initial position of the object is in one corner of the region. The simulation results are summarized in Fig. The informationconstrained formulation allows for an additional saving in communication cost while meeting an estimation criterion wherever possible. The constraint level in the communicationconstrained case is 10 cost units per time step. The communication costs were Bp = 64 and Bm = 1. and increase it when there is a handoﬀ in the planning horizon (to come closer to meeting the constraint). a multiplicative update was used for the subgradient method. which ever occurs sooner (the average length is around 180 steps). λmax = 500 and Hmax = 2 (these parameters were determined experimentally).
e. Position entropy and communication cost for dynamic programming method with communication constraint (DP CC) and information constraint (DP IC) with diﬀerent planning horizon lengths (N ).8 x 10 2 4 Accrued communication cost Position entropy vs communication cost Average minimum entropy 3 2. . Upper ﬁgure compares average position entropy to communication cost. while lower ﬁgure compares average of the minimum entropy over blocks of the same length as the planning horizon (i.5 0.6 0.6 1.5 1 0.5 1 0. providing an indication of the variability across simulations. the quantity to which the constraint is applied) to communication cost.4 1.5 2 1..4 0.8 1 1.4 0.6 1.8 x 10 2 4 Accrued communication cost Figure 5. compared to the methods selecting as leader node and activating the sensor with the largest mutual information (greedy MI). and the sensor with the smallest expected square distance to the object (min expect dist).2 1.Position entropy vs communication cost 3 Average position entropy 2.5 2 1. ellipses illustrate covariance.8 1 1.6.2 0.5 0.6 0.2 1.2 DP CC N=5 DP CC N=10 DP CC N=25 DP IC N=5 DP IC N=10 DP IC N=25 Greedy MI Min Expect Dist 0. Ellipse centers show the mean in each axis over 100 Monte Carlo runs.4 1.
The results demonstrate that for this case the decoupled method using a single sensor at each time step results in similar estimation performance and communication cost to the Lagrangian relaxation method using an information constraint with the given level. The planning algorithm may be applied alongside a wide range of estimation methods. The introduction of secondary objectives as constraints provides a natural methodology to address the tradeoﬀ between estimation performance and communication cost.7. The additional ﬂexibility of the Lagrangian relaxation method allows one to select the constraint level to achieve various points on the estimation performance/communication cost tradeoﬀ. The linearized Gaussian approximation in Section 5. 5. since it requires the minimum entropy within the planning horizon to be less than a given value.4 Conclusion and future work This paper has demonstrated how an adaptive Lagrangian relaxation can be utilized for sensor management in an energyconstrained sensor network. SENSOR MANAGEMENT IN SENSOR NETWORKS demonstrated in Fig. which adaptively selects the leader node to minimize the expected communication cost expended in implementing the decision of the ﬁxed sensor management method.6 results in a structure identical to the OLFC. 5. 5.8 compares the adaptive Lagrangian relaxation method discussed in Section 5. The remainder of our algorithm (removing the linearized Gaussian approximation) may be applied to ﬁnd .1.2. Accordingly. using a longer planning horizon. which shows the adaptation of the dual variable for a single Monte Carlo run. the average minimum entropy is reduced. demonstrating that the information constraint is met more often with a longer planning horizon (as well as resulting in a larger communication saving). The algorithm is also applicable to a wide range of sensor models. the decoupled method using two sensors at each time step results in similar estimation performance and communication cost to the Lagrangian relaxation method using a communication constraint with the given level. 5. Similarly. and additional communication energy is saved. Fig. In the informationconstrained case.184 CHAPTER 5.1 with the decoupled scheme discussed in Section 5. The lower diagram in Fig. rather than being restricted to particular points corresponding to diﬀerent numbers of sensors. ranging from the Kalman ﬁlter to the particle ﬁlter.6 shows the average minimum entropy in blocks of the same length as the planning horizon. The ﬁxed sensor management scheme activates the sensor or two sensors with the observation or observations producing the largest expected reduction in entropy. increasing the planning horizon relaxes the constraint.
x 10 5 4 3 2 1 0 0 −3 Adaptation of dual variable λk 20 40 60 80 100 120 140 160 Time step (k) Communication cost accrual 3000 2000 1000 0 N=5 N=10 N=25 0 20 40 60 80 100 120 140 160 Time step (k) Figure 5. . and corresponding cumulative communication costs. Adaptation of communication constraint dual variable λk for diﬀerent horizon lengths for a single Monte Carlo run.7.
4 1. Position entropy and communication cost for dynamic programming method with communication constraint (DP CC) and information constraint (DP IC). providing an indication of the variability across simulations. ellipses illustrate covariance.5 1 0.2 1.6 0.6 1. .5 0. compared to the method which dynamically selects the leader node to minimize the expected communication cost consumed in implementing a ﬁxed sensor management scheme.8 1 1. The ﬁxed sensor management scheme activates the sensor (‘greedy’) or two sensors (‘greedy 2’) with the observation or observations producing the largest expected reduction in entropy.8 x 10 2 4 Accrued communication cost Figure 5.5 2 1.Position entropy vs communication cost 3 Average position entropy 2.2 Greedy/Decoupled Greedy 2/Decoupled DP CC N=25 DP IC N=25 0.8.4 0. Ellipse centers show the mean in each axis over 100 Monte Carlo runs.
and developing approximations which are less coarse than the linearized Gaussian model. 5.3 demonstrate that approximations based on dynamic programming are able to provide similar estimation performance (as measured by entropy).4. for a fraction of the communication cost in comparison to simple heuristics which consider estimation performance alone and utilize a single sensor. The discussion in Section 5. Conclusion and future work 187 an eﬃcient approximation of the OLFC as long as an eﬃcient estimate of the reward function (mutual information in our case) is available.1. . The simulation results in Section 5.7 provides a guide for eﬃcient implementation strategies that can enable implementation on the latest generation wireless sensor networks.Sec. Future work includes incorporation of the impact on planning caused by the interaction between objects when multiple objects are observed by a single sensor.
188 CHAPTER 5. SENSOR MANAGEMENT IN SENSOR NETWORKS .
then additional planning is un 189 . ﬁnding an eﬃcient integer programming solution that exploits the structure of beam steering. delineating problems in which additional open loop planning can be beneﬁcial from those in which it cannot.1. before outlining suggestions of areas for further investigation. 6. ﬁnding an eﬃcient heuristic sensor management method for object tracking in sensor networks. and ﬁnally. The extension is quite general in that it applies to arbitrary. e and counterexamples illuminate larger classes of problems to which they do not apply. For example.5 of the optimal performance is adequate.1 Summary of contributions The following sections outline the contributions made in this thesis. T 6. Extensions include tighter bounds that exploit either process diﬀusiveness or objectives involving discount factors. yielding a guarantee on the posterior Cram´rRao bound. The results are the ﬁrst of their type for sequential problems. if a factor of 0. and eﬀectively justify the use of the greedy heuristic in certain contexts. secondly. the logdeterminant of the Fisher information matrix was also shown to be submodular.Chapter 6 Contributions and future directions HE preceding chapters have extended existing sensor management methods in three ways: ﬁrstly.1 Performance guarantees for greedy heuristics The analysis in Chapter 3 extends the recent work in [46] to the sequential problem structures that commonly arise in waveform selection and beam steering. and applicability to closed loop problems. Examples demonstrate that the bounds are tight. obtaining performance guarantees for sequential sensor management problems. time varying observation and dynamical models. The results apply to objectives including mutual information. This chapter brieﬂy summarizes these contributions.
. i. closed loop planning over a horizon of N = 20 time steps using Np = 50 simulated values of observations at each stage would involve N complexity of the order O([Ns 2Ns ]N Np ) ≈ 10180 . minimizing energy cost subject to a constraint on estimation performance. in a problem involving Ns = 20 sensors. and observations requiring diﬀerent time durations to complete.3 Sensor network management In Chapter 5.1. Performing planning of this type through full enumeration would require evaluation of the reward of more than 10100 diﬀerent observation sequences.1. these experiments also illustrate a new capability for exploring the beneﬁt that is possible through utilizing longer planning horizons. As well as demonstrating that the algorithm is suitable for online application. CONTRIBUTIONS AND FUTURE DIRECTIONS necessary for any problem that ﬁts within the structure. The trade oﬀ between these two quantities was formulated by maximizing estimation performance subject to a constraint on energy cost. Simulation results demonstrate the . Solutions with guaranteed nearoptimality were found by simultaneously reducing the upper bound and raising a matching lower bound.6 × 105 . The simpliﬁcations enable computation over much longer planning horizons: e. The algorithm has quite general applicability. For example. the simpliﬁcations yield worstcase 3 computation of the order O(N Ns ) = 1. admitting time varying observation and dynamical models. we have quantiﬁed the small beneﬁt of additional open loop planning in problems where models exhibit low degrees of nonstationarity. The analysis from Chapter 3 was utilized to obtain an upper bound on the objective function. The online guarantees conﬁrm cases in which the greedy heuristic is even closer to optimality. 6.2 Eﬃcient solution for beam steering problems The analysis in Chapter 4 exploits the special structure in problems involving large numbers of independent objects to ﬁnd an eﬃcient solution of the beam steering problem. yet still captures the essential structure of the underlying trade oﬀ. including mutual information and the posterior Cram´rRao bound. or the dual of this..190 CHAPTER 6. The methods are applicable to a wide range of objectives. e Computational experiments demonstrated application to problems involving 50–80 objects planning over horizons up to 60 time slots.g. An alternative formulation that was specialized to time invariant rewards provided a further computational saving. we presented a method trading oﬀ estimation performance and energy consumed in an object tracking problem. Our analysis has proposed a planning method that is both computable and scalable.e. 6.
. The approximations are applicable to a wide range of problems. Directions in which this analysis may be extended include those outlined in the following paragraphs.2.5 for any lookahead length. under diﬀusive assumptions.2.1.1 Performance guarantees Chapter 3 has explored several performance guarantees that are possible through exploitation of submodular objectives. A guarantee with a factor of (e − 1)/(2e − 1) ≈ 0.g. even if the linearized Gaussian assumption is relaxed. 6. in between the two original time slots in Example 3. 6. Observations consuming diﬀerent resources Our analysis inherently assumes that all observations utilize the same resources: the same options available to us in later stages regardless of the choice we make in the current stage.5). in which all observations are uninformative. we can obtain the same factor of 0. e. or separate resource constraints for each time slot. In [46]. Expanding this analysis to sequential selection problems involving either a single resource constraint encompassing all time slots. a guarantee is obtained for the subset selection problem in which each observation j has a resource consumption cj . Recommendations for future work 191 dramatic reduction in the communication cost required to achieve a given estimation performance level as compared to previously proposed algorithms. However.e. would be an interesting extension. if we introduce additional time slots. 6. the remaining approximations may be applied to ﬁnd an eﬃcient approximation of the open loop feedback controller as long as an eﬃcient estimate of the reward function (mutual information in our case) is available.. one would expect that additional lookahead steps would yield an algorithm that is closer to optimal. but it may be possible to obtain tighter guarantees (i.Sec. as well as some of the boundaries preventing wider application..g.2 Recommendations for future work The following sections describe some promising areas for future investigation. and we seek the most informative subset A of observations for which j∈A cj ≤ C. 0. Guarantees for longer lookahead lengths It is easy to show that no stronger guarantees exist for heuristics using longer lookahead lengths for general models.387 can be obtained quite easily in the latter case. e. .
linear Gaussian problems with dynamics and observation models satisfying particular properties.. e. it may be possible to introduce additional structure (e. as outlined in the following sections. the incremental reward of an exploration subset element conditioned on a given candidate subset is low enough that it is unlikely to be chosen. e.. and explored its performance and computation complexity.192 Closed loop guarantees CHAPTER 6.2 Integer programming formulation of beam steering Chapter 4 proposed a new method for eﬃcient solution of beam steering problems. Stronger guarantees exploiting additional structure Finally.g.2.3 represents one of many ways in which the update between iterations could be performed. generating candidate subsets for each of the chosen exploration subset elements. 6. while the guarantees in Chapter 3 have been shown to be tight within the level of generality to which they apply. CONTRIBUTIONS AND FUTURE DIRECTIONS Example 3. However.3 establishes that there is no guarantee on the ratio of the performance of the greedy heuristic operating in closed loop to the performance of the optimal closed loop controller.g. diﬀusiveness and/or limited bandwidth observations) to obtain some form of weakened guarantee. Deferred reward calculation It may be beneﬁcial to defer calculation of some of the incremental rewards.g. e. rather than just the one with the highest reward increment. it would seem unnecessary to recalculate the incremental reward when the candidate subset is extended. . Alternative update algorithms The algorithm presented in Section 4.g.3. An example of this is the result that greedy heuristics are optimal for beam steering of onedimensional linear Gaussian systems [33]. it may be possible to obtain stronger guarantees for problems with speciﬁc structure... There are several areas in which this development may be extended. It remains to explore the relative beneﬁts of other update algorithms.
Sec. but is more diﬃcult in integer programming problems since the bounds previously evaluated in the branch and bound process are (in general) invalidated. one may utilize a parallel instance of the algorithm from Chapter 5 for each object. A further extension along the same line would be to integrate the algorithm for generating new candidate sets with the branch and bound procedure for the integer program. A major computational saving may result if a method is found that allows the solution of the previous iteration to be applied to the new solution.1 to ﬁnd the best solution amongst those explored so far (i. the concept may be easily extended to multiple objects. with demonstrated empirical performance. It may be beneﬁcial to incorporate heuristic searches that introduce additional candidate subsets that appear promising in order to raise the lower bound quickly as well as decreasing the upper bound. Recommendations for future work 193 Accelerated search for lower bounds The lower bound in Section 4. changes made between iterations of the integer program force the solver to restart the optimization. one may either store the joint conditional distribution of the object group at one sensor. This is easily performed in linear programming problems. In the former case..1 tend to focus on reducing the upper bound to the reward rather than on ﬁnding solutions for which the reward is high. However. Integration into branch and bound procedure In the existing implementation.3.2. 6.3 Sensor network management Chapter 5 provides a computable method for tracking an object using a sensor network.e. When objects are wellseparated in space.6 utilizes the results of the Algorithm 4.2. Problems involving multiple objects While the discussion in Chapter 5 focused on the case of a single object. Possible extensions include multiple objects and performance guarantees. the decisions made by Algorithm 4. there will be a control choice corresponding to breaking the joint . 6. those for which the exact reward has been evaluated). When objects become close together and observations induce conditional dependency in their states. or utilize a distributed representation across two or more sensors. One example of this would be to include candidate subsets corresponding to the decisions made by the greedy heuristic—this would ensure that a solution at least as good as that of the greedy heuristic will be obtained regardless of when the algorithm is terminated.
both of which could be incorporated into the tradeoﬀ performed by the constrained optimization. Performance guarantees The algorithm presented in Chapter 5 does not possess any performance guarantee.194 CHAPTER 6. In the case of a distributed representation of the joint conditional distribution.9 provides an example of a situation in which one element of the approximate algorithm performs poorly. This will result in a loss of information and a saving in communication cost. An extension of both Chapters 3 and 5 is to exploit additional problem structure and amend the algorithm to guarantee performance. . CONTRIBUTIONS AND FUTURE DIRECTIONS distribution into its marginals after the objects separate again. it will be necessary to quantify both the beneﬁt (in terms of estimation performance) and cost of each of the communications involved in manipulating the distributed representation. Section 3.
[10] Dimitri P. Abadir and Jan R. 50(2):174–188. Constrained Markov decision processes. Belmont. Anderson and John B. [7] M Behara. Artech House. 2005. Neil Gordon. Bertsekas. MA. 195 . February 2002. and Tim Clapp. Wiley Eastern Ltd. Belmont. Chapman and Hall.Bibliography [1] Karim M. 94(5):1452–1475. 2003.E. Matrix Algebra. American Economic Review. second edition. [8] P. Bertsekas. An eﬃcient ascendingbid auction for multiple objects. Ausubel. Sanjeev Arulampalam.A. MA. 1999. [3] Brian D. 2003. UK. Berry and D. Cambridge University Press. Athena Scientiﬁc. Magnus. 1993. second edition. [4] M. Proceedings of the International. Techniques and Software. 2000. Norwood. 1990. 1999. Moore. A tutorial on particle ﬁlters for online nonlinear/nonGaussian Bayesian tracking. Fogg. Englewood Cliﬀs. [6] Yaakov BarShalom and XiaoRong Li. On the use of entropy for optimal radar resource management and control. Optimal ﬁltering.B. MA. [2] Eitan Altman. Simon Maskell. India. Additive and Nonadditive Measures of Entropy. London. Nonlinear Programming. In Radar Conference. NJ. pages 572–577. 1979. Estimation and Tracking: Principles. Dynamic Programming and Optimal Control. New Delhi. PrenticeHall. Athena Scientiﬁc. IEEE Transactions on Signal Processing. O. [5] Lawrence M. [9] Dimitri P. December 2004.
May 2006. Journal of Mathematical Analysis and Applications. Martin J. Auction algorithms for network ﬂow problems: A tutorial introduction. IEEE. M¨jdat Cetin. Energy eﬃcient target tracking in a sensor network using nonmyopic sensor scheduling. 43(910):1114–1135. Chhetri. Ross. pages 4939–4944. and A. In IEEE Workshop on Statistical Signal Processing. July 1995. [14] Samuel S. Automatica. MA. Athena Scientiﬁc. Optimal search strategies in dynamic hypothesis testing. and Alan S. Computational Optimization and Applications. Casta˜´n. 36(9):1249–1274. Morrell. Blackman and Robert Popoli. Sensor scheduling using a 01 mixed integer programming framework. Data asu ¸ sociation based on optimization in graphical models with application to sensor networks.S. 25(7):1130–1138. Man and Cybernetics. pages 1202–1207. Darryl Morrell. 1997. PapandreouSuppappola. Approximate dynamic programming for sensor management. Tsitsiklis. [15] V. [20] Amit S. Wainwright. and A. Belmont. Mathematical and Computer Modelling. 1999. no In Proc 36th Conference on Decision and Control. A survey of computational complexity results in systems and control. Introduction to Linear Optimization. [13] Frederick J. 1992. September 2000. [18] David A. D. December 1997. [22] A. D. IEEE Transactions on. Chhetri. Design and Analysis of Modern Tracking Systems. Bertsekas. Beutler and Keith W. [21] A. 112 (1):236–252.196 BIBLIOGRAPHY [11] D. [19] Lei Chen. July 2005. In Proc. [17] David A. Willsky.S. Morrell. 2006. MA. Casta˜´n. Blondel and John N. . pages 549–552. Artech House.P. [16] David A. PapandreouSuppappola. Chhetri. In no IEEE Conference on Decision and Control.D. [12] Dimitris Bertsimas and John N. 1:7–66. November 1985. Eighth International Conference of Information Fusion. Stochastic control bounds on sensor network performance. September/October 2003. Norwood. Casta˜´n. In Fourth IEEE Workshop on Sensor Array and Multichannel Processing. Sysno tems. 2005. Scheduling multiple sensors using particle ﬁlters in target tracking. Optimal policies for controlled Markov chains with a constraint. and Antonia PapandreouSuppappola. Tsitsiklis.
David A. pages 405–416. [27] Arthur Gelb. and A. [26] Satoru Fujishige. Drummond. David J. R. 1991. S. Digital Signal Processing. [30] M. 2004. Man and Cybernetics.BIBLIOGRAPHY 197 [23] Thomas M. second edition. and Target Recognition VIII. . Willsky. and Bill Moran. Thomas. SpringerVerlag. Cambridge.F. 1974. and A. MA.L. IEEE Transactions on Aerospace and Electronic Systems. Systems. John W. Kirubarajan. [33] Stephen Howard. Sensor scheduling for target tracking: A Monte Carlo sampling approach. Smith. New York. 1993. 140:107–113. pages 249–255. Novel approach to nonlinear and nonGaussian Bayesian state estimation. J.J. Hintz and E. SPIE.M. Fisher. Multisensor resource deployment using posterior cramerrao bounds. rectangular. Ihler. IEEE Transactions on.S. MIT Press. and M. Sensor Fusion. 40(2):399–416. 1991. Elsevier. cost matrices. Optimal policy for scheduling of GaussMarkov systems. Applied optimal estimation. volume 3720. Soﬁa Suvorova. [31] Kenneth J. BarShalom. [25] Emre Ertin. Chong. to appear. 2005. [32] K. [28] Neil Gordon. and Y. 2004. Cover and Joy A. 21(1):237–244. McVey. Journal of the SDI Panels on Tracking. [29] Ying He and Edwin K. Maximum mutual information principle for dynamic sensor query problems. Hernandez. [34] A. 2005. ﬂoating point. Comparison of 2D asno signment algorithms for sparse. T. L. Elements of Information Theory. MA. T. Boston. In Proceedings of the Seventh International Conference on Information Fusion.E. (4):81–97. Fisher III. Nonparametric belief propagation for selfcalibration in sensor networks. John Wiley and Sons. Multiprocess constrained estimation. McIntyre. W. Submodular functions and optimization. Hintz and Gregory A. Salmond. and Lee C. In Signal Processing. December 1990. Bellovin. volume 58 of Annals of discrete mathematics. Goal lattices for sensor management. Casta˜´n. 1999. Moses. IEEE Journal of Selected Areas in Communication.S. IEE Proceedings F: Radar and Signal Processing. In Proc IPSN 2003. April 2003. P. [24] O. Potter. NY.
Covariance control for multisensor systems. Statistical Science. 2003. Jordan. E. Kalandros and L. [36] A. IEEE Transactions on [see also Acoustics. 38(4):1138–1157. and A. Nearoptimal nonmyopic value of information in graphical models.198 BIBLIOGRAPHY [35] A.J. Speech. IEEE Transactions on]. 2004. Informationbased sensor management for landmine detection using multimodal sensors. International Journal of High Performance Computing Applications. 2002. 2005. 40(5):1536–1550.J. Signal Processing. Communicationsconstrained inference.Y. Uhlmann. and Signal Processing.W. Proceedings of the IEEE. IEEE Transactions on Aerospace and Electronic Systems. Julier and J. [41] Keith D. Technical Report 2601. Massachusetts Institute of Technology Laboratory for Information and Decision Systems. Eﬃcient multiscale sampling from products of Gaussian mixtures. SPIE. . Peter A. [39] S. In Neural Information Processing Systems 17. 51(10):2592–2601.K. Graphical models.J. volume 5794. 33(4):1180–1188. Tasking distributed sensor networks. Willsky. 2561(1):66–70. Kastella. Ihler. Shashank Mehrotra. 1995. Ihler. [46] Andreas Krause and Carlos Guestrin. 19(1):140–155.T. Gaussian particle ﬁltering. Unscented ﬁltering and nonlinear estimation. Djuric.T. Collins. [38] Michael I. Pao. [37] Mark Jones.M. Kotecha and P. J. 2002. pages 1098–1107.S. Waveform selective probabilistic data association. and A. Fisher III. Evans. IEEE Transactions on Information Theory. and Jae Hong Park. [40] M. W.T. Kershaw and R. 2004. [44] Mark P.B. Kershaw and R. Kolba. In Detection and Remediation Technologies for Mines and Minelike Targets X.J. July 2005. Discrimination gain to optimize detection and classiﬁcation. Optimal waveform selection for tracking systems. Freeman. In UAI 2005. SPIE Signal and Data Processing of Small Targets. and Leslie M. Torrione. October 2003. IEEE Transactions on Aerospace and Electronic Systems.H. [43] D. March 2004.J. Sudderth. October 1997. September 1994. 92(3):401–422. Willsky. Evans. [45] J.S. [42] D. 16 (3):243–257.
Hero III. 50(6):1382–1397. volume 1. IEEE Transactions on. and Alfred O. December 2005. [51] Chris Kreucher. Kreucher. and Alfred O. and Alfred O. volume 5204. Alfred O. [49] Chris Kreucher.BIBLIOGRAPHY 199 [47] Andreas Krause. Keith Kastella. Algorithms for optimal scheduling and management of hidden Markov model sensors. Hero III. Hero III. pages 704–711. Anupam Gupta. A bayesian method for integrated multitarget tracking and sensor management. and Ajit Paul Singh. December 2004. Sensor management using an active sensing approach. Nearoptimal sensor placements in Gaussian processes. Keith Kastella. [55] V. The International Society for Optical Engineering. [50] Chris Kreucher. Eﬃcient methods of nonmyopic sensor management for multitarget tracking. In International Conference on Information Fusion. Keith Kastella. pages 480–489. Alfred O. 85(3):607–624. Kreucher. In Fifth International Conference on Information Processing in Sensor Networks. Kreucher. [48] Andreas Krause. Carlos Guestrin. and Daniel Chang. 2003. [52] Chris M. April 2006. Alfred O. and Keith Kastella. Informationbased sensor management for multitarget tracking. In 43rd IEEE Conference on Decision and Control. Keith Kastella. In Proceedings of The Thirteenth Annual Conference on Adaptive Sensor Array Processing (ASAP). Signal Processing. [54] Christopher M. Hero III. In International Conference on Machine Learning. In SPIE Signal and Data Processing of Small Targets. Keith D. . Hero III. Informationbased sensor management for simultaneous multitarget tracking and identiﬁcation. A comparison of task driven and information driven sensor management for target tracking. Signal Processing. 2002. [53] Chris M. Kastella. August 2005. Carlos Guestrin. June 2005. Hero III. Nearoptimal sensor placements: Maximizing information while minimizing communication cost. March 2005. Krishnamurthy. In IEEE Conference on Decision and Control. and John Kleinberg. 2003. and Ben Shapo.
[61] II Lewis. June 1999. Logothetis and A. pages 552–557. [57] V. James Reich. Correction to ‘Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking’. [59] B. IEEE Transactions on Information Theory. pages 2402–2406. San Diego. Krishnamurthy and R. Yu Hen Hu. La Scala. The characteristic selection problem in recognition systems. Sayeed. 51(6):1662–1663. [58] Thomas Kurien. 49(12):2893–2908. volume 1. (4):378–391. and Leslie Pack Kaelbling. December 2001. 8(2):171–178. Evans. ArtechHouse. IEEE Transactions on. September 2003. La Scala. Brown University. 1995. Optimal adaptive waveform selection for target detection. Evans. and Feng Zhao. pages 43–83. Krishnamurthy and R. In Proceedings of the International Radar Conference. Rezaeian. CA. Hidden Markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. Evans. On sensor scheduling via information theoretic criteria.J. MA. Issues in the design of practical multitarget tracking algorithms. Cassandra. Norwood.J. volume 4. Moran.200 BIBLIOGRAPHY [56] V. classiﬁcation. Detection. Wong. March 2002. [63] Michael L. P. and R. [62] Dan Li. In MultitargetMultisensor Tracking: Advanced Applications. pages 492–496. June 2003. Eﬃcient dynamicprogramming updates in partially observable Markov decision processes. EURASIP Journal on Applied Signal Processing. Littman.F. . 19(2):17–29. Isaksson. and Akbar M. Optimal adaptive waveform selection for target tracking.J. [60] B. IEEE Transactions on. W. Technical Report CS9519. 2005. [64] Juan Liu. Anthony R. Kerry D. M. In Proceedings of the American Control Conference. February 1962. [65] A. 1990. and tracking of targets. 2003. IEEE Signal Processing Magazine. In Proceedings of the Eighth International Conference on Information Fusion. Signal Processing. and B. ISSN 00189448. Moran. Signal Processing. Collaborative innetwork processing for target tracking.
Tse. Stochastic Models. Hoﬀman. E. inference and learning.B. 1998.A. and R. [72] James Munkres. Mathematical Programming.K. In Acquisition. 2005. 1994. [71] S.L. L. An analysis of approximations for maximizing submodular set functions–I. 5(1):32–38. Tracking. Integer and combinatorial optimization. Estimation. volume 3365. and R.L. Farsighted sensor management strategies for move/stop tracking. Mahler. Mori. Wolsey. Washburn. chapter Sensor management for radar: a tutorial. Maybeck. Wishner. McIntyre and Kenneth J. Tracking and classifying multiple targets without a priori identiﬁcation. Stochastic Models. VA.L. Algorithms for the assignment and transportation problems. Navtech. Wiley.BIBLIOGRAPHY 201 [66] Ronald P. volume 1. [74] A. 1988. VA. 2002. S. SPIE. L. and Stephen Howard. SPIE. [67] Peter S. volume 2. pages 252–263. [73] Kevin P. and M. An analysis of approximations for maximizing submodular set functions–II. Nemhauser and Laurence A. Maybeck. Estimation. [76] G. In Proceedings of the Eighth International Conference on Information Fusion. In M. Fisher. and M. December 1978. volume 2755. May 1986. University of California. ISSN 00189286. Murphy. In Signal Processing. Global posterior densities for sensor management.L. 1994.A. M.L. Nemhauser. pages 304–312. Berkeley. 2006. PhD thesis. Soﬁa Suvorova. [70] Bill Moran. IEEE Transactions on Automatic Control. 1996. SpringerVerlag. and Pointing XII. Advances in Sensing with Security Applications. Hintz. [77] G. [68] Peter S. Arlington.J. [69] Gregory A. Dynamic Bayesian networks: representation. and Control. and Target Recognition V. Fisher. Nemhauser. 31(5):401–409. Wolsey. Sensor Fusion. Information theoretic approach to sensor scheduling. Nedich. Balinski and A. Wolsey. Schneider. and Control. [75] George L. Journal of the Society for Industrial and Applied Mathematics. Navtech. pages 566–573. . volume 1. March 1957. CheeYee Chong. 14(1): 265–294. Arlington.
Communications of the ACM. [83] Branko Ristic. In IPSN ’03. Freeman. Papadimitrou and John N. 21(5):1071–1088. December 1979.S. [84] Louis L. AddisonWesley.B. Pottie and W. W. [79] David C. Schneider. 43(5):51–58. 2004.202 BIBLIOGRAPHY editors. Evans. volume 5. [87] Richard D. [86] Sumeetpal Singh. A. Artech House. 2004. Variance reduction for monte carlo implementation of adaptive sensor management. Willsky. May 2000.L. Reid. Massachusetts Institute of Technology. Nonparametric belief propagation. The complexity of Markov decision processes. and A. August 1987. Beyond the Kalman Filter: Particle Filters for Tracking Applications. Sondik. and F. Polyhedral combinatorics. [78] C. IEEE Transactions on Automatic Control.J.K. 2004. Ihler. AC24(6):843–854. Operations Research. [88] E. Estimation and Time Series Analysis. An algorithm for tracking multiple targets. A. Iterative combinatorial auctions: Theory and practice. Parkes and Lyle H. W. [81] G. Ihler.S. volume 8 of Mathematical programming study. Sanjeev Arulampalam. BaNgu Vo. Technical Report LIDSTR2551. Mathematics of Operations Research.J. Tsitsiklis. G. In Proceedings of the American Control Conference. pages 73–87. 1978. Wireless integrated network sensors. pages 74–81. Reading. Freeman. 12(3):441–450. and Arnaud Doucet. Sudderth. Scharf. April 2003. 2000. pages 901–908. [89] E. Pait. Kaiser.H. pages 4752–4757. Smart dust (keynote address). In Computer Vision and Pattern Recognition. [80] Kris Pister. 2002. The optimal control of partially observable Markov decision processes over a ﬁnite horizon. In Proc 17th National Conference on Artiﬁcial Intelligence (AAAI). Sudderth. 1991. Statistical Signal Processing: Detection. Closing the loop in sensor fusion systems: stochastic dynamic programming approaches. . Seventh International Conference of Information Fusion. [85] M. 2003. In Proc. MA. Elsevier. 1973. Nonparametric belief propagation. Smallwood and Edward J.T.T. and Neil Gordon. [82] Donald B. and A.T.M. Ungar. Robin J. Willsky. Mealy.
volume 1. Schneider. In Proceedings of the Fifth International Conference on Information Fusion. [97] J. Jaewon Shin. [93] R.BIBLIOGRAPHY 203 [90] P. [95] Feng Zhao. Stochastic Processes on Graphs: Geometric and Variational Approaches. Driessen. pages 712–718. [94] Kirk A. Fox. C. Interscience. IEEE Signal Processing Magazine. Stochastic dynamic programming based approaches to sensor resource management. PhD thesis. 2001. Muravchik. pages 608–615. Wainwright. and J. Massachusetts Institute of Technology.B. Y. Driessen. Tracking performance constrained MFR parameter control: applying constraints on prediction accuracy. and James Reich. July 2005. and A. Informationdriven dynamic sensor collaboration. On tracking performance constrained MFR parameter control. and Modulation Theory. 19(2):61–72. Detection.J. 2003. Estimation. volume 1. Naval Research Logistics. Nehorai. and H. .H. Boers. October 2000. IEEE Transactions on Signal Processing. 1998.H.H. Tichavsky. Washburn. 2002. In International Conference on Information Fusion. [91] Harry L. In International Conference on Information Fusion. [96] J. Washburn.K. Zwaga and H. M. Wiley [92] Martin J. Posterior Cram´rRao bounds for e discretetime nonlinear ﬁltering. March 2002. The LP/POMDP marriage: Optimization with imperfect information. volume 1. 46(5): 1386–1396. Van Trees. 47(8):607–619. 2002. Zwaga. Yost and Alan R.
This action might not be possible to undo. Are you sure you want to continue?