You are on page 1of 69

WORKSHOP ON

MULTIROBOTIC SYSTEMS FOR SOCIETAL


APPLICATIONS

Editors
K. Madhava Krishna
Assistant Professor
Robotics Research Lab
IIIT Hyderabad, India.

Bipin Indurkhya
Professor
IIIT Hyderabad, India.
MAIN MENU

¾ FOREWORD

¾ ORGANIZING COMMITTEE

¾ PROGRAM COMMITTEE

¾ WORKSHOP PROGRAM

¾ TABLE OF CONTENTS

I
FOREWORD

Multirobotic applications have become increasingly popular due to their obvious merits
of:
• Sharing computational burden.
• Increased fault tolerance.
• Study of emergent behaviors.
• Augmented capabilities.

The workshop has focused on Multirobotic systems and applications that have
potential utilities for society. The workshop has attracted papers which describe new
ideas, formulations & proof of concept demonstrations. Particularly the area of multi-
robot localization including SLAM has attracted lot of attention and nearly 50% of the
workshop papers have been in this area. A paper on localizing multiple spills in a factory
like setting using biologically inspired behaviors imitated on a swarm of robots has been
particularly motivating.

Other papers have targeted the use of swarm for communication of emergencies in
hazardous situations, automated coverage, and learning by imitation & new theoretical
formulation for multi-agent state representation that actuates coordination in a team only
when necessary.

We would like to express our most sincere appreciation and thanks to all Organizing
Committee members, the Program Committee members, local arrangements chair Dr.
Rajeev Sangal & Gen. Bagga, our students DVK Viswanath and Amit Kumar Pandey for
working hard to make this workshop successful.

K. Madhava Krishna
Bipin Indurkhya

Organizing Committee

II
ORGANIZING COMMITTEE

K Madhava Krishna
Assistant Professor
Robotics Research Lab
IIIT Hyderabad, India.
Email: mkrishna@iiit.ac.in

Bipin Indurkhya
Professor
IIIT Hyderabad, India.
Email: bipin@iiit.ac.in

III
PROGRAM COMMITTEE

K Madhava Krishna
Asst Professor
IIIT Hyderabad
Gachibowli
Hyderabad - 500 032
AP, India
Phone: 91-40-2300 1967 (Extn: 269)
Email: mkrishna@iiit.ac.in
URL: http://mail.iiit.ac.in/~mkrishna

Bipin Indurkhya
Professor
IIIT Hyderabad
Gachibowli
Hyderabad - 500 032
AP, India
Phone: 91-40-2300 1967 (Extn: 182)e
Email: bipin@iiit.ac.in
URL: http://iiit.net/faculty/bipin.php

Rachid Alami
Directeur' de'Recherche
LAAS-CNRS
7 Avenue du Colonel Roche
Tolouse
Cedex – 31077
France
Phone: 33-561-33 63 46
URL: http://www.laas.fr/~rachid/

Raja Chatila
Directeur' de'Recherche
LAAS-CNRS
7 Avenue du Colonel Roche
Tolouse
Cedex – 31077
France
Phone: 33-561-33 63 44
URL: http://www.laas.fr/~raja/

IV
Thierry Simeon
Directeur' de'Recherche
LAAS-CNRS
7 Avenue du Colonel Roche
Tolouse
Cedex – 31077
France
Phone: 33-561-33 63 49
URL: http://www.laas.fr/~nic/

Elena Messina
Intelligent Systems Division
National Institute of Standards & Technology
100 Bureau Drive, Mail Stop 8230
Gaithersburg, MD 20899-8230
USA
Email: elena.messina@nist.gov
Telephone Number: (301) 975-3510
FAX Number: (301) 990-9688
Secretary: Karen Hollingsworth

Dr. Raj Madhavan


Research Staff Member
Computational Sciences Division
Oak Ridge National Laboratory
Oak Ridge, TN 37831.
Guest Researcher at:
Intelligent Systems Division
National Institute of Standards & Technology
100 Bureau Drive, Mail Stop 8230
Gaithersburg, MD 20899-8230
E-mail: raj.madhavan@nist.gov
Telephone Number: (301) 975-2865
FAX Number: (301) 990-9688
Secretary: Karen Hollingsworth

V
WORKSHOP PROGRAM

9:00am Inauguration of Workshop


9:10am Paper Presentation: Exploiting Factored Representations for
Decentralized Execution in Multi-agent Teams
Maayan Roth, Reid Simmons, Manuela Veloso.
9:40am Paper Presentation: GUARDIANS: a Swarm of Autonomous Robots for
Emergencies
Jacques Penders, et. al
10:10am Paper Presentation: Learning by Imitation: A Scheme for Offline
Learning of Robot Control through Demonstration by Other Robot
Umar Suleman, Rauf Baig, Mian M. Awais
10:40am Paper Presentation: Controlled Deployment of Sensor Networks for
Coverage Optimization
Sameera Poduri, Gaurav S. Sukhatme
11:10am Snacks & Tea
11:30am Paper Presentation: Glowworms-inspired multi-robot system for multiple
source localization tasks
P. Amruth, K.N. Krishnanand, D. Ghose
12:00pm Paper Presentation: Efficient Communication Strategy for Cooperative
Multirobot Localization
Valguima Odakura, Anna Helena Reali Costa
12:30pm Paper Presentation: Active Localization of Multiple Robots by Selection
of Best Frontiers
Rakesh Goyal, K Madhava Krishna, Shivudu Bhuvanagiri
1:00pm Paper Presentation: Multi-Robot Marginal-SLAM
Ruben Martinez-Cantin, Jos´e A. Castellanos & Nando de Freitas

1:30pm Lunch
2:30pm Panel Discussion: Have Multi-Robotic systems reached the society:
What needs to be done.
4:00pm Tea Break

4:30pm Visit to IIIT Robotics Research Lab.

VI
TABLE OF CONTENTS

1. Exploiting Factored Representations for Decentralized Execution


in Multi-agent Teams
Maayan Roth, Reid Simmons & Manuela Veloso………………………1
2. GUARDIANS: a Swarm of Autonomous Robots for Emergencies
Jacques Penders, et. al ………………………………8
3. Learning by Imitation: A Scheme for Offline Learning of Robot
Control through Demonstration by Other Robot
Umar Suleman, Rauf Baig & Mian M. Awais…………………………..17
4. Controlled Deployment of Sensor Networks for Coverage
Optimization
Sameera Poduri & Gaurav S. Sukhatme ………………………………24
5. Glowworms-inspired Multi-Robot System for Multiple Source
Localization Tasks
P. Amruth, K.N. Krishnanan & D. Ghose …………………………….32
6. Efficient Communication Strategy for Cooperative Multirobot
Localization
Valguima Odakura & Anna Helena Reali Costa ………………………38
7. Active Localization of Multiple Robots by Selection of Best
Frontiers
Rakesh Goyal, K Madhava Krishna & Shivudu Bhuvanagiri…………..45
8. Multi-Robot Marginal-SLAM
Ruben Martinez-Cantin, Jos´e A. Castellanos & Nando de Freitas……52

VII
Exploiting Factored Representations for Decentralized
Execution in Multi-agent Teams


Maayan Roth Reid Simmons Manuela Veloso
Robotics Institute Robotics Institute Robotics Institute
Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University
5000 Forbes Avenue 5000 Forbes Avenue 5000 Forbes Avenue
Pittsburgh, PA, USA Pittsburgh, PA, USA Pittsburgh, PA, USA
mroth@andrew.cmu.edu reids@cs.cmu.edu veloso@cs.cmu.edu

ABSTRACT coordination. Several extensions of the Markov decision pro-


In many cooperative multi-agent domains, there exist some cess have been proposed to enable decision-theoretic model-
states in which the agents can act independently and oth- ing of multi-agent teams. Decentralized MDPs (Dec-MDP)
ers in which they need to coordinate with their teammates. can be used to model teams that, like the ones addressed
In this paper, we explore how factored representations of in this paper, operate under collective observability [1]. In
state can be used to generate factored policies that can, a collectively observable domain, while each agent is unable
with minimal communication, be executed distributedly by to independently identify the world state, the union of all
a multi-agent team. The factored policies indicate those teammate observations at a given timestep renders the state
portions of the state where no coordination is necessary, au- fully observable.
tomatically alert the agents when they reach a state in which
they do need to coordinate, and determine what the agents Although Dec-MDPs provide a comprehensive semantics for
should communicate in order to achieve this coordination. modeling multi-agent teams in collectively observable do-
We evaluate the success of our approach experimentally by mains, their use in planning for such teams is limited by the
comparing the amount of communication needed by a team intractability of generating their policies. Even for teams of
executing a factored policy to a team that needs to commu- as few as two agents, the problem of generating an optimal
nicate in every timestep. policy for the Dec-MDP representing that team is NEXP-
complete [1]. One approach to overcoming the high complex-
Categories and Subject Descriptors ity of planning for Dec-MDPs has been the identification of
I.2.11 [Artificial Intelligence]: Distributed Artificial In- classes of domains that are easier to solve. One such class,
telligence—Multiagent systems transition-independent Dec-MDPs are domains in which the
global state is partitionable into local states for each agent.
This local state depends only on the individual actions of a
General Terms single agent, rather than on the joint actions of the team.
Algorithms, Design, Performance However, because the team shares a joint reward that de-
pends on the joint state and action, a transition-independent
Keywords Dec-MDP cannot be solved as a set of independent MDPs.
decentralized execution, multi-agent MDP, communication Policy-generation for a transition-independent Dec-MDP is
NP-complete [6], a significant reduction in complexity from
the problem of solving a general Dec-MDP. Unfortunately,
1. INTRODUCTION not all multi-agent domains are transition-independent.
The problem of coordinating teams of cooperative agents
operating under uncertainty is a difficult one that has re-
Factoring is another approach that has been applied to the
ceived a great deal of recent attention. This problem is a
problem of planning for multi-agent teams. Factoring has
challenging one because, while each agent observes only its
been applied successfully to the problem of speeding up
own local point of view, the agents must reason about both
policy computation for large MDPs (e.g. [2, 7]). Unlike
the states and actions of their teammates in order to act in
standard MDP representations which enumerate all possi-
∗The first author is a student. ble states, factored MDPs take advantage of conditional in-
dependences among state variables to allow more compact
problem representations, and in some cases, more compact
policies. The presence of an additional type of indepen-
dence, called context-specific independence, can lead to even
greater savings in the size of problem representation and
the efficiency of planning [3]. Work has been done in ap-
plying factoring to decentralized planning for multi-agent
MDPs [8]. In that approach, it is assumed that agents have
full observability of all relevant state variables and the chal-

1
lenge is to enable those agents to plan independently without The state, S, is comprised of n state features or variables
reasoning over the full set of joint actions. X = hX1 . . . Xn i, where each state si is an assignment of
values to each feature in X . The features may be binary
When agents must coordinate without communication, gen- or multi-valued, or both. While in the most general case,
erating a policy for the team is NEXP-complete. However, a Dec-MDP model also contains a set of joint observations,
if it is assumed that agents will be allowed to communi- Ω, we limit ourselves to domains in which the state features
cate with their teammates at every timestep during exe- are can be partitioned into α sets of local state features, one
cution, then a centralized policy, which we refer to as the for each agent, with each agent observing its local features
free-communication policy, can be generated in polynomial perfectly, and a set of joint features that are fully observable
time, as if the entire team were being modeled by a single- by all of the agents.
agent MDP [12]. In general, though, communication is not
free, and it is necessary to minimize the use of communica- Each agent i has a set Ai of possible individual actions. Ev-
tion resources during execution. Like Roth et al. [13], we ery joint action in A is comprised of one individual action
assume that each agent in our decentralized team can inde- for each agent on the team. Because our model is factorable,
pendently compute the free-communication policy, and does the relationship between state features and joint actions can
so at plan-time. In this paper, we explore the challenge of be represented as a Dynamic Decision Network [4]. For each
decentralized execution of this centralized policy while min- joint action, there is a Bayes network (such as the example
imizing the use of communication resources. network shown in Figure 1) that specifies the relationship
between state variables at time t and time t + 1. State fea-
In our work, we employ techniques for solving factored MDPs tures at time t+1 are dependent on the values of their parent
to generate the centralized, free-communication plan for our features in time t and are conditionally independent of all
team. We believe that there is a natural synergy in applying other features. Associated with each variable is a conditional
factored representations to cooperative multi-agent teams. probability table describing the probability of that variable
When represented for centralized planning, even assuming given the different possible values of its parents. In many
the presence of free communication, multi-agent teams can cases, a variable will depend on certain parents in some con-
quickly form large domains with many relevant state vari- texts but not others. When this is the case, the conditional
ables and joint actions. Factored representations and solu- probability table can be represented more compactly as a
tion techniques can speed up policy-generation in large do- tree [3]. For example, in the conditional probability tree for
mains, but only if those domains contain significant amounts variable X2 at time t + 1, shown in Figure 1, X2 depends on
of conditional and context-specific independence. It is our the value of X3 at time t when X1 has the value 1, but is
claim that many multi-agent domains exhibit a large amount independent of X3 otherwise. This type of independence is
of context-specific independence, allowing them to be repre- referred to as context-specific independence [3].
sented compactly and solved efficiently using factored repre-
sentations. Although there will be times when agents need
to coordinate their actions or gain knowledge of their team- X1 X1
X1
mates’ local state variables, multi-agent domains also con- 0 1

tain situations in which agents can act independently for p1 X3


0 2
long periods of time. The challenge is to identify the in- X2 X2 1

dependences present in any given domain, and thus enable X2 p2 X2


0 1 0 1
agents to coordinate only when necessary. Through address-
p3 p4 p5 p6
ing this challenge, factoring can enable the efficient solution X3 X3
of and execution in some Dec-MDP domains.

In the remainder of this paper, we show how factored policies


Figure 1: Example action network for a joint action
can enable a team of cooperative agents to execute indepen-
ak ∈ A, with tree-structured CPT for X2 .
dently, with each agent choosing when and what to commu-
nicate if it is necessary for team coordination. We introduce
an algorithm for transforming a centralized factored policy
Our use of a factored state representation in modeling our
into a distributed policy, and evaluate the success of our
multi-agent teams enables us to generate tree-structured
approach experimentally.
policies like the ones described in Boutilier et al. [2]. These
factored policies take the form of a decision tree, where the
internal nodes are state variables and the leaves are joint
2. FACTORED DEC-MDP MODEL actions. Structured Policy Iteration (SPI) is an algorithm
The Dec-MDP model is a Markov decision process extended that can be used to compute decision tree-structured poli-
to handle joint actions and observations by a cooperative cies for factored MDPs. We refer you to [2] for the details
multi-agent team [1]. A Dec-MDP is composed of the tu- of the algorithm, but point out that it requires the MDP to
ple hα, S, A, P, Ri where: α is the number of agents on the be represented as a Dynamic Decision Network, with con-
team, S is a finite set of world states, A is the set of possible ditional probability tables for the variables represented as
joint actions, with P defining a transition function that spec- trees, and that the reward function, also represented as a
ifies the likelihood of the team transitioning from one state tree, depend only on state features and not on the selected
to the next given a particular joint action, and R specifies action. This constraint is easily met by adding additional
the joint reward function of the team, given the state and a state variables to the domain representation. We use SPI to
joint action. generate a factored free-communication policy for our team.

2
Figure 2 shows an example of one such factored policy. The Table 1 shows how communication enables the distributed
actions at the leaves are joint actions, and V1 and V2 are execution of a factored policy. Each agent observes its lo-
state variables. cal state features, as well as any global state variables that
are observable to the team, and traverses the policy tree ac-
V1 cording to the values of those features until it either reaches
a leaf, whereupon it executes the individual action at that
leaf, or until it reaches a decision point in the policy that
ax V2
depends on a feature to which it does not have local ac-
cess. When this happens, the agent knows it must ask its
teammate to communicate the feature value. The amount
of communication needed during execution measures of the
ay cz degree of context-specific independence among agents in a
domain. Agents executing in domains where they are often
Figure 2: A factored policy over joint actions. independent of their teammates will need to communicate
less than agents in domains where they must coordinate fre-
quently.
3. DECENTRALIZED EXECUTION OF AN
INDIVIDUAL FACTORED POLICY Table 1: Recursively execute a factored policy, com-
In order to execute distributedly, each agent on our team municating when necessary.
must have a policy that maps assignments of state features
to individual actions. In Section 4 we detail an algorithm for function Execute(policy, features, teammate features)
converting a factored joint policy generated by SPI into in- returns an action
dividual factored policies. For now, we note that an agent’s input: policy, a tree-structured policy for agent i
individual policy may depend on state features that it can- features, the current values of i’s state features
not observe directly. For example, in the policy for agent teammate features, a set of teammate features
i shown in Figure 3, the feature labeled X j a local feature for which values are known (from
of agent j. Leaves in the policy that depend only on agent previous communication)
i’s features indicate portions of the state space where that
agent has context-specific independence from its teammates. 1. if policy is a leaf return policy.action
Communication is needed to facilitate the execution of those 2. if policy.variable is in features
portions of the policy without this context-specific indepen- a. x ← features[policy.variable]
dence. b. return Execute(policy.childx , features,
teammate features)
i
X 3. if teammate features is empty
a. Find all the variables in policy.
b. Build teammate variables by asking teammates
a X
j for the values of those variables.
4. x ← teammate features[policy.variable]
5. return Execute(policy.childx , features,
teammate features)
a b c

Figure 3: A factored policy over individual actions. To avoid the possibility that an agent will have to communi-
cate more than once per timestep, when an agent reaches a
decision point where it discovers that it needs a piece of in-
formation local to one of its teammates, the agent attempts
Previous approaches to communication for Dec-MDPs and to predict what other information it may need to know. The
Dec-POMDPs has utilized either the tell paradigm of com- agent searches for all of the variables contained in the sub-
munication, in which each agent reasons about its local tree of its policy rooted at this decision point and asks its
knowledge and decides whether to send information to its teammates for all of those feature values. This means that,
teammates (e.g. [13]), or the synchronize paradigm, in in some cases, the agent will be requesting information that
which if at least one agent initiates communication, all the it later discovers is irrelevant. However, we believe that in
teammates broadcast their full observation histories (e.g. [11, most domains where it is necessary to conserve communica-
5]). In this paper, we consider a different option, query tion resources, it is desirable to minimize the total number
communication. Instead of trying to predict if its informa- of messages sent, not the size of those messages. Only a
tion will be useful to its teammates, each agent asks its minor change is required in the execution algorithm to ac-
teammates for information when needed. In order to query, commodate domains in which it is desirable to minimize the
agents must reason both about when communication is nec- total number of features communicated.
essary, as well as what information is needed. Unlike other
approaches which use broadcast communication, query com-
munication is peer-to-peer, with each agent asking for infor- 4. GENERATING INDIVIDUAL FACTORED
mation only from the teammate who possess it. POLICIES FOR MULTI-AGENT TEAMS

3
Structured Policy Iteration relies on several tree operations example of a leaf in which an Independent action can be
defined by Boutilier et al., namely the ability to Merge mul- discovered for agent 1, and one in which no Independent
tiple policy trees such that the resulting tree contains the action exists.
maximum policy attainable among the trees, and to Sim-
plify trees to reduce redundant branches and subtrees. In ax ax
order to convert a factored policy for a centralized team into ay {a} by {}
bx cz
factored policies for individual agents, we extend the policy
tree representation to allow for multiple actions at any given
leaf, indicating the presence of a tie, and introduce two addi- (a) (b)
tional tree operations, Intersect, which simplifies a tree by
combining branches whose leaves have non-empty action-set Figure 5: Independent action examples. (a) In this
intersections, and Independent, which computes, for a sin- leaf, action a is can be performed independently by
gle leaf, the set of individual actions that can be performed agent 1, since it forms an optimal joint action with
without considering the action choices of a teammate. x and y, all the possible actions that agent 2 might
perform at this leaf. (b) There is no Independent
The ability to detect and propagate ties between joint ac- action for agent 1 in this leaf.
tions in a policy improves the discovery of context-specific
independences among agents. If there is a portion of the
state where an agent i can act independently, without need- The full process of transforming a factored joint policy into
ing to coordinate with its teammates’ actions, the values factored individual policies for each agent can be found in
of the joint actions composed of i’s optimal action and all Table 3. First, the joint policy is re-written for each agent
possible actions that i’s teammates could choose will be the so as to move the state features visible to that agent to
same. To this end, we modify Merge, which in its original the upper-most nodes of the tree. This is done to increase
form picks one joint action at random when it encounters a the likelihood that, during execution, an agent will be able
tie, to instead build trees where leaves may contain a set of to traverse portions of its policy without requiring commu-
optimal joint actions. There is now an additional simplifi- nication. This rewritten tree is processed to discover the
cation that can be performed on a policy tree, Intersect, Independent actions for the agent at each leaf. Leaves con-
detailed in Table 2 and in the example in Figure 4. At each taining independent actions are labeled with those actions,
non-leaf node in the policy tree, Intersect is recursively while those where the set of independent actions is empty
called on the node’s children. After this, if every child of retain their joint action sets. If any ties remain among the
the node is a leaf, and the action sets of the node’s chil- joint actions, they are now broken according to a predeter-
dren have a non-empty intersection, the node is redundant mined canonical action ordering. This is necessary to avoid
and can be replaced with a new leaf containing the intersec- equilibrium-selection mis-coordinations like the one shown
tion. If all children but one are leaves, a node may still be in Figure 6. At this point, the joint actions in the policy
redundant. This redundancy is detected by examining the tree are converted to individual actions. Finally, Intersect
subtree that is not a leaf. If all of the actions at the leaves is performed and the tree is simplified one final time.
of that subtree are present in the intersection of the node’s
leaf-children’s action sets, the node may be replaced by its
non-leaf child. Table 3: Transform a factored joint policy into fac-
tored individual policies for each agent..
Table 2: Simplify a policy tree through intersection function generateIndividualPolicies
of subtrees that contain the same actions. for each agent:
1. Make a joint policy for that agent, with its state
function Intersect(policy) returns a policy
variables at the root.
input: policy, a tree-structured policy
2. For each leaf in the policy, find Independent
actions
1. if policy is a leaf return policy
3. Break ties among remaining joint actions
2. Recursively call Intersect on each child in policy.
using canonical action ordering.
3. if everyT child of policy is a leaf
4. Convert joint actions to individual actions.
return i childi .actions for each childi of policy
5. Intersect and Simplify.
4. if exactly one child, childNL , of policy is a non-leaf
a. get AL , the intersection of the leaf-children’s
action sets
b. get ANL , the set of all actions at the leaves of 5. EXPERIMENTAL RESULTS
childNL ’s subtree We demonstrate the effectiveness of our approach on an ex-
c. if ANL is a subset of AL return childNL ample problem, a Meeting-Under-Uncertainty domain. In
5. return policy this problem, two agents must meet at a predetermined lo-
cation in a grid world, and when both are at the goal lo-
cation, simultaneously send up a Signal. The other indi-
An individual action is considered Independent for an agent vidual actions available to the agents are North, South,
i in a particular leaf of a policy tree if the action is optimal East, West, and Stop, with movement actions succeed-
when paired with any of the other individual actions that ing with 0.9 probability. The agents receive a reward of
its teammates may choose at that leaf. Figure 5 shows an +20 for signaling together, and receive penalties for either

4
V1 V1 V1

bx V2 bx V2 bx V3

ax ax ax ax ay
V3 V4 ay V3 ay ay
cz cz

ax ay ax ax ax ay
ay ay
cz

(a) (b) (c)

Figure 4: Intersect example. Agent 1 can take individual actions {a, b, c} and agent 2 can take actions {x, y,
z}. (a) The node labeled V4 is simplified by Intersect because all of its children are leaves. The intersection
of the leaves’ action sets is {ax, ay}. (b) V2 is simplified because it has only one child that is not a leaf (the
subtree rooted at V3). Its non-leaf children’s action sets have the intersection {ax, ay}, which includes all of
the actions at the leaves of the non-leaf subtree. (c) No further intersection is possible.

mis-coordinating their Signal actions or signaling from the 1


X
wrong location. The team incurs a cost of -1 for each timestep
that it takes them to reach the goal and Signal. Each EAST WEST

agent observes its own location (X 1 and Y 1 for agent 1, and 1


Y
X 2 and Y 2 for agent 2) but does not know the position of
2
its teammate. This is a domain that admits a high degree X

of factorability and which allows for considerable context-


specific independence among the two agents. While the SOUTH SOUTH

agents are moving toward the goal location, they can act in- STOP
2
Y STOP
dependently. They only need to coordinate once they have
reached the goal.

We applied our algorithm to grid worlds of various sizes. STOP STOP SIGNAL

Figure 8 shows a factored individual policy for agent 1 op-


erating in the 3-by-3 grid world shown in Figure 7, with Figure 8: A factored policy for Agent 1 in a 3-by-3
the goal at location (2,3). A visual inspection of the policy Meeting-Under-Uncertainty problem.
shows that, as expected, for most of the state space, agents
can act independently. We verified this by measuring the
amount of communication needed in an average run of the
problem.
sults for a 3-by-3 grid world are shown in Table 4.

Our factored algorithm achieves the same average reward


as a team communicating at every timestep, but requires
significantly fewer instances of communication. This con-
firms our intuition that the Meeting-Under-Uncertainty do-
main admits a great deal of context-specific independence
between agents. The Dec-Comm algorithm, an approach
* specifically designed to minimize communication usage, is
able to communicate much less frequently than a team ex-
ecuting a factored policy, but does so by sacrificing some
Figure 7: Agents in a 3-by-3 grid world. The goal is amount of expected reward.
at (2,3).
Figure 9 shows the amount of communication used by a team
executing a factored policy, as a percentage of the amount
We compared our execution method to a team operating of communication used by a team with full communication
with free communication and to Dec-Comm, an execution- for grid worlds of increasing size, from 3-by-3 (82 states)
time communication algorithm that chooses joint actions by up to 7-by-7 (2402 states). As the problem size increases,
requiring agents to maintain estimates of the possible joint in this domain, the number of states in which agents can
beliefs of the team [13]. We ran 1000 trials of each method, act without coordinating increases, leading to even greater
starting the agents in random locations. The complete re- communication savings over a fully-communicating team.

5
V1 V1
0 1 0 1

V2 ax a a
V2
0 1 0 1

V1 cx bx c b
0 1 az ay a a

V2 ax

0 1
V2 V2
cx bx 0 1 0
az ay

V1 V1 V1 V1 x

0 1 0 1 0 1 0 1

cx ax bx ax x x x x
az ay z y

(a) (b) (c)

Figure 6: An example of mis-coordination that can occur if ties are not broken using a canonical action
ordering. V1 is a state feature local to agent 1, and V2 is local to agent 2. (a) An individual policy is
generated for each agent, with its local variable at the root. (b) The policies are converted into policies over
individual actions, without an intermediate tie-breaking step. (c) The resulting simplified policies indicate
that agent 1 should always perform action a and agent 2 should always perform x, making the joint action
ax. This is an uncoordinated action whenever V1 = 0.

0.5
Table 4: Complete results for the 3-by-3 Meeting-
Under-Uncertainty domain. 0.45

Mean Mean Mean


0.4
Reward Messages Variables
Sent Sent
(σ2 ) (σ2 ) (σ2 ) 0.35

Free 17.484 7.032 14.064


0.3
Communication (1.069) (1.069) (1.069)
Factored 17.484 3.323 6.646
Execution (1.069) (1.050) (4.199) 0.25
3−by−3 4−by−4 5−by−5 6−by−6 7−by−7
16.378 1.153 2.306
Dec-Comm (1.234) (1.153) (0.498)
Figure 9: Communication usage as a function of
problem size.

One major challenge posed by multi-agent problems is the


exponential growth in the number of states and joint actions
as the number of agents increases. However, as the agents til they reach a state in which potential mis-coordination
may be independent of their teammates for large portions could take place. We provide a post-processing technique
of the state in some multi-agent domains, the single-agent for transforming a centralized factored policy into individ-
factored policies needed to control the agents may be very ual factored policies for each member of a team, and show
compact. Table 5 shows that the number of leaves in each how this technique discovers context-specific independences
agent’s factored individual policy grows linearly as the num- between agents. Our preliminary experimental results are
ber of agents increases from two to five in a 3-by-3 Meeting- promising, showing that our work enables agents to act inde-
Under-Uncertainty domain. pendently and without communication under favorable state
conditions, and discovers automatically those conditions in
6. CONCLUSION which agents must coordinate with their teammates.
In this work, we show how factored policy representations
can be used for coordinated decentralized execution of a Although the Structured Policy Iteration algorithm that we
multi-agent team, providing an efficient answer to the ques- discuss in this paper generates optimal policies for MDP
tions of when and what a team should communicate and al- domains, it is important to note that our approach does
lowing agents to act independently of their teammates un- not require optimal joint policies as input, nor does it de-

6
[5] C. V. Goldman and S. Zilberstein. Optimizing
Table 5: Policy-tree size grows linearly with the information exchange in cooperative multi-agent
number of agents in a 3-by-3 Meeting-Under- systems. In International Joint Conferences on
Uncertainty domain, even as the number of states Autonomous Agents and Multi-agent Systems, 2003.
and joint actions grow exponentially.
[6] C. V. Goldman and S. Zilberstein. Decentralized
Policy control of cooperative systems: Categorization and
Joint Tree
complexity analysis. Journal of AI Research, 2004.
States Actions Leaves
2 agents 82 36 9 [7] C. Guestrin and G. Gordon. Distributed planning in
3 agents 730 216 13 hierarchical factored MDPs. In Uncertainty in
4 agents 6562 1296 17 Artificial Intelligence, 2002.
5 agents 59050 7776 21
[8] C. Guestrin, S. Venkataraman, and D. Koller. Context
specific multiagent coordination and planning with
factored MDPs. In AAAI Spring Symposium, 2002.
pend on the use of one particular MDP solution technique.
The algorithms that we present in this paper can be applied [9] E. Hansen and Z. Feng. Dynamic programming for
to transform a joint policy of any quality into single-agent POMDPs using a factored state representation. In
policies with equal expected reward. All that is required is International Conference on AI Planning Systems,
that the joint policies be transformable into decision trees 2000.
over state variables. For example, the SPUDD algorithm
[10] J. Hoey, R. St-Aubin, A. Hu, and C. Boutilier.
efficiently solves Markov decision processes in which rela-
SPUDD: Stochastic planning using decision diagrams.
tionships among state variables are represented as algebraic
In Uncertainty in Artificial Intelligence, 1999.
decision diagrams (ADDs) [10], which can easily be trans-
formed into trees. Thus, our work can build on any advances [11] R. Nair, M. Roth, M. Yokoo, and M. Tambe.
that arise in the area of generating tree-structured policies Communication for improving policy computation in
for MDPs to solve increasingly large problems. distributed POMDPs. In International Joint
Conferences on Autonomous Agents and Multi-agent
There are several directions in which our work can be ex- Systems, 2004.
tended. Currently, our execution algorithm does not take
into account the cost of communication. An analysis of [12] D. V. Pynadath and M. Tambe. The communicative
the value of information could be integrated into the rea- Multiagent Team Decision Problem: Analyzing
soning to allow agents to decide if the cost of a poten- teamwork theories and models. Journal of AI
tial mis-coordination is high enough to justify communi- Research, 2002.
cation. Additionally, our work focuses on collectively ob-
servable multi-agent domains modeled by Dec-MDPs. It is [13] M. Roth, R. Simmons, and M. Veloso. Reasoning
possible that work on factored POMDPs (e.g. [9] could be about joint beliefs for execution-time communication
applied in a similar fashion to the decentralized execution decisions. In International Joint Conferences on
of Dec-POMDPs, modeling teams with collective partial ob- Autonomous Agents and Multi-agent Systems, 2005.
servability. There is also significant work to be done in char-
acterizing multi-agent domains that exhibit a high degree of
context-specific independence between teammates, as those
are the domains that would benefit most from the applica-
tion of our techniques.

7. REFERENCES
[1] D. S. Bernstein, R. Givan, N. Immerman, and
S. Zilberstein. The complexity of centralized control of
Markov decision processes. Mathematics of Operations
Research, 2002.

[2] C. Boutilier, R. Dearden, and M. Goldszmidt.


Stochastic dynamic programming with factored
representations. Artificial Intelligence, 2000.

[3] C. Boutilier, N. Friedman, M. Goldszmidt, and


D. Koller. Context-specific independence in Bayesian
networks. In Uncertainty in Artificial Intelligence,
1996.

[4] T. Dean and K. Kanazawa. A model for reasoning


about persistence and causation. Computational
Intelligence Journal, 1989.

7
G UARDIANS: a Swarm of Autonomous Robots for Emergencies
Jacques Penders Enric Cervera
Materials and Engineering Research Institute Robotic Intelligence Lab
Sheffield Hallam University, UK Jaume-I University, Spain
J.Penders@shu.ac.uk enric.cervera@icc.uji.es

Ulf Witkowski Lino Marques


Heinz Nixdorf Institute Institute of Systems and Robotics
University of Paderborn, Germany University of Coimbra, Portugal
witkowski@hni.uni-paderborn.de lino@isr.uc.pt

Jeremi Gancet Pierre Bureau


Space Application Services K-Team
Belgium Switzerland
leif.steinicke@spaceapplications.com pierre.bureau@k-team.com

Veysel Gazi Roberto Guzman


Dept. of Electrical and Electronics Engineering Robotnik Automation
TOBB University of Economics and Technology, Turkey Spain
vgazi@etu.edu.tr rguzman@robotnik.es

Abstract ground. Several robot platforms are used, off-the-shelf mini-


robots as well as special purpose middle sized robots. The
The G UARDIANS1 are a swarm of autonomous robots are provided with an artificial nose.
robots applied to navigate and search in particular The project’s central use-case scenario is an industrial
an industrial warehouse in smoke. A time consum- warehouse in smoke, as proposed by the South Yorkshire Fire
ing and dangerous job; toxics may be released and and Rescue Service. Searching the warehouse is time con-
as senses may be impaired humans may get lost. suming and dangerous; toxics may be released and humans
The robots warn for toxic chemicals, provide and senses can be severely impaired. Humans become disoriented
maintain mobile communication links, infer local- and may get lost. The robots navigate autonomously, individ-
ization information and assist in searching. They ually but also as a coherent swarm. Apart from the swarming
enhance operational safety and speed and thus in- technology, a novelty of the project is that the swarm is able
directly save lives. This paper presents the state- to adequately assist and safeguard a human squad leader. The
of-the-art and the aims of the project in the areas robots therefore:
of swarm robotics, chemical sensors, and commu-
nication strategies. Though still at the beginning 1. provide and maintain mobile communication links,
of the project, a real fire-fighting scenario is envi- 2. infer localization information,
sioned, and previous results by the project partners
are expected to be merged and enhanced, in order 3. detect and warn for toxic chemicals and
to bring together new potential impact in the use of 4. assist in searching.
teams of robots in emergency setups.
Thus, they enhance operational safety and speed up the
search job and thus indirectly may save lives.
1 Introduction The robots connect via a wireless network to the squad-
The G UARDIANS (Group of Unmanned Assistant Robots De- leader and the control station. The autonomous swarm oper-
ployed In Aggregative Navigation by Scent) are a swarm of ates in communicative and non-communicative mode. The
autonomous robots applied to navigate and search an urban communication network is self-organizing, and automatic
service discovery is applied: the robots find peers to help
1
Group of Unmanned Assistant Robots Deployed In Aggregative them. The aim is for flexible and seamless switching between
Navigation by Scent; EU-project contract IST-045269; G UARDIANS these modes to compensate for loss of communication sig-
is part of the EC Cognitive Systems initiative. nals. An artificial nose enables the robots to apply olfactory-

8
based navigation and toxic plume detection, as well as the
swarm as a collective .
The wireless network also enables the robots to support a
human squad-leader operating within close range, they main-
tain communication connectivity, provide localization infor-
mation and warn for and try to avoid chemicals. Toxic plume
detection by the robots allows safe trespassing for the human
squad-leader.
The rest of the paper is organized as follows. Section 2
describes the major use case of the project, and the over-
all components of the proposed solution. Sections 3, 4,
and 5 respectively present the state-of-the-art, and expertise
of the project partners in the key areas addressed by the
project, namely: robot swarms, ad-hoc mobile networking,
and olfactory-based mobile robots. The ultimate integration
of such components is provided by a specific human - robot
swarm interface, whose main functionalities are outlined in
Section 6. To conclude, Section 7 analyses the potential im-
pact and outcomes of the project, in the addressed key areas.

2 Fire ground search technology


The major use-case scenario and test case is proposed by the
Fire and Rescue Services (Fig. 1) and consists of searching
an industrial warehouse or basement. As part of the economic
developments industrial warehouses are dramatically increas-
ing in size. However, the standard procedures to search a
warehouse date back to the Second World War. In recent
Figure 1: South Yorkshire Fire and Rescue Service (project
years, five fire fighters died in the UK, as they got disori-
partner). Up: incident support unit. Down: training setup.
entated in a warehouse search [London Fire Brigades Union,
1991]. The firefighters have a maximum of 20 minutes to en-
ter the fire ground, search and exit the building safely. Any • possibly detect casualties
review of the procedures and even small improvements are With a human squad leader the additional roles are:
desperately needed. The consortium believes that the appli-
cation of a swarm of ad-hoc networking robots will lead to • navigate the squad leader through the warehouse
very significant improvements. • exchange squad leader’s position data with the base sta-
Before human beings are to enter a fire ground, the hu- tion
man commander has to decide whether the ground is (rel- • warn for toxic agents
atively) safe and what equipment and gear to use. To en-
hance and speed-up decision making the Fire and Rescue • call the squad leader’s attention to objects of interest
Services urgently require sensors or other devices to assess • maintain the communication link between the fire fighter
the fire ground for the presence of flammables, explosives, and the control base
leaked chemicals, high radiation level, etc. The G UARDIANS The abundance of robots in the work space enables them to
robots are to access and assess scenes for dangers to human switch between jobs without loss of overall functionality. For
beings, they search for toxic chemicals under extreme condi- instance, at first all robots might be involved in searching but
tions: poor visibility, radiation etc. The robots are equipped when the swarm is advancing several robots might stay be-
with a range of sensors for the registration of temperature, at- hind as positioning beacons and to maintain communication
mospheric pressure, and the presence of different chemicals. connectivity. Thus the project delivers:
The robot swarm can be applied with or without a hu-
man squad leader. Depending on the configuration the robot • a mobile adhoc communication network
swarm has the following roles: • a mobile positioning system, carrying its own beacons
Without a human squad leader: • a mobile toxic alarming system
• search and navigate through the warehouse • robot search assistants
• maintain communication connections
• designate one or more robots as positioning beacons
3 Robot swarming
The G UARDIANS swarm will consist of many simple off-the-
• exchange position data with the base station shelf mini robots; they are 13cm in diameter, able to move on
• detect and possibly locate toxic agents floor surfaces with minor obstacles such as doorsteps. The

9
swarm will also comprise a few larger middle-size robots: tonomously search to restore the wireless connection; data
USAR (Urban Search and Rescue) Robots, which will be as- transmission being deferred until contact is restored. More-
sembled in the project (see Fig. 2). The project will equip over, while the swarm is advancing, some robots may remain
the mini robots with special task sensors (electronic nose). behind as position beacons and/or as communication nodes.
The middle size robots in addition carry more navigation sen- The swarm will also apply a service discovery architecture
sors and a large array of gas and vapor sensors. The robots where robots assist each other in executing a task or limit
are to communicate wirelessly with peers and a base station, themselves to a particular job. The swarm can be guided from
and may use Wireless LAN, Bluetooth and ZigBee and may the base-station and monitored by a central control unit (with
switch between these technologies. optional human supervision).
The autonomous swarm operates in communicative and
non-communicative mode. In non-communicative mode,
when a communication link is failing, the individual robots
make navigation decisions based only on sensor feedback.
Swarm behavior results from the robots reacting to each other
whilst applying artificial potential functions [Gazi, 2005a;
Gazi, 2005b].
Swarm research and swarm robotics is a relatively new
field that has gained momentum in recent years. Since the pi-
oneering work by [Reynolds, 1987] on simulation of a flock
of birds in flight (using a behavioral model based on few sim-
ple rules and only local interactions), the field has witnessed
many developments. Several different approaches have been
considered for swarm aggregation, navigation, coordination
and control. While initial research has focused on centralized
[Latombe, 1991; Liu et al., 1989; Barraquand et al., 1992]
and leader-based [Desai et al., 2001] approaches, most of
the recent research has been concentrated on decentralized
approaches, which offer reduced computational complexity
and robustness to failures. Such approaches include behav-
ioral [Balch and Arkin, 1998], artificial potential functions
[Reif and Wang, 1999; Egerstedt and Hu, 2001; Gazi and
Passino, 2003; Gazi and Passino, 2004b; Gazi and Passino,
2004a; Gazi, 2005a; Gazi, 2005b], virtual agents or virtual
structures [Bachmayer and Leonard, 2002; P. Ögren et al.,
2002], probabilistic [Stilwell et al., 2005], and others [Tan-
ner et al., 2003a; Tanner et al., 2003b]. The open-loop ap-
proaches use optimization criteria from game theory or for
navigation control [Wangermann and Stengel, 1999] or dis-
tribution or area coverage criteria [Cortes et al., 2004]. There
are also works dealing with improving systems performance
through adaptation and learning [Patnaik et al., 2005; Uchibe
et al., 1999; Asada and Hosoda, 1999]. Some of these works
use global information while others are based on local in-
teractions and rules. Moreover, besides bio-inspired mod-
els, control-theoretic approaches are currently researched as
well. Some surveys/collections on recent advances and the
state of the art in swarms can be found in [Dorigo and Sahin,
2004; Sahin and Spears, 2005; Kumar et al., 2005] and a web
database on swarm robotics related literature is currently be-
ing compiled at http://swarm-robotics.org/.
Swarms are being investigated in many research projects,
Figure 2: Small, middle-size, and large robot platforms to be most of which with a focus on robot design an im-
used in G UARDIANS. plementation: the Swarm-bots project (http://swarm-
bots.org/)the I-Swarm: Intelligent Small World Autonomous
During the operation the robots have to keep track of their Robots for Micro-manipulation project; (http://www.i-
positioning and maintain their communication link. However, swarm.org/) the Open-Source Micro-Robotic project
while advancing, communication failures might be caused (http://www.swarmrobot.org/); the SWARMS project (Scal-
by loss of connectivity, or even the loss of a robot. When able sWarms of Autonomous Robots and Mobile Sensors)
the wireless signal is lost, the swarm and robots will au- (http://www.swarms.org/) and the CentiBOTS project at SRI.

10
Despite significant advances, the basic mechanisms of functions [Reif and Wang, 1999], as their combination in-
swarm behavior have not been fully elucidated [Penders et duces coherence in the swarm. The functions are chosen such


al., 2004]. Many of the above projects are hardware based that on large distances the attraction IA dominates, on short
with the objective to develop hardware platforms for swarm −→
distances repulsion IR dominates, and at a particular distance
systems. However, a problem of equal importance is to de- attraction and repulsion balance; this is called the equilibrium
velop mechanisms for coordination and control of swarms. distance [Reif and Wang, 1999]. Note that in the case when
The G UARDIANS project aims to contribute to better under- only internal attraction applies but no repulsion, the robots
standing of swarming behavior. Moreover, the objective is not will chase each other; if only repulsion applies the robots will
only to contribute to the swarm robotics literature but also to disperse. In the last two sums, only the neighbors of robot a
apply the developments in a real-world application. could be used (i.e. those robots which can be sensed without
The G UARDIANS project will build upon the previous re- any communication).
search on swarm robotics and develop it in two new direc- Special applications. Non-communicative swarming is
tions: considered a very basic ability of the swarm, which the swarm
1. the swarm has to be able to navigate the area and build up must be able apply when communication fails. Also if tasks
and maintain the communication channel with the out- are achievable by non-communicative swarming, the com-
side world. munication can be dedicated to other purposes. The control
model (1) is intended for general navigation, however it can
2. the swarm has to support a human squad leader in
be adjusted to special purposes, such as recovery from a dis-
searching a hazardous environment in which the human
rupted communication network and olfactory based naviga-
senses partly fail;
tion.
3.1 Non Communicative swarming In case all communication links are failing the robots have
to search individually for a communication signal. This be-
The basic operations of the G UARDIANS-swarm are navigat- havior is achieved by manipulating the first two terms of
ing and searching. The robots will be able to successfully −→
carry out these operations without direct communication be- the control model (1): EA(G,a) , the external attraction and
−−→
tween each other. Each robot agent will process informa- ER(o,a) the external repulsion.
tion locally, but due to the nature of algorithms that can be The robots are individually able to apply olfactory-based
developed, the robots will seem to perform their tasks col- navigation. For the swarm of robots non-communicative
lectively. The searching operations are implemented as very olfactory-based navigation is also obtained by manipulating
−→
basic behaviors using decentralized swarming methods. The the first two terms of the control model (1): EA(G,a) , the
G UARDIANS initial approach is artificial potential functions −−→
external attraction and ER(o,a) the external repulsion.
based. Though various other methods for searching and ef-
fective coverage of the area including heuristic/ad-hoc, gradi- 3.2 Communicative swarming
ent based, or probabilistic approaches, will be developed and Basically in what we are calling communication-based
implemented as well. swarming the robots use a communication channel to ex-
The control model The control model of the robots are the change symbols; as opposed to non-communicative swarm-
mathematical equations governing the robots and the swarm. ing which is based on physical actions (usually movements)
We relate our model to those of [Reif and Wang, 1999; Ya- which change the environment and reactions to these changes.
maguchi, 1999] and the more recent models of [Gazi, 2005a] In the context of robot swarms this distinction is also referred


and [Gazi, 2005b]. The model is given by equation (1); Fa to as between explicit and implicit communication [Parker,
is the basis for the new velocity vector of the robot a and 2000]. In communication based swarming several approaches
depends basically on four terms. The first two terms, repre- can be found in the literature, with an abundance of multi-
−→
sent the external influences. EA(G,a) is the attraction of the agent based approaches, refer to [Farinelli et al., 2004] for a
−−→ recent overview.
goal G on robot a and ER(o,a) is the repulsion caused by the
unidentified obstacle o ∈ O on robot a. The second pair of Whereas searching and navigation can be performed with-
terms in (1) consists of the internal forces, which originate out the individuals communicating, cooperation with com-
amongst the robot in the swarm Sw. They are the attraction munication can dramatically improve the performance of the
−→ −→ system. Moreover, several other tasks essentially require
IA(r,a) and repulsion IR(r,a) between the swarm members
r on a. The attraction points directly towards the source ob- communication; examples are building and maintaining an
ject and the repulsion points into the opposite direction, away overview of the terrain, localization and position detection.
from the source object. More importantly, the swarm can make use of communica-
tion channels to effectively build up a division of labour, with
some robots only acting as a positioning beacon and commu-
Sw Sw

→ −→ X −−→ X−→ X −→ nication node, while others continue pursuing multiple tasks
F a = EA(G,a) + ER(o,a) + IA(r,a) + IR(r,a) (refer to service discovery below).
o∈O r6=a r6=a
For the G UARDIANS project we distinguish two distinct
(1) swarm configurations, a locally communicating swarm and a
−→
The internal attraction IA(r,a) and internal repulsion coordinator based configuration. In the locally communicat-
−→
IR(r,a) are sometimes called the artificial social potential ing swarm the robots are considered to be peers that exchange

11
data amongst themselves but each robot makes its own deci- One major challenge in wireless ad hoc networks, partic-
sions. In the coordinator based configuration the robots ex- ularly in mobile ad-hoc networks, is the design of efficient
change data with the base station. The base station thus ac- routing algorithms. The communication protocol should be
quires an overview from the data and draws conclusions at the resilient to node failures, scalable, efficient - particularly re-
swarm level. The G UARDIANS project will have both config- garding the transmission bandwidth, and secure. Two kinds
urations operational and develop seamless switching between of communication take place: First short range communica-
them. tion within a group of robots and second middle range com-
munication between networks of robots, static activated sen-
3.3 Combining Communicative and sors, humans and the base station.
Non-communicative swarming Generally, topology control can be regarded as the task
of constructing a (connected) subgraph with certain required
Basic swarming behavior such as aggregation and coordi- properties from a given graph representing the communica-
nated motion, navigation, or even specialization through tion network. Topology control can be used to reduce the
learning [Li et al., 2004] can be achieved without com- energy consumption and/or interference in large networks by
munication. However, as mentioned above communication adapting the transmission ranges of the network nodes or cre-
can drastically improve the performance of the system. In ating a connected dominating subset of nodes to which regu-
G UARDIANS each agent will be capable of performing both lar nodes connect. The risk is, however, the loss of connec-
communicative and non-communicative swarming and based tivity in the network through the removal of connection links.
on the availability of the communication channels will be op- In the scenario we consider here, the availability and robust-
erating in either mode. This will not only help to enhance the ness of network connection is the most important property
performance, but will also improve the robustness of the sys- and energy consumption and scalability are only secondary
tem and potentially decrease robot casualties. While operat- concerns.
ing in non-communicative mode each robot will continue col- In communicative mode the robots in the swarm can co-
lecting data about the environment; which, as soon as com- ordinate their actions, for instance by negotiating as a multi-
munication is established it will share with the other swarm agent system. Important when applying the swarm in rescue
members, the squad leader, and the base station and will re- operations is that in this mode the swarm applies automatic
quest help or other tasks. A novel aspect for swarming re- service discovery that is: the robots find peers to help them to
search (to the best of our knowledge not been done before) solve a local problem [Du et al., 2005].
is that in order to maintain communication connections a The problem of efficient service discovery in mobile ad-
swarming algorithm has to be built to make the robots keep hoc networks is still unsolved. Known solutions for service
track of the strength of the communication signals, and when discovery (like Jini, JXTA, SLP, UPnP, Bluetooth SDP or
necessary search for a signal or a stronger signal. Salutation) were not designed with such kind of highly dy-
namic networks in mind.
4 Ad-hoc mobile networking and service Robots need the ability to mutually offer each other ser-
vices in an open, heterogeneous network and they must be
discovery able to efficiently discover and use those services (Fig. 3).
The mobile robots will form a wireless ad hoc network Only this feature will allow them to cooperate flexibly and
for communication. Various wireless communication stan- dynamically to solve tasks. Thus, service discovery allows a
dards have been established in recent years including Wire- constant, task-oriented self-configuration of the overall sys-
less LAN, Bluetooth and ZigBee. They differ in network tem to achieve required goals.
size, radio range, data rate and power consumption. Wireless
LAN is suited to high data rate and high range communica-
tion at the cost of high power consumption. Bluetooth has
a lower data rate and transmission range but in turn signifi-
cantly lower power consumption. ZigBee is highly scalable
with even lower power consumption but with a trade off for
lower data rates.
Mobile ad hoc networks are wireless networks consisting
of mobile nodes with a typical local (decentralized) organi-
zation and potentially high dynamics of the network struc-
ture, they have a dynamic or adjustable topology. In a swarm
and in contrast to a small group, communication by simply
broadcasting becomes infeasible in large-scale networks due
to scalability problems and the limited range of communica-
tion devices. The G UARDIANS project utilities the flexible
topology to deal with communication barriers that are very Figure 3: Basic idea of service discovery.
likely present in the warehouse; to our knowledge the project
is the first to apply ad-hoc networking techniques for this pur- Besides the autonomous operation modes, the swarm
pose. can also guide the human squad-leader meaning that a

12
semi-autonomous mode is applied, where the human being area- and non-communicative olfactory based swarming -
makes the overall decisions, while supported by mostly au- robust fall-back option- are developed.
tonomously operating robots. The aim is for flexible and
seamless switching between the various modes in order to 6 Human robot swarm cooperation
compensate for loss of network signals and to maintain sup-
port of the squad-leader. The G UARDIANS are a swarm of autonomous robots, a fact
with far stretching consequences for the human-robot inter-
face. The human swarm interface is essentially very differ-
5 Olfactory-based robot swarm navigation ent from the human-robot interfaces applied in telerobotics.
The swarm will navigate autonomously through the terrain In telerobotics (refer to PeLoTe project, Project IST-2001-
using ’conventional’ SLAM sensors. An additional naviga- 38873, or View-Finder FP6-045541) several humans may op-
tion mode is based on data from the olfactory sensors. erate one robot, in G UARDIANS however, it is basically one
The possibility to measure the gas concentration with a human being cooperating with several robots.
mobile robot enables a broad range of applications, rang-
ing from surveillance of environmental pollutants [Marques 6.1 Human squad leader
et al., 2003] and the detection of hazardous gases, to self- The objective of providing a human control over the swarm of
produced odors to aid navigation [Larionova et al., 2006]. So robots is to add some qualities which are not inherently avail-
far, the main problem studied in olfactory navigation has been able in the robotic swarm. Human control in swarm robotics
the tracking and localization of static odor sources [Almeida allows for dynamic authoritative control of specific swarm ac-
et al., 2004]. In real situations, this problem can be broken tivities based upon local circumstances and human expertise.
into three sub-problems: finding traces of the chemical of in- In addition, feedback from local robots can enhance the task
terest; tracking the respective odor plume; and identifying the performance of the human being.
odor source. The competences and capabilities of individual robots and
humans vary. In an emergency situation a robot’s sensors,
position awareness and access will be better than those of the
human, by contrast the humans problem solving ability, inter-
pretative powers and authority will exceed those of a robot.
Having a human add some qualities which are not inherently
available in the robotic swarm, e.g. opportunism, inductive
reasoning, improvisation on unexpected events, learning from
experience. However, no matter how intelligent the robotic
system is, as demonstrated by a lot of remarkable results in
autonomous robotics, still human supervision is necessary,
especially for some critical and sophisticated tasks like search
and rescue (SAR).
Swarm guidance algorithms are built based on the au-
tonomous operations of the robots (distributive) to which
(human originating) tactical planning instructions might be
added. The swarm will be monitored by human supervisors.
At this level there are several new problems to be solved.
One is to interweave human control with the (mostly) au-
tonomously operating swarm and to enable the operator to
single out individual sensor-carriers and take control over
them.

6.2 Base station


The swarm of mobile robots can also be accessed via the
base station. The base station is the central point for data
collection and incorporates a decisive controller function for
the entire operation. It operates in automatic mode, in semi-
automatic mode or in operator mode while accessed via the
Figure 4: Electronic nose, and search path of a swarm. user interface. The base station provides global instruction
to the robots. The base-station allows point-to-point con-
Our focus is on swarming using olfactory-based naviga- nections between robots and the operator as well as point-to-
tion. Efficient swarming algorithms for terrain surveying multipoint and broadcast connections to communicate to the
based on spatial concentration of odor fields are developed. group of robots. The overall operations of the robot swarm
Particularly the effects of airflow and diffusion need to be will be monitored via the base station.
addressed in the swarming algorithms. Both olfactory based The base station incorporates:
communicative swarming -enabling efficient coverage of the 1. A location mapping unit

13
2. A human supervised operation command post. • A review of the currently applied search strategy and
The location mapping unit up-dates maps with position in- procedures, dating back to the Second World War.
formation received from the robots and, when applied, the hu- • From the robotics viewpoint completely new strategies
man squad leader. At the base station a mixture of local -robot may arise, that may be useful even without robots being
centered- information and global information (risk maps, ob- applied.
stacle maps, etc.) is available. From this mixture, an overview • Applying robots even when they are only partially
of the data is compiled, mainly presented as maps. Overview- successful- may enhance this time consuming search
ing the swarm at group-level conclusions may be deducted, task.
for instances about areas considered cleared or possible loca-
tions of the chemical plume source. These conclusions might • A continuously evolving ad-hoc wireless network will
be translated into sets of instructions to the swarm. be developed, which can be applied on robots as well as
Whenever possible, the robots act autonomously and are on human fire fighters.
only monitored by the supervisor. However for complex • Several solutions for localization will be developed,
group operations, and even though the robots have as high some based on sensors others on communication; each
a level of autonomy as possible, it is still necessary to em- of these solutions might aid the current orientation prob-
power one or more operators to control them, especially in lem of the fire fighters.
critical tasks like cooperated search for explosives or chem- • Robot-nose and array of chemical sensors, devices to
ical agents, and rescue missions. This is due to the fact that speed-up the identification of possible toxic gases that
the robots have limited resources, meaning that there are con- enhance decision making by the commander on whether
ditions in which the robots do not know what to do. Inter- and how the ground can be entered by a human fire
ference of human operators will solve this problem, and thus fighter
support the robots to accomplish the mission. Via the user
interface the supervisor can adjust the degree of autonomy of • Enhance safety and efficiency of Search and Rescue op-
the robots. erations.
The base station assists the supervisor in controlling the
7.2 Communicative and non communicative robot
swarm; it displays maps, and visualizes the group behavior
and automatically derived conclusions. The human operator swarming
is supported in the overall functions of strategic and contin- Methods for robot swarming are currently an important topic
uous tactical planning for the swarm of robots. The operator of research. The G UARDIANS project is one of the first to
can also acquire detailed information about the actions of the apply swarm technology in a real life scenario, which calls
individual robots. The Base station human interface provides for a mixture of jobs. The importance of the jobs varies with
functions for group instructions as well as for direct (robot) the situation the robots find themselves in. The strength of
task allocation. applying a swarm is that each of the individual robots can
dynamically be assigned to any of these tasks.
7 Potential impact Autonomous swarming: two approaches are explored in
the project, one without explicit communication and one with
Many current swarm-based projects seek to investigate the communication; the latter includes service discovery as well.
effect of swarming and learning in theoretical and controlled The approaches are based on studying how multiple individ-
environments. The G UARDIANS project intend to take this ual behaviors merge to become group behavior, and the group
in the next level by offering a practical and real-world appli- may accomplish higher-level tasks. This approach has deep
cation for such research. The research in the G UARDIANS implications on how cognitive systems can be designed and
project impacts on the following innovations: implemented.
1. Advancing fire ground search technology Decentralized control as the project aims at (usually the
swarm operates non-supervised) is a relatively new concept
2. New algorithms and methods for robot swarming tech-
and largely unexplored although a lot of research is ongoing.
nology
The algorithms developed in the G UARDIANS project will be
3. Multi-sensor data fusion in a swarm of robots general enough to render them applicable in other applica-
4. Ad-hoc communication and networking methods and tions and contexts as well. These applications include ev-
service discovery for mobile robots eryday surveillance tasks as well as other search and rescue
operations. This project will contribute to advancing the state
5. Olfactory-based navigation of the art in the subject field. The proposed architecture might
6. Human-swarm cooperation be used in other applications such as Intelligent Vehicles and
Highway System (IVHS).
7.1 Fire ground search technology
South Yorkshire Fire and Rescue Service (SyFire) proposes 7.3 Multi-sensor data fusion in a swarm
the use-case scenario of searching an industrial warehouse Research into the fusion of data from multiple sensors is a
covered in smoke. Performing research in this given real- topic of current research in single robot systems. The problem
world scenario will deliver several important innovations and of data fusion in a swarm environment is at least an order
results: more complex. The research in G UARDIANS will address the

14
fusion of data from multiple robot sources. This in turn will to develop a language to maintain control and co-ordination
govern the individual behaviors of agents in the swarm. between the robots and the human. Moreover, given to con-
In a way the G UARDIANS robots, together, will act as one ditions on the fire ground, the smoke and the noisy breath-
sensor network that can be used to identify safe passages and ing equipment, the project cannot fully rely on the commonly
the location of human targets in unsafe environments due to used audio-visual communication means and is forced to look
chemical spillage or fire. at other means to establish communication between the hu-
Techniques for target tracking, data association and identi- man and the robots.
fication have to be addressed and developed. In this context, A solution that meets all or most of these requirements will
multi-modal statistical techniques will be used to develop the be extremely novel and results might easily be transferable to
framework for fusion processing. Delivery of this objective many areas where a bond between a human and one or more
will offer new and novel methods that can combine sensor robots is desirable. Applications for this innovation may vary
and classifier information in a distributed manner. from supporting humans operating in hazardous or danger-
ous environments (regularly experienced by rescue workers)
7.4 Ad-hoc mobile networking and service up to supporting audio-visually handicapped people in their
discovery everyday lives.
At a built-up terrain, and in an industrial warehouse in par- Supporting human control: complex tasks are still in the
ticular many metal obstacles may be present that pose several realms of human reasoning and will be so for quite some
communication and connection problems, therefore the wire- time. It is important that while the proposed robotic system
less communication network is typically self-organizing and is able to operate as independently as possible in automatic
aims at ad hoc modifications ranging from single point-to- and semi-automatic ways it does allow and support full hu-
point connections between local peers via local group con- man operator control. The project investigates ways in which
nections to centralized connections with the base station. the system can effectively interact with humans in ways that
The Fire and Rescue services are desperate for wireless support rapid decision making by providing full information
mobile communication systems that do not require that net- in easily accessible formats. In particular, the outputs of the
work nodes have to be positioned previous to the network map building activity and what they might mean, with uncer-
being operational. The ad hoc network, the nodes of which tainties highlighted. Such communication strategies are rele-
are in the project carried by the robots, would be an ideal so- vant to all applications where Geographical data are deployed
lution for the communication problem; implemented with or from simple home tasks to space exploration.
without robots.
Acknowledgments
The wireless network provides also one of the (rough)
means to locate the robots and the human fire fighter on the Support from EU-VI Framework Programme (IST-045269) is
map available at the base station. In applications this func- acknowledged. G UARDIANS is part of the EC Cognitive Sys-
tionality allows that the operational command can keep track tems initiative. Jaume-I University is also funded by Spanish
of the fire fighter even when the person is not or no longer Ministry of Education-CICYT (DPI2005-08203-C02-01) and
responding. Generalitat Valenciana-IMPIVA (IMCOIC/2006/48).

7.5 Olfactory-based Robot swarm navigation References


The G UARDIANS project also addresses olfactory-based nav- N. Almeida, L. Marques, and A.T. de Almeida. Directional
igation, and chemical sensors. Their use in fire fighting is electronic nose setup results on the detection of static
explained above. The application of these sensors on mobile odour sources. In Proc. IEEE Int. Conf. on Sensors,
robot swarms opens up new directions for various applica- Wien, Austria, 2004.
tions beyond the rescue services.
Robot swarms are particularly apt for source detection. M. Asada and E. Uchibe K. Hosoda. Cooperative behavior
First, as safety requirements for a robot are much lower then acquisition for mobile robots in dynamically changing
for a human being the robots might be applied in high risk real worlds via vision-based reinforcement learning and
environments, seconds, because the swarm spreads out they development. Artificial Intelligence, 110(2):275–292,
quickly provide geographical data from which the position June 1999.
source might be inferred. R. Bachmayer and N. E. Leonard. Vehicle networks for gra-
dient descent in a sampled environment. In Proc. Conf.
7.6 Human Robot Swarm cooperation Decision Contr., pages 112–117, Las Vegas, Nevada,
The research on the human squad interface has to tackle sev- December 2002.
eral very new problems. The major requirement is that the
human squad leader interface creates a feel of confidence to T. Balch and R. C. Arkin. Behavior-based formation control
the human. The robot swarm is to help the human to navigate for multirobot teams. IEEE Trans. on Robotics and Au-
where the human senses are failing. Feedback from the robots tomation, 14(6):926–939, December 1998.
enhances the task performance of the human being. The inter- J. Barraquand, B. Langlois, and J. C. Latombe. Numeri-
face has to be effective but simple, as any additional complex- cal potential field techniques for robot path planning.
ity of working with a robot swarm could detract the human IEEE Transactions on Systems, Man, and Cybernetics,
being from the potential utility of the swarm. The project has 22(2):224–241, 1992.

15
J. Cortes, S. Martinez, T. Karatas, and F. Bullo. Coverage London Fire Brigades Union. Fire Brigades Union Investiga-
control for mobile sensing networks. IEEE Trans. on tion into fatal fire at Hays Business Services., 1991.
Robotics and Automation, 20(2):243–255, 2004. L. Marques, N. Almeida, and A. T. de Almeida. Olfactory
J. P. Desai, J. Ostrowski, and V. Kumar. Modeling and con- sensory system for odour-plume tracking and localiza-
trol of formations of nonholonomic mobile robots. IEEE tion. In Proc. IEEE Int. Conf. on Sensors, Toronto,
Trans. on Robotics and Automation, 17(6):905–908, De- Canada, 2003.
cember 2001. P. Ögren, E. Fiorelli, and N. E. Leonard. Formations with
M. Dorigo and E. Sahin. Special issue on swarm robotics. a mission: Stable coordination of vehicle group maneu-
Autonomous Robots, 17(2-3), September 2004. vers. In Symposium on Mathematical Theory of Net-
Jia Lei Du, Stefan Rührup, Ulf Witkowski, and Ulrich works and Systems, August 2002.
Rückert. Resource and service discovery for large–scale L. E. Parker. Current state of the art in multi-robot teams. In
robot networks in disaster scenarios. In Proceedings of Distributed Autonomous Robotic Systems 4, pages 3–12,
the IEEE International Workshop on Safety, Security and 2000.
Rescue Robotics (SSRR2005), Kobe, Japan, june 2005. S. Patnaik, A. Konar, and A. K. Mandal. Improving the multi-
M. Egerstedt and X. Hu. Formation constrained multi-agent agent coordination through learning. IETE JOURNAL
control. IEEE Trans. on Robotics and Automation, OF RESEARCH, 51(5):395–406, September-October
17(6):947–951, December 2001. 2005.
Alessandro Farinelli, Luca Iocchi, and Daniele Nardi. Multi- J. Penders, L. Alboul, and M. Rodrigues. Modelling in-
robot systems: A classification focused on coordina- teraction patterns and group behaviour in a three-robot
tion. IEEE Transactions on System Man and Cybernet- team. In Proceeding of Taros 04, Colchester; Tech-
ics, part B, pages 2015–2028, 2004. nical Report Series, CSM-415 University of Essex,
ISSN 1744-8050. http://cswww.essex.ac.uk/technical-
V. Gazi and K. M. Passino. Stability analysis of swarms. reports/2004/csm415/, 2004.
IEEE Trans. on Automatic Control, 48(4):692–697,
April 2003. J. H. Reif and H. Wang. Social potential fields: A distributed
behavioral control for autonomous robots. Robotics and
V. Gazi and K. M. Passino. A class of attraction/repulsion Autonomous Systems, 27:171–194, 1999.
functions for stable swarm aggregations. Int. J. Control,
77(18):1567–1579, December 2004. C. W. Reynolds. Flocks, herds, and schools: A distributed
behavioral model. Comp. Graph., 21(4):25–34, 1987.
V. Gazi and K. M. Passino. Stability analysis of social forag-
ing swarms. IEEE Trans. on Systems, Man, and Cyber- E. Sahin and W. M. Spears, editors. Swarm Robotics, A State
netics: Part B, 34(1):539–557, February 2004. of the Art Survey. Lecture Notes in Computer Science
3342. Springer-Verlag, Berlin Heidelberg, 2005.
V. Gazi. Formation control of a multi-agent system using non-
linear servomechanism. International Journal of Con- D.J. Stilwell, B.E. Bishop, and C.A. Sylvester. Redundant
trol, 78(8):554–565, 2005. manipulator techniques for partially decentralized path
planning and control of a platoon of autonomous vehi-
V. Gazi. Swarm aggregations using artificial potentials and cles. IEEE Transactions on Systems Man and Cybernet-
sliding mode control. IEEE Transactions on Robotics, ics Part B-Cybernetics, 35(4):842–848, 2005.
21(6):1208–1214, 2005.
H. G. Tanner, A. Jadbabaie, and G. J. Pappas. Stable flock-
V. J. Kumar, N. E. Leonard, and A. S. Morse, editors. Coop- ing of mobile agents, part i: Fixed topology. In Proc.
erative Control: 2003 Block Island Workshop on Coop- Conf. Decision Contr., pages 2010–2015, Maui, Hawaii,
erative Control, volume 309 of Lecture Notes in Control December 2003.
and Information Sciences. Springer-Verlag, 2005.
H. G. Tanner, A. Jadbabaie, and G. J. Pappas. Stable flocking
S. Larionova, L. Marques, and A.T. de Almeida. Olfactory of mobile agents, part ii: Dynamic topology. In Proc.
coordinated mobile robot area coverage. Autonomous Conf. Decision Contr., pages 2016–2021, Maui, Hawaii,
Robots Journal, Special Issue on Mobile Robot Olfac- December 2003.
tion, 2006. E. Uchibe, M. Nakamura, and M. Asada. Cooperative behav-
J. C. Latombe. Robot Motion Planning. Kluwer Akademic ior acquisition in a multiple mobile robot environment
Publshers, 1991. by co-evolution. In Lecture Notes in Artificial Intelli-
L. Li, A. Martinoli, and Y. S. Abu-Mostafa. Learning and gence 1604, pages 273–285, 1999.
measuring specialization in collaborative swarm sys- J. P. Wangermann and R. F. Stengel. Optimization and co-
tems. Adaptive Behavior, 12(3–4):199–212, 2004. ordination of multiagent systems using principled nego-
Y.-H. Liu, S. Kuroda, T. Naniwa, H. Noborio, and S. Arimoto. tiation. Journal of Guidance, Control, and Dynamics,
A practical algorithm for planning collision-free coordi- 22(1):43–50, 1999.
nated motion of multiple mobile robots. In Proc. IEEE H. Yamaguchi. A cooperative hunting behavior by mobile-
International Conference on Robotics and Automation, robot troops. The International Journal of Robotics Re-
volume 3, pages 1427–1432, 14-19 May 1989. search, 18(8):931–940, September 1999.

16
Learning by Imitation: A Scheme for Offline Learning of Robot Control through
Demonstration by Other Robot

Umar Suleman
Lahore University of Management Sciences
Computer Science Department
Lahore, Pakistan
SULEMAN@LUMS.EDU.PK
Rauf Baig
FAST - National University of Computer and Emerging Sciences
Computer Science Department
Islamabad, Pakistan
DR.RAUF.BAIG@NU.EDU.PK

Mian M. Awais
Lahore University of Management Sciences
Computer Science Department
Lahore, Pakistan
AWAIS@LUMS.EDU.PK

Abstract ing. Some research works have tried to model such coupling
[Mataric, 2002], and there is an ongoing quest by research-
This contribution proposes a machine learning ers for a clearer understanding of this phenomenon and its
scheme inspired from learning by imitation and reproduction in robots and other artificial and virtual forms
demonstration in nature where demonstrations per-
[Alissandrakis et al., 2001].
formed upon a robot is used to program another ro-
bot’s control program. The proposal is kept generic 1.1 Learning by Imitation in Robots
and open to deal with a variety of robotics’ prob-
lems. An evolutionary learning algorithm as in Learning is essential in robotics, especially for modern
[Surmann, 2000] has been used to implement the autonomous robots with social and/or dynamic orientation.
core fuzzy system learning phase. Thus, offline Learning by imitation is an instance of social and continu-
learning of a robot controller program has been ous interaction and has been successfully employed for a
shown using AIBO robot in a foot-landing task. wide range of autonomous robotics applications. Learning
The data used for learning in the given experiment by imitation in robots has shown tremendous potential and
is generated by a human expert controlled perform- productivity [Herrera, 1998]. Many novel representations of
ance over another robot. learning systems are proposed to deal with imitation of
varying degrees of complexity. Initially social learning in
1 Introduction robots was employed on groups of similar robots where
robots learned navigation and other movements of one
Interaction and imitation are the fundamental mechanisms
leader robot. In later works such learning evolved into inter-
for shaping both individuals and societies in terms of infor-
mation, skills and behavior. Learning by imitation has been personal communications of robots, learning from slightly
regarded as a key source of ‘social’ learning in biological different specification robots and then to imitation of hu-
forms [Noble and Todd, 1999]. Neuroscience experiments mans and other intelligent embodiments [Billard, 2000].
on monkeys have revealed a very tight coupling of percep- Robot learning by imitation ranges over various levels
tion of motion and motor neurons, strengthening the belief of complexities. A primitive level learning would cover
in existence of imitation in nature as a basic form of learn- motor tasks observations alone. In contrast, a higher-level

17
instance would be focusing beyond just motor actions, on knowledge base learning [Linkens and Chen, 1999]. Over
identification of the underlying motivation and goals, of the time four categories have emerged signifying different
observed actions and then accomplishments of these goals generic approaches in generation or learning of fuzzy sys-
[Maistros and Hayes, 2001]. Consequently, high-level envi- tems.
ronment modeling and perception, cognitive modeling, Most techniques actually generate only the rule-base,
complex goal identification and planning, and interactive given a data base in advance like [Jeong and Oh, 1999].
social imitation are all goals of research in learning by imi- However some studies showed optimal data base to be of
tation. Some contributions in learning by imitation in robot- much greater impact on the robustness and simplicity of
ics have realized imitation as more of a goal-oriented exer- final fuzzy system then to an optimal rule base. To this rea-
cise rather than action mimicry, while others have focused son, there are in fact schemes that optimize the data base as
on imitation as reproduction of actions alone, ignoring out- well after a rule base has been generated [Karr, 1997]. This
come and success [Mataric, 2002]. Selection and design of kind of optimization poses constraints on the structure of
primitives is naturally a very difficult task in implementing data base and therefore only the membership functions can
learning by imitation in robots. Different contributions have be optimized, which do not completely exploit the potential
suggested collaborative use of imitation learning constructs benefits. Another such category is of simultaneous rule base
and other machine learning techniques for improvement in and data base generation, which has shown com-parable to
results [Hayes and Demiris, 1994]. better results but has undesirably long computational re-
Learning by imitation in robots has been extended to quirements. A natural adjustment to the former approach is
dissimilar robot, and human demonstrators. It has also been more informed scheme with focused optimization of data
shown that high level tasks learning is poor with a robot base, followed by rule base derivation. Contributions in the
demonstrator as compared to learning of the same tasks with last approach are beginning to emerge and have shown im-
human demonstrator, however the results are inverse when proved results as compared to the other three [Surmann,
dealing with low level task learning or homogenous action 2000; Surmann and Selenschtschikow, 2002].
robot demonstrator [Nicolescu and Mataric, 2001]. Many learning technologies have evolved for fuzzy sys-
The employment of vision as the medium of observa- tems generation from data, some even allowing integration
tion has been the choice of the majority of work in the field. of the available expert model knowledge in the generated
Besides vision, there are other mediums of observation that system. The major methods amongst them include Fuzzy
can potentially produce notable enhancement in the technol- Inductive Reasoning (FIR) [Mugica and Nebot, 1996],
ogy. In the scenario where robots acts as demonstrators for fuzzy-rule oriented statistical analysis (fuzzy-ROSA)
other robots to learn, the communication mediums and ob- method [Krone and Kiendl, 1996] and other statistical
servation horizon is broadened to new artificial means like methods, neuro-fuzzy [Figueiredo and Gomide, 1999] and
electric signals, digital communication etc. genetic-fuzzy identification [Surmann, 2000]. There is a
variety of other approaches as well that do not fall under any
1.2 Fuzzy Systems Learning and Robotics particular category [Wang and Mendel, 1992].
Robots are essentially control systems. The internal logic of Application of genetic algorithms, specifically on fuzzy
any contemporary generic control system such as a robot is systems, is an emerging area at present. One interesting ap-
provided very often in the form of a fuzzy system or a fuzzy plication is the supervised learning of fuzzy systems using
logic controller. Fuzzy systems are rule-based systems ca- genetic algorithms.
pable of handling uncertain situations. Their robustness is Learning by imitation has yet to play a vital rule in ro-
dependant on the appropriateness of the rule base [Takagi botics and control systems in general. This work will focus
and Sugeno, 1985]. The rules are often provided by experts on imitation of the trivial-moderate degree where neither
and are mostly incomplete, especially when the system is human nor robot demonstrators have shown convincing
dynamic and complex. Sometimes, a thorough, costly and superiority over the other. The scope of this work spans
complex analysis is required for the design of the system at homogeneous (exactly same anatomy of the demonstrator
hand. In practice inappropriate rule bases are made appro- and learner robots) environment learning.
priate by means of trial and error.
The alternative to convention fuzzy controller devel- 2 Learning Scheme
opment is empirical automated modeling or simply data- The learning scheme proposed here requires a demonstrator
based learning which, if successful, diminishes the disad- robot to perform the action of interest and a learner robot to
vantages of the former methodology. The learning of fuzzy learn the demonstrated action as its control program. In a
systems has two crucial ingredients, structure i.e. the data homogenous environment, where the demonstrator robot
base and parameters i.e. the rule base. While majority of the and the learner robot are exactly same, this learning effort
work has been concentrated on parametric and partial tuning seems useless in practice because the control program of the
[Castellano and Fanelli, 2000], a few have worked on com- demonstrator could be copied verbatim to the learner robot
plete knowledge base system learning, all of which have an to achieve perfect results. The utility of this scheme comes
incremental procedure involving little or none simultaneous

18
into perspective when the demonstration is actually per-
formed by a human expert or Expert Human Controller
(EHC) by controlling the demonstrator robot in some intui- Environment Setup
tive manner. Thus a control program could actually be pro-
EHC Ready
duced by this scheme when there is no real control program
in the demonstrator robot at all. And that is in fact the real Demonstration Performance
strength of this scheme. Therefore from an architectural
viewpoint the demonstrator robot in the proposed scheme is Demonstration Log

void of a high-level control program. There is only the very Observations Pre-processing
basic inventory of get and set functions that expose the
minimal interface required by EHC to use all the sensors Training Data
and actuators of the demonstrator robot respectively. EHC
Fuzzy System Learning
interact with the get and set functions through the controller
interface as shown in the figure 1. Fuzzy System

Post-processing

Refined Fuzzy System

Deployment

Figure 2. Phases of proposed scheme

2.1 Environment Setup


The prerequisite environment for the EHC to perform dem-
onstration over the demonstration robot cannot be set up
Figure 1. Schematic view of learning environment setup on with any fixed universal formula. However we have ex-
the demonstrator robot
tracted guidelines to carry out the setup phase. The objective
Observer module listens on all the communication of is to engineer an intrinsic computable medium to float the
get and set functions during demonstrations performance demonstration forward to the learner robot. First of all the
through the Controller Interface and stores it in the Demon- system to be modeled is analyzed to enumerate all the inputs
stration Log. When the required demonstrations are stored to the system actuators i.e. commands, and outputs of the
the log is passed on to the learner robot to learn a control system i.e. sensor readings etc. In addition the maximum
logic equivalent to the control logic of EHC. Arguably, the range of values each of inputs and outputs is calculated.
medium of communication and observation appropriate for The second task is to get the demonstrator robot ready
inter-robot learning should ideally be one that is most in- for perform the target action. The demonstration system is
formative and has the least of problems of its own. Hence set up according to figure 1. The basic task in this is to, im-
we chose raw digital data i.e. log. We believe that vision is plement the set and get functions to control actuators and
not an ideal medium as the mere interpretation of a visual read sensors respectively, put on the controller interface and
observation is a hard and open problem to date and a full- the observer modules to record communication between
fledged field in its own right to work on. So simply stated EHC and underlying hardware of the demonstration robot
the approach we have identified is basically using the dem- and implement the tools for EHC that are intuitive for a hu-
onstration data stored in the said log to learn EHC equiva- man to use for control. The latter could be implemented in a
lent autonomous control system which is finally installed in variety of ways ranging from simple joystick hardware
the learner robot to deal with real situations. setup to sophisticated software cum wearable sensors instal-
The whole scheme is broken into a sequence of generic lation.
phases as shown in figure 2. The output of each phase is
2.2 Demonstration Performance
also shown. The remainder of this section covers each of the
phases briefly. EHC performs the demonstration of interest multiple times
over the demonstration robot, thus generating demonstration
log. All principles of generating a good supervised learning
data apply. The number of performances should be suffi-
cient enough so that each important situation to be handled

19
is encountered. Consequently it should be observed that with acceptable solution(s) in the population, as a best effort
each performance deals with new situation. approach, tries to improve the structural elegance of ac-
The Observer module is responsible to sniff all the data ceptably fit individuals, i.e. subsuming similar membership
passing through the Controller Interface and log them. The functions and removing unused rules and membership func-
scheme proposes a simple relational format for the demon- tions called the closing step. Note that when the closing step
stration log consisting of name-value pairs prescribing the cannot reduce an acceptable fuzzy system at all it is as-
actuator commands and sensor readings at each instance of sumed that the most compact system found, hence the algo-
time during demonstrations. rithm exits with success. Third it falls in the most effective
and generic class of learning techniques where the both
2.3 Observations Pre-processing membership functions as well as the rules are learned simul-
Raw demonstration log is processed in the pre-processing taneously.
phase to prepare it for the learning phase to use as supervi-
sory training data. In pre-processing the demonstration log
is processed depending upon the actual learning algorithm
to follow. As an example our current implementation uses
simple data normalization on sensor readings and actuator
values.
The option to cleanse the recorded data off noise and
outliers is consciously rejected on the basis of the fact that
the noisy records are bound to re-occur on the learner robot
platform because of the same reason they occurred at the
time of demonstration. Keeping the learning system in igno-
rance of the potential noise can contribute greatly in produc-
tion of a non-robust control system.

2.4 Fuzzy System Learning


The crux of this scheme is the fuzzy system learning phase.
Given the pre-processed data the goal is to produce a fuzzy
system which can mimic the control logic of the EHC i.e.
the fuzzy system can continually produce the right set of
actuator commands given the sensor readings to achieve the
goal.
The criteria for selection of a particular method for this
phase are generality, robustness and completeness. Genetic
learning of the fuzzy system is one good option for this
phase (see section 1.2) which our current implementation
employs based on works of Herrera and Surmann [Herrera,
1998; Surmann, 2000]. It is implemented as an offline sys- Figure 3. Genetic Algorithm for Fuzzy System Learning
tem which takes all the observation records as training sam-
ples in order to produce a reasonably fitted fuzzy system. When the genetic algorithm returns an acceptable fuzzy
The logical flow of the genetic algorithm implementation is system, it is further optimized by a simple numerical tech-
given in Figure 3. nique i.e. membership functions parameters are fine tuned
To understand the real implication of the used algo- with increments/decrements of reducing magnitude to the
rithm one thing should be kept in mind that the individuals point where no more improvement is possible on the train-
of the population optimized by this genetic algorithm are ing data. This technique provides the fine granular steps to
complete fuzzy systems, with membership functions and get to the real maxima, where the gross granular genetic
rules and the algorithm tries to find an optimum fuzzy sys- steps are unable to reach.
tem with an acceptable MSE over the given training data.
The complete fuzzy system is encoded into two strings i.e.
one for each of rules and membership functions. The mem- 2.5 Post-processing
bership functions used are asymmetric Gaussian functions. The output fuzzy system of the learning phase is not de-
This technique has been selected by virtue of some very ployed as such. The fuzzy system is de-normalized back in
useful features. First it provides recovery from persisting exactly the reverse manner to the initial normalization in the
stagnation in improvement of individuals in the population pre-processing phase. All the membership functions, for
by mutating portions of unfit individuals denoted as the example, are de-normalized to deal with actual sensor val-
open population step. Second when it reaches any maxima

20
ues. The fuzzy system thus constructed is depicting the real proposed scheme. An account of each phase of experiments
logic i.e. it can process real inputs and render real outputs. has been laid down as below.

2.6 Deployment For Environment Setup, the inputs of the target system
The learner robot is now ready to switch control to the are analyzed and thus six inputs are selected for the test,
learned fuzzy system controller which is done in a straight- namely, right front leg joints i.e. RFLEG_J1, RFLEG_J2,
forward manner. The demonstration setup is reused in the RFLEG_J3, right front leg paw switch i.e. RFLEG_SW,
robot with some precise alterations as depicted in Figure 4. PSD, and frame i.e. system timer. Note that frame is not a
literal sensor, but as time can possibly be a factor of the
Fuzzy target system, frame is included in the inputs list for the
Inference Fuzzy sake of completeness. The outputs of the system are
Driver System
RFLEG_J1, RFLEG_J2, and RFLEG_J3 i.e. the three joints
of the front leg.
The mechanism developed for EHC to perform the
Controller Interface
demonstration is very intuitive. The EHC can simply hold
the leg of the robot, and maneuver it manually to land it on
the surface below as if the demonstrator robot was landing
Set Get
Functions Functions the foot. The controller interface is developed such that it
can reverse engineer the commands necessary to perform
the forced movements by the EHC, and pass it to the Ob-
server for logging. The get functions module is implemented
Actuators Sensors by a single Open-R object to simply pass on the read values
from the sensors to the controller interface. The Observer
Figure 4. Schematic view of learner robot system module is implemented in the same Open-R object, to store
the received observations, remotely on a desktop computer
First the complete system is copied on to the learner ro- host.
bot and then few alterations follow. The only significant The readings from all the identified inputs, namely
alteration is the replacement of EHC with fuzzy inference RFLEG_J1, RFLEG_J2, RFLEG_J3, RFLEG_SW, PSD,
driver module. The fuzzy inference driver, coupled with the and frame are taken and passed on the Observer module.
learned fuzzy system, replaces the EHC. The fuzzy infer- The Demonstration is performed by the EHC multiple times
ence driver is a very lean piece of logic that receives the in a row at different initial orientation of the right leg.
inputs of the sensors via the controller interface, passes it to For Observations Preprocessing the complete observa-
the fuzzy system for inference, and returns the fuzzy sys- tional data is passed on to normalization routine to convert
tem’s outputs to the controller interface, destined towards all the field values onto the normal scale between 0 and 1.
the set functions to control actuators. Note that the observer This data now enters the Fuzzy System Learning phase
is no longer required and is thus removed from the system. where the genetic algorithm searches for a fuzzy system
with MSE less than the configured acceptable MSE (e.g.
3 Experimentation 0.02). Once such a system is learned its membership func-
The proposed scheme is being implemented on a simple tions are denormalized to depict real situation and matching
problem using Sony’s AIBO® robot dog as the learner and response as explained in 2.5.
demonstrator robot. In order to test the genetic fuzzy im- The learned system is manually deployed in the learner
plementation of the learning scheme AIBO ERS-220 is pro- robot. The controller interface is slightly modified to create
grammed using Open-R SDK version 1.1.4. a command object from the output of the inference driver
The target action is simple yet non-trivial right fore-leg and pass it to the Set functions module. The learner robot is
movement to land it on the ground properly starting from an then rebooted such that the new fuzzy system controller
arbitrary position. Note that the learner robot and the takes over the control and performs in real environment.
scheme will be said successful if it can properly land it’s
arbitrarily positioned right front leg (only) on the ground 3.7 Results Analysis
below at a particular distance. Proper foot landing on
ground means to make foot contact with the flat surface All experiments eventually generated a fuzzy system to
below such that the leg is not applying any undue force on mimic the control logic of the EHC by running for a few
the surface. It is assumed that robot main body is firmly hours. The fuzzy system generated each time was slightly
suspended i.e. managing balance and the three remaining different from the rest. However the common observation in
limbs are out of scope of this experiment. The complete every learned fuzzy system is that it is compact with only
experiment was repeated multiple times according to the two rules and often with a total of eleven membership func-
tions for the 3 input and 3 output variables. The remaining

21
three input variables PSD, PAWSW and FRAME have no ing out demonstration e.g. human control through a joystick
membership function, which means that each time the learn- setup, an ageing robot controller, etc.
ing algorithm has discarded these three variables as non- The scheme presented and implemented has success-
contributing inputs. Almost every time the system is put to fully combined good human control capabilities and soft-
real action in the learner robot it is able to land the foot computing based learning algorithm to program intelligent
starting from any arbitrary initial position with realistic ele- robotic systems through demonstration over a robot. It cov-
gance as judged by the naked eye. Thus a fuzzy system with ers the end-to-end details of the process and is intentionally
an acceptable behavior and a very compact structure has kept generic.
been generated to depict the target action logic. The experimentation conducted so far is focused on the
Taking one of the experiment’s results as example proof of the concept only. There is room for further en-
where a population size of 20, an acceptable MSE value of hancements, most exciting of which is to mature and de-
2%, and initial rule count of 10 was used. An acceptable velop soft-computing centric techniques for heterogeneous
fuzzy system was returned after 43125 generations and 316 imitation i.e. learning robot control programs from human
stagnations as given in Figure 5. and dissimilar robot demonstrations.

References
[Noble and Todd, 1999] J. Noble, and P. M. Todd. "Is it
really imitation? A review of simple mechanisms in so-
cial information gathering." The Symp. on Imitation in
Animals and Artifacts, Artificial Intelligence and the
Simulation of Behaviour Convention, Edinburgh, April
1999.
[Mataric, 2002] M. J. Mataric. "Sensory-motor primitives as
a basis for imitation - linking perception to action and
biology to robotics". Imitation in Animals and Artifacts,
pp. 391-422, MIT Press, 2002. ISBN 0-262-04203-7.
[Alissandrakis et al., 2001] A. Alissandrakis, C. L. Nehaniv,
and K. Dautenhahn. "Through the looking-glass with
ALICE - Trying to Imitate using correspondences".
Proc. First Intl. Workshop on Epigenetic Robotics, Mod-
eling Cognitive Development in Robotic Systems, pp.
Figure 5. Learned Fuzzy System
115-122, Sweden, 2001.
Thus the implicit expert control rules have been eluci- [Herrera, 1998] F. Herrera. "A Learning Process for Fuzzy
dated in a human readable format by the learning algorithm Control Rules using Genetic Algorithms". Fuzzy Sets
using the proposed scheme. The dependency, in develop- and Systems, vol.100, Issue 1-3, pp. 143-158, November
ment of a fuzzy membership functions and rules, to control 1998. ISSN 0165-0114.
a robot, on a real expert has been eliminated. A repeatable [Billard, 2000] A. Billard. "Imitation - a means to enhance
process is given that can produce new fuzzy system-based the synthetic proto-language in an autonomous robot". In
controller in reasonable time successfully. C. Nehaniv, and K. Dautenhahn, editors, Imitation in
Animals and Artifacts, MIT Press, 2000.
4 Conclusion [Maistros and Hayes, 2001] G. Maistros, and G. Hayes. "An
The scheme has been highlighted in a number of ways. imitation mechanism for goal-directed actions". Proc.
First, genetic algorithm based fuzzy system learning, a soft- TIMR, 2001.
computing technique has been applied in robot control [Hayes and Demiris, 1994] G. Hayes and J. Demiris. "A
learning through robot demonstration. Second, the complete robot controller using learning by imitation". Proc. the
implementation of the genetic fuzzy learning algorithm pro- Intl. Symp. On Intelligent Robotic Systems, pp. 198-204,
posed in [Surmann, 2000] has been done and tested. Third, a France, 1994.
simple generic process and control architecture is proposed [Nicolescu and Mataric, 2001] M. N. Nicolescu and M. J.
which is used in two steps i.e. firstly implementation of raw Mataric. "Experience-based representation construction -
interfaces to link hardware with a human/program controller learning from human and robot teachers". Proc.
through a controller interface, and secondly to replace the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems,
controller with the learned fuzzy controller. The architecture IROS, USA, 2001.
allows any form of human or program controller for carry- [Takagi and Sugeno, 1985] T. Takagi, and S.M. Sugeno.
"Fuzzy identification of systems and its application to

22
modeling and control". IEEE Trans. Syst., Man, and Cy-
bernetics, vol. 15, 1985.
[Castellano and Fanelli, 2000] G. Castellano, and A. M.
Fanelli. "Fuzzy inference and rule extraction using neu-
ral networks". Neural Network World Journal, no.3, pp.
361-371, 2000.
[Linkens and Chen, 1999] D.A. Linkens, and M.Y. Chen.
"Input selection and partition validation for fuzzy model-
ing using neural network". Fuzzy Sets and Systems, vol.
107, 1999.
[Jeong and Oh, 1999] J. Jeong, and S. Oh. "Automatic rule
generation for fuzzy logic controllers using rule-level co-
evolution of subpopulations". Proc. The Congress of
Evolutionary Comp., 1999.
[Karr, 1997] C. L. Karr. "Fuzzy-evolutionary systems". In
T. Back, D. B. Fogel, and Z. Michalewicz, editors,
Handbook of Evolutionary Comp., Oxford University
Press, 1997.
[Surmann, 2000] H. Surmann. "Learning a fuzzy rule based
knowledge representation". Proc. 2nd ICSC Symp.. on
Neural Comp., NC'2000, 23-26, pp. 349-355, 2000.
ISBN 3-906454-22-3.
[Surmann and Selenschtschikow, 2002] H. Surmann, and A.
Selenschtschikow. "Automatic generation of fuzzy logic
rule bases: examples 1". Proc. First Intl. ICSC Conf. on
Neuro-Fuzzy Technologies, NF'2002, Cuba, 2002. ISBN
3-906454-29-0.
[Mugica and Nebot, 1996] F. Mugica, and A. Nebot. "A
specialization of k-nearest neighbor classification rule
for the prediction of dynamical systems using FIR". Ad-
vances in Artificial Intelligence and Engineering Cyber-
netics, vol. III, Baden-Baden, Germany, August 1996.
[Krone and Kiendl, 1996] A. Krone, and H. Kiendl. "Rule-
based decision analysis with fuzzy-ROSA method".
EFDAN’96, Dortmund, Germany, 1996.
[Figueiredo and Gomide, 1999] M. Figueiredo, and F.
Gomide. "Design of fuzzy systems using neurofuzzy
networks". IEEE Trans. on Neural Networks, vol. 10, no.
4, 1999.
[Lin and Lu, 1996] C. Lin, and Y. Lu. "A neural fuzzy sys-
tem with supervised learning". IEEE Trans. Syst., Man,
and Cyb., part B, vol. 26, 1996.
[Wang and Mendel, 1992] L.X. Wang, and J. Mendel.
"Generating fuzzy rules by learning from examples".
IEEE Trans. Syst., Man, and Cyb., vol. 22, 1992.

23
Controlled Deployment of Sensor Networks for Coverage Optimization∗

Sameera Poduri, Gaurav S. Sukhatme


University of Southern California
Computer Science Department
spoduri, gaurav@pollux.usc.edu

Abstract
Controlled deployment of sensor networks is fea-
sible when the positions of individual nodes can
be controlled. In contrast to random, high den-
sity deployment of nodes, which is a common as-
sumption in sensor network literature, controlled
deployment is better suited to a large number of
applications and is significantly more efficient than
random deployment. The problem of controlling
the position of nodes to optimize sensing coverage Figure 1: (blanket) Coverage configuration with minimum
has been studied in classic geometric optimization density of nodes
problems such as the art-gallery problem, disk cov-
ering, voronoi tessellations, etc. For sensor net-
work applications, there are additional challenges physical environment that they are embedded in. The den-
because of communication constraints and lack of sity and locations of sensors will control the accuracy with
global knowledge of node locations and obstacles. which a phenomenon is captured. An important question to
This paper surveys recent research to address these address is how to deploy these networks. The common de-
challenges for sensing models of varying complex- ployment techniques are nodes being dropped by flying vehi-
ity. We summarize our results on two aspects of the cles, planted by robots or humans, or mobile nodes self de-
problem 1) local geometric conditions for deploy- ploying.
ment that can guarantee global coverage and con- Several Wireless Sensor Network (WSN) deployments will
nectivity, and 2) self deployment for mobile sensor involve control over positions of nodes. Following are some
networks with guaranteed connectivity, and 3)anal- example scenarios:
ysis of coalescence for mobile networks. 1. Urban WSNs of sensors like cameras, seismic sensors,
temperature sensors, etc., that must be carefully im-
1 Introduction planted on the walls or ceilings of buildings, and other
structures, so as to be unobtrusive.
Hundreds of temperature sensors distributed over a forest to
detect fires and help control them, seismic sensor-actuator 2. Sensors in water bodies that are either achored to stay in
systems embedded in buildings to sense and counteract earth- place or that are mounted on boats that can control their
quakes, camera units manipulating low resolution images to location.
check for intruders in a military location, chemical sensors 3. Sensors that are not inexpensive enough to afford high
mounted on buoys and boats monitoring water quality, are redundancy.
a few examples from the wide spectrum of applications of 4. Small and medium scale networks that are maintained
Wireless Sensor Networks (WSNs). A typical WSN consists by a robot or human. For example, in the NIMS
of a large number of short-range sensors equipped with mod- project [Kaiser et al., 2004], mobile nodes can control
est resources for computation, power, and wireless commu- the deployment locations of static nodes.
nication, that cooperate to infer global information about the

5. Mobile sensor networks.
This work is supported in part by the National Science Foun-
dation (NSF) under grants 0347621, 0325875, IIS-0133947, EIA- Position controlled deployment can significantly improve
0121141 and grants CCR-0120778, ANI-00331481 (via subcon- the performance and lifetime of a sensor network. The ad-
tract). Any opinions, findings, and conclusions or recommendations vantage over random deployment can be illustrated by the
expressed in this material are those of the authors and do not neces- following example. Consider sensors that can sense a cir-
sarily reflect the views of the National Science Foundation. cular disk of area S around them. If deployed randomly, the

24
density, λ, required to guarantee that each point in the do- • Binary Disk Model: f (i, q) = 1 if ||pi − q|| ≤ Rs (i),
main is sensed (with high probability) by at least one sen- 0 otherwise. Rs (i) is called the Sensing range of sensor
sor, λ ≥ −log(0.05)
S ≈ S3 which is very high compared to a i.
hexagonal placement (fig. 1) where the density ≈ 1.2S . In sev- • Binary Cone Model: The sensors are not isotropic.
eral applications, the expected lifetime of the network is very If the sensors are facing {θ1 , θ2 , ...θn } directions, then
long compared to the deployment phase and the effort put in f (i, (q)) = 1 if ||pi − q|| ≤ Rs (i) and angle(i, q) ≤
during deployment can lead to significant gains. Therefore φi . and 0 otherwise.
whenever possible, position control must be exploited. This
• Continuous Model: f (i, q) = g(||pi − q||) where g is
work considers the cases where position control is available
a monotonically decreasing continuous function.
and addresses the problem of finding sensor locations to op-
timize sensing coverage. A point in the domain, q is set to be covered if it is within
the sensing region of at least one node, i.e., if f (i, q) ≥ α
Outline for at least one i, where α is a minimum threshold. Similarly,
Section 3 surveys recent research to address these challenges a point qin the domain is k−covered if it is simultaneously
for sensing models of varying complexity. The coverage within the sensing region of k nodes. For sensors (like tem-
problem is closely related to well studied problems in geome- perature sensors) that measure at a point, the sensing range is
try like facility-location, art-gallery problem, packing and fill- a measure of sampling granularity.
ing of space. Sections 4-6 present our results on three aspects
of the problem 2) self deployment for mobile sensor networks Communication Model
with constrained node degree [Poduri and Sukhatme, 2004],
The nodes communicate over wireless. The probability that
2) local geometric conditions for deployment that can guar-
a message from node i reaches node j can be represented as
antee global coverage and connectivity [Poduri et al., 2005],
P RR(i, j). In general, P RR(i, j) 6= P RR(j, i).
and 3) analysis of coalescence in mobile networks [Poduri
and Sukhatme, 2007]. The next section defines the coverage • Binary Disk Model: P RR(i, j) = 1 if ||pi − pj || ≤
problem in the context of sensor networks. Since maintaining Rc (i), 0 otherwise. Rc (i) is called the Communication
communication connectivity is a necessary requirement for range of sensor i.
WSNs and communication links depend on node locations,
• Quasi Disc: P RR(i, j) = 1 if ||pi − pj || ≤ rc (i)
we call this the problem of constrained coverage.
and P RR(i, j) = 0 if ||pi − pj || > Rc (i) where
rc (i) ≤ Rc (i) are the minimum and maximum trans-
2 Constrained Coverage mission ranges of node i. For j such that rc (i) <
Given a bounded domain, where should n sensors be located ||pi − pj || ≤ Rc (i), P RR(i, j) is not specified. This
such that the coverage is maximized subject to some connec- model is close to realistic links [Zuniga and Krishna-
tivity constraints? machari, 2004].
Coverage • Probabilistic Model: In complex domains, P RR(i, j)
Depending on the application, the desired sensing coverage does not have a parametric form and can be estimated as
can be characterized in one of 3 ways [Gage, 1992]: a distribution. [Cerpa et al., 2005].
1. Blanket Coverage:
(a) Maximize the area sensed by a fixed number of sen- 3 Survey of Related Work
sors, i.e., maximize the number of detections per The coverage problem is closely related to several optimiza-
time. tion problems in geometry. We will first review key results
(b) Find the minimum number of sensors such that from these areas, discuss their applicability to sensor net-
each point in the domain is covered, i.e., minimize works and then focus on recent research on the coverage
the number of missed detections per area. problem in the context of sensor networks.
2. Barrier Coverage: Find an arrangement of a barrier of
static nodes that minimizes the probability of the target 3.1 Locational Optimization
penetrating an area undetected. Locational optimization deals with finding optimal locations
3. Sweep Coverage: Find a motion strategy for a group of for resources (like mailboxes) that can provide local services.
mobile sensors such that the number of event detection A general problem can be formulated as follows. Find loca-
per time is maximized or the number of event misses tions of n facilities in a bounded domain, such that the aver-
per area is minimized. This can be treated as a moving age distance to the nearest facility is minimized [Okabe and
barrier. Suzuki, 1997]. The facilities could be point-like, line-like, or
area-like and the distance measure can also vary. The case
Sensing Model with point-like facilities and euclidean distance measure is
Let Q be a bounded convex region and p = {p1 , p2 , ..., pn } similar to the sensor network coverage problem without con-
locations of n sensors. The sensing performance of each sen- nectivity constraints.
sor i at point q is given by f (i, q). Depending on the nature Given a fixed arrangement of sensors, their Voronoi di-
of f (i, q), we can classify sensors as follows: agram represents the partition of the domain such that the

25
Voronoi cell of a sensor contains all points that it ‘ser- 2003] and [Li, 2003] for detailed surveys). The primary ap-
vices’ or covers. We can compute the maximal breach proaches used, power control and sleep scheduling, operate
path [Meguerdichian et al., 2001] which is the path in the on a given well connected network and provide ways of prun-
domain that maximizes the distance between a moving target ing the network’s communication graph to save power. They
and its nearest sensor and thus captures the worst case cover- do not leverage possible control on node positions. While
age of the network. The optimal locations of sensors can be sensing coverage cannot be altered by power control, it is of-
found using a gradient descent approach called Lloyd’s de- ten the objective of sleep scheduling algorithms.
scent with the average sensor distance as the objective func- Sleep scheduling assumes an initial high density deploy-
tion [Cortés et al., 2004]. This can be used to find a dis- ment and seeks to activate only a fraction of the nodes.
tributed control law for a group of mobile nodes. The result- Several researchers have studied the problem of finding
ing configuration is the Centroidal Voronoi partition. A fun- the smallest set of active nodes that can simultaneously
damental drawback of this method is that it does not consider achieve coverage and connectivity where complete coverage
obstacles in the environment. is defined as every point in the network domain being
within the sensing range of at least one active node (Blanket
3.2 Art-Gallery Problem Coverage (b) ). Under this definition, coverage and connec-
Art-gallery is closely related to centralized deployment of tivity are not opposing goals; in fact, if the sensing range
line-of-sight sensors. It deals finding locations for line-of- is at least twice the communication range, then complete
sight sensors such that every point inside a polygon (that coverage implies connectivity [Zhang and Hou, 2005;
could possibly be non-convex and have holes) is visible from Wang et al., 2003a]. In contrast we focus on the problem
at least one sensor [O’Rourke, 1987]. The sensors are as- of maximizing sensing coverage given a fixed number
sumed to have infinite range and 360o field of view. For of nodes so that higher coverage, in most cases, implies
simple ploygons, b n3 c sensors are sufficient and sometimes poorer connectivity. OGDC [Zhang and Hou, 2005] and
necessary. However, finding the minimum number of sensors CCP [Wang et al., 2003b] are sleep scheduling protocols
is NP-Hard. For non-isotropic sensors the problem is more based on geometric conditions. While OGDC seeks to
complex.For the case when the cameras have limited field of minimize the number of active nodes, CCP can provide
view and/or limited sensing range, a number of algorithms different degrees of coverage and connectivity. Interest-
have been proposed to find approximate solutions [Kazazakis ingly, the geometric optimality conditions presented in
and Argyros, 2002] [Zhang and Hou, 2005], as pointed out by the authors, can
be attained exactly if the positions of nodes can be controlled.
3.3 Packing and Filling
Packing is the problem of placing solid shapes in a bounded Controlled deployment of a Static Network De-
domain such that no overlap is permitted. The problem is ployment and repair of static networks by mobile robots
to find the placement that minimizes uncovered area. In 2D, has been presented in [Batalin and Sukhatme, 2004;
only regular hexagons, squares and triangles can cover a do- Corke et al., 2004]. Typically, such algorithms have focussed
main completely. on the control strategies for the robot rather than properties
Filling is the problem of placing solid shapes such that a of the network. By providing conditions that are distributed
given bounded domain is completely covered. In sensor net- and can be easily combined with robot controllers, our work
works, we are interested in finding the placement of the min- provides a means of integrating the two objectives. For
imum number of sensors that can cover the domain. The op- a similar setting, Bredin, et. al., [Brendin et al., 2005]
timal configuration of binary disk sensors is shown in 1. have recently proposed a network repair strategy that finds
additional node locations required to ensure k-connectivity
3.4 Challenges in Sensor Network Placements in any given network. The algorithm is proven to be within
a constant factor of the absolute minimum number of addi-
Most solutions for the above mentioned problems involve tional nodes. In the context of repairing a network, the NET
centralized solutions and do not deal with communication condition provides a solution with a very low computational
constraints. In the context of sensor networks these problems cost but without guarantees on minimizing the number of
become challenging because of additional nodes.
1. Connectivity Constraints: there are constraints on the
connectivity of the network. Self Deployment of Mobile Robots: Potential field based
2. Environmental Complexity: the environment can have deployment algorithms also maximize coverage [Howard
complex shaped obstacles et al., 2002b]. An incremental deployment algorithm that
ensures line-of-sight connectivity has been presented in
3. Centralized vs Distributed Approaches the deploy- [Howard et al., 2002a]. The above algorithms, require
ment algorithm does not have global knowledge of the information about angle and distance to neighbors. They
environment or positions of other nodes. maximize coverage but do not guarantee network connectiv-
ity.
3.5 Coverage in Sensor Networks
Topology Control: Topology Control is considered a canon- Rendezvous: Rendezvous is the problem of multiple
ical problem in static and mobile ad-hoc networks (see [Santi, mobile robots meeting at a point in an unknown environ-

26
ment. Several distributed algorithms have been proposed. In our experiments, each node begins with more than K
Some of these assume connectivity [Ando et al., 1999; neighbors and repels all of them using Fcover till it has only
Lin et al., 2003]. If the communication graph is an RNG then K left. The resulting neighbors are called the node’s critical
rendezvous algorithms have been proposed that can deal with neighbors and the connections between them and the node
partial communication loss over some of the links [Cortes are called critical connections. It then continues to repel all
et al., in submission]. In the absence of connectivity, the its neighbors using Fcover but as the distance between the
robots can explore the environment building a map of the node and its critical neighbor increases, kFcover k decreases
landmarks [Roy and Dudek, 2001] where other robots are and kFdegree k increases. As a result, at some distance ηRc ,
likely to visit. This is similar in spirit to the Coalescence where 0 < η < 1, the net force kFcover + Fdegree k between
problem where the base station is a landmark whose lo- the node and its neighbor is zero. At this distance, the node
cation is unknown. However, our focus is on forming a and its neighbor are in equilibrium with respect to each other.
connected component while allowing the robots to remain Fig. 2 shows a node with > K and exactly K neighbors the
”spread out” so that the time for coalescence is reduced. corresponding force profiles.
Several algorithms have been proposed to maintain connec- Mathematically, the forces can be expressed as follows.
tivity of a wireless network [Zavlanos and Pappas, 2005; Consider a network of n nodes 1, 2, 3, . . . , n at positions
Spanos and Murray, 2004] given that the initial network is x1 , x2 , . . . , xn respectively. Let ∆xij represent the Eu-
connected. clidean distance between nodes i and j, i.e. ∆xij = kxi −xj k
Fcover and Fdegree are defined as follows.
4 Coverage with degree constraint 
−Kcover

xi − xj

Fcover (i, j) =
Evidence from the theory of random networks indicates that ∆xij 2 ∆xij
global network connectivity is strongly dependent on node (  
Kdegree xi −xj
degree [Gupta and Kumar, 2003], a local constraint. In- 2 if critical connection;
Fdegree (i, j) = (∆xij −Rc ) ∆xij
spired by this result, we developed a simple self deployment 0 otherwise.
algorithm for a group of mobile nodes that maximizes sens-
ing coverage with the constraint that each node has at least K where Kcover and Kdegree are the force constants. The resul-
neighbors, where K is a parameter to the algorithm. tant force between the nodes i and j is
F(i, j) = Fcover (i, j) + Fdegree (i, j)
and node i will experience a net force of
X
Fi = F(i, j)
all neighbors j

The equation of motion for node i is formulated as:


 
Fi − ν ẋi
ẍi (t) =
m
Non-Critical connection Critical Connection
where ν is a damping factor and m is the virtual mass of
the node which is assumed to be 1.
Virtual Forces

4.1 Simulation Results


Virtual Forces

0 0

Simulations were conducted in Player/Stage [Gerkey et al.,


Fcover F
2002]. Fig. 3 shows the initial and final configurations for a
cover

0.25 η Rc
Fdegree
0.25 ηR
Fdegree
network of 64 nodes and the variation in coverage and node
Rc c Rc
distance between nodes distance between nodes
degree for different values of K. The algorithm converges
Fcover Fcover + Fdegree within a minute and we did not observe any oscillations. Fig.
4 shows that the coverage obtained is significantly better than
Figure 2: Force Profiles that of a random network.

Our approach to the problem is based on virtual potential 5 Local Geometric Conditions to guarantee
fields. We construct virtual forces between nodes so that each global Coverage and Connectivity
node can attract or repel its neighbors. The forces are of two In this section, we present a position control based topology
kinds (fig 2). The first, Fcover , causes the nodes to repel construction mechanism that has two key features:
each other to increase their coverage, and the second, Fdegree
constrains the degree of nodes by making them attract each • the required conditions are simple, local and geometric,
other when they are on the verge of being disconnected. By and
using a combination of these forces each node maximizes its • varying a single parameter (θ) provides tunable network
coverage while maintaining a degree of at least K. coverage and connectivity.

27
50
110

100 K=0 45

90 40

K=2
80 35

Average Degree of Nodes


70
30

%Coverage
60
25

50
K=5
20
40
K=8 15
30
K=8
10
20
K=5
5
10 K=2

K=0
0
0 0 50 100 150 200 250 300 350
0 50 100 150 200 250 300 350
Time in seconds Time in seconds

(a) (b) (c) (d)

Figure 3: Snapshots of simulation in Player/Stage with 64 nodes. Blue circles indicate sensing ranges. (a) Initial Configuration
and (b) Final Configuration. Variation of (c) Coverage and (d) Node degree with time for different values of K.

110 16

100
14
Ti+1
90

12
80

70 10
Average degree
%Coverage

60
8
Yi+1
50

40 6
Ii
30
θi
4
20 Tiled Network

θi
Potential Field Network
Tiled Network
10
Random Network 2 Potential Field Network
Random Network
X
0
Ti
−2 0 2 4 6 8 10 12 0
−2 0 2 4 6 8 10 12
K K desired
Yi
(a) (b) 2R c

c
R
Figure 4: Coverage and node degree comparison with optimal
and random networks

Figure 6: Coverage with k nodes placed on the communica-


X
tion perimeter of node X. The shaded area is the total cover-
Rc Rc ≤θ age.

have been well studied and are known to be desirable global


(a) (b) topologies for wireless communication networks including
guaranteed connectivity. Given a binary disk communication
Figure 5: Illustration of the NET condition. In a) X does not model, we can prove the following 3 properties:
satisfy NET condition. It has one sector greater than θ with no
neighbor. In b) X satisfies NET condition, the largest sector 1. If each node has at least one neighbor in every 2π
3 sector
with no neighbors is smaller than θ. of its communication disk, the communication graph is
a supergraph of RNG. Moreover, 2π 3 is the largest angle
that satisfies this property.
This mechanism is based on the Neighbor-Every-Theta
(NET) condition.
We define the Neighbor-Every-Theta (NET) condition for a 2. If each node has at least one neighbor in every
node as requiring at least one neighbor in every θ sector of its θ = 2 arccos( Rrc ) sector of its communication disk, the
communication range (fig. 5). For finite networks, boundary communication graph is a supergraph of GG.
nodes or nodes on the network edge cannot satisfy such a
condition. For a given θ, we define boundary nodes as those
which have no neighbor in at least one θ sector of infinite 3. If each node has at least one neighbor in every
radius. A NET graph is one in which every node except the θ = 2 arccos( Rrc ) sector of its communication disk, the
boundary nodes satisfies the NET constraint for a given θ. communication graph is a supergraph of DelG.
5.1 Connectivity Analysis of NET Graphs
Proximity graphs, such as the Relative Neighborhood Graph It is our conjecture that for θ < π the edge connectivity of
(RNG), Gabriel Graph (GG) and the Delaunay Graph (DelG) the NET graph is at least b 2πθ c.

28
Tiling type Degree K-Conn No. of edges
Line 2 1 Θ(n)(≈ n)
Hexagonal 3 3 Θ(n)(≈ 23 n)
Square 4 4 Θ(n)(≈ 2n)
Triangle 6 6 Θ(n)(≈ 3n)
(a) t = 0 (b) t = HT1 (c)
Table 1: Properties of Tiling Configurations

5.2 Coverage Analysis of NET Graphs


Suppose that in order to satisfy the NET sector conditions,
a node must have k neighbors. Assuming a circular sensing (d) t = HT2 (e) t = HT3 (f) t = HT4
range of radius Rs , we can show that
For Rs = Rc , the area coverage is maximized when the Figure 9: Illustration for Coalescence for N = 4. The blue
k ≥ 3 nodes are placed at the edges of k disjoint 2π
k sectors cone represents the base station, red circles represent robots
of sensing disk (fig: 6) and the grey discs are the communication areas. The dotted
circles show robot positions in the previous time step. As
Based on this theorem, we can compute an upper bound on robots join the base station’s network, the area of the con-
the maximum achievable coverage of a NET graph as a func- nected component grows and the robots that are still search-
tion of k. This gives a guideline to locally maximize cover- ing have a better chance of hitting it.
age while satisfying the NET condition. For k = 3, 4, 6, it
is possible to place nodes such that each node has its neigh-
bors placed symmetrically on its communication range. The compared to the incremental approach. Even in the presence
resulting communication graph will be a tiling of the space: of boundary nodes in real networks, these results validate the
hexagonal, square and triangle for k = 3, 4 and 6 respec- assertion in Theorem 1 that the communication graph will
tively. In these cases, since each node maximizes coverage contain the RNG for θ ≤ 2π
locally, the total coverage of the network will also be the max- 3 .

imized. Due to their symmetry, tiling graphs possess some


other desirable properties (Table 1). 7 Coalescence
6 Deployment of NET graphs For self deployment of mobile nodes, maintaining connec-
NET conditions can easily be incorporated into deployment tivity (and other network constraints) is the key challenge.
of a static network by mobile agents. The agent can incre- In the above algorithms, this was achieved by starting out
mentally add nodes by picking the best location of the next with a network that satisfied the constraint and only allowing
node based on the geometry of the existing network. We can constraint-preserving steps. If at some point during the de-
also incorporate these rules into self-deployment algorithms ployment, the constraint is broken (because of reasons such
for mobile nodes based on potential fields. as node failure), it is difficult for the network to recover. One
In a typical mobile deployment scenario, all nodes start repair mechanism could be that the nodes stop the deploy-
in positions close to each other so that the initial network is ment algorithm as soon as connectivity is broken and start
highly well connected and the NET condition is satisfied. Ev- searching for disconnected nodes. Once every one is con-
ery node applies a force of Frepel + Fattract on each its nected, the deployment restarts. Coalescence is the problem
neighbors. As a result of these forces, each pair of nodes of isolated robots independently searching for each other and
moves as far away from each other as possible without los- forming larger connected components.
ing connectivity with each other. After every time step, each We are interested in understanding the time taken for nodes
node evaluates the contribution of each of its edges towards to regain connectivity. As an extreme case, we consider N
satisfying the NET condition. An edge that is not necessary disconnected nodes searching for other nodes and forming a
to satisfy the condition of either of the nodes it connects is single connected component. We assume that there exists a
broken by turning off Fattract . As a result of these forces the single ”special” node called the base station that is known to
nodes spread out as far as possible till no more edges can be all nodes. Nodes perform independent random walks and stop
broken. their search as soon as they are connected to the base station
Fig. 7(a) shows that the distributed algorithm has a lower (fig. 9).
coverage but greater connectivity as compared to the incre- Let HT1 ≤ HT2 ≤ · · · ≤ HTN be the 1st , 2nd , · · · , N th
mental approach. The edge connectivity is greater than b 2π θ c hitting times i.e., the times at which the 1st , 2nd , · · · , N th
which is in agreement with our conjecture. nodes “hit” or join the connected component of the base
For θ = 2π 3 , the communication graph contains the RNG station. As nodes join the base station, the communication
(fig. 8). In contrast to the incremental approach, we observe spread of the connected component increases and as a result
that there are no edges at the boundary that are present in the the probability of a searching node finding the component in-
RNG and are not contained in the communication graph. This creases. It can be shown that the time for coalescence de-
is because more boundary nodes satisfy the NET condition creases as N1 with the number of nodes N .

29
(a) (b)

Figure 7: Performance of the Incremental and Distributed deployment Algorithms for a deployment of 100 nodes in terms of
(a) Coverage and (b) Edge Connectivity

Student Version of MATLAB Student Version of MATLAB Student Version of MATLAB

(a) (b) (c)

Figure 8: Comparison of communication graph and corresponding RNG[Distributed Deployment, 100 nodes, θ = 2π 3 ] a)
communication graph b) RNG c) Dashed lines indicate edges in RNG that are not in communication graph. Dark lines indicate
edges in communication graph that are not in RNG.

8 Conclusions gorithm for mobile robots with limited visibility. IEEE


This paper presents an extensive survey of research on Cover- Transactions on Robotics and Automation, pages 818–
age optimization in Sensor Networks. This problem is related 828, Oct. 1999.
to several problems in computational geometry like facility-
[Batalin and Sukhatme, 2004] M. Batalin and G. Sukhatme.
location, art-gallery problem, packing and filling, etc. We
argue that several sensor network deployments will involve Coverage, exploration and deployment by a mobile robot
careful placement of nodes as against random high density and communication network. In Telecommunication Sys-
deployments. For such networks we show that efficient net- tems Journal, Special Issue on Wireless Sensor Networks,
works can be constructed by exploiting placement control. 26(2):181–196, 2004.
We present our recent results on 3 fronts - 1) Self-deployment [Brendin et al., 2005] J. Brendin, E. Demaine, M. T. Haji-
of mobile sensor networks 2) Local conditions to guarantee
aghayi, and D. Rus. Deploying sensor nets with guaran-
global coverage and connectivity 3) analysis of coalescence
teed capacity and fault tolerance. In ACM Intl. sympo-
for mobile networks.
sium on Mobile ad hoc networking and computing, Mobi-
Hoc’05, Urbana-Champaign, IL, May 2005. to appear.
References
[Ando et al., 1999] H. Ando, T. Oasa, J. Suzuki, and M. Ya- [Cerpa et al., 2005] Alberto Cerpa, Jennifer L. Wong,
mashita. Distributed memoryless point convergence al- Louane Kuang, Miodrag Potkonjak, and Deborah Estrin.

30
Statistical model of lossy links in wireless sensor net- lems in wireless ad-hoc sensor networks. In IEEE Info-
works. In IPSN, 2005. com, 2001.
[Corke et al., 2004] P. I. Corke, S. E. Hrabar, R. Peterson, [Okabe and Suzuki, 1997] A. Okabe and A. Suzuki. Loca-
D. Rus, S. Saripalli, and G. S. Sukhatme. Autonomous tional optimization problems solved through voronoi dia-
deployment and repair of a sensor network using an un- grams. Eur. Journal of Operationa Research, 98(3):445–
manned aerial vehicle. In IEEE Intl. Conf. on Robotics 456, 1997.
and Automation, pages 3602–3609, Apr 2004. [O’Rourke, 1987] J. O’Rourke. Art Gallery Theorems and
[Cortés et al., 2004] J. Cortés, S. Martı́nez, T. Karatas, and Algorithms. Oxford University Press, 1987.
F. Bullo. Coverage control for mobile sensing net- [Poduri and Sukhatme, 2004] S. Poduri and G. Sukhatme.
works. IEEE Transactions on Robotics and Automation, Constrained coverage for mobile sensor networks. In IEEE
20(2):243–255, April 2004. Intl. Conf. on Robotics and Automation, pages 165–172,
[Cortes et al., in submission] J. Cortes, S. Martinez, and May 2004.
F. Bullo. Robust rendezvous for mobile autonomous [Poduri and Sukhatme, 2007] S. Poduri and G. S. Sukhatme.
agents via proximity graphs in d dimensions. IEEE Trans- Latency analysis of coalescence in robot groups. 2007.
actions on Automatic Control, in submission.
[Poduri et al., 2005] Sameera Poduri, Sundeep Pattem,
[Gage, 1992] D. W. Gage. Command control of many-robot Bhaskar Krishnamachari, and Gaurav S. Sukhatme. A
systems. In AUVS, pages 28–34, Huntsville, AL, 1992. unifying framework for tunable topology control in sensor
[Gerkey et al., 2002] B. Gerkey, R. T. Vaughan, A. Howard, networks. Technical report, University of Southern
G Sukhatme, and M. J. Mataric. Most valuable player: A California, Center for Robotics and Embedded Systems
robot device server for distributed control. In IEEE/RSJ Technical Report, CRES-05-004, 2005.
Intl. Conf. on Intelligent Robots and Systems, IROS’02, [Roy and Dudek, 2001] N. Roy and G. Dudek. Collabora-
pages 299–308, 2002. tive exploration and rendezvous: Algorithms, performance
[Gupta and Kumar, 2003] P. Gupta and P. R. Kumar. To- bounds and observations. Autonomous Robots, 11(2):117–
wards an information theory of large networks: An achiev- 136, Sept. 2001.
able rate region. IEEE Transactions on Information The- [Santi, 2003] P. Santi. Topology control in wireless ad hoc
ory, 49(8):1877–1894, Aug. 2003. and sensor networks. Technical Report IIT-TR-04/2003,
[Howard et al., 2002a] A. Howard, M. J. Mataric, and G. S. Institute for Informatics and Telematics, Pisa, Italy, 2003.
Sukhatme. An incremental self-deployment algorithm for [Spanos and Murray, 2004] D. Spanos and R. M. Murray.
mobile sensor networks. Autonomous Robots, Special Robust connectivity of networked vehicles. In Interna-
Issue on Intelligent Embedded Systems, 13(2):113–126, tional Conference on Decision and Control, pages 2893–
2002. 2898, 2004.
[Howard et al., 2002b] A. Howard, M. J. Mataric, and G. S. [Wang et al., 2003a] X. Wang, G. Xing, Y. Zhang, C. Lu,
Sukhatme. Mobile sensor network deployment using po- R. Pless, and C. Gill. Integrated coverage and connec-
tential fields: A distributed, scalable solution to the area tivity configuration in wireless sensor networks. In First
coverage problem. In Distributed Autonomous Robotic ACM Conf. on Embedded Networked Sensor Systems, Sen-
Systems, pages 299–308, 2002. sys’03, Nov 2003.
[Kaiser et al., 2004] W. Kaiser, G. Pottie, M. Srivastava, [Wang et al., 2003b] X. Wang, G. Xing, Y. Zhang, C. Lu,
G .S. Sukhatme, J. Villasenor, and D. Estrin. Networked R. Pless, and C. Gill. Integrated coverage and connec-
infomechanical systems (nims) for ambient intelligence. tivity configuration in wireless sensor networks. In First
Ambient Intelligence, Springer-Verlag, 2004. ACM Conf. on Embedded Networked Sensor Systems, Sen-
[Kazazakis and Argyros, 2002] G. D. Kazazakis and A. A. sys’03, Nov 2003.
Argyros. Fast positioning of limited-visibility guards for [Zavlanos and Pappas, 2005] M. M. Zavlanos and G. J. Pap-
the inspection of 2d workspaces. In IEEE International pas. Controlling connectivity of dynamic graphs. In In-
Conference on Intelligent Robots and Systems, 2002. ternational Conference on Decision and Control, pages
[Li, 2003] X. Li. Topology control in wireless ad hoc net- 6388–6393, 2005.
works. Book Chapter in ”Ad Hoc Wireless Networking” [Zhang and Hou, 2005] H. Zhang and J. C. Hou. Maintain-
Kluwer, edited by XiuZhen Cheng, Xiao Huang and Ding- ing coverage and connectivity in large sensor networks.
Zhu Du, 2003. The Wireless Ad Hoc and Sensor Networks: An Intl. Jour-
[Lin et al., 2003] J. Lin, A. S. Morse, and B. D. O. Ander- nal, 2005.
son. The multi-agent rendezvous problem. In Interna- [Zuniga and Krishnamachari, 2004] M. Zuniga and B. Kr-
tional Conference on Decision and Control, pages 1508– ishnamachari. Analyzing the transitional region in low
1513, 2003. power wireless links. In First IEEE Intl. Conf. on Sen-
sor and Ad hoc Communications and Networks (SECON),
[Meguerdichian et al., 2001] S. Meguerdichian, F. Koushan-
Santa Clara, CA, Oct 2004.
far, M. Potkonjak, and M.B. Srivastava. Coverage prob-

31
Glowworms-inspired multi-robot system for multiple source localization tasks

P. Amruth, K.N. Krishnanand, and D. Ghose


Guidance, Control, and Decision Systems Laboratory,
Department of Aerospace Engineering,
Indian Institute of Science, Bangalore.

Abstract • Interference between sources: A source may subsume


the influence of another source whose concentration
We address the problem of multiple source lo-
level is relatively small.
calization where a collection of mobile robots is
used to localize multiple radiation sources such as • Environment infested with forbidden regions/obstacles.
nuclear/hazardous aerosol spills, origins of a fire • Lack of gradient information.
calamity, and hostile transmitters/sensors scattered
on a spatial landscape. Our approach consists of a • Complex source placements.
distributed algorithm based on a glowworm swarm • Discontinuities in the nutrient profile.
optimization (GSO) technique that finds solutions
to multimodal function optimization problems. We • Noisy measurements.
present a multi-robot system - a set of four wheeled Different aspects of the same problem belong to topics
robots called glowworms - that can be used to per- such as multi-agent rendezvous, multimodal function opti-
form multiple source localization tasks. Implemen- mization, and bio-inspired algorithms.
tation of the glowworm algorithm requires each
robot, during each cycle, to perform tasks such 1.1 Multi-agent Rendezvous
as computation of local-decision domain, identi-
fication and localization of neighbors, and selec- The theoretical aspects of the problem are related to problem
tion of a leader among the current neighbors dur- of consensus (agreement) in multi-agent networks [Olfati-
ing each iteration. Preliminary experiments in this Saber and Murray, 2004] where agents transit from an ini-
regard validate each robot’s capability to perform tially random state to a final steady state where they agree
the above basic algorithmic primitives. Finally, we upon their individual state values. The state could be ei-
describe the results of a sound localization experi- ther position, heading angle, frequency of oscillation, etc.
ment with a set of four glowworms. The agreed upon quantity is usually called the group deci-
sion value. Multi-agent rendezvous [Ando et al., 1999] is
a specific instance of the consensus problem where the goal
1 Introduction is to design control laws that enable a group of randomly de-
Localization of multiple emission sources using networked ployed mobile agents to steer towards, and meet at, a common
mobile robots has received some attention recently [Cui et destination. Multiple source localization is a variation of the
al., 2004] in the collective robotics community. In particular, above problem as it involves co-location of agents at multiple
the goal of multiple source localization is to use a collection positions instead of convergence to a single location. Kaipa
of mobile robots to localize multiple sources of a general nu- and Ghose [2006b] report the theoretical foundations of the
trient profile that is distributed spatially on a two dimensional above problem where they address the rendezvous of visibil-
landscape. The problem is representative of a wide variety ity limited mobile-agents at multiple locations.
of applications that include detection of multiple radiating
sources such as nuclear/hazardous aerosol spills, origins of 1.2 Multimodal Function Optimization
a fire calamity, and hostile sensors/transmitters that are scat- Multimodal function optimization has been addressed ex-
tered over a region. We assume that the nature of the nutri- tensively in the recent literature [Parsopoulos and Vrahatis,
ent profile is such that the concentration of the nutrient peaks 2004], [Kaipa and Ghose, 2006a]. Most prior work on this
at the source location and monotonically decreases with dis- topic focussed on developing algorithms to find either the
tance from the source. We also assume that the concentra- global optimum or all the global optima of the given multi-
tion levels at different source locations need not necessarily modal function, while avoiding local optima. However, there
be equal. The above problem is non-trivial for a couple of is another class of optimization problems which is differ-
reasons that can be summarized as follows: ent from the problem of finding only the global optimum
• Non availability of a priori information. of a multimodal function. The objective of this class of

32
multimodal optimization problems is to find multiple op- is proportional to the associated luciferin. Each glowworm
tima having either equal or unequal function values [Kaipa uses the luciferin to (indirectly) communicate the function-
and Ghose, 2006a]. In particular, we are interested in de- profile information at its current location to the neighbors.
veloping an algorithm which, while serving for numerical The glowworms depend on a adaptive local-decision domain,
multimodal function optimization on one hand, could be di- which is bounded above by a circular sensor range, to com-
rectly used in a realistic collective robotics task of simulta- pute their movements. Each glowworm selects a neighbor
neously localizing multiple sources of interest like nuclear that has a luciferin value more than its own, using a prob-
spills, aerosol/hazardous chemical leaks, and fire-origins in abilistic mechanism, and moves towards it. That is, they
a fire calamity. For instance, several forest fires at different are attracted to neighbors that glow brighter. These move-
locations give rise to a temperature profile that peaks at the ments, that are based only on local information, enable the
locations of the fire. Similar phenomenon can be observed in glowworms to split into subgroups and exhibit a simultane-
nuclear radiations and electromagnetic radiations from signal ous taxis-behavior towards the optimum locations leading to
sources. In all the above situations, there is an imperative the capture of multiple optima of the given objective function.
need to simultaneously identify and neutralize all the multi- Concepts underlying the basic GSO algorithm and prelimi-
ple sources using a swarm of robots before they cause a great nary real-robot implementation results can be found in [Kaipa
loss to the environment and people in the vicinity. and Ghose, 2005] and [Kaipa et al., 2006], respectively. This
paper presents crucial modifications such as:
1.3 Bio-inspired Algorithms
1. The neighbor-selection probability is made a function of
Population-based algorithms could be broadly divided into
relative luciferin (luciferin-difference), instead of abso-
two categories: evolutionary computation (EC) techniques
[Goldberg and Richardson, 1987] that are based on evolution- lute luciferin values.
ary mechanisms encountered in natural selection and swarm 2. The adaptive local-decision update rule is modified with
intelligence (SI) methods [Dorigo and Stutzle, 2004] that are improvements in performance by a factor of approxi-
inspired by flocking mechanisms found in biological swarms mately two.
like ants, termites, bees, wasps, and bacteria. The intelligent
3. The robots used in this paper use a circular ring of 16 in-
group behavior exhibited by these social species is a result
frared transmitters, as opposed to a combination of four
of actions performed by relatively simple individuals that are
vertical transmitters and a conical reflector [Kaipa et al.,
solely based on neighbor-interactions and local information
2006], to achieve a near-circular pattern of illumination
from the environment inhabited by the agent-collective. Usu-
around the robot.
ally, the emergent behavior is utilized in an appropriate man-
ner in order to perform certain complex tasks. The above We present results of preliminary experiments that are con-
behavioral metaphor offers an insight into the basis to devise ducted to test each robot’s capability to implement the various
distributed algorithms that solve complex problems related subtasks of the algorithm. Later, we demonstrate a real-robot-
to diverse fields such as optimization, multi-agent decision implementation of the glowworm algorithm in the context of
making, and collective robotics. Recent literature abounds a sound localization task.
with examples of such biomimetic algorithms including ant
colony optimization (ACO) techniques, particle swarm opti-
mization (PSO) algorithms, and several swarm based collec- 2 Glowworm Algorithm
tive robotic algorithms. While the application of evolution- The GSO algorithm starts by placing the glowworms ran-
ary approaches is largely limited to numerical optimization domly in the workspace so that they are well dispersed. Ini-
problems, the particle-nature of individuals in swarm based tially, all the glowworms contain an equal quantity of lu-
optimization algorithms enable their application to realistic ciferin (`j (0) = `0 , ∀ j). Each iteration consists of a
collective robotics tasks. luciferin-update phase followed by a movement-phase based
In this paper, we present a multi-robot system that imple- on a transition rule [Kaipa and Ghose, 2006a].
ments a distributed algorithm to achieve multiple source lo- Luciferin-update phase: The luciferin update rule is given by:
calization tasks. The algorithm is based on a novel glow-
worm swarm optimization technique (GSO) originally devel- `j (t + 1) = (1 − ρ)`j (t) + γJj (t + 1) (1)
oped by Kaipa and Ghose [2006a], to find solutions to mul-
timodal function optimization problems. The GSO algorithm where, ρ is the luciferin decay constant (0 < ρ < 1) and γ is
[Kaipa and Ghose, 2006a] is in the same spirit as the ACO the luciferin enhancement constant and Jj (t) represents the
[Dorigo and Stutzle, 2004] techniques but is different in many value of the objective function at agent j’s location at time t.
aspects that help in achieving simultaneous detection of mul- While the first term in (1) represents the decaying nature
tiple local optima of multimodal functions. Generally, ACO of the luciferin that indirectly allows the agents to escape in-
techniques are used for locating global optima. However, the ferior regions and move towards promising regions of the ob-
objective of GSO is to locate as many of the peaks as possi- jective functions space, the second term represents the fitness
ble. In the GSO algorithm, the agents are initially deployed of the agent j’s location at time t + 1.
randomly in the objective function space. The agents carry Movement-phase: During the movement-phase, each glow-
a luminescence quantity called luciferin along with them and worm decides, using a probabilistic mechanism, to move to-
are thought of as glowworms that emit a light whose intensity wards a neighbor that has a luciferin value more than its own.

33
3 3
Luciferin receiver
2 2
Sweep platform
1 1
Luciferin transmitter
0 0
Y

Y
−1 −1
Intensity sensor
−2 −2
Infrared transmitter
−3
−3 −2 −1 0 1 2 3
−3
−3 −2 −1 0 1 2 3
(emission in a
X X
near circular region)
(a) (b)

Figure 1: Emergence of solution after 200 iterations for the


following cases: a) The decision range is kept constant with
rdi = 2 (only one peak is captured). d) Decision range is made Sound pick-up sensor
adaptive according to (4), with rdi (0) = 3 (all three peaks are
captured).
Differential mobile
platform
For each glowworm i, the probability of moving towards a
neighbor j is given by:
Figure 2: Wheeled-mobile robot platform used for multiple
(`j (t) − `i (t)) source localization experiments
pj (t) = P (2)
k∈Ni (t) (`k (t) − `i (t))

where, j ∈ Ni (t), Ni (t) = {j : di,j (t) < rdi (t); `i (t) < A set of 50 glowworms are randomly deployed in a two-
`j (t)}, t is the time (or step) index, di,j (t) is the euclidian dimensional workspace of size 6X6 square units. When the
distance between glowworms i and j at time t, `j (t) is the lu- decision-range is kept constant (rdi = 2), only the global
ciferin level associated with glowworm j at time t, rdi (t) is peak is captured (Figure 1(a)). Even though not shown, for
the variable local-decision range associated with glowworm i rdi = 1.8 and 1.5, two and three peaks are captured, respec-
at time t, and rs is the radial range of the luciferin sensor. Let tively. However, when the decision-range is made adaptive,
the glowworm i select a glowworm j ∈ Ni (t) with pj (t) all the peaks are captured (Figure 1(b)) irrespective of the
given by (2). Then the discrete-time model of the glowworm chosen value of rdi (0).
movements can be stated as:
µ ¶ 3 Glowworms
xj (t) − xi (t)
xi (t + 1) = xi (t) + s (3) Four small-wheeled robots christened Glowworms (named
kxj (t) − xi (t)k
after the glowworm algorithm) were built to conduct our ex-
where s is the step size. periments. Implementation of each sensing-decision-action
Local-decision range update rule: Since we assume that a cycle of the GSO algorithm requires the glowworms to per-
priori information about the objective function is not avail- form subtasks such as relative localization of neighbors, se-
able, in order to detect multiple peaks, the sensor range must lection of a leader among current neighbors, updating of the
be made a varying parameter. For this purpose, we associate associated luciferin and local-decision range, and making a
each agent i with a local-decision domain whose radial range step-movement towards the selected leader. The various hard-
rdi is dynamic in nature (0 < rdi ≤ rsi ). A suitable function is ware modules of each glowworm that are used to achieve the
chosen to adaptively update the local-decision domain range above tasks are shown in Figure 2.
of each glowworm. This is given by:
rdi (t + 1) = min{rs , max{0, rdi (t) + β(nt − |Ni (t)|)}} (4) 3.1 Relative Localization of Neighbors
A circular array of sixteen infrared transmitters placed ra-
where, β is a constant parameter and nt is used as a thresh- dially outward is used as the glowworm’s beacon to ob-
old parameter to control the number of neighbors. The tain a near circular emission pattern around the robot. A
advantage of using an adaptive local-decision domain, in- photodiode-pair mounted on a rotary platform perform a 180o
stead of a constant decision domain for capture of multiple sweep and records the intensity of the received infrared light
peaks, is illustrated in the simulation example shown in Fig- as a function of the angle made by the direction of the source
ure 1. The following multimodal function, which has three with the heading direction of a Glowworm. Local peaks in the
peaks at locations (−0.0093, 1.5814), (1.2857, −0.0048), intensity pattern indicate the presence and location of other
and (−0.46, −0.6292), is chosen for the purpose: glowworms. The received intensity of infrared light follows
J(x, y) = 3(1 − x)2 exp(−(x2 ) − (y + 1)2 ) a inverse square law with respect to distance which is used to
compute range information to others. The angular position of
− 10(x/5 − x3 − y 5 ) exp(−x2 − y 2 ) the peak is approximated as the relative angle to a neighbor.
− (1/3) exp(−(x + 1)2 − y 2 ) (5) Even though the Glowworm locates all others within the per-

34
3.5
90
Case 1 50
120 60

Standard deviation of intensity ( σ (θ) )


40
3

I
30
150 30
Case 2 20

10 2.5

180 0

210 330
1.5

240 300
270
1
0 50 100 150 200 250 300 350
θ (degrees)

Figure 3: Polar plot of the intensity pattern for the two cases Figure 5: Plot of standard deviation of received intensity as a
of ambient light conditions function of angle for the second case

3.5 90
200
120 60
Standard deviation of intensity ( σI (θ) )

150
3
150 100 30

50
2.5

180 0

210 330
1.5

240 300
270
1
0 50 100 150 200 250 300 350
θ (degrees)

Figure 4: Plot of standard deviation of received intensity as a Figure 6: Intensity pattern of a glowworm when a single
function of angle for the first case neighbor is placed in its vicinity. Ambient light conditions
of case 2 are considered.
ception range of the distance-sensor (excepting those that are
eclipsed by other glowworms), it identifies them as neighbors bor. While the peak intensity is used to approximate distance
only when they are located within its current variable local- to the neighbor, the angular position of the peak gives the
decision domain. angle to the neighbor. Figures 7 and 8 show the intensity pat-
Ambient light is a major source that alters the received in- terns when a glowworm consists of two, three, four, and five
tensity pattern of a glowworm from the infrared beacon rings neighbors, respectively. We observe that detection of neigh-
of its neighbors. Therefore, we initially characterize the in- bors becomes difficult when the number of neighbors is more
tensity pattern of a glowworm in the absence of neighbors and than six as the situation leads to merging of neighboring lobes
presence of only ambient daylight. A set of 30 experimental in the intensity pattern. However, the above limitation doesn’t
trials is carried out and the average behavior of the intensity affect the performance of the algorithm since the parameter
pattern is shown as a polar plot in Figure 3 (case 1). Figure nt (number of threshold neighbors) is always chosen as less
4 shows the standard deviation of intensity as a function of than five in number.
angle. The average intensity and standard deviation averaged
over all angles are 32.94 and 1.97, respectively. Later, we 3.2 Luciferin glow/Reception
minimize the amount of ambient light by using black walls The glow consists of an infrared light modulated by an 8-
around the boundary of the workspace and by conducting the bit serial binary signal that is proportional to the glowworms
same experiment during night conditions (Figure 3, case 2). luciferin value at the current sensing-decision-action cycle.
Note that there is a substantial reduction in the intensities in Four emitters that are mounted symmetrically about the glow-
all directions (Iav (case2) = 3.56 ≈ 0.11 × Iav (case1)). worms central axis are use to broadcast luciferin data in the
Therefore, all further experiments are conducted under ambi- Glowworms neighborhood. Two infrared receivers mounted
ent light conditions of case 2. Figure 5 shows the standard de- on a sweep-platform are used as luciferin receptors. In order
viation of intensity as a function of angle for case 2. Note that to avoid problems due to interference between data signals
the average standard deviation is 1.97, which is the same as coming from different neighbors, the receiver sweeps and
that of case 1. Figure 6 shows the intensity pattern of a glow- aligns along the line-of-sight with a neighbor before reading
worm when a single neighbor is placed in its vicinity. The the luciferin data transmitted by it. Using the above scheme,
presence of a single peak in the intensity pattern enables the a minimum threshold separation of 10 cm between neighbors
glowworm to detect the condition that it has a single neigh- was observed to be sufficient in order to distinguish data com-

35
90 90 150
200
120 60 120 60
150
100
A B
150 100 30 150 30
50
50

180 0 180 0

C G2
210 330 210 330
G1 A
240 300 240 300
270 270

(a) (b)

Figure 7: Intensity pattern of a glowworm when two and three


neighbors is placed in its vicinity. Ambient light conditions
of case 2 are considered.
Figure 9: A preliminary experiment to demonstrate the
120
90 150
60 120
90 150
60
sensing-decision-action cycle of a glowworm
100 100
150 30 150 30
50 50
con A in finite number of step-movements.
180 0 180 0

4.2 Sound Source Localization


210 330 210 330
In this experiment, the glowworms localize a sound source
240 300 240 300 which is a loud speaker activated by a square wave signal of
frequency 28 Hz. A microphone based sound sensor enables
270 270

each Glowworm to measure the intensity of sound at its cur-


(a) (b) rent location. We place a Glowworm (A) near the source and
a dummy Glowworm (B) away from the source which is kept
Figure 8: Intensity pattern of a glowworm when four and five
stationary but made to emit luciferin proportional to the inten-
neighbors is placed in its vicinity. Ambient light conditions
sity measurement at its location. Since A is already located at
of case 2 are considered.
the source, it doesn’t get a direction to move. Therefore, it re-
mains stationary at the source location. Initially, since B is in
ing from different glowworm neighbors. the vicinity of C (while A is not), it moves towards B. How-
ever, as it reaches closer to B it senses A and hence, changes
4 Robot Experiments direction in order to move towards A. Since D is closer to A,
it makes deterministic movements towards A at every step. In
4.1 Sensing-action-decision Cycle this manner, the glowworms localize the sound source even-
We describe a preliminary experiment which demonstrates tually. Snapshots from a video of the above experiment are
the basic behavioral primitives used by a Glowworm to taxis shown in Figure 10.
towards source locations. A set of three static beacons (la- A dummy glowworm was placed at the source location
belled as A, B, and C in Figure 9) and two glowworms (la- only for the purpose of demonstration and is not a neces-
belled as G1 and G2 ) are used for this experiment. The pur- sary requirement for the algorithm working. According to
pose of a static beacon is to emulate the presence of a neigh- the glowworm algorithm, a glowworm with the maximum lu-
bor that emits a luciferin value associated with its location ciferin at a particular iteration remains stationary during that
in a near circular region. The static beacons A, B, and C iteration. Ideally, the above property leads to a dead-lock situ-
emit luciferin values of 120, 48, and 80, respectively. The ation when all the glowworms are located such that the peak-
glowworms G1 and G2 are associated with constant luciferin location lies outside the convex-hull formed by the glowworm
values of 90 and 32, respectively. For simplicity, we use a positions. Since the agent movements are restricted to the
maximum-neighbor selection rule in which a Glowworm se- interior region of the convex-hull, all the glowworms con-
lects to move towards a neighbor that emits maximum lu- verge to a glowworm that attains maximum luciferin value
ciferin. The hierarchy of agents (beacons/glowworms) with during its movements within the convex-hull. As a result, all
respect to their luciferin values is given by `G2 < `B < `C < the glowworms get co-located away from the peak-location.
`G1 < `A . Since beacon A emits the maximum luciferin However, the discrete nature of the movement-update rule
and G1 is next in the luciferin-hierarchy, it makes determin- automatically takes care of this problem which could be de-
istic movements towards the beacon A. At initial placement, scribed in the following way. During the movement phase,
the glowworm G2 has G1 , B, and C as neighbors within its each glowworm moves a distance of finite step-size s towards
decision-range. The luciferin value associated with G2 is less a neighbor. Hence, when a glowworm i approaches closer to
than that of static beacons B and C. However, it selects to a glowworm j that is located nearest to a peak such that the
move towards G1 , which is its neighbor with maximum lu- inter-agent distance becomes less than s, i crosses the posi-
ciferin. Therefore, both the glowworms reach the static bea- tion of j and becomes a leader to j. In the next iteration, i re-

36
References
D A [Ando et al., 1999] H Ando, Y Oasa, I Suzuki, and M Ya-
mashita. Distributed memoryless point convergence al-
Sound source gorithm for mobile robots with limited visibility. IEEE
Transactions on Robotics and Automation, 15:818–828,
C B t=0 t=7 1999.
[Cui et al., 2004] X Cui, C T. Hardin, R K. Ragade, and A S.
Elmaghraby. A swarm approach for emission sources lo-
calization. In Proceedings of 16th International Confer-
ence on Tools with Artificial Intelligence, pages 424–430,
Boca Raton, Florida, November 2004.
t = 10 t = 21 [Dorigo and Stutzle, 2004] Marco Dorigo and T Stutzle. Ant
Colony Optimization. MIT Press, Cambridge, Mas-
sachusetts, 2004.
[Goldberg and Richardson, 1987] D E Goldberg and
J Richardson. Genetic algorithms with sharing for multi-
modal function optimization. In Genetic Algorithms and
t = 28 their Applications ICCGA87, pages 41–49, 1987.
t = 32
[Kaipa and Ghose, 2005] Krishnanand N. Kaipa and Deba-
sish Ghose. Detection of multiple source locations us-
ing a glowworm metaphor with applications to collective
robotics. In Proceedings of IEEE Swarm Intelligence Sym-
posium, pages 84–91, Pasadena, CA, June 2005.
t = 36 t = 41 [Kaipa and Ghose, 2006a] Krishnanand N. Kaipa and Deba-
sish Ghose. Glowworm swarm based optimization algo-
rithm for multimodal functions with collective robotics
Figure 10: Demonstration of a sound-source localization task applications. Multi-agent and Grid Systems, 2:to appear,
2006.
mains stationary and j crosses the position of i thus regaining [Kaipa and Ghose, 2006b] Krishnanand N. Kaipa and Deba-
its leadership. This process of interchanging of roles between sish Ghose. Theoretical foundations for multiple ren-
i and j repeats until they reach the peak. This phenomenon dezvous of glowworm-inspired mobile agents with vari-
was supported by simulations in [Kaipa and Ghose, 2006a]. able local-decision domains. In Proceedings of American
However, since real robots cannot move over each other, the Control Conference, pages 3588–3593, Minneapolis, Min-
robot’s behavior should be modified such that it emulates the nesota, June 2006.
above phenomenon when it encounters another robot. [Kaipa et al., 2006] Krishnanand N. Kaipa, Amruth Put-
Real-robot-experiments are in progress and we will report tappa, Guruprasad M. Hegde, Sharschchandra V. Bidar-
more results in the final version of the paper. gaddi, and Debasish Ghose. Glowworm-inspired robot
swarm for simultaneous taxis towards multiple radiation
sources. In Proceedings of IEEE International Conference
5 Concluding Remarks on Robotics and Automation, pages 958–963, Orlando,
We describe a multi-robot system that consists of four Florida, May 2006.
wheeled-robots called glowworms that have the necessary [Olfati-Saber and Murray, 2004] Reza Olfati-Saber and
hardware potential to conduct multiple source localization Richard M. Murray. Consensus problems in networks of
experiments. As inexpensive sensors are used for neighbor agents with switching topology and time-delays. IEEE
localization and communication tasks there is a need to con- Transactions on Automatic Control, 48:1520–1533, 2004.
duct the experiments with sufficient care in order to improve [Parsopoulos and Vrahatis, 2004] K E Parsopoulos and M N
the quality of the experimental results. For instance, we have Vrahatis. On the computation of all global minimizers
shown through experiments that, in spite of using cheap in- through particle swarm optimization. IEEE Transactions
frared based sensors, we are able to achieve good neighbor on Evolutionary Computation, 8:211–224, 2004.
localization by proper conditioning of the ambient light con-
ditions. The preliminary sound localization experiment vali-
dates the suitability of the glowworm algorithm and the multi-
robot system for source localization tasks. Ongoing work in-
cludes construction of twenty six miniature-sized glowworms
and implementation of localization tasks with multiple sound
sources.

37
Efficient Communication Strategy for Cooperative Multirobot Localization∗
Valguima Odakura
Anna Helena Reali Costa
Laboratório de Técnicas Inteligentes - LTI
Depart. de Eng. de Computação e Sistemas Digitais - PCS
Escola Politécnica da Universidade de São Paulo - EPUSP
{valguima.odakura, anna.reali}@poli.usp.br

Abstract robot to send an information is related to the time a new in-


formation is available to each robot, which means that a great
The cooperative multirobot localization problem amount of information can be shared among them.
consists in localizing each robot in a group within In this paper we propose a communication strategy that
the same environment, when robots share infor- aims at reducing the amount of information transferred across
mation in order to improve localization accuracy. robots without reducing the accuracy of their pose estimation.
This paper proposes a communication strategy to The key idea is that a robot should only communicate when it
be used in a cooperative multirobot localization ap- acquires useful information to provide for the other robots in
proach aiming at reducing the amount of informa- a group or when a robot ask for extra information to be able
tion shared among robots without, degrading the to localize itself.
accuracy of their pose estimation. The key idea is This paper is organized as follows. Section 2 presents the
that a robot should only communicate when it ac- basic statistical mechanisms for cooperative multirobot local-
quires useful information to provide for the other ization. Section 3 shows some useful information that can be
robots in a group or when a robot ask for extra in- shared among robots in order to improve the localization ac-
formation to be able to localize itself. Experimental curacy. Our strategy for using less data to share among multi-
results show that it is possible to reduce the amount ple robots is presented in Section 4. Experimental results are
of data exchanged among the robots and still obtain reported in Section 5. Finally, our conclusions are derived in
accurate localization. Section 6.

1 Introduction 2 Cooperative Multirobot Localization


Research in multirobot systems aims at providing robots with The ability to cooperate by exchanging information during
capabilities to perform tasks that could be dangerous, need localization is particularly attractive for two main reasons: it
very accurate results or are time consuming for human be- may be easier and less expensive to use several simple co-
ings. The possibility to have a group of robots performing operative robots than a single powerful one to accomplish a
tasks such as search and rescue or surveillance has motivated difficult task, and it may be more efficient and faster for a
developments in cooperative multirobot localization, since lo- group of robots to cover vast areas than a single one [Cao et
calization is one of the main abilities an autonomous mobile al., 1997]. In order to accomplish tasks more quickly and ro-
robot should have. bustly than a single robot, multiple robots can be used, which
poses the multirobot localization problem.
Cooperation in multirobot systems can be achieved by
The cooperative multirobot localization problem consists
transferring information across multiple robots, which tackles
in localizing each robot in a group within the same environ-
some issues such as: what kind of information is important to
ment, when robots can detect each other and exchange infor-
send to other robots, how to incorporate the information re-
mation in order to improve their localization accuracy.
ceived into its own belief about its state, and when to send
Representative recent works in cooperative multirobot lo-
this information. In the multirobot localization research area
calization are from Roumeliotis and Bekey (2002), and Fox et
there are some works about the first two issues, but there is not
al. (2000), and they use Kalman filter and Particle filter as lo-
much about the third. The information shared among robots
calization algorithms, respectively. Probabilistic approaches
in a group is their pose beliefs and the location of another
have been used with success in the localization problem.
robot relative to its own, and this information is integrated
Markov localization (ML) was initially designed for a sin-
into their pose beliefs by using probabilistic approaches [Fox
gle robot [Fox et al., 1999], and an extension of ML that aims
et al., 2000], [Roumeliotis and Bekey, 2002], [Madhavan et
at solving the multirobot problem is presented by Fox et al.
al., 2002] and [Howard et al., 2003]. The best time for a
(2000).

This work was supported by Fapesp grant 01/14246-4, Capes In ML, p(xt = x) denotes the robot’s belief over its pose
grant BEX 0803/03-0, CNPq and Capes/Grices 099/03. x at time t, where xt is a random variable representing the

38
robot’s pose at time t, and x = (x, y, θ) is the pose composed has been detected, it blocks the ability to see the same robot
of the robot’s (x, y) position and its heading θ. This robot’s again until the detecting robot has traveled a pre-defined dis-
pose belief represents a probability distribution over all the tance. This communication significantly improves the local-
space of xt . ization accuracy, if compared to a less communicative local-
ML uses two models to localize a robot: a motion model ization approach.
and an observation model. The motion model is specified as
a probability distribution p(xt = x|xt−1 = x0 , at−1 ), where 3 Refining Localization Estimates
xt is a random variable representing the robot’s pose at time t,
at is the action or movement executed by the robot at time t. Cooperative multirobot localization takes advantage of robot
The movement can be estimated, for example, by odometers detection measurements by exploiting relative information
on the wheels. In ML the update equation is described as: between different robots’ poses [Roumeliotis and Bekey,
X 2002], [Fox et al., 2000]. Every time one robot detects each
p̄(xt = x) = p(xt = x|xt−1 = x0 , at−1 )p(xt−1 = x0 ), other, they communicate and share their pose beliefs. Fox
x0 et al. (2000) propose the use of the detection model to per-
(1) form pose update only for the two meeting robots. However,
where p̄(xt = x) is the probability density function before another way to improve the localization accuracy would be
incorporating sensor measurements derived from the obser- by broadcasting detection information provided by the two
vation of the environment at time t. meeting robots to other robots in the group; and by using
The observation model is used to incorporate data captured negative sights, i.e., events where a robot does not see an-
by exteroceptive sensors, such as proximity sensors and cam- other robot. Another situation occurs when a robot see more
eras, and it is expressed as p(xt = x|ot ), where ot is an ob- than one robot at the same time, the multidetection situation.
servation sensed at time t. The update equation is described In the next section these techniques to refine localization es-
as: timates are detailed.
p(ot |xt = x)p̄(xt = x)
p(xt = x) = P , (2)
x0 p(ot |xt = x )p̄(xt = x )
0 0 3.1 Broadcasting Detection Information
where p(xt = x) is the probability density function after in- The key idea in broadcasting detection information to other
corporating observations of the environment at time t. robots in a group is that all of them can benefit from the
At the beginning, p(x0 = x) is the prior belief over the shared information derived from a single detection (when
initial pose of the robot. If the initial pose is unknown, robot n meets robot m) [Odakura and Costa, 2005].
p(x0 = x) is initialized by a uniform distribution. Suppose robots m, n and k in a group. When robot n meets
In order to accommodate multirobot cooperation in ML it robot m, if they transfer their pose estimates to robot k, the
is necessary to add a detection model to the previous obser- latter can use this information to constrain its pose estimate,
vation and motion models. The detection model [Fox et al., by excluding the belief of being at m and n position, since
2000] is based on the assumption that each robot is able to de- only one robot can occupy a location in the environment at
tect and identify each other and furthermore, the robots can a time. Robot k can also conclude that its pose is not in a
communicate their probabilities distributions to other robots. location between the two meeting robots, otherwise, robot n
Let’s suppose that robot n detects robot m, m 6= n, and that would have detected robot k instead of robot m. It is sup-
it measures the relative distance between them, so: posed that the detection sensor can sense robots in front of
the detecting robot. For example, the detection sensor could
p(xnt = x) = (3) be a camera (pointing forward) to identify the robot and a
X proximity sensor to measure the relative distance.
p̄(xnt = x) p(xnt = x|xm 0 m 0
t = x , rn )p̄(xt = x ),
The broadcasting of the probabilities distributions of the
x0
two meeting robots to other robots in the group can be per-
where xnt represents the pose of robot n, xm t represents formed by the robot n, the one that detected robot m. When
the pose of robot m and rn denotes Pthe measured distance robot m updates its pose based on the information commu-
n m
between robots. The calculation x0 p(xt = x|xt = nicated by robot n, it communicates back its updated prob-
0 m 0
x , rn )p̄(xt = x ) describes the belief of robot m over the ability distribution, pm . Robot n then calculates a probabil-
pose of robot n. Similarly, the same detection is used to up- ity distribution that will be communicated to the non-meeting
date the pose of robot m. robots:
To summarize, in the multirobot localization algorithm
each robot updates its pose belief whenever a new informa- pd (xkt = x00 ) =1 − (pn (xnt = x) + pm (xm 0
t = x )+
tion becomes available: Equation 1 is used when a motion pnm (xnt = x, xm 0
t = x )), (4)
information is available; Equation 2 is used when an envi-
ronment observation occurs; and Equation 3 is used when a where pd (xkt = x00 ) is the detection information communi-
detection happens. cated to the non-meeting robots, pn (xnt = x) is the probabil-
Once a detection occurs, the two robots involved in the de- ity distribution of robot n, pm (xm 0
t = x ) is the probability
tection share their probabilities distributions and relative dis- distribution of robot m and pnm (xt = x, xm
n 0
t = x ) is the
tance, and update their pose beliefs according to the detection probability distribution of a robot being in a location between
model. Fox et al. (2000) included a counter that, once a robot the two meeting robots.

39
Whenever a detection between robot n and robot m occurs, In this case, the negative detection measurement can provide
the probability distribution function of each robot k can be the useful information that a robot is not located in the visibil-
updated, with k 6= n and k 6= m, in the following way: ity area of another robot. In some cases, it can be an essential
information for improving the robot’s belief of its pose when
pd (xkt = x)p̄(xkt = x)
p(xkt = x) = P k k
. (5) it is located at a part of the environment with few discriminant
x0 pd (xt = x )p̄(xt = x )
0 0
landmarks.
where xk , for k = 0, · · · , n, represents all robots which do Consider two robots within a known environment. If robot
not take part in a detection. 1 does not detect robot 2 at a given time, a negative informa-
Figure 1 illustrates an example of this situation. In the ex- tion is reported, which states that robot 2 is not in the visibility
ample there are 3 robots, R1, R2 and R3 in their actual poses. area of robot 1, as depicted in Figure 2(a).
The robots have different knowledge about their poses: R1 is
sure of its pose, R2 is completely unsure and R3 is in doubt
about 2 poses, represented as dashed cells. If R1 meets R2,
R2 becomes certain about its pose and R3 keeps its previ-
ous pose knowledge. However, if the detection information is
transferred to R3, it becomes certain about its pose, because
the possibility of being between R1 and R2 is eliminated.
The example shows the improvement in group localization (a) (b)
obtained if detection information is propagated to all robots
in the group. Figure 2: (a) Negative information. (b) Occlusion in the field
of view.

  The information gathered from Figure 2(a) is true if we


consider that there are no occlusions. In order to account for
R1 R2
occlusions it is necessary to sense the environment to iden-
 tify free areas or occupied areas. If there is a free space on
the visibility area of a detection sensor, than there is not an
R3
occlusion. Otherwise, if it is identified as an occupied area it
means that the other robot could be occluded by another robot
Figure 1: Example of a situation with three robots: R1 is sure or an obstacle. In this case it is possible to use geometric in-
of its pose, R2 is completely uncertain and R3 is in doubt ference to determine which part of the visibility area can be
about its pose (the two dashed cells). used as negative detection information (gray area in Figure
2(b)).
Let’s suppose that robot m reports a negative detection.
3.2 Negative Information The negative detection model, considering the visibility area
Negative evidence is gathered when a robot fails to detect a of the robot and the occlusion area, becomes:
m m
data that it expects to see. Most techniques for state estima- p(d− t |xt = x, v, obs)p(xt = x)
tion update state belief by using a sensor model every time a p(xm−
t = x) = P − m m
,
x0 p(dt |xt = x , v, obs)p(xt = x )
0 0
sensor reports a measurement. However, it is possible to get (6)
useful state information from the absence of sensor reports where d− t is the event of not detecting any robot and x m cor-
relative to expected landmark measurements. responds to the pose of robot m, the robot that reports the
Negative information has been applied to target tracking by negative detection information. The variables v and obs rep-
using the event of not detecting a target as evidence to update resent the visibility area and the identified obstacles, respec-
the probability density function [Koch, 2004]. In that work, tively.
a negative information means that the target is not located in Whenever a robot m does not detect another robot k, the
the visibility area of the sensor and since the target is known probability distribution function of each robot k can be up-
to exist it is certainly outside this area. dated, with k 6= m, in the following way:
In the robot localization domain, Hoffmann et al. (2005)
p(xm− = x)p̄(xkt = x)
consider as negative information the absence of landmark p(xkt = x) = P t
m− , (7)
sensor measurements. Occlusions are identified by using a x0 p(xt = x0 )p̄(xkt = x0 )
visual sensor that scans colors of the ground to determine where xk , for k = 0, · · · , n, represents all robots which were
if there is free area or obstacle. The environment is a soc- not detected.
cer field in green with white lines. So, if a different color is The consequence of using the negative detection model is
identified, it means that an obstacle could be occluding the that it can solve certain localization problems that are unsolv-
visibility of a landmark. able for a group of robots that relies only on positive detection
In the cooperative multirobot localization problem negative information. A typical situation is the case that the robots are
information can also mean the absence of detections (in the in different rooms, in a way that one robot cannot detect each
case that a robot does not detect another one), which config- other. In this situation, each robot can constrain its pose belief
ures a lack of group information [Odakura and Costa, 2006b]. every time a negative detection is reported.

40
3.3 Multidetection i.e., the sensor reading from the most accurate sensor will be
A multidetection situation occurs when one robot detects two used to update the poses beliefs of the robots.
or more robots at once or one robot sees other robot that sees Given a weighted graph G, the multidetection problem can
another one [Odakura and Costa, 2006a], as shown in Figure be formulated as finding a subset of edges E 0 from E, in a
3. way that all nodes of V are incidents on edges in E 0 , and
G0 = (V, E 0 ) has the minimum total weight.
At any moment it is possible to represent a detection con-
figuration of the robots by a graph G = (V, E). Given G, the
R2
R2
following steps should be performed in order to update the
R1
R0
R1
R3 R0 R3
robots’ poses:
1. Reduce all bidirectional edges to unidirectional ones.
2. Weight each edge (Ri , Rj ) by Hij .
(a) (b)
3. Choose a subset of edges E 0 ⊆ E, so that G0 = (V, E 0 )
has minimum weight.
Figure 3: Multidetection: (a) Robot R0 detects R1, R2 and
R3 (lines surrounding robots R1, R2 and R3 represent R0 4. Sort the edges of E 0 by their minimum weight.
field of view); (b) R0 detects R1, R1 detects R2, and R2 5. Update robots’ beliefs following the ranked edges of E 0 .
detects R3 at the same moment. Following this procedure it is possible to find a detection
update order to deal with multidetection. As an example, con-
The key idea to solve the multidetection situation is that sider the robot configuration depicted in Figure 3(b). What is
robots more confident about their poses have more useful in- the update order that obtains the best information gain? If we
formation to share, so they should communicate first, propa- suppose that the minimum entropy values for the meetings
gating their knowledge to the rest of the group. are, in order, (R1, R2), (R0, R1) and (R2, R3), then, the up-
A measure that can be used to estimate the uncertainty of date order should be (R1, R2), (R0, R1) and (R2, R3). So,
a robot’s pose belief is the expected entropy value H. The each robot pair updates their poses beliefs (Equation 3), fol-
entropy measures the amount of information provided by the lowing the order presented, as shown in Figure 4(a). Now,
probability density function, and is defined as [Russell and suppose that the minimum entropy values for the meetings
Norvig, 1995]: are, in order, (R0, R1), (R2, R3) and (R1, R2), then, the up-
X date order is (R0, R1) and (R2, R3) and each robot updates
H =− p(xt = x) log(p(xt = x), (8) its pose belief using Equation 3, as can be seen in Figure 4(b).
x It is worth noting that in this case it is not necessary to update
where p(xt = x) denotes the robot’s belief that it is at pose x. (R1, R2), once R1 and R2 are already updated in (R0, R1)
The entropy value H is zero if the robot is perfectly localized and (R2, R3).
in one pose, and the entropy value achieves maximal value
if the robot is uncertain about its pose, which means that the 2 nd 1st 3 rd
probability density function is uniformly distributed. R0 R1 R2 R3
The information gained from the meeting of two robots
is given by the maximum value of the entropy between the (a)
robots, Hij = max(Hi , Hj ), where Hij represents the en-
tropy value of a meeting of robots Ri and Rj .
1st 2 nd
Let’s consider a group of robots taking part in a multide-
tection configuration. This configuration can be represented R0 R1 R2 R3
as a graph G = (V, E), where V is the set of nodes and E (b)
is the set of directed edges. Let each robot be represented
as a node, and each detection be represented as a directed
Figure 4: Poses beliefs update order: (a) 1st : (R1, R2), 2nd :
edge, linking the two meeting robots, pointing to the detected
(R0, R1), and 3rd : (R2, R3). (b) 1st : (R0, R1) and 2nd :
robot. In this way, the configuration that each robot detects
(R2, R3).
all the others can be represented as a complete directed graph.
All edges (Ri , Rj ) in E are labeled by a weighted value that
corresponds to the entropy Hij .
If the graph presents bidirectional edges it means that two 4 Communication Strategy
robots are detecting and being detected by each other at the Communication among robots in localization task is a way to
same time. In this case, two types of robots should be consid- increase the robots’ perception about their states in the world.
ered: (1) homogeneous or (2) heterogeneous, regarding their It is achieved by using observation from other robots to better
exteroceptive sensors capabilities, i.e., the error introduced in estimate their poses. Once the aim of communication in robot
their relative distance measurements. In the first case, we just localization is to contribute to a more accurate pose estima-
eliminate one direction of the bidirectional edge; in the latter, tion, it is possible to use a communication strategy to increase
we just keep the direction pointing to the less accurate robot, the localization accuracy and at the same time to diminish

41
the amount of information communicated and the computa- compose the graph and sort the edges.
tion time. The computation resource is related to the task of Once the edges of the graph are sorted, the robots assume
updating the robot’s pose by using the new information re- the role of sender (or receiver) and can start the communica-
ceived. tion process to exchange their poses beliefs and they can ap-
The communication strategy should address questions such ply the communication strategy to decide if they would accept
as what to communicate, which robots should participate in a sender or a receiver role and take part in the communication.
the communication and when to do it. The question what to
communicate refers to the content of the message, and in lo- Communication Robots Conditions for
calization task it is the robot’s pose belief. The answer to Situation Involved Communication
which robots should communicate is determined by the oc- Single Detection n and m if (Hn ≤ ST ) and
currence of a detection, so all the robots involved in a de- (Hm ≥ RT )
tection should communicate. Finally, the answer to when to Detection all if (Hn ≤ ST )
communicate can be guided by a new evidence; when robots Broadcasting for all k
detect each other they can start communication. (k 6= n) and (k 6= m)
However, in some situations, a communication among the if (Hk ≥ RT )
robots do not contribute to improve their poses estimations. Negative all if (Hm ≤ ST )
For example, when two robots are certain about their poses, Detection for all k, (k 6= m)
if (Hk ≥ RT )
this communication does not improve their poses beliefs, or Multidetection M robots according to the robot rank
when two robots are completely lost and exchange messages, if (Hs ≤ ST ) and
the communication does not help them to localize themselves. (Hr ≥ RT )
We contribute a communication strategy which is based on
the utility of a message exchange among the robots. A com- Table 1: Communication strategy.
munication between two robots is useful if it improves the
localization accuracy of at least one of the robots. Thus, the
The proposed communication strategy is summarized in
criteria to reduce communication are related to how certain
the Table 1. The first column refers to the communication
robots are about their poses.
situations identified, where the single detection situation cor-
To reduce the amount of information communicated responds to the detection situation described by Fox et al.
among robots, and as a consequence, the processing time, it (2000). Every time one of these situations occurs, the com-
is possible to avoid some of the communication situations. munication strategy is used to minimize the communication
Each robot can decide on the basis of its own knowledge if it requirements while obtaining as much information as possi-
will take part in a communication or not. ble. In the multidetection situation, M represents a group of
The utility of an information to share is given by H, which robots that detect each other simultaneously.
is the entropy value of a robot. If H ≤ ST then the robot con-
cludes that it could send an useful information to the group,
where ST is the sender threshold. The robot that receives a 5 Experiments
message will accept the information received if H ≥ RT , We perform four experiments by using simulated robots in or-
where RT is the receiver threshold. der to evaluate the communication strategy proposed in this
The proposed communication strategy is fully decentral- paper. Each robot is equipped with a proximity sensor to mea-
ized once each robot can decide based on its own pose belief sure the distance to the walls in the environment, and a detec-
if it takes part in a communication or not. If a robot receives tion sensor that can identify other robots and measure their
a message and it is certain about its pose, it will not update its relative distance. All robot’s sensors are assumed to be cor-
pose belief with the received information, reducing compu- rupted by Gaussian noise. The robots know the environment
tation time, and it will not proceed with the communication model and they do not know their initial poses in the environ-
process, avoiding messages that are not necessary. ment. The robots move in the environment and collect infor-
The communication strategy can be used in different situa- mation in order to improve their localization beliefs. For each
tions in cooperative multirobot localization: positive or nega- experiment several runs are performed and in each run robots
tive detection, broadcasting and multidetection, in order to re- start in a different initial pose. A run finishes when all robots
duce the amount of messages transmitted, defining conditions achieve 0.9 of certainty about its poses. The robot environ-
for the information exchanges. And, as a consequence, re- ment model is a grid with 0.2 × 0.2 meters cells, and angular
ducing the amount of computation performed by robots when resolution of 90 degrees. In the experiments the thresholds
updating their poses beliefs. RT and ST have the same value, denoted as EV . We tested
In the multirobot cooperative localization when a multide- two values for the threshold EV , one is the maximum en-
tection situation occurs, it is represented by a graph, where tropy value and the other is an entropy value near zero, which
its edges guide the messages exchange among the robots. So, means that the robot is almost certain about its pose.
it is needed that all robots represented in the graph commu- The first experiment uses eight robots in the environment of
nicate with each other in order to find the update order in the Figure 5(a). The test environment has dimensions of 6×6 me-
graph. The graph can be composed in a distributed manner, ters and is a symmetric environment with four corridors, A,
if each robot communicate to the others short messages con- B, C, and D. All robots move clockwise at the same pace, fol-
taining their entropy values, so each robot can independently lowing the corridors. The graph in Figure 6(a) plots average

42
entropy values over all robots as function of steps, averaged 7(a). The normalized maximum entropy value is EV = 1.00
over eight runs for the communication situation of detection and is compared to EV = 0.03. The reduction in N C is
broadcasting. Entropy values are zero if the robot is perfectly from 60 to 34, without a significant loss in the localization
localized in one pose, and the entropy value achieves max- accuracy. The maximum difference between the entropy val-
imal value if the robot is uncertain about its pose. At first ues for the two approaches is less then 0.1.
the robots do not know their poses and the expected entropy The last experiment uses the same environment shown in
has the maximal value. The entropy value drops as soon as Figure 5(c) and ten robots. The result depicted in Figure 7(b)
the robot updates its pose with the information reported from shows multidetection localization with ten robots, averaged
the environment. Each step means an update step by the in- over four runs over all robots. The reduction in N C is from
corporation of some measured information (observation of 43 to 36, without a significant loss in the localization accu-
the environment, movement, detection or propagation). The racy. The maximum difference between the entropy values
solid line represents a more communicative approach, where for the two approaches is less then 0.2.
EV = 1.00 is the normalized maximum entropy value, that
corresponds to the complete uncertainty about robot’s pose, 1
i.e., in this case the robots communicate every time a new EV: 0.03 − NC: 46.0
EV: 1.00 − NC: 135.0
0.9
information is available. The dashed line represents a less
communicative approach, where EV = 0.03. The N C is the 0.8

number of communication events that occurs during the ex- 0.7

periment. The reduction in N C is from 135 for EV = 1.00 0.6

Entropy
to 46 for EV = 0.03, without a significant loss in the lo- 0.5

calization accuracy. The maximum difference between the 0.4

entropy values for the two approaches is less then 0.1. 0.3

0.2

B 0.1

0
0 20 40 60 80 100 120 140 160 180
Number of steps
A C

(a)

D 1
EV: 0.03 − NC: 40.0
0.9 EV: 1.00 − NC: 88.0
(a) (b)
0.8

0.7

0.6
Entropy

0.5

0.4

(c) 0.3

0.2

Figure 5: Test environment (white represents free area): (a) 0.1

Hallway with four corridors denoted by A, B, C and D. (b) 0


0 20 40 60 80 100 120 140 160 180
Open environment. (c) Rooms and corridor. Number of steps

(b)
The second experiment uses the environment depicted in
Figure 5(b), which has dimensions of 4 × 4 meters. Figure
6(b) shows entropy values as function of steps, averaged over Figure 6: Comparison between less and more communica-
six runs over all robots for the communication situation of tive approaches to: (a) Detection broadcasting. (b) Negative
negative detection. The solid line represents a more commu- detection.
nicative approach, where EV = 1.00 is the maximum en-
tropy value. The dashed line represents a less communicative
approach, where EV = 0.03. The reduction in the num- 6 Conclusions
ber of communications N C is from 88.0 to 40.0, without a In this paper we propose a unified approach defining different
significant loss in the localization accuracy. The maximum communication situations, which incorporates single detec-
difference between the entropy values for the two approaches tion [Fox et al., 2000], broadcasting detection [Odakura and
is less then 0.1. Costa, 2005], negative detection [Odakura and Costa, 2006b],
The third experiment uses the environment depicted in Fig- and adds the multidetection situation. In this unified approach
ure 5(c), which has dimensions of 8 × 3.6 meters. Results for we define the order the robots should communicate so that
the single-detection multirobot localization with ten robots, the amount of information communicated is reduced. In this
averaged over four runs over all robots, are shown in Figure case, the multidetection approach outperforms the approach

43
1
[Hoffmann et al., 2005] Jan Hoffmann, Michael Spranger, Daniel
EV: 0.03 − NC: 34.0
EV: 1.00 − NC: 60.0
Gohring, and Matthias Jungel. Making use of what you dont
0.9
see: Negative information in markov localization. In IEEE/RSJ
0.8
International Conference on Intelligent Robots and Systems -
0.7 IROS’05, pages 854–859. IEEE, 2005.
0.6 [Howard et al., 2003] Andrew Howard, Maja Mataric, and Gaurav
Entropy

0.5 Sukhatme. Putting the ’i’ in ’team’: an egocentric approach to


0.4
cooperative localization. In IEEE International Conference on
Robotics and Automation - ICRA’03, pages 14–19. Kluwer Aca-
0.3
demic Publishers, 2003.
0.2
[Koch, 2004] Wolfgang Koch. On negative information in track-
0.1
ing and sensor data fusion: discussion of selected examples. In
0
0 10 20 30 40 50 60 70 80 90 100
Seventh International Conference on Information Fusion, pages
Number of steps 91–98, 2004.
(a) [Madhavan et al., 2002] Raj Madhavan, Kingsley Fregene, and
Lynne E. Parker. Distributed heterogeneous outdoor multi-robot
1 localization. In IEEE International Conference on Robotics and
0.9
EV: 0.03 − NC: 36.0
EV: 1.00 − NC: 43.0
Automation – ICRA’02. IEEE Press, 2002.
0.8
[Odakura and Costa, 2005] Valguima Odakura and Anna H. R.
Costa. Cooperative multi-robot localization: using communica-
0.7
tion to reduce localization error. In International Conference on
0.6
Informatics in Control, Automation and Robotics - ICINCO’05,
Entropy

0.5 pages 88–93, 2005.


0.4 [Odakura and Costa, 2006a] Valguima Odakura and Anna H. R.
0.3 Costa. Multidetection in multirobot cooperative localization. In
0.2
First IFAC Workshop on Multivehicle Systems - MVS’06, pages
19–24, October 2006.
0.1

0
[Odakura and Costa, 2006b] Valguima Odakura and Anna H. R.
0 10 20 30 40 50 60 70 80 90 100
Number of steps
Costa. Negative information in cooperative multirobot localiza-
tion. In Advances in Artificial Intelligence – IBERAMIA-SBIA
(b) 2006, volume LNAI 4140, pages 552–561. Springer, 2006.
[Roumeliotis and Bekey, 2002] Stergios I. Roumeliotis and
Figure 7: Comparison between less and more communicative George A. Bekey. Distributed multirobot localization. IEEE
approaches to: (a) Single detection. (b) Multidetection. Transactions on Robotics and Automation, 18(2):781–795, 2002.
[Russell and Norvig, 1995] Stuart Russell and Peter Norvig. Artifi-
cial Intelligence: A Modern Approach. Prentice-Hall, 1995.
proposed by [Fox et al., 2000] that uses a set of single de-
tections procedures when a detection involves more than two
robots. The use of the multidetection approach reduced from
N C = 60 for the approach of [Fox et al., 2000] to N C = 36.
We also propose a communication strategy that defines
conditions for the information exchanges that should be sat-
isfied aiming at reducing the amount of messages transmitted
without loss in the localization accuracy. As a result the num-
ber of communication events are reduced in all situations.
The results are very encouraging since they illustrate the
efficiency and effectiveness of the proposal.

References
[Cao et al., 1997] Yu Uny Cao, Alex S. Fukunaga, Andrew B.
Kahng, and Frank Meng. Cooperative mobile robotics: an-
tecedents and directions. Autonomous Robots, 4:1–23, 1997.
[Fox et al., 1999] Dieter Fox, Wolfram Burgard, and Sebastian
Thrun. Markov localization for mobile robots in dynamic en-
vironments. Journal of Artificial Intelligence Research, 11:391–
427, 1999.
[Fox et al., 2000] Dieter Fox, Wolfram Burgard, H. Kruppa, and
Sebastian Thrun. A probabilistic approach to collaborative multi-
robot localization. Autonomous Robots, 8(3):325–344, 2000.

44
Active Localization of Multiple Robots by Selection of Best Frontiers

Rakesh Goyal, K Madhava Krishna and Shivudu Bhuvanagiri

IIIT Hyderabad, Gachibowli, Hyderabad – 500 032, INDIA (mkrishna@iiit.ac.in)

Abstract their states can help in resolving each others ambiguity if


We envisage a multi robotic scenario where several they arrive at the right positions. These positions are
robots are in ambiguity about their states and frontiers in the sense used in [Yamauchi, 1997]. Since the
require help of other robots to overcome their robots are already in ambiguity about their states they can
ambiguity. In such a scenario the method presented only reach a frontier area with respect to their current
here moves the robots to locations where visibility in a probabilistic sense. The algorithm computes
probability of eliminating several ambiguous states the probability of robots reaching or occupying those
among multiple robots is a maximum. There are frontier positions that would lead to a unique hypothesis for
two essential aspects to this problem. In the first a set of robots in the first phase. Once shepherded to these
aspect all robots move to their best frontiers from positions in a probabilistic sense some of the robots could
their current hypothesis state. The best frontiers are end up attaining a unique hypothesis. In the second phase
those that have the highest probability of achieving such localized robots are then sent to locations where they
a unique hypothesis if the robots were to arrive could help in localizing as many robots as possible. In
there. In the second aspect robots that do not have principle a localized robot can localize all others if it is able
any ambiguity about their state are dispatched to make a measurement on them. The algorithm alternates
locations where they can assist a maximum of between the two phases till ambiguity in states of all robots
robots to overcome their ambiguity. Ambiguity is removed. This paper presents the theory and formulation
here is used in the sense of more than one for only the first phase of the method evidently the more
hypothesis of a robot’s state. The method presented interesting phase due to strict page limits. The second phase
has been tested in both simulation and real-time on is only cursorily mentioned for sake of completion.
robots and its efficacy verified. The authors argue that the novelty of this work lies in
that it is the first such work to be reported where multiple
1 Introduction hypothesis states in more than one robot is resolved in an
active localization sense. The method presented here would
Localization has proven to be the most critical aspect of find utility in several multi-robotic applications where
mobile robot navigation. Generally indoor localization is localization issues dominate. It is useful when robots
achieved with the aid of maps over and above dead brought alive need to estimate their initial state. It is useful
reckoning. However in instances where the initial state of when several robots tend to get confused about their states
the robot is unknown maps can lead to conflicting due to large localization errors and need to globally localize
hypotheses of a robot’s state. This is a pathological problem once in a while in the map provided or built so far. It is once
in global localization and is well documented [Fox et. al, again useful if one or more robots in a multi-robotic
1998]. In such a case the robot moves to locations where the endeavor could get kidnapped by malicious agents. While
configuration of obstacles enables zeroing onto a single the context mentioned here is resolving multiple hypotheses
hypothesis from the earlier multi-hypotheses situation often among multiple robots this effort can also be seen in the
called as active localization. This paper explores the idea of larger context of cooperative localization where robots also
using robots apart from obstacles as hypotheses resolving serve the function of mobile landmarks and come together
agents in such situations. Robots that are in ambiguity of to estimate their state whenever required.

45
the robots in a partition all share the same virtual frontiers.
Frontiers of different robots are shared if they are close in an
2 Related Work euclidean sense, their distance less than a threshold. The
In general work on active localization has tended to be associated frontier set is denoted by F1, F2, …, Fn with
limited when compared with passive localization. The cardinality n fi . The base pairs are then
pioneering work in this area has been from [Burgard et. al,
1998] and [Dudek et.al, 1998]. In [Burgard et. al, 1998] a
(n f 1 , n R1 ), (n f 2 , n R 2 ),..., (n fn , n Rn ). The partitioning reduces
method of active localization based on maximum the number of combinations to be considered while moving
information gain was presented.. Dudek and others [Dudek to best frontier locations. Figure 1 shows a robot with two
et.al, 1998] presented a method for minimum distance hypotheses states, two distinct frontiers (one on top and one
traversal for localization that works in polygonal at bottom) with two virtual frontiers each.
environments without holes that they show to be NP-Hard.
A randomized version of the same method was presented in
[Dudek et.al, 2005]. In [Kaelbing et.al, 1995] an approach
for guiding the robot to a target location is proposed when
its current position is not known accurately.
As far as authors knowledge goes there has been no
reported work dealing active localization of several robots.
In the context of multi robotic localization Fox and others
extended their earlier global localization methods to a multi-
robotic setting in [Fox et.al, 2000]. A cooperative Figure1 A robot with two hypotheses states H1 and H2. The distinct
localization approach was also presented based on Monte- frontiers for each state are two one on top and other at bottom
Carlo methods in [Howard et.al, 2003]. In [Rekelitis et. al, denoted as ft1 and fb1 for state H1 at the boundary of the visibility
1997] a method is presented in which in which team polygon. There are two virtual frontiers (ft1, ft2) for every distinct
members carefully coordinate their activities in order to frontier due to two hypotheses states.
reduce cumulative odometric errors. However these
methods did not consider active localization in a rigorous 3.1 Some preliminaries
sense of guiding robots to positions. Let f ij (rij ) denote the event that a robot ri belonging to
set R j reaches the frontier f i of set F j for the base pair
3 Methodology
Consider a workspace W populated by robots, NR in (n fj , n Rj ). Then P( f ij (rij )) is the probability associated with
that event. Let f ij (rij ), f mn (rmn ) denote the intersection
number, each of them having multiple hypotheses of their
states. The problem is to move these NR robots to frontier
locations such that each of the robots localizes to a unique event that ri reaches fi and rm reaches f m .
state with minimal number of frontiers visited. One such ( )
P f ij (rij ), f mn (rmn ) denotes the probability of that event or
robot r possesses distinct and possibly virtual frontiers. The
distinct frontiers are those mentioned in [Yamauchi, 1997] the occupancy probability of the frontiers f i and f m . We
and computed from a hypothesis position of r. The distinct compute
{ } ( ) (
Pm n Rj ,1 Pm n Fj − 1, n Rj − 1 )
frontiers are denoted as f1d , f 2d ,..., f md , the superscript d
( )
P f ij (rij ) =
(
Pm n Fj , n Rj )
.......... (1)
signifying its distinctness. If r has n f hypotheses locations
of its state, each of its distinct frontiers would have Pm(x, y ) denotes the permutation of x things taken y at a
n f copies that are called virtual frontiers. A set of robots R ( )
time. P f ij (rij ), f mn (rmn ) is the probability of two
with cardinality nR all sharing the same n f virtual frontiers independent events since the robots belong to different sets
is defined as a base pair and denoted by (n f , nR ) . ( )
and hence given by P f ij (rij ) P( f mn (rmn )) . However
Henceforth the term frontier refers to a virtual frontier ( )
P f ij (rij ), f kj (rkj ) is the probability of events that are not
unless explicitly mentioned as distinct. Figure 1 depicts the independent and given by
distinct and virtual frontiers for a robot that has two ( )
hypotheses states H1 and H2. ( ( ) ( ))
P f ij rij , f kj rkj =
Pm n Rj ,2
( ) Pm(n Fj − 2, n Rj − 2) ...... (2)
Pm n Fj , n Rj
The NR robots are partitioned into mutually exclusive
sets R1, R2,…, Rn where the cardinality of set Ri is nRi and Hence P ( f ij (rij ), f kj (rkj )) is the probability that at-least one
∑ nRi = N R . The partition is done in a manner such that of the frontiers fi or f k is left unoccupied. Then
i

46
( ( ) ) ( ( ) ( )) evaluates to Intersection
P f ij rij , f mn (rmn ) I P f kj rkj , f pn r pn terms of order greater than 3 are not considered

P ( f (r ), f (r )) + P ( f (r ), f (r ))− P ( f (r ), f (r ))
in (4b) due to their negligible contributions.
−1 −1 −1 −1

D2. I P ( f , r , q ;1)I P( f , r , s; l ) is then given by a similar


l
kj kj pn pn kj kj pn pn kj kj pn pn
i
Here the superscript -1 indicates that the cardinality of the
i =1
frontier set is reduced by 1 while distributing robots among
union of terms as in (4b) where each term is
the frontiers of that set. In other words

( ( ))
⎛ t
( ⎞
) ( ( ))
Ti = ⎜ ∏ P n f 1i , n R1i ,−l1i ⎟ P f i −21l 21 (ri 21 ), K , f j−2ls2 s r K (5)
) ( )
Pm ( r ,1)
P f kj−1 rkj =
kj
Pm n Fj − 2, rkj − 1 ..... (3) . ⎜ ⎟
(
Pm n Fj − 1, rkj ⎝ i =1 ⎠
Notations Here the frontier sets associated with the second term are
1. P ( f , r ,2; l ) : The probability of l intersection terms where F21 , F22 , K , F2 s and the number of frontiers to be unfilled
each intersection term is the probability of two robots in those sets are l 21 , l 22 , K , l 2 s . The frontier sets associated
reaching two frontiers such as in (2). with the first complemented term are F11 , F12 , K , F1t and
2. P ( f , r , q; l ) : Same as notation 1 except that each term
the frontiers to be in unfilled in those given by
computes the probability of q robots reaching frontiers.
l11 , l12 , K , l1t
3. P( f , r ,2; l ) : Complement of notation 1. Each term
D3. Counting frontiers that need to be left unfilled: Figure
signifies the probability that at-least one of the two
2 shows three frontier sets F1 , F2 , F3 . The frontiers in each
frontiers is left unfilled.
4. P( f , r , q; l ) : Complement of notation 2. Each term set are abstracted as circles and labeled f 11 , f 21 etc. Arrows
signifies the probability that at-least one of the q frontiers between two frontiers indicate at-least one of them is left
is left unfilled. unfilled. A dashed arrow indicates that both of them need to
(
Pm n f − l , n R ) be occupied. We proceed with the frontier having maximum
(
5. P n f , n R ,−l = ) (
Pm n f , n R )
. This is used in difficult connections in a particular set and make it unoccupied. In
F1 , f 31 is made unoccupied and marked as 1. Then
computations below. f 11 , f 41 , f 31 can be occupied. Since f 11 can be occupied
6. A measurement between two frontiers is denoted by f 21 needs to be unoccupied and labeled 2. Similarly f 23
(d , θ ) and measures the distance and angle between the
needs to be unoccupied and marked 1 in set F3 . Similarly
two. A unique measurement is one that is unique in terms
of either d or θ between a pair among all pairs of f 51 , f 43 needs to be occupied entailing unoccupancy of
frontiers considered f 12 , f 53 and they are labeled 1, 2 in F2 , F3 . Finally f 32
Difficult computations.
( )
needs to be unoccupied and labeled as 2 in F2 . In the above
D1. P f , r ,2; l I P( f , r ,2;1) : There are l + 1 terms in the the only candidate for alternate combination is 2,2,3 unfilled
overall intersection. Let the number of frontiers associated frontiers in F1 , F2 , F3 obtained by having f 33 unoccupied at
be t in number. The number of frontiers required to be start. This is however subsumed in the existing 2,2,2
unfilled in each of this set be l1 , l 2 , K , l t . Let the frontier combination that turns out be the unique combination.
sets associated with P ( f , r ,2; l ) computation be Fm , Fn . If

)⎟⎟ P( f im−l (rim ), f jn−l (r jn ))K (4a )


⎛ t ⎞
T1 = ⎜
⎜ ∏ (
P n fi , n Ri ,−l i m n

⎝ i =1,i ≠ m,n ⎠

( )
p
then P f , r ,2; l I P( f , r ,2;1) = U Tk K (4b ) , where each
k =1
Tk in (4b) is of the form taken by T1 in (4a) corresponding
to every alternate combination of numbers frontiers that are
to be unfilled for each frontier set. The intersection terms
that result from the computation of (4b) is also of the form Figure 2: Counting frontiers that need to be unoccupied. Three
(4a). The only point to note is how unfilled frontiers are frontier sets F1, F2 and F3 with their associated frontiers. Frontiers
allotted to frontier sets in these intersection terms. A second that are counted have their counts shown inside the circle
order intersection term with two combinations of unfilled
frontiers l1 , l 2 , K , l t and m1 , m 2 , K , mt allocates Uniqueness of hypothesis and belief updates.
max(l i , mi ) to Fi . Similarly a p order intersection term
allocates maximum of those p values to each frontier set.

47
For obtaining a unique hypothesis for a certain nR robots Visibility restriction between frontiers: If in the set Fun , a
capable of moving to and occupying n f frontiers it is { }
pair of frontiers f i1 , f j1 is not directly visible then its
enough to ensure a pair of frontier locations results in a occupancy probability ( ( ))
P f i1 (ri1 ), f j1 r j1 ought to be
multiplied by the visibility probability P(V s ) while
unique measurement. If this pair is visible then the states of
the robots occupying the frontiers can be uniquely
hypothesized as a unimodal distribution through update of computing (6). Here P(V s ) is the probability that two
each others beliefs. The belief equations are as mentioned in frontiers not visible directly but can become visible
[Fox et.al.2000] If they are not visible directly but their indirectly through the presence of other robots. A visibility
presence known through other robots their beliefs can still { }
graph is constructed between f i1 , f j1 . Let the number of
be updated to result in a unique hypothesis. This is done by
maintaining a visibility graph where nodes of the graph are paths leading from f i1 to f j1 be denoted by the set
robots and a link between two nodes signifies their mutual {
PATH = P1 , P2 , K , Pp } and the nodes of each path
Pi ∈ PATH be Vi = {V1i , V 2i , K , V vi } . Note that Vi is
visibility. By updating beliefs appropriately along the links
of the graph it can be ensured the robots at the unique pair
of frontiers have unimodal beliefs (unique hypothesis). The essentially a set of frontier locations. The probability that
p
belief update along a graph is not always trivial and
involves certain care but we do not elaborate further for f i1 and f j1 are visible is given by P(V s ) = U P (Vi )K (7 )
i =1
space constraints. The frontier sets are always exclusive.
( )
v
and P(Vi ) = I P V ji K (8) . By writing (7) as in (6a) we
When robots from different base pairs share the same
frontier then the frontier is separated by some distance to
j =1
avoid robots reaching the same place. This also makes
frontier sets mutually exclusive. find that each summation term corresponds to the
Cases: Cases depicting computation of probability of robots computation described in D2. Since occupancy of frontiers
occupying frontiers that lead to a unique hypothesis is and their visibility are independent events their probabilities
discussed here. We proceed from specific to general get multiplied while computing (6). This is because if a pair
situations. of unique frontiers is directly visible then their visibility is
Frontiers lie along a straight line equidistant from one not hindered by presence of other robots in between because
another: This case typically occurs while moving in update of beliefs would any way happen through the
corridors with doors or rooms equally spaced along the intermediaries. In such cases P(V s ) = 1 always. P(V s ) is
corridor. If the frontiers are from a single base pair there is computed only when robots at two frontiers are not directly
only one pair of frontiers that have a unique measurement. visible due to farness or presence of an obstacle and
These are the frontiers at the endpoints of the line. The mandates the presence of other robots at other locations.
Then we compute the probability of those robots occupying
( ( ))
probability of an unique hypothesis is
P (UH ) = P( f , r ,2;1) = P f i1 (ri1 ), f n f 1 r j1 . This is those locations.
calculated as in (2). Here the second index of all subscript is
3.2 Choosing best combination: A base pair n f , n R ( )
1 since there is only one robot and frontier set associated to is henceforth denoted as fr . Given the set of all base pairs
it. P(UH) denotes the probability of an unique hypothesis. { fr1 , fr2 , K , frm } its power set is obtained. From the power
Frontiers scattered about everywhere: The pairs of frontier
set a set VAL of valid strings are generated as
locations that have unique measurements are counted and
enumerated in a set as
{{ fr1 , fr2 , K , frm }, { fr1 , ( fr2 , K , frm )}K , {( fr1 , fr2 , K , frm )}}
{( )( ) ( )}
Fun = f i1 , f j1 , f i 2 , f j 3 ,..., f pm , f qn . Then A valid string is one that has all the base pairs occurring

P(UH ) = P ( f i1 (ri1 ), f j1 (r j1 )) U K U P ( f pm (rxm ), f qn (r yn ))L (6 )


exactly once albeit in various combinations. For example
{ fr1, ( fr2 , K , frm )} is a combination of one base pair fr1
Each of the union is a term of the form P ( f , r ,2;1) . Write considered separately and the remaining considered
together. From VAL a set of strings that can actually occur
( A1 U ... U An ) = A1 + (A1 )I A2 + K + ⎜⎜ I (Ai )⎟⎟ I An K (6a ) .
⎛ n −1 ⎞
in W is extracted and denoted as GVAL, the strings with
⎝ i =1 ⎠ ground truth value. For example a string {( fr1, fr2 , K , frm )}
By associating probabilities for all terms we find that every that signifies the combination of all base pairs considered
summation term on the right hand side of (6a) can be together belongs to GVAL only if at-least one frontier of
correlated to the difficult computation elaborated in D1 and every base pair is visible to at-least one of all other base
hence computed as specified there. A1 in (6a) denotes the pairs. For each of the strings in GVAL an objective function
( )
event f i1 (ri1 ), f j1 r j1 , or the occupancy of f i1 , f j1 in (6). is evaluated. The string with the highest evaluation is
chosen and the robots dispatched to those frontiers.

48
Objective function: Given a string from GVAL such as moving to the frontiers. The number of frontiers to be
{( fr1, fr2 ), ( fr3 , K , frm )} the objective function is of the form visited by a robot on an average before localizing is nearly 2
in such a case, whereas by considering robot-robot
⎛ n R1 n R 2 ⎞ ⎛ m ⎞
⎜ + ⎟ P (UH ( fr1 , fr2 )) + ⎜ ∑ n Ri ⎟ P(UH ( fr3 , K , frm )) detections localization is achieved by moving to the first
⎜ n f1 n f 2 ⎟ ⎜ ⎟
⎝ ⎠ ⎝ i =3 n fi ⎠ selected frontier for all the robots. Similar decrease in
( )
P UH ( fri , fr j ) is the probability of obtaining a unique frontiers to be covered before localizing to a unique
hypothesis is obtained in various simulations that are not
hypothesis through a combination of frontiers in base pairs reported here for conciseness.
(
fri and fr j . The multiplicative term n Ri n fi ) biases the Figure 5 shows the graph of probability of obtaining a
unique hypothesis versus number of unique measurement
evaluation to locations where the occurrence of number of
pairs for a fixed number of frontiers (here 12) for various
robots is high to enable localizing a large number of robots
number of robots occupying those frontiers. The probability
at the same time.
increases as the number of unique measurement pairs
4 Simulation Results increase for a given number of robots. Also across the
Figure 3 shows 4 suites of 3 neighboring rooms labeled robots the probability increases as more number of robots
S1,S2,S3 and S4 from top to bottom. The initial positions occupies the same space. Hence the rationale behind the
of the robots are at the center of the rooms. All the robots objective function (section 3.2) that tends to select
shown have three hypotheses of their states initially and all combinations where more robots come together. This graph
of them possess two distinct frontiers one at the top and one sums up the essence of this effort, it gives a bird’s eye view
at the bottom at each of their hypothesis position. Each of of where the robots should move to maximize probabilities
the two distinct frontiers has 3 copies of virtual frontiers of obtaining a unique hypothesis.
corresponding to the three hypotheses states for each robot.
When robot hypotheses of states overlap over certain grids
they are represented as r1/r2, r4/r5 etc suggesting that both
r1 and r2’s hypotheses share some of the grids.
There are eight base pairs with the superscript t and b
indicating the top and bottom distinct frontiers in the
{ }
set fr1t , fr1b , fr2t , fr2b ,K, , fr4t , fr4b . Robots r1 and r2
constitute the base pairs fr1t , fr1b , r3 forms fr2t , fr2b , r4 and
r5 with fr3t , fr3b and r6 and r7 the last two.
Based on the above enumeration, the power set, the set
VAL and the set GVAL are formed as mentioned in 3.2. The
{( ) } {( ) }
combinations fr1b , fr2t , fr3t , fr4t and fr1b , fr2b , fr3t , fr4t
have the highest objective functions. Ties are broken
randomly and the latter combination is chosen. The arrows
show the direction in which each robots moves to reach its
frontier. After reaching their respective frontiers all but
robots r6 and r7 have localized to a unimodal hypothesis of
their states in figure 3b. In the second phase robot r3 is sent
to the frontier set shared by r6 and r7. When r3 nears the Figure 3: 4 suites of 3 rooms labeled S1, S2, S3 and S4 with
center of gravity of that frontier set r6 and r7 are localized to robots at their centers initially. Arrows show the movement of
a unique hypothesis distribution. This has not been shown robots from all its hypothesis states to corresponding frontiers.
in the form of figure due to constraints of space.
Figure 4 tabulates some of the combinations in string
VAL for a different scenario consisting of 6 base pairs of the
{ }
form fr1t , fr1b , ,K, , fr3t , fr3b . The corresponding number of
unique measurements between frontier pairs, probabilities of
unique hypotheses and the objective function evaluation is
also shown. From this the top most combination that can
actually occur in the workspace is selected. This figure
makes clear much of the details in section 3, especially 3.2.
Analysis: If robot-robot detections were not considered
in figure 3b only robots r3 and r5 would have localized after

49
robot on the top half and 2 for the one at bottom. The
combination { frb1 , frt 2 } yielded the maximum objective
function. The robots are dispatched to the corresponding
frontiers and upon reaching detect one another in figure 6c.
The unique hypothesis states corresponding to figure 6c is
shown in 6d for the two robots.

6 Conclusions:
This paper has presented a novel method of
shepherding robots to locations for rapid elimination of
multiple hypotheses among them. A set of robots Ri with
cardinality n Ri and sharing the same frontier set Fi is
( )
represented by the base pair n Ri , n fi also denoted as fri in
the paper. A combination of several such base pairs that are
Figure 4: Tabulation of only some of the combinations in both syntactically and semantically valid is stored in the set
string VAL for a scenario consisting of 3 base pairs. Other GVAL. The combination with the highest evaluation of
combinations not shown due to space. objective function is chosen and robots are shepherded to
those locations to eliminate maximum number of conflicting
5 Implementation hypotheses. The strategy has been tested both in simulation
and real robots and its efficacy verified. This method finds
The method was verified on a pack of Amigobots utility in several multi-robotic scenarios, where robots are
equipped with 8 sonar transducers. External hardware in the
form of IR transceiver circuit was interfaced to the serial
buffer of the Amigobot’s controller board to facilitate easy
detection of one robot by other. Each transceiver transmitted
a unique pulse code corresponding to a particular robot. It
also received similar codes corresponding to other robots.
The hardware apart from detecting a robot was capable of
measuring the bearing between the detected and detector
robot. Sonar sensors are then fired along the detected
direction to measure distance between the two. Figure 6a
shows a corridor studded with cardboards somewhat akin to
situations in warehouse with two Amigos. Figure 6b shows
the hypotheses returned by the global localization procedure
resulting in two distinct frontiers (one at the top and one at
bottom) for each robot. Each frontier has two copies of
Figure 5: Graph showing probability of obtaining a unique
virtual frontiers corresponding to the two hypotheses states hypothesis versus number of unique measurements for a fixed
for a robot. The number of base pairs is thus four number of frontiers for different numbers of robots
namely { frt1 , frb1 , frt 2 , frb 2 } where the suffix 1 is for the not clear about their state, have multiple hypotheses and

Figures 6a-6d. 6a: Two robots (indicated by white arrows) in an environment studded with cartons. 6b: The hypotheses states corresponding
to 6a. 6c: The robots detect one another through IR transceivers upon nearing the frontiers. 6d. The hypotheses states corresponding to 6c
with each robot having now singularized to a unique state

50
require the assistance of other robots to resolve conflicts as
well as to refine their states to precise coordinates.
References
Dudek et. al, 1998] G. Dudek, K. Romanik, and S. Whitesides.
“Localizing a robot with minimum travel”., SIAM J. Computing
27(2), 1998.
[Fox et. al, 1998] D. Fox, W. Burgard, and S. Thrun,. “Active
markov localization for mobile robots,” Robotics and Autonomous
Systems, 25 (1998), 195-207
[Fox et. al, 2000] Fox et. al, “A Probabilistic Approach to
Collaborative Multi-robotic Localization”, Autonomous Robots,
8(3), 2000
[Howard et. al, 2003] A Howard, M J Mataric and G S Sukhatme,
“Putting the I in the team.: Ego-centric approach to multi-robotic
localization”, IEEE ICRA, 2003.
[Kaelbing 1995] L. P. Kaelbing, M.L. Littman, and A.R.
Cassandra “Planning and acting in partially observable stochastic
domains”, TR, Brown University, 1995.
[Rao et. al, 2005] M Rao, G Dudek, and S. Whitesides,
“Minimum Distance Localization for a Robot with Limited
Visibility”, Proc of ICRA, 2005
[Rekelitis et. al, 1997] I. M. Rekleitis, G. Dudek, and E. Milios.
“Multi-robot exploration of an unknown environment: efficiently
reducing the odometry error.” 2: 1340–1345, IJCAI 1997
[Yamauchi 1997] Yamauchi, “A Frontier-Based Approach for
Autonomous Exploration”, IEEE International Symposium on
Computational Intelligence in Robotics and Automation,: 146-151.

51
Multi-Robot Marginal-SLAM
Ruben Martinez-Cantin and José A. Castellanos Nando de Freitas
Dept. Informática e Ingenierı́a de Sistemas Dept. of Computer Science
Inst. de Investigación en Ingenierı́a de Aragón (I3A) University of British Columbia
University of Zaragoza nando@cs.ubc.ca
rmcantin@unizar.es

Abstract conditioning on increasing paths was already described in as


early as 1999 [Andrieu et al., 1999].
This paper has two goals. First, it expands the pre-
sentation of the marginal particle filter for SLAM The same concerns apply to the multi-robot setting. The
proposed recently in [Martinez-Cantin et al., 2006]. problem is further exacerbated since we typically do not know
In particular, it presents detailed pseudo-code to en- the relative position of one robot with respect to the others. It
able practitioners to implement the algorithm eas- is important to estimate this quantity in order to produce a
ily. Second, it proposes an extension to the multi- global map. Some authors simplify the problem by only car-
robot setting. In the marginal representation, the rying out the map merging step when two robots make “eye-
robots share a common map and their locations are contact” [Howard, 2005]. In general, however, the only infor-
independent given this map. The robot’s relative mation available is the data association between common fea-
locations with respect to each other are assumed tures in both maps. In this setting, some authors have adopted
to be unknown. The multi-robot Marginal-Slam Monte-Carlo localization using joint path sampling [Fox et
algorithm estimates these transformations of coor- al., 2006]. However, as we have argued earlier, this approach
dinates between the robots to produce a common could easily result in degeneracy of the filter. We refer the
global map. interested reader to [Fox et al., 2006] for background on the
multi-robot SLAM problem.
This paper presents a marginal filtering approach, where
1 Introduction the Monte Carlo integration with respect to the robot state
For many years, Simultaneous Localization and Mapping (pose) happens in the marginal space. The approach treats
(SLAM) has occupied the center-stage in robotics re- static maps as parameters, which by necessity are learned
search [Durrant-Whyte and Bailey, 2006; Thrun, 2002]. The using maximum likelihood (ML) or maximum a posteriori in-
introduction of particle filters (PFs) gave researchers the ference. The idea of treating maps as parameters is not new.
power and flexibility to handle nonlinearity and non-Gaussian It has been central to the incremental ML method [Thrun,
distributions routinely [Fox et al., 2001]. Moreover, it en- 1993]. However, this method resorts to an ML estimate of
abled researchers to exploit conditional independence, us- the state and hence fails to manage the uncertainty in the ro-
ing the Rao-Blackwellized particle filtering (RBPF) variance bot state properly, as elaborated in [Thrun, 2002]. A varia-
reduction technique, to obtain more efficient Monte Carlo tion based on learning the distribution of the robot states and
schemes. RBPFs were applied to dynamic maps [Doucet using an ML estimator of the map as a function of the exist-
et al., 2000] and subsequently to static maps with the cel- ing and growing state trajectories was proposed in [Thrun,
ebrated FastSLAM algorithm [Montemerlo et al., 2003]. 2001]. This approach unfortunately suffers from the same
The application of RBPF to dynamic maps is only sensi- consistency problems as FastSLAM. Various EM approaches
ble inasmuch as one has a good model to describe the evo- using forward-backward state smoothing and map estimation
lution of the dynamic map. On the other hand, the appli- in the M step have also been proposed; see [Thrun, 2002] for
cation of RBPF to static maps has come into question. It a survey. However, this approach only applies when learn-
has become popular knowledge that the approach can di- ing the map off-line. Moreover, it is only very recently that
verge [Bailey et al., 2006]. In loose terms, learning static particle smoothing has become feasible [Klaas et al., 2006].
variables (the map) by conditioning on increasing histories (As an aside, we note that two-filter smoothers do better than
of the state variables results in an accumulation of Monte forward-backward smoothers in the PF context.)
Carlo errors and explosion of variance. We will elaborate In the marginal approach to SLAM, we are able to com-
on this later on and present further arguments. Heuristic ap- pute the filter derivative on-line. This extremely crucial for
proaches to ameliorate the situation [Stachniss et al., 2005; on-line static parameter estimation, but has only become pos-
Elinas et al., 2006] have been proposed, but these do no solve sible very recently following new advances in particle simu-
the fundamental problem at hand. We note that the problem lation [Klaas et al., 2005; Poyadjis et al., 2005a]. Another
of PF divergence resulting from learning fixed variables by advantage of working on the marginal space is that we can

52
assume independency of the robot locations given the shared goes to infinity. Unfortunately, one cannot easily sample
map. Therefore, an independent sample set can be used for from the marginal distribution p(xt |y1:t ) directly. Instead,
each robot; thereby reducing the dimensionality of the sam- we draw particles from p(x1:t |y1:t ) and samples x1:t−1 are
pling space. That is, the filter derivatives and mapping up- ignored. This is a valid way to draw samples from a marginal
dates are independent despite the fact that the map parameters distribution and is at the core of most Monte Carlo statistical
are shared. methods. The unknown normalizing constant precludes us
Sections 2 and 3 of this paper are a review of the method from sampling directly from the posterior. Instead, we draw
proposed in [Martinez-Cantin et al., 2006]. The new exten- samples from a proposal distribution q and weight the parti-
sion to multiple robots and new experiments are presented in cles according to the following importance ratio:
Sections 4 and 5.
p(x1:t |y1:t )
wt (x1:t ) =
2 Problem Formulation q(x1:t |y1:t )
We present a general formulation of the problem that is ap- The proposal distribution is constructed sequentially
plicable to both feature-based maps and grid-based maps. A
more specific and detailed model appears in the experimen- q(x1:t |y1:t ) = q(x1:t−1 |y1:t−1 )q(xt |yt , xt−1 )
tal section. The unknown robot pose (location and heading), and, hence, the importance weights can be updated recur-
xt ∈ X , is modelled as a Markov process of initial dis- sively in time
tribution p (x1 ) and transition prior p ( xt | xt−1 ). The ob-
servations, yt ∈ Y, are assumed to be conditionally inde- p(x1:t |y1:t )
pendent given the process {xt } and of marginal distribution wt (x1:t ) = wt−1 (x1:t−1 ).
p(x1:t−1 |y1:t−1 )q(xt |yt , xt−1 )
p θ (yt |xt ), where θ is a vector describing the elements of the (1)
map. Hence, the model consists of the following two distrib- (i)
Given a set of N particles {x1:t−1 }i=1 , we obtain a set of
N
utions: (i) (i)
particles {x1:t }N
i=1 by sampling from q(xt |yt , xt−1 ) and ap-
p(xt |xt−1 ) plying the weights of equation (1).
p θ (yt |xt ) t ≥ 1. The familiar particle filtering equations for this model are
obtained by remarking that
We denote by x1:t  {x1 , ..., xt } and y1:t  {y1 , ..., yt },
respectively, the robot state and the observations up to time t, 
t
and define p(x1 |x0 )  p(x1 ) for notational convenience. p(x1:t |y1:t ) ∝ p (x1:t , y1:t ) = p(yk |xk )p(xk |xk−1 ),
In this formulation, we have used θ to describe the map k=1
only. However, θ could be used to include other parameters in given which, equation (1) becomes
the transition and measurement models, as well as, to include
data association variables [Thrun, 2002]. For simplicity of (i) (i) (i)
(i) p(yt |xt )p(xt |xt−1 ) (i)
presentation, we will assume that the associations are given. t ∝
w (i) (i)
t−1 .
w
Our objective is to compute sequentially in time the filter- q(xt |yt , xt−1 )
ing distribution p ( xt | y1:t ) and point estimates of the map
This iterative scheme produces a weighted measure
θ. (i) (i) (i) (i)  (j)
{x1:t , wt }N i=1 , where wt =w t / j w t , and is known
3 Particle Methods as Sequential Importance Sampling (SIS).
It has been proved [Doucet, 1998] that the variance of the
3.1 The Joint Path Space Approach importance weights in SIS increases over time. This causes
In classical Rao-Blackwellized particle filtering, one first most particles to have very small probability. A common
notes the following decomposition of the joint posterior: strategy to solve this degeneracy, consists of using a resam-
p(θ, x1:t |y1:t ) = p(θ|x1:t , y1:t )p(x1:t |y1:t ). pling step (SIR) after updating the weights to replicate sam-
ples with high probability and prune those with negligible
Consequently, given the state path x1:t , we can solve for the weight [Doucet et al., 2001].
map θ analytically. This leaves us with only having to carry This is the procedure in common use by practitioners. It
out particle filtering to compute the posterior distribution of can be deceptive: although only the state xt is being updated
the robot state p(x1:t |y1:t ). every round, the algorithm is nonetheless importance sam-
 N
If we had a set of samples (or particles) xt
(i)
from pling in the growing joint path space X t .
i=1 Formally, the resampling step should be done along the full
p(xt |y1:t ), we could approximate the distribution with the (i) (i)
Monte Carlo estimate path {x1:t , w1:t }Ni=1 . Since dynamic systems forget the past
exponentially fast, several authors carry out resampling over
1 
N (i) (i)
p(dxt |y1:t ) = δ (i) (dxt ), the marginal space {xt , wt }N i=1 . This would be fine if, for
N i=1 xt example, we were inteterested in tracking dynamic maps.
Static parameter estimation and model selection do not
where δx(i) (dxt ) denotes the delta Dirac function. This esti- necessarily exhibit the exponential forgetting behavior. For
t
mate converges almost surely to the true expectation as N example, static maps depend on the whole state trajectory.

53
Resampling over the joint path space is guaranteed to de- Using equation (3) and Bayes rule, the unnormalized filtering
plete the past in finite time. Alternatively, resampling from distribution can be expanded as follows:
the marginal space still leaves us with an accumulation of 
Monte Carlo errors over time. Some implementations have ξ θ (xt , y1:t ) = p θ (yt |xt ) p(xt |xt−1 )p θ (xt−1 |y1:t−1 )dxt−1 .
introduced artificial dynamics or Markov chain Monte Carlo
(MCMC) rejuvenation steps to reduce the severity of the The integral in equation (3) is generally not solveable
problem, but these approaches do not overcome the problem. analytically, but since we have a particle approximation of
(i) (i)
In conclusion, whether we resample or not, learning the p(xt−1 |y1:t−1 ) (namely, {xt−1 , wt−1 }), we can approximate
map as a function of a growing path in Monte Carlo simula- (3) as the weighted kernel density estimate
tion is a bad idea. 
N
(j) (j)
The same degeneracy problem arises if we try to obtain p
θ (xt |y1:t−1 ) = wt−1 p(xt |xt−1 ).
estimates of the filter derivative ∇θ p θ (xt |y1:t ) for recursive j=1
(online) map estimation. To see this, let ∇θ p θ (x1:t |y1:t ) de- While we are free to choose any proposal distribution that
note the gradient vector of the path posterior with respect to has appropriate support, it is convenient to assume that the
the map. Then, we have marginal proposal takes a similar form, namely
∇θ p θ (x1:t |y1:t ) 
N
∇θ p θ (x1:t |y1:t ) = p θ (x1:t |y1:t ) (j) (j)
p θ (x1:t |y1:t ) q θ (xt |y1:t ) = wt−1 q θ (xt |yt , xt−1 ). (5)
j=1
and, consequently the filter derivative, necessary for online
map learning, is given by: We can easily draw particles from this proposal using
 multinomial or stratified sampling. Note the importance
∇θ p θ (x1:t |y1:t ) weights are now defined on the marginal space:
∇θ p θ (xt |y1:t ) = p θ (x1:t |y1:t )dx1:t−1
X t−1 p θ (x1:t |y1:t ) p θ (xt |y1:t )
(2) wt = .
q θ (xt |y1:t )
This equation says that when using standard particle filters
to approximate the filter derivative we are implicitly carrying The Filter Derivative
out importance sampling on a vast growing space with pro- In order to obtain the gradient vector with respect to the map
posal p θ (x1:t |y1:t ) and weight ∇pθ θp(xθ (x1:t |y1:t )
. This should variables, we apply standard differentiation rules to equa-
1:t |y1:t ) tion (4), yielding:
be enough reason to call for a new approach. Yet, the problem
∇θ ξ θ (xt , y1:t )
is even worse. ∇θ p θ (xt |y1:t ) =
The filter derivative is a signed-measure, and not a standard ξ θ (xt , y1:t )dxt

probability measure. It consists of positive and negative func- ∇θ ξ θ (xt , y1:t )dxt
tions over disjoint parts of the state space and it sums to zero − p θ (xt |y1:t ) . (6)
ξ θ (xt , y1:t )dxt
over the entire state space. A serious problem, when carrying
out classical particle filtering to estimate this signed-measure, Similarly, using the expansions for the derivatives of logs, the
is that particles with positive and negative weights will cancel gradient of ξ(·) can be written as follows:
each other, say, in parts of the space where the derivative is ∇θ ξ θ (xt |y1:t ) = p θ (yt |xt )∇θ log p θ (yt |xt )
close to zero. This is wasteful and statistically harmful. See 
Figure 1 of [Poyadjis et al., 2005a] for a beautiful depiction × p(xt |xt−1 )p θ (xt−1 |y1:t−1 )dxt−1
of this problem. 
The technique presented in the following section over- + p θ (yt |xt ) p(xt |xt−1 )∇θ p θ (xt−1 |y1:t−1 )dxt−1 (7)
comes these deficiencies.
Note that the importance sampling process now happens in
3.2 The Marginal Space Approach the marginal space. The last integral in equation (4) can be
Marginal Particle Filtering expanded using the score identity:
To eliminate the problems discussed in the previous section, 
we will perform particle filtering directly on the marginal dis- p(xt |xt−1 )∇θ log[p θ (xt−1 |y1:t−1 )]p θ (xt−1 |y1:t−1 )dxt−1
tribution p(xt |y1:t ) instead of on the joint space [Klaas et
al., 2005; Poyadjis et al., 2005b; 2005a]. To do so, we be- That is we sample from the marginal filtering distribution
gin by noting that the predictive density can be obtained by and weight with β  ∇θ log[p θ (xt−1 |y1:t−1 )]. Contrast this
marginalization: with equation (2).
 The other thing to note, as pointed out in [Poyadjis et al.,
p θ (xt |y1:t−1 ) = p(xt |xt−1 )p θ (xt−1 |y1:t−1 )dxt−1 (3) 2005a] is that the marginal filter derivative allows us to obtain
a particle approximation of the Hahn-Jordan decomposition.
To simplify the exposition later on, we introduce the follow- This implies that we can surmount the problem of particles
ing notation [Poyadjis et al., 2005a]: of opposite signs cancelling each other out in infinitesimal
neighborhoods of the state space. Once again we refer the
ξ θ (xt , y1:t ) reader to Figure 1 of [Poyadjis et al., 2005a] for an illustration
p θ (xt |y1:t )  (4)
ξ θ (xt , y1:t )dxt of this phenomenon.

54
Monte Carlo Implementation A detailed analysis is presented in [Tadic and Doucet, 2005].
We are now ready to present the particle algorithm of [Poy- The only remaining detail is to describe the Monte Carlo ap-
adjis et al., 2005a] for approximating the filter derivative ef- proximation of the gradient of the predictive distribution. It
ficiently. Assume that at time t − 1, we have the following can be derived as follows:
approximations of the filter and its gradient: 
∇ θ p θ (yt |y1:t−1 )

N ∇θ log p
θ (yt |y1:t−1 ) =
(i) p
θ (yt |y1:t−1 )
p
θ (xt−1 |y1:t−1 ) = wt−1 δx(i) (xt−1 )
i=1
t−1
∇ θ ξ θ (xt , y1:t )dxt
=

N
(i) (i) ξ
θ (xt , y1:t )dxt

∇θ p θ (xt−1 |y1:t−1 ) = wt−1 βt−1 δx(i) (xt−1 ). N (j)
i=1
t−1
j=1 ρ t
=
N (j)
(8)
Then, we can sample from the proposal in equation (5) and j=1 w t
compute the new unnormalized importance weights:
(i) N (j) (i) (j)
Pseudo-Code for Marginal Slam
(i) p θ (y t |x t ) w
j=1 t−1 p(x t |x t−1 ) The Marginal-SLAM algorithm is depicted in Figure 1. Note
wt = (i) that it is linear in the number of features. It has an O(N 2 )
q θ (xt |y1:t )
complexity in terms of the number of samples, but this can be
(i)  (j) (i) (j) (i) (j)
(i) p θ (y t |x t ) w
j t−1 p(x t |x t−1 )[∇ θ log p θ (yt |x t )+βt−1 ] reduced to O(N log N ) using the fast multipole expansions
ρt = (i) and metric tree recursions proposed in [Klaas et al., 2005].
q θ (xt |y1:t ) The marginal particle filter is an old idea [Fearnhead, 1998;
These weights lead to the following approximations: Pitt and Shephard, 1999]. Yet, because of its large computa-
tional cost, it was not fully explored until the introduction of
1  (i)
N
ξ
θ (xt , y1:t ) = t δx(i) (xt )
w fast methods [Klaas et al., 2005]. When using the transition
N i=1 t prior as proposal, the marginal filter and classical particle fil-
ter are equivalent [Khan et al., 2004], but this is no longer
1  N
(i) true when computing the derivative of the filter as outlined in
∇ θ ξ θ (xt , y1:t ) = ρ δ (i) (xt )
N i=1 t xt [Poyadjis et al., 2005b] and this paper.
Finally, normalizing the weights and substituting the above 4 Multi-robot Implementation
Monte Carlo estimates into the expression for the derivative
of p θ in terms of ξ θ , we obtain: Hereinafter, we focus on feature-based representations be-
cause our goal is to apply the method to visually guided mo-

N
(i) (i) bile robots [Lowe, 2004; Bay et al., 2006; Newman et al.,

∇θ p θ (xt , y1:t ) = wt βt δx(i) (xt )
t 2006]. The robot motion model is based on simple differen-
i=1
tial drive vehicle
where,
(i)
 (j) Xt = Xt−1 + dt cos(ψt−1 )
(i) (i) ρ (i) t

wt βt =  t (j) − wt  (j) Yt = Yt−1 + dt sin(ψt−1 )
jwt t
jw ψt = ψt−1 + αt ,
On-Line Map Learning where xt = [Xt , Yt , ψt ] denotes the robot pose and heading
Armed with Monte Carlo estimates of the filter derivative, and ut = [dt , αt ] denotes the translation and rotation motion
we can now attack the problem of developing recursive map commands (odometry) at time t with corresponding Gaussian
estimators. Here, we choose to maximize the predictive like- noise vt ∼ N (0, σR I), where σR = [σd , σα ]. The transition
lihood (also known as the innovations or evidence): model is Gaussian in terms of ut , but it involves a nonlinear

transformation between ut and xt . Specifically, it is given by
p θ (yt |y1:t−1 ) = p θ (yt |xt )p(xt |xt−1 )p θ (xt−1 |y1:t−1 )dxt−1:t the following expression:

To accomplish this, we adopt the following stochastic approx- 1 (dt − dˆt )2 (αt − α̂t )2
imation algorithm: p(xt |xt−1 , ut ) = exp + ,
2πσd σα −2σd2 −2σα2
θt = θt−1 + γt ∇θ log p
θ (yt |y1:t−1 ) (9)
Provided that the step size γt satisfies standard stochastic ap- where
proximation criteria, see for example [Bertsekas and Tsitsik-
ˆt =
d d2x + d2y
lis, 1996; Spall, 2005], it can be shown that θt converges to
the set of global or local maxima of l(θ), where [Poyadjis et  
−dx sin(ψt−1 ) + dy cos(ψt−1 )
al., 2005a] α̂t = arctan .
dx cos(ψt−1 ) + dy sin(ψt−1 )
1 
k
l(θ) = lim log p θ (yt |y1:t−1 ) with dx = Xt − Xt−1 and dy = Yt − Yt−1 . In the work,
k→∞ k + 1 we will use this transition prior as the proposal distribution.
t=1

55
ture is:

Marginal-SLAM
1 (ρt − ρ̂t )2 (φt − φ̂t )2
p(yt |xt ) = exp + (10)
• For i = 1, ..., N , sample the robot state from the pro- 2πσρ σφ −2σρ2 −2σφ2
posal
N
The overall observation likelihood is given by a product of
(i)
xt ∼
(j) (j)
wt−1 qθ (xt |yt , xt−1 ) the individual likelihood terms. The gradient of the log-
j=1
likelihood can be obtained by simple differentiation:
 


∆x (ρt −ρ̂t ) ∆ (φ −φ̂ )
• For i = 1, ..., N , evaluate the importance weights 1  σρ2 − y ρt tσ2 t
∇θ log(p(yt |xt )) = .

φ
∆y (ρt −ρ̂t )
N ρt + ∆x (φt −2 φ̂t )
(i) (j) (i) (j)
(i) p θ (yt |xt ) j=1 wt−1 p(xt |xt−1 ) 2
σρ ρt σφ
wt = (i)
q θ (xt |y1:t ) (11)

(i)
ρt = (i)
(i)
p θ (yt |xt )
q θ (xt |y1:t )
·
One of the main SLAM difficulties is partial observability of
the parameters. The choice of learning rates γt is affected by
this partial observability. In our case, we chose to implement
N
(j) (i) (j)
separate learning rates for each landmark. Each individual
· wt−1 p(xt |xt−1 ) · rate depends on the number of times its corresponding feature
j=1
is observed. In addition, when there is no observation, the
(i) (j)
· [∇θ log p θ (yt |xt ) + βt−1 ] likelihood function of the unseen features is assumed to be
uniform over the whole space, then ∇θ log(p(yt |xt )) = 0.
• Normalise the importance weights

  (i) 4.1 Non-Stationarity and Estimate Recovery


(i) wt


wt = (j) Although our goal has been to develop a method for static

 
j wt
maps using decreasing learning rates, it is possible to adopt


(j)
(i) (i) ρt
(i)
(i) j ρt small constant learning rates to estimate time-varying para-
wt β t = (j)
− wt (j) meters (maps). This enables us to attack maps that contain
j wt j wt
some moving parameters, such as chairs and doors [Martinez-


• Update the map vector Cantin et al., 2006].
The same strategy for dealing with non-stationarity can


(j)
j ρt also be adopted to correct for modelling errors. For exam-
θt = θt−1 + γt (j) ple, the ability to track moving features enables Marginal-
j wt
SLAM to recover from errors in data association, provided
• Update the learning rate γt . that the correct correspondences are obtained in subsequent
steps. This is crucial because in the early steps of the al-
gorithm there aren’t enough observations to guarantee global
map consistency. Figure 6 in the experimental section, illus-
Figure 1: The Marginal-SLAM algorithm at time t. trates how Marginal-SLAM is robust with respect to reason-
able errors in matching.

We do this for both Marginal-SLAM and FastSLAM so as 4.2 Map Merging


to provide a fair comparison. One can do better however by Our goal is to merge the map estimates of each robot into
adopting approximations of the optimal proposal distribution a single global map. When the relative robot locations are
that account for the latest observations. Typical approxima- known, the problem is simple and reduces to standard mono-
tions of this type include the extended and unscented Kalman lithic SLAM. In practice, however, we do not know the trans-
filters. formations of coordinates that would allow all the robots to
We assume a simple range and bearing sensor yt = [ρt , φt ] have the same reference frame for the global map. We must
with a point feature detector: therefore estimate the parameters of these linear transforma-
tions (rotation and translation) between the poses of the vari-
ous robots.
ρ̂t = ∆2x + ∆2y Some authors simplify this problem by only carrying
  out the map merging step when two robots make “eye-
−∆x sin(ψt ) + ∆y cos(ψt )
φ̂t = arctan , contact” [Howard, 2005]. In general, however, the only infor-
∆x cos(ψt ) + ∆y sin(ψt ) mation available is the data association between common fea-
tures in both maps. In this setting, some authors have adopted
where ∆x = θx − Xt and ∆y = θy − Yt , with [θx , θy ] de- Monte-Carlo localization using joint path sampling [Fox et
noting the feature location. The sensor has white Gaussian al., 2006]. However, as we have argued in earlier sections,
noise vt ∼ N (0, σI) in the measurement space, where σ = this approach could easily result in degeneracy of the filter.
[σρ , σφ ]. Hence, the observation likelihood for a single fea- We point out in passing that map merging is only one of the

56
Here, we adopt standard vision algorithms to compute the
Multi-robot Marginal-SLAM at step t correspondences between features [Hartley and Zisserman,
1. For each robot, compute their pose and map:
2000; Brown and Lowe, 2005]. In order to simplify the pre-
sentation, let us for the time being consider only two robots:
• For i = 1, ..., N , particles R1 and R2 . When the two robots share a common set of map
1. Sample new particles from the prior mixture pro- features for which we have computed the correspondences,
posal: we apply standard algorithms to estimate the transformation
(a) Select mixture components using stratified re- of coordinates between the poses or R1 and R2 . (We refer the
sampling:
 x t−1 ,
(i) 1
 N 
← xt−1 , wt−1
(i) (i)
 N
reader to either [Castellanos and Tardós, 1999, Appendices A
and B] for a detailed explanation or [Brown and Lowe, 2005]
for a brief yet comprehensive explanation on how to carry out
N i=1 i=1
feature matching and estimation of transformations between
(b) Simulate the stochastic dynamics for each se- different reference frames.)
lected component p(xt |ut , x t−1 ):
(i) Once we know the transformation of coordinates, we map
R2 ’s pose to the reference frame of R1 . We now have a com-
(i) (i) (i) (i)
[dt , αt ] = [dt , αt ] + vt ; vt ∼ N (0, σI) mon reference frame for the poses of both robots. We also
(i) (i) (i) (i) transform the estimates of the location of map features to this
Xt = Xt−1 + dt cos(ψt−1 )
common reference frame. Finally we merge the estimates of
(i) (i) (i) (i)
Yt = Yt−1 + dt sin(ψt−1 ) the map feature locations of both robots in order to produce
(i) (i) (i) the common global map.
ψt = ψt−1 + αt
As mentioned previously in Section 4.1, we noticed that
2. Evaluate the importance weights using the expres- Marginal-SLAM was able to converge empirically to the true
sions in equations (9), (10) and (11) for the transi- map, even in the presence of errors in the estimates of the


tion prior (odometry), likelihood and gradient: relative transformations.
The multi-robot Marginal-SLAM algorithm is detailed in
 
(i) (i)
wt = p θ (yt |xt )
(i) (i) (i)
figure 2. For clarification, the pseudo-code is based on the
ρt = wt ∇θ log p θ (yt |xt ) robot and sensor models presented in this paper, but it can be
p θ (yt |xt ) N

(i) (j) (j) (i) (i) (j)
j=1 wt−1 βt−1 p(xt |ut , xt−1 )
easily modified to treat other models.
+ N (j) (i) (i) (j)
j=1 wt−1 p(xt |ut , xt−1 )
5 Experiments
• For i = 1, ..., N , normalise the importance weights:

   
In this preliminary work, experiments are based on simula-


(j)
(i) wt
(i)
(i) (i) ρt
(i)
(i) j ρt tions in a controlled environment, where all the error sources
wt = (j)
; wt βt = (j)
−wt (j) and data associations are known. This simulated environment
j wt j wt j wt
allows us to carry out experiments for one and multiple robots


• Update the individual map vector: using different sensor and motion noise models.

 
We first present a brief comparison of Marginal-SLAM and
(j)
j ρt FastSLAM. For a more detailed comparison, see [Martinez-
θt = θt−1 + γt (j) Cantin et al., 2006]. Figure 3 shows a comparison be-
j wt N
tween the number of effective particles Nef f = 1/ i=1 wi2
• Update the learning rates γt . using Marginal-SLAM and FastSLAM. This quantity mea-
2. If maps share features: sures the degeneracy of sequential Monte Carlo algorithms.
To obtain a fair comparison, we use the same transition
• Compute the relative linear transformation between the
coordinates of both robots.
prior p(xt |ut , xt−1 ) as the proposal distribution in both algo-
rithms. Clearly, the marginal particle filter reaches a steady
• Transform all maps to a common reference frame using state, but FastSLAM loses particle diversity at a very fast rate.
the estimated linear transformations. These results are based on the following parameter settings:
• Place all transformed features in a common global map. L = 200, σd = 0.1m, σα = 0.5deg, σρ = 0.025m, σφ =
All robots use this map at step t + 1. 3deg, where L denotes the number of landmarks.
In the multi-robot case, we used only 50 landmarks to en-
sure that we have a demanding testing scenario. That is, the
Figure 2: The multi-robot Marginal-SLAM algorithm at time robots share very few features in common. For simplicity, we
t. This version uses stratified resampling to sample from the use only two robots, but the algorithm extends naturally to
mixture proposal. The proposal in this case is simply a mix- more robots.
ture of transition priors. We assume that the initial relative robot location is un-
known. As shown in Figure 4, we have to learn the linear
transformation (rotation and translation) of coordinates be-
components of the more general approach presented in [Fox tween the two robots using standard geometry. When the ro-
et al., 2006]. bots detect common features, the estimated linear transforma-

57
80 120
120
MarginalSLAM
FastSLAM
70 FastSLAMt
100
100 First steps inset
60
80 80

50
Effective particle

Efficient particle

Efficient particle
s

s
60 60
40

40 40
30

20 20
20

0 0
10

0 −20 −20
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000 0 50 100 150 200 250 300
Steps Steps Steps

Figure 3: Average number of effective particles Nef f , with corresponding confidence intervals, for 25 simulations. The number
of particles is 200 in both cases. The left plot corresponds to Marginal-SLAM, where the filter achieves a steady state with
a 25% effective number of particles. The middle plot shows the same experiment for FastSLAM. The effective sample size
quickly drops and FastSLAM fails to maintain particle diversity after 200 steps. The right plot is a zoomed view of the middle
plot.

Step 200 Step 250 Step 4536

40 40
40

35 35
35

30 30
30

25 25
25

20 20
20

15 15
15

10 10 10

5 5 5

0 0 0

−20 −10 0 10 20 30 −10 0 10 20 30 40 −45 −40 −35 −30 −25 −20 −15 −10 −5 0 5

Figure 6: Loop closing in Multi-robot Marginal-SLAM with two robots. Despite the error introduced during map merging,
the robot is able to close the loop (left plot). Although the map estimate after loop closing is still inaccurate (middle plot),
Marginal-SLAM is able to minimize the average error and converge to the true solution. The right plot displays the estimate
and true maps map after 10 loops. The blue stars represent the true map and the black circles correspond to the most recent
map estimate. The red diamonds are the current observations.

tion is used to represent both maps with respect to a common in the loop closing area for this robot. The convergence of the
reference frame (robot R1 ’s position in our case). The two filter is empirically verified in this last experiment.
maps are then merged into a single global map. This is illus- We close this section by stating that Marginal-SLAM is
trated in Figure 5. In this figure, the estimated transformation able to build accurate maps with large range and bearing
is biased, because the estimates of the maps have not con- noise. However, very large sensor noise (for example sonar)
verged and the observations are very noisy. Therefore, the demands a high number of particles to ensure convergence.
map merging step introduces some error in the global map. In addition, many update steps are required to obtain reason-
However, as discussed in Section 4.1, Marginal-SLAM is able able errors. We believe that data driven proposals and adap-
to recover from these errors. Figure 6 reveals how R1 and R2 tive stochastic approximation could be applied in the future
close the loop and keep updating the shared map, while re- to confront this problem and speed up the convergence rate.
ducing the error introduce in the map merging step. The plots
are with respect to R1 ’s location. There is no need for ab- 6 Conclusions
solute location or reference information. The experiments and arguments in this paper and [Martinez-
Finally, Figure 7 shows the average mean Euclidean dis- Cantin et al., 2006] indicate that Marginal-SLAM is an im-
tance of the landmarks estimate with respect to the true loca- portant new direction in the design of particle methods for
tion after ten repetitions of the experiment. We use a robocen- SLAM. It is clear that algorithms designed to work on the
tric representation based on the expectation of the first robot’s marginal space behave better than the ones designed to work
location. The peaks represent points with high heading error on the path space. They also lend themselves naturally to the

58
Step 1 Step 45
50
40

45
35
40

30
35

25
30

20 25

20
15

15
10
10

5
5

0 0

−5 0 5 10 15 20 25 30 35 40 45 −30 −20 −10 0 10 20 30

Step 1 Step 46

40 40

35 35

30 30

25 25

20 20

15 15

10 10

5 5

0 0

−5 0 5 10 15 20 25 30 35 40 45 −30 −20 −10 0 10 20

Figure 4: In the first iteration of the multi-robot experiment, Figure 5: Map merging in multi-robot Marginal-SLAM. The
the unknown relative position of the robots is modelled in unknown relative robot location is recovered using the es-
terms of an unknown linear transformation. The upper plot timated transformation between common features in both
represents the true robot locations. The lower plot shows the maps. In the lower plot, maps have been merged. The re-
robot locations, while emphasizing that we don’t know the sulting map is biased because of the poor relative location es-
linear transformation of coordinates at this early stage. The timate at this early stage. Again, the blue stars represent the
blue stars represent the true map and the black circles corre- true map and the black circles correspond to the most recent
spond to the most recent map estimate. The red diamonds are map estimate. The red diamonds are the current observations.
the current observations.

7 ACKNOWLEDGMENTS
We are very thankful to Arnaud Doucet for many thought-
global map estimation in the multi-robot setting.
ful discussions. We are also indebted to Frank Dellaert and
Marginal-SLAM seems to be robust with respect to errors Tim Bailey for valuable feedback on an earlier version of this
introduced during the map joining step. This is important paper.
as computer vision matching algorithms are not perfect. We This work was supported in part by the Dirección General
are planning to deploy Marginal-SLAM using the visual fea- de Investigación of Spain under project DPI2003-07986 and
tures and matching procedure presented in [Brown and Lowe, by NSERC.
2005]. This will enable us to obtain 3D maps of the environ-
ment. References
In future work, we plan to adopt better proposal distrib- [Andrieu et al., 1999] C Andrieu, N de Freitas, and A Doucet. Se-
utions using extended and unscented Kalman filters and to quential MCMC for Bayesian model selection. In IEEE Higher
implement Hessian updates so as to potentially maximize the Order Statistics Workshop, pages 130–134, Caesarea, Israel,
predictive likelihood more efficiently. 1999.

59
4 [Castellanos and Tardós, 1999] J.A. Castellanos and J.D. Tardós.
Mobile Robot Localization and Map Building. A Multisensor Fu-
3.5 sion Approach. Kluwer Academic Publishers, 1999.
[Doucet et al., 2000] A Doucet, N de Freitas, K Murphy, and
average eclidean distance of features S Russell. Rao-Blackwellised particle filtering for dynamic
3

Bayesian networks. In C Boutilier and M Godszmidt, editors,


2.5 Uncertainty in Artificial Intelligence, pages 176–183. Morgan
Kaufmann Publishers, 2000.
2 [Doucet et al., 2001] A Doucet, N de Freitas, and N J Gordon, ed-
itors. Sequential Monte Carlo Methods in Practice. Springer-
1.5 Verlag, 2001.
[Doucet, 1998] A Doucet. On sequential simulation-based methods
1
for Bayesian filtering. Technical Report CUED/F-INFENG/TR
310, Department of Engineering, Cambridge University, 1998.
0.5
[Durrant-Whyte and Bailey, 2006] Hugh Durrant-Whyte and Tim
Bailey. Simultaneous Localisation and Mapping (SLAM): Part
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 I The Essential Algorithms. Robotics and Automation Magazine,
steps June 2006.
1.4
[Elinas et al., 2006] Pantelis Elinas, Robert Sim, and James J.
Little. sigmaSLAM: Stereo vision SLAM using the Rao-
1.2 Blackwellised particle filter and a novel mixture proposal dis-
tribution,. In IEEE International Conference on Robotics and
average euclidean distance of features

1
Automation, 2006.
[Fearnhead, 1998] P Fearnhead. Sequential Monte Carlo Methods
in Filter Theory. PhD thesis, Department of Statistics, Oxford
0.8
University, England, 1998.
[Fox et al., 2001] D Fox, S Thrun, W Burgard, and F Dellaert. Par-
0.6
ticle filters for mobile robot localization. In A Doucet, N de Fre-
itas, and N J Gordon, editors, Sequential Monte Carlo Methods
0.4 in Practice. Springer-Verlag, 2001.
[Fox et al., 2006] D. Fox, J. Ko, K. Konolige, B. Limketkai,
0.2 D. Schulz, and B. Stewart. Distributed multi-robot exploration
and mapping. Proc. of the IEEE, 2006.
0
[Hartley and Zisserman, 2000] R Hartley and A Zisserman. Mobile
46 66 86 106
steps
126 146 166 186 Robot Localization and Map Building. A Multisensor Fusion Ap-
proach. Kluwer Academic Publishers, 2000.
[Howard, 2005] A. Howard. Multi-robot simultaneous localization
Figure 7: Evolution of the average map error in 10 runs. The
and mapping using particle filters. In In Proc. Robotics: Science
error is measured in terms of the Euclidean distance between
and Systems Conference, 2005.
the robocentric map and the ground truth map. Therefore, the
robot location error is implicitly included in this plot. The [Khan et al., 2004] Z. Khan, T. Balch, and F. Dellaert. A Rao-
top plot shows that the algorithm converges asymptotically. Blackwellized particle filter for EigenTracking. 2004.
The bottom plot is a zoomed version of the top plot with 95% [Klaas et al., 2005] Mike Klaas, Nando de Freitas, and Arnaud
confidence intervals, which indicate the robustness of the al- Doucet. Toward practical n2 Monte Carlo: The marginal particle
gorithm. filter. In Uncertainty in Artificial Intelligence, 2005.
[Klaas et al., 2006] Mike Klaas, Mark Briers, Nando de Freitas, Ar-
naud Doucet, Simon Maskell, and Dustin Lang. Fast particle
[Bailey et al., 2006] Tim Bailey, Juan Nieto, and Eduardo Nebot. smoothing: If i had a million particles. In International Confer-
Consistency of the FastSLAM algorithm. In IEEE International ence on Machine Learning, 2006.
Conference on Robotics and Automation, 2006. [Lowe, 2004] D.G. Lowe. Distinctive image features from scale-
invariant keypoints. In International Journal of Computer Vision,
[Bay et al., 2006] H. Bay, T. Tuytelaars, and L. Van Gool. Surf:
volume 60, pages 91–110, 2004.
Speeded up robust features. In Proceedings of the ninth European
Conference on Computer Vision, May 2006. [Martinez-Cantin et al., 2006] R. Martinez-Cantin, N. de Freitas,
and J.A. Castellanos. Marginal-SLAM: A Convergent Parti-
[Bertsekas and Tsitsiklis, 1996] D P Bertsekas and J N Tsitsiklis. cle Method for Simultaneous Robot Localization and Mapping,
Neuro-Dynamic Programming. Athena Scientific, 1996. 2006.
[Brown and Lowe, 2005] M Brown and D Lowe. Unsupervised 3D [Montemerlo et al., 2003] M. Montemerlo, S. Thrun, D. Koller, and
object recognition and reconstruction in unordered datasets. In B. Wegbreit. FastSLAM 2.0: An improved particle filtering al-
5th International Conference on 3D Imaging and Modelling, Ot- gorithm for simultaneous localization and mapping that provably
tawa, Canada, 2005. converges. In Proceedings of the Sixteenth International Joint

60
Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico,
2003. IJCAI.
[Newman et al., 2006] Paul Newman, David Cole, and Kin Ho.
Outdoor slam using visual appearance and laser ranging. In
International Conference on Robotics and Automation (ICRA),
Florida, 2006.
[Pitt and Shephard, 1999] M K Pitt and N Shephard. Filtering via
simulation: Auxiliary particle filters. JASA, 94(446):590–599,
1999.
[Poyadjis et al., 2005a] G. Poyadjis, A. Doucet, and S.S. Singh.
Maximum likelihood parameter estimation using particle meth-
ods. In Joint Statistical Meeting, 2005.
[Poyadjis et al., 2005b] G. Poyadjis, A. Doucet, and S.S. Singh.
Particle methods for optimal filter derivative: Application to pa-
rameter estimation. In IEEE ICASSP, 2005.
[Spall, 2005] J C Spall. Introduction to Stochastic Search and Op-
timization. Wiley, 2005.
[Stachniss et al., 2005] C. Stachniss, G. Grisetti, and W. Burgard.
Recovering particle diversity in a Rao-Blackwellized particle fil-
ter for SLAM after actively closing loops. In Proc. of the IEEE
Int. Conf. on Robotics & Automation (ICRA), pages 667–672,
Barcelona, Spain, 2005.
[Tadic and Doucet, 2005] V B Tadic and A Doucet. Exponential
forgetting and geometric ergodicity for optimal filtering in gen-
eral state-space models. Stochastic Processes and Their Applica-
tions, 115:1408–1436, 2005.
[Thrun, 1993] Sebastian Thrun. Exploration and model building in
mobile robot domains. In E Ruspini, editor, Proceedings of the
ICNN, pages 175–180, 1993.
[Thrun, 2001] S. Thrun. A probabilistic online mapping algorithm
for teams of mobile robots. International Journal of Robotics
Research, 20(5):335–363, 2001.
[Thrun, 2002] S. Thrun. Robotic mapping: A survey. In G. Lake-
meyer and B. Nebel, editors, Exploring Artificial Intelligence in
the New Millenium. Morgan Kaufmann, 2002.

61

You might also like