Probabilistic Plan Management
Laura M. Hiatt
CMUCS09170
November 17, 2009
School of Computer Science
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA 15213
Thesis Committee:
Reid Simmons, chair
Stephen F. Smith
Manuela Veloso
David J. Musliner, SIFT
Submitted in partial fulﬁllment of the requirements
for the degree of Doctor of Philosophy.
Copyright c (2009 Laura M. Hiatt
This research was sponsored by the National Aeronautics and Space Administration under grant numbers NNX06AD23G,
NNA04CK90A, and NNA05CP97A; Naval Research Laboratory under grant number N00173091G001; Defense Ad
vanced Research Projects Agency under grant number FA875005C0033; and National Science Foundation under
grant numbers DGE0750271, DGE0234630, and DGE0750271.
The views and conclusions contained in this document are those of the author and should not be interpreted as repre
senting the ofﬁcial policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other
entity.
Keywords: planning under uncertainty, robust planning, metalevel control, cooperative multi
agent systems, plan management, artiﬁcial intelligence
Abstract
The general problem of planning for uncertain domains remains a difﬁcult chal
lenge. Research that focuses on constructing plans by reasoning with explicit mod
els of uncertainty has produced some promising mechanisms for coping with speciﬁc
types of domain uncertainties; however, these approaches generally have difﬁculty
scaling. Research in robust planning has alternatively emphasized the use of deter
ministic planning techniques, with the goal of constructing a ﬂexible plan (or set of
plans) that can absorb deviations during execution. Such approaches are scalable, but
either result in overly conservative plans, or ignore the potential leverage that can be
provided by explicit uncertainty models.
The main contribution of this work is a composite approach to planning that cou
ples the strengths of both the above approaches while minimizing their weaknesses.
Our approach, called Probabilistic Plan Management (PPM), takes advantage of the
known uncertainty model while avoiding the overhead of nondeterministic planning.
PPM takes as its starting point a deterministic plan that is built with deterministic mod
eling assumptions. PPM begins by layering an uncertainty analysis on top of the plan.
The analysis calculates the overall expected outcome of execution and can be used to
identify expected weak areas of the schedule.
PPM uses the analysis in two main ways to maximize the utility of and manage
execution. First, it makes deterministic plans more robust by minimizing the negative
impact that unexpected or undesirable contingencies can have on plan utility. PPM
strengthens the current schedule by fortifying the areas of the plan identiﬁed as weak
by the probabilistic analysis, increasing the likelihood that they will succeed. In exper
iments, probabilistic schedule strengthening is able to signiﬁcantly increase the utility
of execution while introducing only a modest overhead.
Second, PPM reduces the amount of replanning that occurs during execution via
a probabilistic metalevel control algorithm. It uses the probability analysis as a basis
for identifying cases where replanning probably is (or is not) necessary, and acts ac
cordingly. This addresses the tradeoff of too much replanning, which can lead to the
overuse of computational resources and lack of responsiveness, versus too little, which
can lead to undesirable errors or missed opportunities during execution. Experiments
show that probabilistic metalevel control is able to considerably decrease the amount
of time spent managing plan execution, without affecting how much utility is earned.
In these ways, our approach effectively manages the execution of deterministic plans
for uncertain domains both by producing effective plans in a scalable way, and by
intelligently controlling the resources that are used maintain these highutility plans
during execution.
iv
Acknowledgments
I would ﬁrst like to thank my advisor, Reid Simmons, for being such a great mentor through this
process. This would not have been possible without all of his support, questions, and encourage
ment, and I cannot thank him enough.
Thanks also to Steve Smith, for letting me join his project for part of this thesis. I really enjoyed
the opportunity to apply this work to a real domain. Along those lines, thanks to the rest of the ICLL
group for the many hours discussing the intricacies of C TAEMS and the Coordinators project.
Terry Zimmerman gets a special thanks for suggesting the “PADS” acronym, saving everyone
from days of repeating “probabilistic analysis of deterministic schedules” over and over again; but
also, of course, thanks to Zack Rubinstein, Laura Barbulescu, Anthony Gallagher, Matt Danish,
Charlie Collins and Dave Crimm.
Thanks to Dave Musliner and Manuela Veloso for being part of my committee, and for all of
their good suggestions and guidance.
Thanks to Brennan Sellner for lending me his code for and helping me with ASPEN, and for
sharing his ASPEN contacts with me to provide additional support. Thanks also to him and the
rest of the Trestle/IDSR group – Fred Heger, Nik Melchior, Seth Koterba, Brad Hamner and Greg
Armstrong – for all the fun times in the highbay, even if sometimes the “fun” was in a painful way.
I would like to thank my mother and my sister for not giving me too much trouble about never
having a “real job,” and for consistently expressing (or, at least, pretending to express) interest in
what I’ve been up to these many years. I would also like to ﬁrmly point out to them, this one last
time, that my robots are not taking over the world.
Finally, I want to thank my husband, Stephen, for, well, everything.
v
vi
Contents
1 Introduction 13
1.1 Probabilistic Analysis of Deterministic Schedules . . . . . . . . . . . . . . . . . . 15
1.2 Probabilistic Schedule Strengthening . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Probabilistic MetaLevel Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Related Work 21
2.1 NonDeterministic Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.1 PolicyBased Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2 SearchBased Contingency Planning . . . . . . . . . . . . . . . . . . . . . 23
2.1.3 Conformant Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Deterministic Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 Plan Flexibility and Robustness . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 Plan Management and Repair . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 MetaLevel Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Background 29
3.1 Plan Versus Schedule Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Domain and Schedule Representation . . . . . . . . . . . . . . . . . . . . . . . . 30
1
CONTENTS
3.2.1 Activity Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Flexible Times Schedules . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 SingleAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.2 MultiAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Decision Trees with Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Approach 49
4.1 Domain Suitability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Probabilistic Analysis of Deterministic Schedules . . . . . . . . . . . . . . . . . . 51
4.3 Probabilistic Schedule Strengthening . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Probabilistic MetaLevel Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Probabilistic Analysis of Deterministic Schedules 59
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1.1 SingleAgent PADS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.2 MultiAgent PADS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2 SingleAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.1 Activity Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.2 Probability Proﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 MultiAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.1 Activity Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.2 Probability Proﬁles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.3 Global Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3.4 Distributed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.3.5 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.4.1 SingleAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2
CONTENTS
5.4.2 MultiAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6 Probabilistic Schedule Strengthening 105
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Tactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2.1 Backﬁlling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.2.2 Swapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.2.3 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.4.1 Effects of Targeting Strengthening Actions . . . . . . . . . . . . . . . . . 121
6.4.2 Comparison of Strengthening Strategies . . . . . . . . . . . . . . . . . . . 124
6.4.3 Effects of Global Strengthening . . . . . . . . . . . . . . . . . . . . . . . 125
6.4.4 Effects of RunTime Distributed Schedule Strengthening . . . . . . . . . . 127
7 Probabilistic MetaLevel Control 131
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.1.1 SingleAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.1.2 MultiAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.1.3 Probabilistic Plan Management . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 SingleAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2.1 Baseline Plan Management . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2.2 Uncertainty Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2.3 Likely Replanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.2.4 Probabilistic Plan Management . . . . . . . . . . . . . . . . . . . . . . . 138
7.3 MultiAgent Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.3.1 Baseline Plan Management . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.3.2 Learning When to Replan . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3
CONTENTS
7.3.3 Probabilistic Plan Management . . . . . . . . . . . . . . . . . . . . . . . 142
7.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.4.1 Thresholding MetaLevel Control . . . . . . . . . . . . . . . . . . . . . . 144
7.4.2 Learning Probabilistic MetaLevel Control . . . . . . . . . . . . . . . . . 150
8 Future Work and Conclusions 165
8.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.1.1 MetaLevel Control Extensions . . . . . . . . . . . . . . . . . . . . . . . 165
8.1.2 Further Integration with Planning . . . . . . . . . . . . . . . . . . . . . . 166
8.1.3 Demonstration on Other Domains / Planners . . . . . . . . . . . . . . . . 167
8.1.4 HumanRobot Teams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.1.5 NonCooperative MultiAgent Scenarios . . . . . . . . . . . . . . . . . . 168
8.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9 Bibliography 171
A Sample Problems 181
A.1 Sample MRCPSP/max problem in ASPEN syntax . . . . . . . . . . . . . . . . . . 182
A.2 Sample C TAEMS Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
4
List of Figures
3.1 A sample STN for a single scheduled method m with duration 5 and a deadline of
10. Notice that the method is constrained to start no earlier than time 0 (via the
constraint between time 0 and m start) and ﬁnish by time 10 (via the constraint
between time 0 and m finish). . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 A sample MRCPSP/max problem. . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 A graph of a normal distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 A simple, deterministic 3agent problem. . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 A small 3agent problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 The limited local view for agent Buster of Figure 3.5. . . . . . . . . . . . . . . . . 43
3.7 An example decision tree built from a data set. The tree classiﬁes 11 examples
giving state information about the time of day and how much rain there has been in
the past 10 minutes as having or not having trafﬁc at that time. Each node of the tree
shows the attribute and threshold for that branch as well as how many instances of
each class are present at that node. Instances that are not underlined refer to trafﬁc
being present; underlined instances refer to trafﬁc not being present. . . . . . . . . 46
4.1 Flow diagram for PPM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1 The activity dependencies (denoted by solid arrows) and propagation order (de
noted by circled numbers) that result from the schedule shown for the problem in
Figure 3.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5
LIST OF FIGURES
5.2 The same propagation as Figure 5.1, but done distributively. Jeremy ﬁrst calculates
the proﬁles for its activities, m1 and T1, and passes the proﬁle of T1 to Buster;
Buster can then ﬁnish the proﬁle calculation for the rest of the activity network. . . 65
5.3 A simple 3agent problem and a corresponding schedule. . . . . . . . . . . . . . . 86
5.4 Time spent calculating 20 schedules’ initial expected utilities. . . . . . . . . . . . . 100
5.5 The accuracy of global PADS: the calculated expected utility compared to the av
erage utility, when executed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.6 Time spent distributively calculating 20 schedule’s initial expected utility; shown
as both the average time per agent and total time across all agents. . . . . . . . . . 103
6.1 The baseline strengthening strategy explores the full search space of the different
orderings of backﬁlls, swapping and pruning steps that can be taken to strengthen
a schedule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2 The roundrobin strengthening strategy performs one backﬁll, one swap and one
prune, repeating until no more strengthening actions are possible. . . . . . . . . . . 109
6.3 The greedy strengthening strategy repeatedly performs one trial backﬁll, one trial
swap and one trial prune, and commits at each point to the action that raises the
expected utility of the joint schedule the most. . . . . . . . . . . . . . . . . . . . . 109
6.4 Utility earned by different backﬁlling conditions, expressed as the average propor
tion of the originally scheduled utility earned. . . . . . . . . . . . . . . . . . . . . 122
6.5 The percent increase of the number of methods in the initial schedule after back
ﬁlling for each mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.6 Utility for each problem earned by different strengthening strategies, expressed as
the average proportion of the originally scheduled utility earned. . . . . . . . . . . 126
6.7 Utility for each problem for the different online replanning conditions, expressed
as the average proportion of omniscient utility earned. . . . . . . . . . . . . . . . . 128
6.8 Utility earned across various conditions, with standard deviation shown. . . . . . . 129
7.1 Uncertainty horizon results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.2 Likely replanning results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.3 Average execution time across all three problems using PPM with both the uncer
tainty horizon and likely replanning at various thresholds. . . . . . . . . . . . . . . 150
6
LIST OF FIGURES
7.4 Average utility across all three problems using PPM with both the uncertainty hori
zon and likely replanning at various thresholds. . . . . . . . . . . . . . . . . . . . 151
7.5 Average proportion of omniscient utility earned for various probabilistic metalevel
control strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.6 Average proportion of omniscient utility earned for various probabilistic metalevel
control strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.7 Average proportion of omniscient utility earned for various plan management ap
proaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.8 Average utility earned for various plan management approaches. . . . . . . . . . . 162
7
LIST OF FIGURES
8
List of Tables
3.1 A summary of utility accumulation function (uaf) types. . . . . . . . . . . . . . . . 40
3.2 A summary of nonlocal effect (NLE) types. . . . . . . . . . . . . . . . . . . . . . 42
5.1 Average time spent by PADS calculating and updating probability proﬁles while
executing schedules in the singleagent domain. . . . . . . . . . . . . . . . . . . . 98
5.2 Characteristics of the 20 DARPA Coordinators test suite problems. . . . . . . . . . 99
6.1 A comparison of strengthening strategies. . . . . . . . . . . . . . . . . . . . . . . 124
7.1 Characteristics of the 20 generated test suite problems. . . . . . . . . . . . . . . . 153
7.2 Average amount of effort spent managing plans under various probabilistic meta
level control strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.3 Average amount of effort spent managing plans under various probabilistic meta
level control strategies using strengthening. . . . . . . . . . . . . . . . . . . . . . 157
7.4 Average amount of effort spent managing plans under various probabilistic plan
management approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.5 Average size of communication log ﬁle. . . . . . . . . . . . . . . . . . . . . . . . 163
9
LIST OF TABLES
10
List of Algorithms
5.1 Method probability proﬁle calculation. . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Task probability proﬁle calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Taskgroup expected utility calculation for the singleagent domain with normal du
ration distributions and temporal constraints. . . . . . . . . . . . . . . . . . . . . . 68
5.4 Method start time distribution calculation with discrete distributions, temporal con
straints and hard NLEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.5 Method soft NLE duration and utility effect distribution calculation with discrete
distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.6 Method ﬁnish time distribution calculation with discrete distributions, temporal con
straints and soft NLEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.7 Method probability of earning utility calculation with utility formulas. . . . . . . . . 78
5.8 Activity utility formula evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.9 Method expected utility calculation with discrete distributions and soft NLEs. . . . . 80
5.10 OR task ﬁnish time distribution calculation with discrete distributions and uafs. . . . 81
5.11 OR task expected utility calculation with discrete distributions and uafs. . . . . . . . 82
5.12 AND task ﬁnish time distribution calculation with discrete distributions and uafs. . . 83
5.13 AND task expected utility calculation with discrete distributions and uafs. . . . . . . 84
5.14 Distributed taskgroup expected utility approximation with enablers and uafs. . . . . 95
6.1 Global backﬁlling step utilizing maxleaf tasks. . . . . . . . . . . . . . . . . . . . . 111
6.2 Distributed backﬁlling step utilizing maxleaf tasks. . . . . . . . . . . . . . . . . . 114
6.3 Global swapping tactic for a schedule with failure likelihoods and NLEs. . . . . . . 116
11
LIST OF ALGORITHMS
6.4 Global pruning step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.5 Global baseline strengthening strategy utilizing backﬁlls, swaps and prunes. . . . . . 119
6.6 Global roundrobin strengthening strategy utilizing backﬁlls, swaps and prunes. . . . 119
6.7 Global greedy strengthening strategy utilizing backﬁlls, swaps and prunes. . . . . . 120
7.1 Probabilistic plan management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2 Uncertainty horizon calculation for a schedule with normal duration distributions. . . 136
7.3 Taskgroup expected utility calculation, within an uncertainty horizon, for the single
agent domain with normal duration distributions and temporal constraints. . . . . . . 137
12
Chapter 1
Introduction
We are motivated by the highlevel goal of building intelligent robotic systems in which robots, or
teams of robots and humans, operate autonomously in realworld situations. The rich sources of
uncertainty common in these scenarios pose challenges for the various aspects of execution, from
sending commands to the controller to navigate over unpredictable terrain, to deciding which activ
ities to execute given unforeseen circumstances. Many domains of this type are further complicated
by the presence of durative actions and concurrent, multiagent execution. This thesis focuses on
the highlevel aspect of the problem: generating, executing and managing robust plans for agents
operating in such situations.
Not surprisingly, the general problem of planning and scheduling for uncertain domains re
mains a difﬁcult challenge. Research that focuses on constructing plans (or policies) by reasoning
with explicit models of uncertainty has produced some promising results (e.g., [McKay et al., 2000,
Beck and Wilson, 2007]). Such approaches typically produce optimal or expected optimal solu
tions that encode a set of plans that cover many or all of the contingencies of execution. However,
in complex domains these approaches can quickly become prohibitively expensive, making them
undesirable or impractical for realworld use [Bryce et al., 2008].
Research in robust planning, in contrast, has emphasized the use of expected models and de
terministic planning techniques, with the goal of constructing a ﬂexible plan (or set of plans) that
can absorb deviations at execution time [Policella et al., 2007]. These approaches are much more
scalable than probabilistic planning or scheduling, but either result in overly conservative schedules
at the expense of performance (e.g., [Morris et al., 2001]) or ignore the potential leverage that can
be provided by an explicit uncertainty model. For example, even though such systems typically
replan during execution, the deterministic assumptions unavoidably can create problems to which
13
1 Introduction
replanning cannot adequately respond, such as a missed deadline that cannot be repaired. Replan
ning very frequently can help to alleviate the number of these problems that occur, but a large
amount of online replanning has the potential to lead to a lack of responsiveness and an overuse of
computational resources during execution, ultimately resulting in a loss of utility.
The main contribution of this work is adopting a composite approach to planning and schedul
ing that couples the strengths of both deterministic and nondeterministic planning while mini
mizing their weaknesses. Our approach, called Probabilistic Plan Management (PPM), is a natural
synthesis of both of the above research streams that takes advantage of the known uncertainty model
while avoiding the large computational overhead of nondeterministic planning. As a starting point,
we assume a deterministic plan, or schedule, that is built to maximize utility under deterministic
modeling assumptions, such as by assuming deterministic durations for activities or by ignoring
unlikely outcomes or effects of activity execution. Our approach ﬁrst layers an uncertainty analy
sis, called a probabilistic analysis of deterministic schedules (PADS), on top of the deterministic
schedule. The analysis calculates the overall expected outcome of execution and can be used to
identify expected weak areas of the schedule. We use this analysis in two main ways to manage
and bolster the utility of execution. Probabilistic schedule strengthening strengthens the current
schedule by targeting local improvements at identiﬁed likely weak points. Such improvements
have inherent tradeoffs associated with them; these tradeoffs are explicitly addressed by PADS,
which ensures that schedule strengthening efforts increase the schedule’s expected utility. Proba
bilistic metalevel control uses the expected outcome of the plan to intelligently control the amount
of replanning that occurs, explicitly addressing the tradeoff of too much replanning, which can
lead to the overuse of computational resources and lack of responsiveness, versus too little, which
can lead to undesirable errors or missed opportunities during execution.
Probabilistic plan management combines these different components to effectively manage
the execution of plans for uncertain domains. During execution, PADS tracks and updates the
probabilistic state of the schedule. Based on this, PPM makes the decision of when replanning
and probabilistic schedule strengthening is necessary, and replans and strengthens at the identiﬁed
times, maintaining high utility schedules while also limiting unnecessary replanning.
In the following sections, we introduce the concepts of PADS, probabilistic schedule strength
ening, and probabilistic metalevel control, and our basic approaches to them.
14
1.1 Probabilistic Analysis of Deterministic Schedules
1.1 Probabilistic Analysis of Deterministic Schedules
Central to our approach is the idea of leveraging domain uncertainty models to provide information
that can help with more effective executiontime management of deterministic schedules. We have
developed an algorithm that performs a Probabilistic Analysis of Deterministic Schedules (PADS),
given their associated uncertainty model. At the highest level of abstraction, the algorithm explores
all known contingencies within the context of the current schedule in order to ﬁnd the probability
that execution “succeeds” by meeting its objective. It calculates, for each activity, the probability of
the activity meeting its objective, as well as its expected contribution to the schedule. By explicitly
calculating these values, PADS is able to summarize the expected outcome of execution, as well as
identify portions of the schedule where something is likely to go wrong.
PADS can be used as the basis for many planning decisions. In this thesis, we discuss how it
can be used to strengthen plans to increase the expected utility of execution, and how it can be used
as the basis for different metalevel control strategies that control the computational resources used
to replan during execution. Further, we show how its low computational overhead makes it a useful
analysis for even complex, multiagent domains.
1.2 Probabilistic Schedule Strengthening
While deterministic planners have the beneﬁt of being more scalable than probabilistic planners,
they make strong, and often inaccurate, assumptions on how execution will unfold. In the case of
domains with temporal uncertainty (e.g., uncertain activity durations), such as the ones we con
sider, deterministic modeling assumptions have the potential to lead to frequent constraint viola
tions when activities end at unanticipated times and cause inconsistencies in the schedule. Re
planning during execution can effectively respond to such unexpected dynamics, but only as long
as dependencies allow sufﬁcient time for one or more agent planners to respond to problems that
occur; further, even if the agent is able to sufﬁciently respond to a problem, utility loss may occur
due to the lastminute nature of the schedule repair. One way to prevent this situation is to layer a
step on top of the planning process that proactively tries to prevent these problems from occurring,
or to minimize their effects, thereby strengthening schedules against the uncertainty of execution
[Gallagher, 2009].
Schedules are strengthened via strengthening actions, which make small modiﬁcations to the
schedule to make them more robust. The use of such improvements, however, typically has an
associated tradeoff. Example strengthening actions, and their tradeoffs, are:
15
1 Introduction
1. Adding redundant backup activities  This provides protection against activity failure but
can clog schedules and reduce slack. In a search and rescue scenario, for example, trying
to reach survivors via two rescue points, instead of one, may increase the likelihood of a
successful rescue; but, as a result, the agents at the added rescue point may not have enough
time to execute the rest of their schedule.
2. Swapping activities for shorterduration alternatives  This provides protection against an
activity missing a deadline, but shorterduration alternatives tend to be less effective or have
a lower utility. For example, restoring power to only critical buildings instead of the entire
area may allow power to be restored more quickly, but it is less desirable as other buildings
go without power.
3. Reordering scheduled activities  This may or may not help, depending on which activities
have tight deadlines and/or constraints with other activities. Deciding to take a patient to the
hospital before refueling, for example, may help the patient, but the driver then has a higher
risk of running out of fuel and failing to perform other activities.
The set of strengthening actions used can be adapted to suit a variety of domains, depending on
what is most appropriate for the domain’s structure. Ideally, schedule strengthening should be used
in conjunction with replanning during execution, allowing agents to take advantage of opportunities
that may arise while also ensuring that they always have schedules that are robust to unexpected or
undesirable outcomes.
In deterministic systems, schedule strengthening can be performed by using simple determin
istic heuristics, or limited stochastic analyses of the current schedule (e.g., considering the uncer
tainty of the duration of an activity without any propagation) [Gallagher, 2009]. In this thesis,
we argue that explicitly reasoning about the uncertainty associated with scheduled activities, via
our PADS algorithm, provides a more effective basis for strengthening schedules. Probabilistic
schedule strengthening ﬁrst relies on PADS to identify the most proﬁtable activities to strengthen.
Then, improvements are targeted at activities that have a high likelihood of not doing their part
to contribute to the schedule’s success, and whose failure would have the largest negative impact.
As described, all such improvements have an associated tradeoff; PADS, however, ensures that
all strengthening actions have a positive expected beneﬁt and improve the expected outcome of
execution.
We have evaluated the effect of probabilistic schedule strengthening in a multiagent simulation
environment, where the objective is to maximize utility earned during execution and there is tem
poral and activity outcome uncertainty. Probabilistic schedule strengthening produced schedules
that are signiﬁcantly more robust and earn 32% more utility than their deterministic counterparts
when executed. When it is used in conjunction with replanning during execution, it results in 14%
16
1.3 Probabilistic MetaLevel Control
more utility earned as compared to replanning alone during execution.
1.3 Probabilistic MetaLevel Control
Replanning provides an effective way of responding to the dynamics of execution: it helps agents
avoid failure or utility loss resulting from unpredicted and undesirable outcomes, and takes advan
tage of opportunities that may arise from unpredicted favorable outcomes. Replanning too much
during execution, however, has the potential to become counterproductive by leading to a lack
of responsiveness and a waste of computational resources. Metalevel control is one technique
commonly used to address this tradeoff. Metalevel control allows agents to optimize their perfor
mance by selectively choosing deliberative actions to perform during execution.
In deterministic systems, metalevel control can be done by learning a model for deliberative
action selection based on deterministic, abstract features of the current schedule [Raja and Lesser,
2007]. Deterministic approaches such as this one, however, ignore the beneﬁt that explicitly con
sidering the uncertainty model can provide. For example, whether replanning would result in a
better schedule can depend, in part, on the probability of the current schedule meeting its objective;
similarly, the beneﬁt of planning actions such as probabilistic schedule strengthening can be greatly
affected by the robustness of the current schedule. In this thesis, we show that a probabilistic meta
level control algorithm, based on probability proﬁle information obtained by the PADS algorithm,
is an effective way of selecting when to replan (or probabilistically strengthen) during execution.
Ideally, agents would replan if it would be useful either in avoiding failure or in taking advan
tage of new opportunities. Similarly, they ideally would not replan if it would not be useful, either
because: (1) there is no potential failure to avoid; (2) there is a possible failure but replanning
would not ﬁx it; (3) there is no new opportunity to leverage; (4) or replanning would not be able to
exploit the opportunities that were there. Many of these situations are difﬁcult to explicitly predict,
in large part because it is difﬁcult to predict whether a replan is beneﬁcial without actually doing it.
As an alternative, probabilistic metalevel control uses the probabilistic analysis to identify times
where there is potential for replanning to be useful, either because there is likely a new opportunity
to leverage or a possible failure to avoid. When such times are identiﬁed, no replanning is done;
otherwise, replanning is possibly useful and so should be performed.
The control strategy can also be used to select the amount of the current schedule that is re
planned. In situations with high uncertainty, always replanning the entire schedule means that
activities farther down the schedule are potentially replanned many times before they are actually
executed, possibly wasting computational effort. To avoid this, we introduce an uncertainty hori
17
1 Introduction
zon, which distinguishes between portions of the schedule that are likely to need to be managed and
replanned at the current time, and portions that are not. This distinction is made based on the prob
ability that activities beyond the uncertainty horizon will need to be replanned before execution
reaches them.
We have shown, through experimentation, the effectiveness of probabilistic metalevel control.
Its use in a singleagent scenario with temporal uncertainty has shown a 23% reduction in sched
ule execution time (where execution time is deﬁned as the schedule makespan plus the time spent
replanning during execution) versus replanning whenever a schedule conﬂict arises. Using proba
bilistic metalevel control in a multiagent scenario with temporal and activity outcome uncertainty
results in a 49% decrease in plan management time as compared to a plan management strategy
that replans according to deterministic heuristics, with no signiﬁcant change in utility earned.
1.4 Thesis Statement
Explicitly analyzing the probabilistic state of deterministic plans built for uncertain domains pro
vides important information that can be leveraged in a scalable way to effectively manage plan
execution.
We support this statement by presenting algorithms for probabilistic plan management that
probabilistically analyze schedules, strengthen schedules based on this analysis, and limit the
amount of computational effort spent replanning and strengthening schedules during execution.
Our approach was validated through a series of experiments in simulation for two different do
mains. For one domain, using probabilistic plan management was about to reduce schedule execu
tion time by 23% (where execution time is deﬁned as the schedule makespan plus the time spent
replanning during execution). For another, probabilistic plan management resulted in increasing
utility earned by 14% with only a small added overhead. Overall, PPM provides some of the
beneﬁts of nondeterministic planning while avoiding its computational overhead.
1.5 Thesis Outline
The rest of this document is organized as follows. In Chapter 2, we discuss related work, describ
ing various approaches to the problem of planning and scheduling for uncertain domains, as well
as different methodologies for metalevel control. In Chapter 3, we discuss background concepts
related to our work, including details on the two domains we consider. Next, Chapter 4 provides an
overview of our approach to probabilistic plan management. The following three chapters go into
18
1.5 Thesis Outline
more detail about the three main components of our approach: probabilistic analyses of determin
istic schedules, probabilistic schedule strengthening, and probabilistic metalevel control. Finally,
in Chapter 8, we discuss future work and conclusions.
19
1 Introduction
20
Chapter 2
Related Work
There are many different approaches that handle planning and execution for uncertain domains.
Approaches can be distinguished by what restrictions they place on the domains they can handle,
such as whether they consider durative actions and temporal uncertainty, and whether they account
for concurrency and multiple agents. We present some of these approaches, categorized by whether
uncertainty is taken into account in the planning process or planning is done deterministically.
We also discuss various approaches to the metalevel control of computation during execution,
including work utilizing planning horizons.
2.1 NonDeterministic Planning
The main distinction between planners that deal with uncertainty is whether they assume agents
can observe the world during execution. Of those that make this assumption, there are policy
based approaches based on the basic Markov Decision Process (MDP) model [Howard, 1960] and
more searchbased approaches such as contingency planners. Some MDP models also assume that
the world is partially observable. Planning approaches that assume the world is not observable are
commonly referred to as conformant planners.
2.1.1 PolicyBased Planning
One common policybased planner is based on an MDP model [Howard, 1960]. The classic MDP
model has four components: a collection of states, a collection of actions, a transition model (i.e.,
which states actions transition between, and the probability of the transition happening), and the
21
2 Related Work
rewards associated with each transition (or state). The MDP can be solved to ﬁnd the optimal
policy that maximizes reward based on the problem speciﬁcation.
This classic MDP model is unable to handle explicit durative actions, which can be consid
ered by our approach. There is a growing body of work, however, that has been done to remove
this limitation. SemiMDPs, for example, handle domains with uncertain action durations and no
deadlines [Howard, 1971]. Timedependent MDPs [Boyan and Littman, 2000] go one step further
and can model deadlines by undiscounting time in the MDP model. Hybrid MDPs can also handle
deadlines, and can represent both discrete and continuous variables. One of the main difﬁculties
of this model is convolving the probability density functions and value functions while solving the
model; this has been approached in a variety of ways [Feng et al., 2004, Li and Littman, 2005,
Marecki et al., 2007].
None of the above MDP models, however, can handle concurrency, and so cannot be used for
multiagent scenarios. Generalized SemiMDPs, an extension of SemiMDPs [Younes et al., 2003,
Younes and Simmons, 2004a], can handle concurrency and actions with uncertain, continuous du
rations, at the cost of very high complexity. The DUR family of MDP planners can also handle
both concurrency and durative actions [Mausam and Weld, 2005], again at a very high computa
tional cost. The work shows, in fact, that the ∆DUR planner that plans with expected durations
outperforms the ∆DUR planner that explicitly models temporal uncertainty [Mausam and Weld,
2006].
In addition to the MDP systems above, MDPs have been used to solve a distributed, multi
agent domain with activity duration and outcome uncertainty, the same domain that we use as part
of this thesis work [Musliner et al., 2006, 2007]. This domain is especially challenging for MDP
approaches, as plans must be constructed in real time. Each agent in this approach has its own
MDP, which it partially and approximately solves in order to generate a solution in acceptable
time. However, despite this success, scalability ultimately prevented it from performing better than
competing deterministic approaches.
MDPs that assume the world is partially observable, or POMDPs, assume that agents can make
some local observations during execution but cannot directly observe the true state  instead, agents
must probabilistically infer it [Kaelbling et al., 1998]. This assumption makes POMDP’s very
difﬁcult to solve, and the additional complexity is unnecessary for the domains considered in this
document, where the world is fully observable.
A nonMDP policybased approach is FPG, a Factored Policy Gradient planner [Buffet and
Aberdeen, 2009]. This planner uses gradient ascent to optimize a parameterized policy, such as
if users prefer a safer, or more optimal but riskier, plan. It still, however, suffers from the high
complexity of policybased probabilistic planners, and the authors suggest that it may be improved
22
2.1 NonDeterministic Planning
by combining deterministic reasoning with their probabilistic approach in order to allowit to handle
larger and more difﬁcult policies.
2.1.2 SearchBased Contingency Planning
Conditional, or contingency, planners create branching plans, and assume observations during ex
ecution to choose which branch of the plan to follow. Many contingency planners such as SGP
[Weld et al., 1998], MBP [Bertoli et al., 2001], PGraphPlan [Blum and Langford, 1999] and Para
graph [Little and Thi´ ebaux, 2006], however, are unable to support durative actions. Partialorder
contingency approaches include CBuridan [Draper et al., 1994], BProdigy [Blythe and Veloso,
1997], DTPOP [Peot, 1998] and Mahinur [Onder and Pollack, 1999]. They do not support con
currency or durative actions; it is possible to add them to these architectures, but at an even higher
computational cost.
[Xuan and Lesser, 1999, Wagner et al., 2006] consider a domain similar to ours, and use con
tingency planning in their scheduling process and to manage the uncertainty between agents, but
at the cost of high complexity. Prottle uses forward search planning to solve a concurrent, du
rative problem over a ﬁnite horizon [Little et al., 2005]. Although it uses a probabilistic planning
graphbased heuristic to reduce the search space, memory usage prevents this algorithm from being
applied to anything but small problems.
Some work tries to lessen the computational burden of contingency planners by planning in
crementally. These approaches share some similarities with our approach, as they try to identify
weak parts of the plan to improve at plan time. One common incremental approach is JustInCase
(JIC) scheduling [Drummond et al., 1994], which creates branch points in a schedule by ﬁnding the
place where it is most likely to fail based on normal task distributions overrunning their scheduled
duration. For simple distribution types, this involves performing probability calculations similar to
how we calculate our uncertainty horizon for singleagent schedules.
ICP [Dearden et al., 2003] and Tempastic [Younes and Simmons, 2004b] also build contingency
plans incrementally, by branching on utility gain and ﬁxing plan failure points, respectively. Both
of these approaches use stochastic simulation to ﬁnd plan outcome probabilities. [Blythe, 1998]
calculates plan certainty by translating the plan into a Bayesian network, and adding contingent
branches to increase the probability of success. [Meuleau and Smith, 2003]’s OKP generates the
optimal plan having a limited number of contingency branches. These incremental approaches
make contingency planning more manageable or tractable; however, the approach in general is still
unable to model large numbers of outcomes. Due to its difﬁculty, contingency planning is best for
dealing with avoiding unrecoverable or critical errors in domains with small numbers of outcomes
23
2 Related Work
as in [Foss et al., 2007].
2.1.3 Conformant Planning
Conformant planners such as [Goldman and Boddy, 1996], CGP [Smith and Weld, 1998], CMBP
[Cimatti and Roveri, 2000], BURIDAN [Kushmerick et al., 1995] and PROBAPOP [Onder et al.,
2006] handle uncertainty by developing plans that will lead to the goal without assuming that the
results of actions are observable. Acommon way to think about conformant planners is that they try
force the world into a single, known state by generating “failsafe” plans that are robust to anything
that may happen during execution. Conformant planning is very difﬁcult, and also very limiting;
typically, effective conformant planning requires domains with limited uncertainty and very strong
actions. Additionally, none of the above conformant planners can currently handle durative actions.
Planning to add “robustness” is similar to conformant planning, and has the goal of building
plans that are robust to various types of uncertainty. [Fox et al., 2006] “probes” a temporal plan for
robustness by testing the effectiveness of different slight variations of the plan, where each variation
is generated by stochastically altering the timings of execution points in the plan. [Blackmore et al.,
2007] build plans robust to temporal and other types of uncertainty by utilizing sampling techniques
such as Monte Carlo simulation to approximately predict the outcome of execution. In [Younes and
Musliner, 2002], a probabilistic extension to the CIRCAplanner [Musliner et al., 1993] is described
that uses Monte Carlo simulation to verify whether a potential deterministic plan meets speciﬁed
safety constraints. [Beck and Wilson, 2007] also uses Monte Carlo simulation to generate effective
deterministic schedules for the jobshop scheduling problem. Using Monte Carlo simulation in this
way, however, is not practical for our runtime analysis, as the number of particles needed increases
quickly with the amount of domain uncertainty; in contrast, our approach analytically calculates
and stores probability information for each activity. [Fritz and McIlraith, 2009] also considers
probabilities analytically, in part, while calculating the robustness of plans, but still partially relies
on sampling and cannot handle complex plan objective functions.
2.2 Deterministic Planning
The basic case for deterministic planning in uncertain domains is to generate a plan based on a de
terministic reduction of the domain, and then replan as necessary during execution. A wellknown
example of this strategy is FFReplan [Yoon et al., 2007], which determinizes probabilistic input
domains, plans with the deterministic planner FF [Hoffmann and Nebel, 2001], and replans when
unexpected states occur. This very simple strategy for dealing with domain uncertainty won the
24
2.2 Deterministic Planning
Probabilistic Track of the 2004 International Planning Competition, and was also the top performer
in the 2006 competition. Planning in this way, however, suffers from problems such as unrecover
able errors and spending large amounts of time replanning at execution time (even though the time
spent planning may be smaller overall). Below we discuss two main bodies of work that have been
developed to improve deterministic planning of uncertain domains: increasing plan ﬂexibility and
robustness to lessen the need to replan, and making each replan that does happen as fast as possible.
2.2.1 Plan Flexibility and Robustness
Whereas ﬁxed times schedules specify only one valid start time for each task, ﬂexible times sched
ules allow execution intervals for tasks to drift within a range of feasible start and end times [Smith
et al., 2006]. This allows the system to adjust to unforeseen temporal events as far as constraints
will allow, replanning only if one or more constraints in the current deterministic schedule cannot
be satisﬁed. The schedules are modeled as simple temporal networks (STNs) [Dechter et al., 1991],
enabling fast constraint propagation through the network as start and end times are updated after
execution to generate new execution intervals for later tasks. Flexible times schedules may be total
or partialorder.
IxTeT [Laborie and Ghallab, 1995], VHPOP [Younes and Simmons, 2003] and [Vidal and
Geffner, 2006] are temporal planners that generate partialorder plans. Partialorder planners repair
plans at run time by searching through the partial plan space. These planners in general, however,
have a much larger state space than totalorder planners, and so replanning in general is more
expensive. This work is orthogonal to our approach, as either total or partialorder plans could be
used as part of our analysis.
In a robustnessfocused approach to partialorder planning, [Policella et al., 2007] generates ro
bust partialorder plans by expanding a ﬁxed times schedule to be partialorder, showing improve
ments with respect to fewer precedence orderings and more temporal slack over a more topdown
planning approach. [Lau et al., 2007] also generates partialorder schedules that are guaranteed to
have a minimum utility with probability (1 − ) for scenarios where the only uncertainty is in the
processing time of each task.
[Jim´ enez et al., 2006] look at how to create more robust deterministic plans for probabilistic
planning problems by directly considering the underlying probabilistic model. They incorporate
probabilistic information as an action “cost model,” and by solving the resulting deterministic
problems they see a decrease in the frequency of replanning. This approach has been demonstrated
to outperform FFReplan on single agent domains without durative actions, but it is so far unable
to handle the types of domains we are considering. [Gallagher, 2009] makes plans more robust
25
2 Related Work
by using simple schedule strengthening that relies on a limited stochastic analysis of the current
schedule; in general, however, it is not as effective as our probabilistic schedule strengthening
approach which relies on a probability analysis that propagates probability information throughout
the entire schedule.
An approach that helps to promote a different kind of ﬂexibility during execution allows agents
to switch tasks during execution in order to expedite the execution of bottleneck tasks, minimizing
schedule makespan [Sellner and Simmons, 2006]. In order to anticipate problems, the agents
predict how much longer executing tasks will take to complete; in theory, this prediction could
also be used as part of our probability calculations.
Alternately, a plan can be conﬁgured to be dynamically controllable [Morris et al., 2001].
A plan with this property is guaranteed to have a successful solution strategy for an uncertain
temporal planning problem. This approach, as well as the above approaches in general, however,
are either overly conservative or ignore the extra leverage that explicit consideration of uncertainty
can provide.
2.2.2 Plan Management and Repair
A second way to optimize the deterministic planning approach is manage them during execution,
identifying problems and, ideally, making replanning as fast as possible. [Pollack and Horty, 1999]
describes an approach to plan management which includes commitment management, alternate
plan assessment, and selective plan elaboration to manage plans during execution and provide the
means to readily change plans if better alternatives are found.
Another common way to manage plans is to perform fast replanning during execution when
errors are identiﬁed. This typically involves plan repair, where the planner starts with the current,
conﬂicted plan and makes changes to it, such as by adding or removing tasks, until a new plan is
found. ASPEN [Chien et al., 2000] is one such planner that makes heuristic, randomized changes
to a conﬂicted plan to ﬁnd a new, valid plan. Because of the randomness, there are no guarantees
on how long it will take before a solution is found. [Gallagher et al., 2006] and [Smith et al., 2007]
rely on an incremental scheduler to manage schedules during execution, and use heuristics to both
maximize schedule utility and minimize disruption. Other plan repair systems are OPlan [Tate
et al., 1994], GPG [Gerevini and Serina, 2000] and [van der Krogt and de Weerdt, 2005]; however,
none of these three approaches has yet been demonstrated to work with durative tasks.
HOTRiDE (Hierarchical Ordered Task Replanning in Dynamic Environments) [Ayan et al.,
2007] is a Hierarchical Task Network (HTN) planning system that replans the lowest task in the
task network that has a conﬂict whose parent does not. This helps promote plan stability, but no
26
2.3 MetaLevel Control
data is available about the time spent replanning during execution. RepairSHOP [Warﬁeld et al.,
2007] is similar to HOTRiDE, but minimizes replanning based on “explanations” of plan conﬂicts.
RepairSHOP results indicate that it also helps to lessen the time spent replanning during execution.
The only uncertainty these two approaches address, however, is whether an executed task succeeds.
[Maheswaran and Szekely, 2008] considers a distributed, multiagent domain with activity du
ration and outcome uncertainty that we use as part of this thesis work. Probabilitybased criticality
metrics are used to repair the plan during execution according to predeﬁned policies, in the spirit
of our schedule strengthening. Their approach, however, is not as methodical in its probabilistic
analysis nor its use of it, and is highly tailored to the domain in question.
Overall, deterministic planning systems are at a disadvantage because they ignore the potential
leverage that can be provided by an explicit uncertainty model. Frequent replanning can help to
overcome this downside, but the deterministic assumptions unavoidably can create problems to
which replanning cannot adequately respond. Frequent replanning should also be used with care,
as it has the potential to grow out of hand in extremely uncertain environments.
2.3 MetaLevel Control
Metalevel control is “the ability of complex agents operating in open environments to sequence
domain and deliberative actions to optimize expected performance” [Raja and Lesser, 2007]. There
is a large body of work in the area of metalevel control that addresses a variety of aspects of the
problem [Cox, 2005]. Work has been performed to control the computational effort of anytime
algorithms [Russell and Wefald, 1991, Hansen and Zilberstein, 2001b, Alexander et al., 2008,
Boddy and Dean, 1994], to ensure that any extra time spent ﬁnding a better solution is worth
the expected increase in utility of that solution. [Alexander et al., 2010] describe an anytime
approach that unrolls an MDP incrementally at run time based on the tradeoff of improving the
policy versus executing it. [Raja and Lesser, 2007] develop a metalevel control framework for a
distributed multiagent domain that learns a metalevel MDP to decide what control actions to take
at any give time. The MDP is based on a state space of abstract features of the current schedule;
it would be interesting to see how the MDP would perform if the state space also incorporated in
the probability proﬁles generated by our PADS algorithm. [Musliner et al., 2003] perform meta
level control of a realtime agent, and adaptively learn performance proﬁles of deliberation and the
metalevel strategy.
Using a planning horizon is another way of performing metalevel control [L´ eaut´ e and Williams,
2005, Chien et al., 2000, Estlin et al., 2001]. Typically, planning horizons specify how far into the
27
2 Related Work
future a planner should plan. Although planning with a horizon can threaten the effectiveness of
the schedule, as portions of the solution are being ignored, horizons can increase the tractability of
planning (or replanning). In the above approaches, however, the horizon is set at a ﬁxed amount of
time in the future. Anytime algorithms such as [Hansen and Zilberstein, 2001a, Alexander et al.,
2010] also inherently include a planning horizon as they expand an MDP’s horizon while time is
available; both of these approaches consider the cost of computation in their choice of a horizon.
In general, many of these approaches explicitly reason about the cost of computation, which we
do not for several reasons. First, execution and replanning can occur concurrently in our domains,
removing the need to explicitly choose between deliberation and action at any given time. Second,
deterministic replanning tends to be fast, lessening the beneﬁt of explicitly modeling it to use in a
metalevel control algorithm. Finally, performance proﬁles can be difﬁcult to model for heuristic
or randomized planners like the ones we use in this document. Instead, our approach analyzes the
schedule to see if replanning is likely necessary based on probabilistic information such as how
likely it is that execution will succeed if a replan does not occur.
28
Chapter 3
Background
In this chapter, we describe core concepts and technologies upon which we build our work. We
ﬁrst discuss our use of plan versus schedule terminology, followed by the representation of domains
and schedules that we use. Next, we discuss in detail the two domains we speciﬁcally analyze as
part of this thesis. Finally, we introduce decision trees with boosting, a technique we use in our
probabilistic metalevel control strategy.
3.1 Plan Versus Schedule Terminology
In this document, we use both the terms planning and scheduling freely. For the purposes of this
document, we deﬁne planning as the problem of deciding what to execute, and scheduling as the
problem of deciding when to execute it. By this deﬁnition, all temporal planning domains include a
scheduling component; similarly, many domains typically considered scheduling domains include
at least a limited planning component.
This leads us to the choice of planning/scheduling in our title and in the name of our algorithms.
We name our algorithms for the general case, where there is an overall plan with a schedule that
is executed to realize it. Our title, “Probabilistic Plan Management,” reﬂects the idea that we are
managing plans to satisfy goals at the highest level. Two of our algorithms, “Probabilistic Analysis
of Deterministic Schedules” and “Probabilistic Schedule Strengthening” work to analyze plans at
the schedule level, so we use the word “schedule” to describe them.
Our terminology, however, should not be thought to limit the scope of our approach. The ideas
in this document could very well be tailored to probabilistic schedule management, a probabilistic
29
3 Background
analysis of deterministic plans, or plan strengthening, depending on the needs of the domain.
3.2 Domain and Schedule Representation
3.2.1 Activity Networks
We present much of the description of our work in terms of activity networks, of which Hierar
chical Task Networks (HTNs) are a wellknown example [Erol et al., 1994]. Activity networks
represent and organize domain knowledge by expressing dependencies among executable actions
as a hierarchical network (or tree) of activities. At the lowest level of the tree are the primitive,
executable actions, which are called methods. The methods are collected under aggregate nodes,
or tasks. Tasks can be children of other tasks, and ultimately all activities (a term that refers to
either methods or tasks) are collected under the root node of the network, called the taskgroup. The
successful satisfaction of the objective of the root node corresponds to the goal of the domain.
Constraints are used to describe dependency relationships between any two activities in the
network (although typically there are no explicit dependency constraints between activities with
direct ancestry relationships). The most common constraint type is a temporal constraint between
activity execution time points, which we refer to as a prerequisite relationship; for example, a
prerequisite relationship from activity A to activity B means that the source A must come before the
target B. As part of our analysis of one of our domains we also consider other types of constraints
such as facilitates, an optional prerequisite that, if observed, magniﬁes the utility of an action.
Activities can also have constraints such as release (earliest possible start) times and deadlines.
These types of constraints are passed down such that child activities inherit constraints from their
parents, and an activity’s effective execution window is deﬁned by the tightest of these bound
aries. Sibling activities can be executed concurrently in the absence of any constraints that require
otherwise.
In our activity networks, each task has a type: AND, OR, or ONE. An AND task must have
all children underneath it succeed by meeting their objective in order for its own objective to be
satisﬁed. An OR task must have at least one child underneath it execute successfully, and a ONE
task can have only one child underneath it successfully execute. Typically, a good schedule would
include all children under an AND task, one or more under an OR task, and only one child under a
ONE task; otherwise, the schedule is at risk of losing utility by not observing the task types.
In this work, we assume all domains can be represented by activity networks. The assumption
that such domain knowledge is available is common for many realworld domains [Wilkins and
30
3.2 Domain and Schedule Representation
!"#$%$ "&'()*($ "&+,'.$
/%012$
/3032$
/%04%2$
Figure 3.1: A sample STN for a single scheduled method m with duration 5 and a deadline of 10.
Notice that the method is constrained to start no earlier than time 0 (via the constraint between
time 0 and m start) and ﬁnish by time 10 (via the constraint between time 0 and m finish).
desJardins, 2000]. A simple example of an activity network representation for a domain where a
set of activities must be executed is a network where all methods are children of an AND taskgroup.
3.2.2 Flexible Times Schedules
We assume that schedules are represented as deterministic, ﬂexible times schedules as in [Smith
et al., 2007]. This type of schedule representation allows the execution intervals of activities to
ﬂoat within a range of feasible start and ﬁnish times, so that slack can be explicitly used as a
hedge against temporal uncertainty. As the underlying representation we use Simple Temporal
Networks (STNs) [Dechter et al., 1991]. An STN is a graph where nodes represent event time
points, such as method start and ﬁnish times, and edges specify lower and upper bounds (i.e.,
[lb, ub]) on temporal distances between pairs of time points. A special time point represents time
0, grounding the network. All constraints and relationships between activities, such as deadlines,
durations, ancestry relationships and prerequisites, are represented as edges in the graph. For
example, a deadline dl is represented as an edge with bounds [0, dl] between time 0 and the ﬁnish
time of the constrained method, and prerequisites may be represented as edges of [0, ∞] between
the ﬁnish time of the ﬁrst method and the start of the second. Method durations are represented
as an edge of [d, d
] between a method’s start and ﬁnish time. In the deterministic systems we
consider, method durations are [d, d], where d is the method’s deterministic, scheduled duration.
An agent’s schedule is represented by posting precedence constraints between the start and end
points of each neighboring pair of methods, termed predecessors and successors. The STN for a
single scheduled method with duration 5 whose execution must ﬁnish before its deadline at time
10 is shown in Figure 3.1.
Constraints are propagated in order to maintain lower and upper bounds on all time points in
the STN. Speciﬁcally, the constraint network provides an earliest start time est, latest start time
31
3 Background
lst, earliest ﬁnish time eft and latest ﬁnish time lft for each activity. Conﬂicts occur in the
STN whenever negative cycles are present, which indicates that some time bound is violated.
Even if a schedule is not generated by a ﬂexible times planner, and is ﬁxed times, it can easily
be translated into a ﬂexible times schedule, similar to [Policella et al., 2009]. The process begins by
representing the schedule as described above, preserving the sequencing of the original schedule.
To generate initial time bounds, each node can be initialized with all possible times (e.g., times in
the range [0, h] where h is a time point known to be after all deadlines and when execution will have
halted), and constraint propagation used to generate the valid execution intervals of each activity.
3.3 Domains
We have investigated our approach in two domains: a singleagent domain and a multiagent one.
The objective in the singleagent domain is to execute a set of tasks by a ﬁnal deadline, where there
is uncertainty in method durations. The domain is based on a wellstudied problem with temporal
constraints and little schedule slack, and its success is greatly affected by the temporal uncertainty.
These characteristics make it an ideal domain for which to demonstrate the fundamental principles
of our probabilistic analysis.
The second domain is a multiagent domain that has a plan objective of maximizing utility.
Utility is explicitly modeled as part of the domain description. There is uncertainty in method
duration and outcome, as well as a rich set of constraints between activities. Using this large, com
plex domain demonstrates the generality and scalability of our algorithms. The next subsections
describe these domains in detail.
3.3.1 SingleAgent Domain
We ﬁrst demonstrate our work on a singleagent planning problem, which is concerned with execut
ing a set of tasks by a ﬁnal deadline. Minimal and maximal time lags specify temporal constraints
between activity start and ﬁnish time points. All tasks must be executed, observing all temporal
domain constraints, in order for the plan to succeed, and tasks do not have explicitly utility. Each
task has a variable number of children methods, only one of which can be executed to fulﬁll that
task’s role in the plan. The agent can execute at most one method at a time.
Our domain is based on the MultiMode Resource Constrained Project Scheduling Problem
with Minimal and Maximal Time Lags (MRCPSP/max) [Schwindt, 1998]. This NPcomplete
makespan minimization problem is wellstudied in the scheduling community. Although the do
32
3.3 Domains
Laskaroup
Anu
11
CnL
12
CnL
m1_1
durauon: 10
m1_2
durauon: 4
m2_1
durauon: 7
m2_2
durauon: 3
[2, ~]
[3, ~]
[4, ~]
[1, ~]
Figure 3.2: A sample MRCPSP/max problem.
main description allows a variable number of tasks and methods, each problem we use in this thesis
has 20 tasks, and tasks have 2, 3 or 5 methods underneath them, each with potentially different du
rations. Activity dependency constraints are in the form of time lags between the ﬁnish times of the
children of a source task and the start times of the children of a target task. Although all children
of a source task are constrained with all children of a target task, the speciﬁcs of the constraints
can vary. These constraints act as prerequisite constraints that have an associated minimal and/or
maximal time delay between the execution time points of the two methods involved. The time lags
speciﬁed in the problem description can also be negative; intuitively, this reverses the prerequisite
constraint. The domain can be represented as an activity network with the AND taskgroup decom
posing into each of the 20 tasks of the problem; those tasks, represented as ONE nodes, further
decompose into their child methods.
A sample problem (with only 2 tasks) is shown in Figure 3.2. Underneath the parent taskgroup
are the two tasks, each with two children. Constraints are in the form of prerequisite constraints
between the end time of the source task and the start time of the target task. Note how all of T1’s
children are constrained with each of T2’s children, but with different time lags.
The problems specify a duration dur
m
for each method m, which is the method’s determinis
tic, scheduled duration. As the domain itself does not include uncertainty, we modiﬁed the prob
lems by assigning each method a normal duration distribution DUR(m) with a standard deviation
σ (DUR(m)) that is uniformly sampled from the range [1, 2 dur
m
/5], and a mean µ(DUR(m))
of (dur
m
− σ (DUR(m)) /2). These values were chosen because, after experimenting with sev
33
3 Background
eral values, they seemed to present the best compromise between schedule robustness (i.e., more
slack built into the schedule) and a minimal schedule makespan (i.e., no slack build into the sched
ule). When sampling from duration distributions to ﬁnd methods’ actual durations, durations are
rounded to the nearest integer in order to simplify the simulation of execution. Finally, we assign
each problem a ﬁnal deadline, since the original problem formulations do not have explicit dead
lines represented in it. The addition of deadlines adds to the temporal complexity of the domain.
Core Planner
We use ASPEN (Automated Scheduling/Planning ENvironment) [Chien et al., 2000] as the plan
ner for this domain. ASPEN is an iterativerepair planner that produces ﬁxed times plans and is
designed to support a wide range of planning and scheduling domains. To generate a valid plan,
ASPEN ﬁrst generates an initial, ﬂawed plan, then iteratively and randomly repairs plan conﬂicts
until the plan is valid. It will continue to attempt to generate a plan until a solution is found or
the number of allowed repair iterations is reached, in which case it returns with no solution. When
replanning during execution, the planner begins with the current, potentially conﬂicted plan and
performs the same repair process until the plan is valid or until it times out. The schedule is gen
erated using dur
m
for each method m’s deterministic duration. We used ASPEN instead of a
ﬂexible times planner because it was what we had available to use, at the time. A sample problem
for this domain in ASPEN syntax is shown in Appendix A.1. In order to ﬁt the problem on a rea
sonable number of pages, the problem was simpliﬁed and some constraints between methods were
removed.
After planning (or replanning), we expand the plan output by ASPEN into a totalorder, ﬂexible
times representation, as described in Section 3.2.2. As our constraint propagation algorithm for the
STN, we implemented a version of a standard constraint propagation algorithm AC3 [Machworth,
1977] that propagates constraints both forward and backward. The algorithm begins with all edges
marked as pending, meaning they are yet to be checked. It examines each edge and removes
possible values from the two connected nodes if they are not consistent with the constraint asserted
by the edge. If any node has a value removed, all edges except for the current one are added back
onto the pending list. The algorithm terminates when the pending list is empty.
During execution, the executive monitors the prerequisite conditions of any pending activities,
and executes the next activity as soon as its prerequisites are satisﬁed. The agent replans whenever
the STN becomes inconsistent. We will present the runtime plan management approach for this
domain later, in Section 7.2.1.
34
3.3 Domains
Figure 3.3: A graph of a normal distribution.
Normal Distributions
Task duration distributions are modeled in this domain as normal distributions. Normal, or Gaus
sian, distributions are continuous probability distributions typically used to describe data that is
clustered around a mean, or average, µ (Figure 3.3). A distribution’s standard deviation (σ) de
scribes how tightly the distribution is clustered around the mean; its variance σ
2
is the square of
the standard deviation. To indicate that a realvalued random variable X is normally distributed
with mean µ and variance σ
2
, we write X ∼ N
µ, σ
2
. Normal distributions have the nice prop
erty that the mean of the sum of two normal distributions equals the sum of the means of each of
the distributions; the same is true for variance. Thus, the normal distribution exactly representing
the distribution of the sum of two random variables drawn from ∼N
µ
1
, σ
2
1
and ∼N
µ
2
, σ
2
2
is
∼N
µ
1
+µ
2
, σ
2
1
+σ
2
2
. This property makes normal distributions very easy to work with in our
analysis of this domain, which requires summing distributions over groups of tasks, as the sum of
n distributions can be done in O(n) time. For other functions that some domains may require (e.g.,
maximum, minimum, etc.), however, normals do not combine as nicely and an approximation or
discretization may be appropriate. Of course, if a domain’s distribution type is not represented as a
normal, these calculations can be changed as appropriate.
We use two functions as part of our use of normal distributions. The ﬁrst is normcdf(c, µ, σ),
the normal cumulative distribution function. This function calculates the probability that a variable
drawn from the distribution will have a value less than or equal to c. As an example, the value of
35
3 Background
normcdf(µ, µ, σ) is 0.5, as 50% of the distribution’s probability is less than the mean.
As part of its analysis, PADS needs to calculate the distribution of a method’s ﬁnish time,
given that it meets a deadline. This requires truncating distributions and combining them, such
as by addition, with other normal distributions. There is no known way to represent exactly
the combination of a truncated normal distribution and a normal distribution; the best approxi
mation we are aware of ﬁnds the mean and standard deviation of the truncated distribution and
treats it as the mean and standard deviation of a normal distribution. We deﬁne this function as
µ
trunc
, σ
trunc
= truncate(lb, ub, µ, σ), meaning that the normal distribution ∼ N(µ, σ
2
) is
being truncated below and above at [lb, ub]. To ﬁnd the mean and standard deviation of a truncated
distribution, we base our calculations on [Barr and Sherrill, 1999], which speciﬁes that the mean
of a standard normal random variable ∼ N(0, 1) truncated below at a ﬁxed point c is
µ
trunc
=
e
−c
2
/2
√
2π (1 −normcdf(c, 0, 1))
and the variance
1
is
σ
2
trunc
=
1 −chisquarecdf
c
2
, 3
2 (1 −normcdf(c, 0, 1))
−µ
2
trunc
From this we derive equations that truncate both above and below at [lb, ub]:
µ
trunc
=
−e
−ub
2
/2
+e
−lb
2
/2
√
2π (normcdf(ub, 0, 1) −normcdf(lb, 0, 1))
σ
2
trunc
=
chisquarecdf
ub
2
, 3
−chisquarecdf
lb
2
, 3
2 (normcdf(ub, 0, 1) −normcdf(lb, 0, 1))
−µ
2
trunc
These equations are the basis of the function truncate, which we use in our probabilistic analysis
of this domain.
3.3.2 MultiAgent Domain
Our second domain is an oversubscribed, multiagent planning problem that is concerned with the
collaborative execution of a joint mission by a team of agents in a highly dynamic environment.
Missions are formulated as a hierarchical network of activities in a version of the TAEMS language
1
The formula used in this calculation, chisquarecdf(x, N) is the chisquared cumulative distribution function
and returns the probability that a single observation from the chisquared distribution with N degrees of freedom will
fall in the interval [0, x].
36
3.3 Domains
!"#$%&'()*
+,!
.*
+,!
/*
012!
3.*
(456*./*
7(&6*.8*
"%9:!6*;9&93<*
3/*
(456*=*
7(&6*>*
"%9:!6*?"&"@*
3A*
(456*B*
7(&6*=*
"%9:!6*C(#!9&*
38*
(456*D*
7(&6*D*
"%9:!6*C(#!9&*
9:"E59#*
Figure 3.4: A simple, deterministic 3agent problem.
(Task Analysis, Environment Modeling and Simulation) [Decker, 1996] called C TAEMS [Boddy
et al., 2005]. The plan objective is to maximize earned utility, which is explicitly modeled in this
domain
2
. The taskgroup node of the network, which is an OR node, is the overall mission task, and
agents cooperate to accrue as much utility at this level as possible. On successive levels, interior
nodes constitute aggregate activities that can be decomposed into sets of tasks and/or methods (leaf
nodes), which are directly executable in the world. Other than this, there is no predetermined
structure to the problems of this domain. The domain’s activity network also includes special
types of constraints and relationships between activities that warrant special consideration while
probabilistically analyzing plans of this domain, as described further below.
To illustrate the basic principles of this domain, consider the simple activity network shown
in Figure 3.4. At the root is the OR taskgroup, with two child tasks, one OR and one AND; each
child task has two methods underneath it. Each method can be executed only by a speciﬁed agent
(agent: Name) and each agent can execute at most one method at a time. The enables relationship
shown is a prerequisite relationship.
Given this activity network, one could imagine a number of schedules being generated. We list
below all possible valid schedules, as well as situations in which they would be appropriate.
• m1  Jeremy executes method m1, but there is not enough room in Buster’s schedule for m3
2
In the terminology of the domain and our previous publications of PPM for this domain [Hiatt et al., 2008, 2009],
utility is called “quality”; for consistency in this document, we will continue to refer to it as “utility.”
37
3 Background
and m4.
• m2  Jeremy does not have enough room in its schedule to execute method m1, so Sarah
executes the lowerutility option of m2.
• m1→¦m3, m4¦  Jeremy executes method m1, which fulﬁlls the prerequisite constraint and
allows Buster to execute m3 and m4, which can be executed in any order.
• m2 → ¦m3, m4¦  Jeremy does not have enough room in its schedule to execute method
m1, so Sarah executes the lowerutility option of m2; this fulﬁlls the prerequisite constraint
and allows Buster to execute m3 and m4, which can be executed in any order.
• ¦m1 or m2¦ → ¦m3, m4¦  Jeremy executes method m1, and Sarah concurrently executes
method m2. Whichever ﬁnishes ﬁrst fulﬁlls the prerequisite constraint and allows Buster to
execute m3 and m4, which can be executed in any order. An example of when this schedule
is appropriate is if Buster’s schedule is tight and it is useful for m3 and m4 to begin execution
before m1 is able to ﬁnish, calling for the scheduling of m2; since m1 earns higher utility,
however, it is also scheduled.
Note that the reason why methods m3 and m4 are not executed concurrently in this example is that
they are owned by the same agent.
Now that we have illustrated the basic principles of the domain, we show the full model of
this example problem in Figure 3.5, and refer to it as we describe the further characteristics of the
domain.
Method durations are speciﬁed as discrete probability distributions over times (DUR: X), with
the exact duration of a method known only after it has been executed. Method utilities, (UTIL:
X), are also speciﬁed as discrete probability distributions over utilities. Methods do not earn utility
until their execution has completed. Methods may also have an a priori likelihood of ending with
a failure outcome (fail: X%), which results in zero utility. In subsequent text, we refer to DUR,
UTIL and fail as functions, which when given a method as input return the duration distribution,
utility distribution and failure likelihood, respectively, of that method. We also deﬁne the functions
averageDur and averageUtil, which return the deterministic expectation of an activity’s du
ration and utility, respectively. We use “average” here to distinguish these values from the expected
utility which is calculated as part of the probabilistic analysis. Methods can fail for reasons other
than a failure outcome, such as by missing a deadline; however, any type of failure results in the
method earning zero utility. Even if a method fails, execution of the schedule still continues, since
the complicated activity structure and plan objective means that there is very likely still potential
for earning further utility at the taskgroup level.
Each task in the activity network has a speciﬁed utility accumulation function (uaf) deﬁning
38
3.3 Domains
Laskaroup
!"#: $!%&(C8)&
11
rel: 100, dl: 120
!"#: %"'&(C8)&
12
rel: 110, dl: 130
!"#: $!%(")*&(Anu)&
m1
fall: 20°
u1lL: 10 (23°), 12 (30°), 14 (23°)
uu8: 12 (23°), 14 (30°), 16 (23°)
aaenL: !eremv
m2
u1lL: 4 (23°),
3 (30°), 6 (23°)
uu8: 7 (23°),
8 (30°), 9 (23°)
aaenL: Sarah
m3
u1lL: 3 (23°),
6 (30°), 7 (23°)
uu8: 4 (23°),
3 (30°), 6 (23°)
aaenL: 8usLer
m4
u1lL: 8 (23°),
9 (30°), 10 (23°)
uu8: 7 (23°),
9 (30°), 11 (23°)
aaenL: 8usLer
enables
Figure 3.5: A small 3agent problem.
when, and how, a task accumulates utility. The uafs are grouped by node type: OR, AND and
ONE. Tasks with OR uafs earn positive utility as soon as their ﬁrst child executes with positive
utility. Tasks with AND uafs earn positive utility once all children execute with positive utility.
Trivially, ONE tasks earn utility when their sole child executes with positive utility. Uafs also
specify how much ﬁnal utility the tasks earn. For example, a sum task’s utility at any point equals
the total earned utility of all children. Other uafs are max, whose utility equals the maximum utility
of its executed children, and sum and, which earns the sum of the utilities of its children once they
all execute with positive utility. A less common uaf is min, which speciﬁes that tasks accumulate
the minimum of the utilities of their children once they all execute with positive utility. A summary
of the uaf types is shown in Table 3.1.
Additional constraints in the network include a release (earliest possible start) time and a dead
line (denoted by rel: X and dl: X, respectively, in Figure 3.5) that can be speciﬁed for any activity.
Each descendant of a task inherits these constraints from its ancestors, and its effective execution
window is deﬁned by the tightest of these constraints. An executed activity fails if any of these
constraints are violated. In the rest of this document, we refer to rel and dl as functions, which
when given an activity as input return its release and deadline, respectively.
Interdependencies between activities are modeled via constraints called nonlocal effects (NLEs).
NLEs express causal effects between a source activity and a target activity. NLEs can be between
any two activities in the network that do not have a direct ancestry relationship. If the target activity
is a task, the NLE is interpreted as applying individually to each of the methods in the activity net
39
3 Background
Table 3.1: A summary of utility accumulation function (uaf) types.
uaf type description
sum OR
utility is sum of children’s utility
(task earns utility as soon as the ﬁrst child earns positive utility)
max OR
utility is max of children’s utility
(task earns utility as soon as the ﬁrst child earns positive utility)
sum and AND
utility is sum of children’s utility
(task earns utility once all children earn positive utility)
min AND
utility is min of children’s utility
(task earns utility once all children earn positive utility)
exactly one ONE
utility is only executed child’s utility
(task earns utility as soon as the ﬁrst child earns positive utility,
task has zero utility if any other children earn positive utility)
work under the task. Therefore, in much of our future discussion, we treat NLEs where the target
is a task as the analogous situation where there are enabling NLEs from the source to each of the
task’s method descendants.
NLEs are activated by the source activity accruing positive utility, and must be activated before
a target method begins execution in order to have an effect on it. All NLEs have an associated delay
parameter specifying how much time passes from when the source activity accrues positive utility
until the NLE is activated; although our approach can handle any arbitrary delay, for the sake of
simplicity we present our discussion as if delays do not exist.
There are four types of NLEs. The most common is the enables relationship. The enables NLE
stipulates that unless the target method begins execution after the source activity earns positive
utility, it will fail. For example, in Figure 3.5, the execution of targets m
3
and m
4
will not result
in positive utility unless the methods are executed after the source T
1
earns positive utility. The
enables relationship is unique among the NLEs because an effective agent will wait for all of a
method’s enablers to earn positive utility before it attempts to begin executing it; otherwise, its
execution would be a waste as the method would not earn utility.
The second NLE is the disables NLE, which has the opposite effect of enabling. The disables
relationship stipulates that the target method fails (earns zero utility) if it begins executing after its
source has accrued positive utility. Typically, an effective planner will schedule sources of disables
to ﬁnish after the targets start. If a source activity does ﬁnish, however, before the target has started,
executing the target serves no purpose and so the planner should remove it from the schedule. The
40
3.3 Domains
disables NLE and enables NLE are referred to as “hard” NLEs, as they must be observed to earn
utility.
The third NLE is the facilitates NLE. This NLE, when activated, increases the target method’s
utility and decreases its executed duration. The amount of the effect depends on the proportional
utility (proputil) of the source s at the time e when the target starts execution:
proputil (s, e) = min(1, utilearned (s, e)) /MaxU (s)
In this formula, utilearned refers to the utility so far accumulated by s, and MaxU (s) refers to
an approximation of the maximum possible utility of the activity s. As it is not important to our
approach, we omit describing how MaxU is calculated.
This formula is used to specify the effect of the NLE in terms of a utility coefﬁcient uc such
that 0 ≤ uc ≤ 1, and a duration coefﬁcient dc such that 0 ≤ dc < 1. Putting it all together, the
utility distribution of the target t if it starts at time e becomes:
UTIL(t) = UTIL(t) (1 +uc proputil (s, e))
The duration distribution, similarly, becomes:
DUR(t) = DUR(t) (1 −dc proputil (s, e))
Note that the presence of  in the duration ensures that durations are integral; utilities may not
be after a facilitating effect. Also note that according to this feature of the domain description, a
target’s utility can be, at most, doubled by the effect of a single facilitator.
The hinders NLE is exactly the same as the facilitates, except that the signs of the effect are
changed:
UTIL(t) = UTIL(t) (1 −uc proputil (s))
DUR(t) = DUR(t) (1 +dc proputil (s))
Again, note that durations are always integral and, according to this feature of the domain descrip
tion, a target’s duration can, at most, be doubled by the effect of a single hinderer.
If a method is the target of multiple facilitates or hinders, the effects can be applied in any order,
as multiplication is communicative; for the duration calculation, the ceiling function is applied after
the individual NLE effects have been combined. The facilitates and hinders NLEs are referred to as
“soft” NLEs as, unlike the hard NLEs, utility can be earned without observing them. A summary
of the NLE types is shown in Table 3.2.
In the rest of this document, we refer to en, dis, fac and hind as functions that, when
given an activity as input, return the scheduled enablers, disablers, facilitators and hinderers of
41
3 Background
Table 3.2: A summary of nonlocal effect (NLE) types.
NLE hard/soft description
enables hard
target will fail if it begins executing
before the source earns positive utility
disables hard
target will fail if the source earns
positive utility before the target begins executing
facilitates soft
target’s duration/utility are decreased/increased
if the source earns positive utility before the target begins executing
hinders soft
target’s duration/utility are increased/decreased
if the source earns positive utility before the target begins executing
the activity, as appropriate. The function enabled and disabled return the activities directly
enabled (or disabled) by the input activity. NLEs returns the scheduled source NLEs of all types
of an activity. We will also refer later to the function directlyDisabledAncestor which,
when given a method and (one of) its disabler, returns the ancestor of the method that is directly
targeted by the disables NLE. A sample problem for this domain in C TAEMS syntax is shown
in Appendix A.2. In order to ﬁt the problem on a reasonable number of pages, the problem was
simpliﬁed and some activities were deleted.
We assume that a planner generates a complete ﬂexible times schedule for all the agents. When
execution begins, agents receive a copy of their portion of the initial schedule and limited local
views of the activity network. Agents’ local views consist of activities that are designated as local
(the agent is responsible for executing and/or managing them) or remote (the agent tracks current
information for these activities via status updates from the agent responsible for maintaining them).
Agents receive copies of all of their local activities, as well as remote copies of all activities that are
constrained via NLEs with their local activities. Additionally, all activities are ultimately connected
to the root node (i.e., the local view consists of a single tree rooted at the taskgroup node). The
limited local view for agent Buster of Figure 3.5 is shown in Figure 3.6. After execution begins,
agents have a short, ﬁxed period of time to perform initial computations (such as calculating initial
probability information) before they actually begin to execute methods.
We have developed two versions of our probabilistic plan management approach for this do
main. One analysis applies to the initial, global schedule and is used to strengthen it before it
is distributed to agents. The second analysis is used by each agent during execution to distribu
tively manage their portion of the schedule. During our discussion of PADS, we distinguish be
tween the functions jointSchedule, which returns the scheduled methods across all agents,
and localSchedule, which returns the methods scheduled locally on a single agent. Similarly,
42
3.3 Domains
Laskaroup
!"#:$%!&$(C8)$
11
rel: 100, dl: 120
!"#: &"'$(C8)$
12
rel: 110, dl: 130
!"#: %!&(")*$(Anu)$
enables
(remoLe)
m3
u1lL: 3 (23°),
6 (30°), 7 (23°)
uu8: 4 (23°),
3 (30°), 6 (23°)
aaenL: 8usLer
m4
u1lL: 8 (23°),
9 (30°), 10 (23°)
uu8: 7 (23°),
9 (30°), 11 (23°)
aaenL: 8usLer
Figure 3.6: The limited local view for agent Buster of Figure 3.5.
the local function, when given an activity, returns whether that activity is local to the agent.
Core Planner
The planner for this domain is the incremental ﬂexible times scheduling framework described in
[Smith et al., 2007]. It adopts a partialorder schedule (POS) representation, with an underlying
implementation of an STN, as described in Section 3.2.2. The constraint propagation algorithm
used an incremental, arcconsistency based shortest path algorithm, based on [Cesta and Oddi,
1996]. The schedule for any individual agent is totalorder; however, the joint schedule as a whole
may be partialorder as interagent activities are constrained with each other only by NLEs, and not
by explicit sequencing constraints.
The planner uses expected duration and utility values for methods, ignores a priori failure
likelihoods, and focuses on producing a POS for the set of agents that maximizes overall problem
utility. Given the complexity of solving this problem optimally and an interest in scalability, the
planner uses a heuristic approach. Speciﬁcally, the initial, global schedule is developed via an
iterative twophase process. In the ﬁrst phase, a relaxed version of the problem is solved optimally
to determine the set of “contributor” methods, or methods that would maximize overall utility if
there were no capacity constraints. In the second phase, an attempt is made to sequentially allocate
these methods to agent schedules. If a contributor being allocated depends on one or more enabling
activities that are not already scheduled, these are scheduled ﬁrst. Should a method fail to be
inserted at any point (due to capacity constraints), the ﬁrst phase is reinvoked with the problematic
method excluded. A similar process takes place within each agent when it replans locally. More
43
3 Background
detail on the planner can be found in [Smith et al., 2007].
During distributed execution, the executor monitors the NLEs of any pending activities and ex
ecutes the next pending activity as soon as all the necessary constraints are satisﬁed. Agents replan
selectively in response to execution, such as if the STN becomes inconsistent, or if a method ends
in a failure outcome. We describe the runtime plan management system further in Section 7.3.1.
Discrete Probability Distributions
In this domain, uncertainty is represented as discrete distributions. We use summation, multiplica
tion, maximization and minimization in our algorithms in later chapters; if a domain’s distribution
type is different than discrete, the below equations can be changed as appropriate.
The sum of two discrete probability distributions, Z = X +Y is [Grinstead and Snell, 1997]:
P (Z = z) =
∞
¸
k=−∞
P (X = k) P (Y = z −k)
The max of two discrete probability distributions, Z = max(X, Y ) is:
P (Z = z) = P (X = z) P (Y < z) +P (X < z) P (Y = z)
For the sake of redundancy, we omit the versions of these formulas for the multiplication of
and the minimum of two distributions. These equations can be implemented by iterating through
the distributions in question. Unlike with normal distributions, therefore, these calculations are
not linear and are done in O([X[ [Y [) time; the size of the resulting distribution is bounded by
[X[[Y [ as well (although typically such a distribution has duplicate values and so its representation
is not that large). Discrete distributions do, however, have the beneﬁt of being able to be combined
or modiﬁed exactly by any arbitrary operator.
3.4 Decision Trees with Boosting
We use decision trees with boosting [Quinlan, 1986] as part of our work on probabilistic meta
level control. Decision trees, and classiﬁers in general, are a way of assigning one of a number
of classes to a given state. For our purposes, this means determining what highlevel planning
action to take based on the current probabilistic analysis of the schedule. Decision trees, however,
are easily susceptible to overﬁtting: they can become unnecessarily complex and describe random
44
3.4 Decision Trees with Boosting
error or noise in the dataset instead of the true underlying model. Boosting is one technique that is
commonly used to overcome this problem. Boosting is a metaalgorithm that creates many small,
weak classiﬁers (such as decision trees restricted in size), weighs them according to their goodness,
and lets them “vote” to determine the overall classiﬁcation. We choose this type of classiﬁcation
because it is easy to implement and very effective; [Breiman, 1996] even called AdaBoost with
decision trees the “best offtheshelf classiﬁer in the world.”
To construct a decision tree, one needs a training set of example states, x
1
...x
j
, each with a
known class Y ∈ y
1
...y
k
. Each example consists of a number of attributes; in our case, these
attributes are variables that have continuous values. Because we are using a boosting algorithm,
the examples also have associated weights, w
1
...w
j
. A decision tree is built recursively. It starts at
the root node and inspects the data. If all examples have the same class, the node becomes a leaf
node specifying that class. If all examples have identical states, it becomes a leaf node specifying
the class that the weighted majority of the states have. Otherwise, the node branches based on an
attribute and a threshold for that attribute: all examples with a value for that attribute less than the
threshold are passed to the right child, all others are passed to the left. This process recurses for
each child, continuing until the algorithm terminates. An example decision tree where all examples
are weighted equally is shown in Figure 3.7.
To choose which attribute to branch on, we use the algorithm C4.5 [Quinlan, 1993]. At each
node of the tree C4.5 chooses an attribute A and, if A is a continuous attribute, a threshold t, which
maximize Information Gain (IG), or the reduction in entropy that occurs by splitting the data by A
and t. Formally, IG is deﬁned as
IG(A, t) = H (Y ) −H (Y [ A, t)
where H (Y ) represents the entropy of the class variable Y before the split:
H (Y ) = −
k
¸
i=1
P (Y = y
i
) log
2
P (Y = y
i
)
and H (Y [ A, t) represents the entropy of the class variable Y after the split, which divides the
data according to whether the value for attribute A is less than t:
H (Y [ A, t) =−P (A < t)
k
¸
i=1
P (Y = y
i
[ A < t) log
2
P (Y = y
i
[ A < t)
−P (A ≥ t)
k
¸
i=1
P (Y = y
i
[ A ≥ t) log
2
P (Y = y
i
[ A ≥ t)
45
3 Background
!"#$%&'( )*$'( +#*,( )#./(
!" #$%%&'" %(%)*+" ,."
/" #$!0&'" %(%!*+" 12"
3" 4$%%&'" %(%*+" ,."
)" 4$3%&'" %(%/*+" ,."
0" 5$3%&'" %(%*+" 12"
#" !$3%6'" %(%!*+" 12"
7" )$3%6'" %(%3*+" ,."
4" )$)06'" %(%!*+" ,."
5" 0$)06'" %(%*+" ,."
!%" 7$%%6'" %(%*+" 12"
!!" 4$%%6'" %(%0*+" ,."
8229"
7")"
:&*+";"%(%!0"
3")"
:&*+"<="%(%!0"
)"%"
68>*?9"98&@?"
A*'"<="3$!06'"
/"!"
A*'";"3$!06'"
!"3"
A*'";"4$)0&'"
!"!"
A*'"<="4$)0&'"
%"/"
68>*?9"+2"98&@?"
A*'";"7$%%&'"
%"!"
A*'"<="7$%%&'"
!"%"
68>*?9"+2"98&@?"
68>*?9"98&@?"
A*'";"#$!06'"
/"%"
A*'"<="#$!06'"
%"!"
68>*?9"98&@?"
68>*?9"+2"98&@?"
Figure 3.7: An example decision tree built from a data set. The tree classiﬁes 11 examples giving
state information about the time of day and how much rain there has been in the past 10 minutes as
having or not having trafﬁc at that time. Each node of the tree shows the attribute and threshold for
that branch as well as how many instances of each class are present at that node. Instances that are
not underlined refer to trafﬁc being present; underlined instances refer to trafﬁc not being present.
As the data is weighted, P (Y = y
i
) is:
P (Y = y
i
) =
¸
j
i=1
(w
i
1(Y (x
i
) = y
i
))
¸
j
i=1
w
i
where 1 is the indicator function, which returns 1 if the passed in equation is true and 0 otherwise.
Because we are solving a multiclass classiﬁcation problem, we use the boosting algorithm
SAMME [Zhu et al., 2005], a multiclass variant of AdaBoost [Freund and Schapire, 1997]. The
algorithm begins by weighing all training examples equally. It builds a simple decision tree (e.g.,
a decision tree with two levels), and calculates a “goodness” term α for the classiﬁer based on
the (weighted) percent of examples that were misclassiﬁed. Next, it increases the weight of the
46
3.4 Decision Trees with Boosting
misclassiﬁed examples and decreases the weight of the correctly classiﬁed examples, and builds
another decision tree based on the newly weighted examples. It repeats the process of building
decision trees and reweighing the examples until the desired number M of classiﬁers are built. The
overall classiﬁcation C (x) of an example x is then:
C (x) = arg max
y
M
¸
m=1
α(m) 1(C
m
(x) = y)
This equation, intuitively, means that the classiﬁer returns the classiﬁcation that receives the highest
total of votes from each of the weighted decision trees.
47
3 Background
48
Chapter 4
Approach
Probabilistic plan management (PPM) is a composite approach to planning and scheduling under
uncertainty that couples the strengths of deterministic and nondeterministic planning, while avoid
ing their weaknesses. The goal is to manage and execute effective deterministic plans in a scalable
way. As its starting point, PPM assumes a deterministic planner that generates an initial schedule
to execute and replans as necessary during execution. It then layers a probabilistic analysis on top
of the planner to explicitly analyze the uncertainty of the schedules the planner produces. PPM
maintains this probabilistic analysis of the current deterministic schedule as it changes due to both
the dynamics of execution and replanning. Both before and during execution, it uses the analysis
to make the schedule more robust to execution uncertainty, via probabilistic schedule strengthen
ing. It also uses the analysis to manage the execution of the schedule via probabilistic metalevel
control, limiting the use of resources to replan and strengthen the schedule during execution while
maintaining the effectiveness of the schedule.
In this chapter, we ﬁrst discuss the types of domains that our approach can analyze. We then
detail each of the three main components of probabilistic plan management, as well as discussing
how they ﬁt together to probabilistically manage plans.
4.1 Domain Suitability
In the problems we consider, plan execution has an overall, explicit objective. Objectives typically
fall under one of two general categories: all activities must be executed within their constraints;
or reward (or another metric such as cost) is maximized or minimized. The objective can also be
a combination of these two categories. In all cases, the objective must be speciﬁed in terms of a
49
4 Approach
numeric utility, which quantiﬁes how well the objective was achieved. For example, if all activities
must be executed within their constraints, then one possible utility function is assigning a schedule
that does so a utility of 1, and one that does not a utility of 0. Similarly, the objective of maximizing
reward leads to a utility that simply equals the reward earned during execution. For multiagent
scenarios, we assume agents cooperate together to jointly achieve the plan objective. Additionally,
the objective must be affected in some way by the presence of uncertainty in the problem. If it is
not, then our analysis will not provide any information helpful for probabilistically managing plan
execution.
Any domain that is to be considered by our approach must have an explicit model of its uncer
tainty. No speciﬁc representation of uncertainty is assumed, as long as the distributions represent
ing the uncertainty can be combined, or approximately combined, according to various functions
(e.g., sum, max). Two common distribution types (and the two that we consider explicitly in this
document) are normal and discrete distributions. Normal distributions can be very readily summed
to calculate activities’ ﬁnish time distributions; however, they are unable to exactly capture more
complicated functions, such as the ﬁnish time of an activity given that it meets a deadline [Barr
and Sherrill, 1999]. Discrete distributions, on the other hand, allow for the exact calculation of
combinations of distributions; however, the size of the combination of two distributions is bounded
by the product of the size of the two individual distributions.
In this work, we make the assumption that domain representations can be translated into hier
archical activity networks, which are then considered by our analysis (Section 3.2.1). Discussion
of our analysis does presuppose that we are considering planning problems with durative tasks and
temporal uncertainty; such problems are in general difﬁcult to solve, and we show our approach
is effective in such domains. We assume that schedules are represented as STNs. However, the
temporal aspect of the analysis can easily be left out to perform PPM for domains without durative
actions. Our analysis also assumes that plans do not branch; again, however, this assumption could
be broken and the approach extended to include branching or conditional plans. Our approach does
not place any restrictions on what planner is used or how it determinizes the domain, but ideally it
generates good deterministic plans for the problem of interest. This is because we do not integrate
our approach with the planner, but instead layer an uncertainty analysis on top of the schedules it
produces. In particular, we do not assume the planner plans with activity networks; the planner can
work with whatever domain representation is most convenient.
Although a variety of executive systems can be used in our approach, the most suitable would
provide execution results as they become known throughout execution, allowing for more effective
runtime probabilistic plan management. Also, for the purposes of this analysis, we assume that the
executive is perfect (e.g., tasks always start when requested, no latency exists, etc.). If the executive
50
4.2 Probabilistic Analysis of Deterministic Schedules
system broke this assumption often enough, it should be explicitly modeled and incorporated into
the uncertainty analysis to increase PPM’s effectiveness.
For multiagent domains, we also assume that communication is reliable and has sufﬁcient
bandwidth for agent communications; this assumption is not broken in the multiagent domain
we speciﬁcally consider, as all communication is done via message passing over a wired intranet.
Ways to limit communication in domains with limited bandwidth are discussed in Sections 5.3.5
and 8.1.1. If communication is unreliable and readily goes down, it would adversely affect our
approach as agents only intermittently be able to share probability information. However, the effect
of communication accidently going down can be alleviated somewhat by probabilistic schedule
strengthening, because agents’ last schedules would have the beneﬁt of having been strengthened.
We discuss this further in Section 6.4.4.
In this thesis, we use PPM in two different domains. One is a singleagent domain where the
objective is to execute all tasks within their temporal constraints. There is uncertainty in method
duration, which is represented as normal distributions. The second is a distributed, oversubscribed
multiagent domain where the objective is to maximize reward. There are a variety of sources of
uncertainty, such as in method durations and outcomes, which are represented as discrete distribu
tions.
4.2 Probabilistic Analysis of Deterministic Schedules
At the core of our approach is a Probabilistic Analysis of Deterministic Schedules (PADS). At
the highest level of abstraction, the algorithm explores all known contingencies within the context
of the current schedule in order to ﬁnd its expected utility. It calculates, for each activity, the
probability of the activity meeting its objective, as well as its expected utility; these values, in
part, make up a probability proﬁle for each activity. PADS culminates in ﬁnding the probability
proﬁle of the taskgroup, which summarizes the expected outcome of execution and provides the
overall expected utility. For domains where the plan objective is to execute all activities, the overall
expected utility is between 0 and 1; trivially, it equals the probability that the plan objective will
be met by executing the schedule. For domains with an explicit notion of reward, the taskgroup
expected utility equals the expected earned reward.
By explicitly calculating a schedule’s expected utility, PADS allows us to distinguish between
the deterministic and probabilistic states of a schedule. We consider the deterministic state to be the
status of the deterministic schedule, such as the deterministically scheduled start and ﬁnish times
of activities. The probabilistic state, on the other hand, tracks activities’ ﬁnish time distributions,
51
4 Approach
probabilities of meeting deadlines, and expected utility. The deterministic state knows about one (or
a small set of) possible execution paths, whereas the probabilistic state expands that to summarize
all execution paths within the scope of the current schedule. Thus, the deterministic state and the
probabilistic state have very different notions of what it means for something to be “wrong” with
the schedule. In the deterministic state, something is wrong if execution does not follow that one
path (or small set of paths) and an event happens at a nonscheduled time or in a nonplanned way;
in the probabilistic state, something is wrong if the average utility is below the targeted amount or
the likelihood of not meeting the plan objective is unacceptably low.
In general, domains can have a wide variety of sources of uncertainty, which can affect the ac
complishment of their objectives. Common examples are uncertainty associated with an activity’s
duration, utility, and outcome (e.g., was the activity successful in achieving its goal or not). These
sources of uncertainty propagate through the schedule, affecting the execution of other scheduled
activities and, ultimately, determining how much utility is earned during execution. The overall
structure of PADS analyzes this, starting with the earliest and least constrained activities and prop
agating probability values throughout the schedule to ﬁnd the overall expected utility.
PADS can also provide useful information about the relative importance of activities via the
expected utility loss metric. This metric measures the relative importance of an activity by quanti
fying how much the expected utility of the schedule would suffer if the activity failed, scaled by the
activity’s likelihood of failing. This metric is most useful for plan objectives that involve maximiz
ing or minimizing some characteristic of execution; for plan objectives that involve executing all
activities, the schedule’s expected utility suffers equally from any activity failing, and so expected
utility loss could simply equal the probability of the activity failing.
In distributed, multiagent scenarios, PADS cannot be calculated all at once since no agent has
complete knowledge or control over the entire activity hierarchy. In such situations, therefore, the
PADS algorithm is also distributed, with each agent calculating as much as it can for its portion
of the activity network, and communicating the information with others. Calculations such as
expected utility loss, however, are approximated by agents as needed during execution, because
agents cannot calculate changes in expected utility for the entire schedule with only partial views
of the activity network.
PADS can be used as the basis of many potential planning actions or decisions. In this docu
ment, we discuss how it can be used to strengthen plans to increase expected utility, and how it can
be used as the basis of different metalevel control algorithms that intelligently control the compu
tational resources used to replan during execution. Other potential uses are described in Chapter 8,
which discusses future work.
A novelty of our algorithm is how quickly PADS can be calculated and its results utilized to
52
4.3 Probabilistic Schedule Strengthening
make plan improvements, due to analytically calculating the uncertainty of a problem in the context
of a given schedule. On large reference multiagent problems with 1025 agents, the expected
utility is globally calculated in an average of 0.58 seconds, which is a very reasonable time for
large, complicated multiagent problems such as these.
4.3 Probabilistic Schedule Strengthening
While deterministic planners have the beneﬁt of being more scalable than probabilistic planners,
they make strong, and often inaccurate, assumptions on how execution will unfold. Replanning
during execution can effectively counter such unexpected dynamics, but only as long as planners
have sufﬁcient time to respond to problems that occur; moreover, there are domains when replan
ning is always too late to respond to activity failure, as backup tasks would have needed to be
set up ahead of time. Additionally, even if the agent is able to sufﬁciently respond to a problem,
utility loss may occur due to the last minute nature of the schedule repair. Schedule strengthening
is one way to proactively protect schedules against such contingencies [Gallagher, 2009]. In this
thesis, we utilize the information provided by PADS to layer an additional probabilistic schedule
strengthening step on top of the planning process, making local improvements to the schedule by
anticipating problems that may arise and probabilistically strengthening them to protect schedules
against them.
The philosophy behind schedule strengthening is that it preserves the existing plan structure
while making it more robust. Probabilistic schedule strengthening improvements are targeted at
activities that have a (high) likelihood of failing to contribute to the plan’s objective and whose
failure negatively impacts schedule utility the most. By minimizing the likelihood of these failures
occurring, probabilistic schedule strengthening increases schedule robustness, thereby raising the
expected utility of the schedule.
Because schedule strengthening works within the existing plan structure, its effectiveness can
be limited by the original goodness of the schedule. If a schedule has low utility to begin with, for
example, probabilistic schedule strengthening will not be that useful; if a schedule has high utility
but with a high chance of not meeting its objective or earning that utility, probabilistic schedule
strengthening can help to decrease the risk associated with the schedule and raise the schedule’s
expected utility. Probabilistic schedule strengthening improves the expected outcome of the current
schedule, not the schedule itself.
The effectiveness of schedule strengthening also depends on the structure of the domain, and
on knowledge of that structure, as the speciﬁc actions that can be used to strengthen these activities
53
4 Approach
depend on the domain’s activity network. Domains wellsuited to schedule strengthening ideally
have a network structure that allows for ﬂexibility in terms of activity redundancy, substitution,
or constraints; without this ﬂexibility, it would be difﬁcult to ﬁnd ways to make schedules more
robust. We list here example strengthening actions that can be applied to a variety of domains:
1. Adding a redundant activity  Scheduling a redundant, backup activity under a parent OR
task. This protects against a number of contingencies, such as an activity having a failure
outcome or not meeting a deadline. However, it can also decrease schedule slack, potentially
increasing the risk that other activities will violate their temporal constraints.
2. Switching scheduled activities  Replacing an activity with a sibling under a parent OR or
ONE task, with the goal of shortening activity duration or lowering a priori failure likelihood.
This may result, however, in a lower utility activity being scheduled.
3. Decreasing effective activity duration  Adding an extra agent to assist with the execution
of an activity to shorten its effective duration [Sellner and Simmons, 2006] or to facilitate
its execution in some other way. This decreases the chance that the activity will violate a
temporal constraint such as a deadline; however, adding agents to activities or facilitating
them in another way may decrease schedule slack and increase the likelihood other activities
miss their deadline.
4. Removing an activity  Unscheduling an activity. Although an unexecuted activity does not
contribute to a schedule’s utility, removing activities increases schedule slack, potentially
increasing the likelihood that remaining activities meet their deadlines.
5. Reordering activities  Changing the execution order of activities. This has the potential to
order activities so that constraints and deadlines are more likely to be met; however, it might
also increase the likelihood that activities miss their deadlines or violate their constraints.
PADS allows us to explicitly address the tradeoffs inherent in each of these actions to ensure
that all modiﬁcations make improvements with respect to the domain’s objective. After any given
strengthening action, the expected utility of the plan is recalculated to reﬂect the modiﬁcations
made. If expected utility increases, the strengthening action is helpful; otherwise it is not helpful
and the action can be rejected and undone.
During overall probabilistic schedule strengthening, activities are prioritized for strengthening
by expected utility loss so that the most critical activities are strengthened ﬁrst. Strengthening
actions can be prioritized by their effectiveness: in general, the most effective actions preserve the
existing schedule structure (such as (1) and (3)); the least effective do not (such as (4)).
54
4.4 Probabilistic MetaLevel Control
During execution, probabilistic schedule strengthening can be performed at any time to ro
bustify the schedule as execution proceeds. Ultimately, however, it is best when coupled with
replanning to build effective, robust schedules. Replanning during execution allows agents to gen
erate schedules that take advantage of opportunities that may arise; schedule strengthening in turn
makes those schedules more robust to unexpected or undesirable outcomes, increasing schedule
expected utility.
On reference problems in the multiagent domain, we have shown that probabilistically strength
ened schedules are signiﬁcantly more robust, and have an average of 32% more expected utility
than their initial, deterministic counterparts. Using probabilistic schedule strengthening in con
junction with replanning during execution results, on average, in 14% more utility earned than
replanning alone during execution.
4.4 Probabilistic MetaLevel Control
Replanning during execution provides an effective way of responding to the unexpected dynamics
of execution. There are clear tradeoffs, however, in replanning too much or too little during
execution. On one hand, an agent could replan in response to every update; however, this could
lead to lack of responsiveness and a waste of computational resources. On the other hand, an agent
could limit replanning, but run the risk of not meeting its goals because errors were not corrected
or opportunities were not taken advantage of.
Ideally, an agent would always and only replan (or take some other planning action, like
strengthening) when it was useful; either to take advantage of opportunities that arise or to min
imize problems and/or conﬂicts during execution. At other times, the agent would know that the
schedule had enough ﬂexibility – temporal or otherwise – to absorb unexpected execution out
comes as they arise without replanning. The question of whether replanning would be beneﬁcial,
however, is difﬁcult to answer, in large part because it is difﬁcult to predict whether replanning is
beneﬁcial without actually doing it.
As an alternative to explicitly answering the question, we use our probabilistic analysis to
identify times where there is potential for replanning to be useful, either because there is a potential
new opportunity to leverage or a potential failure to avoid. The probabilistic control algorithm thus
distinguishes between times when it is likely necessary (or beneﬁcial) to replan, and times when it
is not. This eliminates potentially unnecessary and expensive replanning, while also ensuring that
replans are done when they are probably necessary or may be useful.
There are several different ways to use PADS to drive the metalevel decision of when to replan.
55
4 Approach
For example, strategies can differ in which activity probability proﬁles they consider when deciding
whether to replan. Which component of a probability proﬁle is utilized (e.g., expected utility,
probability of meeting objective, etc.) can also vary between strategies. Additionally, different
strategies can either consider activities’ proﬁles on their own, or look at how the proﬁles change
over time.
When designing metalevel control strategies, the objective of the plan can help lead to ﬁnding
the best strategy. For planning problems with a clear upper bound on utility, such as ones in
which the objective is to execute all or a set of activities within their temporal constraints, a good
strategy is to replan whenever the expected utility of the entire schedule falls below a threshold.
For oversubscribed problems with the goal of maximizing reward, the maximum feasible utility is
typically not known, and a better strategy might be a learningbased approach that learns when to
replan based on how activities’ expected utilities change over time.
Whether the approach is centralized or distributed can also affect the choice of a control strat
egy. For centralized applications, looking at the plan as a whole may sufﬁce for deciding when
to replan. For distributed applications, however, when agents decide to replan independently of
one another, the plan as a whole may be a poor basis for a control algorithm because the state of
the joint plan does not necessarily directly correspond to the state of an agent’s local schedule and
activities. Instead, in distributed cases, it may be more appropriate to consider the local probabilis
tic state, such as the probability proﬁles of methods in an agent’s local schedule, when deciding
whether to replan.
The control strategy can also be used to limit not only the number of replans, but also the
amount of the current schedule that is replanned. In situations with high uncertainty, always re
planning the entire schedule means that tasks far removed from what is currently being executed
are replanned again and again as the near term causes conﬂicts, potentially causing a waste of com
putational effort. To this end, we introduce an uncertainty horizon, which uses PADS to distinguish
between portions of the schedule that are likely to need to be replanned at the current time, and
portions that are not. This distinction is made based on the probability that activities beyond the
uncertainty horizon will need to be replanned before executing them.
Experiments in simulation for the singleagent domain have shown a 23% reduction in schedule
execution time (where execution time is deﬁned as the schedule makespan plus the time spent
replanning during execution) when probabilistic metalevel control is used to limit the amount of
replanning and to impose an uncertainty horizon, versus when the entire plan is replanned whenever
a schedule conﬂict arises. Experiments in the multiagent domain show a 49% decrease in plan
management time when our probabilistic metalevel control strategy is used, as compared to a plan
management strategy that replans according to deterministic heuristics, with no signiﬁcant change
56
4.4 Probabilistic MetaLevel Control
in utility earned.
In PPM, all these components come together to probabilistically and effectively manage plan
execution. A ﬂow diagram of PPM is shown in Figure 4.1. At all times, PADS calculates and
updates probabilistic information about the current schedule as execution progresses. The default
state of an agent is a “wait” state, where it waits to hear of changed information, either from the
simulator specifying the results of execution or, in multiagent scenarios, messages from other
agents specifying changes to their probabilistic information. Any time such an update is received,
the agent updates its probability proﬁles via PADS (described in Chapter 5). Then, in order to
control the amount of replanning, probabilistic metalevel control (PMLC in Figure 4.1, described
in Chapter 7) decides whether to replan (or probabilistically strengthen the schedule) in response to
the changes in the agents probability proﬁles. Probabilistic metalevel control makes this decision
with the goal of maintaining high utility schedules while also limiting unnecessary replanning.
Finally, each time that a replan is performed, PADS ﬁrst updates the probability proﬁles of the new
schedule, and then a probabilistic schedule strengthening pass is performed (PSS in Figure 4.1,
described in Chapter 6) in order to improve the expected utility of the new schedule. Finally, in
multiagent scenarios, agents would communicate these changes to other agents. Note that this
diagram illustrates only the ﬂow of PPM in order to demonstrate its principles. It does not, for
example, show the ﬂow of information to or from agents’ executive systems.
Overall, the beneﬁt of PPM is clear. When all components of PPM are used, the average utility
earned in the multiagent domain increases by 14% with an average overhead of only 18 seconds
per problem, which corresponds to only 2% of the total time of execution.
In the next chapter, we discuss the PADS algorithm for both the singleagent and multiagent
domains. Chapter 6 details our probabilistic schedule strengthening approach, while Chapter 7
explains in depth the probabilistic metalevel control strategy. Finally, Chapter 8 presents our
conclusions and future work.
57
4 Approach
!"#$%
&#'()"$*+%
,$.+%
/0.1$2%
(34"$.%'.22"0.2%
5/6&%
5789%
5)"1%
5&&%
:2$+.10$.1%*1);<%
:+.3)"1%"14%%
2$+.10$.1<%
:4*%1*$#10<%
/0.1$%557%=)*>%6#"0+"'%
5/6&%
(34"$.%
'.22"0.2%
Figure 4.1: Flow diagram for PPM.
58
Chapter 5
Probabilistic Analysis of Deterministic
Schedules
This section discusses in detail how to perform a Probabilistic Analysis of Deterministic Schedules
(PADS), which is the basis of our probabilistic plan management approach. We developed versions
of this algorithm for both of our domains of interest. Section 5.1 provides an overview of how
the probability analysis is performed in the two domains we consider, while Sections 5.2 and 5.3
provide details about the algorithms. The casual reader, after reading the overview section, can
safely pass over the two more detailed sections that follow. Finally, Section 5.4 presents PADS
experimental results.
5.1 Overview
The goal of the PADS algorithm is to calculate, for each activity in a problem’s activity network,
a probability proﬁle, which includes the activity’s probability of meeting its objective (po) and its
expected utility (eu). The algorithm propagates probability proﬁles through the network, culminat
ing in ﬁnding the probability proﬁle of the taskgroup, which summarizes the expected outcome of
execution and provides the overall expected utility.
For domains with temporal constraints like the two we consider, a method’s ﬁnish time distri
bution (FT) is typically necessary for calculating the probability that the method meets the plan
objective; in turn, its start time distribution (ST) and duration distribution are necessary for cal
culating its ﬁnish time distribution. Along with the method’s probability of meeting its objective
59
5 Probabilistic Analysis of Deterministic Schedules
and expected utility, we call these values the method’s probability proﬁle. In general, in this and
following chapters we use uppercase variables to refer to distributions or sets and lowercase vari
ables to refer to scalar values. We also use lowercase variables to refer to methods or activities, but
uppercase variables to refer speciﬁcally to tasks.
Algorithm 5.1 provides the general, highlevel structure for calculating a method’s probability
proﬁle. It can be adjusted or instantiated differently for different domain characteristics, as we will
see in Sections 5.2 and 5.3, when we discuss PADS in detail for the two domains.
Algorithm 5.1: Method probability proﬁle calculation.
Function : calcMethodProbabilityProfile(m)
//calculate m’s start time distribution
ST (m) = calcStartTimes(m) 1
//calculate m’s finish time distribution
FT (m) = calcFinishTimes(m) 2
//calculate m’s probability of meeting its objective
po (m) = calcProbabilityOfObjective(m) 3
//calculate m’s expected utility
eu(m) = calcExpectedUtility(m) 4
The order in which activities’ proﬁles are calculated is based on the structure of activity de
pendencies in the activity network. For example, a method is dependent on its predecessor and
its prerequisites; correspondingly, its proﬁle depends on the proﬁles of those activities and cannot
be calculated until they are. Once these proﬁles are known, a method’s start time distribution can
be calculated (Line 1). Typically, calculating a method’s start time distribution includes ﬁnding
the maximum of its earliest start time, est, and its predecessors and prerequisite activities’ ﬁnish
time distributions, as it cannot start until those activities ﬁnish. A method’s ﬁnish time distribu
tion is found next (Line 2), and generally includes summing its start time distribution and duration
distribution.
At any point during these calculations, a method may be found to have a nonzero probability
of violating a constraint or otherwise not meeting its objective, leading to a failure. These cases
are collected and combined to ﬁnd m’s probability of meeting its objective (Line 3). Similarly, if
a method is found to have a nonzero a priori probability of failing, such information should be
incorporated into the po (m) calculation.
Finally, a method’s expected utility is calculated (Line 4). Its expected utility depends on
po (m) and its utility, as well as any other dependencies (such as soft NLEs) that may affect it.
60
5.1 Overview
Its utility depends on the plan objective. If the plan objective is to execute all scheduled methods,
individual methods should all have the same utility; if the plan objective includes an explicit model
of reward or utility, a method’s utility is deﬁned by that.
Probability proﬁles are found differently for tasks. The general, highlevel algorithm is shown
in Algorithm 5.2. For tasks, start time distributions are ignored as they are not necessary for
calculating a task’s ﬁnish time distribution; instead, a task’s ﬁnish time distribution depends on the
ﬁnish time distributions of its children. A task’s ﬁnish time distribution may only be necessary to
explicitly calculate if there are direct relationships between it or its ancestors and other activities;
however, we do not show that check here. The algorithm calculates tasks’ probability proﬁles based
on whether the task is an AND, OR or ONE task.
Algorithm 5.2: Task probability proﬁle calculation.
Function : calcTaskProbabilityProfile(T)
switch Type of task T do 1
case AND 2
FT (T) = calcANDTaskFinishTimes(T) 3
po (T) = calcANDTaskProbabilityOfObjective(T) 4
eu(T) = calcANDTaskExpectedUtility(T) 5
case OR 6
FT (T) = calcORTaskFinishTimes(T) 7
po (T) = calcORTaskProbabilityOfObjective(T) 8
eu(T) = calcORTaskExpectedUtility(T) 9
case ONE 10
if [scheduledChildren(T) [ == 1 then 11
c = soleelement(scheduledChildren(T)) 12
FT (T) = FT (c) 13
po (T) = po (c) 14
eu(T) = eu(c) 15
else 16
po (T) = eu(T) = 0 17
end 18
end 19
For AND and OR tasks, the algorithm does not explicitly check the number of children that are
scheduled. That is because we assume that unscheduled tasks have a po and eu of 0. This property
means that the equations in the algorithm evaluate to the desired values without such a check. In
this algorithm, the function scheduledChildren returns the scheduled children of a task.
61
5 Probabilistic Analysis of Deterministic Schedules
An AND task’s ﬁnish time distribution includes calculating the max of its children’s, as it does
not meet its plan objective until all children do (Line 3). Similarly, its probability of meeting the
plan objective should equal the probability that all children do (Line 4). An AND task’s expected
utility is generally a function of the utility of its children, multiplied by the probability that it meets
its objective; this function depends on the domain speciﬁcs (such as what the task’s uaf is) (Line 5).
An OR task’s ﬁnish time distribution includes calculating the min of its scheduled children’s,
as it meets its plan objective when its ﬁrst child does (Line 7). Its probability of meeting the plan
objective is the probability that at least one child does (which is equivalent to the probability that
not all scheduled children do not) (Line 8). An OR task’s expected utility is generally a function
of the utility of its children; this function depends on the domain speciﬁcs (such as what the task’s
uaf is) (Line 9).
Trivially, ONE tasks’ proﬁles equal those of their sole scheduled child. Typically, planners do
not schedule more than one child under a ONE task. Therefore, we omit considering, for cases
where more than one child is scheduled, the probability that only one succeeds. Instead, we say
instead that in such a case the task has an expected utility of 0. Algorithms 5.1 and 5.2, put together,
enable the expected utility of the entire schedule to be found. In subsequent equations, we refer to
this as eu(TG), where TG is the taskgroup.
5.1.1 SingleAgent PADS
Recall the representation of the singleagent domain: at the root is the AND taskgroup, and below
it are the ONE tasks, each with their possible child methods underneath. That is, the plan objective
is to execute one child of each task within its imposed temporal constraints. Trivially, therefore,
po (a) equals the probability that an activity a meets this objective. We deﬁne plan utility for this
domain to be 1, if all tasks have one child method that executes within its constraints, or 0 if all do
not. Due to this simple plan structure, PADS for this domain is fairly intuitive. It moves linearly
through the schedule, calculating the probability proﬁles of each method, and passing them up to
their parents, until all tasks’ proﬁles have been found and the expected utility of the schedule is
known.
5.1.2 MultiAgent PADS
The overall objective of our multiagent domain is to maximize utility earned. While calculating
probability proﬁles for this domain, we deﬁne po (a) to equal the probability that an activity a earns
positive utility; its utility is trivially its domaindeﬁned utility.
62
5.2 SingleAgent Domain
This domain is inherently more complex than our singleagent domain, leading to a number
of extensions to the simple algorithms and equations presented above. First, the calculations of
tasks’ expected utilities depend on the tasks’ uaf types, not just node types (i.e., AND, OR or
ONE). Second, the disables, facilitates and hinders relationships need to be factored in at various
stages in the calculations as appropriate: disablers affect whether a method starts; and facilitators
and hinderers affect its ﬁnish time and expected utility calculations. The analysis begins at the
lowest method level, and propagates proﬁles both forward and upward through agents’ schedules
and the activity hierarchy until all activities’ (including the taskgroup’s) probability proﬁles have
been calculated and the schedule’s expected utility is known.
We have developed two versions of PADS for this domain. The ﬁrst is intended to be performed
ofﬂine and centrally in order to perform plan management before the initial global schedule is dis
tributed to agents. The second version is used for probabilistic plan management during distributed
execution. When PADS is performed globally, calculations begin with the ﬁrst method(s) sched
uled to execute, and later activities’ proﬁles can be calculated as soon as the proﬁles of all activities
they are dependent on have been determined. In the distributed case, the same basic propagation
structures takes place; however, agents can update only portions of the activity network at a time
and must communicate (via message passing) relevant activity probability proﬁles to other agents
as the propagation progresses. Figure 5.1 shows a schedule, the resulting activity dependencies
and proﬁle propagation order for the small example shown in Figure 3.5, on page 39. Figure 5.2
shows how the proﬁle propagation is done in a distributed way. Although it is not illustrated in the
diagram for visual clarity, after agent Buster ﬁnish its calculations, it would pass the proﬁles of T2
and the taskgroup to agent Jeremy, as Jeremy has remote copies of them.
5.2 SingleAgent Domain
We have implemented PADS for a singleagent domain [Hiatt and Simmons, 2007]. Recall the
characteristics of this domain: the plan’s objective is that all tasks, each with exactly one method
underneath them, execute without violating any temporal constraints. The only source of uncer
tainty is in method duration; and temporal constraints are in the form of minimal and maximal time
lags between tasks, as well as a ﬁnal deadline. We use a ﬁxed times planner to generate schedules,
and we expand solutions to be ﬂexible times by translating them into an STN.
Recall from our discussion on STNs (Section 3.2.2, on page 31) that if a method does not
begin by its latest start time, lst, the STN will become inconsistent. In the STN representing
the deterministic schedules of this domain, time bounds are based only on methods’ scheduled
durations, i.e., the edge representing the duration constraint speciﬁes lower and upper bounds of
63
5 Probabilistic Analysis of Deterministic Schedules
!"# $%&%!'(#
)!%#"**#
!+# ,./%&(# !0#
%
1
2
3
4
%
.
#
)!%#"+*#
52&26(#
)!%#"7*#
%
1
2
3
4
%
.
#
Laskaroup
!"#: $!%&(C8)
11
rel: 100, dl: 120
!"#: %"'&(C8)&
12
rel: 110, dl: 130
!"#: $!%(")*&(Anu)&
m1
fall: 20°
u1lL: 10 (23°), 12 (30°), 14 (23°)
uu8: 12 (23°), 14 (30°), 16 (23°)
aaenL: !eremv
m2
u1lL: 4 (23°),
3 (30°), 6 (23°)
uu8: 7 (23°),
8 (30°), 9 (23°)
aaenL: Sarah
m3
u1lL: 3 (23°),
6 (30°), 7 (23°)
uu8: 4 (23°),
3 (30°), 6 (23°)
aaenL: 8usLer
m4
u1lL: 8 (23°),
9 (30°), 10 (23°)
uu8: 7 (23°),
9 (30°), 11 (23°)
aaenL: 8usLer
enables
1
3
4
3
2
6
Figure 5.1: The activity dependencies (denoted by solid arrows) and propagation order (denoted
by circled numbers) that result from the schedule shown for the problem in Figure 3.5.
64
5.2 SingleAgent Domain
Laskaroup
!"#: $!%&(C8)&
11
rel: 100, dl: 120
!"#: %"'&(C8)&
12
rel: 110, dl: 130
!"#: $!%(")*&(Anu)
m3
u1lL: 3 (23°), 6 (30°), 7 (23°)
uu8: 4 (23°), 3 (30°), 6 (23°)
aaenL: 8usLer
m4
u1lL: 8 (23°), 9 (30°), 10 (23°)
uu8: 7 (23°), 9 (30°), 11 (23°)
aaenL: 8usLer
enables
Laskaroup
!"#: $!%&(C8)&
12
rel: 110, dl: 130
!"#: $!%(")*&+Anu)&
11
rel: 100, dl: 120
!"#: %"'&(C8)&
m1
fall: 20°
u1lL: 10 (23°), 12 (30°), 14 (23°)
uu8: 12 (23°), 14 (30°), 16 (23°)
aaenL: !eremv
m2
u1lL: 4 (23°), 3 (30°), 6 (23°)
uu8: 7 (23°), 8 (30°), 9 (23°)
aaenL: Sarah
(remoLe)
(remoLe)
1
2
(remoLe)
2
vla
m
e
ssa
ae
p
a
ssln
a
3
4 3
6
Jeremy:
Buster:
enables
(remoLe)
Figure 5.2: The same propagation as Figure 5.1, but done distributively. Jeremy ﬁrst calculates the
proﬁles for its activities, m1 and T1, and passes the proﬁle of T1 to Buster; Buster can then ﬁnish
the proﬁle calculation for the rest of the activity network.
65
5 Probabilistic Analysis of Deterministic Schedules
[dur
m
, dur
m
] (see Section 3.2.2, on page 31, for this description). One implication of this is that,
if a method misses its lst, the STN will become inconsistent; however, the schedule may still
execute successfully, such as if the method begins after its lst but executes with a duration less
than dur
m
. Ideally, this contingency would be captured by the probabilistic analysis.
To do so, before performing its analysis, PADS expands the duration edges in the STN to
include the range of possible integer durations for methods bounded by
[max(1, µ(DUR(m)) −3 σ (DUR(m))) , µ(DUR(m)) + 3 σ (DUR(m))]
This covers approximately 99.7% of methods’ duration distributions. We use integer durations
because method’s executed durations are rounded to the nearest integer during simulation (Sec
tion 3.3.1). By expanding method durations in the STN, we can interpret the resulting time bounds
differently. Now, if a method does not start by its revised lst, it is almost certain that the schedule
cannot successfully be executed asis. This allows PADS to more accurately calculate the sched
ule’s probability of succeeding. We use a “
” symbol to denote that we are using this modiﬁed
STN: we refer to it as STN
, and time bounds generated from it are est
, lst
, etc.
5.2.1 Activity Dependencies
In this domain, since all tasks need to be executed within their constraints, the probability that
the schedule meets its objective for a problem with T tasks is po (TG) = po (m
1
, ..., m
T
), where
po (m
i
) is the probability that method m
i
executes without violating any constraints.
The probabilities po (m
1
) , ..., po (m
T
) are not independent. The successful execution of any
method is dependent on the successful execution of its predecessors, because if a predecessor
does not execute successfully (for example by violating its lft
) then execution of the current
schedule no longer has any possibility of succeeding. Conditional probability allows us to rewrite
the equation as:
po (m
1
, ..., m
T
) =
T
¸
i=1
po (m
i
[ m
1
, ..., m
i−1
)
That is, a method’s po value should equal the probability that it executes within all its constraints
given that all previous ones do, as well. Then, the schedule’s po value is calculated by multiplying
the po values for the individual ONE tasks, which equal the po values of their sole scheduled child,
as shown in Section 5.1. Note that we can avoid explicit consideration of prerequisite relationships;
they are implicitly considered by the above consideration that all of a method’s predecessors must
successfully execute before it can, since the agent can execute only one thing at a time.
66
5.2 SingleAgent Domain
5.2.2 Probability Proﬁles
As the structure of the entire activity network is standard for all problems in this domain, we are
able to present a closedform PADS algorithm. Algorithm 5.3 shows the pseudocode for this. Note
how this algorithm provides domainspeciﬁc implementations of the functions in Algorithm 5.1,
and reuses its calculation of ONE tasks’ probability proﬁles. We do not need to calculate the
ﬁnish time distributions of the ONE tasks, because neither they nor their ancestors have any direct
dependencies with other activities. Since the distributions in this domain are represented as normal
distributions, we present them in terms of their mean and variance. Distributions are combined
according to the equations presented in Section 3.3.1 (on page 35).
The function schedule in this and later algorithms returns the agent’s current schedule, with
methods sorted by their scheduled start time, and children and scheduledChildren return
the children and scheduled children of a task, respectively. The algorithm iterates through each
scheduled method and ﬁnds its probability of violating temporal constraints given that all previous
methods successfully complete. As with Algorithm 5.1, it ﬁrst calculates the method’s start time
distribution. We use the truncate function to approximate the max function (see Section 3.3.1
for speciﬁcs). It follows that its approximate start time distribution is its predecessor’s ﬁnish time
distribution, given it succeeds, truncated at its lower bound by est
(m
i
) (Line 2); as stated above,
we can avoid explicit consideration of prerequisite relationships in our calculations.
The method’s ﬁnish time distribution is the sum of its start time and duration distributions
(Line 3). The probability of the method ending at a valid time equals the probability of it ending
in the interval deﬁned by its earliest ﬁnish time eft
and its latest ﬁnish time lft
(Line 4),
as reported by the STN
. This value is m’s po value, assuming all prior methods have ﬁnished
successfully.
Assigning each method an arbitrary constant utility of 1, m
i
’s expected utility equals po (m
i
).
Next, the method’s ﬁnish time distribution is truncated to approximate its ﬁnish time distribution
given it succeeds, FT
s
, in order to pass that forward to future methods (Line 6). It is not necessary
to ﬁnd m
i
’s failure ﬁnish time distribution, as if any method fails the schedule cannot execute
successfully asis.
After the proﬁles of all methods have been found, their proﬁles are passed up to the task level.
All tasks are ONE tasks, so their proﬁles equal that of their sole scheduled child (Lines 911); if
no child (or more than one child) is scheduled, the tasks’ po and eu values are 0 (Lines 1214).
Finally, after this has been done for all tasks, the individual task po values can be multiplied to ﬁnd
the overall probability of the schedule meeting its objective (Line 17). This, trivially, equals the
schedule’s expected utility (Line 18).
67
5 Probabilistic Analysis of Deterministic Schedules
Algorithm 5.3: Taskgroup expected utility calculation for the singleagent domain with normal dura
tion distributions and temporal constraints.
Function : calcExpectedUtility
foreach method m
i
∈ schedule do 1
//calculate m
i
’s start time distribution
ST (m
i
) = truncate(est
(m
i
) , ∞, µ(FT
s
(m
i−1
)) , σ (FT
s
(m
i−1
))) 2
//calculate m
i
’s finish time distribution
FT (m
i
) = ST (m
i
) +DUR(m
i
) 3
//calculate m
i
’s probability of meeting its objective
po (m
i
) = normcdf(lft
(m
i
) , µ(FT (m
i
)) , σ (FT (m
i
))) − 4
normcdf(eft
(m
i
) , µ(FT (m
i
)) , σ (FT (m
i
)))
//calculate m
i
’s expected utility
eu(m
i
) = po (m
i
) 5
//adjust the FT distribution to be m
i
’s FT distribution given it
succeeds
FT
s
(m
i
) = truncate(eft
(m
i
) , lft
(m
i
) , µ(FT (m
i
)) , σ (FT (m
i
))) 6
end 7
//calculate ONE tasks’ profiles
foreach T ∈ children(TG) do 8
if [scheduledChildren(T) [ == 1 then 9
po (T) = po (soleelement(T)) 10
eu(T) = eu(soleelement(T)) 11
else 12
po (T) = 0 13
eu(T) = 0 14
end 15
end 16
po (TG) =
¸
T∈children(TG)
po (T) 17
eu(TG) = po (TG) 18
68
5.3 MultiAgent Domain
Before execution begins, this algorithm is used to ﬁnd the probability proﬁles of the entire
schedule. During execution, feedback of execution results, i.e., the methods’ actual, executed
durations, are asserted into the STN as they become known. As PADS continues to update its
probability values, it skips over activities that have already been executed and ﬁnds only the proﬁles
for activities that have not yet completed. These proﬁles will be used in as part of a probabilistic
metalevel control algorithm, which will be described in Section 7.2.
5.3 MultiAgent Domain
This section discusses the implementation of PADS for the multiagent domain we consider [Hiatt
et al., 2008, 2009]. Recall that the goal of this domain is to maximize utility earned. We say, there
fore, that the probability of meeting the plan objective equals the probability of earning positive
utility. The multiagent domain has many sources of uncertainty and a complicated activity struc
ture. We assume a partialorder deterministic planner exists, which outputs schedules represented
as STNs. Unlike the singleagent domain, however, when performing PADS we do not expand
the STN to an STN
that includes all possible method durations; this design decision was made to
facilitate integration with the existing architecture [Smith et al., 2007].
The distributions we work with in this domain are discrete. In its algorithms, PADS combines
the distributions according to the equations deﬁned in Section 3.3.2 (on page 44). As part of the
algorithms it also iterates through distributions. We use the foreach loop construct to designate
this; e.g., “foreach ft ∈ FT (m),” which iterates through each itemft in the distribution FT (m).
Inside the loop, PADS will also refer to p (ft); this is the probability of ft occurring according to
FT.
5.3.1 Activity Dependencies
Activity dependencies and po calculations need to be handled differently for this domain than in the
singleagent domain. First, there is a more complicated dependency structure, due to more types
of NLEs and, more speciﬁcally, NLEs between activities on different agents’ schedules. There
are also more ways for activities to fail than violating their temporal constraints, including other
activities explicitly disabling them. Finally, because of the varying activity network structure and
dependencies between activities, we do not present a closedform formula representing po (TG).
When calculating activity ﬁnish time proﬁles for this domain, we make the simplifying as
sumption that a method’s nonlocal source NLEs ﬁnish times are independent of one another (e.g.,
a method’s enabler is different from its disabler or facilitator); this assumption is rarely broken
69
5 Probabilistic Analysis of Deterministic Schedules
in practice (e.g., an enabler rarely also serves as a facilitator). For tasks, we also assume that all
children’s ﬁnish time distributions are independent of one another. This assumption fails if, for
example, both children are enabled by the same enabler. This approximation was found to be accu
rate enough for our purposes, however, and the extra computational effort required to sort out these
dependencies was deemed unnecessary (see Section 5.4); we also show an example supporting this
decision at the end of this section.
We do not, however, make this type of independence assumption when calculating po values, as
it is too costly with respect to accuracy. To see why, consider the activity network and associated
schedule shown in Figure 5.1 (on page 64). It is clear from this example that po (T1) = 0.8
because its only scheduled child is m1, and m1 only fails with a 20% likelihood. Both m3 and m4
are enabled by T1. Therefore, if neither has a probability of missing its deadline, po (m3) = 0.8
and po (m4) = 0.8 as they both can earn utility only if T1 does. Assuming that these values are
independent means that po (T2) = 0.8 0.8 = 0.64, when in fact it should equal 0.8; essentially,
we are doublecounting T1’s probability of failing in po (T2). We deem this error unacceptably
high, and so put forth the computational effort to keep track of these dependencies.
To track the dependencies, our representation of po for all activities is based on a utility formula
UF (a), which indicates what events must occur during execution for activity a to earn utility and
how to combine the events (e.g., at least one needs to happen, or all must happen, etc.) into an
overall formula of earning utility. We deﬁne events as possible contingencies of execution, such as
a method ending in a failure outcome, a method’s disabler ending before the method begins, or a
method missing its deadline. The formula assigns probabilities to each individual event, and can
be evaluated to ﬁnd an overall probability of earning utility.
As an example of how the utility formula is handled, consider again the example in Figure 5.1.
We deﬁne UF (T1) = (utilm1, (utilm1 ← 0.8)). This formula says, intuitively, that “m1 must
execute with positive utility for T1 to earn utility; this happens with a probability of 0.8.” Formally,
UF formulas have two parts: the event formula component, UF
ev
, and the value assignment com
ponent, UF
val
. In the example above, UF
ev
(T1) = utilm1 and UF
val
(T1) = (utilm1 ← 0.8).
The event formula component speciﬁes which events must come true in order for T1 to earn utility;
here, just m1 must earn utility. The event variable utilm1 designates “the event of m1 earning
utility” and is used in place of m1 to avoid confusion. The value assignment component speciﬁes
probability assignments for the events; here, it says that the event of m1 earning utility will occur
with probability 0.8 (due to m1’s possible failure outcome). UF (T1) can be evaluated to ﬁnd
po (T1), which here, trivially, is 0.8.
Since neither m3 nor m4 will fail on their own, but depend on T1 earning utility, their for
mulas mimic that of T1: UF (m3) = UF (m4) = (utilm1, (utilm1 ← 0.8)). Now we can
70
5.3 MultiAgent Domain
deﬁne UF (T2). For T2 to earn utility, both m3 and m4 need to. Therefore, UF
ev
(T2) is the
and of UF
ev
(m3) and UF
ev
(m4), as both m3 and m4 must earn utility for T2 to earn utility.
UF
val
(T2) is the union of UF
val
(m3) and UF
val
(m4), ensuring all events have probability value
assignments. So, UF (T2) = ((and utilm1 utilm1) , (utilm1 ← 0.8, utilm1 ← 0.8)). This
can be simpliﬁed to UF (T2) = (utilm1, (utilm1 ← 0.8)), which evaluates to po (T2) = 0.8,
the correct value.
More generally, there are ﬁve events that affect whether a method earns utility: (1) whether
it is disabled before it begins to be executed; (2) whether its enablers must earn positive utility;
(3) whether it begins executing by its lst; (4) whether it meets its deadline; and (5) whether its
execution yields a successful outcome. These events have event variables associated with them
(e.g., the sucm variable represents a method m executing with a successful outcome), which
can be combined together as an event formula that represents what must happen for a method to
earn utility; as we saw above, that event formula can then be evaluated in conjunction with value
assignments to ﬁnd the method’s probability of earning utility.
In this domain, if a method ends in a failure outcome, violates a domain constraint or cannot
be executed, executing the remainder of the schedule asis could still earn utility. Therefore, in
contrast to the singleagent domain, it is necessary to explicitly calculate methods’ failure ﬁnish
time distributions to propagate forward. In this domain, a method’s ﬁnish time is either when it
actually ﬁnishes (whether it earns utility or not) or, if it is not able to be executed, when the next
activity could begin executing. For example, if while a method’s predecessor is executing, the agent
learns that the method has been disabled by an activity d, the “ﬁnish time” of the disabled method is
max(FT (pred(m)) , ft (d)), as that is the earliest time that the subsequently scheduled method
(on the same agent’s schedule) could start. The function pred returns the predecessor of a method
(the method scheduled before it on the same agent’s schedule).
We represent FT (a) in terms of two components: FT
u
(a), or a’s ﬁnish time distribution
when it earns utility, and FT
nu
(a), or a’s ﬁnish time distribution when it does not earn utility.
Together, the probabilities of FT
u
(a) and FT
nu
(a) sum to 1; when we refer to FT (a) alone, it is
the concatenation of these distributions.
We now return back to our assumption that a task’s children’s ﬁnish time distributions are
independent of one another. When calculating the ﬁnish time distribution of task T2 from the above
example, we would doublecount the probability that T1 fails and does not enable either m3 or m4.
Say, hypothetically, that we have already calculated that FT
u
(m3) = ¦¦118, 0.3¦, ¦120, 0.5¦¦,
and FT
u
(m4) = ¦¦127, 0.6¦, ¦129, 0.2¦¦. Because T2 is an AND task, FT
u
(T2) equals the
maximum of its children’s FT
u
distributions, since it does not earn utility until both children have.
When we ﬁnd the maximum of these distributions as if they were independent, we get FT
u
(T2) =
71
5 Probabilistic Analysis of Deterministic Schedules
¦¦127, 0.48¦, ¦129, 0.16¦¦. Intuitively, however, the ﬁnish time distribution of T2 should be
¦¦127, 0.6¦, ¦129, 0.2¦¦, because m3 and m4 always either both earn utility, or both do not earn
utility. If, however, we scale FT
u
(T2) by our previously calculated po (T2), so that the distribution
of T2’s ﬁnish time when it earns utility sums to po (T2), we get the desired value of FT
u
(T2) =
¦¦127, 0.6¦, ¦129, 0.2¦¦. Although this approach does not always result in the exactly correct
ﬁnish time distributions, this example illustrates why we make the independence assumption for
tasks’ ﬁnish time distributions, and introduces the extra step we take of scaling the distributions by
activities’ po values.
5.3.2 Probability Proﬁles
We deﬁne the algorithms of PADS recursively, with each activity calculating its proﬁle in terms
of the proﬁles of other activities it is connected to. The complex activity dependency structure
for this domain also means that the highlevel algorithms presented in Algorithms 5.1 and 5.2 (on
pages 60 and 61) must be elaborated upon in order to calculate probability proﬁles for activities
in this domain. We present much of the probability proﬁle calculation in this domain in terms of
providing appropriate individual functions to instantiate these algorithms.
The following algorithms are used in both the global and distributed versions of PADS for this
domain. Then, Sections 5.3.3 and 5.3.4 addresses the global and distributed versions, speciﬁcally.
Method Probability Proﬁle Calculations
The calculation of methods’ proﬁles for this domain is the most complicated, as all activity de
pendencies (e.g., predecessors, enablers, disablers, facilitators and hinderers) are considered at this
level.
Recall again the ﬁve events determining whether a method earns utility: (1) it is not disabled;
(2) it is enabled; (3) it meets its lst and begins executing; (4) it meets its deadline; and (5) it
does not end in a failure outcome. Of these events, (5) is independent of the others; we also, as
previously stated, assume that (1) and (2) are independent. Event (3), however, depends on whether
it is able to begin execution and is contingent on (1) and (2); similarly, (4) is dependent on (3). To
ﬁnd the overall event formula of m earning utility, then, we ﬁnd the probability value assignments
of these individual events given these dependencies. We list them explicitly below along with the
associated event variable:
1. dism  The event of m being disabled before it starts.
72
5.3 MultiAgent Domain
2. enm  The event of m being enabled; we assume this is independent of dism.
3. stm  The event of m beginning execution given that it is not disabled and is enabled.
4. ftm  The event of m meeting its deadline given that it begins execution.
5. sucm  The event of m not ending in a failure outcome; due to its a priori likelihood, it is
independent of the other events.
Overall, UF
ev
(m) = (and (not dism) enm stm ftm sucm).
Two of these events, dism and enm are actually event formulas, in that they are determined
by the outcome of other events. For example, enm = (and UF
ev
(en1) UF
ev
(en2) ...), where
en1, en2, etc. are the enablers of m. The variable dism equals a similar formula. These formulas
appear in the place of dism and enm in m’s event formula.
Of the ﬁve events determining whether a method earns utility, the probability values (and event
formulas, if applicable) of two – all enablers earning utility, and the method having a successful
outcome – can be found initially. The values of the other three, however, depend on the method’s
start and ﬁnish times. Therefore as we calculate the method’s start time and ﬁnish time distribu
tions, we also calculate the value assignments (and event formulas, for the case of disablers) for
these three cases in order to avoid duplicate work.
In the algorithms presented below, we assume that variables are global (e.g., ST (m), FT (m),
etc.); therefore, these functions in general do not have return values.
Start Time Distributions We assume in our calculations that methods begin at their est, as
executives have incentive to start methods as early as possible. This assumption, in general, holds;
we discuss one example of a method not being able to start at its est in our PADS experiments in
Section 5.4.2.
When calculating a method’s starting distribution, as execution continues in this domain past a
method failure, we must include in our implementation of Line 1 of Algorithm5.1 consideration for
what happens when a method’s enablers fail. We must also check for disablers. And, as execution
continues even if a constraint is violated, we must explicitly check to see if methods’ will be able
to start by their lst.
The algorithm for calcStartTimes is shown in Algorithm 5.4. In this domain, a method’s
start time distribution is a function of three things: (1) the ﬁnish time distribution of the method’s
predecessor, FT (pred(m)); (2) the ﬁnish time distributions of its enablers given that they earn
utility, FT
u
(e) (recall that m will not execute until all enablers earn utility); and (3) the method’s
73
5 Probabilistic Analysis of Deterministic Schedules
release (earliest possible start) time. Along the way, several types of failures can occur: (1) the
method might be disabled; (2) the method might not be enabled; and (3) the method might miss its
lst. Before a method’s start time distribution can be calculated, the proﬁles of m’s predecessor
and all enablers must have been calculated. If all enablers are not scheduled, then po (m) and
eu(m) both equal 0.
The algorithm begins by generating a distribution of the maximum of its enablers’ ﬁnish times
(Line 1). The ﬁrst component shown, ¦max
e∈en(m)
FT
u
(e)¦ is the distribution representing when
all enablers will successfully ﬁnish. The second component, ¦¦∞, 1 −
¸
e∈en(m)
(1 −po (e))¦¦,
represents the group’s ﬁnish time if not all enablers succeed; in this case, the group as a whole
never ﬁnishes and so has an end time of ∞.
The algorithm iterates through this distribution, as well as the distribution of m’s predecessor’s
ﬁnish time (Line 4), to consider individual possible start times st. It ﬁrst checks to see if the method
might be disabled by the time it starts; if it is, the method would immediately be removed from
the schedule without being executed. To calculate the probability that m is disabled, the algorithm
considers the distribution of the ﬁrst of m’s disablers to successfully ﬁnish with utility (Lines 8 
9). If there is a chance a disabler will ﬁnish with utility by st, it adds this probability (scaled by the
probability of st) to FT
nu
(m), along with m’s ﬁnish time in that case (Lines 1011).
If m might be disabled it also makes note of it for UF (m). For each potential disabler the
algorithm creates a unique event variable representing the disabling (Line 12). The variable is of
the formddisd
, where d
is either mor an ancestor of m, whichever is directly disabled by d. The
algorithm increments the probability of this event and adds it to m’s disable formula (Lines 1314).
It also increments the probability that m will be disabled at this start time for later use (Line 15).
Next, the algorithm checks ft
e
. If ft
e
equals ∞, then m is not enabled in this iteration. In
such cases, m is not removed from the schedule until its lst has passed; the planner will wait to
remove it just in case the failed enabler(s) can be rescheduled. Therefore we add lst(m) + 1 to
FT
nu
(m) along with the probability p
st
− p
disbl
, which is the probability of this iteration minus
the probability that m has already been disabled (Line 20).
Finally, if m is not disabled and is enabled, but st > lst(m), then this is an invalid start time
(Line 23). In this case m will miss its latest start time with probability p
st
−p
disbl
. The associated
ﬁnish time is the maximum of the predecessor’s ﬁnish time and lst(m) + 1. We add these to
FT
nu
(m) (Line 24). Otherwise, m can start at time st and so we add p
st
−p
disbl
and st to ST (m)
(Line 26). This process is repeated for all possible st’s. Note that ST (m) does not sum to 1;
instead, it sums to the probability that m successfully begins execution.
At the end of the algorithm we collect information for UF (m). The event formula representing
74
5.3 MultiAgent Domain
Algorithm 5.4: Method start time distribution calculation with discrete distributions, temporal con
straints and hard NLEs.
Function : calcStartTimes(m)
E = ¦max
e∈en(m)
FT
u
(e)¦
¸
¦¦∞, 1 −
¸
e∈en(m)
po (e)¦¦ 1
p
norm
= 0 2
foreach ft
e
∈ E do 3
foreach ft
pred
∈ FT (pred(m)) do 4
st = max(rel(m) , ft
pred
, ft
e
) 5
p
st
= p (ft
e
) p (ft
pred
) 6
//might the method be disabled in this combination?
p
disbl
= 0 7
foreach D ∈ { (dis(m)) do 8
foreach ft
d
∈ min
d∈dis(m)
(d ∈ D ? FT
u
(d) [FT
nu
(d) +∞) do 9
if ft
d
≤ st and ft
d
≤ lst(m) then 10
//the method is disabled
push (¦max(ft
pred
, ft
d
) , p
st
p (ft
d
)¦, FT
nu
(m)) 11
a = directlyDisabledAncestor(m, d) 12
p (ddisa) = p (ddisa) +p
st
p (ft
d
) 13
dis
ev
= dis
ev
∪ ¦ddisa¦ 14
p
disbl
= p
disbl
+p
st
p (ft
d
) 15
end 16
end 17
end 18
//is the method enabled for this start time?
if ft
e
== ∞then 19
push (¦max(lst(m) + 1, ft
pred
) , p
st
−p
disbl
¦, FT
nu
(m)) 20
else 21
p
norm
= p
norm
+p
st
−p
disbl
22
if st > lst(m) then 23
//invalid start time
push (¦max(lst(m) + 1, ft
pred
) , p
st
−p
disbl
¦, FT
nu
(m)) 24
else 25
push (¦st, p
st
−p
disbl
¦, ST (m)) 26
end 27
end 28
end 29
end 30
//collect the probability formulas for UF (m)
dism = (or dis
ev
) 31
p (stm) = Σ
st∈ST(m)
p (st) /p
norm
32
75
5 Probabilistic Analysis of Deterministic Schedules
m being disabled is (not (or dis
ev
)) where dis
ev
and the corresponding value assignments of it
were found earlier (Line 31). The event variable representing m meeting its lst and starting is
stm, which happens with probability Σ
st∈ST(m)
p (st) /p
norm
; it is normalized in this way so that
it equals the probability that it starts given that it is not disabled and is enabled (Line 32).
Soft NLE Distributions In order to calculate m’s ﬁnish time and expected utility, we ﬁrst need
to know m’s facilitating and hindering effects. Algorithm 5.5 presents the algorithm for this. The
algorithm returns a set of distributions representing the effects on both m’s duration NLE
d
and
utility NLE
u
. Further, it indexes the distribution NLE
d
by m’s possible start times; each entry is
the distribution of possible NLE coefﬁcients for that start time.
Algorithm 5.5: Method soft NLE duration and utility effect distribution calculation with discrete
distributions.
Function : calcNLEs(m)
foreach st ∈ ST (m) do 1
DC = ¦1, 1¦ 2
UC = ¦1, 1¦ 3
foreach f ∈ fac(m) do 4
pnle =
¸
ft∈FTu(f)
ft ≤ st ? p (ft) : 0 5
push (¦¦(1 −df proputil (f, st)) , pnle¦, ¦1, 1 −pnle¦¦, DC) 6
push (¦¦(1 +uf proputil (f, st)) , pnle¦, ¦1, 1 −pnle¦¦, UC) 7
end 8
foreach h ∈ hind(m) do 9
pnle =
¸
ft∈FTu(h)
ft ≤ st ? p (ft) : 0 10
push (¦¦(1 +df proputil (h, st)) , pnle¦, ¦1, 1 −pnle¦¦, DC) 11
push (¦¦(1 −uf proputil (h, st)) , pnle¦, ¦1, 1 −pnle¦¦, UC) 12
end 13
foreach dc ∈
¸
ci∈DC
c
i
do 14
push (¦dc, p (dc)¦, NLE
d
(m, st)) 15
end 16
foreach uc ∈
¸
ci∈UC
c
i
do 17
push (¦uc, p (uc)¦, NLE
u
(m)) 18
end 19
end 20
At a high level, the algorithm iterates through m’s start times, facilitators and hinderers. For
each start time st in ST (m) and facilitator (hinderer), it ﬁnds the probability that the facilitator
(hinderer) will earn utility by time st (Line 5). It then adds a small distribution to the lists DC and
76
5.3 MultiAgent Domain
UC representing the possible coefﬁcients of what the facilitator’s (hinderer’s) effect on m will be:
with probability pnle it will have an effect, with probability 1−pnle it will not (Lines 67). Then,
the algorithm combines these small distributions via multiplication to ﬁnd an overall distribution
of what the possible NLE coefﬁcients for m are at time st (Line 14), and the result is added to m’s
NLE distributions (Line 15). This is repeated for each possible start time until the full distributions
are found.
Finish Time Distributions Next, FT
u
(m) is calculated. The implementation of Line 2 of Algo
rithm 5.1 (on page 60) is complicated due to the presence of NLEs and the separation of FT
u
and
FT
nu
. The algorithm is illustrated in Algorithm 5.6. It iterates through st ∈ ST (m), and for each
start time multiplies m’s duration distribution with NLE
d
(m, st) to ﬁnd m’s effective duration
distribution when it starts at time st. Iterating through the effective duration distribution, the algo
rithm considers each ﬁnish time ft that results from adding each possible start time and duration
together (Lines 23). If ft > dl(m), the method earns zero utility and the algorithm adds the prob
ability of this ﬁnish time p (ft) and ft to FT
nu
(m) (Line 7). Otherwise, it is a valid ﬁnish time. At
this point, the only other thing to consider is that m might end with a failure outcome. Therefore
p (ft) fail(m) and the time ft are added to FT
nu
(m) (Line 10), and p (ft) (1 −fail(m))
and the time ft are added to FT
u
(m) (Line 11).
The last line of the algorithm collects the probability value assignment for ftm (Line 15). Its
value is the probability that it does not miss a deadline given that it begins execution.
Probability of Earning Utility To calculate m’s probability of meeting the plan objective (which
equals its probability of earning positive utility), all of the event probabilities that have been cal
culated in preceding algorithms are combined. This is shown in Algorithm 5.7. In the preceding
algorithms the event formula for dism and the corresponding value assignments were found, as
well as p (stm) and p (ftm); additionally, the event formulas and value assignments for enm
and sucm are already known. With these formulas and value assignments in hand, the algorithm
can put it all together to get m’s ﬁnal utility formula. As all of the events must happen for m to
earn utility, the event formula is the and of these ﬁve individual event formulas (Line 1). Trivially,
the value assignments are the union of the individual value assignments (Line 2).
To calculate po (m) from UF (m), there are two options: reduce the event formula UF
ev
(m)
such that each event appears only once and evaluate it using the variable probability values in
UF
val
(m); or consider all possible assignments of truth values to those events and, for each as
signment, calculate the probability of that assignment occurring as well as whether it leads to an
eventual success or failure. The former, however, is not guaranteed to be possible, so we chose the
77
5 Probabilistic Analysis of Deterministic Schedules
Algorithm 5.6: Method ﬁnish time distribution calculation with discrete distributions, temporal con
straints and soft NLEs.
Function : calcFinishTimes(m)
p
dl
= 0 1
foreach st ∈ ST (m) do 2
foreach dur ∈ DUR(m) NLE
d
(m, st) do 3
ft = st +dur 4
p (ft) = p (st) p (dur) 5
if ft > dl(m) then 6
//the method did not meet its deadline
push (¦ft, p (ft)¦, FT
nu
(m)) 7
p
dl
= p
dl
+p (ft) 8
else 9
//the method might finish with a failure outcome
push (¦ft, p (ft) fail(m)¦, FT
nu
(m)) 10
//otherwise it ends with positive utility
push (¦ft, p (ft) (1 −fail(m))¦, FT
u
(m)) 11
end 12
end 13
end 14
//collect the probability for m meeting its deadline
p (ftm) =
Σ
st∈ST(m)
p (st) −p
dl
/Σ
st∈ST(m)
p (st) 15
Algorithm 5.7: Method probability of earning utility calculation with utility formulas.
Function : calcProbabilityOfObjective(m)
//first put it all together to get UF (m)
UF
ev
(m) =
and (not dism)
¸
e∈en(m)
UF
ev
(e) stm ftm sucm
1
UF
val
(m) = ¦
¸
ddisa∈dism
ddisa ← p (ddisa) ,
¸
e∈en(m)
UF
val
(e) , stm ← 2
p (stm) , ftm ← p (ftm) , sucm ← (1 −fail(m))¦
//evaluate UF (m)
po (m) = evaluateUtilityFormula(UF (m)) 3
78
5.3 MultiAgent Domain
latter to use in our approach, shown as Algorithm 5.8.
The algorithm creates different truth value assignments by looking at all possible subsets of
the events and assigning “true” to the events in the set and “false” to those not in the set. We use
the powerset function { to generate the set containing all subsets of the events (Line 1). For each
subset S, the algorithm evaluates whether UF
en
(a) is true when all variables in S are true and all
that are not are false (Line 3). It does so by substituting all variables in UF
en
(a) that are in S
with true and all that are not with false, and evaluating the formula. If it is true, it calculates the
probability of that subset and adds it to po (a). Once all subsets are considered we have the ﬁnal
value of po (a).
Finally, PADS normalizes FT
u
(a) and FT
nu
(a) so that they sum to the values po (a) and
1 −po (a), respectively, as discussed in Section 5.3.1. This algorithm is the same for all activities
in the hierarchy.
Algorithm 5.8: Activity utility formula evaluation.
Function : evaluateUtilityFormula(a)
foreach S ∈ { (UF
var
(a)) do 1
L = UF
var
(a) −S 2
if evaluate(UF
en
/∀s ∈ S ← true/∀l ∈ L ← false) then 3
p (S) =
¸
s∈S
p (s)
¸
l∈L
(1 −p (l)) 4
po (a) = po (a) +p (S) 5
end 6
end 7
normalize(FT
u
(a) , po (a)) 8
normalize(FT
nu
(a) , 1 −po (a)) 9
Expected Utility Algorithm 5.9 shows how eu(m) is calculated for methods, providing an im
plementation for Line 4 of Algorithm 5.1 (on page 60). The algorithm iterates through UTIL(m)
and NLE
u
(m) to ﬁnd the method’s effective average utility, after soft NLEs, and then multiplies
that by m’s probability of earning utility.
OR Task Probability Proﬁle Calculations
An OR task’s utility at any time is a direct function of its children’s: if any child has positive utility,
the task does as well; similarly if no child has positive utility, nor will the task. Therefore its FT
u
distribution represents when the ﬁrst child earns utility, and its FT
nu
distribution represents when
79
5 Probabilistic Analysis of Deterministic Schedules
Algorithm 5.9: Method expected utility calculation with discrete distributions and soft NLEs.
Function : calcExpectedUtility(m)
eu(m) = 0 1
foreach util ∈ UTIL(m) do 2
enle = 0 3
foreach nle ∈ NLE
u
(m) do 4
enle = enle +nle p (nle) 5
end 6
eu(m) = eu(m) + (enle util p (util)) 7
end 8
eu(m) = eu(m) po (m) 9
the last child fails to earn utility. To ﬁnd these distributions, it is necessary to consider all possible
combinations of children succeeding and failing. As a simple example of this, if a task T has two
scheduled children a and b, to calculate the ﬁnish time distribution of T we must consider:
• The min of FT
u
(a) and FT
u
(b), to ﬁnd when T earns utility if both a and b earn utility.
• FT
u
(a), scaled by 1 −po (b), to ﬁnd when T earns utility if only a earns utility.
• FT
u
(b), scaled by 1 −po (a), to ﬁnd when T earns utility if only b earns utility.
• The max of FT
nu
(a) and FT
nu
(b), to ﬁnd when T fails to earn utility if neither a nor b earn
utility.
We do this algorithmically by calculating the powerset of T’s children, where each subset repre
sents a hypothetical set of children earning utility. A task’s probability proﬁle cannot be calculated
until the proﬁles of all scheduled children have been found; if there are no scheduled children, its
eu and po values equal 0.
Algorithm5.10 shows howto calculate these distributions, providing implementation for Line 7
of Algorithm 5.2 (on page 61). As described in the above paragraph, the algorithm considers each
possible set of task T’s children earning utility by taking the powerset of T’s scheduled children
(Line 1). Each set S in the powerset represents a set of children that hypothetically earns utility; the
set difference between T’s scheduled children and S is L, a set of children that hypothetically does
not earn utility. If the size of set S is at least one, meaning at least one child earns utility in this
hypothetical case, the algorithm ﬁnds the distribution representing the minimum ﬁnish time of the
children in S and iterates through it, considering each member ft in turn (Line 4). The probability
of this ﬁnish time under these circumstances is the probability of ft multiplied by the probability
that all children in L do not earn utility (i.e., the product of 1 − po (l) for each l ∈ L) (Line 5).
80
5.3 MultiAgent Domain
The algorithm adds each possible ﬁnish time and its probability to FT
u
(T) (Line 6). Otherwise,
if no child earns utility in the current hypothetical case (i.e., S is the empty set), the algorithm
ﬁnds the latest ﬁnish time distribution of the hypothetically failed children and adds it to FT
nu
(T)
(Line 10).
Algorithm 5.10: OR task ﬁnish time distribution calculation with discrete distributions and uafs.
Function : calcORTaskFinishTimes(T)
foreach S ∈ { (scheduledChildren(T)) do 1
L = scheduledChildren(T) −S 2
if [S[ ≥ 1 then 3
//at least one child earned utility
foreach ft ∈ min
s∈S
FT
u
(s) do 4
p
ft
= p (ft)
¸
l∈L
(1 −po (l)) 5
push (¦ft, p
ft
¦, FT
u
(T)) 6
end 7
else 8
//no child earned utility
foreach ft ∈ max
l∈L
FT
nu
(l) do 9
push (¦ft, p (ft)¦, FT
nu
(T)) 10
end 11
end 12
end 13
An OR task’s utility formula is simple to calculate. The event formula is the or of its children’s
formulas, and the value assignments are the union of its children’s assignments. It can be evaluated
by using the function evaluateUtilityFormula from Algorithm 5.8.
The expected utility of an OR task depends on its uaf; its calculation is shown in Algo
rithm 5.11. For a sum task, it is the sum of the expected utilities of its children, as the expectation
of a sum is the sum of the expectation (Line 3). For a max task, the algorithm must consider all
possible combinations of children succeeding or failing and ﬁnd what the maximum utility is for
each combination. As with an OR task’s ﬁnish time distribution, to do so the algorithm consid
ers the powerset of all scheduled children, where each set in the powerset represents a hypothetical
combination of children earning utility. The probability that any set S in that powerset occurs is the
probability that the children in S earn utility and the children in L do not (Line 10). As the task’s
expected utility is the max of only the children that earn utility, its expected utility is a function of
the expected utilities of its children provided they earn positive utility. This number can be easily
calculated for activities as part of their probability proﬁles. We call this value epu (Line 11).
81
5 Probabilistic Analysis of Deterministic Schedules
Algorithm 5.11: OR task expected utility calculation with discrete distributions and uafs.
Function : calcORTaskExpectedUtility(T)
switch Type of task T do 1
case sum 2
eu(T) =
¸
s∈scheduledChildren(T)
eu(s) 3
case max 4
foreach S ∈ { (scheduledChildren(T)) do 5
if [S[ == 0 then 6
eu(T) = 0 7
else 8
L = scheduledChildren(T) −S 9
p (S) =
¸
s∈S
po (s)
¸
l∈L
(1 −po (l)) 10
eu(T) = eu(T) +p (S) max
s∈S
epu(s) 11
end 12
end 13
end 14
end 15
AND Task Probability Proﬁle Calculations
An AND task’s utility at any time depends on whether all children have earned utility. If any child
has zero utility, then the task does as well. Therefore its FT
u
distribution represents when the last
child earns utility, and its FT
nu
distribution represents when the ﬁrst child fails to earn utility. As
with an OR task’s ﬁnish time distribution, to do so it is necessary to consider all possible combina
tions of children succeeding and failing, which can be generated algorithmically by the powerset
operator. A task’s probability proﬁle cannot be calculated until the proﬁles of all scheduled chil
dren have been found; if not all children are scheduled, its eu and po values equal 0 and no further
calculation is necessary.
The ﬁnish time distribution calculation is shown in Algorithm 5.12. The algorithm considers
each possible set of task T’s children not earning utility by taking the powerset of T’s scheduled
children (Line 1). Each set L in the powerset represents a hypothetical set of children that do not
earn utility; the set difference between T’s scheduled children and L is S, the children that do earn
utility in this hypothetical case. If at least one child does not earn utility in the current set, the
algorithm ﬁnds the distribution representing the minimum ﬁnish time of the children who do not
earn utility and consider each possible value ft in the distribution (Line 4). The probability of this
ﬁnish time under these circumstances is the probability of ft times the probability that all children
82
5.3 MultiAgent Domain
in S do earn utility (Line 5). We add each possible ﬁnish time and its probability to FT
nu
(T)
(Line 10). Otherwise, if all children earn utility, then we ﬁnd the latest ﬁnish time of the children
and add it to FT
u
(T) (Line 10).
Algorithm 5.12: AND task ﬁnish time distribution calculation with discrete distributions and uafs.
Function : calcANDTaskFinishTimes(T)
foreach L ∈ { (scheduledChildren(T)) do 1
S = scheduledChildren(T) −L 2
if [L[ ≥ 1 then 3
//at least one child did not earn utility
foreach ft ∈ min
l∈L
FT
nu
(l) do 4
p
ft
= p (ft)
¸
s∈S
po (s) 5
push (¦ft, p
ft
¦, FT
nu
(T)) 6
end 7
else 8
//all children earned utility
foreach ft ∈ max
s∈S
FT
u
(s) do 9
push (¦ft, p (ft)¦, FT
u
(T)) 10
end 11
end 12
end 13
An AND task’s event formula is the and of its children’s formulas, and the value assign
ments are the union of its children’s assignments. It can be evaluated by using the function
evaluateUtilityFormula from Algorithm 5.8.
The algorithm for calculating the expected utility of an AND task is shown in Algorithm 5.13.
Because an and task earns utility only when all children do, its expected utility is a function of the
expected utilities of its children provided they earn positive utility. As mentioned in the discussion
of OR tasks’ expected utility calculations, this value, epu, can easily be calculated as part of an
activity’s probability proﬁle. If the activity is a sum and task, it is the sum over all children of this
estimate (Line 3); if the activity is a min task, it is the minimum of these estimates (Line 5).
ONE Task Probability Proﬁle Calculation
A ONE task’s probability proﬁle is that of its sole scheduled child; if the number of scheduled
children is not one, the task will not earn utility and its proﬁle is trivial. Its implementation is the
same as in Algorithm 5.2 (on page 61).
83
5 Probabilistic Analysis of Deterministic Schedules
Algorithm 5.13: AND task expected utility calculation with discrete distributions and uafs.
Function : calcANDTaskExpectedUtility(T)
switch Type of task T do 1
case sum and 2
eu(T) = po (T) Σ
c∈children(T)
epu(c) 3
case min 4
eu(T) = po (T) min
c∈children(T)
epu(c) 5
end 6
end 7
Expected Utility Loss / Gain
As part of its analysis, PADS also calculates two heuristics which are used to rank activities by
their importance by probabilistic schedule strengthening (Chapter 6). It measures this importance
in terms of expected utility loss (eul) and expected utility gain (eug). Both of these metrics are
based on the expected utility of the taskgroup assuming that an activity a earns positive utility,
minus the expected utility of the taskgroup assuming that a earns zero utility:
eu(TG [ po (a) = 1) −eu(TG [ po (a) = 0)
In this and subsequent chapters, we rely on the [ operator, which signiﬁes a change to the structure
or proﬁles of the taskgroup. For example, eu(TG [ po (a) = 1) denotes the expected utility of
the taskgroup if po (a) were to equal 1. The above equation, then, speciﬁes by how much the
expected utility of the taskgroup would differ if the activity were to deﬁnitely succeed versus if
it were to deﬁnitely fail; we call this utility difference (ud). The utility difference is, in general,
positive; however, in the presence of disablers, it may be negative. The utility difference can be
interpreted as representing two different measures: how much utility the taskgroup would lose if
the activity were to fail; or, how much utility the taskgroup would gain if the activity were to earn
positive utility. These two interpretations are the bases for the ideas behind expected utility loss
and expected utility gain.
To ﬁnd expected utility loss, we scale ud by 1 − po (a); this turns the measure into the utility
loss expected to happen based on a’s probability of failing. For example, when considering eul we
do not care if an activity has a huge utility difference if it will never fail; instead, we would want
to prioritize looking at activities with medium utility differences but large probabilities of failing.
The overall equation for expected utility loss is:
eul (a) = (1 −po (a)) (eu(TG [ po (a) = 1) −eu(TG [ po (a) = 0))
84
5.3 MultiAgent Domain
Expected utility gain also scales ud, but by a’s probability of earning utility, so that it measures
the gain in utility expected to happen based on a’s probability of earning utility. This ensures that
eug prioritizes activities with, for example, a medium utility difference but high probability of
earning utility over an activity with a high utility difference but a very low probability of earning
utility. The equation for expected utility gain is:
eug (a) = po (a) (eu(TG [ po (a) = 1) −eu(TG [ po (a) = 0))
Although we do not to so in this work, utility difference can also be used as a metric without
considering po values, depending on what the goals of the programmer are. Using ud alone is
suitable for systems that want to prioritize, for example, eliminating the possibility of a catastrophic
failure, no matter how unlikely; (1 − po) ud and po ud are suitable for systems with the goal
of maximizing expected utility, as ours. These heuristics are not speciﬁc to this domain, as utility
difference can be calculated for any activity in any activity network.
Because these calculations ideally require knowledge of the entire activity hierarchy, they are
implemented differently depending on whether it is global or distributed. We discuss details of this
in the following two sections.
5.3.3 Global Algorithm
When initially invoked, global PADS goes through the activity network and calculates probability
proﬁles for each activity. PADS begins with the method(s) with the earliest est(s); it then prop
agates proﬁles through the network until proﬁles for the entire tree have been found. Proﬁles are
stored with each activity, and are updated by PADS if the information associated with upstream (or
child) activities changes.
To calculate eul (a) and eug (a), PADS temporarily sets po (a) to be 1 and 0, in turn, and
propagates that information throughout the activity network until the eu(TG) has been updated
and provides the values of eu(TG [ po (a) = 1) and eu(TG [ po (a) = 0). PADS can then ﬁnd the
utility difference, ud = eu(TG [ po (a) = 1)−eu(TG [ po (a) = 0), and multiply it by 1−po (a)
or po (a) to ﬁnd eul (a) or eug (a), respectively.
Probability Proﬁle Propagation Example
In this section, we show global probability proﬁle calculations for a simple example (Figure 5.3)
to give further intuition for how PADS works . The problem that we show is similar to Figures 3.5
and 5.1, but is simpliﬁed here to promote understanding; it has deterministic utilities and smaller
85
5 Probabilistic Analysis of Deterministic Schedules
Laskaroup
!"#: $!%&(C8)&
11
rel: 100, dl: 120
!"#: %"'&(C8)&
12
rel: 110, dl: 130
!"#: $!%(")*&(Anu)&
m1
uul: 12, fall: 20°
uu8: 14 (73°), 16 (23°)
aaenL: !eremv
m2
uul: 3
dur: 8
aaenL: Sarah
m3
uul: 6
uu8: 4 (30°), 6 (30°)
aaenL: 8usLer
m4
uul: 9
uu8: 7 (30°), 11 (30°)
aaenL: 8usLer
enables
!"# $%&%!'(#
)!%#"**#
!+# ,./%&(# !0#
%
1
2
3
4
%
.
#
)!%#"+*#
52&26(#
)!%#"7*#
%
1
2
3
4
%
.
#
Figure 5.3: A simple 3agent problem and a corresponding schedule.
duration distributions. We also show the schedule for this problem. The time bounds for this
schedule generated by the deterministic STN (and thus based on the methods’ average, scheduled
durations) are:
• est(m1) = 100, lst(m1) = 101, eft(m1) = 115, lft(m1) = 116
• est(m3) = 115, lst(m3) = 116, eft(m3) = 120, lft(m3) = 121
• est(m4) = 120, lst(m4) = 121, eft(m4) = 129, lft(m4) = 130
Our calculation begins with method m1, as it is the earliest scheduled method. It has no en
ablers and no predecessor, so its start time distribution depends on nothing but its release time
(rel). Therefore, since methods are always started as early as possible, ST (m1) = ¦¦100, 1.0¦¦;
this implies that p (stm1) = 1 because it never misses its lst of 101. It has no disablers, so
p (dism1) = 0. It also has no soft NLEs, so NLE
u
(m1) = ¦¦1, 1¦¦ and NLE
d
(m1) =
¦¦1, 1¦¦ for all start times.
86
5.3 MultiAgent Domain
The ﬁnish time of m1 is a combination of ST (m1) = ¦¦100, 1.0¦¦ and its duration distri
bution, ¦¦14, 0.75¦, ¦16, 0.25¦¦. Iterating through these, PADS ﬁrst gets ft = 100 + 14 = 114
which occurs with p (ft) = 1.0 0.75 = 0.75. This meets m1’s deadline of 120. So, PADS pushes
¦114, 0.750.2¦ onto FT
nu
(m1) to account for its failure outcome likelihood, and ¦114, 0.750.8¦
onto FT
u
(m1). The method’s other possible ﬁnish time is ft = 100 + 16 = 116 which occurs
with p (ft) = 1.0 0.25 = 0.25, which also meets m1’s deadline. Again, however, m1 might fail
so PADS pushes ¦116, 0.25 0.2¦ onto FT
nu
(m1) and ¦116, 0.25 0.8¦ onto FT
u
(m1). Trivially,
p (ftm1) = 1.
The utility event formula for m1 is UF
ev
(m1) = (and stm1 ftm1 sucm1); we omit
dism1 and enm1 because m1 does not have enablers or disablers. The utility value assignment
is UF
val
(m1) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8¦. To evaluate this, PADS creates
different truth value assignments to the variables in UF
ev
(m1) by looking at all possible subsets
of the events and assigning “true” to the events in the set and “false” to those not in the set. We list
below the different sets, generated by the powerset operator. and how the utility formula evaluates
for those sets, in the context of Algorithm 5.8.
1. S = ¦¦:
T = ¦stm1, ftm1, sucm1¦
p (S) = (1 −p (stm1)) (1 −p (ftm1)) (1 −p (sucm1)) = 0
UF
en
/∀s ∈ S ← true/∀t ∈ T ← false = (and false false false) = false
2. S = ¦stm1¦:
T = ¦ftm1, sucm1¦
p (S) = p (stm1) (1 −p (ftm1)) (1 −p (sucm1)) = 0
UF
en
/∀s ∈ S ← true/∀t ∈ T ← false = (and true false false) = false
3. S = ¦ftm1¦: not shown, evaluates to false
4. S = ¦sucm1¦: not shown, evaluates to false
5. S = ¦stm1, ftm1¦: not shown, evaluates to false
6. S = ¦stm1, sucm1¦: not shown, evaluates to false
7. S = ¦ftm1, sucm1¦: not shown, evaluates to false
87
5 Probabilistic Analysis of Deterministic Schedules
8. S = ¦stm1, ftm1, sucm1¦:
T = ¦¦
p (S) = p (stm1) p (ftm1) p (sucm1) = 0.8
UF
en
/∀s ∈ S ← true/∀t ∈ T ← false = (and true true true) = true
po (m1) = po (m1) + 0.8
Based on these iterations, po (m1) = 0.8. As util (m1) = 12 and NLE
u
(m1) = ¦¦1, 1¦¦,
eu(m1) = 0.8 12 = 9.6. To put together m1’s probability proﬁle:
ST (m1) = ¦¦100, 1.0¦¦
FT
u
(m1) = ¦¦114, 0.6¦, ¦116, 0.2¦¦
FT
nu
(m1) = ¦¦114, 0.15¦, ¦116, 0.05¦¦
UF
ev
(m1) = (and stm1 ftm1 sucm1)
UF
val
(m1) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8¦
po (m1) = 0.8
eu(m1) = 9.6
Next we can calculate T1’s probability proﬁle. Because it only has one scheduled child, its
proﬁle is the same as m1’s:
FT
u
(T1) = ¦¦114, 0.6¦, ¦116, 0.2¦¦
FT
nu
(T1) = ¦¦114, 0.15¦, ¦116, 0.05¦¦
UF
ev
(T1) = (and stm1 ftm1 sucm1)
UF
val
(T1) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8¦
po (T1) = 0.8
eu(T1) = 9.6
Because m3 is enabled by T1, its start time depends on T1’s ﬁnish time. The algorithm begins
by iterating through the distribution E (Line 1 of Algorithm 5.4), which here equals FT
u
(T1)
appended by ¦∞, 1 −po (T1)¦, so E = ¦¦114, 0.6¦, ¦116, 0.2¦, ¦∞, 0.2¦¦. It can skip Line 4 as
m3 has no predecessors. First, then, st = 114 with probability 0.6. Since this is a valid start time,
PADS pushes ¦114, 0.6¦ onto ST (m3). Next, st = 116 with probability 0.2 and PADS pushes
¦116, 0.2¦ onto ST (m3).
Next, st = ∞ with probability 0.2. This represents the case when T1 does not earn utility.
Because the method is not enabled in this case, the method fails at max(lst(m3) + 1, ft
pred
) =
88
5.3 MultiAgent Domain
117; recall that in cases where a method’s enabler fails, the method is not removed from the sched
ule until its lst has passed in case the planner is able to reschedule the enabler. So, PADS pushes
¦117, 0.2¦ onto FT
nu
(m3). Finally, p (stm3) = 0.8/0.8 and p (dism3) = 0. As m3 has no soft
NLEs, NLE
u
(m3) = ¦¦1, 1¦¦ and NLE
d
(m3) = ¦¦1, 1¦¦ for all possible start times.
To ﬁnd m3’s ﬁnish time distribution, ST (m3) is combined with DUR(m3). The ﬁrst combi
nation is st = 114 and dur = 4, which happens with probability 0.6 0.5 = 0.3; the ﬁnish time 118
is less than m3’s deadline, and so PADS pushes ¦118, 0.3¦ onto FT
u
(m3). The next combination
is st = 114 and dur = 6, which happens with probability 0.6 0.5 = 0.3; the ﬁnish time 120 is less
than m3’s deadline, and so we push ¦120, 0.3¦ onto FT
u
(m3). The next combination is st = 116
and dur = 4, which happens with probability 0.2 0.5 = 0.1; the ﬁnish time 120 is less than m3’s
deadline, and so PADS pushes ¦120, 0.1¦ onto FT
u
(m3). The ﬁnal combination is st = 116 and
dur = 6, which happens with probability 0.2 0.5 = 0.1; the ﬁnish time 122 is less than m3’s
deadline, and so PADS pushes ¦122, 0.1¦ onto FT
u
(m3). Trivially, p (ftm3) = 0.8/0.8 = 1.
The utility event formula for m3 is (omitting dism3, as m3 has no disablers):
UF
ev
(m3) =(and (and stm1 ftm1 sucm1) stm3 ftm3 sucm3)
and the utility value assignment is
UF
val
(m3) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8,
stm3 ← 1, ftm3 ← 1, sucm3 ← 1¦
UF (m3) is evaluated the same as is m1’s utility formula. The calculation is not shown, but it
evaluates to 0.8, and m3’s expected utility is 0.8 6 = 4.8. Thus m3’s full proﬁle is:
ST (m3) = ¦¦114, 0.6¦, ¦116, 0.2¦¦
FT
u
(m3) = ¦¦118, 0.3¦, ¦120, 0.4¦, ¦122, 0.1¦¦
FT
nu
(m3) = ¦¦117, 0.2¦¦
UF
ev
(m3) =(and (and stm1 ftm1 sucm1) stm3 ftm3 sucm3)
UF
val
(m3) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8,
stm3 ← 1, ftm3 ← 1, sucm3 ← 1¦
po (m3) = 0.8
eu(m3) = 4.8
Method m4’s proﬁle is calculated next as per Algorithm 5.4. Its start time depends on T1’s and
m3’s ﬁnish times. As with method m4, the distribution E = ¦¦114, 0.6¦, ¦116, 0.2¦, ¦∞, 0.2¦¦.
89
5 Probabilistic Analysis of Deterministic Schedules
PADS iterates through this distribution and m4’s predecessor’s (m3’s) ﬁnish time distribution,
calculating m4’s start time distribution along the way. Recall that lst(m4) = 121. We collapse
the cases where ft
e
equals ∞and do not bother showing the iteration through m4’s predecessor’s
distribution for that case.
1. ft
e
= 114, ft
pred
= 117, st = 117, p
st
= 0.6 0.2 = 0.12
Method starts, push ¦117, 0.12¦ onto ST (m4)
2. ft
e
= 114, ft
pred
= 118, st = 118, p
st
= 0.6 0.3 = 0.18
Method starts, push ¦118, 0.18¦ onto ST (m4)
3. ft
e
= 114, ft
pred
= 120, st = 120, p
st
= 0.6 0.4 = 0.24
Method starts, push ¦120, 0.24¦ onto ST (m4)
4. ft
e
= 114, ft
pred
= 122, st = 114, p
st
= 0.6 0.1 = 0.06
Method misses lst, push ¦122, 0.06¦ onto FT
nu
(m4)
5. ft
e
= 116, ft
pred
= 117, st = 117, p
st
= 0.2 0.2 = 0.04
Method starts, push ¦117, 0.04¦ onto ST (m4)
6. ft
e
= 116, ft
pred
= 118, st = 118, p
st
= 0.2 0.3 = 0.06
Method starts, push ¦118, 0.06¦ onto ST (m4)
7. ft
e
= 116, ft
pred
= 120, st = 120, p
st
= 0.2 0.4 = 0.08
Method starts, push ¦120, 0.08¦ onto ST (m4)
8. ft
e
= 116, ft
pred
= 122, st = 122, p
st
= 0.2 0.1 = 0.02
Method misses lst, push ¦122, 0.02¦ onto FT
nu
(m4)
9. ft
e
= ∞, st = ∞, p
st
= 0.2
Method is not enabled, push ¦122, 0.2¦ onto FT
nu
(m4)
PADS ultimately ends up with ST (m4) = ¦¦117, 0.16¦, ¦118, 0.24¦, ¦120, 0.32¦¦. As it is en
abled with probability 0.8 but starts with probability 0.72, p (stm4) = 0.72/0.8 = 0.9. As m4
has no soft NLEs, NLE
u
(m4) = ¦¦1, 1¦¦ and NLE
d
(m4) = ¦¦1, 1¦¦.
To ﬁnd m4’s ﬁnish time distribution, ST (m4) is combined with DUR(m4). Since DUR(m4)
has only two possible values, m4 has six possible ﬁnish time combinations; we do not show the
calculation. The latest one, when m4 starts at time 120 and has duration 11, is the only one where
m4 misses its deadline; so p (ftm4) =
0.72−0.16
0.72
= 0.77778.
The utility event formula for m4 is
UF
ev
(m4) = (and (and stm1 ftm1 sucm1) stm4 ftm4 sucm4)
The utility value formula is UF
val
(m4) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8, stm4 ←
0.9, ftm4 ← 0.77778, sucm4 ← 1¦. We evaluate this as we evaluated m1’s utility formula.
90
5.3 MultiAgent Domain
The calculation is not shown but it evaluates to 0.56, and m4’s expected utility is 0.56 9 = 5.04.
Thus m4’s full proﬁle is:
ST (m4) = ¦¦117, 0.16¦, ¦118, 0.24¦, ¦120, 0.32¦¦
FT
u
(m4) = ¦¦124, 0.08¦, ¦125, 0.12¦, ¦127, 0.16¦, ¦128, 0.08¦, ¦129, 0.12¦¦
FT
nu
(m4) = ¦¦122, 0.28¦, ¦131, 0.16¦¦
UF
ev
(m4) = (and (and stm1 ftm1 sucm1) stm4 ftm4 sucm4)
UF
val
(m4) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8,
stm4 ← 0.9, ftm4 ← 0.77778, sucm4 ← 1¦
po (m4) = 0.56
eu(m4) = 5.04
T2 is calculated next. PADS begins by iterating through the superset of T2’s children to
consider the possible combinations of T2’s children not earning utility, as per Algorithm 5.12.
When L = ¦¦, all children earn utility; PADS ﬁnds the distribution representing the max of
their FT
u
distributions and adds it to FT
u
(T2). Because m4 always ﬁnishes later, FT
u
(T2) =
¦¦124, 0.064¦, ¦125, 0.096¦, ¦127, 0.128¦, ¦128, 0.064¦, ¦129, 0.096¦¦ (note that this is the same
as FT
u
(m4) times 0.8, the probability that m3 earns utility). When L = ¦m3¦, T2 will fail
at the same time m3 does and so PADS adds ¦117, 0.2 0.56 = 0.112¦ to FT
nu
(T2). When
L = ¦m4¦, T2 will fail at the same time m4 does and so PADS adds ¦122, 0.28 0.8 = 0.224¦ and
¦131, 0.16 0.8 = 0.128¦ to FT
nu
(T2). Finally, when L = ¦m3, m4¦, T2 will fail at the same
time m3 does, as it always fails earlier, and so PADS adds ¦117, 0.2 0.44 = 0.088¦ to FT
nu
(T2).
UF
ev
(T2) equals the and of its children’s event formulas, and UF
val
(T2) equals the union of
its children’s value assignments. UF (T2) evaluates to 0.56. Note here the discrepancy between
the sum of the members of FT
u
and po: FT
u
(T2) sums to 0.448, but po (T2) = 0.56. This
highlights the difference between assuming that the enablers of m3 and m4 are independent, as in
the FT
u
calculation, and not making that assumption, as for po. T2’s expected utility, as described
in Algorithm 5.13, is 0.56 (6 + 9) = 8.4, where 6 and 9 are utilities of m3 and m4, respectively.
The last part of the calculation is to normalize T2’s FT distributions so that they sum to po (T2)
and 1 −po (T2).
91
5 Probabilistic Analysis of Deterministic Schedules
T2’s ﬁnal probability proﬁle is:
FT
u
(T2) = ¦¦124, 0.08¦, ¦125, 0.12¦, ¦127, 0.16¦, ¦128, 0.08¦, ¦129, 0.12¦¦
FT
nu
(T2) = ¦¦117, 0.16¦, ¦122, 0.18¦, ¦131, 0.1¦¦
UF
ev
(T2) = (and (and (and stm1 ftm1 sucm1) stm3 ftm3 sucm3)
(and (and stm1 ftm1 sucm1) stm4 ftm4 sucm4))
UF
val
(T2) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8,
stm3 ← 1, ftm3 ← 1, sucm3 ← 1,
stm4 ← 0.9, ftm4 ← 0.77778, sucm4 ← 1¦
po (T2) = 0.56
eu(T2) = 8.4
Finally we can ﬁnd the taskgroup’s probability proﬁle. We ﬁrst ﬁnd its ﬁnish time distribution
as per Algorithm 5.10. In the ﬁrst iteration of the algorithm, S = ¦¦, which means that in this
hypothetical case no child earns utility. As T2’s failure ﬁnish time distribution is strictly later
than T1’s, we can add FT
nu
(T2) (1 − po (T1)), or ¦¦117, 0.032¦, ¦122, 0.036¦, ¦131, 0.02¦¦,
to FT
nu
(TG).
Next, S = ¦T1¦, meaning that in this hypothetical case T1 earns utility and T2 does not.
Therefore, we add FT
u
(T1) (1 − po (T2)), or ¦¦114, 0.264¦, ¦116, 0.088¦¦, to FT
u
(TG).
S next equals ¦T2¦. For the same reasoning, we add FT
u
(T2) (1 − po (T1)), which equals
¦¦124, 0.016¦, ¦125, 0.024¦, ¦127, 0.032¦, ¦128, 0.016¦, ¦129, 0.024¦¦, to FT
u
(TG). Finally,
S = ¦T1, T2¦. As FT
u
(T1) is strictly less than FT
u
(T1), we add ¦¦114, 0.336¦, ¦116, 0.112¦¦
to FT
u
(TG).
UF
ev
(TG) equals the or of its children’s event formulas, and UF
val
(TG) equals the union of
its children’s value assignments. Together, they evaluate to 0.8. Given this value, we can normalize
92
5.3 MultiAgent Domain
FT
u
(TG) and FT
nu
(TG) so they sum to 0.8 and 0.2, respectively. Overall:
FT
u
(TG) = ¦¦114, 0.53¦, ¦116, 0.175¦, ¦124, 0.014¦, ¦125, 0.021¦, ¦127, 0.028¦,
¦128, 0.014¦, ¦129, 0.021¦¦
FT
nu
(TG) = ¦¦117, 0.073¦, ¦122, 0.082¦, ¦131, 0.045¦¦
UF
ev
(TG) = (or (and stm1 ftm1 sucm1)
(and (and (and stm1 ftm1 sucm1) stm3 ftm3 sucm3)
(and (and stm1 ftm1 sucm1) stm4 ftm4 sucm4)))
UF
val
(TG) = ¦stm1 ← 1, ftm1 ← 1, sucm1 ← 0.8,
stm3 ← 1, ftm3 ← 1, sucm3 ← 1,
stm4 ← 0.9, ftm4 ← 0.77778, sucm4 ← 1¦
po (TG) = 0.8
eu(TG) = 18
Therefore the probability that the schedule with earn positive utility during execution is 0.8, and its
expected utility is 18.
5.3.4 Distributed Algorithm
The distributed version of PADS (DPADS) is used by agents locally during execution. It is in
tuitively very similar to global PADS. In fact, the algorithms for calculating activities start times,
NLE information, ﬁnish times and expected utility are for the large part the same. However, not
all activity proﬁles can be found in a single pass, as no agent has a complete view of the activ
ity network. Recall from Section 3.3.2 (on page 36) that when execution begins, agents have a
ﬁxed period of time to perform initial computations, such as calculating initial probability proﬁles,
before they actually begin to execute methods. When execution begins, during this time, the prob
ability proﬁles of the activity network are computed incrementally, with each agent calculating as
many proﬁles as it can at a time. Agents then pass on the newlyfound proﬁles to other agents
in order to allow them to calculate further portions of the network. This process repeats until the
proﬁles of the entire network have been found.
Once execution begins and methods begin executing, probability proﬁles are continuously and
asynchronously being updated to reﬂect the stochasticity of the domain and the evolving schedule.
Whenever an agent receives updates concerning an activity’s execution, its enablers, disablers,
facilitators, hinderers, or (for tasks) children, it updates that activity’s proﬁle; if the activity’s proﬁle
does in fact change, DPADS passes that information on to other agents interested in that activity.
93
5 Probabilistic Analysis of Deterministic Schedules
Agents do not explicitly wait for or keep track of the effects these changes have on the rest of the
hierarchy, even though changes resulting from their update may eventually propagate back to them;
in a domain with high executional uncertainty such as this one, doing so would be very difﬁcult
because typically many probability proﬁles are changing concurrently.
The runtime aspect of the approach also necessitates an additional notion of the currenttime,
which is used as a lower bound on when a method will start (assuming its execution has not already
begun); similarly, after an activity has been executed, its proﬁle is changed to the actual outcome
of its execution. Since handling these issues involve very small changes to the PADS algorithms
presented in Section 5.3.2, we will not explicitly show the changes needed to those algorithms.
The distributed version, however, does make several approximations of expected utility at the
taskgroup level. For example, in the global version, while calculating ud, which is necessary for
eul and eug, eu(TG [ po(a) = 1) and eu(TG [ po(a) = 0) can be calculated exactly by setting
po(a) to the appropriate value and updating the entire network. In the distributed case, a different
approach must be taken since no agent has a complete view of the network.
There are three options for calculating ud in a distributed way. The ﬁrst is to exactly calculate
such values on demand. This involves agents querying other agents for information to use in their
calculations, which would introduce a large delay in the calculation. The second option, which
can also calculate the values exactly, avoids this delay by always having such values cached. This
requires every agent to calculate, for each of its activities, how the success or failure of every other
activity in the network would affect it, and communicate that to other agents. Such computations
would take long to compute due to the large number of contingencies to explore, and would also
require a large increase in the amount of communication; overall, therefore, we consider them
intractable given the realtime constraints of the domain.
Instead, we utilize a third option in which these values are estimated locally by agents as
needed. During DPADS, agents estimate ud based on an estimation of the expected utility of
the taskgroup if an activity a were to have a hypothetical probability value h of earning utility,
¯ eu
TG
(a, h); we use the ´ symbol to designate a number as an estimate.
The algorithm for calculating ¯ eu
TG
is shown in Algorithm 5.14. The algorithm begins at some
local activity a and ascends the activity hierarchy one parent at a time until it reaches the taskgroup
(Line 5). At each level it estimates the po and eu that the current task T would have if po (a) = h,
notated as ´ po (T [ po (a) = h) and ´ eu(T [ po (a) = h). To begin, ´ po (a [ po (a) = h) = h and
´ eu(a [ po (a) = h) = h averageUtil(a) (where averageUtil is deﬁned in Section 3.3.2)
(Lines 12). At higher levels the algorithm estimates these numbers according to tasks’ uafs. As a
simple example, if a’s parent is a max task T, its estimated expected utility is the maximum of a’s
new estimated expected utility and the expected utility of T’s other children (Lines 2021).
94
5.3 MultiAgent Domain
Algorithm 5.14: Distributed taskgroup expected utility approximation with enablers and uafs.
Function : approximateEU(a, h)
´ po (a [ po (a) = h) = h 1
´ eu(a [ po (a) = h) = h averageUtil(a) 2
c = a 3
¯
nle
∆
= 0 4
while T = parent(c) do 5
foreach c
e
∈ enabled(c) do 6
´ po (c
e
[ ˆ po (c)) = po (c
e
)
c po(cpo(a)=h)
po(c)
7
¯
nle
∆
=
¯
nle
∆
+approximateEU(c
e
, ´ po (c
e
[ ˆ po (c))) 8
end 9
foreach c
d
∈ disabled(c) do 10
´ po (c
d
[ ˆ po (c)) = po (c
d
)
1−c po(cpo(a)=h)
1−po(c)
11
¯
nle
∆
=
¯
nle
∆
+approximateEU(c
d
, ´ po (c
d
[ ˆ po (c))) 12
end 13
children
= scheduledChildren(T) −¦c¦ 14
switch Type of task T do 15
case sum 16
´ po (T [ po (a) = h) = 1 −
(1 −po (T))
1−c po(cpo(a)=h)
1−po(c)
17
´ eu(T [ po (a) = h) = ´ eu(c [ po (a) = h) +
¸
c
∈children
eu(c
) 18
case max 19
´ po (T [ po (a) = h) = 1 −
(1 −po (T))
1−c po(cpo(a)=h)
1−po(c)
20
´ eu(T [ po (a) = h) = max(´ eu(c [ po (a) = h) , max
c
∈children
eu(c
)) 21
case exactly one 22
´ po (T [ po (a) = h) = ´ po (c [ po (a) = h) 23
´ eu(T [ po (a) = h) = ´ eu(c [ po (a) = h) 24
case sum and 25
´ po (T [ po (a) = h) = po (T)
c po(cpo(a)=h)
po(c)
26
´ eu(T [ po (a) = h) = ´ po (T [ po (a) = h)
eu(c)
c po(cpo(a)=h)
+
¸
c
∈children
eu(c
)
po(c
)
27
case min 28
´ po (T [ po (a) = h) = po (T)
c po(cpo(a)=h)
po(c)
29
´ eu(T [ po (a) = h) = ´ po (T [ po (a) = h)min
eu(c)
c po(cpo(a)=h)
, min
c
∈children
eu(c
)
po(c
)
30
end 31
end 32
c = T 33
end 34
return ´ eu(TG [ po (a) = h) +
¯
nle
∆
35
95
5 Probabilistic Analysis of Deterministic Schedules
Enabling and disabling relationships are also included in the estimation. If an activity c is
reached that enables another activity c
e
, c
e
’s probability of earning utility given ´ po (c [ po(a) = h)
is estimated. We abbreviate this for clarity as ´ po (c
e
[ ˆ po (c)). How this hypothetical po value for
c
e
would impact the overall taskgroup utility, ¯ eu
TG
(c
e
, ´ po (c
e
[ ˆ po (c))), is then estimated by a
recursive call (Line 8). The same process happens for disablers. We do not include facilitating or
hindering relationships as they do not affect whether a method earns utility, and we expected little
beneﬁt in including them in the estimation. Our experimental results, showing the accuracy of this
approximation, support this decision (Section 5.4.2). Overall, the algorithm returns the sum of the
estimated taskgroup utilities (Line 35).
This algorithm, if it recurses, clearly returns a value not directly comparable to eu(TG) as it
includes potentially multiple summands of it (Lines 8, 35). Instead, it is meant to be compared to
another call to the estimation algorithm. So, to estimate ud, agents use this algorithm twice, with
h = 1 and h = 0:
ud = ¯ eu
TG
(a, 1) − ¯ eu
TG
(a, 0)
This value can then be used to provide estimates of eul and eug.
Typically, the estimate the calculation provides is good. The main source of inaccuracy is in
cases where an activity c and its enabler would separately cause the same task to lose utility if
they fail. For example, if c and an activity it enables c
e
were crucial to the success of a common
ancestral task, both the core algorithm as well as the recursive call would return a high utility
difference stemming from that common ancestor. As we sum these values together, we end up
doublecounting the utility difference at this task and the overall estimate is higher than it should
be. Ultimately, however, we did not deem this a critical error, as even though we are overestimating
the impact c’s failure would have on the schedule, in such cases it typically has a high impact,
anyway.
5.3.5 Practical Considerations
When combining discrete distributions, the potential exists for generating large distributions: the
size of a distribution Z = X + Y , for example, is bounded by [X[ + [Y [. As described in Sec
tion 3.3.2 (on page 44), however, typically such a distribution has duplicate values which we col
lapse to reduce its size (i.e., if the distribution already contained an entry ¦5, 0.25¦ and PADS adds
¦5, 0.2¦ to it, the resulting distribution is ¦5, 0.45¦). In part because of this, we did not ﬁnd dis
tribution size to be a problem. Even so, we demonstrate how the size can be further reduced by
eliminating very unlikely outcomes. We found that eliminating outcomes that occur with less than
a 1% probability worked well for us; however, depending on what point in the tradeoff of speed
96
5.4 Experiments and Results
versus accuracy is needed for a given domain, that threshold can be adjusted. The elimination did
not much impair the calculation’s accuracy, as is shown in the next section, and can readily be done
for systems which are negatively affected by the distribution sizes.
We also memoized the results of function calls to evaluateUtilityFormula. This was
found to greatly reduce the memory usage and increase the speed of updating probability proﬁles.
It is effective because many activities have identical formulas or portions of formulas (e.g., a parent
task with only one scheduled child), and because many activities’ proﬁles do not change over time.
In the distributed version of PADS, our approach has the potential to increase communication
costs as compared to a system not using PADS, as more information is being exchanged. One
simple way to decrease this effect is to communicate only those changes in probability proﬁles that
are “large enough” to be important, based on a threshold value. We discuss other ways to reduce
communication in our discussion of future work, Section 8.1.1.
5.4 Experiments and Results
We performed experiments in the singleagent domain to test the speed of PADS, as well as in the
multiagent domain to test the speed and accuracy of both the global and distributed versions of
PADS.
5.4.1 SingleAgent Domain
We performed an experiment to test the speed of PADS in this domain. We used as our test
problems instances PSP94, PSP100 and PSP107 of the mmj20 benchmark suite of the resource
relaxed MultiMode Resource Constrained Project Scheduling Problem with Minimal and Maxi
mal Time Lags (MRCPSP/max) [Schwindt, 1998]. The problems were augmented as described in
Section 3.3.1.
In the experiment, we executed the three problems 250 times each, using PADS to calculate and
maintain all activity probability proﬁles, and timed all PADS calculations. To execute the problems,
we implemented a simple simulator. Execution starts at time zero, and methods are started at their
est. To simulate method execution the simulator picks a random duration for the method based on
its duration distribution, rounds the duration to the nearest integer, advances time by that duration,
and updates the STN as appropriate. The planner is invoked both initially, to generate an initial
plan, and then during execution whenever a method ﬁnishes at a time that caused an inconsistency
in the STN. PADS is invoked to analyze the initial plan before execution begins, as well as to update
97
5 Probabilistic Analysis of Deterministic Schedules
Table 5.1: Average time spent by PADS calculating and updating probability proﬁles while execut
ing schedules in the singleagent domain.
average standard deviation average
PADS time (s) PADS time (s) makespan (s)
PSP94 8.65 2.25 130.99
PSP100 14.82 7.54 140.94
PSP107 6.43 1.68 115.41
probability proﬁles when execution results become known and when replanning occurs.
Table 5.1 shows the results. For each problem, it shows the average time taken to perform
all PADS calculations, as well as the corresponding standard deviation. It also shows the average
executed schedule makespan. All values are shown in seconds. Clearly, PADS incurs only a modest
time cost in this domain, incurring an average overhead of 10.0 seconds, or 7.6% of the duration of
execution.
5.4.2 MultiAgent Domain
Test Suite
For these experiments, we used a test suite consisting of 20 problems taken from the Phase II
evaluation test problem suite of the DARPA Coordinators program
1
. In this test suite, methods
have deterministic utility (before the effects of soft NLEs), which promotes consistency in the
utility of the results. Method durations were, in general, 3point distributions. Table 5.2 shows the
main features of each of the problems, roughly ordered by size. In general, three problems had
10 agents, 20 problems had 20 agents, and two problems had 25 agents. The problems had, on
average, between 200 and 300 methods in the activity network, and each method had, on average,
2.4 possible execution outcomes. The table also shows how many maxleaf tasks, tasks overall,
and NLEs of each type were present in the activity network. Recall that “maxleaf” tasks are max
tasks that are the parents of methods. Typically, a deterministic planner would choose only one
child of a maxleaf task to schedule. We include this information here because we also use this test
suite in our probabilistic schedule strengthening experiments in the following chapter. In general,
2030% of the total methods in the activity network were scheduled in the initial schedule of these
problems, or roughly 5090 methods.
1
http://www.darpa.mil/ipto/programs/coor/coor.asp
98
5.4 Experiments and Results
Table 5.2: Characteristics of the 20 DARPA Coordinators test suite problems.
agents methods
avg # method max
tasks
enable disable facilitate hinder
outcomes leafs NLEs NLEs NLEs NLEs
1 10 189 3 63 89 18 12 0 18
2 10 225 2.4 45 67 0 0 13 0
3 10 300 2.66 105 140 50 36 33 33
4 20 126 2.95 42 59 21 2 13 10
5 20 135 2.67 52 64 22 0 11 0
6 20 135 2.67 45 64 23 0 12 0
7 20 198 2.67 66 93 38 12 21 10
8 20 225 2.64 75 105 43 10 28 8
9 20 225 5 75 106 23 12 14 11
10 20 225 5 75 106 23 12 20 19
11 20 225 5 75 106 28 12 8 13
12 20 225 5 75 106 29 13 10 13
13 20 225 5 75 106 33 11 15 17
14 20 384 1.55 96 137 33 10 20 17
15 20 384 1.58 96 137 38 23 16 15
16 20 384 1.59 96 137 32 18 17 16
17 20 384 1.59 96 137 35 23 19 14
18 20 384 1.62 96 137 47 16 19 23
19 25 460 3.01 115 146 106 28 42 47
20 25 589 3.06 153 194 81 12 60 26
99
5 Probabilistic Analysis of Deterministic Schedules
!"
!#$"
%"
%#$"
&"
&#$"
%" &" '" (" $" )" *" +" ," %!" %%" %&" %'" %(" %$" %)" %*" %+" %," &!"
!
"
#
$
%
&
'
$
()*+,#"$
Figure 5.4: Time spent calculating 20 schedules’ initial expected utilities.
Analysis of MultiAgent Global PADS
We performed two experiments in which we characterized the running time and accuracy of global
PADS. For each problem in the test suite, we timed the calculation of the expected utility of the
initial schedule, using a 3.6 GHz Intel computer with 2GB RAM. Figure 5.4 shows how long it took
to calculate the initial expected utility for the entire activity network for each of the 20 problems.
The problems are numbered as presented in Table 5.2, and so their order is roughly correlated
to their size. The time is very fast, with the largest problem taking 2.1 seconds, and well within
acceptable range.
Next, we ran an experiment to characterize the accuracy of the calculation. We executed the
initial schedules of each of the 20 test suite problems in a distributed execution environment to
see what utility was actually earned by executing the schedules. Each problem in the test suite
was run 25 times. In order to directly compare the calculated expected utility with the utility the
schedule earns during execution, we executed the schedules as is, with no replanning or strength
ening allowed. The schedule’s ﬂexible times representation does, however, allow the time bounds
of methods to ﬂoat in response to execution dynamics, and we did retain basic conﬂictresolution
functionality that naively repairs inconsistencies in the STN, such as removing methods from a
schedule if they miss their lst.
In the distributed execution environment, when execution begins, agents are given their local
view of the activity network and their portion of the precomputed initial joint schedule, and are
left to manage execution of their local schedules while trying to work together to maximize overall
100
5.4 Experiments and Results
!"##$
#$
"##$
%##$
&##$
'##$
(##$
)##$
*##$
+##$
,##$
"###$
""##$
"%##$
"&$ ""$ ,$ )$ "%$ +$ %$ "#$ '$ ($ "$ &$ "+$ ",$ "*$ "'$ ")$ %#$ *$ "($
!
"
#
$
%
&
'
()*+#,'
./01/$/2/345/6$47895:$
383485/6$/2;/35/6$47895:$
Figure 5.5: The accuracy of global PADS: the calculated expected utility compared to the average
utility, when executed.
joint schedule utility. They communicate method start commands to the simulator, receive back
execution results (based on the stochastics of the problem) as they become known, and share status
information asynchronously. Each agent and the simulator run on their own processor and com
municate via message passing. We assume communication is perfect and sufﬁcient bandwidth is
available.
Figure 5.5 shows the experimental results, and compares the calculated expected utility with the
average utility that was actually earned during execution. The problems are numbered as presented
in Table 5.2, and for visual clarity are sorted in the graph by their calculated expected utility. Error
bars indicate the standard deviation of utility earned over the 25 execution trials.
Note that although the calculations are close to the average achieved utilities, they are not ex
actly the same. In addition to assumptions we make that affect the accuracy (Section 5.3.1), there
are sources of inaccuracy due to aspects of the problem that we do not model. For example, meth
ods may not start as early as possible (at their est) because of the asychronousness of execution;
this accounts in large part for the inaccuracy of problem 3. Also, very rarely, when a method misses
its lst it is left on the schedule and, instead, a later method is removed to increase schedule slack,
101
5 Probabilistic Analysis of Deterministic Schedules
going against our assumption that the method that misses its lst is always removed. This occurs
during the execution of problem 1, accounting for most of its inaccuracy.
Despite these differences, only 4 of the problems  1, 14, 17, and 18  have average executed
utilities that are statistically different from the corresponding calculated expected utility. Problem
1 we have already discussed. Each of the other problems’ 25 simulated executed utilities consist
solely of two different values; this occurs, for example, when there is one highlevel (and high
utility) task that has a nonzero probability of failing but all others always succeed. In cases such
as these, a slight error in the calculation of the probability of that task earns utility can lead to
a reasonable error in the calculated expected utility, and this is what happens in for these three
problems. Overall, however, as the results show, our PADS calculation is very accurate and there is
an average error of only 7% between the calculations of the expected utility and the utility earned
during execution for these schedules.
Analysis of MultiAgent Distributed PADS
We also performed experiments to characterize the runtime and accuracy of DPADS. Recall that
when execution begins, agents have a ﬁxed period of time to perform initial computations, such
as calculating initial probability proﬁles, before they actually begin to execute methods. The ﬁrst
experiment measured how much time, during this period, agents spent on PADS from when they
received their initial schedule until all probability proﬁles for the entire activity network were
calculated. The experiment was performed on the 20 problem test suite described above. Agent
threads were distributed among 10 3.6 GHz Intel computers, each with 2GB RAM. Figure 5.6
shows the average computation time each agent spent distributively calculating the initial expected
utilities for each problem, as well as the total computation time spent across agents calculating
proﬁles for that problem. On average, the clock time that passed between when execution began
and when all proﬁles were ﬁnished being calculated was 4.9 seconds. In this ﬁgure, the problems
are numbered as presented in Table 5.2.
Comparing this graph with Figure 5.4, we see that distributing the PADS algorithm does incur
some overhead. This is due to probability proﬁles on an agent needing to be updated multiple times
as proﬁle information arrives asynchronously from other agents and affects alreadycalculated pro
ﬁles. Even considering this, however, the average time spent per agent calculating probability
proﬁles is quite low, and suitable for a runtime algorithm.
Once agents begin executing methods and execution progresses, the workload in general is not
as high as these results, since proﬁles are only updated on an asneeded basis. On average across
these problems, once method execution has begun, individual PADS updates take an average of
102
5.4 Experiments and Results
!"
#"
$"
%"
&"
'!"
'#"
'" #" (" $" )" %" *" &" +" '!" ''" '#" '(" '$" ')" '%" '*" '&" '+" #!"
!
"
#
$
%
&
'
$
()*+,#"$
,./,0."12."3./",0.45" 565,7"12.",8/699",0.459"
Figure 5.6: Time spent distributively calculating 20 schedule’s initial expected utility; shown as
both the average time per agent and total time across all agents.
0.003 seconds (with standard deviation σ = 0.01). Overall, each agent spends an average of
1.5 seconds updating probabilities. In contrast, each individual replan takes an average of 0.309
seconds (σ = 0.39) with agents spending an average of 47.7 seconds replanning. Thus the overhead
of distributed PADS is insigniﬁcant when compared to the cost of replanning.
Because there are no approximations in our initial calculation of the probability proﬁles of the
activity tree, we do not duplicate our accuracy results for DPADS. We did, however, perform an
experiment which tested the accuracy of our utility difference approximation. For each method
scheduled for each of the 20 test suite problems, we calculated the utility difference exactly (using
the method described in Section 5.3.3), and approximately (using the approximation algorithm
described in Section 5.3.4). Overall, there were 1413 methods used in the experiment. A repeated
measures ANOVA showed a weakly signiﬁcant difference between the two calculations with p <
0.07; however, the utility difference approximations were, on average, within 4.58% of the actual
utility difference.
103
5 Probabilistic Analysis of Deterministic Schedules
104
Chapter 6
Probabilistic Schedule Strengthening
This chapter discusses how to effectively strengthen schedules by utilizing the information pro
vided by PADS. The next section describes the approach at a high level. The two sections follow
ing that provide detailed descriptions of the algorithms, and can be skipped by the casual reader.
Finally, Section 6.4 shows probabilistic schedule strengthening experimental results.
6.1 Overview
The goal of probabilistic schedule strengthening is to make improvements to the most vulnerable
portions of the schedule. It relies on PADS to target, or focus strengthening efforts on, “atrisk”
activities, or activities in danger of failing, whose failure impacts taskgroup utility the most. It
then takes steps to strengthen the schedule by protecting those activities against failure. Proba
bilistic schedule strengthening also utilizes PADS to ensure that all strengthening actions raise the
taskgroup expected utility. We implemented probabilistic schedule strengthening for the multi
agent domain [Hiatt et al., 2008, 2009]. In this domain, schedule strengthening is performed both
on the initial schedule before it is distributed to agents, and during distributed execution in con
junction with replanning.
At a high level, schedule strengthening works within the structure of the current schedule to
improve its robustness and minimize the number of activity failures that arise during execution. In
this way is somewhat dependent on the goodness of the schedules generated by the deterministic
planner. It is most effective if domain knowledge can be taken advantage of to ﬁnd the best ways
to make the schedule most robust. In the multiagent domain, for example, a common structural
feature of the activity networks is to have max tasks as the direct parents of several leaf node
105
6 Probabilistic Schedule Strengthening
methods. These sibling methods are typically of different durations and utilities, and often belong
to different agents. For example, a search and rescue domain might have “Restore Power” modeled
as a max task, with the option of fully restoring power to a town (higher utility and longer duration)
or restoring power only to priority buildings (lower utility and shorter duration). We refer to these
max tasks as “maxleafs.” Typically, a deterministic planner would choose only one child of a
maxleaf task to schedule. Probabilistic schedule strengthening, in contrast, might schedule an
additional child, or substitute one child for another, in order to make the schedule more robust.
Given our knowledge of the multiagent domain’s characteristics, such as the presence of max
leaf tasks, we have investigated three strengthening tactics for this domain:
1. “Backﬁlling”  Scheduling redundant methods under a maxleaf task to improve the task’s
likelihood of earning utility (recall that max tasks accrue the utility of the highest utility child
that executes successfully). This decreases schedule slack, however, which may lead to a loss
of utility.
2. “Swapping”  Replacing a scheduled method with a currently unscheduled sibling method
that has a lower a priori failure likelihood, a shorter duration or a different set of NLEs
1
. This
potentially increases the parent’s probability of earning utility by lowering the likelihood of
a failure outcome, or potentially increases overall schedule slack and the likelihood that this
(or another) method earns utility. Swapping may lead to a loss of utility, however, as the
sibling may have a lower average utility, or a set of NLEs that is less desirable.
3. “Pruning”  Unscheduling a method to increase overall schedule slack, possibly strengthen
ing the schedule by increasing the likelihood that remaining methods earn utility. This can
also lead to utility loss, however, as the removed method will not earn utility.
Note that each of these tactics has an associated tradeoff. PADS allows us to explicitly address
this tradeoff to ensure that all schedule modiﬁcations improve the taskgroup expected utility. A
priori, we expect backﬁlling to improve expected utility the most, as it preserves the higherutility
option while still providing a backup in case that higherutility option fails. Swapping is expected
to be the next most useful, as the task remains scheduled (although possibly at a lower utility).
Pruning is expected to be the weakest individual tactic. Our results (Section 6.4.2) support these
conjectures.
Activities to strengthen are identiﬁed and prioritized differently for the three tactics. For back
ﬁlling, maxleaf tasks are targeted which have a nonzero likelihood of failing, and methods are
1
The reason to consider siblings with a different set of NLEs is because they may change the effective time bounds
of the method (via different hard NLEs) or its effective duration (via soft NLEs).
106
6.1 Overview
added to reduce this failure probability. Parent tasks are prioritized for backﬁlling by their expected
utility loss (Section 5.3.2, on page 84). Recall that this metric speciﬁes by how much utility the
overall taskgroup is expected to suffer because of a task earning zero utility. As schedules can sup
port a limited amount of backﬁlling before they become too saturated with methods, prioritizing
the tasks in this way ensures that tasks whose expected failure affects taskgroup utility the most are
strengthened ﬁrst and have the best chance of being successfully backﬁlled. While this does not
always result in optimal behavior (for example, if backﬁlling the activity with the highest eul pre
vents backﬁlling two subsequent ones, which could have led to a higher overall expected utility),
we found it to be an effective heuristic in prioritizing activities for strengthening.
For both swapping and pruning, “chains” of methods are ﬁrst identiﬁed where methods are
scheduled consecutively and threaten to cause a method(s) to miss its deadline. Methods within
the chain are sorted by increasing expected utility gain (Section 5.3.2). Recall that this metric
speciﬁes by how much utility the overall taskgroup is expected to gain by a task earning positive
utility. Sorting methods in this way ensures that lowimportance methods, that do not contribute
much to the schedule, are removed or swapped ﬁrst to try increase overall schedule slack. The
chains themselves are prioritized by decreasing expected utility loss so that the most important
potential problems are solved ﬁrst. Methods with nonzero likelihoods of a failure outcome are
also considered candidates for swapping; such methods are identiﬁed and incorporated into the list
of chains, above. As with backﬁlling, using the eul and eug heuristics are not guaranteed to result
in optimal behavior, but we found them to be effective when used with our swapping and pruning
tactics.
Every time that a tactic attempts to modify the schedule, it takes a step. A step of any tac
tic begins with sorting the appropriate activities in the ways speciﬁed above. The step considers
strengthening each activity sequentially until a valid strengthening action is made. An action is
valid if (1) it is successful (i.e., a method is successfully scheduled, swapped or unscheduled) and
(2) it raises the taskgroup’s expected utility. When a valid action is found, the step returns without
trying to modify the schedule further. A step will also return if no valid action was found after all
possible actions were considered (e.g., all maxleaf tasks have been looked at but no valid backﬁll
is found). The main computational cost of schedule strengthening comes from updating probability
proﬁles while performing strengthening during these steps.
Given the three tactics introduced above, we next address the question of the order in which
they should be applied. The ordering of tactics can affect the effectiveness of probabilistic schedule
strengthening because the tactics are not independent: performing a step of one tactic, such as un
scheduling a method, may make another tactic, such as swapping that method, infeasible. We call
an approach to determining how tactics can be applied to perform probabilistic schedule strength
107
6 Probabilistic Schedule Strengthening
!"#$%&&'
()"*'
*+,.'
!"#$%&&'
()"*'
*+,.'
!"#$%&&'
()"*'
*+,.'
!"#$%&&'
()"*'
*+,.'
/'
/'
/'
/'
Figure 6.1: The baseline strengthening strategy explores the full search space of the different or
derings of backﬁlls, swapping and pruning steps that can be taken to strengthen a schedule.
ening a strategy. The baseline “pseudooptimal” strategy that ﬁnds the best possible combination
of these actions considers each possible order of backﬁlling, swapping and pruning steps, ﬁnds the
taskgroup expected utility that results from each ordering, and returns the sequence of steps that
results in the highest expected utility (Figure 6.1). This strategy, however, can take an unmanage
able amount of time as it is exponential in the number of possible strengthening actions. Although
its practical use is infeasible, we use it as a baseline to compare more efﬁcient strategies against.
The highlevel strategy we use is a variant of a roundrobin strategy. Agents perform one
backﬁll step, one swap step and one prune step, in that order, repeating until no more actions are
possible. This strategy is illustrated in Figure 6.2. Two additional heuristics are incorporated in
order to improve the strategy. The ﬁrst prevents swapping methods for lowerutility methods (to
lower the a priori failure likelihood) while backﬁlling the parent maxleaf task is still possible.
The second prevents pruning methods that are a task’s sole scheduled child, and thus pruning en
tire branches of the taskgroup, when a backﬁll or swap is still possible. These heuristics help to
prioritize the tactics in order of their individual effectiveness, leading to a more effective proba
bilistic schedule strengthening strategy. If other strengthening actions were being used, analogous
heuristics could be used to prioritize the new set in its order of effectiveness. Although we found
these heuristics to be very effective at increasing the amount of expected utility gained by perform
ing roundrobin schedule strengthening, they are not guaranteed to be optimal. For example, if a
backﬁll was performed instead of a prune to bolster the expected utility of a task, it is possible that
then backﬁlling some other set tasks may not be possible, which may have led to a higher overall
expected utility.
We also experimented with a greedy heuristic variant that performs one “trial” step of each
tactic, and commits to the step that raises the expected utility the most, repeating until no more
108
6.1 Overview
!"#$%&&' ()"*' *+,.'
Figure 6.2: The roundrobin strengthening strategy performs one backﬁll, one swap and one prune,
repeating until no more strengthening actions are possible.
!"#$%&&'
()"*'
*+,.'
!"#$%&&'
()"*'
*+,.'
!"#$%&&'
()"*'
*+,.'
/'
Figure 6.3: The greedy strengthening strategy repeatedly performs one trial backﬁll, one trial swap
and one trial prune, and commits at each point to the action that raises the expected utility of the
joint schedule the most.
steps are possible (Figure 6.3). It utilizes the same two heuristics as the roundrobin strategy. It
performs, in general, roughly as well as the roundrobin strategy, but with a higher running time,
as for each successful strengthening action it performs up to three trial actions; the roundrobin
strategy, on the other hand, only performs the one.
During distributed execution, probabilistic schedule strengthening and replanning work well
when used together, as they ensure that the current schedule takes advantage of opportunities that
arise while also being robust to execution uncertainty; ideally, therefore, replanning would not be
performed without being followed by probabilistic schedule strengthening. Schedule strengthen
ing may also be useful when performed alone if the schedule has changed since the last time it
was performed, as the changes may have made it possible to strengthen activities that could not be
strengthened before, or they may have weakened new activities that could be helped by strength
ening.
Agents operate independently and asynchronously during execution. This means that actions
such as replanning and probabilistic schedule strengthening are also performed asynchronously.
Coordination is required, therefore, to prevent agents from simultaneously performing duplicate
actions needlessly and/or undesirably (e.g., two agents backﬁlling the same task when one backﬁll
would sufﬁce). To address this, agents communicate a “strengthening status” for each local method
indicating whether it is scheduled, whether it is unscheduled and the agent has not tried to backﬁll it
yet, or whether the agent has tried to backﬁll it but the backﬁll action was invalid. Agents use this
109
6 Probabilistic Schedule Strengthening
information to reduce these unwanted situations and to inform the two heuristics of roundrobin
strengthening described above.
Recall also that, as agents make changes to their own probability proﬁles, they do not explicitly
wait for or keep track of the effects these changes have on the rest of the hierarchy, even though
changes resulting from their update may eventually propagate back to them. In a domain with
high executional uncertainty such as this one, doing so would be very difﬁcult because typically
many probability proﬁles are changing concurrently. Instead, agents replan and strengthen with as
complete information as they have. Although this of course has the potential to lead to suboptimal
behavior, this situation is alleviated since agents have the option of replanning and strengthening
again later if their actions became unsuitable for the updated situation.
6.2 Tactics
The next three sections describe the tactics in detail, for both the global and distributed versions of
our multiagent domain.
6.2.1 Backﬁlling
Global
Algorithm 6.1 shows pseudocode for a step of the global backﬁlling tactic. The step performs
one backﬁll, if possible, and then returns. It begins by sorting all scheduled maxleaf tasks T
where po (T) < 1 by eul (T) (Line 2), targeting improvements at maxleaf tasks which might fail.
We deﬁne the function maxLeafs to return scheduled maxleaf tasks. Additionally, in this and
subsequent algorithms we assume sort takes as its arguments the set of activities to sort, the sort
order, and the ﬁeld of the activity to sort by. Sorting in this way ensures that tasks whose expected
failure leads to the highest utility loss are strengthened ﬁrst.
Once the maxleaf tasks are sorted, the step stores the original eu(TG) value (Line 3) to use
later for comparison. Then, each child of each task is sequentially considered for backﬁlling. If a
child is unscheduled, and the step is able to schedule it via a call to the planner doschedule
(Line 6), then the activity hierarchy’s probability proﬁles are updated (Line 7). Next, the step
compares the taskgroup’s new eu value to the original (Line 8). If it is higher, the step returns the
successfully backﬁlled child c. If not, it undoes the schedule (Lines 1112) and continues looping
through the tasks and children. If no successful backﬁlls are found once all maxleaf tasks have
been considered, the step returns null (Line 17).
110
6.2 Tactics
Algorithm 6.1: Global backﬁlling step utilizing maxleaf tasks.
Function : doOneGlobalBackfill
MLT = maxLeafs() 1
MLT
s
= sort(¦T ∈ MLT [ po (T) < 1¦, >, eul) 2
eu
TG
= eu(TG) 3
foreach T ∈ MLT
s
do 4
foreach c ∈ children(T) do 5
if unscheduled(c) and doschedule(c) then 6
updateAllProfiles() 7
if eu(TG) > eu
TG
then 8
//expected utility increased, commit to backfill
return c 9
else 10
//expected utility decreased or stayed the same, undo
backfill
undo(doschedule(c)) 11
updateAllProfiles() 12
end 13
end 14
end 15
end 16
return null 17
111
6 Probabilistic Schedule Strengthening
This step, as well as the steps for the swapping and pruning tactics (described below), can be
modiﬁed in several simple ways to suit the needs of different domains. Threshold values for failure
probabilities could be utilized, if desired, to limit computation. Also, if a domain has an effective
way to sort child methods of a maxleaf task for backﬁlling, then that could be added as well at
Line 5; in our domain, the order of the methods considered for backﬁlling and swapping was found
to be unimportant.
Distributed
The distributed backﬁlling step has several complications. First, as agents no longer have access to
the entire activity network in order to calculate eul values, it must use instead the
´
eul approximation
(Section 5.3.4). Second, updateAllProfiles can no longer update proﬁles throughout the
entire task tree, and the step must instead rely only on updating local probabilities via the function
updateLocalProfiles. It follows that the change in taskgroup expected utility that results
from a backﬁll can only be estimated, which can be done using the same approximation technique
used to approximate eul and eug. Third, the maxLeafs function returns only maxleafs that the
agent knows about (whether local or remote), and their children must be checked to see if they
are local before they can be considered by an agent. Fourth, agents are strengthening their local
schedule asynchronously, and so coordination is required to prevent agents from simultaneously
performing duplicate actions needlessly and/or undesirably.
To address the fourth issue, agents communicate a “strengthening status” for each method:
• noattempt : this method is unscheduled and the owning agent has not yet tried to schedule
this method as part of a backﬁll
• failedbackﬁll : this method is unscheduled because the owning agent tried, but failed, to
schedule this method as part of a backﬁll
• scheduled : this method is scheduled
The strengthening status is communicated along with probability proﬁles during execution. Agents
use this information to implicitly and partially coordinate schedule strengthening during execution.
Namely, given a predeﬁned sort order for maxleaf children, agents will not try to backﬁll a task
with their own methods if there is a method earlier in the list of the task’s children whose owner has
not yet attempted to use it to backﬁll the task. This helps to prevent the duplicate actions mentioned
above, as the ﬁrst time that a maxleaf task is distributively strengthened all of its children are
scheduled (or attempted to be scheduled) in order. It is still possible for duplicate strengthening
actions to be performed, such as if two agents simultaneously revisit the maxleaf task on a later
112
6.2 Tactics
schedule strengthening pass and both succeed in backﬁlling it; however, in practice we rarely
see this occur. We deﬁne the function status to return a method’s strengthening status. The
strengthening status is also used to inform the heuristics for some of our strengthening strategies,
described in Section 6.3.
Algorithm 6.2 presents the pseudocode for a distributed backﬁll step. It begins by sorting all
maxleaf tasks by their estimated eul values (Line 2). The step next stores the original eu(TG)
value (Line 3) to use later for comparison. Then, each child of each task is sequentially considered
for backﬁlling. If a child is not local and its owner has not yet attempted to backﬁll it (i.e., its
strengthening status is noattempt), then it skips this maxleaf task and continue to the next. This
ensures that methods are (attempted to be) backﬁlled in order, discouraging concurrent backﬁlling.
If the method is local, unscheduled, and the algorithm is able to schedule it via a call to the planner
doschedule (Line 8), then the agent’s local activity proﬁles are updated (Line 9).
Next, the step looks at each method in the agent’s local schedule. It estimates the change in
expected utility of the taskgroup by summing the estimated change resulting from differences in
each method’s proﬁles, if any; this ensures that any effects a backﬁll has on other methods in the
schedule (e.g., pushing them towards a deadline) are accounted for (Line 11). If the expected utility
is estimated to be higher, the step returns the successfully backﬁlled child c. If not, it undoes the
schedule (Lines 1617) and continues looping through the tasks and their children. If no successful
backﬁll is found once all maxleaf tasks have been considered, the step returns null (Line 22).
6.2.2 Swapping
Global
A step of the swapping tactic performs one swap, if possible. The step begins by identifying
methods that are candidates for swapping. Swapping can serve to strengthen the schedule in two
different ways: it can include methods with lower a priori failure outcome likelihoods and can
increase schedule slack to increase the likelihood that this (or another) method meets a deadline.
Algorithm 6.3 shows pseudocode for a step of the global swapping tactic. The step ﬁrst ﬁnds
chains of methods, scheduled backtoback on the same agent, that have overlapping execution
intervals, where at least one method in the chain has a probability of missing a deadline (Line 1).
It sorts the methods m within the chains by increasing eug (m), so that methods that are expected
to contribute the least to the overall utility are swapped ﬁrst (Line 2).
To this set of chains, the step adds scheduled methods that have an a priori likelihood of a
failure outcome (Lines 35). It identiﬁes the max eul value of the methods in the group, and sorts
113
6 Probabilistic Schedule Strengthening
Algorithm 6.2: Distributed backﬁlling step utilizing maxleaf tasks.
Function : doOneDistributedBackfill
MLT = maxLeafs() //all maxleafs the agent knows about, whether 1
local or remote
MLT
s
= sort(¦T ∈ MLT [ po (T) < 1¦, >,
´
eul) 2
eu
TG
= eu(TG) 3
foreach T ∈ MLT
s
do 4
foreach c ∈ children(T) do 5
if not(local(c)) and status(c) == noattempt then 6
continue //continue to next task 7
else if local(c) and unscheduled(c) and doschedule(c) then 8
updateLocalProfiles() 9
foreach m ∈ localSchedule() do 10
//estimate how m’s new po value impacts TG expected
utility
¯
∆eu =
¯
∆eu + (approximateEU(m, po (m)) −eu
TG
) 11
end 12
if
¯
∆eu > 0 then 13
//expected utility increased, commit to backfill
return c 14
else 15
//expected utility decreased, undo backfill
undo(doschedule(c)) 16
updateLocalProfiles() 17
end 18
end 19
end 20
end 21
return null 22
114
6.2 Tactics
the resulting groups by the decreasing maximum eul, so that groups whose worst expected failure
leads to the most utility loss are considered ﬁrst (Line 9). Next, the step iterates through the groups
and methods in the groups to ﬁnd candidates for swapping (Line 1112).
A method and its sibling are a candidate for swapping if they are owned by the same agent
and the sibling is unscheduled. In addition, the sibling must be considered a good candidate for
swapping. If the original method was considered for swapping because it has a failure likelihood,
the sibling method must have a lower failure likelihood. If the original method was identiﬁed
because it was part of a chain, the sibling must meet one of two criteria to be considered a candidate:
(1) the sibling has a shorter average duration, so swapping may increase schedule slack; or (2) the
sibling has a different set of NLEs, so swapping might increase schedule slack due to the different
temporal constraints (Lines 1318). If the two methods meet this criteria, the planner is called to
attempt to swap them. Then, if the swap is successful, the activity hierarchy’s probability proﬁles
are updated, and the taskgroup’s new eu value is compared to the original (Line 1920). If it is
higher, the step returns the successfully swapped methods (Line 21). If not, it undoes the swap and
continues looping through the methods and chains (Line 2324). If no successful swaps are found
once all chains have been considered, the step returns the empty set ¦¦ (Line 30).
Distributed
The adaptation of this step to make it distributed is analogous to changes made for the distributed
version of the backﬁlling step, where localSchedule is called instead of jointSchedule,
not all proﬁles can be updated at once, and eul, eug and the change in expected utility of the
taskgroup have to be approximated.
6.2.3 Pruning
Global
Algorithm 6.4 shows pseudocode for a step of the global pruning tactic. The step performs one
prune, if possible. It begins by ﬁnding chains of methods on agents’ schedules that have overlap
ping execution intervals, exactly as in the swapping algorithm (Line 1). Also as with swapping, it
sorts the methods m within the chains by increasing eug, and the chains themselves by decreasing
eul (Lines 23). Next, the step iterates through the chains and each method in the chains to ﬁnd
methods that can be unscheduled (Lines 57). When it reaches a method that can be unscheduled,
then the activity hierarchy’s probability proﬁles are updated, and the new eu(TG) value is com
pared to the original (Lines 89). If it is higher, the step returns the successfully pruned method
115
6 Probabilistic Schedule Strengthening
Algorithm 6.3: Global swapping tactic for a schedule with failure likelihoods and NLEs.
Function : doOneGlobalSwap
C = findChains(jointSchedule()) 1
foreach c ∈ C do c ← sort(c, <, eug) 2
foreach m ∈ jointSchedule() do 3
if fail(m) > 0 then 4
push(¦m¦, F) 5
end 6
end 7
G = C ∪ F 8
G = sort(G, >, eul) 9
eu
TG
= eu(TG) 10
foreach g ∈ G do 11
foreach m ∈ g do 12
foreach m
∈ siblings(m) do 13
if owner(m
) = owner(m) and unscheduled(m
) 14
and ((g ∈ F and fail(m
) < fail(m)) 15
or (g ∈ C and 16
(averageDur(m
) < averageDur(m) or NLEs(m
) = NLEs(m)))) 17
and doswap(m, m
) then 18
updateAllProfiles() 19
if eu(TG) > eu
TG
then 20
return ¦m, m
¦ 21
else 22
undo(doswap(m, m
)) 23
updateAllProfiles() 24
end 25
end 26
end 27
end 28
end 29
return ¦¦ 30
116
6.3 Strategies
(Line 10). If not, it undoes the unschedule and continues looping through the chains and methods
(Lines 1213). If no successful swaps are found once all chains have been considered, the step
returns null (Line 18).
Algorithm 6.4: Global pruning step.
Function : doOneGlobalPrune
C = findChains(jointSchedule()) 1
foreach c ∈ C do c ← sort(c, <, eug) 2
G = sort(C, >, eul) 3
eu
TG
= eu(TG) 4
foreach g ∈ G do 5
foreach m ∈ g do 6
if dounschedule(m) then 7
updateAllProfiles() 8
if eu(TG) > eu
TG
then 9
return m 10
else 11
undo(dounschedule(m)) 12
updateAllProfiles() 13
end 14
end 15
end 16
end 17
return null 18
Distributed
The adaptation of this step to make it distributed is analogous to changes made for the distributed
version of the backﬁlling step, where localSchedule is called instead of jointSchedule,
not all proﬁles can be updated at once, and eul, eug and the change in expected utility of the
taskgroup have to be approximated.
6.3 Strategies
Given these three strengthening tactics, the next step is to ﬁnd a strategy that speciﬁes the order in
which to apply them. The ordering of tactics can affect the effectiveness of probabilistic schedule
117
6 Probabilistic Schedule Strengthening
strengthening because the tactics are not independent: performing an action of one tactic, such
as unscheduling a method, may make another tactic, such as swapping that method, infeasible.
Similarly, performing an action of one tactic, such as swapping sibling methods, may increase
schedule slack and make another tactic, such as backﬁlling a maxleaf task, possible.
In order to develop a good strategy to use for probabilistic schedule strengthening, we ﬁrst con
sider a searchbased strategy that optimizes the application of the tactics. This baseline, “pseudo
optimal” solution considers each possible order of backﬁlling, swapping and pruning steps, ﬁnds
the taskgroup expected utility that results from each ordering, and returns the sequence of steps
that results in the highest expected utility. This strategy is not optimal, since the ordering of ac
tivities within each tactic is just a heuristic; however, as this strategy is exponential in the number
of possible strengthening actions, it still quickly grows to take an unmanageable amount of time.
Although this makes its own use infeasible, we use it as a baseline against which to compare more
efﬁcient strategies.
Algorithm 6.5 shows the pseudocode for globally ﬁnding the best possible ordering of tactics
for the baseline strategy. It performs a depthﬁrst search to enumerate all possible orderings, re
cursing until no more strengthening actions are possible. It returns the highest expected utility for
the schedule it found as well as the ordering of tactics that produced that schedule. As we are using
this strategy as a baseline, it is not necessary to show how it would be implemented distributively.
We implemented two more efﬁcient strategies to compare against the baseline strategy. One
is a variant of a roundrobin strategy, shown in Algorithm 6.6. The global version of the strategy
performs one backﬁlling step, one swapping step and one pruning step, repeating until no more
steps are possible. Two additional heuristics are incorporated. The ﬁrst concerns swapping meth
ods to ﬁnd a lower a priori failure likelihood. The heuristic speciﬁes that a swap cannot take place
for a method unless the method’s parent is not eligible for backﬁlling, where not eligible means
that either its po value equals 1 or backﬁlling has already been attempted on all of the method’s
siblings (i.e., no siblings have a strengthening status of noattempt). This heuristic prevents swap
ping methods for lowerutility methods while backﬁlling a parent maxleaf task (and preserving a
higher utility option) is still possible. This is incorporated in as part of the swapping tactic; we do
not show the revised algorithm.
The second heuristic is that a prune cannot take place unless either: (1) the candidate for
pruning is not the only scheduled child of its parent; or (2) swapping or backﬁlling any method is
not possible at the current point in the algorithm. This heuristic prevents pruning methods, and thus
entire branches of the taskgroup, when a backﬁll or swap is still possible and would better preserve
the existing task structure. Again this is added in as part of the pruning tactic; we do not show the
revised algorithm.
118
6.3 Strategies
Algorithm 6.5: Global baseline strengthening strategy utilizing backﬁlls, swaps and prunes.
Function : baselineStrengthening
b = doOneBackfill() 1
if b then 2
eu
b
, orderings
b
= baselineStrengthening() 3
undo(doschedule(b)) 4
updateAllProfiles() 5
end 6
¦s, s
¦ = doOneSwap() 7
if ¦s, s
¦ then 8
eu
s,s
, orderings
s,s
= baselineStrengthening() 9
undo(doswap(s, s
)) 10
updateAllProfiles() 11
end 12
p = doOnePrune() 13
if p then 14
eu
p
, orderings
p
= baselineStrengthening() 15
undo(dounschedule(p)) 16
updateAllProfiles() 17
end 18
if b or ¦s, s
¦ or p then 19
switch max(eu
b
, eu
s,s
, eu
p
) do 20
case eu
b
return eu
b
, ¦b¦ ∪ orderings
b
21
case eu
s,s
return eu
s,s
, ¦¦s, s
¦¦ ∪ orderings
s,s
22
case eu
p
return eu
p
, ¦p¦ ∪ orderings
p
23
end 24
end 25
return eu(TG), ¦¦ 26
Algorithm 6.6: Global roundrobin strengthening strategy utilizing backﬁlls, swaps and prunes.
Function : roundRobinStrengthening
while true do 1
b = doOneBackfill() 2
¦s, s
¦ = doOneSwap() 3
p = doOnePrune() 4
if not(b or ¦s, s
¦ or p) then return 5
end 6
119
6 Probabilistic Schedule Strengthening
The distributed version of this strategy differs only in how the distributed versions of backﬁll
ing, swapping and pruning functions are invoked. The overall structure and heuristics remain the
same. As the details are purely implementational, we do not provide them here.
The other strategy is a greedy variant, shown in Algorithm 6.7. This strategy is similar to
hillclimbing: it performs one ‘trial’ step of each tactic, records for each the increase in expected
utility that resulted (Lines 27), and commits to the step that raises the expected utility the most
(Lines 920), repeating until no more steps are possible. This greedy strategy also incorporates in
the same two heuristics as does the roundrobin strategy. In general, it performed the same as the
roundrobin strategy in our preliminary testing, but at a much higher computational cost; therefore
we omit discussion of its distributed counterpart.
Algorithm 6.7: Global greedy strengthening strategy utilizing backﬁlls, swaps and prunes.
Function : greedyStrengthening
while true do 1
b = doOneBackfill(), eu
b
= eu(TG) 2
if b then undo(doschedule(b)), updateAllProfiles() 3
¦s, s
¦ = doOneSwap(), eu
s,s
= eu(TG) 4
if ¦s, s
¦ then undo(doswap(s, s
)), updateAllProfiles() 5
p = doOnePrune(), eu
p
= eu(TG) 6
if p then undo(dounschedule(p)), updateAllProfiles() 7
if b or ¦s, s
¦ or p then 8
switch max(eu
b
, eu
s,s
, eu
p
) do 9
case eu
b
10
doschedule(b) 11
end 12
case eu
s,s
13
doswap(s, s
) 14
end 15
case eu
p
16
dounschedule(p) 17
end 18
end 19
updateAllProfiles() 20
else 21
return 22
end 23
end 24
120
6.4 Experimental Results
6.4 Experimental Results
We used the DARPA test suite of problems described in Section 5.4.2 (on page 98) for the proba
bilistic schedule strengthening experiments. We ran four experiments: one testing the effectiveness
of using PADS to target and prioritize activities for schedule strengthening; one testing the effec
tiveness of the various schedule strengthening strategies; one testing how schedule utility improves
when executing prestrengthened schedules; and one measuring the effect of probabilistic schedule
strengthening when performed in conjunction with replanning during execution.
6.4.1 Effects of Targeting Strengthening Actions
We ran an experiment to test the effectiveness of using PADS to target and prioritize activities
when performing probabilistic schedule strengthening [Hiatt et al., 2008]. For simplicity, in this
experiment the only tactic considered was backﬁlling. We compared three different classes of
schedules: (1) the original schedule generated by the deterministic, centralized planner; (2) the
schedule generated by performing backﬁlling on the original schedule, using PADS to target and
prioritize activities (for clarity we refer to this as probabilistic backﬁlling); and (3) the schedule
generated by performing unconstrained backﬁlling on the original schedule. The unconstrained
backﬁlling tactic is deterministic, and loops repeatedly through the set of all scheduled maxleaf
tasks and attempts to add one additional method under each task in each pass. It will continue
until it can no longer backﬁll under any of the maxleafs due to lack of space in agents’ sched
ules. This third mode provides the schedule strengthening beneﬁts of backﬁlling without either the
algorithmic overhead or the probabilistic focus of probabilistic backﬁlling.
We ran the precomputed schedules for each condition through a multiagent simulation en
vironment in order to test their relative robustness. All agents and the simulator ran on separate
threads on a 2.4 GHz Intel Core 2 Due computer with 2GBRAM. For each problemin our test suite,
and each condition, we ran 10 simulations, each with their own stochastically chosen durations and
outcomes, but purposefully seeded such that the same set of durations and outcomes were executed
by each of the conditions. In order to directly compare the relative robustness of schedules, we
executed the schedules as is, with no replanning allowed. The schedule’s ﬂexible times representa
tion does, however, allow the time bounds of methods to ﬂoat in response to execution dynamics,
and we did retain basic conﬂictresolution functionality that naively repairs inconsistencies in the
STN, such as removing methods from a schedule if they miss their lst.
The average utility earned by the three conditions for each of the 20 problems is shown in
Figure 6.4. The utility is expressed as a function of the scheduled utility of the original initial
121
6 Probabilistic Schedule Strengthening
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
18 16 14 17 5 2 8 15 19 20 1 7 4 6 10 12 3 9 11 13 a
v
e
r
a
g
e
p
r
o
p
o
r
t
i
o
n
o
f
s
c
h
e
d
u
l
e
d
u
t
i
l
i
t
y
e
a
r
n
e
d
problem number
no backfilling
unconstrained
backfilling
probabilistic
backfilling
Figure 6.4: Utility earned by different backﬁlling conditions, expressed as the average proportion
of the originally scheduled utility earned.
schedule: it is represented as the average proportion of that scheduled utility earned during ex
ecution. Problem numbers correspond to the numbers in Table 5.2, and problems are sorted by
how well the condition (1) did; this roughly correlates to sorting the problems by their level of
uncertainty. On all but 4 of the problems (16, 19, 20 and 6), probabilistic backﬁlling earned the
highest percentage of the scheduled utility. In the four problems that unconstrained backﬁlling
wins, it does so marginally. On average across all problems, no backﬁlling earned a proportional
utility of 0.64, unconstrained backﬁlling earned a proportional utility of 0.66, and probabilistic
backﬁlling earned a proportional utility of 0.77. Using a repeated measures ANOVA, probabilistic
backﬁlling is signiﬁcantly better than either other mode with p < 0.02. Unconstrained backﬁll
ing and no backﬁlling are not statistically different; although unconstrained backﬁlling sometimes
outperformed no backﬁlling, it also sometimes performed worse than no backﬁlling.
Looking more closely at the 4 problems where unconstrained backﬁlling outperformed prob
abilistic backﬁlling, we observe that these results in some cases are due to occasional subpar
planning by the deterministic planner. For example, under some circumstances, the method al
located by the planner under a maxleaf may not be the highest utility child, even though there
is enough schedule space for the latter. Unconstrained backﬁlling may correct this coincidentally
due to its undirected nature, while probabilistic backﬁlling will not even consider this maxleaf if
the originally scheduled method is not in danger of earning no utility. This highlights our earlier
supposition that the beneﬁt of probabilistic schedule strengthening is affected by the goodness of
the original, deterministic schedule.
Figure 6.5 shows the percent increase in the number of methods in the initial schedule after
122
6.4 Experimental Results
0
25
50
75
100
125
150
175
200
225
18 16 14 17 5 2 8 15 19 20 1 7 4 6 10 12 3 9 11 13
p
e
r
c
e
n
t
i
n
c
r
e
a
s
e
i
n
n
u
m
b
e
r
o
f
m
e
t
h
o
d
s
problem number
unconstrained
backfilling
probabilistic
backfilling
Figure 6.5: The percent increase of the number of methods in the initial schedule after backﬁlling
for each mode.
backﬁlling for the two modes that performed backﬁlling. Problem numbers correspond to the
numbers in Table 5.2, and problems are sorted to be consistent with Figure 6.4. In all cases but
one, probabilistic backﬁlling added at least 1 method. Despite resulting in less utility earned during
execution, unconstrained backﬁlling far exceeds probabilistic backﬁlling in the number of backﬁlls.
Comparing Figures 6.4 and 6.5, the effective, probabilistic nature of our schedule strengthen
ing approach is emphasized. In general, for the more certain problems to the left, targeted back
ﬁlling does not add many methods because not many are needed. For the more uncertain, difﬁcult
problems to the right, it adds methods in order to hedge against that uncertainty. Unconstrained
backﬁlling, in contrast, has no discernible pattern in the graph from left to right. These ﬁgures
again highlight how schedule strengthening can be limited by the goodness of the original schedule
and how it is unable to ﬁx all problems: for problems 6 and 3, for example, probabilistic backﬁll
ing does not much affect the utility earned as it is able to make only a small number of effective
strengthening actions.
Overall, these results illustrate the advantage to be gained by our probabilistic schedule strength
ening approach. There is an inherent tradeoff associated with reinforcing schedules in resource
constrained circumstances: there is advantage to be gained by adding redundant methods, as seen
by comparing the probabilistic backﬁlling results to the no backﬁlling results; however, this ad
vantage can be largely negated if backﬁlling is done without guidance, as seen by comparing the
no backﬁlling results to the relatively similar results of unconstrained backﬁlling. Our explicit
123
6 Probabilistic Schedule Strengthening
use of probabilities to target and prioritize activities provides a means of effectively balancing this
tradeoff.
6.4.2 Comparison of Strengthening Strategies
We compared the effectiveness of the different strengthening strategies by strengthening initial
schedules for the 20 problems in the DARPA test suite in 6 ways: (1) pruning only; (2) swapping
only; (3) backﬁlling only; (4) the roundrobin strategy; (5) the greedy strategy; and (6) the baseline
strategy [Hiatt et al., 2009]. Strengthening was performed on a 3.6 GHz Intel computer with 2GB
RAM. For 8 of the 20 problems the baseline strategy did not ﬁnish by 36 hours; for these cases
we considered the run time to be 36 hours and the utility earned to be the best found when it
was stopped. The resulting average expected utility after the probabilistic schedule strengthening,
as well as the average running time for each of the strategies, is shown in Table 6.1. The table
also shows, on average, how many strengthening actions each strategy successfully performed;
recall, for comparison, that these problems initially had roughly between 5090 methods initially
scheduled.
Table 6.1: A comparison of strengthening strategies.
avg expected utility avg running time
avg num actions
(% of baseline) (in seconds)
initial 74.5 – –
prune 77.2 6.7 0.8
swap 85.1 32.9 3.5
backﬁll 92.8 68.5 10.4
roundrobin 99.5 262.0 13.1
greedy 98.5 354.5 13.0
baseline 100 32624.3 13.0
Our supposition of the effectiveness of individual tactics is supported by these results: namely,
that backﬁlling is the strongest individual tactic, followed by swapping and then pruning. Fur
ther, despite the limitations of each individual tactic due to each one’s associated tradeoff, their
different properties can also have synergistic effects (e.g., swapping can increase schedule slack,
making room for backﬁlling), and using all three results in stronger schedules than using any one
individually.
124
6.4 Experimental Results
The roundrobin strategy uses, on average, fewer strengthening actions than the sum of the
number of actions performed by the pruning, swapping and backﬁlling strategies independently
(13.1 compared to 14.6). In contrast, its running time is much more than the sum of the running
times of the three independent strategies (262.0 seconds compared to 108.1 seconds). This is
because of the increased search space: when attempting to perform a valid action, instead of only
trying to backﬁll, the roundrobin strategy must consider the space of backﬁlls, swaps and prunes.
Therefore, in situations where only backﬁlls are possible, for example, the roundrobin strategy
incurs a higher overhead than the pure backﬁlling strategy.
Clearly the roundrobin strategy outperforms each of the individual strategies it employs at a
modest runtime cost. When compared to the greedy strategy, it performs mostly the same. There
are, however, a couple problems where roundrobin outperforms greedy. An example of such a
situation is when the greedy strategy backﬁlled a maxleaf task twice, but roundrobin backﬁlled
it and then swapped two of its children; the roundrobin strategy turned out to be better, as it
resulted in more schedule slack and left room for other backﬁlls. Although we do not see the
difference between the two conditions to be very meaningful, the difference in run time leads us to
choose roundrobin as the dominant strategy. Additionally, roundrobin is almost as effective as the
baseline strategy and, with a runtime that is over two orders of magnitude faster, is the preferred
choice.
6.4.3 Effects of Global Strengthening
We performed an experiment to compare the utility earned when executing initial deterministic
schedules with the utility earned when executing strengthened versions of the schedules. We ran
simulations comparing the utility earned by executing three classes of schedules: (1) the sched
ule generated by the deterministic, centralized planner; (2) the schedule generated by performing
backﬁlling (the best individual tactic) only on the schedule in condition (1); and (3) the sched
ule generated by performing roundrobin schedule strengthening on the schedule in condition (1).
Experiments were run on a network of computers with Intel Pentium D CPU 2.80GHz and Intel
Pentium 4 CPU 3.60GHz processors; each agent and the simulator ran on its own machine.
For each problem in our DARPA test suite, and each condition, we ran 25 simulations, each
with their own stochastically chosen durations and outcomes, but purposefully seeded such that the
same set of durations and outcomes were executed by each of the three conditions. As with previous
experiments, in order to directly compare the relative robustness of schedules, we executed the
schedules as is, with no replanning allowed. Recall that the schedule’s ﬂexible times representation
does, however, allow the time bounds of methods to ﬂoat in response to execution dynamics, and
125
6 Probabilistic Schedule Strengthening
we did retain basic conﬂictresolution functionality that naively repairs inconsistencies in the STN,
such as removing methods from a schedule if they miss their lst.
Figure 6.6 shows the average utility earned by the three sets of schedules for each of the 20
problems. The utility is expressed as the fraction of the originally scheduled utility earned during
execution. Problem numbers correspond to the numbers in Table 5.2, and problems are sorted by
how well condition (1) did; this roughly correlates to sorting the problems by their level of uncer
tainty. Note although this graph and Figure 6.4 share two conditions, the data are slightly different;
this is due to the previous experiment running only 10 simulations per problem, in contrast to the
25 simulations this experiment performs.
0
0.1
0.2
0.3
0.4
0.3
0.6
0.7
0.8
0.9
1
18 16 3 17 14 13 2 8 20 1 4 19 7 6 3 10 12 9 13 11
!
"
#
$
!
%
#
&
'
$
(
'
(
$
)
(
*
&
(
+
&
,

.
#
/
0
1
#
/
&
0
)
1
2
3
4
&
#
!
$
*
#
/
&
'$(51#6&
lnlual schedule
backñll sLrenaLhenlna
roundrobln sLrenaLhenlna
Figure 6.6: Utility for each problem earned by different strengthening strategies, expressed as the
average proportion of the originally scheduled utility earned.
For problems 14, 15, 2, 8, 3 and 10, roundrobin strengthening identiﬁed only backﬁll steps
to be of value, and as a result produced results identical to the backﬁlling only strategy. Problems
18, 5, 17 and 20, while not identical, are on average very close for these conditions. For individ
ual simulations of these four problems, we see that roundrobin strengthening usually performs
slightly worse than backﬁlling alone, but occasionally far outperforms it. This interesting behavior
arises because roundrobin can hedge against method failure by swapping in a lower utility, shorter
duration method for a method with a small chance of earning zero utility. While for many runs this
tactic sacriﬁces a few utility points when the original method succeeds, on average roundrobin
avoids the big utility loss that occurs when the method fails. For problem 3, strengthening is un
126
6.4 Experimental Results
able to provide much help in making the schedule more robust, and so all three conditions earn
roughly the same utility.
As the ﬁgure shows, roundrobin schedule strengthening outperforms backﬁlling, which in
turn outperforms the initial schedule condition. Roundrobin schedule strengthening, overall, out
performs the initial schedule condition by 31.6%. The differences between the conditions are
signiﬁcant with p < 0.01, when tested for signiﬁcance with a repeated measures ANOVA.
6.4.4 Effects of RunTime Distributed Schedule Strengthening
We next ran an experiment to quantify the effect that schedule strengthening has when performed
in conjunction with replanning during execution. We used a 16 problem subset of the 20 problems
described above. Twelve of the problems involved 20 agents, 2 involved 10 agents, and 2 involved
25 agents. Three of the problems had fewer than 200 methods, 6 had between 200 and 300 methods,
5 between 300 and 400 methods, and 2 had more than 400 methods. We initially avoided 4 of the
problems due to difﬁculties getting the planner we were using to work well with them; as the results
were signiﬁcant with only 16 problems, we did not spend the time necessary to incorporate them
into the test suite for these experiments.
We ran simulations comparing three conditions: (1) agents are initially given their portion
of the precomputed initial joint schedule and during execution replan as appropriate; (2) agents
are initially given their portion of a strengthened version of the initial joint schedule and during
execution replan as appropriate; (3) agents are initially given their portion of a strengthened version
of the initial joint schedule and replan as appropriate during execution; further, agents perform
probabilistic schedule strengthening each time a replan is performed. Experiments were run on a
network of computers with Intel(R) Pentium(R) D CPU 2.80GHz and Intel(R) Pentium(R) 4 CPU
3.60GHz processors; each agent and the simulator ran on its own machine. For each problem in our
test suite we ran 25 simulations, each with their own stochastically chosen durations and outcomes,
but purposefully seeded such that the same set of durations and outcomes were executed by each
of the three conditions.
Figure 6.7 shows the average utility earned by the three conditions for each of the 16 problems.
The utility is expressed as the fraction of the “omniscient” utility earned during execution. We
deﬁne the omniscient utility for a given simulation to be the utility produced by a deterministic,
centralized version of the planner on omniscient versions of the problems, where the actual method
durations and outcomes generated by the simulator are given as inputs to the planner. As the planner
is heuristic, the fraction of omniscient utility earned can be greater than one. Problem numbers
correspond to the numbers in Table 5.2, and problems are sorted by how well the condition (1) did;
127
6 Probabilistic Schedule Strengthening
!"#$
!"%$
!"&$
!"'$
!"($
)$
)")$
)%$ )($ )!$ )$ *!$ +$ )'$ )#$ )+$ #$ *$ )&$ ))$ )*$ ($ ),$
!
"
#
$
!
%
#
&
'
$
(
'
(
$
)
(
*
&
(
+
&
(
,
*

.
/

#
*
0
&
&
1
)
2

0
3
&
#
!
$
*
#
4
&
'$(52#,&
./0122324$5206$
./0122324$52067$89.249:.2.;$323<10$8=:.;>0.$
./0122324$12;$52032.$8=:.;>0.$89.249:.2324$
Figure 6.7: Utility for each problem for the different online replanning conditions, expressed as the
average proportion of omniscient utility earned.
this roughly correlates to sorting the problems by their level of uncertainty.
Online schedule strengthening results in higher utility earned for all problems. For some the
difference is considerable; for others it results in only slightly more utility on average as replanning
is able to sufﬁciently respond to most unexpected outcomes. On average, probabilistic schedule
strengthening outperforms condition (1) by 13.6%. Note that this difference is less dramatic than
the difference between the probabilistic schedule strengthening and initial schedule conditions of
the previous experiment of global schedule strengthening. This difference is because replanning
is able to recover from some of the problems that arise during execution. Online probabilistic
schedule strengthening does, however, have an additional beneﬁt of protecting schedules against
an agent’s planner going down during execution, or an agent losing communication so that it can
no longer send and receive information to other agents. This is because the last schedule produced
will likely be more effective in the face of unexpected outcomes. Note, though, that our results
do not reﬂect this because we assume that communication is perfect and never fails, and so these
communication problems are not present in our experiments. Overall, the differences between the
online schedule strengthening condition and the other two conditions are signiﬁcant with p < 0.01,
when tested for signiﬁcance with a repeated measures ANOVA.
128
6.4 Experimental Results
!"
#!!"
$!!"
%!!"
&!!"
'!!!"
'#!!"
'$!!"
'%" '(" '!" '" #!" $" '&" ')" '$" )" #" '*" ''" '#" (" '+"
!
"
#
$
!
%
#
&
'
(
)
*
+
,
&
#
!
$

#
.
&
/$01)#2&
,./011213"41/5"
,./011213"016"41/21"7896:/"7;,13;91213"
Figure 6.8: Utility earned across various conditions, with standard deviation shown.
Interestingly, conditions (1) and (2), which differ in which joint schedule they begin with, have
very similar results for many of the problems. Ultimately this is because when an agent replans for
the ﬁrst time, it effectively “undoes” much of the schedule strengthening: the deterministic planner
believes that redundant methods are unnecessary, and that as long as all methods ﬁt in the schedule,
the amount of slack is acceptable. This indicates that prestrengthening schedules is not critical to
the success of probabilistic schedule strengthening; still, we include it in future experiments for
completeness.
Figure 6.8 shows, for the same problems, the utility earned when replanning only (condition
(1)) to the utility earned when both replanning and schedule strengthening (condition (3)). Error
bars indicate one standard deviation above and below the mean. A notable feature of this graph is
how probabilistic schedule strengthening, in general, reduces the standard deviation of the utility
earned across the 25 simulations. These results emphasize how probabilistic schedule strengthen
ing improves the average utility by helping to protect against activity failures, lessening the impact
of uncertainty on the outcome of execution.
Timing measurements gathered during these experiments show that each individual call to up
date probability proﬁles and to strengthen a schedule was very fast. Individual proﬁle updates took
on average 0.003 seconds (with standard deviation σ = 0.01) and individual schedule strength
enings took on average 0.105 seconds (σ = 0.27). Overall, each agent spent an average of 1.7
129
6 Probabilistic Schedule Strengthening
seconds updating probabilities and 37.6 seconds strengthening schedules for each problem. In con
trast, each individual replan took an average of 0.309 seconds (σ = 0.39) with agents spending an
average of 47.7 seconds replanning. Even with the fastpaced execution of our simulations, with
methods lasting between 15 and 30 seconds, these computations are well within range to keep pace
with execution.
Ultimately, the results clearly show that the best approach utilizes both replanning and schedule
strengthening to generate and maintain the best schedule possible: replanning provides the means
to take advantage of opportunities for earning higher utility when possible, while schedule strength
ening works to protect against activity failure and minimize the problems that arise to which the
planner cannot sufﬁciently respond.
130
Chapter 7
Probabilistic MetaLevel Control
This chapter discusses how to perform effective metalevel control of planning actions by utilizing
the information provided by PADS. The ﬁrst section discusses the control strategy at a highlevel,
and describes how our overall probabilistic plan management approach comes together. The two
following sections provide a more detailed description of probabilistic metalevel control, and can
be skipped by the casual reader. Finally, Section 7.4 shows probabilistic metalevel control and
PPM experimental results.
7.1 Overview
We developed versions of probabilistic metalevel control for both the singleagent and multiagent
domains. The goal of these control strategies is to intelligently decide, based on a probabilistic
analysis, when replanning is likely to be helpful and when it is not. Ideally, agents would replan
only if it would be useful either in avoiding failure or in taking advantage of new opportunities.
Similarly, they ideally would not replan if it would not be useful, either because: (1) there was no
potential failure to avoid; (2) there was a possible failure but replanning could not ﬁx it; (3) there
was no new opportunity to leverage; or (4) replanning would not be able to exploit the opportuni
ties that were there. Many of these situations are difﬁcult to explicitly predict, in large part because
it is difﬁcult to predict whether a replan is beneﬁcial without actually doing it. As an alternative,
probabilistic metalevel control uses the probabilistic analysis to identify times where there is po
tential for replanning to be useful, either because there is likely a new opportunity to leverage or
a possible failure to avoid. The probabilistic metalevel control strategy is the ﬁnal component of
our probabilistic plan management approach.
131
7 Probabilistic MetaLevel Control
7.1.1 SingleAgent Domain
The probabilistic metalevel control strategy has two components in the single agent domain. First,
there is a probabilitybased planning horizon, called the uncertainty horizon, which limits how
much of the schedule is replanned at any given time. A planning horizon is an effective metalevel
strategy for this domain due to its relatively ﬂat activity network, its lack of AND nodes, and the
presence of activity dependencies only at the method level. The uncertainty horizon is placed where
the probability of a replan occurring before execution reaches that point in the schedule rises above
a threshold. The threshold can be adjusted via trial and error as necessary to achieve the right
balance between expected utility and computation time. The portion of the schedule within the
horizon is considered for further PPM; the portion outside is ignored, lowering the computational
overhead of managing the plan during execution.
The second metalevel control component determines when replanning is performed during
execution. Because this domain has a clear maximal and minimal utility, an effective metalevel
control strategy is to replan based on a threshold of expected utility. As execution progresses,
as long as the expected utility of the schedule within the uncertainty horizon remains above a
threshold, execution continues without replanning, because there is likely enough ﬂexibility to suc
cessfully absorb unexpected method durations during execution and avoid failure. If the expected
utility ever goes below the threshold, a replan is performed to improve the schedule. The threshold
can be adjusted via trial and error as necessary to achieve the right balance between expected utility
and computation time. We term this strategy likely replanning.
7.1.2 MultiAgent Domain
The probabilistic metalevel control strategy for our oversubscribed, distributed, multiagent do
main determines when to replan and performprobabilistic schedule strengthening during execution.
Because the domain is oversubscribed and the maximal feasible utility is unknown, a likely replan
ning strategy thresholding on expected utility is not a good ﬁt for this domain. Instead, metalevel
control is based on learning when it is effective to replan or strengthen based on the probabilistic
state of the schedule: it assumes the initial schedule is as robust as possible and then learns when
to replan and strengthen, or only strengthen, based on how execution affects probability proﬁles at
the method level.
There are several reasons for considering only methodlevel proﬁles. First, since this is a
distributed scenario, looking at the probabilistic state of the taskgroup or other highlevel tasks is
not always helpful for determining whether locally replanning could be of use. Second, due to the
presence of OR nodes in this scenario, the probability proﬁle of the taskgroup or another highlevel
132
7.1 Overview
task does not always accurately represent the state of the plan due to the different uafs at each level
of the tree. For example, if a task has two children, one of which will always earn utility and one
of which will almost never earn utility, the task itself will have a probability of earning utility of
100%; this probability does not adequately capture the idea that one of its children will probably
fail. Finally, we wanted also to learn when probabilistic schedule strengthening would be a good
action to take, and lowerlevel proﬁles provide a better indication of when this might be useful.
The probabilistic metalevel control strategy for this domain learns when to replan using a
classiﬁer; speciﬁcally, decision trees with boosting (described in Section 3.4, on page 44). The
classiﬁer is learned ofﬂine, based on data from previous execution traces. Once the classiﬁer has
been learned, it is used to decide what planning action to take at any given time during execution.
7.1.3 Probabilistic Plan Management
With a metalevel control strategy in hand, our overall probabilistic plan management approach
can be used to manage plans during execution. For the purposes of this discussion, we assume
agent execution is a messagebased system, where during execution agents take planning actions
in response to update messages, either from other agents providing information such as changes
in probability proﬁles, or from the simulator providing feedback on method execution such as a
method’s actual start or end time. Such a message is referred to as updateMessage. We assume
this solely for clarity of presentation.
Highlevel, general pseudocode for the runtime approach is shown in Algorithm 7.1. During
execution, PPM waits for updates to be received, such as about the status of a method the agent is
currently executing or from other agents communicating status updates (Line 2). When a message
is received, PADS is used to update all (or local for distributed cases) probability proﬁles (Line 3).
Once the proﬁles have been updated, probabilistic metalevel control is used to decide what plan
ning action to take, if any (Line 4). Once that action has been taken, probability proﬁles are updated
again (Line 7), and any actions the agent needs to take to process the changes (e.g., communicate
changes to other agents) are done (Line 9). This process repeats until execution halts.
This algorithm can be instantiated depending on the speciﬁcs of probabilistic plan management
for a domain. For example, for the singleagent domain, the overall structure of probabilistic
plan management for this domain is: during execution, each time a method ﬁnishes and the agent
gets a message from the simulator about its end time (corresponding to Line 2 of Algorithm 7.1),
the agent ﬁrst updates the uncertainty horizon and then updates all proﬁles within the horizon to
reﬂect the current state of execution (Line 3). At this point, the probabilistic metalevel control
strategy is invoked (Line 4). To decide whether to replan, it considers the current expected utility
133
7 Probabilistic MetaLevel Control
Algorithm 7.1: Probabilistic plan management.
Function : ProbabilisticPlanManagement()
while execution continues do 1
if updateMessage() then 2
updateAllProfiles() 3
act = metaLevelControlAction() 4
if act then 5
plan(act) 6
updateAllProfiles() 7
end 8
process() 9
end 10
end 11
of the schedule. If it is within the acceptable range, it returns null and so the agent does nothing.
Otherwise, it returns replan, and so the agent replans all activities within the uncertainty horizon,
ignoring all others, and updates all proﬁles within the horizon to reﬂect the new schedule (Lines 6
7). There is no processing necessary in this domain (Line 9). This process continues until execution
has completed.
For the multiagent domain, any time an agent receives an update message, whether from the
simulator specifying method execution information or from another agent communicating infor
mation about remote activities (corresponding to Line 2 of Algorithm 7.1), the agent updates its
local probabilities (Line 3). It then passes the probabilistic state to the probabilistic metalevel
control classiﬁer, which indicates whether the agent should replan and strengthen, only strengthen,
or do nothing (Line 4). The agent acts accordingly, and communicates changes to interested agents
(Lines 69). This process continues until execution has ﬁnished.
7.2 SingleAgent Domain
We developed a probabilistic metalevel control strategy for our singleagent domain [Hiatt and
Simmons, 2007]. Recall once again that the objective of the domain is that all tasks have exactly
one method underneath them execute without violating any temporal constraints, and the only
source of uncertainty is in method duration. The schedule is represented as an STN.
The probabilistic metalevel control strategy for this domain has two components: an uncer
tainty horizon and likely replanning. We begin by discussing a baseline, deterministic plan manage
134
7.2 SingleAgent Domain
ment strategy for this domain, and then describe our more informed, probabilitydriven strategies.
7.2.1 Baseline Plan Management
In the baseline plan management strategy for this domain, ASPEN is invoked both initially, to
generate an initial plan, and then during execution whenever a method ﬁnishes at a time that causes
an inconsistency in the STN, such as a method missing a deadline or running so early or so late
that a future method is unable to start in its speciﬁed time window.
7.2.2 Uncertainty Horizon
In general, we would like to spend time managing only methods that are near enough to the present
to warrant the effort, and not waste time replanning methods far in the future when they will
only have to be replanned again later. The problem, then, is how to determine which methods
are “near enough.” We pose this as how to ﬁnd the ﬁrst method at which the probability of a
replan occurring before that method is executed rises above a threshold, which we call the horizon
threshold. We calculate the probability of a replan occurring before a method starts executing based
on the baseline, deterministic plan management system.
The highlevel equation for ﬁnding the probability that the baseline system will need to replan
before the execution of some method m
i
in the schedule is:
(1 −po
b
(m
1
, ..., m
i
))
where po
b
(m
j
) is deﬁned as the probability that method m
j
meets its plan objective by observing
the constraints in the deterministic STN. We simplify the calculation by assuming this means it
ends within its range of valid end times, [eft(m
j
) , lft(m
j
)], given that it starts at est(m
j
).
Because po
b
(m
i
) is independent of po
b
(m
1
, ..., m
i−1
), po
b
(m
1
, ..., m
i
) =
¸
i
j=1
po
b
(m
j
). To
begin calculating, we ﬁnd the probability that each method mends in the range [eft(m) , lft(m)]
given that it starts at its est:
po
b
(m) =normcdf(lft(m) , µ(DUR(m)) +est(m) , σ (DUR(m)))
−normcdf(eft(m) , µ(DUR(m)) +est(m) , σ (DUR(m)))
Multiplying these together, we can ﬁnd the probability of a replan occurring before execution
reaches method m
i
. The horizon is set at the method where this probability rises above the user
deﬁned horizon threshold. The algorithm pseudocode is shown in Algorithm 7.2, where the vari
able max horizon indicates that the horizon includes all scheduled methods. Recall that the function
schedule returns the agent’s current schedule, with methods sorted by their scheduled start time.
135
7 Probabilistic MetaLevel Control
Algorithm 7.2: Uncertainty horizon calculation for a schedule with normal duration distributions.
Function : calcUncertaintyHorizon
po
b
= 1 1
foreach method m ∈ schedule do 2
po
b
= po
b
(normcdf(lft(m) , µ(DUR(m)) +est(m) , σ (DUR(m))) 3
−normcdf(eft(m) , µ(DUR(m)) +est(m) , σ (DUR(m)))) 4
if (1 −po
b
) > horizon threshold then return m 5
end 6
return max horizon 7
We ﬁnd the uncertainty horizon boundary and use it to limit the scope of probabilistic plan
management, ignoring methods, tasks and constraints outside of the horizon. A higher horizon
threshold is a more conservative approach and results in a longer horizon, where more methods
and tasks are considered during each replan. A lower threshold corresponds to a shorter horizon,
and results in fewer methods being considered by PPM. The threshold can be adjusted via trial
and error as necessary to achieve the right balance between expected utility and computation time.
In our experimental results (Section 7.4.1), we show experiments run with various thresholds and
demonstrate the impact they have on plan management.
When a replan is performed, the planner replans all methods and tasks within the uncertainty
horizon. This may cause inconsistencies between portions of the schedule inside and outside of
the horizon as constraints between them are not observed while replanning. As long as at least
one activity involved in the constraint remains outside of the horizon, such constraint violations
are ignored. As execution progresses, however, and the horizon correspondingly advances, these
conﬂicts shift to be inside the uncertainty horizon. They are handled at that time as necessary by
replanning.
7.2.3 Likely Replanning
The decision of when to replan is based on the taskgroup’s expected utility within the uncertainty
horizon. The algorithm to calculate this value is very similar to Algorithm 5.3 (on page 68), which
calculates the expected utility of the entire schedule. The differences are that instead of iterating
through all methods and tasks to update activity probability proﬁles, only the proﬁles of methods
and tasks within the uncertainty horizon are updated. Similarly, when calculating po (TG), only
those tasks within the uncertainty horizon are considered. The expected utility calculation within
the uncertainty horizon is shown in Algorithm 7.3.
136
7.2 SingleAgent Domain
Algorithm 7.3: Taskgroup expected utility calculation, within an uncertainty horizon, for the single
agent domain with normal duration distributions and temporal constraints.
Function : calcExpectedUtility(uh)
foreach method m
i
∈ schedule until uh do 1
//calculate m
i
’s start time distribution
ST (m
i
) = truncate(est
(m
i
) , ∞, µ(FT
s
(m
i−1
)) , σ (FT
s
(m
i−1
))) 2
//calculate m
i
’s finish time distribution
FT (m
i
) = ST (m
i
) +DUR(m
i
)) 3
//calculate m
i
’s probability of meeting its objective
po (m
i
) = normcdf(lft
(m
i
) , µ(FT (m
i
)) , σ (FT (m
i
))) − 4
normcdf(eft
(m
i
) , µ(FT (m
i
)) , σ (FT (m
i
)))
//calculate m
i
’s expected utility
eu(m
i
) = po (m
i
) 5
//adjust the FT distribution to be m
i
’s FT distribution given it
succeeds
FT
s
(m
i
) = truncate(eft
(m
i
) , lft
(m
i
) , µ(FT (m
i
)) , σ (FT (m
i
))) 6
end 7
foreach T ∈ children(TG) until uh do 8
if [scheduledChildren(T) [ == 1 then 9
po (T) = po (soleelement(T)) 10
eu(T) = eu(soleelement(T)) 11
else 12
po (T) = 0 13
eu(T) = 0 14
end 15
end 16
po (TG) =
¸
T∈children(TG) until uh
po (T) 17
eu(TG) = po (TG) 18
137
7 Probabilistic MetaLevel Control
Note that, as before, we set eu(TG) equal to po (TG), where po (TG) is the probability that
the schedule within the horizon executes successfully. This means the range of values for eu(TG)
within the horizon is [0, 1]; this is ideal, as we are using the value to compare to a ﬁxed threshold,
called the replanning threshold. As long as the expected utility of the taskgroup remains above
the replanning threshold, no replanning is performed; if the expected utility ever goes below the
threshold, however, a replan is done to improve schedule utility. A high replanning threshold, close
to 1, corresponds to a more conservative approach, rejecting schedules with high expected utilities
and replanning more often. At very high thresholds, for example, even a conﬂictfree schedule just
generated by the planner may not satisfy the criterion if its probability of successful execution, and
thus its expected utility, is less than 1. Lower thresholds equate to more permissive approaches,
allowing schedules with less certain outcomes and lower expected utility to remain, and so per
forming fewer replans. The threshold can be adjusted via trial and error as necessary to achieve
the right balance between utility and replanning time. In our experimental results (Section 7.4.1),
we show experiments run with various thresholds and demonstrate the impact they have on plan
management.
If a replan is not performed, there may be a conﬂict in the STN due to a method running early
or late. If that is the case, we adjust methods’ scheduled durations to make the STN consistent. To
do so, we ﬁrst use the STN to ﬁnd all valid integer durations for unexecuted methods. Then we
iteratively choose for each method the valid duration closest to its original one as the method’s new
scheduled, deterministic duration. The durations are reset to their domaindeﬁned values when the
next replan is performed. The ability to perform this duration adjustment is why we are able to
perform the extra step of extending the STN to STN
for our analysis (see Section 5.2, on page 63),
and is what allows us to assume that methods can violate their lst as long as they do not violate
their lst
.
7.2.4 Probabilistic Plan Management
The overall PPM algorithm is derived from Algorithm 7.1. As before, the agent waits for update
messages to be received. When one is, the agent updates all probability proﬁles. The function
updateAllProfiles (Line 3 in Algorithm 7.1) for this domain is deﬁned as
calcExpectedUtility(calcUncertaintyHorizon())
This equation ﬁnds the uncertainty horizon and updates all probability proﬁles within it. The agent
next determines whether to replan. The metalevel control function (Line 4) is deﬁned as:
if eu(TG) < planning threshold then return replan else return null
138
7.3 MultiAgent Domain
Finally, after agents have replanned (or not), they have nothing to process at this point, so process
does not do anything.
7.3 MultiAgent Domain
We implemented a probabilistic metalevel control strategy for the multiagent domain, where the
plan objective is to maximize utility earned. We begin by discussing baseline plan management
for this domain, then describe our probabilistic metalevel control strategy and overall probabilistic
plan management approach.
7.3.1 Baseline Plan Management
We use the planning system described in [Smith et al., 2007] as our baseline plan management
system for this domain. During execution, agents replan locally in order to respond to updates
from the simulator specifying the current time and the results of execution for their own methods,
as well as updates passed from other agents communicating the results of their execution, changes
due to replanning, or changes to their schedules that occur because of messages from other agents.
Speciﬁcally, replans occur when:
• A method completes earlier than its expected ﬁnish time and the next method cannot start
immediately (a replan here exploits potential schedule slack)
• A method ﬁnishes later than the lst of its successor or misses a deadline
• A method starts later than requested and causes a constraint violation
• A method ends in a failure outcome
• Current time advances and causes a constraint violation
No updates are processed while an agent is replanning or schedule strengthening, and so important
updates must wait until the current replan or strengthening pass, if one is occurring, has ﬁnished
before they can be incorporated into the agent’s schedule and responded to. More detail on this
planning system can be found in [Smith et al., 2007].
7.3.2 Learning When to Replan
Our probabilistic metalevel control strategy is based on a classiﬁer which classiﬁes the current
state of execution and indicates what planning action, if any, the agent should take. We use decision
trees with boosting (see Section 3.4) to learn the classiﬁer. The learning algorithm takes training
139
7 Probabilistic MetaLevel Control
examples that pair states (in our case, probabilistic information about the current schedule) and
classiﬁcations (e.g., “replan and strengthen,” “do nothing,” etc.), and learns a function that maps
from states to classiﬁcations. States are based on changes in probability proﬁles at the method
and maxleaf task level that result from update messages. Considering proﬁles at this level is
appropriate for distributed applications like this one: higherlevel proﬁles can be poor basis for a
control algorithm because the state of the joint plan does not necessarily directly correspond to the
state of an agents’ local schedule and activities.
For each local method, we look at how its probability of meeting its objective (po), ﬁnish time
distribution when it earns utility (FT
u
) and ﬁnish time distribution when it does not earn utility
(FT
nu
) change. For each of these values, we consider the maximum increase and decrease in their
value or, for distributions, their average and standard deviation. This allows the metalevel control
strategy to respond to, for example, methods’ projected ﬁnish times becoming earlier or later, or
a large change in a method’s probability of earning utility. We also look at the po value of max
leaf tasks and their remote children, as well as whether any of their remote children have been
scheduled or unscheduled, since changes in these values are indicative of whether probabilistic
schedule strengthening may be useful. The overall state consists of:
1. the largest increase in local method po
2. the largest decrease in local method po
3. the largest increase in local method µ(FT
u
)
4. the largest decrease in local method µ(FT
u
)
5. the largest increase in local method σ (FT
u
)
6. the largest decrease in local method σ (FT
u
)
7. the largest increase in local method µ(FT
nu
)
8. the largest decrease in local method µ(FT
nu
)
9. the largest increase in local method σ (FT
nu
)
10. the largest decrease in local method σ (FT
nu
)
11. the largest increase in maxleaf task po
12. the largest decrease in maxleaf task po
13. the largest increase in any maxleaf tasks’ remote child po
14. the largest decrease in any maxleaf tasks’ remote child po
15. whether any maxleaf task’s remote children have been scheduled or unscheduled
To distinguish this state from the probabilistic state of the schedule as a whole, we call this state
the “learning state.”
140
7.3 MultiAgent Domain
We trained two different classiﬁers: one to decide whether to replan, and one to decide whether
to replan and strengthen, only strengthen, or do nothing. We use the former in our experiments to
test the effectiveness of the probabilistic metalevel control strategy when used alone. It learned
when to replan based on the baseline system with no strengthening; its possible classiﬁcations were
“replan only” or “do nothing.” We use the latter to test the overall plan management approach,
which also includes probabilistic schedule strengthening. Its possible classiﬁcations were “replan
and strengthen,” “strengthen only,” and “do nothing.” We do not consider “replan only” a condition
here because, as previously argued (Section 6.1), replanning should always be used in conjunction
with strengthening, if possible.
To generate our training data, we used 4 training problems that closely mirrored in structure
those used in our experiments, described in Section 7.4.2. We executed the problems 5 times each.
During execution, agents replanned in response to every update received (for the ﬁrst classiﬁer),
or replanned and strengthened in response to every update received (for the second classiﬁer). For
the rest of this section, we describe the learning approach in terms of the more general case, which
includes strengthening.
We analyzed the execution traces of these problems to generate training examples. Each time
a replan and strengthening pass was performed by an agent in response to an update, we recorded
the learning state that resulted from the changes in probabilities caused by that update. We next
looked to see if either replanning or strengthening resulted in “important” changes to the schedule.
Important changes were deﬁned as changes that either: (1) removed methods originally in the
schedule but not ultimately executed; or (2) added methods not originally in the schedule but
ultimately executed. Note that a method could be removed from the schedule but still executed
if, for example, it was added back onto the schedule at a later time; this would be considered a non
important change. If an important change was made, the learning state was considered a “replan
and strengthen” or “strengthen only” example, depending on which action resulted in the change.
If no important change was made, the learning state was considered a “do nothing” example.
Using this data, we learned a classiﬁer of 100 decision trees, each of height 2. When given a
state as an input, the classiﬁer outputs a sum of the weighted votes for each of the three classes.
Each sum represents the strength of the belief that the example is in that class according to the
classiﬁer. In classic machine learning, the classiﬁcation with the maximal sum is used as the state’s
classiﬁcation.
In our domain, however, the data are very noisy and, despite using decision trees with boosting
(which are typically effective with noisy data), we found that the noisy data were still adversely
affecting classiﬁcation and almost all learning states were being classiﬁed as “do nothing.” The
source of the noise is the lack of direct correlation between a schedule’s probabilistic state and
141
7 Probabilistic MetaLevel Control
whether a replan will improve it, since it is hard to predict whether a replan will be useful. For
example, if something is drastically wrong with the schedule, a replan might not be able to ﬁx it;
similarly, if nothing is wrong with the schedule, a replan still might improve scheduled utility. In
terms of our classiﬁer, this means that in situations, for example, where the need to replan may seem
obvious according to the learning state (e.g., many probability values have changed), replanning
is not always useful and so in our training data such learning states were often classiﬁed as “do
nothing.”
We address this by changing the interpretation of the weighted votes for the three classes output
by the classiﬁer. Instead of treating them as absolute values, we treat them as relative values and
compare them to a baseline classiﬁcation to see whether the weight of the classiﬁcations for “replan
and strengthen” or “strengthen only” increase, decrease, or remain the same. If the weight for one
of the classiﬁcations increases, such as “replan and strengthen,” then we determine that replanning
and strengthening may be useful in that case. More technically, to get a baseline classiﬁcation, we
classify the baseline learning state where no probabilities have changed (i.e., a learning state of all
zeroes). Then, for each class, we calculate the proportion that the current learning state’s weighted
votes have as compared to the baseline example, and use that as the class’s new weighted vote. As
a second step to address the noisy data, we randomly choose a class, with each class weighted by
their adjusted votes. These steps are shown below.
7.3.3 Probabilistic Plan Management
The overall PPM algorithm is derived from Algorithm 7.1. As before, the agent waits for update
messages to be received. When one is, the agent updates all of its local probability proﬁles, via
a call to updateLocalProfiles, and generates the learning state based on the new proﬁle
probabilities. Then, the metalevel control algorithm is invoked to classify the learning state. The
pseudocode for the probabilistic metalevel control algorithm for this domain, which implements
142
7.4 Experiments and Results
metaLevelControlAction from Algorithm 7.1, is:
rs, so, dn = classify(learning state)
rs
b
, so
b
, dn
b
= classify(0, 0, 0, 0, 0...)
rs
= rs/rs
b
, so
= so/so
b
, dn
= dn/dn
b
rand = random
0, rs
+so
+dn
switch rand
case rand < rs
: return replan() , updateLocalProfiles() ,
roundRobinStrengthening()
case rs
≤ rand < rs
+so
: return roundRobinStrengthening()
case rs
+so
≤ rand < rs
+so
+dn
: return null
end
The algorithm classiﬁes both the learning state and the baseline learning state of all zeros, and
scales the learning state’s classiﬁcation weights as described above. Then, it randomly samples a
class from the scaled values. Based on this, it returns the appropriate planning action(s), and the
agent performs that action as per Algorithm 7.1. If a replan is performed, probabilities must be
updated before probabilistic schedule strengthening can be performed; it is not necessary to update
probability proﬁles after a strengthen, however, as probabilistic schedule strengthening takes care
of that. If no replan or strengthening pass is performed, execution continues with the schedule as
is. If there is an inconsistency in the STN due to a method running late, however, the next method
is removed from the schedule to make the STN consistent again. At this point, all that is left to do
is for the agent to process the changes (process in Algorithm 7.1) by sending updates to other
agents notifying them of appropriate changes.
7.4 Experiments and Results
We performed experiments to characterize the beneﬁts of probabilistic metalevel control for the
singleagent domain, which uses a thresholding metalevel control strategy, as well as for the multi
agent domain, which uses a learningbased metalevel control strategy. Finally, we ran additional
experiments characterizing the overall beneﬁt of probabilistic plan management for each domain.
143
7 Probabilistic MetaLevel Control
7.4.1 Thresholding MetaLevel Control
We performed several experiments to characterize the effectiveness of the different aspects of meta
level control, as well as PPM overall, for the singleagent domain [Hiatt and Simmons, 2007]. We
used as our test problems instances PSP94, PSP100 and PSP107 of the mmj20 benchmark suite of
the resourcerelaxed MultiMode Resource Constrained Project Scheduling Problem with Minimal
and Maximal Time Lags (MRCPSP/max) [Schwindt, 1998]. The problems were augmented as
described in Section 3.3.1.
We developed a simple simulator to execute schedules for this domain. Execution starts at
time zero, and methods are started at their est. To simulate method execution the simulator
picks a random duration for the method based on its duration distribution, advances time by that
duration, and updates the STN as appropriate. In this simple simulator, time does not advance
while replanning is performed; instead, time effectively is paused and resumed when the replan is
complete.
Uncertainty Horizon Analysis
The ﬁrst experiment tested how the choice of a horizon threshold affects the performance of exe
cution. In this experiment, the agent replanned whenever a method ended at a duration other than
its scheduled one, but replanned activities only within the uncertainty horizon. We ran experiments
under 5 conditions: the baseline deterministic system, and four conditions testing horizon threshold
values of 0.75, 0.9. 0.95 and 0.99. A threshold of 0.75 corresponds to replanning approximately
2 methods in the future, and a threshold of 0.99 replans approximately all 20 methods; in between
these two extremes, the number of methods scales roughly exponentially as the threshold linearly
increases.
The experiment was run on a 3.4 GHz Pentium D computer with 2 GB RAM. We performed
250 simulation runs for each condition and problem instance combination. Method duration means
and standard deviations were randomly repicked for each simulated run.
The results are shown in Figure 7.1. Figure 7.1(a) shows both the average makespan of the exe
cuted schedule alone and the average makespan plus average time spent managing and replanning,
which we refer to as execution time. It also has error bars for PSP94 showing the standard deviation
of the execution time; for clarity of the graph, we omit error bars for the other two problems. We
believe the high variances in execution time, shown by the error bars, are due to the randomness of
execution and the randomness of ASPEN, the planner. Figure 7.1(b) shows the average number of
replans per simulation run.
144
7.4 Experiments and Results
!"#$
!%#$
!&#$
!'#$
"##$
""#$
"%#$
"&#$
"'#$
#()*$ #('$ #('*$ #(+$ #(+*$ !$
!
"
#
$
!
%
#
&
'
(
#
&
)
*
#
+
,

.
*
/
&
0,$12,&30$#*0,4.&
,,+%$
./0123/4$
,,!##$
./0123/4$
,,!#)$
./0123/4$
,,+%$
15167894$8.1$
,,!##$
15167894$8.1$
,,!#)$
15167894$8.1$
(a) Average makespan and execution time using an uncertainty horizon at
various thresholds.
!"
!#$"
%"
%#$"
&"
&#$"
'"
'#$"
()"
)#%$" )#&" )#&$" )#'" )#'$" ("
!
"
#
$
!
%
#
&
'
(
)
*
#
$
&
+
,
&
$
#

.
!
'
/
&
0+$12+'&30$#/0+.4&
*+*',"
*+*())"
*+*()%"
(b) Average number of replans using an uncertainty horizon
at various thresholds.
!"#$
!"%$
!"&$
!"'$
!"($
!")$
!"*$
!"(&$ !")$ !")&$ !"*$ !"*&$ +$
!
"
#
$
!
%
#
&
'
(
)
*
+
,
&
.$*/.0&+$#1.)2&
,,*%$
,,+!!$
,,+!($
(c) Average utility using an uncertainty horizon at various
thresholds.
Figure 7.1: Uncertainty horizon results.
145
7 Probabilistic MetaLevel Control
Figure 7.1(c) shows average expected utility. Simulations that did not earn utility failed due
to the planner timing out during a replan, either because a solution did not exist (e.g., a constraint
violation could not be repaired) or a solution was not found in time. We began by allowing ASPEN
to take 100,000 repair iterations to replan (Section 3.3.1); this resulted, however, in the planner
timing out unintuitively often. Believing this to be due in part to ASPEN’s randomness, we there
fore switched to a modiﬁed randomrestart approach [Schoning, 1999], invoking it 5 separate times
with an increasing limit of repair iterations  5000, 5000, 10000, 25000, and ﬁnally 50000. This
approach signiﬁcantly decreased the number of time outs. All other statistics reported are compiled
from only those runs that completed successfully. Note that the notion of timeouts would be less
important when using more methodical, and less randomized, planners.
Figure 7.1(a) shows that, as the horizon threshold increases, the average schedule makespan
decreases  an unanticipated beneﬁt. It occurs because with longer horizons, the planner is able
to generate better overall plans, since it plans with more complete information. As the horizon
threshold increases, however, the average execution time also increases. This is due to replanning
more methods and tasks each time a replan is performed, increasing the amount of time each
replan takes. This point is highlighted when one considers Figure 7.1(b), which shows that despite
the execution time and replanning time increasing as the horizon threshold increases, the number
of replans performed decreases; therefore, clearly, each replan must take longer as the threshold
approaches 1.
The trend in the number of replans in Figure 7.1(b) occurs because, as the threshold decreases,
the number of replans increases due to needing to replan extra conﬂicts that arise by extending the
uncertainty horizon. This trend in turn explains Figure 7.1(c), where we believe higher thresholds
earn on average higher utility both because they have fewer conﬂicts between activities inside and
outside of the horizon, and because there are fewer replans performed, lessening the number of
replans that result in a timeout.
Although as the threshold in Figure 7.1(a) decreases the execution time plateaus, the balance
of management time and makespan continues to change, with the schedule span increasing as the
time spent managing continues to decrease. This suggests the use of a threshold where the overall
execution time is near the lower bound, and the schedule span is as low as possible, such as at the
thresholds of 0.75 or 0.9.
A repeated measures ANOVA test showed that the choice of horizon threshold signiﬁcantly
affects schedule makespan, execution time, number of replans, and utility earned, all with a conﬁ
dence of 5% or better.
146
7.4 Experiments and Results
Likely Replanning Analysis
We also tested how the choice of a replanning threshold affects the performance of execution.
In this experiment, the agent replanned all activities (i.e., there was no uncertainty horizon), but
replanned only when the schedule’s expected utility fell below the replanning threshold. We ran
experiments under 7 conditions testing replanning threshold values of 0.01, 0.05, 0.25, 0.45, 0.65,
0.85 and 1. Note that a replanning threshold of 1 has the potential to replan even if there are no
actual conﬂicts in the plan, since there may still be a small possibility of a constraint violation.
The experiment was run on a 3.4 GHz Pentium D computer with 2 GB RAM. We performed
250 simulation runs for each condition and problem instance combination. Method duration means
and standard deviations were randomly repicked for each simulated run.
The results are shown in Figure 7.2. Figure 7.2(a) shows both the average makespan of the
executed schedule and the execution time, with error bars for PSP94 showing the standard deviation
of execution time; we do not show error bars for the other two problems for clarity of the graph.
As with the uncertainty horizon results, we believe that the high variances in execution time are
due to the randomness of execution and the planner. Figure 7.2(b) shows the average number of
replan episodes per simulation run, and Figure 7.2(c) shows the average utility earned. As with
the uncertainty horizon results, simulations that did not earn utility failed due to the planner timing
out during a replan, either because a solution did not exist (e.g., a constraint violation could not
be repaired) or the solution was not found in time. The same modiﬁed randomrestart approach
was utilized, and all other statistics reported are compiled from only those runs that completed
successfully.
As Figure 7.2(a) shows, schedule makespan tends to decrease as the replanning threshold in
creases due to its requirement that schedules have a certain amount of expected utility, which
loosely correlates with schedule temporal ﬂexibility. The overall execution time, however, in
creases, as more replans are performed in response to more schedules’ expected utility not meeting
the replanning threshold. Figure 7.2(b) explicitly shows this trend of the average number of re
plans increasing as the replanning threshold increases. The threshold of 1 clearly takes this to an
extreme. This threshold, in fact, becomes problematic when one considers Figure 7.2(c). The high
expectations incurred by a replanning threshold of 1 result in a higher number of failures, due to
more frequent replans which have a chance of not ﬁnding a solution in the allotted amount of time.
We expect that the average utility would increase if we increased the number of iterations allowed
to the planner.
As with the uncertainty horizon, Figure 7.2(a) shows that as the replanning threshold decreases,
the execution time plateaus, but the balance of management time and makespan continues to
147
7 Probabilistic MetaLevel Control
!""#
!$"#
!%"#
!&"#
!'"#
$""#
$$"#
$%"#
$&"#
$'"#
"# "($# "(%# "(&# "('# !#
!
"
#
$
!
%
#
&
'
(
#
&
)
*
#
+
,

.
*
/
&
$#01!2%&34$#*4,1.&
)*)+%#
,./012#
)*)!""#
,./012#
)*)!"3#
,./012#
)*)+%#
/4/56782#7,/#
)*)!""#
/4/56782#7,/#
)*)!"3#
/4/56782#7,/#
(a) Average makespan and execution time using likely replanning at various
thresholds.
!"
#"
$"
%"
&"
'!"
'#"
'$"
'%"
!" !(#" !($" !(%" !(&" '"
!
"
#
$
!
%
#
&
'
(
)
*
#
$
&
+
,
&
$
#

.
!
'
/
&
$#.!''0'%&12$#/2+.3&
)*)+$"
)*)'!!"
)*)'!,"
(b) Average number of replan episodes using likely replan
ning at various thresholds.
!"#$
!"%$
!"&$
!"'$
!"($
)$
!$ !"*$ !"+$ !"%$ !"'$ )$
!
"
#
$
!
%
#
&
'
(
)
*
+
,
&
$#)!..*.%&+/$#0/1)2&
,,(+$
,,)!!$
,,)!&$
(c) Average utility using likely replanning at various
thresholds.
Figure 7.2: Likely replanning results.
148
7.4 Experiments and Results
change, with the schedule span increasing as the time spent continues to decrease. This again
suggests the use of a threshold where the overall execution time is near the lower bound, and the
schedule span is as low as possible, such as at a threshold of 0.65 or 0.85.
A repeated measures ANOVA showed that the choice of replanning threshold signiﬁcantly
affects schedule makespan, execution time, number of replans and utility, all with a conﬁdence of
5% or better.
Analysis of Probabilistic Plan Management
We ran a third experiment to test overall PPM for this domain, which integrates the probabilistic
metalevel control strategies of uncertainty horizon and likely replanning. The experiment com
pared the probabilistic metalevel control strategy with the baseline management strategy for this
domain (Section 7.2.1). In the experiment we also varied the thresholds for the uncertainty horizon
and probabilistic ﬂexibility.
As with the two previous experiments, this experiment was run on a 3.4 GHz Pentium D com
puter with 2 GB RAM. We performed 250 simulation runs for each condition and problem instance
combination. Method duration means and standard deviations were randomly repicked for each
simulated run.
To summarize the results, Figure 7.3 shows the average execution time across all three problems
using various combinations of thresholds of the uncertainty horizon and likely replanning. We also
show the average utility in Figure 7.4; note the axes are reversed in the diagram for legibility. The
label nouh refers no uncertainty horizon being used (and so only using likely replanning), and the
label nolr refers to not utilizing likely replanning (and so only using the uncertainty horizon). The
bar with both nouh and nolr, then, is baseline deterministic plan management. The bars are also
labeled with their numerical values.
The analyses of the two individual components hold when we consider the simulations that
incorporate both likely replanning and the uncertainty horizon. Speciﬁcally, the average utility is
highest at higher horizon thresholds and lower replanning thresholds, as Figure 7.4 shows. Fig
ure 7.3 also shows that as the thresholds decrease, the average execution time in general decreases
both because it replans fewer tasks during each replan episode and because fewer replans are made.
These ﬁgures also show that using the uncertainty horizon and likely replanning together results
in a lower execution time than each technique results in separately, while retaining the high aver
age utility earned when using likely replanning alone. The graphs indicate also that after a certain
point the analysis is not extremely parameter sensitive, and the execution time plateaus at a global
minimum.
149
7 Probabilistic MetaLevel Control
0.73
0.9
0.93
0.99
nouh
0.00
30.00
100.00
130.00
200.00
nolr
0.83
0.63
0.43
0.23
0.03
!"#$!%&
!"'$(#&
!""$%'&
!"'$)'&
!""$!!&
!""$*+&
!+"$#!&
!""$'+&
!""$,,&
!""$'(&
!""$#+&
!""$,*&
!,'$(%&
!",$'(&
!",$',&
!"+$+(&
!"+$,'&
!"+$)(&
!#($'#&
!+*$'!&
!+*$('&
!+*$*(&
!")$,,&
!")$(#&
!#($"+&
!+%$#*&
!+%$*!&
!+%$!!&
!+!$!+&
!+*$#+&
./01.2&
3/45.67&
8
9
4
/
8
:
4
&
4
;
4
<
=
>
.
2
&
>
?
4
&
/4@682202:&3/45.67&
Figure 7.3: Average execution time across all three problems using PPM with both the uncertainty
horizon and likely replanning at various thresholds.
Overall, these results show that the beneﬁt of PPM is clear, and achieves a signiﬁcantly better
balance of time spent replanning during execution versus schedule makespan. Using a horizon
threshold of 0.9 and a replanning threshold of 0.85, PPM decreases the schedule execution time
from an average of 187s to an average of 144s  a 23% reduction, while increasing average utility
by 30%.
7.4.2 Learning Probabilistic MetaLevel Control
We performed several experiments to characterize the effectiveness of the different aspects of meta
level control, as well as PPM overall, for the multiagent domain. To perform the experiments,
we use a multiagent, distributed execution environment. Agents’ planners and executors run on
separate threads, which allows deliberation to be separated from execution. The executor executes
the current schedule by activating the next pending activity to execute as soon as all of its causal
and temporal constraints are satisﬁed. A simulator also runs separately and receives method start
commands from the agent executor and provides back execution results as they become known.
All agents, and the simulator, run and communicate asynchronously but are grounded by a current
150
7.4 Experiments and Results
0.73
0.9
0.93
0.99
nouh
0.00
0.20
0.40
0.60
0.80
1.00
0.03
0.23
0.43
0.63
0.83
nolr
!"#$%
!"#$%
!"#&%
!"#'%
!"#'%
!"$(%
!"#&%
!"#$%
!"#$%
!"#(%
!"#'%
!")!%
!"#)%
!"#&%
!"#$%
!"#'%
!"#!%
!")*%
!"#&%
!"#&%
!"#$%
!"#!%
!"+$%
!")!%
!"#+%
!"#&%
!"#&%
!"#(%
!"#(%
!"),%
./01.2%
3/45.67%
8
9
4
/
8
:
4
%
;
<
6
0
3
=
%
/4>682202:%3/45.67%
Figure 7.4: Average utility across all three problems using PPM with both the uncertainty horizon
and likely replanning at various thresholds.
understanding of time. Time, e.g., method durations and deadlines, is measured in an abstract
measure of “ticks”; for our purposes, one tick equals 5 seconds.
Test Suite
We created a test suite speciﬁcally for this series of experiments in order to have problems that
were similar to each other structurally, for the beneﬁt of the learning algorithm. The test suite was
generated by an inhouse problem generator. All problems had 10 agents. The structures of the
problems were randomly created according to a variable template. Each problem had, underneath
the sum taskgroup, two subproblems with a sum and uaf. The subproblems had between 3 and 6
tasks underneath them, and all deadlines were imposed at this level. We call these tasks “windows”
for clarity. The windows had either sum and or sum uafs, and there were enabling constraints
between many of them. Each window had 4 tasks underneath it. All tasks at this level were max
uafs and had four child methods each. Two of these four had a utility and expected duration of 10;
the other two had a utility and expected duration of 5. The methods were also typically assigned
between two different agents. In this test suite, methods have deterministic utility, which promotes
consistency in the utility of the results. The duration distributions were 3point distributions: the
151
7 Probabilistic MetaLevel Control
highest was 1.3 times the expected duration; the middle was the expected duration; and the lowest
was 0.7 times the expected duration. Forty percent of methods had a 20% likelihood of a failure
outcome. There were no disablers, facilitators or hinderers.
Several parameters were varied throughout problem generation to get our overall test suite:
• Subproblem overlap  how much the execution intervals of the subproblems overlapped.
• Window overlap  how much the execution intervals of windows overlapped.
• Deadline tightness  how tight the deadlines were at the window level.
Each problem in our test suite was generated with a different set of values for these three pa
rameters. The generator then constructed problems randomly, using the parameters as a guide to
determine deadlines, etc., according to the structure above. Table 7.1 shows the main features of
each of the problems, sorted by the number of methods and average number of method outcomes.
In general, the problems have between 128 and 192 methods in the activity network, with roughly 4
execution outcomes possible for each method. The table also shows the number of maxleaf tasks,
tasks overall, enables NLEs, and ticks for each problem. The “ticks” ﬁeld speciﬁes the maximum
length of execution for each problem; recall that method durations, deadlines, etc. for this domain
are represented in terms of ticks, and that for our purposes, one tick equals 5 seconds.
Analysis of Probabilistic MetaLevel Control
We performed two experiments in which we try to characterize the beneﬁts gained by using our
probabilistic metalevel control strategy. For our ﬁrst experiment, we tested probabilistic meta
level control alone, without schedule strengthening. We ran trials under three different control
algorithms:
1. controlbaseline  replanning according to the baseline, deterministic plan management (de
scribed in Section 7.3.1).
2. controlalways  replanning in response to every update.
3. controllikely  replanning according to the probabilistic metalevel control strategy.
The controlalways condition was included to provide an approximate upper bound for how
much replanning could be supported during execution; we are most interested in the difference
between the controlbaseline and controllikely conditions to measure the effectiveness of proba
bilistic metalevel control.
152
7.4 Experiments and Results
Table 7.1: Characteristics of the 20 generated test suite problems.
methods
avg # method max
tasks
enable
ticks
outcomes leafs NLEs
1 128 3.91 32 43 2 125
2 128 4.01 32 43 2 164
3 128 4.01 32 43 2 232
4 128 4.10 32 43 2 210
5 128 4.17 32 43 2 131
6 128 4.20 32 43 2 183
7 128 4.20 32 43 2 79
8 128 4.24 32 43 2 76
9 128 4.27 32 43 2 120
10 144 4.33 36 48 3 93
11 144 4.33 36 48 3 233
12 160 4.03 40 53 4 101
13 160 4.11 40 53 4 80
14 160 4.14 40 53 4 164
15 160 4.14 40 53 4 233
16 160 4.16 40 53 4 212
17 160 4.22 40 53 4 221
18 176 4.28 44 58 5 150
19 176 4.50 44 58 5 169
20 192 4.08 48 63 6 232
153
7 Probabilistic MetaLevel Control
Table 7.2: Average amount of effort spent managing plans under various probabilistic metalevel
control strategies.
PADS replanning num num update
time time replans messages
controlbaseline 0 5.9 40.8 114.2
controlalways 0 17.4 117.8 117.8
controllikely 1.7 1.3 10.2 149.0
Agent threads were distributed among 10 3.6 GHz Intel computers, each with 2 GB RAM. For
each problem and condition combination, we ran 10 simulations, each with their own stochasti
cally chosen durations and outcomes. To ensure fairness across conditions, these simulations were
purposefully seeded such that the same set of durations and outcomes were executed by each of the
conditions of the experiment.
Table 7.2 shows how much effort was spent replanning for each of the conditions. It speciﬁes
the average time spent on PADS by each agent updating proﬁles (which is, of course, nonzero only
for the controllikely case). It also shows the average number of replans performed by each agent,
as well as the number of update messages received; recall that agents decide whether to replan in
response to update messages (Section 7.1.3). The table shows that the controlalways condition
performs almost three times as many replans as does the controlbaseline condition, which replans
in response to 36% of update messages received. The controllikely condition, on the other hand,
performs roughly a quarter of the number of replans as the controlbaseline condition, and replans
in response to 7% of update messages received, reducing the plan management time by 49%. Note
how PADS causes more messages to be passed during execution, as probability proﬁles tend to
change more frequently than their deterministic counterparts. Despite the overhead that PADS
incurs, the overall effect of the probabilistic metalevel control algorithm is a signiﬁcant decrease
in computational effort.
Figure 7.5 shows the average utility earned by each condition. The utility is expressed as the
fraction of the omniscient utility earned during execution. We deﬁne the omniscient utility for a
given simulation to be the utility produced by a deterministic, centralized version of the planner on
omniscient versions of the problems, where the actual method durations and outcomes generated
by the simulator are given as inputs. As the planner is heuristic, the fraction of omniscient utility
earned can be greater than one. Problem numbers correspond to the numbers in Table 7.1, and the
problems are sorted by how well the controlbaseline condition performs; this roughly correlates
154
7.4 Experiments and Results
0
0.1
0.2
0.3
0.4
0.3
0.6
0.7
0.8
0.9
1
1.1
19 10 4 16 18 9 8 13 3 11 3 2 20 6 12 17 1 14 7 13
!
"
#
$
!
%
#
&
'
$
(
'
(
$
)
(
*
&
(
+
&
(
,
*

.
/

#
*
0
&
1
)
2

0
3
&
#
!
$
*
#
4
&
'$(52#,&
conLrolbasellne
conLrolalwavs
conLrolllkelv
Figure 7.5: Average proportion of omniscient utility earned for various probabilistic metalevel
control strategies.
to sorting by increasing problem difﬁculty.
An important feature of this graph is the difference between controlbaseline and control
always. We had expected controlalways to always outperform the controlbaseline condition.
Although the two conditions are not signiﬁcantly different, their behavior disproves our conjecture:
controlalways does outperform controlbaseline on some of the problems, but there are several in
which it performs worse than the controlbaseline condition. This is, in part, due to controlalways
replanning at inopportune times and missing the opportunity to respond to an important update
(such as an important, remote method failing), as the system is unresponsive to updates during
replans. On average, in fact, controlalways earns slightly less utility than the controlbaseline
condition, although not signiﬁcantly so.
In contrast, the controllikely condition slightly outperforms the controlbaseline condition with
respect to utility, although again without a signiﬁcant difference. These results establish the control
likely condition as the best option. By using probabilistic metalevel control, deliberation during
execution is reduced by 49%, while still maintaining highutility schedules.
Our second experiment tested probabilistic metalevel control in conjunction with probabilistic
schedule strengthening. For each problem in the generated test suite, we ran simulations executing
the problem under a variety of control algorithms:
1. PPMalways  replanning and strengthening in response to every update.
155
7 Probabilistic MetaLevel Control
2. PPMchange  replanning and strengthening whenever anything in the local schedule’s de
terministic state changes, or when a remote maxleaf task’s deterministic state changes. Oth
erwise, strengthening (without replanning) when anything in the probabilistic state changes.
3. PPMbaseline  replanning and strengthening whenever the baseline, deterministic plan man
agement system chooses to replan.
4. PPMlikely  replanning and strengthening, or only strengthening, according to the proba
bilistic metalevel control strategy.
5. PPMinterval  replanning and strengthening, or only strengthening, at ﬁxed intervals, with
roughly the same frequency as PPMlikely.
6. PPMrandom  replanning and strengthening, or only strengthening, at random times, with
roughly the same frequency as PPMlikely. In this condition, a replan and strengthen were
programmed to occur automatically if a replan or strengthen had not happened in 2.5 times
the expected interval between replans.
The PPMalways and PPMchange conditions were included to provide approximate upper
bounds for how much replanning and strengthening could be supported during execution. We
conjectured that PPMlikely would have a considerable drop in computational effort as compared
to these conditions, while meeting or exceeding their average utility; after seeing the above results
for controlalways, we expected PPMalways and PPMchange to be very adversely affected by
their large computational demands, which far exceed those of controlalways.
PPMinterval and PPMrandom, on the other hand, were included to test our claim that PPM
likely intelligently limits replanning by correctly identifying when to replan. Each condition re
planned and strengthened, or only strengthened, with roughly the same frequency as PPMlikely,
but not necessarily at the same times. The frequency was determined with respect to the number of
updates received. For each simulation performed by PPMlikely, we calculated the ratio of replans
and strengthens performed to the number of updates received. Separate ratios were calculated for
replanning and strengthening together, and for strengthening only. Then, for each problem and
each agent, we conservatively took the maximum of these ratios. We then passed the frequencies
to the appropriate agents during execution so they could use them in their control algorithm. The
design decision to base the frequency of replanning on updates instead of time was made with the
belief that this would prove to be the stronger strategy, as replans and strengthens would be more
likely to be performed at highly dynamic times.
Agent threads were distributed among 10 3.6 GHz Intel computers, each with 2 GB RAM. For
each problem and condition combination, we ran 10 simulations, each with their own stochasti
156
7.4 Experiments and Results
Table 7.3: Average amount of effort spent managing plans under various probabilistic metalevel
control strategies using strengthening.
PADS strengthen replan num only num replans
time time time strengthens and strengthens
PPMalways 2.3 79.1 31.9 0 140.3
PPMchange 2.1 55.1 22.1 4.0 66.1
PPMbaseline 2.1 38.8 13.7 0 64.9
PPMlikely 1.9 20.5 3.2 10.4 9.4
PPMinterval 2.2 25.7 4.8 13.9 13.4
PPMrandom 2.2 26.5 4.6 13.8 14.0
cally chosen durations and outcomes. To ensure fairness across conditions, these simulations were
purposefully seeded such that the same set of durations and outcomes were executed by each of the
conditions of the experiments.
Table 7.3 shows how much effort was spent during plan management for each of the conditions.
The table shows the average time spent by PADS updating proﬁles, strengthening schedules, and
replanning by each agent. It also shows the average number of schedule strengthenings, and replans
and strengthenings, each agent performed; note that the column showing the number of schedule
strengthenings speciﬁes the number of times the classiﬁer returned “strengthen only,” and does not
include strengthening following a replan.
The results are what we expected. Clearly, PPMalways provides an upper bound on computa
tion time, spending a total of 113.3 seconds managing execution. The average time spent on each
replan for this condition, however, is much lower that the average for all the other conditions; this
is because it replans and strengthens even when nothing in the deterministic or probabilistic state
has changed, and such replans and strengthens tend to be faster than replans and strengthens which
actually modify the schedule. PPMchange is next in terms of computation time (79.3 seconds),
followed by PPMbaseline (54.6 seconds). PPMlikely drastically reduces planning effort com
pared to these conditions, with a total management time of 25.6 seconds. Note that PPMinterval
and PPMrandom end up with slightly greater numbers of replans and strengthens than PPMlikely.
This is due to their conservative implementation, which uses the maximum of the replanning and
strengthening ratios from PPMlikely, described above.
Figure 7.6 shows the average utility earned by each condition. The utility is expressed as the
fraction of the omniscient utility earned during execution. We deﬁne the omniscient utility for a
157
7 Probabilistic MetaLevel Control
given simulation to be the utility produced by a deterministic, centralized version of the planner on
omniscient versions of the problems, where the actual method durations and outcomes generated
by the simulator are given as inputs. As the planner is heuristic, the fraction of omniscient utility
earned can be greater than one. Problem numbers correspond to the numbers in Table 7.1, and the
problems are sorted by how well the PPMalways condition did.
The conditions PPMalways, PPMchange, PPMbaseline and PPMlikely are very close for all
of the problems, with PPMlikely ending up earning only a little more utility, on average. Surpris
ingly, and in contrast to the controlalways condition (Figure 7.5), PPMalways and PPMchange
did not seem to be as adversely affected by their high computational load. Looking into this further,
we discovered that even though these conditions do suffer from lack of responsiveness, utility in
general does not suffer very much because their schedules are strengthened, protecting themagainst
execution outcomes that would otherwise have led to failure in the face of their unresponsiveness.
This highlights one of the beneﬁts of probabilistic schedule strengthening: that it protects against
agents not being responsive. Using both replanning and probabilistic schedule strengthening to
gether truly does allow agents to take advantage of opportunities that may arise during execution
while also protecting their schedules against unexpected or undesirable outcomes.
Along these lines, however, we do expect that if we lessened the amount of seconds that pass
per tick (currently, that value is 5), PPMalways, and perhaps even PPMbaseline, would also
be adversely affected by their high computational loads and probabilistic schedule strengthening
would not be able to sufﬁciently protect them from the resulting utility loss. We expect, therefore,
that in such a situation probabilistic metalevel control would become a necessity for executing
schedules with high utility, instead of only reducing the amount of computation.
Using a repeated measures ANOVA, no combination of PPMalways, PPMchange, PPM
baseline or PPMlikely is statistically signiﬁcantly different. Each of these conditions, however,
is signiﬁcantly better than both PPMinterval and PPMrandom with p < 0.001; similarly, PPM
interval and PPMrandom are signiﬁcantly different from each other with the less conﬁdent p value
of p < 0.09, with PPMinterval being the worse condition. Notice that PPMlikely is signiﬁcantly
better than PPMinterval and PPMrandom despite replanning and strengthening fewer times, fur
thering our claim of PPMlikely’s dominance over these strategies.
These results conﬁrm that PPMlikely does, in fact, intelligently limit replanning and strength
ening, eliminating likely wasteful and unnecessary planning effort during execution. The results
of PPMinterval and PPMrandom, in addition, conﬁrm our claim that PPMlikely intelligently
chooses not only how much to replan, but also when to replan: although they performed slightly
more replans and strengthens on average than PPMlikely, on a number of problems their perfor
mance was much worse due to their unfocused manner of replanning and strengthening. Overall,
158
7.4 Experiments and Results
0
0.1
0.2
0.3
0.4
0.3
0.6
0.7
0.8
0.9
1
1.1
13 19 14 9 12 17 4 6 3 11 18 16 20 1 2 10 13 3 7 8
!
"
#
$
!
%
#
&
'
$
(
'
(
$
)
(
*
&
(
+
&
(
,
*

.
/

#
*
0
&
&
1
)
2

0
3
&
#
!
$
*
#
4
&
'$(52#,&
ÞÞMalwavs
ÞÞMchanae
ÞÞMbasellne
ÞÞMllkelv
ÞÞMlnLerval
ÞÞMrandom
Figure 7.6: Average proportion of omniscient utility earned for various probabilistic metalevel
control strategies.
the beneﬁt of probabilistic metalevel control is clear. When PPMlikely is used, the amount of time
spent replanning and schedule strengthening is decreased by 79% as compared to PPMalways, and
55% as compared to PPMbaseline, with no signiﬁcant difference in utility earned.
Analysis of Probabilistic Plan Management
To highlight the overall beneﬁt of PPM, we ran an experiment comparing PPMlikely with four
other conditions:
1. controlbaseline  replanning whenever the baseline, deterministic system would have.
2. controllikely  replanning according to the probabilistic metalevel control strategy.
3. dsbaseline  replanning and deterministically strengthening whenever the baseline, deter
ministic system would replan.
4. PPMbaseline  replanning and (probabilistically) strengthening whenever the baseline, de
terministic plan management system chooses to replan.
159
7 Probabilistic MetaLevel Control
The ﬁrst, second and fourth conditions are straightforward, and are repeated from the above
experiments. The condition dsbaseline is similar to controlbaseline, with the additional step
of following every replan with a less informed, deterministic version of schedule strengthening
[Gallagher, 2009]. Deterministic strengthening makes two types of modiﬁcations to the schedule
in order to protect it from activity failure:
• Schedule a backup method for methods under a maxleaf that either have a nonzero failure
likelihood or whose est plus their maximum duration is greater than their deadline. The
backup method must have an average utility of at least half the average utility of the original
method.
• Swap in a shorterduration sibling for methods under a maxleaf whose est plus their max
imum duration is greater than the lst of the subsequently scheduled method (on the same
local schedule). The sibling method’s average utility must be at least half the average utility
of the original method.
As with the previous experiment, agent threads were distributed among 10 3.6 GHz Intel com
puters, each with 2 GB RAM. For each problem and condition combination, we ran 10 simulations,
each with their own stochastically chosen durations and outcomes. To ensure fairness across condi
tions, these simulations were purposefully seeded such that the same set of durations and outcomes
were executed by each of the conditions of the experiments.
Table 7.4 shows the average amount of time spent managing plans under the various strategies.
It also shows the average number of schedule strengthenings, and replans and strengthenings, each
agent performed on average; note that the column showing number of schedule strengthens spec
iﬁes the number of times the classiﬁer returned “strengthen only,” and does not include strength
ening following a replan. For the deterministic conditions that do not strengthen, the replans and
strengthens column denotes the average number of replans performed.
In these results, PPMlikely performed far fewer replans/strengthens than the deterministic
baseline condition (19.8 versus 40.8 for controlbaseline); however, the time spent managing plans
was much higher. This is because of a higher computational cost to probabilistic schedule strength
ening than to replanning for the problems of this test suite. In the previous test suite, agents spent
an average of 47.7 seconds replanning in the control condition; for this test suite, however, agents
in the control condition spent an average of 6.1 seconds replanning. This large difference is due to
the simpliﬁed activity network structure of the problems we utilized for this test suite (i.e., no dis
ablers or soft NLEs, etc.). We anticipate that the savings PPMlikely provides would be much more
signiﬁcant for more complicated problems. In contrast, in more complicated problem structures,
schedule strengthening time would not increase by as much, as its complexity is more dependent
160
7.4 Experiments and Results
Table 7.4: Average amount of effort spent managing plans under various probabilistic plan man
agement approaches.
PADS strengthen replan num only num replans
time time time strengthens (and strengthens)
controlbaseline 0 0 5.9 0 40.8
controllikely 1.7 0 1.3 0 10.2
dsbaseline 0 2.8 5.9 0 51.1
PPMbaseline 2.1 38.8 13.7 0 64.9
PPMlikely 1.9 20.5 3.2 10.4 9.4
on the number of methods than it is on the complexity of the activity network. Still, even in this
domain, PADSlikely incurs an average overhead of only 17.8 seconds over controlbaseline – a
minute overhead when compared to an average execution time of 802 seconds.
Figure 7.7 shows the average utility earned by each problem. The utility is expressed as the
fraction of the omniscient utility earned during execution. We deﬁne the omniscient utility for a
given simulation to be the utility produced by a deterministic, centralized version of the planner on
omniscient versions of the problems, where the actual method durations and outcomes generated
by the simulator are given as inputs. As the planner is heuristic, the fraction of omniscient utility
earned can be greater than one. Problem numbers correspond to the numbers in Table 7.1, and the
problems are sorted by how well the control condition did; this roughly correlates to sorting the
problems by increasing problem difﬁculty.
The deterministic strengthening condition, dsbaseline, performs on average better than the
controlbaseline condition. Ultimately, though, it fails to match PPMbaseline or PPMlikely in
utility, because its limited analysis does not identify all cases where failures might occur and be
cause it does not determine whether its modiﬁcations ultimately help the schedule.
PPMlikely clearly outperforms, on average, all of the deterministic conditions. It does under
perform dsbaseline for problem 8; the utility difference, however, was less than 5 utility points.
Figure 7.8 explicitly shows this, specifying the absolute average utility earned by each condition.
On average, PPMlikely earned 13.6% more utility than the controlbaseline condition, and 8.1%
more than the dsbaseline condition, and is the clear winner.
Using a repeated measures ANOVA, there is no statistically signiﬁcant difference between
PPMlikely and PPMbaseline; these conditions, however, are signiﬁcantly different from each of
161
7 Probabilistic MetaLevel Control
0
0.1
0.2
0.3
0.4
0.3
0.6
0.7
0.8
0.9
1
1.1
19 10 4 16 18 9 8 13 3 11 3 2 20 6 12 17 1 14 7 13
!
"
#
$
!
%
#
&
'
$
(
'
(
$
)
(
*
&
(
+
&
(
,
*

.
/

#
*
0
&
1
)
2

0
3
&
#
!
$
*
#
4
&
'$(52#,&
conLrolbasellne
conLrolllkelv
dsbasellne
ÞÞMbasellne
ÞÞMllkelv
Figure 7.7: Average proportion of omniscient utility earned for various plan management ap
proaches.
!"
#!"
$!!"
$#!"
%!!"
%#!"
&!!"
&#!"
'!!"
'#!"
#!!"
$(" $!" '" $)" $*" (" *" $#" &" $$" #" %" %!" )" $%" $+" $" $'" +" $&"
!
"
#
$
!
%
#
&
'
(
)
*
)
+
,
&
$./*#0&
,./012345617.6"
,./012178619"
:52345617.6"
;;<2345617.6"
;;<2178619"
Figure 7.8: Average utility earned for various plan management approaches.
162
7.4 Experiments and Results
Table 7.5: Average size of communication log ﬁle.
avg log size (MB)
controlbaseline 0.29
controlalways 0.31
controllikely 0.71
PPMalways 0.92
PPMchange 0.90
PPMbaseline 0.96
PPMlikely 0.79
the other conditions with p < 0.001. The controlbaseline and controllikely conditions are not
signiﬁcantly different. The dsbaseline condition is signiﬁcantly different from each of the other
conditions with p < 0.001.
In distributed scenarios, PPM increases the amount of communication between agents, as
agents must communicate more information about each activity. Communicating such informa
tion also increases the frequency of communication, as probability proﬁles change more frequently
than deterministic information (as was suggested by Table 7.2). Additionally, communication hap
pens more often because schedules are changing more frequently due to the added step of schedule
strengthening. Table 7.5 shows the average amount of communication that occurred for each agent,
expressed as the average ﬁle size (in megabytes) of a text log ﬁle that recorded all outgoing mes
sages across agents.
The table suggests that the main source of communication increase is communicating the prob
ability proﬁles, not the added step of schedule strengthening. We determine this by comparing the
large difference in the amount of communication of controlbaseline versus PPMbaseline (0.29MB
versus 0.96MB), with the small difference between the amount of communication for controllikely
versus PPMlikely (0.71MB versus 0.79MB). The difference between the PPMlikely and PPM
always conditions also shows that probabilistic metalevel control is able to decrease communica
tion somewhat by limiting the frequency of replanning and probabilistic schedule strengthening.
In these experiments, the level of communication was within reasonable range for our available
bandwidth. If not, communication could be explicitly limited as described in Sections 5.3.5 and
8.1.1.
Overall, these results highlight the beneﬁt of probabilistic plan management. When proba
bilistic metalevel control is used alone, it can greatly decrease the computational costs of plan
163
7 Probabilistic MetaLevel Control
management for this domain. PPM, including both probabilistic schedule strengthening and prob
abilistic metalevel control, is capable of signiﬁcantly increasing the utility earned at a modest
increase in computational effort for this domain, and achieves its goal effectively managing plans
during execution in a scalable way.
164
Chapter 8
Future Work and Conclusions
8.1 Future Work
Our current system of probabilistic plan management has been shown to be effective in the man
agement of the execution of plans for uncertain, temporal domains. There are other ways to use
the probabilistic information provided by PADS, however, that remain to be explored. We discuss
below some main areas in which we believe it would be beneﬁcial to extend PPM in the future.
8.1.1 MetaLevel Control Extensions
Although we do not explicitly describe it as such, probabilistic schedule strengthening is an any
time algorithm where the beneﬁt of the initial effort is high and tapers off as the algorithm continues
due to the prioritization of strategies and activities by expected utility loss. In this thesis, we
do not limit its computation time, and let it run until completion; we found this to be effective
for our domains. If it is necessary to limit schedule strengthening computation, we previously
suggested imposing a ﬁxed threshold on the po values of activities considered (Section 6.2.1).
A more dynamic approach would be to utilize a metalevel control technique (e.g., [Russell and
Wefald, 1991, Hansen and Zilberstein, 2001b]) to adaptively decide whether further strengthening
was worth the effort.
As implemented, our metalevel control strategies ignore the cost of computation, and focus
instead on its possible beneﬁt. This works well to lessen computation, and is sufﬁcient for domains,
such as the multiagent domain we present, where deliberation and execution occur at the same
time. It would be interesting to see, however, whether our approach could be combined with
165
8 Future Work and Conclusions
one such as in [Raja and Lesser, 2007, Musliner et al., 2003] to fully integrate both domain and
computation probabilistic models into a metalevel control strategy. This would involve using our
probabilistic state information as part of an MDP model that chooses deliberative actions based on
both the state of the current schedule and performance proﬁles of deliberation.
We also did not explicitly consider the cost of communication, as we assume sufﬁcient band
width in our multiagent domain. Although we propose a simple way to limit communication, by
only communicating changes with a minimum change in probability proﬁle, in situations with a
ﬁrm cap on bandwidth more thorough deliberation may be useful. One example strategy would
be to communicate only probability changes that are likely to cause a planning action in another
agent; this would involve a learning process similar to how we learn when an agent should replan
based on the updates it receives. An alternate strategy would be to reason about which information
is most important to pass on at a given time, and which can be reserved for later.
Similarly, reasoning could be performed to decide what level of coordination between agents
is useful, and when it would be useful [Rubinstein et al., 2010, Raja et al., 2010]. For example, if
one agent is unable to improve its expected execution outcome via probabilistic schedule strength
ening, in oversubscribed settings it may be useful to coordinate with other agents to change which
portions of the task tree are scheduled, with the goal of ﬁnding tasks to execute that have a better
expected outcome. This coordinated probabilistic schedule strengthening could be achieved by
agents communicating with each other how they expect their schedules to be impacted by hypo
thetical changes, with the group as a whole committing to the change which results in the highest
overall expected utility.
8.1.2 Further Integration with Planning
An interesting area of exploration would be to integrate PPM with the deterministic planner being
used, instead of layering PPM on top of the plans the planner produces. Although many deter
ministic planners have been augmented to handle decisiontheoretic planning (e.g., [Blythe, 1999,
Kushmerick et al., 1995, Blythe, 1998]), most have done so at the expense of tractability. In accor
dance with the philosophy of this approach, however, PADS could be integrated with the planning
process at a lower level, improving the search without fully integrating probability information into
the planner. A simple example of this in the singleagent domain using PADS to determine which
child of a ONE task should be scheduled to maximize expected utility. PADS could also help to
guide the search at key points by using the expected utility loss and expected utility gain heuristics
to identify activities that are proﬁtable to include in the schedule. For example, in the multiagent
domain, when choosing between two children of an OR max task, the eug heuristic could be used
166
8.1 Future Work
to suggest which would have the highest positive impact on taskgroup utility.
8.1.3 Demonstration on Other Domains / Planners
One extension of this work that we would like to see is applying it to other planners in order to
demonstrate its principles in another setting. For example, we would like to use our approach in
conjunction with FFReplan [Yoon et al., 2007] in order see what improvement could be garnered.
In the problems FFReplan considers, uncertainty is in the effects of actions. In its determinization
of these problems, FFReplan uses an “alloutcomes” technique that expands a probabilistic action
such that there is one deterministic action available to the planner for each possible probabilistic
effect of the original action. Planning is then done based on these deterministic actions. One
possible way FFReplan could beneﬁt from our approach is to analyze the current deterministic
plan, ﬁnd a likely point of failure, and then try switching to alternate deterministic actions to try to
bolster the probability of success.
8.1.4 HumanRobot Teams
Although, as is, PPM could be used to manage the execution of plans for teams of both human and
robotic agents, it could also be extended to speciﬁcally tailor to those types of situations. First,
in many situations it would be appropriate to include humans’ preferences into the plan objective,
complicating the analysis. For example, humans may like their job more if they are frequently
assigned tasks that they enjoy performing. Other modiﬁcations to the objective function include
discounting utility earned later in the plan or discounting utility once a certain amount has been
earned. This would reﬂect, for example, a human getting tired and assigning a higher cost to
execution as time goes on. These more complex plan objective functions would make schedule
strengthening more difﬁcult since it is harder to predict what types of actions will bolster low
utility portions of the plan.
Additionally, including humans in the loop introduces the possibility of having plan stability
as a component of the objective function [van der Krogt and de Weerdt, 2005]. Ideally, in human
robot scenarios, plans should remain relatively stable throughout execution in order to facilitate
continued human understanding of the plan as well as agent predictability. Instead of using meta
level control to limit only the frequency of replanning, which implicitly encourages plan stability,
it could be used to explicitly encourage plan stability and limit the number of times (and the degree
to which) the plans change overall. It could also be used to choose the best times to replan; an
example of this is discouraging replanning a human’s schedule at times when the human is likely
167
8 Future Work and Conclusions
mentally overloaded, and favoring replanning at a time when the human has the capacity to fully
process and understand the resulting changes.
8.1.5 NonCooperative MultiAgent Scenarios
Our multiagent work assumes that agents are cooperative and work together to maximize joint
plan utility. Breaking that assumption entails rethinking much of the PPM strategies. The dis
tributed PADS algorithm and basic communication protocol, for example, would need to change as
agents may not have incentive to share all of their probability proﬁle information. Instead, agents
may share information or query other agents for information if they feel that it would raise their
expected utility [Gmytrasiewicz and Durfee, 2001]. Negotiation would also be a possibility for
an information sharing protocol. Ideally, agents would track not only the expected beneﬁt from
receiving information, but also whether sharing information would have an expected cost; such
metrics could be used to decide whether an information trade was worth while.
Similar ideas apply to probabilistic schedule strengthening. Noncooperative, selfinterested
agents have little incentive, for example, to schedule a backup activity for other agents. Again,
negotiation could be one way of achieving coordination, based on agents’ projected expected gain
in utility.
8.2 Conclusions
This thesis has described an approach to planning and execution for uncertain domains called
Probabilistic Plan Management (PPM). The thesis claims that explicitly analyzing the probabilistic
state of deterministic plans built for uncertain domains provides important information that can be
leveraged in a scalable way to effectively manage plan execution, providing some of the beneﬁts of
nondeterministic planning while avoiding much of its computational overhead. PPM relies on a
probabilistic analysis of deterministic schedules to inform probabilistic schedule strengthening and
to act as the basis of a probabilistic metalevel control strategy. We support this thesis by developing
algorithms for each of the three components of the approach, in the contexts of a singleagent
domain with temporal constraints and activity durational uncertainty and a multiagent domain
with a rich set of activity constraints and sources of uncertainty.
We developed PADS, a general way to probabilistically analyze deterministic schedules. PADS
assumes that plan execution has a speciﬁc objective that can be expressed in terms of utility and that
the domain has a known uncertainty model. With this information, PADS analyzes deterministic
schedules to calculate their expected utility. It explicitly considers the uncertainty model of the do
168
8.2 Conclusions
main and propagates probability information throughout the schedule. A novelty of our algorithm
is how quickly PADS can be calculated and its results utilized to make plan improvements, due to
analytically calculating the uncertainty of a problem in the context of a given schedule. For large,
1025 agent problems, schedule expected utility can be globally calculated in an average of 0.58
seconds.
We also developed a way to strengthen deterministic schedules based on the probabilistic anal
ysis, making them more robust to the uncertainty of execution. This step is best done in conjunction
with replanning during execution to ensure that robust, highutility schedules are always being ex
ecuted. Probabilistic schedule strengthening is also best done when the structure of the domain
can be exploited to identify classes of strengthening actions that may be helpful in different situ
ations. Through experimentation, we found strengthened schedules to earn 36% more utility, on
average, than their initial deterministic counterparts, when both were executed with no replanning
allowed. We also found that even in cases where it increases computation time, probabilistic sched
ule strengthening makes up for the lack of agent responsiveness by ensuring that agents’ schedules
are strengthened and robust to unexpected outcomes that may occur while an agent is tied up re
planning or strengthening.
We investigated the use of a PADSdriven probabilistic metalevel control strategy, which ef
fectively controls planning actions during execution by performing them only when they would
likely be beneﬁcial. It identiﬁes cases when there are likely opportunities to exploit or failures
to avoid, and replans only at those times. Probabilistic metalevel control reduces unnecessary
replanning, and improves agent responsiveness during execution. Experiments in the multiagent
domain showed that this strategy can reduce plan management effort by an average of 49% over
approaches driven by deterministic heuristics, with no signiﬁcant change in utility earned.
As part of this thesis, we performed experiments to test the efﬁcacy of our overall PPM ap
proach. For a singleagent domain, PPM afforded a 23% reduction in schedule execution time
(where execution time is deﬁned as the schedule makespan plus the time spent replanning during
execution). For a multiagent, distributed domain, PPM using probabilistic schedule strengthening
in conjunction with replanning and probabilistic metalevel control resulted in 14% more utility
earned on average, with only a small added overhead.
Overall, probabilistic plan management is able to take advantage of the strengths of non
deterministic planning while, in large part, avoiding its computational burden. It is able to sig
niﬁcantly improve the expected outcome of execution while also reasoning about the probabilistic
state of the schedule to reduce the computational overhead of plan management. We conclude
that probabilistic plan management, including both probabilistic schedule strengthening and prob
abilistic metalevel control, achieves its goal of being able to leverage the probabilistic information
169
8 Future Work and Conclusions
provided by PADS to effectively manage plan execution in a scalable way.
170
Chapter 9
Bibliography
George Alexander, Anita Raja, and David J. Musliner. Controlling deliberation in a markov deci
sion processbased agent. In Seventh International Joint Conference on Autonomous Agents and
MultiAgent Systems AAMAS ’08, 2008. 2.3
George Alexander, Anita Raja, and David Musliner. Controlling deliberation in coordinators. In
Michael T. Cox and Anita Raja, editors, Metareasoning: Thinking about thinking. MIT Press,
2010. 2.3
N. Fazil Ayan, Ugur Kuter, Fusun Yaman, and Robert P. Goldman. HOTRiDE: Hierarchical ordered
task replanning in dynamic environments. In Proceedings of the 3rd Workshop on Planning and
Plan Execution for RealWorld Systems (held in conjunction with ICAPS 2007), September 2007.
2.2.2
Donald R. Barr and E. Todd Sherrill. Mean and variance of truncated normal distributions. The
American Statistician, 53(4):357–361, November 1999. 3.3.1, 4.1
J. C. Beck and N. Wilson. Proactive algorithms for job shop scheduling with probabilistic durations.
Journal of Artiﬁcial Intelligence Research, 28:183–232, 2007. 1, 2.1.3
Piergiorgio Bertoli, Alessandro Cimatti, Marco Roverie, and Paolo Traverso. Planning in nonde
terministic domains under partial observability via symbolic model checking. In Proceedings of
IJCAI ’01, 2001. 2.1.2
Lars Blackmore, Askar Bektassov, Masahiro Ono, and Brian C. Williams. Robust, optimal predic
tive control of jump markov linear systems using particles. Hybrid Systems: Computation and
Control, 2007. 2.1.3
Avrim Blum and John Langford. Probabilistic planning in the graphplan framework. In Proceed
ings of the Fifth European Conference on Planning (ECP ’99), 1999. 2.1.2
171
9 Bibliography
Jim Blythe. Decisiontheoretic planning. AI Magazine, 20(2):37–54, 1999. 8.1.2
Jim Blythe. Planning under Uncertainty in Dynamic Domains. PhD thesis, Carnegie Mellon
University, 1998. 2.1.2, 8.1.2
Jim Blythe and Manuela Veloso. Analogical replan for efﬁcient conditional planning. In Proceed
ings of the Fourteenth International Conference on Artiﬁcial Intelligence (AAAI ’97), Menlo
Park, California, 1997. 2.1.2
Mark Boddy and Thomas Dean. Decisiontheoretic deliberation scheduling for problem solving in
timeconstrained environments. Artiﬁcial Intelligence, 67(2):245–286, 1994. 2.3
Mark Boddy, Bryan Horling, John Phelps, Robert Goldman, Regis Vincent, C. Long, and Bob
Kohout. C TAEMS language speciﬁcation v. 1.06. October 2005. 3.3.2
Justin A. Boyan and Michael L. Littman. Exact solutions to timedependent mdps. In Advances in
Neural Information Processing Systems (NIPS), 2000. 2.1.1
L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996. 3.4
Daniel Bryce, Mausam, and Sungwook Yoon. Workshop on a reality check for planning and
scheduling under uncertainty (held in conjunction with ICAPS ’08). 2008. URL http://
www.ai.sri.com/˜bryce/ICAPS08workshop.html. 1
Olivier Buffet and Douglas Aberdeen. The factored policygradient planner. Artiﬁcial Intelligence,
173(56):722–747, 2009. 2.1.1
Amedeo Cesta and Angelo Oddi. Gaining efﬁciency and ﬂexibility in the simple temporal problem.
In Proceedings of the Third International Workshop on Temporal Representation and Reasoning,
1996. 3.3.2
S. Chien, G. Rabideau, R. Knight, R. Sherwood, B. Engelhardt, D. Mutz, T. Estlin, B. Smith,
F. Fisher, T. Barrett, G. Stebbins, and D. Tran. Aspen  automated planning and scheduling for
space mission operations. In Space Ops, Toulouse, June 2000. URL http://citeseer.
ist.psu.edu/chien00aspen.html. 2.2.2, 2.3, 3.3.1
Alessandro Cimatti and Marco Roveri. Conformant planning via symbolic model checking. Jour
nal of Artiﬁcial Intelligence Research, 13:305–338, 2000. URL citeseer.ist.psu.edu/
cimatti00conformant.html. 2.1.3
Michael T. Cox. Metacognition in computation: A selected research review. Artiﬁcial Intelligence,
169(2):104–141, 2005. 2.3
172
9 Bibliography
Richard Dearden, Nicolas Meuleau, Sailesh Ramakrishnan, David E. Smith, and Rich Washington.
Incremental contingency planning. In Procceedings of ICAPS Workshop on Planning under
Uncertainty, 2003. URL citeseer.ist.psu.edu/dearden03incremental.html.
2.1.2
Rina Dechter, Itay Meiri, and Judea Pearl. Temporal constraint networks. Artiﬁcial Intelligence,
49:61–95, May 1991. 2.2.1, 3.2.2
Keith Decker. TAEMS: A framework for environment centered analysis & design of coordina
tion mechanisms. In G. O’Hare and N. Jennings, editors, Foundations of Distributed Artiﬁcial
Intelligence, chapter 16, pages 429–448. Wiley InterScience, 1996. 3.3.2
Denise Draper, Steve Hanks, and Daniel Weld. Probabilistic planning with information gathering
and contingent execution. In Proceedings of the Second International Conference on Artiﬁcial
Intelligence Planning Systems (AIPS), Menlo Park, California, 1994. 2.1.2
Mark Drummond, John Bresina, and Keith Swanson. Justincase scheduling. In Proceedings of
the Twelfth National Conference on Artiﬁcial Intelligence, 1994. URL citeseer.ist.psu.
edu/drummond94justcase.html. 2.1.2
Kutluhan Erol, James Hendler, and Dana S. Nau. Htn planning: Complexity and expressivity. In
Proceedings of AAAI94, 1994. 3.2.1
Tara Estlin, Rich Volpe, Issa Nesnas, Darren Mutz, Forest Fisher, Barbara Engelhardt, and Steve
Chien. Decisionmaking in a robotic architecture for autonomy. In Proceedings of iSAIRAS
2001, Montreal, CA, June 2001. 2.3
Zhengzhu Feng, Richard Dearden, Nicolas Meuleau, and Richard Washington. Dynamic program
ming for structured continuous markov decision problems. In Proceedings of the Twentieth
Conference on Uncertainty in Artiﬁcial Intelligence (UAI ’04), 2004. 2.1.1
Janae Foss, Nulifer Onder, and David E. Smith. Preventing unrecoverable failures through precau
tionary planning. In Workshop on Moving Planning and Scheduling Systems into the Real World
(held in conjunction with ICAPS ’07), 2007. 2.1.2
Maria Fox, Richard Howey, and Derek Long. Exploration of the robustness of plans. In Proceed
ings of AAAI06, 2006. 2.1.3
Yoav Freund and Robert E. Schapire. A decisiontheoretic generalization of online learning and
an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997. 3.4
Christian Fritz and Shiela McIlraith. Computing robust plans in continuous domains. In Pro
ceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS),
2009. 2.1.3
173
9 Bibliography
Anthony Gallagher. Embracing Conﬂicts: Exploiting Inconsistencies in Distributed Schedules
using Simple Temporal Network Representation. PhD thesis, Carnegie Mellon University, 2009.
1.2, 1.2, 2.2.1, 4.3, 7.4.2
Anthony Gallagher, Terry L. Zimmerman, and Stephen F. Smith. Incremental scheduling to max
imize quality in a dynamic environment. In Proceedings of the International Conference on
Automated Planning and Scheduling (ICAPS ’06), 2006. 2.2.2
Alfonso Gerevini and Ivan Serina. Fast plan adaptation through planning graphs: Local and system
atic search techniques. In Proceedings of the International Conference on Artiﬁcial Intelligence
Planning Systems (AIPS ’00), 2000. 2.2.2
Piotr J. Gmytrasiewicz and Edmund H. Durfee. Rational communication in multiagent systems.
Autonomous Agents and MultiAgent Systems, 4(3):233–273, 2001. 8.1.5
Robert P. Goldman and Mark S. Boddy. Expressive planning and explicit knowledge. In AIPS,
1996. 2.1.3
Charles M. Grinstead and J. Laurie Snell. Introduction to Probability. American Mathematical
Society, 1997. 3.3.2
Eric A. Hansen and Shlomo Zilberstein. LAO*: A heuristic search algorithm that ﬁnds solutions
with loops. Artiﬁcial Intelligence, 129(12):35–62, 2001a. URL http://rbr.cs.umass.
edu/shlomo/papers/HZaij01b.html. 2.3
Eric A. Hansen and Shlomo Zilberstein. Monitoring and control of anytime algorithms: A dynamic
programming approach. Artiﬁcial Intelligence, 126(12):139–157, 2001b. URL http://rbr.
cs.umass.edu/shlomo/papers/HZaij01a.html. 2.3, 8.1.1
Laura M. Hiatt and Reid Simmons. Torwards probabilistic plan management. In Proceedings of
the 3rd Workshop on Planning and Plan Execution for RealWorld Systems (held in conjunction
with ICAPS 2007), Providence, RI, September 2007. URL http://www.cs.drexel.edu/
˜pmodi/papers/sultanikijcai07.pdf. 5.2, 7.2, 7.4.1
Laura M. Hiatt, Terry L. Zimmerman, Stephen F. Smith, and Reid Simmons. Reasoning about
executional uncertainty to strengthen schedules. In Proceedings of the Workshop on A Reality
Check for Planning and Scheduling Under Uncertainty (held in conjunction with ICAPS08),
2008. 2, 5.3, 6.1, 6.4.1
Laura M. Hiatt, Terry L. Zimmerman, Stephen F. Smith, and Reid Simmons. Strengthening sched
ules through uncertainty analysis. In Proceedings of the TwentyFirst International Joint Con
ference on Artiﬁcial Intelligence (IJCAI09), 2009. 2, 5.3, 6.1, 6.4.2
174
9 Bibliography
Jorg Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuris
tic search. Hournal of Artiﬁcial Intelligence Research, 14:263:302, 2001. 2.2
Ronald A. Howard. Dynamic Programming and Markov Processes. MIT Press, 1960. 2.1, 2.1.1
Ronald A. Howard. Dynamic Probabilistic Systems: SemiMarkov and Decision Processes. John
Wiley & Sons, New York, New York, 1971. 2.1.1
Sergio Jim´ enez, Andrew Coles, and Amanda Smith. Planning in probabilistic domains using a
deterministic numeric planner. In Proceedings of the 25th Workshop of the UK Planning and
Scheduling Special Interest Group (PlanSIG 2006), December 2006. 2.2.1
Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and acting in
partially observable stochastic domains. Artiﬁcial Intelligence, 101, 1998. 2.1.1
Nicholas Kushmerick, Steve Hanks, and Daniel Weld. An algorithm for probabilistic planning.
Artiﬁcial Intelligence, 76(12):239–286, 1995. 2.1.3, 8.1.2
P. Laborie and M. Ghallab. Planning with sharable resource constraints. In Proceed
ings of IJCAI05, Montreal, Canada, 1995. URL http://dli.iiit.ac.in/ijcai/
IJCAI95VOL2/PDF/081.pdf. 2.2.1
Hoong Chuin Lau, Thomas Ou, and Fei Xiao. Robust local search and its application to generating
robust local schedules. In Proceedings of the International Conference on Automated Planning
and Scheduling (ICAPS ’07), September 2007. 2.2.1
Thomas L´ eaut´ e and Brian C. Williams. Coordinating agile systems through the modelbased ex
ecution of temporal plans. In Proceedings of the Twentieth National Conference on Artif icial
Intelligence (AAAI ’05), 2005. 2.3
Lihong Li and Michael L. Littman. Lazy approximation for solving continuous ﬁnitehorizon
MDPs. In Proceedings of the Twentieth American National Conference on Artiﬁcial Intelligence
(AAAI ’05), 2005. 2.1.1
Iain Little and Sylvie Thi´ ebaux. Concurrent probabilistic planning in the graphplan framework. In
Proceedings of ICAPS ’06, 2006. 2.1.2
Iain Little, Douglas Aberdeen, and Sylvie Thi´ ebaux. Prottle: A probabilistic temporal planner.
In Proceedings of the Twentieth American National Conference on Artiﬁcial Intelligence (AAAI
’05), Pittsburgh, PA, July 2005. 2.1.2
A. K. Machworth. Consistency in networks of relations. Artiﬁcial Intelligence, 8(1):99–118, 1977.
3.3.1
175
9 Bibliography
Rajiv T. Maheswaran and Pedro Szekely. Criticality metrics for distributed plan and schedule
management. In Proceedings of the 18th International Conference on Automated Planning and
Scheduling (ICAPS), 2008. 2.2.2
Janusz Marecki, Sven Koenig, and Milind Tambe. A fast analytical algorithm for solving markov
decision processes with realvalued resources. In Proceedings of the International Joint Confer
ence on Artiﬁcial Intelligence (IJCAI), 2007. 2.1.1
Mausam and Daniel S. Weld. Probabilisitic temporal planning with uncertain durations. In Pro
ceedings of AAAI06, Boston, MA, July 2006. 2.1.1
Mausam and Daniel S. Weld. Concurrent probabilistic temporal planning. In Proceedings of the
International Conference on Automated Planning and Scheduling (ICAPS ’05), 2005. 2.1.1
K. N. McKay, T. E. Morton, P. Ramnath, and J. Wang. Aversion dynamics’ scheduling when the
system changes. Journal of Scheduling, 3(2), 2000. 1
Nicolas Meuleau and David E. Smith. Optimal limited contingency planning. In Workshop on
Planning under Uncertainty (held in conjunction with ICAPS ’03), 2003. 2.1.2
Paul Morris, Nicola Muscettola, and Thierry Vidal. Dynamic control of plans with temporal un
certainty. In Proceedings of IJCAI, pages 494–502, 2001. 1, 2.2.1
David J. Musliner, E, and Kang G. Shin. CIRCA: A cooperative intelligent realtime control
architecture. IEEE Transactions on Systems, Man, and Cybernetics, 23(6), 1993. 2.1.3
David J. Musliner, Robert P. Goldman, and Kurt D. Krebsbach. Deliberation scheduling strategies
for adaptive mission planning in realtime environments. In Third International Workshop on
Self Adaptive Software, 2003. 2.3, 8.1.1
David J. Musliner, Edmund H. Durfee, Jianhui Wu, Dmitri A. Dolgov, Robert P. Goldman, and
Mark S. Boddy. Coordinated plan management using multiagent MDPs. In Working Notes of
the AAAI Spring Symposium on Distributed Plan and Schedule Management, March 2006. 2.1.1
David J. Musliner, JimCarcioﬁni, Edmund H. Durfee, Jianhui Wu, Robert P. Goldman, and Mark S.
Boddy. Flexibly integrating deliberation and execution in decisiontheoretic agents. In Proceed
ings of the 3rd Workshop on Planning and Plan Execution for RealWorld Systems (held in
conjunction with ICAPS07), September 2007. 2.1.1
Nulifer Onder and Martha Pollack. Conditional, probabilistic planning: A unifying algorithm
and effective search control mechanisms. In Proceedings of the 16th National Conference on
Artiﬁcial Intelligence, 1999. 2.1.2
Nulifer Onder, Garrett C. Whelan, and Li Li. Engineering a conformant probabilistic planner.
Journal of Artiﬁcial Intelligence Research, 25:1–15, 2006. 2.1.3
176
9 Bibliography
Mark Alan Peot. DecisionTheoretic Planning. PhD thesis, Department of EngineeringEconomic
Systems, Stanford University, 1998. 2.1.2
Nicola Policella, Amedeo Cesta, Angelo Oddi, and Stephen F. Smith. From precedence constraint
posting to partial order schedules. A CSP approach to robust scheduling. AI Communications,
20(3):163–180, 2007. 1, 2.2.1
Nicola Policella, Amedeo Cesta, Angelo Oddi, and Stephen F. Smith. Solveandrobustify. Syn
thesizing partial order schedules by chaining. Journal of Scheduling, 12(3):299–314, June 2009.
3.2.2
Martha E. Pollack and John F. Horty. There’s more to life than making plans: Plan management in
dynamic, multiagent environments. AI Magazine, 20(4):71–84, 1999. 2.2.2
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993. 3.4
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986. 3.4
Anita Raja and Victor Lesser. A framework for metalevel control in multiagent systems. Au
tonomous Agents and MultiAgent Systems, 15(2):147–196, 2007. 1.3, 2.3, 8.1.1
Anita Raja, George Alexander, Victor Lesser, and Mike Krainin. Coordinating agents’ metalevel
control. In Michael T. Cox and Anita Raja, editors, Metareasoning: Thinking about thinking.
MIT Press, 2010. 8.1.1
Zachary Rubinstein, Stephen F. Smith, and Terry Zimmerman. The role of metareasoning in achiev
ing effective multiagent coordination. In Michael T. Cox and Anita Raja, editors, Metareason
ing: Thinking about thinking. MIT Press, 2010. 8.1.1
Stuart Russell and Eric Wefald. Principles of metareasoning. Artiﬁcial Intelligence, 49, 1991. 2.3,
8.1.1
Uwe Schoning. A probabilistic algorithm for kSAT and constraint satisfaction problems. In
Proceedings of the 40th IEEE Symposium on Foundations of Computer Science, 1999. 7.4.1
Christoph Schwindt. Generation of resourceconstrained project scheduling problems subject to
temporal constraints. Technical Report WIOR543, Institut fur Wirtschaftstheorie und Opera
tions Research, Universit at Karlsruhe, November 1998. 3.3.1, 5.4.1, 7.4.1
Brennan Sellner and Reid Simmons. Towards proactive replanning for multirobot teams. In
Proceedings of the 5th International Workshop on Planning and Scheduling in Space 2006,
Baltimore, MD, October 2006. 2.2.1, 3
Daniel E. Smith and Daniel S. Weld. Conformant graphplan. In Proceedings of AAAI98, Madison,
WI, 1998. 2.1.3
177
9 Bibliography
Stephen F. Smith, Anthony Gallagher, Terry Zimmerman, Laura Barbulescu, and Zachary Rubin
stein. Multiagent management of joint schedules. In Proceedings of the 2006 AAAI Spring
Symposium on Distributed Plan and Schedule Management, March 2006. 2.2.1
Stephen F. Smith, Anthony Gallagher, Terry Zimmerman, Laura Barbulescu, and Zachary Rubin
stein. Distributed management of ﬂexible times schedules. In Proceedings of AAMAS07, May
2007. 2.2.2, 3.2.2, 3.3.2, 5.3, 7.3.1
Austin Tate, Brian Drabble, and Richard Kirby. OPlan 2: an open architecture for command,
planning and control. In M. Fox. and M. Zweben, editors, Knowledge Based Scheduling. Morgan
Kaufmann, 1994. URL citeseer.ist.psu.edu/tate94oplan.html. 2.2.2
Roman van der Krogt and Mathijs de Weerdt. Plan repair as an extension of planning. In Pro
ceedings of the International Conference on Automated Planning and Scheduling (ICAPS ’05),
Monterey, California, 2005. 2.2.2, 8.1.4
Vincent Vidal and H´ ector Geffner. Branching and pruning: An optimal temporal POCL planner
based on constraint programming. Artiﬁcial Intelligence, 170(3):298–335, 2006. 2.2.1
Thomas Wagner, Anita Raja, and Victor Lesser. Modeling uncertainty and its implications to
sophisticated control in TAEMS agents. Autonomous Agents and MultiAgent Systems, 13(3):
235–292, 2006. 2.1.2
Ian Warﬁeld, Chad Hogg, Stephen LeeUrban, and H´ ector Mu˜ nozAvila. Adaptation of hierarchi
cial task network plans. In FLAIRS ’07, May 2007. 2.2.2
Daniel S. Weld, Corin R. Anderson, and David E. Smith. Extending graphplan to handle uncer
tainty and sensing actions. In Proceedings of the Sixteenth National Conference on Artiﬁcial
Intelligence (AAAI98), 1998. 2.1.2
David E. Wilkins and Marie desJardins. A call for knowledgebased planning. AI Magazine, 22:
99–115, 2000. 3.2.1
Ping Xuan and Victor Lesser. Incorporating uncertainty in agent commitments. In Proceedings of
ATAL99, 1999. 2.1.2
Sungwook Yoon, Alan Fern, and Robert Givan. FFReplan: A baseline for probabilistic plan
ning. In Proceedings of the Seventeenth International Conference on Automated Planning and
Scheduling (ICAPS ’07), Providence, RI, September 2007. 2.2, 8.1.3
H˚ akan L. S. Younes and David J. Musliner. Probabilistic plan veriﬁcation through acceptance
sampling. In Froduald Kabanza and Sylvie Thi´ ebaux, editors, Proceedings of the AIPS02
Workshop on Planning via Model Checking, pages 81–88, Toulouse, France, 2002. URL
http://www.tempastic.org/papers/aipsws2002.pdf. 2.1.3
178
9 Bibliography
H˚ akan L. S. Younes and Reid G. Simmons. Solving generalized semimarkov decision processes
using continuous phasetype distributions. In Proceedings of AAAI04, San Jose, CA, 2004a.
URL http://www.tempastic.org/papers/aaai2004.pdf. 2.1.1
H˚ akan L. S. Younes and Reid G. Simmons. VHPOP: Versatile heuristic partial order planner.
Journal of Artiﬁcial Intelligence Research, 20:405–430, 2003. 2.2.1
H˚ akan L. S. Younes and Reid G. Simmons. Policy generation for continuoustime stochastic do
mains with concurrency. In Proceedings of ICAPS ’04, 2004b. URL citeseer.ist.psu.
edu/younes04policy.html. 2.1.2
H˚ akan L. S. Younes, David J. Musliner, and Reid G. Simmons. A framework for planning in
continuoustime stochastic domains. In ICAPS, pages 195–204, 2003. 2.1.1
Ji Zhu, Saharon Rosset, Hui Zou, and Travor Hastie. Multiclass adaboost. Technical Report 430,
University of Michigan, 2005. http://wwwstat.stanford.edu/ hastie/Papers/samme.pdf. 3.4
179
9 Bibliography
180
Appendix A
Sample Problems
181
A Sample Problems
A.1 Sample MRCPSP/max problem in ASPEN syntax
activity a001 {
decompositions = a001_0 or a001_1;
};
activity a001_0 {
decompositions = a001_0_by_a012_0 or a001_0_by_a012_1;
};
activity a001_0_by_a012_0 {
duration = 10;
constraints = ((ends_before start of a012_0 by [1, 1000000]));
};
activity a001_0_by_a012_1 {
duration = 10;
constraints = ((ends_before start of a012_1 by [3, 1000000]));
};
activity a001_1 {
decompositions = a001_1_by_a012_0 or a001_1_by_a012_1;
};
activity a001_1_by_a012_0 {
duration = 4;
constraints = ((ends_before start of a012_0 by [1, 1000000]));
};
activity a001_1_by_a012_1 {
duration = 4;
constraints = ((ends_before start of a012_1 by [6, 1000000]));
};
activity a002 {
decompositions = a002_0 or a002_1;
};
activity a002_0 {
decompositions = a002_0_by_a015_0_a020_0 or a002_0_by_a015_0_a020_1 or
a002_0_by_a015_1_a020_0 or a002_0_by_a015_1_a020_1;
};
activity a002_0_by_a015_0_a020_0 {
duration = 7;
constraints = ((ends_before start of a015_0 by [19, 1000000]) and
(ends_before start of a020_0 by [10, 1000000]));
};
activity a002_0_by_a015_0_a020_1 {
duration = 7;
constraints = ((ends_before start of a015_0 by [34, 1000000]) and
(ends_before start of a020_1 by [1, 1000000]));
};
activity a002_0_by_a015_1_a020_0 {
182
A.1 Sample MRCPSP/max problem in ASPEN syntax
duration = 7;
constraints = ((ends_before start of a015_1 by [19, 1000000]) and
(ends_before start of a020_0 by [10, 1000000]));
};
activity a002_0_by_a015_1_a020_1 {
duration = 7;
constraints = ((ends_before start of a015_1 by [19, 1000000]) and
(ends_before start of a020_1 by [1, 1000000]));
};
activity a002_1 {
decompositions = a002_1_by_a015_0_a020_0 or a002_1_by_a015_0_a020_1 or
a002_1_by_a015_1_a020_0 or a002_1_by_a015_1_a020_1;
};
activity a002_1_by_a015_0_a020_0 {
duration = 2;
constraints = ((ends_before start of a015_0 by [2, 1000000]) and
(ends_before start of a020_0 by [1, 1000000]));
};
activity a002_1_by_a015_0_a020_1 {
duration = 2;
constraints = ((ends_before start of a015_0 by [2, 1000000]) and
(ends_before start of a020_1 by [4, 1000000]));
};
activity a002_1_by_a015_1_a020_0 {
duration = 2;
constraints = ((ends_before start of a015_1 by [6, 1000000]) and
(ends_before start of a020_0 by [1, 1000000]));
};
activity a002_1_by_a015_1_a020_1 {
duration = 2;
constraints = ((ends_before start of a015_1 by [6, 1000000]) and
(ends_before start of a020_1 by [4, 1000000]));
};
activity a003 {
decompositions = a003_0 or a003_1;
};
activity a003_0 {
decompositions = a003_0_by_a010_0_a013_0_a020_0 or a003_0_by_a010_0_a013_1_a020_0 or
a003_0_by_a010_1_a013_0_a020_0 or a003_0_by_a010_1_a013_1_a020_0 or
a003_0_by_a010_0_a013_0_a020_1 or a003_0_by_a010_0_a013_1_a020_1 or
a003_0_by_a010_1_a013_0_a020_1 or a003_0_by_a010_1_a013_1_a020_1;
};
activity a003_0_by_a010_0_a013_0_a020_0 {
duration = 4;
constraints = ((ends_before start of a010_0 by [6, 1000000]) and
(ends_before start of a013_0 by [1000000, 16]) and
(ends_before start of a020_0 by [2, 1000000]));
};
183
A Sample Problems
activity a003_0_by_a010_0_a013_1_a020_0 {
duration = 4;
constraints = ((ends_before start of a010_0 by [6, 1000000]) and
(ends_before start of a013_1 by [1000000, 18]) and
(ends_before start of a020_0 by [2, 1000000]));
};
activity a003_0_by_a010_1_a013_0_a020_0 {
duration = 4;
constraints = ((ends_before start of a010_1 by [8, 1000000]) and
(ends_before start of a013_0 by [1000000, 16]) and
(ends_before start of a020_0 by [2, 1000000]));
};
activity a003_0_by_a010_1_a013_1_a020_0 {
duration = 4;
constraints = ((ends_before start of a010_1 by [8, 1000000]) and
(ends_before start of a013_1 by [1000000, 18]) and
(ends_before start of a020_0 by [2, 1000000]));
};
activity a003_0_by_a010_0_a013_0_a020_1 {
duration = 4;
constraints = ((ends_before start of a010_0 by [6, 1000000]) and
(ends_before start of a013_0 by [1000000, 16]) and
(ends_before start of a020_1 by [8, 1000000]));
};
activity a003_0_by_a010_0_a013_1_a020_1 {
duration = 4;
constraints = ((ends_before start of a010_0 by [6, 1000000]) and
(ends_before start of a013_1 by [1000000, 18]) and
(ends_before start of a020_1 by [8, 1000000]));
};
activity a003_0_by_a010_1_a013_0_a020_1 {
duration = 4;
constraints = ((ends_before start of a010_1 by [8, 1000000]) and
(ends_before start of a013_0 by [1000000, 16]) and
(ends_before start of a020_1 by [8, 1000000]));
};
activity a003_0_by_a010_1_a013_1_a020_1 {
duration = 4;
constraints = ((ends_before start of a010_1 by [8, 1000000]) and
(ends_before start of a013_1 by [1000000, 18]) and
(ends_before start of a020_1 by [8, 1000000]));
};
activity a003_1 {
decompositions = a003_1_by_a010_0_a013_0_a020_0 or a003_1_by_a010_0_a013_1_a020_0 or
a003_1_by_a010_1_a013_0_a020_0 or a003_1_by_a010_1_a013_1_a020_0 or
a003_1_by_a010_0_a013_0_a020_1 or a003_1_by_a010_0_a013_1_a020_1 or
a003_1_by_a010_1_a013_0_a020_1 or a003_1_by_a010_1_a013_1_a020_1;
};
184
A.1 Sample MRCPSP/max problem in ASPEN syntax
activity a003_1_by_a010_0_a013_0_a020_0 {
duration = 5;
constraints = ((ends_before start of a010_0 by [8, 1000000]) and
(ends_before start of a013_0 by [1000000, 12]) and
(ends_before start of a020_0 by [17, 1000000]));
};
activity a003_1_by_a010_0_a013_1_a020_0 {
duration = 5;
constraints = ((ends_before start of a010_0 by [15, 1000000]) and
(ends_before start of a013_1 by [1000000, 30]) and
(ends_before start of a020_0 by [32, 1000000]));
};
activity a003_1_by_a010_1_a013_0_a020_0 {
duration = 5;
constraints = ((ends_before start of a010_1 by [3, 1000000]) and
(ends_before start of a013_0 by [1000000, 12]) and
(ends_before start of a020_0 by [47, 1000000]));
};
activity a003_1_by_a010_1_a013_1_a020_0 {
duration = 5;
constraints = ((ends_before start of a010_1 by [3, 1000000]) and
(ends_before start of a013_1 by [1000000, 30]) and
(ends_before start of a020_0 by [62, 1000000]));
};
activity a003_1_by_a010_0_a013_0_a020_1 {
duration = 5;
constraints = ((ends_before start of a010_0 by [22, 1000000]) and
(ends_before start of a013_0 by [1000000, 12]) and
(ends_before start of a020_1 by [13, 1000000]));
};
activity a003_1_by_a010_0_a013_1_a020_1 {
duration = 5;
constraints = ((ends_before start of a010_0 by [29, 1000000]) and
(ends_before start of a013_1 by [1000000, 30]) and
(ends_before start of a020_1 by [13, 1000000]));
};
activity a003_1_by_a010_1_a013_0_a020_1 {
duration = 5;
constraints = ((ends_before start of a010_1 by [3, 1000000]) and
(ends_before start of a013_0 by [1000000, 12]) and
(ends_before start of a020_1 by [13, 1000000]));
};
activity a003_1_by_a010_1_a013_1_a020_1 {
duration = 5;
constraints = ((ends_before start of a010_1 by [3, 1000000]) and
(ends_before start of a013_1 by [1000000, 30]) and
(ends_before start of a020_1 by [13, 1000000]));
};
activity a004 {
185
A Sample Problems
decompositions = a004_0 or a004_1;
};
activity a004_0 {
decompositions = a004_0_by_a006_0_a012_0 or a004_0_by_a006_1_a012_0 or
a004_0_by_a006_0_a012_1 or a004_0_by_a006_1_a012_1;
};
activity a004_0_by_a006_0_a012_0 {
duration = 9;
constraints = ((ends_before start of a006_0 by [20, 1000000]) and
(ends_before start of a012_0 by [25, 1000000]));
};
activity a004_0_by_a006_1_a012_0 {
duration = 9;
constraints = ((ends_before start of a006_1 by [22, 1000000]) and
(ends_before start of a012_0 by [25, 1000000]));
};
activity a004_0_by_a006_0_a012_1 {
duration = 9;
constraints = ((ends_before start of a006_0 by [20, 1000000]) and
(ends_before start of a012_1 by [18, 1000000]));
};
activity a004_0_by_a006_1_a012_1 {
duration = 9;
constraints = ((ends_before start of a006_1 by [22, 1000000]) and
(ends_before start of a012_1 by [18, 1000000]));
};
activity a004_1 {
decompositions = a004_1_by_a006_0_a012_0 or a004_1_by_a006_1_a012_0 or
a004_1_by_a006_0_a012_1 or a004_1_by_a006_1_a012_1;
};
activity a004_1_by_a006_0_a012_0 {
duration = 7;
constraints = ((ends_before start of a006_0 by [18, 1000000]) and
(ends_before start of a012_0 by [18, 1000000]));
};
activity a004_1_by_a006_1_a012_0 {
duration = 7;
constraints = ((ends_before start of a006_1 by [6, 1000000]) and
(ends_before start of a012_0 by [31, 1000000]));
};
activity a004_1_by_a006_0_a012_1 {
duration = 7;
constraints = ((ends_before start of a006_0 by [18, 1000000]) and
(ends_before start of a012_1 by [20, 1000000]));
};
activity a004_1_by_a006_1_a012_1 {
duration = 7;
186
A.1 Sample MRCPSP/max problem in ASPEN syntax
constraints = ((ends_before start of a006_1 by [6, 1000000]) and
(ends_before start of a012_1 by [20, 1000000]));
};
activity a005 {
decompositions = a005_0;
};
activity a005_0 {
decompositions = a005_0_by_a014_0 or a005_0_by_a014_1;
};
activity a005_0_by_a014_0 {
duration = 1;
constraints = ((ends_before start of a014_0 by [2, 8]));
};
activity a005_0_by_a014_1 {
duration = 1;
constraints = ((ends_before start of a014_1 by [1, 7]));
};
activity a006 {
decompositions = a006_0 or a006_1;
};
activity a006_0 {
decompositions = a006_0_by_a017_0 or a006_0_by_a017_1;
};
activity a006_0_by_a017_0 {
duration = 9;
constraints = ((ends_before start of a017_0 by [27, 1000000]));
};
activity a006_0_by_a017_1 {
duration = 9;
constraints = ((ends_before start of a017_1 by [15, 1000000]));
};
activity a006_1 {
decompositions = a006_1_by_a017_0 or a006_1_by_a017_1;
};
activity a006_1_by_a017_0 {
duration = 4;
constraints = ((ends_before start of a017_0 by [4, 1000000]));
};
activity a006_1_by_a017_1 {
duration = 4;
constraints = ((ends_before start of a017_1 by [10, 1000000]));
};
activity a007 {
decompositions = a007_0 or a007_1 with (duration < duration);
};
187
A Sample Problems
activity a007_0 {
decompositions = a007_0_by_a010_0_a013_0 or a007_0_by_a010_0_a013_1 or
a007_0_by_a010_1_a013_0 or a007_0_by_a010_1_a013_1;
};
activity a007_0_by_a010_0_a013_0 {
duration = 3;
constraints = ((ends_before start of a010_0 by [1000000, 1]) and
(ends_before start of a013_0 by [5, 42]));
};
activity a007_0_by_a010_0_a013_1 {
duration = 3;
constraints = ((ends_before start of a010_0 by [1000000, 1]) and
(ends_before start of a013_1 by [8, 31]));
};
activity a007_0_by_a010_1_a013_0 {
duration = 3;
constraints = ((ends_before start of a010_1 by [1000000, 29]) and
(ends_before start of a013_0 by [5, 42]));
};
activity a007_0_by_a010_1_a013_1 {
duration = 3;
constraints = ((ends_before start of a010_1 by [1000000, 29]) and
(ends_before start of a013_1 by [8, 31]));
};
activity a007_1 {
decompositions = a007_1_by_a010_0_a013_0 or a007_1_by_a010_0_a013_1 or
a007_1_by_a010_1_a013_0 or a007_1_by_a010_1_a013_1;
};
activity a007_1_by_a010_0_a013_0 {
duration = 3;
constraints = ((ends_before start of a010_0 by [1000000, 1]) and
(ends_before start of a013_0 by [1, 18]));
};
activity a007_1_by_a010_0_a013_1 {
duration = 3;
constraints = ((ends_before start of a010_0 by [1000000, 1]) and
(ends_before start of a013_1 by [6, 20]));
};
activity a007_1_by_a010_1_a013_0 {
duration = 3;
constraints = ((ends_before start of a010_1 by [1000000, 7]) and
(ends_before start of a013_0 by [1, 18]));
};
activity a007_1_by_a010_1_a013_1 {
duration = 3;
constraints = ((ends_before start of a010_1 by [1000000, 7]) and
(ends_before start of a013_1 by [6, 20]));
188
A.1 Sample MRCPSP/max problem in ASPEN syntax
};
activity a008 {
decompositions = a008_0 or a008_1;
};
activity a008_0 {
decompositions = a008_0_by_a009_0_a016_0 or a008_0_by_a009_0_a016_1 or
a008_0_by_a009_1_a016_0 or a008_0_by_a009_1_a016_1;
};
activity a008_0_by_a009_0_a016_0 {
duration = 10;
constraints = ((ends_before start of a009_0 by [1000000, 29]) and
(ends_before start of a016_0 by [18, 1000000]));
};
activity a008_0_by_a009_0_a016_1 {
duration = 10;
constraints = ((ends_before start of a009_0 by [1000000, 29]) and
(ends_before start of a016_1 by [17, 1000000]));
};
activity a008_0_by_a009_1_a016_0 {
duration = 10;
constraints = ((ends_before start of a009_1 by [1000000, 4]) and
(ends_before start of a016_0 by [18, 1000000]));
};
activity a008_0_by_a009_1_a016_1 {
duration = 10;
constraints = ((ends_before start of a009_1 by [1000000, 4]) and
(ends_before start of a016_1 by [17, 1000000]));
};
activity a008_1 {
decompositions = a008_1_by_a009_0_a016_0 or a008_1_by_a009_0_a016_1_none or
a008_1_by_a009_1_a016_0_none or a008_1_by_a009_1_a016_1_none;
};
activity a008_1_by_a009_0_a016_0 {
duration = 9;
constraints = ((ends_before start of a009_0 by [1000000, 10]) and
(ends_before start of a016_0 by [7, 1000000]));
};
activity a008_1_by_a009_0_a016_1 {
duration = 9;
constraints = ((ends_before start of a009_0 by [1000000, 10]) and
(ends_before start of a016_1 by [11, 1000000]));
};
activity a008_1_by_a009_1_a016_0 {
duration = 9;
constraints = ((ends_before start of a009_1 by [1000000, 1]) and
(ends_before start of a016_0 by [7, 1000000]));
};
189
A Sample Problems
activity a008_1_by_a009_1_a016_1 {
duration = 9;
constraints = ((ends_before start of a009_1 by [1000000, 1]) and
(ends_before start of a016_1 by [11, 1000000]));
};
activity a009 {
decompositions = a009_0 or a009_1;
};
activity a009_0 {
decompositions = a009_0_by_a012_0_a013_0_a016_0 or a009_0_by_a012_0_a013_0_a016_1 or
a009_0_by_a012_0_a013_1_a016_0 or a009_0_by_a012_0_a013_1_a016_1 or
a009_0_by_a012_1_a013_0_a016_0 or a009_0_by_a012_1_a013_0_a016_1 or
a009_0_by_a012_1_a013_1_a016_0 or a009_0_by_a012_1_a013_1_a016_1;
};
activity a009_0_by_a012_0_a013_0_a016_0 {
duration = 10;
constraints = ((ends_before start of a012_0 by [1000000, 4]) and
(ends_before start of a013_0 by [1000000, 21]) and
(ends_before start of a016_0 by [1000000, 43]));
};
activity a009_0_by_a012_0_a013_0_a016_1 {
duration = 10;
constraints = ((ends_before start of a012_0 by [1000000, 4]) and
(ends_before start of a013_0 by [1000000, 21]) and
(ends_before start of a016_1 by [1000000, 39]));
};
activity a009_0_by_a012_0_a013_1_a016_0 {
duration = 10;
constraints = ((ends_before start of a012_0 by [1000000, 4]) and
(ends_before start of a013_1 by [1000000, 22]) and
(ends_before start of a016_0 by [1000000, 43]));
};
activity a009_0_by_a012_0_a013_1_a016_1 {
duration = 10;
constraints = ((ends_before start of a012_0 by [1000000, 4]) and
(ends_before start of a013_1 by [1000000, 22]) and
(ends_before start of a016_1 by [1000000, 39]));
};
activity a009_0_by_a012_1_a013_0_a016_0 {
duration = 10;
constraints = ((ends_before start of a012_1 by [1000000, 17]) and
(ends_before start of a013_0 by [1000000, 21]) and
(ends_before start of a016_0 by [1000000, 43]));
};
activity a009_0_by_a012_1_a013_0_a016_1 {
duration = 10;
constraints = ((ends_before start of a012_1 by [1000000, 17]) and
(ends_before start of a013_0 by [1000000, 21]) and
190
A.1 Sample MRCPSP/max problem in ASPEN syntax
(ends_before start of a016_1 by [1000000, 39]));
};
activity a009_0_by_a012_1_a013_1_a016_0 {
duration = 10;
constraints = ((ends_before start of a012_1 by [1000000, 17]) and
(ends_before start of a013_1 by [1000000, 22]) and
(ends_before start of a016_0 by [1000000, 43]));
};
activity a009_0_by_a012_1_a013_1_a016_1 {
duration = 10;
constraints = ((ends_before start of a012_1 by [1000000, 17]) and
(ends_before start of a013_1 by [1000000, 22]) and
(ends_before start of a016_1 by [1000000, 39]));
};
activity a009_1 {
decompositions = a009_1_by_a012_0_a013_0_a016_0 or a009_1_by_a012_0_a013_0_a016_1 or
a009_1_by_a012_0_a013_1_a016_0 or a009_1_by_a012_0_a013_1_a016_1 or
a009_1_by_a012_1_a013_0_a016_0 or a009_1_by_a012_1_a013_0_a016_1 or
a009_1_by_a012_1_a013_1_a016_0 or a009_1_by_a012_1_a013_1_a016_1;
};
activity a009_1_by_a012_0_a013_0_a016_0 {
duration = 4;
constraints = ((ends_before start of a012_0 by [1000000, 1]) and
(ends_before start of a013_0 by [1000000, 3]) and
(ends_before start of a016_0 by [1000000, 47]));
};
activity a009_1_by_a012_0_a013_0_a016_1 {
duration = 4;
constraints = ((ends_before start of a012_0 by [1000000, 1]) and
(ends_before start of a013_0 by [1000000, 3]) and
(ends_before start of a016_1 by [1000000, 18]));
};
activity a009_1_by_a012_0_a013_1_a016_0 {
duration = 4;
constraints = ((ends_before start of a012_0 by [1000000, 1]) and
(ends_before start of a013_1 by [1000000, 6]) and
(ends_before start of a016_0 by [1000000, 47]));
};
activity a009_1_by_a012_0_a013_1_a016_1 {
duration = 4;
constraints = ((ends_before start of a012_0 by [1000000, 1]) and
(ends_before start of a013_1 by [1000000, 6]) and
(ends_before start of a016_1 by [1000000, 18]));
};
activity a009_1_by_a012_1_a013_0_a016_0 {
duration = 4;
constraints = ((ends_before start of a012_1 by [1000000, 16]) and
(ends_before start of a013_0 by [1000000, 3]) and
(ends_before start of a016_0 by [1000000, 47]));
191
A Sample Problems
};
activity a009_1_by_a012_1_a013_0_a016_1 {
duration = 4;
constraints = ((ends_before start of a012_1 by [1000000, 16]) and
(ends_before start of a013_0 by [1000000, 3]) and
(ends_before start of a016_1 by [1000000, 18]));
};
activity a009_1_by_a012_1_a013_1_a016_0 {
duration = 4;
constraints = ((ends_before start of a012_1 by [1000000, 16]) and
(ends_before start of a013_1 by [1000000, 6]) and
(ends_before start of a016_0 by [1000000, 47]));
};
activity a009_1_by_a012_1_a013_1_a016_1 {
duration = 4;
constraints = ((ends_before start of a012_1 by [1000000, 16]) and
(ends_before start of a013_1 by [1000000, 6]) and
(ends_before start of a016_1 by [1000000, 18]));
};
activity a010 {
decompositions = a010_0 or a010_1;
};
activity a010_0 {
duration = 2;
};
activity a010_1 {
duration = 10;
};
activity a011 {
decompositions = a011_0 or a011_1;
};
activity a011_0 {
decompositions = a011_0_by_a012_0_a018_0 or a011_0_by_a012_0_a018_1 or
a011_0_by_a012_1_a018_0 or a011_0_by_a012_1_a018_1;
};
activity a011_0_by_a012_0_a018_0 {
duration = 4;
constraints = ((ends_before start of a012_0 by [1000000, 1]) and
(ends_before start of a018_0 by [9, 1000000]));
};
activity a011_0_by_a012_0_a018_1 {
duration = 4;
constraints = ((ends_before start of a012_0 by [1000000, 1]) and
(ends_before start of a018_1 by [6, 1000000]));
};
activity a011_0_by_a012_1_a018_0 {
192
A.1 Sample MRCPSP/max problem in ASPEN syntax
duration = 4;
constraints = ((ends_before start of a012_1 by [1000000, 7]) and
(ends_before start of a018_0 by [9, 1000000]));
};
activity a011_0_by_a012_1_a018_1 {
duration = 4;
constraints = ((ends_before start of a012_1 by [1000000, 7]) and
(ends_before start of a018_1 by [6, 1000000]));
};
activity a011_1 {
decompositions = a011_1_by_a012_0_a018_0 or a011_1_by_a012_0_a018_1 or
a011_1_by_a012_1_a018_0 or a011_1_by_a012_1_a018_1;
};
activity a011_1_by_a012_0_a018_0 {
duration = 10;
constraints = ((ends_before start of a012_0 by [1000000, 3]) and
(ends_before start of a018_0 by [24, 1000000]));
};
activity a011_1_by_a012_0_a018_1 {
duration = 10;
constraints = ((ends_before start of a012_0 by [1000000, 3]) and
(ends_before start of a018_1 by [25, 1000000]));
};
activity a011_1_by_a012_1_a018_0 {
duration = 10;
constraints = ((ends_before start of a012_1 by [1000000, 7]) and
(ends_before start of a018_0 by [39, 1000000]));
};
activity a011_1_by_a012_1_a018_1 {
duration = 10;
constraints = ((ends_before start of a012_1 by [1000000, 7]) and
(ends_before start of a018_1 by [25, 1000000]));
};
activity a012 {
decompositions = a012_0 or a012_1;
};
activity a012_0 {
decompositions = a012_0_by_a019_0 or a012_0_by_a019_1;
};
activity a012_0_by_a019_0 {
duration = 6;
constraints = ((ends_before start of a019_0 by [1000000, 16]));
};
activity a012_0_by_a019_1 {
duration = 6;
constraints = ((ends_before start of a019_1 by [1000000, 17]));
193
A Sample Problems
};
activity a012_1 {
decompositions = a012_1_by_a019_0 or a012_1_by_a019_1;
};
activity a012_1_by_a019_0 {
duration = 6;
constraints = ((ends_before start of a019_0 by [1000000, 27]));
};
activity a012_1_by_a019_1 {
duration = 6;
constraints = ((ends_before start of a019_1 by [1000000, 25]));
};
activity a013 {
decompositions = a013_0 or a013_1;
};
activity a013_0 {
duration = 7;
};
activity a013_1 {
duration = 10;
};
activity a014 {
decompositions = a014_0 or a014_1;
};
activity a014_0 {
duration = 4;
};
activity a014_1 {
duration = 4;
};
activity a015 {
decompositions = a015_0 or a015_1;
};
activity a015_0 {
duration = 8;
};
activity a015_1 {
duration = 1;
};
activity a016 {
decompositions = a016_0 or a016_1;
};
activity a016_0 {
194
A.1 Sample MRCPSP/max problem in ASPEN syntax
duration = 7;
};
activity a016_1 {
duration = 2;
};
activity a017 {
decompositions = a017_0 or a017_1;
};
activity a017_0 {
duration = 9;
};
activity a017_1 {
duration = 10;
};
activity a018 {
decompositions = a018_0 or a018_1;
};
activity a018_0 {
duration = 5;
};
activity a018_1 {
duration = 2;
};
activity a019 {
decompositions = a019_0 or a019_1;
};
activity a019_0 {
duration = 5;
};
activity a019_1 {
duration = 6;
};
activity a020 {
decompositions = a020_0 or a020_1;
};
activity a020_0 {
duration = 10;
};
activity a020_1 {
duration = 8;
};
195
A Sample Problems
A.2 Sample C TAEMS Problem
;;Scenario time bounds
(spec_boh 20)
(spec_eoh 115)
;;Agents
(spec_agent (label Miek))
(spec_agent (label Marc))
(spec_agent (label Christie))
(spec_agent (label Hector))
(spec_agent (label Moses))
(spec_agent (label Jesper))
(spec_agent (label Blayne))
(spec_agent (label Amy))
(spec_agent (label Les))
(spec_agent (label Kevan))
(spec_agent (label Starbuck))
(spec_agent (label Tandy))
(spec_agent (label Betty))
(spec_agent (label Ole))
(spec_agent (label Liyuan))
(spec_agent (label Naim))
(spec_agent (label Galen))
(spec_agent (label Sandy))
(spec_agent (label Kyu))
(spec_agent (label Cris))
;;Activities
(spec_task_group (label scenario1)
(subtasks problem1 problem2)
(qaf q_sum))
(spec_task (label problem1)
(subtasks gutturalwinroot songbirdwinroot hunkwinroot)
(qaf q_sum_and))
(spec_task (label gutturalwinroot)
(earliest_start_time 30) (deadline 61)
(subtasks averment exudate healthcare)
(qaf q_sum_and))
(spec_task (label averment)
(subtasks solve solveff1 solverd1)
(qaf q_max))
(spec_method (label solve) (agent Betty)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_method (label solveff1) (agent Betty)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
196
A.2 Sample C TAEMS Problem
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_method (label solverd1) (agent Moses)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_task (label exudate)
(subtasks snarf snarfff1 snarfrd1)
(qaf q_max))
(spec_method (label snarf) (agent Moses)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 11 0.50 14 0.25 7 0.25))))
(spec_method (label snarfff1) (agent Moses)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 4 0.25 5 0.50 6 0.25))))
(spec_method (label snarfrd1) (agent Hector)
(outcomes
(default (density 0.800000)
(quality_distribution 10.00 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))
(failure (density 0.200000)
(quality_distribution 0.00 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_task (label healthcare)
(subtasks mousetrap mousetraprd1)
(qaf q_max))
(spec_method (label mousetrap) (agent Hector)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 9 0.50 11 0.25 6 0.25))))
(spec_method (label mousetraprd1) (agent Jesper)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_task (label songbirdwinroot)
(earliest_start_time 45) (deadline 55)
(subtasks owngoal tandoori)
(qaf q_sum))
(spec_task (label owngoal)
(subtasks scorn scornff1 scornrd1)
(qaf q_max))
197
A Sample Problems
(spec_method (label scorn) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_method (label scornff1) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 8 0.25 5 0.25 7 0.50))))
(spec_method (label scornrd1) (agent Cris)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 8 0.50 10 0.25 5 0.25))))
(spec_task (label tandoori)
(subtasks unstop unstoprd1)
(qaf q_max))
(spec_method (label unstop) (agent Marc)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 9 0.50 11 0.25 6 0.25))))
(spec_method (label unstoprd1) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 8 0.50 10 0.25 5 0.25))))
(spec_task (label hunkwinroot)
(earliest_start_time 50) (deadline 74)
(subtasks barmecide glacis cherry)
(qaf q_sum_and))
(spec_task (label barmecide)
(subtasks gutter gutterff1 gutterrd1)
(qaf q_max))
(spec_method (label gutter) (agent Kevan)
(outcomes
(default (density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_method (label gutterff1) (agent Kevan)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 4 0.25 5 0.50 6 0.25))))
(spec_method (label gutterrd1) (agent Liyuan)
198
A.2 Sample C TAEMS Problem
(outcomes
(default (density 1.000000)
(quality_distribution 8.75 1.0)
(duration_distribution 9 0.50 11 0.25 6 0.25))))
(spec_task (label glacis)
(subtasks purchase purchaseff1 purchaserd1)
(qaf q_max))
(spec_method (label purchase) (agent Liyuan)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_method (label purchaseff1) (agent Liyuan)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_method (label purchaserd1) (agent Kevan)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 9 0.50 11 0.25 6 0.25))))
(spec_task (label cherry)
(subtasks misreport misreportff1)
(qaf q_max))
(spec_method (label misreport) (agent Kevan)
(outcomes
(default (density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 10 0.50 13 0.25 7 0.25))))
(spec_method (label misreportff1) (agent Kevan)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 9 0.50 10 0.25 7 0.25))))
(spec_task
(label problem2)
(subtasks elevonwinroot pingowinroot extentwinroot)
(qaf q_sum_and))
(spec_task (label elevonwinroot)
(earliest_start_time 72) (deadline 84)
(subtasks racerunner washup reticle)
(qaf q_sum))
(spec_task (label racerunner)
(subtasks fissure fissureff1 fissurerd1)
(qaf q_max))
199
A Sample Problems
(spec_method (label fissure) (agent Ole)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_method (label fissureff1) (agent Ole)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 3 0.25 4 0.75))))
(spec_method (label fissurerd1) (agent Naim)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_task (label washup)
(subtasks tie tierd1)
(qaf q_max))
(spec_method (label tie) (agent Kyu)
(outcomes
(default (density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 10 0.50 13 0.25 7 0.25))))
(spec_method (label tierd1) (agent Tandy)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_task (label reticle)
(subtasks stem stemff1 stemrd1)
(qaf q_max))
(spec_method (label stem) (agent Tandy)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_method (label stemff1) (agent Tandy)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 4 0.25 5 0.50 6 0.25))))
(spec_method (label stemrd1) (agent Betty)
(outcomes
(default (density 1.000000)
(quality_distribution 8.75 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_task (label pingowinroot)
200
A.2 Sample C TAEMS Problem
(earliest_start_time 78) (deadline 90)
(subtasks bromicacid aquanaut mandinka)
(qaf q_sum))
(spec_task (label bromicacid)
(subtasks randomize randomizeff1)
(qaf q_max))
(spec_method (label randomize) (agent Naim)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 10 0.50 13 0.25 7 0.25))))
(spec_method (label randomizeff1) (agent Naim)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 8 0.50 9 0.25 6 0.25))))
(spec_task (label aquanaut)
(subtasks enervate enervateff1)
(qaf q_max))
(spec_method (label enervate) (agent Ole)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_method (label enervateff1) (agent Ole)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_task (label mandinka)
(subtasks entrust entrustff1)
(qaf q_max))
(spec_method (label entrust) (agent Kyu)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 8 0.50 10 0.25 5 0.25))))
(spec_method (label entrustff1) (agent Kyu)
(outcomes
(default (density 1.000000)
(quality_distribution 8.75 1.0)
(duration_distribution 4 0.25 5 0.50 6 0.25))))
(spec_task (label extentwinroot)
(earliest_start_time 84) (deadline 110)
(subtasks squashbugsyncpoint civilwarsyncpoint cellaragesyncpoint)
(qaf q_sum_and))
201
A Sample Problems
(spec_task (label squashbugsyncpoint)
(subtasks cloudscape salesman plainsman angina)
(qaf q_sync_sum))
(spec_task (label cloudscape)
(subtasks orient orientrd1)
(qaf q_max))
(spec_method (label orient) (agent Miek)
(outcomes
(default (density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_method (label orientrd1) (agent Cris)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 8 0.50 10 0.25 5 0.25))))
(spec_task (label salesman)
(subtasks reface refaceff1)
(qaf q_max))
(spec_method (label reface) (agent Blayne)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 11 0.50 14 0.25 7 0.25))))
(spec_method (label refaceff1) (agent Blayne)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 9 0.50 10 0.25 7 0.25))))
(spec_task (label plainsman)
(subtasks diarize diarizerd1)
(qaf q_max))
(spec_method (label diarize) (agent Amy)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 9 0.50 11 0.25 6 0.25))))
(spec_method (label diarizerd1) (agent Blayne)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_task (label angina)
(subtasks search searchff1)
(qaf q_max))
(spec_method (label search) (agent Cris)
202
A.2 Sample C TAEMS Problem
(outcomes
(default (density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 8 0.50 10 0.25 5 0.25))))
(spec_method (label searchff1) (agent Cris)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 8 0.25 5 0.25 7 0.50))))
(spec_task (label civilwarsyncpoint)
(subtasks gyrocopter shoreline putoption denverboot)
(qaf q_sum))
(spec_task (label gyrocopter)
(subtasks intoxicate intoxicateff1 intoxicaterd1)
(qaf q_max))
(spec_method (label intoxicate) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 9 0.50 11 0.25 6 0.25))))
(spec_method (label intoxicateff1) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 3 0.25 4 0.75))))
(spec_method (label intoxicaterd1) (agent Cris)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 8 0.50 10 0.25 5 0.25))))
(spec_task (label shoreline)
(subtasks enwreathe enwreatherd1)
(qaf q_max))
(spec_method (label enwreathe) (agent Cris)
(outcomes
(default (density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 10 0.50 13 0.25 7 0.25))))
(spec_method (label enwreatherd1) (agent Miek)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_task (label putoption)
(subtasks configure configureff1 configurerd1)
(qaf q_max))
203
A Sample Problems
(spec_method (label configure) (agent Marc)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_method (label configureff1) (agent Marc)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 4 0.25 5 0.50 6 0.25))))
(spec_method (label configurerd1) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_task (label denverboot)
(subtasks skitter skitterff1)
(qaf q_max))
(spec_method (label skitter) (agent Starbuck)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 9 0.25 4 0.25 7 0.50))))
(spec_method (label skitterff1) (agent Starbuck)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 3 0.25 4 0.75))))
(spec_task (label cellaragesyncpoint)
(subtasks bonfire mojarra skimmilk flapjack)
(qaf q_sum))
(spec_task (label bonfire)
(subtasks preprogram preprogramff1 preprogramrd1)
(qaf q_max))
(spec_method (label preprogram) (agent Galen)
(outcomes
(default (density 1.000000)
(quality_distribution 11.75 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_method (label preprogramff1) (agent Galen)
(outcomes
(default (density 1.000000)
(quality_distribution 8.75 1.0)
(duration_distribution 9 0.50 10 0.25 7 0.25))))
(spec_method (label preprogramrd1) (agent Starbuck)
(outcomes
(default (density 1.000000)
204
A.2 Sample C TAEMS Problem
(quality_distribution 10.00 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_task (label mojarra)
(subtasks barf barfff1)
(qaf q_max))
(spec_method (label barf) (agent Starbuck)
(outcomes (default
(density 1.000000)
(quality_distribution 10.75 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_method (label barfff1) (agent Starbuck)
(outcomes
(default (density 1.000000)
(quality_distribution 7.75 1.0)
(duration_distribution 8 0.25 5 0.25 7 0.50))))
(spec_task (label skimmilk)
(subtasks conceive conceiveff1 conceiverd1)
(qaf q_max))
(spec_method (label conceive) (agent Marc)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 11 0.50 14 0.25 7 0.25))))
(spec_method (label conceiveff1) (agent Marc)
(outcomes
(default (density 1.000000)
(quality_distribution 8.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_method (label conceiverd1) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 8.75 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
(spec_task (label flapjack)
(subtasks import importrd1)
(qaf q_max))
(spec_method (label import) (agent Les)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 3 0.25 5 0.50 6 0.25))))
(spec_method (label importrd1) (agent Marc)
(outcomes
(default (density 1.000000)
(quality_distribution 10.00 1.0)
(duration_distribution 4 0.25 6 0.50 7 0.25))))
205
A Sample Problems
;;NLEs
(spec_hinders (label salesman_to_fissureff1)
(from salesman) (to fissureff1) (delay 1)
(quality_power 0.50) (duration_power 0.50))
(spec_enables (label gutturalwinroot_to_tandoori)
(from gutturalwinroot) (to tandoori) (delay 1))
(spec_enables (label gutterrd1_to_famous)
(from gutterrd1) (to famous) (delay 1))
(spec_enables (label scornrd1_to_gutterrd1)
(from scornrd1) (to gutterrd1) (delay 1))
(spec_disables (label racerunner_to_randomize)
(from racerunner) (to randomize) (delay 1))
(spec_facilitates (label barmecide_to_exudate)
(from barmecide) (to exudate) (delay 1)
(quality_power 0.50) (duration_power 0.50))
(spec_facilitates (label enwreatherd1_to_elevonwinroot)
(from enwreatherd1) (to elevonwinroot) (delay 1)
(quality_power 0.50) (duration_power 0.50))
(spec_enables (label elevonwinroot_to_entrust)
(from elevonwinroot) (to entrust) (delay 1))
(spec_enables (label tandoori_to_barmecide)
(from tandoori) (to barmecide) (delay 1))
(spec_facilitates (label gutturalwinroot_to_purchaseff1)
(from gutturalwinroot) (to purchaseff1) (delay 1)
(quality_power 0.50) (duration_power 0.50))
(spec_hinders (label enervate_to_diarize)
(from enervate) (to diarize) (delay 1)
(quality_power 0.50) (duration_power 0.50))
(spec_enables (label stemrd1_to_enervate)
(from stemrd1) (to enervate) (delay 1))
;;Initial Schedule
(spec_schedule
(schedule_elements
(mousetrap (start_time 30) (spec_attributes (performer "Hector") (duration 9)))
(snarf (start_time 30) (spec_attributes (performer "Moses") (duration 11)))
(solve (start_time 30) (spec_attributes (performer "Betty") (duration 7)))
(unstop (start_time 45) (spec_attributes (performer "Marc") (duration 9)))
(scorn (start_time 45) (spec_attributes (performer "Les") (duration 7)))
(scornrd1 (start_time 45) (spec_attributes (performer "Cris") (duration 8)))
(misreport (start_time 50) (spec_attributes (performer "Kevan") (duration 10)))
(gutterrd1 (start_time 55) (spec_attributes (performer "Liyuan") (duration 9)))
(gutter (start_time 60) (spec_attributes (performer "Kevan") (duration 5)))
(purchase (start_time 64) (spec_attributes (performer "Liyuan") (duration 6)))
(misreportff1 (start_time 65) (spec_attributes (performer "Kevan") (duration 9)))
(stemff1 (start_time 72) (spec_attributes (performer "Tandy") (duration 5)))
206
A.2 Sample C TAEMS Problem
(stemrd1 (start_time 72) (spec_attributes (performer "Betty") (duration 7)))
(fissurerd1 (start_time 72) (spec_attributes (performer "Naim") (duration 6)))
(tie (start_time 72) (spec_attributes (performer "Kyu") (duration 10)))
(randomize (start_time 78) (spec_attributes (performer "Naim") (duration 10)))
(enervate (start_time 80) (spec_attributes (performer "Ole") (duration 7)))
(entrust (start_time 82) (spec_attributes (performer "Kyu") (duration 8)))
(enwreatherd1 (start_time 84) (spec_attributes (performer "Miek") (duration 6)))
(importrd1 (start_time 84) (spec_attributes (performer "Marc") (duration 6)))
(intoxicate (start_time 84) (spec_attributes (performer "Les") (duration 9)))
(barf (start_time 84) (spec_attributes (performer "Starbuck") (duration 5)))
(preprogram (start_time 84) (spec_attributes (performer "Galen") (duration 5)))
(preprogramrd1 (start_time 89) (spec_attributes (performer "Starbuck") (duration 6)))
(orient (start_time 90) (spec_attributes (performer "Miek") (duration 6)))
(conceive (start_time 90) (spec_attributes (performer "Marc") (duration 11)))
(refaceff1 (start_time 90) (spec_attributes (performer "Blayne") (duration 9)))
(diarize (start_time 90) (spec_attributes (performer "Amy") (duration 6)))
(search (start_time 90) (spec_attributes (performer "Cris") (duration 8)))
(skitterff1 (start_time 95) (spec_attributes (performer "Starbuck") (duration 4)))
(configure (start_time 101) (spec_attributes (performer "Marc") (duration 6))))
(spec_attributes (Quality 421.35208)))
207
A Sample Problems
208
Keywords: planning under uncertainty, robust planning, metalevel control, cooperative multiagent systems, plan management, artiﬁcial intelligence
Abstract
The general problem of planning for uncertain domains remains a difﬁcult challenge. Research that focuses on constructing plans by reasoning with explicit models of uncertainty has produced some promising mechanisms for coping with speciﬁc types of domain uncertainties; however, these approaches generally have difﬁculty scaling. Research in robust planning has alternatively emphasized the use of deterministic planning techniques, with the goal of constructing a ﬂexible plan (or set of plans) that can absorb deviations during execution. Such approaches are scalable, but either result in overly conservative plans, or ignore the potential leverage that can be provided by explicit uncertainty models. The main contribution of this work is a composite approach to planning that couples the strengths of both the above approaches while minimizing their weaknesses. Our approach, called Probabilistic Plan Management (PPM), takes advantage of the known uncertainty model while avoiding the overhead of nondeterministic planning. PPM takes as its starting point a deterministic plan that is built with deterministic modeling assumptions. PPM begins by layering an uncertainty analysis on top of the plan. The analysis calculates the overall expected outcome of execution and can be used to identify expected weak areas of the schedule. PPM uses the analysis in two main ways to maximize the utility of and manage execution. First, it makes deterministic plans more robust by minimizing the negative impact that unexpected or undesirable contingencies can have on plan utility. PPM strengthens the current schedule by fortifying the areas of the plan identiﬁed as weak by the probabilistic analysis, increasing the likelihood that they will succeed. In experiments, probabilistic schedule strengthening is able to signiﬁcantly increase the utility of execution while introducing only a modest overhead. Second, PPM reduces the amount of replanning that occurs during execution via a probabilistic metalevel control algorithm. It uses the probability analysis as a basis for identifying cases where replanning probably is (or is not) necessary, and acts accordingly. This addresses the tradeoff of too much replanning, which can lead to the overuse of computational resources and lack of responsiveness, versus too little, which can lead to undesirable errors or missed opportunities during execution. Experiments show that probabilistic metalevel control is able to considerably decrease the amount of time spent managing plan execution, without affecting how much utility is earned. In these ways, our approach effectively manages the execution of deterministic plans for uncertain domains both by producing effective plans in a scalable way, and by intelligently controlling the resources that are used maintain these highutility plans during execution.
iv
I would like to thank my mother and my sister for not giving me too much trouble about never having a “real job. Brad Hamner and Greg Armstrong – for all the fun times in the highbay. for.” and for consistently expressing (or. Laura Barbulescu. even if sometimes the “fun” was in a painful way. Along those lines. thanks to Zack Rubinstein. I want to thank my husband. at least. Finally. for being such a great mentor through this process. Thanks to Dave Musliner and Manuela Veloso for being part of my committee. Terry Zimmerman gets a special thanks for suggesting the “PADS” acronym. and encouragement. questions. pretending to express) interest in what I’ve been up to these many years. I would also like to ﬁrmly point out to them. Seth Koterba. well. Reid Simmons. Thanks to Brennan Sellner for lending me his code for and helping me with ASPEN. Anthony Gallagher. this one last time. and I cannot thank him enough. and for all of their good suggestions and guidance. I really enjoyed the opportunity to apply this work to a real domain. saving everyone from days of repeating “probabilistic analysis of deterministic schedules” over and over again. thanks to the rest of the ICLL group for the many hours discussing the intricacies of C TAEMS and the Coordinators project. and for sharing his ASPEN contacts with me to provide additional support. v . for letting me join his project for part of this thesis. Thanks also to him and the rest of the Trestle/IDSR group – Fred Heger. This would not have been possible without all of his support. but also. that my robots are not taking over the world. Charlie Collins and Dave Crimm. everything. Nik Melchior. Stephen.Acknowledgments I would ﬁrst like to thank my advisor. of course. Matt Danish. Thanks also to Steve Smith.
vi .
. . . . . . . .2. . . . . . . . .2 Plan Flexibility and Robustness . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . .2 Plan Versus Schedule Terminology .1 1. . . . . . . . . . . . . .1. . . . . Plan Management and Repair . . . . . 3 Background 3. . . . . . . . . .2 1. . . . . . .2.1. . . . . . . . . . . . . Probabilistic MetaLevel Control . . . . . . . . . . . . . . . . . . . . . . . . . . .2 2. . . . . . Probabilistic Schedule Strengthening . . . . . . . Thesis Outline . . . . . . . . . . . . . . . . . . . . . . .Contents 1 Introduction 1. . . . . . . . . . . . . . . . . .3 2. . . .1 2. . . . . .1 3.1 2. . . . . . .5 Probabilistic Analysis of Deterministic Schedules . . . . . . . . . . . . . . . . . . . . .3 MetaLevel Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thesis Statement . . . Deterministic Planning . . . . . . . . . . . . . . . . . . . . . . SearchBased Contingency Planning . . . . . . . . . . . Domain and Schedule Representation . . . . . . . . . . . . . . . . .1 NonDeterministic Planning . . . . . .1. . . . . . . . . . . . . . . . . Conformant Planning . . . . . . . . . . . . . . . . . . . . . . .2 PolicyBased Planning . . . . . . . . 13 15 15 17 18 18 21 21 21 23 24 24 25 26 27 29 29 30 2 Related Work 2. . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 1. . 2. . . . . . . . .4 1. . . . . . . . . .
. 2 .1. . .3. . 3. . 5. . . . . . . . . . . . . . . .1 5.1 5. Probabilistic Analysis of Deterministic Schedules . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Decision Trees with Boosting . . . . . . . . . . . . . . . . . . . .2. . . . .2 SingleAgent Domain . . . . . . .1 SingleAgent Domain . Flexible Times Schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Algorithm . . . . . . . . . .3. . . . . . . . . .1. . . . . . . . . . Probabilistic Schedule Strengthening . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 31 32 32 36 44 49 49 51 53 55 59 59 62 62 63 66 67 69 69 72 85 93 96 97 97 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MultiAgent PADS . . . . . . . . . . . . . . . . . Probability Proﬁles . . . . . . . . . . . . . . . 5 Probabilistic Analysis of Deterministic Schedules 5. .2 SingleAgent PADS . .3 5. . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . Global Algorithm . . .CONTENTS 3. Probability Proﬁles . . . . . . . . . . . . . . . .3 MultiAgent Domain .3 4. . . . 5. . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . .2 5. . . . .2 5. . . . . . . . . Probabilistic MetaLevel Control . . . . . . . . . . . . . .3. . . . . . . . . .2 Activity Dependencies . . . . .4 Experiments and Results . . . . . . . . . . . . . . . . . . . . .5 Activity Dependencies . .1 5. . . . . . . . 5. . .3 Activity Networks . . . . . . . . . . . . . . . .1 3. . . . . . . . . . . .4 5. . . . . 3. . . . . . . . . . . . . .1 Overview . . . . . . . . . . . .1 4. . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . .2. . . . . . . . MultiAgent Domain .4 Domain Suitability . . . . .1 3. . SingleAgent Domain . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . Practical Considerations . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . .2 3. . . . . 4 Approach 4. . . . . .2 4. . .
. . . . . . . . . . . . . .2. . 132 Probabilistic Plan Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 6. . . . . . . . . . . 110 Swapping . 117 Experimental Results . . . .2. . . . . . . . . . . . . . . . . . . . . .2 SingleAgent Domain . . . . . . . . . . . 121 Comparison of Strengthening Strategies . . . . . . . . . . . .2 Overview . . . . . . . 139 7. . . . . . . . . . . . . . . . . . .1. .4. . . . . . . . . . . . . . . 133 7. . . . . . . . . . . . . . . .2 7. . . . .2 6. . 98 105 6 Probabilistic Schedule Strengthening 6. . . . . . . . . . . 113 Pruning . . . . .3 Backﬁlling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Overview .3 SingleAgent Domain . . . . . . . .2. . . . .1 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6. . . . . . . . . . . . . . . . . . . . .2 MultiAgent Domain . . . . . . . . 115 6. .2 6. . . . . . . . .3 MultiAgent Domain . . . . . . . . 139 3 . .4 Baseline Plan Management . . .3. . . 135 Uncertainty Horizon . . .1. . . . . . . 105 Tactics . . . . . .3. . . . . . . . 139 Learning When to Replan . .3 6. . . . 121 6. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . .1 6. . . . . . . . . . . . . . . . 138 7. . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . . .4 Effects of Targeting Strengthening Actions . 132 MultiAgent Domain . .2. . . . . . . .1 7. . . .3 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Likely Replanning . . . . . . 134 7. . . . . . . . . . . 127 131 7 Probabilistic MetaLevel Control 7.2 Baseline Plan Management . . . . . . . . . . . 131 7. . . . .1 7. . . . . . . . . . . . . . 124 Effects of Global Strengthening . .CONTENTS 5. . . . . . . . . . .2. .3 7. . . . . . . . . . . . . .4.2. . . . . . . . . . . . . . . . . . 136 Probabilistic Plan Management .2 7. . . . . . . . . . . . . . . . . . . . . . . . . 125 Effects of RunTime Distributed Schedule Strengthening .1 7. . . . . . . . .4. . . . . . . . . . . . .4 Strategies . . . . . . . . . . . .1. . . .4. . . . . .
. . . . . . . . . 196 4 . .2 Conclusions . . . . . . . . . . . .1 7. .1. . . . . . . . . . . . 165 8. . . . . . . . . . .1. . . . . . . . . . . . . . 144 Learning Probabilistic MetaLevel Control . . . . . . . . . . . . . . . . . . . . . . . 142 Experiments and Results . . . . . .4 8. . . . .1. .3 8. . . . . . . . . . . . .1 8. . . . .1. . . .1. . . . . 168 8. . . . . . . . . . . . . 150 165 8 Future Work and Conclusions 8. . . 166 Demonstration on Other Domains / Planners .1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 MetaLevel Control Extensions . . . . . 168 171 181 9 Bibliography A Sample Problems A. . . . . . . . . . . . . . . . . . . . . . . .1 Sample MRCPSP/max problem in ASPEN syntax . . .2 8. . . . . . .CONTENTS 7.3 7. 143 7. .4. . . . . .4. . . . . . .2 Sample C TAEMS Problem . . . . . . . . . . . . . . . . . . . . . . 182 A. . . .2 Thresholding MetaLevel Control . . . . . . . . . . . . . . . .4 Probabilistic Plan Management . . . . . 167 NonCooperative MultiAgent Scenarios . . . . . . . . . . . . . . 167 HumanRobot Teams . . . . . . . . . . . 165 Further Integration with Planning . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .5. . . . The limited local view for agent Buster of Figure 3. . . . . . . 5 31 33 35 37 39 43 3. . . . . . . . . . Notice that the method is constrained to start no earlier than time 0 (via the constraint between time 0 and m start) and ﬁnish by time 10 (via the constraint between time 0 and m f inish). . . . deterministic 3agent problem. . . . . . . . .3 3. . . . . . . . . . .1 5.2 3. . . . . . . . . . . . . . .1 A sample STN for a single scheduled method m with duration 5 and a deadline of 10. . . . . A simple.7 46 58 4. . . . . . . . . . . . . . Instances that are not underlined refer to trafﬁc being present. . . . . . . . . . . . . .5 3. . A graph of a normal distribution. . . . . . . . . . . . . . . . .1 64 . . . . . . . . . . . .5. An example decision tree built from a data set. . . . . . The tree classiﬁes 11 examples giving state information about the time of day and how much rain there has been in the past 10 minutes as having or not having trafﬁc at that time. . . . . . . .4 3. . . . A small 3agent problem. . . . Each node of the tree shows the attribute and threshold for that branch as well as how many instances of each class are present at that node. .List of Figures 3. . . . . . . . . . Flow diagram for PPM. . . . . . . . . . . . The activity dependencies (denoted by solid arrows) and propagation order (denoted by circled numbers) that result from the schedule shown for the problem in Figure 3. . underlined instances refer to trafﬁc not being present. . . . A sample MRCPSP/max problem. . . . . . . . . . . . . . . . . .6 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 126 Utility for each problem for the different online replanning conditions. . . . .4 6. and passes the proﬁle of T1 to Buster. . 109 The greedy strengthening strategy repeatedly performs one trial backﬁll. .8 7. . . 148 Average execution time across all three problems using PPM with both the uncertainty horizon and likely replanning at various thresholds. . . . . . . . . . . when executed. . expressed as the average proportion of the originally scheduled utility earned. . . A simple 3agent problem and a corresponding schedule. . . . . . 65 86 5. . . . . . . . .3 5. . . . . . . . expressed as the average proportion of the originally scheduled utility earned.2 The same propagation as Figure 5. . . . . . 109 Utility earned by different backﬁlling conditions. . but done distributively. . . . . . . . . . . . . .1. . . . . . . . . . . 103 The baseline strengthening strategy explores the full search space of the different orderings of backﬁlls. . with standard deviation shown. . . . . . . . . . . shown as both the average time per agent and total time across all agents. . . . . . 122 The percent increase of the number of methods in the initial schedule after backﬁlling for each mode. . 145 Likely replanning results. . . Jeremy ﬁrst calculates the proﬁles for its activities. . . . . . . .7 6. . . . . . . . . . one trial swap and one trial prune.2 6. . . . . . . 128 Utility earned across various conditions. . . . . . . . . . . . . . . . . . . Buster can then ﬁnish the proﬁle calculation for the rest of the activity network. . . . .5 5. . . . . . . . . . 108 The roundrobin strengthening strategy performs one backﬁll. . . . . . .LIST OF FIGURES 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Time spent calculating 20 schedules’ initial expected utilities. . . . . . . . .1 7. . m1 and T1. . . . . repeating until no more strengthening actions are possible. 129 Uncertainty horizon results. . . . . . . .3 . . . . .5 6. . . . . . one swap and one prune. . 150 6 6. . . . . .1 6. . . . . 101 Time spent distributively calculating 20 schedule’s initial expected utility. . . . . . .2 7. .4 5.3 6. . . . . . . . 100 The accuracy of global PADS: the calculated expected utility compared to the average utility. . . . expressed as the average proportion of omniscient utility earned. . . . . . . . . . . . . . . . . . swapping and pruning steps that can be taken to strengthen a schedule. . . . . . . . . . . . . . . . . . . .6 6. . . . . . . . . . . . . . . . 123 Utility for each problem earned by different strengthening strategies. . . . . . . and commits at each point to the action that raises the expected utility of the joint schedule the most.
. . . . . . . . . . . 162 Average utility earned for various plan management approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Average proportion of omniscient utility earned for various probabilistic metalevel control strategies. . . . . . . . . .8 Average utility across all three problems using PPM with both the uncertainty horizon and likely replanning at various thresholds. . .LIST OF FIGURES 7. . . . . . . . .5 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 7. . . . . . .7 7. . . . . . . . 162 7 . . . . . . . . . . . . . . 159 Average proportion of omniscient utility earned for various plan management approaches. . . . . . . . . . . . .4 7. 151 Average proportion of omniscient utility earned for various probabilistic metalevel control strategies. . . . . . . . . . . . . .
LIST OF FIGURES 8 .
. . . . . . . . . . . .1 7. . A summary of nonlocal effect (NLE) types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 3. . . 154 Average amount of effort spent managing plans under various probabilistic metalevel control strategies using strengthening. . . . . . . . . . . Average time spent by PADS calculating and updating probability proﬁles while executing schedules in the singleagent domain. . . . . . . . . . . . . . . . . . . . . 157 Average amount of effort spent managing plans under various probabilistic plan management approaches. . . . . . . . . 40 42 98 99 A comparison of strengthening strategies. . . . . . . . . . . . . . . . . 163 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Characteristics of the 20 generated test suite problems. .5 A summary of utility accumulation function (uaf) types. . . . . . . .1 7. . . . . . 161 Average size of communication log ﬁle. . . . 153 Average amount of effort spent managing plans under various probabilistic metalevel control strategies. . . . .3 7. . .2 5. . . . .List of Tables 3. . . . . . . . . . . . . . . . . .1 5. . . . . . . .4 7. . . . . . . . . . .2 6. . Characteristics of the 20 DARPA Coordinators test suite problems. . . . . . . . . . . .2 7. . . . . . . . . .
LIST OF TABLES 10 .
. . . . . . . . . temporal constraints and hard NLEs.3 5. . .12 AND task ﬁnish time distribution calculation with discrete distributions and uafs. . . . . Method soft NLE duration and utility effect distribution calculation with discrete distributions. . . . 60 61 68 75 76 78 78 79 80 81 82 83 84 95 5. . . . . . . . . . .3 Global backﬁlling step utilizing maxleaf tasks. Method probability of earning utility calculation with utility formulas. . . . . .14 Distributed taskgroup expected utility approximation with enablers and uafs. 5. . . .List of Algorithms 5. . . . . . . . . . . . . . . . . . . Method ﬁnish time distribution calculation with discrete distributions. . . . . . . . . . . 5. . . . . . . . . . . .4 5. . . . . . . . . . . . . . . . 111 Distributed backﬁlling step utilizing maxleaf tasks. . temporal constraints and soft NLEs. . . .2 5. . . . . . . . .2 6. . . . Method start time distribution calculation with discrete distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Task probability proﬁle calculation. . . 5. Taskgroup expected utility calculation for the singleagent domain with normal duration distributions and temporal constraints. . . . . . . . . . . . . . .10 OR task ﬁnish time distribution calculation with discrete distributions and uafs. . . . . . . . . . . . .7 5. . . . . . . . . . . . . .11 OR task expected utility calculation with discrete distributions and uafs. .5 5. . . . . . . . . . . . . . . . . Activity utility formula evaluation. . . . . . . . . . . .9 Method probability proﬁle calculation. . Method expected utility calculation with discrete distributions and soft NLEs. . . . . . .6 5. . .8 5. . . .1 5. . . . 5. . . 6. . . . . . . . . . 116 11 . . . . . . . .13 AND task expected utility calculation with discrete distributions and uafs. . . . . . . . . . . .1 6. 114 Global swapping tactic for a schedule with failure likelihoods and NLEs. . . . . . . . . . . . . . . . . . . . . . . . .
1 7. . .LIST OF ALGORITHMS 6. . . . . 117 Global baseline strengthening strategy utilizing backﬁlls. . 136 Taskgroup expected utility calculation. . . . . . . . . .3 Global pruning step. .7 7.2 7. . 137 12 .6 6. . . . . . . 120 Probabilistic plan management. . . . for the singleagent domain with normal duration distributions and temporal constraints. swaps and prunes. swaps and prunes. . . . . . . . . . . . . . . . . . . . 119 Global greedy strengthening strategy utilizing backﬁlls. . . . . . . . . . . . . . . . . . . . . . . 119 Global roundrobin strengthening strategy utilizing backﬁlls. . .4 6. . . within an uncertainty horizon. . . . . . . .5 6. 134 Uncertainty horizon calculation for a schedule with normal duration distributions. . . . swaps and prunes. .
in contrast.. operate autonomously in realworld situations. [Morris et al. Beck and Wilson. For example. multiagent execution. 2007]).. However. These approaches are much more scalable than probabilistic planning or scheduling.Chapter 1 Introduction We are motivated by the highlevel goal of building intelligent robotic systems in which robots. 2008].g. Research that focuses on constructing plans (or policies) by reasoning with explicit models of uncertainty has produced some promising results (e.. 2007]. Not surprisingly. 2001]) or ignore the potential leverage that can be provided by an explicit uncertainty model. with the goal of constructing a ﬂexible plan (or set of plans) that can absorb deviations at execution time [Policella et al. has emphasized the use of expected models and deterministic planning techniques. to deciding which activities to execute given unforeseen circumstances.. in complex domains these approaches can quickly become prohibitively expensive. or teams of robots and humans.. Many domains of this type are further complicated by the presence of durative actions and concurrent. the deterministic assumptions unavoidably can create problems to which 13 . Research in robust planning. Such approaches typically produce optimal or expected optimal solutions that encode a set of plans that cover many or all of the contingencies of execution.g.. executing and managing robust plans for agents operating in such situations. from sending commands to the controller to navigate over unpredictable terrain. [McKay et al. 2000. even though such systems typically replan during execution. making them undesirable or impractical for realworld use [Bryce et al. This thesis focuses on the highlevel aspect of the problem: generating. but either result in overly conservative schedules at the expense of performance (e. the general problem of planning and scheduling for uncertain domains remains a difﬁcult challenge. The rich sources of uncertainty common in these scenarios pose challenges for the various aspects of execution.
As a starting point. 14 . Based on this. called Probabilistic Plan Management (PPM). we introduce the concepts of PADS. which ensures that schedule strengthening efforts increase the schedule’s expected utility. The main contribution of this work is adopting a composite approach to planning and scheduling that couples the strengths of both deterministic and nondeterministic planning while minimizing their weaknesses. Probabilistic plan management combines these different components to effectively manage the execution of plans for uncertain domains. maintaining high utility schedules while also limiting unnecessary replanning. but a large amount of online replanning has the potential to lead to a lack of responsiveness and an overuse of computational resources during execution. and probabilistic metalevel control. probabilistic schedule strengthening. We use this analysis in two main ways to manage and bolster the utility of execution. The analysis calculates the overall expected outcome of execution and can be used to identify expected weak areas of the schedule. and replans and strengthens at the identiﬁed times. on top of the deterministic schedule. versus too little. Replanning very frequently can help to alleviate the number of these problems that occur. Our approach ﬁrst layers an uncertainty analysis. Probabilistic schedule strengthening strengthens the current schedule by targeting local improvements at identiﬁed likely weak points. In the following sections. such as by assuming deterministic durations for activities or by ignoring unlikely outcomes or effects of activity execution. is a natural synthesis of both of the above research streams that takes advantage of the known uncertainty model while avoiding the large computational overhead of nondeterministic planning. During execution. ultimately resulting in a loss of utility. PPM makes the decision of when replanning and probabilistic schedule strengthening is necessary. explicitly addressing the tradeoff of too much replanning.1 Introduction replanning cannot adequately respond. these tradeoffs are explicitly addressed by PADS. Our approach. Such improvements have inherent tradeoffs associated with them. which can lead to undesirable errors or missed opportunities during execution. that is built to maximize utility under deterministic modeling assumptions. we assume a deterministic plan. or schedule. PADS tracks and updates the probabilistic state of the schedule. and our basic approaches to them. such as a missed deadline that cannot be repaired. called a probabilistic analysis of deterministic schedules (PADS). Probabilistic metalevel control uses the expected outcome of the plan to intelligently control the amount of replanning that occurs. which can lead to the overuse of computational resources and lack of responsiveness.
multiagent domains. or to minimize their effects. the probability of the activity meeting its objective. We have developed an algorithm that performs a Probabilistic Analysis of Deterministic Schedules (PADS). The use of such improvements. they make strong. the algorithm explores all known contingencies within the context of the current schedule in order to ﬁnd the probability that execution “succeeds” by meeting its objective. and often inaccurate. thereby strengthening schedules against the uncertainty of execution [Gallagher. for each activity. It calculates. By explicitly calculating these values. as well as its expected contribution to the schedule. utility loss may occur due to the lastminute nature of the schedule repair. and their tradeoffs. 1. Replanning during execution can effectively respond to such unexpected dynamics. PADS can be used as the basis for many planning decisions. even if the agent is able to sufﬁciently respond to a problem. In this thesis. Example strengthening actions. deterministic modeling assumptions have the potential to lead to frequent constraint violations when activities end at unanticipated times and cause inconsistencies in the schedule.1. One way to prevent this situation is to layer a step on top of the planning process that proactively tries to prevent these problems from occurring. 2009]. In the case of domains with temporal uncertainty (e. PADS is able to summarize the expected outcome of execution. Further. as well as identify portions of the schedule where something is likely to go wrong. which make small modiﬁcations to the schedule to make them more robust.g.2 Probabilistic Schedule Strengthening While deterministic planners have the beneﬁt of being more scalable than probabilistic planners. however. given their associated uncertainty model. we show how its low computational overhead makes it a useful analysis for even complex.. uncertain activity durations). At the highest level of abstraction. but only as long as dependencies allow sufﬁcient time for one or more agent planners to respond to problems that occur. further. Schedules are strengthened via strengthening actions.1 Probabilistic Analysis of Deterministic Schedules Central to our approach is the idea of leveraging domain uncertainty models to provide information that can help with more effective executiontime management of deterministic schedules. typically has an associated tradeoff. assumptions on how execution will unfold.1 Probabilistic Analysis of Deterministic Schedules 1. and how it can be used as the basis for different metalevel control strategies that control the computational resources used to replan during execution. we discuss how it can be used to strengthen plans to increase the expected utility of execution. such as the ones we consider. are: 15 .
In a search and rescue scenario. via our PADS algorithm. In deterministic systems. We have evaluated the effect of probabilistic schedule strengthening in a multiagent simulation environment. In this thesis. depending on which activities have tight deadlines and/or constraints with other activities. 2009]. Probabilistic schedule strengthening produced schedules that are signiﬁcantly more robust and earn 32% more utility than their deterministic counterparts when executed. Deciding to take a patient to the hospital before refueling. As described. improvements are targeted at activities that have a high likelihood of not doing their part to contribute to the schedule’s success. trying to reach survivors via two rescue points. for example. may increase the likelihood of a successful rescue. but the driver then has a higher risk of running out of fuel and failing to perform other activities.This provides protection against activity failure but can clog schedules and reduce slack. depending on what is most appropriate for the domain’s structure. and whose failure would have the largest negative impact. as a result. Adding redundant backup activities . For example.This provides protection against an activity missing a deadline. restoring power to only critical buildings instead of the entire area may allow power to be restored more quickly. The set of strengthening actions used can be adapted to suit a variety of domains. but.This may or may not help. instead of one. but shorterduration alternatives tend to be less effective or have a lower utility. Probabilistic schedule strengthening ﬁrst relies on PADS to identify the most proﬁtable activities to strengthen. Ideally.g. schedule strengthening can be performed by using simple deterministic heuristics. Swapping activities for shorterduration alternatives . however. or limited stochastic analyses of the current schedule (e. schedule strengthening should be used in conjunction with replanning during execution.1 Introduction 1. all such improvements have an associated tradeoff. may help the patient. but it is less desirable as other buildings go without power. ensures that all strengthening actions have a positive expected beneﬁt and improve the expected outcome of execution. allowing agents to take advantage of opportunities that may arise while also ensuring that they always have schedules that are robust to unexpected or undesirable outcomes.. considering the uncertainty of the duration of an activity without any propagation) [Gallagher. the agents at the added rescue point may not have enough time to execute the rest of their schedule. for example. 2. Reordering scheduled activities . PADS. When it is used in conjunction with replanning during execution. 3. we argue that explicitly reasoning about the uncertainty associated with scheduled activities. provides a more effective basis for strengthening schedules. Then. it results in 14% 16 . where the objective is to maximize utility earned during execution and there is temporal and activity outcome uncertainty.
possibly wasting computational effort. however. either because there is likely a new opportunity to leverage or a possible failure to avoid. In this thesis. otherwise. no replanning is done. ignore the beneﬁt that explicitly considering the uncertainty model can provide. For example. in large part because it is difﬁcult to predict whether a replan is beneﬁcial without actually doing it.3 Probabilistic MetaLevel Control more utility earned as compared to replanning alone during execution. has the potential to become counterproductive by leading to a lack of responsiveness and a waste of computational resources. Similarly. similarly. In situations with high uncertainty. Metalevel control is one technique commonly used to address this tradeoff. and takes advantage of opportunities that may arise from unpredicted favorable outcomes. based on probability proﬁle information obtained by the PADS algorithm. probabilistic metalevel control uses the probabilistic analysis to identify times where there is potential for replanning to be useful. 1. always replanning the entire schedule means that activities farther down the schedule are potentially replanned many times before they are actually executed. replanning is possibly useful and so should be performed. agents would replan if it would be useful either in avoiding failure or in taking advantage of new opportunities. (4) or replanning would not be able to exploit the opportunities that were there. we show that a probabilistic metalevel control algorithm. The control strategy can also be used to select the amount of the current schedule that is replanned. Ideally. (2) there is a possible failure but replanning would not ﬁx it. the beneﬁt of planning actions such as probabilistic schedule strengthening can be greatly affected by the robustness of the current schedule. To avoid this. As an alternative. When such times are identiﬁed. Replanning too much during execution. Deterministic approaches such as this one. however. Many of these situations are difﬁcult to explicitly predict. in part. whether replanning would result in a better schedule can depend. 2007]. abstract features of the current schedule [Raja and Lesser.1. either because: (1) there is no potential failure to avoid. In deterministic systems. we introduce an uncertainty hori17 . (3) there is no new opportunity to leverage. is an effective way of selecting when to replan (or probabilistically strengthen) during execution. Metalevel control allows agents to optimize their performance by selectively choosing deliberative actions to perform during execution. they ideally would not replan if it would not be useful.3 Probabilistic MetaLevel Control Replanning provides an effective way of responding to the dynamics of execution: it helps agents avoid failure or utility loss resulting from unpredicted and undesirable outcomes. on the probability of the current schedule meeting its objective. metalevel control can be done by learning a model for deliberative action selection based on deterministic.
we discuss background concepts related to our work.4 Thesis Statement Explicitly analyzing the probabilistic state of deterministic plans built for uncertain domains provides important information that can be leveraged in a scalable way to effectively manage plan execution. with no signiﬁcant change in utility earned. Chapter 4 provides an overview of our approach to probabilistic plan management. using probabilistic plan management was about to reduce schedule execution time by 23% (where execution time is deﬁned as the schedule makespan plus the time spent replanning during execution). and portions that are not. and limit the amount of computational effort spent replanning and strengthening schedules during execution. In Chapter 2. Overall. 1. Its use in a singleagent scenario with temporal uncertainty has shown a 23% reduction in schedule execution time (where execution time is deﬁned as the schedule makespan plus the time spent replanning during execution) versus replanning whenever a schedule conﬂict arises. as well as different methodologies for metalevel control. probabilistic plan management resulted in increasing utility earned by 14% with only a small added overhead. 1. through experimentation. including details on the two domains we consider. Using probabilistic metalevel control in a multiagent scenario with temporal and activity outcome uncertainty results in a 49% decrease in plan management time as compared to a plan management strategy that replans according to deterministic heuristics. PPM provides some of the beneﬁts of nondeterministic planning while avoiding its computational overhead.5 Thesis Outline The rest of this document is organized as follows. This distinction is made based on the probability that activities beyond the uncertainty horizon will need to be replanned before execution reaches them. we discuss related work. In Chapter 3. the effectiveness of probabilistic metalevel control. Our approach was validated through a series of experiments in simulation for two different domains. We have shown. Next. describing various approaches to the problem of planning and scheduling for uncertain domains.1 Introduction zon. The following three chapters go into 18 . which distinguishes between portions of the schedule that are likely to need to be managed and replanned at the current time. For one domain. We support this statement by presenting algorithms for probabilistic plan management that probabilistically analyze schedules. strengthen schedules based on this analysis. For another.
19 . probabilistic schedule strengthening. and probabilistic metalevel control. in Chapter 8.5 Thesis Outline more detail about the three main components of our approach: probabilistic analyses of deterministic schedules. Finally.1. we discuss future work and conclusions.
1 Introduction 20 .
1960]. 2. The classic MDP model has four components: a collection of states. categorized by whether uncertainty is taken into account in the planning process or planning is done deterministically. there are policybased approaches based on the basic Markov Decision Process (MDP) model [Howard.. Approaches can be distinguished by what restrictions they place on the domains they can handle. including work utilizing planning horizons.Chapter 2 Related Work There are many different approaches that handle planning and execution for uncertain domains. a transition model (i. 2. Some MDP models also assume that the world is partially observable. Of those that make this assumption. and whether they account for concurrency and multiple agents.1 NonDeterministic Planning The main distinction between planners that deal with uncertainty is whether they assume agents can observe the world during execution. such as whether they consider durative actions and temporal uncertainty. and the probability of the transition happening). We present some of these approaches. We also discuss various approaches to the metalevel control of computation during execution. and the 21 .1.1 PolicyBased Planning One common policybased planner is based on an MDP model [Howard. which states actions transition between. 1960] and more searchbased approaches such as contingency planners. a collection of actions. Planning approaches that assume the world is not observable are commonly referred to as conformant planners.e.
Each agent in this approach has its own MDP. scalability ultimately prevented it from performing better than competing deterministic approaches. the same domain that we use as part of this thesis work [Musliner et al. Timedependent MDPs [Boyan and Littman. an extension of SemiMDPs [Younes et al. this has been approached in a variety of ways [Feng et al. such as if users prefer a safer. It still. that has been done to remove this limitation. 2007]. Li and Littman. This classic MDP model is unable to handle explicit durative actions. and so cannot be used for multiagent scenarios. 2005]. 2003. Hybrid MDPs can also handle deadlines. however. and the authors suggest that it may be improved 22 . None of the above MDP models.. However. that the ∆DUR planner that plans with expected durations outperforms the ∆DUR planner that explicitly models temporal uncertainty [Mausam and Weld. The DUR family of MDP planners can also handle both concurrency and durative actions [Mausam and Weld. suffers from the high complexity of policybased probabilistic planners. Generalized SemiMDPs.. at the cost of very high complexity. A nonMDP policybased approach is FPG. which it partially and approximately solves in order to generate a solution in acceptable time. This assumption makes POMDP’s very difﬁcult to solve. 2009]. 2007].. Younes and Simmons. In addition to the MDP systems above. and the additional complexity is unnecessary for the domains considered in this document. The MDP can be solved to ﬁnd the optimal policy that maximizes reward based on the problem speciﬁcation. continuous durations. Marecki et al.. There is a growing body of work. 2006]. again at a very high computational cost. 1998]. 2006.2 Related Work rewards associated with each transition (or state). however. This domain is especially challenging for MDP approaches. a Factored Policy Gradient planner [Buffet and Aberdeen. 2005. for example. This planner uses gradient ascent to optimize a parameterized policy. and can represent both discrete and continuous variables. multiagent domain with activity duration and outcome uncertainty. where the world is fully observable. 1971]. MDPs that assume the world is partially observable. The work shows. plan. assume that agents can make some local observations during execution but cannot directly observe the true state . however. as plans must be constructed in real time. One of the main difﬁculties of this model is convolving the probability density functions and value functions while solving the model. can handle concurrency. 2004. 2000] go one step further and can model deadlines by undiscounting time in the MDP model. or more optimal but riskier. or POMDPs. SemiMDPs. handle domains with uncertain action durations and no deadlines [Howard. in fact. can handle concurrency and actions with uncertain. which can be considered by our approach.. MDPs have been used to solve a distributed. agents must probabilistically infer it [Kaelbling et al. 2004a].instead. despite this success.
These incremental approaches make contingency planning more manageable or tractable. by branching on utility gain and ﬁxing plan failure points... Due to its difﬁculty. 1994].2. contingency planning is best for dealing with avoiding unrecoverable or critical errors in domains with small numbers of outcomes 23 . PGraphPlan [Blum and Langford.. but at an even higher computational cost. Some work tries to lessen the computational burden of contingency planners by planning incrementally. which creates branch points in a schedule by ﬁnding the place where it is most likely to fail based on normal task distributions overrunning their scheduled duration.1. 2003] and Tempastic [Younes and Simmons. 2003]’s OKP generates the optimal plan having a limited number of contingency branches. Partialorder e contingency approaches include CBuridan [Draper et al. planners create branching plans. These approaches share some similarities with our approach. Although it uses a probabilistic planning graphbased heuristic to reduce the search space. or contingency. [Meuleau and Smith.2 SearchBased Contingency Planning Conditional. 1998] calculates plan certainty by translating the plan into a Bayesian network. BProdigy [Blythe and Veloso. [Blythe. and use contingency planning in their scheduling process and to manage the uncertainty between agents. Both of these approaches use stochastic simulation to ﬁnd plan outcome probabilities.1 NonDeterministic Planning by combining deterministic reasoning with their probabilistic approach in order to allow it to handle larger and more difﬁcult policies. 1997]. 1994]. 2006]. 2004b] also build contingency plans incrementally. DTPOP [Peot. For simple distribution types. this involves performing probability calculations similar to how we calculate our uncertainty horizon for singleagent schedules. however. MBP [Bertoli et al. 1999] and Paragraph [Little and Thi´ baux. the approach in general is still unable to model large numbers of outcomes. it is possible to add them to these architectures.. ICP [Dearden et al. 1999. durative problem over a ﬁnite horizon [Little et al. Prottle uses forward search planning to solve a concurrent. 1998]. are unable to support durative actions. [Xuan and Lesser.. 2005]. 1999]. and assume observations during execution to choose which branch of the plan to follow. Many contingency planners such as SGP [Weld et al. 1998] and Mahinur [Onder and Pollack. One common incremental approach is JustInCase (JIC) scheduling [Drummond et al. Wagner et al.. respectively.. memory usage prevents this algorithm from being applied to anything but small problems. They do not support concurrency or durative actions. but at the cost of high complexity. and adding contingent branches to increase the probability of success. however. 2. 2006] consider a domain similar to ours. 2001]. as they try to identify weak parts of the plan to improve at plan time.
typically. none of the above conformant planners can currently handle durative actions.. and has the goal of building plans that are robust to various types of uncertainty. 2002]. 2. [Beck and Wilson. in part. in contrast.. and then replan as necessary during execution. CMBP [Cimatti and Roveri. however. effective conformant planning requires domains with limited uncertainty and very strong actions. 1998]. while calculating the robustness of plans. Using Monte Carlo simulation in this way. [Fox et al. 2007]. 2009] also considers probabilities analytically. Conformant planning is very difﬁcult. This very simple strategy for dealing with domain uncertainty won the 24 . where each variation is generated by stochastically altering the timings of execution points in the plan.2 Related Work as in [Foss et al. 2000]. and replans when unexpected states occur. 2006] handle uncertainty by developing plans that will lead to the goal without assuming that the results of actions are observable. our approach analytically calculates and stores probability information for each activity. a probabilistic extension to the CIRCA planner [Musliner et al. which determinizes probabilistic input domains.2 Deterministic Planning The basic case for deterministic planning in uncertain domains is to generate a plan based on a deterministic reduction of the domain. 1993] is described that uses Monte Carlo simulation to verify whether a potential deterministic plan meets speciﬁed safety constraints. CGP [Smith and Weld. 2001]. but still partially relies on sampling and cannot handle complex plan objective functions. plans with the deterministic planner FF [Hoffmann and Nebel. Additionally. [Fritz and McIlraith. Planning to add “robustness” is similar to conformant planning. A common way to think about conformant planners is that they try force the world into a single. In [Younes and Musliner.. as the number of particles needed increases quickly with the amount of domain uncertainty. is not practical for our runtime analysis. BURIDAN [Kushmerick et al.3 Conformant Planning Conformant planners such as [Goldman and Boddy. A wellknown example of this strategy is FFReplan [Yoon et al... 2007] also uses Monte Carlo simulation to generate effective deterministic schedules for the jobshop scheduling problem.. 2007] build plans robust to temporal and other types of uncertainty by utilizing sampling techniques such as Monte Carlo simulation to approximately predict the outcome of execution. [Blackmore et al. 1995] and PROBAPOP [Onder et al. 2007].. known state by generating “failsafe” plans that are robust to anything that may happen during execution.1. 1996]. 2006] “probes” a temporal plan for robustness by testing the effectiveness of different slight variations of the plan. and also very limiting. 2.
and was also the top performer in the 2006 competition. 2007] generates robust partialorder plans by expanding a ﬁxed times schedule to be partialorder. 2009] makes plans more robust 25 . suffers from problems such as unrecoverable errors and spending large amounts of time replanning at execution time (even though the time spent planning may be smaller overall).. In a robustnessfocused approach to partialorder planning. Below we discuss two main bodies of work that have been developed to improve deterministic planning of uncertain domains: increasing plan ﬂexibility and robustness to lessen the need to replan. but it is so far unable to handle the types of domains we are considering. These planners in general.. IxTeT [Laborie and Ghallab. [Gallagher. Partialorder planners repair plans at run time by searching through the partial plan space. They incorporate probabilistic information as an action “cost model. enabling fast constraint propagation through the network as start and end times are updated after execution to generate new execution intervals for later tasks. and making each replan that does happen as fast as possible.1 Plan Flexibility and Robustness Whereas ﬁxed times schedules specify only one valid start time for each task.” and by solving the resulting deterministic problems they see a decrease in the frequency of replanning. This approach has been demonstrated to outperform FFReplan on single agent domains without durative actions.or partialorder plans could be used as part of our analysis.. 2003] and [Vidal and Geffner.. 1995]. The schedules are modeled as simple temporal networks (STNs) [Dechter et al. as either total. 1991]. and so replanning in general is more expensive.2. ﬂexible times schedules allow execution intervals for tasks to drift within a range of feasible start and end times [Smith et al. 2. 2006]. showing improvements with respect to fewer precedence orderings and more temporal slack over a more topdown planning approach. Planning in this way. 2007] also generates partialorder schedules that are guaranteed to have a minimum utility with probability (1 − ) for scenarios where the only uncertainty is in the processing time of each task. 2006] look at how to create more robust deterministic plans for probabilistic e planning problems by directly considering the underlying probabilistic model. however.2. [Jim´ nez et al. [Lau et al. This allows the system to adjust to unforeseen temporal events as far as constraints will allow. [Policella et al. however. VHPOP [Younes and Simmons.. This work is orthogonal to our approach. 2006] are temporal planners that generate partialorder plans.2 Deterministic Planning Probabilistic Track of the 2004 International Planning Competition. have a much larger state space than totalorder planners. replanning only if one or more constraints in the current deterministic schedule cannot be satisﬁed. Flexible times schedules may be totalor partialorder.
2 Related Work by using simple schedule strengthening that relies on a limited stochastic analysis of the current schedule. it is not as effective as our probabilistic schedule strengthening approach which relies on a probability analysis that propagates probability information throughout the entire schedule. ASPEN [Chien et al. valid plan. making replanning as fast as possible. this prediction could also be used as part of our probability calculations. A plan with this property is guaranteed to have a successful solution strategy for an uncertain temporal planning problem. 2. 2006]. in theory. but no 26 . the agents predict how much longer executing tasks will take to complete. and use heuristics to both maximize schedule utility and minimize disruption. In order to anticipate problems.2. [Pollack and Horty. HOTRiDE (Hierarchical Ordered Task Replanning in Dynamic Environments) [Ayan et al. 2005]. GPG [Gerevini and Serina.. identifying problems and. a plan can be conﬁgured to be dynamically controllable [Morris et al. there are no guarantees on how long it will take before a solution is found. ideally. until a new plan is found. in general. as well as the above approaches in general. 2000] is one such planner that makes heuristic. 2007] is a Hierarchical Task Network (HTN) planning system that replans the lowest task in the task network that has a conﬂict whose parent does not. [Gallagher et al. none of these three approaches has yet been demonstrated to work with durative tasks. 2001]. minimizing schedule makespan [Sellner and Simmons. are either overly conservative or ignore the extra leverage that explicit consideration of uncertainty can provide. however. 2006] and [Smith et al. however.. Because of the randomness. conﬂicted plan and makes changes to it. however. such as by adding or removing tasks. This approach. This helps promote plan stability. 1999] describes an approach to plan management which includes commitment management.. 1994].. This typically involves plan repair.. 2007] rely on an incremental scheduler to manage schedules during execution. and selective plan elaboration to manage plans during execution and provide the means to readily change plans if better alternatives are found. Alternately. randomized changes to a conﬂicted plan to ﬁnd a new. Other plan repair systems are OPlan [Tate et al. alternate plan assessment. An approach that helps to promote a different kind of ﬂexibility during execution allows agents to switch tasks during execution in order to expedite the execution of bottleneck tasks..2 Plan Management and Repair A second way to optimize the deterministic planning approach is manage them during execution. 2000] and [van der Krogt and de Weerdt. where the planner starts with the current. Another common way to manage plans is to perform fast replanning during execution when errors are identiﬁed.
Frequent replanning should also be used with care. 2008] considers a distributed. Probabilitybased criticality metrics are used to repair the plan during execution according to predeﬁned policies. Frequent replanning can help to overcome this downside. There is a large body of work in the area of metalevel control that addresses a variety of aspects of the problem [Cox. Overall.. The only uncertainty these two approaches address. in the spirit of our schedule strengthening.3 MetaLevel Control Metalevel control is “the ability of complex agents operating in open environments to sequence domain and deliberative actions to optimize expected performance” [Raja and Lesser. Hansen and Zilberstein. 2001]. Alexander et al. Chien et al. it would be interesting to see how the MDP would perform if the state space also incorporated in the probability proﬁles generated by our PADS algorithm. deterministic planning systems are at a disadvantage because they ignore the potential leverage that can be provided by an explicit uncertainty model. [Musliner et al.3 MetaLevel Control data is available about the time spent replanning during execution. RepairSHOP results indicate that it also helps to lessen the time spent replanning during execution. but the deterministic assumptions unavoidably can create problems to which replanning cannot adequately respond. 2005]. Their approach.. e e 2005. 2007] is similar to HOTRiDE.2. multiagent domain with activity duration and outcome uncertainty that we use as part of this thesis work. [Maheswaran and Szekely. as it has the potential to grow out of hand in extremely uncertain environments. however. and is highly tailored to the domain in question. and adaptively learn performance proﬁles of deliberation and the metalevel strategy. but minimizes replanning based on “explanations” of plan conﬂicts.. Boddy and Dean.. Typically. RepairSHOP [Warﬁeld et al.. 2008. to ensure that any extra time spent ﬁnding a better solution is worth the expected increase in utility of that solution. 2007] develop a metalevel control framework for a distributed multiagent domain that learns a metalevel MDP to decide what control actions to take at any give time. [Raja and Lesser. is not as methodical in its probabilistic analysis nor its use of it. is whether an executed task succeeds. 2000. 1994]. 2. Work has been performed to control the computational effort of anytime algorithms [Russell and Wefald. however. Estlin et al. [Alexander et al. Using a planning horizon is another way of performing metalevel control [L´ aut´ and Williams. 1991. 2003] perform metalevel control of a realtime agent. planning horizons specify how far into the 27 . The MDP is based on a state space of abstract features of the current schedule.. 2001b. 2010] describe an anytime approach that unrolls an MDP incrementally at run time based on the tradeoff of improving the policy versus executing it. 2007].
Instead.2 Related Work future a planner should plan. 2010] also inherently include a planning horizon as they expand an MDP’s horizon while time is available. Alexander et al. which we do not for several reasons. many of these approaches explicitly reason about the cost of computation. execution and replanning can occur concurrently in our domains. both of these approaches consider the cost of computation in their choice of a horizon. Finally. however. Although planning with a horizon can threaten the effectiveness of the schedule. 2001a. our approach analyzes the schedule to see if replanning is likely necessary based on probabilistic information such as how likely it is that execution will succeed if a replan does not occur. In general. as portions of the solution are being ignored. Second. performance proﬁles can be difﬁcult to model for heuristic or randomized planners like the ones we use in this document. 28 . In the above approaches.. lessening the beneﬁt of explicitly modeling it to use in a metalevel control algorithm. removing the need to explicitly choose between deliberation and action at any given time. First. Anytime algorithms such as [Hansen and Zilberstein. deterministic replanning tends to be fast. the horizon is set at a ﬁxed amount of time in the future. horizons can increase the tractability of planning (or replanning).
however. similarly. Two of our algorithms. Finally. For the purposes of this document. By this deﬁnition. a probabilistic 29 . and scheduling as the problem of deciding when to execute it. Our terminology. Next. “Probabilistic Analysis of Deterministic Schedules” and “Probabilistic Schedule Strengthening” work to analyze plans at the schedule level. We name our algorithms for the general case. many domains typically considered scheduling domains include at least a limited planning component. a technique we use in our probabilistic metalevel control strategy. followed by the representation of domains and schedules that we use. we introduce decision trees with boosting.” reﬂects the idea that we are managing plans to satisfy goals at the highest level. all temporal planning domains include a scheduling component.Chapter 3 Background In this chapter. should not be thought to limit the scope of our approach. This leads us to the choice of planning/scheduling in our title and in the name of our algorithms. “Probabilistic Plan Management. we describe core concepts and technologies upon which we build our work. 3. We ﬁrst discuss our use of plan versus schedule terminology. where there is an overall plan with a schedule that is executed to realize it. The ideas in this document could very well be tailored to probabilistic schedule management. so we use the word “schedule” to describe them. Our title.1 Plan Versus Schedule Terminology In this document. we deﬁne planning as the problem of deciding what to execute. we use both the terms planning and scheduling freely. we discuss in detail the two domains we speciﬁcally analyze as part of this thesis.
These types of constraints are passed down such that child activities inherit constraints from their parents. The successful satisfaction of the objective of the root node corresponds to the goal of the domain.2 Domain and Schedule Representation 3. At the lowest level of the tree are the primitive. the schedule is at risk of losing utility by not observing the task types. of which Hierarchical Task Networks (HTNs) are a wellknown example [Erol et al. if observed.3 Background analysis of deterministic plans. magniﬁes the utility of an action. and only one child under a ONE task. Tasks can be children of other tasks. we assume all domains can be represented by activity networks. Constraints are used to describe dependency relationships between any two activities in the network (although typically there are no explicit dependency constraints between activities with direct ancestry relationships). which are called methods. or tasks.2. The methods are collected under aggregate nodes. Activity networks represent and organize domain knowledge by expressing dependencies among executable actions as a hierarchical network (or tree) of activities. each task has a type: AND. or ONE. 1994]. and a ONE task can have only one child underneath it successfully execute. OR. The most common constraint type is a temporal constraint between activity execution time points. As part of our analysis of one of our domains we also consider other types of constraints such as facilitates. which we refer to as a prerequisite relationship. An AND task must have all children underneath it succeed by meeting their objective in order for its own objective to be satisﬁed. An OR task must have at least one child underneath it execute successfully. and an activity’s effective execution window is deﬁned by the tightest of these boundaries. one or more under an OR task. or plan strengthening. depending on the needs of the domain. otherwise. In this work. an optional prerequisite that..1 Activity Networks We present much of the description of our work in terms of activity networks. 3. a prerequisite relationship from activity A to activity B means that the source A must come before the target B. The assumption that such domain knowledge is available is common for many realworld domains [Wilkins and 30 . Typically. called the taskgroup. a good schedule would include all children under an AND task. for example. Activities can also have constraints such as release (earliest possible start) times and deadlines. and ultimately all activities (a term that refers to either methods or tasks) are collected under the root node of the network. In our activity networks. executable actions. Sibling activities can be executed concurrently in the absence of any constraints that require otherwise.
e. Notice that the method is constrained to start no earlier than time 0 (via the constraint between time 0 and m start) and ﬁnish by time 10 (via the constraint between time 0 and m f inish). A simple example of an activity network representation for a domain where a set of activities must be executed is a network where all methods are children of an AND taskgroup.. so that slack can be explicitly used as a hedge against temporal uncertainty.. As the underlying representation we use Simple Temporal Networks (STNs) [Dechter et al. ∞] between the ﬁnish time of the ﬁrst method and the start of the second. durations. 2000]. method durations are [d.1: A sample STN for a single scheduled method m with duration 5 and a deadline of 10. a deadline dl is represented as an edge with bounds [0.2. The STN for a single scheduled method with duration 5 whose execution must ﬁnish before its deadline at time 10 is shown in Figure 3. are represented as edges in the graph. An STN is a graph where nodes represent event time points. termed predecessors and successors. dl] between time 0 and the ﬁnish time of the constrained method. In the deterministic systems we consider.'. where d is the method’s deterministic. ﬂexible times schedules as in [Smith et al.. 3. desJardins. This type of schedule representation allows the execution intervals of activities to ﬂoat within a range of feasible start and ﬁnish times. d]. latest start time 31 . An agent’s schedule is represented by posting precedence constraints between the start and end points of each neighboring pair of methods. Constraints are propagated in order to maintain lower and upper bounds on all time points in the STN. and edges specify lower and upper bounds (i.2 Domain and Schedule Representation /%04%2$ /%012$ !"#$%$ "&'()*($ /3032$ "&+. 1991]. For example. grounding the network. d ] between a method’s start and ﬁnish time.3. ub]) on temporal distances between pairs of time points. ancestry relationships and prerequisites. [lb. Method durations are represented as an edge of [d. such as method start and ﬁnish times. and prerequisites may be represented as edges of [0. the constraint network provides an earliest start time est. A special time point represents time 0. Speciﬁcally. 2007].2 Flexible Times Schedules We assume that schedules are represented as deterministic. All constraints and relationships between activities. such as deadlines.$ Figure 3.1. scheduled duration.
There is uncertainty in method duration and outcome. complex domain demonstrates the generality and scalability of our algorithms. where there is uncertainty in method durations. Even if a schedule is not generated by a ﬂexible times planner. Utility is explicitly modeled as part of the domain description. 2009]. which indicates that some time bound is violated. each node can be initialized with all possible times (e. The second domain is a multiagent domain that has a plan objective of maximizing utility. All tasks must be executed. Each task has a variable number of children methods. The agent can execute at most one method at a time. h] where h is a time point known to be after all deadlines and when execution will have halted). The next subsections describe these domains in detail. Although the do32 . The domain is based on a wellstudied problem with temporal constraints and little schedule slack... it can easily be translated into a ﬂexible times schedule. in order for the plan to succeed. only one of which can be executed to fulﬁll that task’s role in the plan. These characteristics make it an ideal domain for which to demonstrate the fundamental principles of our probabilistic analysis.1 SingleAgent Domain We ﬁrst demonstrate our work on a singleagent planning problem. The objective in the singleagent domain is to execute a set of tasks by a ﬁnal deadline. Minimal and maximal time lags specify temporal constraints between activity start and ﬁnish time points. similar to [Policella et al.3. Conﬂicts occur in the STN whenever negative cycles are present. which is concerned with executing a set of tasks by a ﬁnal deadline.g. as well as a rich set of constraints between activities. and its success is greatly affected by the temporal uncertainty. Our domain is based on the MultiMode Resource Constrained Project Scheduling Problem with Minimal and Maximal Time Lags (MRCPSP/max) [Schwindt. earliest ﬁnish time eft and latest ﬁnish time lft for each activity. preserving the sequencing of the original schedule.3 Domains We have investigated our approach in two domains: a singleagent domain and a multiagent one. 1998]. times in the range [0. observing all temporal domain constraints. 3. 3. and constraint propagation used to generate the valid execution intervals of each activity. and tasks do not have explicitly utility.3 Background lst. This NPcomplete makespan minimization problem is wellstudied in the scheduling community. and is ﬁxed times. Using this large. To generate initial time bounds. The process begins by representing the schedule as described above.
* . The problems specify a duration durm for each method m. intuitively. These constraints act as prerequisite constraints that have an associated minimal and/or maximal time delay between the execution time points of the two methods involved. represented as ONE nodes. each problem we use in this thesis has 20 tasks. Although all children of a source task are constrained with all children of a target task. These values were chosen because. Note how all of T1’s children are constrained with each of T2’s children.1* . which is the method’s deterministic. but with different time lags. we modiﬁed the problems by assigning each method a normal duration distribution DUR (m) with a standard deviation σ (DUR (m)) that is uniformly sampled from the range [1. A sample problem (with only 2 tasks) is shown in Figure 3. scheduled duration.2.3 Domains !"#$%&'()* +. after experimenting with sev33 . 3 or 5 methods underneath them.3. Activity dependency constraints are in the form of time lags between the ﬁnish times of the children of a source task and the start times of the children of a target task. Constraints are in the form of prerequisite constraints between the end time of the source task and the start time of the target task. The domain can be represented as an activity network with the AND taskgroup decomposing into each of the 20 tasks of the problem. The time lags speciﬁed in the problem description can also be negative.1* 3/4/* 5(&"6'78*/9* 3/42* 5(&"6'78*:* =2>*?@* =/>*?@* 324/* 5(&"6'78*./* 0. each with potentially different durations. this reverses the prerequisite constraint. main description allows a variable number of tasks and methods.2: A sample MRCPSP/max problem. Underneath the parent taskgroup are the two tasks. the speciﬁcs of the constraints can vary. each with two children. further decompose into their child methods. those tasks. As the domain itself does not include uncertainty. and tasks have 2. and a mean µ (DUR (m)) of (durm − σ (DUR (m)) /2).2* 0.* =:>*?@* 3242* 5(&"6'78*<* =A>*?@* Figure 3. 2 · durm /5].
The algorithm terminates when the pending list is empty. A sample problem for this domain in ASPEN syntax is shown in Appendix A. In order to ﬁt the problem on a reasonable number of pages. we implemented a version of a standard constraint propagation algorithm AC3 [Machworth. ﬂawed plan. Finally. in Section 7.. the planner begins with the current.2..3 Background eral values. since the original problem formulations do not have explicit deadlines represented in it. potentially conﬂicted plan and performs the same repair process until the plan is valid or until it times out. The algorithm begins with all edges marked as pending. more slack built into the schedule) and a minimal schedule makespan (i. they seemed to present the best compromise between schedule robustness (i. meaning they are yet to be checked. After planning (or replanning). as described in Section 3.e. ﬂexible times representation. ASPEN is an iterativerepair planner that produces ﬁxed times plans and is designed to support a wide range of planning and scheduling domains. we expand the plan output by ASPEN into a totalorder. The addition of deadlines adds to the temporal complexity of the domain. 34 . and executes the next activity as soon as its prerequisites are satisﬁed. we assign each problem a ﬁnal deadline. in which case it returns with no solution. When replanning during execution. The agent replans whenever the STN becomes inconsistent.2. 2000] as the planner for this domain. We will present the runtime plan management approach for this domain later. During execution. As our constraint propagation algorithm for the STN.2. It examines each edge and removes possible values from the two connected nodes if they are not consistent with the constraint asserted by the edge.1. no slack build into the schedule).e. To generate a valid plan. It will continue to attempt to generate a plan until a solution is found or the number of allowed repair iterations is reached. If any node has a value removed. the problem was simpliﬁed and some constraints between methods were removed. Core Planner We use ASPEN (Automated Scheduling/Planning ENvironment) [Chien et al.1. 1977] that propagates constraints both forward and backward. at the time. then iteratively and randomly repairs plan conﬂicts until the plan is valid.. We used ASPEN instead of a ﬂexible times planner because it was what we had available to use. all edges except for the current one are added back onto the pending list. the executive monitors the prerequisite conditions of any pending activities. ASPEN ﬁrst generates an initial. durations are rounded to the nearest integer in order to simplify the simulation of execution. The schedule is generated using durm for each method m’s deterministic duration. When sampling from duration distributions to ﬁnd methods’ actual durations.
σ1 + σ2 . σ1 and ∼N µ2 .3: A graph of a normal distribution. distributions are continuous probability distributions typically used to describe data that is clustered around a mean. µ (Figure 3. or average. the normal cumulative distribution function.3 Domains Figure 3. σ 2 .3). maximum. its variance σ 2 is the square of the standard deviation. Normal distributions have the nice property that the mean of the sum of two normal distributions equals the sum of the means of each of the distributions. the normal distribution exactly representing 2 2 the distribution of the sum of two random variables drawn from ∼N µ1 . σ2 is 2 2 ∼N µ1 + µ2 . We use two functions as part of our use of normal distributions. however. As an example.g. Normal. For other functions that some domains may require (e. minimum. as the sum of n distributions can be done in O(n) time. Of course. the same is true for variance. The ﬁrst is normcdf (c. the value of 35 .3. σ). etc. To indicate that a realvalued random variable X is normally distributed with mean µ and variance σ 2 . Thus. these calculations can be changed as appropriate. A distribution’s standard deviation (σ) describes how tightly the distribution is clustered around the mean. if a domain’s distribution type is not represented as a normal. normals do not combine as nicely and an approximation or discretization may be appropriate. This property makes normal distributions very easy to work with in our analysis of this domain. or Gaussian. we write X ∼ N µ. µ. Normal Distributions Task duration distributions are modeled in this domain as normal distributions. This function calculates the probability that a variable drawn from the distribution will have a value less than or equal to c..). which requires summing distributions over groups of tasks.
0. ub]: µtrunc = √ −e−ub /2 + e−lb /2 2π (normcdf (ub. given that it meets a deadline. 0. There is no known way to represent exactly the combination of a truncated normal distribution and a normal distribution. ub. 3 − chisquarecdf lb2 . We deﬁne this function as µtrunc . This requires truncating distributions and combining them. 1) − normcdf (lb. As part of its analysis. 0. which we use in our probabilistic analysis of this domain. multiagent planning problem that is concerned with the collaborative execution of a joint mission by a team of agents in a highly dynamic environment. we base our calculations on [Barr and Sherrill. 1)) − µ2 trunc 2 2 2 σtrunc = chisquarecdf ub2 . 0. 1)) These equations are the basis of the function truncate. which speciﬁes that the mean of a standard normal random variable ∼ N (0.3 Background normcdf (µ. 0. 1999]. σtrunc = truncate (lb. 3 2 · (normcdf (ub. 1) truncated below at a ﬁxed point c is µtrunc and the variance1 is 2 σtrunc = e−c /2 =√ 2π (1 − normcdf (c. the best approximation we are aware of ﬁnds the mean and standard deviation of the truncated distribution and treats it as the mean and standard deviation of a normal distribution. such as by addition. chisquarecdf (x. x]. with other normal distributions. N ) is the chisquared cumulative distribution function and returns the probability that a single observation from the chisquared distribution with N degrees of freedom will fall in the interval [0. 3. σ). 1)) − µ2 trunc From this we derive equations that truncate both above and below at [lb.2 MultiAgent Domain Our second domain is an oversubscribed. 3 2 · (1 − normcdf (c. σ) is 0. PADS needs to calculate the distribution of a method’s ﬁnish time. 0. σ 2 ) is being truncated below and above at [lb.3. 1) − normcdf (lb. as 50% of the distribution’s probability is less than the mean. 1 36 . To ﬁnd the mean and standard deviation of a truncated distribution. meaning that the normal distribution ∼ N (µ. µ. ub]. Missions are formulated as a hierarchical network of activities in a version of the TAEMS language The formula used in this calculation. µ.5. 1)) 2 1 − chisquarecdf c2 .
The domain’s activity network also includes special types of constraints and relationships between activities that warrant special consideration while probabilistically analyzing plans of this domain. 2005]. The plan objective is to maximize earned utility. interior nodes constitute aggregate activities that can be decomposed into sets of tasks and/or methods (leaf nodes).” 2 37 ./* 7(&6*. which is an OR node. The enables relationship shown is a prerequisite relationship. but there is not enough room in Buster’s schedule for m3 In the terminology of the domain and our previous publications of PPM for this domain [Hiatt et al.9&93<* 3/* (456*=* 7(&6*>* "%9:!6*?"&"@* 3A* (456*B* 7(&6*=* "%9:!6*C(#!9&* 38* (456*D* 7(&6*D* "%9:!6*C(#!9&* Figure 3.3. To illustrate the basic principles of this domain. with two child tasks. we will continue to refer to it as “utility. and agents cooperate to accrue as much utility at this level as possible.* (456*.. consider the simple activity network shown in Figure 3. On successive levels.* +. Each method can be executed only by a speciﬁed agent (agent: Name) and each agent can execute at most one method at a time. is the overall mission task.4: A simple. one could imagine a number of schedules being generated. as described further below. utility is called “quality”.! 9:"E59#* /* 012! 3. 2009]. there is no predetermined structure to the problems of this domain. each child task has two methods underneath it. one OR and one AND. as well as situations in which they would be appropriate. deterministic 3agent problem.4.. 2008. Other than this.Jeremy executes method m1. The taskgroup node of the network. which is explicitly modeled in this domain2 . Environment Modeling and Simulation) [Decker.! . which are directly executable in the world. 1996] called C TAEMS [Boddy et al. At the root is the OR taskgroup. Given this activity network. • m1 . We list below all possible valid schedules. (Task Analysis.3 Domains !"#$%&'()* +.8* "%9:!6*. for consistency in this document.
which fulﬁlls the prerequisite constraint and allows Buster to execute m3 and m4. respectively. which can be executed in any order.Jeremy executes method m1. since m1 earns higher utility. execution of the schedule still continues. such as by missing a deadline. we show the full model of this example problem in Figure 3.5. Even if a method fails. • • • • Note that the reason why methods m3 and m4 are not executed concurrently in this example is that they are owned by the same agent. We use “average” here to distinguish these values from the expected utility which is calculated as part of the probabilistic analysis. m4} . Now that we have illustrated the basic principles of the domain. and refer to it as we describe the further characteristics of the domain. which return the deterministic expectation of an activity’s duration and utility.Jeremy does not have enough room in its schedule to execute method m1. which results in zero utility. (UTIL: X). we refer to DUR. Each task in the activity network has a speciﬁed utility accumulation function (uaf) deﬁning 38 . Methods may also have an a priori likelihood of ending with a failure outcome (fail: X%). m4} . UTIL and fail as functions.Jeremy does not have enough room in its schedule to execute method m1. m2 → {m3. In subsequent text. which can be executed in any order. so Sarah executes the lowerutility option of m2. however. respectively. however. Method utilities. m1→ {m3. Method durations are speciﬁed as discrete probability distributions over times (DUR: X). calling for the scheduling of m2. An example of when this schedule is appropriate is if Buster’s schedule is tight and it is useful for m3 and m4 to begin execution before m1 is able to ﬁnish. and Sarah concurrently executes method m2. it is also scheduled. Whichever ﬁnishes ﬁrst fulﬁlls the prerequisite constraint and allows Buster to execute m3 and m4.Jeremy executes method m1. which can be executed in any order. so Sarah executes the lowerutility option of m2. Methods do not earn utility until their execution has completed. this fulﬁlls the prerequisite constraint and allows Buster to execute m3 and m4. since the complicated activity structure and plan objective means that there is very likely still potential for earning further utility at the taskgroup level. m2 . m4} .3 Background and m4. of that method. {m1 or m2} → {m3. We also deﬁne the functions averageDur and averageUtil. are also speciﬁed as discrete probability distributions over utilities. with the exact duration of a method known only after it has been executed. which when given a method as input return the duration distribution. any type of failure results in the method earning zero utility. Methods can fail for reasons other than a failure outcome. utility distribution and failure likelihood.
7C?/5** C*./& <1* =">3+*74?* @0AB+*14*.5) that can be speciﬁed for any activity. and how.+*17*. For example./& 2F"O32#* 07* &23+*1145*63+*184* !"#+*$!%(")*&. In the rest of this document. we refer to rel and dl as functions. A less common uaf is min.9:.. Each descendant of a task inherits these constraints from its ancestors.C4?/5*1D*. Additional constraints in the network include a release (earliest possible start) time and a deadline (denoted by rel: X and dl: X.7C?/* "%2F!+*N(#!2&* Figure 3.7C?/5** K*. respectively.@.7C?/5*17*. An executed activity fails if any of these constraints are violated. A summary of the uaf types is shown in Table 3.+*D*.7C?/* . Other uafs are max.7C?/5* C*.7C?/5** J*. when.7C?/* ./& 01* &23+*1445*63+*174* !"#+*%"'&.7C?/* "%2F!+*G2&2<H* <7* @0AB+*D*. a sum task’s utility at any point equals the total earned utility of all children.3 Domains !"#$%&'()* !"#+*$!%&. ONE tasks earn utility when their sole child executes with positive utility.. in Figure 3. which when given an activity as input return its release and deadline.7C?/5** E*.1. which speciﬁes that tasks accumulate the minimum of the utilities of their children once they all execute with positive utility. Uafs also specify how much ﬁnal utility the tasks earn.7C?/* "%2F!+*L"&"M* <8* @0AB+*C*.C4?/5*K*. a task accumulates utility.5: A small 3agent problem.7C?/* .7C?/** . Tasks with AND uafs earn positive utility once all children execute with positive utility.@.7C?/5*1D*. Tasks with OR uafs earn positive utility as soon as their ﬁrst child executes with positive utility. the NLE is interpreted as applying individually to each of the methods in the activity net39 . AND and ONE. and its effective execution window is deﬁned by the tightest of these constraints.@.3.+*I*. NLEs can be between any two activities in the network that do not have a direct ancestry relationship.@. whose utility equals the maximum utility of its executed children.C4?/5*E*.7C?/* "%2F!+*N(#!2&* <D* @0AB+*J*. which earns the sum of the utilities of its children once they all execute with positive utility.C4?/5*1E*. and sum and. Trivially.7C?/5* *K*.C4?/5*14*. Interdependencies between activities are modeled via constraints called nonlocal effects (NLEs).C4?/5*I*. If the target activity is a task. respectively.C4?/5*11*. NLEs express causal effects between a source activity and a target activity.+*I*. The uafs are grouped by node type: OR.C4?/5*E*.
we treat NLEs where the target is a task as the analogous situation where there are enabling NLEs from the source to each of the task’s method descendants.1: A summary of utility accumulation function (uaf) types. The second NLE is the disables NLE. executing the target serves no purpose and so the planner should remove it from the schedule. The enables NLE stipulates that unless the target method begins execution after the source activity earns positive utility. The most common is the enables relationship. and must be activated before a target method begins execution in order to have an effect on it. If a source activity does ﬁnish. for the sake of simplicity we present our discussion as if delays do not exist. which has the opposite effect of enabling. it will fail. Therefore. before the target has started. although our approach can handle any arbitrary delay. the execution of targets m3 and m4 will not result in positive utility unless the methods are executed after the source T1 earns positive utility. however. type description utility is sum of children’s utility sum OR (task earns utility as soon as the ﬁrst child earns positive utility) utility is max of children’s utility max OR (task earns utility as soon as the ﬁrst child earns positive utility) utility is sum of children’s utility sum and AND (task earns utility once all children earn positive utility) utility is min of children’s utility min AND (task earns utility once all children earn positive utility) utility is only executed child’s utility exactly one ONE (task earns utility as soon as the ﬁrst child earns positive utility. NLEs are activated by the source activity accruing positive utility.5. The 40 . Typically. For example. in Figure 3. There are four types of NLEs. otherwise. an effective planner will schedule sources of disables to ﬁnish after the targets start. task has zero utility if any other children earn positive utility) uaf work under the task. All NLEs have an associated delay parameter specifying how much time passes from when the source activity accrues positive utility until the NLE is activated.3 Background Table 3. The disables relationship stipulates that the target method fails (earns zero utility) if it begins executing after its source has accrued positive utility. in much of our future discussion. its execution would be a waste as the method would not earn utility. The enables relationship is unique among the NLEs because an effective agent will wait for all of a method’s enablers to earn positive utility before it attempts to begin executing it.
at most. a target’s duration can. unlike the hard NLEs. be doubled by the effect of a single hinderer. The hinders NLE is exactly the same as the facilitates. fac and hind as functions that. as they must be observed to earn utility. the ceiling function is applied after the individual NLE effects have been combined. doubled by the effect of a single facilitator. for the duration calculation. If a method is the target of multiple facilitates or hinders. and M axU (s) refers to an approximation of the maximum possible utility of the activity s. As it is not important to our approach. The facilitates and hinders NLEs are referred to as “soft” NLEs as. when activated. return the scheduled enablers. Putting it all together. the utility distribution of the target t if it starts at time e becomes: UTIL (t) = UTIL (t) · (1 + uc · proputil (s. The third NLE is the facilitates NLE. becomes: DUR (t) = DUR (t) · (1 − dc · proputil (s. disablers. utilities may not be after a facilitating effect. at most. The amount of the effect depends on the proportional utility (proputil) of the source s at the time e when the target starts execution: proputil (s. increases the target method’s utility and decreases its executed duration. according to this feature of the domain description. This formula is used to specify the effect of the NLE in terms of a utility coefﬁcient uc such that 0 ≤ uc ≤ 1.3 Domains disables NLE and enables NLE are referred to as “hard” NLEs. facilitators and hinderers of 41 . e)) /M axU (s) In this formula. Also note that according to this feature of the domain description. This NLE. utility can be earned without observing them. e) = min (1.2. e)) Note that the presence of in the duration ensures that durations are integral. utilearned (s. when given an activity as input. utilearned refers to the utility so far accumulated by s. a target’s utility can be. we omit describing how M axU is calculated. dis. In the rest of this document. the effects can be applied in any order. except that the signs of the effect are changed: UTIL (t) = UTIL (t) · (1 − uc · proputil (s)) DUR (t) = DUR (t) · (1 + dc · proputil (s)) Again. we refer to en. similarly. as multiplication is communicative.3. note that durations are always integral and. A summary of the NLE types is shown in Table 3. e)) The duration distribution. and a duration coefﬁcient dc such that 0 ≤ dc < 1.
We will also refer later to the function directlyDisabledAncestor which. Additionally. agents receive a copy of their portion of the initial schedule and limited local views of the activity network.6.3 Background NLE enables disables facilitates hinders Table 3. NLEs returns the scheduled source NLEs of all types of an activity.5 is shown in Figure 3. The function enabled and disabled return the activities directly enabled (or disabled) by the input activity.e. 42 . returns the ancestor of the method that is directly targeted by the disables NLE. as appropriate.2: A summary of nonlocal effect (NLE) types. and localSchedule. When execution begins. The second analysis is used by each agent during execution to distributively manage their portion of the schedule. The limited local view for agent Buster of Figure 3. agents have a short..2. when given a method and (one of) its disabler. the problem was simpliﬁed and some activities were deleted. After execution begins. We have developed two versions of our probabilistic plan management approach for this domain. hard/soft description target will fail if it begins executing hard before the source earns positive utility target will fail if the source earns hard positive utility before the target begins executing target’s duration/utility are decreased/increased soft if the source earns positive utility before the target begins executing target’s duration/utility are increased/decreased soft if the source earns positive utility before the target begins executing the activity. the local view consists of a single tree rooted at the taskgroup node). One analysis applies to the initial. all activities are ultimately connected to the root node (i. Agents receive copies of all of their local activities. During our discussion of PADS. which returns the methods scheduled locally on a single agent. which returns the scheduled methods across all agents. we distinguish between the functions jointSchedule. Similarly. In order to ﬁt the problem on a reasonable number of pages. as well as remote copies of all activities that are constrained via NLEs with their local activities. A sample problem for this domain in C TAEMS syntax is shown in Appendix A. ﬁxed period of time to perform initial computations (such as calculating initial probability information) before they actually begin to execute methods. global schedule and is used to strengthen it before it is distributed to agents. We assume that a planner generates a complete ﬂexible times schedule for all the agents. Agents’ local views consist of activities that are designated as local (the agent is responsible for executing and/or managing them) or remote (the agent tracks current information for these activities via status updates from the agent responsible for maintaining them).
B4C/5*E*. a relaxed version of the problem is solved optimally to determine the set of “contributor” methods. the joint schedule as a whole may be partialorder as interagent activities are constrained with each other only by NLEs.?.5.7BC/5** D*.3 Domains !"#$%&'()* !"#+$%!&$. the local function. 2007].?./$ >F* ?0@A+*H*. returns whether that activity is local to the agent. The constraint propagation algorithm used an incremental.6: The limited local view for agent Buster of Figure 3. with an underlying implementation of an STN. More 43 .2. an attempt is made to sequentially allocate these methods to agent schedules.. The planner uses expected duration and utility values for methods.3. Speciﬁcally. If a contributor being allocated depends on one or more enabling activities that are not already scheduled. Core Planner The planner for this domain is the incremental ﬂexible times scheduling framework described in [Smith et al. Given the complexity of solving this problem optimally and an interest in scalability.7BC/5* *I*. global schedule is developed via an iterative twophase process. In the ﬁrst phase. and not by explicit sequencing constraints.B4C/5*11*.7BC/* "%2<!+*G(#!2&* >8* ?0@A+*B*. arcconsistency based shortest path algorithm. ignores a priori failure likelihoods. the initial. or methods that would maximize overall utility if there were no capacity constraints.2. A similar process takes place within each agent when it replans locally. based on [Cesta and Oddi.7BC/* .+*E*. The schedule for any individual agent is totalorder.. It adopts a partialorder schedule (POS) representation. however.&2>'!2/* 01* &23+*1445*63+*174* !"#+*&"'$. In the second phase. as described in Section 3../$ 2<"=32#* 07* &23+*1145*63+*184* !"#+*%!&(")*$. these are scheduled ﬁrst.7BC/5** I*.B4C/5*D*./$ .+*F*.7BC/5** B*. when given an activity. and focuses on producing a POS for the set of agents that maximizes overall problem utility. 1996].B4C/5*14*.7BC/* "%2<!+*G(#!2&* Figure 3. the planner uses a heuristic approach.7BC/* . Should a method fail to be inserted at any point (due to capacity constraints).9:. the ﬁrst phase is reinvoked with the problematic method excluded.
if a domain’s distribution type is different than discrete.1. 2007]. We describe the runtime plan management system further in Section 7. Decision trees. and classiﬁers in general. therefore. we omit the versions of these formulas for the multiplication of and the minimum of two distributions.4 Decision Trees with Boosting We use decision trees with boosting [Quinlan.3 Background detail on the planner can be found in [Smith et al. Agents replan selectively in response to execution. Z = X + Y is [Grinstead and Snell. 1986] as part of our work on probabilistic metalevel control. 1997]: ∞ P (Z = z) = k=−∞ P (X = k) · P (Y = z − k) The max of two discrete probability distributions. such as if the STN becomes inconsistent. Discrete distributions do. the executor monitors the NLEs of any pending activities and executes the next pending activity as soon as all the necessary constraints are satisﬁed. the below equations can be changed as appropriate. Unlike with normal distributions. have the beneﬁt of being able to be combined or modiﬁed exactly by any arbitrary operator. We use summation. Decision trees.. For our purposes. Y ) is: P (Z = z) = P (X = z) · P (Y < z) + P (X < z) · P (Y = z) For the sake of redundancy. the size of the resulting distribution is bounded by X·Y  as well (although typically such a distribution has duplicate values and so its representation is not that large). Z = max (X. these calculations are not linear and are done in O (X · Y ) time. 3. Discrete Probability Distributions In this domain. however. or if a method ends in a failure outcome. this means determining what highlevel planning action to take based on the current probabilistic analysis of the schedule.3. uncertainty is represented as discrete distributions. During distributed execution. are a way of assigning one of a number of classes to a given state. however. multiplication. maximization and minimization in our algorithms in later chapters. are easily susceptible to overﬁtting: they can become unnecessarily complex and describe random 44 . These equations can be implemented by iterating through the distributions in question. The sum of two discrete probability distributions.
.. A decision tree is built recursively. IG is deﬁned as IG (A. each with a known class Y ∈ y1 . We choose this type of classiﬁcation because it is easy to implement and very effective.7. the node becomes a leaf node specifying that class. weak classiﬁers (such as decision trees restricted in size). To choose which attribute to branch on. t) where H (Y ) represents the entropy of the class variable Y before the split: k H (Y ) = − i=1 P (Y = yi ) log2 P (Y = yi ) and H (Y  A. At each node of the tree C4. we use the algorithm C4. the examples also have associated weights.” To construct a decision tree. It starts at the root node and inspects the data.5 chooses an attribute A and.3. weighs them according to their goodness. If all examples have identical states. t) = H (Y ) − H (Y  A. which divides the data according to whether the value for attribute A is less than t: k H (Y  A. Boosting is a metaalgorithm that creates many small. t) = − P (A < t) i=1 k P (Y = yi  A < t) log2 P (Y = yi  A < t) P (Y = yi  A ≥ t) log2 P (Y = yi  A ≥ t) i=1 − P (A ≥ t) 45 . or the reduction in entropy that occurs by splitting the data by A and t. if A is a continuous attribute. w1 . Each example consists of a number of attributes. 1996] even called AdaBoost with decision trees the “best offtheshelf classiﬁer in the world. and lets them “vote” to determine the overall classiﬁcation. Boosting is one technique that is commonly used to overcome this problem. [Breiman.. it becomes a leaf node specifying the class that the weighted majority of the states have. t) represents the entropy of the class variable Y after the split.xj .. which maximize Information Gain (IG).wj .. An example decision tree where all examples are weighted equally is shown in Figure 3.. in our case. the node branches based on an attribute and a threshold for that attribute: all examples with a value for that attribute less than the threshold are passed to the right child. continuing until the algorithm terminates. Otherwise. 1993]. one needs a training set of example states. Formally.5 [Quinlan. Because we are using a boosting algorithm. these attributes are variables that have continuous values.yk . a threshold t.4 Decision Trees with Boosting error or noise in the dataset instead of the true underlying model. x1 . This process recurses for each child. all others are passed to the left. If all examples have the same class.
The algorithm begins by weighing all training examples equally. 1997].."4$)0&'" !"!" A*'"<="4$)0&'" %"/" 68>*?9"+2"98&@?" A*'"<="#$!06'" %"!" 68>*?9"+2"98&@?" A*'".( %(%)*+" %(%!*+" %(%*+" %(%/*+" %(%*+" %(%!*+" %(%3*+" %(%!*+" %(%*+" %(%*+" %(%0*+" )#. Instances that are not underlined refer to trafﬁc being present. and calculates a “goodness” term α for the classiﬁer based on the (weighted) percent of examples that were misclassiﬁed. which returns 1 if the passed in equation is true and 0 otherwise."3$!06'" !"3" A*'"<="3$!06'" /"!" A*'". It builds a simple decision tree (e..... P (Y = yi ) is: P (Y = yi ) = j i=1 (wi · 1 (Y (xi ) = yi )) j i=1 wi where 1 is the indicator function. we use the boosting algorithm SAMME [Zhu et al.3 Background !"#$%&'( )*$'( !" /" 3" )" 0" #" 7" 4" 5" !%" !!" #$%%&'" #$!0&'" 4$%%&'" 4$3%&'" 5$3%&'" !$3%6'" )$3%6'" )$)06'" 0$)06'" 7$%%6'" 4$%%6'" +#*." 12" ."7$%%&'" %"!" 68>*?9"+2"98&@?" A*'"<="7$%%&'" !"%" 68>*?9"98&@?" A*'". Next." 12" 12" .. 2005]. As the data is weighted. it increases the weight of the 46 ..." 12" . a multiclass variant of AdaBoost [Freund and Schapire." .g." ./( ."%(%!0" 3")" :&*+"<="%(%!0" )"%" 68>*?9"98&@?" A*'"." 8229" 7")" :&*+".." . underlined instances refer to trafﬁc not being present. Because we are solving a multiclass classiﬁcation problem. Each node of the tree shows the attribute and threshold for that branch as well as how many instances of each class are present at that node. a decision tree with two levels).7: An example decision tree built from a data set. The tree classiﬁes 11 examples giving state information about the time of day and how much rain there has been in the past 10 minutes as having or not having trafﬁc at that time."#$!06'" /"%" 68>*?9"98&@?" Figure 3.
3. intuitively. means that the classiﬁer returns the classiﬁcation that receives the highest total of votes from each of the weighted decision trees.4 Decision Trees with Boosting misclassiﬁed examples and decreases the weight of the correctly classiﬁed examples. The overall classiﬁcation C (x) of an example x is then: M C (x) = arg max y m=1 α (m) · 1 (C m (x) = y) This equation. and builds another decision tree based on the newly weighted examples. It repeats the process of building decision trees and reweighing the examples until the desired number M of classiﬁers are built. 47 .
3 Background 48 .
while avoiding their weaknesses. It also uses the analysis to manage the execution of the schedule via probabilistic metalevel control. It then layers a probabilistic analysis on top of the planner to explicitly analyze the uncertainty of the schedules the planner produces. it uses the analysis to make the schedule more robust to execution uncertainty. PPM maintains this probabilistic analysis of the current deterministic schedule as it changes due to both the dynamics of execution and replanning. 4. Both before and during execution. The objective can also be a combination of these two categories. Objectives typically fall under one of two general categories: all activities must be executed within their constraints. the objective must be speciﬁed in terms of a 49 . as well as discussing how they ﬁt together to probabilistically manage plans. explicit objective. In this chapter.1 Domain Suitability In the problems we consider.Chapter 4 Approach Probabilistic plan management (PPM) is a composite approach to planning and scheduling under uncertainty that couples the strengths of deterministic and nondeterministic planning. The goal is to manage and execute effective deterministic plans in a scalable way. limiting the use of resources to replan and strengthen the schedule during execution while maintaining the effectiveness of the schedule. via probabilistic schedule strengthening. we ﬁrst discuss the types of domains that our approach can analyze. or reward (or another metric such as cost) is maximized or minimized. As its starting point. plan execution has an overall. PPM assumes a deterministic planner that generates an initial schedule to execute and replans as necessary during execution. In all cases. We then detail each of the three main components of probabilistic plan management.
which are then considered by our analysis (Section 3. but ideally it generates good deterministic plans for the problem of interest. Similarly. again. but instead layer an uncertainty analysis on top of the schedules it produces. allowing for more effective runtime probabilistic plan management. then our analysis will not provide any information helpful for probabilistically managing plan execution.2. such as the ﬁnish time of an activity given that it meets a deadline [Barr and Sherrill. and one that does not a utility of 0. if all activities must be executed within their constraints.g. which quantiﬁes how well the objective was achieved. according to various functions (e. we assume agents cooperate together to jointly achieve the plan objective. Normal distributions can be very readily summed to calculate activities’ ﬁnish time distributions. for the purposes of this analysis. In this work. however. and we show our approach is effective in such domains. Discrete distributions. For multiagent scenarios. Also. tasks always start when requested. However. For example.. then one possible utility function is assigning a schedule that does so a utility of 1. the objective of maximizing reward leads to a utility that simply equals the reward earned during execution. Although a variety of executive systems can be used in our approach. we assume that the executive is perfect (e. no latency exists. This is because we do not integrate our approach with the planner.1).4 Approach numeric utility. such problems are in general difﬁcult to solve. the objective must be affected in some way by the presence of uncertainty in the problem. we make the assumption that domain representations can be translated into hierarchical activity networks. etc. or approximately combined. as long as the distributions representing the uncertainty can be combined. If it is not. We assume that schedules are represented as STNs. they are unable to exactly capture more complicated functions. Our approach does not place any restrictions on what planner is used or how it determinizes the domain. Our analysis also assumes that plans do not branch. however. the planner can work with whatever domain representation is most convenient. In particular. sum. Any domain that is to be considered by our approach must have an explicit model of its uncertainty. the temporal aspect of the analysis can easily be left out to perform PPM for domains without durative actions. the most suitable would provide execution results as they become known throughout execution.. Discussion of our analysis does presuppose that we are considering planning problems with durative tasks and temporal uncertainty. this assumption could be broken and the approach extended to include branching or conditional plans. 1999]. No speciﬁc representation of uncertainty is assumed. however. Additionally. the size of the combination of two distributions is bounded by the product of the size of the two individual distributions.g. Two common distribution types (and the two that we consider explicitly in this document) are normal and discrete distributions. allow for the exact calculation of combinations of distributions. on the other hand. max).). we do not assume the planner plans with activity networks. If the executive 50 .
oversubscribed multiagent domain where the objective is to maximize reward. One is a singleagent domain where the objective is to execute all tasks within their temporal constraints. such as in method durations and outcomes. for each activity. such as the deterministically scheduled start and ﬁnish times of activities. By explicitly calculating a schedule’s expected utility. For multiagent domains.3. we use PPM in two different domains.4. as well as its expected utility.4. because agents’ last schedules would have the beneﬁt of having been strengthened.4. in part. PADS culminates in ﬁnding the probability proﬁle of the taskgroup. the algorithm explores all known contingencies within the context of the current schedule in order to ﬁnd its expected utility. these values. the effect of communication accidently going down can be alleviated somewhat by probabilistic schedule strengthening.2 Probabilistic Analysis of Deterministic Schedules system broke this assumption often enough. For domains with an explicit notion of reward. which is represented as normal distributions. For domains where the plan objective is to execute all activities. 51 . tracks activities’ ﬁnish time distributions. PADS allows us to distinguish between the deterministic and probabilistic states of a schedule. the overall expected utility is between 0 and 1.1. trivially. which summarizes the expected outcome of execution and provides the overall expected utility. it would adversely affect our approach as agents only intermittently be able to share probability information. we also assume that communication is reliable and has sufﬁcient bandwidth for agent communications. on the other hand. which are represented as discrete distributions.1. There are a variety of sources of uncertainty. If communication is unreliable and readily goes down. It calculates. as all communication is done via message passing over a wired intranet. There is uncertainty in method duration. At the highest level of abstraction. make up a probability proﬁle for each activity. this assumption is not broken in the multiagent domain we speciﬁcally consider. the probability of the activity meeting its objective. Ways to limit communication in domains with limited bandwidth are discussed in Sections 5. However. 4. The probabilistic state.2 Probabilistic Analysis of Deterministic Schedules At the core of our approach is a Probabilistic Analysis of Deterministic Schedules (PADS). it equals the probability that the plan objective will be met by executing the schedule. it should be explicitly modeled and incorporated into the uncertainty analysis to increase PPM’s effectiveness. The second is a distributed.5 and 8. In this thesis. the taskgroup expected utility equals the expected earned reward. We discuss this further in Section 6. We consider the deterministic state to be the status of the deterministic schedule.
the deterministic state and the probabilistic state have very different notions of what it means for something to be “wrong” with the schedule. the schedule’s expected utility suffers equally from any activity failing. PADS cannot be calculated all at once since no agent has complete knowledge or control over the entire activity hierarchy. and how it can be used as the basis of different metalevel control algorithms that intelligently control the computational resources used to replan during execution. These sources of uncertainty propagate through the schedule. ultimately. therefore. scaled by the activity’s likelihood of failing. This metric is most useful for plan objectives that involve maximizing or minimizing some characteristic of execution. for plan objectives that involve executing all activities. which discusses future work. In this document. Common examples are uncertainty associated with an activity’s duration. starting with the earliest and least constrained activities and propagating probability values throughout the schedule to ﬁnd the overall expected utility. whereas the probabilistic state expands that to summarize all execution paths within the scope of the current schedule. and so expected utility loss could simply equal the probability of the activity failing. we discuss how it can be used to strengthen plans to increase expected utility. In general. and communicating the information with others. The overall structure of PADS analyzes this. because agents cannot calculate changes in expected utility for the entire schedule with only partial views of the activity network.4 Approach probabilities of meeting deadlines. In such situations. Other potential uses are described in Chapter 8. affecting the execution of other scheduled activities and.. something is wrong if execution does not follow that one path (or small set of paths) and an event happens at a nonscheduled time or in a nonplanned way. utility. with each agent calculating as much as it can for its portion of the activity network. which can affect the accomplishment of their objectives. are approximated by agents as needed during execution.g. something is wrong if the average utility is below the targeted amount or the likelihood of not meeting the plan objective is unacceptably low. In distributed. and expected utility. determining how much utility is earned during execution. Calculations such as expected utility loss. The deterministic state knows about one (or a small set of) possible execution paths. the PADS algorithm is also distributed. This metric measures the relative importance of an activity by quantifying how much the expected utility of the schedule would suffer if the activity failed. domains can have a wide variety of sources of uncertainty. multiagent scenarios. Thus. PADS can also provide useful information about the relative importance of activities via the expected utility loss metric. A novelty of our algorithm is how quickly PADS can be calculated and its results utilized to 52 . however. PADS can be used as the basis of many potential planning actions or decisions. was the activity successful in achieving its goal or not). and outcome (e. in the probabilistic state. In the deterministic state.
even if the agent is able to sufﬁciently respond to a problem. due to analytically calculating the uncertainty of a problem in the context of a given schedule. Probabilistic schedule strengthening improves the expected outcome of the current schedule. If a schedule has low utility to begin with. The effectiveness of schedule strengthening also depends on the structure of the domain. not the schedule itself. its effectiveness can be limited by the original goodness of the schedule. and on knowledge of that structure. assumptions on how execution will unfold. complicated multiagent problems such as these. 4. we utilize the information provided by PADS to layer an additional probabilistic schedule strengthening step on top of the planning process.58 seconds. but only as long as planners have sufﬁcient time to respond to problems that occur. probabilistic schedule strengthening will not be that useful. as the speciﬁc actions that can be used to strengthen these activities 53 . By minimizing the likelihood of these failures occurring. and often inaccurate. Schedule strengthening is one way to proactively protect schedules against such contingencies [Gallagher. moreover. the expected utility is globally calculated in an average of 0. probabilistic schedule strengthening can help to decrease the risk associated with the schedule and raise the schedule’s expected utility. thereby raising the expected utility of the schedule. The philosophy behind schedule strengthening is that it preserves the existing plan structure while making it more robust. 2009]. which is a very reasonable time for large. probabilistic schedule strengthening increases schedule robustness. Probabilistic schedule strengthening improvements are targeted at activities that have a (high) likelihood of failing to contribute to the plan’s objective and whose failure negatively impacts schedule utility the most.3 Probabilistic Schedule Strengthening While deterministic planners have the beneﬁt of being more scalable than probabilistic planners. if a schedule has high utility but with a high chance of not meeting its objective or earning that utility. On large reference multiagent problems with 1025 agents.3 Probabilistic Schedule Strengthening make plan improvements. there are domains when replanning is always too late to respond to activity failure. utility loss may occur due to the last minute nature of the schedule repair. Additionally. for example.4. as backup tasks would have needed to be set up ahead of time. making local improvements to the schedule by anticipating problems that may arise and probabilistically strengthening them to protect schedules against them. Replanning during execution can effectively counter such unexpected dynamics. Because schedule strengthening works within the existing plan structure. In this thesis. they make strong.
We list here example strengthening actions that can be applied to a variety of domains: 1. substitution. Although an unexecuted activity does not contribute to a schedule’s utility. without this ﬂexibility. the least effective do not (such as (4)). backup activity under a parent OR task.Changing the execution order of activities. in a lower utility activity being scheduled. This may result. 54 . However. This protects against a number of contingencies. This decreases the chance that the activity will violate a temporal constraint such as a deadline. Adding a redundant activity . This has the potential to order activities so that constraints and deadlines are more likely to be met. Decreasing effective activity duration . such as an activity having a failure outcome or not meeting a deadline. with the goal of shortening activity duration or lowering a priori failure likelihood. Strengthening actions can be prioritized by their effectiveness: in general.Scheduling a redundant. activities are prioritized for strengthening by expected utility loss so that the most critical activities are strengthened ﬁrst. Reordering activities . Switching scheduled activities . however.Replacing an activity with a sibling under a parent OR or ONE task. PADS allows us to explicitly address the tradeoffs inherent in each of these actions to ensure that all modiﬁcations make improvements with respect to the domain’s objective. however. Domains wellsuited to schedule strengthening ideally have a network structure that allows for ﬂexibility in terms of activity redundancy.4 Approach depend on the domain’s activity network. 2. Removing an activity .Adding an extra agent to assist with the execution of an activity to shorten its effective duration [Sellner and Simmons. it can also decrease schedule slack. During overall probabilistic schedule strengthening. otherwise it is not helpful and the action can be rejected and undone. 3. removing activities increases schedule slack. 5.Unscheduling an activity. it would be difﬁcult to ﬁnd ways to make schedules more robust. it might also increase the likelihood that activities miss their deadlines or violate their constraints. adding agents to activities or facilitating them in another way may decrease schedule slack and increase the likelihood other activities miss their deadline. or constraints. 4. however. the strengthening action is helpful. 2006] or to facilitate its execution in some other way. If expected utility increases. After any given strengthening action. the most effective actions preserve the existing schedule structure (such as (1) and (3)). the expected utility of the plan is recalculated to reﬂect the modiﬁcations made. potentially increasing the risk that other activities will violate their temporal constraints. potentially increasing the likelihood that remaining activities meet their deadlines.
an agent could replan in response to every update. Replanning during execution allows agents to generate schedules that take advantage of opportunities that may arise. however.4. 55 . on average. in replanning too much or too little during execution. On one hand. deterministic counterparts. On reference problems in the multiagent domain. however. we use our probabilistic analysis to identify times where there is potential for replanning to be useful. Ultimately. is difﬁcult to answer. the agent would know that the schedule had enough ﬂexibility – temporal or otherwise – to absorb unexpected execution outcomes as they arise without replanning. we have shown that probabilistically strengthened schedules are signiﬁcantly more robust. while also ensuring that replans are done when they are probably necessary or may be useful. this could lead to lack of responsiveness and a waste of computational resources. This eliminates potentially unnecessary and expensive replanning. increasing schedule expected utility. either to take advantage of opportunities that arise or to minimize problems and/or conﬂicts during execution. Using probabilistic schedule strengthening in conjunction with replanning during execution results. but run the risk of not meeting its goals because errors were not corrected or opportunities were not taken advantage of. like strengthening) when it was useful. schedule strengthening in turn makes those schedules more robust to unexpected or undesirable outcomes. There are several different ways to use PADS to drive the metalevel decision of when to replan. There are clear tradeoffs. however. The question of whether replanning would be beneﬁcial. either because there is a potential new opportunity to leverage or a potential failure to avoid. in large part because it is difﬁcult to predict whether replanning is beneﬁcial without actually doing it. robust schedules. however.4 Probabilistic MetaLevel Control Replanning during execution provides an effective way of responding to the unexpected dynamics of execution. As an alternative to explicitly answering the question. and times when it is not.4 Probabilistic MetaLevel Control During execution. and have an average of 32% more expected utility than their initial. 4. an agent would always and only replan (or take some other planning action. probabilistic schedule strengthening can be performed at any time to robustify the schedule as execution proceeds. it is best when coupled with replanning to build effective. in 14% more utility earned than replanning alone during execution. an agent could limit replanning. At other times. On the other hand. Ideally. The probabilistic control algorithm thus distinguishes between times when it is likely necessary (or beneﬁcial) to replan.
the objective of the plan can help lead to ﬁnding the best strategy. when agents decide to replan independently of one another. and portions that are not. In situations with high uncertainty.. Experiments in simulation for the singleagent domain have shown a 23% reduction in schedule execution time (where execution time is deﬁned as the schedule makespan plus the time spent replanning during execution) when probabilistic metalevel control is used to limit the amount of replanning and to impose an uncertainty horizon. When designing metalevel control strategies.) can also vary between strategies. For oversubscribed problems with the goal of maximizing reward. versus when the entire plan is replanned whenever a schedule conﬂict arises. For centralized applications.g. always replanning the entire schedule means that tasks far removed from what is currently being executed are replanned again and again as the near term causes conﬂicts. but also the amount of the current schedule that is replanned. with no signiﬁcant change 56 . such as ones in which the objective is to execute all or a set of activities within their temporal constraints. For distributed applications. and a better strategy might be a learningbased approach that learns when to replan based on how activities’ expected utilities change over time. Which component of a probability proﬁle is utilized (e. looking at the plan as a whole may sufﬁce for deciding when to replan. Instead. or look at how the proﬁles change over time. For planning problems with a clear upper bound on utility. Whether the approach is centralized or distributed can also affect the choice of a control strategy. which uses PADS to distinguish between portions of the schedule that are likely to need to be replanned at the current time. The control strategy can also be used to limit not only the number of replans. it may be more appropriate to consider the local probabilistic state. as compared to a plan management strategy that replans according to deterministic heuristics. however. expected utility. To this end. the plan as a whole may be a poor basis for a control algorithm because the state of the joint plan does not necessarily directly correspond to the state of an agent’s local schedule and activities. probability of meeting objective. different strategies can either consider activities’ proﬁles on their own. such as the probability proﬁles of methods in an agent’s local schedule.4 Approach For example. Experiments in the multiagent domain show a 49% decrease in plan management time when our probabilistic metalevel control strategy is used. we introduce an uncertainty horizon. This distinction is made based on the probability that activities beyond the uncertainty horizon will need to be replanned before executing them. strategies can differ in which activity probability proﬁles they consider when deciding whether to replan. Additionally. in distributed cases. potentially causing a waste of computational effort. etc. when deciding whether to replan. a good strategy is to replan whenever the expected utility of the entire schedule falls below a threshold. the maximum feasible utility is typically not known.
we discuss the PADS algorithm for both the singleagent and multiagent domains. Overall. PADS calculates and updates probabilistic information about the current schedule as execution progresses. and then a probabilistic schedule strengthening pass is performed (PSS in Figure 4. where it waits to hear of changed information. in multiagent scenarios. Then. probabilistic metalevel control (PMLC in Figure 4. 57 . which corresponds to only 2% of the total time of execution. in multiagent scenarios. When all components of PPM are used. In PPM. Finally.4. agents would communicate these changes to other agents. Finally. the agent updates its probability proﬁles via PADS (described in Chapter 5).1. The default state of an agent is a “wait” state. Finally. Probabilistic metalevel control makes this decision with the goal of maintaining high utility schedules while also limiting unnecessary replanning. It does not. either from the simulator specifying the results of execution or. In the next chapter. the average utility earned in the multiagent domain increases by 14% with an average overhead of only 18 seconds per problem. all these components come together to probabilistically and effectively manage plan execution. Any time such an update is received. show the ﬂow of information to or from agents’ executive systems.1. Note that this diagram illustrates only the ﬂow of PPM in order to demonstrate its principles. PADS ﬁrst updates the probability proﬁles of the new schedule. while Chapter 7 explains in depth the probabilistic metalevel control strategy. for example. At all times. described in Chapter 7) decides whether to replan (or probabilistically strengthen the schedule) in response to the changes in the agents probability proﬁles. Chapter 6 details our probabilistic schedule strengthening approach.1. each time that a replan is performed. Chapter 8 presents our conclusions and future work.4 Probabilistic MetaLevel Control in utility earned. described in Chapter 6) in order to improve the expected utility of the new schedule. in order to control the amount of replanning. A ﬂow diagram of PPM is shown in Figure 4. the beneﬁt of PPM is clear. messages from other agents specifying changes to their probabilistic information.
4 Approach .10$. 58 .2% 5789% :+.% '.22"0.2% /0.10$.3)"1%"14%% 2$+.22"0.1$%557%=)*>%6#"0+"'% 5/6&% :4*%1*$#10<% !"#$% (34"$.+% /0.$.1: Flow diagram for PPM.<% 5)"1% 5/6&% 5&&% Figure 4.%'.1$2% &#'()"$*+% (34"$.1<% :2$+.1%*1).
which includes the activity’s probability of meeting its objective (po) and its expected utility (eu).1 Overview The goal of the PADS algorithm is to calculate. for each activity in a problem’s activity network. 5.Chapter 5 Probabilistic Analysis of Deterministic Schedules This section discusses in detail how to perform a Probabilistic Analysis of Deterministic Schedules (PADS). Finally. a probability proﬁle. Section 5.1 provides an overview of how the probability analysis is performed in the two domains we consider. its start time distribution (ST ) and duration distribution are necessary for calculating its ﬁnish time distribution. The algorithm propagates probability proﬁles through the network.2 and 5. which summarizes the expected outcome of execution and provides the overall expected utility.4 presents PADS experimental results. which is the basis of our probabilistic plan management approach. Along with the method’s probability of meeting its objective 59 . The casual reader. We developed versions of this algorithm for both of our domains of interest. culminating in ﬁnding the probability proﬁle of the taskgroup. while Sections 5. in turn. a method’s ﬁnish time distribution (F T ) is typically necessary for calculating the probability that the method meets the plan objective. For domains with temporal constraints like the two we consider.3 provide details about the algorithms. Section 5. can safely pass over the two more detailed sections that follow. after reading the overview section.
For example. a method’s expected utility is calculated (Line 4).2 and 5. 60 . a method may be found to have a nonzero probability of violating a constraint or otherwise not meeting its objective. leading to a failure. est. a method is dependent on its predecessor and its prerequisites. It can be adjusted or instantiated differently for different domain characteristics. Finally.1 provides the general. A method’s ﬁnish time distribution is found next (Line 2). and generally includes summing its start time distribution and duration distribution. in this and following chapters we use uppercase variables to refer to distributions or sets and lowercase variables to refer to scalar values. its proﬁle depends on the proﬁles of those activities and cannot be calculated until they are. At any point during these calculations. such information should be incorporated into the po (m) calculation. Algorithm 5. Typically. as it cannot start until those activities ﬁnish. We also use lowercase variables to refer to methods or activities. when we discuss PADS in detail for the two domains. and its predecessors and prerequisite activities’ ﬁnish time distributions. as well as any other dependencies (such as soft NLEs) that may affect it.1: Method probability proﬁle calculation. a method’s start time distribution can be calculated (Line 1). Once these proﬁles are known. Similarly. Algorithm 5. Function : calcMethodProbabilityProfile (m) 1 //calculate m’s start time distribution ST (m) = calcStartTimes (m) //calculate m’s finish time distribution F T (m) = calcFinishTimes (m) 2 //calculate m’s probability of meeting its objective 3 po (m) = calcProbabilityOfObjective (m) 4 //calculate m’s expected utility eu (m) = calcExpectedUtility (m) The order in which activities’ proﬁles are calculated is based on the structure of activity dependencies in the activity network. highlevel structure for calculating a method’s probability proﬁle. In general. but uppercase variables to refer speciﬁcally to tasks. calculating a method’s start time distribution includes ﬁnding the maximum of its earliest start time. if a method is found to have a nonzero a priori probability of failing. Its expected utility depends on po (m) and its utility. correspondingly.3. we call these values the method’s probability proﬁle. These cases are collected and combined to ﬁnd m’s probability of meeting its objective (Line 3).5 Probabilistic Analysis of Deterministic Schedules and expected utility. as we will see in Sections 5.
1 Overview Its utility depends on the plan objective. If the plan objective is to execute all scheduled methods. individual methods should all have the same utility. highlevel algorithm is shown in Algorithm 5. instead. the algorithm does not explicitly check the number of children that are scheduled. we do not show that check here. The general. the function scheduledChildren returns the scheduled children of a task. The algorithm calculates tasks’ probability proﬁles based on whether the task is an AND. OR or ONE task. a method’s utility is deﬁned by that. however. if the plan objective includes an explicit model of reward or utility. In this algorithm. 61 .2.2: Task probability proﬁle calculation. Algorithm 5. A task’s ﬁnish time distribution may only be necessary to explicitly calculate if there are direct relationships between it or its ancestors and other activities. start time distributions are ignored as they are not necessary for calculating a task’s ﬁnish time distribution. That is because we assume that unscheduled tasks have a po and eu of 0. For tasks. Function : calcTaskProbabilityProfile (T ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 switch Type of task T do case AND F T (T ) = calcANDTaskFinishTimes (T ) po (T ) = calcANDTaskProbabilityOfObjective (T ) eu (T ) = calcANDTaskExpectedUtility (T ) case OR F T (T ) = calcORTaskFinishTimes (T ) po (T ) = calcORTaskProbabilityOfObjective (T ) eu (T ) = calcORTaskExpectedUtility (T ) case ONE if scheduledChildren (T )  == 1 then c = soleelement (scheduledChildren (T )) F T (T ) = F T (c) po (T ) = po (c) eu (T ) = eu (c) else po (T ) = eu (T ) = 0 end end For AND and OR tasks. Probability proﬁles are found differently for tasks. This property means that the equations in the algorithm evaluate to the desired values without such a check.5. a task’s ﬁnish time distribution depends on the ﬁnish time distributions of its children.
therefore. Instead. enable the expected utility of the entire schedule to be found. calculating the probability proﬁles of each method. its probability of meeting the plan objective should equal the probability that all children do (Line 4). its utility is trivially its domaindeﬁned utility. While calculating probability proﬁles for this domain. we refer to this as eu (T G). we say instead that in such a case the task has an expected utility of 0. Trivially. multiplied by the probability that it meets its objective. put together. this function depends on the domain speciﬁcs (such as what the task’s uaf is) (Line 5). until all tasks’ proﬁles have been found and the expected utility of the schedule is known. planners do not schedule more than one child under a ONE task.2.1 and 5. That is. Its probability of meeting the plan objective is the probability that at least one child does (which is equivalent to the probability that not all scheduled children do not) (Line 8). the probability that only one succeeds. An OR task’s expected utility is generally a function of the utility of its children.1. Algorithms 5. 5. It moves linearly through the schedule. 5.2 MultiAgent PADS The overall objective of our multiagent domain is to maximize utility earned. Therefore.1 SingleAgent PADS Recall the representation of the singleagent domain: at the root is the AND taskgroup. An AND task’s expected utility is generally a function of the utility of its children. as it does not meet its plan objective until all children do (Line 3). Typically. as it meets its plan objective when its ﬁrst child does (Line 7).1. and below it are the ONE tasks. We deﬁne plan utility for this domain to be 1. 62 . Trivially. ONE tasks’ proﬁles equal those of their sole scheduled child. PADS for this domain is fairly intuitive. this function depends on the domain speciﬁcs (such as what the task’s uaf is) (Line 9). An OR task’s ﬁnish time distribution includes calculating the min of its scheduled children’s. where T G is the taskgroup. if all tasks have one child method that executes within its constraints. we omit considering. or 0 if all do not. po (a) equals the probability that an activity a meets this objective. we deﬁne po (a) to equal the probability that an activity a earns positive utility. Similarly. Due to this simple plan structure. the plan objective is to execute one child of each task within its imposed temporal constraints. In subsequent equations. for cases where more than one child is scheduled. each with their possible child methods underneath. and passing them up to their parents.5 Probabilistic Analysis of Deterministic Schedules An AND task’s ﬁnish time distribution includes calculating the max of its children’s.
2.5.5. on page 31) that if a method does not begin by its latest start time. The ﬁrst is intended to be performed ofﬂine and centrally in order to perform plan management before the initial global schedule is distributed to agents. as Jeremy has remote copies of them..2. calculations begin with the ﬁrst method(s) scheduled to execute. after agent Buster ﬁnish its calculations. time bounds are based only on methods’ scheduled durations. the same basic propagation structures takes place. not just node types (i. however. In the STN representing the deterministic schedules of this domain. Figure 5. the STN will become inconsistent. 2007]. the resulting activity dependencies and proﬁle propagation order for the small example shown in Figure 3. OR or ONE). When PADS is performed globally.1 shows a schedule. Figure 5.2 SingleAgent Domain This domain is inherently more complex than our singleagent domain. i. as well as a ﬁnal deadline. and propagates proﬁles both forward and upward through agents’ schedules and the activity hierarchy until all activities’ (including the taskgroup’s) probability proﬁles have been calculated and the schedule’s expected utility is known. Although it is not illustrated in the diagram for visual clarity. The second version is used for probabilistic plan management during distributed execution. and temporal constraints are in the form of minimal and maximal time lags between tasks. each with exactly one method underneath them. First. leading to a number of extensions to the simple algorithms and equations presented above. 5. the edge representing the duration constraint speciﬁes lower and upper bounds of 63 . and later activities’ proﬁles can be calculated as soon as the proﬁles of all activities they are dependent on have been determined. Recall the characteristics of this domain: the plan’s objective is that all tasks. We use a ﬁxed times planner to generate schedules. the calculations of tasks’ expected utilities depend on the tasks’ uaf types. it would pass the proﬁles of T 2 and the taskgroup to agent Jeremy. Recall from our discussion on STNs (Section 3. lst. and facilitators and hinderers affect its ﬁnish time and expected utility calculations.2 SingleAgent Domain We have implemented PADS for a singleagent domain [Hiatt and Simmons. The only source of uncertainty is in method duration.2 shows how the proﬁle propagation is done in a distributed way. the disables. Second. The analysis begins at the lowest method level.. We have developed two versions of PADS for this domain. facilitates and hinders relationships need to be factored in at various stages in the calculations as appropriate: disablers affect whether a method starts. and we expand solutions to be ﬂexible times by translating them into an STN.e. AND.e. execute without violating any temporal constraints. on page 39. In the distributed case. agents can update only portions of the activity network at a time and must communicate (via message passing) relevant activity probability proﬁles to other agents as the propagation progresses.
/* E* 01* &23+*1445*63+*174* !"#+*%"'&.C4?/5*E*.+*D*.7C?/* "%2F!+*N(#!2&* 8* <D* @0AB+*J*.7C?/5** C*.7C?/* .C4?/5*1D*.7C?/** ./& 7* 2F"O32#* 07* &23+*1145*63+*184* !"#+*$!%(")*&.7C?/5** E*.7C?/5* *K*.7C?/5*1D*.C4?/5*1E*.+*I*.7C?/5** K*.7C?/5** J*.C4?/5*E*. # )!%#"+*# .@.C4?/5*K*.C4?/5*14*..7C?/* ../%&(# !+# !0# !"#$%&'()* !"#+*$!%&.# %1 23 4%.7C?/* "%2F!+*N(#!2&* D* Figure 5.+*I*.5 Probabilistic Analysis of Deterministic Schedules 52&26(# )!%#"**# $%&%!'(# )!%#"7*# !"# %123 4%.7C?/* .9:.+*17*.@./& C* <1* =">3+*74?* @0AB+*14*. 64 .@.7C?/* "%2F!+*G2&2<H* 1* <7* @0AB+*D*.@.5.C4?/5*11*.1: The activity dependencies (denoted by solid arrows) and propagation order (denoted by circled numbers) that result from the schedule shown for the problem in Figure 3..7C?/* "%2F!+*L"&"M* <8* @0AB+*C*.7C?/5*17*.7C?/5* C*.C4?/5*I*.
7@A/5*H*.7@A/5*B*.7@A/* .2 SingleAgent Domain Jeremy: . and passes the proﬁle of T1 to Buster.7@A/** .2: The same propagation as Figure 5.&2<'!2/* !"#$%&'()* !"#+*$!%&./& .7@A/* "%2E!+*N"&"O* Buster: ..&2<'!2/* 2##"% 2*)"# 7* 01* &23+*1445*63+*174* !"#+*%"'&.@4A/5*14*.&2<'!2/* 1* <7* =0>?+*D*./& <1* J"K3+*74A* =0>?+*14*.7@A/5*17*.7@A/* "%2E!+*F(#!2&* 8* D* Figure 5.7@A/* .@4A/5*H*.7@A/5*@*.@4A/5*1D*.+*C*.5./& 2E"I32#* B* @* 01* &23+*1445*63+*174* !"#+*%"'&.7@A/5*1D*.=.7@A/5*G*. but done distributively.7@A/5**H*.=. 65 . Buster can then ﬁnish the proﬁle calculation for the rest of the activity network.&2<'!2/* !"#$%&'()* !"#+*$!%&.7@A/* "%2E!+*F(#!2&* <8* =0>?+*@*./& 7* 07* &23+*1145*63+*184* !"#+*$!%(")*&.7@A/5*@*.@4A/5*B*. Jeremy ﬁrst calculates the proﬁles for its activities.=.@4A/5*11*.=../& PK"*< .+*C*. m1 and T1..7@A/* .@4A/5*B*./* <D* =0>?+*G*..1.7@A/* "%2E!+*L2&2<M* #KE%* 2E"I32#* 07* &23+*1145*63+*184* !"#+*$!%(")*&+9:.@4A/5*1B*.@4A/5*C*.9:.+*17*.+*D*.
if a method misses its lst. . and time bounds generated from it are est . The successful execution of any method is dependent on the successful execution of its predecessors.3. the schedule may still execute successfully. . Ideally. durm ] (see Section 3. 66 . 5. To do so. µ (DUR (m)) + 3 · σ (DUR (m))] This covers approximately 99. Then. a method’s po value should equal the probability that it executes within all its constraints given that all previous ones do.1). . since all tasks need to be executed within their constraints. however.2. the probability that the schedule meets its objective for a problem with T tasks is po (T G) = po (m1 . mi−1 ) That is.. Note that we can avoid explicit consideration of prerequisite relationships. as shown in Section 5.. this contingency would be captured by the probabilistic analysis. We use integer durations because method’s executed durations are rounded to the nearest integer during simulation (Section 3. the schedule’s po value is calculated by multiplying the po values for the individual ONE tasks..5 Probabilistic Analysis of Deterministic Schedules [durm . the STN will become inconsistent.7% of methods’ duration distributions.. PADS expands the duration edges in the STN to include the range of possible integer durations for methods bounded by [max (1. it is almost certain that the schedule cannot successfully be executed asis.. they are implicitly considered by the above consideration that all of a method’s predecessors must successfully execute before it can. as well. The probabilities po (m1 ) . since the agent can execute only one thing at a time. We use a “ ” symbol to denote that we are using this modiﬁed STN: we refer to it as STN .1 Activity Dependencies In this domain. where po (mi ) is the probability that method mi executes without violating any constraints.. etc.2. mT ). po (mT ) are not independent. before performing its analysis. mT ) = i=1 po (mi  m1 . By expanding method durations in the STN.. on page 31.. Conditional probability allows us to rewrite the equation as: T po (m1 . This allows PADS to more accurately calculate the schedule’s probability of succeeding. because if a predecessor does not execute successfully (for example by violating its lft ) then execution of the current schedule no longer has any possibility of succeeding.2.1... such as if the method begins after its lst but executes with a duration less than durm . One implication of this is that. lst ... . which equal the po values of their sole scheduled child. Now. if a method does not start by its revised lst. µ (DUR (m)) − 3 · σ (DUR (m))) . for this description). we can interpret the resulting time bounds differently.
mi ’s expected utility equals po (mi ).1. It is not necessary to ﬁnd mi ’s failure ﬁnish time distribution. the method’s ﬁnish time distribution is truncated to approximate its ﬁnish time distribution given it succeeds. 67 . After the proﬁles of all methods have been found.1. It follows that its approximate start time distribution is its predecessor’s ﬁnish time distribution. if no child (or more than one child) is scheduled. in order to pass that forward to future methods (Line 6). respectively. The method’s ﬁnish time distribution is the sum of its start time and duration distributions (Line 3). after this has been done for all tasks. given it succeeds. we present them in terms of their mean and variance. The function schedule in this and later algorithms returns the agent’s current schedule.2 SingleAgent Domain 5.2. We do not need to calculate the ﬁnish time distributions of the ONE tasks. their proﬁles are passed up to the task level. As with Algorithm 5. Assigning each method an arbitrary constant utility of 1. All tasks are ONE tasks. because neither they nor their ancestors have any direct dependencies with other activities. This value is m’s po value. We use the truncate function to approximate the max function (see Section 3. Algorithm 5. it ﬁrst calculates the method’s start time distribution. the tasks’ po and eu values are 0 (Lines 1214). Finally. the individual task po values can be multiplied to ﬁnd the overall probability of the schedule meeting its objective (Line 17). as if any method fails the schedule cannot execute successfully asis.2 Probability Proﬁles As the structure of the entire activity network is standard for all problems in this domain. F Ts . Distributions are combined according to the equations presented in Section 3. and children and scheduledChildren return the children and scheduled children of a task.5. truncated at its lower bound by est (mi ) (Line 2). as stated above. as reported by the STN . we are able to present a closedform PADS algorithm. Since the distributions in this domain are represented as normal distributions. equals the schedule’s expected utility (Line 18).3. The probability of the method ending at a valid time equals the probability of it ending in the interval deﬁned by its earliest ﬁnish time eft and its latest ﬁnish time lft (Line 4).3 shows the pseudocode for this. Note how this algorithm provides domainspeciﬁc implementations of the functions in Algorithm 5.1 (on page 35). assuming all prior methods have ﬁnished successfully.3. with methods sorted by their scheduled start time. This. Next. we can avoid explicit consideration of prerequisite relationships in our calculations. so their proﬁles equal that of their sole scheduled child (Lines 911). trivially. and reuses its calculation of ONE tasks’ probability proﬁles.1 for speciﬁcs). The algorithm iterates through each scheduled method and ﬁnds its probability of violating temporal constraints given that all previous methods successfully complete.
µ (F T (mi )) . σ (F T (mi ))) //calculate mi ’s expected utility eu (mi ) = po (mi ) //adjust the FT distribution to be mi ’s FT distribution given it succeeds F Ts (mi ) = truncate (eft (mi ) . ∞. µ (F Ts (mi−1 )) .3: Taskgroup expected utility calculation for the singleagent domain with normal duration distributions and temporal constraints. lft (mi ) . 1 Function : calcExpectedUtility foreach method mi ∈ schedule do //calculate mi ’s start time distribution 2 ST (mi ) = truncate (est (mi ) .5 Probabilistic Analysis of Deterministic Schedules Algorithm 5. σ (F T (mi ))) − normcdf (eft (mi ) . σ (F T (mi ))) end //calculate ONE tasks’ profiles foreach T ∈ children (T G) do if scheduledChildren (T )  == 1 then po (T ) = po (soleelement (T )) eu (T ) = eu (soleelement (T )) else po (T ) = 0 eu (T ) = 0 end end po (T G) = T ∈children(T G) po (T ) eu (T G) = po (T G) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 68 . σ (F Ts (mi−1 ))) 3 //calculate mi ’s finish time distribution F T (mi ) = ST (mi ) + DUR (mi ) //calculate mi ’s probability of meeting its objective po (mi ) = normcdf (lft (mi ) . µ (F T (mi )) . µ (F T (mi )) .
the methods’ actual. due to more types of NLEs and. we make the simplifying assumption that a method’s nonlocal source NLEs ﬁnish times are independent of one another (e. because of the varying activity network structure and dependencies between activities.3 MultiAgent Domain This section discusses the implementation of PADS for the multiagent domain we consider [Hiatt et al. In its algorithms.g. when performing PADS we do not expand the STN to an STN that includes all possible method durations. there is a more complicated dependency structure.g. PADS will also refer to p (f t). NLEs between activities on different agents’ schedules. 2008. including other activities explicitly disabling them. First. this design decision was made to facilitate integration with the existing architecture [Smith et al. executed durations. “foreach f t ∈ F T (m). The multiagent domain has many sources of uncertainty and a complicated activity structure. are asserted into the STN as they become known. As part of the algorithms it also iterates through distributions. These proﬁles will be used in as part of a probabilistic metalevel control algorithm. We use the foreach loop construct to designate this.3 MultiAgent Domain Before execution begins. feedback of execution results. which outputs schedules represented as STNs. we do not present a closedform formula representing po (T G). however. 5. this assumption is rarely broken 69 . As PADS continues to update its probability values.5. Finally.e. that the probability of meeting the plan objective equals the probability of earning positive utility. Inside the loop.2 (on page 44). We assume a partialorder deterministic planner exists. more speciﬁcally.2.. this algorithm is used to ﬁnd the probability proﬁles of the entire schedule. i. therefore. a method’s enabler is different from its disabler or facilitator). There are also more ways for activities to fail than violating their temporal constraints. PADS combines the distributions according to the equations deﬁned in Section 3. e.3. Unlike the singleagent domain. We say. When calculating activity ﬁnish time proﬁles for this domain. which will be described in Section 7.. this is the probability of f t occurring according to FT. The distributions we work with in this domain are discrete.. it skips over activities that have already been executed and ﬁnds only the proﬁles for activities that have not yet completed. During execution.” which iterates through each item f t in the distribution F T (m).3. 2009].1 Activity Dependencies Activity dependencies and po calculations need to be handled differently for this domain than in the singleagent domain.. 2007]. Recall that the goal of this domain is to maximize utility earned. 5..
but depend on T 1 earning utility. and the extra computational effort required to sort out these dependencies was deemed unnecessary (see Section 5. We deﬁne events as possible contingencies of execution. It is clear from this example that po (T 1) = 0. just m1 must earn utility.8. an enabler rarely also serves as a facilitator). We do not. U F (T 1) can be evaluated to ﬁnd po (T 1). trivially. Therefore. that “m1 must execute with positive utility for T 1 to earn utility. This formula says. Assuming that these values are independent means that po (T 2) = 0. as it is too costly with respect to accuracy.8 · 0.. or all must happen.g.. The formula assigns probabilities to each individual event.8. such as a method ending in a failure outcome. This approximation was found to be accurate enough for our purposes. we are doublecounting T 1’s probability of failing in po (T 2).8 as they both can earn utility only if T 1 does. for example. and m1 only fails with a 20% likelihood. U Fev . We deﬁne U F (T 1) = (utilm1. (utilm1 ← 0. The event variable utilm1 designates “the event of m1 earning utility” and is used in place of m1 to avoid confusion. we also assume that all children’s ﬁnish time distributions are independent of one another. however.) into an overall formula of earning utility.g. our representation of po for all activities is based on a utility formula U F (a). As an example of how the utility formula is handled. at least one needs to happen. and the value assignment component. To see why. Since neither m3 nor m4 will fail on their own. Now we can 70 . here. or a method missing its deadline. however.” Formally. a method’s disabler ending before the method begins. U Fval . The value assignment component speciﬁes probability assignments for the events. their formulas mimic that of T 1: U F (m3) = U F (m4) = (utilm1.8 = 0.8).8 because its only scheduled child is m1.64. consider again the example in Figure 5. (utilm1 ← 0. This assumption fails if. etc. We deem this error unacceptably high. essentially. consider the activity network and associated schedule shown in Figure 5. and can be evaluated to ﬁnd an overall probability of earning utility. po (m3) = 0. Both m3 and m4 are enabled by T 1.8. intuitively. In the example above.8 (due to m1’s possible failure outcome).1. To track the dependencies. which here. both children are enabled by the same enabler. it says that the event of m1 earning utility will occur with probability 0. make this type of independence assumption when calculating po values.8)). when in fact it should equal 0.8 and po (m4) = 0. For tasks. The event formula component speciﬁes which events must come true in order for T 1 to earn utility. which indicates what events must occur during execution for activity a to earn utility and how to combine the events (e. we also show an example supporting this decision at the end of this section. and so put forth the computational effort to keep track of these dependencies. if neither has a probability of missing its deadline. is 0. U F formulas have two parts: the event formula component.1 (on page 64).5 Probabilistic Analysis of Deterministic Schedules in practice (e. this happens with a probability of 0.4). here. U Fev (T 1) = utilm1 and U Fval (T 1) = (utilm1 ← 0.8)).
8)). 0. More generally. (utilm1 ← 0. as both m3 and m4 must earn utility for T 2 to earn utility. (utilm1 ← 0. and F Tnu (a).5. we get F Tu (T 2) = 71 .g. For T 2 to earn utility. if a method ends in a failure outcome. the “ﬁnish time” of the disabled method is max (F T (pred (m)) . hypothetically.3}. the probabilities of F Tu (a) and F Tnu (a) sum to 1. or a’s ﬁnish time distribution when it does not earn utility. U F (T 2) = ((and utilm1 utilm1) . U Fev (T 2) is the and of U Fev (m3) and U Fev (m4). it is the concatenation of these distributions. In this domain. which evaluates to po (T 2) = 0. We now return back to our assumption that a task’s children’s ﬁnish time distributions are independent of one another. Therefore. F Tu (T 2) equals the maximum of its children’s F Tu distributions. 0. and F Tu (m4) = {{127. in contrast to the singleagent domain. This can be simpliﬁed to U F (T 2) = (utilm1. since it does not earn utility until both children have. These events have event variables associated with them (e. (2) whether its enablers must earn positive utility. 0. {120. if it is not able to be executed. In this domain. The function pred returns the predecessor of a method (the method scheduled before it on the same agent’s schedule). violates a domain constraint or cannot be executed.8)).6}. 0. when the next activity could begin executing. Together. which can be combined together as an event formula that represents what must happen for a method to earn utility. f t (d)). as we saw above. For example. (4) whether it meets its deadline. Therefore.8. We represent F T (a) in terms of two components: F Tu (a). the sucm variable represents a method m executing with a successful outcome). that we have already calculated that F Tu (m3) = {{118. When calculating the ﬁnish time distribution of task T 2 from the above example. there are ﬁve events that affect whether a method earns utility: (1) whether it is disabled before it begins to be executed. and (5) whether its execution yields a successful outcome. ensuring all events have probability value assignments. that event formula can then be evaluated in conjunction with value assignments to ﬁnd the method’s probability of earning utility. When we ﬁnd the maximum of these distributions as if they were independent. Say. both m3 and m4 need to. it is necessary to explicitly calculate methods’ failure ﬁnish time distributions to propagate forward. So. executing the remainder of the schedule asis could still earn utility. a method’s ﬁnish time is either when it actually ﬁnishes (whether it earns utility or not) or. (3) whether it begins executing by its lst. {129.8.2}}.. utilm1 ← 0. we would doublecount the probability that T 1 fails and does not enable either m3 or m4.3 MultiAgent Domain deﬁne U F (T 2). or a’s ﬁnish time distribution when it earns utility. if while a method’s predecessor is executing.5}}. Because T 2 is an AND task. as that is the earliest time that the subsequently scheduled method (on the same agent’s schedule) could start. the correct value. U Fval (T 2) is the union of U Fval (m3) and U Fval (m4). the agent learns that the method has been disabled by an activity d. when we refer to F T (a) alone.
or both do not earn utility. (2) it is enabled. (5) is independent of the others. Of these events.16}}. we ﬁnd the probability value assignments of these individual events given these dependencies. If. this example illustrates why we make the independence assumption for tasks’ ﬁnish time distributions. Recall again the ﬁve events determining whether a method earns utility: (1) it is not disabled. {129. (4) it meets its deadline. {129.4 addresses the global and distributed versions. so that the distribution of T 2’s ﬁnish time when it earns utility sums to po (T 2). enablers. and introduces the extra step we take of scaling the distributions by activities’ po values. Method Probability Proﬁle Calculations The calculation of methods’ proﬁles for this domain is the most complicated. depends on whether it is able to begin execution and is contingent on (1) and (2). with each activity calculating its proﬁle in terms of the proﬁles of other activities it is connected to.3 and 5. speciﬁcally. Although this approach does not always result in the exactly correct ﬁnish time distributions.2}}.2 Probability Proﬁles We deﬁne the algorithms of PADS recursively. the ﬁnish time distribution of T 2 should be {{127. The complex activity dependency structure for this domain also means that the highlevel algorithms presented in Algorithms 5. however. however.The event of m being disabled before it starts.2 (on pages 60 and 61) must be elaborated upon in order to calculate probability proﬁles for activities in this domain. Sections 5. we scale F Tu (T 2) by our previously calculated po (T 2). then. The following algorithms are used in both the global and distributed versions of PADS for this domain. We list them explicitly below along with the associated event variable: 1.3. We present much of the probability proﬁle calculation in this domain in terms of providing appropriate individual functions to instantiate these algorithms. 0.6}. Intuitively. similarly. Then. Event (3). {129. 0. 72 . (3) it meets its lst and begins executing. however.3. predecessors. because m3 and m4 always either both earn utility. facilitators and hinderers) are considered at this level. disablers.48}. we get the desired value of F Tu (T 2) = {{127. dism . and (5) it does not end in a failure outcome..5 Probabilistic Analysis of Deterministic Schedules {{127.2}}. as all activity dependencies (e.g. 0. 5.6}. 0. (4) is dependent on (3).1 and 5.3. we also. as previously stated. To ﬁnd the overall event formula of m earning utility. 0. assume that (1) and (2) are independent. 0.
Overall. Two of these events. The values of the other three. 4. sucm . Therefore as we calculate the method’s start time and ﬁnish time distributions. as execution continues in this domain past a method failure. where en1. f tm . In the algorithms presented below.The event of m being enabled. 5. holds.1 consideration for what happens when a method’s enablers fail. 3.. a method’s start time distribution is a function of three things: (1) the ﬁnish time distribution of the method’s predecessor.The event of m not ending in a failure outcome. however.3 MultiAgent Domain 2.. etc.The event of m meeting its deadline given that it begins execution.4. en2.2. Of the ﬁve events determining whether a method earns utility.. (2) the ﬁnish time distributions of its enablers given that they earn utility. these functions in general do not have return values. and (3) the method’s 73 . For example. Start Time Distributions We assume in our calculations that methods begin at their est. The algorithm for calcStartTimes is shown in Algorithm 5. enm = (and U Fev (en1) U Fev (en2) . we also calculate the value assignments (and event formulas. dism and enm are actually event formulas.5.). F T (pred (m)). F T (m). as execution continues even if a constraint is violated. etc.4. are the enablers of m. therefore. due to its a priori likelihood. We must also check for disablers. And.The event of m beginning execution given that it is not disabled and is enabled. if applicable) of two – all enablers earning utility. F Tu (e) (recall that m will not execute until all enablers earn utility). and the method having a successful outcome – can be found initially. depend on the method’s start and ﬁnish times. stm .g. The variable dism equals a similar formula. it is independent of the other events. in general. U Fev (m) = (and (not dism) enm stm f tm sucm). as executives have incentive to start methods as early as possible. for the case of disablers) for these three cases in order to avoid duplicate work. we must explicitly check to see if methods’ will be able to start by their lst. When calculating a method’s starting distribution. These formulas appear in the place of dism and enm in m’s event formula. ST (m). we discuss one example of a method not being able to start at its est in our PADS experiments in Section 5. This assumption.). in that they are determined by the outcome of other events. enm . the probability values (and event formulas. we assume that variables are global (e. we assume this is independent of dism. In this domain. we must include in our implementation of Line 1 of Algorithm 5.
We add these to F Tnu (m) (Line 24). To calculate the probability that m is disabled. instead. the planner will wait to remove it just in case the failed enabler(s) can be rescheduled. The variable is of the form ddisd . m is not removed from the schedule until its lst has passed. if it is. If there is a chance a disabler will ﬁnish with utility by st. if m is not disabled and is enabled. several types of failures can occur: (1) the method might be disabled. If all enablers are not scheduled. Therefore we add lst (m) + 1 to F Tnu (m) along with the probability pst − pdisbl . the group as a whole never ﬁnishes and so has an end time of ∞. The ﬁrst component shown. the algorithm considers the distribution of the ﬁrst of m’s disablers to successfully ﬁnish with utility (Lines 8 9). then po (m) and eu (m) both equal 0. then m is not enabled in this iteration. the algorithm checks f te . {maxe∈en(m) F Tu (e)} is the distribution representing when all enablers will successfully ﬁnish. Next. Before a method’s start time distribution can be calculated. If f te equals ∞. The algorithm increments the probability of this event and adds it to m’s disable formula (Lines 1314). it adds this probability (scaled by the probability of st) to F Tnu (m). The associated ﬁnish time is the maximum of the predecessor’s ﬁnish time and lst (m) + 1. (2) the method might not be enabled. If m might be disabled it also makes note of it for U F (m). it sums to the probability that m successfully begins execution. in this case. then this is an invalid start time (Line 23). but st > lst (m). and (3) the method might miss its lst. Otherwise. The second component. For each potential disabler the algorithm creates a unique event variable representing the disabling (Line 12). Along the way. The algorithm iterates through this distribution.5 Probabilistic Analysis of Deterministic Schedules release (earliest possible start) time. the proﬁles of m’s predecessor and all enablers must have been calculated. Note that ST (m) does not sum to 1. represents the group’s ﬁnish time if not all enablers succeed. {{∞. It also increments the probability that m will be disabled at this start time for later use (Line 15). where d is either m or an ancestor of m. 1 − e∈en(m) (1 − po (e))}}. which is the probability of this iteration minus the probability that m has already been disabled (Line 20). The algorithm begins by generating a distribution of the maximum of its enablers’ ﬁnish times (Line 1). m can start at time st and so we add pst − pdisbl and st to ST (m) (Line 26). In this case m will miss its latest start time with probability pst − pdisbl . The event formula representing 74 . It ﬁrst checks to see if the method might be disabled by the time it starts. Finally. the method would immediately be removed from the schedule without being executed. to consider individual possible start times st. At the end of the algorithm we collect information for U F (m). This process is repeated for all possible st’s. In such cases. as well as the distribution of m’s predecessor’s ﬁnish time (Line 4). whichever is directly disabled by d. along with m’s ﬁnish time in that case (Lines 1011).
f tpred ) . pst − pdisbl }. F Tnu (m)) else push ({st. f tpred ) . pst − pdisbl }. d) p (ddisa) = p (ddisa) + pst · p (f td ) disev = disev ∪ {ddisa} pdisbl = pdisbl + pst · p (f td ) end end end //is the method enabled for this start time? if f te == ∞ then push ({max (lst (m) + 1. F Tnu (m)) a = directlyDisabledAncestor (m. F Tnu (m)) else pnorm = pnorm + pst − pdisbl if st > lst (m) then //invalid start time push ({max (lst (m) + 1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Function : calcStartTimes (m) E = {maxe∈en(m) F Tu (e)} {{∞.4: Method start time distribution calculation with discrete distributions.3 MultiAgent Domain Algorithm 5. pst · p (f td )}. f te ) pst = p (f te ) · p (f tpred ) //might the method be disabled in this combination? pdisbl = 0 foreach D ∈ P (dis (m)) do foreach f td ∈ mind∈dis(m) (d ∈ D ? F Tu (d) F Tnu (d) + ∞) do if f td ≤ st and f td ≤ lst (m) then //the method is disabled push ({max (f tpred . ST (m)) end end end end //collect the probability formulas for U F (m) 31 dism = (or disev ) 32 p (stm) = Σst∈ST (m) p (st) /pnorm 75 . pst − pdisbl }. f tpred . f td ) .5. 1 − e∈en(m) po (e)}} pnorm = 0 foreach f te ∈ E do foreach f tpred ∈ F T (pred (m)) do st = max (rel (m) . temporal constraints and hard NLEs.
Soft NLE Distributions In order to calculate m’s ﬁnish time and expected utility. Algorithm 5.5 Probabilistic Analysis of Deterministic Schedules m being disabled is (not (or disev )) where disev and the corresponding value assignments of it were found earlier (Line 31).5 presents the algorithm for this. 1 − pnle}}. U C) end foreach dc ∈ ci ∈DC ci do push ({dc. {1. st)) end foreach uc ∈ ci ∈U C ci do push ({uc. the algorithm iterates through m’s start times. 1 − pnle}}. it ﬁnds the probability that the facilitator (hinderer) will earn utility by time st (Line 5). 1 − pnle}}. facilitators and hinderers. U C) end foreach h ∈ hind (m) do pnle = f t∈F Tu (h) f t ≤ st ? p (f t) : 0 push ({{(1 + df · proputil (h. st)) . p (dc)}. DC) push ({{(1 − uf · proputil (h.5: Method soft NLE duration and utility effect distribution calculation with discrete distributions. 1 − pnle}}. p (uc)}. each entry is the distribution of possible NLE coefﬁcients for that start time. it is normalized in this way so that it equals the probability that it starts given that it is not disabled and is enabled (Line 32). DC) push ({{(1 + uf · proputil (f. {1. N LEu (m)) end end At a high level. For each start time st in ST (m) and facilitator (hinderer). The event variable representing m meeting its lst and starting is stm. {1. pnle}. which happens with probability Σst∈ST (m) p (st) /pnorm . it indexes the distribution N LEd by m’s possible start times. pnle}. Further. st)) . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Function : calcNLEs (m) foreach st ∈ ST (m) do DC = {1. we ﬁrst need to know m’s facilitating and hindering effects. 1} U C = {1. The algorithm returns a set of distributions representing the effects on both m’s duration N LEd and utility N LEu . It then adds a small distribution to the lists DC and 76 . 1} foreach f ∈ fac (m) do pnle = f t∈F Tu (f ) f t ≤ st ? p (f t) : 0 push ({{(1 − df · proputil (f. st)) . Algorithm 5. st)) . pnle}. N LEd (m. {1. pnle}.
With these formulas and value assignments in hand. with probability 1−pnle it will not (Lines 67). F Tu (m) is calculated. At this point. there are two options: reduce the event formula U Fev (m) such that each event appears only once and evaluate it using the variable probability values in U Fval (m). Probability of Earning Utility To calculate m’s probability of meeting the plan objective (which equals its probability of earning positive utility). as well as p (stm) and p (f tm). Then. Otherwise. This is repeated for each possible start time until the full distributions are found. so we chose the 77 .3 MultiAgent Domain U C representing the possible coefﬁcients of what the facilitator’s (hinderer’s) effect on m will be: with probability pnle it will have an effect.1 (on page 60) is complicated due to the presence of NLEs and the separation of F Tu and F Tnu . for each assignment. To calculate po (m) from U F (m).6. Iterating through the effective duration distribution. It iterates through st ∈ ST (m). calculate the probability of that assignment occurring as well as whether it leads to an eventual success or failure. additionally. As all of the events must happen for m to earn utility. or consider all possible assignments of truth values to those events and. Therefore p (f t) · fail (m) and the time f t are added to F Tnu (m) (Line 10). however. The implementation of Line 2 of Algorithm 5. The former. The last line of the algorithm collects the probability value assignment for f tm (Line 15).5. the only other thing to consider is that m might end with a failure outcome. all of the event probabilities that have been calculated in preceding algorithms are combined. the method earns zero utility and the algorithm adds the probability of this ﬁnish time p (f t) and f t to F Tnu (m) (Line 7). This is shown in Algorithm 5. it is a valid ﬁnish time. st) to ﬁnd m’s effective duration distribution when it starts at time st. the algorithm can put it all together to get m’s ﬁnal utility formula. the event formula is the and of these ﬁve individual event formulas (Line 1). Trivially. Its value is the probability that it does not miss a deadline given that it begins execution. and the result is added to m’s NLE distributions (Line 15). is not guaranteed to be possible. and for each start time multiplies m’s duration distribution with N LEd (m. If f t > dl(m). the event formulas and value assignments for enm and sucm are already known. In the preceding algorithms the event formula for dism and the corresponding value assignments were found. The algorithm is illustrated in Algorithm 5. the algorithm combines these small distributions via multiplication to ﬁnd an overall distribution of what the possible NLE coefﬁcients for m are at time st (Line 14). the algorithm considers each ﬁnish time f t that results from adding each possible start time and duration together (Lines 23). and p (f t) · (1 − fail (m)) and the time f t are added to F Tu (m) (Line 11). the value assignments are the union of the individual value assignments (Line 2). Finish Time Distributions Next.7.
6: Method ﬁnish time distribution calculation with discrete distributions. p (f t) · fail (m)}.7: Method probability of earning utility calculation with utility formulas. e∈en(m) U Fval (e) . p (f t) · (1 − fail (m))}. F Tu (m)) end end end //collect the probability for m meeting its deadline p (f tm) = Σst∈ST (m) p (st) − pdl /Σst∈ST (m) p (st) 15 Algorithm 5. sucm ← (1 − fail (m))} //evaluate U F (m) 3 po (m) = evaluateUtilityFormula (U F (m)) 78 . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Function : calcFinishTimes (m) pdl = 0 foreach st ∈ ST (m) do foreach dur ∈ DUR (m) · N LEd (m. st) do f t = st + dur p (f t) = p (st) · p (dur) if f t > dl (m) then //the method did not meet its deadline push ({f t. F Tnu (m)) //otherwise it ends with positive utility push ({f t.5 Probabilistic Analysis of Deterministic Schedules Algorithm 5. stm ← p (stm) . F Tnu (m)) pdl = pdl + p (f t) else //the method might finish with a failure outcome push ({f t. f tm ← p (f tm) . Function : calcProbabilityOfObjective (m) //first put it all together to get U F (m) 1 2 U Fev (m) = and (not dism) e∈en(m) U Fev (e) stm f tm sucm U Fval (m) = { ddisa∈dism ddisa ← p (ddisa) . temporal constraints and soft NLEs. p (f t)}.
providing an implementation for Line 4 of Algorithm 5. and evaluating the formula. PADS normalizes F Tu (a) and F Tnu (a) so that they sum to the values po (a) and 1 − po (a). po (a)) normalize (F Tnu (a) . after soft NLEs.9 shows how eu (m) is calculated for methods. Algorithm 5. If it is true. Therefore its F Tu distribution represents when the ﬁrst child earns utility. Once all subsets are considered we have the ﬁnal value of po (a).5.3 MultiAgent Domain latter to use in our approach. as discussed in Section 5. The algorithm iterates through UTIL (m) and N LEu (m) to ﬁnd the method’s effective average utility. and then multiplies that by m’s probability of earning utility. It does so by substituting all variables in U Fen (a) that are in S with true and all that are not with false. 1 − po (a)) Expected Utility Algorithm 5.1. We use the powerset function P to generate the set containing all subsets of the events (Line 1). the task does as well. similarly if no child has positive utility. the algorithm evaluates whether U Fen (a) is true when all variables in S are true and all that are not are false (Line 3).8. For each subset S.1 (on page 60). shown as Algorithm 5.3. respectively. This algorithm is the same for all activities in the hierarchy. OR Task Probability Proﬁle Calculations An OR task’s utility at any time is a direct function of its children’s: if any child has positive utility.8: Activity utility formula evaluation. and its F Tnu distribution represents when 79 . The algorithm creates different truth value assignments by looking at all possible subsets of the events and assigning “true” to the events in the set and “false” to those not in the set. Finally. nor will the task. 1 2 3 4 5 6 7 8 9 Function : evaluateUtilityFormula (a) foreach S ∈ P (U Fvar (a)) do L = U Fvar (a) − S if evaluate (U Fen /∀s ∈ S ← true/∀l ∈ L ← false) then p (S) = s∈S p (s) · l∈L (1 − p (l)) po (a) = po (a) + p (S) end end normalize (F Tu (a) . it calculates the probability of that subset and adds it to po (a).
The probability of this ﬁnish time under these circumstances is the probability of f t multiplied by the probability that all children in L do not earn utility (i. to ﬁnd when T earns utility if only a earns utility. Algorithm 5. to ﬁnd when T fails to earn utility if neither a nor b earn utility. A task’s probability proﬁle cannot be calculated until the proﬁles of all scheduled children have been found. if there are no scheduled children.2 (on page 61). To ﬁnd these distributions. scaled by 1 − po (a). a set of children that hypothetically does not earn utility. the algorithm considers each possible set of task T ’s children earning utility by taking the powerset of T ’s scheduled children (Line 1).5 Probabilistic Analysis of Deterministic Schedules Algorithm 5. its eu and po values equal 0. considering each member f t in turn (Line 4). the set difference between T ’s scheduled children and S is L. As described in the above paragraph. 1 2 3 4 5 6 7 8 9 Function : calcExpectedUtility (m) eu (m) = 0 foreach util ∈ UTIL (m) do enle = 0 foreach nle ∈ N LEu (m) do enle = enle + nle · p (nle) end eu (m) = eu (m) + (enle · util · p (util)) end eu (m) = eu (m) · po (m) the last child fails to earn utility. Each set S in the powerset represents a set of children that hypothetically earns utility. the algorithm ﬁnds the distribution representing the minimum ﬁnish time of the children in S and iterates through it. where each subset represents a hypothetical set of children earning utility. We do this algorithmically by calculating the powerset of T ’s children. to ﬁnd when T earns utility if both a and b earn utility. to ﬁnd when T earns utility if only b earns utility.9: Method expected utility calculation with discrete distributions and soft NLEs.10 shows how to calculate these distributions. The max of F Tnu (a) and F Tnu (b). As a simple example of this. If the size of set S is at least one. providing implementation for Line 7 of Algorithm 5. 80 . it is necessary to consider all possible combinations of children succeeding and failing. the product of 1 − po (l) for each l ∈ L) (Line 5).e. if a task T has two scheduled children a and b.. to calculate the ﬁnish time distribution of T we must consider: • • • • The min of F Tu (a) and F Tu (b). F Tu (b). F Tu (a). scaled by 1 − po (b). meaning at least one child earns utility in this hypothetical case.
F Tu (T )) end else //no child earned utility foreach f t ∈ maxl∈L F Tnu (l) do push ({f t.10: OR task ﬁnish time distribution calculation with discrete distributions and uafs. For a max task. The expected utility of an OR task depends on its uaf.. to do so the algorithm considers the powerset of all scheduled children. As with an OR task’s ﬁnish time distribution. Algorithm 5.e. as the expectation of a sum is the sum of the expectation (Line 3). pf t }. the algorithm ﬁnds the latest ﬁnish time distribution of the hypothetically failed children and adds it to F Tnu (T ) (Line 10). its expected utility is a function of the expected utilities of its children provided they earn positive utility. it is the sum of the expected utilities of its children. if no child earns utility in the current hypothetical case (i.5. Otherwise.11. The probability that any set S in that powerset occurs is the probability that the children in S earn utility and the children in L do not (Line 10). 1 2 3 4 5 6 7 8 9 10 11 12 13 Function : calcORTaskFinishTimes (T ) foreach S ∈ P (scheduledChildren (T )) do L = scheduledChildren (T ) − S if S ≥ 1 then //at least one child earned utility foreach f t ∈ mins∈S F Tu (s) do pf t = p (f t) · l∈L (1 − po (l)) push ({f t. p (f t)}. its calculation is shown in Algorithm 5.8. where each set in the powerset represents a hypothetical combination of children earning utility. and the value assignments are the union of its children’s assignments. F Tnu (T )) end end end An OR task’s utility formula is simple to calculate.3 MultiAgent Domain The algorithm adds each possible ﬁnish time and its probability to F Tu (T ) (Line 6). The event formula is the or of its children’s formulas. For a sum task. the algorithm must consider all possible combinations of children succeeding or failing and ﬁnd what the maximum utility is for each combination. We call this value epu (Line 11). S is the empty set). 81 . As the task’s expected utility is the max of only the children that earn utility. It can be evaluated by using the function evaluateUtilityFormula from Algorithm 5. This number can be easily calculated for activities as part of their probability proﬁles.
the set difference between T ’s scheduled children and L is S. the children that do earn utility in this hypothetical case.11: OR task expected utility calculation with discrete distributions and uafs. the algorithm ﬁnds the distribution representing the minimum ﬁnish time of the children who do not earn utility and consider each possible value f t in the distribution (Line 4). The ﬁnish time distribution calculation is shown in Algorithm 5. The algorithm considers each possible set of task T ’s children not earning utility by taking the powerset of T ’s scheduled children (Line 1). Therefore its F Tu distribution represents when the last child earns utility. A task’s probability proﬁle cannot be calculated until the proﬁles of all scheduled children have been found. If any child has zero utility. its eu and po values equal 0 and no further calculation is necessary. if not all children are scheduled. Each set L in the powerset represents a hypothetical set of children that do not earn utility. The probability of this ﬁnish time under these circumstances is the probability of f t times the probability that all children 82 . and its F Tnu distribution represents when the ﬁrst child fails to earn utility. then the task does as well.5 Probabilistic Analysis of Deterministic Schedules Algorithm 5. As with an OR task’s ﬁnish time distribution. If at least one child does not earn utility in the current set. which can be generated algorithmically by the powerset operator. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Function : calcORTaskExpectedUtility (T ) switch Type of task T do case sum eu (T ) = s∈scheduledChildren(T ) eu (s) case max foreach S ∈ P (scheduledChildren (T )) do if S == 0 then eu (T ) = 0 else L = scheduledChildren (T ) − S p (S) = s∈S po (s) l∈L (1 − po (l)) eu (T ) = eu (T ) + p (S) · maxs∈S epu (s) end end end end AND Task Probability Proﬁle Calculations An AND task’s utility at any time depends on whether all children have earned utility. to do so it is necessary to consider all possible combinations of children succeeding and failing.12.
it is the minimum of these estimates (Line 5). the task will not earn utility and its proﬁle is trivial. if the number of scheduled children is not one. if all children earn utility. Because an and task earns utility only when all children do. epu.2 (on page 61). its expected utility is a function of the expected utilities of its children provided they earn positive utility. We add each possible ﬁnish time and its probability to F Tnu (T ) (Line 10). F Tu (T )) end end end An AND task’s event formula is the and of its children’s formulas. The algorithm for calculating the expected utility of an AND task is shown in Algorithm 5. 83 . then we ﬁnd the latest ﬁnish time of the children and add it to F Tu (T ) (Line 10). pf t }. if the activity is a min task. and the value assignments are the union of its children’s assignments. this value.5.3 MultiAgent Domain in S do earn utility (Line 5). Its implementation is the same as in Algorithm 5. Otherwise. 1 2 3 4 5 6 7 8 9 10 11 12 13 Function : calcANDTaskFinishTimes (T ) foreach L ∈ P (scheduledChildren (T )) do S = scheduledChildren (T ) − L if L ≥ 1 then //at least one child did not earn utility foreach f t ∈ minl∈L F Tnu (l) do pf t = p (f t) · s∈S po (s) push ({f t.8. As mentioned in the discussion of OR tasks’ expected utility calculations. It can be evaluated by using the function evaluateUtilityFormula from Algorithm 5. p (f t)}. Algorithm 5. ONE Task Probability Proﬁle Calculation A ONE task’s probability proﬁle is that of its sole scheduled child.12: AND task ﬁnish time distribution calculation with discrete distributions and uafs. can easily be calculated as part of an activity’s probability proﬁle. If the activity is a sum and task. F Tnu (T )) end else //all children earned utility foreach f t ∈ maxs∈S F Tu (s) do push ({f t.13. it is the sum over all children of this estimate (Line 3).
it may be negative. which signiﬁes a change to the structure or proﬁles of the taskgroup. speciﬁes by how much the expected utility of the taskgroup would differ if the activity were to deﬁnitely succeed versus if it were to deﬁnitely fail. The utility difference can be interpreted as representing two different measures: how much utility the taskgroup would lose if the activity were to fail. These two interpretations are the bases for the ideas behind expected utility loss and expected utility gain. For example. we scale ud by 1 − po (a). The above equation. we rely on the  operator. however. we call this utility difference (ud). how much utility the taskgroup would gain if the activity were to earn positive utility.5 Probabilistic Analysis of Deterministic Schedules Algorithm 5. eu (T G  po (a) = 1) denotes the expected utility of the taskgroup if po (a) were to equal 1. minus the expected utility of the taskgroup assuming that a earns zero utility: eu (T G  po (a) = 1) − eu (T G  po (a) = 0) In this and subsequent chapters. Both of these metrics are based on the expected utility of the taskgroup assuming that an activity a earns positive utility. 1 2 3 4 5 6 7 Function : calcANDTaskExpectedUtility (T ) switch Type of task T do case sum and eu (T ) = po (T ) · Σc∈children(T ) epu (c) case min eu (T ) = po (T ) · minc∈children(T ) epu (c) end end Expected Utility Loss / Gain As part of its analysis. or. in the presence of disablers. instead. when considering eul we do not care if an activity has a huge utility difference if it will never fail. PADS also calculates two heuristics which are used to rank activities by their importance by probabilistic schedule strengthening (Chapter 6). It measures this importance in terms of expected utility loss (eul) and expected utility gain (eug). we would want to prioritize looking at activities with medium utility differences but large probabilities of failing. in general. this turns the measure into the utility loss expected to happen based on a’s probability of failing. positive. The overall equation for expected utility loss is: eul (a) = (1 − po (a)) · (eu (T G  po (a) = 1) − eu (T G  po (a) = 0)) 84 . To ﬁnd expected utility loss. The utility difference is.13: AND task expected utility calculation with discrete distributions and uafs. For example. then.
PADS temporarily sets po (a) to be 1 and 0. for example. and are updated by PADS if the information associated with upstream (or child) activities changes. The problem that we show is similar to Figures 3. utility difference can also be used as a metric without considering po values. in turn. but by a’s probability of earning utility. global PADS goes through the activity network and calculates probability proﬁles for each activity. Using ud alone is suitable for systems that want to prioritize.3) to give further intuition for how PADS works . for example. ud = eu (T G  po (a) = 1)−eu (T G  po (a) = 0).5 and 5. respectively. Probability Proﬁle Propagation Example In this section. and propagates that information throughout the activity network until the eu (T G) has been updated and provides the values of eu (T G  po (a) = 1) and eu (T G  po (a) = 0).3 Global Algorithm When initially invoked. eliminating the possibility of a catastrophic failure.3 MultiAgent Domain Expected utility gain also scales ud.3. a medium utility difference but high probability of earning utility over an activity with a high utility difference but a very low probability of earning utility. (1 − po) · ud and po · ud are suitable for systems with the goal of maximizing expected utility. so that it measures the gain in utility expected to happen based on a’s probability of earning utility. no matter how unlikely. depending on what the goals of the programmer are. but is simpliﬁed here to promote understanding. 5.1. it has deterministic utilities and smaller 85 . and multiply it by 1−po (a) or po (a) to ﬁnd eul (a) or eug (a). as ours. PADS begins with the method(s) with the earliest est(s). Proﬁles are stored with each activity. This ensures that eug prioritizes activities with. To calculate eul (a) and eug (a). The equation for expected utility gain is: eug (a) = po (a) · (eu (T G  po (a) = 1) − eu (T G  po (a) = 0)) Although we do not to so in this work. Because these calculations ideally require knowledge of the entire activity hierarchy. we show global probability proﬁle calculations for a simple example (Figure 5. as utility difference can be calculated for any activity in any activity network. These heuristics are not speciﬁc to this domain. they are implemented differently depending on whether it is global or distributed.5. PADS can then ﬁnd the utility difference. it then propagates proﬁles through the network until proﬁles for the entire tree have been found. We discuss details of this in the following two sections.
as it is the earliest scheduled method.D4@/5*E*. duration distributions. It has no disablers. We also show the schedule for this problem.7D@/* "%2F!+*G2&2<H* 52&26(# )!%#"**# $%&%!'(# )!%#"7*# 2F"N32#* 07* &23+*1145*63+*184* !"#+*$!%(")*&.A. lst (m4) = 121. lft (m3) = 121 • est (m4) = 120. lst (m1) = 101. 1. so N LEu (m1) = {{1.5 Probabilistic Analysis of Deterministic Schedules !"#$%&'()* !"#+*$!%&. so p (dism1) = 0.# %1 23 4%./%&(# !+# !0# Figure 5. lft (m1) = 116 • est (m3) = 115. so its start time distribution depends on nothing but its release time (rel). It has no enablers and no predecessor.0}}. ST (m1) = {{100.D4@/* "%2F!+*L(#!2&* "%2F!+*L(#!2&* !"# %123 4%. 1}} and N LEd (m1) = {{1.D4@/* .+*B*.3: A simple 3agent problem and a corresponding schedule. this implies that p (stm1) = 1 because it never misses its lst of 101. eft (m1) = 115.9:.D4@/5*11*.+*C*. Therefore. The time bounds for this schedule generated by the deterministic STN (and thus based on the methods’ average./& <1* (=3+*175*>"?3+*74@* . 1}} for all start times. since methods are always started as early as possible.A. 86 . eft (m4) = 129. # )!%#"+*# . It also has no soft NLEs. eft (m3) = 120.+*1B*./& <7* (=3+*D* 6(&+*I* "%2F!+*J"&"K* <8* <B* (=3+*E* (=3+*M* . lst (m3) = 116.. scheduled durations) are: • est (m1) = 100./& 01* &23+*1445*63+*174* !"#+*%"'&.CD@/5*1E*..A. lft (m4) = 130 Our calculation begins with method m1..
8} onto F Tu (m1). f tm1}: not shown. 0. Trivially.75}.25}}.2} onto F Tnu (m1) to account for its failure outcome likelihood. evaluates to false = {stm1. we omit dism1 and enm1 because m1 does not have enablers or disablers. 7.5.25. sucm1}: not shown.75·0. 6. 0. evaluates to false = {stm1. 1. 5. generated by the powerset operator.8}. S S S S S = {f tm1}: not shown. 0. which also meets m1’s deadline. evaluates to false 87 . To evaluate this. The utility event formula for m1 is U Fev (m1) = (and stm1 f tm1 sucm1). f tm1 ← 1. sucm1} p (S) = (1 − p (stm1)) · (1 − p (f tm1)) · (1 − p (sucm1)) = 0 U Fen /∀s ∈ S ← true/∀t ∈ T ← f alse = (and f alse f alse f alse) = f alse 2. evaluates to false = {f tm1. PADS creates different truth value assignments to the variables in U Fev (m1) by looking at all possible subsets of the events and assigning “true” to the events in the set and “false” to those not in the set. PADS ﬁrst gets f t = 100 + 14 = 114 which occurs with p (f t) = 1. sucm1} p (S) = p (stm1) · (1 − p (f tm1)) · (1 − p (sucm1)) = 0 U Fen /∀s ∈ S ← true/∀t ∈ T ← f alse = (and true f alse f alse) = f alse 3.25 = 0.3 MultiAgent Domain The ﬁnish time of m1 is a combination of ST (m1) = {{100. sucm1}: not shown. evaluates to false = {sucm1}: not shown. The utility value assignment is U Fval (m1) = {stm1 ← 1. S = {stm1}: T = {f tm1. S = {}: T = {stm1.0 · 0.0 · 0.2} onto F Tnu (m1) and {116. 0. We list below the different sets. 0.8. PADS pushes {114.25 · 0.8} onto F Tu (m1).75 = 0. Iterating through these. in the context of Algorithm 5.25 · 0. and how the utility formula evaluates for those sets. This meets m1’s deadline of 120. m1 might fail so PADS pushes {116. 0.0}} and its duration distribution. Again. p (f tm1) = 1. sucm1 ← 0. 1. f tm1. {{14. however.75. So.75·0. and {114. 4. {16. The method’s other possible ﬁnish time is f t = 100 + 16 = 116 which occurs with p (f t) = 1.
0. 0.6.6 Because m3 is enabled by T 1.8 · 12 = 9.6}. It can skip Line 4 as m3 has no predecessors.2 and PADS pushes {116. 0. 0.2}} F Tnu (m1) = {{114. {116. 0. 1}}. f tpred ) = 88 .6.05}} U Fev (T 1) = (and stm1 f tm1 sucm1) U Fval (T 1) = {stm1 ← 1. eu (m1) = 0. 0.8 eu (T 1) = 9.2} onto ST (m3). so E = {{114. {116. Because the method is not enabled in this case. st = 116 with probability 0. its proﬁle is the same as m1’s: F Tu (T 1) = {{114. {∞.15}. Next. sucm1 ← 0. the method fails at max (lst (m3) + 1. First.05}} U Fev (m1) = (and stm1 f tm1 sucm1) U Fval (m1) = {stm1 ← 1.5 Probabilistic Analysis of Deterministic Schedules 8.8 Based on these iterations. then.6}. which here equals F Tu (T 1) appended by {∞. st = 114 with probability 0. To put together m1’s probability proﬁle: ST (m1) = {{100. 0.8 eu (m1) = 9. 0. {116. sucm1}: T = {} p (S) = p (stm1) · p (f tm1) · p (sucm1) = 0. 1 − po (T 1)}.6} onto ST (m3).8.8} po (T 1) = 0. S = {stm1.2}. As util (m1) = 12 and N LEu (m1) = {{1. st = ∞ with probability 0. sucm1 ← 0. Next.2}} F Tnu (T 1) = {{114.6}. Because it only has one scheduled child. This represents the case when T 1 does not earn utility.8 U Fen /∀s ∈ S ← true/∀t ∈ T ← f alse = (and true true true) = true po (m1) = po (m1) + 0.4). PADS pushes {114. f tm1.15}. 0. 0. Since this is a valid start time. 0.2}}. its start time depends on T 1’s ﬁnish time. 1.8} po (m1) = 0. po (m1) = 0. 0.0}} F Tu (m1) = {{114. {116. The algorithm begins by iterating through the distribution E (Line 1 of Algorithm 5. 0. {116.6 Next we can calculate T 1’s probability proﬁle. f tm1 ← 1. f tm1 ← 1.2.
3} onto F Tu (m3).5 = 0. the ﬁnish time 120 is less than m3’s deadline. PADS pushes {117. The next combination is st = 116 and dur = 4.6·0. 0. ST (m3) is combined with DUR (m3).1}} F Tnu (m3) = {{117. sucm1 ← 0. Thus m3’s full proﬁle is: ST (m3) = {{114.1. the ﬁnish time 120 is less than m3’s deadline.6}.2 · 0. 0. the method is not removed from the schedule until its lst has passed in case the planner is able to reschedule the enabler. {122.3.8 Method m4’s proﬁle is calculated next as per Algorithm 5. 0. sucm1 ← 0. sucm3 ← 1} U F (m3) is evaluated the same as is m1’s utility formula.1. as m3 has no disablers): U Fev (m3) =(and (and stm1 f tm1 sucm1) stm3 f tm3 sucm3) and the utility value assignment is U Fval (m3) = {stm1 ← 1.6 · 0. but it evaluates to 0. 0.3 MultiAgent Domain 117. As with method m4.2}} F Tu (m3) = {{118. stm3 ← 1.1} onto F Tu (m3).2 · 0.5 = 0. stm3 ← 1. p (f tm3) = 0. 0. and so PADS pushes {122. 0.8. The calculation is not shown. {∞. Its start time depends on T 1’s and m3’s ﬁnish times. {116.5 = 0. 0.4}.3}. and m3’s expected utility is 0.8. The ﬁnal combination is st = 116 and dur = 6. Finally. which happens with probability 0.8 eu (m3) = 4. 1}} for all possible start times. 0. As m3 has no soft NLEs. p (stm3) = 0. which happens with probability 0.5 = 0.8 · 6 = 4. the distribution E = {{114. the ﬁnish time 118 is less than m3’s deadline.5.2} onto F Tnu (m3).2}} U Fev (m3) =(and (and stm1 f tm1 sucm1) stm3 f tm3 sucm3) U Fval (m3) = {stm1 ← 1.1} onto F Tu (m3).8.6}.8/0. Trivially. and so we push {120.2}. 0. So. 0.3. The ﬁrst combination is st = 114 and dur = 4. The next combination is st = 114 and dur = 6. f tm3 ← 1. 0. which happens with probability 0. 0.3} onto F Tu (m3). and so PADS pushes {120. 1}} and N LEd (m3) = {{1. f tm1 ← 1.8 = 1.8. N LEu (m3) = {{1. 89 . 0. the ﬁnish time 122 is less than m3’s deadline. f tm1 ← 1. {116. f tm3 ← 1. and so PADS pushes {118. which happens with probability 0. recall that in cases where a method’s enabler fails.8 and p (dism3) = 0. sucm3 ← 1} po (m3) = 0.4. To ﬁnd m3’s ﬁnish time distribution.8/0. 0. The utility event formula for m3 is (omitting dism3.2}}. {120.
As it is enabled with probability 0. sucm1 ← 0. N LEu (m4) = {{1. m4 has six possible ﬁnish time combinations. pst = 0.08} onto ST (m4) 8.72 The utility event formula for m4 is U Fev (m4) = (and (and stm1 f tm1 sucm1) stm4 f tm4 sucm4) The utility value formula is U Fval (m4) = {stm1 ← 1.18 Method starts. 0.32}}. f te = 116. 1}} and N LEd (m4) = {{1. pst = 0. 0. f te = 116. f te = 116.8 but starts with probability 0. p (stm4) = 0.77778. 0. push {122.24}. st = 117. 0.02 Method misses lst.2 · 0. Recall that lst (m4) = 121.16 = 0. 1}}. 0.06 Method starts. st = 120.8 = 0.12 Method starts. f tpred = 122.72−0. f tpred = 120.5 Probabilistic Analysis of Deterministic Schedules PADS iterates through this distribution and m4’s predecessor’s (m3’s) ﬁnish time distribution. st = 120. push {122.4 = 0.02} onto F Tnu (m4) 9.72/0. stm4 ← 0. f te = 114. st = 122.77778.8. 0. f te = 114. is the only one where m4 misses its deadline. f tm1 ← 1.06} onto ST (m4) 7. pst = 0.1 = 0.16}. so p (f tm4) = 0. push {118. 0. 90 .18} onto ST (m4) 3.3 = 0. st = 117.24 Method starts. push {117. push {122. f te = 114.06} onto F Tnu (m4) 5.2 · 0. pst = 0. f tpred = 118. st = 118. ST (m4) is combined with DUR (m4).6 · 0. 0.6 · 0.72. 0. pst = 0.2 = 0.2} onto F Tnu (m4) PADS ultimately ends up with ST (m4) = {{117.04 Method starts. Since DUR (m4) has only two possible values. f tpred = 122.3 = 0. pst = 0. 0.9.2 · 0. f te = 114.6 · 0. when m4 starts at time 120 and has duration 11.9.04} onto ST (m4) 6. 0.06 Method misses lst. sucm4 ← 1}. 0. To ﬁnd m4’s ﬁnish time distribution. We collapse the cases where f te equals ∞ and do not bother showing the iteration through m4’s predecessor’s distribution for that case. f tpred = 118. push {118. pst = 0. As m4 has no soft NLEs. f tpred = 120. {118. {120. pst = 0.24} onto ST (m4) 4.08 Method starts. f tpred = 117. we do not show the calculation. st = 114.4 = 0. 0. push {120.12} onto ST (m4) 2.2 · 0. f te = 116. pst = 0. We evaluate this as we evaluated m1’s utility formula. f tpred = 117. st = 118. push {117. st = ∞. push {120.2 Method is not enabled. 1.1 = 0. f tm4 ← 0. f te = ∞.6 · 0. calculating m4’s start time distribution along the way.2 = 0. The latest one.
f tm1 ← 1. and U Fval (T 2) equals the union of its children’s value assignments. T 2’s expected utility.16}.5.8. f tm4 ← 0.28}.064}.28 · 0. 0.12}.12.128}. T 2 will fail at the same time m3 does and so PADS adds {117. 0. respectively. where 6 and 9 are utilities of m3 and m4.2 · 0. 0.8.56.24}.12}} F Tnu (m4) = {{122. sucm1 ← 0.8 = 0. sucm4 ← 1} po (m4) = 0. 0.16 · 0. {127. F Tu (T 2) = {{124. and so PADS adds {117. {127. {131. 0. Note here the discrepancy between the sum of the members of F Tu and po: F Tu (T 2) sums to 0. as for po. 0.56 · 9 = 5.08}.44 = 0. and m4’s expected utility is 0. and not making that assumption. stm4 ← 0.04. PADS begins by iterating through the superset of T 2’s children to consider the possible combinations of T 2’s children not earning utility. U F (T 2) evaluates to 0.096}. 0. T 2 will fail at the same time m4 does and so PADS adds {122. 0. {128. When L = {m4}. PADS ﬁnds the distribution representing the max of their F Tu distributions and adds it to F Tu (T 2). but po (T 2) = 0. This highlights the difference between assuming that the enablers of m3 and m4 are independent.2 · 0. Because m4 always ﬁnishes later.13.088} to F Tnu (T 2). 0. 0. {125. {128.56 · (6 + 9) = 8. as it always fails earlier. The last part of the calculation is to normalize T 2’s F T distributions so that they sum to po (T 2) and 1 − po (T 2).77778.224} and {131. the probability that m3 earns utility). 0. as per Algorithm 5. all children earn utility.04 T 2 is calculated next.56.56 = 0. Thus m4’s full proﬁle is: ST (m4) = {{117.096}} (note that this is the same as F Tu (m4) times 0.16}} U Fev (m4) = (and (and stm1 f tm1 sucm1) stm4 f tm4 sucm4) U Fval (m4) = {stm1 ← 1.56 eu (m4) = 5. U Fev (T 2) equals the and of its children’s event formulas.16}.112} to F Tnu (T 2). {129. {129. 0.9.064}.4. is 0. Finally. {120.32}} F Tu (m4) = {{124. 0.128} to F Tnu (T 2). When L = {}.56. as in the F Tu calculation.08}. as described in Algorithm 5. T 2 will fail at the same time m3 does. 0. 0. When L = {m3}. 91 . 0.3 MultiAgent Domain The calculation is not shown but it evaluates to 0.448. m4}. 0. {125. when L = {m3. 0. 0. {118.8 = 0.
10. 0. we add F Tu (T 1) · (1 − po (T 2)).02}}. we can add F Tnu (T 2) · (1 − po (T 1)). 0. {128. we can normalize 92 . 0. sucm3 ← 1.036}. {122. {116.16}. they evaluate to 0. Therefore.18}.336}.4 Finally we can ﬁnd the taskgroup’s probability proﬁle. 0. 0. U Fev (T G) equals the or of its children’s event formulas. Next. Given this value.08}. {129. meaning that in this hypothetical case T 1 earns utility and T 2 does not. 0. to F Tu (T G). which equals {{124. or {{114. As T 2’s failure ﬁnish time distribution is strictly later than T 1’s. Together. which means that in this hypothetical case no child earns utility. {125.56 eu (T 2) = 8.77778. {131.8. 0.5 Probabilistic Analysis of Deterministic Schedules T 2’s ﬁnal probability proﬁle is: F Tu (T 2) = {{124. to F Tnu (T G). For the same reasoning.08}. {128. sucm1 ← 0. {125.032}. 0. S = {T 1. 0. 0. to F Tu (T G). 0. {116.088}}. f tm4 ← 0.1}} U Fev (T 2) = (and (and (and stm1 f tm1 sucm1) stm3 f tm3 sucm3) (and (and stm1 f tm1 sucm1) stm4 f tm4 sucm4)) U Fval (T 2) = {stm1 ← 1. 0. f tm1 ← 1.9.024}}.8. {129. f tm3 ← 1. 0. {122. As F Tu (T 1) is strictly less than F Tu (T 1). S next equals {T 2}.264}.024}.032}. 0.12}} F Tnu (T 2) = {{117. {127. we add {{114. {131. we add F Tu (T 2) · (1 − po (T 1)). {127.12}. S = {}. or {{117.016}. Finally. 0. We ﬁrst ﬁnd its ﬁnish time distribution as per Algorithm 5.112}} to F Tu (T G). 0.16}. S = {T 1}. stm4 ← 0. T 2}. In the ﬁrst iteration of the algorithm. 0. sucm4 ← 1} po (T 2) = 0. 0. 0. 0.016}. stm3 ← 1. and U Fval (T G) equals the union of its children’s value assignments.
{116. {131. 0. it updates that activity’s proﬁle. as no agent has a complete view of the activity network. f tm1 ← 1. 0.4 Distributed Algorithm The distributed version of PADS (DPADS) is used by agents locally during execution. hinderers. Agents then pass on the newlyfound proﬁles to other agents in order to allow them to calculate further portions of the network. 93 . f tm4 ← 0.8.77778. Once execution begins and methods begin executing. not all activity proﬁles can be found in a single pass. 0. Whenever an agent receives updates concerning an activity’s execution. This process repeats until the proﬁles of the entire network have been found.3. ﬁnish times and expected utility are for the large part the same. its enablers. agents have a ﬁxed period of time to perform initial computations. the probability proﬁles of the activity network are computed incrementally. In fact.014}.073}.028}. When execution begins. 0.5.9. {127. before they actually begin to execute methods. facilitators.021}. and its expected utility is 18. {124. However. sucm1 ← 0.2.021}} F Tnu (T G) = {{117. {129. if the activity’s proﬁle does in fact change. sucm4 ← 1} po (T G) = 0. 5. f tm3 ← 1. stm3 ← 1. sucm3 ← 1. {125.014}.8 eu (T G) = 18 Therefore the probability that the schedule with earn positive utility during execution is 0.045}} U Fev (T G) = (or (and stm1 f tm1 sucm1) (and (and (and stm1 f tm1 sucm1) stm3 f tm3 sucm3) (and (and stm1 f tm1 sucm1) stm4 f tm4 sucm4))) U Fval (T G) = {stm1 ← 1.175}. {128. such as calculating initial probability proﬁles.53}. disablers. or (for tasks) children. respectively. stm4 ← 0.3 MultiAgent Domain F Tu (T G) and F Tnu (T G) so they sum to 0.082}. 0. NLE information.3.8 and 0. during this time. with each agent calculating as many proﬁles as it can at a time. 0. 0. 0. the algorithms for calculating activities start times. 0. It is intuitively very similar to global PADS. Overall: F Tu (T G) = {{114. 0.2 (on page 36) that when execution begins. probability proﬁles are continuously and asynchronously being updated to reﬂect the stochasticity of the domain and the evolving schedule. {122. DPADS passes that information on to other agents interested in that activity. Recall from Section 3.8.
avoids this delay by always having such values cached. This involves agents querying other agents for information to use in their calculations. For example. which is used as a lower bound on when a method will start (assuming its execution has not already begun). if a’s parent is a max task T . for each of its activities. which would introduce a large delay in the calculation. h). The algorithm begins at some local activity a and ascends the activity hierarchy one parent at a time until it reaches the taskgroup (Line 5). in a domain with high executional uncertainty such as this one. in the global version. similarly. euT G (a. how the success or failure of every other activity in the network would affect it. and would also require a large increase in the amount of communication.3.5 Probabilistic Analysis of Deterministic Schedules Agents do not explicitly wait for or keep track of the effects these changes have on the rest of the hierarchy. overall. we will not explicitly show the changes needed to those algorithms. Instead. Since handling these issues involve very small changes to the PADS algorithms presented in Section 5. The runtime aspect of the approach also necessitates an additional notion of the currenttime. we consider them intractable given the realtime constraints of the domain.14. Such computations would take long to compute due to the large number of contingencies to explore.2) (Lines 12). eu(T G  po(a) = 1) and eu(T G  po(a) = 0) can be calculated exactly by setting po(a) to the appropriate value and updating the entire network. The distributed version. even though changes resulting from their update may eventually propagate back to them. In the distributed case. The algorithm for calculating euT G is shown in Algorithm 5. therefore. while calculating ud. agents estimate ud based on an estimation of the expected utility of the taskgroup if an activity a were to have a hypothetical probability value h of earning utility. doing so would be very difﬁcult because typically many probability proﬁles are changing concurrently. 94 . As a simple example. a different approach must be taken since no agent has a complete view of the network. This requires every agent to calculate. and communicate that to other agents. after an activity has been executed. The second option. does make several approximations of expected utility at the taskgroup level. which is necessary for eul and eug. however. During DPADS. we use the symbol to designate a number as an estimate. The ﬁrst is to exactly calculate such values on demand. its proﬁle is changed to the actual outcome of its execution.2. There are three options for calculating ud in a distributed way. which can also calculate the values exactly. At higher levels the algorithm estimates these numbers according to tasks’ uafs. notated as po (T  po (a) = h) and eu (T  po (a) = h). its estimated expected utility is the maximum of a’s new estimated expected utility and the expected utility of T ’s other children (Lines 2021). To begin. po (a  po (a) = h) = h and eu (a  po (a) = h) = h · averageUtil (a) (where averageUtil is deﬁned in Section 3. we utilize a third option in which these values are estimated locally by agents as needed.3. At each level it estimates the po and eu that the current task T would have if po (a) = h.
po (cd  po (c))) ˆ end children = scheduledChildren (T ) − {c} switch Type of task T do case sum po (T  po (a) = h) = 1 − (1 − po (T )) · 1−c po(cpo(a)=h) 1−po(c) c ∈children eu (T  po (a) = h) = eu (c  po (a) = h) + case max po (T  po (a) = h) = 1 − (1 − po (T )) · eu (c ) 1−c po(cpo(a)=h) 1−po(c) eu (T  po (a) = h) = max (eu (c  po (a) = h) . 1 Function : approximateEU (a. maxc ∈children eu (c )) case exactly one po (T  po (a) = h) = po (c  po (a) = h) eu (T  po (a) = h) = eu (c  po (a) = h) case sum and c po (T  po (a) = h) = po (T ) · po(cpo(a)=h) po(c) eu (T  po (a) = h) = po (T  po (a) = h) · case min po (T  po (a) = h) = po (T ) · po(cpo(a)=h) c po(c) eu(c ) eu(c) po(cpo(a)=h) . minc ∈children po(c ) c eu(c) po(cpo(a)=h) c + c ∈children eu(c ) po(c ) eu (T  po (a) = h) = po (T  po (a) = h)·min end end c=T end return eu (T G  po (a) = h) + nle∆ 95 .14: Distributed taskgroup expected utility approximation with enablers and uafs.3 MultiAgent Domain Algorithm 5.5. po (ce  po (c))) ˆ end foreach cd ∈ disabled (c) do po(cpo(a)=h) po (cd  po (c)) = po (cd ) · 1−c ˆ 1−po(c) nle∆ = nle∆ + approximateEU (cd . h) po (a  po (a) = h) = h 2 eu (a  po (a) = h) = h · averageUtil (a) 3 c=a 4 5 nle∆ = 0 while T = parent (c) do 6 foreach ce ∈ enabled (c) do c po (ce  po (c)) = po (ce ) · po(cpo(a)=h) ˆ 7 po(c) 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 nle∆ = nle∆ + approximateEU (ce .
5 Probabilistic Analysis of Deterministic Schedules Enabling and disabling relationships are also included in the estimation. If an activity c is reached that enables another activity ce , ce ’s probability of earning utility given po (c  po(a) = h) is estimated. We abbreviate this for clarity as po (ce  po (c)). How this hypothetical po value for ˆ ce would impact the overall taskgroup utility, euT G (ce , po (ce  po (c))), is then estimated by a ˆ recursive call (Line 8). The same process happens for disablers. We do not include facilitating or hindering relationships as they do not affect whether a method earns utility, and we expected little beneﬁt in including them in the estimation. Our experimental results, showing the accuracy of this approximation, support this decision (Section 5.4.2). Overall, the algorithm returns the sum of the estimated taskgroup utilities (Line 35). This algorithm, if it recurses, clearly returns a value not directly comparable to eu (T G) as it includes potentially multiple summands of it (Lines 8, 35). Instead, it is meant to be compared to another call to the estimation algorithm. So, to estimate ud, agents use this algorithm twice, with h = 1 and h = 0: ud = euT G (a, 1) − euT G (a, 0) This value can then be used to provide estimates of eul and eug. Typically, the estimate the calculation provides is good. The main source of inaccuracy is in cases where an activity c and its enabler would separately cause the same task to lose utility if they fail. For example, if c and an activity it enables ce were crucial to the success of a common ancestral task, both the core algorithm as well as the recursive call would return a high utility difference stemming from that common ancestor. As we sum these values together, we end up doublecounting the utility difference at this task and the overall estimate is higher than it should be. Ultimately, however, we did not deem this a critical error, as even though we are overestimating the impact c’s failure would have on the schedule, in such cases it typically has a high impact, anyway.
5.3.5 Practical Considerations
When combining discrete distributions, the potential exists for generating large distributions: the size of a distribution Z = X + Y , for example, is bounded by X + Y . As described in Section 3.3.2 (on page 44), however, typically such a distribution has duplicate values which we collapse to reduce its size (i.e., if the distribution already contained an entry {5, 0.25} and PADS adds {5, 0.2} to it, the resulting distribution is {5, 0.45}). In part because of this, we did not ﬁnd distribution size to be a problem. Even so, we demonstrate how the size can be further reduced by eliminating very unlikely outcomes. We found that eliminating outcomes that occur with less than a 1% probability worked well for us; however, depending on what point in the tradeoff of speed 96
5.4 Experiments and Results versus accuracy is needed for a given domain, that threshold can be adjusted. The elimination did not much impair the calculation’s accuracy, as is shown in the next section, and can readily be done for systems which are negatively affected by the distribution sizes. We also memoized the results of function calls to evaluateUtilityFormula. This was found to greatly reduce the memory usage and increase the speed of updating probability proﬁles. It is effective because many activities have identical formulas or portions of formulas (e.g., a parent task with only one scheduled child), and because many activities’ proﬁles do not change over time. In the distributed version of PADS, our approach has the potential to increase communication costs as compared to a system not using PADS, as more information is being exchanged. One simple way to decrease this effect is to communicate only those changes in probability proﬁles that are “large enough” to be important, based on a threshold value. We discuss other ways to reduce communication in our discussion of future work, Section 8.1.1.
5.4 Experiments and Results
We performed experiments in the singleagent domain to test the speed of PADS, as well as in the multiagent domain to test the speed and accuracy of both the global and distributed versions of PADS.
5.4.1
SingleAgent Domain
We performed an experiment to test the speed of PADS in this domain. We used as our test problems instances PSP94, PSP100 and PSP107 of the mmj20 benchmark suite of the resourcerelaxed MultiMode Resource Constrained Project Scheduling Problem with Minimal and Maximal Time Lags (MRCPSP/max) [Schwindt, 1998]. The problems were augmented as described in Section 3.3.1. In the experiment, we executed the three problems 250 times each, using PADS to calculate and maintain all activity probability proﬁles, and timed all PADS calculations. To execute the problems, we implemented a simple simulator. Execution starts at time zero, and methods are started at their est. To simulate method execution the simulator picks a random duration for the method based on its duration distribution, rounds the duration to the nearest integer, advances time by that duration, and updates the STN as appropriate. The planner is invoked both initially, to generate an initial plan, and then during execution whenever a method ﬁnishes at a time that caused an inconsistency in the STN. PADS is invoked to analyze the initial plan before execution begins, as well as to update 97
5 Probabilistic Analysis of Deterministic Schedules
Table 5.1: Average time spent by PADS calculating and updating probability proﬁles while executing schedules in the singleagent domain. average standard deviation average PADS time (s) PADS time (s) makespan (s) PSP94 8.65 2.25 130.99 PSP100 14.82 7.54 140.94 PSP107 6.43 1.68 115.41
probability proﬁles when execution results become known and when replanning occurs. Table 5.1 shows the results. For each problem, it shows the average time taken to perform all PADS calculations, as well as the corresponding standard deviation. It also shows the average executed schedule makespan. All values are shown in seconds. Clearly, PADS incurs only a modest time cost in this domain, incurring an average overhead of 10.0 seconds, or 7.6% of the duration of execution.
5.4.2 MultiAgent Domain
Test Suite For these experiments, we used a test suite consisting of 20 problems taken from the Phase II evaluation test problem suite of the DARPA Coordinators program1 . In this test suite, methods have deterministic utility (before the effects of soft NLEs), which promotes consistency in the utility of the results. Method durations were, in general, 3point distributions. Table 5.2 shows the main features of each of the problems, roughly ordered by size. In general, three problems had 10 agents, 20 problems had 20 agents, and two problems had 25 agents. The problems had, on average, between 200 and 300 methods in the activity network, and each method had, on average, 2.4 possible execution outcomes. The table also shows how many maxleaf tasks, tasks overall, and NLEs of each type were present in the activity network. Recall that “maxleaf” tasks are max tasks that are the parents of methods. Typically, a deterministic planner would choose only one child of a maxleaf task to schedule. We include this information here because we also use this test suite in our probabilistic schedule strengthening experiments in the following chapter. In general, 2030% of the total methods in the activity network were scheduled in the initial schedule of these problems, or roughly 5090 methods.
1
http://www.darpa.mil/ipto/programs/coor/coor.asp
98
5.4 Experiments and Results
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Table 5.2: Characteristics of the 20 DARPA Coordinators test suite problems. avg # method maxenable disable facilitate hinder agents methods tasks outcomes leafs NLEs NLEs NLEs NLEs 10 189 3 63 89 18 12 0 18 10 225 2.4 45 67 0 0 13 0 10 300 2.66 105 140 50 36 33 33 20 126 2.95 42 59 21 2 13 10 20 135 2.67 52 64 22 0 11 0 20 135 2.67 45 64 23 0 12 0 20 198 2.67 66 93 38 12 21 10 20 225 2.64 75 105 43 10 28 8 20 225 5 75 106 23 12 14 11 20 225 5 75 106 23 12 20 19 20 225 5 75 106 28 12 8 13 20 225 5 75 106 29 13 10 13 20 225 5 75 106 33 11 15 17 20 384 1.55 96 137 33 10 20 17 20 384 1.58 96 137 38 23 16 15 20 384 1.59 96 137 32 18 17 16 20 384 1.59 96 137 35 23 19 14 20 384 1.62 96 137 47 16 19 23 25 460 3.01 115 146 106 28 42 47 25 589 3.06 153 194 81 12 60 26
99
1 seconds. with the largest problem taking 2. allow the time bounds of methods to ﬂoat in response to execution dynamics." &!" ()*+. with no replanning or strengthening allowed.5 Probabilistic Analysis of Deterministic Schedules &#$" &" !"#$%&'$ %#$" %" !#$" !" %" &" '" (" $" )" *" +" . Each problem in the test suite was run 25 times.4 shows how long it took to calculate the initial expected utility for the entire activity network for each of the 20 problems. The problems are numbered as presented in Table 5. and well within acceptable range. we ran an experiment to characterize the accuracy of the calculation. we executed the schedules as is.4: Time spent calculating 20 schedules’ initial expected utilities. We executed the initial schedules of each of the 20 test suite problems in a distributed execution environment to see what utility was actually earned by executing the schedules. and are left to manage execution of their local schedules while trying to work together to maximize overall 100 . such as removing methods from a schedule if they miss their lst. Analysis of MultiAgent Global PADS We performed two experiments in which we characterized the running time and accuracy of global PADS. For each problem in the test suite. In order to directly compare the calculated expected utility with the utility the schedule earns during execution. Figure 5.2." %!" %%" %&" %'" %(" %$" %)" %*" %+" %. and we did retain basic conﬂictresolution functionality that naively repairs inconsistencies in the STN. Next. and so their order is roughly correlated to their size.#"$ Figure 5. agents are given their local view of the activity network and their portion of the precomputed initial joint schedule. when execution begins. we timed the calculation of the expected utility of the initial schedule. The schedule’s ﬂexible times representation does. In the distributed execution environment. using a 3.6 GHz Intel computer with 2GB RAM. however. The time is very fast.
$ "*$ "'$ ")$ %#$ *$ "($ ()*+#. when a method misses its lst it is left on the schedule and. Each agent and the simulator run on their own processor and communicate via message passing./35/6$47895:$ Figure 5. receive back execution results (based on the stochastics of the problem) as they become known. and share status information asynchronously. methods may not start as early as possible (at their est) because of the asychronousness of execution.5: The accuracy of global PADS: the calculated expected utility compared to the average utility. They communicate method start commands to the simulator. there are sources of inaccuracy due to aspects of the problem that we do not model.5.##$ +##$ *##$ )##$ (##$ '##$ &##$ %##$ "##$ #$ !"##$ !"#$%&' "&$ ""$ .' . instead. and for visual clarity are sorted in the graph by their calculated expected utility. 101 . For example.2. joint schedule utility. Also. they are not exactly the same. The problems are numbered as presented in Table 5.5 shows the experimental results. We assume communication is perfect and sufﬁcient bandwidth is available. Figure 5./01/$/2/345/6$47895:$ 383485/6$/2. Error bars indicate the standard deviation of utility earned over the 25 execution trials. this accounts in large part for the inaccuracy of problem 3.3. Note that although the calculations are close to the average achieved utilities. a later method is removed to increase schedule slack. very rarely.4 Experiments and Results "%##$ ""##$ "###$ . when executed. and compares the calculated expected utility with the average utility that was actually earned during execution.1). In addition to assumptions we make that affect the accuracy (Section 5.$ )$ "%$ +$ %$ "#$ '$ ($ "$ &$ "+$ ".
and 18 . and suitable for a runtime algorithm. In this ﬁgure. our PADS calculation is very accurate and there is an average error of only 7% between the calculations of the expected utility and the utility earned during execution for these schedules. 17. only 4 of the problems . the average time spent per agent calculating probability proﬁles is quite low. On average across these problems. Agent threads were distributed among 10 3. accounting for most of its inaccuracy.2. Comparing this graph with Figure 5. this occurs. since proﬁles are only updated on an asneeded basis. This occurs during the execution of problem 1. such as calculating initial probability proﬁles. agents have a ﬁxed period of time to perform initial computations. each with 2GB RAM. Problem 1 we have already discussed. individual PADS updates take an average of 102 . during this period.4. however.6 shows the average computation time each agent spent distributively calculating the initial expected utilities for each problem. Analysis of MultiAgent Distributed PADS We also performed experiments to characterize the runtime and accuracy of DPADS. for example. Figure 5. and this is what happens in for these three problems. Even considering this. a slight error in the calculation of the probability of that task earns utility can lead to a reasonable error in the calculated expected utility. as well as the total computation time spent across agents calculating proﬁles for that problem. when there is one highlevel (and high utility) task that has a nonzero probability of failing but all others always succeed. Recall that when execution begins. Despite these differences. This is due to probability proﬁles on an agent needing to be updated multiple times as proﬁle information arrives asynchronously from other agents and affects alreadycalculated proﬁles. On average. The experiment was performed on the 20 problem test suite described above. Each of the other problems’ 25 simulated executed utilities consist solely of two different values.5 Probabilistic Analysis of Deterministic Schedules going against our assumption that the method that misses its lst is always removed.1. however.6 GHz Intel computers. the clock time that passed between when execution began and when all proﬁles were ﬁnished being calculated was 4.have average executed utilities that are statistically different from the corresponding calculated expected utility. we see that distributing the PADS algorithm does incur some overhead. the workload in general is not as high as these results. the problems are numbered as presented in Table 5. Once agents begin executing methods and execution progresses. agents spent on PADS from when they received their initial schedule until all probability proﬁles for the entire activity network were calculated.9 seconds. as the results show. before they actually begin to execute methods. once method execution has begun. The ﬁrst experiment measured how much time. In cases such as these. 14. Overall.
3).459" Figure 5./.4 Experiments and Results '#" '!" !"#$%&'$ &" %" $" #" !" '" #" (" $" )" %" *" &" +" '!" ''" '#" '(" '$" ')" '%" '*" '&" '+" #!" ()*+. perform an experiment which tested the accuracy of our utility difference approximation.3.003 seconds (with standard deviation σ = 0. Overall. Thus the overhead of distributed PADS is insigniﬁcant when compared to the cost of replanning.8/699".".#"$ .0. We did.7 seconds replanning. Overall.309 seconds (σ = 0.45" 565. A repeated measures ANOVA showed a weakly signiﬁcant difference between the two calculations with p < 0. each agent spends an average of 1. In contrast.4). the utility difference approximations were.39) with agents spending an average of 47. however."12.0. Because there are no approximations in our initial calculation of the probability proﬁles of the activity tree."3..5 seconds updating probabilities. For each method scheduled for each of the 20 test suite problems. 103 . 0.3.01)./". on average.0. and approximately (using the approximation algorithm described in Section 5.6: Time spent distributively calculating 20 schedule’s initial expected utility. each individual replan takes an average of 0. there were 1413 methods used in the experiment.58% of the actual utility difference.7"12.5. we do not duplicate our accuracy results for DPADS.07. we calculated the utility difference exactly (using the method described in Section 5. shown as both the average time per agent and total time across all agents. however. within 4.
5 Probabilistic Analysis of Deterministic Schedules 104 .
We implemented probabilistic schedule strengthening for the multiagent domain [Hiatt et al. schedule strengthening works within the structure of the current schedule to improve its robustness and minimize the number of activity failures that arise during execution. In the multiagent domain. schedule strengthening is performed both on the initial schedule before it is distributed to agents. Section 6. “atrisk” activities. It is most effective if domain knowledge can be taken advantage of to ﬁnd the best ways to make the schedule most robust. In this way is somewhat dependent on the goodness of the schedules generated by the deterministic planner. whose failure impacts taskgroup utility the most.4 shows probabilistic schedule strengthening experimental results. In this domain. and during distributed execution in conjunction with replanning. or activities in danger of failing. Finally. 6. and can be skipped by the casual reader. Probabilistic schedule strengthening also utilizes PADS to ensure that all strengthening actions raise the taskgroup expected utility. It relies on PADS to target.Chapter 6 Probabilistic Schedule Strengthening This chapter discusses how to effectively strengthen schedules by utilizing the information provided by PADS. It then takes steps to strengthen the schedule by protecting those activities against failure.1 Overview The goal of probabilistic schedule strengthening is to make improvements to the most vulnerable portions of the schedule. for example. a common structural feature of the activity networks is to have max tasks as the direct parents of several leaf node 105 . The next section describes the approach at a high level. At a high level. 2009]. 2008.. or focus strengthening efforts on. The two sections following that provide detailed descriptions of the algorithms.
might schedule an additional child. These sibling methods are typically of different durations and utilities. or substitute one child for another.Scheduling redundant methods under a maxleaf task to improve the task’s likelihood of earning utility (recall that max tasks accrue the utility of the highest utility child that executes successfully). or potentially increases overall schedule slack and the likelihood that this (or another) method earns utility. Note that each of these tactics has an associated tradeoff. or a set of NLEs that is less desirable.6 Probabilistic Schedule Strengthening methods. however. “Pruning” . Given our knowledge of the multiagent domain’s characteristics. and methods are The reason to consider siblings with a different set of NLEs is because they may change the effective time bounds of the method (via different hard NLEs) or its effective duration (via soft NLEs). however.4. Activities to strengthen are identiﬁed and prioritized differently for the three tactics. maxleaf tasks are targeted which have a nonzero likelihood of failing. as the removed method will not earn utility. in order to make the schedule more robust. For example.” Typically. 2. This can also lead to utility loss. we expect backﬁlling to improve expected utility the most. a shorter duration or a different set of NLEs1 . as the sibling may have a lower average utility. Probabilistic schedule strengthening.Replacing a scheduled method with a currently unscheduled sibling method that has a lower a priori failure likelihood. a deterministic planner would choose only one child of a maxleaf task to schedule. and often belong to different agents. “Swapping” . Swapping may lead to a loss of utility. possibly strengthening the schedule by increasing the likelihood that remaining methods earn utility. in contrast. however. This decreases schedule slack. as the task remains scheduled (although possibly at a lower utility).2) support these conjectures. we have investigated three strengthening tactics for this domain: 1. with the option of fully restoring power to a town (higher utility and longer duration) or restoring power only to priority buildings (lower utility and shorter duration). Swapping is expected to be the next most useful. 3. PADS allows us to explicitly address this tradeoff to ensure that all schedule modiﬁcations improve the taskgroup expected utility. such as the presence of maxleaf tasks. as it preserves the higherutility option while still providing a backup in case that higherutility option fails. For backﬁlling. This potentially increases the parent’s probability of earning utility by lowering the likelihood of a failure outcome. 1 106 . a search and rescue domain might have “Restore Power” modeled as a max task. Our results (Section 6. Pruning is expected to be the weakest individual tactic. “Backﬁlling” . We refer to these max tasks as “maxleafs.Unscheduling a method to increase overall schedule slack. A priori. which may lead to a loss of utility.
above. we next address the question of the order in which they should be applied. such as unscheduling a method. We call an approach to determining how tactics can be applied to perform probabilistic schedule strength 107 . it takes a step. Sorting methods in this way ensures that lowimportance methods. Methods with nonzero likelihoods of a failure outcome are also considered candidates for swapping.3. if backﬁlling the activity with the highest eul prevents backﬁlling two subsequent ones. which could have led to a higher overall expected utility).e. For both swapping and pruning. are removed or swapped ﬁrst to try increase overall schedule slack. such as swapping that method. When a valid action is found. a method is successfully scheduled. As with backﬁlling. A step will also return if no valid action was found after all possible actions were considered (e. on page 84). swapped or unscheduled) and (2) it raises the taskgroup’s expected utility. we found it to be an effective heuristic in prioritizing activities for strengthening. the step returns without trying to modify the schedule further. prioritizing the tasks in this way ensures that tasks whose expected failure affects taskgroup utility the most are strengthened ﬁrst and have the best chance of being successfully backﬁlled.2. The chains themselves are prioritized by decreasing expected utility loss so that the most important potential problems are solved ﬁrst. A step of any tactic begins with sorting the appropriate activities in the ways speciﬁed above. The step considers strengthening each activity sequentially until a valid strengthening action is made. Given the three tactics introduced above. Every time that a tactic attempts to modify the schedule. that do not contribute much to the schedule. but we found them to be effective when used with our swapping and pruning tactics.2). Parent tasks are prioritized for backﬁlling by their expected utility loss (Section 5. may make another tactic.1 Overview added to reduce this failure probability. Recall that this metric speciﬁes by how much utility the overall taskgroup is expected to suffer because of a task earning zero utility. infeasible.. Recall that this metric speciﬁes by how much utility the overall taskgroup is expected to gain by a task earning positive utility. While this does not always result in optimal behavior (for example. The main computational cost of schedule strengthening comes from updating probability proﬁles while performing strengthening during these steps. An action is valid if (1) it is successful (i..3. The ordering of tactics can affect the effectiveness of probabilistic schedule strengthening because the tactics are not independent: performing a step of one tactic. such methods are identiﬁed and incorporated into the list of chains. As schedules can support a limited amount of backﬁlling before they become too saturated with methods. “chains” of methods are ﬁrst identiﬁed where methods are scheduled consecutively and threaten to cause a method(s) to miss its deadline. using the eul and eug heuristics are not guaranteed to result in optimal behavior.6.g. all maxleaf tasks have been looked at but no valid backﬁll is found). Methods within the chain are sorted by increasing expected utility gain (Section 5.
' !"#$%&&' ()"*' *+. can take an unmanageable amount of time as it is exponential in the number of possible strengthening actions. if a backﬁll was performed instead of a prune to bolster the expected utility of a task. which may have led to a higher overall expected utility.1)..' ()"*' *+. repeating until no more actions are possible. Although its practical use is infeasible. ﬁnds the taskgroup expected utility that results from each ordering. and returns the sequence of steps that results in the highest expected utility (Figure 6. one swap step and one prune step.2. swapping and pruning steps. The highlevel strategy we use is a variant of a roundrobin strategy. For example. The ﬁrst prevents swapping methods for lowerutility methods (to lower the a priori failure likelihood) while backﬁlling the parent maxleaf task is still possible. Two additional heuristics are incorporated in order to improve the strategy. This strategy.. we use it as a baseline to compare more efﬁcient strategies against. This strategy is illustrated in Figure 6.. Although we found these heuristics to be very effective at increasing the amount of expected utility gained by performing roundrobin schedule strengthening. The baseline “pseudooptimal” strategy that ﬁnds the best possible combination of these actions considers each possible order of backﬁlling. when a backﬁll or swap is still possible. ening a strategy. These heuristics help to prioritize the tactics in order of their individual effectiveness. analogous heuristics could be used to prioritize the new set in its order of effectiveness. If other strengthening actions were being used. it is possible that then backﬁlling some other set tasks may not be possible. and commits to the step that raises the expected utility the most. however. and thus pruning entire branches of the taskgroup.. swapping and pruning steps that can be taken to strengthen a schedule. We also experimented with a greedy heuristic variant that performs one “trial” step of each tactic. The second prevents pruning methods that are a task’s sole scheduled child. Agents perform one backﬁll step.1: The baseline strengthening strategy explores the full search space of the different orderings of backﬁlls.' /' /' /' Figure 6.' /' ()"*' *+. in that order. they are not guaranteed to be optimal.6 Probabilistic Schedule Strengthening !"#$%&&' !"#$%&&' !"#$%&&' ()"*' *+. repeating until no more 108 . leading to a more effective probabilistic schedule strengthening strategy.
2: The roundrobin strengthening strategy performs one backﬁll.' !"#$%&&' ()"*' *+. to prevent agents from simultaneously performing duplicate actions needlessly and/or undesirably (e.. It performs. or whether the agent has tried to backﬁll it but the backﬁll action was invalid. !"#$%&&' ()"*' *+. in general. roughly as well as the roundrobin strategy.3). ideally. During distributed execution. two agents backﬁlling the same task when one backﬁll would sufﬁce).' !"#$%&&' ()"*' *+.. To address this. as for each successful strengthening action it performs up to three trial actions. therefore. one trial swap and one trial prune.1 Overview !"#$%&&' ()"*' *+. as the changes may have made it possible to strengthen activities that could not be strengthened before. therefore.' Figure 6.6.. and commits at each point to the action that raises the expected utility of the joint schedule the most..3: The greedy strengthening strategy repeatedly performs one trial backﬁll. Schedule strengthening may also be useful when performed alone if the schedule has changed since the last time it was performed. or they may have weakened new activities that could be helped by strengthening. only performs the one.. agents communicate a “strengthening status” for each local method indicating whether it is scheduled.' /' Figure 6. repeating until no more strengthening actions are possible. the roundrobin strategy. on the other hand. It utilizes the same two heuristics as the roundrobin strategy.g. replanning would not be performed without being followed by probabilistic schedule strengthening. whether it is unscheduled and the agent has not tried to backﬁll it yet. steps are possible (Figure 6. Agents operate independently and asynchronously during execution. as they ensure that the current schedule takes advantage of opportunities that arise while also being robust to execution uncertainty. probabilistic schedule strengthening and replanning work well when used together. but with a higher running time. This means that actions such as replanning and probabilistic schedule strengthening are also performed asynchronously. one swap and one prune. Agents use this 109 . Coordination is required.
the step compares the taskgroup’s new eu value to the original (Line 8). for both the global and distributed versions of our multiagent domain. In a domain with high executional uncertainty such as this one. We deﬁne the function maxLeafs to return scheduled maxleaf tasks. the step stores the original eu (T G) value (Line 3) to use later for comparison. and the step is able to schedule it via a call to the planner doschedule (Line 6). it undoes the schedule (Lines 1112) and continues looping through the tasks and children. Next. they do not explicitly wait for or keep track of the effects these changes have on the rest of the hierarchy. agents replan and strengthen with as complete information as they have. the sort order. if possible. Additionally. the step returns null (Line 17). even though changes resulting from their update may eventually propagate back to them. and the ﬁeld of the activity to sort by. If no successful backﬁlls are found once all maxleaf tasks have been considered.2. 110 . Instead. this situation is alleviated since agents have the option of replanning and strengthening again later if their actions became unsuitable for the updated situation. the step returns the successfully backﬁlled child c. then the activity hierarchy’s probability proﬁles are updated (Line 7). If it is higher.1 Backﬁlling Global Algorithm 6.2 Tactics The next three sections describe the tactics in detail. If not. The step performs one backﬁll. Sorting in this way ensures that tasks whose expected failure leads to the highest utility loss are strengthened ﬁrst. It begins by sorting all scheduled maxleaf tasks T where po (T ) < 1 by eul (T ) (Line 2). Once the maxleaf tasks are sorted. as agents make changes to their own probability proﬁles. 6. If a child is unscheduled.1 shows pseudocode for a step of the global backﬁlling tactic. Then. targeting improvements at maxleaf tasks which might fail. Although this of course has the potential to lead to suboptimal behavior. each child of each task is sequentially considered for backﬁlling.6 Probabilistic Schedule Strengthening information to reduce these unwanted situations and to inform the two heuristics of roundrobin strengthening described above. in this and subsequent algorithms we assume sort takes as its arguments the set of activities to sort. and then returns. 6. Recall also that. doing so would be very difﬁcult because typically many probability proﬁles are changing concurrently.
>.6. eul) euT G = eu (T G) foreach T ∈ M LTs do foreach c ∈ children(T) do if unscheduled (c) and doschedule (c) then updateAllProfiles() if eu (T G) > euT G then //expected utility increased.1: Global backﬁlling step utilizing maxleaf tasks.2 Tactics Algorithm 6. undo backfill undo (doschedule (c)) updateAllProfiles() end end end end return null 111 . commit to backfill return c else //expected utility decreased or stayed the same. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Function : doOneGlobalBackfill M LT = maxLeafs () M LTs = sort({T ∈ M LT  po (T ) < 1}.
and so coordination is required to prevent agents from simultaneously performing duplicate actions needlessly and/or undesirably. it must use instead the eul approximation (Section 5. if a domain has an effective way to sort child methods of a maxleaf task for backﬁlling. and the step must instead rely only on updating local probabilities via the function updateLocalProfiles. Distributed The distributed backﬁlling step has several complications. agents are strengthening their local schedule asynchronously. such as if two agents simultaneously revisit the maxleaf task on a later 112 . as agents no longer have access to the entire activity network in order to calculate eul values.6 Probabilistic Schedule Strengthening This step. It follows that the change in taskgroup expected utility that results from a backﬁll can only be estimated. Namely. Fourth. First. This helps to prevent the duplicate actions mentioned above. then that could be added as well at Line 5. as the ﬁrst time that a maxleaf task is distributively strengthened all of its children are scheduled (or attempted to be scheduled) in order. which can be done using the same approximation technique used to approximate eul and eug. Second. but failed. agents communicate a “strengthening status” for each method: • noattempt : this method is unscheduled and the owning agent has not yet tried to schedule this method as part of a backﬁll • failedbackﬁll : this method is unscheduled because the owning agent tried. and their children must be checked to see if they are local before they can be considered by an agent. It is still possible for duplicate strengthening actions to be performed. agents will not try to backﬁll a task with their own methods if there is a method earlier in the list of the task’s children whose owner has not yet attempted to use it to backﬁll the task. the order of the methods considered for backﬁlling and swapping was found to be unimportant. the maxLeafs function returns only maxleafs that the agent knows about (whether local or remote). to limit computation. Third. Also. as well as the steps for the swapping and pruning tactics (described below). To address the fourth issue. can be modiﬁed in several simple ways to suit the needs of different domains. to schedule this method as part of a backﬁll • scheduled : this method is scheduled The strengthening status is communicated along with probability proﬁles during execution.4). updateAllProfiles can no longer update proﬁles throughout the entire task tree. Agents use this information to implicitly and partially coordinate schedule strengthening during execution. in our domain. if desired.3. given a predeﬁned sort order for maxleaf children. Threshold values for failure probabilities could be utilized.
described in Section 6. the step looks at each method in the agent’s local schedule. this ensures that any effects a backﬁll has on other methods in the schedule (e. The step ﬁrst ﬁnds chains of methods.6.. scheduled backtoback on the same agent. If a child is not local and its owner has not yet attempted to backﬁll it (i.. Next. If the expected utility is estimated to be higher. discouraging concurrent backﬁlling.3 shows pseudocode for a step of the global swapping tactic. each child of each task is sequentially considered for backﬁlling. then it skips this maxleaf task and continue to the next. however. The step next stores the original eu (T G) value (Line 3) to use later for comparison. it undoes the schedule (Lines 1617) and continues looping through the tasks and their children.2 presents the pseudocode for a distributed backﬁll step.g. the step returns null (Line 22). If not. It identiﬁes the max eul value of the methods in the group.2 Tactics schedule strengthening pass and both succeed in backﬁlling it. If the method is local. pushing them towards a deadline) are accounted for (Line 11). Algorithm 6. if any. the step returns the successfully backﬁlled child c. its strengthening status is noattempt).2 Global Swapping A step of the swapping tactic performs one swap. if possible.3. 6. It estimates the change in expected utility of the taskgroup by summing the estimated change resulting from differences in each method’s proﬁles. It sorts the methods m within the chains by increasing eug (m). This ensures that methods are (attempted to be) backﬁlled in order. then the agent’s local activity proﬁles are updated (Line 9). in practice we rarely see this occur. and the algorithm is able to schedule it via a call to the planner doschedule (Line 8). The step begins by identifying methods that are candidates for swapping. It begins by sorting all maxleaf tasks by their estimated eul values (Line 2). the step adds scheduled methods that have an a priori likelihood of a failure outcome (Lines 35).e. If no successful backﬁll is found once all maxleaf tasks have been considered. The strengthening status is also used to inform the heuristics for some of our strengthening strategies. where at least one method in the chain has a probability of missing a deadline (Line 1). Then. Algorithm 6. so that methods that are expected to contribute the least to the overall utility are swapped ﬁrst (Line 2). We deﬁne the function status to return a method’s strengthening status. unscheduled. and sorts 113 . To this set of chains. that have overlapping execution intervals.2. Swapping can serve to strengthen the schedule in two different ways: it can include methods with lower a priori failure outcome likelihoods and can increase schedule slack to increase the likelihood that this (or another) method meets a deadline.
po (m)) − euT G ) end if ∆eu > 0 then //expected utility increased. eul) euT G = eu (T G) foreach T ∈ M LTs do foreach c ∈ children(T) do if not (local (c)) and status (c) == noattempt then continue //continue to next task else if local (c) and unscheduled (c) and doschedule (c) then updateLocalProfiles() foreach m ∈ localSchedule () do //estimate how m’s new po value impacts T G expected utility ∆eu = ∆eu + (approximateEU (m. commit to backfill return c else //expected utility decreased. undo backfill undo (doschedule (c)) updateLocalProfiles() end end end end return null 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 114 .6 Probabilistic Schedule Strengthening Algorithm 6. >.2: Distributed backﬁlling step utilizing maxleaf tasks. 1 Function : doOneDistributedBackfill M LT = maxLeafs () //all maxleafs the agent knows about. whether local or remote M LTs = sort({T ∈ M LT  po (T ) < 1}.
the planner is called to attempt to swap them. eug and the change in expected utility of the taskgroup have to be approximated. If it is higher. A method and its sibling are a candidate for swapping if they are owned by the same agent and the sibling is unscheduled.2 Tactics the resulting groups by the decreasing maximum eul.3 Global Pruning Algorithm 6. It begins by ﬁnding chains of methods on agents’ schedules that have overlapping execution intervals. the activity hierarchy’s probability proﬁles are updated. and the chains themselves by decreasing eul (Lines 23). so swapping might increase schedule slack due to the different temporal constraints (Lines 1318). exactly as in the swapping algorithm (Line 1). Next. the step returns the successfully pruned method 115 . so that groups whose worst expected failure leads to the most utility loss are considered ﬁrst (Line 9). the sibling must be considered a good candidate for swapping. and the taskgroup’s new eu value is compared to the original (Line 1920). the step returns the empty set {} (Line 30). If the original method was considered for swapping because it has a failure likelihood. the step returns the successfully swapped methods (Line 21). the step iterates through the groups and methods in the groups to ﬁnd candidates for swapping (Line 1112). then the activity hierarchy’s probability proﬁles are updated. 6. If not. if possible. Then. If no successful swaps are found once all chains have been considered. the sibling method must have a lower failure likelihood. it undoes the swap and continues looping through the methods and chains (Line 2324). In addition. Next. Distributed The adaptation of this step to make it distributed is analogous to changes made for the distributed version of the backﬁlling step.6. the sibling must meet one of two criteria to be considered a candidate: (1) the sibling has a shorter average duration. The step performs one prune. where localSchedule is called instead of jointSchedule. When it reaches a method that can be unscheduled. or (2) the sibling has a different set of NLEs. Also as with swapping. If the two methods meet this criteria. so swapping may increase schedule slack. and eul.4 shows pseudocode for a step of the global pruning tactic. if the swap is successful. and the new eu (T G) value is compared to the original (Lines 89). it sorts the methods m within the chains by increasing eug. not all proﬁles can be updated at once. If the original method was identiﬁed because it was part of a chain. the step iterates through the chains and each method in the chains to ﬁnd methods that can be unscheduled (Lines 57). If it is higher.2.
6 Probabilistic Schedule Strengthening
Algorithm 6.3: Global swapping tactic for a schedule with failure likelihoods and NLEs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Function : doOneGlobalSwap C = findChains (jointSchedule ()) foreach c ∈ C do c ← sort (c, <, eug) foreach m ∈ jointSchedule () do if fail (m) > 0 then push ({m}, F ) end end G=C ∪F G = sort (G, >, eul) euT G = eu (T G) foreach g ∈ G do foreach m ∈ g do foreach m ∈ siblings(m) do if owner (m ) = owner (m) and unscheduled (m ) and
((g ∈ F and fail (m ) < fail (m))
or (g ∈ C and
(averageDur (m ) < averageDur (m) or NLEs (m ) = NLEs (m))) and doswap (m, m ) then updateAllProfiles() if eu (T G) > euT G then return {m, m } else undo (doswap (m, m )) updateAllProfiles() end end
end end end return {}
)
116
6.3 Strategies (Line 10). If not, it undoes the unschedule and continues looping through the chains and methods (Lines 1213). If no successful swaps are found once all chains have been considered, the step returns null (Line 18).
Algorithm 6.4: Global pruning step.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Function : doOneGlobalPrune C = findChains (jointSchedule ()) foreach c ∈ C do c ← sort (c, <, eug) G = sort (C, >, eul) euT G = eu (T G) foreach g ∈ G do foreach m ∈ g do if dounschedule (m) then updateAllProfiles() if eu (T G) > euT G then return m else undo (dounschedule (m)) updateAllProfiles() end end end end return null
Distributed The adaptation of this step to make it distributed is analogous to changes made for the distributed version of the backﬁlling step, where localSchedule is called instead of jointSchedule, not all proﬁles can be updated at once, and eul, eug and the change in expected utility of the taskgroup have to be approximated.
6.3 Strategies
Given these three strengthening tactics, the next step is to ﬁnd a strategy that speciﬁes the order in which to apply them. The ordering of tactics can affect the effectiveness of probabilistic schedule 117
6 Probabilistic Schedule Strengthening strengthening because the tactics are not independent: performing an action of one tactic, such as unscheduling a method, may make another tactic, such as swapping that method, infeasible. Similarly, performing an action of one tactic, such as swapping sibling methods, may increase schedule slack and make another tactic, such as backﬁlling a maxleaf task, possible. In order to develop a good strategy to use for probabilistic schedule strengthening, we ﬁrst consider a searchbased strategy that optimizes the application of the tactics. This baseline, “pseudooptimal” solution considers each possible order of backﬁlling, swapping and pruning steps, ﬁnds the taskgroup expected utility that results from each ordering, and returns the sequence of steps that results in the highest expected utility. This strategy is not optimal, since the ordering of activities within each tactic is just a heuristic; however, as this strategy is exponential in the number of possible strengthening actions, it still quickly grows to take an unmanageable amount of time. Although this makes its own use infeasible, we use it as a baseline against which to compare more efﬁcient strategies. Algorithm 6.5 shows the pseudocode for globally ﬁnding the best possible ordering of tactics for the baseline strategy. It performs a depthﬁrst search to enumerate all possible orderings, recursing until no more strengthening actions are possible. It returns the highest expected utility for the schedule it found as well as the ordering of tactics that produced that schedule. As we are using this strategy as a baseline, it is not necessary to show how it would be implemented distributively. We implemented two more efﬁcient strategies to compare against the baseline strategy. One is a variant of a roundrobin strategy, shown in Algorithm 6.6. The global version of the strategy performs one backﬁlling step, one swapping step and one pruning step, repeating until no more steps are possible. Two additional heuristics are incorporated. The ﬁrst concerns swapping methods to ﬁnd a lower a priori failure likelihood. The heuristic speciﬁes that a swap cannot take place for a method unless the method’s parent is not eligible for backﬁlling, where not eligible means that either its po value equals 1 or backﬁlling has already been attempted on all of the method’s siblings (i.e., no siblings have a strengthening status of noattempt). This heuristic prevents swapping methods for lowerutility methods while backﬁlling a parent maxleaf task (and preserving a higher utility option) is still possible. This is incorporated in as part of the swapping tactic; we do not show the revised algorithm. The second heuristic is that a prune cannot take place unless either: (1) the candidate for pruning is not the only scheduled child of its parent; or (2) swapping or backﬁlling any method is not possible at the current point in the algorithm. This heuristic prevents pruning methods, and thus entire branches of the taskgroup, when a backﬁll or swap is still possible and would better preserve the existing task structure. Again this is added in as part of the pruning tactic; we do not show the revised algorithm. 118
6.3 Strategies
Algorithm 6.5: Global baseline strengthening strategy utilizing backﬁlls, swaps and prunes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Function : baselineStrengthening b = doOneBackfill() if b then eub , orderingsb = baselineStrengthening() undo (doschedule (b)) updateAllProfiles() end {s, s } = doOneSwap() if {s, s } then eus,s , orderingss,s = baselineStrengthening() undo (doswap (s, s )) updateAllProfiles() end p = doOnePrune() if p then eup , orderingsp = baselineStrengthening() undo (dounschedule (p)) updateAllProfiles() end if b or {s, s } or p then switch max (eub , eus,s , eup ) do case eub return eub , {b} ∪ orderingsb case eus,s return eus,s , {{s, s }} ∪ orderingss,s case eup return eup , {p} ∪ orderingsp end end return eu (T G), {}
Algorithm 6.6: Global roundrobin strengthening strategy utilizing backﬁlls, swaps and prunes.
1 2 3 4 5 6
Function : roundRobinStrengthening while true do b = doOneBackfill () {s, s } = doOneSwap () p = doOnePrune () if not (b or {s, s } or p) then return end
119
6 Probabilistic Schedule Strengthening The distributed version of this strategy differs only in how the distributed versions of backﬁlling, swapping and pruning functions are invoked. The overall structure and heuristics remain the same. As the details are purely implementational, we do not provide them here. The other strategy is a greedy variant, shown in Algorithm 6.7. This strategy is similar to hillclimbing: it performs one ‘trial’ step of each tactic, records for each the increase in expected utility that resulted (Lines 27), and commits to the step that raises the expected utility the most (Lines 920), repeating until no more steps are possible. This greedy strategy also incorporates in the same two heuristics as does the roundrobin strategy. In general, it performed the same as the roundrobin strategy in our preliminary testing, but at a much higher computational cost; therefore we omit discussion of its distributed counterpart.
Algorithm 6.7: Global greedy strengthening strategy utilizing backﬁlls, swaps and prunes. Function : greedyStrengthening while true do 2 b = doOneBackfill(), eub = eu (T G) 3 if b then undo (doschedule (b)), updateAllProfiles()
1 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
{s, s } = doOneSwap(), eus,s = eu (T G) if {s, s } then undo (doswap (s, s )), updateAllProfiles() p = doOnePrune(), eup = eu (T G) if p then undo (dounschedule (p)), updateAllProfiles() if b or {s, s } or p then switch max(eub , eus,s , eup ) do case eub doschedule (b) end case eus,s doswap (s, s ) end case eup dounschedule (p) end end updateAllProfiles() else return end end
120
however.4. each with their own stochastically chosen durations and outcomes.4.2 (on page 98) for the probabilistic schedule strengthening experiments. we executed the schedules as is. This third mode provides the schedule strengthening beneﬁts of backﬁlling without either the algorithmic overhead or the probabilistic focus of probabilistic backﬁlling. in this experiment the only tactic considered was backﬁlling. one testing the effectiveness of the various schedule strengthening strategies. (2) the schedule generated by performing backﬁlling on the original schedule. It will continue until it can no longer backﬁll under any of the maxleafs due to lack of space in agents’ schedules. such as removing methods from a schedule if they miss their lst. and we did retain basic conﬂictresolution functionality that naively repairs inconsistencies in the STN. but purposefully seeded such that the same set of durations and outcomes were executed by each of the conditions.4 Experimental Results 6. The average utility earned by the three conditions for each of the 20 problems is shown in Figure 6.4 Experimental Results We used the DARPA test suite of problems described in Section 5.4. We ran four experiments: one testing the effectiveness of using PADS to target and prioritize activities for schedule strengthening. The unconstrained backﬁlling tactic is deterministic. and one measuring the effect of probabilistic schedule strengthening when performed in conjunction with replanning during execution. We ran the precomputed schedules for each condition through a multiagent simulation environment in order to test their relative robustness. All agents and the simulator ran on separate threads on a 2. and (3) the schedule generated by performing unconstrained backﬁlling on the original schedule. 2008]. We compared three different classes of schedules: (1) the original schedule generated by the deterministic.4 GHz Intel Core 2 Due computer with 2GB RAM. using PADS to target and prioritize activities (for clarity we refer to this as probabilistic backﬁlling). For each problem in our test suite. one testing how schedule utility improves when executing prestrengthened schedules. allow the time bounds of methods to ﬂoat in response to execution dynamics. In order to directly compare the relative robustness of schedules. 6. and loops repeatedly through the set of all scheduled maxleaf tasks and attempts to add one additional method under each task in each pass. For simplicity. The utility is expressed as a function of the scheduled utility of the original initial 121 . The schedule’s ﬂexible times representation does. and each condition. we ran 10 simulations. with no replanning allowed.6..1 Effects of Targeting Strengthening Actions We ran an experiment to test the effectiveness of using PADS to target and prioritize activities when performing probabilistic schedule strengthening [Hiatt et al. centralized planner.
8 0. and problems are sorted by how well the condition (1) did. On all but 4 of the problems (16. we observe that these results in some cases are due to occasional subpar planning by the deterministic planner. expressed as the average proportion of the originally scheduled utility earned.66. while probabilistic backﬁlling will not even consider this maxleaf if the originally scheduled method is not in danger of earning no utility.6 0.3 0. it also sometimes performed worse than no backﬁlling. 19. Unconstrained backﬁlling may correct this coincidentally due to its undirected nature. and probabilistic backﬁlling earned a proportional utility of 0. Looking more closely at the 4 problems where unconstrained backﬁlling outperformed probabilistic backﬁlling. For example.2. schedule: it is represented as the average proportion of that scheduled utility earned during execution. Problem numbers correspond to the numbers in Table 5. Using a repeated measures ANOVA. unconstrained backﬁlling earned a proportional utility of 0. this roughly correlates to sorting the problems by their level of uncertainty.4 0.64. the method allocated by the planner under a maxleaf may not be the highest utility child. probabilistic backﬁlling earned the highest percentage of the scheduled utility. On average across all problems. deterministic schedule.9 0. under some circumstances.5 0. Figure 6.6 Probabilistic Schedule Strengthening 1 average proportion of scheduled utility earned 0.2 0. This highlights our earlier supposition that the beneﬁt of probabilistic schedule strengthening is affected by the goodness of the original. although unconstrained backﬁlling sometimes outperformed no backﬁlling. In the four problems that unconstrained backﬁlling wins.5 shows the percent increase in the number of methods in the initial schedule after 122 .1 0 18 16 14 17 5 2 8 15 19 20 1 7 4 6 10 12 3 9 11 13 problem number no backfilling unconstrained backfilling probabilistic backfilling Figure 6. probabilistic backﬁlling is signiﬁcantly better than either other mode with p < 0.02.77. it does so marginally. even though there is enough schedule space for the latter.7 0. no backﬁlling earned a proportional utility of 0. 20 and 6). Unconstrained backﬁlling and no backﬁlling are not statistically different.4: Utility earned by different backﬁlling conditions.
backﬁlling for the two modes that performed backﬁlling. targeted backﬁlling does not add many methods because not many are needed. in contrast. difﬁcult problems to the right. Despite resulting in less utility earned during execution. for the more certain problems to the left. Our explicit 123 . These ﬁgures again highlight how schedule strengthening can be limited by the goodness of the original schedule and how it is unable to ﬁx all problems: for problems 6 and 3. Comparing Figures 6. unconstrained backﬁlling far exceeds probabilistic backﬁlling in the number of backﬁlls. it adds methods in order to hedge against that uncertainty. has no discernible pattern in the graph from left to right. probabilistic backﬁlling added at least 1 method. as seen by comparing the no backﬁlling results to the relatively similar results of unconstrained backﬁlling. as seen by comparing the probabilistic backﬁlling results to the no backﬁlling results.2. For the more uncertain. probabilistic nature of our schedule strengthening approach is emphasized. however. for example. In all cases but one. Problem numbers correspond to the numbers in Table 5.5: The percent increase of the number of methods in the initial schedule after backﬁlling for each mode. probabilistic backﬁlling does not much affect the utility earned as it is able to make only a small number of effective strengthening actions.4.4 Experimental Results percent increase in number of methods 225 200 175 150 125 100 75 50 25 0 18 16 14 17 5 2 8 15 19 20 1 7 4 6 10 12 3 9 11 13 problem number unconstrained backfilling probabilistic backfilling Figure 6. the effective. these results illustrate the advantage to be gained by our probabilistic schedule strengthening approach.5. There is an inherent tradeoff associated with reinforcing schedules in resourceconstrained circumstances: there is advantage to be gained by adding redundant methods. Overall.4 and 6. this advantage can be largely negated if backﬁlling is done without guidance. Unconstrained backﬁlling. and problems are sorted to be consistent with Figure 6. In general.6.
1 13.5 77. The resulting average expected utility after the probabilistic schedule strengthening.0 354.4 13. (4) the roundrobin strategy. and (6) the baseline strategy [Hiatt et al. and using all three results in stronger schedules than using any one individually. their different properties can also have synergistic effects (e. as well as the average running time for each of the strategies. despite the limitations of each individual tactic due to each one’s associated tradeoff. making room for backﬁlling).8 3.5 100 avg running time (in seconds) – 6. (3) backﬁlling only. (2) swapping only. followed by swapping and then pruning. on average. for these cases we considered the run time to be 36 hours and the utility earned to be the best found when it was stopped.5 262. The table also shows.7 32. 124 . that these problems initially had roughly between 5090 methods initially scheduled. swapping can increase schedule slack.5 32624.1. Strengthening was performed on a 3.4.1 92.6 GHz Intel computer with 2GB RAM.2 Comparison of Strengthening Strategies We compared the effectiveness of the different strengthening strategies by strengthening initial schedules for the 20 problems in the DARPA test suite in 6 ways: (1) pruning only. that backﬁlling is the strongest individual tactic.g.0 13. avg expected utility (% of baseline) 74.1: A comparison of strengthening strategies. for comparison.5 98.5 10.2 85.. 6..6 Probabilistic Schedule Strengthening use of probabilities to target and prioritize activities provides a means of effectively balancing this tradeoff.3 avg num actions – 0. (5) the greedy strategy. For 8 of the 20 problems the baseline strategy did not ﬁnish by 36 hours. Table 6.9 68.0 initial prune swap backﬁll roundrobin greedy baseline Our supposition of the effectiveness of individual tactics is supported by these results: namely. Further.8 99. how many strengthening actions each strategy successfully performed. 2009]. is shown in Table 6. recall.
the roundrobin strategy turned out to be better. but purposefully seeded such that the same set of durations and outcomes were executed by each of the three conditions. 6. When compared to the greedy strategy.6). This is because of the increased search space: when attempting to perform a valid action. Recall that the schedule’s ﬂexible times representation does.1 compared to 14. its running time is much more than the sum of the running times of the three independent strategies (262. we ran 25 simulations. instead of only trying to backﬁll. the roundrobin strategy must consider the space of backﬁlls. but roundrobin backﬁlled it and then swapped two of its children.4 Experimental Results The roundrobin strategy uses. Clearly the roundrobin strategy outperforms each of the individual strategies it employs at a modest runtime cost. with a runtime that is over two orders of magnitude faster. swapping and backﬁlling strategies independently (13. Additionally. and each condition.1 seconds).0 seconds compared to 108. in situations where only backﬁlls are possible. the roundrobin strategy incurs a higher overhead than the pure backﬁlling strategy. and (3) the schedule generated by performing roundrobin schedule strengthening on the schedule in condition (1). We ran simulations comparing the utility earned by executing three classes of schedules: (1) the schedule generated by the deterministic.4. we executed the schedules as is. Therefore. centralized planner. fewer strengthening actions than the sum of the number of actions performed by the pruning. and 125 . for example. Although we do not see the difference between the two conditions to be very meaningful. however. each agent and the simulator ran on its own machine. with no replanning allowed. In contrast. the difference in run time leads us to choose roundrobin as the dominant strategy.80GHz and Intel Pentium 4 CPU 3. roundrobin is almost as effective as the baseline strategy and. As with previous experiments. a couple problems where roundrobin outperforms greedy. There are.3 Effects of Global Strengthening We performed an experiment to compare the utility earned when executing initial deterministic schedules with the utility earned when executing strengthened versions of the schedules. allow the time bounds of methods to ﬂoat in response to execution dynamics. An example of such a situation is when the greedy strategy backﬁlled a maxleaf task twice. swaps and prunes. on average. is the preferred choice.60GHz processors.6. it performs mostly the same. however. as it resulted in more schedule slack and left room for other backﬁlls. Experiments were run on a network of computers with Intel Pentium D CPU 2. each with their own stochastically chosen durations and outcomes. in order to directly compare the relative robustness of schedules. (2) the schedule generated by performing backﬁlling (the best individual tactic) only on the schedule in condition (1). For each problem in our DARPA test suite.
/01"23456715" 8039:11"2. the data are slightly different. This interesting behavior arises because roundrobin can hedge against method failure by swapping in a lower utility. 15."2. For problems 14. and as a result produced results identical to the backﬁlling only strategy. while not identical.. The utility is expressed as the fraction of the originally scheduled utility earned during execution. and problems are sorted by how well condition (1) did. but occasionally far outperforms it.. 3 and 10. Note although this graph and Figure 6." !#+" !#*" !#)" !#(" !#'" !#&" !#%" !#$" !" $+" $)" (" $*" $'" $(" %" +" %!" $" '" $.<5. 5.6?<>8.=" <>7. !"#$!%#&'$('($)(*&(+&. expressed as the average proportion of the originally scheduled utility earned. 17 and 20. are on average very close for these conditions.45.6: Utility for each problem earned by different strengthening strategies.=. shorter duration method for a method with a small chance of earning zero utility. this roughly correlates to sorting the problems by their level of uncertainty. While for many runs this tactic sacriﬁces a few utility points when the original method succeeds. For problem 3. this is due to the previous experiment running only 10 simulations per problem. Figure 6. roundrobin strengthening identiﬁed only backﬁll steps to be of value.45.. Problem numbers correspond to the numbers in Table 5.=.6 shows the average utility earned by the three sets of schedules for each of the 20 problems. For individual simulations of these four problems.#/01#/&0)1234& #!$*#/& $" !#." *" )" &" $!" $%" . we see that roundrobin strengthening usually performs slightly worse than backﬁlling alone." $&" $$" '$(51#6& . on average roundrobin avoids the big utility loss that occurs when the method fails.6 Probabilistic Schedule Strengthening we did retain basic conﬂictresolution functionality that naively repairs inconsistencies in the STN. 8. such as removing methods from a schedule if they miss their lst.2. Problems 18. strengthening is un126 .=" Figure 6.<5. 2.4 share two conditions. in contrast to the 25 simulations this experiment performs.
01. 2 involved 10 agents. outperforms the initial schedule condition by 31. and problems are sorted by how well the condition (1) did.4. For each problem in our test suite we ran 25 simulations. we did not spend the time necessary to incorporate them into the test suite for these experiments. (2) agents are initially given their portion of a strengthened version of the initial joint schedule and during execution replan as appropriate. where the actual method durations and outcomes generated by the simulator are given as inputs to the planner. as the results were signiﬁcant with only 16 problems. which in turn outperforms the initial schedule condition. centralized version of the planner on omniscient versions of the problems. the fraction of omniscient utility earned can be greater than one. 127 . each agent and the simulator ran on its own machine. 6. and 2 had more than 400 methods. overall.2. but purposefully seeded such that the same set of durations and outcomes were executed by each of the three conditions. Twelve of the problems involved 20 agents. We used a 16 problem subset of the 20 problems described above. Figure 6. and so all three conditions earn roughly the same utility.4 Experimental Results able to provide much help in making the schedule more robust. each with their own stochastically chosen durations and outcomes.80GHz and Intel(R) Pentium(R) 4 CPU 3. We deﬁne the omniscient utility for a given simulation to be the utility produced by a deterministic. 6 had between 200 and 300 methods.60GHz processors.6%. roundrobin schedule strengthening outperforms backﬁlling. We ran simulations comparing three conditions: (1) agents are initially given their portion of the precomputed initial joint schedule and during execution replan as appropriate. As the planner is heuristic. We initially avoided 4 of the problems due to difﬁculties getting the planner we were using to work well with them. (3) agents are initially given their portion of a strengthened version of the initial joint schedule and replan as appropriate during execution. 5 between 300 and 400 methods. As the ﬁgure shows. agents perform probabilistic schedule strengthening each time a replan is performed. The utility is expressed as the fraction of the “omniscient” utility earned during execution. and 2 involved 25 agents.7 shows the average utility earned by the three conditions for each of the 16 problems. Problem numbers correspond to the numbers in Table 5. The differences between the conditions are signiﬁcant with p < 0.4 Effects of RunTime Distributed Schedule Strengthening We next ran an experiment to quantify the effect that schedule strengthening has when performed in conjunction with replanning during execution. Roundrobin schedule strengthening.6. when tested for signiﬁcance with a repeated measures ANOVA. further. Three of the problems had fewer than 200 methods. Experiments were run on a network of computers with Intel(R) Pentium(R) D CPU 2.
. this roughly correlates to sorting the problems by their level of uncertainty.$89./0122324$12. 128 . though. when tested for signiﬁcance with a repeated measures ANOVA.$ '$(52#. For some the difference is considerable.01. Note./0122324$5206$ .$323<10$8=:.2324$ Figure 6.249:.249:./0122324$52067$89.& .$52032.$8=:.6 Probabilistic Schedule Strengthening !"#$!%#&'$('($)(*&(+&(. or an agent losing communication so that it can no longer send and receive information to other agents. for others it results in only slightly more utility on average as replanning is able to sufﬁciently respond to most unexpected outcomes. however. This difference is because replanning is able to recover from some of the problems that arise during execution.2. Online probabilistic schedule strengthening does. probabilistic schedule strengthening outperforms condition (1) by 13.>0.$ . On average. have an additional beneﬁt of protecting schedules against an agent’s planner going down during execution..*. that our results do not reﬂect this because we assume that communication is perfect and never fails.>0. This is because the last schedule produced will likely be more effective in the face of unexpected outcomes. Note that this difference is less dramatic than the difference between the probabilistic schedule strengthening and initial schedule conditions of the previous experiment of global schedule strengthening. Online schedule strengthening results in higher utility earned for all problems. expressed as the average proportion of omniscient utility earned./#*0&& 1)203&#!$*#4& )")$ )$ !"($ !"'$ !"&$ !"%$ !"#$ )%$ )($ )!$ )$ *!$ +$ )'$ )#$ )+$ #$ *$ )&$ ))$ )*$ ($ ).6%. and so these communication problems are not present in our experiments.7: Utility for each problem for the different online replanning conditions.. the differences between the online schedule strengthening condition and the other two conditions are signiﬁcant with p < 0. Overall.
lessening the impact of uncertainty on the outcome of execution.01) and individual schedule strengthenings took on average 0. in general. A notable feature of this graph is how probabilistic schedule strengthening. This indicates that prestrengthening schedules is not critical to the success of probabilistic schedule strengthening. Figure 6./011213"41/5" .& '#!!" '!!!" &!!" %!!" $!!" #!!" !" '%" '(" '!" '" #!" $" '&" ')" '$" )" #" '*" ''" '#" (" '+" /$01)#2& . Interestingly. the amount of slack is acceptable.27). and that as long as all methods ﬁt in the schedule.6.. These results emphasize how probabilistic schedule strengthening improves the average utility by helping to protect against activity failures. have very similar results for many of the problems.105 seconds (σ = 0.8 shows.91213" Figure 6. Ultimately this is because when an agent replans for the ﬁrst time. conditions (1) and (2). Timing measurements gathered during these experiments show that each individual call to update probability proﬁles and to strengthen a schedule was very fast.13. Error bars indicate one standard deviation above and below the mean./011213"016"41/21"7896:/"7. with standard deviation shown. we include it in future experiments for completeness. each agent spent an average of 1. reduces the standard deviation of the utility earned across the 25 simulations. Individual proﬁle updates took on average 0.. which differ in which joint schedule they begin with..4 Experimental Results '$!!" !"#$!%#&'()*+.8: Utility earned across various conditions.003 seconds (with standard deviation σ = 0. still. it effectively “undoes” much of the schedule strengthening: the deterministic planner believes that redundant methods are unnecessary.7 129 .&#!$#. Overall. the utility earned when replanning only (condition (1)) to the utility earned when both replanning and schedule strengthening (condition (3)). for the same problems.
Ultimately.7 seconds replanning.309 seconds (σ = 0. with methods lasting between 15 and 30 seconds. the results clearly show that the best approach utilizes both replanning and schedule strengthening to generate and maintain the best schedule possible: replanning provides the means to take advantage of opportunities for earning higher utility when possible. Even with the fastpaced execution of our simulations. In contrast.39) with agents spending an average of 47. 130 .6 Probabilistic Schedule Strengthening seconds updating probabilities and 37.6 seconds strengthening schedules for each problem. these computations are well within range to keep pace with execution. while schedule strengthening works to protect against activity failure and minimize the problems that arise to which the planner cannot sufﬁciently respond. each individual replan took an average of 0.
in large part because it is difﬁcult to predict whether a replan is beneﬁcial without actually doing it. probabilistic metalevel control uses the probabilistic analysis to identify times where there is potential for replanning to be useful. and can be skipped by the casual reader. Many of these situations are difﬁcult to explicitly predict. agents would replan only if it would be useful either in avoiding failure or in taking advantage of new opportunities. 131 .4 shows probabilistic metalevel control and PPM experimental results.Chapter 7 Probabilistic MetaLevel Control This chapter discusses how to perform effective metalevel control of planning actions by utilizing the information provided by PADS. or (4) replanning would not be able to exploit the opportunities that were there. either because there is likely a new opportunity to leverage or a possible failure to avoid. The probabilistic metalevel control strategy is the ﬁnal component of our probabilistic plan management approach. Section 7. either because: (1) there was no potential failure to avoid. Finally. The ﬁrst section discusses the control strategy at a highlevel. they ideally would not replan if it would not be useful.1 Overview We developed versions of probabilistic metalevel control for both the singleagent and multiagent domains. 7. Similarly. As an alternative. The goal of these control strategies is to intelligently decide. The two following sections provide a more detailed description of probabilistic metalevel control. Ideally. (3) there was no new opportunity to leverage. based on a probabilistic analysis. and describes how our overall probabilistic plan management approach comes together. (2) there was a possible failure but replanning could not ﬁx it. when replanning is likely to be helpful and when it is not.
the portion outside is ignored. metalevel control is based on learning when it is effective to replan or strengthen based on the probabilistic state of the schedule: it assumes the initial schedule is as robust as possible and then learns when to replan and strengthen. We term this strategy likely replanning.1 SingleAgent Domain The probabilistic metalevel control strategy has two components in the single agent domain. First. 7. which limits how much of the schedule is replanned at any given time. a likely replanning strategy thresholding on expected utility is not a good ﬁt for this domain. since this is a distributed scenario. Because the domain is oversubscribed and the maximal feasible utility is unknown.1. The threshold can be adjusted via trial and error as necessary to achieve the right balance between expected utility and computation time. an effective metalevel control strategy is to replan based on a threshold of expected utility. distributed. because there is likely enough ﬂexibility to successfully absorb unexpected method durations during execution and avoid failure. Because this domain has a clear maximal and minimal utility. or only strengthen.2 MultiAgent Domain The probabilistic metalevel control strategy for our oversubscribed. A planning horizon is an effective metalevel strategy for this domain due to its relatively ﬂat activity network. execution continues without replanning. The threshold can be adjusted via trial and error as necessary to achieve the right balance between expected utility and computation time. If the expected utility ever goes below the threshold. Second. a replan is performed to improve the schedule. lowering the computational overhead of managing the plan during execution.1. looking at the probabilistic state of the taskgroup or other highlevel tasks is not always helpful for determining whether locally replanning could be of use. based on how execution affects probability proﬁles at the method level. First. called the uncertainty horizon. There are several reasons for considering only methodlevel proﬁles. the probability proﬁle of the taskgroup or another highlevel 132 .7 Probabilistic MetaLevel Control 7. As execution progresses. its lack of AND nodes. there is a probabilitybased planning horizon. The portion of the schedule within the horizon is considered for further PPM. due to the presence of OR nodes in this scenario. The second metalevel control component determines when replanning is performed during execution. and the presence of activity dependencies only at the method level. multiagent domain determines when to replan and perform probabilistic schedule strengthening during execution. The uncertainty horizon is placed where the probability of a replan occurring before execution reaches that point in the schedule rises above a threshold. as long as the expected utility of the schedule within the uncertainty horizon remains above a threshold. Instead.
if any (Line 4). if a task has two children. or from the simulator providing feedback on method execution such as a method’s actual start or end time. During execution. the probabilistic metalevel control strategy is invoked (Line 4).3 Probabilistic Plan Management With a metalevel control strategy in hand. each time a method ﬁnishes and the agent gets a message from the simulator about its end time (corresponding to Line 2 of Algorithm 7. the task itself will have a probability of earning utility of 100%.7. Finally. At this point. The classiﬁer is learned ofﬂine. this probability does not adequately capture the idea that one of its children will probably fail. For example. it considers the current expected utility 133 . 7. The probabilistic metalevel control strategy for this domain learns when to replan using a classiﬁer. Once that action has been taken. general pseudocode for the runtime approach is shown in Algorithm 7. the agent ﬁrst updates the uncertainty horizon and then updates all proﬁles within the horizon to reﬂect the current state of execution (Line 3). probabilistic metalevel control is used to decide what planning action to take. we assume agent execution is a messagebased system. To decide whether to replan. Once the proﬁles have been updated. For example. either from other agents providing information such as changes in probability proﬁles. decision trees with boosting (described in Section 3.g. on page 44). and lowerlevel proﬁles provide a better indication of when this might be useful. for the singleagent domain. We assume this solely for clarity of presentation.1). For the purposes of this discussion. This algorithm can be instantiated depending on the speciﬁcs of probabilistic plan management for a domain. such as about the status of a method the agent is currently executing or from other agents communicating status updates (Line 2). speciﬁcally. When a message is received. Such a message is referred to as updateMessage. it is used to decide what planning action to take at any given time during execution. PPM waits for updates to be received. our overall probabilistic plan management approach can be used to manage plans during execution. based on data from previous execution traces. one of which will always earn utility and one of which will almost never earn utility. communicate changes to other agents) are done (Line 9). Once the classiﬁer has been learned. the overall structure of probabilistic plan management for this domain is: during execution. probability proﬁles are updated again (Line 7).1. This process repeats until execution halts.1 Overview task does not always accurately represent the state of the plan due to the different uafs at each level of the tree. where during execution agents take planning actions in response to update messages.. PADS is used to update all (or local for distributed cases) probability proﬁles (Line 3).4. and any actions the agent needs to take to process the changes (e. Highlevel.1. we wanted also to learn when probabilistic schedule strengthening would be a good action to take.
The agent acts accordingly. any time an agent receives an update message. Recall once again that the objective of the domain is that all tasks have exactly one method underneath them execute without violating any temporal constraints.7 Probabilistic MetaLevel Control Algorithm 7. deterministic plan manage134 . it returns replan.1: Probabilistic plan management. which indicates whether the agent should replan and strengthen. There is no processing necessary in this domain (Line 9). the agent updates its local probabilities (Line 3). whether from the simulator specifying method execution information or from another agent communicating information about remote activities (corresponding to Line 2 of Algorithm 7. If it is within the acceptable range. It then passes the probabilistic state to the probabilistic metalevel control classiﬁer. and the only source of uncertainty is in method duration. and communicates changes to interested agents (Lines 69). it returns null and so the agent does nothing. ignoring all others. The schedule is represented as an STN. only strengthen.1). 2007]. or do nothing (Line 4). and updates all proﬁles within the horizon to reﬂect the new schedule (Lines 67). The probabilistic metalevel control strategy for this domain has two components: an uncertainty horizon and likely replanning. 7. We begin by discussing a baseline. For the multiagent domain. Otherwise.2 SingleAgent Domain We developed a probabilistic metalevel control strategy for our singleagent domain [Hiatt and Simmons. and so the agent replans all activities within the uncertainty horizon. 1 2 3 4 5 6 7 8 9 10 11 Function : ProbabilisticPlanManagement () while execution continues do if updateMessage () then updateAllProfiles () act = metaLevelControlAction () if act then plan (act) updateAllProfiles () end process () end end of the schedule. This process continues until execution has completed. This process continues until execution has ﬁnished.
Because pob (mi ) is independent of pob (m1 . then. deterministic plan management system.7.. given that it starts at est (mj ). lft (mj )]. and not waste time replanning methods far in the future when they will only have to be replanned again later. µ (DUR (m)) + est (m) . mi )) where pob (mj ) is deﬁned as the probability that method mj meets its plan objective by observing the constraints in the deterministic STN. We simplify the calculation by assuming this means it ends within its range of valid end times. we would like to spend time managing only methods that are near enough to the present to warrant the effort. . with methods sorted by their scheduled start time. where the variable max horizon indicates that the horizon includes all scheduled methods. σ (DUR (m))) Multiplying these together.2.. such as a method missing a deadline or running so early or so late that a future method is unable to start in its speciﬁed time window. ASPEN is invoked both initially. We calculate the probability of a replan occurring before a method starts executing based on the baseline. and then describe our more informed. The algorithm pseudocode is shown in Algorithm 7. The highlevel equation for ﬁnding the probability that the baseline system will need to replan before the execution of some method mi in the schedule is: (1 − pob (m1 . mi ) = i pob (mj ). we can ﬁnd the probability of a replan occurring before execution reaches method mi . 7. σ (DUR (m))) − normcdf (eft (m) . The horizon is set at the method where this probability rises above the userdeﬁned horizon threshold..2. and then during execution whenever a method ﬁnishes at a time that causes an inconsistency in the STN. we ﬁnd the probability that each method m ends in the range [eft (m) . pob (m1 ... The problem. to generate an initial plan. is how to determine which methods are “near enough. which we call the horizon threshold. .. Recall that the function schedule returns the agent’s current schedule. mi−1 )... probabilitydriven strategies. lft (m)] given that it starts at its est: pob (m) =normcdf (lft (m) . 135 . .” We pose this as how to ﬁnd the ﬁrst method at which the probability of a replan occurring before that method is executed rises above a threshold.2 SingleAgent Domain ment strategy for this domain.1 Baseline Plan Management In the baseline plan management strategy for this domain. [eft (mj ) .2.. µ (DUR (m)) + est (m) .2 Uncertainty Horizon In general. To j=1 begin calculating. 7.
In our experimental results (Section 7. and the horizon correspondingly advances. The expected utility calculation within the uncertainty horizon is shown in Algorithm 7. where more methods and tasks are considered during each replan. Function : calcUncertaintyHorizon 1 pob = 1 2 foreach method m ∈ schedule do 3 pob = pob · (normcdf (lft (m) . which calculates the expected utility of the entire schedule. 7. The algorithm to calculate this value is very similar to Algorithm 5. A lower threshold corresponds to a shorter horizon. A higher horizon threshold is a more conservative approach and results in a longer horizon. however. σ (DUR (m))) 4 −normcdf (eft (m) . µ (DUR (m)) + est (m) . only those tasks within the uncertainty horizon are considered.2.4. the planner replans all methods and tasks within the uncertainty horizon. when calculating po (T G). As execution progresses.3 Likely Replanning The decision of when to replan is based on the taskgroup’s expected utility within the uncertainty horizon. ignoring methods.3 (on page 68).7 Probabilistic MetaLevel Control Algorithm 7. As long as at least one activity involved in the constraint remains outside of the horizon. these conﬂicts shift to be inside the uncertainty horizon.3. The threshold can be adjusted via trial and error as necessary to achieve the right balance between expected utility and computation time. This may cause inconsistencies between portions of the schedule inside and outside of the horizon as constraints between them are not observed while replanning. and results in fewer methods being considered by PPM. tasks and constraints outside of the horizon. 136 . Similarly. such constraint violations are ignored. we show experiments run with various thresholds and demonstrate the impact they have on plan management.1). only the proﬁles of methods and tasks within the uncertainty horizon are updated. They are handled at that time as necessary by replanning. The differences are that instead of iterating through all methods and tasks to update activity probability proﬁles.2: Uncertainty horizon calculation for a schedule with normal duration distributions. When a replan is performed. σ (DUR (m)))) if (1 − pob ) > horizon threshold then return m 5 6 end 7 return max horizon We ﬁnd the uncertainty horizon boundary and use it to limit the scope of probabilistic plan management. µ (DUR (m)) + est (m) .
for the singleagent domain with normal duration distributions and temporal constraints. σ (F T (mi ))) − normcdf (eft (mi ) . µ (F T (mi )) . µ (F T (mi )) . σ (F Ts (mi−1 ))) //calculate mi ’s finish time distribution F T (mi ) = ST (mi ) + DUR (mi )) //calculate mi ’s probability of meeting its objective po (mi ) = normcdf (lft (mi ) . σ (F T (mi ))) //calculate mi ’s expected utility eu (mi ) = po (mi ) //adjust the FT distribution to be mi ’s FT distribution given it succeeds F Ts (mi ) = truncate (eft (mi ) . 1 Function : calcExpectedUtility (uh) foreach method mi ∈ schedule until uh do //calculate mi ’s start time distribution 2 ST (mi ) = truncate (est (mi ) . µ (F T (mi )) . σ (F T (mi ))) end foreach T ∈ children (T G) until uh do if scheduledChildren (T )  == 1 then po (T ) = po (soleelement (T )) eu (T ) = eu (soleelement (T )) else po (T ) = 0 eu (T ) = 0 end end po (T G) = T ∈children(T G) until uh po (T ) eu (T G) = po (T G) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 137 . lft (mi ) . ∞. µ (F Ts (mi−1 )) .3: Taskgroup expected utility calculation.2 SingleAgent Domain Algorithm 7.7. within an uncertainty horizon.
The function updateAllProfiles (Line 3 in Algorithm 7. When one is.1). deterministic duration.7 Probabilistic MetaLevel Control Note that. where po (T G) is the probability that the schedule within the horizon executes successfully. as before. The metalevel control function (Line 4) is deﬁned as: if eu (T G) < planning threshold then return replan else return null 138 . and is what allows us to assume that methods can violate their lst as long as they do not violate their lst . Then we iteratively choose for each method the valid duration closest to its original one as the method’s new scheduled. allowing schedules with less certain outcomes and lower expected utility to remain.4. rejecting schedules with high expected utilities and replanning more often. The durations are reset to their domaindeﬁned values when the next replan is performed.4 Probabilistic Plan Management The overall PPM algorithm is derived from Algorithm 7.2. and thus its expected utility. is less than 1.1. 7. The agent next determines whether to replan. If a replan is not performed. we ﬁrst use the STN to ﬁnd all valid integer durations for unexecuted methods. This means the range of values for eu (T G) within the horizon is [0. and so performing fewer replans. however. The threshold can be adjusted via trial and error as necessary to achieve the right balance between utility and replanning time. As before. In our experimental results (Section 7. a replan is done to improve schedule utility. the agent waits for update messages to be received. we adjust methods’ scheduled durations to make the STN consistent. To do so. if the expected utility ever goes below the threshold.1) for this domain is deﬁned as calcExpectedUtility (calcUncertaintyHorizon ()) This equation ﬁnds the uncertainty horizon and updates all probability proﬁles within it. Lower thresholds equate to more permissive approaches. this is ideal. 1].2. At very high thresholds. If that is the case. no replanning is performed. corresponds to a more conservative approach. on page 63). for example. called the replanning threshold. we set eu (T G) equal to po (T G). The ability to perform this duration adjustment is why we are able to perform the extra step of extending the STN to STN for our analysis (see Section 5. close to 1. there may be a conﬂict in the STN due to a method running early or late. as we are using the value to compare to a ﬁxed threshold. As long as the expected utility of the taskgroup remains above the replanning threshold. we show experiments run with various thresholds and demonstrate the impact they have on plan management. even a conﬂictfree schedule just generated by the planner may not satisfy the criterion if its probability of successful execution. A high replanning threshold. the agent updates all probability proﬁles.
Speciﬁcally.3. 7. We use decision trees with boosting (see Section 3. has ﬁnished before they can be incorporated into the agent’s schedule and responded to. More detail on this planning system can be found in [Smith et al. replans occur when: • A method completes earlier than its expected ﬁnish time and the next method cannot start immediately (a replan here exploits potential schedule slack) • A method ﬁnishes later than the lst of its successor or misses a deadline • A method starts later than requested and causes a constraint violation • A method ends in a failure outcome • Current time advances and causes a constraint violation No updates are processed while an agent is replanning or schedule strengthening. they have nothing to process at this point.3 MultiAgent Domain We implemented a probabilistic metalevel control strategy for the multiagent domain. and so important updates must wait until the current replan or strengthening pass.1 Baseline Plan Management We use the planning system described in [Smith et al.3. as well as updates passed from other agents communicating the results of their execution. the agent should take.4) to learn the classiﬁer. changes due to replanning. then describe our probabilistic metalevel control strategy and overall probabilistic plan management approach. 2007]. 7. if any.. if one is occurring. so process does not do anything.2 Learning When to Replan Our probabilistic metalevel control strategy is based on a classiﬁer which classiﬁes the current state of execution and indicates what planning action. where the plan objective is to maximize utility earned. 7. 2007] as our baseline plan management system for this domain. We begin by discussing baseline plan management for this domain.3 MultiAgent Domain Finally. During execution. after agents have replanned (or not).. agents replan locally in order to respond to updates from the simulator specifying the current time and the results of execution for their own methods.7. or changes to their schedules that occur because of messages from other agents. The learning algorithm takes training 139 .
3. 14. We also look at the po value of maxleaf tasks and their remote children. For each of these values. 10. “replan and strengthen. we call this state the “learning state. 4. for example.” etc. 8. 5.” “do nothing. 12. 15.. methods’ projected ﬁnish times becoming earlier or later. Considering proﬁles at this level is appropriate for distributed applications like this one: higherlevel proﬁles can be poor basis for a control algorithm because the state of the joint plan does not necessarily directly correspond to the state of an agents’ local schedule and activities. and learns a function that maps from states to classiﬁcations. 6. States are based on changes in probability proﬁles at the method and maxleaf task level that result from update messages.). as well as whether any of their remote children have been scheduled or unscheduled. This allows the metalevel control strategy to respond to. we consider the maximum increase and decrease in their value or.g. for distributions. 2. or a large change in a method’s probability of earning utility.” 140 . probabilistic information about the current schedule) and classiﬁcations (e. their average and standard deviation. the largest increase in local method po the largest decrease in local method po the largest increase in local method µ (F Tu ) the largest decrease in local method µ (F Tu ) the largest increase in local method σ (F Tu ) the largest decrease in local method σ (F Tu ) the largest increase in local method µ (F Tnu ) the largest decrease in local method µ (F Tnu ) the largest increase in local method σ (F Tnu ) the largest decrease in local method σ (F Tnu ) the largest increase in maxleaf task po the largest decrease in maxleaf task po the largest increase in any maxleaf tasks’ remote child po the largest decrease in any maxleaf tasks’ remote child po whether any maxleaf task’s remote children have been scheduled or unscheduled To distinguish this state from the probabilistic state of the schedule as a whole. 11.7 Probabilistic MetaLevel Control examples that pair states (in our case. 13. 9. since changes in these values are indicative of whether probabilistic schedule strengthening may be useful. For each local method. The overall state consists of: 1. ﬁnish time distribution when it earns utility (F Tu ) and ﬁnish time distribution when it does not earn utility (F Tnu ) change. 7. we look at how its probability of meeting its objective (po).
we describe the learning approach in terms of the more general case. We next looked to see if either replanning or strengthening resulted in “important” changes to the schedule. we recorded the learning state that resulted from the changes in probabilities caused by that update. We use the former in our experiments to test the effectiveness of the probabilistic metalevel control strategy when used alone.” and “do nothing.” We use the latter to test the overall plan management approach. its possible classiﬁcations were “replan only” or “do nothing. If an important change was made. Using this data. Its possible classiﬁcations were “replan and strengthen. the classiﬁcation with the maximal sum is used as the state’s classiﬁcation. for example. as previously argued (Section 6. or do nothing. the classiﬁer outputs a sum of the weighted votes for each of the three classes. the learning state was considered a “do nothing” example.” “strengthen only. depending on which action resulted in the change. this would be considered a nonimportant change.7. In our domain. each of height 2. or replanned and strengthened in response to every update received (for the second classiﬁer). agents replanned in response to every update received (for the ﬁrst classiﬁer). Important changes were deﬁned as changes that either: (1) removed methods originally in the schedule but not ultimately executed. we used 4 training problems that closely mirrored in structure those used in our experiments. however. To generate our training data.4.” The source of the noise is the lack of direct correlation between a schedule’s probabilistic state and 141 . We analyzed the execution traces of these problems to generate training examples.2. the data are very noisy and. only strengthen. replanning should always be used in conjunction with strengthening. or (2) added methods not originally in the schedule but ultimately executed. if possible. In classic machine learning. we learned a classiﬁer of 100 decision trees. Each time a replan and strengthening pass was performed by an agent in response to an update. it was added back onto the schedule at a later time. We executed the problems 5 times each. Each sum represents the strength of the belief that the example is in that class according to the classiﬁer. which also includes probabilistic schedule strengthening. It learned when to replan based on the baseline system with no strengthening. Note that a method could be removed from the schedule but still executed if. the learning state was considered a “replan and strengthen” or “strengthen only” example. despite using decision trees with boosting (which are typically effective with noisy data).” We do not consider “replan only” a condition here because. During execution. For the rest of this section. described in Section 7. When given a state as an input. and one to decide whether to replan and strengthen. If no important change was made. we found that the noisy data were still adversely affecting classiﬁcation and almost all learning states were being classiﬁed as “do nothing.1).3 MultiAgent Domain We trained two different classiﬁers: one to decide whether to replan. which includes strengthening.
with each class weighted by their adjusted votes. the metalevel control algorithm is invoked to classify the learning state. such as “replan and strengthen. decrease. if something is drastically wrong with the schedule. via a call to updateLocalProfiles.g. and use that as the class’s new weighted vote.” We address this by changing the interpretation of the weighted votes for the three classes output by the classiﬁer. we classify the baseline learning state where no probabilities have changed (i. this means that in situations. and generates the learning state based on the new proﬁle probabilities. If the weight for one of the classiﬁcations increases. many probability values have changed). When one is.1. In terms of our classiﬁer.3. we calculate the proportion that the current learning state’s weighted votes have as compared to the baseline example.7 Probabilistic MetaLevel Control whether a replan will improve it. or remain the same. replanning is not always useful and so in our training data such learning states were often classiﬁed as “do nothing. the agent waits for update messages to be received. For example. Then.. These steps are shown below. a replan might not be able to ﬁx it. for example.e. More technically. which implements 142 .” then we determine that replanning and strengthening may be useful in that case. if nothing is wrong with the schedule. The pseudocode for the probabilistic metalevel control algorithm for this domain. similarly. As a second step to address the noisy data. As before. we randomly choose a class.. for each class.3 Probabilistic Plan Management The overall PPM algorithm is derived from Algorithm 7. since it is hard to predict whether a replan will be useful. to get a baseline classiﬁcation. a learning state of all zeroes). 7. the agent updates all of its local probability proﬁles. we treat them as relative values and compare them to a baseline classiﬁcation to see whether the weight of the classiﬁcations for “replan and strengthen” or “strengthen only” increase. a replan still might improve scheduled utility. where the need to replan may seem obvious according to the learning state (e. Instead of treating them as absolute values. Then.
rs + so + dn switch rand case rand < rs : return replan () . Based on this.1. probabilities must be updated before probabilistic schedule strengthening can be performed. dn = dn/dnb rand = random 0. 0. we ran additional experiments characterizing the overall beneﬁt of probabilistic plan management for each domain. execution continues with the schedule asis. is: rs. the next method is removed from the schedule to make the STN consistent again. however. roundRobinStrengthening () case rs ≤ rand < rs + so : return roundRobinStrengthening () case rs + so ≤ rand < rs + so + dn : return null end The algorithm classiﬁes both the learning state and the baseline learning state of all zeros. dnb = classify (0.4 Experiments and Results We performed experiments to characterize the beneﬁts of probabilistic metalevel control for the singleagent domain. and scales the learning state’s classiﬁcation weights as described above. Finally. which uses a learningbased metalevel control strategy. it randomly samples a class from the scaled values. 7.. all that is left to do is for the agent to process the changes (process in Algorithm 7. If no replan or strengthening pass is performed. Then. so = so/sob . 143 . it returns the appropriate planning action(s). sob . 0.7. 0. If there is an inconsistency in the STN due to a method running late.1. updateLocalProfiles () . If a replan is performed.1) by sending updates to other agents notifying them of appropriate changes. as well as for the multiagent domain.) rs = rs/rsb . dn = classify (learning state) rsb . so. as probabilistic schedule strengthening takes care of that. it is not necessary to update probability proﬁles after a strengthen.4 Experiments and Results metaLevelControlAction from Algorithm 7. At this point. which uses a thresholding metalevel control strategy. and the agent performs that action as per Algorithm 7.. however. 0.
We used as our test problems instances PSP94. Uncertainty Horizon Analysis The ﬁrst experiment tested how the choice of a horizon threshold affects the performance of execution. 144 . In this simple simulator.95 and 0. shown by the error bars. In this experiment.9.1. time effectively is paused and resumed when the replan is complete. are due to the randomness of execution and the randomness of ASPEN.4 GHz Pentium D computer with 2 GB RAM.1(b) shows the average number of replans per simulation run. time does not advance while replanning is performed.99 replans approximately all 20 methods. and a threshold of 0. To simulate method execution the simulator picks a random duration for the method based on its duration distribution. for the singleagent domain [Hiatt and Simmons.1. Method duration means and standard deviations were randomly repicked for each simulated run. The experiment was run on a 3. PSP100 and PSP107 of the mmj20 benchmark suite of the resourcerelaxed MultiMode Resource Constrained Project Scheduling Problem with Minimal and Maximal Time Lags (MRCPSP/max) [Schwindt. the number of methods scales roughly exponentially as the threshold linearly increases. Figure 7. We believe the high variances in execution time. 1998]. and methods are started at their est.99. advances time by that duration. A threshold of 0.1(a) shows both the average makespan of the executed schedule alone and the average makespan plus average time spent managing and replanning.75 corresponds to replanning approximately 2 methods in the future. Figure 7. the agent replanned whenever a method ended at a duration other than its scheduled one. the planner.4. The results are shown in Figure 7. We performed 250 simulation runs for each condition and problem instance combination. 0. in between these two extremes.75.7 Probabilistic MetaLevel Control 7. but replanned activities only within the uncertainty horizon. and four conditions testing horizon threshold values of 0. for clarity of the graph. The problems were augmented as described in Section 3.3. as well as PPM overall. 2007]. We ran experiments under 5 conditions: the baseline deterministic system. Execution starts at time zero.1 Thresholding MetaLevel Control We performed several experiments to characterize the effectiveness of the different aspects of metalevel control. which we refer to as execution time. It also has error bars for PSP94 showing the standard deviation of the execution time. instead. We developed a simple simulator to execute schedules for this domain. and updates the STN as appropriate. 0. we omit error bars for the other two problems.
+!($ 0+$12+'&30$#/0+.*/& "%#$ ""#$ "##$ !'#$ !&#$ !%#$ !"#$ #()*$ #('$ #('*$ #(+$ #(+*$ !$ .!#)$ .1$ 0./0123/4$ .*%$ .+%$ 15167894$8.$12.+!!$ .1: Uncertainty horizon results.!'/& ()" '#$" '" &#$" &" %#$" %" !#$" !" )#%$" )#&" )#&$" )#'" )#'$" (" !"*$ !"#$!%#&'()*+...&30$#*0. !"#$!%#&'()*#$&+.1$ ...../0123/4$ .4 Experiments and Results "'#$ "&#$ !"#$!%#&'(#&)*#+.)2& (b) Average number of replans using an uncertainty horizon (c) Average utility using an uncertainty horizon at various at various thresholds.!##$ . thresholds.+%$ ./0123/4$ ..$*/. 145 .!#)$ 15167894$8.&$#.1$ .& (a) Average makespan and execution time using an uncertainty horizon at various thresholds.!##$ 15167894$8.7.4& .." *+*())" *+*()%" ...0&+$#1.& !")$ !"($ !"'$ !"&$ !"%$ !"#$ !"(&$ !")$ !")&$ !"*$ !"*&$ +$ *+*'.4. Figure 7.
the planner is able to generate better overall plans. the balance of management time and makespan continues to change. It occurs because with longer horizons. as the threshold decreases. we therefore switched to a modiﬁed randomrestart approach [Schoning.3. lessening the number of replans that result in a timeout. the number of replans performed decreases. therefore.1(c). execution time. this resulted. however. This trend in turn explains Figure 7.1(c) shows average expected utility. Note that the notion of timeouts would be less important when using more methodical.1(a) decreases the execution time plateaus. and because there are fewer replans performed. and utility earned.1(b). since it plans with more complete information. the average execution time also increases. 25000. Figure 7. clearly. and less randomized. This point is highlighted when one considers Figure 7.g. 1999]. and ﬁnally 50000. however. 10000. each replan must take longer as the threshold approaches 1. number of replans. A repeated measures ANOVA test showed that the choice of horizon threshold signiﬁcantly affects schedule makespan. We began by allowing ASPEN to take 100.1). This suggests the use of a threshold where the overall execution time is near the lower bound. as the horizon threshold increases. This approach signiﬁcantly decreased the number of time outs.9. The trend in the number of replans in Figure 7.5000. increasing the amount of time each replan takes.1(a) shows that. As the horizon threshold increases. in the planner timing out unintuitively often. all with a conﬁdence of 5% or better. invoking it 5 separate times with an increasing limit of repair iterations .. the average schedule makespan decreases . 146 . either because a solution did not exist (e. a constraint violation could not be repaired) or a solution was not found in time. This is due to replanning more methods and tasks each time a replan is performed. such as at the thresholds of 0. Although as the threshold in Figure 7. the number of replans increases due to needing to replan extra conﬂicts that arise by extending the uncertainty horizon. Simulations that did not earn utility failed due to the planner timing out during a replan.7 Probabilistic MetaLevel Control Figure 7. where we believe higher thresholds earn on average higher utility both because they have fewer conﬂicts between activities inside and outside of the horizon. 5000. which shows that despite the execution time and replanning time increasing as the horizon threshold increases. with the schedule span increasing as the time spent managing continues to decrease. All other statistics reported are compiled from only those runs that completed successfully. planners.1(b) occurs because.75 or 0. and the schedule span is as low as possible. Believing this to be due in part to ASPEN’s randomness.an unanticipated beneﬁt.000 repair iterations to replan (Section 3.
As with the uncertainty horizon.2(c).. 0. 0. We performed 250 simulation runs for each condition and problem instance combination. increases. The overall execution time.05. since there may still be a small possibility of a constraint violation. Figure 7. In this experiment. We expect that the average utility would increase if we increased the number of iterations allowed to the planner.85 and 1. becomes problematic when one considers Figure 7.4 GHz Pentium D computer with 2 GB RAM.25. and all other statistics reported are compiled from only those runs that completed successfully. either because a solution did not exist (e. The results are shown in Figure 7.4 Experiments and Results Likely Replanning Analysis We also tested how the choice of a replanning threshold affects the performance of execution. which loosely correlates with schedule temporal ﬂexibility. The threshold of 1 clearly takes this to an extreme. Figure 7. simulations that did not earn utility failed due to the planner timing out during a replan. We ran experiments under 7 conditions testing replanning threshold values of 0. but the balance of management time and makespan continues to 147 . but replanned only when the schedule’s expected utility fell below the replanning threshold. Note that a replanning threshold of 1 has the potential to replan even if there are no actual conﬂicts in the plan. The high expectations incurred by a replanning threshold of 1 result in a higher number of failures.2(b) shows the average number of replan episodes per simulation run. a constraint violation could not be repaired) or the solution was not found in time.g. however. we believe that the high variances in execution time are due to the randomness of execution and the planner.2(c) shows the average utility earned.2(a) shows. as more replans are performed in response to more schedules’ expected utility not meeting the replanning threshold. 0. schedule makespan tends to decrease as the replanning threshold increases due to its requirement that schedules have a certain amount of expected utility. This threshold. The experiment was run on a 3. we do not show error bars for the other two problems for clarity of the graph.e. there was no uncertainty horizon). As Figure 7.2(b) explicitly shows this trend of the average number of replans increasing as the replanning threshold increases. As with the uncertainty horizon results.2. Figure 7.01. 0. due to more frequent replans which have a chance of not ﬁnding a solution in the allotted amount of time. the execution time plateaus. the agent replanned all activities (i. The same modiﬁed randomrestart approach was utilized. and Figure 7. As with the uncertainty horizon results. Figure 7. 0.2(a) shows that as the replanning threshold decreases..45. in fact. with error bars for PSP94 showing the standard deviation of execution time.2(a) shows both the average makespan of the executed schedule and the execution time.7. Method duration means and standard deviations were randomly repicked for each simulated run.65.
)!&$ $#.& !"($ !"'$ !"&$ !"%$ !"#$ )*)+$" )*)'!!" )*)'!.7 Probabilistic MetaLevel Control $'"# !"#$!%#&'(#&)*#+.*./012# )*)!"3# .(c) Average utility using likely replanning at various ning at various thresholds..!''0'%&12$#/2+. !"#$!%#&'()*#$&+./# $#01!2%&34$#*4.3& !$ !"*$ $#)!../# )*)!"3# /4/56782#7.. 148 ./# )*)!""# /4/56782#7.*/& $&"# $%"# $$"# $""# !'"# !&"# !%"# !$"# !""# "# "($# "(%# "(&# "('# !# )*)+%# .)!!$ .!'/& '%" '$" '#" '!" &" %" $" #" !" !" !(#" !($" !(%" !(&" '" )$ !"#$!%#&'()*+.. Figure 7../012# )*)!""# ." . thresholds.&$#..%&+/$#0/1)2& !"+$ !"%$ !"'$ )$ (b) Average number of replan episodes using likely replan.& (a) Average makespan and execution time using likely replanning at various thresholds./012# )*)+%# /4/56782#7.1..(+$ ..2: Likely replanning results.
2.65 or 0.4 GHz Pentium D computer with 2 GB RAM. The experiment compared the probabilistic metalevel control strategy with the baseline management strategy for this domain (Section 7. number of replans and utility. the average utility is highest at higher horizon thresholds and lower replanning thresholds. while retaining the high average utility earned when using likely replanning alone. all with a conﬁdence of 5% or better. We performed 250 simulation runs for each condition and problem instance combination.7. A repeated measures ANOVA showed that the choice of replanning threshold signiﬁcantly affects schedule makespan. Figure 7. The bar with both nouh and nolr.4. Method duration means and standard deviations were randomly repicked for each simulated run. is baseline deterministic plan management. These ﬁgures also show that using the uncertainty horizon and likely replanning together results in a lower execution time than each technique results in separately. To summarize the results. The analyses of the two individual components hold when we consider the simulations that incorporate both likely replanning and the uncertainty horizon. the average execution time in general decreases both because it replans fewer tasks during each replan episode and because fewer replans are made.3 also shows that as the thresholds decrease. which integrates the probabilistic metalevel control strategies of uncertainty horizon and likely replanning. 149 . and the label nolr refers to not utilizing likely replanning (and so only using the uncertainty horizon). with the schedule span increasing as the time spent continues to decrease.85. this experiment was run on a 3. as Figure 7. such as at a threshold of 0.3 shows the average execution time across all three problems using various combinations of thresholds of the uncertainty horizon and likely replanning. note the axes are reversed in the diagram for legibility.4 shows. The graphs indicate also that after a certain point the analysis is not extremely parameter sensitive. then. This again suggests the use of a threshold where the overall execution time is near the lower bound. Analysis of Probabilistic Plan Management We ran a third experiment to test overall PPM for this domain. We also show the average utility in Figure 7. and the schedule span is as low as possible.4 Experiments and Results change. Speciﬁcally. The bars are also labeled with their numerical values. The label nouh refers no uncertainty horizon being used (and so only using likely replanning). execution time. In the experiment we also varied the thresholds for the uncertainty horizon and probabilistic ﬂexibility. and the execution time plateaus at a global minimum.1). As with the two previous experiments. Figure 7.
while increasing average utility by 30%. The executor executes the current schedule by activating the next pending activity to execute as soon as all of its causal and temporal constraints are satisﬁed.. Using a horizon threshold of 0./01.$'(& !".& !".2 Learning Probabilistic MetaLevel Control We performed several experiments to characterize the effectiveness of the different aspects of metalevel control.a 23% reduction.3: Average execution time across all three problems using PPM with both the uncertainty horizon and likely replanning at various thresholds.& !+"$#!& !")$(#& !"+$+(& !"+$. Overall. 7.*& !"'$)'& !""$!!& !""$*+& !. run and communicate asynchronously but are grounded by a current 150 .2& !"&% !"#$% 3/45.67& !"$% '()*+% !"&&% !"&$% .9 and a replanning threshold of 0.85.$'.67& !"!$% Figure 7. which allows deliberation to be separated from execution. To perform the experiments. for the multiagent domain.4<=>.2&>?4& !!"!!% ./% !"0$% !"1$% !"2$% /4@682202:&3/45. PPM decreases the schedule execution time from an average of 187s to an average of 144s ..'& !""$'+& !""$.$!"!!% . distributed execution environment. Agents’ planners and executors run on separate threads. and achieves a signiﬁcantly better balance of time spent replanning during execution versus schedule makespan.4. All agents.'$(%& '(). A simulator also runs separately and receives method start commands from the agent executor and provides back execution results as they become known.!!"!!% $!"!!% !"!!% !+%$#*& !+%$*!& !+%$!!& !+!$!+& !+*$'!& !+*$('& !+*$#+& !+*$*(& !")$.& !"#$!%& !"+$)(& !""$'(& !""$#+& !"'$(#& !""$%'& !""$. as well as PPM overall. and the simulator. these results show that the beneﬁt of PPM is clear. we use a multiagent.7 Probabilistic MetaLevel Control !#($"+& !#($'#& 894/8:4&4.
for our purposes. We call these tasks “windows” for clarity.67% !"/$% '()12% !"#$% !"&% !"&$% . methods have deterministic utility. The methods were also typically assigned between two different agents.$% !"$% !".2% !"&&% '()*+% 3/45.!% !"!% !"./01. understanding of time. Two of these four had a utility and expected duration of 10.!% !"!!% !"#$% !"#$% !"#&% !"#&% !"#$% !"#$% !"#'% !"#'% !"#)% !"#&% !"#$% !"#(% !"#'% !"#&% !"#&% !"#'% !"#$% !"#+% !"#!% !"#&% !"#!% !"#&% !"+$% !"#(% !")!% !"#(% !")*% !"$(% !")!% !").g. Each window had 4 tasks underneath it. two subproblems with a sum and uaf.4 Experiments and Results 0"!!% 894/8:4%. The windows had either sum and or sum uafs. The subproblems had between 3 and 6 tasks underneath them. The structures of the problems were randomly created according to a variable template. The duration distributions were 3point distributions: the 151 . for the beneﬁt of the learning algorithm. Each problem had.67% Figure 7.<603=% !"/!% !". one tick equals 5 seconds.% !"!$% !". is measured in an abstract measure of “ticks”. the other two had a utility and expected duration of 5.$% /4>682202:%3/45.4: Average utility across all three problems using PPM with both the uncertainty horizon and likely replanning at various thresholds. underneath the sum taskgroup.7. All tasks at this level were max uafs and had four child methods each. method durations and deadlines. In this test suite. Time.. The test suite was generated by an inhouse problem generator. e. and there were enabling constraints between many of them. All problems had 10 agents. Test Suite We created a test suite speciﬁcally for this series of experiments in order to have problems that were similar to each other structurally. which promotes consistency in the utility of the results. and all deadlines were imposed at this level.
we are most interested in the difference between the controlbaseline and controllikely conditions to measure the effectiveness of probabilistic metalevel control. we tested probabilistic metalevel control alone. with roughly 4 execution outcomes possible for each method.3 times the expected duration.replanning in response to every update.. The controlalways condition was included to provide an approximate upper bound for how much replanning could be supported during execution.7 times the expected duration. The table also shows the number of maxleaf tasks. deterministic plan management (described in Section 7. the problems have between 128 and 192 methods in the activity network.1 shows the main features of each of the problems. deadlines. The generator then constructed problems randomly.3. tasks overall. There were no disablers. for this domain are represented in terms of ticks. Several parameters were varied throughout problem generation to get our overall test suite: • Subproblem overlap . Each problem in our test suite was generated with a different set of values for these three parameters. 3. controlbaseline . 152 . using the parameters as a guide to determine deadlines. without schedule strengthening. enables NLEs. We ran trials under three different control algorithms: 1. and ticks for each problem.7 Probabilistic MetaLevel Control highest was 1. Table 7.how much the execution intervals of the subproblems overlapped. facilitators or hinderers. controllikely . recall that method durations.how much the execution intervals of windows overlapped. one tick equals 5 seconds. For our ﬁrst experiment. In general. sorted by the number of methods and average number of method outcomes. • Window overlap .how tight the deadlines were at the window level. according to the structure above.replanning according to the baseline. • Deadline tightness .1). Forty percent of methods had a 20% likelihood of a failure outcome. the middle was the expected duration. and the lowest was 0. Analysis of Probabilistic MetaLevel Control We performed two experiments in which we try to characterize the beneﬁts gained by using our probabilistic metalevel control strategy. etc. controlalways . etc. The “ticks” ﬁeld speciﬁes the maximum length of execution for each problem. and that for our purposes.replanning according to the probabilistic metalevel control strategy. 2.
01 32 43 2 164 3 128 4.10 32 43 2 210 5 128 4.4 Experiments and Results Table 7.28 44 58 5 150 19 176 4.08 48 63 6 232 153 .01 32 43 2 232 4 128 4.14 40 53 4 233 16 160 4.11 40 53 4 80 14 160 4.7.20 32 43 2 183 7 128 4.91 32 43 2 125 2 128 4.20 32 43 2 79 8 128 4.24 32 43 2 76 9 128 4.27 32 43 2 120 10 144 4.03 40 53 4 101 13 160 4.22 40 53 4 221 18 176 4. avg # method maxenable methods tasks ticks outcomes leafs NLEs 1 128 3.1: Characteristics of the 20 generated test suite problems.17 32 43 2 131 6 128 4.33 36 48 3 233 12 160 4.16 40 53 4 212 17 160 4.50 44 58 5 169 20 192 4.14 40 53 4 164 15 160 4.33 36 48 3 93 11 144 4.
3 num replans 40. nonzero only for the controllikely case). Table 7. we ran 10 simulations. It speciﬁes the average time spent on PADS by each agent updating proﬁles (which is. As the planner is heuristic. It also shows the average number of replans performed by each agent. of course. Despite the overhead that PADS incurs. as well as the number of update messages received. as probability proﬁles tend to change more frequently than their deterministic counterparts.4 1.2 shows how much effort was spent replanning for each of the conditions.2 117. The utility is expressed as the fraction of the omniscient utility earned during execution. We deﬁne the omniscient utility for a given simulation to be the utility produced by a deterministic. Figure 7.2 num update messages 114.2: Average amount of effort spent managing plans under various probabilistic metalevel control strategies. which replans in response to 36% of update messages received.0 controlbaseline controlalways controllikely Agent threads were distributed among 10 3. PADS time 0 0 1. and the problems are sorted by how well the controlbaseline condition performs.6 GHz Intel computers.9 17. reducing the plan management time by 49%. To ensure fairness across conditions.5 shows the average utility earned by each condition.1. recall that agents decide whether to replan in response to update messages (Section 7.3).7 replanning time 5. For each problem and condition combination. each with their own stochastically chosen durations and outcomes.1. The table shows that the controlalways condition performs almost three times as many replans as does the controlbaseline condition. the overall effect of the probabilistic metalevel control algorithm is a signiﬁcant decrease in computational effort.7 Probabilistic MetaLevel Control Table 7.8 149. Problem numbers correspond to the numbers in Table 7. each with 2 GB RAM. The controllikely condition. centralized version of the planner on omniscient versions of the problems.8 117. on the other hand. Note how PADS causes more messages to be passed during execution. where the actual method durations and outcomes generated by the simulator are given as inputs. and replans in response to 7% of update messages received. this roughly correlates 154 . the fraction of omniscient utility earned can be greater than one.8 10. performs roughly a quarter of the number of replans as the controlbaseline condition. these simulations were purposefully seeded such that the same set of durations and outcomes were executed by each of the conditions of the experiment.
while still maintaining highutility schedules." !#+" !#*" !#)" !#(" !#'" !#&" !#%" !#$" !" $./01.*. to sorting by increasing problem difﬁculty. By using probabilistic metalevel control. although not signiﬁcantly so./#*0& 1)203&#!$*#4& . An important feature of this graph is the difference between controlbaseline and controlalways. their behavior disproves our conjecture: controlalways does outperform controlbaseline on some of the problems. controlalways earns slightly less utility than the controlbaseline condition.2328. due to controlalways replanning at inopportune times and missing the opportunity to respond to an important update (such as an important.5: Average proportion of omniscient utility earned for various probabilistic metalevel control strategies./01. On average.4 Experiments and Results $#$" $" !#. Although the two conditions are not signiﬁcantly different.replanning and strengthening in response to every update. although again without a signiﬁcant difference.7. but there are several in which it performs worse than the controlbaseline condition. 155 . we ran simulations executing the problem under a variety of control algorithms: 1. the controllikely condition slightly outperforms the controlbaseline condition with respect to utility. We had expected controlalways to always outperform the controlbaseline condition. Our second experiment tested probabilistic metalevel control in conjunction with probabilistic schedule strengthening. In contrast." +" $(" &" $$" (" %" %!" )" $%" $*" $" $'" *" $&" '$(52#. remote method failing)./01. in part. deliberation during execution is reduced by 49%. as the system is unresponsive to updates during replans." $!" '" $)" $+" .& !"#$!%#&'$('($)(*&(+&(.23456728/7" . in fact.72:" Figure 7. This is.235295:6" . PPMalways . For each problem in the generated test suite. These results establish the controllikely condition as the best option.
6 GHz Intel computers.replanning and strengthening whenever the baseline.replanning and strengthening. or only strengthening. Separate ratios were calculated for replanning and strengthening together. PPMbaseline . We conjectured that PPMlikely would have a considerable drop in computational effort as compared to these conditions. The design decision to base the frequency of replanning on updates instead of time was made with the belief that this would prove to be the stronger strategy. at random times. 3. 5. we conservatively took the maximum of these ratios. we expected PPMalways and PPMchange to be very adversely affected by their large computational demands. which far exceed those of controlalways. PPMinterval . each with their own stochasti156 . Agent threads were distributed among 10 3. Then. or only strengthening. In this condition. according to the probabilistic metalevel control strategy. for each problem and each agent. For each simulation performed by PPMlikely. Otherwise. we calculated the ratio of replans and strengthens performed to the number of updates received. strengthening (without replanning) when anything in the probabilistic state changes.5 times the expected interval between replans. PPMinterval and PPMrandom. as replans and strengthens would be more likely to be performed at highly dynamic times. 6. 4. and for strengthening only. We then passed the frequencies to the appropriate agents during execution so they could use them in their control algorithm. on the other hand. but not necessarily at the same times.replanning and strengthening.replanning and strengthening whenever anything in the local schedule’s deterministic state changes. or when a remote maxleaf task’s deterministic state changes. Each condition replanned and strengthened. deterministic plan management system chooses to replan. PPMchange .replanning and strengthening. were included to test our claim that PPMlikely intelligently limits replanning by correctly identifying when to replan. each with 2 GB RAM. The frequency was determined with respect to the number of updates received. we ran 10 simulations. or only strengthening. with roughly the same frequency as PPMlikely. PPMlikely . For each problem and condition combination. a replan and strengthen were programmed to occur automatically if a replan or strengthen had not happened in 2. The PPMalways and PPMchange conditions were included to provide approximate upper bounds for how much replanning and strengthening could be supported during execution. with roughly the same frequency as PPMlikely.7 Probabilistic MetaLevel Control 2. PPMrandom . at ﬁxed intervals. with roughly the same frequency as PPMlikely. while meeting or exceeding their average utility. or only strengthened. after seeing the above results for controlalways.
3 seconds managing execution. It also shows the average number of schedule strengthenings. strengthening schedules.3 2. spending a total of 113. followed by PPMbaseline (54.4 13. We deﬁne the omniscient utility for a 157 . The average time spent on each replan for this condition.7.7 26. note that the column showing the number of schedule strengthenings speciﬁes the number of times the classiﬁer returned “strengthen only. PADS time 2. The utility is expressed as the fraction of the omniscient utility earned during execution.9 2.1 38.3 seconds).4 13.6 seconds).4 14. however. is much lower that the average for all the other conditions.0 PPMalways PPMchange PPMbaseline PPMlikely PPMinterval PPMrandom cally chosen durations and outcomes.2 4. Note that PPMinterval and PPMrandom end up with slightly greater numbers of replans and strengthens than PPMlikely. This is due to their conservative implementation.1 1. Table 7. and replanning by each agent. PPMalways provides an upper bound on computation time. and replans and strengthenings.1 2.8 num replans and strengthens 140.6 num only strengthens 0 4.3: Average amount of effort spent managing plans under various probabilistic metalevel control strategies using strengthening.1 13.5 25. Clearly.8 20.6 shows the average utility earned by each condition.1 55.8 4.7 3. Figure 7. The results are what we expected. which uses the maximum of the replanning and strengthening ratios from PPMlikely.3 shows how much effort was spent during plan management for each of the conditions. this is because it replans and strengthens even when nothing in the deterministic or probabilistic state has changed.4 Experiments and Results Table 7. The table shows the average time spent by PADS updating proﬁles.2 strengthen time 79. these simulations were purposefully seeded such that the same set of durations and outcomes were executed by each of the conditions of the experiments. with a total management time of 25.2 2. described above. PPMlikely drastically reduces planning effort compared to these conditions. each agent performed.1 64.9 13.9 9. To ensure fairness across conditions.6 seconds.5 replan time 31.” and does not include strengthening following a replan.0 0 10. PPMchange is next in terms of computation time (79.3 66.9 22. and such replans and strengthens tend to be faster than replans and strengthens which actually modify the schedule.
7 Probabilistic MetaLevel Control given simulation to be the utility produced by a deterministic. on average. As the planner is heuristic.001. utility in general does not suffer very much because their schedules are strengthened. conﬁrm our claim that PPMlikely intelligently chooses not only how much to replan. PPMalways. furthering our claim of PPMlikely’s dominance over these strategies. however. The results of PPMinterval and PPMrandom. eliminating likely wasteful and unnecessary planning effort during execution. with PPMlikely ending up earning only a little more utility. We expect. PPMbaseline or PPMlikely is statistically signiﬁcantly different. would also be adversely affected by their high computational loads and probabilistic schedule strengthening would not be able to sufﬁciently protect them from the resulting utility loss. on a number of problems their performance was much worse due to their unfocused manner of replanning and strengthening. and in contrast to the controlalways condition (Figure 7. The conditions PPMalways. PPMinterval and PPMrandom are signiﬁcantly different from each other with the less conﬁdent p value of p < 0.09. with PPMinterval being the worse condition. that in such a situation probabilistic metalevel control would become a necessity for executing schedules with high utility. Using a repeated measures ANOVA. the fraction of omniscient utility earned can be greater than one. and the problems are sorted by how well the PPMalways condition did. instead of only reducing the amount of computation. protecting them against execution outcomes that would otherwise have led to failure in the face of their unresponsiveness.5). Overall. however. Along these lines. Each of these conditions. but also when to replan: although they performed slightly more replans and strengthens on average than PPMlikely. in addition. no combination of PPMalways. that value is 5). PPMchange. Looking into this further. in fact.1. PPMchange. PPMbaseline and PPMlikely are very close for all of the problems. Problem numbers correspond to the numbers in Table 7. These results conﬁrm that PPMlikely does. therefore. PPMalways and PPMchange did not seem to be as adversely affected by their high computational load. Notice that PPMlikely is signiﬁcantly better than PPMinterval and PPMrandom despite replanning and strengthening fewer times. where the actual method durations and outcomes generated by the simulator are given as inputs. intelligently limit replanning and strengthening. and perhaps even PPMbaseline. Using both replanning and probabilistic schedule strengthening together truly does allow agents to take advantage of opportunities that may arise during execution while also protecting their schedules against unexpected or undesirable outcomes. is signiﬁcantly better than both PPMinterval and PPMrandom with p < 0. 158 . we do expect that if we lessened the amount of seconds that pass per tick (currently. Surprisingly. similarly. centralized version of the planner on omniscient versions of the problems. This highlights one of the beneﬁts of probabilistic schedule strengthening: that it protects against agents not being responsive. we discovered that even though these conditions do suffer from lack of responsiveness.
the amount of time spent replanning and schedule strengthening is decreased by 79% as compared to PPMalways. 2." $%" $*" '" )" &" $$" $+" $)" %!" $" %" $!" $(" (" *" +" '$(52#. with no signiﬁcant difference in utility earned. 3. deterministic plan management system chooses to replan.replanning and (probabilistically) strengthening whenever the baseline." !#+" !#*" !#)" !#(" !#'" !#&" !#%" !#$" !" $&" $. and 55% as compared to PPMbaseline.6: Average proportion of omniscient utility earned for various probabilistic metalevel control strategies. When PPMlikely is used.7=9>?01" . deterministic system would replan. 4.replanning whenever the baseline.replanning and deterministically strengthening whenever the baseline. the beneﬁt of probabilistic metalevel control is clear. deterministic system would have./1./.4 Experiments and Results !"#$!%#&'$('($)(*&(+&(.<913" Figure 7./>07@AB" ." $'" .replanning according to the probabilistic metalevel control strategy. we ran an experiment comparing PPMlikely with four other conditions: 1. dsbaseline ./012034" .& . controlbaseline .79" ./560789" .*. Analysis of Probabilistic Plan Management To highlight the overall beneﬁt of PPM.7. PPMbaseline ./:0491. controllikely ./#*0& &1)203&#!$*#4& $#$" $" !#. 159 .
To ensure fairness across conditions. the time spent managing plans was much higher. The backup method must have an average utility of at least half the average utility of the original method. each with 2 GB RAM. It also shows the average number of schedule strengthenings. schedule strengthening time would not increase by as much.). The sibling method’s average utility must be at least half the average utility of the original method. with the additional step of following every replan with a less informed. we ran 10 simulations. This large difference is due to the simpliﬁed activity network structure of the problems we utilized for this test suite (i.” and does not include strengthening following a replan.e. etc.7 Probabilistic MetaLevel Control The ﬁrst. Table 7. however. As with the previous experiment. For the deterministic conditions that do not strengthen. We anticipate that the savings PPMlikely provides would be much more signiﬁcant for more complicated problems. PPMlikely performed far fewer replans/strengthens than the deterministic baseline condition (19. agents spent an average of 47. • Swap in a shorterduration sibling for methods under a maxleaf whose est plus their maximum duration is greater than the lst of the subsequently scheduled method (on the same local schedule). agents in the control condition spent an average of 6. deterministic version of schedule strengthening [Gallagher. however.4 shows the average amount of time spent managing plans under the various strategies.7 seconds replanning in the control condition. these simulations were purposefully seeded such that the same set of durations and outcomes were executed by each of the conditions of the experiments.8 for controlbaseline). The condition dsbaseline is similar to controlbaseline. This is because of a higher computational cost to probabilistic schedule strengthening than to replanning for the problems of this test suite. as its complexity is more dependent 160 . each agent performed on average. second and fourth conditions are straightforward. and are repeated from the above experiments. note that the column showing number of schedule strengthens speciﬁes the number of times the classiﬁer returned “strengthen only. For each problem and condition combination. Deterministic strengthening makes two types of modiﬁcations to the schedule in order to protect it from activity failure: • Schedule a backup method for methods under a maxleaf that either have a nonzero failure likelihood or whose est plus their maximum duration is greater than their deadline.1 seconds replanning. In these results.8 versus 40. 2009]. for this test suite. each with their own stochastically chosen durations and outcomes.6 GHz Intel computers. and replans and strengthenings. in more complicated problem structures. no disablers or soft NLEs. In the previous test suite. In contrast.. the replans and strengthens column denotes the average number of replans performed. agent threads were distributed among 10 3.
9 1. As the planner is heuristic. are signiﬁcantly different from each of 161 .4 num replans (and strengthens) 40.7 0 2.4 controlbaseline controllikely dsbaseline PPMbaseline PPMlikely on the number of methods than it is on the complexity of the activity network. dsbaseline.2 num only strengthens 0 0 0 0 10.1 64.7. however. the fraction of omniscient utility earned can be greater than one. performs on average better than the controlbaseline condition.8 20.8 10.9 13. The utility is expressed as the fraction of the omniscient utility earned during execution. all of the deterministic conditions.1.2 51. these conditions. this roughly correlates to sorting the problems by increasing problem difﬁculty. and is the clear winner.8 seconds over controlbaseline – a minute overhead when compared to an average execution time of 802 seconds. PPMlikely clearly outperforms. where the actual method durations and outcomes generated by the simulator are given as inputs. on average. and 8. We deﬁne the omniscient utility for a given simulation to be the utility produced by a deterministic. Figure 7. there is no statistically signiﬁcant difference between PPMlikely and PPMbaseline.6% more utility than the controlbaseline condition. Problem numbers correspond to the numbers in Table 7. PADS time 0 1.5 replan time 5. was less than 5 utility points. and the problems are sorted by how well the control condition did.8 38. though. specifying the absolute average utility earned by each condition. Figure 7. PADSlikely incurs an average overhead of only 17.7 3. Ultimately. Still. PPMlikely earned 13. On average.1 1. however. Using a repeated measures ANOVA.9 strengthen time 0 0 2. even in this domain. because its limited analysis does not identify all cases where failures might occur and because it does not determine whether its modiﬁcations ultimately help the schedule. centralized version of the planner on omniscient versions of the problems. The deterministic strengthening condition. the utility difference.4: Average amount of effort spent managing plans under various probabilistic plan management approaches.9 9.3 5.4 Experiments and Results Table 7.7 shows the average utility earned by each problem. It does underperform dsbaseline for problem 8. it fails to match PPMbaseline or PPMlikely in utility.1% more than the dsbaseline condition.8 explicitly shows this.
/012178619" :52345617.& .6" .6" ..7 Probabilistic MetaLevel Control !"#$!%#&'$('($)(*&(+&(./012345617." !#+" !#*" !#)" !#(" !#'" !#&" !#%" !#$" !" $.8: Average utility earned for various plan management approaches." +" $(" &" $$" (" %" %!" )" $%" $*" $" $'" *" $&" '$(52#..63456728/7" <<=3456728/7" <<=328972:" Figure 7.7: Average proportion of omniscient utility earned for various plan management approaches. 162 ./*#0& .. #!!" '#!" '!!" &#!" &!!" %#!" %!!" $#!" $!!" #!" !" $(" $!" '" $)" $*" (" *" $#" &" $$" #" %" %!" )" $%" $+" $" $'" +" $&" $./01../#*0& 1)203&#!$*#4& $#$" $" !#./01.6" .2328972:" .23456728/7" ." $!" '" $)" $+" .& Figure 7.<2178619" !"#$!%#&'()*)+.<2345617.*.
1. Additionally.5 shows the average amount of communication that occurred for each agent.90 0.5: Average size of communication log ﬁle. Table 7. it can greatly decrease the computational costs of plan 163 . We determine this by comparing the large difference in the amount of communication of controlbaseline versus PPMbaseline (0. communication could be explicitly limited as described in Sections 5. If not. communication happens more often because schedules are changing more frequently due to the added step of schedule strengthening.1. When probabilistic metalevel control is used alone.92 0. avg log size (MB) 0. these results highlight the beneﬁt of probabilistic plan management. In these experiments. the level of communication was within reasonable range for our available bandwidth.4 Experiments and Results Table 7.5 and 8. Overall.79MB).2). The dsbaseline condition is signiﬁcantly different from each of the other conditions with p < 0.96 0. PPM increases the amount of communication between agents. as probability proﬁles change more frequently than deterministic information (as was suggested by Table 7.7. with the small difference between the amount of communication for controllikely versus PPMlikely (0.001.71 0.71MB versus 0.3. In distributed scenarios. expressed as the average ﬁle size (in megabytes) of a text log ﬁle that recorded all outgoing messages across agents. The table suggests that the main source of communication increase is communicating the probability proﬁles.79 controlbaseline controlalways controllikely PPMalways PPMchange PPMbaseline PPMlikely the other conditions with p < 0. The controlbaseline and controllikely conditions are not signiﬁcantly different. as agents must communicate more information about each activity.31 0. Communicating such information also increases the frequency of communication.29 0. The difference between the PPMlikely and PPMalways conditions also shows that probabilistic metalevel control is able to decrease communication somewhat by limiting the frequency of replanning and probabilistic schedule strengthening.96MB).29MB versus 0. not the added step of schedule strengthening.001.
PPM. 164 . is capable of signiﬁcantly increasing the utility earned at a modest increase in computational effort for this domain. including both probabilistic schedule strengthening and probabilistic metalevel control.7 Probabilistic MetaLevel Control management for this domain. and achieves its goal effectively managing plans during execution in a scalable way.
we do not limit its computation time.1). however. and let it run until completion.1. [Russell and Wefald. A more dynamic approach would be to utilize a metalevel control technique (e. There are other ways to use the probabilistic information provided by PADS. we found this to be effective for our domains.1 Future Work Our current system of probabilistic plan management has been shown to be effective in the management of the execution of plans for uncertain. our metalevel control strategies ignore the cost of computation. If it is necessary to limit schedule strengthening computation. and focus instead on its possible beneﬁt. 1991. Hansen and Zilberstein.. In this thesis. This works well to lessen computation. however.Chapter 8 Future Work and Conclusions 8. whether our approach could be combined with 165 . temporal domains. As implemented. 2001b]) to adaptively decide whether further strengthening was worth the effort.1 MetaLevel Control Extensions Although we do not explicitly describe it as such. we previously suggested imposing a ﬁxed threshold on the po values of activities considered (Section 6. It would be interesting to see. that remain to be explored. 8. We discuss below some main areas in which we believe it would be beneﬁcial to extend PPM in the future. and is sufﬁcient for domains. such as the multiagent domain we present.g. probabilistic schedule strengthening is an anytime algorithm where the beneﬁt of the initial effort is high and tapers off as the algorithm continues due to the prioritization of strategies and activities by expected utility loss. where deliberation and execution occur at the same time.2.
1998]). Kushmerick et al. We also did not explicitly consider the cost of communication.8 Future Work and Conclusions one such as in [Raja and Lesser. One example strategy would be to communicate only probability changes that are likely to cause a planning action in another agent. when choosing between two children of an OR max task. Blythe. Raja et al. Although many deterministic planners have been augmented to handle decisiontheoretic planning (e.. This coordinated probabilistic schedule strengthening could be achieved by agents communicating with each other how they expect their schedules to be impacted by hypothetical changes. An alternate strategy would be to reason about which information is most important to pass on at a given time. the eug heuristic could be used 166 . For example. and which can be reserved for later. Although we propose a simple way to limit communication.g. 8. [Blythe. 2010. In accordance with the philosophy of this approach. in situations with a ﬁrm cap on bandwidth more thorough deliberation may be useful. PADS could also help to guide the search at key points by using the expected utility loss and expected utility gain heuristics to identify activities that are proﬁtable to include in the schedule. however. in oversubscribed settings it may be useful to coordinate with other agents to change which portions of the task tree are scheduled.. as we assume sufﬁcient bandwidth in our multiagent domain. 2010].. in the multiagent domain. This would involve using our probabilistic state information as part of an MDP model that chooses deliberative actions based on both the state of the current schedule and performance proﬁles of deliberation. 2007. For example. most have done so at the expense of tractability.2 Further Integration with Planning An interesting area of exploration would be to integrate PPM with the deterministic planner being used. 1995. by only communicating changes with a minimum change in probability proﬁle. 1999. this would involve a learning process similar to how we learn when an agent should replan based on the updates it receives. and when it would be useful [Rubinstein et al. Similarly. instead of layering PPM on top of the plans the planner produces.1. 2003] to fully integrate both domain and computation probabilistic models into a metalevel control strategy.. improving the search without fully integrating probability information into the planner. if one agent is unable to improve its expected execution outcome via probabilistic schedule strengthening. Musliner et al. reasoning could be performed to decide what level of coordination between agents is useful. A simple example of this in the singleagent domain using PADS to determine which child of a ONE task should be scheduled to maximize expected utility. with the group as a whole committing to the change which results in the highest overall expected utility. with the goal of ﬁnding tasks to execute that have a better expected outcome. PADS could be integrated with the planning process at a lower level..
we would like to use our approach in conjunction with FFReplan [Yoon et al. as is. FFReplan uses an “alloutcomes” technique that expands a probabilistic action such that there is one deterministic action available to the planner for each possible probabilistic effect of the original action. PPM could be used to manage the execution of plans for teams of both human and robotic agents.3 Demonstration on Other Domains / Planners One extension of this work that we would like to see is applying it to other planners in order to demonstrate its principles in another setting. 2005]. and then try switching to alternate deterministic actions to try to bolster the probability of success. One possible way FFReplan could beneﬁt from our approach is to analyze the current deterministic plan. These more complex plan objective functions would make schedule strengthening more difﬁcult since it is harder to predict what types of actions will bolster lowutility portions of the plan. including humans in the loop introduces the possibility of having plan stability as a component of the objective function [van der Krogt and de Weerdt. it could be used to explicitly encourage plan stability and limit the number of times (and the degree to which) the plans change overall. 2007] in order see what improvement could be garnered. In the problems FFReplan considers. uncertainty is in the effects of actions. humans may like their job more if they are frequently assigned tasks that they enjoy performing. This would reﬂect. For example.1 Future Work to suggest which would have the highest positive impact on taskgroup utility. ﬁnd a likely point of failure. it could also be extended to speciﬁcally tailor to those types of situations. Additionally.1. plans should remain relatively stable throughout execution in order to facilitate continued human understanding of the plan as well as agent predictability. which implicitly encourages plan stability. Instead of using metalevel control to limit only the frequency of replanning. First.1. In its determinization of these problems. Planning is then done based on these deterministic actions. Ideally. an example of this is discouraging replanning a human’s schedule at times when the human is likely 167 . a human getting tired and assigning a higher cost to execution as time goes on. for example.8. in many situations it would be appropriate to include humans’ preferences into the plan objective.4 HumanRobot Teams Although. Other modiﬁcations to the objective function include discounting utility earned later in the plan or discounting utility once a certain amount has been earned. 8. in humanrobot scenarios. complicating the analysis.. For example. It could also be used to choose the best times to replan. 8.
for example. PADS assumes that plan execution has a speciﬁc objective that can be expressed in terms of utility and that the domain has a known uncertainty model. negotiation could be one way of achieving coordination. a general way to probabilistically analyze deterministic schedules. selfinterested agents have little incentive. based on agents’ projected expected gain in utility. agents would track not only the expected beneﬁt from receiving information. 8. Similar ideas apply to probabilistic schedule strengthening. but also whether sharing information would have an expected cost. Again. such metrics could be used to decide whether an information trade was worth while. Breaking that assumption entails rethinking much of the PPM strategies.2 Conclusions This thesis has described an approach to planning and execution for uncertain domains called Probabilistic Plan Management (PPM). Noncooperative. The distributed PADS algorithm and basic communication protocol.8 Future Work and Conclusions mentally overloaded. The thesis claims that explicitly analyzing the probabilistic state of deterministic plans built for uncertain domains provides important information that can be leveraged in a scalable way to effectively manage plan execution. 8. PPM relies on a probabilistic analysis of deterministic schedules to inform probabilistic schedule strengthening and to act as the basis of a probabilistic metalevel control strategy. We support this thesis by developing algorithms for each of the three components of the approach. in the contexts of a singleagent domain with temporal constraints and activity durational uncertainty and a multiagent domain with a rich set of activity constraints and sources of uncertainty.1. It explicitly considers the uncertainty model of the do168 . Ideally. agents may share information or query other agents for information if they feel that it would raise their expected utility [Gmytrasiewicz and Durfee. PADS analyzes deterministic schedules to calculate their expected utility. We developed PADS. to schedule a backup activity for other agents. With this information. would need to change as agents may not have incentive to share all of their probability proﬁle information.5 NonCooperative MultiAgent Scenarios Our multiagent work assumes that agents are cooperative and work together to maximize joint plan utility. Negotiation would also be a possibility for an information sharing protocol. and favoring replanning at a time when the human has the capacity to fully process and understand the resulting changes. Instead. providing some of the beneﬁts of nondeterministic planning while avoiding much of its computational overhead. for example. 2001].
distributed domain.8. Probabilistic schedule strengthening is also best done when the structure of the domain can be exploited to identify classes of strengthening actions that may be helpful in different situations. probabilistic plan management is able to take advantage of the strengths of nondeterministic planning while. Overall. We also developed a way to strengthen deterministic schedules based on the probabilistic analysis. Through experimentation. As part of this thesis. We conclude that probabilistic plan management. than their initial deterministic counterparts. with only a small added overhead. making them more robust to the uncertainty of execution. Probabilistic metalevel control reduces unnecessary replanning. This step is best done in conjunction with replanning during execution to ensure that robust. It identiﬁes cases when there are likely opportunities to exploit or failures to avoid. we found strengthened schedules to earn 36% more utility. We investigated the use of a PADSdriven probabilistic metalevel control strategy. schedule expected utility can be globally calculated in an average of 0. which effectively controls planning actions during execution by performing them only when they would likely be beneﬁcial. avoiding its computational burden. A novelty of our algorithm is how quickly PADS can be calculated and its results utilized to make plan improvements. in large part. due to analytically calculating the uncertainty of a problem in the context of a given schedule. and replans only at those times. we performed experiments to test the efﬁcacy of our overall PPM approach. For a singleagent domain. and improves agent responsiveness during execution. including both probabilistic schedule strengthening and probabilistic metalevel control. probabilistic schedule strengthening makes up for the lack of agent responsiveness by ensuring that agents’ schedules are strengthened and robust to unexpected outcomes that may occur while an agent is tied up replanning or strengthening. highutility schedules are always being executed. We also found that even in cases where it increases computation time. PPM using probabilistic schedule strengthening in conjunction with replanning and probabilistic metalevel control resulted in 14% more utility earned on average. on average. achieves its goal of being able to leverage the probabilistic information 169 . It is able to signiﬁcantly improve the expected outcome of execution while also reasoning about the probabilistic state of the schedule to reduce the computational overhead of plan management. 1025 agent problems. For large. For a multiagent. with no signiﬁcant change in utility earned.58 seconds. when both were executed with no replanning allowed. Experiments in the multiagent domain showed that this strategy can reduce plan management effort by an average of 49% over approaches driven by deterministic heuristics.2 Conclusions main and propagates probability information throughout the schedule. PPM afforded a 23% reduction in schedule execution time (where execution time is deﬁned as the schedule makespan plus the time spent replanning during execution).
8 Future Work and Conclusions provided by PADS to effectively manage plan execution in a scalable way. 170 .
In Michael T. Musliner. Journal of Artiﬁcial Intelligence Research. and Robert P. 2010. 2007. Masahiro Ono.3 Avrim Blum and John Langford.1. and David Musliner. Metareasoning: Thinking about thinking. Mean and variance of truncated normal distributions. optimal predictive control of jump markov linear systems using particles.1 J. September 2007. In Proceedings of IJCAI ’01. 2008. Ugur Kuter. Wilson. Hybrid Systems: Computation and Control. MIT Press. November 1999.2. 2. In Proceedings of the 3rd Workshop on Planning and Plan Execution for RealWorld Systems (held in conjunction with ICAPS 2007). 2. Controlling deliberation in coordinators. HOTRiDE: Hierarchical ordered task replanning in dynamic environments.3 N. Proactive algorithms for job shop scheduling with probabilistic durations. Alessandro Cimatti. 1. Williams. 2007. 2001. and David J. 3.1.2 171 . Fusun Yaman. 2.Chapter 9 Bibliography George Alexander. and Brian C. Probabilistic planning in the graphplan framework. Todd Sherrill. 2. Robust. Fazil Ayan.3 George Alexander. Cox and Anita Raja. 2.1. In Seventh International Joint Conference on Autonomous Agents and MultiAgent Systems AAMAS ’08. 1999.3 Piergiorgio Bertoli. 53(4):357–361.3. Controlling deliberation in a markov decision processbased agent. 28:183–232. Goldman. Marco Roverie. In Proceedings of the Fifth European Conference on Planning (ECP ’99). C. Anita Raja.2 Lars Blackmore. Planning in nondeterministic domains under partial observability via symbolic model checking. 2. Askar Bektassov. Barr and E. Anita Raja. 2.1. 4. editors. Beck and N.1. and Paolo Traverso. The American Statistician.2 Donald R.
URL citeseer. Bagging predictors. 2005. 2009. 1 Olivier Buffet and Douglas Aberdeen.1.2 S. Carnegie Mellon University.2. T. 2. and Bob Kohout. Conformant planning via symbolic model checking.automated planning and scheduling for space mission operations.1.3. 3.html. 2. Littman.3 172 . Cox. Barrett. Bryan Horling. Machine Learning.1.3. Toulouse. ist.06.com/˜bryce/ICAPS08workshop. October 2005. D. 2. G. PhD thesis.4 Daniel Bryce. 1. Workshop on a reality check for planning and scheduling under uncertainty (held in conjunction with ICAPS ’08). Knight. Long. and D. Boyan and Michael L. G.2 Justin A. Robert Goldman.2. R.2 Mark Boddy and Thomas Dean. B. T. 2000.1 Amedeo Cesta and Angelo Oddi. Stebbins. Artiﬁcial Intelligence. 1999. B.1.html. Tran.ist. 1996. 3. 67(2):245–286. Engelhardt.1 L. Journal of Artiﬁcial Intelligence Research.edu/ cimatti00conformant. Artiﬁcial Intelligence. Exact solutions to timedependent mdps. and Sungwook Yoon. 2000. In Advances in Neural Information Processing Systems (NIPS). Breiman.2. Chien. Metacognition in computation: A selected research review. 173(56):722–747. 169(2):104–141.html.9 Bibliography Jim Blythe. Aspen . 1998. Decisiontheoretic deliberation scheduling for problem solving in timeconstrained environments. 3.psu.1. June 2000.3. 2. URL http:// www. 2. Sherwood.2 Jim Blythe and Manuela Veloso.psu.3 Michael T. 1997. In Proceedings of the Fourteenth International Conference on Artiﬁcial Intelligence (AAAI ’97).1 Alessandro Cimatti and Marco Roveri.ai. Estlin. In Space Ops.edu/chien00aspen. 2. R. Artiﬁcial Intelligence. Fisher. Gaining efﬁciency and ﬂexibility in the simple temporal problem. In Proceedings of the Third International Workshop on Temporal Representation and Reasoning. The factored policygradient planner. Rabideau. 3. 8. C.1. Decisiontheoretic planning.3. 1994. 20(2):37–54.2 Jim Blythe. Smith. 24:123–140. 13:305–338. Planning under Uncertainty in Dynamic Domains. AI Magazine. F.1. Menlo Park. Analogical replan for efﬁcient conditional planning. 2008. John Phelps.3 Mark Boddy. 2. C TAEMS language speciﬁcation v. 2. 2. California. Mausam. 1996.sri. URL http://citeseer. 8. Regis Vincent. Mutz.
3 173 . 1994.ist.1.1. and Steve Chien. 2006. 2003. Smith. In Proceedings of the Second International Conference on Artiﬁcial Intelligence Planning Systems (AIPS). 2. Artiﬁcial Intelligence.ist.1. Probabilistic planning with information gathering and contingent execution.9 Bibliography Richard Dearden. Nau. Issa Nesnas. TAEMS: A framework for environment centered analysis & design of coordination mechanisms. and Rich Washington. Nicolas Meuleau. 49:61–95.1. 2004. Temporal constraint networks. 2.1.2.2 Maria Fox. Justincase scheduling. Nicolas Meuleau. Sailesh Ramakrishnan. 2.1. Foundations of Distributed Artiﬁcial Intelligence.2 Rina Dechter.3. 3.html.2 Kutluhan Erol.2. and Judea Pearl. Preventing unrecoverable failures through precautionary planning. John Bresina. URL citeseer.3 Zhengzhu Feng. Computing robust plans in continuous domains. Menlo Park. URL citeseer. David E. In Procceedings of ICAPS Workshop on Planning under Uncertainty.1.1. In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS). 2. 3. 1997.4 Christian Fritz and Shiela McIlraith. O’Hare and N. In Workshop on Moving Planning and Scheduling Systems into the Real World (held in conjunction with ICAPS ’07). Richard Dearden.1 Janae Foss. 1996.3 Yoav Freund and Robert E. Smith. Montreal. Schapire.psu. 1994.2 Mark Drummond. In Proceedings of iSAIRAS 2001. 1994. Darren Mutz. 2009. Exploration of the robustness of plans. 55(1):119–139. editors. and David E. 2. 3. James Hendler. 2. 2. Wiley InterScience. chapter 16. and Keith Swanson. In G. Journal of Computer and System Sciences. pages 429–448. Steve Hanks. Barbara Engelhardt. Dynamic programming for structured continuous markov decision problems.html. In Proceedings of AAAI94.2 Denise Draper.psu. In Proceedings of AAAI06. In Proceedings of the Twentieth Conference on Uncertainty in Artiﬁcial Intelligence (UAI ’04). Htn planning: Complexity and expressivity. and Derek Long. 2007. and Daniel Weld. Rich Volpe. and Dana S. Itay Meiri. Nulifer Onder. California. 3. A decisiontheoretic generalization of online learning and an application to boosting. Decisionmaking in a robotic architecture for autonomy.2. 2. Richard Howey.1 Tara Estlin. Incremental contingency planning.edu/dearden03incremental. and Richard Washington. May 1991. Forest Fisher.2 Keith Decker. CA. edu/drummond94justcase. June 2001. In Proceedings of the Twelfth National Conference on Artiﬁcial Intelligence. Jennings. 2.
2. In Proceedings of the 3rd Workshop on Planning and Plan Execution for RealWorld Systems (held in conjunction with ICAPS 2007).2. September 2007. PhD thesis.2 Piotr J. 2009. and Reid Simmons. 1997. Laurie Snell.umass.3. 4(3):233–273. 3. RI. 2. 5. Hiatt and Reid Simmons. Monitoring and control of anytime algorithms: A dynamic programming approach. 2. Stephen F. 1996. 129(12):35–62. LAO*: A heuristic search algorithm that ﬁnds solutions with loops.1. URL http://rbr. 2009. Artiﬁcial Intelligence. edu/shlomo/papers/HZaij01b. URL http://rbr. Stephen F. In Proceedings of the Workshop on A Reality Check for Planning and Scheduling Under Uncertainty (held in conjunction with ICAPS08). In Proceedings of the TwentyFirst International Joint Conference on Artiﬁcial Intelligence (IJCAI09).2 174 . 2001.1 Laura M. In Proceedings of the International Conference on Artiﬁcial Intelligence Planning Systems (AIPS ’00).edu/shlomo/papers/HZaij01a.cs. 7. Hiatt.3 Eric A. 2. 2006. Artiﬁcial Intelligence.2. 8. Smith. 1. Introduction to Probability. In AIPS.1.4. American Mathematical Society.cs. Terry L. 5. 2008.pdf. 2.3. Fast plan adaptation through planning graphs: Local and systematic search techniques. Boddy. Strengthening schedules through uncertainty analysis. Autonomous Agents and MultiAgent Systems. 6. Hiatt. Grinstead and J. Carnegie Mellon University.5 Robert P.html. Smith.1. 5.2.4.1.3.2.2 Anthony Gallagher.3 Charles M. Durfee. 8. 6. 6. 2001b. 126(12):139–157. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS ’06). and Stephen F. Zimmerman. URL http://www.2. Hansen and Shlomo Zilberstein. 2000. Providence.umass.html.drexel.9 Bibliography Anthony Gallagher.1 Laura M. 7.2. Incremental scheduling to maximize quality in a dynamic environment. Zimmerman.4. 2. Smith. Goldman and Mark S. Reasoning about executional uncertainty to strengthen schedules. 2001a.2 Alfonso Gerevini and Ivan Serina. 6. Hansen and Shlomo Zilberstein. and Reid Simmons.1 Laura M. 7.4.3. Zimmerman.2 Eric A. 4. Gmytrasiewicz and Edmund H. 1.1.edu/ ˜pmodi/papers/sultanikijcai07.1. Expressive planning and explicit knowledge. Torwards probabilistic plan management. Embracing Conﬂicts: Exploiting Inconsistencies in Distributed Schedules using Simple Temporal Network Representation. 2. Terry L.3. Terry L. cs. Rational communication in multiagent systems. 2.2.
1 Nicholas Kushmerick.2 A. Laborie and M.1 Thomas L´ aut´ and Brian C. MIT Press.ac. Howard. Ghallab. Montreal. 2005. The FF planning system: Fast plan generation through heuristic search. 2.iiit.1. Consistency in networks of relations. Planning in probabilistic domains using a e deterministic numeric planner. 2006. Thomas Ou.1 Leslie Pack Kaelbling. 1977. and Fei Xiao.2.1. 2.in/ijcai/ IJCAI95VOL2/PDF/081. In Proceedings of IJCAI05. Hournal of Artiﬁcial Intelligence Research. Lazy approximation for solving continuous ﬁnitehorizon MDPs. 2.1. and Amanda Smith. Coordinating agile systems through the modelbased exe e ecution of temporal plans. e In Proceedings of the Twentieth American National Conference on Artiﬁcial Intelligence (AAAI ’05). 2. Pittsburgh. In Proceedings of the 25th Workshop of the UK Planning and Scheduling Special Interest Group (PlanSIG 2006). 2. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS ’07).3. Littman. September 2007. 101. 2. December 2006. 2. Planning and acting in partially observable stochastic domains.1 Hoong Chuin Lau.1 Iain Little and Sylvie Thi´ baux. In Proceedings of the Twentieth National Conference on Artif icial Intelligence (AAAI ’05).1. 2005. Artiﬁcial Intelligence. PA.1. Douglas Aberdeen. 2. Dynamic Programming and Markov Processes. In Proceedings of the Twentieth American National Conference on Artiﬁcial Intelligence (AAAI ’05). 2. Robust local search and its application to generating robust local schedules. 2.9 Bibliography Jorg Hoffmann and Bernhard Nebel. Williams. 2. Michael L. 1995. Andrew Coles.1 175 . Howard. An algorithm for probabilistic planning.2 Iain Little.1. 2001.1 Ronald A.3.2 Ronald A. Artiﬁcial Intelligence. 8.1 Sergio Jim´ nez.1. July 2005. Littman. Steve Hanks. Dynamic Probabilistic Systems: SemiMarkov and Decision Processes. and Sylvie Thi´ baux. Planning with sharable resource constraints. 14:263:302. Cassandra. 76(12):239–286.2.2. Concurrent probabilistic planning in the graphplan framework. 1998.3 Lihong Li and Michael L. 1971. Machworth. and Daniel Weld. John Wiley & Sons. 8(1):99–118. Canada. 2. New York. 3. 1960.pdf. In e Proceedings of ICAPS ’06.2 P. K. and Anthony R. URL http://dli. New York.1. Prottle: A probabilistic temporal planner. 2. 1995. Artiﬁcial Intelligence.1.
2. Robert P. Maheswaran and Pedro Szekely.2 Janusz Marecki. Dolgov. In Working Notes of the AAAI Spring Symposium on Distributed Plan and Schedule Management. Edmund H. Conditional. and Milind Tambe. 1999. In Proceedings of the 16th National Conference on Artiﬁcial Intelligence.1 David J.1.1 Mausam and Daniel S. MA. Coordinated plan management using multiagent MDPs. Garrett C. CIRCA: A cooperative intelligent realtime control architecture. E. Jianhui Wu.1 K.3 David J. 2. Aversion dynamics’ scheduling when the system changes. McKay. Jim Carcioﬁni. Krebsbach. Robert P. Wang. In Proceedings of AAAI06. 8. Musliner. Musliner. Edmund H. 2006. Robert P. Deliberation scheduling strategies for adaptive mission planning in realtime environments. Jianhui Wu. Whelan.3. Weld.2 Nulifer Onder. Durfee. Musliner. In Proceedings of the 18th International Conference on Automated Planning and Scheduling (ICAPS).1.1. 2005. In Workshop on Planning under Uncertainty (held in conjunction with ICAPS ’03).1 David J. 2.9 Bibliography Rajiv T. and Thierry Vidal. and Kang G. E. pages 494–502. T. In Proceedings of IJCAI.1.1 David J. Shin. March 2006.1 Nulifer Onder and Martha Pollack. 25:1–15. N. Morton. In Third International Workshop on Self Adaptive Software. 1993. Goldman. 2. Boddy.1 Mausam and Daniel S. Weld. 2003. 2007. IEEE Transactions on Systems. 2. 3(2). and J. July 2006. Probabilisitic temporal planning with uncertain durations.2. 23(6). 1 Nicolas Meuleau and David E. Flexibly integrating deliberation and execution in decisiontheoretic agents. Dmitri A. 2001.1. In Proceedings of the 3rd Workshop on Planning and Plan Execution for RealWorld Systems (held in conjunction with ICAPS07). 2000. and Mark S. Goldman. Concurrent probabilistic temporal planning. 2.1. 2. Nicola Muscettola.1. 1. Journal of Scheduling.2 Paul Morris.3 176 . 2. 2. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS ’05). 2. Boddy. and Li Li. Goldman. Sven Koenig. probabilistic planning: A unifying algorithm and effective search control mechanisms.1.1. Musliner. Dynamic control of plans with temporal uncertainty. Ramnath. 2. and Kurt D. September 2007. In Proceedings of the International Joint Conference on Artiﬁcial Intelligence (IJCAI).1. 2008. 2. Smith. Optimal limited contingency planning. and Cybernetics. A fast analytical algorithm for solving markov decision processes with realvalued resources. 2003. Criticality metrics for distributed plan and schedule management. Engineering a conformant probabilistic planner. Durfee. Man. 2. Journal of Artiﬁcial Intelligence Research. P. and Mark S. Boston.
12(3):299–314.4. 3. 2. 8. 1991. AI Communications. Smith. From precedence constraint posting to partial order schedules. MD. Smith.2. Towards proactive replanning for multirobot teams. Cox and Anita Raja. 7. November 1998. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science. 3. Generation of resourceconstrained project scheduling problems subject to temporal constraints. C4. Amedeo Cesta. MIT Press.4. Angelo Oddi. 8. AI Magazine.2 Nicola Policella. Journal of Scheduling. 2. 2.3. multiagent environments. 20(3):163–180.2 J. In Proceedings of AAAI98.1 Stuart Russell and Eric Wefald. editors. 2010.3. Principles of metareasoning. Pollack and John F. MIT Press. Amedeo Cesta. 3. Conformant graphplan.2 Martha E. and Terry Zimmerman.1 Brennan Sellner and Reid Simmons.3. Cox and Anita Raja. 2010. Weld. 2007. There’s more to life than making plans: Plan management in dynamic. Machine Learning. 1999. 20(4):71–84. Coordinating agents’ metalevel control. George Alexander. 1:81–106. Stanford University.1. 2007. Synthesizing partial order schedules by chaining. and Mike Krainin. Artiﬁcial Intelligence.3 177 .4 J.2. 2.1. In Michael T.1. R. Universit at Karlsruhe. Victor Lesser. 1. 8. Horty. Technical Report WIOR543. and Stephen F. 2. 2.9 Bibliography Mark Alan Peot. Quinlan. Angelo Oddi. 1986. Morgan Kaufmann Publishers. Metareasoning: Thinking about thinking. 5. R.1 Nicola Policella. Autonomous Agents and MultiAgent Systems.1. 15(2):147–196. A CSP approach to robust scheduling. PhD thesis. In Proceedings of the 5th International Workshop on Planning and Scheduling in Space 2006. The role of metareasoning in achieving effective multiagent coordination. and Stephen F. 3. Smith. WI.1. Quinlan. 1993.1 Christoph Schwindt.4 Anita Raja and Victor Lesser. 7. 1998. A framework for metalevel control in multiagent systems.1 Anita Raja.1. editors. A probabilistic algorithm for kSAT and constraint satisfaction problems. Solveandrobustify. 3 Daniel E.1 Uwe Schoning. October 2006. 1999.1. 2. 8. Institut fur Wirtschaftstheorie und Operations Research.1. Induction of decision trees.1.1 Zachary Rubinstein.2. Metareasoning: Thinking about thinking. 1998. DecisionTheoretic Planning.3. Baltimore. June 2009. Smith and Daniel S. In Michael T.5: Programs for Machine Learning. Stephen F.4. Madison. Department of EngineeringEconomic Systems.2. 1. 49.
RI.1. URL http://www. In Proceedings of the Sixteenth National Conference on Artiﬁcial Intelligence (AAAI98). 2. In FLAIRS ’07. Smith. 2. URL citeseer. 2.2. Modeling uncertainty and its implications to sophisticated control in TAEMS agents.2 Sungwook Yoon. Laura Barbulescu. 2.2. March 2006. Zweben. 7.1.2 Roman van der Krogt and Mathijs de Weerdt. Anita Raja.3. 13(3): 235–292.3. 2006. AI Magazine. Anthony Gallagher. 2. In Froduald Kabanza and Sylvie Thi´ baux. 3.2.3 178 . 2006. Extending graphplan to handle uncertainty and sensing actions. May 2007. 2.3.pdf.3 H˚ kan L.2 David E. Musliner. Smith. editors. Alan Fern. 8.2. Probabilistic plan veriﬁcation through acceptance a sampling. Knowledge Based Scheduling.1 Thomas Wagner. Monterey. and David E. A call for knowledgebased planning.2. Weld. and Robert Givan.tempastic. France. and H´ ctor Mu˜ ozAvila. Chad Hogg. 1999. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS ’05). planning and control. 2. Terry Zimmerman. editors.1.1 Ping Xuan and Victor Lesser.1. 8.2. 3. 2005.2. 2002. Distributed management of ﬂexible times schedules. and Victor Lesser. Terry Zimmerman. Branching and pruning: An optimal temporal POCL planner e based on constraint programming.9 Bibliography Stephen F. 2.org/papers/aipsws2002. 170(3):298–335. In Proceedings of the 2006 AAAI Spring Symposium on Distributed Plan and Schedule Management. Fox. Multiagent management of joint schedules. In Proceedings of AAMAS07. Laura Barbulescu. Autonomous Agents and MultiAgent Systems. Providence.2 Ian Warﬁeld.2. 2. Wilkins and Marie desJardins. 1994. Younes and David J.ist.psu. 2.1. OPlan 2: an open architecture for command.2. 22: 99–115.1. and Zachary Rubinstein. S. California. Anthony Gallagher. FFReplan: A baseline for probabilistic planning.edu/tate94oplan.4 Vincent Vidal and H´ ctor Geffner. Adaptation of hierarchie n cial task network plans. 1998. 2000. Corin R. and M. Toulouse. Smith.2 Daniel S.1 Stephen F. pages 81–88.1 Austin Tate. In Proceedings of ATAL99. 3. Incorporating uncertainty in agent commitments. and Zachary Rubinstein. 2. Stephen LeeUrban. Artiﬁcial Intelligence. May 2007.2.2. and Richard Kirby. September 2007. Proceedings of the AIPS02 e Workshop on Planning via Model Checking.2. 5. Plan repair as an extension of planning.2. In Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling (ICAPS ’07). Anderson. Brian Drabble. Morgan Kaufmann.html. In M.
URL citeseer.1. Simmons. pages 195–204. Policy generation for continuoustime stochastic doa mains with concurrency. 2005. 20:405–430.1 H˚ kan L. URL http://www. S. 2003.1 H˚ kan L. 2.2. Younes and Reid G.1. Solving generalized semimarkov decision processes a using continuous phasetype distributions.org/papers/aaai2004. VHPOP: Versatile heuristic partial order planner. Musliner. 2. University of Michigan. S. Technical Report 430. a Journal of Artiﬁcial Intelligence Research.1.ist. San Jose. edu/younes04policy. 2003.tempastic. In Proceedings of AAAI04. S. 2004b. and Travor Hastie.pdf.2 H˚ kan L. CA. David J.stanford. Simmons.psu. In ICAPS. Simmons. A framework for planning in a continuoustime stochastic domains. S. Younes and Reid G. Younes and Reid G.pdf. 2.9 Bibliography H˚ kan L.4 179 .edu/ hastie/Papers/samme. Hui Zou. In Proceedings of ICAPS ’04.1 Ji Zhu. 2004a. Younes. http://wwwstat.html. Saharon Rosset. 2. Simmons. and Reid G. 3. Multiclass adaboost.
9 Bibliography 180 .
Appendix A Sample Problems 181 .
constraints = ((ends_before start of a012_0 by [1. }. a002_0 or a002_1. constraints = ((ends_before start of a012_1 by [3.1 Sample MRCPSP/max problem in ASPEN syntax activity a001 { decompositions = a001_0 or a001_1. 1000000]) and (ends_before start of a020_1 by [1. constraints = ((ends_before start of a012_0 by [1. activity a001_0_by_a012_1 { duration = 10.A Sample Problems A. 1000000])). }. 1000000])). activity a002_0_by_a015_0_a020_0 { duration = 7. activity a001_1_by_a012_1 { duration = 4. }. constraints = ((ends_before start of a012_1 by [6. activity a002_0_by_a015_1_a020_0 { 182 . 1000000])). }. 1000000])). activity a001_1_by_a012_0 { duration = 4. }. activity a001_0_by_a012_0 { duration = 10. constraints = ((ends_before start of a015_0 by [34. 1000000])). }. 1000000])). constraints = ((ends_before start of a015_0 by [19. activity a002 { decompositions = }. activity a001_1 { decompositions = a001_1_by_a012_0 or a001_1_by_a012_1. }. }. activity a002_0 { decompositions = a002_0_by_a015_0_a020_0 or a002_0_by_a015_0_a020_1 or a002_0_by_a015_1_a020_0 or a002_0_by_a015_1_a020_1. }. }. 1000000]) and (ends_before start of a020_0 by [10. activity a001_0 { decompositions = a001_0_by_a012_0 or a001_0_by_a012_1. activity a002_0_by_a015_0_a020_1 { duration = 7.
}. constraints = ((ends_before start of a015_1 by [6. 1000000]) and (ends_before start of a013_0 by [1000000.1 Sample MRCPSP/max problem in ASPEN syntax duration = 7. }. constraints = ((ends_before start of a015_1 by [6. 1000000]) and (ends_before start of a020_1 by [4. 1000000]) and (ends_before start of a020_1 by [1. activity a002_1_by_a015_1_a020_1 { duration = 2. activity a002_1_by_a015_0_a020_1 { duration = 2. 1000000])). activity a003_0_by_a010_0_a013_0_a020_0 { duration = 4. }. 1000000]) and (ends_before start of a020_0 by [10.A. constraints = ((ends_before start of a015_0 by [2. }. activity a002_1 { decompositions = a002_1_by_a015_0_a020_0 or a002_1_by_a015_0_a020_1 or a002_1_by_a015_1_a020_0 or a002_1_by_a015_1_a020_1. constraints = ((ends_before start of a015_1 by [19. activity a002_1_by_a015_0_a020_0 { duration = 2. 16]) and (ends_before start of a020_0 by [2. }. activity a003_0 { decompositions = a003_0_by_a010_0_a013_0_a020_0 a003_0_by_a010_1_a013_0_a020_0 a003_0_by_a010_0_a013_0_a020_1 a003_0_by_a010_1_a013_0_a020_1 }. 1000000]) and (ends_before start of a020_1 by [4. 183 . constraints = ((ends_before start of a015_1 by [19. 1000000])). a003_0 or a003_1. }. 1000000])). 1000000]) and (ends_before start of a020_0 by [1. 1000000])). activity a002_0_by_a015_1_a020_1 { duration = 7. activity a003 { decompositions = }. 1000000])). 1000000]) and (ends_before start of a020_0 by [1. activity a002_1_by_a015_1_a020_0 { duration = 2. 1000000])). }. or or or or a003_0_by_a010_0_a013_1_a020_0 or a003_0_by_a010_1_a013_1_a020_0 or a003_0_by_a010_0_a013_1_a020_1 or a003_0_by_a010_1_a013_1_a020_1. 1000000])). constraints = ((ends_before start of a015_0 by [2. constraints = ((ends_before start of a010_0 by [6. }.
1000000])). constraints = ((ends_before start of a010_0 by [6. }. 1000000]) and (ends_before start of a013_0 by [1000000. 18]) and (ends_before start of a020_1 by [8. activity a003_0_by_a010_0_a013_0_a020_1 { duration = 4. constraints = ((ends_before start of a010_1 by [8. 1000000]) and (ends_before start of a013_1 by [1000000. 1000000])). 16]) and (ends_before start of a020_0 by [2. }. or or or or a003_1_by_a010_0_a013_1_a020_0 or a003_1_by_a010_1_a013_1_a020_0 or a003_1_by_a010_0_a013_1_a020_1 or a003_1_by_a010_1_a013_1_a020_1. 1000000]) and (ends_before start of a013_1 by [1000000. 16]) and (ends_before start of a020_1 by [8. activity a003_1 { decompositions = a003_1_by_a010_0_a013_0_a020_0 a003_1_by_a010_1_a013_0_a020_0 a003_1_by_a010_0_a013_0_a020_1 a003_1_by_a010_1_a013_0_a020_1 }. 1000000])). activity a003_0_by_a010_0_a013_1_a020_1 { duration = 4. constraints = ((ends_before start of a010_1 by [8. constraints = ((ends_before start of a010_0 by [6. }. }. activity a003_0_by_a010_1_a013_0_a020_1 { duration = 4. 18]) and (ends_before start of a020_0 by [2. 16]) and (ends_before start of a020_1 by [8. }. activity a003_0_by_a010_1_a013_1_a020_0 { duration = 4. constraints = ((ends_before start of a010_1 by [8. 18]) and (ends_before start of a020_0 by [2. 18]) and (ends_before start of a020_1 by [8. 1000000])). }. activity a003_0_by_a010_1_a013_1_a020_1 { duration = 4. 1000000]) and (ends_before start of a013_0 by [1000000. 1000000]) and (ends_before start of a013_1 by [1000000. 1000000]) and (ends_before start of a013_1 by [1000000. 1000000]) and (ends_before start of a013_0 by [1000000. 1000000])). }. constraints = ((ends_before start of a010_0 by [6. 1000000])). activity a003_0_by_a010_1_a013_0_a020_0 { duration = 4. constraints = ((ends_before start of a010_1 by [8. 1000000])).A Sample Problems activity a003_0_by_a010_0_a013_1_a020_0 { duration = 4. 184 .
1000000]) and (ends_before start of a013_1 by [1000000. 1000000])). constraints = ((ends_before start of a010_1 by [3. 1000000]) and (ends_before start of a013_0 by [1000000. 12]) and (ends_before start of a020_0 by [17. 30]) and (ends_before start of a020_0 by [32. activity a003_1_by_a010_1_a013_1_a020_0 { duration = 5. }. constraints = ((ends_before start of a010_0 by [22. 12]) and (ends_before start of a020_1 by [13. 1000000])). }. 30]) and (ends_before start of a020_1 by [13. constraints = ((ends_before start of a010_0 by [15. 1000000]) and (ends_before start of a013_0 by [1000000. 1000000])). activity a003_1_by_a010_0_a013_1_a020_0 { duration = 5. 30]) and (ends_before start of a020_0 by [62. activity a003_1_by_a010_1_a013_0_a020_0 { duration = 5. activity a003_1_by_a010_0_a013_0_a020_1 { duration = 5. }. }. 1000000]) and (ends_before start of a013_0 by [1000000. 30]) and (ends_before start of a020_1 by [13. 1000000]) and (ends_before start of a013_1 by [1000000. }. 12]) and (ends_before start of a020_1 by [13. }. 1000000])). constraints = ((ends_before start of a010_1 by [3. 12]) and (ends_before start of a020_0 by [47. constraints = ((ends_before start of a010_0 by [29. 1000000])). 1000000]) and (ends_before start of a013_0 by [1000000.A. 1000000])).1 Sample MRCPSP/max problem in ASPEN syntax activity a003_1_by_a010_0_a013_0_a020_0 { duration = 5. constraints = ((ends_before start of a010_0 by [8. }. }. constraints = ((ends_before start of a010_1 by [3. activity a003_1_by_a010_0_a013_1_a020_1 { duration = 5. 1000000])). activity a004 { 185 . activity a003_1_by_a010_1_a013_1_a020_1 { duration = 5. 1000000])). constraints = ((ends_before start of a010_1 by [3. 1000000]) and (ends_before start of a013_1 by [1000000. activity a003_1_by_a010_1_a013_0_a020_1 { duration = 5. 1000000]) and (ends_before start of a013_1 by [1000000.
1000000])). 1000000])). }. 1000000])). }. activity a004_1_by_a006_1_a012_1 { duration = 7. activity a004_1 { decompositions = a004_1_by_a006_0_a012_0 or a004_1_by_a006_1_a012_0 or a004_1_by_a006_0_a012_1 or a004_1_by_a006_1_a012_1. constraints = ((ends_before start of a006_1 by [22. constraints = ((ends_before start of a006_0 by [20. 1000000])). constraints = ((ends_before start of a006_0 by [20. activity a004_0_by_a006_0_a012_0 { duration = 9. activity a004_0_by_a006_1_a012_1 { duration = 9. activity a004_0_by_a006_0_a012_1 { duration = 9. 1000000]) and (ends_before start of a012_0 by [25. 1000000]) and (ends_before start of a012_1 by [18. activity a004_0_by_a006_1_a012_0 { duration = 9. }. }. activity a004_1_by_a006_1_a012_0 { duration = 7. 1000000]) and (ends_before start of a012_0 by [25. }. }. 1000000]) and (ends_before start of a012_1 by [18. activity a004_0 { decompositions = a004_0_by_a006_0_a012_0 or a004_0_by_a006_1_a012_0 or a004_0_by_a006_0_a012_1 or a004_0_by_a006_1_a012_1. activity a004_1_by_a006_0_a012_1 { duration = 7. constraints = ((ends_before start of a006_0 by [18. constraints = ((ends_before start of a006_1 by [22.A Sample Problems decompositions = a004_0 or a004_1. }. constraints = ((ends_before start of a006_0 by [18. 186 . constraints = ((ends_before start of a006_1 by [6. 1000000])). 1000000])). }. }. }. 1000000]) and (ends_before start of a012_1 by [20. activity a004_1_by_a006_0_a012_0 { duration = 7. 1000000]) and (ends_before start of a012_0 by [31. 1000000]) and (ends_before start of a012_0 by [18. 1000000])).
A. 1000000])). }. }. }. activity a006_0 { decompositions = a006_0_by_a017_0 or a006_0_by_a017_1. 1000000]) and (ends_before start of a012_1 by [20. activity a005_0_by_a014_1 { duration = 1. 1000000])). activity a005 { decompositions = a005_0. 187 . constraints = ((ends_before start of a017_1 by [15. 8])). activity a006_0_by_a017_0 { duration = 9. activity a006 { decompositions = }. 1000000])). }. activity a006_1_by_a017_1 { duration = 4. constraints = ((ends_before start of a017_0 by [27. constraints = ((ends_before start of a017_0 by [4. a006_0 or a006_1. }. }.1 Sample MRCPSP/max problem in ASPEN syntax constraints = ((ends_before start of a006_1 by [6. activity a006_0_by_a017_1 { duration = 9. 1000000])). 1000000])). activity a006_1_by_a017_0 { duration = 4. }. activity a005_0 { decompositions = a005_0_by_a014_0 or a005_0_by_a014_1. }.duration). }. }. activity a005_0_by_a014_0 { duration = 1. }. activity a006_1 { decompositions = a006_1_by_a017_0 or a006_1_by_a017_1. activity a007 { decompositions = a007_0 or a007_1 with (duration <. constraints = ((ends_before start of a014_1 by [1. }. 7])). constraints = ((ends_before start of a014_0 by [2. constraints = ((ends_before start of a017_1 by [10.
188 .A Sample Problems activity a007_0 { decompositions = a007_0_by_a010_0_a013_0 or a007_0_by_a010_0_a013_1 or a007_0_by_a010_1_a013_0 or a007_0_by_a010_1_a013_1. constraints = ((ends_before start of a010_1 by [1000000. constraints = ((ends_before start of a010_1 by [1000000. 7]) and (ends_before start of a013_1 by [6. activity a007_1_by_a010_0_a013_0 { duration = 3. activity a007_0_by_a010_0_a013_1 { duration = 3. }. 1]) and (ends_before start of a013_1 by [8. constraints = ((ends_before start of a010_0 by [1000000. constraints = ((ends_before start of a010_1 by [1000000. }. constraints = ((ends_before start of a010_0 by [1000000. }. }. 29]) and (ends_before start of a013_1 by [8. 42])). }. activity a007_1_by_a010_1_a013_1 { duration = 3. 1]) and (ends_before start of a013_0 by [5. 20])). 18])). 42])). activity a007_1_by_a010_1_a013_0 { duration = 3. activity a007_0_by_a010_1_a013_0 { duration = 3. }. }. 1]) and (ends_before start of a013_1 by [6. 18])). 7]) and (ends_before start of a013_0 by [1. constraints = ((ends_before start of a010_0 by [1000000. 31])). 31])). constraints = ((ends_before start of a010_1 by [1000000. constraints = ((ends_before start of a010_0 by [1000000. }. activity a007_0_by_a010_0_a013_0 { duration = 3. activity a007_1 { decompositions = a007_1_by_a010_0_a013_0 or a007_1_by_a010_0_a013_1 or a007_1_by_a010_1_a013_0 or a007_1_by_a010_1_a013_1. activity a007_1_by_a010_0_a013_1 { duration = 3. activity a007_0_by_a010_1_a013_1 { duration = 3. }. 20])). 1]) and (ends_before start of a013_0 by [1. 29]) and (ends_before start of a013_0 by [5.
4]) and (ends_before start of a016_0 by [18. activity a008_1_by_a009_0_a016_0 { duration = 9. activity a008_0 { decompositions = a008_0_by_a009_0_a016_0 or a008_0_by_a009_0_a016_1 or a008_0_by_a009_1_a016_0 or a008_0_by_a009_1_a016_1. 1000000])). }. }. constraints = ((ends_before start of a009_0 by [1000000. constraints = ((ends_before start of a009_0 by [1000000. activity a008_0_by_a009_1_a016_1 { duration = 10. }.1 Sample MRCPSP/max problem in ASPEN syntax }. activity a008 { decompositions = a008_0 or a008_1. 10]) and (ends_before start of a016_1 by [11.A. 1000000])). 189 . activity a008_0_by_a009_0_a016_0 { duration = 10. constraints = ((ends_before start of a009_0 by [1000000. activity a008_1 { decompositions = a008_1_by_a009_0_a016_0 or a008_1_by_a009_0_a016_1_none or a008_1_by_a009_1_a016_0_none or a008_1_by_a009_1_a016_1_none. }. }. constraints = ((ends_before start of a009_1 by [1000000. }. }. }. activity a008_0_by_a009_1_a016_0 { duration = 10. activity a008_1_by_a009_0_a016_1 { duration = 9. 4]) and (ends_before start of a016_1 by [17. 1000000])). 1000000])). 10]) and (ends_before start of a016_0 by [7. constraints = ((ends_before start of a009_0 by [1000000. 1000000])). 1000000])). }. constraints = ((ends_before start of a009_1 by [1000000. activity a008_0_by_a009_0_a016_1 { duration = 10. 1000000])). }. activity a008_1_by_a009_1_a016_0 { duration = 9. constraints = ((ends_before start of a009_1 by [1000000. 1]) and (ends_before start of a016_0 by [7. 29]) and (ends_before start of a016_1 by [17. 29]) and (ends_before start of a016_0 by [18.
43])). }. constraints = ((ends_before start of a012_1 by [1000000. }. 21]) and (ends_before start of a016_1 by [1000000. 4]) and (ends_before start of a013_0 by [1000000. 22]) and (ends_before start of a016_0 by [1000000. constraints = ((ends_before start of a012_0 by [1000000. 21]) and (ends_before start of a016_0 by [1000000. }. activity a009 { decompositions = a009_0 or a009_1. activity a009_0_by_a012_0_a013_0_a016_0 { duration = 10. 4]) and (ends_before start of a013_0 by [1000000. 43])). 1000000])). constraints = ((ends_before start of a012_0 by [1000000. activity a009_0 { decompositions = a009_0_by_a012_0_a013_0_a016_0 a009_0_by_a012_0_a013_1_a016_0 a009_0_by_a012_1_a013_0_a016_0 a009_0_by_a012_1_a013_1_a016_0 }. constraints = ((ends_before start of a009_1 by [1000000. constraints = ((ends_before start of a012_0 by [1000000. 17]) and (ends_before start of a013_0 by [1000000. }. 4]) and (ends_before start of a013_1 by [1000000.A Sample Problems activity a008_1_by_a009_1_a016_1 { duration = 9. 21]) and 190 . activity a009_0_by_a012_0_a013_1_a016_1 { duration = 10. 39])). activity a009_0_by_a012_0_a013_0_a016_1 { duration = 10. activity a009_0_by_a012_1_a013_0_a016_1 { duration = 10. activity a009_0_by_a012_0_a013_1_a016_0 { duration = 10. 4]) and (ends_before start of a013_1 by [1000000. 17]) and (ends_before start of a013_0 by [1000000. constraints = ((ends_before start of a012_1 by [1000000. }. activity a009_0_by_a012_1_a013_0_a016_0 { duration = 10. constraints = ((ends_before start of a012_0 by [1000000. 21]) and (ends_before start of a016_0 by [1000000. }. 1]) and (ends_before start of a016_1 by [11. 43])). 39])). }. or or or or a009_0_by_a012_0_a013_0_a016_1 or a009_0_by_a012_0_a013_1_a016_1 or a009_0_by_a012_1_a013_0_a016_1 or a009_0_by_a012_1_a013_1_a016_1. 22]) and (ends_before start of a016_1 by [1000000.
activity a009_0_by_a012_1_a013_1_a016_1 { duration = 10. 17]) and (ends_before start of a013_1 by [1000000. }. 47])). }. constraints = ((ends_before start of a012_1 by [1000000. constraints = ((ends_before start of a012_0 by [1000000. activity a009_0_by_a012_1_a013_1_a016_0 { duration = 10. 1]) and (ends_before start of a013_1 by [1000000. constraints = ((ends_before start of a012_0 by [1000000. }. constraints = ((ends_before start of a012_0 by [1000000. 22]) and (ends_before start of a016_1 by [1000000. 47])). 6]) and (ends_before start of a016_0 by [1000000. 3]) and (ends_before start of a016_1 by [1000000. 47])). 18])). activity a009_1_by_a012_0_a013_1_a016_0 { duration = 4. 191 . 1]) and (ends_before start of a013_0 by [1000000. activity a009_1 { decompositions = a009_1_by_a012_0_a013_0_a016_0 a009_1_by_a012_0_a013_1_a016_0 a009_1_by_a012_1_a013_0_a016_0 a009_1_by_a012_1_a013_1_a016_0 }. 39])). }. constraints = ((ends_before start of a012_1 by [1000000. 1]) and (ends_before start of a013_0 by [1000000. 22]) and (ends_before start of a016_0 by [1000000. 16]) and (ends_before start of a013_0 by [1000000. 43])). 3]) and (ends_before start of a016_0 by [1000000. 18])). constraints = ((ends_before start of a012_1 by [1000000. constraints = ((ends_before start of a012_0 by [1000000. activity a009_1_by_a012_0_a013_0_a016_0 { duration = 4.1 Sample MRCPSP/max problem in ASPEN syntax (ends_before start of a016_1 by [1000000. activity a009_1_by_a012_0_a013_1_a016_1 { duration = 4. 6]) and (ends_before start of a016_1 by [1000000. }. activity a009_1_by_a012_0_a013_0_a016_1 { duration = 4.A. 1]) and (ends_before start of a013_1 by [1000000. 17]) and (ends_before start of a013_1 by [1000000. 39])). }. or or or or a009_1_by_a012_0_a013_0_a016_1 or a009_1_by_a012_0_a013_1_a016_1 or a009_1_by_a012_1_a013_0_a016_1 or a009_1_by_a012_1_a013_1_a016_1. }. activity a009_1_by_a012_1_a013_0_a016_0 { duration = 4. 3]) and (ends_before start of a016_0 by [1000000.
A Sample Problems
}; activity a009_1_by_a012_1_a013_0_a016_1 { duration = 4; constraints = ((ends_before start of a012_1 by [1000000, 16]) and (ends_before start of a013_0 by [1000000, 3]) and (ends_before start of a016_1 by [1000000, 18])); }; activity a009_1_by_a012_1_a013_1_a016_0 { duration = 4; constraints = ((ends_before start of a012_1 by [1000000, 16]) and (ends_before start of a013_1 by [1000000, 6]) and (ends_before start of a016_0 by [1000000, 47])); }; activity a009_1_by_a012_1_a013_1_a016_1 { duration = 4; constraints = ((ends_before start of a012_1 by [1000000, 16]) and (ends_before start of a013_1 by [1000000, 6]) and (ends_before start of a016_1 by [1000000, 18])); }; activity a010 { decompositions = a010_0 or a010_1; }; activity a010_0 { duration = 2; }; activity a010_1 { duration = 10; }; activity a011 { decompositions = a011_0 or a011_1; }; activity a011_0 { decompositions = a011_0_by_a012_0_a018_0 or a011_0_by_a012_0_a018_1 or a011_0_by_a012_1_a018_0 or a011_0_by_a012_1_a018_1; }; activity a011_0_by_a012_0_a018_0 { duration = 4; constraints = ((ends_before start of a012_0 by [1000000, 1]) and (ends_before start of a018_0 by [9, 1000000])); }; activity a011_0_by_a012_0_a018_1 { duration = 4; constraints = ((ends_before start of a012_0 by [1000000, 1]) and (ends_before start of a018_1 by [6, 1000000])); }; activity a011_0_by_a012_1_a018_0 {
192
A.1 Sample MRCPSP/max problem in ASPEN syntax
duration = 4; constraints = ((ends_before start of a012_1 by [1000000, 7]) and (ends_before start of a018_0 by [9, 1000000])); }; activity a011_0_by_a012_1_a018_1 { duration = 4; constraints = ((ends_before start of a012_1 by [1000000, 7]) and (ends_before start of a018_1 by [6, 1000000])); };
activity a011_1 { decompositions = a011_1_by_a012_0_a018_0 or a011_1_by_a012_0_a018_1 or a011_1_by_a012_1_a018_0 or a011_1_by_a012_1_a018_1; }; activity a011_1_by_a012_0_a018_0 { duration = 10; constraints = ((ends_before start of a012_0 by [1000000, 3]) and (ends_before start of a018_0 by [24, 1000000])); }; activity a011_1_by_a012_0_a018_1 { duration = 10; constraints = ((ends_before start of a012_0 by [1000000, 3]) and (ends_before start of a018_1 by [25, 1000000])); }; activity a011_1_by_a012_1_a018_0 { duration = 10; constraints = ((ends_before start of a012_1 by [1000000, 7]) and (ends_before start of a018_0 by [39, 1000000])); }; activity a011_1_by_a012_1_a018_1 { duration = 10; constraints = ((ends_before start of a012_1 by [1000000, 7]) and (ends_before start of a018_1 by [25, 1000000])); }; activity a012 { decompositions = a012_0 or a012_1; }; activity a012_0 { decompositions = a012_0_by_a019_0 or a012_0_by_a019_1; }; activity a012_0_by_a019_0 { duration = 6; constraints = ((ends_before start of a019_0 by [1000000, 16])); }; activity a012_0_by_a019_1 { duration = 6; constraints = ((ends_before start of a019_1 by [1000000, 17]));
193
A Sample Problems
}; activity a012_1 { decompositions = a012_1_by_a019_0 or a012_1_by_a019_1; }; activity a012_1_by_a019_0 { duration = 6; constraints = ((ends_before start of a019_0 by [1000000, 27])); }; activity a012_1_by_a019_1 { duration = 6; constraints = ((ends_before start of a019_1 by [1000000, 25])); }; activity a013 { decompositions = a013_0 or a013_1; }; activity a013_0 { duration = 7; }; activity a013_1 { duration = 10; }; activity a014 { decompositions = a014_0 or a014_1; }; activity a014_0 { duration = 4; }; activity a014_1 { duration = 4; }; activity a015 { decompositions = a015_0 or a015_1; }; activity a015_0 { duration = 8; }; activity a015_1 { duration = 1; }; activity a016 { decompositions = a016_0 or a016_1; }; activity a016_0 {
194
A.1 Sample MRCPSP/max problem in ASPEN syntax
duration = 7; }; activity a016_1 { duration = 2; }; activity a017 { decompositions = a017_0 or a017_1; }; activity a017_0 { duration = 9; }; activity a017_1 { duration = 10; }; activity a018 { decompositions = a018_0 or a018_1; }; activity a018_0 { duration = 5; }; activity a018_1 { duration = 2; }; activity a019 { decompositions = a019_0 or a019_1; }; activity a019_0 { duration = 5; }; activity a019_1 { duration = 6; }; activity a020 { decompositions = a020_0 or a020_1; }; activity a020_0 { duration = 10; }; activity a020_1 { duration = 8; };
195
000000) (quality_distribution 10.Scenario time bounds (spec_boh 20) (spec_eoh 115) .0) (duration_distribution 9 0..0) 196 .50)))) (spec_method (label solveff1) (agent Betty) (outcomes (default (density 1.Activities (spec_task_group (label scenario1) (subtasks problem1 problem2) (qaf q_sum)) (spec_task (label problem1) (subtasks gutturalwinroot songbirdwinroot hunkwinroot) (qaf q_sum_and)) (spec_task (label gutturalwinroot) (earliest_start_time 30) (deadline 61) (subtasks averment exudate healthcare) (qaf q_sum_and)) (spec_task (label averment) (subtasks solve solveff1 solverd1) (qaf q_max)) (spec_method (label solve) (agent Betty) (outcomes (default (density 1.000000) (quality_distribution 11.Agents (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (spec_agent (label (label (label (label (label (label (label (label (label (label (label (label (label (label (label (label (label (label (label (label Miek)) Marc)) Christie)) Hector)) Moses)) Jesper)) Blayne)) Amy)) Les)) Kevan)) Starbuck)) Tandy)) Betty)) Ole)) Liyuan)) Naim)) Galen)) Sandy)) Kyu)) Cris)) .A Sample Problems A.75 1..25 4 0..2 Sample C TAEMS Problem .00 1.25 7 0.
75 1.2 Sample C TAEMS Problem (duration_distribution 4 0.25 5 0.25)))) (spec_method (label snarfrd1) (agent Hector) (outcomes (default (density 0.25 4 0.25)))) (spec_task (label healthcare) (subtasks mousetrap mousetraprd1) (qaf q_max)) (spec_method (label mousetrap) (agent Hector) (outcomes (default (density 1.200000) (quality_distribution 0.75 1.25 7 0.25 6 0.000000) (quality_distribution 11.0) (duration_distribution 4 0.25 6 0.75 1.75 1.0) (duration_distribution 9 0.00 1.0) (duration_distribution 3 0.50 6 0.800000) (quality_distribution 10.50)))) (spec_task (label exudate) (subtasks snarf snarfff1 snarfrd1) (qaf q_max)) (spec_method (label snarf) (agent Moses) (outcomes (default (density 1.50 6 0.50 14 0.25)))) (spec_method (label mousetraprd1) (agent Jesper) (outcomes (default (density 1.25)))) (spec_task (label songbirdwinroot) (earliest_start_time 45) (deadline 55) (subtasks owngoal tandoori) (qaf q_sum)) (spec_task (label owngoal) (subtasks scorn scornff1 scornrd1) (qaf q_max)) 197 .25)) (failure (density 0.25 5 0.000000) (quality_distribution 7.0) (duration_distribution 9 0.50 6 0.50 7 0.25 5 0.25 6 0.50 7 0.50 11 0.000000) (quality_distribution 7.000000) (quality_distribution 11.0) (duration_distribution 4 0.000000) (quality_distribution 7.25)))) (spec_method (label solverd1) (agent Moses) (outcomes (default (density 1.25)))) (spec_method (label snarfff1) (agent Moses) (outcomes (default (density 1.25 7 0.75 1.00 1.0) (duration_distribution 11 0.0) (duration_distribution 3 0.A.
0) (duration_distribution 4 0.0) (duration_distribution 8 0.50 6 0.000000) (quality_distribution 10.75 1.25)))) (spec_method (label gutterrd1) (agent Liyuan) 198 .000000) (quality_distribution 7.25 7 0.000000) (quality_distribution 10.50 11 0.25 5 0.000000) (quality_distribution 10.000000) (quality_distribution 10.25)))) (spec_task (label hunkwinroot) (earliest_start_time 50) (deadline 74) (subtasks barmecide glacis cherry) (qaf q_sum_and)) (spec_task (label barmecide) (subtasks gutter gutterff1 gutterrd1) (qaf q_max)) (spec_method (label gutter) (agent Kevan) (outcomes (default (density 1.25 5 0.50 10 0.25)))) (spec_task (label tandoori) (subtasks unstop unstoprd1) (qaf q_max)) (spec_method (label unstop) (agent Marc) (outcomes (default (density 1.0) (duration_distribution 8 0.50 10 0.50 6 0.50)))) (spec_method (label scornrd1) (agent Cris) (outcomes (default (density 1.25 5 0.0) (duration_distribution 9 0.25 6 0.25 7 0.25 5 0.0) (duration_distribution 9 0.25 4 0.75 1.75 1.000000) (quality_distribution 10.000000) (quality_distribution 11.0) (duration_distribution 3 0.00 1.A Sample Problems (spec_method (label scorn) (agent Les) (outcomes (default (density 1.25)))) (spec_method (label unstoprd1) (agent Les) (outcomes (default (density 1.00 1.25)))) (spec_method (label gutterff1) (agent Kevan) (outcomes (default (density 1.00 1.50)))) (spec_method (label scornff1) (agent Les) (outcomes (default (density 1.0) (duration_distribution 8 0.25 5 0.75 1.
75 1.75 1.25 6 0.000000) (quality_distribution 10.0) (duration_distribution 4 0.000000) (quality_distribution 7.50 11 0.2 Sample C TAEMS Problem (outcomes (default (density 1.50 11 0.75 1.00 1.75 1.A.25 6 0.25)))) (spec_method (label purchaserd1) (agent Kevan) (outcomes (default (density 1.0) (duration_distribution 4 0.25 7 0.25)))) (spec_task (label problem2) (subtasks elevonwinroot pingowinroot extentwinroot) (qaf q_sum_and)) (spec_task (label elevonwinroot) (earliest_start_time 72) (deadline 84) (subtasks racerunner washup reticle) (qaf q_sum)) (spec_task (label racerunner) (subtasks fissure fissureff1 fissurerd1) (qaf q_max)) 199 .000000) (quality_distribution 8.50 7 0.000000) (quality_distribution 7.000000) (quality_distribution 11.25 7 0.25)))) (spec_method (label misreportff1) (agent Kevan) (outcomes (default (density 1.25)))) (spec_task (label cherry) (subtasks misreport misreportff1) (qaf q_max)) (spec_method (label misreport) (agent Kevan) (outcomes (default (density 1.75 1.0) (duration_distribution 10 0.000000) (quality_distribution 10.25)))) (spec_task (label glacis) (subtasks purchase purchaseff1 purchaserd1) (qaf q_max)) (spec_method (label purchase) (agent Liyuan) (outcomes (default (density 1.50 7 0.0) (duration_distribution 9 0.25)))) (spec_method (label purchaseff1) (agent Liyuan) (outcomes (default (density 1.50 10 0.25 6 0.0) (duration_distribution 9 0.25 6 0.50 13 0.0) (duration_distribution 9 0.
0) (duration_distribution 3 0.50 6 0.25 7 0.0) (duration_distribution 10 0.25 7 0.25)))) (spec_method (label tierd1) (agent Tandy) (outcomes (default (density 1.25 4 0.00 1.75 1.25 7 0.25)))) (spec_task (label washup) (subtasks tie tierd1) (qaf q_max)) (spec_method (label tie) (agent Kyu) (outcomes (default (density 1.25)))) (spec_method (label stemrd1) (agent Betty) (outcomes (default (density 1.000000) (quality_distribution 7.0) (duration_distribution 4 0.000000) (quality_distribution 10.75 1.00 1.25 4 0.000000) (quality_distribution 10.50 6 0.75)))) (spec_method (label fissurerd1) (agent Naim) (outcomes (default (density 1.000000) (quality_distribution 10.50)))) (spec_method (label stemff1) (agent Tandy) (outcomes (default (density 1.0) (duration_distribution 3 0.00 1.00 1.0) (duration_distribution 9 0.50 13 0.25)))) (spec_method (label fissureff1) (agent Ole) (outcomes (default (density 1.000000) (quality_distribution 8.25 5 0.00 1.0) (duration_distribution 3 0.25)))) (spec_task (label reticle) (subtasks stem stemff1 stemrd1) (qaf q_max)) (spec_method (label stem) (agent Tandy) (outcomes (default (density 1.25 5 0.000000) (quality_distribution 10.75 1.0) (duration_distribution 9 0.50)))) (spec_task (label pingowinroot) 200 .50 7 0.000000) (quality_distribution 10.50 6 0.0) (duration_distribution 4 0.A Sample Problems (spec_method (label fissure) (agent Ole) (outcomes (default (density 1.25 6 0.25 4 0.000000) (quality_distribution 10.25 5 0.
000000) (quality_distribution 7.25)))) (spec_task (label aquanaut) (subtasks enervate enervateff1) (qaf q_max)) (spec_method (label enervate) (agent Ole) (outcomes (default (density 1.2 Sample C TAEMS Problem (earliest_start_time 78) (deadline 90) (subtasks bromicacid aquanaut mandinka) (qaf q_sum)) (spec_task (label bromicacid) (subtasks randomize randomizeff1) (qaf q_max)) (spec_method (label randomize) (agent Naim) (outcomes (default (density 1.000000) (quality_distribution 7.25 7 0.75 1.50 7 0.25 7 0.25)))) (spec_method (label randomizeff1) (agent Naim) (outcomes (default (density 1.0) (duration_distribution 9 0.0) (duration_distribution 4 0.75 1.75 1.0) (duration_distribution 10 0.0) (duration_distribution 4 0.25)))) (spec_method (label entrustff1) (agent Kyu) (outcomes (default (density 1.25 5 0.000000) (quality_distribution 11.000000) (quality_distribution 11.25)))) (spec_task (label extentwinroot) (earliest_start_time 84) (deadline 110) (subtasks squashbugsyncpoint civilwarsyncpoint cellaragesyncpoint) (qaf q_sum_and)) 201 .50 10 0.75 1.000000) (quality_distribution 11.0) (duration_distribution 8 0.25 6 0.50 13 0.25 4 0.50)))) (spec_method (label enervateff1) (agent Ole) (outcomes (default (density 1.50 6 0.25)))) (spec_task (label mandinka) (subtasks entrust entrustff1) (qaf q_max)) (spec_method (label entrust) (agent Kyu) (outcomes (default (density 1.A.000000) (quality_distribution 8.50 9 0.25 6 0.75 1.25 5 0.0) (duration_distribution 8 0.75 1.
000000) (quality_distribution 10.75 1.50)))) (spec_task (label angina) (subtasks search searchff1) (qaf q_max)) (spec_method (label search) (agent Cris) 202 .50 14 0.25 4 0.000000) (quality_distribution 10.0) (duration_distribution 9 0.000000) (quality_distribution 7.25)))) (spec_method (label orientrd1) (agent Cris) (outcomes (default (density 1.75 1.25 6 0.50 10 0.000000) (quality_distribution 11.0) (duration_distribution 9 0.25 7 0.50 10 0.25 7 0.A Sample Problems (spec_task (label squashbugsyncpoint) (subtasks cloudscape salesman plainsman angina) (qaf q_sync_sum)) (spec_task (label cloudscape) (subtasks orient orientrd1) (qaf q_max)) (spec_method (label orient) (agent Miek) (outcomes (default (density 1.50 7 0.0) (duration_distribution 8 0.25)))) (spec_task (label plainsman) (subtasks diarize diarizerd1) (qaf q_max)) (spec_method (label diarize) (agent Amy) (outcomes (default (density 1.00 1.00 1.000000) (quality_distribution 10.25)))) (spec_method (label diarizerd1) (agent Blayne) (outcomes (default (density 1.0) (duration_distribution 9 0.0) (duration_distribution 11 0.25 7 0.50 11 0.00 1.25 5 0.25)))) (spec_task (label salesman) (subtasks reface refaceff1) (qaf q_max)) (spec_method (label reface) (agent Blayne) (outcomes (default (density 1.0) (duration_distribution 4 0.75 1.25)))) (spec_method (label refaceff1) (agent Blayne) (outcomes (default (density 1.25 6 0.000000) (quality_distribution 10.
75 1.50 10 0.000000) (quality_distribution 7.25 4 0.50)))) (spec_task (label civilwarsyncpoint) (subtasks gyrocopter shoreline putoption denverboot) (qaf q_sum)) (spec_task (label gyrocopter) (subtasks intoxicate intoxicateff1 intoxicaterd1) (qaf q_max)) (spec_method (label intoxicate) (agent Les) (outcomes (default (density 1.25)))) (spec_method (label enwreatherd1) (agent Miek) (outcomes (default (density 1.50 13 0.25)))) (spec_task (label putoption) (subtasks configure configureff1 configurerd1) (qaf q_max)) 203 .000000) (quality_distribution 11.75)))) (spec_method (label intoxicaterd1) (agent Cris) (outcomes (default (density 1.0) (duration_distribution 4 0.000000) (quality_distribution 10.00 1.75 1.75 1.0) (duration_distribution 3 0.75 1.000000) (quality_distribution 10.50 10 0.00 1.25 7 0.25 5 0.25)))) (spec_task (label shoreline) (subtasks enwreathe enwreatherd1) (qaf q_max)) (spec_method (label enwreathe) (agent Cris) (outcomes (default (density 1.0) (duration_distribution 8 0.0) (duration_distribution 9 0.000000) (quality_distribution 10.0) (duration_distribution 10 0.0) (duration_distribution 8 0.A.25 7 0.000000) (quality_distribution 10.25 5 0.0) (duration_distribution 8 0.25 5 0.50 7 0.50 11 0.000000) (quality_distribution 7.25)))) (spec_method (label searchff1) (agent Cris) (outcomes (default (density 1.75 1.25 6 0.25 6 0.2 Sample C TAEMS Problem (outcomes (default (density 1.25)))) (spec_method (label intoxicateff1) (agent Les) (outcomes (default (density 1.
75 1.25 7 0.50 7 0.A Sample Problems (spec_method (label configure) (agent Marc) (outcomes (default (density 1.00 1.25 4 0.50 6 0.000000) (quality_distribution 10.75 1.000000) (quality_distribution 11.0) (duration_distribution 3 0.0) (duration_distribution 4 0.00 1.25 6 0.25)))) (spec_task (label denverboot) (subtasks skitter skitterff1) (qaf q_max)) (spec_method (label skitter) (agent Starbuck) (outcomes (default (density 1.0) (duration_distribution 4 0.0) (duration_distribution 3 0.25 5 0.25)))) (spec_method (label preprogramrd1) (agent Starbuck) (outcomes (default (density 1.000000) 204 .75 1.0) (duration_distribution 3 0.25 5 0.25)))) (spec_method (label preprogramff1) (agent Galen) (outcomes (default (density 1.25 7 0.00 1.25 5 0.75)))) (spec_task (label cellaragesyncpoint) (subtasks bonfire mojarra skimmilk flapjack) (qaf q_sum)) (spec_task (label bonfire) (subtasks preprogram preprogramff1 preprogramrd1) (qaf q_max)) (spec_method (label preprogram) (agent Galen) (outcomes (default (density 1.0) (duration_distribution 9 0.000000) (quality_distribution 11.75 1.50 6 0.50)))) (spec_method (label skitterff1) (agent Starbuck) (outcomes (default (density 1.000000) (quality_distribution 10.25)))) (spec_method (label configurerd1) (agent Les) (outcomes (default (density 1.50 10 0.000000) (quality_distribution 7.000000) (quality_distribution 8.0) (duration_distribution 9 0.25)))) (spec_method (label configureff1) (agent Marc) (outcomes (default (density 1.50 6 0.25 4 0.000000) (quality_distribution 10.
25 7 0.25)))) (spec_method (label barfff1) (agent Starbuck) (outcomes (default (density 1.25)))) 205 .75 1.A.0) (duration_distribution 3 0.75 1.50 7 0.0) (duration_distribution 4 0.000000) (quality_distribution 10.50 7 0.000000) (quality_distribution 8.25 6 0.25 6 0.000000) (quality_distribution 7.00 1.0) (duration_distribution 4 0.00 1.0) (duration_distribution 8 0.25 5 0.000000) (quality_distribution 10.25 6 0.000000) (quality_distribution 8.50 7 0.50 14 0.000000) (quality_distribution 10.25 5 0.50 6 0.25)))) (spec_method (label conceiveff1) (agent Marc) (outcomes (default (density 1.25)))) (spec_task (label mojarra) (subtasks barf barfff1) (qaf q_max)) (spec_method (label barf) (agent Starbuck) (outcomes (default (density 1.75 1.25 5 0.00 1.0) (duration_distribution 11 0.0) (duration_distribution 3 0.00 1.000000) (quality_distribution 10.2 Sample C TAEMS Problem (quality_distribution 10.25 7 0.50)))) (spec_task (label skimmilk) (subtasks conceive conceiveff1 conceiverd1) (qaf q_max)) (spec_method (label conceive) (agent Marc) (outcomes (default (density 1.25)))) (spec_task (label flapjack) (subtasks import importrd1) (qaf q_max)) (spec_method (label import) (agent Les) (outcomes (default (density 1.0) (duration_distribution 4 0.25)))) (spec_method (label conceiverd1) (agent Les) (outcomes (default (density 1.50 7 0.0) (duration_distribution 4 0.25)))) (spec_method (label importrd1) (agent Marc) (outcomes (default (density 1.25 6 0.75 1.50 6 0.
50) (duration_power 0.50) (duration_power 0.NLEs (spec_hinders (label salesman_to_fissureff1) (from salesman) (to fissureff1) (delay 1) (quality_power 0.Initial Schedule (spec_schedule (schedule_elements (mousetrap (start_time 30) (spec_attributes (performer "Hector") (duration 9))) (snarf (start_time 30) (spec_attributes (performer "Moses") (duration 11))) (solve (start_time 30) (spec_attributes (performer "Betty") (duration 7))) (unstop (start_time 45) (spec_attributes (performer "Marc") (duration 9))) (scorn (start_time 45) (spec_attributes (performer "Les") (duration 7))) (scornrd1 (start_time 45) (spec_attributes (performer "Cris") (duration 8))) (misreport (start_time 50) (spec_attributes (performer "Kevan") (duration 10))) (gutterrd1 (start_time 55) (spec_attributes (performer "Liyuan") (duration 9))) (gutter (start_time 60) (spec_attributes (performer "Kevan") (duration 5))) (purchase (start_time 64) (spec_attributes (performer "Liyuan") (duration 6))) (misreportff1 (start_time 65) (spec_attributes (performer "Kevan") (duration 9))) (stemff1 (start_time 72) (spec_attributes (performer "Tandy") (duration 5))) 206 .50) (duration_power 0.50)) (spec_facilitates (label enwreatherd1_to_elevonwinroot) (from enwreatherd1) (to elevonwinroot) (delay 1) (quality_power 0.50) (duration_power 0...50) (duration_power 0.A Sample Problems .50)) (spec_enables (label gutturalwinroot_to_tandoori) (from gutturalwinroot) (to tandoori) (delay 1)) (spec_enables (label gutterrd1_to_famous) (from gutterrd1) (to famous) (delay 1)) (spec_enables (label scornrd1_to_gutterrd1) (from scornrd1) (to gutterrd1) (delay 1)) (spec_disables (label racerunner_to_randomize) (from racerunner) (to randomize) (delay 1)) (spec_facilitates (label barmecide_to_exudate) (from barmecide) (to exudate) (delay 1) (quality_power 0.50)) (spec_enables (label elevonwinroot_to_entrust) (from elevonwinroot) (to entrust) (delay 1)) (spec_enables (label tandoori_to_barmecide) (from tandoori) (to barmecide) (delay 1)) (spec_facilitates (label gutturalwinroot_to_purchaseff1) (from gutturalwinroot) (to purchaseff1) (delay 1) (quality_power 0.50)) (spec_enables (label stemrd1_to_enervate) (from stemrd1) (to enervate) (delay 1)) .50)) (spec_hinders (label enervate_to_diarize) (from enervate) (to diarize) (delay 1) (quality_power 0.
2 Sample C TAEMS Problem (stemrd1 (start_time 72) (spec_attributes (performer "Betty") (duration 7))) (fissurerd1 (start_time 72) (spec_attributes (performer "Naim") (duration 6))) (tie (start_time 72) (spec_attributes (performer "Kyu") (duration 10))) (randomize (start_time 78) (spec_attributes (performer "Naim") (duration 10))) (enervate (start_time 80) (spec_attributes (performer "Ole") (duration 7))) (entrust (start_time 82) (spec_attributes (performer "Kyu") (duration 8))) (enwreatherd1 (start_time 84) (spec_attributes (performer "Miek") (duration 6))) (importrd1 (start_time 84) (spec_attributes (performer "Marc") (duration 6))) (intoxicate (start_time 84) (spec_attributes (performer "Les") (duration 9))) (barf (start_time 84) (spec_attributes (performer "Starbuck") (duration 5))) (preprogram (start_time 84) (spec_attributes (performer "Galen") (duration 5))) (preprogramrd1 (start_time 89) (spec_attributes (performer "Starbuck") (duration 6))) (orient (start_time 90) (spec_attributes (performer "Miek") (duration 6))) (conceive (start_time 90) (spec_attributes (performer "Marc") (duration 11))) (refaceff1 (start_time 90) (spec_attributes (performer "Blayne") (duration 9))) (diarize (start_time 90) (spec_attributes (performer "Amy") (duration 6))) (search (start_time 90) (spec_attributes (performer "Cris") (duration 8))) (skitterff1 (start_time 95) (spec_attributes (performer "Starbuck") (duration 4))) (configure (start_time 101) (spec_attributes (performer "Marc") (duration 6)))) (spec_attributes (Quality 421.A.35208))) 207 .
A Sample Problems 208 .