Professional Documents
Culture Documents
https://doi.org/10.1007/s10588-022-09359-y
Disaster world
Decision-theoretic agents for simulating population responses to
hurricanes
Abstract
Artificial intelligence (AI) research provides a rich source of modeling languages
capable of generating socially plausible simulations of human behavior, while also
providing a transparent ground truth that can support validation of social-science
methods applied to that simulation. In this work, we leverage two established AI
representations: decision-theoretic planning and recursive modeling. Decision-
theoretic planning (specifically Partially Observable Markov Decision Processes)
provides agents with quantitative models of their corresponding real-world enti-
ties’ subjective (and possibly incorrect) perspectives of ground truth in the form of
probabilistic beliefs and utility functions. Recursive modeling gives an agent a the-
ory of mind, which is necessary when a person’s (again, possibly incorrect) subjec-
tive perspectives are of another person, rather than of just his/her environment. We
used PsychSim, a multiagent social-simulation framework combining these two AI
frameworks, to build a general parameterized model of human behavior during dis-
aster response, grounding the model in social-psychological theories to ensure social
plausibility. We then instantiated that model into alternate ground truths for simu-
lating population response to a series of natural disasters, namely, hurricanes. The
simulations generate data in response to socially plausible instruments (e.g., sur-
veys) that serve as input to the Ground Truth program’s designated research teams
for them to conduct simulated social science. The simulation also provides a graphi-
cal ground truth and a set of outcomes to be used as the gold standard in evaluating
the research teams’ inferences.
* David V. Pynadath
pynadath@usc.edu
Extended author information available on the last page of the article
1Vol:.(1234567890)
3
Disaster world 85
1 Introduction
Social science modeling approaches are designed to make causal inferences about
social dynamics observed in the real-world, but they have a major drawback:
The accuracy of these approaches and their inferred causal links is impossible to
evaluate without knowing the ground truth underlying the observed phenomena.
The aptly named Ground Truth program sought to assess the accuracy of such
approaches with respect to a known, albeit simulated, underlying model of social
dynamics. The program tasked one set of teams to create computational “test-
beds” with a known ground truth that another set of teams would try to infer from
observable data generated by those testbeds. This process afforded the means
to characterize the limitations of today’s modeling tools in inferring a social-
dynamics ground truth from observable data. This paper presents our computa-
tional approach in building a multiagent social simulation to provide a testbed for
the evaluation of teams’ efforts in inferring its ground truth.
Multiagent social simulation has proven to be a useful tool for answering
the hypothetical questions asked by policy makers (Carley et al. 2006; JASSS
1998-present; Luke et al. 2005; MABS 1998-present; Sun 2006). Such simula-
tions represent people as autonomous agents that reflect individuals’ or groups’
decision-making perspectives and behavior. In most approaches, each agent
chooses actions based on a simple set of rules that captures a hypothesis underly-
ing the simulation. While such rules provide a generative model of human behav-
ior, they are less suited for capturing a “ground truth” to be inferred by others.
Changing the hypotheses underlying the simulated mental processes typically
requires encoding a new set of rules. Furthermore, the rules gain efficiency at
the cost of transparency of the reasoning process they encode. Rules reduce the
entire reasoning process into a set of stimulus-response behaviors, so that reason-
ing process is no longer accessible to researchers who may wish to observe it.
Artificial intelligence (AI) research provides a rich source of alternate mod-
eling languages capable of addressing the requirements of a transparent ground
truth. For example, we can use decision-theoretic models to capture people’s deci-
sion-making processes, in the form of beliefs, choices, preferences, etc. (Goodie
et al. 2012; Hoey and Little 2007; Paruchuri et al. 2013; Pynadath and Marsella
2005; Wang et al. 2015). By representing these relatively persistent characteris-
tics, the agent can make decisions that are aligned with the corresponding real
people in hypothetical situations of interest.
We claim that two established AI representations are particularly appropri-
ate for the needed simulation framework: decision-theoretic planning and recur-
sive modeling. Decision-theoretic planning [specifically Partially Observable
Markov Decision Processes (POMDPs) Kaelbling et al. (1998)] provides agents
with quantitative models of their corresponding real-world entities’ subjective
(i.e., possibly incorrect) perspectives of ground truth in the form of probabilistic
beliefs and utility functions. Recursive modeling gives an agent a theory of mind,
which is necessary when a person’s subjective perspectives are of another person,
rather than of just his/her environment. A theory of mind enables people (and
13
86 D. V. Pynadath et al.
agents) to form beliefs about the mental states of others, generate expectations
about the behaviors of other, and update their beliefs in response to observations
of their actual behavior.
We have leveraged such AI technology to develop a social-simulation framework,
called PsychSim (Pynadath and Marsella 2005), that we have used to build genera-
tive models of diverse social scenarios, spanning training simulations in urban sta-
bilization (McAlinden et al. 2014) and bilateral negotiation (Kim et al. 2009), and a
Department of Homeland Security study of disaster response (Pynadath et al. 2016),
among others. Each PsychSim agent represents an entity (individual, organization,
state, etc.) within a general language for encapsulating a variety of phenomena stud-
ied in the psychological literature, such as appraisal processes in emotion (Si et al.
2010), wishful thinking Ito et al. (2010), influence factors (Marsella et al. 2004), and
stereotype formation/person perception (Pynadath and Marsella 2007). All of these
existing models have been captured within a general graphical language that can
encode first principles as quantitative links among variables.
In this work, we used PsychSim to build a general parameterized model of human
behavior during disaster response, grounding the model in social-psychological
theories to ensure social plausibility. We then instantiated that model into alternate
ground truths for simulating population response to a series of natural disasters,
namely, hurricanes. The framework provides simulation flexibility by supporting
the reconfiguration of the simulation through a relatively small set of parameters.
Changing these parameters produces an alternate simulation grounded in a per-
turbation of the original ground truth, but one that still leads to socially plausible
behavior.
Section 2 describes the properties of the real-world phenomena that our model
seeks to capture. Section 3 presents PsychSim, the multiagent social-simulation
framework that we used to represent and simulate the agent models. Section 4 pre-
sents those agent models of the hurricane, the urban area it affects, the individual
actors who live there, the groups they potentially form in response to the hurricane,
and the government that encapsulates the system-level response. Section 5 presents
the separate explanation, prediction, and prescription challenges that we posed to
those analysts. Section 6 describes the data that the resulting simulation provides
up front, as well as in response to research requests from external analysts of the
simulated society. Section 7 discusses potential future directions for this work and
concludes.
Hurricanes are natural storms, energized by warm ocean temperatures, with sus-
tained wind speeds of 74 mph or more: Increasing wind speed (and category clas-
sification Schott et al. 2019) and concomitant heavy rain, storm surges, and flood-
ing have led to thousands of fatalities and over $400 billion in estimated property
damage in the last decade alone. Given global warming, coastal communities are
increasingly vulnerable as sea levels rise, compounding surge effects (NOAA 2020).
13
Disaster world 87
3 PsychSim
13
88 D. V. Pynadath et al.
proven to be a robust and powerful tool for decision analysis, a method for con-
structing a model of the causal relationships underlying an individual’s or group’s
decision-making process (Howard 1988).
While a factored POMDP thus provides an expressive model of an entity’s deci-
sion-making process, we also need to support multiple POMDPs to capture the enti-
ties’ different decision-making perspectives. Furthermore, each of these entities may
have beliefs about not only their environment (e.g., “The hurricane poses a great
risk to me and my family”), but also beliefs about other entities (e.g., “I do not think
the government cares about my ethnic group”). Recursive models allow agents to
have such a theory of mind, by representing their beliefs about the mental states of
other agents in the same form as their own mental states, allowing them to reuse
the same AI algorithms to generate expectations of others’ behavior as they use for
their own actual behavior (Gmytrasiewicz and Durfee 1995). Interactive POMDPs
(Gmytrasiewicz and Doshi 2005) re-use POMDP representations and algorithms to
represent agents’ subjective perceptions of others that can deviate from reality (e.g.,
individual citizens may have different beliefs about their government’s reward func-
tion than what the government’s actual reward function is).
PsychSim is our implementation of recursive, factored POMDPs, with additional
restrictions to aid AI non-experts in creating and understanding the simulation mod-
els and output. To allow such potential authors to be able to encode their knowledge
within PsychSim agents, these encodings take the form of a graphical representation
of probabilistic and utility interdependencies among scenario variables. We start
from the standard factored POMDP’s use of Dynamic Bayesian Networks (Kjaerulff
1992) and influence diagrams (Howard and Matheson 1984) to exploit conditional
independence in modeling the effects of actions Boutilier et al. (1999). We can thus
express dependencies among our states and actions as links among the nodes of a
dynamic influence diagram (DID) (Tatman and Shachter 1990). PsychSim’s com-
putational realization of such graphs in a multiagent context draws upon existing
decision-theoretic graphical models like MultiAgent Influence Diagrams (MAIDs)
(Koller and Milch 2003) and Interactive Dynamic Influence Diagrams (I-DIDs)
(Polich and Gmytrasiewicz 2007).
4 Agent models
Figure 1 shows the DID visualization of a particular instantiation of our agent mod-
els. Different colored nodes correspond to variables associated with different enti-
ties: red for the hurricane (Nature), dark green for regions of the urban environment
(Regions), two shades of yellow for two different residents of the area (Actors), light
green for groups of actors that emerge to perform joint actions (Groups), gray for
system-wide behaviors that emerge through political processes (System), and blue
for global variables not associated with any entity.
The shapes of the nodes indicate the type of variable: ovals for random variables,
rectangles for actions, and hexagons for utility functions. We can further distinguish
between the values of random variables before actions are taken (to the left of the
columns of rectangular nodes) and those afterward (to the right). Edges between
13
Disaster world 89
nodes can thus capture both interdependencies among variables at the same point in
time and the effects of actions on the change in variables over time.
The simulation uses the latter to update the state of the world once per day, with
each day consisting of updates to the state of individual entities in the following
sequence: (1) Groups, (2) Actors, (3) System, and (4) Nature (Sect. 4.1). Much of
the interaction among these entities is mediated by their effects on the shared envi-
ronment, represented by 16 regions (Sect. 4.2). Unless otherwise noted, the vari-
ables take on real values ∈ [0, 1] . The initial values of such variables are drawn from
a normal distribution (whose parameters can vary from simulation to simulation)
and then mapped to the smallest element in {0, 0.2, 0.4, 0.6, 0.8, 1} that is greater
than the value drawn.
Section 4.1 presents our model of the hurricane dynamics. Section 4.2 presents
our representation of the regional environment and how it is affected by the hurri-
cane. Section 4.3 presents our actor model. Section 4.4 presents how we model the
emergence of group behaviors from the actors within a region. Section 4.5 presents
our model of how system-wide dynamics emerge from the actors across the entire
area.
4.1 Nature
In line with the unpredictable characteristics of hurricanes in the real world, evolu-
tion of hurricanes in our model is governed by a stochastic process that is independ-
ent of the actions of any of the people. The state of the hurricane is defined by four
variables (the terms in bold represent their initial values):
Category ∈{none, 1–5} indicates the severity of the hurricane along the Saffir-
Simpson scale, with 1 being the least and 5 being the most severe (Schott et al.
2019). A value of none indicates that there is no hurricane present.
13
90 D. V. Pynadath et al.
A change of phase from none to approaching can happen only after a mini-
mum number of days, with that minimum fixed within a simulation instance, but
free to vary across instances. Upon reaching that minimum, there is a fixed prob-
ability of the phase changing to approaching, with that value again constant for a
given instance, but varying across instances. The dynamics of the transition from
approaching to active are the same, with the same fixed minimum number of days
and the fixed transition probability.
When phase transitions to approaching, the category of this new hurricane is
drawn from a uniform distribution over 1–5. The predicted landfall location is drawn
from a uniform distribution over the four coastal regions (Region01, Region05,
Region09, and Region13). While the hurricane is approaching, there is a fixed prob-
ability of its category going up or down by 1 as permitted by the 1–5 range. How-
ever, the location stays constant during this phase.
When phase transitions to active, the category does not change. We restrict
the movement of the hurricane to be only east (inland) or north. Each simulation
instance has its own fixed probability distribution over whether the hurricane’s loca-
tion on the next day will be the region directly to the east of its current location, the
region directly to the north, or the same region.
When the hurricane’s location makes a transition to the north or east that would
take it out of the specified regions, the hurricane is declared over. Its category,
phase, and location all reset to none. days resets to 0 and the phase-transition cycle
begins again.
4.2 Regions
Risk encompasses all of the ways that the hurricane can make a region unsafe
(e.g., property damage, high winds, flooding, etc.). Each region’s risk level
13
Disaster world 91
Fig. 2 Population map at the start of a sample simulation instance, showing the risk of regions (green
rectangles) and the health of individual actors (color-coded circles). (Color figure online)
can never drop below its initial value (i.e., its baseline level of safety never
improves). These initial values are drawn from a normal distribution with a
mean and standard deviation specified for each simulation instance. In Fig. 2,
the regions all start with an equally low level of risk, depicted by the green
color of the corresponding rectangles.
Economy represents the economic viability of the region, in terms of the abil-
ity of businesses to stay in operation. As in the risk level, a region’s economic
level can never improve to exceed its initial value. These initial values are also
drawn from a normal distribution with a mean and standard deviation specified
for each simulation instance.
On each day with no hurricane over land (i.e., phase is {none or approaching),
each region’s risk decreases toward its initial value by a fixed percentage. When
phase is active, then each region’s risk increases toward 1 by a percentage that
increases with category and decreases with the Manhattan distance between that
region and the hurricane’s location.
Figure 3 illustrates the state of the simulation in the midst of a hurricane,
depicted by the red icon in the center of the region second from the left in the top
row. The region containing the icon is the hurricane’s location, and the number
in the icon (2) gives the hurricane’s category. Green regions are still those with
low levels of risk, but we can see that the regions that the hurricane has passed
through have higher levels of risk from their orange color.
Figure 4 shows the effects of a different hurricane, this time with a category of
3 and a location in the bottom left region. The hurricane has just made landfall,
so that its phase has just become active. The higher category of the hurricane
has led to a larger impact on the regions’ risk levels than is seen in Fig. 3. There
13
92 D. V. Pynadath et al.
are areas of red (highest risk), and no regions retain their original dark green
color.
The hurricane’s effect on the regions’ economy is based on their risk level. If a
region’s risk exceeds a fixed threshold, then its economy will decrease by a fixed
percentage; otherwise, it will recover toward its initial value by a fixed percentage.
Regions which have public shelters have four more variables to represent their
current state (a region can have at most one shelter):
13
Disaster world 93
Shelter risk has the same meaning as the region’s risk variable, except that it
captures the level of risk specific to the shelter, rather than the region at large.
Shelter pets ∈{True, False} indicates whether the shelter allows pets inside
or not (True means that it does). Shelters do not change their policy during the
course of the simulation.
Shelter capacity is an integer indicating how many households can stay in the
shelter. A shelter’s capacity does not change during the course of the simulation.
Shelter occupancy is an integer indicating how many households are currently
staying in the shelter.
4.3 Actors
Actors are the most complex entities in the simulation by far, as they are the primary
target for the inference challenge.
4.3.1 Actor state
13
94 D. V. Pynadath et al.
There are internal dependencies among some of these variables. The most impor-
tant dependency is the effect of actors’ risk on their health. Actors whose risk is
in the lowest quintile (i.e., ≤ 0.2 ) face no threat and will instead recover from any
prior injuries. More precisely, they will have their health approach its initial value
by a fixed percentage (that possibly varies across simulation instances) each day. All
other actors face a nonzero chance of injury, with the likelihood of injury increas-
ing at higher quintiles of risk. Their health will change stochastically, approaching
either 0 or its initial value by a fixed (but possibly unequal) percentage. The likeli-
hood of going up vs. down is based on which quintile an actor’s current risk is in.
In particular, if it is in the highest quintile, there is an 80% chance that the actor’s
health will decrease; if in the next highest, there is a 60% chance; etc. Thus, even a
low level of risk (e.g., 0.2) can still result in injury, while even a maximum level of
risk still has a chance of being survived injury-free. This provides room for different
actors to assess such uncertain outcomes differently.
Children of actors face the same injury risks as their parents. In other words, the
dynamics of children health follows the same distribution as their parents’ health,
conditioned on the same value of risk. However, the health and children health are
drawn independently, so children are no more or less likely to suffer an injury when
their parents are injured.
The dynamics of other state variables depend on the actors’ behavior, captured by
the actions they choose to perform. The actors’ choices center around locations they
decide to move to. They can either:
Evacuate The actor and any family leave the area completely, at least temporar-
ily. This action changes the actor’s location to evacuated.
13
Disaster world 95
Move to shelter The actor and any family move into a public shelter. This action
changes the actor’s location to shelter.
Move home The actor and family move back to their residence. This action
changes the actor’s location to home.
Stay in location The actor and family stay where they currently are. This action
does not change the actor’s location.
When not moving (i.e., choosing stay in location), the actors’ risk changes based
on their location. When at home, their risk is set to the risk associated with their
region of residence. When already at a public shelter, it is set to the shelter risk of
the region in which they are sheltering. When their location is already evacu-
ated, their risk drops by 90%.
When returning home (i.e., choosing move home), actors also incur the risk of
their region of residence. The other movement actions affect the actor’s risk slightly
differently. When evacuating, the actor incurs the risk of the last region traversed
before leaving the area. While traveling to a shelter (i.e., choosing move to shelter),
actors incur the risk (not the shelter risk) of the region containing that shelter, to
capture the danger they are exposed to while in transit.
Moving to a shelter is the only action that can affect the status of an actor’s pet.
The only circumstance under which a pet will die if the its owner moves to a public
shelter that does not allow pets, thus forcing the owner to abandon it at home during
the hurricane. More precisely, an actor’s pet becomes False only if the actor chooses
to move to shelter in a region whose shelter pet is False. Under all other circum-
stances, the status of the pet stays the same.
Movement also has financial ramifications for the actors. There is a fixed proba-
bility (possibly changing across simulation instances) of actors losing their job (i.e.,
employed becomes False) while their location is evacuated. There is no chance of
their losing their job under any other circumstance, nor is there a chance of actors
gaining employment when unemployed. Actors who are unemployed (employed is
False) will see their resources drop by a fixed percentage.
However, being employed is not a guarantee of income either. Each simulation
instance can set a threshold which a region’s economy must exceed for jobs in that
region to generate income. We minimize the number of regions for actors to model
by assuming that they are employed in their region of residence. If a region’s econ-
omy does not exceed this threshold, then the residents of that region’s resources
drop just as if they would if their employed was False.
For actors employed in a region with a sufficiently high economy are able to gain
in resources. For actors whose location is either home or evacuated, their resources
level increases toward its original value by a fixed percentage (i.e., all actors receive
the same income). Actors whose location is a shelter may or may not be able to gain
an income, depending on a fixed Boolean parameter associated with each simula-
tion instance. This parameter controls whether actors staying at the shelter are able
to continue gaining income across the population (identical to being at home), or
whether they are not (identical to being unemployed).
The other impact on an actor’s resources is the cost associated with choosing to
evacuate. We define this as a fixed cost for each simulation instance, with the cost
13
96 D. V. Pynadath et al.
being subtracted from the actor’s resources when evacuating. Actors whose level of
resources is less than this cost end up with 0 resources upon evacuating. Note that
this cost in no way blocks actors from evacuating; in fact, actors with 0 resources
can still evacuate and will suffer no financial repercussion for it. While this might
not be completely accurate, it does capture the fact that people who have already lost
everything have less to lose by abandoning their homes.
Actors whose location is home have the option of performing prosocial and antiso-
cial actions as well:
Decrease risk is a prosocial action which lowers the risk of the actor’s region of
residence.
Take resources is an antisocial action (equivalent to looting) which increases the
actor’s resources.
Both of these actions require actors to leave the safety of their home, so they increase
the personal risk possibly beyond the baseline risk of the region of residence. An
instance parameter specifies a fixed percentage (one for each action type) by which
the actor’s risk will approach 1.
The decrease risk action will decrease the risk and increase the economy of the
actor’s region of residence. There is thus an immediate benefit to the risk levels of
the actor’s neighbors. However, the actor does not see this benefit until the follow-
ing day, because of the exposure to greater risk incurred on the day of the prosocial
behavior.
The take resources action brings the actor’s resources closer to 1 by a fixed per-
centage that can change across simulation instances. There is thus a benefit to the
individual actor, but we do not model the cost to the region of residence. A more
realistic model of looting would perhaps cause the region’s economy to decrease
based on the number of actors choosing the take resources action on a given day.
4.3.4 Actor reward
The reward function represents the utility that an actor derives from the current state.
We choose a linear reward structure for the actors, with constant weights over a sub-
set of state features. The initial values for these weights are sampled from normal
distributions (all using the same standard deviation). The values remain unchanged
throughout, reflecting the unchanging values held by the actors. The following are
the state features from which the actors derive direct utility:
Priority of health assigns a positive weight to the actor’s level of health. The
mean of the distribution can change based on the actor’s gender.
Priority of resources assigns a positive weight to the actor’s level of resources.
The mean of the distribution can change based on the actor’s gender.
13
Disaster world 97
4.3.5 Actor beliefs
Actors are able to observe all state features associated with themselves, giving them
accurate beliefs about the values of those variables. This model thus assumes that
actors are able to accurately assess their health, location, employed, grievance,
etc. Furthermore, when there is no hurricane (phase=none), they are also able to
observe that fact, as well as the levels of risk and shelter risk in their region of
residence.
However, when a hurricane is present (phase is either approaching or
active), actors have only partial observability of the true values of the variables
not associated with themselves. In particular, they do not directly observe the values
of the hurricane’s category, the risk of their region of residence, the shelter risk of
their nearest shelter(s), nor their own risk.
Instead, they receive an uncertain observation, perceived category, which
is drawn each day from a distribution that is conditioned on the hurricane’s true
category. Actors are given a static assignment to one of three such distributions,
recorded in their information distortion attribute: overestimate either overestimates
or matches the true category value, underestimate either underestimates or matches,
and none yields the correct observations with 100% probability. The overestimate
(underestimate) distribution yields either the true category value or that value plus
(minus) one. The probability of receiving the incorrect value is fixed throughout a
given simulation instance and is the same for both types of information distortion.
If the incorrect value is outside the acceptable category range of 1–5, then the prob-
ability is 0.
Upon receiving information about the hurricane’s category, the actors do a
Bayesian belief update, following the standard POMDP belief-update algorithm
(Kaelbling et al. 1998). The actors all have complete knowledge of the hurricane
dynamics as described in Sect. 4.1. They are thus able to compute the likelihood
over possible transitions in the category value from the current values in their
beliefs. They combine these expectations with the observation they receive to com-
pute a posterior distribution over category in their new belief state.
Unlike category, actors do not receive any information about the risk of their
region of residence, the shelter risk of their nearest shelter(s), nor their own risk
while a hurricane is active. They are still able to form and update their beliefs
13
98 D. V. Pynadath et al.
about these variables using their complete knowledge of the effects of the hurri-
cane on them. In other words, they start from the possible values for category in
their updated beliefs and then apply the deterministic effects described in Sect. 4.2
to determine the implied values for the regional risk variables. They then apply the
deterministic effects described in Sect. 4.3.2 to determine the values for their risk
implied by these updated values of regional risk. They are thus able to compute a
posterior joint distribution over category, the regional risk and shelter risk vari-
ables, and their own risk for their updated belief state.
While actors do not receive any direct information about their own risk, they do
receive observations of variables affected by it, such as their own health. For exam-
ple, after an actor sees its health decrease (e.g., due to injury), it should believe its
risk level to be higher than it previously thought. The POMDP belief-update algo-
rithm used by our agents realizes such changes in posterior beliefs. In particular,
the actors use their complete knowledge about the dependency between their risk
and health (described in Sect. 4.3.1) and their observed health and children health
values to compute a posterior distribution over the joint category, regional risk,
and personal risk variables. This distribution reflects the consistency of the hypoth-
esized severity of the hurricane and its impact with the actors’ information received
(perceived category) and personal experience (health and children health).
Thus, despite the possibly erroneous information about category and their inabil-
ity to directly observe the risk at either the regional or personal level, the actors
are able to update their beliefs based on their own experience and their knowledge
of how their physical environment works. Each day brings additional evidence that
they use to update their beliefs, resulting in the vast majority of actors having accu-
rate beliefs about the hurricane severity by the time it leaves the area, regardless of
what misconceptions may have formed at its onset. The distribution from which per-
ceived category is drawn quantifies the stochasticity in the belief-update process.
Actors beliefs are limited to only those variables that concern themselves, the
hurricane, and regions they may travel to or through. They do not form beliefs about
other actors, not even their friends or neighbors. This restriction greatly reduces the
size of the actors’ belief states and the computation time needed for them to rea-
son over the outcomes of their actions given those belief states. It also captures the
limited reasoning that people do about each other, in that it is unrealistic for actors
to form and maintain beliefs about all of the other individual residents in the area.
They are able to use their regions’ risk as a proxy for the well-being of their neigh-
bors. It would be plausible for the actors to maintain beliefs over their small group
of friends, but we do not have them do so in the current simulation.
The heart of the actors’ behavior generation is in their POMDP-based decision mak-
ing. We describe this decision making using an online version of the algorithm,
where each actor reasons about which action maximizes expected reward given its
current beliefs (Ross et al. 2008). To do so, it considers each of its available actions
separately, generating expectations of the effect of each candidate action, and choos-
ing the action that leads to the highest expected reward.
13
Disaster world 99
13
100 D. V. Pynadath et al.
fidelity of the actors’ expectations about their own behavior as they look farther into
the future.
The end result of these hypothetical simulations is a value function that computes
the actor’s Expected Reward, a table over its possible action choices. Actors choose
the action that has the highest value in this table, with ties broken based on a fixed
ordering generated at runtime. Despite the stochasticity in the hurricane dynamics
and the possibility of injury, the computation of Expected Reward is deterministic
given the actors’ beliefs. Thus, their action choices are also a deterministic func-
tion of their belief states. While it is trivial to replace the strict maximization with a
softmax instead, we deliberately avoided introducing such a random component into
the actors’ decision making in this scenario. The variability in the belief states led to
sufficiently plausible diversity of behaviors, so introducing stochasticity into action
selection would have served only to obfuscate the decision-making process.
4.3.7 Actor relationships
There are three types of binary relationships that can exist between actors:
Married to represents a marriage relationship between two actors. Pairings are
chosen randomly at the beginning of the simulation, based on a specified percentage
of married vs. single members of the population and a percentage of same-gender
marriages. An actor can be married to at most one other actor, and the relation-
ship holds for the duration of the simulation. Actors who are married must perform
the same action whenever possible. Whichever actor makes a decision first1 imposes
that decision on the partner (regardless of what that partner’s expected-reward cal-
culation would have otherwise dictated). Thus, married couples act in perfect unison
throughout the course of the simulation. However, partners not making the decision
are still able to answer questions about how (dis)satisfied they were with the action
imposed on them. To do so, actors can compare their expected reward of the action
imposed on them against their expected reward of the alternative action they would
have chosen otherwise.
Friend of represents a friendship between two actors. Pairings are chosen ran-
domly at the beginning of the simulation, and the pairing holds for the duration
of the simulation. Fixed parameters specify a minimum and maximum number of
friends, and each actor has a number drawn from a uniform distribution over that
range. Friends do not influence each other’s reward function, but they do influence
each other’s beliefs. Every day, actors send a message containing their beliefs about
the hurricane’s category to all of their friends. Actors then update their beliefs by
computing a weighted sum over the probability distributions in these messages and
its own beliefs. The weighted sum is guaranteed by three parameters fixed for a
given simulation instance: trust in self, a weight for the actor’s own beliefs; trust
in optimists, a weight for messages that are more optimistic (i.e., a lower expec-
tation for category) than the actor’s own beliefs, and trust in pessimists, a weight
for messages that are more pessimistic (i.e., a higher expectation for category). The
1
Each actor’s decision-making function is invoked in an arbitrarily determined sequence.
13
Disaster world 101
weighted sum of these distributions becomes the actor’s new belief over category.
Decreasing trust in self makes the population more susceptible to social influence.
Increasing trust in pessimists leads to amplification of risk, as the influence process
will lead to higher perceptions of risk. Increasing trust in optimists leads to deampli-
ficiation of risk and the opposite effect.
Neighbor of is an implicit relationship between actors who share the same region
of residence. Because the actors’ region of residence is constant throughout the sim-
ulation, the neighbor of relationships are similarly constant. Actors do not explic-
itly reason about their neighbors, nor even form beliefs about them as individuals.
However, the reward associated with the risk in their region of residence (priority
of neighbors) causes actors to be indirectly incentivized to help their neighbors, as
decreasing the region’s risk also decreases the risk of any neighbors currently stay-
ing home.
4.4 Groups
Each region has an organized group which all residents are eligible to join. Actors
are free to join and leave the group in their region of residence as often as they
prefer, but they cannot join groups in other regions. This possible membership in
a group is captured in each actor’s member of Group Region X, where Region X
is the actor’s region of residence. This is a Boolean variable that is True when the
actor is a member of the group and False otherwise. The initial group members are
selected randomly from the residents according to a fixed probability (which can
change across simulation instances).
Groups essentially act as a monolithic agent, whose beliefs and reward functions
are an aggregation of those of its current members. Each group considers perform-
ing a joint prosocial action (a group version of decrease risk) or else leaving its
members to choose their own individual action. The group makes this decision by
evaluating the expected reward of each option just as actors do, with the only differ-
ence being the group’s particular beliefs, reward function, and horizon. Section 4.4.1
describes how groups arrive at their aggregated belief state. The group’s reward
function simply computes the sum of the rewards received by its individual mem-
bers. Finally, all groups within a simulation instance share the same fixed horizon.
4.4.1 Group beliefs
Because a group’s beliefs are derived from the beliefs of its individual members, the
uncertainty in its beliefs is confined to the same uncertain variables from the actors’
beliefs: category, any relevant regions’ risk and shelter risk, and the personal risk
of its members. The group adopts the same beliefs about its members’ risk levels as
the individual members themselves, as actors do not have beliefs about each others’
risk.
However, the group must form a coherent belief over the hurricane and regional
variables out of the possibly divergent beliefs (i.e., probability distributions) held
by its members. To do so, it extracts one of three distributions from the aggregation
13
102 D. V. Pynadath et al.
of its members’ beliefs: a mean that averages the members’ beliefs, a max distribu-
tion that is the belief with the highest expected value out of all the members’, and a
min distribution with the lowest expected value. The selection used is specified by
a group aggregation attribute on each group, with the value chosen randomly from
a fixed distribution, and with the value staying constant throughout the simulation.
Groups whose group aggregation is mean will form beliefs that are in the “middle”
of their individual members’ beliefs (although the exact beliefs formed may not be
shared by any members). Groups whose group aggregation is max (min) will form
beliefs that mirror the highest (lowest) perception of hurricane severity across their
members.
4.4.2 Group actions
Each day, the groups first decide whether or not to perform a joint decrease risk
action. If a group decides not to, all of the actors in that region choose their action
out of those allowable from Sect. 4.3.2. If a group does decide to perform the joint
action, then actors who already belong to the group have a reduced set of options, as
they can either participate in the joint action or else leave the group (leave Group
Region X ). Leaving the group changes the value of member of Group Region X to
False, but is otherwise identical in effect to the action stay in location.
Actors who do not belong to the group consider their full set of available action
choices, including participating in the joint action, if one was chosen for that day
by their relevant group. Deciding to participate in the joint action is labeled as join
Group Region X and changes the value of member of Group Region X to True.
Joint execution of the decrease risk action has two benefits. First, actors are
exposed to less personal risk when acting jointly instead of individually. In par-
ticular, the increase to personal risk when performing decrease risk is reduced
by a fixed percentage when performing it jointly. Second, the resulting decrease
in the region’s risk is magnified by the same fixed percentage. Thus, acting jointly
increases the benefit to the region as a whole, while also reducing the risk to the
individuals. On the other hand, this benefit occurs regardless of the size of the group,
so there is room for free riders, who still reap the benefit when the group makes their
region safer, while also avoiding even the reduced increase in personal risk.
Just as the actors do, a group performs a hypothetical simulation to compute its
Expected Reward under both options, subject to the misconceptions possibly con-
tained within its aggregated belief state. The group has the benefit of access to all of
its individual members’ beliefs, so it can generate a more accurate expectation of the
cumulative effect during the actors’ turn. Just as the actors’ hypothetical simulation
deterministically generates their Expected Reward table, so does the groups’. Thus,
there is no stochasticity in a group’s decision as a function of its belief state, though
again, the group’s belief state dynamics will exhibit stochasticity due to the stochas-
ticity of its individual members’ beliefs.
13
Disaster world 103
Groups that have fewer altruistic members are less likely to engage in joint
prosocial behavior, as its aggregated reward function will reflect this lesser degree
of altruism. The interaction with the group’s aggregated belief about the risk is
less straightforward. While a higher level of perceived risk would cause more self-
ish actors to avoid danger, it would also offer more room for improvement by the
decrease risk action (which decreases the region’s risk by a percentage). In such
cases, the interaction between the benefit to the region and the health outcomes for
individual members (rather than simply the risk levels) comes into play.
4.5 System
The system level reflects government response to the hurricane. We describe here
the simulation’s default government policy, but this policy could be overridden in
the Prescribe Challenge (described in Sect. 5.3). By default, the government allo-
cates aid to a single region each day. The aid reduces the risk in the chosen region
toward its initial level by a fixed percentage. This percentage is fixed for a given
simulation instance, unless modified by an external prescription (e.g., a tax policy
that gives the government more resources to allocate).
The government chooses which region receives aid by examining the risk of each
region. In the simplest government policy, it simply chooses the region with the
highest risk value. An alternative policy has the government choose the region for
which the product of risk and the number of residents is highest. The most complex
alternative policy replaces the number of residents with a weighted count, account-
ing for the government being susceptible to bias along ethnic and religious lines:
Ethnic bias ∈ [−1, 1] represents how much more weight government gives to
residents of the ethnic majority.
Religious bias ∈ [−1, 1] represents how much more weight government gives to
residents of the religious majority.
Positive bias numbers incentivize the government to favor regions that have more
residents of the ethnic and religious majorities, while negative numbers have the
opposite effect.
The actors’ grievance captures their dissatisfaction with the government’s
response. Residents of regions who do not receive aid will have their grievance
increase toward 1 by a fixed percentage, while residents of the region receiving aid
will have their grievance decrease by that same percentage. This is a very narrow
model of dissatisfaction, as actors do not consider any other variables, not even the
degree to which their region even needs aid (i.e., its risk).
System-level dynamics emerge from an election process that we did not exercise
in any of the challenges, but which allow the biases of the government to change
periodically. More specifically, after an election, the ethnic (religious) bias at the
system level increments by the difference between the total of the grievance values
over all members of the ethnic (religious) majority and the total over all members
of the minority, divided by the total number of voters. If all actors have the same
13
104 D. V. Pynadath et al.
level of grievance, then the biases toward the majority should increase (assuming
the majority to be more populous). On the other hand, this function allows the biases
to shift toward the minority should their grievance values sufficiently exceed those
of the majority.
This section presents the specific challenges we posed to the Ground Truth pro-
gram’s designated research teams, who were tasked with conducting social science
on our simulated world.
5.1 Explain
The following scenario description was provided to the research teams as back-
ground material on the simulation in all three challenges:
You are a public policy consultant, assisting local governments in hurricane-
ravaged areas. Although each of these communities has its own particularities,
they all share a common problem: despite the government’s best efforts to pro-
vide shelter and aid to its residents, each hurricane season brings more death,
destruction, and dissatisfaction. Officials are mystified by why their diverse
constituents respond to each hurricane as they do, making it hard to predict
their behavior and choose the most effective intervention. These governments
seek your advice on what policies to implement to minimize the negative
effects of these hurricanes.
You have many tools at your disposal to augment the available government
data with information more targeted to your analysis. Surveys are the most
rudimentary instrument, and also the least expensive to implement. Of course,
self-reported perceptions, motivations, etc. are notoriously fickle in their accu-
racy, so you also have a team of observers ready to give you first-hand reports
of conditions and behavior on the ground. But the main advantage you have
over your competitors is the HoloCane®, your proprietary hurricane simulator.
The HoloCane® places subjects within a hyper-compressed hurricane time-
line of the experimenter’s choosing, and, despite being completely artificial
and almost completely safe, it still engenders responses that reflect its subjects’
behavior during the real thing. Armed with these tools, you approach each new
area with full confidence that you can fulfill the hopes of its government and
residents.
5.2 Predict challenge
The predict challenge provided data from an initial sequence of hurricanes and
asked for predictions on certain outcomes over a subsequent number of hurricanes.
13
Disaster world 105
We provided two different simulation instances: one used for a short-term challenge,
and one used for a long-term one.
5.2.1 Short‑term challenge
In this challenge, data for the first N hurricanes of a given season were provided,
along with the complete trajectory (time series of category, phase, location) pro-
vided for the N + 1 st one. A specific target actor was also identified to be the sub-
ject of queries at the individual level. The goal of this challenge is to predict out-
comes of a future hurricane, but with the hurricane’s inherent stochasticity taken
out of the equation. The following questions were posed with respect to the N + 1 st
hurricane:
1. Global prediction: How many people will evacuate at least once during the new
hurricane?
2. Local prediction: Which region will have the highest percentage of evacuations?
3. Individual prediction: Will the target actor evacuate during the new hurricane?
4. Counterfactual prediction: How would your answers to questions 1 and 3
change if all of the area’s shelters became unusable at the end of hurricane N and
remained unusable throughout hurricane N + 1?
The correct predictions are generated by Monte Carlo simulation. With the hurri-
cane trajectory fixed, the only remaining uncertainty is the injuries incurred by the
actors, which are conditional on their risk levels. Their evacuation decisions are
deterministic given their belief states (as described in Sect. 4.3.6), but the random-
ness of injuries means that there is stochasticity in those belief state trajectories that
makes the correct predictions an expectation over possible outcomes, even at the
individual level of the target actor.
5.2.2 Long‑term challenge
In this challenge, data for an entire hurricane season (4–6 months) were provided,
and we posed questions regarding the subsequent hurricane season, starting exactly
one year from the start of the first. A specific target actor is also singled out for
individual-level queries. Unlike the short-term challenge, no information is given
regarding the hurricanes in the second season, but a specific target actor. We instead
ask for predictions over a longer time period
13
106 D. V. Pynadath et al.
As in the short-term challenge, the correct predictions are generated by Monte Carlo
simulation. The taxation affects the actors’ resources, but does not affect any other
variables directly. One could easily imagine such an action affecting their grievance
level, but we omit such a dependency as there would be no evidence of such an
effect in any of the data provided prior to this challenge.
5.3 Prescribe challenge
We divided our prescribe challenge into short- and long-term challenges, with the
same data provided as in the predict challenge described in Sect. 5.2.
1. Constrained prescription To which region should the government direct its aid
on each day?
2. Unconstrained prescription What should the government do on each day, with-
out changing its current policy of aid allocation?
3. Combined prescription What should the government do on each day, while
being free to change its current policy of aid allocation? In other words, a com-
bination of items 1 and 2.
We again used Monte Carlo simulation to generate the outcomes of alternate pre-
scriptions. For a “reasonable” prescription baseline, we evaluated the default gov-
ernment policy of allocating aid to the region with the highest risk, weighted by the
residents and the system-level biases. For a null baseline, we generated outcomes
under a condition where the government did not allocate any aid at all.
The submitted Unconstrained Prescriptions included evacuation incentives.
The long-term challenge also asked for prescriptions that would minimize Casual-
ties. In addition to a day-to-day policy for the government to follow, we also asked
for actions that the government could take after the first season (whose data is pro-
vided) and the second season (over which the prescriptions will be evaluated). This
13
Disaster world 107
1. Offseason prescription What should the government do before the next hurricane
season?
2. In-season prescription What should the government do during the next hurricane
season?
3. Combined prescription What should the government do both before and during
the next hurricane season?
As in the short-term challenge, we used null and default aid policies as the baseline.
Submitted prescriptions included taxation (as illustrated in our long-term counter-
factual prediction question). Other prescriptions
6 Accessibility
During each challenge, a package of data was provided that contained the results
of applying a variety of instruments to the simulated population. These instruments
were designed to be representative of the types of instruments that could be applied
to a real population under similar circumstances.
6.1.1 Census
13
108 D. V. Pynadath et al.
6.1.2 Environmental data
The initial data package includes accurate records of the hurricanes in the form of a
daily log of each one, with each entry recording:
6.1.3 Casualty statistics
A time series of various casualty numbers was also provided, with values for the fol-
lowing statistics provided for each day of the simulation run:
6.1.4 Surveys
Surveys are the simplest instrument for using to access the subjective perspectives
of our actors. We included data from two survey instruments in our initial package:
one conducted during the approach of each hurricane, and one conducted during the
aftermath of each hurricane. In addition to providing data for explanation, predic-
tion, and prescription, these surveys were designed to also exemplify the types of
questions our agents were capable of answering in follow-up research requests as
well.
Pre-hurricane survey For each hurricane, over the days between its phase
changing to approaching and the subsequent change to active, we sampled 10%
of the population. To maximize the coverage of the survey, actors who previously
answered the survey were removed from the pool of available responders2. A more
realistic model would have some percentage of actors refuse to ever answer the sur-
vey, but we prioritized covering all actors instead of realism in this case.
We designed the questions to gather basic information about the actors’ current
circumstances, as well as to probe their thought processes:
2
Once all actors had answered the survey, we reset the pool to be all actors.
13
Disaster world 109
The Category item provides data on actors’ subjective perceptions of the hurricane
severity and demonstrates that they do indeed differ from reality (as reported in
Sect. 6.1.2’s data). The actors respond to this item by simply computing an expecta-
tion over their belief about the hurricane’s category. They then map this expectation
into a 7-point scale.
The Risk similarly illustrates the actors’ subjective perceptions, but it also dem-
onstrates their ability to project into the future. To respond to this item, the actors
do a hypothetical simulation of the hurricane, but using their beliefs as the start-
ing point, instead of the true state. They run this simulation in their “heads” until
the hurricane passes through the area, following the most likely trajectory (accord-
ing to their perceptions of hurricane dynamics, which are accurate in these simu-
lations). This simulation also includes the most likely observation the actor will
receive and the updated belief state resulting from that observation. However, this
hypothetical simulation does not consider the actions (whether good or bad) of other
actors, because other actors are not represented within the actors’ current beliefs. So
these expectations are almost guaranteed to diverge from reality. Regardless, actors
respond to this item by computing an expectation of their beliefs about risk on every
day and then mapping the maximum value over those days to the desired 7-point
Likert scale.
The items anticipated shelter and anticipated evacuation also illustrate how
actors can project into the future, not just to generate expectations of exogenous
events (i.e., the hurricane), but also of their own behavior. The actors are able to rea-
son about and express explicitly consider alternate actions (i.e., their behavior is not
generated by stimulus-response rules). On each day of this hypothetical simulation,
the actors generate expectations about what they themselves will do, computing
their expected reward over their possible action choices. They then use a softmax to
convert these expected rewards into a probability distribution over action choices.
Their responses to anticipated shelter and anticipated evacuation are the maxi-
mum probability they foresee for the corresponding action choice (move to shelter
or evacuate, respectively) over all of the days that the hurricane is approaching or
active. These items thus demonstrate the actors’ ability to reason about the future
and form communicable expectations about their own behavior
Post-hurricane Survey After each hurricane, we sampled 10% of the popula-
tion (again covering the entire population over time, as in the pre-hurricane survey)
to answer a survey after the phase returned to none, but before the next hurricane
13
110 D. V. Pynadath et al.
began approaching. We designed the questions to gather basic information about the
actors’ behavior during the just-ended hurricane, as well as to probe their decision-
making processes behind that behavior:
The “possibility” questions provide insight into the actors’ decision-making process,
in that the responses indicate how close they were to choosing an alternate behavior.
The aim of these questions is to highlight that the actors explicitly consider alternate
courses of action, hopefully eliminating the possibility that their behaviors are gov-
erned by a rule-based system.
13
Disaster world 111
6.2 Research requests
Although informative, these IDPs were deliberately insufficient for answering the chal-
lenges, as a key goal of the exercise was to understand what different data were required
by different methods for solving them. We instead provided an accessibility interface
that could support research requests into the simulated world in the same way that
social scientists operationalize their instruments in the real world.
The most-used method of this interface was the one for extracting a time series of
a single variable from the simulation history and then aggregating that series in some
way. For example, the at shelter question from the IDP post-hurricane survey simply
accumulated the values of the actor’s location variable for the interval of the previ-
ous hurricane and returned “yes” if and only if a shelter appeared in those values. We
made a strong assumption that actors had perfect recall of their experiences, behavior,
and even their belief states, no matter how far in the past. While this assumption does
not match human memory, it makes the data more useful in addressing the challenges
presented here.
Across the research requests received, the vast majority were surveys, with most of
the remaining requests being “event journals” or briefs by experts/observers. All such
requests started from the same method for extracting a history of variables for a par-
ticular variable. If the variable in question was one for which an actor had some uncer-
tainty, the time series would be over the actor’s belief states for that variable, not the
actual values.
Some research requests did not fit into this common mold. For example, we received
several requests asking actors whether they were willing to provide aid to different
groups of people (e.g., friends, neighbors, people of the same ethnicity as themselves).
In this case, the question is asking about the relative willingness of the actor to help
these different groups. As described in Sect. 4.3.4, the actors’ priority of neighbors
completely encapsulates their altruism, so their answer to this question weighed the
degree to which this altruism applied to the group in question, and the degree to which
this altruism was able to override their other goals. Thus, actors would be more willing
to give aid to those of the same ethnicity if more of them lived in the same region of
residence. Also, the higher their priority of health, the less willing the actors would be
to give aid to this group.
As another example, a separate request asked actors in the pre-hurricane survey
about the likelihood that the hurricane would hit their region of residence. To answer
this survey item, the actors did a hypothetical simulation of the hurricane using their
beliefs about the hurricane dynamics. This simulation reused the code for the actors’
decision-making, except that it considered only the hurricane’s “turn” and ignored
those of the groups, system, and the actors themselves. Because all of the actors share
correct beliefs about the hurricane dynamics, actors with the same region of residence
responded to this item with the same perceived likelihood.
13
112 D. V. Pynadath et al.
6.3 HoloCane
13
Disaster world 113
7 Conclusion
Funding This study was supported by Defense Sciences Office, DARPA [Grant No. HR00111820004].
13
114 D. V. Pynadath et al.
References
Boutilier C, Poole D (1996) Computing optimal policies for partially observable decision processes
using compact representations. In: Proceedings of the national conference on artificial intelli-
gence, pp 1168–1175
Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and compu-
tational leverage. J Artif Intell Res 11(1):94
Carley KM, Fridsma DB, Casman E, Yahja A, Altman N, Chen LC, Kaminsky B, Nave D (2006) Bio-
War: scalable agent-based model of bioattacks. IEEE Trans Syst Man Cybern A 36(2):252–265
Collins J, Ersing R, Polen A (2017) Evacuation decision-making during Hurricane Matthew: an
assessment of the effects of social connections. Weather Clim Soc 9(4):769–776
Collins J, Ersing R, Polen A, Saunders M, Senkbeil J (2018) The effects of social connections on
evacuation decision making during Hurricane Irma. Weather Clim Soc 10(3):459–469
Dash N, Gladwin H (2007) Evacuation decision making and behavioral responses: individual and
household. Nat Hazards Rev 8(3):69–77
Demuth JL, Morss RE, Morrow BH, Lazo JK (2012) Creation and communication of hurricane risk
information. Bull Am Meteorol Soc 93(8):1133–1145
Farmer AK, DeYoung SE, Wachtendorf T (2017) Pets and evacuation: an ongoing challenge in disas-
ters. J Homel Secur Emerg Manag. https://doi.org/10.1515/jhsem-2016-0051
Gmytrasiewicz PJ, Durfee EH (1995) A rigorous, operational formalization of recursive modeling. In:
Proceedings of the international conference on multi-agent systems. pp 125–132
Gmytrasiewicz PJ, Doshi P (2005) A framework for sequential planning in multi-agent settings. J
Artif Intell Res 24:49–79
Goodie AS, Doshi P, Young DL (2012) Levels of theory-of-mind reasoning in competitive games. J
Behav Decis Mak 25(1):95–108
Heath SE, Kass PH, Beck AM, Glickman LT (2001) Human and pet-related risk factors for household
evacuation failure during a natural disaster. Am J Epidemiol 153(7):659–665
Hoey J, Little JJ (2007) Value-directed human behavior analysis from video using partially observable
Markov decision processes. IEEE Trans Pattern Anal Mach Intell 29(7):1118–1132
Howard RA (1988) Decision analysis: practice and promise. Manag Sci 34(6):679–695
Howard RA, Matheson JE (eds) (1984/2005a) Influence diagrams. In: The principles and applica-
tions of decision analysis, Vol. II. Strategic Decisions Group, Menlo Park, California, 719–763.
Reprinted, Decision Anal 2, 127–143.
Huang SK, Lindell MK, Prater CS (2016) Who leaves and who stays? A review and statistical meta-
analysis of hurricane evacuation studies. Environ Behav 48(8):991–1029
Hunt MG, Bogue K, Rohrbaugh N (2012) Pet ownership and evacuation prior to Hurricane Irene.
Animals 2(4):529–539
Ito JY, Pynadath DV, Marsella SC (2010) Modeling self-deception within a decision-theoretic frame-
work. J Auton Agents Multiagent Syst 20(1):3–13
JASSS (1998–present) The Journal of Artificial Societies and Social Simulation. http://jasss.soc.sur-
rey.ac.uk/JASSS.html
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochas-
tic domains. Artif Intell 101:99–134
Kim JM, Hill Jr RW, Durlach PJ, Lane HC, Forbell E, Core M, Marsella S, Pynadath D, Hart J (2009)
BiLAT: a game-based environment for practicing negotiation in a cultural context. Int J Artif
Intell Educ 19(3):289–308
Kjaerulff U (1992) A computational scheme for reasoning in dynamic probabilistic networks. In: Pro-
ceedings of the eighth international conference on uncertainty in artificial intelligence. Morgan
Kaufmann Publishers Inc., Milan, pp 121–129
Koller D, Milch B (2003) Multi-agent influence diagrams for representing and solving games. Games
Econ Behav 45(1):181–221
Lazo JK, Bostrom A, Morss RE, Demuth JL, Lazrus H (2015) Factors affecting hurricane evacuation
intentions. Risk Anal 35(10):1837–1857
Lindell MK, Perry RW (2012) The protective action decision model: theoretical modifications and
additional evidence. Risk Anal 32(4):616–632
Lindell MK, Lu JC, Prater CS (2005) Household decision making and evacuation in response to Hur-
ricane Lili. Nat Hazards Rev 6(4):171–179
13
Disaster world 115
Luke S, Cioffi-Revilla C, Panait L, Sullivan K, Balan G (2005) MASON: a multiagent simulation envi-
ronment. Simulation 81(7):517–527
MABS (1998–present) Proceedings of the international workshop on multi-agent-based simulation.
http://www.pcs.usp.br/~mabs/
Marsella SC, Pynadath DV, Read SJ (2004) PsychSim: agent-based modeling of social interactions and
influence. In: Proceedings of the international conference on cognitive modeling. pp 243–248
McAlinden R, Pynadath D, Hill RW Jr (2014) UrbanSim: using social simulation to train for stability
operations. In: Ehlschlaeger C (ed) Understanding megacities with the reconnaissance, surveillance,
and intelligence paradigm, chap 10. pp 90–99
NOAA (2020) U.S. billion-dollar weather and climate disasters. https://www.ncdc.noaa.gov/billions/.
Accessed 23 Sept 2020
Paruchuri P, Chakraborty N, Gordon G, Sycara K, Brett J, Adair W (2013) Inter-cultural opponent behav-
ior modeling in a POMDP based automated negotiating agent. In: Models for intercultural collabo-
ration and negotiation. Springer, pp 165–182
Polich K, Gmytrasiewicz P (2007) Interactive dynamic influence diagrams. In: Proceedings of the Inter-
national joint conference on autonomous agents and multiagent systems. ACM, p 34
Pynadath DV, Marsella SC (2005) PsychSim: modeling theory of mind with decision-theoretic agents. In:
Proceedings of the international joint conference on artificial intelligence. pp 1181–1186
Pynadath DV, Marsella SC (2007) Minimal mental models. In: Proceedings of the conference on artificial
intelligence. pp 1038–1046
Pynadath DV, Rosoff H, John RS (2016) Semi-automated construction of decision-theoretic models of
human behavior. In: Proceedings of the international conference on autonomous agents and multia-
gent systems
Ross S, Pineau J, Paquet S, Chaib-Draa B (2008) Online planning algorithms for POMDPs. J Artif Intell
Res 32:663–704
Schott T, Landsea C, Hafele G, Lorens J, Taylor A, Thurm H, Ward B, Willis M, Zaleski W (2019)
Saffir–Simpson hurricane wind scale. https://www.nhc.noaa.gov/pdf/sshws.pdf, published by the
NOAA. Accessed 23 Sept 2020
Si M, Marsella SC, Pynadath DV (2010) Modeling appraisal in theory of mind reasoning. J Auton Agents
MultiAgent Syst 20(1):14–31
Sun R (2006) Cognition and multi-agent interaction: from cognitive modeling to social simulation. Cam-
bridge University Press, Cambridge
Tatman JA, Shachter RD (1990) Dynamic programming and influence diagrams. IEEE Trans Syst Man
Cybern 20(2):365–379
Wang N, Pynadath DV, Marsella SC (2015) Subjective perceptions in wartime negotiation. IEEE Trans
Affect Comput 6(2):118–126
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
David V. Pynadath is the Director for Social Simulation Research at the USC Institute for Creative Tech-
nologies and a Research Assistant Professor in the USC Computer Science Department. He has published
papers on social simulation, multiagent systems, teamwork, plan recognition, and adjustable autonomy.
He is the co-creator and maintainer of PsychSim, a multiagent social simulation framework that has been
used in interactive simulations for teaching urban stabilization operations, cross-cultural negotiation, and
avoiding risky behavior. Dr. Pynadath’s work on PsychSim is a key component of his long-term research
into applying decision-theoretic multiagent methods to models of behavior. He has developed multia-
gent systems for applications in social simulation, virtual training environments, human-robot interac-
tion, automated personal assistants, and UAV coordination. He has used such systems to create models
of human decision-making in scenarios including ethnic conflict, traffic, classroom violence, negotiation,
and disaster response.
Bistra Dilkina is an Associate Professor of Computer Science at the University of Southern California,
co-director of the USC Center of AI in Society, and the inaugural Dr. Allen and Charlotte Ginsburg Early
Career Chair at the USC Viterbi School of Engineering. Her research and teaching center around the
13
116 D. V. Pynadath et al.
integration of machine learning and discrete optimization, with a strong focus on AI applications in com-
putational sustainability and social good. She received her PhD from Cornell University in 2012 and was
a Post-Doctoral Associate at the Institute for Computational Sustainability. Her research has contributed
significant advances to machine-learning-guided combinatorial solving including mathematical program-
ming and planning, as well as decision-focused learning where combinatorial reasoning is integrated
in machine learning pipelines. Her applied research in Computational Sustainability spans using AI for
wildlife conservation planning, using AI to understand the impacts of climate change in terms of energy,
water, habitat and human migration, and using AI to optimize the fortification of lifeline infrastructures
for disaster resilience. She has over 80 publications and has co-organized or served as a chair to numerous
workshops, tutorials, and special tracks at major conferences.
David C. Jeong is an Assistant Professor in the Department of Communication at Santa Clara University.
His research areas include the study of VR, haptics, and gaming within human-computer interaction,
as well as critical approaches to online toxicity within games and social media. At Santa Clara Univer-
sity, he leads the Imaginarium Lab, which specializes in VR/AR/XR development, 3D modeling, data
visualization, digital humanities, and high performance computing. His recent work has been published
inFront iers of Pychology, IEEE Robotics and Automation Letters, Proceedings of AI and VR (AIVR),
the Proceedings of the Autonomous Agents and MultiAgent Systems (AAMAS), and the Proceedings of
the International Conference on Intelligent Virtual Agents (IVA).
Richard S. John is a Professor of Psychology and Associate Director at the Center for Risk and Economic
Analysis of Threats and Emergencies (CREATE) at the University of Southern California. His research
focuses on normative and descriptive models of human judgment and decision making and methodologi-
cal issues in the application of decision analysis and probabilistic risk analysis (PRA). Richard received
his PhD. in quantitative psychology from the University of Southern California in 1984, M.S. in applied
mathematics from the University of Southern California in 1983, and B.S. in applied mathematics
(summa cum laude) from the Georgia Institute of Technology in 1976.
Stacy C. Marsella is a Professor at Northeastern University in the Khoury College of Computer Sciences
with a joint appointment in the Department of Psychology. His research is in the computational modeling
of human cognition, emotion and social behavior, both as a basic research method in the study of human
behavior as well as for use in a range of applications. His work has been applied to the modeling of
human behavior for large scale social simulations, realization of effective human-AI teamwork as well as
design of virtual humans, software entities that look human and can interact with humans in virtual envi-
ronments using verbal and nonverbal behavior. He is the co-creator of the PsychSim multi-agent social
simulation framework.
Chirag Merchant is a software engineer based in Los Angeles, CA, USA. He has developed software
professionally for 20 years. At USC’s Institute for Creative Technologies, he develops prototypes, simula-
tions, game-based training applications, educational games, and research support software. He has led
the development of applications used to train leaders, treat PTSD, prevent sexual harassment and assault,
deliver survivor testimonies, teach children AI, learn foreign languages, and visualize simulations. He
holds a Master’s degree in Computer Science from the University of Southern California with a speciali-
zation in Multimedia and Creative Technologies.
Lynn C. Miller is Professor of Communication and Psychology at USC. She is a pioneer in developing and
testing representative assessments (e.g., for risky behavior) and interventions in virtual environments. She
developed systematic representative design (SRD), a new experimental design with both greater causal
inference capacity and generalizability to everyday life. With Read, she developed social computational
models of the underlying personality dynamics (e.g., goals, plans, resources, and beliefs) that could pro-
duce within-person variability across contexts that aggregated could produce the Big-5 (between-per-
sons) linking within-person variability to between-person personality trait structures.
Stephen J. Read is Professor of Psychology at USC. He is a social and personality psychologist, and
cognitive scientist, expert in the computational modeling of human social behavior and social reasoning.
Over the last 30 years, he and Miller have worked on theoretical and computational models of motiva-
tion and human personality, as well as social perception. He has created both symbolic models of human
13
Disaster world 117
personality, and neural network models of human motivation and personality, and social perceptions. His
research covers human motivation and personality, social perception, and human decision-making. He
has published four edited books and over 100 articles.
13