Disaster World: Decision-Theoretic Agents For Simulating Population Responses To Hurricanes

Computational and Mathematical Organization Theory (2023) 29:84–117
https://doi.org/10.1007/s10588-022-09359-y
S.I. : GROUND TRUTH: IN SILICO SOCIAL SCIENCE (GTIS3)
Disaster world
Decision-theoretic agents for simulating population responses to
hurricanes
David V. Pynadath1 · Bistra Dilkina2 · David C. Jeong2,3 · Richard S. John2 ·

Stacy C. Marsella4 · Chirag Merchant1 · Lynn C. Miller2 · Stephen J. Read2
Published online: 18 May 2022

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
2022
Abstract
Artificial intelligence (AI) research provides a rich source of modeling languages
capable of generating socially plausible simulations of human behavior, while also
providing a transparent ground truth that can support validation of social-science
methods applied to that simulation. In this work, we leverage two established AI
representations: decision-theoretic planning and recursive modeling. Decision-
theoretic planning (specifically Partially Observable Markov Decision Processes)
provides agents with quantitative models of their corresponding real-world enti-
ties’ subjective (and possibly incorrect) perspectives of ground truth in the form of
probabilistic beliefs and utility functions. Recursive modeling gives an agent a the-
ory of mind, which is necessary when a person’s (again, possibly incorrect) subjec-
tive perspectives are of another person, rather than of just his/her environment. We
used PsychSim, a multiagent social-simulation framework combining these two AI
frameworks, to build a general parameterized model of human behavior during dis-
aster response, grounding the model in social-psychological theories to ensure social
plausibility. We then instantiated that model into alternate ground truths for simu-
lating population response to a series of natural disasters, namely, hurricanes. The
simulations generate data in response to socially plausible instruments (e.g., sur-
veys) that serve as input to the Ground Truth program’s designated research teams
for them to conduct simulated social science. The simulation also provides a graphi-
cal ground truth and a set of outcomes to be used as the gold standard in evaluating
the research teams’ inferences.
Keywords Social simulation · Decision theory · Partially observable Markov

decision processes (POMDPs) · Multiagent-based simulation · Disaster response
* David V. Pynadath
pynadath@usc.edu
Extended author information available on the last page of the article
1Vol:.(1234567890)
3
Disaster world 85
1 Introduction
Social science modeling approaches are designed to make causal inferences about
social dynamics observed in the real-world, but they have a major drawback:
The accuracy of these approaches and their inferred causal links is impossible to
evaluate without knowing the ground truth underlying the observed phenomena.
The aptly named Ground Truth program sought to assess the accuracy of such
approaches with respect to a known, albeit simulated, underlying model of social
dynamics. The program tasked one set of teams to create computational “test-
beds” with a known ground truth that another set of teams would try to infer from
observable data generated by those testbeds. This process afforded the means
to characterize the limitations of today’s modeling tools in inferring a social-
dynamics ground truth from observable data. This paper presents our computa-
tional approach in building a multiagent social simulation to provide a testbed for
the evaluation of teams’ efforts in inferring its ground truth.
Multiagent social simulation has proven to be a useful tool for answering
the hypothetical questions asked by policy makers (Carley et al. 2006; JASSS
1998-present; Luke et al. 2005; MABS 1998-present; Sun 2006). Such simula-
tions represent people as autonomous agents that reflect individuals’ or groups’
decision-making perspectives and behavior. In most approaches, each agent
chooses actions based on a simple set of rules that captures a hypothesis underly-
ing the simulation. While such rules provide a generative model of human behav-
ior, they are less suited for capturing a “ground truth” to be inferred by others.
Changing the hypotheses underlying the simulated mental processes typically
requires encoding a new set of rules. Furthermore, the rules gain efficiency at
the cost of transparency of the reasoning process they encode. Rules reduce the
entire reasoning process into a set of stimulus-response behaviors, so that reason-
ing process is no longer accessible to researchers who may wish to observe it.
Artificial intelligence (AI) research provides a rich source of alternate mod-
eling languages capable of addressing the requirements of a transparent ground
truth. For example, we can use decision-theoretic models to capture people’s deci-
sion-making processes, in the form of beliefs, choices, preferences, etc. (Goodie
et al. 2012; Hoey and Little 2007; Paruchuri et al. 2013; Pynadath and Marsella
2005; Wang et al. 2015). By representing these relatively persistent characteris-
tics, the agent can make decisions that are aligned with the corresponding real
people in hypothetical situations of interest.
We claim that two established AI representations are particularly appropri-
ate for the needed simulation framework: decision-theoretic planning and recur-
sive modeling. Decision-theoretic planning [specifically Partially Observable
Markov Decision Processes (POMDPs) Kaelbling et al. (1998)] provides agents
with quantitative models of their corresponding real-world entities’ subjective
(i.e., possibly incorrect) perspectives of ground truth in the form of probabilistic
beliefs and utility functions. Recursive modeling gives an agent a theory of mind,
which is necessary when a person’s subjective perspectives are of another person,
rather than of just his/her environment. A theory of mind enables people (and
13
86 D. V. Pynadath et al.
agents) to form beliefs about the mental states of others, generate expectations
about the behaviors of other, and update their beliefs in response to observations
of their actual behavior.
We have leveraged such AI technology to develop a social-simulation framework,
called PsychSim (Pynadath and Marsella 2005), that we have used to build genera-
tive models of diverse social scenarios, spanning training simulations in urban sta-
bilization (McAlinden et al. 2014) and bilateral negotiation (Kim et al. 2009), and a
Department of Homeland Security study of disaster response (Pynadath et al. 2016),
among others. Each PsychSim agent represents an entity (individual, organization,
state, etc.) within a general language for encapsulating a variety of phenomena stud-
ied in the psychological literature, such as appraisal processes in emotion (Si et al.
2010), wishful thinking Ito et al. (2010), influence factors (Marsella et al. 2004), and
stereotype formation/person perception (Pynadath and Marsella 2007). All of these
existing models have been captured within a general graphical language that can
encode first principles as quantitative links among variables.
In this work, we used PsychSim to build a general parameterized model of human
behavior during disaster response, grounding the model in social-psychological
theories to ensure social plausibility. We then instantiated that model into alternate
ground truths for simulating population response to a series of natural disasters,
namely, hurricanes. The framework provides simulation flexibility by supporting
the reconfiguration of the simulation through a relatively small set of parameters.
Changing these parameters produces an alternate simulation grounded in a per-
turbation of the original ground truth, but one that still leads to socially plausible
behavior.
Section 2 describes the properties of the real-world phenomena that our model
seeks to capture. Section 3 presents PsychSim, the multiagent social-simulation
framework that we used to represent and simulate the agent models. Section 4 pre-
sents those agent models of the hurricane, the urban area it affects, the individual
actors who live there, the groups they potentially form in response to the hurricane,
and the government that encapsulates the system-level response. Section 5 presents
the separate explanation, prediction, and prescription challenges that we posed to
those analysts. Section 6 describes the data that the resulting simulation provides
up front, as well as in response to research requests from external analysts of the
simulated society. Section 7 discusses potential future directions for this work and
concludes.
2 Disaster response in real‑world hurricane scenarios
Hurricanes are natural storms, energized by warm ocean temperatures, with sus-
tained wind speeds of 74 mph or more: Increasing wind speed (and category clas-
sification Schott et al. 2019) and concomitant heavy rain, storm surges, and flood-
ing have led to thousands of fatalities and over $400 billion in estimated property
damage in the last decade alone. Given global warming, coastal communities are
increasingly vulnerable as sea levels rise, compounding surge effects (NOAA 2020).
13
Disaster world 87
Understanding how the residents of these communities response to hurricanes is

critical in mitigating the damage done by these disasters.
The decision of when and whether to evacuate is complicated by the uncertain
future path and intensity of a hurricane. Studies revealed a range of factors influ-
encing the decision in the face of such uncertainty. Huang et al. (2016) performed
a meta-analysis of 49 studies spanning both responses to real hurricane events and
hypothetical hurricane scenarios. The analysis identified a range of factors that con-
sistently had significant effects on evacuation, including official warnings, mobile
home residence, residence in risk areas, observations of environmental factors such
as storm conditions, social cues such as other people’s behavior, and perception of
severe personal impacts. Other studies have reinforced the role of the perception of
risks Lindell et al. (2005) and social factors, including social connectedness (Collins
et al. 2017, 2018) and pet ownership (Heath et al. 2001; Farmer et al. 2017; Hunt
et al. 2012).
We developed the models for agent decision-making by drawing upon the litera-
ture on how people obtain, interpret, and use information for hurricanes and other
weather hazards (Dash and Gladwin 2007; Lindell and Perry 2012; Demuth et al.
2012), in conjunction with relevant expertise from members of our research team.
Existing regression models of hurricane evacuation (Lazo et al. 2015) are con-
strained by the amount of variability there is in prediction depending on specific
populations. For instance, even if a model indicates prioritization of families, this
may be far less predictive in cities where most people live alone.
PsychSim attempts to account for such between-subject variability among agents,
which may afford improved a priori understanding of crisis situations. Most impor-
tantly, it also addresses the internal reasoning processes of goal-driven mechanisms
operating for people within such situations. Section 3 describes PsychSim’s general
decision-making framework, and Section 4 describes the specific agent models built
for this hurricane scenario.
3 PsychSim
PsychSim draws from a variety of decision-theoretic frameworks that are rich

enough to capture real-world decision-making and social-science theories Pynadath
and Marsella (2005). These quantitative frameworks also support domain-independ-
ent algorithms for generating beliefs and behaviors that capture real-world phenom-
ena. We focus here on the subset of such frameworks that PsychSim uses to model
hurricane response.
Factored POMDPs (Boutilier and Poole 1996) separate the joint spaces of states,
actions, and observations into finer-grained dimensions (e.g., a state space that is
a combination of variables like “employment”, “location”, “health”). They exploit
the sparse dependencies among these variables (e.g., one individual’s location has
no direct effect on another individual’s employment status) through a graphical
representation of the links among them. Each agent also has a quantitative reward
function, allowing the model to capture tradeoffs among potentially competing goals
(e.g., financial stability vs. personal safety). Such graphical representations have
13
proven to be a robust and powerful tool for decision analysis, a method for con-
structing a model of the causal relationships underlying an individual’s or group’s
decision-making process (Howard 1988).
While a factored POMDP thus provides an expressive model of an entity’s deci-
sion-making process, we also need to support multiple POMDPs to capture the enti-
ties’ different decision-making perspectives. Furthermore, each of these entities may
have beliefs about not only their environment (e.g., “The hurricane poses a great
risk to me and my family”), but also beliefs about other entities (e.g., “I do not think
the government cares about my ethnic group”). Recursive models allow agents to
have such a theory of mind, by representing their beliefs about the mental states of
other agents in the same form as their own mental states, allowing them to reuse
the same AI algorithms to generate expectations of others’ behavior as they use for
their own actual behavior (Gmytrasiewicz and Durfee 1995). Interactive POMDPs
(Gmytrasiewicz and Doshi 2005) re-use POMDP representations and algorithms to
represent agents’ subjective perceptions of others that can deviate from reality (e.g.,
individual citizens may have different beliefs about their government’s reward func-
tion than what the government’s actual reward function is).
PsychSim is our implementation of recursive, factored POMDPs, with additional
restrictions to aid AI non-experts in creating and understanding the simulation mod-
els and output. To allow such potential authors to be able to encode their knowledge
within PsychSim agents, these encodings take the form of a graphical representation
of probabilistic and utility interdependencies among scenario variables. We start
from the standard factored POMDP’s use of Dynamic Bayesian Networks (Kjaerulff
1992) and influence diagrams (Howard and Matheson 1984) to exploit conditional
independence in modeling the effects of actions Boutilier et al. (1999). We can thus
express dependencies among our states and actions as links among the nodes of a
dynamic influence diagram (DID) (Tatman and Shachter 1990). PsychSim’s com-
putational realization of such graphs in a multiagent context draws upon existing
decision-theoretic graphical models like MultiAgent Influence Diagrams (MAIDs)
(Koller and Milch 2003) and Interactive Dynamic Influence Diagrams (I-DIDs)
(Polich and Gmytrasiewicz 2007).
4 Agent models
Figure 1 shows the DID visualization of a particular instantiation of our agent mod-
els. Different colored nodes correspond to variables associated with different enti-
ties: red for the hurricane (Nature), dark green for regions of the urban environment
(Regions), two shades of yellow for two different residents of the area (Actors), light
green for groups of actors that emerge to perform joint actions (Groups), gray for
system-wide behaviors that emerge through political processes (System), and blue
for global variables not associated with any entity.
The shapes of the nodes indicate the type of variable: ovals for random variables,
rectangles for actions, and hexagons for utility functions. We can further distinguish
between the values of random variables before actions are taken (to the left of the
columns of rectangular nodes) and those afterward (to the right). Edges between
13
Disaster world 89
Fig. 1 Dynamic influence diagram visualization of a simulation instance
nodes can thus capture both interdependencies among variables at the same point in
time and the effects of actions on the change in variables over time.
The simulation uses the latter to update the state of the world once per day, with
each day consisting of updates to the state of individual entities in the following
sequence: (1) Groups, (2) Actors, (3) System, and (4) Nature (Sect. 4.1). Much of
the interaction among these entities is mediated by their effects on the shared envi-
ronment, represented by 16 regions (Sect. 4.2). Unless otherwise noted, the vari-
ables take on real values ∈ [0, 1] . The initial values of such variables are drawn from
a normal distribution (whose parameters can vary from simulation to simulation)
and then mapped to the smallest element in {0, 0.2, 0.4, 0.6, 0.8, 1} that is greater
than the value drawn.
Section 4.1 presents our model of the hurricane dynamics. Section 4.2 presents
our representation of the regional environment and how it is affected by the hurri-
cane. Section 4.3 presents our actor model. Section 4.4 presents how we model the
emergence of group behaviors from the actors within a region. Section 4.5 presents
our model of how system-wide dynamics emerge from the actors across the entire
area.
4.1 Nature
In line with the unpredictable characteristics of hurricanes in the real world, evolu-
tion of hurricanes in our model is governed by a stochastic process that is independ-
ent of the actions of any of the people. The state of the hurricane is defined by four
variables (the terms in bold represent their initial values):
Category ∈{none, 1–5} indicates the severity of the hurricane along the Saffir-
Simpson scale, with 1 being the least and 5 being the most severe (Schott et al.
2019). A value of none indicates that there is no hurricane present.
13
Phase ∈{none, approaching, active} specifies the hurricane’s current phase:

none if no hurricane is present, approaching if a hurricane is present but has not
yet made landfall, and active if the hurricane is over land.
Location ∈{none, Region1–16} specifies which region contains the center of the
hurricane if one is active, the predicted region of landfall if the hurricane is only
approaching and none if no hurricane is present.
Days ∈{0, 1, 2, ...} specifies how many days the hurricane has been in its current
phase. It increments by one on each day with no change in phase and resets to 0
on days when the phase changes.
A change of phase from none to approaching can happen only after a mini-
mum number of days, with that minimum fixed within a simulation instance, but
free to vary across instances. Upon reaching that minimum, there is a fixed prob-
ability of the phase changing to approaching, with that value again constant for a
given instance, but varying across instances. The dynamics of the transition from
approaching to active are the same, with the same fixed minimum number of days
and the fixed transition probability.
When phase transitions to approaching, the category of this new hurricane is
drawn from a uniform distribution over 1–5. The predicted landfall location is drawn
from a uniform distribution over the four coastal regions (Region01, Region05,
Region09, and Region13). While the hurricane is approaching, there is a fixed prob-
ability of its category going up or down by 1 as permitted by the 1–5 range. How-
ever, the location stays constant during this phase.
When phase transitions to active, the category does not change. We restrict
the movement of the hurricane to be only east (inland) or north. Each simulation
instance has its own fixed probability distribution over whether the hurricane’s loca-
tion on the next day will be the region directly to the east of its current location, the
region directly to the north, or the same region.
When the hurricane’s location makes a transition to the north or east that would
take it out of the specified regions, the hurricane is declared over. Its category,
phase, and location all reset to none. days resets to 0 and the phase-transition cycle
begins again.
4.2 Regions
We divide the area into a 4 × 4 grid of rectangular regions, as reprsented by the

green rectangles in Fig. 2. The blue rectangles represent the water from which the
hurricanes originate, while the brown rectangles represent land that is outside the
area being simulated. The small circles represent the actors, with their placement
indicating the location of their residences and their color indicating their level of
health (described in Sect. 4.3.1).
The state of each region is captured by two variables:
Risk encompasses all of the ways that the hurricane can make a region unsafe
(e.g., property damage, high winds, flooding, etc.). Each region’s risk level
13
Disaster world 91
Fig. 2 Population map at the start of a sample simulation instance, showing the risk of regions (green
rectangles) and the health of individual actors (color-coded circles). (Color figure online)
can never drop below its initial value (i.e., its baseline level of safety never
improves). These initial values are drawn from a normal distribution with a
mean and standard deviation specified for each simulation instance. In Fig. 2,
the regions all start with an equally low level of risk, depicted by the green
color of the corresponding rectangles.
Economy represents the economic viability of the region, in terms of the abil-
ity of businesses to stay in operation. As in the risk level, a region’s economic
level can never improve to exceed its initial value. These initial values are also
drawn from a normal distribution with a mean and standard deviation specified
for each simulation instance.
On each day with no hurricane over land (i.e., phase is {none or approaching),
each region’s risk decreases toward its initial value by a fixed percentage. When
phase is active, then each region’s risk increases toward 1 by a percentage that
increases with category and decreases with the Manhattan distance between that
region and the hurricane’s location.
Figure 3 illustrates the state of the simulation in the midst of a hurricane,
depicted by the red icon in the center of the region second from the left in the top
row. The region containing the icon is the hurricane’s location, and the number
in the icon (2) gives the hurricane’s category. Green regions are still those with
low levels of risk, but we can see that the regions that the hurricane has passed
through have higher levels of risk from their orange color.
Figure 4 shows the effects of a different hurricane, this time with a category of
3 and a location in the bottom left region. The hurricane has just made landfall,
so that its phase has just become active. The higher category of the hurricane
has led to a larger impact on the regions’ risk levels than is seen in Fig. 3. There
13
Fig. 3 Population map in the middle of a category 2 hurricane
Fig. 4 Population map in the middle of a category 3 hurricane
are areas of red (highest risk), and no regions retain their original dark green
color.
The hurricane’s effect on the regions’ economy is based on their risk level. If a
region’s risk exceeds a fixed threshold, then its economy will decrease by a fixed
percentage; otherwise, it will recover toward its initial value by a fixed percentage.
Regions which have public shelters have four more variables to represent their
current state (a region can have at most one shelter):
13
Disaster world 93
Shelter risk has the same meaning as the region’s risk variable, except that it
captures the level of risk specific to the shelter, rather than the region at large.
Shelter pets ∈{True, False} indicates whether the shelter allows pets inside
or not (True means that it does). Shelters do not change their policy during the
course of the simulation.
Shelter capacity is an integer indicating how many households can stay in the
shelter. A shelter’s capacity does not change during the course of the simulation.
Shelter occupancy is an integer indicating how many households are currently
staying in the shelter.
4.3 Actors
Actors are the most complex entities in the simulation by far, as they are the primary
target for the inference challenge.
4.3.1 Actor state
An actor’s state is represented by the following variables:
Location ∈{home, shelter, evacuated} indicates the actor’s current location,

where home means somewhere in its region of residence (randomly assigned at
the beginning of the simulation and static throughout). If there are multiple shel-
ters in the area, shelter refers to the shelter in the region nearest to the actor’s
region of residence. Some simulation instances have cases where there may be
two shelters equally close to the actor’s residence, in which case these two shel-
ters are allowed as separate possible values for location.
Risk captures the current level of personal risk to an actor. This level is primar-
ily driven by the actor’s current location, but it can be increased by the actor’s
behavioral choices as well.
Health represents the health level of the actor. If this value drops below a fixed
threshold (0.01 for all simulations run so far), the actor is considered dead and
is removed from the simulation. Actors’ initial health values are drawn from a
normal distribution whose mean value is fixed based on which age quintile they
fall into. A simulation instance can also specify an amount to increase or decrease
that mean for actors of the minority ethnic group. The standard deviation for the
distribution is the same for all actors. An actor’s health can never improve over
its initial value.
Children health represents the health level of an actor’s children (for those actors
who have children). There is only a single value, regardless of how many children
the actor has. The initial value is the same as that of the actor’s health. However,
unlike actors, children can still recover even if [children health] drops to 0.
Pet ∈{True, False} indicates whether or not the actor’s pet (for those actors who
have one) is alive or not. Once dead, this value stays False, with no opportunity
for the actor to acquire a new pet.
13
Resources aggregates whatever financial resources an actor can currently bring

to bear on their situation. Initial values are drawn from a normal distribution
whose mean is a fixed value based on an actor’s age, ethnicity, gender, and reli-
gion. An actor’s resources can never exceed this initial value, although the value
is allowed to recover even if it drops to 0.
Employed ∈{True, False} indicates whether the actor currently has a full-time
job or not. The initial values are drawn from a fixed distribution that can poten-
tially differ between the two ethnic groups. Actors can potentially lose their jobs
due to the hurricane, but there is no mechanism for them to regain them.
Grievance represents an actor’s level of dissatisfaction in the government. Ini-
tial values are drawn from a normal distribution whose mean is a fixed value
based on an actor’s ethnicity, gender, religion, and wealthiness (whether its initial
resources exceed a fixed threshold). The dynamics of this variable are discussed
in Sect. 4.5, which covers the government’s actions.
perceivedCategory, with the same possible values as Nature’s category, repre-
sents the aggregated, but possibly incorrect, “observation” that the actor receives
about the hurricane’s current severity.
There are internal dependencies among some of these variables. The most impor-
tant dependency is the effect of actors’ risk on their health. Actors whose risk is
in the lowest quintile (i.e., ≤ 0.2 ) face no threat and will instead recover from any
prior injuries. More precisely, they will have their health approach its initial value
by a fixed percentage (that possibly varies across simulation instances) each day. All
other actors face a nonzero chance of injury, with the likelihood of injury increas-
ing at higher quintiles of risk. Their health will change stochastically, approaching
either 0 or its initial value by a fixed (but possibly unequal) percentage. The likeli-
hood of going up vs. down is based on which quintile an actor’s current risk is in.
In particular, if it is in the highest quintile, there is an 80% chance that the actor’s
health will decrease; if in the next highest, there is a 60% chance; etc. Thus, even a
low level of risk (e.g., 0.2) can still result in injury, while even a maximum level of
risk still has a chance of being survived injury-free. This provides room for different
actors to assess such uncertain outcomes differently.
Children of actors face the same injury risks as their parents. In other words, the
dynamics of children health follows the same distribution as their parents’ health,
conditioned on the same value of risk. However, the health and children health are
drawn independently, so children are no more or less likely to suffer an injury when
their parents are injured.
4.3.2 Actor movement actions
The dynamics of other state variables depend on the actors’ behavior, captured by
the actions they choose to perform. The actors’ choices center around locations they
decide to move to. They can either:
Evacuate The actor and any family leave the area completely, at least temporar-
ily. This action changes the actor’s location to evacuated.
13
Disaster world 95
Move to shelter The actor and any family move into a public shelter. This action
changes the actor’s location to shelter.
Move home The actor and family move back to their residence. This action
changes the actor’s location to home.
Stay in location The actor and family stay where they currently are. This action
does not change the actor’s location.
When not moving (i.e., choosing stay in location), the actors’ risk changes based
on their location. When at home, their risk is set to the risk associated with their
region of residence. When already at a public shelter, it is set to the shelter risk of
the region in which they are sheltering. When their location is already evacu-
ated, their risk drops by 90%.
When returning home (i.e., choosing move home), actors also incur the risk of
their region of residence. The other movement actions affect the actor’s risk slightly
differently. When evacuating, the actor incurs the risk of the last region traversed
before leaving the area. While traveling to a shelter (i.e., choosing move to shelter),
actors incur the risk (not the shelter risk) of the region containing that shelter, to
capture the danger they are exposed to while in transit.
Moving to a shelter is the only action that can affect the status of an actor’s pet.
The only circumstance under which a pet will die if the its owner moves to a public
shelter that does not allow pets, thus forcing the owner to abandon it at home during
the hurricane. More precisely, an actor’s pet becomes False only if the actor chooses
to move to shelter in a region whose shelter pet is False. Under all other circum-
stances, the status of the pet stays the same.
Movement also has financial ramifications for the actors. There is a fixed proba-
bility (possibly changing across simulation instances) of actors losing their job (i.e.,
employed becomes False) while their location is evacuated. There is no chance of
their losing their job under any other circumstance, nor is there a chance of actors
gaining employment when unemployed. Actors who are unemployed (employed is
False) will see their resources drop by a fixed percentage.
However, being employed is not a guarantee of income either. Each simulation
instance can set a threshold which a region’s economy must exceed for jobs in that
region to generate income. We minimize the number of regions for actors to model
by assuming that they are employed in their region of residence. If a region’s econ-
omy does not exceed this threshold, then the residents of that region’s resources
drop just as if they would if their employed was False.
For actors employed in a region with a sufficiently high economy are able to gain
in resources. For actors whose location is either home or evacuated, their resources
level increases toward its original value by a fixed percentage (i.e., all actors receive
the same income). Actors whose location is a shelter may or may not be able to gain
an income, depending on a fixed Boolean parameter associated with each simula-
tion instance. This parameter controls whether actors staying at the shelter are able
to continue gaining income across the population (identical to being at home), or
whether they are not (identical to being unemployed).
The other impact on an actor’s resources is the cost associated with choosing to
evacuate. We define this as a fixed cost for each simulation instance, with the cost
13
being subtracted from the actor’s resources when evacuating. Actors whose level of
resources is less than this cost end up with 0 resources upon evacuating. Note that
this cost in no way blocks actors from evacuating; in fact, actors with 0 resources
can still evacuate and will suffer no financial repercussion for it. While this might
not be completely accurate, it does capture the fact that people who have already lost
everything have less to lose by abandoning their homes.
4.3.3 Actor pro/antisocial actions
Actors whose location is home have the option of performing prosocial and antiso-
cial actions as well:
Decrease risk is a prosocial action which lowers the risk of the actor’s region of
residence.
Take resources is an antisocial action (equivalent to looting) which increases the
actor’s resources.
Both of these actions require actors to leave the safety of their home, so they increase
the personal risk possibly beyond the baseline risk of the region of residence. An
instance parameter specifies a fixed percentage (one for each action type) by which
the actor’s risk will approach 1.
The decrease risk action will decrease the risk and increase the economy of the
actor’s region of residence. There is thus an immediate benefit to the risk levels of
the actor’s neighbors. However, the actor does not see this benefit until the follow-
ing day, because of the exposure to greater risk incurred on the day of the prosocial
behavior.
The take resources action brings the actor’s resources closer to 1 by a fixed per-
centage that can change across simulation instances. There is thus a benefit to the
individual actor, but we do not model the cost to the region of residence. A more
realistic model of looting would perhaps cause the region’s economy to decrease
based on the number of actors choosing the take resources action on a given day.
4.3.4 Actor reward
The reward function represents the utility that an actor derives from the current state.
We choose a linear reward structure for the actors, with constant weights over a sub-
set of state features. The initial values for these weights are sampled from normal
distributions (all using the same standard deviation). The values remain unchanged
throughout, reflecting the unchanging values held by the actors. The following are
the state features from which the actors derive direct utility:
Priority of health assigns a positive weight to the actor’s level of health. The
mean of the distribution can change based on the actor’s gender.
Priority of resources assigns a positive weight to the actor’s level of resources.
The mean of the distribution can change based on the actor’s gender.
13
Disaster world 97
Priority of children’s health assigns a positive weight to the actor’s children

health (for actors who have children). The mean of the distribution can change
based on the actor’s gender.
Priority of pets assigns a positive weight to the actor’s pet (for actors who have
a pet).
Priority of neighbors assigns a negative weight to the risk present in the actor’s
region of residence. This reward incentivizes actors to make their neighborhood
safer, benefiting all of the residents, not just themselves. We use the magnitude of
this weight to capture an actor’s degree of altruism, as this component is the only
one not associated with the actors themselves. The mean of the distribution used
to generate the magnitude of this weight can change based on the actor’s religion.
4.3.5 Actor beliefs
Actors are able to observe all state features associated with themselves, giving them
accurate beliefs about the values of those variables. This model thus assumes that
actors are able to accurately assess their health, location, employed, grievance,
etc. Furthermore, when there is no hurricane (phase=none), they are also able to
observe that fact, as well as the levels of risk and shelter risk in their region of
residence.
However, when a hurricane is present (phase is either approaching or
active), actors have only partial observability of the true values of the variables
not associated with themselves. In particular, they do not directly observe the values
of the hurricane’s category, the risk of their region of residence, the shelter risk of
their nearest shelter(s), nor their own risk.
Instead, they receive an uncertain observation, perceived category, which
is drawn each day from a distribution that is conditioned on the hurricane’s true
category. Actors are given a static assignment to one of three such distributions,
recorded in their information distortion attribute: overestimate either overestimates
or matches the true category value, underestimate either underestimates or matches,
and none yields the correct observations with 100% probability. The overestimate
(underestimate) distribution yields either the true category value or that value plus
(minus) one. The probability of receiving the incorrect value is fixed throughout a
given simulation instance and is the same for both types of information distortion.
If the incorrect value is outside the acceptable category range of 1–5, then the prob-
ability is 0.
Upon receiving information about the hurricane’s category, the actors do a
Bayesian belief update, following the standard POMDP belief-update algorithm
(Kaelbling et al. 1998). The actors all have complete knowledge of the hurricane
dynamics as described in Sect. 4.1. They are thus able to compute the likelihood
over possible transitions in the category value from the current values in their
beliefs. They combine these expectations with the observation they receive to com-
pute a posterior distribution over category in their new belief state.
Unlike category, actors do not receive any information about the risk of their
region of residence, the shelter risk of their nearest shelter(s), nor their own risk
while a hurricane is active. They are still able to form and update their beliefs
13
about these variables using their complete knowledge of the effects of the hurri-
cane on them. In other words, they start from the possible values for category in
their updated beliefs and then apply the deterministic effects described in Sect. 4.2
to determine the implied values for the regional risk variables. They then apply the
deterministic effects described in Sect. 4.3.2 to determine the values for their risk
implied by these updated values of regional risk. They are thus able to compute a
posterior joint distribution over category, the regional risk and shelter risk vari-
ables, and their own risk for their updated belief state.
While actors do not receive any direct information about their own risk, they do
receive observations of variables affected by it, such as their own health. For exam-
ple, after an actor sees its health decrease (e.g., due to injury), it should believe its
risk level to be higher than it previously thought. The POMDP belief-update algo-
rithm used by our agents realizes such changes in posterior beliefs. In particular,
the actors use their complete knowledge about the dependency between their risk
and health (described in Sect. 4.3.1) and their observed health and children health
values to compute a posterior distribution over the joint category, regional risk,
and personal risk variables. This distribution reflects the consistency of the hypoth-
esized severity of the hurricane and its impact with the actors’ information received
(perceived category) and personal experience (health and children health).
Thus, despite the possibly erroneous information about category and their inabil-
ity to directly observe the risk at either the regional or personal level, the actors
are able to update their beliefs based on their own experience and their knowledge
of how their physical environment works. Each day brings additional evidence that
they use to update their beliefs, resulting in the vast majority of actors having accu-
rate beliefs about the hurricane severity by the time it leaves the area, regardless of
what misconceptions may have formed at its onset. The distribution from which per-
ceived category is drawn quantifies the stochasticity in the belief-update process.
Actors beliefs are limited to only those variables that concern themselves, the
hurricane, and regions they may travel to or through. They do not form beliefs about
other actors, not even their friends or neighbors. This restriction greatly reduces the
size of the actors’ belief states and the computation time needed for them to rea-
son over the outcomes of their actions given those belief states. It also captures the
limited reasoning that people do about each other, in that it is unrealistic for actors
to form and maintain beliefs about all of the other individual residents in the area.
They are able to use their regions’ risk as a proxy for the well-being of their neigh-
bors. It would be plausible for the actors to maintain beliefs over their small group
of friends, but we do not have them do so in the current simulation.
4.3.6 Actor decision making
The heart of the actors’ behavior generation is in their POMDP-based decision mak-
ing. We describe this decision making using an online version of the algorithm,
where each actor reasons about which action maximizes expected reward given its
current beliefs (Ross et al. 2008). To do so, it considers each of its available actions
separately, generating expectations of the effect of each candidate action, and choos-
ing the action that leads to the highest expected reward.
13
Disaster world 99
Expectation generation occurs by running a hypothetical simulation within

the actor’s subjective frame of reference. In other words, the actor uses the same
algorithms as the actual simulation itself, but restricted to only the variables in
its beliefs and the possibly incorrect and uncertain values it attributes to them.
Each actor has its own horizon attribute that specifies how many days it runs this
hypothetical simulation. The values for this attribute are drawn from a uniform
distribution over an interval of integers (specified for each simulation instance),
and they do not change.
The actors’ hypothetical simulation follows the same sequence of turns on each
day, although starting with their own turn first, because at this point the groups have
already made their decisions. Their expectations of the effect of their current action
choice must also consider the effects of these other turns as well. During an actor’s
own turn, it would theoretically need expectations about all of its fellow actors’
actions to correctly assess the effect of its own. For example, move to home will be
more desirable if the actor’s neighbors all choose to perform decrease risk. How-
ever, actors do not form beliefs about each other, so they have no basis for forming
such expectations. Thus, their hypothetical simulation ignores the possible actions
of other actors and, under most circumstances, will generate different outcomes
from the actual simulation.
In contrast, we give the actors accurate models of the system in that they can pre-
dict how the government would allocate aid under different circumstances. In this
simulation, that allocation is a function of the risk levels across all of the regions.
However, actors do have beliefs about the risk in only those regions that are rel-
evant to them: their region of residence, the region on their evacuation path, and the
region of the nearest shelter(s). Actors are therefore unable to form accurate expec-
tations of what the system-level action will be, leading to another deviation of this
hypothetical simulation from the actual one.
The actors also have accurate models of the hurricane dynamics, and they apply
those models to their uncertain beliefs about the hurricane state. However, reason-
ing about all of the possible hurricane trajectories (in terms of both category and
location changes) incurs a prohibitive computational cost. Furthermore, it is not
psychologically plausible that real-world actors are able to reason so exhaustively.
Our actors instead consider only the most likely outcome of the hurricane dynam-
ics, leading to the possibility of yet another divergence between the hypothetical and
actual simulations.
The actors have accurate models of their regional group’s inner workings
(described in Sect. 4.4, but again, they do not have beliefs about the other members
of that group. They are thus unable to generate expectations about what the group
is going to do. We therefore skip the group’s turn within the actor’s hypothetical
simulations.
For actors whose horizon is more than one day (which is the vast majority of
actors across our simulations), the hypothetical simulation must generate expecta-
tions about what they themselves will do in the future. To generate such expecta-
tions for the actors’ actions on the following day, we recursively invoke this same
hypothetical simulation procedure, with the only change being to use a horizon that
is horizon−1 . This avoids an otherwise infinite recursion, as well as reducing the
13
fidelity of the actors’ expectations about their own behavior as they look farther into
the future.
The end result of these hypothetical simulations is a value function that computes
the actor’s Expected Reward, a table over its possible action choices. Actors choose
the action that has the highest value in this table, with ties broken based on a fixed
ordering generated at runtime. Despite the stochasticity in the hurricane dynamics
and the possibility of injury, the computation of Expected Reward is deterministic
given the actors’ beliefs. Thus, their action choices are also a deterministic func-
tion of their belief states. While it is trivial to replace the strict maximization with a
softmax instead, we deliberately avoided introducing such a random component into
the actors’ decision making in this scenario. The variability in the belief states led to
sufficiently plausible diversity of behaviors, so introducing stochasticity into action
selection would have served only to obfuscate the decision-making process.
4.3.7 Actor relationships
There are three types of binary relationships that can exist between actors:
Married to represents a marriage relationship between two actors. Pairings are
chosen randomly at the beginning of the simulation, based on a specified percentage
of married vs. single members of the population and a percentage of same-gender
marriages. An actor can be married to at most one other actor, and the relation-
ship holds for the duration of the simulation. Actors who are married must perform
the same action whenever possible. Whichever actor makes a decision first1 imposes
that decision on the partner (regardless of what that partner’s expected-reward cal-
culation would have otherwise dictated). Thus, married couples act in perfect unison
throughout the course of the simulation. However, partners not making the decision
are still able to answer questions about how (dis)satisfied they were with the action
imposed on them. To do so, actors can compare their expected reward of the action
imposed on them against their expected reward of the alternative action they would
have chosen otherwise.
Friend of represents a friendship between two actors. Pairings are chosen ran-
domly at the beginning of the simulation, and the pairing holds for the duration
of the simulation. Fixed parameters specify a minimum and maximum number of
friends, and each actor has a number drawn from a uniform distribution over that
range. Friends do not influence each other’s reward function, but they do influence
each other’s beliefs. Every day, actors send a message containing their beliefs about
the hurricane’s category to all of their friends. Actors then update their beliefs by
computing a weighted sum over the probability distributions in these messages and
its own beliefs. The weighted sum is guaranteed by three parameters fixed for a
given simulation instance: trust in self, a weight for the actor’s own beliefs; trust
in optimists, a weight for messages that are more optimistic (i.e., a lower expec-
tation for category) than the actor’s own beliefs, and trust in pessimists, a weight
for messages that are more pessimistic (i.e., a higher expectation for category). The
1
Each actor’s decision-making function is invoked in an arbitrarily determined sequence.
13
Disaster world 101
weighted sum of these distributions becomes the actor’s new belief over category.
Decreasing trust in self makes the population more susceptible to social influence.
Increasing trust in pessimists leads to amplification of risk, as the influence process
will lead to higher perceptions of risk. Increasing trust in optimists leads to deampli-
ficiation of risk and the opposite effect.
Neighbor of is an implicit relationship between actors who share the same region
of residence. Because the actors’ region of residence is constant throughout the sim-
ulation, the neighbor of relationships are similarly constant. Actors do not explic-
itly reason about their neighbors, nor even form beliefs about them as individuals.
However, the reward associated with the risk in their region of residence (priority
of neighbors) causes actors to be indirectly incentivized to help their neighbors, as
decreasing the region’s risk also decreases the risk of any neighbors currently stay-
ing home.
4.4 Groups
Each region has an organized group which all residents are eligible to join. Actors
are free to join and leave the group in their region of residence as often as they
prefer, but they cannot join groups in other regions. This possible membership in
a group is captured in each actor’s member of Group Region X, where Region X
is the actor’s region of residence. This is a Boolean variable that is True when the
actor is a member of the group and False otherwise. The initial group members are
selected randomly from the residents according to a fixed probability (which can
change across simulation instances).
Groups essentially act as a monolithic agent, whose beliefs and reward functions
are an aggregation of those of its current members. Each group considers perform-
ing a joint prosocial action (a group version of decrease risk) or else leaving its
members to choose their own individual action. The group makes this decision by
evaluating the expected reward of each option just as actors do, with the only differ-
ence being the group’s particular beliefs, reward function, and horizon. Section 4.4.1
describes how groups arrive at their aggregated belief state. The group’s reward
function simply computes the sum of the rewards received by its individual mem-
bers. Finally, all groups within a simulation instance share the same fixed horizon.
4.4.1 Group beliefs
Because a group’s beliefs are derived from the beliefs of its individual members, the
uncertainty in its beliefs is confined to the same uncertain variables from the actors’
beliefs: category, any relevant regions’ risk and shelter risk, and the personal risk
of its members. The group adopts the same beliefs about its members’ risk levels as
the individual members themselves, as actors do not have beliefs about each others’
risk.
However, the group must form a coherent belief over the hurricane and regional
variables out of the possibly divergent beliefs (i.e., probability distributions) held
by its members. To do so, it extracts one of three distributions from the aggregation
13
of its members’ beliefs: a mean that averages the members’ beliefs, a max distribu-
tion that is the belief with the highest expected value out of all the members’, and a
min distribution with the lowest expected value. The selection used is specified by
a group aggregation attribute on each group, with the value chosen randomly from
a fixed distribution, and with the value staying constant throughout the simulation.
Groups whose group aggregation is mean will form beliefs that are in the “middle”
of their individual members’ beliefs (although the exact beliefs formed may not be
shared by any members). Groups whose group aggregation is max (min) will form
beliefs that mirror the highest (lowest) perception of hurricane severity across their
members.
4.4.2 Group actions
Each day, the groups first decide whether or not to perform a joint decrease risk
action. If a group decides not to, all of the actors in that region choose their action
out of those allowable from Sect. 4.3.2. If a group does decide to perform the joint
action, then actors who already belong to the group have a reduced set of options, as
they can either participate in the joint action or else leave the group (leave Group
Region X ). Leaving the group changes the value of member of Group Region X to
False, but is otherwise identical in effect to the action stay in location.
Actors who do not belong to the group consider their full set of available action
choices, including participating in the joint action, if one was chosen for that day
by their relevant group. Deciding to participate in the joint action is labeled as join
Group Region X and changes the value of member of Group Region X to True.
Joint execution of the decrease risk action has two benefits. First, actors are
exposed to less personal risk when acting jointly instead of individually. In par-
ticular, the increase to personal risk when performing decrease risk is reduced
by a fixed percentage when performing it jointly. Second, the resulting decrease
in the region’s risk is magnified by the same fixed percentage. Thus, acting jointly
increases the benefit to the region as a whole, while also reducing the risk to the
individuals. On the other hand, this benefit occurs regardless of the size of the group,
so there is room for free riders, who still reap the benefit when the group makes their
region safer, while also avoiding even the reduced increase in personal risk.
4.4.3 Group decision making
Just as the actors do, a group performs a hypothetical simulation to compute its
Expected Reward under both options, subject to the misconceptions possibly con-
tained within its aggregated belief state. The group has the benefit of access to all of
its individual members’ beliefs, so it can generate a more accurate expectation of the
cumulative effect during the actors’ turn. Just as the actors’ hypothetical simulation
deterministically generates their Expected Reward table, so does the groups’. Thus,
there is no stochasticity in a group’s decision as a function of its belief state, though
again, the group’s belief state dynamics will exhibit stochasticity due to the stochas-
ticity of its individual members’ beliefs.
13
Disaster world 103
Groups that have fewer altruistic members are less likely to engage in joint
prosocial behavior, as its aggregated reward function will reflect this lesser degree
of altruism. The interaction with the group’s aggregated belief about the risk is
less straightforward. While a higher level of perceived risk would cause more self-
ish actors to avoid danger, it would also offer more room for improvement by the
decrease risk action (which decreases the region’s risk by a percentage). In such
cases, the interaction between the benefit to the region and the health outcomes for
individual members (rather than simply the risk levels) comes into play.
4.5 System
The system level reflects government response to the hurricane. We describe here
the simulation’s default government policy, but this policy could be overridden in
the Prescribe Challenge (described in Sect. 5.3). By default, the government allo-
cates aid to a single region each day. The aid reduces the risk in the chosen region
toward its initial level by a fixed percentage. This percentage is fixed for a given
simulation instance, unless modified by an external prescription (e.g., a tax policy
that gives the government more resources to allocate).
The government chooses which region receives aid by examining the risk of each
region. In the simplest government policy, it simply chooses the region with the
highest risk value. An alternative policy has the government choose the region for
which the product of risk and the number of residents is highest. The most complex
alternative policy replaces the number of residents with a weighted count, account-
ing for the government being susceptible to bias along ethnic and religious lines:
Ethnic bias ∈ [−1, 1] represents how much more weight government gives to
residents of the ethnic majority.
Religious bias ∈ [−1, 1] represents how much more weight government gives to
residents of the religious majority.
Positive bias numbers incentivize the government to favor regions that have more
residents of the ethnic and religious majorities, while negative numbers have the
opposite effect.
The actors’ grievance captures their dissatisfaction with the government’s
response. Residents of regions who do not receive aid will have their grievance
increase toward 1 by a fixed percentage, while residents of the region receiving aid
will have their grievance decrease by that same percentage. This is a very narrow
model of dissatisfaction, as actors do not consider any other variables, not even the
degree to which their region even needs aid (i.e., its risk).
System-level dynamics emerge from an election process that we did not exercise
in any of the challenges, but which allow the biases of the government to change
periodically. More specifically, after an election, the ethnic (religious) bias at the
system level increments by the difference between the total of the grievance values
over all members of the ethnic (religious) majority and the total over all members
of the minority, divided by the total number of voters. If all actors have the same
13
level of grievance, then the biases toward the majority should increase (assuming
the majority to be more populous). On the other hand, this function allows the biases
to shift toward the minority should their grievance values sufficiently exceed those
of the majority.
5 Simulated research challenges
This section presents the specific challenges we posed to the Ground Truth pro-
gram’s designated research teams, who were tasked with conducting social science
on our simulated world.
5.1 Explain
The following scenario description was provided to the research teams as back-
ground material on the simulation in all three challenges:
You are a public policy consultant, assisting local governments in hurricane-
ravaged areas. Although each of these communities has its own particularities,
they all share a common problem: despite the government’s best efforts to pro-
vide shelter and aid to its residents, each hurricane season brings more death,
destruction, and dissatisfaction. Officials are mystified by why their diverse
constituents respond to each hurricane as they do, making it hard to predict
their behavior and choose the most effective intervention. These governments
seek your advice on what policies to implement to minimize the negative
effects of these hurricanes.
You have many tools at your disposal to augment the available government
data with information more targeted to your analysis. Surveys are the most
rudimentary instrument, and also the least expensive to implement. Of course,
self-reported perceptions, motivations, etc. are notoriously fickle in their accu-
racy, so you also have a team of observers ready to give you first-hand reports
of conditions and behavior on the ground. But the main advantage you have
over your competitors is the HoloCane®, your proprietary hurricane simulator.
The HoloCane® places subjects within a hyper-compressed hurricane time-
line of the experimenter’s choosing, and, despite being completely artificial
and almost completely safe, it still engenders responses that reflect its subjects’
behavior during the real thing. Armed with these tools, you approach each new
area with full confidence that you can fulfill the hopes of its government and
residents.
5.2 Predict challenge
The predict challenge provided data from an initial sequence of hurricanes and
asked for predictions on certain outcomes over a subsequent number of hurricanes.
13
Disaster world 105
We provided two different simulation instances: one used for a short-term challenge,
and one used for a long-term one.
5.2.1 Short‑term challenge
In this challenge, data for the first N hurricanes of a given season were provided,
along with the complete trajectory (time series of category, phase, location) pro-
vided for the N + 1 st one. A specific target actor was also identified to be the sub-
ject of queries at the individual level. The goal of this challenge is to predict out-
comes of a future hurricane, but with the hurricane’s inherent stochasticity taken
out of the equation. The following questions were posed with respect to the N + 1 st
hurricane:
1. Global prediction: How many people will evacuate at least once during the new
hurricane?
2. Local prediction: Which region will have the highest percentage of evacuations?
3. Individual prediction: Will the target actor evacuate during the new hurricane?
4. Counterfactual prediction: How would your answers to questions 1 and 3
change if all of the area’s shelters became unusable at the end of hurricane N and
remained unusable throughout hurricane N + 1?
The correct predictions are generated by Monte Carlo simulation. With the hurri-
cane trajectory fixed, the only remaining uncertainty is the injuries incurred by the
actors, which are conditional on their risk levels. Their evacuation decisions are
deterministic given their belief states (as described in Sect. 4.3.6), but the random-
ness of injuries means that there is stochasticity in those belief state trajectories that
makes the correct predictions an expectation over possible outcomes, even at the
individual level of the target actor.
5.2.2 Long‑term challenge
In this challenge, data for an entire hurricane season (4–6 months) were provided,
and we posed questions regarding the subsequent hurricane season, starting exactly
one year from the start of the first. A specific target actor is also singled out for
individual-level queries. Unlike the short-term challenge, no information is given
regarding the hurricanes in the second season, but a specific target actor. We instead
ask for predictions over a longer time period
1. Global prediction: How many people will die?

2. Local prediction: Which region will suffer the highest percentage of deaths?
3. Individual prediction: Will the target actor survive the following hurricane
season?
4. Counterfactual prediction: How would your answers to questions 1 and 3
change if, after the first season ends, but before the following season begins, the
13
government taxes everyone (decreasing everyone’s wealth by 10%), thus enabling

a 50% increase in its aid impact during the following season?
As in the short-term challenge, the correct predictions are generated by Monte Carlo
simulation. The taxation affects the actors’ resources, but does not affect any other
variables directly. One could easily imagine such an action affecting their grievance
level, but we omit such a dependency as there would be no evidence of such an
effect in any of the data provided prior to this challenge.
5.3 Prescribe challenge
We divided our prescribe challenge into short- and long-term challenges, with the
same data provided as in the predict challenge described in Sect. 5.2.
5.3.1 Short‑term prescribe challenge
In the short-term challenge, alternate prescriptions were measured in terms of

Casualties, defined as the number of people who die or are seriously injured
(health< 0.2 ). Actors whose health drops below the threshold at any point during
the N + 1 st hurricane count once toward this metric, regardless of whether they
recover and regardless of how many different times they drop below the threshold.
We ask for prescriptions both in the form of the regional aid allocation of our imple-
mented government (Sect. 4.5) and in the form of an unconstrained policy:
1. Constrained prescription To which region should the government direct its aid
on each day?
2. Unconstrained prescription What should the government do on each day, with-
out changing its current policy of aid allocation?
3. Combined prescription What should the government do on each day, while
being free to change its current policy of aid allocation? In other words, a com-
bination of items 1 and 2.
We again used Monte Carlo simulation to generate the outcomes of alternate pre-
scriptions. For a “reasonable” prescription baseline, we evaluated the default gov-
ernment policy of allocating aid to the region with the highest risk, weighted by the
residents and the system-level biases. For a null baseline, we generated outcomes
under a condition where the government did not allocate any aid at all.
The submitted Unconstrained Prescriptions included evacuation incentives.
5.3.2 Long‑term prescribe challenge
The long-term challenge also asked for prescriptions that would minimize Casual-
ties. In addition to a day-to-day policy for the government to follow, we also asked
for actions that the government could take after the first season (whose data is pro-
vided) and the second season (over which the prescriptions will be evaluated). This
13
Disaster world 107
“offseason” prescription provides room for preparatory activities (building shelters,

accumulating a “war chest”, etc.) that would then lead to different outcomes during
the hurricane season itself.
1. Offseason prescription What should the government do before the next hurricane
season?
2. In-season prescription What should the government do during the next hurricane
season?
3. Combined prescription What should the government do both before and during
the next hurricane season?
As in the short-term challenge, we used null and default aid policies as the baseline.
Submitted prescriptions included taxation (as illustrated in our long-term counter-
factual prediction question). Other prescriptions
6 Accessibility
Addressing the challenges of Sect. 5 obviously requires access to the simulation,

but we aimed to limit accessibility in ways that reflect the real-world limitations on
social scientists studying the same phenomena. This section describes those limita-
tions, in terms of the set of data that we generated up front for potential researchers
(Sect. 6.1), as well as the support provided for both backward- and forward-looking
instruments (Sects. 6.2 and 6.3, respectively).
6.1 Initial data package (IDP)
During each challenge, a package of data was provided that contained the results
of applying a variety of instruments to the simulated population. These instruments
were designed to be representative of the types of instruments that could be applied
to a real population under similar circumstances.
6.1.1 Census
Aggregated demographic statistics of the population were provided, both in total

and broken down by region:
Population Number of residents (including children)

Gender Number of adult residents broken down by gender
Ethnicity Number of adult residents who are of each ethnic group
Religion Number of adult residents of each religion
Age Histogram of ages (in intervals of 5 years)
Employment Number of adult residents who do or do not have a full-time job
13
6.1.2 Environmental data
The initial data package includes accurate records of the hurricanes in the form of a
daily log of each one, with each entry recording:
Category The actual category of the hurricane on that day

Location The actual location of the hurricane on that day
Landed “Yes” if and only if the hurricane’s phase is active; otherwise, “No”.
6.1.3 Casualty statistics
A time series of various casualty numbers was also provided, with values for the fol-
lowing statistics provided for each day of the simulation run:
Deaths Cumulative number of deaths on the given day

Casualties Number of residents who are either dead or injured (health< 0.2 ) on
the given day
Evacuated Number of residents whose location is evacuated on the given day
Sheltered Number of residents whose location is shelterX on the given day
6.1.4 Surveys
Surveys are the simplest instrument for using to access the subjective perspectives
of our actors. We included data from two survey instruments in our initial package:
one conducted during the approach of each hurricane, and one conducted during the
aftermath of each hurricane. In addition to providing data for explanation, predic-
tion, and prescription, these surveys were designed to also exemplify the types of
questions our agents were capable of answering in follow-up research requests as
well.
Pre-hurricane survey For each hurricane, over the days between its phase
changing to approaching and the subsequent change to active, we sampled 10%
of the population. To maximize the coverage of the survey, actors who previously
answered the survey were removed from the pool of available responders2. A more
realistic model would have some percentage of actors refuse to ever answer the sur-
vey, but we prioritized covering all actors instead of realism in this case.
We designed the questions to gather basic information about the actors’ current
circumstances, as well as to probe their thought processes:
Demographics A report of their age, children, employment status, ethnicity, gen-

der, pet ownership, region of residence, and religion. We also included “wealth”,
which mapped the actor’s resources onto a 7-point Likert scale.
At shelter “Are you currently staying at a public shelter?”
Evacuated “Are you currently residing outside the area?”
2
Once all actors had answered the survey, we reset the pool to be all actors.
13
Disaster world 109
Category “What category do you think the approaching hurricane is?”

Risk “I expect the approaching hurricane to pose a significant risk to my fam-
ily and me.” Response on a 1-7 scale, ranging from strongly disagree to strongly
agree.
Anticipated shelter (only if answer to at shelter is no) “I expect to go to a public
shelter during the approaching hurricane. (Response on a 1-7 scale, ranging from
strongly disagree to strongly agree).”
Anticipated evacuation (only if answer to evacuated is no) “I expect to evacu-
ate the area during the approaching hurricane.” Response on a 1-7 scale, ranging
from strongly disagree to strongly agree.
The Category item provides data on actors’ subjective perceptions of the hurricane
severity and demonstrates that they do indeed differ from reality (as reported in
Sect. 6.1.2’s data). The actors respond to this item by simply computing an expecta-
tion over their belief about the hurricane’s category. They then map this expectation
into a 7-point scale.
The Risk similarly illustrates the actors’ subjective perceptions, but it also dem-
onstrates their ability to project into the future. To respond to this item, the actors
do a hypothetical simulation of the hurricane, but using their beliefs as the start-
ing point, instead of the true state. They run this simulation in their “heads” until
the hurricane passes through the area, following the most likely trajectory (accord-
ing to their perceptions of hurricane dynamics, which are accurate in these simu-
lations). This simulation also includes the most likely observation the actor will
receive and the updated belief state resulting from that observation. However, this
hypothetical simulation does not consider the actions (whether good or bad) of other
actors, because other actors are not represented within the actors’ current beliefs. So
these expectations are almost guaranteed to diverge from reality. Regardless, actors
respond to this item by computing an expectation of their beliefs about risk on every
day and then mapping the maximum value over those days to the desired 7-point
Likert scale.
The items anticipated shelter and anticipated evacuation also illustrate how
actors can project into the future, not just to generate expectations of exogenous
events (i.e., the hurricane), but also of their own behavior. The actors are able to rea-
son about and express explicitly consider alternate actions (i.e., their behavior is not
generated by stimulus-response rules). On each day of this hypothetical simulation,
the actors generate expectations about what they themselves will do, computing
their expected reward over their possible action choices. They then use a softmax to
convert these expected rewards into a probability distribution over action choices.
Their responses to anticipated shelter and anticipated evacuation are the maxi-
mum probability they foresee for the corresponding action choice (move to shelter
or evacuate, respectively) over all of the days that the hurricane is approaching or
active. These items thus demonstrate the actors’ ability to reason about the future
and form communicable expectations about their own behavior
Post-hurricane Survey After each hurricane, we sampled 10% of the popula-
tion (again covering the entire population over time, as in the pre-hurricane survey)
to answer a survey after the phase returned to none, but before the next hurricane
13
began approaching. We designed the questions to gather basic information about the
actors’ behavior during the just-ended hurricane, as well as to probe their decision-
making processes behind that behavior:
Demographics As in the pre-hurricane survey, a report of their age, children,

employment status, ethnicity, gender, pet ownership, region of residence, reli-
gion, and wealth (i.e., resources).
At shelter “Did you stay at a public shelter at any point during the last hurri-
cane?” Actors answer “yes” if and only if their location was shelter on any day
during the last hurricane.
Evacuated “Did you stay outside the area at any point during the last hurricane?”
Actors answer “yes” if and only if their location was evacuated on any day dur-
ing the last hurricane.
Injured “Were you injured at any point during the last hurricane?” Actors answer
“yes” if and only if their health< 0.2 on any day during the last hurricane.
Risk “The last hurricane posed a significant risk to myself and my family.” The
actors respond on a 1–7 scale, mapping the highest personal risk level that they
perceived (i.e., an expectation over their beliefs).
Dissatisfaction “The government response to the last hurricane was unfair and
inadequate.” The actors respond on a 1–7 scale, mapping their level of grievance
on the day they take the survey.
Shelter possibility (Only if answer to at shelter is “no”) “I considered moving
to a public shelter during the last hurricane.” The actors respond on a 1–7 scale,
computing the probability of this alternate course of action using a softmax over
the expected reward tables recorded over each day in the previous hurricane.
They report using the value from the day with the highest probability of selecting
move to shelter.
Evacuation possibility (Only if answer to evacuated is “no”) “I considered
evacuating the area during the last hurricane.” The actors respond on a 1–7 scale,
computing the probability of evacuation using the same reasoning as in shelter
possibility.
Stay at home possibility (Only if either at shelter or evacuated were “yes”) “I
considered staying at home throughout the last hurricane.” The actors respond on
a 1–7 scale, computing the probability of evacuation using the same reasoning as
in shelter possibility.
The “possibility” questions provide insight into the actors’ decision-making process,
in that the responses indicate how close they were to choosing an alternate behavior.
The aim of these questions is to highlight that the actors explicitly consider alternate
courses of action, hopefully eliminating the possibility that their behaviors are gov-
erned by a rule-based system.
13
Disaster world 111
6.2 Research requests
Although informative, these IDPs were deliberately insufficient for answering the chal-
lenges, as a key goal of the exercise was to understand what different data were required
by different methods for solving them. We instead provided an accessibility interface
that could support research requests into the simulated world in the same way that
social scientists operationalize their instruments in the real world.
The most-used method of this interface was the one for extracting a time series of
a single variable from the simulation history and then aggregating that series in some
way. For example, the at shelter question from the IDP post-hurricane survey simply
accumulated the values of the actor’s location variable for the interval of the previ-
ous hurricane and returned “yes” if and only if a shelter appeared in those values. We
made a strong assumption that actors had perfect recall of their experiences, behavior,
and even their belief states, no matter how far in the past. While this assumption does
not match human memory, it makes the data more useful in addressing the challenges
presented here.
Across the research requests received, the vast majority were surveys, with most of
the remaining requests being “event journals” or briefs by experts/observers. All such
requests started from the same method for extracting a history of variables for a par-
ticular variable. If the variable in question was one for which an actor had some uncer-
tainty, the time series would be over the actor’s belief states for that variable, not the
actual values.
Some research requests did not fit into this common mold. For example, we received
several requests asking actors whether they were willing to provide aid to different
groups of people (e.g., friends, neighbors, people of the same ethnicity as themselves).
In this case, the question is asking about the relative willingness of the actor to help
these different groups. As described in Sect. 4.3.4, the actors’ priority of neighbors
completely encapsulates their altruism, so their answer to this question weighed the
degree to which this altruism applied to the group in question, and the degree to which
this altruism was able to override their other goals. Thus, actors would be more willing
to give aid to those of the same ethnicity if more of them lived in the same region of
residence. Also, the higher their priority of health, the less willing the actors would be
to give aid to this group.
As another example, a separate request asked actors in the pre-hurricane survey
about the likelihood that the hurricane would hit their region of residence. To answer
this survey item, the actors did a hypothetical simulation of the hurricane using their
beliefs about the hurricane dynamics. This simulation reused the code for the actors’
decision-making, except that it considered only the hurricane’s “turn” and ignored
those of the groups, system, and the actors themselves. Because all of the actors share
correct beliefs about the hurricane dynamics, actors with the same region of residence
responded to this item with the same perceived likelihood.
13
6.3 HoloCane
In addition to backward-looking data extraction and aggregation, the simulation also

allows researchers to bring actors into the “laboratory” for experiments. As outlined
in Sect. 5.1, we explicitly described support for a particular set of experiments via
the HoloCane, a fictional holodeck for subjecting our actors to simulated hurricanes.
To invoke the HoloCane, researchers can specify a sampling of actors and a period
of time (e.g., a number of days or hurricanes). They also have the option to override
the default aid policy of the government with one of their own. They can also pro-
vide a specific hurricane trajectory in place of the usual stochastic dynamics.
The HoloCane then simulates the same ground truth as the actual simulation,
albeit restricted to this smaller pool of subjects and under any alternate aid policy
or hurricane trajectory provided. Actors will exchange messages with only those
friends who are also within this pool. Group membership is likewise restricted to
only those actors participating in the experiment. As a result, the results of a Hol-
oCane experiment will reflect the actual outcome, but may not mirror it exactly.
The utility of the HoloCane lies in the ability to conduct repeated experiments
over the same timeline under different conditions, a luxury not afforded to real-
world hurricane response planners. In addition to the options for overriding gov-
ernmental aid and hurricane dynamics, the researcher can manipulate the initial
conditions of the simulation as desired. Some example manipulations taken from
the research requests submitted are:
1. “participants experience a one-time tax of 20% of their starting wealth”

2. “they are exposed to a psychoactive drug in their drinking water that makes them
more risk averse”
3. “they are subsidized for their evacuation expenses such that they know ahead of
time that the government will reimburse such expenses”
We implemented item 1 by simply reducing each actor’s resources by 20%, con-

tributing to an increased amount of aid to be distributed by the government. We
implemented item 2 by distorting the actors’ expected-reward calculation to not
just weigh the reward of individual outcomes by their likelihood, but to also exag-
gerate the impact of low-reward outcomes and make them less desirable. Item 3
required a change to the impact of evacuate actions on actor resources (reducing
the evacuation cost as described in Sect. 4.3.2) in the transition probability and
the actors’ beliefs about that transition probability.
In the Predict and Prescribe challenges we did not allow requests that ran the
HoloCane over the evaluation time period (the N + 1 st hurricane in the short-
term challenge and the second season in the long-term one). Such a request could
directly generate answers to the challenges (e.g., run a HoloCane experiment to
generate the counterfactual prediction). However, we did allow retroactive experi-
ments (i.e., conducted during the first N hurricanes in the short-term challenge
and the first season in the long-term one), even if they otherwise used exactly the
same conditions as in the challenge.
13
Disaster world 113
7 Conclusion
The simulated hurricane-response scenario covers a broad range of psycho-social

phenomena observed in its real-world counterparts. However, there are obviously
many phenomena not present. While adding even more phenomena would have
made the challenges infeasible to solve within the evaluation timeline, one could
imagine a meta-simulation that supports all of these phenomena, with simulation
instances realizing only a subset of them.
The current simulation’s setup mechanism uses each instance’s parameter set-
tings to specify such subsets. For example, setting the economy threshold for
generating income to 0 essentially removes the economy variable from ground
truth. Even though each region’s economy variable would change value as usual,
it would have no impact on any observable variable, making it a spurious variable
that would be unfair to require in answers to the Explain challenge.
We can potentially make use of this parameterization to automatically gener-
ate simulation instances using randomly selected parameter settings. Each such
setting would correspond to a different ground truth, drawn from a circumscribed
set of possible graphs. The agent decision-making and belief-update algorithms
would proceed in the same manner, ensuring that the behavior is guided by prin-
ciples of rationality, regardless of the type of information distortion, group belief
aggregation, system biases, etc. While there will be edge cases that would lead to
degenerate simulation outcomes (e.g., everyone evacuates as soon as a hurricane
appears), there should be a large space of parameter settings that lead to socially
plausible outcomes that are sufficiently different from each other to support the
challenges.
To make full use of such automatically generated ground truths, we would also
need a fully automated accessibility interface. Human-in-the-loop handling of
research requests is not feasible as the volume of requests goes up. We would
instead have to restrict the accessibility interface to the backward- and forward-
looking methods we already have implemented (e.g., HoloCane requests). These
methods would be able to auto-generate data, both in the form of IDPs and
responses to research requests.
On the other hand, because each simulation instance makes use of only a sub-
set of a broad set of potentially relevant variables, then we can provide research-
ers with the variable superset, ensuring that the requests map directly onto vari-
ables that the simulation either contains or knows with certainty that it does not
contain. Providing this variable set up front also eliminates much of the guessing
on the part of the researchers as to what features are even present in the simula-
tion, even though it does not eliminate the need for vigilance with respect to pos-
sibly spurious variables. Thus, the development done through the Ground Truth
program paves the way for an automated version of the challenges that can sup-
port a larger number of inference teams and a wider range of simulation ground
truth.
Funding This study was supported by Defense Sciences Office, DARPA [Grant No. HR00111820004].
13
References
Boutilier C, Poole D (1996) Computing optimal policies for partially observable decision processes
using compact representations. In: Proceedings of the national conference on artificial intelli-
gence, pp 1168–1175
Boutilier C, Dean T, Hanks S (1999) Decision-theoretic planning: structural assumptions and compu-
tational leverage. J Artif Intell Res 11(1):94
Carley KM, Fridsma DB, Casman E, Yahja A, Altman N, Chen LC, Kaminsky B, Nave D (2006) Bio-
War: scalable agent-based model of bioattacks. IEEE Trans Syst Man Cybern A 36(2):252–265
Collins J, Ersing R, Polen A (2017) Evacuation decision-making during Hurricane Matthew: an
assessment of the effects of social connections. Weather Clim Soc 9(4):769–776
Collins J, Ersing R, Polen A, Saunders M, Senkbeil J (2018) The effects of social connections on
evacuation decision making during Hurricane Irma. Weather Clim Soc 10(3):459–469
Dash N, Gladwin H (2007) Evacuation decision making and behavioral responses: individual and
household. Nat Hazards Rev 8(3):69–77
Demuth JL, Morss RE, Morrow BH, Lazo JK (2012) Creation and communication of hurricane risk
information. Bull Am Meteorol Soc 93(8):1133–1145
Farmer AK, DeYoung SE, Wachtendorf T (2017) Pets and evacuation: an ongoing challenge in disas-
ters. J Homel Secur Emerg Manag. https://doi.org/10.1515/jhsem-2016-0051
Gmytrasiewicz PJ, Durfee EH (1995) A rigorous, operational formalization of recursive modeling. In:
Proceedings of the international conference on multi-agent systems. pp 125–132
Gmytrasiewicz PJ, Doshi P (2005) A framework for sequential planning in multi-agent settings. J
Artif Intell Res 24:49–79
Goodie AS, Doshi P, Young DL (2012) Levels of theory-of-mind reasoning in competitive games. J
Behav Decis Mak 25(1):95–108
Heath SE, Kass PH, Beck AM, Glickman LT (2001) Human and pet-related risk factors for household
evacuation failure during a natural disaster. Am J Epidemiol 153(7):659–665
Hoey J, Little JJ (2007) Value-directed human behavior analysis from video using partially observable
Markov decision processes. IEEE Trans Pattern Anal Mach Intell 29(7):1118–1132
Howard RA (1988) Decision analysis: practice and promise. Manag Sci 34(6):679–695
Howard RA, Matheson JE (eds) (1984/2005a) Influence diagrams. In: The principles and applica-
tions of decision analysis, Vol. II. Strategic Decisions Group, Menlo Park, California, 719–763.
Reprinted, Decision Anal 2, 127–143.
Huang SK, Lindell MK, Prater CS (2016) Who leaves and who stays? A review and statistical meta-
analysis of hurricane evacuation studies. Environ Behav 48(8):991–1029
Hunt MG, Bogue K, Rohrbaugh N (2012) Pet ownership and evacuation prior to Hurricane Irene.
Animals 2(4):529–539
Ito JY, Pynadath DV, Marsella SC (2010) Modeling self-deception within a decision-theoretic frame-
work. J Auton Agents Multiagent Syst 20(1):3–13
JASSS (1998–present) The Journal of Artificial Societies and Social Simulation. http://jasss.soc.sur-
rey.ac.uk/JASSS.html
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochas-
tic domains. Artif Intell 101:99–134
Kim JM, Hill Jr RW, Durlach PJ, Lane HC, Forbell E, Core M, Marsella S, Pynadath D, Hart J (2009)
BiLAT: a game-based environment for practicing negotiation in a cultural context. Int J Artif
Intell Educ 19(3):289–308
Kjaerulff U (1992) A computational scheme for reasoning in dynamic probabilistic networks. In: Pro-
ceedings of the eighth international conference on uncertainty in artificial intelligence. Morgan
Kaufmann Publishers Inc., Milan, pp 121–129
Koller D, Milch B (2003) Multi-agent influence diagrams for representing and solving games. Games
Econ Behav 45(1):181–221
Lazo JK, Bostrom A, Morss RE, Demuth JL, Lazrus H (2015) Factors affecting hurricane evacuation
intentions. Risk Anal 35(10):1837–1857
Lindell MK, Perry RW (2012) The protective action decision model: theoretical modifications and
additional evidence. Risk Anal 32(4):616–632
Lindell MK, Lu JC, Prater CS (2005) Household decision making and evacuation in response to Hur-
ricane Lili. Nat Hazards Rev 6(4):171–179
13
Disaster world 115
Luke S, Cioffi-Revilla C, Panait L, Sullivan K, Balan G (2005) MASON: a multiagent simulation envi-
ronment. Simulation 81(7):517–527
MABS (1998–present) Proceedings of the international workshop on multi-agent-based simulation.
http://www.pcs.usp.br/~mabs/
Marsella SC, Pynadath DV, Read SJ (2004) PsychSim: agent-based modeling of social interactions and
influence. In: Proceedings of the international conference on cognitive modeling. pp 243–248
McAlinden R, Pynadath D, Hill RW Jr (2014) UrbanSim: using social simulation to train for stability
operations. In: Ehlschlaeger C (ed) Understanding megacities with the reconnaissance, surveillance,
and intelligence paradigm, chap 10. pp 90–99
NOAA (2020) U.S. billion-dollar weather and climate disasters. https://www.ncdc.noaa.gov/billions/.
Accessed 23 Sept 2020
Paruchuri P, Chakraborty N, Gordon G, Sycara K, Brett J, Adair W (2013) Inter-cultural opponent behav-
ior modeling in a POMDP based automated negotiating agent. In: Models for intercultural collabo-
ration and negotiation. Springer, pp 165–182
Polich K, Gmytrasiewicz P (2007) Interactive dynamic influence diagrams. In: Proceedings of the Inter-
national joint conference on autonomous agents and multiagent systems. ACM, p 34
Pynadath DV, Marsella SC (2005) PsychSim: modeling theory of mind with decision-theoretic agents. In:
Proceedings of the international joint conference on artificial intelligence. pp 1181–1186
Pynadath DV, Marsella SC (2007) Minimal mental models. In: Proceedings of the conference on artificial
intelligence. pp 1038–1046
Pynadath DV, Rosoff H, John RS (2016) Semi-automated construction of decision-theoretic models of
human behavior. In: Proceedings of the international conference on autonomous agents and multia-
gent systems
Ross S, Pineau J, Paquet S, Chaib-Draa B (2008) Online planning algorithms for POMDPs. J Artif Intell
Res 32:663–704
Schott T, Landsea C, Hafele G, Lorens J, Taylor A, Thurm H, Ward B, Willis M, Zaleski W (2019)
Saffir–Simpson hurricane wind scale. https://www.nhc.noaa.gov/pdf/sshws.pdf, published by the
NOAA. Accessed 23 Sept 2020
Si M, Marsella SC, Pynadath DV (2010) Modeling appraisal in theory of mind reasoning. J Auton Agents
MultiAgent Syst 20(1):14–31
Sun R (2006) Cognition and multi-agent interaction: from cognitive modeling to social simulation. Cam-
bridge University Press, Cambridge
Tatman JA, Shachter RD (1990) Dynamic programming and influence diagrams. IEEE Trans Syst Man
Cybern 20(2):365–379
Wang N, Pynadath DV, Marsella SC (2015) Subjective perceptions in wartime negotiation. IEEE Trans
Affect Comput 6(2):118–126
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
David V. Pynadath is the Director for Social Simulation Research at the USC Institute for Creative Tech-
nologies and a Research Assistant Professor in the USC Computer Science Department. He has published
papers on social simulation, multiagent systems, teamwork, plan recognition, and adjustable autonomy.
He is the co-creator and maintainer of PsychSim, a multiagent social simulation framework that has been
used in interactive simulations for teaching urban stabilization operations, cross-cultural negotiation, and
avoiding risky behavior. Dr. Pynadath’s work on PsychSim is a key component of his long-term research
into applying decision-theoretic multiagent methods to models of behavior. He has developed multia-
gent systems for applications in social simulation, virtual training environments, human-robot interac-
tion, automated personal assistants, and UAV coordination. He has used such systems to create models
of human decision-making in scenarios including ethnic conflict, traffic, classroom violence, negotiation,
and disaster response.
Bistra Dilkina is an Associate Professor of Computer Science at the University of Southern California,
co-director of the USC Center of AI in Society, and the inaugural Dr. Allen and Charlotte Ginsburg Early
Career Chair at the USC Viterbi School of Engineering. Her research and teaching center around the
13
integration of machine learning and discrete optimization, with a strong focus on AI applications in com-
putational sustainability and social good. She received her PhD from Cornell University in 2012 and was
a Post-Doctoral Associate at the Institute for Computational Sustainability. Her research has contributed
significant advances to machine-learning-guided combinatorial solving including mathematical program-
ming and planning, as well as decision-focused learning where combinatorial reasoning is integrated
in machine learning pipelines. Her applied research in Computational Sustainability spans using AI for
wildlife conservation planning, using AI to understand the impacts of climate change in terms of energy,
water, habitat and human migration, and using AI to optimize the fortification of lifeline infrastructures
for disaster resilience. She has over 80 publications and has co-organized or served as a chair to numerous
workshops, tutorials, and special tracks at major conferences.
David C. Jeong is an Assistant Professor in the Department of Communication at Santa Clara University.
His research areas include the study of VR, haptics, and gaming within human-computer interaction,
as well as critical approaches to online toxicity within games and social media. At Santa Clara Univer-
sity, he leads the Imaginarium Lab, which specializes in VR/AR/XR development, 3D modeling, data
visualization, digital humanities, and high performance computing. His recent work has been published
inFront iers of Pychology, IEEE Robotics and Automation Letters, Proceedings of AI and VR (AIVR),
the Proceedings of the Autonomous Agents and MultiAgent Systems (AAMAS), and the Proceedings of
the International Conference on Intelligent Virtual Agents (IVA).
Richard S. John is a Professor of Psychology and Associate Director at the Center for Risk and Economic
Analysis of Threats and Emergencies (CREATE) at the University of Southern California. His research
focuses on normative and descriptive models of human judgment and decision making and methodologi-
cal issues in the application of decision analysis and probabilistic risk analysis (PRA). Richard received
his PhD. in quantitative psychology from the University of Southern California in 1984, M.S. in applied
mathematics from the University of Southern California in 1983, and B.S. in applied mathematics
(summa cum laude) from the Georgia Institute of Technology in 1976.
Stacy C. Marsella is a Professor at Northeastern University in the Khoury College of Computer Sciences
with a joint appointment in the Department of Psychology. His research is in the computational modeling
of human cognition, emotion and social behavior, both as a basic research method in the study of human
behavior as well as for use in a range of applications. His work has been applied to the modeling of
human behavior for large scale social simulations, realization of effective human-AI teamwork as well as
design of virtual humans, software entities that look human and can interact with humans in virtual envi-
ronments using verbal and nonverbal behavior. He is the co-creator of the PsychSim multi-agent social
simulation framework.
Chirag Merchant is a software engineer based in Los Angeles, CA, USA. He has developed software
professionally for 20 years. At USC’s Institute for Creative Technologies, he develops prototypes, simula-
tions, game-based training applications, educational games, and research support software. He has led
the development of applications used to train leaders, treat PTSD, prevent sexual harassment and assault,
deliver survivor testimonies, teach children AI, learn foreign languages, and visualize simulations. He
holds a Master’s degree in Computer Science from the University of Southern California with a speciali-
zation in Multimedia and Creative Technologies.
Lynn C. Miller is Professor of Communication and Psychology at USC. She is a pioneer in developing and
testing representative assessments (e.g., for risky behavior) and interventions in virtual environments. She
developed systematic representative design (SRD), a new experimental design with both greater causal
inference capacity and generalizability to everyday life. With Read, she developed social computational
models of the underlying personality dynamics (e.g., goals, plans, resources, and beliefs) that could pro-
duce within-person variability across contexts that aggregated could produce the Big-5 (between-per-
sons) linking within-person variability to between-person personality trait structures.
Stephen J. Read is Professor of Psychology at USC. He is a social and personality psychologist, and
cognitive scientist, expert in the computational modeling of human social behavior and social reasoning.
Over the last 30 years, he and Miller have worked on theoretical and computational models of motiva-
tion and human personality, as well as social perception. He has created both symbolic models of human
13
Disaster world 117
personality, and neural network models of human motivation and personality, and social perceptions. His
research covers human motivation and personality, social perception, and human decision-making. He
has published four edited books and over 100 articles.
Authors and Affiliations
David V. Pynadath1 · Bistra Dilkina2 · David C. Jeong2,3 · Richard S. John2 ·

Stacy C. Marsella4 · Chirag Merchant1 · Lynn C. Miller2 · Stephen J. Read2
Bistra Dilkina
dilkina@usc.edu
David C. Jeong
dcjeong@scu.edu
Richard S. John
richardj@usc.edu
Stacy C. Marsella
Stacy.Marsella@glasgow.ac.uk
Chirag Merchant
merchant@ict.usc.edu
Lynn C. Miller
lmiller@usc.edu
Stephen J. Read
read@usc.edu
1
University of Southern California Institute for Creative Technologies, Los Angeles, USA
2
University of Southern California, Los Angeles, USA
3
Santa Clara University, Santa Clara, USA
4
University of Glasgow, Glasgow, UK
13

Disaster World: Decision-Theoretic Agents For Simulating Population Responses To Hurricanes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Disaster World: Decision-Theoretic Agents For Simulating Population Responses To Hurricanes

Uploaded by

Copyright:

Available Formats

Computational and Mathematical Organization Theory (2023) 29:84–117

S.I. : GROUND TRUTH: IN SILICO SOCIAL SCIENCE (GTIS3)

David V. Pynadath1 · Bistra Dilkina2 · David C. Jeong2,3 · Richard S. John2 ·

Published online: 18 May 2022

Keywords Social simulation · Decision theory · Partially observable Markov

2 Disaster response in real‑world hurricane scenarios

Understanding how the residents of these communities response to hurricanes is

PsychSim draws from a variety of decision-theoretic frameworks that are rich

Fig. 1 Dynamic influence diagram visualization of a simulation instance

Phase ∈{none, approaching, active} specifies the hurricane’s current phase:

We divide the area into a 4 × 4 grid of rectangular regions, as reprsented by the

Fig. 3 Population map in the middle of a category 2 hurricane

Fig. 4 Population map in the middle of a category 3 hurricane

An actor’s state is represented by the following variables:

Location ∈{home, shelter, evacuated} indicates the actor’s current location,

Resources aggregates whatever financial resources an actor can currently bring

4.3.2 Actor movement actions

4.3.3 Actor pro/antisocial actions

Priority of children’s health assigns a positive weight to the actor’s children

4.3.6 Actor decision making

Expectation generation occurs by running a hypothetical simulation within

4.4.3 Group decision making

5 Simulated research challenges

1. Global prediction: How many people will die?

government taxes everyone (decreasing everyone’s wealth by 10%), thus enabling

5.3.1 Short‑term prescribe challenge

In the short-term challenge, alternate prescriptions were measured in terms of

5.3.2 Long‑term prescribe challenge

“offseason” prescription provides room for preparatory activities (building shelters,

Addressing the challenges of Sect. 5 obviously requires access to the simulation,

6.1 Initial data package (IDP)

Aggregated demographic statistics of the population were provided, both in total

Population Number of residents (including children)

Category The actual category of the hurricane on that day

Deaths Cumulative number of deaths on the given day

Demographics A report of their age, children, employment status, ethnicity, gen-

Category “What category do you think the approaching hurricane is?”

Demographics As in the pre-hurricane survey, a report of their age, children,

In addition to backward-looking data extraction and aggregation, the simulation also

1. “participants experience a one-time tax of 20% of their starting wealth”

We implemented item 1 by simply reducing each actor’s resources by 20%, con-

The simulated hurricane-response scenario covers a broad range of psycho-social

Authors and Affiliations

David V. Pynadath1 · Bistra Dilkina2 · David C. Jeong2,3 · Richard S. John2 ·

You might also like

2 Disaster response in real‑world hurricane scenarios

Fig. 1 Dynamic influence diagram visualization of a simulation instance

Fig. 3 Population map in the middle of a category 2 hurricane

Fig. 4 Population map in the middle of a category 3 hurricane

4.3.2 Actor movement actions

4.3.3 Actor pro/antisocial actions

4.3.6 Actor decision making

4.4.3 Group decision making

5 Simulated research challenges

5.3.1 Short‑term prescribe challenge

5.3.2 Long‑term prescribe challenge

6.1 Initial data package (IDP)