You are on page 1of 15

International Journal of Disaster Risk Reduction 55 (2021) 102110

Contents lists available at ScienceDirect

International Journal of Disaster Risk Reduction


journal homepage: http://www.elsevier.com/locate/ijdrr

An uncertainty-aware framework for reliable disaster damage assessment


via crowdsourcing
Asim B. Khajwal, Arash Noshadravan *
Zachry Department of Civil & Environmental Engineering, Texas A&M University, College Station, TX, 77843, USA

A R T I C L E I N F O A B S T R A C T

Keywords: Accurate and timely estimation of incurred damages is a critical component of effective disaster management,
Natural disaster usually performed by trained inspectors and experts. The limitations in resources and workforce can hinder the
Citizen science timely acquisition of critical information and make the process costly. Crowdsourcing and participatory disaster
Post-disaster damage
damage assessment have emerged as a possible solution to address this challenge. However, such approaches
Risk assessment
generally suffer from a lack of reliability. This research improves the effectiveness of crowdsourcing in post-
disaster damage assessment by enhancing the content and reliability of information gathered through public
participation. The paper presents a novel framework for quantification and reduction of uncertainty in the
outcome of participatory damage assessment. First, to reduce the complexity and subjectivity, the classification
of overall damage state is decomposed into more straightforward microtasks in the form of a questionnaire
survey. A decision rule is implemented to infer the damage state of buildings from the participant responses.
Second, an information-theoretic model based on a maximum a posteriori probability estimation is presented for
obtaining an accurate probabilistic description of the inferred damage states while quantifying and accounting
for the reliability of the citizen participants as well as the relative ambiguity of images. A pilot study is presented
by involving 70 non-expert citizen participants to assess the post-disaster imagery of 60 buildings collected
following Hurricane Harvey. A comparison of the outcome with the available expert labels shows relatively high
accuracy. The proposed model also outperforms the common majority-vote approach, especially as the number
of unreliable participants increases.

1. Introduction development can potentially increase the speed at which post-disaster


assessments are performed. However, the reliance on specific experts
Post-disaster damage assessment forms a critical under-current for and trained professionals to make inferences about the state of damage
any disaster-related decision-making. Information and narratives of of the buildings from the visual data pose many limitations in terms of
disaster impact provided by the damage assessment have a significant cost, human resources, spatial coverage, accessibility, and speed.
impact on the development of both post-disaster response and recovery Moreover, it can cause a delay in addressing other expert-reliant de­
models as well as planning for future mitigation strategies. These as­ mands that may emerge in disaster management. These challenges call
sessments are often being performed by experts, trained professionals, for alternative resources and more efficient, yet reliable means to
and reconnaissance teams by means of visually inspecting the damaged perform preliminary damage assessment and extract desired informa­
structures and infrastructure in the affected region. Assessment teams tion from the collected field data in the disaster-affected community.
mainly rely on photographs and videos collected during the survey to As the availability of social networking infrastructure and advances
assess the state of damage off-site or leverage them in the verification in data retrieval and data processing techniques over the last decade has
and validation of their on-site assessments. With emerging technologies facilitated web-based research collaborations, the reliance on external
in remote sensing and imaging, especially using images captured by participants and freelancers has gained significant popularity as an op­
satellites and Unmanned Aerial Vehicles (UAVs) in the disaster assess­ portunity to enhance the efficiency of various scientific tasks and
ment and inspection domain, the speed and spatial extent of visual data collaborative science projects [1,2]. This has led to the emergence of the
collected in the aftermath of a disaster has increased manifold. This concept of citizen science or various other similar terms such as

* Corresponding author.
E-mail address: noshadravan@tamu.edu (A. Noshadravan).

https://doi.org/10.1016/j.ijdrr.2021.102110
Received 15 September 2020; Received in revised form 2 January 2021; Accepted 31 January 2021
Available online 5 February 2021
2212-4209/© 2021 Elsevier Ltd. All rights reserved.
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

participatory research, participatory sensing, and crowdsourcing, etc. applications. Section 3 outlines the methodology adopted in the present
[3]. In addition to its applications in a wide range of domains like as­ study along with the underlying mathematical descriptions. Section 4
tronomy [4], geography [5], genetic genealogy [6], ornithology [7], presents the case study and implementation of the proposed methodol­
etc., crowdsourcing is considered a promising approach for generating ogy to the post-hurricane damage assessment. Section 5 presents and
databases for disaster-related studies and decision making [8,9]. With discusses the results obtained from the study followed by the limitation
the increase in frequency and intensity of extreme climate events and the and future directions.
management complexities encountered during disaster relief, our
thinking about disaster response has witnessed a paradigm shift towards 2. Literature review and background
leveraging and managing public participation and negotiated integra­
tion of information [10,11]. The last few years have witnessed a trend of The term “crowdsourcing” first found its mention in an article by
increasing focus on the use of crowdsourcing in disaster risk reduction Howe [22], and ever since then, its applications in various fields of
and damage evaluation literature [9]. Although volunteerism has al­ research have been tested and practiced. Crowdsourcing refers to
ways been associated with disaster management, the extensive and outsourcing tasks to the public domain and harnessing the collective
pervasive technological advancement has extended the scope and op­ intelligence of the citizens/crowd for inferring/obtaining the desired
portunities for the volunteers to participate in emergency situations like information. Also referred to as “participatory sensing”, the idea is to
disaster response, disaster risk reduction, etc. Ricardi [12] conducted a capitalize on the power of a crowd and rely on citizen participation for
study that involved interviewing academicians about the use of the realization of the desired objectives [23]. In several applications, the
crowdsourcing during disaster operations. The study revealed that so-called “wisdom of the crowd” has proven to provide insights beyond
disaster managers could significantly benefit from the use of crowd­ the capabilities of individual experts. All this can be obtained at a small
sourcing. The usefulness of citizen involvement against disaster risk has monetary expense or even free of cost at times. With advances in
been successfully demonstrated in several previous events like the 2010 internet connectivity and social media networking, crowdsourcing ap­
Haiti earthquake [13,14], 2011 Christchurch earthquake [15], 2016 plications have not only risen in popularity but are also becoming more
California drought and wildfire [16], 2017 Hurricane Harvey [17], etc. and more efficient.
to name a few. With the ability to provide and generate actionable in­ Most of the traditional applications for crowdsourcing and citizen
formation, crowdsourcing, and citizen-driven assessment not only has participation have been restricted to employing citizens as “citizen ob­
the ability to alleviate the disaster analysts and experts of the pressure servatories” [24]. However, with emerging technologies, citizen
following the disaster but also has the potential to make the entire involvement in science has started progressing from mere participation
process of post-disaster management more quick, efficient and in data acquisition to being a part of scientific analysis and the
economical. Although several studies over the last few years have decision-making process [25]. Haklay [26] classified the level of
focused on leveraging citizen-collaborations and crowdsourcing as a tool engagement and participation of citizens in citizen science-related pro­
to address issues in the field of disaster management and damage jects into four levels based on the degree of citizen engagement. The first
assessment [18,19], the major issue remains with the reliability and and most basic level refers to the participation of citizens as mere sensors
exact quantification of uncertainties around the crowdsourced re­ or observation points or data collectors. As we move up in the citizen
sponses. Although efficient and cost-effective, crowdsourcing falls short engagement ladder, the cognitive capability and intelligence of the
of high quality and completely reliable results, more so when the tasks participating citizens can be leveraged to use them as data interpreters,
are complicated and require domain-specific skills and specialties [20]. participants in problem definition, and even in the scientific analysis.
As a result, crowdsourced annotations and labels are generally noisy and [27] is an example of a citizen science project working at the level of
poor in quality and hence require additional processing and validations “distributed intelligence”, in which the intelligence of the participants is
to infer the underlying ground truth [21]. used to classify galaxies.
The main goal of this paper is to advance the effectiveness of
crowdsourcing in post-disaster damage assessment by enhancing 2.1. Crowdsourcing in the context of natural disasters
cognition, content, and reliability of information gathered through
public participation. In particular, for the case of hurricane-induced In the context of natural disasters, the citizen-centric approach has
damage, this study aims at improving the process of conducting and been widely used and can potentially prove to be a vital breakthrough
inferring citizen assessment of building damage states such that the compared to the conventional monopolistic expert-driven approach [9].
outcome is more reliable and informative. This is achieved by proposing With the help of crowdsourcing, we can easily extend the regional
and testing an information-theoretic framework based on a maximum a coverage of the damage assessment while being logistically more
posteriori (MAP) probability estimate that enables enhancing the reli­ effective and feasible at the same time. Citizen involvement in different
ability as well as quantifying, and where possible, reducing the uncer­ phases of disaster management mitigation, preparedness, response, and
tainty in disaster damage assessment under the purview of citizen recovery has already gathered promising attention from the scientific
science. The proposed study will lead to a more reliable design, inter­ community, more so with the growing technological support for sys­
pretation, and uncertainty quantification, thus enabling more informed tematic information transfer and social media. The concept of “citizen
decision making and definitive estimations of the damage from the in­ observatory” has started gaining traction over the last few years, spe­
formation obtained by engaging volunteer citizens in disaster manage­ cifically in the context of flood risk mitigation and environment moni­
ment. The specific contributions of the present study are trifold: (a) we toring [24]. Goodchild and Glennon [28] employed citizen participation
propose and test a micro-tasking scheme in the form of a questionnaire to map the geographic information vital to effective post-disaster
survey for reducing complexity and subjectivity of participatory damage response, exemplified with its application to a series of wildfire events
assessment when engaging citizen-participants in post-disaster damage in Santa Barbara. Wang et al. [29] used crowdsourced data obtained
assessment; (b) we present a probabilistic framework for quantifying from social media platforms to monitor urban flooding risk and vali­
reliability and incorporating the underlying uncertainty, which leads to dation of hyper-resolution numerical models. Yuan and Liu [30] used
a more informed inference of damage assessment outcome; (c) we the crowdsourced information collected via social media to identify
demonstrate how the proposed approach can support informed decision critically affected areas during a disaster. Hao and Wang [31] proposed
on disaster risk assessment. a data-driven method based on social media images and textual mes­
The structure of the paper is organized as follows. Section 2 consists sages from the crowd acquire information and supplement conventional
of the background and a thorough review of literature in the context of damage assessment methods. Kumar [32] discusses the use of crowd­
natural disasters and quality control involved in crowdsourcing sourcing to rescue cultural heritage during disasters. See [33] reviews

2
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

the current activities in crowdsourcing and citizen science applications et al. [41] proposed a comparison-based approach to enhance the
of flooding scenarios. Poblet at al. [34] reviews the various crowd­ effectiveness of the citizen-workers in assessing the damage state of the
sourcing tools and methods developed for application to different stages region. The use of pairwise-comparison as a strategy to rank and classify
of emergency and disaster management. Concerning post-disaster images in citizen-based applications is also studied in the work of several
damage assessment of buildings, Ghosh et al. [35] successfully used other researches [42–44]. Although micro-tasking and proper design of
crowdsourcing for rapid damage assessment following the 2010 Haiti the crowdsourcing activity/experiment can lead to better and compa­
earthquake based on remote-sensing as the source of information for rably reliable responses from the crowd, the citizen responses may still
building damage. The assessment performed with crowdsourcing was not be consistent. The possible uncertainties and offsets in crowd
validated with the field assessment carried out by the European Com­ response still need to be accounted for in order to make reliable in­
missions’ Joint Research Centre (JRC) team and the Earthquake Engi­ ferences. In this regard, several studies focused on identifying and ac­
neering Field Investigation Team (EEFIT). This assessment was a part of counting for the unreliability in the crowd response and making
the Global Earth Catastrophe Assessment Network (GEO-CAN) devel­ inferences from the subjective labels provided by multiple
oped to facilitate rapid damage assessment. The same network was also crowd-workers. Raykar et al. [45] describes different probabilistic ap­
employed to assess building damages from 2008 Wenchuan, China proaches to estimate the hidden labels from the noisy response provided
post-earthquake satellite imagery through interpretations from experts by multiple annotators. Whitehill et al. [46] developed an
by crowdsourcing [15]. Building upon the efforts in Haiti and the pilot Expectation-Maximization (EM) based inference model called GLAD
study on the Wenchuan earthquake, the crowdsourcing-based post-­ (Generative model of Labels, Abilities, and Difficulties) to simulta­
disaster damage assessment of buildings was expanded for the 2011 neously infer the participant expertise and task difficulty along with
Christchurch earthquake with the goal of making the process faster and making inferences about the true labels associated with a given task.
efficient [36]. Unlike the previous two efforts, the 2011 Christchurch Raykar and Yu [47] proposed a model based on Receiver Operating
effort allowed the non-experts as well to participate in the damage Characteristics (ROC) curve analysis to infer ordinal annotations from
assessment. The success of these studies led to the development of other multiple annotators. Kamar et al. [48] investigates the use of machine
crowdsourcing-based damage assessment platforms like the 2011 Japan learning and Bayesian predictive models to predict the behavior of
Disaster Response Platform (DRP) to assess the damage to buildings anonymous workers in a crowdsourced classification task. Venanzi et al.
affected by the Japan tsunami of 2011. Xie et al. [37] developed a [49] proposed a community-based Bayesian label aggregation model for
web-based platform to collect the results of post-earthquake building the accurate extraction of labels from a crowdsourced dataset. Nguyen
damage assessments contributed by public participants, following the et al. [50] presented a probabilistic statistical learning model to identify
2010 Yushu earthquake in China. However, most of these studies are unreliable crowd responses to partially subjective tasks. Li et al. [51]
based on remote sensing and satellite images and hence focus on iden­ proposed a probabilistic graphical annotation model to make inferences
tifying the damage ‘extent’ for buildings rather than damage level as­ about the underlying ground-truth and annotator’s behavior. While
sessments [38]. these studies present the potential possibilities of making accurate in­
ferences from the possible unreliable and noisy responses from the
2.2. Unreliability and quality control in crowdsourcing applications crowd, their application in the disaster efforts specifically in damage
assessment can lead to significant advancement in the citizen-driven
The above discussion notably demonstrates the applicability of research in the disaster community.
crowdsourcing specifically for damage assessment. However, despite
considerable advantages, crowdsourcing and citizen-centric approaches 3. Methodology
suffer from an inherent lack of reliability and quality assurance [39].
While such projects do rely solely on the intellectual resource of the The present methodology proposes the use of crowdsourcing to
citizens, the possibilities of unreliable participants or unreliable re­ evaluate the post-disaster damage state of buildings. While crowd­
sponses, sometimes even from reliable participants, cannot be ruled out. sourcing may promise an appealing alternative to the conventional
There is possibly a potential lack of consistency in the results of the tasks damage assessment performed by experienced domain-experts, it suffers
carried out by participating citizens from diverse professional back­ a major limitation regarding the reliability and credibility of the
grounds. Thus, it is necessary to account for the quality of the partici­ crowdsourced information. In the case of the present application,
pating crowd and efficiently extract reliable inferences from the crowdsourcing may result in unreliable damage classification of build­
anonymous crowd response. This has given rise to another paradigm of ings because of two factors: (a) complexity and subjectivity of the
research in citizen science to ensure reliable collection of information damage classification task and (b) unreliability of the participants.
from the citizens as well as the proper quantification of unreliability and These sources of uncertainties are also present in case of expert
uncertainty in the participant response. In the context of natural di­ involvement, however, to a much lower extent. Owing to the domain-
sasters, the goal hence may be a quick and accurate analysis of large specific knowledge, the expert is in a better position to make subjec­
datasets by employing a network of citizen analysts. For instance, large tive decisions as compared to a non-expert. Also, the reliability of an
volumes of the post and pre-disaster aerial and satellite images can be expert is naturally more than an anonymous volunteer whose domain-
distributed to an online crowd of annotators for identification, classifi­ related knowledge is unknown. Therefore, to make reliable inferences
cation, and prioritization of damaged regions [15]. In fact, Barrington from participatory damage assessment, it is vital to rigorously quantify
et al. [15] outlined three interconnected stages working together to and, where possible, reduce the uncertainty in crowdsourcing of disaster
realize a successful crowdsourcing project. These are (i) Micro-tasking damage stemming from the two abovementioned sources of unreliabil­
the high level, complex tasks into simpler, manageable microtasks, (ii) ity. To achieve this, we propose the following twofold procedure to
Motivating the crowd to participate in the task, (iii) Collecting and quantify and mitigate unreliability and uncertainty so that the in­
combining the crowd responses of varying quality to arrive at reliable ferences about the damage states of the affected buildings based on
conclusions/solutions. Several strategies have been proposed in the crowdsourcing could be more rigorous and reliable.
literature to account for the abovementioned objectives and improvise
on the existing methods and practices. Varshney et al. [40] proposed an 3.1. Addressing the unreliability due to the subjectivity of the task
innovative approach based on the use of simple binary questions as
micro-tasks instead of complex high-level tasks and a decision rule to Since the post-disaster damage assessment of built infrastructure
map the binary responses to the desired high-level task, which otherwise defines the qualitative description of the damage in the building, it is
may require a sufficient amount of domain-related knowledge. Loos intrinsically subjective in nature. The unreliability due to the subjective

3
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

nature of the problem is magnified in crowdsourcing because the non- estimation of the relative reliability associated with each participant, as
expert volunteers may not be competent enough to capture the dam­ well as the ambiguity or the level of difficulty associated with each
age state of the buildings accurately. As such, it is not straightforward image. The outcome from the proposed model is hence expected to
for the citizen participants to accurately identify the damage state of the reduce the discrepancy between the crowdsourcing-based damage
buildings, more so when the participating citizens have no or very less assessment and unobserved ground truth, for instance, the one based on
domain-specific knowledge. To ensure the effective participation of the expert-evaluation. This added reliability is achieved without investing
diverse group of contributing volunteers, the entire task of assessing the excessive resources and time training the non-expert citizen to perform
overall damage state of a building is broken down into simple questions, the damage assessment as in traditional approaches towards participa­
requiring very little or no domain-specific knowledge. Instead of asking tory damaged assessment.
the participants to assess a building for its damage level according to, for
instance, FEMA standard, they can be asked to complete a questionnaire
consisting of a set of simpler observational questions, referred to as 3.3. Mathematical setup
microtasks, which require comparatively less domain-specific knowl­
edge and are as such easier to answer. For example, it can be compli­ The mathematical framework used for the present study is adapted
cated for a participant who has no background in engineering, from [46]. An overview of the mathematical setup in the context of our
architecture, or related fields to assign damage level to a building unless study is presented here. In the present scenario, there are, say N build­
she is extensively trained to do so. On the contrary, if the participant is ings in the affected region, each represented by a set of images taken on
asked to observe an image or set of images depicting a damaged building different directions and angles. Each building belongs to one of the five
and answer if the doors and windows are majorly damaged, whether the damage categories (Dk , k = 0, 1, 2, 3, 4) outlined by the HAZUS-MH
roof is minor damaged or not etc., it will not be a relatively difficult task hurricane model residential damage scale [52]. The aim is to deter­
given they are provided with some basic guidelines or examples. mine the category label of each building by querying from a group of M
Moreover, their reliability in responding to these simple questions will participants. However, due to the lack of distinct visual characteristics
be much more as compared with them responding to the main task. This, between different damage categories associated with the building, such
in turn, will lead to a more reliable overall assessment. The present study damage assessment is very subjective in nature. The observed features
aims to capitalize on this strategy to extract as much reliable informa­ not only depend on the true features of the damaged building but may
tion from the volunteer participants and infer the classification of the also vary depending on a) the ability or competence of the participant in
damaged buildings volunteer participants. A set of appropriate ques­ identifying the actual damage state, and b) the inherent level of diffi­
tions has been carefully designed along with a decision-rule to infer the culty associated with observing the damage from the given set of image
damage class of a building from the participant responses to our (s). Both these factors are quantified in terms of two vector-valued
designed questions. random parameters θ ∈ (− ∞, +∞) and 1/ξ ∈ [0, ∞) respectively. θi ,
where i ranges over the number of participants (i = 1, 2, ⋯, M), repre­
3.2. Addressing the unreliability in the participant responses sents the components of θ and each component accounts for the relative
competence of each participant to make correct observations about the
Besides the unreliability associated with the subjective nature of the assigned visual data. The higher positive value of θi indicates that the
classification, another issue with the crowdsourcing-based inference of participant is highly expert in the assigned task and always responds
damage states is the varying levels of reliability or expertise among the with the correct label/observation. On the other hand, the negative
non-expert participants, which is not known a priori. There is always a value indicates that the participant is adversarial and maliciously pro­
possibility that some participants do not provide accurate responses for vides the incorrect label, contrary to the true label. θi = 0 implies that
different reasons, which may lead to an erroneous classification and the participant is disingenuous or has no skill or competence to identify
labeling of buildings. For instance, there may be some participants who the true features in the assigned images and hence provides arbitrary
provide random responses to the assigned tasks, either out of interest or labels. The inherent difficulty in identifying the representative damage
lack of required expertise. It may also happen that a participant may lose features of the true damage state of the given building from the set of
interest in the middle of the assessment and may provide a mix of reli­ provided visual data is quantified in terms of the reciprocal of ξj , where
able and unreliable responses to questions. There is even the possibility j = 1, 2, ⋯, N corresponds to the building in question and ξj is always
of some adversarial participants who would always respond with positive. 1/ξj = 0 implies that the identification task is very straight­
incorrect responses. In general, due to the anonymity of the crowd, it is forward, and the true damage state of the building can be easily iden­
not possible to specifically point out the unreliable workers from among tified from the provided visual data. 1/ξj = ∞ indicates that the
the participating workers. In addition to this, the level of difficulty ambiguity in identifying the true label from the given image(s) is very
associated with each task, which is not known a priori, can negatively high. In this case, the probability of misclassification, even for the most
affect the reliability, and the degree by which they do, can be different reliable participant, is expected to be close to 0.5.
for different participants. For example, in the case of assigning damage Let us denote lij , i = 1, 2, ⋯, M and j = 1, 2, ⋯, N to be the random
levels to the buildings based on their visual imagery, it is very likely that variable describing the labels assigned to each building based on the
some images may not clearly depict all the details required to assign the participant response. The ground truth or the true label of the building j
damage level to the building correctly. is described by another random variable denoted as Zj . Then the prob­
To address these issues, the present study proposes a probabilistic ability that the assigned label is the same as the true label can be
approach based on inferential statistical analysis and a maximum a modeled as the following logistic function:
posterior estimation to enable more rigorous and reliable inference of
( ⃒ ) 1
the unknown ground truth of building damage levels from the outcome Pr lij = Zj ⃒θi , ξj = θ i ξj
(1)
of crowdsourcing-based damage assessment tasks. First, the responses 1 + e−
from the participants to the questions about damage evaluations The choice of the logistic function used here is based on the manner
(microtasks) are collected and mapped into appropriate damage states in which the underlying parameters defining the task difficulty and
based on a standard guideline using a suitable decision rule. Then, an subjectivity are defined. Given the ways these parameters are defined,
iterative stochastic algorithm is used to obtain the maximum likelihood the behavior of the logistic function provides a suitable and consistent
estimation of probability distributions associated with the damage states mathematical model to map these parameters to the underlying proba­
of each building. Furthermore, this probabilistic inference is performed bility. For instance, in the case of more competent or skilled participants
through a parameterization that also allows obtaining a posterior with higher θi , the probability of labeling correctly is high (close to 1).

4
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

As the image difficulty, quantified by 1/ξj , increases, the probability of probability over all incorrect responses. As such the random variable lij
correct labeling approaches 0.5. Similar behavior is expected when represents the outcomes of independent Bernoulli trials, each with a
participant competence decreases. As θi approaches 0, the chances of the success probability given by equation (1) and the probability of incor­
correctness of the labels drop to 50%. Thus, the logistic function appears rect labelling given by equation (2). Equation (6) can then be substituted
to be a suitable and intuitive mathematical form to model the proba­ in the likelihood term on equation (5) to calculate Q (θ,ξ). The values of
bility of the assigned label being correct in the form of a bi-linear θ and ξ that maximize Q(θ, ξ) can be obtained by setting the gradient of Q
function of the participant competence θi and the task difficulty with respect to θi and ξj equal to 0 and solving the subsequent non-linear
parameter ξj . However, one should note that this is not the only choice equations using iterative methods like gradient descent algorithm.
and that the general framework presented here is not restricted to the
choice of this mathematical model. For instance, if the parameterization
̂
θ, ̂ξ = arg max Q(θ, ξ) (7)
θ, ξ
of uncertainty due to task difficulty and subjectivity was different, other
candidate models may have been suitable to be used in this framework. where ̂ θ and ̂ξ are of the parameters that maximize Q(θ,ξ). The optimum
It is also assumed that the probability that the assigned label/damage values of the unknown parameters obtained are passed on to the
state is incorrect (Lij ∕ = Zj ) is uniform. Thus, the probability of incorrect following E-step (Equation (3)) to iteratively obtain the posterior
labeling (k ∕ = k) is expressed as: probabilities of all the possible damage states for each building, given

[ ] the observed labels and the quantified unknown parameters. The final
( ′⃒ ) 1 1
Pr lij = k ⃒Zj = k, θi , ξj = 1 − − θ i ξj
(2) posterior probabilities corresponding to each damage state is calculated
4 1+e
using equation (3) based on the likelihood term estimated using the
Here, the labels (lij ) assigned to each building i is the observed optimum values of unknown parameters θ and ξ. Equation (5) can be
random variable while the true damage-state/label Zj , participant ac­ easily modified to account for prior over each θi and ξj by adding a log-
curacy θi and the inherent task difficulty ξj are the unknown parameters. prior term for each of these variables. The choice of the prior for θi is
Given this setting, the aim is to determine a maximum a posterior based on the prior information about the nature and reliability of the
probability estimate of the unobserved variables Zj , θi and ξj given the participants. For example, if it is known that most of the participants are
observed labels from different participants. To find the MAP estimates of not adversarial, the prior probability for θi can be made very low for
these unknown variables, Expectation-maximization (EM) algorithm is θi < 0. On the other hand, if the majority of the participants are known
used. The application of EM algorithm in the present case can be sum­ to be adversarial, the prior distribution for θi may be shifted more to its
marized in the following two steps: negative values. For ξj , the prior should have the support in the range [0,
1. E-Step: Given that lj denotes the set of all labels assigned to each ∞). The prior for each zj can also be clamped if, say, it is known a priori
building j by all the participants who labeled the building. The posterior that most of the buildings fall in a specific range of damage states. In the
probabilities of the true damage state of each building, given the absence of such information, however the prior distribution for z, can be
observed labels/damage-state lj obtained from the participants and θ, ξ uniformly distributed over the possible damage states. A flowchart
obtained from the previous M-step can be calculated using: illustrating the outlined computations is presented in Appendix A.
( ) ( ⃒ ) ( ⃒ ) ( ⃒ )
Pr zj |l, θ, ξ = Pr zj ⃒lj , θ, ξj ∝Pr zj ⃒θ, ξj Pr lj ⃒zj , θ, ξj 4. Case study
( )∏ ( ⃒ ) (3)
∝Pr zj Pr lij ⃒zj , θi , ξj
4.1. Data source
i

Here the true label is conditionally independent of the parameters θ


( ⃒ ) ( )
and ξ, hence Pr zj ⃒θ, ξj = Pr zj . The post-hurricane visual damage data collected following Hurri­
2. M-Step: In this step the expectation of joint log-likelihood of the cane Harvey is used in this study to demonstrate the applicability of the
observed and the unknown random variables (lij , Zj ) is maximized, given proposed inference model to the post-disaster damage assessment of the
the parameters θ and ξ obtained in the previous step. Denoting the ex­ buildings. Hurricane Harvey made its landfall on August 25, 2017, and is
pected value of the log-likelihood by Q, this step can be expressed considered one of the worst (Category 4) hurricanes to hit the United
mathematically as: States in a decade, mainly affecting Texas, Louisiana, and some
adjoining regions as well. The data has been collected by the NSF-
Q(θ, ξ) = E[lnPr(l, z|θ, ξ) ]
[ ( )] supported StEER (Structural Extreme Event Reconnaissance) [53]
∏ ∏ (4) network, which primarily deals with the systematic collection of
=E ln Pr(zj ) Pr(lij |zj ,θi ,ξj )
∑ i ∑i
perishable data to inform and address disaster-related research ques­
= E[ln(Pr(zj ) ) ]+ E[lnPr(lij |zj ,θi ,ξj ) ]
j ij tions. The available data is in the form of geotagged images of the
inspected buildings along with the detailed observations based on the
where the parameters θi and ξj are estimated in the previous E-step and door-to-door survey of each inspected building by different damage

lij are conditionally independent given z, θ and ξ. Denoting Pr(zj = k⃒l,θ, inspection teams mobilized by StEER. The database also consists of the
ξ) = p(k) , Q (θ,ξ) can be expanded as: expert-assigned damage states for each building based on the on-site
evaluation and assessment of damages carried out by the expert in­
( )
spection team [66]. The data is hosted by Natural Hazards Engineering
j
∑∑
Q(θ, ξ) = P(k) ln Pr zj = k
j k=1
(5) Research Infrastructure (NHERI) cyberinfrastructure Reconnaissance
j
∑∑ ( ⃒ ) Portal (https://www.DesignSafe-ci.org). The visual data compiled for
+ P(k) ln Pr lij ⃒zj = k, θi , ξj
j k=1 the study includes both the satellite imagery as well as the geo-tagged
ground and aerial photographs of the affected buildings obtained from
Now, from on equations (1) and (2), the likelihood function for the
the abovementioned database. Fig. 1 shows a sample of the visual data
assigned label to each building (lij ) can be calculated using:
compiled for a representative building used in the study. Photographs
( ⃒ ) 1
[ ]1− δ(lij ,k) [ ]δ(lij ,k) showing damage in different directions provide more or less compre­
1 1
Pr lij ⃒zj = k, θi , ξj = 1 − θ i ξj
× θ i ξj
(6) hensive details about the damage state of the observed building as
4 1 + e− 1 + e−
required in the present study.
where δ denotes the Kronecker delta function. It must be noted that
equation (6) is based on the assumption that there is a uniform

5
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

Fig. 1. Sample of the visual data compiled for a single building showing damage in different directions and parts of the building (Source: https://www.
DesignSafe-ci.org).

4.2. The survey questionnaire and decision rule provided at the beginning of the survey to acquaint the participants with
some common terms used in the questionnaire, in case the participants
Based on the visual data available for the inspected buildings, a were not aware. Fig. 2 shows a concise representation of the survey
questionnaire was prepared to ask the participants about the identifi­ questionnaire distributed among the participants, along with the visual
cation of different types of damages observable in the given set of images data of the corresponding buildings. The idea is to get more reliable
corresponding to each building. The questions and their possible re­ information from the participants by reducing the complexity of the
sponses were purposely made simple, and appropriate instructions were questions such that they can be answered with higher confidence and

Fig. 2. Concise representation of the survey questionnaire distributed among the participants.

6
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

ease. The citizen participants were also provided with minimal guide­ proposed decision rules. The presented case study illustrates its appli­
lines about a few technical details involved in the questionnaire to cability in a specific class of buildings.
ensure accurate and consistent interpretation of the questions. A deci­ Fig. 3 divides the damage into 5 damage states, varying between
sion rule was implemented to map the set of responses received from 0 (no damage) and 4 (destruction). A building is considered to be in a
each participant to the relevant damage state based on the guidelines in higher damage state if any of the shaded damage in the corresponding
the HAZUS-MH hurricane model residential damage scale [52,54]. Some row occurs [52]. Therefore, if wall damage indicates that a building is in
aspects of the guidelines provided in the modified damage scale pre­ damage state 3 while the roof damage is insignificant, the building will
sented by Friedland [55] are also considered while implementing the still be classified as in damage state 3. However, a potential challenge in
decision rule. Fig. 3 presents the HAZUS-MH hurricane model for resi­ inferring the damage states from the outcome of damage evaluation
dential damage scales damage scale for residential buildings. This microtasks is that the potential inaccuracy and inconsistencies in the
damage assessment scale is widely used in the field for damage assess­ responses to observational questions may lead to a set of responses that
ment since it is easy to implement and yields damage assessment data cannot be uniquely mapped, through the decision rule, to any of stan­
comparable with the modeled results. It is pertinent to mention here that dard damage states. This may lead to artifact bias in the outcome of the
the damage states presented in Fig. 3 are restricted to wind damage only. final assessment. To reduce this bias, in designing and implementing the
While several combined wind and flood damage states have also been decision rule, care has been taken to minimize the number of mis­
proposed [55], the present study has been restricted to the buildings classifications that could potentially occur because of varying individual
with predominantly wind-induced damages. However, the extension to responses. The responses to questions regarding more important and
other damage scales is straightforward. The logic rule used for deciding visually distinct observational damage features are weighed more
the damage state of the buildings based on the responses from the par­ against other features. For example, if a participant responds that a
ticipants is shown in Appendix A. It is worth noting that the crowd­ building roof is completely damaged (implying DS-4), but at the same
sourcing and citizen participation in the context of disaster management identifies that around 25% of the roof deck is damaged (implying DS-3),
is mainly to support preliminary damage assessment (PDA) following a the former is given precedence, and the damage state 4 is assigned to the
disaster, which is a critical component of the post-disaster response and building. Similarly, if a participant responds that the wall is completely
recovery efforts. In later stages of disaster management, the PDA is often or partially collapsed, but the response to the question about the damage
followed by more detailed disaster damage evaluation and loss estima­ to wall-cladding implies moderate or minor damage or doors and win­
tions once sufficient funds and resources are allocated. The adopted dows are intact, the former is given precedence since it is easier to
decision rule in this study is consistent with the standard practice and identify and report extreme damages with more confidence. This is
guidelines for post-disaster PDA. Specifically, the decision rule and the addressed by placing certain checks during the implementation of the
description of damage states used in this work are based on standard decision rule so that the misjudgment and inconsistency of the damage
FEMA guidelines for single-family residential buildings, generally state based on component-wise responses is minimized. For example, the
wooden frame houses, which comprise a large portion of residential components that are prominent and easily observable are preferred over
construction in the United States. Thus it is important to note that the others in case of any conflicting responses. For example, if a participant
adopted decision rule is not applicable to other types of buildings like observes that the wall structure is partially damaged but then also re­
multistory reinforced concrete buildings or steel structures etc. How­ sponds to the doors and window damage as minor, wall structure
ever, the core contribution of this research, which is to address the un­ damage is considered for the evaluation of the overall preferred damage
certainty and unreliability involved in the outcome of crowdsourcing for state. It should also be mentioned that the issue of inconsistency can also
post-disaster damage assessment to support more-informed decisions, is be present in the case of field damage assessment by experts, although to
general and can be applied to other types of buildings with different a lower extent. In that case, experts use their judgment to do the final

Fig. 3. Guidelines in HAZUS-MH hurricane model for estimation of damage states for residential buildings. (Source: FEMA (2012), HAZUS-MH 2.1 hurricane model
technical manual.).

7
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

evaluation. So the outcome can still be prone to subjectivity. The taken to ensure that ξ lies in the range [0, ∞). Having defined the
methodology used in this work uses collective responses from the par­ observed labels as the samples from the distribution given by Equation
ticipants to reduce the subjectivity, and consequently the uncertainty, so (1), and the prior distributions for the participant competence param­
the outcome would be more reliable. eter θ and image difficulty parameter ξ, our goal is to efficiently search
for the most probable values of the unobserved variables Z and the pa­
4.3. Implementation of the stochastic algorithm for reliable participatory rameters (ξ, θ) given the observed data. This is achieved by imple­
damage assessment mentation of the MAP inference method outlined in the methodology
section to estimate θ, ξ, Z.
Once the participant responses are collected and mapped into the Fig. 4 illustrates the overall methodology adopted in the present
damage state of each inspected building based on the decision rule, the study. The end-product of the presented methodology is the probabi­
outcome is fed into the proposed model outlined in section 3.3. It is listic description of the unobserved ground truth of damage states of the
worth noting that using this model for probabilistic inference of final assessed buildings. This probabilistic inference provides more informa­
damage states does not require all the participants to label all the tion and insight into the damage labels as compared to otherwise
buildings or respond to all the questions asked in the survey. Any prior deterministic inferences. As mentioned in section 4.1, for the database of
knowledge about the reliability of contributing participants can be buildings considered in this study, the expert-assigned labels (damage
incorporated by assuming a prior distribution for the participant states) from the post-disaster on-site inspection is available. This infor­
competence parameter θ. The proper choice of the prior may be very mation is used as the representation of ground-truth for the sake of
useful if the overall credibility of the participants is known a priori. Here comparison and analyzing the performance of the model. For the pur­
it is assumed that θ follows a normal distribution with mean and vari­ pose of performance evaluation with respect to the ground truth, the
ance equal to 1. This assumption is entirely guided by the prior belief mode (most likely value) of the estimated posterior distribution of the
and information about the nature of the participants involved and can be damage states obtained for each building using the proposed model is
modified based on the available information. The assumed choice of a used.
prior distribution does not impose any limitation on the generality of the In addition to the proposed methodology, an alternative approach
prior, however, its proper and informed selection is necessary for reli­ based on the widely adopted majority-vote heuristic is also tested to
able estimates of the desired posterior inferences of the final damage determine the correct damage level for each building from the poten­
states. Since the present study is based on decomposing the overarching tially noisy labels observed from the participant responses. In the case of
task of overall damage assessment into simple, recognizable, and rela­ the majority-vote approach, the probability of a building being in each
tively less subjective microtasks, more participants are expected to be damage state is calculated based on the ratio of the number of labels
reliable in their responses. Also, the fact that most of the participants are assigned in each damage level to all the participants who labeled that
selected from among the graduate school and almost all participants in specific building. The deterministic damage state of a building is chosen
the study have had at least college-level education, it is comparatively to be the damage-level voted by the majority of the participants.
less likely to have adversarial participants. As such it is reasonable to Although this approach is very simple and intuitive, it can, however, be
assume the prior shifted towards the right implying a smaller number of potentially misleading and give us wrong results, given that the
adversarial participants. For the present case, one in every 6 participants competence level or reliability of the involved annotators or the par­
(≈ 16% of the total participants) is taken to be adversarial, implied by ticipants is very diverse. While the majority rule can be considered as an
the choice of normal prior distribution with mean and variance of 1. optimal rule for aggregating the labels in case all the participants are
Note that this is not an ad-hoc assumption and is implied from the equally competent and reliable, it does not account for varying reli­
previous assumption about the choice of the distribution for prior θ. The ability among the participants, which may be the case when the par­
idea is to point out that the expected number of adversarial participants ticipants are anonymous and come from diverse backgrounds. On the
could also be used as a means to assume a suitable prior. In this case for contrary, the damage levels and their corresponding probabilities ob­
the assumed normal prior distribution with mean and variance 1, the tained using the proposed probabilistic model based on MAP estimation
probability that θ < 0, i.e., the probability of an adversarial participant, accounts for the varying level of competence and credibility among the
comes out to be approximately 16% or visa-versa. This assumption can participants. It provides a more reliable and accurate inference of the
be modified based on the nature of participants employed for the study true damage labels as compared to the majority-vote heuristic. It also
and in case any further participant information is known a priori. quantifies the difficulty associated with each task and hence provides us
Similarly, the prior distribution for the exponent of the image difficulty with the added information that can be leveraged to understand further
parameter ξ was also assumed to be a normal distribution with mean and and enhance the quality of citizen-involved damage assessment.
variance of 1. The re-parameterization in terms of the exponent of ξ is

Fig. 4. An illustration showing the methodology adopted in the present study.

8
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

5. Results and discussion when majority-vote is used, the probability of a building being in its true
damage state (corresponding to the mode of the distribution) is often
A total of 60 buildings with varying levels of damage were selected less than the probability of not being in the true damage state (corre­
for the present study. The visual data corresponding to these 60 build­ sponding to the sum of all probabilities other than the probability cor­
ings, in the form of ground images and an aerial satellite view, was responding to the mode). On the other hand, the MAP-based approach
distributed among 70 participants along with a questionnaire. While the assigns higher probabilities to the true damage and provides better
majority of the participants had a background in civil engineering, some interpretability and confidence in the inferred results. To explore
volunteers from other branches of engineering and some non- whether the proposed approach based on MAP provides a more accurate
engineering backgrounds also responded to the damage assessment representation of ground truth as compared with the commonly used
survey for compiling the crowdsourcing database. Approximately 56% majority-vote heuristic, the final probability distributions of damage
of participants were males, 27% were females and the remaining chose states from these two approaches were compared with the labels pro­
to remain anonymous. Approximately 55% of the participants were in vided by the experts during the post-disaster field assessment. The
the age group of 20–24 years, 17% of the students were in the age group expert labels from field surveys were regarded as the benchmark for the
25–29 years, and 27% of the participants aged 30–34 years of age. ground truth, based on which the accuracy of citizen-based damage
Around 78% of the participants were graduate-level students, 14% assessments was quantified. Since the final outcome of damage states in
participants were at the doctorate level or more and around 8% were the present study are stochastic, i.e., they are represented in terms of
undergraduates. Each participant was assigned at least 10 buildings and probabilistic distributions rather than the deterministic labels, the ac­
was asked to respond to the questionnaire based on their observations curacy metric should be defined in terms of the prediction vectors based
from the provided damage visuals. Based on the responses provided by on the probabilities of being at each damage state.
different participants, the probabilistic description of the underlying To that end, a metric based on the normalized mean absolute error
damage state of the buildings is inferred. Fig. 5 shows the typical output (MAE) between the prediction vectors of citizen-based and the expert-
of the proposed model to make inferences about the damage state of assigned damage labels is used. This accuracy metric is mathemati­
each building from the citizen responses. Although the final determin­ cally defined as:
istic label based on the mode of the inferred distributions is similar in ( ∑4 )
most cases using both the methods but the distribution of probabilities AccuracyMAE (%) = 1 − (∑k=0 |pk − qk |
) × 100 (8)
across different damage states is more ambiguous in the case of the
4
max k=0 |pk − qk |
majority-vote heuristic. The likelihood associated with the mode of the
distribution is higher using the MAP-based model. Thus, this approach where pk and qk are the probability of a building being in the damage
leads to a more confident description of the damage states as compared state k based on the distributions obtained using citizen-based assess­
with the majority-vote, which tends to distribute the probabilities across ment and the expert-assigned labels, respectively. Because of the nature
all possible damage states. In other words, the majority-vote heuristic of probability mass functions that are normalized to one, the normali­
tends to assign higher probabilities to the labels other than the true zation factor in the denominator of the above equation always takes the
damage state of the buildings. For example, it is observed in Fig. 5 that, value of 4. The accuracy of the damage levels inferred using the citizen-

Fig. 5. Probabilistic inference of the damage levels of buildings based on (a). MAP-based model (b). Majority-Vote based model for citizen-driven post-disaster
damage assessment.

9
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

based assessment was evaluated for both approaches, namely MAP-


based, as well as the majority-vote approach. The accuracy of the pro­
posed model based on MAP was calculated as 75.1%, whereas the
majority-vote resulted in a slightly lower accuracy of 74.2%. The narrow
difference between the performances of these two can be due to the
relatively homogeneous and reliable participating crowd used in this
study, who are mostly graduate students of civil engineering majors.
This observation is also supported by looking at the distribution of
competence parameter θ characterizing the relative reliability of each
participant. Fig. 6 shows the final distribution of θ obtained at the end of
iterations. It is seen that a very small percentage of the total number of
participants (about 3%) are adversarial with the values of θ < 0. The
range of value of θ is relatively small (note that θ theoretically takes a
real value in ( − ∞, + ∞)).
Next, to test the effect of the number of participants on model per­
formance, a sensitivity analysis was carried out. A subset of the total
number of participants was chosen randomly in an incremental fashion
to observe the effect of the increasing number of participants on the Fig. 7. Effect of the number of participants on the overall accuracy of the re­
accuracy of the results. The procedure was repeated multiple times, and sults obtained.
the corresponding measure of accuracy was calculated for different
numbers of participants associated with each trial. The overall accuracy
corresponding to a specific number of participants was calculated as the
average of the accuracies obtained across all the trials. Fig. 7 shows the
sensitivity of the results with respect to the number of participants.
However, as mentioned previously, since, in the present case, the par­
ticipants are relatively reliable and consistent in their assessment, even
with a smaller number of participants, a reasonable level of accuracy can
be achieved. Thus, in this case, the outcome is not very sensitive to the
number of participants. This may not be the case when the crowd is more
heterogeneous in terms of their level of reliability and, especially, when
there are a considerable number of unreliable and/or adversarial par­
ticipants, which is likely in the case of large-scale crowdsourcing dam­
age assessment.
We hypothesize that with the presence of unreliable participants in
the experiment, the majority-vote heuristic is expected to reduce in its
performance as it cannot systematically account for and mitigate the
bias due to the presence of unreliable participants. To test this hypoth­
esis, additional adversarial participants, who would always respond
with incorrect responses, were simulated artificially and incorporated
into the study. These additional participants were introduced in in­ Fig. 8. Effect of increasing number of additional adversarial participants on the
crements of 5 participants, ranging from 5 to 55 participants. Their re­ accuracy of the proposed MAP-based model in comparison to majority-vote
sponses represented unreliable assessments and were chosen randomly based approach.
among the possible damage states other than the ground truth. The ef­
fect of an increase in the number of such unreliable participants in the evident. While the accuracy of the majority-vote based approach de­
study on the overall accuracy of the damage assessment is shown in clines with the increasing representation of unreliable participants, the
Fig. 8. It is observed that as the number of adversarial participants in­ MAP model identifies and accounts for the unreliable labelers, thus
creases among the participating volunteers, the merits of using the maintaining the accuracy at a relatively higher level.
proposed MAP inference model over the majority-vote heuristic are With the increasing number of adversarial participants, it is expected
that the average value of alpha decreases. This is observed from the
boxplot of the competence parameter θ values shown in Fig. 9a, which
depicts the median line along with the box extending from the first to the
third quartile and the whiskers representing the range of values. It can
be clearly seen that the median of alpha decreases with an increasing
number of adversarial participants. A similar trend can be seen with the
task difficulty parameter (ξ) in Fig. 9b. With the increase in the number
of adversarial participants, the consensus on each task gets more and
more skewed, which will be manifested as the increased difficulty in
assigning the true damage state. It is worth noting that, although θ and ξ
represent two different aspects of the subjectivity of participatory
assessment, one expects an inherent interplay between these parameters
in the sense that increasing the level of difficulty in evaluating the
incurred damage from a set of images naturally lead to a more pro­
nounced discrepancy between responses of reliable and unreliable par­
ticipants. From a statistical point of view, this should be reflected in the
Fig. 6. Distribution of participant competence parameter (θ) for the partici­ correlation between θ and ξ. Fig. 10 depicts the correlation observed in
pants who volunteered in the study.

10
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

Fig. 9. Boxplot showing the variation in the distribution of θ and ξ with increasing number of adversarial participants in the study.

Fig. 10. Plot showing the correlation between the mean values of parameters
Fig. 11. Risk evaluation plots obtained using the proposed methodology.
obtained for different cases.

recovery planning and or fund-allocations for mitigation following the


the mean values of the parameters θ and ξ obtained for different cases
disaster event. For example, with reference to Fig. 11, it is observed that
involving varying number of unreliable participants. For the cases with a
the damage in 26% of the total number of buildings are expected to be in
more ambiguous visual description and hence higher task difficulty, the
damage state 4 or more, if we consider the probability of exceedance as
unreliability observed in the participant responses is expected to be
90% or more. Corresponding to the same level of risk (i.e. 90% or more
higher, leading to a lower average participant competence parameter θ.
probability of exceedance), 46% of the buildings are expected to be in or
Similarly, lower task difficulty implying higher values for ξ will result in
exceed damage state 1. Similarly, if the level of risk considered in terms
more reliable participant responses leading to higher values for θ.
of probability of exceedance is 50% or more, 52% of buildings are ex­
pected to be in damage state 1 or more and 30% of the buildings are
5.1. Application in disaster risk assessment expected to belong to or exceed damage state 4. The percentage of
buildings exceeding damage states 2 and 3 can be found likewise. This
A crucial purpose of carrying out the post-disaster damage assess­ not only gives us a probabilistic description of the distribution of
ment is to provide a comprehensive and an objective estimate of the different damage states at a coarser community level but also leads to a
incurred loss, recovery requirements as well as the planning and allo­ more informed and reliable communication of the underlying damage
cation of resources required to overcome the state of crisis. To facilitate scenario. However, it is important to mention that the abovementioned
informed decision making and implementation, the proper communi­ description is based on a specific scenario event at a specific location.
cation of the assessed damage and the uncertainties involved is critical. The probabilities associated with the occurrence and the intensity of the
The outcome of citizen-based post-disaster damage assessment is useful disaster events are not considered here. While exceedance probability
when it can support a risk-informed decision about different aspects of (EP) curves are widely used to quantify the probabilistic occurrence of
disaster management in the affected community. The exceedance disaster events and the associated risks in terms of the intensity of the
probability curve is one way of presenting the damage assessment for disaster event [56–58], the presented approach is based entirely on the
informed decision making. Thus, to show the implication of this study, distribution of damage states in a community based on the uncertainties
the MAP model’s outcome is used to generate the exceedance proba­ involved with the citizen-based assessment of post-disaster damages.
bility curves representing the percentage of buildings having or The aleatory uncertainties associated with the occurrence of disaster
exceeding each damage state with a certain likelihood. These results are event itself is not accounted for.
shown in Fig. 11. Here each plot represents the percentage of buildings
having a given probability or more to exceed the corresponding damage
state. This information can be valuable in prioritizing and optimizing the

11
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

5.2. Limitations and future directions and inferring citizen assessment of building damage states such that the
outcome is more reliable and informative. To that end, first, a micro-
The presented pilot study shows the application of the proposed tasking scheme in the form of a questionnaire survey was presented
citizen-driven approach for reliable assessment of post-disaster damage for reducing the complexity and subjectivity involved when engaging
states of the affected buildings while quantifying the underlying un­ citizen-participants in post-disaster damage assessment. This was fol­
certainties. It is worth iterating that the database used in the study is lowed by proposing and implementing a probabilistic framework for
rather limited, and ideally, this should be done in a large-scale setting. quantifying reliability and incorporating the underlying uncertainty,
The use of crowdsourcing platforms like Amazon Mechanical Turk leading to a more informed inference of damage assessment outcome. A
(www.mturk.com) or Prolific (www.prolific.com) can further extend the pilot study was conducted by involving a crowd of 70 participants in
database, which would provide more leverage in exploring the effec­ conducting damage assessment using the post-hurricane imagery of 60
tiveness of the proposed approach. Such a large-scale experiment will buildings collected following Hurricane Harvey. The probabilistic de­
provide an improved understanding of the underlying uncertainties due scriptions of overall damage based on the HAZUS-MH hurricane model
to the unreliability of participants and subjectivity of the tasks, as well as residential damage scale were quantified using the proposed model and
the degree by which the uncertainty can be reduced by proper design were compared with the majority-vote and the available field-
and inference of damage evaluation survey. The presence of unreliable assessment data based on expert-evaluation. To show the effectiveness
and adversarial participants will be more pronounced and representa­ of the proposed method, the performance, and accuracy of the model, as
tive of a community-scale citizen-based setting. the number of artificially-generated adversarial participants increases,
In the presented study, microtasks were formed by splitting the in­ were evaluated and compared with the majority-vote approach. The
dividual task into several simple and observational questions. It should results indicate that the proposed inference method outperforms the
be noted that other strategies may be employed in conducting an initial majority-vote approach. While the accuracy of the majority-vote based
damage evaluation survey to help to reduce the subjectivity of the approach declines with the increasing representation of unreliable
outcome. One of such approaches is comparison-based, which relies on participants, the proposed probabilistic method accounts for and miti­
the notion of pairwise comparison and ranking [42–44]. The partici­ gates the bias due to the unreliable labelers, thus maintaining the ac­
pants can be asked to compare each image to be assessed with at least curacy at a relatively higher level. Finally, to show the application in
one among a set of predetermined images which represent different disaster risk assessment, the outcome of the probabilistic framework for
levels of damage or as an alternative strategy compare the images reliable participatory damage assessments was used to generate the
among themselves. The outcome can then be fed to an unsupervised exceedance probability curves representing the percentage of buildings
learning model (e.g., clustering) to identify final damage levels. having or exceeding each damage state with a certain likelihood. With
It is also worth noting that most of the images used in the present the reliable and enhanced inference of the damage information, the
study were obtained from the database collected and archived during a paper presents crowdsourcing as an efficient and reliable alternative to
previous disaster event. The images were most likely intended to serve the conventional methods of post-disaster damage assessment. Although
the purpose of off-site validation and auxiliary data collected during the the proposed approach is demonstrated here for hurricane-induced
damage assessment survey. They were not collected for the specific damage, it can be applied to different types of natural disasters
purpose of citizen evaluations. As such, there is a significant scope to involving varying damage scenarios as well. The primary purpose of the
have better quality visual data, which is more representative of the presented study is to propose a methodology aimed at enhancing the
damage incurred by the buildings. Also, the satellite images for many speed and quality of the immediate post-disaster damage assessment in
buildings were not very clear as required. With the comprehensive and case of natural disasters like hurricanes. The methodology adopted and
good quality visual data of each building, the performance of the pro­ proposed in the presented study also contributes towards realizing a
posed citizen-centric damage assessment can be significantly enhanced. better resilient and risk-aware community apart from aiding in the
Finally, the present study offers the scope for a lot of potential extensions emergency response and immediate post-disaster decision making.
and improvisations towards realizing a faster and reliable post-disaster
building damage assessment. The probabilistic inferences and results Declaration of competing interest
from the large-scale crowd-based assessment of damaged buildings can
also be used in augmentation with the Artificial Intelligence (AI) and The authors declare that they have no known competing financial
deep learning applications to realize a sustainable and reliable Human- interests or personal relationships that could have appeared to influence
AI partnered solution towards the post-disaster damage assessment of the work reported in this paper.
the built environment. Several recent studies have explored the use of AI
and deep learning in visual inspections, damage assessment, post- Acknowledgement
disaster building evaluation etc. [59–65]. The crowd-based assessment
can be directly used as the training data for deep learning models, and Early discussions on this topic with Dr. Hadi Meidani is gratefully
the stochastic representation of the results, along with the quantification acknowledged. This work was partially supported by faculty startup
of suggested parameters, can inform the stochastic deep learning models funds from Texas A&M Engineering Experiment Station. Portions of this
for robust damage identification as assessment. research were conducted with the advanced computing resources pro­
vided by Texas A&M High Performance Research Computing.
6. Conclusion

The objective of this study was to improve the process of conducting

Appendix A

Figure A.1 illustrates the mathematical computations involved in the adopted MAP estimation.

12
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

Fig. A.1. Flowchart illustrating mathematical calculations

Appendix B

Fig. A2 illustrates the set of decision rules that are implemented for transforming the responses of damage evaluation questions (microtasks) to
FEMA building damage states.

13
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

Fig. A2. Logic Rule for building damage assessment

References [9] N. Kankanamge, T. Yigitcanlar, A. Goonetilleke, M. Kamruzzaman, Can volunteer


crowdsourcing reduce disaster risk? A systematic review of the literature,
International journal of disaster risk reduction 35 (2019) 101097.
[1] E.S. Gol, M.-K. Stein, M. Avital, Crowdwork platform governance toward
[10] D. Bunker, L. Levine, C. Woody, Repertoires of collaboration for common operating
organizational value creation, J. Strat. Inf. Syst. 28 (2) (2019) 175–195.
pictures of disasters and extreme events, Inf. Syst. Front 17 (1) (2015) 51–65.
[2] D. Nevo, J. Kotlarsky, Crowdsourcing as a strategic is sourcing phenomenon:
[11] M. Poblet, E. García-Cuesta, P. Casanovas, Crowdsourcing roles, methods and tools
critical review and insights for future research, J. Strat. Inf. Syst. 29 (4) (2020)
for data-intensive disaster management, Inf. Syst. Front 20 (6) (2018) 1363–1379.
101593.
[12] M.T. Riccardi, The power of crowdsourcing in disaster response operations,
[3] A. Wiggins, K. Crowston, From conservation to crowdsourcing: a typology of
International Journal of Disaster Risk Reduction 20 (2016) 123–128.
citizen science, in: 2011 44th Hawaii International Conference on System Sciences,
[13] S.B. Liu, Crisis crowdsourcing framework: designing strategic configurations of
IEEE, 2011, pp. 1–10.
crowdsourcing for the emergency management domain, Comput. Support. Coop.
[4] D. Harvey, T.D. Kitching, J. Noah-Vanhoucke, B. Hamner, T. Salimans, A. Pires,
Work 23 (4–6) (2014) 389–443.
Observing Dark Worlds: a crowdsourcing experiment for dark matter mapping,
[14] T.A. Gurman, N. Ellenberger, Reaching the global community during disasters:
Astronomy and Computing 5 (2014) 35–44.
findings from a content analysis of the organizational use of Twitter after the 2010
[5] C.J. Parker, A. May, V. Mitchell, User-centred design of neogeography: the impact
Haiti earthquake, J. Health Commun. 20 (6) (2015) 687–696.
of volunteered geographic information on users’ perceptions of online map
[15] L. Barrington, S. Ghosh, M. Greene, S. Har-Noy, J. Berger, S. Gill, A. Y.-M. Lin, C.
â€~mashups’, Ergonomics 57 (7) (2014) 987–997.
Huyck, Crowdsourcing earthquake damage assessment using remote sensing
[6] C. Barreto, D.E. Fastovsky, P.M. Sheehan, A model for integrating the public into
imagery, Ann. Geophys. 54 (6).
scientific research, J. Geosci. Educ. 51 (1) (2003) 71–75.
[16] S. Sachdeva, S. McCaffrey, D. Locke, Social media approaches to modeling wildfire
[7] R.E. McCaffrey, Using citizen science in urban bird studies, Urban Habitats 3 (1)
smoke dispersion: spatiotemporal and social scientific investigations, Inf. Commun.
(2005) 70–86.
Soc. 20 (8) (2017) 1146–1161.
[8] N. Prutzer, The mapping crowd: macrotask crowdsourcing in disaster response, in:
[17] F. Yuan, R. Liu, Crowdsourcing for forensic disaster investigations: hurricane
Macrotask Crowdsourcing, Springer, 2019, pp. 253–275.
Harvey case study, Nat. Hazards 93 (3) (2018) 1529–1546.

14
A.B. Khajwal and A. Noshadravan International Journal of Disaster Risk Reduction 55 (2021) 102110

[18] F. Yuan, R. Liu, Mining social media data for rapid damage assessment during regional-scale post disaster damage, in: Eleventh U.S. National Conference on
Hurricane Matthew: feasibility study, J. Comput. Civ. Eng. 34 (3) (2020), Earthquake Engineering, Earthquake Engineering Research Institute, Los Angeles,
05020001. California, 2018.
[19] Z. Song, H. Zhang, C. Dolan, Promoting disaster resilience: operation mechanisms [42] R.A. Bradley, M.E. Terry, Rank analysis of incomplete block designs: I. The method
and self-organizing processes of crowdsourcing, Sustainability 12 (5) (2020) 1862. of paired comparisons, Biometrika 39 (3/4) (1952) 324–345.
[20] S. Liu, C. Chen, Y. Lu, F. Ouyang, B. Wang, An interactive method to improve [43] M.G. Kendall, Further contributions to the theory of paired comparisons,
crowdsourced annotations, IEEE Trans. Visual. Comput. Graph. 25 (1) (2018) Biometrics 11 (1) (1955) 43–62.
235–245. [44] S. Negahban, S. Oh, D. Shah, Rank centrality: ranking from pairwise comparisons,
[21] P. Wais, S. Lingamneni, D. Cook, J. Fennell, B. Goldenberg, D. Lubarov, D. Marin, Oper. Res. 65 (1) (2017) 266–287.
H. Simons, Towards building a high-quality workforce with mechanical turk, [45] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, L. Moy,
Proceedings of computational social science and the wisdom of crowds (NIPS) Learning from crowds., J. Mach. Learn. Res. 11 (4).
(2010) 1–5. [46] J. Whitehill, T.-f. Wu, J. Bergsma, J.R. Movellan, P.L. Ruvolo, Whose vote should
[22] J. Howe, The rise of crowdsourcing, Wired magazine 14 (6) (2006) 1–4. count more: optimal integration of labels from labelers of unknown expertise, in:
[23] M.N.K. Boulos, B. Resch, D.N. Crowley, J.G. Breslin, G. Sohn, R. Burtner, W.A. Pike, Advances in Neural Information Processing Systems, 2009, pp. 2035–2043.
E. Jezierski, K.-Y.S. Chuang, Crowdsourcing, citizen sensing and sensor web [47] V. C. Raykar, S. Yu, Annotation models for crowdsourced ordinal data, J. Mach.
technologies for public and environmental health surveillance and crisis Learn. Res. 13.
management: trends, OGC standards and application examples, Int. J. Health [48] E. Kamar, S. Hacker, E. Horvitz, Combining human and machine intelligence in
Geogr. 10 (1) (2011) 1–29. large-scale crowdsourcing, AAMAS 12 (2012) 467–474.
[24] F. Montargil, V. Santos, Citizen observatories: concept, opportunities and [49] M. Venanzi, J. Guiver, G. Kazai, P. Kohli, M. Shokouhi, Community-based bayesian
communication with citizens in the first EU experiences, in: Beyond Bureaucracy, aggregation models for crowdsourcing, in: Proceedings of the 23rd International
Springer, 2017, pp. 167–184. Conference on World Wide Web, 2014, pp. 155–164.
[25] M. Evers, A. Jonoski, A. Almoradie, L. Lange, Collaborative decision making in [50] A.T. Nguyen, M. Halpern, B.C. Wallace, M. Lease, Probabilistic modeling for
sustainable flood risk management: a socio-technical approach and tools for crowdsourcing partially-subjective ratings, in: HCOMP, 2016, pp. 149–158.
participatory governance, Environ. Sci. Pol. 55 (2016) 335–344. [51] J. Li, S. Ling, J. Wang, Z. Li, P. L. Callet, GPM: A Generic Probabilistic Model to
[26] M. Haklay, Citizen science and volunteered geographic information: overview and Recover Annotator’s Behavior and Ground Truth Labeling, arXiv preprint arXiv:
typology of participation, in: Crowdsourcing Geographic Knowledge, Springer, 2003.00475 .
2013, pp. 105–122. [52] FEMA, HAZUS-MH 2.1 Hurricane Model Technical Manual, 2012.
[27] C.J. Lintott, K. Schawinski, A. Slosar, K. Land, S. Bamford, D. Thomas, M. [53] StEER, ’Structural extreme event reconnaissance’, URL, https://www.steer.networ
J. Raddick, R.C. Nichol, A. Szalay, D. Andreescu, et al., Galaxy zoo: morphologies k/, 2020.
derived from visual inspection of galaxies from the sloan digital sky survey, Mon. [54] P.J. Vickery, J. Lin, P.F. Skerlj, L.A. Twisdale Jr., K. Huang, HAZUS-MH hurricane
Not. Roy. Astron. Soc. 389 (3) (2008) 1179–1189. model methodology. I: hurricane hazard, terrain, and wind load modeling, Nat.
[28] M.F. Goodchild, J.A. Glennon, Crowdsourcing geographic information for disaster Hazards Rev. 7 (2) (2006) 82–93.
response: a research frontier, International Journal of Digital Earth 3 (3) (2010) [55] C. Friedland, Residential building damage from hurricane storm surge: proposed
231–241. methodologies to describe, assess and model building damage, Ph.D. thesis, in:
[29] R.-Q. Wang, H. Mao, Y. Wang, C. Rae, W. Shaw, Hyper-resolution monitoring of Louisiana State University and Agricultural and Mechanical College, 2009.
urban flooding with social media and crowdsourcing data, Comput. Geosci. 111 [56] G. Woo, Natural catastrophe probable maximum loss, Br. Actuar. J. (2002)
(2018) 139–147. 943–959.
[30] F. Yuan, R. Liu, Feasibility study of using crowdsourcing to identify critical affected [57] P. Grossi, Catastrophe Modeling: a New Approach to Managing Risk, vol. 25,
areas for rapid damage assessment: hurricane Matthew case study, International Springer Science & Business Media, 2005.
journal of disaster risk reduction 28 (2018) 758–767. [58] A.B. Khajwal, A. Noshadravan, Probabilistic hurricane wind-induced loss model for
[31] H. Hao, Y. Wang, Leveraging multimodal social media data for rapid disaster risk assessment on a regional scale, ASCE-ASME Journal of Risk and Uncertainty in
damage assessment, International Journal of Disaster Risk Reduction 51 (2020) Engineering Systems, Part A: Civil Engineering 6 (2) (2020), 04020020.
101760. [59] C.M. Yeum, S.J. Dyke, B. Benes, T. Hacker, J. Ramirez, A. Lund, S. Pujol, Postevent
[32] P. Kumar, Crowdsourcing to rescue cultural heritage during disasters: a case study reconnaissance image documentation using automated classification, J. Perform.
of the 1966 Florence Flood, International Journal of Disaster Risk Reduction 43 Constr. Facil. 33 (1) (2019), 04018103.
(2020) 101371. [60] K.R. Nia, G. Mori, Building damage assessment using deep learning and ground-
[33] L. See, A review of citizen science and crowdsourcing in applications of pluvial level image data, in: 2017 14th Conference on Computer and Robot Vision (CRV),
flooding, Front. Earth Sci. 7 (2019) 44. IEEE, 2017, pp. 95–102.
[34] M. Poblet, E. García-Cuesta, P. Casanovas, Crowdsourcing tools for disaster [61] S. Wang, S.A. Zargar, F.-G. Yuan, Augmented Reality for Enhanced Visual
management: a review of platforms and methods, in: International Workshop on AI Inspection through Knowledge-Based Deep Learning, Structural Health
Approaches to the Complexity of Legal Systems, Springer, 2013, pp. 261–274. Monitoring, 2020, 1475921720976986.
[35] S. Ghosh, C.K. Huyck, M. Greene, S.P. Gill, J. Bevington, W. Svekla, R. DesRoches, [62] F.-G. Yuan, S.A. Zargar, Q. Chen, S. Wang, Machine learning for structural health
R.T. Eguchi, Crowdsourcing for rapid damage assessment: the global earth monitoring: challenges and opportunities, in: Sensors and Smart Structures
observation catastrophe assessment network (GEO-CAN), Earthq. Spectra 27 (S1) Technologies for Civil, Mechanical, and Aerospace Systems 2020, vol. 11379,
(2011) S179–S198. International Society for Optics and Photonics, 2020, 1137903.
[36] A. Huynh, M. Eguchi, A.Y.-M. Lin, R. Eguchi, Limitations of crowdsourcing using [63] Z. Mao, Y. Yan, J. Wu, J.F. Hajjar, T. Padlr, Towards automated post-disaster
the EMS-98 scale in remote disaster sensing, in: 2014 IEEE Aerospace Conference, damage assessment of critical infrastructure with small unmanned aircraft systems,
IEEE, 2014, pp. 1–7. in: 2018 IEEE International Symposium on Technologies for Homeland Security
[37] S. Xie, J. Duan, S. Liu, Q. Dai, W. Liu, Y. Ma, R. Guo, C. Ma, Crowdsourcing rapid (HST), IEEE, 2018, pp. 1–6.
assessment of collapsed buildings early after the earthquake based on aerial remote [64] S.A. Zargar, F.-G. Yuan, Impact Diagnosis in Stiffened Structural Panels Using a
sensing image: a case study of yushu earthquake, Rem. Sens. 8 (9) (2016) 759. Deep Learning Approach, Structural Health Monitoring, 2020,
[38] F. Dell’Acqua, P. Gamba, Remote sensing and earthquake damage assessment: 1475921720925044.
experiences, limits, and perspectives, Proc. IEEE 100 (10) (2012) 2876–2890. [65] C.-S. Cheng, A.H. Behzadan, A. Noshadravan, Deep Learning for Post-hurricane
[39] M.F. Goodchild, L. Li, Assuring the quality of volunteered geographic information, Aerial Damage Assessment of Buildings, ComputerAided Civil and Infrastructure
Spatial statistics 1 (2012) 110–120. Engineering (2021).
[40] L.R. Varshney, A. Vempaty, P.K. Varshney, Assuring privacy and reliability in [66] David B. Roueche, Frank T. Lombardo, Richard J. Krupar III, Daniel J. Smith,
crowdsourcing with coding, in: 2014 information theory and applications Collection of Perishable Data on Wind- and Surge-Induced Residential Building
workshop (ITA), IEEE 1–6 (2014). Damage During Hurricane Harvey (TX), in Collection of Perishable Data on Wind-
[41] S. Loos, K. Barns, G. Bhattacharjee, R. Soden, B. Herfort, M. Eckle, C. Giovando, and Surge-Induced Residential Building Damage During Hurricane Harvey (TX),
B. Girardot, K. Saito, G. Deierlein, et al., Crowd-sourced remote assessments of DesignSafe-CI, 2018, https://doi.org/10.17603/DS2DX22.

15

You might also like