Professional Documents
Culture Documents
10 1061@ajrua6 0001106
10 1061@ajrua6 0001106
Abstract: In data-driven structural health monitoring (SHM), the signals recorded from systems in operation can be noisy and incomplete.
Data corresponding to each of the operational, environmental, and damage states are rarely available a priori; furthermore, labeling to describe
the measurements is often unavailable. In consequence, the algorithms used to implement SHM should be robust and adaptive while ac-
commodating for missing information in the training data—such that new information can be included if it becomes available. By reviewing
novel techniques for statistical learning (introduced in previous work), it is argued that probabilistic algorithms offer a natural solution to the
modeling of SHM data in practice. In three case-studies, probabilistic methods are adapted for applications to SHM signals, including semi-
supervised learning, active learning, and multitask learning. DOI: 10.1061/AJRUA6.0001106. © 2020 American Society of Civil Engineers.
Author keywords: Structural health monitoring (SHM); Statistical machine learning; Pattern recognition; Semisupervised learning; Active
learning; Multitask learning; Transfer learning.
Introduction: Probabilistic SHM associated with SHM data (outlined in the next section), the current
work focuses on probabilistic (i.e., statistical) tools: these algorithms
Under the pattern recognition paradigm associated with structural appear to offer a natural solution to some key issues, which can
health monitoring (SHM) (Farrar and Worden 2012), data-driven otherwise prevent practical implementation. Additionally, probabi-
methods have been established as a primary focus of research. Vari- listic methods can lead to predictions under uncertainty (Papoulis
ous machine learning tools have been applied in the literature (for 1965)—a significant advantage in risk-based applications.
example, Vanik et al. 2000; Sohn et al. 2003; Chatzi and Smyth
2009) and used to infer the health or performance state of the moni-
tored system, either directly or indirectly. Generally, algorithms for SHM, Uncertainty, and Risk
regression, classification, density estimation, or clustering learn It should be clear that measured/observed data in SHM will be
patterns in the measured signals (available for training), and the as- inherently uncertain to some degree. Uncertainties can enter via
sociated patterns, can be used to infer the state of the system in op- experimental sources, including limitations to sensor accuracy, pre-
eration, given future measurements (Worden and Manson 2006). cision, or human error; further uncertainties will be associated with
Unsurprisingly, there are numerous ways to apply machine learn- the model—machine learning or otherwise—including paramet-
ing to SHM. Notably (and categorized generally), advances have ric variability, model discrepancy, and interpolation uncertainty.
focussed on various probabilistic (e.g., Vanik et al. 2000; Ou et al. Considering the implications of risk, financially and in terms of
2017; Flynn and Todd 2010) and deterministic (e.g., Bornn et al. safety, uncertainty should be mitigated (during data acquisition), and
2009; Zhao et al. 2019; Janssens et al. 2018) methods. Each ap- quantified (within models) as far as possible to inform decision-
proach has its advantages; however, considering certain challenges making (Zonta et al. 2014; Cappello et al. 2015). That is, when sup-
porting a financial or safety-critical decision, predictions should be
1
Dept. of Mechanical Engineering, Univ. of Sheffield, Mappin St., presented with confidence: clearly, a certain prediction, which im-
Sheffield S1 3JD, UK (corresponding author). ORCID: https://orcid.org plies a system is safe to use, differs significantly from an uncertain
/0000-0002-0225-5010. Email: l.a.bull@sheffield.ac.uk prediction, supporting the same decision. If there is no attempt to
2
Dept. of Mechanical Engineering, Univ. of Sheffield, Mappin St., quantify the associated uncertainties, there is no distinction between
Sheffield S1 3JD, UK. ORCID: https://orcid.org/0000-0002-1882-9728.
these scenarios.
Email: p.gardner@sheffield.ac.uk
3
Dept. of Mechanical Engineering, Univ. of Sheffield, Mappin St.,
Various methods can return predictions with confidence (or cred-
Sheffield S1 3JD, UK. ORCID: https://orcid.org/0000-0002-3433-3247. ibility) (Murphy 2012). The current work focuses on probabilistic
Email: tim.rogers@sheffield.ac.uk models, which—under Kolmogorov’s axioms (Papoulis 1965)—
4
Professor, Dept. of Mechanical Engineering, Univ. of Sheffield, allow for predictions under well-defined uncertainty, provided the
Mappin St., Sheffield S1 3JD, UK. ORCID: https://orcid.org/0000-0001 model assumptions are appropriate.
-5204-1910. Email: e.j.cross@sheffield.ac.uk
5
Dept. of Mechanical Engineering, Univ. of Sheffield, Mappin St.,
Sheffield S1 3JD, UK. Email: n.dervilis@sheffield.ac.uk Probabilistic Approach
6
Professor, Dept. of Mechanical Engineering, Univ. of Sheffield,
Discussions in this work will consider the general strategy illus-
Mappin St., Sheffield S1 3JD, UK. Email: k.worden@sheffield.ac.uk
Note. This manuscript was published online on November 27, 2020. trated in Fig. 1. That is, SHM is viewed as a multiclass problem,
Discussion period open until April 27, 2021; separate discussions must which categorizes measured data into groups, corresponding to the
be submitted for individual papers. This paper is part of the ASCE-ASME condition of the monitored system. The ith input, denoted by xi, is
Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil defined by a d-dimensional vector of variables, which represents an
Engineering, © ASCE, ISSN 2376-7642. observation of the system, such that xi ∈ Rd . The data labels yi , are
© ASCE 03120003-1 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
models [that can localize and classify damage, as well as to detect it
(Worden and Manson 2006)] is unavailable or not obtained.
For the measurements xi that are available—as well as those
that are recorded during operation (in situ)—labels to describe what
the signals represent, yi , are rarely at hand. This missing informa-
Fig. 1. Simplified framework for pattern recognition within SHM.
tion is usually due to the cost associated with manually inspecting
structures (or data), as well as the practicality of investigating each
observation. The absence of labels makes defining and updating
used to specify the condition of the system, directly or indirectly. (multiclass) machine learning models difficult, particularly in the
Machine learning is introduced via the pattern recognition model, online setting, as it can become difficult to determine if/when the
denoted fð·Þ, and is used to infer relationships between the input novel valuable information has been recorded and what it repre-
and output variables to inform predictive maintenance. sents (Bull et al. 2019b). For example, consider streaming data re-
The inputs xi are assumed to be represented by some random corded from a subsea pipeline. Comparisons of measured data to
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
vector X (in this case, a continuous random vector), which can take the model might indicate novelty; however, without labels, it is dif-
any value within a given feature-space X . The random vector is ficult to include this new information in a supervised manner: the
therefore associated with an appropriate probability density func- measurements might represent another operational condition, ab-
tion (p.d.f.), denoted pð·Þ, such that the probability P of X falling normal wave loads, actual damage, or some other condition.
within the interval a < X ≤ b is, Pða < X ≤ bÞ ¼ ∫ ba pðxi Þdxi
such that pðxi Þ ≥ 0; ∫ X pðxi Þdxi ¼ 1. For a discrete classifica-
tion problem, the labels yi are represented by a discrete random New Modes of Probabilistic Inference
variable Y, which can take any value from the finite set, yi ∈ Y ¼
f1; : : : ; Kg. Note that the discrete classification is presented in this New modes of probabilistic inference are being proposed to address
work, although SHM is regularly informed by regression models, challenges with SHM data. Specifically, the algorithms focus on
i.e., yi is continuous; this is application-specific, and most of the probabilistic frameworks to deal with limited labeled data, as well
motivational arguments remain the same. The K is the number of as incomplete measured data, that only correspond to a subset of the
classes defining the (observed) operational, environmental, and expected conditions in situ.
health conditions while Y denotes the label-space. An appropriate
probability mass function (p.m.f.), also denoted P pð·Þ, is such that, Partially-Supervised Learning
PðY ¼ yi Þ ¼ pðyi Þ where 0 ≤ PðY ¼ yi Þ ≤ 1; yi ∈Y PðY ¼ yi Þ ¼ 1.
Note that the context should make the distinction between p.m.fs Partially-supervised learning allows multiclass inference in cases in
and p.d.fs clear. Further details regarding the probability theory for which labeled data are limited. Missing label information is espe-
pattern recognition can be found in a number of well-written cially relevant to practical applications of SHM: while fully labeled
textbooks—for example, Murphy (2012), Barber (2012), and data are often infeasible, it can be possible to include labels for a
Gelman et al. (2013). limited set (or budget) of measurements. Typically, the budget is
limited by some expense incurred when investigating the signals;
this might include direct costs associated with inspection or loss of
Layout income due to downtime (Bull et al. 2020b).
The section “Incomplete Data and Missing Information” summa- Generally speaking, partially-supervised methods can be used
rizes the most significant challenges for data-driven SHM, while to perform multiclass classification while utilizing both labeled
the section “New Modes of Probabilistic Inference” suggests prob- Dl and unlabeled Du signals within a unifying training scheme
abilistic methods to mitigate these issues. The section “Directed (Schwenker and Trentin 2014). As such, the training set D becomes
Graphical Models” introduces the theory behind directed graphical
D ¼ Dl ∪ Du ð1Þ
models (DGMs), which will be used to introduce each method for-
mally. The section “Case Studies” collects four case studies to high- ~
light the advantages of probabilistic inference. Active learning and ¼ fX; yg ∪ X ð2Þ
Dirichlet process clustering are applied to the Z24 bridge data.
Semisupervised learning is applied to data recorded during ground fX; yg ≜ fxi ; yi gni¼1 ð3Þ
vibration tests of a Gnat aircraft. Multitask learning is applied
simulated, and experimental data is applied from shear-building ~ ≜ fx~ i gm
X ð4Þ
i¼1
structures.
Note that the applications presented in this study were introduced Active and semisupervised techniques are suggested—as two
in a previous work by the authors. The related SHM literature is variants of partially-supervised learning—to combine/include in-
referenced in the descriptions of each mode of inference. formation from labeled and unlabeled SHM data (Bull et al. 2018,
2019b, 2020b).
© ASCE 03120003-2 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Arguably, the most simple/intuitive method to introduce unla-
beled data is self-labeling (Zhu 2005). In this case, a classifier is
trained using Dl , which is used to predict labels for the unlabeled
set Du . This defines a new training-set—some labels in D are the
ground truth from the supervised data, and the others are pseudo-
labels, predicted by the classifier. Self-labeling is simple, and it can
Fig. 3. General/simplified active learning heuristic.
be applied to any supervised method; however, the effectiveness is
highly dependent on the method of implementation and the super-
vised algorithm within it (Chapelle et al. 2006).
query/annotate the unlabeled data in Du to extend the labeled set
Generative mixture models offer a formal probabilistic frame-
Dl . Therefore, an active learner attempts to define an accurate map-
work to incorporate unlabeled data (Cozman et al. 2003; Nigam
ping, f∶ X ↦ Y, while keeping queries to a minimum (Dasgupta
et al. 1998). Generative mixtures apply the cluster assumption if
2011); general (and simplified) steps are illustrated in Fig. 3.
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
points are in the same cluster, they are likely to be of the same
The critical step for active algorithms is how to select the most
class. Note that the cluster assumption does not necessarily imply
informative signals to investigate (Wang et al. 2017; Schwenker
that each class is represented by a single, compact cluster; instead,
and Trentin 2014). For example, query by committee (QBC) meth-
the implication is that observations from different classes are un-
ods build an ensemble/committee of classifiers using a small, initial
likely to appear in the same cluster (Chapelle et al. 2006). Through
(random) sample of labeled data, leading to multiple predictions for
density estimation (Barber 2012), a mixture of base-distributions
can be used to estimate the underlying distribution of the data, unlabeled instances. Observations with the most conflicted label
pðxi ; yi Þ, and unlabeled observations can be included in various predictions are viewed as informative, and thus, they are queried
ways (McCallumzy and Nigamy 1998; Vlachos et al. 2009). For (Wang et al. 2017). On the other hand, uncertainty-sampling usu-
example, the expectation-maximization (EM) algorithm [used to ally refers to a framework that is based around a single classifier
learn mixture models in the unsupervised case (Murphy 2012)] (Kremer et al. 2014; Settles 2012) in which signals with the least
can be modified to incorporate labeled observations (Nigam et al. confident predicted label, given the model, are queried. (It is ac-
1998; McCallumzy and Nigamy 1998). Fig. 2 demonstrates how a knowledged that QBC methods can also be viewed as a type of
Gaussian mixture, given acoustic emission data (Rippengill et al. uncertainty sampling.) Uncertainty sampling is (perhaps) most
2003), can be improved by considering the surrounding unlabeled interpretable when considering probabilistic algorithms, as the
examples (via EM). posterior probability over the class-labels pðyi jxi Þ can be used to
To summarize, semisupervised methods allow algorithms to quantify uncertainty/confidence (Bull et al. 2020c). For example,
learn from the information in the available unlabeled measurements consider a binary (two-class) problem: intuitively, uncertain sam-
as well as a limited set of labeled data. In practice, semisupervised ples could be instances whose posterior probability is nearest to 0.5
inference implies that the cost associated with labeling data could for both classes. This view can be extended to multiple (>2) classes
be managed in SHM (Chen et al. 2013, 2014), as the information in using the Shannon entropy (MacKay 2003) as a measure of uncer-
a small set of labeled signals is combined with larger sets of un- tainty, i.e., high entropy (uncertain) signals given the GMM of the
labeled data (Bull et al. 2019c). acoustic emission data (Rippengill et al. 2003) are illustrated in
Fig. 4(a).
Active Learning In summary, as label information is limited by cost implica-
Active learning is an alternative partially-supervised method; the tions in practical SHM (Bull et al. 2019a), active algorithms can
key hypothesis is that an algorithm can provide improved perfor- be utilized to automatically administer the label budget by selecting
mance, using fewer training labels, if it is allowed to select the data the most informative data to be investigated such that the perfor-
from which it learns (Settles 2012). As with semisupervised tech- mance of predictive models is maximized (Bull et al. 2019d).
niques, the learner utilizes Dl and Du ; however, active algorithms
Fig. 4. Uncertainty sampling for the AE data: right, left, and down
Fig. 2. Semisupervised GMM for three-class AE data: (a) supervised arrow markers show the training set, and the closed circle markers show
learning, given the labeled data only, closed circle markers; and the unlabeled data; the circles indicate queries by the active learner:
(b) semisupervised learning, given the labeled and unlabeled data, (a) based on entropy; and (b) based on likelihood. [Adapted from Bull
closed circle/open circle markers. [Adapted from Bull (2019a).] (2019a).]
© ASCE 03120003-3 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Dirichlet Process Mixture Models for
Nonparametric Clustering
Dirichlet process (DP) mixture models (Neal 2000) offer another
probabilistic framework to deal with limited labels as well as in-
complete data a priori. The DP is suggested as an (unsupervised)
Bayesian algorithm for nonparametric clustering, used to perform
inference online such that the need for extensive training data (be-
fore implementing the SHM strategy) is mitigated (Rogers et al.
2019). As such, unlike partially-supervised methods, labels are al-
ways an additional latent variable (they are never observed); thus,
the ground truth of yi is not known during inference. However, la-
bel information has the potential to be incorporated, either within
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
© ASCE 03120003-4 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
Fig. 6. Visualization of knowledge transfer via domain adaptation. Ellipses represent clusters of data: (a and b) are the source and target domains,
respectively, in their original sample spaces; and (c) shows the source and target data mapped into a shared, more consistent latent space.
applied in the projected latent space of Fig. 6(c) should generalize algorithms. In turn, this should increase the performance of predic-
to the target structure, despite missing label information. tive models, utilizing the shared information between systems.
Multitask learning considers shared information from an alter-
native perspective. As with domain adaptation, knowledge from
multiple domains is used to improve tasks (Pan and Yang 2010); Directed Graphical Models
however, in this case, each domain is weighted equally (Zhang and
Yang 2018). Therefore, the goal is to generate an improved predic- It will be useful to introduce basic concepts behind directed graphi-
tive function fð·Þ across multiple tasks by utilizing labeled feature cal models (DGMs), as these will be used to (visually) introduce
data from several different source domains. This approach to infer- each probabilistic algorithm. The terminology in this study follows
ence is particularly useful when labeled training data are insufficient that of Murphy (2012). Generally speaking, DGMs can be used to
across multiple tasks or systems. By considering the shared knowl- represent the joint distribution of the variables in a statistical model
edge across various labeled domains, the amount of the training data by making assumptions of conditional independence. For these
can, in effect, be increased. ideas to make sense, the chain rule is needed; that is, the joint dis-
This work suggests kernelized Bayesian transfer learning tribution of a probabilistic model can be represented as follows,
(KBTL) (Gönen and Margolin 2014) to model shared information. using any ordering of the variables fX 1 ; X 2 : : : ; X V g:
KBTL is a particular form of multitask learning, which can be pðX 1∶V Þ ¼ pðX 1 ÞpðX 2 jX 1 ÞpðX 3 jX 1 ; X 2 Þ : : : pðX V jX 1∶V−1 Þ
viewed as a method for heterogeneous transfer; i.e., at least one
feature space X j for a domain Dj is not the same dimension as X 1∶V ≜ fX 1 ; X 2 : : : ; X V g ð5Þ
another feature space X k (in the set of domains), such that dj ≠
dk (Gardner et al. 2020c). KBTL is a probabilistic method that In practice, a problem with Eq. (5) is that it becomes difficult
performs two tasks: (1) finding a shared latent subspace for each to represent the conditional distribution pðX V jX 1∶V−1 Þ as V gets
domain, and (2) inferring a discriminative classifier in the shared large. Therefore, to efficiently approximate large joint distributions,
latent subspace in a Bayesian manner. It is assumed that there is a assumptions of conditional independence in Eq. (6) are critical.
relationship between the feature space and the label space for each Specifically, conditional independence is denoted with ⊥, and it
domain and that all domains provide knowledge that will improve implies that
the predictive function fð·Þ for all domains (Gardner et al. 2020c). A ⊥ BjC ↔ pðA; BjCÞ ¼ pðAjCÞpðBjCÞ ð6Þ
In practice, methods such as KBTL should be particularly useful
for SHM, as the (labeled) training data are often insufficient or Considering these ideas, nodes in a graphical model can be used
incomplete across structures. If, through multitask/transfer learn- to represent variables, while edges represent conditional dependen-
ing, tasks from different structures can be considered together, cies. For example, for the AE data [in Figs. 2, 4, or 5(a)], one can
this should increase the amount of information available to train consider a random vector xi to describe the (two-dimensional)
© ASCE 03120003-5 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
DGMs; for details behind each algorithm, the reader is referred to
the SHM application papers (Bull et al. 2019b, 2020b; Rogers et al.
2019; Gardner et al. 2020a, c).
© ASCE 03120003-6 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Active Sampling
To use the DGM to query informative data recorded from the mo-
torway bridge, an initial model is learned, given a small sample of
data recorded at the beginning of the monitoring regime. In this
case, it should be safe to assume the labels yi ¼ 1, which corre-
sponds to the normal condition of the structure. As new (unlabeled)
measurements arrive online, denoted x~ i , the model can be used to
predict the labels under uncertainty. The predictive equations are
found by marginalizing (integrating) out the parameters from the
joint distribution (for each conjugate pair)
Z Z
pðx~ i jy~ i ¼ k; Dl Þ ¼ pðx~ i jμk ; Σk Þpðμk ; Σk jDl Þdμk dΣk ð19Þ
|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
Fig. 8. Directed graphical model for the GMM pðxi ; yi ; θÞ over the Eq: ð17Þ
labeled data Dl . As training data are supervised, both xi and yi are Z
observed variables. Shaded and white nodes are the observed and latent
pðy~ i jDl Þ ¼ pðy~ i jλÞpðλjDl Þdλ ð20Þ
variables, respectively, the arrows represent conditional dependencies, |fflfflfflffl{zfflfflfflffl}
and the dots represent constants (i.e., hyperparameters). [Adapted from Eq: ð18Þ
Bull (2019a).]
Again, due to conjugacy, these have analytical solutions (Murphy
2012). The posterior predictive Eqs. (19) and (20) can be combined
to define the posterior over the label estimates given unlabeled ob-
pðμk ; Σk Þ ¼ NIWðμk ; Σk ; m0 ; κ0 ; ν 0 ; S0 Þ ð14Þ servations of the bridge
This introduces the hyperparameters fm0 ; κ0 ; ν 0 ; S0 g associ- pðx~ i jy~ i ; Dl Þpðy~ i jDl Þ
ated with the prior, which can be interpreted as follows: m0 is pðy~ i jx~ i ; Dl Þ ¼ ð21Þ
pðx~ i jDl Þ
the prior mean for the location of each class μk , and κ0 determines
the strength of the prior; S0 is (proportional to) the prior mean of Considering the predictive distribution Eq. (21), labels that ap-
the covariance, Σk , and ν 0 determines the strength of that prior pear most uncertain can be investigated by the engineer. This ob-
(Murphy 2012). Considering that the streaming data will be nor- servation is now labeled fxi ; yi g, thus extending the (supervised)
malized (online), it is reasonable that hyperparemeters are defined training set Dl . Two measures of uncertainty are considered: (1) the
such that the prior belief states that each class is represented by a marginal likelihood of the new observation given the model [the
zero-mean and unit-variance Gaussian distribution. For the mixing denominator of Eq. (21)], and (2) the entropy of the predicted label,
proportions, the conjugate prior is a Dirichlet (Dir) distribution, given by
parameterized by α, which encodes the prior belief of the mixing
X
K
proportion (or weight) of each class. In this case, each class is as- Hðy~ i Þ ¼ − pðy~ i ¼ kjx~ i ; Dl Þ log pðy~ i ¼ kjx~ i ; Dl Þ ð22Þ
sumed equally weighted a priori for generality—although, care k¼1
should be taken when setting this prior, as it is application-specific,
particularly for streaming data (Bull et al. 2019b) Queries with high entropy consider data at the boundary be-
tween two existing classes, while queries given low likelihood will
Y
K
select data that appear unlikely given the current model estimate.
pðλÞ ¼ Dirðλ; αÞ ∝ λαk k −1 ð15Þ
k¼1
Visual examples of data that would be selected given these mea-
sures are shown in Fig. 4(a) for high entropy and Fig. 4(b) for low
α ≜ fα1 ; : : : ; αk g ð16Þ likelihood.
Fig. 9 demonstrates how streaming SHM signals might be que-
With this information, the joint distribution of the model ried using these uncertainty measures. The (unlabeled) data arrive
online in batches of size B; the data that appear most uncertain
Qn i ; yi ; θÞ can be approximated, such that pðX; y; θÞ ¼
pðx
(given the current model) are investigated. The number of investi-
i¼1 pðxi ; yi ; θÞ. The associated DGM can be drawn, including
conditional dependencies and hyperparameters, for n (supervised) gations per batch qb is determined by the label budget, which, in
training data in Fig. 8. turn, is limited by cost implications. Once labeled by the engineer,
Having observed the labeled training data Dl ¼ fX; yg, the pos- these data can be added to Dl and used to update the classifica-
terior distributions can be defined by applying Bayes’ theorem to tion model.
each conjugate pair, where Xk denotes the observations xi ∈ X
with the labels yi ¼ k Z24 Bridge Dataset
The Z24 bridge was a concrete highway bridge in Switzerland,
pðXk jμk ; Σk Þpðμk ; Σk Þ connecting the villages of Koppigen and Utzenstorf. Before its
pðμk ; Σk jXk ; Þ ¼ ð17Þ
pðXk Þ demolition in 1998, the bridge was used for experimental SHM
purposes (de Roeck 2003). Over a 12-month period, a series of sen-
pðyjλÞpðλÞ sors were used to capture dynamic response measurements to ex-
pðλjyÞ ¼ ð18Þ
pðyÞ tract the first four natural frequencies of the structure. Air/deck
temperature, humidity, and wind speed were also recorded (Peeters
In general terms, while the prior pðθÞ was the distribution and de Roeck 2001). There are a total of 3,932 observations in the
over the parameters before any data were observed, the posterior dataset.
distribution pðθjDl Þ describes the parameters given the training Before demolition, different types of damage were artificially
data (i.e., conditioned on the training data). Conveniently, each introduced, starting from observation 3,476 (Dervilis et al. 2014).
of these has analytical solutions (Barber 2012; Murphy 2012). The natural frequencies and deck temperature are shown in Fig. 10.
© ASCE 03120003-7 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
Fig. 9. Flow chart to illustrate the online active learning process. [Adapted from Bull et al. (2019b).]
Visible fluctuations in the natural frequencies can be observed in Results: Active Learning
Fig. 10, for 1,200 ≤ n ≤ 1,500, while there is little variation follow- The model is applied online to the frequency data from the Z24
ing the introduction of damage at observation 3,476. It is believed bridge. To provide an online performance metric, the dataset is
that the asphalt layer in the deck experienced very low temperatures divided into two equal subsets: one is used for training and query-
during this time, leading to increased structural stiffness. ing by the active learner fDl ; Du g, the other is used as a distinct/
In the analysis, the four natural frequencies are the observation independent test set. The f 1 score is used as the performance metric
data, such that xi ∈ R4 . The damage data are assumed to represent (throughout this work). This is a weighted average of precision and
their own class from observation 3,476. Outlying observations recall (Murphy 2012), with values between 0 and 1; a perfect score
within the remaining dataset are determined using the robust mini- corresponds to f1 ¼ 1. Precision (P) and recall (R) can be defined
mum covariance determinant (MCD) algorithm (Rousseeuw and in terms of numbers of true positives (TP), false positives (FP), and
Driessen 1999; Dervilis et al. 2014). In consequence, a three-class false negatives (FN) for each class, k ∈ Y (Murphy 2012)
classification problem is defined, according to Fig. 10: normal
TPk
data, outlying data due to environmental effects and damage, cor- Pk ¼ ð23aÞ
responding to yi ∈ f1; 2; 3g, respectively. TPk þ FPk
Clearly, it is undesirable for an engineer to investigate the bridge
following each data acquisition. Therefore, if active learning can TPk
provide an improved classification performance, compared to pas- Rk ¼ ð23bÞ
TPk þ FN k
sive learning (random sampling) with the same sample budget, this
demonstrates the relevance of active methods to SHM. The (macro) f 1 score is then defined by Murphy (2012)
2Pk Rk
f1;k ¼ ð24aÞ
Pk þ Rk
1X
f1 ¼ f ð24bÞ
K k∈Y 1;k
© ASCE 03120003-8 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
Fig. 11. Online classification performance (f1 score) for the Z24 data, for query budgets of (a) 25%; and (b) 12.5% of the total dataset. [Adapted from
Bull et al. (2019b).]
and additional SHM applications, refer to the study by Bull et al. using Bayes’ theorem, the MAP estimate of the parameters θ given
(2019b). Code and animations of uncertainty sampling for the Z24 the labeled and unlabeled subsets is
data are available (Bull 2019b).
pðDjθÞpðθÞ
θ̂jD ¼ argmaxθ
pðDÞ
Semisupervised Updates to Gaussian Mixture Models
pðDu jθÞpðDl jθÞpðθÞ
¼ argmaxθ
While active learning considered the unlabeled data Du for query- pðDu ; Dl Þ
ing, the observations only contribute to the model once labeled,
D ≜ Du ∪ Dl ð25Þ
i.e., once included in the labeled set Dl . However, a semisupervised
model can consider both the labeled and unlabeled data when Again, it is assumed that the data are i.i.d, so that Dl and Du
approximating the parameters. Therefore, θ is estimated given both can be factorized. Thus, the marginal likelihood of the model
labeled and unlabeled observations, such that the posterior becomes [the denominator of Eq. (25)], considers both the labeled and un-
pðθjDl ; Du Þ. This is advantageous for SHM, as unlabeled ob- labeled data. This is referred to as the joint likelihood, and it is the
servations can also contribute to the model estimate, reducing the value that is maximized while inferring the parameters of the model
dependence on costly supervised data. Continuing the probabilistic through EM.
approach, the original DGM in Fig. 8 can be updated (relatively The EM algorithm iterates E and M steps until convergence in the
simply) to become semisupervised (Fig. 12). The inclusion of Du joint (log) likelihood. During each E-step, the parameters are fixed,
introduces another latent variable y~ i , and, as a result, obtaining the and the unlabeled observations are classified using the current model
posterior distribution over the parameters becomes less simple. One ~ DÞ. The M-step corresponds to finding the θ̂, given
estimate pðyj~ X;
solution adopts an expectation-maximization approach (Dempster
the predicted labels from the E step and the absolute labels for the
et al. 1977). The implementation in this study involves finding the
supervised data. This involves some minor modifications to the con-
maximum a posteriori (MAP) estimate of the parameters θ̂ (the ventional MAP estimates, such that the contribution of the unlabeled
mode of the full posterior distribution) while maximizing the like- data is shared between classes, weighted according to the posterior
lihood of the model. Specifically, from the joint distribution, and ~ DÞ (Barber 2012; Bull et al. 2020b). A pseudo-
distribution pðyj~ X;
code is provided in Algorithm 1; a MATLAB version 2019b code for
the semisupervised GMM is also available (Bull 2019c).
© ASCE 03120003-9 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Semisupervised Learning with the Gnat Aircraft Data data. Such improvements to the classification performance for low
proportions of labeled data should highlight significant advantages
A visual example of improvements to a GMM via semisupervision
for SHM, reducing the dependence on large sets of costly super-
was shown in Fig. 2. To quantify potential advantages for SHM,
vised data.
the method is also applied to experimental data from aircraft ex-
periments, originally presented by Bull et al. (2020b). For details
behind the Gnat aircraft data, refer to the study by Manson et al. Dirichlet Process Clustering of Streaming Data
(2003). Briefly, during the tests, the aircraft was excited with an Returning to the streaming data recorded from the Z24 bridge, an
electrodynamic shaker and band-limited white noise. Transmis- alternative perspective considers that labels are not needed to infer
sibility data were recorded using a network of sensors distributed the model. In this case, an unsupervised algorithm could be used to
over the wing. Artificial damage was introduced by sequentially cluster data online, and labels could be assigned to the resulting
removing one of nine inspection panels in the wing. A total of 198 clusters outside of the inference within the wider SHM scheme—
measurements were recorded for the removal of each panel, such as suggested by Rogers et al. (2019). However, if yi is unobserved
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
that the total number of (frequency domain) observations was for the purposes of inference, the number of class components K
1,782. Over the network of sensors, nine transmissibilities were becomes an additional latent variable, unlike the GMM from pre-
recorded (Manson et al. 2003). Each transmissibility was converted vious case studies.
to a one-dimensional novelty detector, with reference to a distinct As aforementioned, the Dirichlet process Gaussian mixture
set of normal data, where all the panels were intact (Worden et al. model (DPGMM) is one solution to this problem. The DPGMM
2008). Therefore, the data represent a nine-class classification allows for the probabilistic selection of K through a Dirichlet pro-
problem, one class for the removal of each panel, such that yi ¼ cess prior. Initially, this involves defining a GMM in a Bayesian
f1; : : : ; 9g. The measurements are nine-dimensional xi ∈ R9 , each manner, using the same priors as before; however, by following
feature is a novelty index, representing one of nine transmissibil- Rasmussen (2000), it is possible to take the limit K → ∞ to form
ities. When applying semisupervised learning, 1=3 of the total data an infinite Gaussian mixture model. Surprisingly, this concept can
were set aside as an independent test-set. The remaining 2=3 were be shown through another simple modification to the first DGM in
used for training, i.e., D ¼ Dl ∪ Du . Of the training data D, the Fig. 8, leading to Fig. 14. The generative equations remain the same
number of labeled observations n was increased (in 5% increments) as Eqs. (9), (10), (14), and (15).
until all the observations are labeled. The results are compared to a A collapsed Gibbs sampler can be used to perform efficient on-
standard supervised learning for the same budget n. The changes in line inference over this model (Neal 2000). Although potentially
the classification performance through semisupervised updates are faster algorithms for variational inference exist (Blei and Jordan
shown in Fig. 13; the inclusion of the unlabeled data consistently 2006), it can be more practical to implement the Gibbs sampler
improves the f1 score. For very low proportions of labeled data when performing inference online. The nature of the Gibbs sam-
<1.26% (m > n), semisupervised updates can decrease the predic- pling solution is that each data point is assessed conditionally in
tive performance; this is likely due to the unlabeled data outweigh- the sampler, which allows the addition of new points online, rather
ing the labeled instances in the likelihood cost function. Notably, than batch updates (Rogers et al. 2019).
the maximum increase in the f1 score is 0.0405, corresponding to a Within the Gibbs sampler, only components k ¼ f1; : : : ; K þ 1g
3.83% reduction in the classification error for the 2.94% labeled need to be considered to cover the full set of possible clusters
Fig. 13. Classification performance (f1 score) for the supervised GMM versus the semisupervised GMM: (a) f 1 for an increasing proportion of labeled
data; and (b) gain in f 1 score through semisupervised updates, and the horizontal line hilghlights zero-gain. [Adapted from Bull et al. (2020b).]
© ASCE 03120003-10 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
in an online manner, and thus, the hyperparemeters of the prior
pðμ; ΣÞ encode this knowledge. The choice of the dispersion value
α, defining pðλÞ, is more application dependent, as discussed in
the restaurant analogy; this determines the likelihood that new clus-
ters will be generated. In the study by Rogers et al. (2019), sensible
values for online SHM applications were found to be between
0 < α < 20; for the Z24 data, this is set to α ¼ 10. As with the
active GMM, a small set of data from the start of the monitoring
regime make up an initial training set. Fig. 15 shows the algo-
rithm’s progress for the streaming data. A normal condition cluster
is quickly established. As the temperature cools, three more clusters
are created, corresponding to the progression of freezing of the
deck. Two additional clusters are also created: one around point
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
-5
0 500 1000 1500 2000 2500 3000 3500 4000
4
2
0
-2
0 500 1000 1500 2000 2500 3000 3500 4000
4
2
0
-2
0 500 1000 1500 2000 2500 3000 3500 4000
observations
Fig. 15. Figure showing online DP clustering applied to the Z24 bridge data using the first four natural frequencies as the features. Vertical lines
indicate that a new cluster has been formed. [Adapted from Rogers et al. (2019).]
© ASCE 03120003-11 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
shown to categorize multiple damaged and undamaged states while subspace, fHt ¼ A⊤ t Kt gt¼1 . In this shared space, a coupled dis-
T
automatically inferring an appropriate number of mixture compo- criminative classifier is inferred for the projected data from each
nents K in the mixture model. The method requires little user input, domain fft ¼ H⊤ t w þ 1bgt¼1 . This implies the same set of param-
T
and it updates online with simple feedback to the user as to when an eters fw; bg are used across all tasks.
inspection is likely required. If desired, the unsupervised clusters In a Bayesian manner, prior distributions are associated with
can be assigned meaningful descriptions to be interpreted by the the parameters of the model. For the nt × R task-specific projec-
end-user. tion matrices, At , there is an nt × R matrix of priors denoted Λt .
For the weights of the coupled classifier, the prior is η, and for the
bias b, the prior is γ. These are standard priors given the parameter
Multitask Learning types in the model—for details, refer to the study by Gönen and
In the final case study, supervised data from different structures Margolin (2014). Collectively, the priors are Ξ ¼ ffλt gTt¼1 ; η; γg
(each represented by their own domain) are considered simultane- and the latent variables are Θ ¼ ffHt ; At ; ft gTt¼1 ; w; bg; the ob-
served variables (training data) are given by fKt ; y t gTt¼1. The DGM
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
subspace, using common classification parameters. In terms of no- ≈ qðΘ; ΞÞ ¼ ðqðΛt ÞqðAt ÞqðHt ÞÞqðγÞqðηÞqðb; wÞ qðft Þ
tation, the kernel embedding for each domain Kt is projected into a t¼1 t¼1
© ASCE 03120003-12 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
optimized with respect to each factor separately while fixing the In each domain, the damped natural frequencies act as features,
remaining factors (iterating until convergence). such that Xt ½i; ∶ ¼ fωi gdi¼1 . Therefore, as each domain has differ-
ent DOFs/dimensions, a heterogeneous transfer is required. The la-
Numerical + Experimental Example: Shear-Building bel set is consistent across all domains, corresponding to normal or
Structures damaged, i.e., yi ∈ f−1; 1g, respectively. The training and test data
A numerical case study, supplemented with experimental data, is for each domain are summarized in Table 2. The training data have
used for demonstration—an extension of the work by Gardner et al. various degrees of class imbalance to reflect scenarios in which
(2020a). A population of six different shear-building structures certain structures in SHM provide more information about a par-
is considered, five are simulated, and one is experimental. A do- ticular state.
main and task are associated with each structure (such that T ¼ 6), Fig. 19 shows the coupled binary classifier in the (expected)
and the experimental rig and (simulated) lumped-mass models are shared latent subspace for all the data fHt gTt¼1 . The observations
shown in Fig. 18. For each structure (domain), there is a two-class associated with each of the six domains are distinguished via differ-
classification problem (task), which is viewed as binary damage ent markers. The left plot shows the test data and their predicted
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
detection (normal or damaged). labels given ft , while the right plot shows the ground truth labels.
Each simulated structure is represented by d mass, stiffness, KBTL has successfully embedded and projected data from different
and damping coefficients, i.e., fmi ; ki ; ci gdi¼1 . The masses have domains into a shared latent space (R ¼ 2), where the data can be
length lm , width wm , thickness tm , and density ρ. The stiffness categorized by a coupled discriminative classifier. It can also be
elements are calculated from four cantilever beams in bending, seen that, due to class imbalance (weighted toward the undamaged
4kb ¼ 4ð3EI=l3b Þ, where E is the elastic modulus, I the second mo- class −1 for each structure), there is greater uncertainty in the dam-
ment of area, and lb the length of the beam. The damping coeffi- aged class (þ1), leading to more significant scatter in the latent
cients are specified rather than derived from a physical model. space.
Damage is simulated via an open crack, using a reduction in EI The classification results for each domain are presented in
(Christides and Barr 1984). For each structure, each observation Fig. 20. An observation is considered to belong to class þ1 if
is a random draw from a base distribution for E, ρ, and c. The prop- pðy t ½ ¼ þ1jft ½Þ ≥ 0.5. KBTL is compared to a relevance vector
erties of the five simulated structures are shown in Table 1. machine (RVM) (Tipping 2000) as a benchmark—learned for each
The experimental structure is constructed from aluminum 6,082, domain independently. It is acknowledged that the RVM differs in
with dimensions nominally similar to those in Table 1. Observatio- implementation; however, similarities make it useful for compari-
nal data (the first three natural frequencies) were collected via model son as a standard (nonmultitask) alternative to KBTL.
testing in which an electrodynamic shaker applied up to 6,553.6 Hz Multitask learning has accurately inferred a general model. For
broadband white-noise excitation containing 16,384 spectral lines domains f1; 2; 3; 5; 6g, the SHM task is improved by considering
(0.2 Hz resolution). Forcing was applied to the first story, and three the data from all structures in a shared latent space. In particular,
uniaxial accelerometers measured the response at all stories. The extending the (effective) training data has improved the classifica-
damage was artificially introduced as a 50% saw-cut to the midpoint tion for Domain 5. This is because there are few training data
of the front-right beam in Fig. 18(a). associated with the damage class for Domain 5 (Table 2); there-
fore, considering damage data from similar structures (in the latent
space) has proved beneficial. Interestingly, for Domain 4 (t ¼ 4),
there is a marginal decrease in the classification performance. Like
Domain 1, Domain 4 has a less severe class imbalance, and thus, it
appears that the remaining domains (with severe class imbalance)
have negatively impacted the score for this specific domain/task.
These results highlight that the data from a group (or population)
of similar structures can be considered together to increase the (ef-
fective) amount of training data (Bull et al. 2020a; Gosliga et al.
2020; Gardner et al. 2020b). This can lead to significant improve-
ments in the predictive performance of SHM tools—particularly
those learned from small sets of supervised data.
(a) (b) (c)
Fig. 18. Shear structures: (a) test rig; (b) a nominal representation of Conclusions
the five simulated systems; and (c) depiction of the cantilever beam
Three new techniques for statistical inference with SHM signals have
component where fki gdi¼1 ¼ 4kb .
been collected and summarized (originally introduced in previous
© ASCE 03120003-13 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Table 2. Number of data for all domains (1) label information (to describe what measurements represent) is
Training Testing likely to be incomplete, and (2) the available data a priori will usu-
Domain ally correspond to a subset of the expected in situ conditions only.
(t) y ¼ −1 y ¼ þ1 y ¼ −1 y ¼ þ1
Considering the importance of uncertainty quantification in SHM,
1 250 100 500 500 probabilistic methods are suggested, which can be (intuitively) up-
2 100 25 500 500 dated to account for missing information.
3 120 20 500 500 The case study applications for each mode of inference highlight
4 200 150 500 500 the potential advantages for SHM. Partially-supervised methods
5 500 10 500 500
for active and semisupervised learning were utilized to manage the
6* 3 3 2 2
cost system inspections (to label data) while considering the unla-
Note: Domain 6 represents numerical and experimental. beled instances, both offline and online. Dirichlet process cluster-
ing has been applied to streaming data as an unsupervised method
for automatic damage detection and classification. Finally, multi-
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
Acknowledgments
© ASCE 03120003-14 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Bull, L. A., K. Worden, and N. Dervilis. 2019c. “Damage classification Flynn, E. B., and M. D. Todd. 2010. “A Bayesian approach to optimal sen-
using labelled and unlabelled measurements.” In Structural health sor placement for structural health monitoring with application to active
monitoring 2019. Lancaster: Destech Publications. sensing.” Mech. Syst. Sig. Process. 24 (4): 891–903. https://doi.org/10
Bull, L. A., K. Worden, and N. Dervilis. 2020b. “Towards semi-supervised .1016/j.ymssp.2009.09.003.
and probabilistic classification in structural health monitoring.” Mech. Gao, Y., and K. M. Mosalam. 2018. “Deep transfer learning for image-
Syst. Sig. Process. 140 (Jun): 106653. https://doi.org/10.1016/j.ymssp based structural damage recognition.” Comput.-Aided Civ. Infrastruct.
.2020.106653. Eng. 33 (9): 748–768.
Bull, L. A., K. Worden, G. Manson, and N. Dervilis. 2018. “Active learn- Gardner, P., L. A. Bull, N. Dervilis, and K. Worden. 2020a. “Kernelised
ing for semi-supervised structural health monitoring.” J. Sound Vib. Bayesian transfer learning for population-based structural health mon-
437 (Dec): 373–388. https://doi.org/10.1016/j.jsv.2018.08.040. itoring.” In Proc., 38th Int. Modal Analysis Conf. London: Springer.
Bull, L. A., K. Worden, T. J. Rogers, E. J. Cross, and N. Dervilis. 2020c. Gardner, P., L. A. Bull, N. Dervilis, and K. Worden. Forthcoming. “A
“Investigating engineering data by probabilistic measures.” In Vol. 5 sparse Bayesian approach to heterogeneous transfer learning for
of Special topics in structural dynamics and experimental techniques, population-based structural health monitoring.” Mech. Syst. Sig.
77–81. Cham, Switzerland: Springer. Process.
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
Bull, L. A., K. Worden, T. J. Rogers, C. Wickramarachchi, E. J. Cross, T. Gardner, P., L. A. Bull, J. Gosliga, N. Dervilis, and K. Worden. 2020b.
McLeay, W. Leahy, and N. Dervilis. 2019d. “A probabilistic framework “Foundations of population-based structural health monitoring. Part III:
for online structural health monitoring: Active learning from machining Heterogeneous populations—Mapping and transfer.” Mech. Syst. Sig.
data streams.” In Vol. 1264 of Proc., Journal of Physics: Conf. Series, Process. 149 (Feb): 107142. https://doi.org/10.1016/j.ymssp.2020
012028. Bristol, UK: Institute of Physics Publishing. .107142.
Cappello, C., D. Bolognani, and D. Zonta. 2015. “Mechanical equivalent of Gardner, P., X. Liu, and K. Worden. 2020c. “On the application of domain
logical inference from correlated uncertain information.” In Proc., 7th adaptation in structural health monitoring.” Mech. Syst. Sig. Process.
Int. Conf. on Structural Health Monitoring of Intelligent Infrastructure. 138 (Apr): 106550. https://doi.org/10.1016/j.ymssp.2019.106550.
New York: Curran Associates. Gelman, A., H. S. Stern, J. B. Carlin, D. B. Dunson, A. Vehtari, and D. B.
Chakraborty, D., N. Kovvali, B. Chakraborty, A. Papandreou-Suppappola, Rubin. 2013. Bayesian data analysis. Boca Raton, FL: Chapman and
and A. Chattopadhyay. 2011. “Structural damage detection with insuf- Hall/CRC.
ficient data using transfer learning techniques.” In Sensors and smart Gönen, M., and A. Margolin. 2014. “Kernelized Bayesian transfer learn-
structures technologies for civil, mechanical, and aerospace systems, ing.” In Proc., 28th AAAI Conf. on Artificial Intelligence. Palo Alto,
798147. New York: Curran Associates. CA: Association for the Advancement of Artificial Intelligence Press.
Chapelle, O., B. Scholkopf, and A. Zien. 2006. Semi-supervised learning. Gosliga, J., P. Gardner, L. Bull, N. Dervilis, and K. Worden. 2020.
Boca Raton, FL: MIT Press. “Foundations of population-based structural health monitoring. Part II:
Chatzi, E. N., and A. W. Smyth. 2009. “The unscented Kalman filter and Heterogeneous populations—Graphs, networks and communities.”
particle filter methods for nonlinear structural system identification with Mech. Syst. Sig. Process. 148 (Feb): 107144. https://doi.org/10.1016
non-collocated heterogeneous sensing.” Struct. Control Health Monit. /j.ymssp.2020.107144.
Off. J. Int. Assoc. Struct. Control Monit. Eur. Assoc. Control Struct. Huang, Y., J. L. Beck, and H. Li. 2019. “Multitask sparse Bayesian learning
16 (1): 99–123. with applications in structural health monitoring.” Comput.-Aided Civ.
Chen, S., F. Cerda, J. Guo, J. B. Harley, Q. Shi, P. Rizzo, J. Bielak, J. H. Infrastruct. Eng. 34 (9): 732–754. https://doi.org/10.1111/mice.12408.
Garrett, and J. Kovacevic. 2013. “Multiresolution classification with Jang, K., N. Kim, and Y. An. 2019. “Deep learning-based autono-
semi-supervised learning for indirect bridge structural health monitor- mous concrete crack evaluation through hybrid image scanning.”
ing.” In Proc., 2013 IEEE Int. Conf. on Acoustics, Speech and Signal Struct. Health Monit. 18 (5–6): 1722–1737. https://doi.org/10.1177
Processing, 3412–3416. New York: IEEE. /1475921718821719.
Chen, S., F. Cerda, P. Rizzo, J. Bielak, J. H. Garrett, and J. Kovacevic. 2014. Janssens, O., R. Van de Walle, M. Loccufier, and S. Van Hoecke. 2018.
“Semi-supervised multiresolution classification using adaptive graph “Deep learning for infrared thermal image based machine health mon-
filtering with application to indirect bridge structural health monitor- itoring.” IEEE/ASME Trans. Mechatron. 23 (1): 151–159. https://doi
ing.” IEEE Trans. Signal Process. 62 (11): 2879–2893. https://doi.org .org/10.1109/TMECH.2017.2722479.
/10.1109/TSP.2014.2313528. Kremer, J., K. P. Steenstrup, and C. Igel. 2014. “Active learning with
Christides, S., and A. Barr. 1984. “One-dimensional theory of cracked support vector machines.” Wiley Interdiscip. Rev.: Data Min. Knowl.
Bernoulli-Euler beams.” Int. J. Mech. Sci. 26 (11–12): 639–648. https:// Discovery 4 (4): 313–326.
doi.org/10.1016/0020-7403(84)90017-1. MacKay, D. J. 2003. Information theory, inference and learning algo-
Cozman, F. G., I. Cohen, and M. C. Cirelo. 2003. “Semi-supervised learn- rithms. Cambridge, UK: Cambridge University Press.
ing of mixture models.” In Proc., 20th Int. Conf. on Machine Learning Manson, G., K. Worden, and D. Allman. 2003. “Experimental validation of
(ICML-03), 99–106. Washington, DC: Association for the Advance- a structural health monitoring methodology. Part III: Damage location
ment of Artificial Intelligence Press. on an aircraft wing.” J. Sound Vib. 259 (2): 365–385. https://doi.org/10
Dasgupta, S. 2011. “Two faces of active learning.” Theor. Comput. Sci. .1006/jsvi.2002.5169.
412 (19): 1767–1781. https://doi.org/10.1016/j.tcs.2010.12.054. McCallumzy, A. K., and K. Nigamy. 1998. “Employing EM and pool-
de Roeck, G. 2003. “The state-of-the-art of damage detection by vibration based active learning for text classification.” In Proc., Int. Conf. on
monitoring: The SIMCES experience.” Struct. Control Health Monit. Machine Learning (ICML), 359–367. Princeton, NJ: Citeseer.
10 (2): 127–134. https://doi.org/10.1002/stc.20. Murphy, K. P. 2012. Machine learning: A probabilistic perspective.
Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. “Maximum likeli- Cambridge, MA: MIT Press.
hood from incomplete data via the EM algorithm.” J. R. Stat. Soc. Ser. B Neal, R. M. 2000. “Markov chain sampling methods for Dirichlet process
(Methodol.) 39 (1): 1–22. mixture models.” J. Comput. Graphical Stat. 9 (2): 249–265.
Dervilis, N., E. Cross, R. Barthorpe, and K. Worden. 2014. “Robust Nigam, K., A. McCallum, S. Thrun, and T. Mitchell. 1998. “Learning to
methods of inclusive outlier analysis for structural health monitoring.” classify text from labeled and unlabeled documents.” AAAI/IAAI
J. Sound Vib. 333 (20): 5181–5195. https://doi.org/10.1016/j.jsv.2014 792 (6): 792–799.
.05.012. Ou, Y., E. N. Chatzi, V. K. Dertimanis, and M. D. Spiridonakos. 2017.
Dorafshan, S., R. J. Thomas, and M. Maguire. 2018. “Comparison of deep “Vibration-based experimental damage detection of a small-scale wind
convolutional neural networks and edge detectors for image-based crack turbine blade.” Struct. Health Monit. 16 (1): 79–96. https://doi.org/10
detection in concrete.” Constr. Build. Mater. 186 (Oct): 1031–1045. .1177/1475921716663876.
https://doi.org/10.1016/j.conbuildmat.2018.08.011. Pan, S. J., and Q. Yang. 2010. “A survey on transfer learning.” IEEE Trans.
Farrar, C. R., and K. Worden. 2012. Structural health monitoring: A ma- Knowl. Data Eng. 22 (10): 1345–1359. https://doi.org/10.1109/TKDE
chine learning perspective. Chichester, UK: Wiley. .2009.191.
© ASCE 03120003-15 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003
Papoulis, A. 1965. Probabilities, random variables, and stochastic proc- Vanik, M. W., J. L. Beck, and S. Au. 2000. “Bayesian probabilistic approach
esses. New York: McGraw-Hill. to structural health monitoring.” J. Eng. Mech. 126 (7): 738–745. https://
Peeters, B., and G. de Roeck. 2001. “One-year monitoring of the doi.org/10.1061/(ASCE)0733-9399(2000)126:7(738).
Z24-bridge: Environmental effects versus damage events.” Earthquake Vlachos, A., A. Korhonen, and Z. Ghahramani. 2009. “Unsupervised
Eng. Struct. Dyn. 30 (2): 149–171. https://doi.org/10.1002/1096-9845 and constrained Dirichlet process mixture models for verb clustering.”
(200102)30:2<149::AID-EQE1>3.0.CO;2-Z. In Proc., Workshop on Geometrical Models of Natural Language
Rasmussen, C. E. 2000. “The infinite Gaussian mixture model.” In Advan- Semantics, 74–82. Stroudsburg, PA: Association for Computational
ces in neural information processing systems, 554–560. Cambridge, Linguistics.
MA: MIT Press. Wan, H., and Y. Ni. 2019. “Bayesian multi-task learning methodology for
Rasmussen, C. E., and Z. Ghahramani. 2001. “Occam’s razor.” In Advances reconstruction of structural health monitoring data.” Struct. Health
in neural information processing systems, 294–300. Cambridge, MA: Monit. 18 (4): 1282–1309. https://doi.org/10.1177/1475921718794953.
MIT Press. Wang, M., F. Min, Z.-H. Zhang, and Y.-X. Wu. 2017. “Active learning
Rippengill, S., K. Worden, K. M. Holford, and R. Pullin. 2003. “Automatic through density clustering.” Expert Syst. Appl. 85 (Nov): 305–317.
classification of acoustic emission patterns.” Strain 39 (1): 31–41. https://doi.org/10.1016/j.eswa.2017.05.046.
Downloaded from ascelibrary.org by University of Western Ontario on 11/29/20. Copyright ASCE. For personal use only; all rights reserved.
https://doi.org/10.1046/j.1475-1305.2003.00041.x. Worden, K., and G. Manson. 2006. “The application of machine learning to
Rogers, T. J., K. Worden, R. Fuentes, N. Dervilis, U. T. Tygesen, and E. J. structural health monitoring.” Philos. Trans. R. Soc. London, Ser. A
Cross. 2019. “A Bayesian non-parametric clustering approach for semi- 365 (1851): 515–537. https://doi.org/10.1098/rsta.2006.1938.
supervised structural health monitoring.” Mech. Syst. Sig. Process. Worden, K., G. Manson, G. Hilson, and S. Pierce. 2008. “Genetic optimi-
119 (Mar): 100–119. https://doi.org/10.1016/j.ymssp.2018.09.013. zation of a neural damage locator.” J. Sound Vib. 309 (3): 529–544.
Rousseeuw, P. J., and K. V. Driessen. 1999. “A fast algorithm for the https://doi.org/10.1016/j.jsv.2007.07.035.
minimum covariance determinant estimator.” Technometrics 41 (3): Ye, J., T. Kobayashi, H. Tsuda, and M. Murakawa. 2017. “Robust hammer-
212–223. https://doi.org/10.1080/00401706.1999.10485670. ing echo analysis for concrete assessment with transfer learning.”
Schwenker, F., and E. Trentin. 2014. “Pattern classification and cluster- In Proc., 11th Int. Workshop on Structural Health Monitoring,
ing: A review of partially supervised learning approaches.” Pattern 943–949. Stanford, CA: Stanford Univ.
Recognit. Lett. 37 (1): 4–14. https://doi.org/10.1016/j.patrec.2013 Zhang, Y., and Q. Yang. 2018. “An overview of multi-task learning.” Natl.
.10.017. Sci. Rev. 5 (1): 30–43. https://doi.org/10.1093/nsr/nwx105.
Settles, B. 2012. “Active learning.” Synth. Lect. Artif. Intell. Mach. Learn. Zhao, R., R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao. 2019. “Deep
6 (1): 1–114. https://doi.org/10.2200/S00429ED1V01Y201207AIM018. learning and its applications to machine health monitoring.” Mech. Syst.
Sohn, H., C. R. Farrar, F. M. Hemez, D. D. Shunk, D. W. Stinemates, B. R. Sig. Process. 115 (Jan): 213–237. https://doi.org/10.1016/j.ymssp.2018
Nadler, and J. J. Czarnecki. 2003. A review of structural health mon- .05.050.
itoring literature: 1996–2001. Los Alamos, NM: Los Alamos National Zhu, X. J. 2005. Semi-supervised learning literature survey. Report.
Laboratory. Madison, WI: Univ. of Wisconsin–Madison.
Tipping, M. E. 2000. “The relevance vector machine.” In Advances in neu- Zonta, D., B. Glisic, and S. Adriaenssens. 2014. “Value of information:
ral information processing systems, 652–658. Cambridge, MA: MIT Impact of monitoring on decision-making.” Struct. Control Health
Press. Monit. 21 (7): 1043–1056. https://doi.org/10.1002/stc.1631.
© ASCE 03120003-16 ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng.
ASCE-ASME J. Risk Uncertainty Eng. Syst., Part A: Civ. Eng., 2021, 7(1): 03120003