Professional Documents
Culture Documents
&
James L. Beck
Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, USA
Abstract: Bayesian system identification has attracted capability of the presented SBL algorithms for structural
substantial interest in recent years for inferring struc- system identification.
tural models and quantifying their uncertainties based
on measured dynamic response in a structure. The rel- 1 INTRODUCTION
ative plausibility of each structural model in a speci-
fied model class is quantified by its posterior probability In the last decade, worldwide efforts to implement
from Bayes’ Theorem. The relative plausibility of each structural health monitoring (SHM) systems on civil
model class within a set of candidate model classes for infrastructure have been rewarded by an increasing
the structure can also be assessed via Bayes’ Theorem. number of collected data sets of structural vibration
Computation of this posterior probability over all can- response. Despite the importance of these structural
didate model classes automatically applies a quantitative vibration records to understand the behavior and
Ockham’s razor that trades off a data-fit measure with performance of real, full-scale structures under real
an information-theoretic measure of model complexity, environmental conditions, analysis of these data has
which penalizes model classes that “over-fit” the data. lagged far behind their rate of collection. Systematic
In this article, we first present a general Bayesian sys- computer-based information extracting techniques,
tem identification framework and point out that combin- such as those developed in system identification re-
ing it with sparse Bayesian learning (SBL) is an effective search are a key component in model-based inversions
strategy to implement the Bayesian Ockham razor. Then in SHM for detection and assessment of damage. In sys-
we review our recent progress in exploring SBL with the tem identification, the goal is to use observed structural
automatic relevance determination likelihood concept to response data and prior knowledge about the structure
detect and quantify spatially sparse substructure stiffness to update mathematical models of the behavior of a
reductions. To characterize the full posterior uncertainty system such as a bridge or building when subject to
for this problem, an improved Gibbs sampling procedure dynamic excitation. In addition to SHM, the goals of
for SBL is then developed. Finally, illustrative results are such data-informed modeling might also include pro-
provided to compare the performance and validate the viding a better understanding of the structural system’s
∗ To
behavior and allowing more accurate predictions of
whom correspondence should be addressed. E-mail:
huangyong@hit.edu.cn and huangyongthere@outlook.com.
its future response to specified excitations. Despite a
long history, the development of algorithms for system
C 2018 Computer-Aided Civil and Infrastructure Engineering.
DOI: 10.1111/mice.12358
2 Huang & Beck
identification continues to be an active area of research during parameter estimation and model updating. By
in structural dynamics (e.g., Green et al., 2015; Shan introducing a hierarchical Bayesian model class, this
et al., 2016; Perez-Ramirez et al., 2016; Huang et al., machine learning technique is able to select automat-
2017b; Vakilzadeh et al., 2017; Oh et al., 2017; Li et al., ically a sparse subset of all the uncertain parameters
2017; Amezquita-Sanchez et al., 2017). solely from the available data. SBL with an automatic
One of the main difficulties in system identification relevance determination (ARD) prior was originally
is that it is impossible to exactly model the full be- introduced for the relevance vector machine (Tipping,
havior of a structure by using the limited sensor data 2000) and sparse principal component analysis (Tip-
and prior knowledge available. As any model gives an ping, 2001b). It has been used recently in earthquake
approximation to the real system behavior, there are engineering (Mu and Yuen, 2017), compressive sensing
always modeling uncertainties involved; for example, of SHM signals (Huang et al., 2014), and geotechnical
what values of the model parameters are appropriate engineering (Ching et al., 2017; Wang and Zhao, 2017).
and how well does the model predict the real system re- A key feature in using SBL with the ARD prior is that
sponse? Another difficulty is that for complex system the model of the observations is a linear function of
models, single-point parameter estimation often gives those model parameters for which sparseness is to be
nonunique results (e.g., multiple least-squares or max- enforced. This is not the case, however, when updating
imum likelihood estimates). To make more robust pre- the stiffness parameters in a structural model.
dictions, one should track all plausible values of the Recently, we introduced SBL into Bayesian system
parameters based on the data and also explicitly treat identification by incorporating the concept of ARD
the uncertain prediction errors (the difference between through a likelihood function rather than through a
the response of the real system and that of the system prior as in the original SBL approach (Tipping, 2001a).
model), as well as possible measurement errors. These Our procedure gives an effective implementation of
issues have motivated numerous researchers to tackle the Bayesian Ockham Razor and automatically induces
the problem of structural system identification from a sparse changes in model updating, which reduces
Bayesian perspective (e.g., Beck, 2010; Green et al., the ill-conditioning in inferring the updated stiffness
2015; Huang et al., 2017b). parameter values. In a series of three papers, we
In contrast to the point estimates of the parameters have introduced and then improved our SBL theory
used in the conventional deterministic or frequentist– for system identification by finding ways to remove
probabilistic methods, the Bayesian probabilistic approximations, so that we can better characterize the
framework uses Bayes’ Theorem to quantify the rela- posterior uncertainty of the parameters and hyperpa-
tive plausibility based on the data of all possible values rameters (Huang and Beck, 2015; Huang et al., 2017a,
of the model parameters via their posterior probability b). In this work, we further improve our SBL approach
density function (PDF). This procedure is used to by providing a full characterization of the posterior
learn about all plausible models for representing the uncertainty rather than just using maximum a posteriori
system’s behavior where each parameter value specifies (MAP) values for the hyperparameters, which is accom-
a possible model for the system. As there is always plished by developing a SBL procedure for Bayesian
uncertainty in which parameterized model class to system identification based on full Gibbs sampling (GS).
choose to represent a system, one can also choose a set The remainder of the article is organized as fol-
of candidate model classes and calculate their posterior lows. In Section 2, we present a general framework
probability based on the data by applying Bayes’ The- for Bayesian system identification. SBL with the ARD
orem at the model class level. An information-theoretic prior is then introduced together with the Bayesian
interpretation (Beck, 2010) shows that the posterior Ockham Razor in Section 3, with a discussion of why
probability of each model class depends on the differ- they induce sparseness during Bayesian updating. In
ence between a measure of the average data-fit of the Section 4, we give an overview of our recent progress
model class and the amount of information extracted in developing SBL algorithms for system identification
from the data by the model class, which penalizes and propose a new full GS based SBL algorithm to fully
model classes that “over-fit” the data. Comparing the characterize the posterior uncertainty. We then com-
posterior probability of each model class therefore pare our new SBL algorithm with two previous SBL
provides a quantitative Ockham (Occam) Razor (Gull, algorithms of ours in terms of theory and computa-
1988; Jefferys and Berger, 1992; Mackay, 1992), that is, tion. Applications of our Bayesian methods to struc-
loosely speaking, models should be no more complex tural data for a well-known experimental benchmark
than is sufficient to explain the data. problem are presented in Section 5 to show the capabil-
Sparse Bayesian learning (SBL) is a method that ity of these methods. Concluding remarks are made in
uses this Bayesian Ockham Razor to induce sparseness Section 6.
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 3
2 BAYESIAN SYSTEM IDENTIFICATION some prior constraints. This procedure is called stochas-
tic embedding of the parameterized deterministic model
Consider the problem of predicting the output z(t) to in Beck (2010). A probability model can also be cho-
some input u(t) of a real dynamic system over some sen for the measurement error history m1:N based on
time interval, t ∈ [0, t f ], by using a computational model a separate study of the sensors, where m1:N is taken
of the system. We use un = u(nt) ∈ R N I and zn = to be probabilistically independent of the prediction
z(nt) ∈ R No to denote the real system input and out- error history e1:N . This leads to a probability model
put, respectively, at discrete times tn = nt, n ∈ Z+ , and p(y1:N |û0:N , w, M) for predicting the sensor output y1:N .
use u0:n = [u0T , u1T , . . . , unT ]T and z0:n = [z0T , z1T , . . . , znT ]T In many applications, mn is negligible compared with en
to denote the discrete-time histories of the system input and so it can be dropped, that is, the difference between
and output up to time tn . the measured system output yn and the actual output zn
is ignored but not the difference between the real sys-
tem and model outputs, zn and qn .
2.1 Stochastic model class
The data D N can be used to update the relative plau-
In modeling the input and output (I/O) behavior of sibility of each stochastic I/O model p(y1:n |û0:N , w, M),
a real system, one cannot expect any chosen deter- w ∈ W ⊂ R N p , defined by the stochastic model class
ministic model to make perfect predictions and the M, by computing the posterior PDF p(w|D N , M) from
prediction errors of any such model will be uncer- Bayes’ Theorem:
tain. This motivates the introduction of a stochas-
p (w|D N , M) = p (D N |w, M) p (w|M) / p (D N |M)
tic (or Bayesian) model class M (Beck, 2010) for a (2)
system that consists of a set of stochastic I/O pre- = c−1 p (D N |w, M) p (w|M)
dictive models (also called stochastic forward models) where c = p(D N |M) is the normalizing constant, which
{ p(z1:n |u0:n , w, M) : w ∈ W ⊂ R N p } where each model is is called the evidence or marginal likelihood for the
specified by a PDF valid for any n ∈ Z+ , together with model class M given by data D N ; p(D N |w, M), as a
a chosen prior probability distribution p(w|M) over this function of w, is the likelihood function, which expresses
set that quantifies the initial relative plausibility of each the probability of getting data D N based on the PDF
I/O probability model corresponding to each value of p(y1:N |û0:N , w, M) by substituting the measured output
the parameter vector w. Any deterministic I/O model data ŷ1:N for y1:N . Note that a model class can be used
of a system that involves uncertain parameters can be to perform both prior (initial) and posterior (updated
used to construct such a model class for the system by using system sensor data) robust predictive analyses,
stochastic embedding (Beck, 2010) in which the Princi- which can be used during design and operation, respec-
ple of Maximum Information Entropy plays an impor- tively, of a structure, based purely on the probability
tant role (Jaynes, 1983, 2003) (see Equation (1) in the logic axioms (Papadimitriou et al., 2001).
next subsection).
where wm is the vector of model parameters for Mm . A = ∫ log [ p (D N |wm , Mm )] p (wm |D N , Mm ) dwm
uniform prior probability distribution can be chosen for
p (wm |D N , Mm )
the candidate model classes, that is, p(Mm |M) = 1/M, − ∫ log p (wm |D N , Mm ) dwm (5)
p (wm |Mm )
if the model classes are considered equally plausible a
priori (our convention is to use P(·) for probabilities p (wm |D N , Mm )
= E [log ( p (D N |wm , Mm ))] − E log
and p(·) for PDFs). p (wm |Mm )
The calculation of the posterior probability where the expectations E[·] are taken with respect
P(Mm |D N , M) in Equation (3) provides a proce- to the posterior p(wm |D N , Mm ). The first term is the
dure for Bayesian model class selection (or comparison, posterior mean of the log likelihood function, which is a
or assessment), where the computation of the multi- measure of the average data-fit of the model class Mm ,
dimensional integral in Equation (4) for the evidence and the second term is the Kullback–Leibler informa-
function is vital. If there is no analytical solution for tion, or relative entropy, of the posterior relative to the
Equation (4), Laplace’s approximation method can prior, which is a measure of the model complexity (the
be used when the model class is globally identifiable amount of information gain about wm from the data
based on the available data D N (e.g., Beck and Yuen D N ) and is always nonnegative. The merit of Equa-
2004; Beck 2010). When the chosen class of models tion (5) is that it shows rigorously, without introducing
is unidentifiable or locally identifiable based on the ad hoc concepts, that the log evidence for Mm explicitly
data D N so that there are multiple MLEs (maximum builds in a trade-off between the data-fit of the model
likelihood estimates) (Beck and Katafygiotis, 1998), class and its information-theoretic complexity. This is
only stochastic simulation methods are practical to important in system identification applications because
calculate the model class evidence, such as the Transi- too complex models often lead to over-fitting of the
tional Markov chain Monte Carlo simulation (MCMC) data and the subsequent response predictions may then
method (Ching and Chen, 2007) or the Approximate be unreliable as they depend too much on the details of
Bayesian Computation method (Chiachio et al., 2014; the specific data, for example, measurement noise and
Vakilzadeh et al., 2017). When the posterior probability environmental effects.
of each model class, P(Mm |D N , M), has been calcu-
lated, the Total Probability Theorem can be applied to 3.2 General formulation of SBL with the ARD prior
produce the posterior hyper-robust predictive models
that combine the predictions of all plausible model Given a set of I/O data D = {û, ŷ}, suppose that the
classes in a specified set (Beck, 2010). model prediction of the output is y = f(û) + e + m ∈
R No involving a deterministic function f of the input vec-
tor û, along with uncertain prediction error e and mea-
3 SBL AND BAYESIAN OCKHAM RAZOR surement noise m. Assume that the function f is chosen
Np
as a weighted sum of N p basis functions { j (û)} j=1 :
3.1 Bayesian Ockham Razor
Np
Comparing the posterior probability of each candidate f (û) = w j j (û) = θ (û) w (6)
model class by Equation (4) automatically implements j=1
an elegant and powerful version of Ockham’s (Oc- where θ is an No × N p matrix with the basis functions
cam’s) Razor, known as the Bayesian Ockham Razor. { j } as columns. Analysis of this model is facilitated by
A recent interesting information-theoretic interpreta- the adjustable parameters (or weights) w ∈ R N p appear-
tion (Beck, 2010) shows that the evidence p(D N |Mm ) in ing linearly. The objective here is to infer values of the
Equation (4) explicitly builds in a trade-off between a Np
parameters {w j } j=1 such that θ(û)w is a “good” approx-
data-fit measure for the model class and an information- imation of f(û) and the parameter vector w is sparse.
theoretic measure of its complexity that quantifies SBL encodes a preference for sparser parameter vec-
the amount of information that the model class ex- tors by making a special choice for the prior distribution
tracts from the data D N . This result is based on using for the parameter vector w that is known as the ARD
Equation (2) in the expression for the normalization of prior (Mackay, 1992; Tipping, 2001a; Oh et al., 2008):
the posterior PDF:
Np
Np
Log [ p (D N |Mm )] p (w|α) = p (w j |α j ) = N w j |0, α −1
j
j=1 j=1
= ∫ log [ p (D N |Mm )] p (wm |D N , Mm ) dwm (7)
Np
−1/2
1
p (D N |wm , Mm ) p (wm |Mm ) = (2π ) αj 1/2
exp − α j w 2j
= ∫ log p (wm |D N , Mm ) dwm 2
p (wm |D N , Mm ) j=1
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 5
responding coefficient w j has an insignificant contribu- If we assign flat, noninformative prior PDFs for α and
tion to the modeling of the measurements y, because it β, this is equivalent to maximizing the evidence function
produces essentially a Dirac delta-function at zero for p(ŷ|α, β).
the prior, and so the posterior. By using the principle of
maximum information entropy (Jaynes, 1983) and incor-
porating the first two moments of y as constraints, the 3.3 Bayesian Ockham Razor implementation in SBL
combination of the prediction error and measurement Finding the optimal values (α̃, β̃) of the hyper-
noise e is modeled as a zero-mean Gaussian vector with parameters as in Equation (12) is a procedure corre-
covariance matrix β −1 I No , which gives a Gaussian pre- sponding to Bayesian model class selection (Beck and
dictive PDF: Yuen, 2004) because some of the terms in the linear-
− No β in-the-parameters expansion of possible terms in Equa-
p (y|w, β) = 2π β −1 2 exp − y − θ (û) w 22
2 tion (6) are suppressed, where each subset of terms
Np (8) could form a separate model class. Recall from Equa-
= N (y|θ (û) w, β −1 I No ) tion (5) that the evidence function p(ŷ|α, β) can be ex-
j=1 pressed as a difference of posterior expectations:
By substituting the data ŷ for y, Equation (8) gives Log [ p (ŷ|α, β)] = E [log ( p (ŷ|w, β))]
a Gaussian likelihood function that measures how (13)
well the model for specified parameters w and β − E [log [ p (w|ŷ, α, β) / p (w|α)]]
predicts the measurements ŷ. A stochastic model class
M(α, β) is then defined by the I/O predictive model in It is found in Huang et al. (2014) that the data-fit
Equation (8) and the prior PDF on w given by measure (the first term) decreases with a lower spec-
Equation (7). ified prediction accuracy (smaller β), and this is asso-
The posterior distribution p(w|ŷ, α, β) over the ciated with sparser models (more α j s tend to infinity
weight parameters given by model class M(α, β) is during the optimization). This is because smaller β al-
computed based on Bayes’ theorem: lows more of the data misfit to be treated as prediction
errors. At the same time, smaller β, with the associated
p (w|ŷ, α, β) = p (ŷ|w, β) p (w|α) / p (ŷ|α, β) (9) larger α j s, produce a smaller Kullback–Leibler infor-
where p(ŷ|α, β) = ∫ p(ŷ|w, β) p(w|α)dw is the evidence mation (the second term in Equation (13)) between the
of the model class M(α, β). As both the prior and posterior PDF and the prior PDF, indicating that less in-
likelihood for w are Gaussian and the likelihood mean formation is extracted from the measurements ŷ by the
θ(û)w is linear in w, the posterior PDF can be expressed updated model and that the data-fit term is penalized
analytically as a multivariate Gaussian distribution: less by the positive second term in Equation (13). On
−1 the other hand, larger β produces a model that fits the
p (w|ŷ, α, β) = N (w| T θ+ β −1 A T ŷ, measurements with smaller error (larger data-fit mea-
−1 sure in Equation (13)) but the model is under sparse
βT + A ) (10)
(more nonzero terms in Equation (6)) and so its relative
where A = diag(α j , . . . , α N p ). entropy (information entropy) is large (second term in
A continuous set of candidate model classes M(α, β) Equation (13) penalizes data-fit more). In both cases,
is defined above, and the robust posterior PDF smaller and larger β, the models give a trade-off be-
p(w|ŷ) can be computed by integrating out the poste- tween data-fitting and model complexity (more sparse-
rior uncertainty in α and β as below. We assume that ness corresponds to less model complexity) that may
the posterior p(α, β|ŷ) is highly peaked at {α̃, β̃} (the not be the optimal one that maximizes the log evidence
MAP value of {α, β}). We then treat [α, β] as a “nui- in Equation (13). The learning of hyperparameters α
sance” parameter vector and integrate it out by ap- and β by maximizing the evidence function p(ŷ|α, β) as
plying Laplace’s asymptotic approximation (Beck and in Equation (12) produces the optimal trade-off that
Katafygiotis, 1998): causes many hyperparameters α j to approach infinity
with a reasonably large value of β, giving a model w that
p (w|ŷ) = ∫ p (w|ŷ, α, β) p (α, β|ŷ) dαdβ
is both sufficiently sparse and fits the data vector ŷ well,
≈ p w|ŷ, α̃, β̃ (11) that is, it gives the best balance between data-fitting and
6 Huang & Beck
model complexity. We can also say that SBL automati- where K j ∈ R Nd ×Nd , j = 1, . . . , Nθ , is the prior choice
cally penalizes models that “under-fit” or “over-fit” the of the jth substructure stiffness matrix and the corre-
associated data ŷ. This is the Bayesian Ockham Razor sponding stiffness scaling parameter θ j is a factor that
at work. allows modification of the nominal jth substructure
stiffness so it is more consistent with the real structure
Remark 3.1. We have found the SBL algorithm with
behavior. The stiffness matrices K j could come from a
ARD suffers from a robustness problem if the number
finite-element model of the structure, then it would be
of measurements No is much smaller than the number of
appropriate to choose all θ j = 1 to give the most proba-
model parameters N p : there are local maxima for
ble value a priori for the parameter vector θ ∈ R Nθ . For
Equation (12) that may trap the hyperparameter opti-
damage detection purposes, we will exploit the fact that
mization, leading to non-robust Bayesian updating re-
damage-induced stiffness reductions typically occur in
sults (Huang et al., 2014). Several robustness enhance-
a small number of locations in the absence of structural
ment algorithms (Huang et al., 2014, 2016) have been
collapse, and so the potential change in θ compared with
developed based on different strategies, with the goal of
that of a reference calibration stage is expected to be a
increasing signal reconstruction accuracy in compressive
sparse vector with relatively few nonzero components.
sensing for structural health monitoring signals.
The following joint prior PDF for system parameters
ω2 and φ and stiffness scaling parameters θ is chosen
(Huang and Beck, 2015):
4 APPLYING SBL TO SYSTEM
IDENTIFICATION p ω2 , φ, θ|β
β
Nm
−Nm Nd /2
4.1 Hierarchical Bayesian model class for system ∝ (2π/β) exp − K (θ) − ω2m M φm 2
2
identification m=1
more or less information, respectively, from the system (Gelman et al., 2013). From a practical point of view,
modal parameters ω2 and φ, and so from the “measured” GS is ergodic when the regions of high values of the
modal data ω̂2 and ψ̂, which can be seen from the hier- full posterior PDFs p(φ, ω2 , θ, β|ω̂2 , ψ̂, θ̂u ) are effec-
archical model in Figure 1). This process suppresses the tively connected (corresponding to the model class be-
occurrence of false and missed alarms for stiffness reduc- ing either globally identifiable or unidentifiable (Beck
tions. and Katafygiotis, 1998)), which means that sampling the
Markov chain can fully explore its stationary state when
Remark 4.2. It was found that the trade-off in the quan-
n is large, no matter how the GS algorithm is initialized.
titative Bayesian Ockham Razor stated in Subsection 3.1
To derive the generic form p(w1 |ŷ, w2 , w3 , β) for
is sensitive to the selection of the equation-error preci-
the conditional posterior PDFs p(φ|ψ̂, ω2 , θ, β),
sion parameter β. This motivated us to develop a more
p(ω2 |ω̂2 , φ, θ, β) and p(θ|θ̂u , φ, ω2 , β), we derive the
sophisticated method, described in the next subsection, to
conditional prior PDFs p(φ|ω2 , θ, β), p(ω2 |φ, θ, β)
provide a fuller treatment of the posterior uncertainties.
and p(θ|φ, ω2 , β) from Equation (15) and express them
Remark 4.3. In the fast SBL algorithm, the use of the in the following general form:
pseudodata θ̂u is based on the assumption that it is a −1 −1
unique MAP estimate from the calibration stage due p(w1 |w2 , w3 , β) = N w1 | ET E ET r, βET E (21)
to the large amount of time-domain vibration data and
identified modal parameters that can be collected at this where E and r are a matrix and a vector which only de-
stage. In the next subsection, we relax this assumption by pend on the parameters w2 and w3 ; and β is the eigen-
explicitly considering the posterior uncertainty of θu from equation-error precision parameter. Each choice of w1 ,
the calibration stage in case there is insufficient data to get w2 , and w3 is a permutation of φ, ω2 and θ.
a posterior on θu that is highly peaked at θ̂u . Similarly, the likelihood functions in Equation (16),
(17), and (18) can be written in the following general
form:
4.3 SBL algorithms using GS
p (ŷ|w1 , κ) = N (ŷ|θw1 , L (κ)) (22)
4.3.1 Partial GS combined with Laplace approxima- K× 1
tions. The goal of the algorithm presented here is where vector ŷ ∈ R is the available data, either θ̂u , ψ̂
to provide a fuller treatment of the posterior uncer- or ω̂2 , w1 is θ, φ or ω2 , θ ∈ R K ×N is a matrix and L(κ) ∈
tainty by employing MCMC simulation methods, so R K ×K is a diagonal covariance matrix (the diagonal ele-
that the Laplace approximations in the fast SBL al- ments are composed of the components in κ). For Equa-
gorithm that involve the system modal parameters tions (16)–(18), the choice of κ is vector α, and scalars η
{ω2 , φ} and the equation-error precision parameter β and ρ, respectively.
can be avoided. We implement GS to draw posterior The posterior PDF of w1 is computed by Bayes’
samples from p(φ, ω2 , θ, β|ω̂2 , ψ̂, θ̂u ) by decomposing Theorem:
the whole model parameter vector into four groups p (w1 |ŷ, w2 , w3 , β) ≈ p (w1 |ŷ, w2 , w3 , β, κ̃)
and repeatedly sampling from one parameter group (23)
∝ p (ŷ|w1 , κ̃) p (w1 |w2 , w3 , β)
conditional on the other three groups and the avail-
able data. The effective dimension is then four, rather where κ̃ = arg max p(κ|ŷ, w2 , w3 , β) and we have used
than the much higher total number of model parame- Laplace’s approximation to marginalize κ from the full
ters. Laplace’s approximation is used for the integrals posterior for w1 . By combining the Gaussian prior
that marginalize the hyperparameters from the poste- p(w1 |w2 , w3 , β) in Equation (21) and the Gaussian like-
rior PDF, as in Equations (19) and (20). lihood p(ŷ|w1 , κ̃) in Equation (22), the Gaussian poste-
In this GS method, the conditional posterior PDFs: rior PDF p(w1 |ŷ, w2 , w3 , β) is obtained.
The conditional posterior PDF for β is derived as:
p φ|ω̂2 , ψ̂, θ̂u , ω2 , θ, β = p φ|ψ̂, ω2 , θ, β ,
p ω2 |ω̂2 , ψ̂, θ̂u , φ, θ, β = p ω2 |ω̂2 , φ, θ, β , p β|ω̂2 , ψ̂, θ̂u , φ, ω2 , θ = Gamma β|a0 , b0 (24)
p θ|ω̂2 , ψ̂, θ̂u , φ, ω2 , β = p θ|θ̂u , φ, ω2 , β , where the shape parameter a0 and rate parameter b0 for
the posterior gamma distribution on β are given by:
and p β|ω̂2 , ψ̂, θ̂u , φ, ω2 , θ
a0 = 1 + Nm Nd /2 (25a)
are successively sampled to generate samples from the
full posterior PDF p(φ, ω2 , θ, β|ω̂2 , ψ̂, θ̂u ) when the
Nm
number of samples n is sufficiently large (beyond burn- b0 = b̃0 + K (θ) − ωi2 M φi 2 /2 (25b)
in) and the Markov chain created by the GS is ergodic i= 1
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 9
b̃0 = arg max p(b0 |ω̂2 , ψ̂, θ̂u , φ, ω2 , θ) (25c) p(κ|ŷ, w1 ). Also, b0 is irrelevant in the condi-
tional posterior PDF p(w1 |ŷ, w2 , w3 , β, b0 , κ) =
The reader is referred to Huang et al. (2017b) for de- p(w1 |ŷ, w2 , w3 , β, κ) because w1 is independent of b0
tailed information of the SBL algorithm using GS, in- when β is given. Similarly, from Figure 1,
cluding the derivation of MAP values of the hyperpa- the conditional posterior PDF on β satisfies
rameters and the pseudocodes. p(β|ω̂2 , ψ̂, θ̂u , ω2 , φ, θ, ρ, η, α, b0 ) = p(β|φ, ω2 , θ, b0 ),
Remark 4.4. It is tractable to marginalize out the which is just the conditional prior on β. Finally, we
equation-error precision parameter β to remove assign a Gamma prior distribution, Gamma(b0 |ab , bb ),
it from the posterior distributions as a “nuisance” for b0 and using Figure 1, we see that the conditional
parameter, where the generic conditional posterior posterior for b0 is given by:
PDF becomes p(w1 |ŷ, w2 , w3 ) = ∫ p(w1 |ŷ, w2 , w3 , β)
p(β|ŷ, w2 , w3 )dβ, which is a Student’s t distribution p b0 |ω̂2 , ψ̂, θ̂u , φ, ω2 , θ, η, ρ, α, β
(28)
(Huang et al., 2017b). The Student’s t PDFs have heavier ∝ Gamma(b0 |1/2 + ab , β + bb )
tails than the Gaussian PDFs sampled in Algorithm 1
and so the algorithm is more robust to noise and outliers. The prior scale and rate parameters ab and bb for b0 are
selected to be very small values in the examples later.
4.3.2 Full GS procedure for SBL. In this proposed The straight-forward way to implement GS is to
method, the posterior uncertainties of all unknown pa- draw posterior samples from the joint posterior PDF
rameters are explicitly characterized by implementing p(ω2 , φ, θ, ρ, η, α, β, b0 |ω̂2 , ψ̂, θ̂u ) by decomposing the
GS to draw posterior samples from the joint posterior whole uncertain parameter vector into eight groups
PDF p(ω2 , φ, θ, β, ρ, η, α, b0 |ω̂2 , ψ̂, θ̂u ). {ω2 , φ, θ, α, ρ, η, β, b0 } and then iterating over the
We have already introduced the conditional posterior groups by repeatedly sampling from the PDF of one
PDFs for the model parameters that are needed for GS parameter group conditional on the other seven groups
in the previous subsection. It remains to derive the con- and the available data.
ditional posterior PDFs for the hyperparameters. The If there are enough modal data ω̂2 and ψ̂ available
prior PDF for the mth component κm of the hyperpa- to provide sufficient information to constrain the up-
rameter vector κ in Equation (22) (equal to η, ρ or α) is dated stiffness parameters, we can produce a GS al-
chosen as a Gamma distribution with parameter am and gorithm which allows a nonsparse set of stiffness
bm , which gives a conjugate prior and allows exact pos- changes to be inferred. In this case, the soft con-
terior sampling of the hyperparameter κ. The posterior straint provided by Equation (16) is dropped. The pseu-
PDF of κ is obtained by using Bayes’ Theorem: docodes for the algorithm are presented in Table 1,
where we implement GS to draw posterior samples
p (κ|ŷ, w1 , w2 , w3 , β, b0 ) = p (κ|ŷ, w1 ) from p(ω2 , φ, θ, ρ, η, β, b0 |ω̂2 , ψ̂) by decomposing the
M whole model parameter vector into the seven groups
∝ p (ŷ |w1 , κ) p (κm |am , bm ) {ω2 , φ, θ, ρ, η, β, b0 } and repeatedly sampling from one
m=1 (26) parameter group conditional on the other six groups
M and the available modal data ω̂2 and ψ̂. The condi-
∝ Gamma(κm |am , bm ) tional posterior PDFs are readily derived from Equa-
m=1 tions (23)–(28). Note that the Markov chain samples
from the marginal posterior PDF for any parameter or
where, as before, w1 is θ, φ or ω2 , and the shape and
any parameter group is obtained by simply examining
rate parameters for the posterior Gamma distribution
the appropriate components of the joint posterior GS
become:
samples beyond the burn-in period.
am = am + trace (Xm ) /2 (27a) However, when we incorporate the sparseness
constraint for the stiffness changes as in Equa-
bm = bm + (ŷ − θw1 )T X−1 tion (16) and characterize the joint posterior
m (ŷ − θw1 ) /2 (27b)
p(ω2 , φ, θ, ρ, η, α, β, b0 |ω̂2 , ψ̂, θ̂u ) by a full GS pro-
where K is the length of the data vector ŷ and cedure similar to the above, it becomes inefficient to
Xm = ∂L1 (κ)/∂κm . Inspired by Tipping (2001a), converge to the stationary state of the joint posterior
we fix the prior scale and rate parameters, am and PDF; the Markov chain samples may be trapped in
bm , for κm to be very small values in the illus- local maxima of the posterior PDF of the hyperpa-
trative examples later, which makes these priors rameter α due to a very large number of uncertain
noninformative over a logarithmic scale. It is seen parameters to be inferred. Instead, we introduce a
from Equation (26) that p(κ|ŷ, w1 , w2 , w3 , β, b0 ) = sequential Bayesian inference procedure to produce
10 Huang & Beck
Table 1 Table 2
Pseudocode of GS for generating posterior samples Pseudocode of GS for generating posterior samples
(n) (n)
{ω2 , φ(n) , ρ (n) , η(n) , θ (n) , β (n) , b0 (n) }, n = 1, . . . , N1 , when {θ (n) , α (n) , β (n) , b0 (n) , φ(n) , η(n) , (ω2 ) , ρ (n) }, n = 1, . . . , N2 ,
conditional on modal data {ω̂2 , ψ̂} when conditional on modal data {ω̂ , ψ̂} and calibration value
2
θ̂u
1. Initialize the samples with ρ (0) = 100, η(0) = 100,
Ns (0)
1. Initialize the samples α j = 100 ( j = 1, . . . , N p ),
β (0) = 100, b0 = 100, (ω2 )(0) = ω̂r2 /Ns , θ (0) = θ̂u , a
r =1 β (0) = 100
chosen “calibration” value for the stiffness parameter (n)
2. Get samples {ω2 , φ(n) , ρ (n) , η(n) }, n = 1, . . . , N2 , by
vector ignoring the Markov chain samples for θ, β, and b0
2. For n = 1 to N1 obtained from the pseudocode in Table 1
(n−1)
3. Sample φ(n) ∼ p(φ|ψ̂, (ω2 ) , θ (n−1) , η(n−1) , β (n−1) ) 3. For n = 1 to N2
4. Sample η ∼ p(η|ψ̂, φ )
(n) (n) (n)
4. Sample θ (n) ∼ p(θ|θ̂u , φ(n) , (ω2 ) , β (n−1) , α (n−1) )
5. Sample (ω2 )(n) ∼ p(ω2 |ω̂2 , φ(n) , θ (n−1) , β (n−1) , ρ (n−1) ) 5. Sample α (n) ∼ p(α|θ̂u , θ (n) )
(n)
6. Sample ρ (n) ∼ p(ρ|ω̂2 , (ω2 ) ) (n)
6. Sample β (n) ∼ p(β|φ(n) , (ω2 ) , θ (n) , b0 (n−1) )
(n)
7. Sample θ (n) ∼ p(θ|φ(n) , (ω2 ) , β (n−1) ) 7. Sample b0 (n) ∼ p(b0 |β (n) )
2 (n)
8. Sample β ∼ p(β|φ , (ω ) , θ (n) , b0 (n−1) )
(n) (n)
8. End for
9. Sample b0 (n) ∼ p(b0 |β (n) ) 9. For n = 1 to N2
(n−1)
10. End for 10. Sample φ(n) ∼ p(φ|ψ̂, (ω2 ) , θ (n) , η(n−1) , β (n) )
(n)
11. Samples {ω2 , φ(n) , ρ (n) , η(n) , θ (n) , β (n) , b0 (n) : 11. Sample η ∼ p(η|ψ̂, φ )
(n) (n)
n = 1, . . . , N1 } are obtained, which are consistent with the 12. Sample (ω2 )(n) ∼ p(ω2 |ω̂2 , φ(n) , θ (n) , ρ (n) , β (n) )
joint posterior p(ω2 , φ, ρ, η, θ, β, b0 |ω̂2 , ψ̂) 13. Sample ρ (n) ∼ p(ρ|ω̂2 , (ω2 ) )
(n)
Note: Based on the hierarchical model in Figure 1, variables and data 14. End for
(n)
sets are dropped in the conditioning if the parameter to be sampled is Samples {θ (n) , α (n) , β (n) , b0 (n) , φ(n) , η(n) , (ω2 ) , ρ (n) :
independent of them. n = 1, . . . , N2 } are obtained which are consistent with the
joint posterior p(θ, α, β, b0 , ω2 , φ, ρ, η|ω̂2 , ψ̂, θ̂u )
Note: Based on the hierarchical model in Figure 1, variables and data
a more effective sampling method based on the
sets are dropped in the conditioning if the parameter to be sampled is
hierarchical Bayesian model in Figure 1. Using the independent of them.
probability product rule, the marginal posterior PDF
for {θ, α, β, b0 } can be expressed as:
joint posterior PDF for the whole uncertain parameter
p θ, α, β, b0 |ω̂2 , ψ̂, θ̂u vector as:
≈ ∫ p θ, α, β, b0 |ω2 , φ, ρ, η, θ̂u (29) p ω2 , φ, θ, ρ, η, α, β, b0 |ω̂2 , ψ̂, θ̂u
p ω2 , φ, ρ, η|ω̂2 , ψ̂ dω2 dφdρdη = p ω2 , φ, ρ, η|θ, α, β, b0 , ω̂2 , ψ̂, θ̂u (30)
p θ, α, β, b0 |ω̂ , ψ̂, θ̂u
2
where from Figure 1, the first factor is independent of ω̂2
and ψ̂, whereas the second factor is approximated to be To characterize the full posterior uncertainty,
independent of θ̂u . Based on Monte Carlo estimation of we then take the previously generated samples
the integral in Equation (29), samples can be obtained {θ (n) , α (n) , β (n) , b0 (n) }, n = 1, . . . , N , and draw posterior
(n)
from the marginal posterior PDF p(θ, α, β, b0 |ω̂2 , ψ̂, θ̂u ) samples {ω2 , φ(n) , ρ (n) , η(n) }, n = 1, . . . , N , from the
by first sampling from the marginal posterior conditional posterior PDF p(ω2 , φ, ρ, η|θ (n) , α (n) ,
p(ω2 , φ, ρ, η|ω̂2 , ψ̂), and then using each of these β (n) , b0 (n) , ω̂2 , ψ̂, θ̂u ), n = 1, . . . , N , by using GS. The
samples to sample from the conditional posterior reader can refer to Table 2 for the pseudocodes of
PDF p(θ, α, β, b0 |ω2 , φ, ρ, η, θ̂u ). To generate samples the full GS algorithm, which generates the posterior
(n) (n)
{ω2 , φ(n) , ρ (n) , η(n) }, n = 1, . . . , N , conditional on ω̂2 samples {θ (n) , α (n) , β (n) , b0 (n) , ω2 , φ(n) , ρ (n) , η(n) }, n =
and ψ̂, we can use the pseudocode in Table 1 and ignore 1, . . . , N , conditional on ω̂2 , ψ̂ and θ̂u .
the samples for θ, β and b0 to automatically get sam- Remark 4.5. The analytical derivation of the conditional
ples from the marginal PDF p(ω2 , φ, ρ, η|ω̂2 , ψ̂). posterior PDFs from Equations (23)–(28) is important
Then we implement the GS to collect samples for the effectiveness of this GS algorithm. The model
{θ (n) , α (n) , β (n) , b0 (n) }, n = 1, . . . , N , generated from parameters are arranged in eight groups represented by
(n)
the PDF p(θ, α, β, b0 |ω2 , φ(n) , ρ (n) , η(n) , θ̂u ), where {ω2 , φ, θ, α, ρ, η, β, b0 }, which leads to a very desirable
(n)
we incorporate the nth sample {ω2 , φ(n) , ρ (n) , η(n) } in feature that it should be applicable to linear Bayesian
the n th iteration sequentially. We then express the full model updating problems of arbitrarily high dimensions,
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 11
in contrast with other MCMC algorithms, because it is 4.4 Comparison among the three algorithms
effectively like an eight-dimensional updating problem.
In Table 3, we compare the three algorithms presented
above with respect to several aspects of the theories and
Remark 4.6. For the updating of the stiffness scal- computation. It is seen that the full GS SBL algorithm
ing parameter θ and system modal parameters ω2 and can be applied without any concern, unlike the other
φ, the corresponding model classes M(α, β), M(ρ, β), two algorithms. In addition, the full GS algorithm is also
and M(η, β) are investigated, as seen from the hierar- capable of characterizing the full posterior uncertainty
chical Bayesian model in Figure 1. The application of of all uncertain parameters in the hierarchical model
Bayes’ Theorem at the model class level automatically in Figure 1. This is important because the Laplace ap-
penalizes models of θ (ω2 or φ) that “under-fit” or “over- proximation for the optimization of hyperparameters in
fit” the associated data θ̂u (ω̂2 or ψ̂), therefore obtaining the fast SBL and partial GS algorithms is based on the
reliable updating results for the three parameter vectors, assumption that the posterior of these hyperparameters
which is the Bayesian Ockham Razor at work (Beck, has a unique maximum with a sharp peak, which may
2010). not be the case, especially when using noisy incomplete
data from a complicated SHM field environment. How-
ever, the full GS procedure presented in this section is
Remark 4.7. In the implementation of GS algorithms,
still applicable, even when the model class is not glob-
the burn-in period before the Markov chain reaches its
ally identifiable. If the posterior uncertainty of some
stationary state is determined by visual inspection of a
of the stiffness parameters is large, however, it means
plot of the Markov chain samples as they are sequentially
that the modal data are not sufficiently informative for
generated following the strategy in Ching et al. (2006). To
reliable inference, even with the inducement of sparse
determine how many Markov chain samples to generate
stiffness changes. In such a case, we will have much less
after burn-in is achieved, we visually check the conver-
confidence in the identification of the corresponding
gence of the plotted stiffness samples of interest.
substructure stiffness reductions and we will be left
with significant uncertainty about damage in those
Remark 4.8. For damage detection and assessment, we substructures.
can follow Huang et al. (2017b), where the posterior un- The illustrative results given later show that the full
certainty of θu from the calibration stage in the poste- GS algorithm outperforms the other two algorithms.
rior sampling of the stiffness scaling parameter θ d for We have explicitly addressed several important issues
the monitoring stage can be incorporated in the sampling to enhance reliable application of the full GS algorithm
procedure. to real structural data. We discuss these issues in this
section.
Remark 4.9. Much more computing resources are re-
quired for the GS SBL algorithms than the fast SBL al- 4.4.1 Producing model sparseness. With regard to the
gorithm, which is the cost for better posterior uncertainty generation of model sparseness in the stiffness change
quantification. Therefore, the choice between these two (θ − θ̂u ), both the full and partial GS algorithms treat
methods in real applications is a trade-off between the the uncertainty in the calibration value θ̂u because it
computation time and the level of accuracy of the uncer- may be difficult to get an “accurate” structural model
tainty quantification that the user is willing to accept. based on the available data in the calibration stage. This
Table 3
A comparison among the three SBL algorithms for detection and quantification of sparse substructural stiffness changes
is not done in the standard SBL algorithms. However, and system modal parameter estimation because the
the partial GS algorithm ignores the posterior uncer- uncertainties of the hyperparameters are not explicitly
tainty in the hyperparameter α because it learns the hy- treated.
perparameter values by maximization of the evidence
function, which induces many α j to approach infinity. Remark 4.10. In real applications with various envi-
The full posterior uncertainty of the hyperparameter ronmental and operational conditions, if we do not have
vector α is treated in the full GS algorithm by sampling any information about these conditions, prediction errors
from its posterior PDF, which is derived from the condi- with large variances are required to give model predic-
tional dependency in the hierarchical model in Figure 1. tions. This will lead to large posterior uncertainty for the
This is a new strategy for inducing model sparseness that stiffness parameters. In this case, the full GS SBL algo-
avoids the optimization procedure for evidence maxi- rithm proposed can be applied without any concern be-
mization. This is useful as we have found that there are cause the full GS algorithm is capable of characterizing
local maxima of the evidence function that may trap the the full posterior uncertainty of all uncertain parameters.
hyperparameter optimization if the data quantity and
quality is insufficient, leading to non-robust Bayesian
updating results (Huang et al., 2014). We will show later
in some illustrative results that the full GS algorithm 5 ILLUSTRATIVE RESULTS USING THE
also promotes model sparseness well. PROPOSED FULL GS ALGORITHM
Table 4
Considered damage pattern cases
Fig. 4. Post burn-in samples for some posterior stiffness Fig. 5. (a) and (b) MAP values; (c) and (d) post burn-in
parameters for the Configuration 5 scenario, plotted in: samples for hyperparameter α for Configuration 5 scenario,
(a) and (c) {α1,+x , α1,+y }; (b) and (d) {α1,−x , α1,−y } spaces, by plotted in: (a) and (c) {α1,+x , α1,+y }; (b) and
running: (a) and (b) partial GS SBL algorithm; (c) and (d) (d) {α1,−x , α1,−y } spaces, by running: (a) and (b) partial GS
full GS SBL algorithm. SBL algorithm; (c) and (d) full GS SBL algorithm.
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 15
Fig. 7. Configuration 3 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b) partial
GS SBL algorithm; (c) full GS SBL algorithm. The damage substructures correspond to θ1,−y, θ2,−y, θ3,−y and θ4,−y .
Fig. 8. Configuration 4 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b) partial
GS SBL algorithm; (c) full GS SBL algorithm. The damage substructures correspond to θ1,−y and θ4,−y .
In real implementations, there is a concern that when bustness of the algorithm. We observed this type of be-
implementing the partial GS SBL algorithm, there may havior when applying SBL for Bayesian compressive
be some local maxima that trap the optimization of the sensing by optimization of the evidence with respect to
hyperparameters {α, ρ, η, b0 } and hence reduce the ro- the hyperparameters of the hierarchical model (Huang
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 17
Fig. 9. Configuration 5 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b) partial
GS SBL algorithm; (c) full GS SBL algorithm. The only damage substructure corresponds to θ1,−y .
Fig. 10. Configuration 6 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b)
partial GS SBL algorithm; (c) full GS SBL algorithm. The only damage substructure corresponds to θ2,+x .
et al., 2014, 2016). This concern is alleviated for the full 6 CONCLUDING REMARKS
GS SBL algorithm because its correct characterization
of the posterior uncertainty of all uncertain parameters Probability as a logic provides a rigorous framework
is expected to help the algorithm escape being trapped for a Bayesian approach to quantifying modeling un-
in local maxima. certainty in model updating in system identification.
18 Huang & Beck
It allows plausible reasoning about structural behav- also been developed. The full and partial GS algorithms
ior based on incomplete information. A key con- differ in their strategies to deal with the posterior uncer-
cept is a stochastic system model class, which defines tainty of the hyperparameters. The comparison study
the fundamental probability models that allow robust of the effectiveness and robustness of the presented
stochastic structural analyses to be performed. Such a algorithms was demonstrated through the analysis of
model class can be constructed by stochastic embed- the IASC–ASCE Phase II experimental benchmark
ding of any deterministic model of the structure’s I/O problem to identify brace damage. The results suggest
behavior. One distinguishing aspect of the Bayesian that the full GS SBL algorithm is more reliable than the
framework is marginalization of posterior PDFs, where other two SBL algorithms when using real experimental
instead of seeking to estimate all “nuisance” parame- data where there is significant modeling error. This
ters in the models, the goal is to integrate them out superior performance is attributed to its more robust
to properly preserve their contribution to the poste- promotion of model sparseness and more accurate
rior uncertainty of the parameters of interest. This al- posterior uncertainty quantification for the stiffness
lows us to assess the relative plausibility of each model parameters and system modal parameters.
within a set of candidate model classes chosen to repre-
sent the uncertain structural behavior. Applying Bayes’
Theorem at the model class level automatically penal- ACKNOWLEDGMENTS
izes models that are too simple (“under-fit” the data)
and too complex (“over-fit” the data), which is the Yong Huang was supported by the George W. Hous-
Bayesian Ockham Razor at work. This quantitative im- ner Earthquake Engineering Research fund at the Cal-
plementation of Ockham’s Razor is a natural conse- ifornia Institute of Technology and grants from the
quence of applying Bayesian updating at the model class National Natural Science Foundation of China (No.
level. 51778192 and 51308161). This support is gratefully
SBL is an effective strategy to incorporate sparse- acknowledged.
ness during model updating by introducing a hierarchi-
cal model and automatically implementing the Bayesian
Ockham Razor with respect to the hyperparameters REFERENCES
that control the sparseness. The focus in this article is
applying SBL with ARD in Bayesian system identifi- Amezquita-Sanchez, J. P., Park, H. S. & Adeli, H. (2017), A
cation based on noisy incomplete modal data where novel methodology for modal parameters identification of
we can impose spatially sparse stiffness changes when large smart structures using music, empirical wavelet trans-
form, and Hilbert transform, Engineering Structures, 147,
updating a structural model. In general, the ARD ap- 148–59.
proach allows model class selection by suppressing Beck, J. L. (2010), Bayesian system identification based on
terms in a linear-in-the-parameters expansion of possi- probability logic, Structural Control and Health Monitoring,
ble terms, where each subset of terms could form a sep- 17, 825–47.
arate model class. It does this by inducing a sparse pa- Beck, J. L. & Katafygiotis, L. S. (1998), Updating models and
their uncertainties. I: Bayesian statistical framework, Jour-
rameter vector during updating. In this work, a modified nal of Engineering Mechanics, 124, 455–61.
ARD approach is used during the updating to induce Beck, J. L. & Yuen, K. V. (2004), Model selection using re-
sparseness in the change in the stiffness scaling param- sponse measurements: a Bayesian probabilistic approach,
eters θ from the values appropriate for the original un- Journal of Engineering Mechanics, 130, 192–203.
damaged structure to the current potentially damaged Chiachio, M., Beck, J. L., Chiachio, J. & Guillermo, R. (2014),
Approximate Bayesian computation by subset simulation,
structure. This reduces the potential ill-conditioning to SIAM Journal on Scientific Computing, 36(3), A1339–58.
provide more reliable identification results. Ching, J. & Beck, J. L. (2003), Two-Step Bayesian Struc-
Our recently developed fast SBL algorithm and ture Health Monitoring Approach for IASC-ASCE Phase II
partial GS SBL algorithm have been briefly reviewed in Simulated and Experimental Benchmark Studies, Technical
the article. Based on a similar hierarchical SBL model, Report, EERL 2003-02, Earthquake Engineering Research
Laboratory, California Institute of Technology, Pasadena,
we improved our SBL theory for system identification CA.
and developed a full GS procedure to provide a full Ching, J. & Chen, Y. (2007), Transitional Markov Chain
characterization of the posterior uncertainty of all Monte Carlo method for Bayesian model updating, model
unknown parameters and hyperparameters, which is class selection and model averaging, Journal of Engineering
especially useful when the model class is not globally Mechanics, 133, 816–32.
Ching, J., Muto, M. & Beck, J. L. (2006), Structural model up-
identifiable because the Laplace approximations in the dating and health monitoring with incomplete modal data
earlier SBL algorithms are not accurate. An efficient using Gibbs sampler, Computer-Aided Civil and Infrastruc-
sampling strategy for the full GS SBL algorithm has ture Engineering, 21(4), 242–57.
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 19
Ching, J., Phoon, K.-K., Beck, J. L. & Huang, Y. (2017), Mackay, D. J. C. (1992), Bayesian methods for adaptive mod-
Identifiability of geotechnical site-specific trend functions, els. Ph.D. thesis in Computation and Neural Systems, Cali-
ASCE-ASME Journal of Risk and Uncertainty in Engineer- fornia Institute of Technology, Pasadena, CA.
ing Systems, Part A: Civil Engineering, 3(4), 04017021. Mu, H. Q. & Yuen, K. V. (2017), Novel sparse Bayesian learn-
Dyke, S. J., Bernal, D., Beck, J. L. & Ventura, C. (2003), ing and its application to ground motion pattern recog-
Experimental phase II of the structural health monitoring nition, Journal of Computing in Civil Engineering 31(5),
benchmark problem, in Proceedings of 16th Engineering https://doi.org/10.1061/(ASCE)CP.1943-5487.0000668.
Mechanics Conference, ASCE, Reston, VA. Oh, B. K., Kim, D. & Park, H. S. (2017), Modal response-
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, based visual system identification and model updating
A. & Rubin, D. B. (2013), Bayesian Data Analysis, 3rd edn., methods for building structures, Computer-Aided Civil and
Chapman & Hall/CRC, Boca Raton, FL. Infrastructure Engineering, 32(1), 34–56.
Green, P. L., Cross, E. J. & Worden, K. (2015), Bayesian sys- Oh, C. K., Beck, J. L. & Yamada, M. (2008), Bayesian learn-
tem identification of dynamical systems using highly infor- ing using automatic relevance determination prior with an
mative training data, Mechanical Systems and Signal Pro- application to earthquake early warning, Journal of Engi-
cessing, 56–57, 109–22. neering Mechanics, 134(12), 1013–20.
Gull, S. F. (1988), Bayesian inductive inference and maximum Papadimitriou, C., Beck, J. L. & Katafygiotis, L. S. (2001), Up-
entropy, in G. J. Erickson and C. R. Smith (eds.), Maximum dating robust reliability using structural test data, Proba-
Entropy and Bayesian Methods, Kluwer Academic Publish- bilistic Engineering Mechanics, 16, 103–13.
ers, Dordrecht, The Netherlands, pp. 53–74. Perez-Ramirez, C. A., Amezquita-Sanchez, J. P., Adeli, H.,
Huang, Y. & Beck, J. L. (2015), Hierarchical sparse Bayesian Valtierra-Rodriguez, M., Camarena-Martinez, D. & Rene
learning for structural health monitoring with incomplete Romero-Troncoso, R. J. (2016), New methodology for
modal data, International Journal for Uncertainty Quantifi- modal parameters identification of smart civil structures us-
cation, 5(2), 139–69. ing ambient vibrations and synchrosqueezed wavelet, Engi-
Huang, Y., Beck, J. L. & Li, H. (2017a), Hierarchical sparse neering Applications of Artificial Intelligence, 48, 1–16.
Bayesian learning for structural damage detection: theory, Shan, J., Ouyang, Y., Yuan, H. & Shi, W. (2016), Seismic
computation and application, Structural Safety, 64, 37–53. data driven identification of linear physical models for
Huang, Y., Beck, J. L. & Li, H. (2017b), Bayesian system building structures using performance and stabilizing objec-
identification based on hierarchical sparse Bayesian learn- tives, Computer-Aided Civil and Infrastructure Engineering,
ing and Gibbs sampling with application to structural dam- 31(11), 846–70.
age assessment, Computer Methods in Applied Mechanics Tipping, M. E. (2000), The relevance vector machine, in S.
and Engineering, 318, 382–411. A. Solla, T. K. Leen, and K.-R. Müller (eds.). Advances
Huang, Y., Beck, J. L., Wu, S. & Li, H. (2014), Robust in Neural Information Processing Systems 12, MIT Press,
Bayesian compressive sensing for signals in structural Cambridge, MA, pp. 652–58.
health monitoring, Computer-Aided Civil and Infrastruc- Tipping, M. E. (2001a), Sparse Bayesian learning and the rel-
ture Engineering, 29(3), 160–79. evance vector machine, Journal of Machine Learning Re-
Huang, Y., Beck, J. L., Wu, S. & Li, H. (2016), Bayesian com- search, 1, 211–44.
pressive sensing for approximately sparse signals and ap- Tipping, M. E. (2001b), Sparse kernel principal component
plication to structural health monitoring signals for data analysis, in Advances in Neural Information Processing Sys-
loss recovery, Probabilistic Engineering Mechanics, 46, 62– tems, vol. 13, MIT Press, Cambridge, MA, pp. 633–39.
79. Vakilzadeh, M. K., Huang, Y., Beck, J. L. & Abrahamsson, T.
Jaynes, E. T. (1983), Papers on Probability, Statistics and Sta- (2017), Approximate Bayesian computation by subset sim-
tistical Physics, R. D. Rosenkrantz (ed.), D. Reidel Publish- ulation using hierarchical state-space models, Mechanical
ing, Dordrecht, Holland. Systems and Signal Processing, 84(Part B), 2–20.
Jaynes, E. T. (2003), Probability Theory: The Logic of Science, Vanik, M. W., Beck, J. L. & Au, S. K. (2000), Bayesian proba-
Cambridge University Press, Cambridge, UK. bilistic approach to structural health monitoring, Journal of
Jefferys, W. H. & Berger, J. O. (1992), Ockham’s razor and Engineering Mechanics-ASCE, 126(7), 738–45.
Bayesian analysis, American Scientist, 80, 64–72. Wang, Y. & Zhao, T. (2017), Statistical interpretation of soil
Li, Z., Park, H. S. & Adeli, H. (2017), New method for modal property profiles from sparse data using Bayesian compres-
identification and health monitoring of superhighrise build- sive sampling, Geotechnique, 67(6), 523–36.
ing structures using discretized synchrosqueezed wavelet Yuen, K. V. & Katafygiotis, L. S. (2001), Bayesian time–
and Hilbert transforms, The Structural Design of Tall and domain approach for modal updating using ambient data,
Special Buildings, 26(3), 1312–28. Probabilistic Engineering Mechanics, 16(3), 219–31.