You are on page 1of 19

Computer-Aided Civil and Infrastructure Engineering 00 (2018) 1–19

Full Gibbs Sampling Procedure for Bayesian System


Identification Incorporating Sparse Bayesian Learning
with Automatic Relevance Determination
Yong Huang*
Key Lab of Structures Dynamic Behavior and Control of the Ministry of Education, School of Civil Engineering, Harbin
Institute of Technology and Key Lab of Smart Prevention and Mitigation for Civil Engineering Disasters of the Ministry
of Industry and Information, Harbin Institute of Technology, Harbin, China

&

James L. Beck
Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, USA

Abstract: Bayesian system identification has attracted capability of the presented SBL algorithms for structural
substantial interest in recent years for inferring struc- system identification.
tural models and quantifying their uncertainties based
on measured dynamic response in a structure. The rel- 1 INTRODUCTION
ative plausibility of each structural model in a speci-
fied model class is quantified by its posterior probability In the last decade, worldwide efforts to implement
from Bayes’ Theorem. The relative plausibility of each structural health monitoring (SHM) systems on civil
model class within a set of candidate model classes for infrastructure have been rewarded by an increasing
the structure can also be assessed via Bayes’ Theorem. number of collected data sets of structural vibration
Computation of this posterior probability over all can- response. Despite the importance of these structural
didate model classes automatically applies a quantitative vibration records to understand the behavior and
Ockham’s razor that trades off a data-fit measure with performance of real, full-scale structures under real
an information-theoretic measure of model complexity, environmental conditions, analysis of these data has
which penalizes model classes that “over-fit” the data. lagged far behind their rate of collection. Systematic
In this article, we first present a general Bayesian sys- computer-based information extracting techniques,
tem identification framework and point out that combin- such as those developed in system identification re-
ing it with sparse Bayesian learning (SBL) is an effective search are a key component in model-based inversions
strategy to implement the Bayesian Ockham razor. Then in SHM for detection and assessment of damage. In sys-
we review our recent progress in exploring SBL with the tem identification, the goal is to use observed structural
automatic relevance determination likelihood concept to response data and prior knowledge about the structure
detect and quantify spatially sparse substructure stiffness to update mathematical models of the behavior of a
reductions. To characterize the full posterior uncertainty system such as a bridge or building when subject to
for this problem, an improved Gibbs sampling procedure dynamic excitation. In addition to SHM, the goals of
for SBL is then developed. Finally, illustrative results are such data-informed modeling might also include pro-
provided to compare the performance and validate the viding a better understanding of the structural system’s
∗ To
behavior and allowing more accurate predictions of
whom correspondence should be addressed. E-mail:
huangyong@hit.edu.cn and huangyongthere@outlook.com.
its future response to specified excitations. Despite a
long history, the development of algorithms for system

C 2018 Computer-Aided Civil and Infrastructure Engineering.

DOI: 10.1111/mice.12358
2 Huang & Beck

identification continues to be an active area of research during parameter estimation and model updating. By
in structural dynamics (e.g., Green et al., 2015; Shan introducing a hierarchical Bayesian model class, this
et al., 2016; Perez-Ramirez et al., 2016; Huang et al., machine learning technique is able to select automat-
2017b; Vakilzadeh et al., 2017; Oh et al., 2017; Li et al., ically a sparse subset of all the uncertain parameters
2017; Amezquita-Sanchez et al., 2017). solely from the available data. SBL with an automatic
One of the main difficulties in system identification relevance determination (ARD) prior was originally
is that it is impossible to exactly model the full be- introduced for the relevance vector machine (Tipping,
havior of a structure by using the limited sensor data 2000) and sparse principal component analysis (Tip-
and prior knowledge available. As any model gives an ping, 2001b). It has been used recently in earthquake
approximation to the real system behavior, there are engineering (Mu and Yuen, 2017), compressive sensing
always modeling uncertainties involved; for example, of SHM signals (Huang et al., 2014), and geotechnical
what values of the model parameters are appropriate engineering (Ching et al., 2017; Wang and Zhao, 2017).
and how well does the model predict the real system re- A key feature in using SBL with the ARD prior is that
sponse? Another difficulty is that for complex system the model of the observations is a linear function of
models, single-point parameter estimation often gives those model parameters for which sparseness is to be
nonunique results (e.g., multiple least-squares or max- enforced. This is not the case, however, when updating
imum likelihood estimates). To make more robust pre- the stiffness parameters in a structural model.
dictions, one should track all plausible values of the Recently, we introduced SBL into Bayesian system
parameters based on the data and also explicitly treat identification by incorporating the concept of ARD
the uncertain prediction errors (the difference between through a likelihood function rather than through a
the response of the real system and that of the system prior as in the original SBL approach (Tipping, 2001a).
model), as well as possible measurement errors. These Our procedure gives an effective implementation of
issues have motivated numerous researchers to tackle the Bayesian Ockham Razor and automatically induces
the problem of structural system identification from a sparse changes in model updating, which reduces
Bayesian perspective (e.g., Beck, 2010; Green et al., the ill-conditioning in inferring the updated stiffness
2015; Huang et al., 2017b). parameter values. In a series of three papers, we
In contrast to the point estimates of the parameters have introduced and then improved our SBL theory
used in the conventional deterministic or frequentist– for system identification by finding ways to remove
probabilistic methods, the Bayesian probabilistic approximations, so that we can better characterize the
framework uses Bayes’ Theorem to quantify the rela- posterior uncertainty of the parameters and hyperpa-
tive plausibility based on the data of all possible values rameters (Huang and Beck, 2015; Huang et al., 2017a,
of the model parameters via their posterior probability b). In this work, we further improve our SBL approach
density function (PDF). This procedure is used to by providing a full characterization of the posterior
learn about all plausible models for representing the uncertainty rather than just using maximum a posteriori
system’s behavior where each parameter value specifies (MAP) values for the hyperparameters, which is accom-
a possible model for the system. As there is always plished by developing a SBL procedure for Bayesian
uncertainty in which parameterized model class to system identification based on full Gibbs sampling (GS).
choose to represent a system, one can also choose a set The remainder of the article is organized as fol-
of candidate model classes and calculate their posterior lows. In Section 2, we present a general framework
probability based on the data by applying Bayes’ The- for Bayesian system identification. SBL with the ARD
orem at the model class level. An information-theoretic prior is then introduced together with the Bayesian
interpretation (Beck, 2010) shows that the posterior Ockham Razor in Section 3, with a discussion of why
probability of each model class depends on the differ- they induce sparseness during Bayesian updating. In
ence between a measure of the average data-fit of the Section 4, we give an overview of our recent progress
model class and the amount of information extracted in developing SBL algorithms for system identification
from the data by the model class, which penalizes and propose a new full GS based SBL algorithm to fully
model classes that “over-fit” the data. Comparing the characterize the posterior uncertainty. We then com-
posterior probability of each model class therefore pare our new SBL algorithm with two previous SBL
provides a quantitative Ockham (Occam) Razor (Gull, algorithms of ours in terms of theory and computa-
1988; Jefferys and Berger, 1992; Mackay, 1992), that is, tion. Applications of our Bayesian methods to struc-
loosely speaking, models should be no more complex tural data for a well-known experimental benchmark
than is sufficient to explain the data. problem are presented in Section 5 to show the capabil-
Sparse Bayesian learning (SBL) is a method that ity of these methods. Concluding remarks are made in
uses this Bayesian Ockham Razor to induce sparseness Section 6.
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 3

2 BAYESIAN SYSTEM IDENTIFICATION some prior constraints. This procedure is called stochas-
tic embedding of the parameterized deterministic model
Consider the problem of predicting the output z(t) to in Beck (2010). A probability model can also be cho-
some input u(t) of a real dynamic system over some sen for the measurement error history m1:N based on
time interval, t ∈ [0, t f ], by using a computational model a separate study of the sensors, where m1:N is taken
of the system. We use un = u(nt) ∈ R N I and zn = to be probabilistically independent of the prediction
z(nt) ∈ R No to denote the real system input and out- error history e1:N . This leads to a probability model
put, respectively, at discrete times tn = nt, n ∈ Z+ , and p(y1:N |û0:N , w, M) for predicting the sensor output y1:N .
use u0:n = [u0T , u1T , . . . , unT ]T and z0:n = [z0T , z1T , . . . , znT ]T In many applications, mn is negligible compared with en
to denote the discrete-time histories of the system input and so it can be dropped, that is, the difference between
and output up to time tn . the measured system output yn and the actual output zn
is ignored but not the difference between the real sys-
tem and model outputs, zn and qn .
2.1 Stochastic model class
The data D N can be used to update the relative plau-
In modeling the input and output (I/O) behavior of sibility of each stochastic I/O model p(y1:n |û0:N , w, M),
a real system, one cannot expect any chosen deter- w ∈ W ⊂ R N p , defined by the stochastic model class
ministic model to make perfect predictions and the M, by computing the posterior PDF p(w|D N , M) from
prediction errors of any such model will be uncer- Bayes’ Theorem:
tain. This motivates the introduction of a stochas-
p (w|D N , M) = p (D N |w, M) p (w|M) / p (D N |M)
tic (or Bayesian) model class M (Beck, 2010) for a (2)
system that consists of a set of stochastic I/O pre- = c−1 p (D N |w, M) p (w|M)
dictive models (also called stochastic forward models) where c = p(D N |M) is the normalizing constant, which
{ p(z1:n |u0:n , w, M) : w ∈ W ⊂ R N p } where each model is is called the evidence or marginal likelihood for the
specified by a PDF valid for any n ∈ Z+ , together with model class M given by data D N ; p(D N |w, M), as a
a chosen prior probability distribution p(w|M) over this function of w, is the likelihood function, which expresses
set that quantifies the initial relative plausibility of each the probability of getting data D N based on the PDF
I/O probability model corresponding to each value of p(y1:N |û0:N , w, M) by substituting the measured output
the parameter vector w. Any deterministic I/O model data ŷ1:N for y1:N . Note that a model class can be used
of a system that involves uncertain parameters can be to perform both prior (initial) and posterior (updated
used to construct such a model class for the system by using system sensor data) robust predictive analyses,
stochastic embedding (Beck, 2010) in which the Princi- which can be used during design and operation, respec-
ple of Maximum Information Entropy plays an impor- tively, of a structure, based purely on the probability
tant role (Jaynes, 1983, 2003) (see Equation (1) in the logic axioms (Papadimitriou et al., 2001).
next subsection).

2.3 Bayesian updating for multiple model classes


2.2 Bayesian updating for a given model class
Often the system modeler must choose among a set of
If sensor data D N = {û0:N , ŷ1:N } are available where ŷ1:N competing candidate model classes because of the un-
and û0:N are the measured time histories of the system certainty in which model class best represents the dy-
output and the corresponding measured system input namic behavior of a system. If M denotes the propo-
(if available), respectively, sampled at time interval t, sition that specifies a discrete set of candidate model
then a model can be developed to predict the measured classes {Mm : m = 1, 2, . . . , N M } that is being consid-
system output yn at each time tn by using: ered for a system, together with a prior probability
yn = zn + mn = qn (û0:n , w) + en + mn (1) distribution p(Mm |M) over this discrete set, then the
posterior probability P(Mm |D N , M) can be computed
where mn and en denote the measurement noise and from Bayes’ Theorem at the model class level:
output prediction error at time tn and the system out-
put equation zn = qn (û0:n , w) + en is used where qn is P(Mm |D N , M) = p (D N |Mm ) P (Mm |M) / p (D N |M) (3)
the corresponding output of a parameterized determin-
Here, p(D N |Mm ) is the evidence for Mm provided by
istic model that can be based on theoretical principles
the data D N (additional conditioning on M is irrele-
(e.g., a finite element model). A probability model can
vant), which is given by the Total Probability Theorem:
be chosen for the I/O behavior by selecting a PDF
for e1:n that maximizes Shannon’s entropy (a measure
of uncertainty due to missing information) subject to p (D N |Mm ) = ∫ p (D N |wm , Mm ) p (wm |Mm ) dwm (4)
4 Huang & Beck

where wm is the vector of model parameters for Mm . A = ∫ log [ p (D N |wm , Mm )] p (wm |D N , Mm ) dwm
uniform prior probability distribution can be chosen for  
p (wm |D N , Mm )
the candidate model classes, that is, p(Mm |M) = 1/M, − ∫ log p (wm |D N , Mm ) dwm (5)
p (wm |Mm )
if the model classes are considered equally plausible a   
priori (our convention is to use P(·) for probabilities p (wm |D N , Mm )
= E [log ( p (D N |wm , Mm ))] − E log
and p(·) for PDFs). p (wm |Mm )
The calculation of the posterior probability where the expectations E[·] are taken with respect
P(Mm |D N , M) in Equation (3) provides a proce- to the posterior p(wm |D N , Mm ). The first term is the
dure for Bayesian model class selection (or comparison, posterior mean of the log likelihood function, which is a
or assessment), where the computation of the multi- measure of the average data-fit of the model class Mm ,
dimensional integral in Equation (4) for the evidence and the second term is the Kullback–Leibler informa-
function is vital. If there is no analytical solution for tion, or relative entropy, of the posterior relative to the
Equation (4), Laplace’s approximation method can prior, which is a measure of the model complexity (the
be used when the model class is globally identifiable amount of information gain about wm from the data
based on the available data D N (e.g., Beck and Yuen D N ) and is always nonnegative. The merit of Equa-
2004; Beck 2010). When the chosen class of models tion (5) is that it shows rigorously, without introducing
is unidentifiable or locally identifiable based on the ad hoc concepts, that the log evidence for Mm explicitly
data D N so that there are multiple MLEs (maximum builds in a trade-off between the data-fit of the model
likelihood estimates) (Beck and Katafygiotis, 1998), class and its information-theoretic complexity. This is
only stochastic simulation methods are practical to important in system identification applications because
calculate the model class evidence, such as the Transi- too complex models often lead to over-fitting of the
tional Markov chain Monte Carlo simulation (MCMC) data and the subsequent response predictions may then
method (Ching and Chen, 2007) or the Approximate be unreliable as they depend too much on the details of
Bayesian Computation method (Chiachio et al., 2014; the specific data, for example, measurement noise and
Vakilzadeh et al., 2017). When the posterior probability environmental effects.
of each model class, P(Mm |D N , M), has been calcu-
lated, the Total Probability Theorem can be applied to 3.2 General formulation of SBL with the ARD prior
produce the posterior hyper-robust predictive models
that combine the predictions of all plausible model Given a set of I/O data D = {û, ŷ}, suppose that the
classes in a specified set (Beck, 2010). model prediction of the output is y = f(û) + e + m ∈
R No involving a deterministic function f of the input vec-
tor û, along with uncertain prediction error e and mea-
3 SBL AND BAYESIAN OCKHAM RAZOR surement noise m. Assume that the function f is chosen
Np
as a weighted sum of N p basis functions { j (û)} j=1 :
3.1 Bayesian Ockham Razor

Np

Comparing the posterior probability of each candidate f (û) = w j  j (û) = θ (û) w (6)
model class by Equation (4) automatically implements j=1

an elegant and powerful version of Ockham’s (Oc- where θ is an No × N p matrix with the basis functions
cam’s) Razor, known as the Bayesian Ockham Razor. { j } as columns. Analysis of this model is facilitated by
A recent interesting information-theoretic interpreta- the adjustable parameters (or weights) w ∈ R N p appear-
tion (Beck, 2010) shows that the evidence p(D N |Mm ) in ing linearly. The objective here is to infer values of the
Equation (4) explicitly builds in a trade-off between a Np
parameters {w j } j=1 such that θ(û)w is a “good” approx-
data-fit measure for the model class and an information- imation of f(û) and the parameter vector w is sparse.
theoretic measure of its complexity that quantifies SBL encodes a preference for sparser parameter vec-
the amount of information that the model class ex- tors by making a special choice for the prior distribution
tracts from the data D N . This result is based on using for the parameter vector w that is known as the ARD
Equation (2) in the expression for the normalization of prior (Mackay, 1992; Tipping, 2001a; Oh et al., 2008):
the posterior PDF:

Np

Np  
Log [ p (D N |Mm )] p (w|α) = p (w j |α j ) = N w j |0, α −1
j
j=1 j=1
= ∫ log [ p (D N |Mm )] p (wm |D N , Mm ) dwm (7)
  
Np
−1/2

1

p (D N |wm , Mm ) p (wm |Mm ) = (2π ) αj 1/2
exp − α j w 2j
= ∫ log p (wm |D N , Mm ) dwm 2
p (wm |D N , Mm ) j=1
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 5

where the hyperparameter α j is the prior precision (in- where:


verse variance) for w j . An individual hyperparameter  
α̃, β̃ = arg max p (α, β|ŷ)
α j is associated independently with each weight w j , [α,β]
  (12)
thereby moderating the strength of the Gaussian prior. = arg max p (ŷ|α, β) p (α) p (β)
Note that an infinite value of α j implies that the cor- [α,β]

responding coefficient w j has an insignificant contribu- If we assign flat, noninformative prior PDFs for α and
tion to the modeling of the measurements y, because it β, this is equivalent to maximizing the evidence function
produces essentially a Dirac delta-function at zero for p(ŷ|α, β).
the prior, and so the posterior. By using the principle of
maximum information entropy (Jaynes, 1983) and incor-
porating the first two moments of y as constraints, the 3.3 Bayesian Ockham Razor implementation in SBL
combination of the prediction error and measurement Finding the optimal values (α̃, β̃) of the hyper-
noise e is modeled as a zero-mean Gaussian vector with parameters as in Equation (12) is a procedure corre-
covariance matrix β −1 I No , which gives a Gaussian pre- sponding to Bayesian model class selection (Beck and
dictive PDF: Yuen, 2004) because some of the terms in the linear-
− No β in-the-parameters expansion of possible terms in Equa-
p (y|w, β) = 2π β −1 2 exp −  y − θ (û) w 22
2 tion (6) are suppressed, where each subset of terms
Np (8) could form a separate model class. Recall from Equa-
= N (y|θ (û) w, β −1 I No ) tion (5) that the evidence function p(ŷ|α, β) can be ex-
j=1 pressed as a difference of posterior expectations:
By substituting the data ŷ for y, Equation (8) gives Log [ p (ŷ|α, β)] = E [log ( p (ŷ|w, β))]
a Gaussian likelihood function that measures how (13)
well the model for specified parameters w and β − E [log [ p (w|ŷ, α, β) / p (w|α)]]
predicts the measurements ŷ. A stochastic model class
M(α, β) is then defined by the I/O predictive model in It is found in Huang et al. (2014) that the data-fit
Equation (8) and the prior PDF on w given by measure (the first term) decreases with a lower spec-
Equation (7). ified prediction accuracy (smaller β), and this is asso-
The posterior distribution p(w|ŷ, α, β) over the ciated with sparser models (more α j s tend to infinity
weight parameters given by model class M(α, β) is during the optimization). This is because smaller β al-
computed based on Bayes’ theorem: lows more of the data misfit to be treated as prediction
errors. At the same time, smaller β, with the associated
p (w|ŷ, α, β) = p (ŷ|w, β) p (w|α) / p (ŷ|α, β) (9) larger α j s, produce a smaller Kullback–Leibler infor-
where p(ŷ|α, β) = ∫ p(ŷ|w, β) p(w|α)dw is the evidence mation (the second term in Equation (13)) between the
of the model class M(α, β). As both the prior and posterior PDF and the prior PDF, indicating that less in-
likelihood for w are Gaussian and the likelihood mean formation is extracted from the measurements ŷ by the
θ(û)w is linear in w, the posterior PDF can be expressed updated model and that the data-fit term is penalized
analytically as a multivariate Gaussian distribution: less by the positive second term in Equation (13). On
−1 the other hand, larger β produces a model that fits the
p (w|ŷ, α, β) = N (w| T θ+ β −1 A T ŷ, measurements with smaller error (larger data-fit mea-
−1 sure in Equation (13)) but the model is under sparse
βT  + A ) (10)
(more nonzero terms in Equation (6)) and so its relative
where A = diag(α j , . . . , α N p ). entropy (information entropy) is large (second term in
A continuous set of candidate model classes M(α, β) Equation (13) penalizes data-fit more). In both cases,
is defined above, and the robust posterior PDF smaller and larger β, the models give a trade-off be-
p(w|ŷ) can be computed by integrating out the poste- tween data-fitting and model complexity (more sparse-
rior uncertainty in α and β as below. We assume that ness corresponds to less model complexity) that may
the posterior p(α, β|ŷ) is highly peaked at {α̃, β̃} (the not be the optimal one that maximizes the log evidence
MAP value of {α, β}). We then treat [α, β] as a “nui- in Equation (13). The learning of hyperparameters α
sance” parameter vector and integrate it out by ap- and β by maximizing the evidence function p(ŷ|α, β) as
plying Laplace’s asymptotic approximation (Beck and in Equation (12) produces the optimal trade-off that
Katafygiotis, 1998): causes many hyperparameters α j to approach infinity
with a reasonably large value of β, giving a model w that
p (w|ŷ) = ∫ p (w|ŷ, α, β) p (α, β|ŷ) dαdβ
is both sufficiently sparse and fits the data vector ŷ well,
≈ p w|ŷ, α̃, β̃ (11) that is, it gives the best balance between data-fitting and
6 Huang & Beck

model complexity. We can also say that SBL automati- where K j ∈ R Nd ×Nd , j = 1, . . . , Nθ , is the prior choice
cally penalizes models that “under-fit” or “over-fit” the of the jth substructure stiffness matrix and the corre-
associated data ŷ. This is the Bayesian Ockham Razor sponding stiffness scaling parameter θ j is a factor that
at work. allows modification of the nominal jth substructure
stiffness so it is more consistent with the real structure
Remark 3.1. We have found the SBL algorithm with
behavior. The stiffness matrices K j could come from a
ARD suffers from a robustness problem if the number
finite-element model of the structure, then it would be
of measurements No is much smaller than the number of
appropriate to choose all θ j = 1 to give the most proba-
model parameters N p : there are local maxima for
ble value a priori for the parameter vector θ ∈ R Nθ . For
Equation (12) that may trap the hyperparameter opti-
damage detection purposes, we will exploit the fact that
mization, leading to non-robust Bayesian updating re-
damage-induced stiffness reductions typically occur in
sults (Huang et al., 2014). Several robustness enhance-
a small number of locations in the absence of structural
ment algorithms (Huang et al., 2014, 2016) have been
collapse, and so the potential change in θ compared with
developed based on different strategies, with the goal of
that of a reference calibration stage is expected to be a
increasing signal reconstruction accuracy in compressive
sparse vector with relatively few nonzero components.
sensing for structural health monitoring signals.
The following joint prior PDF for system parameters
ω2 and φ and stiffness scaling parameters θ is chosen
(Huang and Beck, 2015):
4 APPLYING SBL TO SYSTEM
IDENTIFICATION p ω2 , φ, θ|β
 
β
Nm
−Nm Nd /2
4.1 Hierarchical Bayesian model class for system ∝ (2π/β) exp −  K (θ) − ω2m M φm 2
2
identification m=1

Suppose that we apply Bayesian modal identification (15)


(e.g., Yuen and Katafygiotis, 2001) to appropriate dy- where the finite value of the equation-error precision
namic data from a structure and take the most proba- parameter β in Equation (15) provides a soft constraint
ble posterior (MAP) values of the modal parameters. for the eigen-equation and it allows for the explicit con-
Denote the vector of identified natural frequencies by trol of how closely the system and model modal pa-
ω̂2 ∈ R Ns Nm × 1 (Ns and Nm are the number of modal rameters agree. Note that we can decompose the joint
identifications performed and the number of extracted prior PDF p(ω2 , φ, θ|β) into the product of a Gaus-
modes from the response data, respectively) and mode sian PDF for any one of the parameter vectors that
shapes by ψ̂ ∈ R Ns Nm No × 1 (No is the number of measured is conditional on the other two parameter vectors and
degrees of freedom [DOFs]). As No is usually smaller a marginal PDF for these two parameters; for exam-
that Nd , the number of DOFs of an appropriate struc- ple, p(ω2 , φ, θ|β) = p(θ|ω2 , φ, β) p(ω2 , φ|β). This helps
tural model, we introduce the system natural frequen- to establish a linear relationship between any one of the
cies ω2 ∈ R Nm × 1 and system mode shapes φ ∈ R Nd Nm × 1 parameter vectors and the combination of the other two
to represent the actual underlying modal parameters of parameter vectors. Although the modal parameters are
the assumed linear dynamics of the structural system a nonlinear function of the stiffness parameters, we will
at all DOFs corresponding to those of the structural show a series of coupled linear-in-the-parameter prob-
model. lems involved in our formulation.
We choose a set of parameterized linear struc- Let θ̂u denote the unique MAP value of θu from ap-
tural models with classical damping to produce nor- plying Bayesian updating using a large amount of time-
mal modes of vibration where each model has the domain vibration data from the calibration (undam-
same known mass matrix M ∈ R Nd ×Nd , which can be aged) state. We choose θ̂u as pseudo-data to define the
accurately inferred from structural drawings. Taking likelihood function for θ when monitoring the structure
an appropriate substructuring (perhaps focusing on for possible damage by:
likely damage locations), we decompose the uncertain
stiffness matrix K ∈ R Nd ×Nd as a linear combination 
Nθ  
of (Nθ + 1) substructure stiffness matrices K j, j = 0, p θ̂u |θ, α = N θ̂u, j |θ j , α −1
j (16)
1, . . . Nθ : j=1

Although the conventional strategy in SBL is to use




K (θ) = K0 + θjKj (14) an ARD Gaussian prior PDF (Tipping, 2001a) to model
j=1 sparseness as shown in Equation (7), here we incorpo-
rate the ARD concept in the likelihood function, along
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 7

ues only for the hyperparameters, not the parameters


representing the behavior of the structure. To charac-
terize the full posterior uncertainty for this problem, we
further improved our SBL theory for system identifica-
tion by finding a way to remove all approximations by a
full GS procedure for SBL.

4.2 Fast SBL algorithm


To facilitate the goal of presenting a fast algorithm to
perform SBL, we focus on an analytical derivation of
the posterior PDF of the stiffness scaling parameter
θ and collect all uncertain parameters except θ in the
T
vector δ = [(ω2 ) , ρ T , φT , η, α T , β, b0 ]T as “nuisance”
parameters, which are treated by using Laplace’s ap-
Fig. 1. Acyclic graph representing the information flow in the proximation method (their posterior uncertainties are
hierarchical Bayesian model for SBL algorithms with ARD. effectively ignored). The stochastic model class M(δ)
for the structural model is defined by the likelihood
functions p(θ̂u |θ, α), p(ω̂2 |ω2 , ρ), and p(ψ̂|φ, η) given
with the prior on θ in Equation (15). When learning θ,
in Equations (16)–(18) and the joint priors given by
Equation (16) keeps it close to θ̂u where the closeness
the product of p(ω2 , φ, θ|β) in Equation (15) and
of θ j to θ̂u, j is controlled by αi . In addition, Gaussian
the exponential prior p(β|b0 ). Based on this defined
likelihood functions:
stochastic model class M(δ), one can use the available
p ψ̂|φ, η = N ψ̂|φ, η−1 I No Ns Nm (17) modal data ω̂2 and ψ̂ and pseudo-data θ̂u to update the
structural model parameters θ for system identification
p ω̂2 |ω2 , ρ = N ω̂2 |Tω2 , ρ −1 I Ns Nm (18) purposes. We assume that the posterior p(δ|ω̂2 , ψ̂, θ̂u )
is highly peaked at δ̃ (the MAP value of δ). We then
are defined for the system parameters ω2 and φ use Laplace’s asymptotic approximation (Beck and
with precision parameters ρ and η, respectively. In Katafygiotis, 1998):
Equation (17),  ∈ R No Ns Nm ×Nd Nm with “1s” and “0s”
picks the observed DOFs in the “measured” mode p θ|ω̂2 , ψ̂, θ̂u = ∫ p θ|δ, ω̂2 , ψ̂, θ̂u p δ|ω̂2 , ψ̂, θ̂u dδ
shape data set from the full system mode shapes φ. (19)
In Equation (18), T = [I Nm , . . . , I Nm ]T ∈ R Ns Nm ×Nm is ≈ p θ|δ, ω̂2 , ψ̂, θ̂u
the matrix which connects the vector of Ns sets of where p(θ|δ, ω̂2 , ψ̂, θ̂u ) is the posterior PDF for a given
Nm identified natural frequencies ω̂2 and the Nm sys- model class M(δ), δ̃ = arg max p(δ|ω̂2 , ψ̂, θ̂u ), and
tem natural frequencies ω2 . In addition, we model our
prior uncertainty in the equation-error precision β by p δ|ω̂2 , ψ̂, θ̂u ∝ p ω̂2 , ψ̂, θ̂u |δ p (δ)
an exponential hyperprior p(β|b0 ) with rate parame- = ∫ p θ̂u |θ, α p θ|ω2 , φ, β dθ (20)
ter b0 . The prior PDFs for η, ρ, αi and b0 are taken
as Gamma distributions with very small scale and rate · p ω̂ |ω , ρ p ψ̂|φ, η p ω , φ|β p (β|b0 )
2 2 2

parameters, which make these priors noninformative


where p(ω̂2 , ψ̂, θ̂u |δ) is the evidence function for the
over a logarithmic scale. The proposed modeling con-
model class M(δ), which can be explicitly determined
stitutes a multistage hierarchical model as shown in
analytically. The full posterior uncertainty in θ is explic-
Figure 1. The bidirectional arrows in the graph of the
itly incorporated when finding the MAP estimates of all
hierarchical Bayesian model represents the informa-
parameters in δ, although it is a nontrivial task. The full
tion dependence between structural modal parameters
details of the fast SBL algorithm are given in Huang
ω2 and φ and the stiffness parameter θ, which comes
et al. (2017a).
from the joint prior p(ω2 , φ, θ|β).
In the next subsections, we review our recently pro- Remark 4.1. The maximization of evidence p(ω̂2 ,
posed SBL algorithms with ARD to detect and quan- ψ̂, θ̂u |δ) is effectively implementing the Bayesian Ock-
tify spatially sparse substructure stiffness reductions. ham Razor by assigning lower probabilities to a struc-
The fast SBL algorithm uses MAP values for all model tural model whose parameter vector θ has too large or too
parameters except the stiffness scaling parameters θ, small differences from the MAP value θ̂u identified from
whereas the partial GS SBL algorithm uses MAP val- the calibration state (i.e., the model extracts relatively
8 Huang & Beck

more or less information, respectively, from the system (Gelman et al., 2013). From a practical point of view,
modal parameters ω2 and φ, and so from the “measured” GS is ergodic when the regions of high values of the
modal data ω̂2 and ψ̂, which can be seen from the hier- full posterior PDFs p(φ, ω2 , θ, β|ω̂2 , ψ̂, θ̂u ) are effec-
archical model in Figure 1). This process suppresses the tively connected (corresponding to the model class be-
occurrence of false and missed alarms for stiffness reduc- ing either globally identifiable or unidentifiable (Beck
tions. and Katafygiotis, 1998)), which means that sampling the
Markov chain can fully explore its stationary state when
Remark 4.2. It was found that the trade-off in the quan-
n is large, no matter how the GS algorithm is initialized.
titative Bayesian Ockham Razor stated in Subsection 3.1
To derive the generic form p(w1 |ŷ, w2 , w3 , β) for
is sensitive to the selection of the equation-error preci-
the conditional posterior PDFs p(φ|ψ̂, ω2 , θ, β),
sion parameter β. This motivated us to develop a more
p(ω2 |ω̂2 , φ, θ, β) and p(θ|θ̂u , φ, ω2 , β), we derive the
sophisticated method, described in the next subsection, to
conditional prior PDFs p(φ|ω2 , θ, β), p(ω2 |φ, θ, β)
provide a fuller treatment of the posterior uncertainties.
and p(θ|φ, ω2 , β) from Equation (15) and express them
Remark 4.3. In the fast SBL algorithm, the use of the in the following general form:
pseudodata θ̂u is based on the assumption that it is a  −1 −1

unique MAP estimate from the calibration stage due p(w1 |w2 , w3 , β) = N w1 | ET E ET r, βET E (21)
to the large amount of time-domain vibration data and
identified modal parameters that can be collected at this where E and r are a matrix and a vector which only de-
stage. In the next subsection, we relax this assumption by pend on the parameters w2 and w3 ; and β is the eigen-
explicitly considering the posterior uncertainty of θu from equation-error precision parameter. Each choice of w1 ,
the calibration stage in case there is insufficient data to get w2 , and w3 is a permutation of φ, ω2 and θ.
a posterior on θu that is highly peaked at θ̂u . Similarly, the likelihood functions in Equation (16),
(17), and (18) can be written in the following general
form:
4.3 SBL algorithms using GS
p (ŷ|w1 , κ) = N (ŷ|θw1 , L (κ)) (22)
4.3.1 Partial GS combined with Laplace approxima- K× 1
tions. The goal of the algorithm presented here is where vector ŷ ∈ R is the available data, either θ̂u , ψ̂
to provide a fuller treatment of the posterior uncer- or ω̂2 , w1 is θ, φ or ω2 , θ ∈ R K ×N is a matrix and L(κ) ∈
tainty by employing MCMC simulation methods, so R K ×K is a diagonal covariance matrix (the diagonal ele-
that the Laplace approximations in the fast SBL al- ments are composed of the components in κ). For Equa-
gorithm that involve the system modal parameters tions (16)–(18), the choice of κ is vector α, and scalars η
{ω2 , φ} and the equation-error precision parameter β and ρ, respectively.
can be avoided. We implement GS to draw posterior The posterior PDF of w1 is computed by Bayes’
samples from p(φ, ω2 , θ, β|ω̂2 , ψ̂, θ̂u ) by decomposing Theorem:
the whole model parameter vector into four groups p (w1 |ŷ, w2 , w3 , β) ≈ p (w1 |ŷ, w2 , w3 , β, κ̃)
and repeatedly sampling from one parameter group (23)
∝ p (ŷ|w1 , κ̃) p (w1 |w2 , w3 , β)
conditional on the other three groups and the avail-
able data. The effective dimension is then four, rather where κ̃ = arg max p(κ|ŷ, w2 , w3 , β) and we have used
than the much higher total number of model parame- Laplace’s approximation to marginalize κ from the full
ters. Laplace’s approximation is used for the integrals posterior for w1 . By combining the Gaussian prior
that marginalize the hyperparameters from the poste- p(w1 |w2 , w3 , β) in Equation (21) and the Gaussian like-
rior PDF, as in Equations (19) and (20). lihood p(ŷ|w1 , κ̃) in Equation (22), the Gaussian poste-
In this GS method, the conditional posterior PDFs: rior PDF p(w1 |ŷ, w2 , w3 , β) is obtained.
The conditional posterior PDF for β is derived as:
p φ|ω̂2 , ψ̂, θ̂u , ω2 , θ, β = p φ|ψ̂, ω2 , θ, β ,  
p ω2 |ω̂2 , ψ̂, θ̂u , φ, θ, β = p ω2 |ω̂2 , φ, θ, β , p β|ω̂2 , ψ̂, θ̂u , φ, ω2 , θ = Gamma β|a0 , b0 (24)

p θ|ω̂2 , ψ̂, θ̂u , φ, ω2 , β = p θ|θ̂u , φ, ω2 , β , where the shape parameter a0 and rate parameter b0 for
the posterior gamma distribution on β are given by:
and p β|ω̂2 , ψ̂, θ̂u , φ, ω2 , θ
a0 = 1 + Nm Nd /2 (25a)
are successively sampled to generate samples from the
full posterior PDF p(φ, ω2 , θ, β|ω̂2 , ψ̂, θ̂u ) when the 
Nm
number of samples n is sufficiently large (beyond burn- b0 = b̃0 +  K (θ) − ωi2 M φi 2 /2 (25b)
in) and the Markov chain created by the GS is ergodic i= 1
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 9

b̃0 = arg max p(b0 |ω̂2 , ψ̂, θ̂u , φ, ω2 , θ) (25c) p(κ|ŷ, w1 ). Also, b0 is irrelevant in the condi-
tional posterior PDF p(w1 |ŷ, w2 , w3 , β, b0 , κ) =
The reader is referred to Huang et al. (2017b) for de- p(w1 |ŷ, w2 , w3 , β, κ) because w1 is independent of b0
tailed information of the SBL algorithm using GS, in- when β is given. Similarly, from Figure 1,
cluding the derivation of MAP values of the hyperpa- the conditional posterior PDF on β satisfies
rameters and the pseudocodes. p(β|ω̂2 , ψ̂, θ̂u , ω2 , φ, θ, ρ, η, α, b0 ) = p(β|φ, ω2 , θ, b0 ),
Remark 4.4. It is tractable to marginalize out the which is just the conditional prior on β. Finally, we
equation-error precision parameter β to remove assign a Gamma prior distribution, Gamma(b0 |ab , bb ),
it from the posterior distributions as a “nuisance” for b0 and using Figure 1, we see that the conditional
parameter, where the generic conditional posterior posterior for b0 is given by:
PDF becomes p(w1 |ŷ, w2 , w3 ) = ∫ p(w1 |ŷ, w2 , w3 , β)
p(β|ŷ, w2 , w3 )dβ, which is a Student’s t distribution p b0 |ω̂2 , ψ̂, θ̂u , φ, ω2 , θ, η, ρ, α, β
(28)
(Huang et al., 2017b). The Student’s t PDFs have heavier ∝ Gamma(b0 |1/2 + ab , β + bb )
tails than the Gaussian PDFs sampled in Algorithm 1
and so the algorithm is more robust to noise and outliers. The prior scale and rate parameters ab and bb for b0 are
selected to be very small values in the examples later.
4.3.2 Full GS procedure for SBL. In this proposed The straight-forward way to implement GS is to
method, the posterior uncertainties of all unknown pa- draw posterior samples from the joint posterior PDF
rameters are explicitly characterized by implementing p(ω2 , φ, θ, ρ, η, α, β, b0 |ω̂2 , ψ̂, θ̂u ) by decomposing the
GS to draw posterior samples from the joint posterior whole uncertain parameter vector into eight groups
PDF p(ω2 , φ, θ, β, ρ, η, α, b0 |ω̂2 , ψ̂, θ̂u ). {ω2 , φ, θ, α, ρ, η, β, b0 } and then iterating over the
We have already introduced the conditional posterior groups by repeatedly sampling from the PDF of one
PDFs for the model parameters that are needed for GS parameter group conditional on the other seven groups
in the previous subsection. It remains to derive the con- and the available data.
ditional posterior PDFs for the hyperparameters. The If there are enough modal data ω̂2 and ψ̂ available
prior PDF for the mth component κm of the hyperpa- to provide sufficient information to constrain the up-
rameter vector κ in Equation (22) (equal to η, ρ or α) is dated stiffness parameters, we can produce a GS al-
chosen as a Gamma distribution with parameter am and gorithm which allows a nonsparse set of stiffness
bm , which gives a conjugate prior and allows exact pos- changes to be inferred. In this case, the soft con-
terior sampling of the hyperparameter κ. The posterior straint provided by Equation (16) is dropped. The pseu-
PDF of κ is obtained by using Bayes’ Theorem: docodes for the algorithm are presented in Table 1,
where we implement GS to draw posterior samples
p (κ|ŷ, w1 , w2 , w3 , β, b0 ) = p (κ|ŷ, w1 ) from p(ω2 , φ, θ, ρ, η, β, b0 |ω̂2 , ψ̂) by decomposing the

M whole model parameter vector into the seven groups
∝ p (ŷ |w1 , κ) p (κm |am , bm ) {ω2 , φ, θ, ρ, η, β, b0 } and repeatedly sampling from one
m=1 (26) parameter group conditional on the other six groups

M and the available modal data ω̂2 and ψ̂. The condi-
∝ Gamma(κm |am , bm ) tional posterior PDFs are readily derived from Equa-
m=1 tions (23)–(28). Note that the Markov chain samples
from the marginal posterior PDF for any parameter or
where, as before, w1 is θ, φ or ω2 , and the shape and
any parameter group is obtained by simply examining
rate parameters for the posterior Gamma distribution
the appropriate components of the joint posterior GS
become:
samples beyond the burn-in period.
am = am + trace (Xm ) /2 (27a) However, when we incorporate the sparseness
constraint for the stiffness changes as in Equa-
bm = bm + (ŷ − θw1 )T X−1 tion (16) and characterize the joint posterior
m (ŷ − θw1 ) /2 (27b)
p(ω2 , φ, θ, ρ, η, α, β, b0 |ω̂2 , ψ̂, θ̂u ) by a full GS pro-
where K is the length of the data vector ŷ and cedure similar to the above, it becomes inefficient to
Xm = ∂L1 (κ)/∂κm . Inspired by Tipping (2001a), converge to the stationary state of the joint posterior
we fix the prior scale and rate parameters, am and PDF; the Markov chain samples may be trapped in
bm , for κm to be very small values in the illus- local maxima of the posterior PDF of the hyperpa-
trative examples later, which makes these priors rameter α due to a very large number of uncertain
noninformative over a logarithmic scale. It is seen parameters to be inferred. Instead, we introduce a
from Equation (26) that p(κ|ŷ, w1 , w2 , w3 , β, b0 ) = sequential Bayesian inference procedure to produce
10 Huang & Beck

Table 1 Table 2
Pseudocode of GS for generating posterior samples Pseudocode of GS for generating posterior samples
(n) (n)
{ω2 , φ(n) , ρ (n) , η(n) , θ (n) , β (n) , b0 (n) }, n = 1, . . . , N1 , when {θ (n) , α (n) , β (n) , b0 (n) , φ(n) , η(n) , (ω2 ) , ρ (n) }, n = 1, . . . , N2 ,
conditional on modal data {ω̂2 , ψ̂} when conditional on modal data {ω̂ , ψ̂} and calibration value
2

θ̂u
1. Initialize the samples with ρ (0) = 100, η(0) = 100,

Ns (0)
1. Initialize the samples α j = 100 ( j = 1, . . . , N p ),
β (0) = 100, b0 = 100, (ω2 )(0) = ω̂r2 /Ns , θ (0) = θ̂u , a
r =1 β (0) = 100
chosen “calibration” value for the stiffness parameter (n)
2. Get samples {ω2 , φ(n) , ρ (n) , η(n) }, n = 1, . . . , N2 , by
vector ignoring the Markov chain samples for θ, β, and b0
2. For n = 1 to N1 obtained from the pseudocode in Table 1
(n−1)
3. Sample φ(n) ∼ p(φ|ψ̂, (ω2 ) , θ (n−1) , η(n−1) , β (n−1) ) 3. For n = 1 to N2
4. Sample η ∼ p(η|ψ̂, φ )
(n) (n) (n)
4. Sample θ (n) ∼ p(θ|θ̂u , φ(n) , (ω2 ) , β (n−1) , α (n−1) )
5. Sample (ω2 )(n) ∼ p(ω2 |ω̂2 , φ(n) , θ (n−1) , β (n−1) , ρ (n−1) ) 5. Sample α (n) ∼ p(α|θ̂u , θ (n) )
(n)
6. Sample ρ (n) ∼ p(ρ|ω̂2 , (ω2 ) ) (n)
6. Sample β (n) ∼ p(β|φ(n) , (ω2 ) , θ (n) , b0 (n−1) )
(n)
7. Sample θ (n) ∼ p(θ|φ(n) , (ω2 ) , β (n−1) ) 7. Sample b0 (n) ∼ p(b0 |β (n) )
2 (n)
8. Sample β ∼ p(β|φ , (ω ) , θ (n) , b0 (n−1) )
(n) (n)
8. End for
9. Sample b0 (n) ∼ p(b0 |β (n) ) 9. For n = 1 to N2
(n−1)
10. End for 10. Sample φ(n) ∼ p(φ|ψ̂, (ω2 ) , θ (n) , η(n−1) , β (n) )
(n)
11. Samples {ω2 , φ(n) , ρ (n) , η(n) , θ (n) , β (n) , b0 (n) : 11. Sample η ∼ p(η|ψ̂, φ )
(n) (n)

n = 1, . . . , N1 } are obtained, which are consistent with the 12. Sample (ω2 )(n) ∼ p(ω2 |ω̂2 , φ(n) , θ (n) , ρ (n) , β (n) )
joint posterior p(ω2 , φ, ρ, η, θ, β, b0 |ω̂2 , ψ̂) 13. Sample ρ (n) ∼ p(ρ|ω̂2 , (ω2 ) )
(n)

Note: Based on the hierarchical model in Figure 1, variables and data 14. End for
(n)
sets are dropped in the conditioning if the parameter to be sampled is Samples {θ (n) , α (n) , β (n) , b0 (n) , φ(n) , η(n) , (ω2 ) , ρ (n) :
independent of them. n = 1, . . . , N2 } are obtained which are consistent with the
joint posterior p(θ, α, β, b0 , ω2 , φ, ρ, η|ω̂2 , ψ̂, θ̂u )
Note: Based on the hierarchical model in Figure 1, variables and data
a more effective sampling method based on the
sets are dropped in the conditioning if the parameter to be sampled is
hierarchical Bayesian model in Figure 1. Using the independent of them.
probability product rule, the marginal posterior PDF
for {θ, α, β, b0 } can be expressed as:
joint posterior PDF for the whole uncertain parameter
p θ, α, β, b0 |ω̂2 , ψ̂, θ̂u vector as:
≈ ∫ p θ, α, β, b0 |ω2 , φ, ρ, η, θ̂u (29) p ω2 , φ, θ, ρ, η, α, β, b0 |ω̂2 , ψ̂, θ̂u
p ω2 , φ, ρ, η|ω̂2 , ψ̂ dω2 dφdρdη = p ω2 , φ, ρ, η|θ, α, β, b0 , ω̂2 , ψ̂, θ̂u (30)
p θ, α, β, b0 |ω̂ , ψ̂, θ̂u
2
where from Figure 1, the first factor is independent of ω̂2
and ψ̂, whereas the second factor is approximated to be To characterize the full posterior uncertainty,
independent of θ̂u . Based on Monte Carlo estimation of we then take the previously generated samples
the integral in Equation (29), samples can be obtained {θ (n) , α (n) , β (n) , b0 (n) }, n = 1, . . . , N , and draw posterior
(n)
from the marginal posterior PDF p(θ, α, β, b0 |ω̂2 , ψ̂, θ̂u ) samples {ω2 , φ(n) , ρ (n) , η(n) }, n = 1, . . . , N , from the
by first sampling from the marginal posterior conditional posterior PDF p(ω2 , φ, ρ, η|θ (n) , α (n) ,
p(ω2 , φ, ρ, η|ω̂2 , ψ̂), and then using each of these β (n) , b0 (n) , ω̂2 , ψ̂, θ̂u ), n = 1, . . . , N , by using GS. The
samples to sample from the conditional posterior reader can refer to Table 2 for the pseudocodes of
PDF p(θ, α, β, b0 |ω2 , φ, ρ, η, θ̂u ). To generate samples the full GS algorithm, which generates the posterior
(n) (n)
{ω2 , φ(n) , ρ (n) , η(n) }, n = 1, . . . , N , conditional on ω̂2 samples {θ (n) , α (n) , β (n) , b0 (n) , ω2 , φ(n) , ρ (n) , η(n) }, n =
and ψ̂, we can use the pseudocode in Table 1 and ignore 1, . . . , N , conditional on ω̂2 , ψ̂ and θ̂u .
the samples for θ, β and b0 to automatically get sam- Remark 4.5. The analytical derivation of the conditional
ples from the marginal PDF p(ω2 , φ, ρ, η|ω̂2 , ψ̂). posterior PDFs from Equations (23)–(28) is important
Then we implement the GS to collect samples for the effectiveness of this GS algorithm. The model
{θ (n) , α (n) , β (n) , b0 (n) }, n = 1, . . . , N , generated from parameters are arranged in eight groups represented by
(n)
the PDF p(θ, α, β, b0 |ω2 , φ(n) , ρ (n) , η(n) , θ̂u ), where {ω2 , φ, θ, α, ρ, η, β, b0 }, which leads to a very desirable
(n)
we incorporate the nth sample {ω2 , φ(n) , ρ (n) , η(n) } in feature that it should be applicable to linear Bayesian
the n th iteration sequentially. We then express the full model updating problems of arbitrarily high dimensions,
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 11

in contrast with other MCMC algorithms, because it is 4.4 Comparison among the three algorithms
effectively like an eight-dimensional updating problem.
In Table 3, we compare the three algorithms presented
above with respect to several aspects of the theories and
Remark 4.6. For the updating of the stiffness scal- computation. It is seen that the full GS SBL algorithm
ing parameter θ and system modal parameters ω2 and can be applied without any concern, unlike the other
φ, the corresponding model classes M(α, β), M(ρ, β), two algorithms. In addition, the full GS algorithm is also
and M(η, β) are investigated, as seen from the hierar- capable of characterizing the full posterior uncertainty
chical Bayesian model in Figure 1. The application of of all uncertain parameters in the hierarchical model
Bayes’ Theorem at the model class level automatically in Figure 1. This is important because the Laplace ap-
penalizes models of θ (ω2 or φ) that “under-fit” or “over- proximation for the optimization of hyperparameters in
fit” the associated data θ̂u (ω̂2 or ψ̂), therefore obtaining the fast SBL and partial GS algorithms is based on the
reliable updating results for the three parameter vectors, assumption that the posterior of these hyperparameters
which is the Bayesian Ockham Razor at work (Beck, has a unique maximum with a sharp peak, which may
2010). not be the case, especially when using noisy incomplete
data from a complicated SHM field environment. How-
ever, the full GS procedure presented in this section is
Remark 4.7. In the implementation of GS algorithms,
still applicable, even when the model class is not glob-
the burn-in period before the Markov chain reaches its
ally identifiable. If the posterior uncertainty of some
stationary state is determined by visual inspection of a
of the stiffness parameters is large, however, it means
plot of the Markov chain samples as they are sequentially
that the modal data are not sufficiently informative for
generated following the strategy in Ching et al. (2006). To
reliable inference, even with the inducement of sparse
determine how many Markov chain samples to generate
stiffness changes. In such a case, we will have much less
after burn-in is achieved, we visually check the conver-
confidence in the identification of the corresponding
gence of the plotted stiffness samples of interest.
substructure stiffness reductions and we will be left
with significant uncertainty about damage in those
Remark 4.8. For damage detection and assessment, we substructures.
can follow Huang et al. (2017b), where the posterior un- The illustrative results given later show that the full
certainty of θu from the calibration stage in the poste- GS algorithm outperforms the other two algorithms.
rior sampling of the stiffness scaling parameter θ d for We have explicitly addressed several important issues
the monitoring stage can be incorporated in the sampling to enhance reliable application of the full GS algorithm
procedure. to real structural data. We discuss these issues in this
section.
Remark 4.9. Much more computing resources are re-
quired for the GS SBL algorithms than the fast SBL al- 4.4.1 Producing model sparseness. With regard to the
gorithm, which is the cost for better posterior uncertainty generation of model sparseness in the stiffness change
quantification. Therefore, the choice between these two (θ − θ̂u ), both the full and partial GS algorithms treat
methods in real applications is a trade-off between the the uncertainty in the calibration value θ̂u because it
computation time and the level of accuracy of the uncer- may be difficult to get an “accurate” structural model
tainty quantification that the user is willing to accept. based on the available data in the calibration stage. This
Table 3
A comparison among the three SBL algorithms for detection and quantification of sparse substructural stiffness changes

Fast SBL algorithm Partial GS SBL algorithm Full GS SBL algorithm


Application assumption p(ω , φ, β, ρ, η, α, b0 |ω̂ , ψ̂, θ̂u )
2 2
Hyperparameters ρ, η, and α No assumption
has a unique global maximum have a unique MAP value
Calibration value θ̂u Required to be unique with No assumption No assumption
small uncertainty
Producing model Optimize α by maximizing the Optimize α by maximizing the Sample α from the conditional
sparseness in (θ − θ̂u ) evidence function evidence function posterior PDF p(α|θ̂u , θ (n) )
(n)
p(θ̂u |ω̃2 , φ̃, β̃, ρ̃, η̃, α, b̃0 ) p(θ̂u |(ω2 ) , φ(n) , β (n) , α) in a GS procedure
Output of the algorithm Marginal posterior distribution Joint and marginal posteriors of Joint and marginal posterior
of θ; MAP estimates of parameters in {θ, ω2 , φ, β}; distributions of parameter
{ω2 , φ, β, ρ, η, α, b0 } MAP estimates of {ρ, η, α, b0 } in {θ, ω2 , φ, β, ρ, η, α, b0 }
12 Huang & Beck

is not done in the standard SBL algorithms. However, and system modal parameter estimation because the
the partial GS algorithm ignores the posterior uncer- uncertainties of the hyperparameters are not explicitly
tainty in the hyperparameter α because it learns the hy- treated.
perparameter values by maximization of the evidence
function, which induces many α j to approach infinity. Remark 4.10. In real applications with various envi-
The full posterior uncertainty of the hyperparameter ronmental and operational conditions, if we do not have
vector α is treated in the full GS algorithm by sampling any information about these conditions, prediction errors
from its posterior PDF, which is derived from the condi- with large variances are required to give model predic-
tional dependency in the hierarchical model in Figure 1. tions. This will lead to large posterior uncertainty for the
This is a new strategy for inducing model sparseness that stiffness parameters. In this case, the full GS SBL algo-
avoids the optimization procedure for evidence maxi- rithm proposed can be applied without any concern be-
mization. This is useful as we have found that there are cause the full GS algorithm is capable of characterizing
local maxima of the evidence function that may trap the the full posterior uncertainty of all uncertain parameters.
hyperparameter optimization if the data quantity and
quality is insufficient, leading to non-robust Bayesian
updating results (Huang et al., 2014). We will show later
in some illustrative results that the full GS algorithm 5 ILLUSTRATIVE RESULTS USING THE
also promotes model sparseness well. PROPOSED FULL GS ALGORITHM

The performance of the proposed full GS algorithm is


4.4.2 Computational efficiency. The computational ef- illustrated and compared with the fast SBL algorithm
ficiency of the full GS algorithm is higher than that of and partial GS SBL algorithm by applying them to the
the partial GS algorithm. For sampling the conditional brace-damage patterns in the IASC-ASCE experimen-
posterior PDF of w1 (that is, θ, ω2 or φ) in each iter- tal Phase II SHM benchmark problem (Dyke et al.,
ation of the partial GS algorithm, an iterative method 2003; Ching and Beck, 2003). The benchmark struc-
is required to determine the MAP value of the hyper- ture is a four-story, two-bay by two-bay steel braced
parameters κ and the posterior mean and covariance frame that is 3.6 m high. A picture of the scaled bench-
matrix for the model parameter vector w1 because both mark structure is shown in Figure 2. We note that there
of these moments of w1 depend on the MAP value κ̃ are only a few publications that have reported success-
and vice versa. Also, in implementation of the partial ful localization and quantification of the damage in the
GS algorithm, the dimension Nθ of the stiffness scal- experimental Phase II SHM benchmark problem us-
ing parameters θ is a concern for efficiency because its ing structural response data (in contrast to the Phase I
MAP estimation has a computational effort of O(Nθ3 ), SHM benchmark problems that use synthetic/simulated
which is computationally demanding when the dimen- dynamic response data) (e.g., Ching and Beck, 2003;
sion Nθ is not small. In the new full GS algorithm, Huang et al., 2017b). The real data benchmarks are
we enhance the computational efficiency by sampling
the hyperparameters directly and no iterative method
is required for learning them. In addition, we intro-
duce a sequential Bayesian inference procedure for GS
to alleviate the problem that it may be inefficient to
converge to the stationary state of the joint posterior
p(ω2 , φ, θ, ρ, η, α, β, b0 |ω̂2 , ψ̂, θ̂u ).

4.4.3 Uncertainty quantification of the stiffness pa-


rameters and system modal parameters. For the
partial GS algorithm, the posterior uncertainties
of the stiffness parameters and system modal pa-
rameters are underestimated because the uncer-
tainty of the approximated conditional posterior
PDF p(w1 |ŷ, w2 , w3 , β, κ̃) is always smaller than
that of the posterior PDF p(w1 |ŷ, w2 , w3 , β) =
∫ p(w1 |ŷ, w2 , w3 , β, κ) p(κ|ŷ, w2 , w3 , β)dκ, especially
when the posterior uncertainty of κ is large. It may Fig. 2. The steel frame scale model structure for the
lead to overconfidence in the stiffness identification benchmark studies.
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 13

Table 4
Considered damage pattern cases

Damaged substructures and their true


stiffness ratios with respect to the
Damage patterns undamaged values
Configuration 3 θ1,−y ,θ2,−y ,θ3,−y and θ4,−y : 77.4%
Configuration 4 θ1,−y and θ4,−y : 77.4%
Configuration 5 θ1,−y : 77.4%
Configuration 6 θ2,+x : 54.9%

on the fractional stiffness reductions f (i.e., damage ex-


tent) for each substructure. For the identified modal pa-
rameters, the reader can refer to Ching and Beck (2003)
for detailed information.
The chosen substructuring follows that of Ching and
Beck (2003), where a stiffness scaling parameter vector
θ with 16 components is defined, one for each of the four
faces of each of the four stories. When learned from the
modal data, the reduction of any θ j , j = 1 . . . ., 16, from
an undamaged calibration stage corresponds to damage
in the jth substructure. The true ratio values of real
damaged substructures with respect to the undamaged
Fig. 3. Damage patterns for brace-damage cases (the dashed values are tabulated in Table 4.
lines indicate the corresponding damaged locations): (a) In Figure 4, 6,000 Markov chain samples after the
Configuration 3; (b) Configuration 4; (c) Configuration 5; and burn-in period of 4,000 samples that are generated
(d) Configuration 6. from the partial and full GS algorithms are plotted in
the {α1,+x , α1,+y } and {α1,−x , α1,−y } spaces for Config-
more challenging because of larger modeling errors due uration 5. They show that the stiffness reduction cor-
to discrepancies between the mathematical model’s be- responding to θ1,−y is correctly identified and quan-
havior and the real structure’s behavior. tified as far as the sample means are concerned for
The task group for the benchmark problems investi- both SBL algorithms. However, the full GS SBL al-
gated various damage configurations by removing brac- gorithm identifies this stiffness reduction with higher
ing and loosening beam-column connections within the confidence (smaller posterior uncertainty) and all of
test structure. In this study, four brace damage config- the samples indicate that there is a damage occur-
urations are investigated (see Figure 3): (a) Configura- ring for the substructure θ1,−y for the full GS SBL
tion 3: removal of the left-hand side brace in each story algorithm. Regarding the posterior uncertainties for
on the –y face; (b) Configuration 4: removal of the left- different substructures, smaller posterior uncertain-
hand side braces in the first and fourth stories on the –y ties can be observed in the stiffness scaling parame-
face; (c) Configuration 5: removal of the left-hand side ters θ j s for undamaged substructures (e.g., θ1,+x and
brace in the first story on the –y face; (d) Configuration θ1,−x ) for both algorithms, because the correspond-
6: removal of two braces in the second story on the + x ing hyperparameters α j s approach very large values
face. The red dashed lines in Figure 3 indicate the cor- during the optimization (for partial GS algorithm)
responding damaged (removed) braces. We note that or sampling (for full GS algorithm), inhibiting the
Configuration 1 is the calibration (undamaged) case, change in these parameters. To show this, in Fig-
whereas Configuration 2 is the same as Configuration ure 5 we plot the corresponding MAP values (par-
3 except that all braces on the –y face are removed and tial GS algorithm) and posterior samples (full GS al-
so it is a less challenging case. gorithm) of each hyperparameter α j corresponding to
For each damage configuration, experimental accel- each θ j in Figure 4. It is seen that for both algo-
eration data were generated by impacts of a sledge ham- rithms, the α1,+x and α1,−x values for the undamaged
mer. Results using a two-step probabilistic system iden- substructures θ1,+x and θ1,−x are much larger than the
tification approach are presented: the modal parameters α1,−y s for the damaged substructure, which indicates
are identified in the first step and are then used in a sub- much higher confidence in the closeness to the calibra-
sequent step to determine the probability distribution tion value and leads to a consequence that the stiffness
14 Huang & Beck

Fig. 4. Post burn-in samples for some posterior stiffness Fig. 5. (a) and (b) MAP values; (c) and (d) post burn-in
parameters for the Configuration 5 scenario, plotted in: samples for hyperparameter α for Configuration 5 scenario,
(a) and (c) {α1,+x , α1,+y }; (b) and (d) {α1,−x , α1,−y } spaces, by plotted in: (a) and (c) {α1,+x , α1,+y }; (b) and
running: (a) and (b) partial GS SBL algorithm; (c) and (d) (d) {α1,−x , α1,−y } spaces, by running: (a) and (b) partial GS
full GS SBL algorithm. SBL algorithm; (c) and (d) full GS SBL algorithm.
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 15

the large posterior uncertainties in the prediction er-


ror precision parameter in this real data case conceal
the fluctuation of the samples in the transient (burn-
in) period. In addition, it is seen that the mean and
the variance of samples of β are similar for the two al-
gorithms, which is consistent with the observation that
similar stiffness results are produced for the two GS al-
gorithms in the Configuration 5 scenario.
The plots in Figures 7–10 for Configurations 3–6
are the posterior probability densities for the damage
extent fraction f, which is the decrease in each stiffness
parameter divided by its original calibration value.
Damaged substructures should have large posterior
probability density values where the stiffness reduction
value is close to the real value. For the fast SBL
algorithm (Figures 7–10a), the probability densities are
approximated by Gaussian PDFs based on the posterior
mean and variance of the substructure stiffness param-
Fig. 6. Markov chain samples for the equation-error
eters before and after possible damage (Vanik et al.,
precision parameter β for Configuration 5 scenario.
2000). For the GS SBL algorithms (Figures 7b, 7c, 8b,
8c, 9b, 9c, 10b, and 10c), the probability densities are
approximated by kernel PDFs based on the posterior
parameters of the undamaged substructures θ1,+x and Markov chain samples (excluding 4,000 in the burn-in
θ1,−x are identified with smaller posterior uncertainties. period) of the substructure stiffness parameters before
What is interesting is that the α1,+y values correspond- and after possible damage (Ching et al., 2006).
ing to undamaged substructure θ1,+y are also larger than It is clear that all the actual damage patterns (see
α1,−y s but are smaller than α1,+x s and α1,−x s (Figures Table 4) are detected for all three algorithms because
4a and c), which induces a lot of θ1,+y samples hav- the probability densities are shifted toward the larger
ing large stiffness increases from the calibration value, damage extents. However, when running the fast SBL
as seen in Figures 4a and c. This is because the cho- algorithm, there is a concern of a missed damage
sen Gaussian likelihood function for the stiffness scal- detection for Configuration 3 (Figure 7) where the
ing parameter θ in Equation (18) allows the stiffness real damaged substructure θ2,−y has a probability
to increase and decrease from the calibration value density centered close to zero damage extent, which
with equal chance. In Huang et al. (2017a), a constraint is not the case for the two GS SBL algorithms. The
that suppresses stiffness increases from the calibration performance of the two GS algorithms is similar,
state is imposed in the fast SBL algorithm and the ill- although the occurrence of false damage detections
conditioning in real damage inversions is effectively al- (actual undamaged substructures that have probability
leviated. densities shifted to larger damage extents) occurs less
Overall, from the observations in Figures 4 and 5, we often for the full GS SBL algorithm for Configuration
conclude that both SBL algorithms with ARD are ca- 5 (Figure 9) and Configuration 6 (Figure 10), due to
pable of promoting model sparseness. With regard to the robust treatment of the hyperparameters by a fuller
the comparison between the two algorithms, some α j s posterior uncertainty quantification. For example, false
are larger for the full GS SBL algorithm than the partial damage detections are observed for the Configuration
GS algorithm, where the difference comes from distinct 5 scenario due to biased estimates of θ2,+x and θ2,−x for
strategies for learning the hyperparameter α in the two the partial GS algorithm that are not seen for the full
algorithms. GS algorithm. Also, for the Configuration 6 scenario,
In Huang et al. (2017a), it is found that the stiffness in- the probability density for θ4,−y is shifted to larger
version is sensitive to the selection of the equation-error damage extents for the fast and partial GS algorithms,
precision parameter β. In Figure 6, the Markov chain which produces false detections, but this does not occur
samples of the equation-error precision parameter β for for the full GS algorithm. In Configuration 4 (Figure 8),
Configuration 5 scenario are presented. It is observed two undamaged substructures are observed as having
that the posterior PDF for β characterized by the GS significant probability densities at larger damage ex-
samples has large uncertainties and few samples are re- tents for all three algorithms, producing false damage
quired to reach the stationary state, presumably because detections for substructure θ4,+x and θ4,−x .
16 Huang & Beck

(a) (b) (c)

Fig. 7. Configuration 3 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b) partial
GS SBL algorithm; (c) full GS SBL algorithm. The damage substructures correspond to θ1,−y, θ2,−y, θ3,−y and θ4,−y .

(a) (b) (c)

Fig. 8. Configuration 4 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b) partial
GS SBL algorithm; (c) full GS SBL algorithm. The damage substructures correspond to θ1,−y and θ4,−y .

In real implementations, there is a concern that when bustness of the algorithm. We observed this type of be-
implementing the partial GS SBL algorithm, there may havior when applying SBL for Bayesian compressive
be some local maxima that trap the optimization of the sensing by optimization of the evidence with respect to
hyperparameters {α, ρ, η, b0 } and hence reduce the ro- the hyperparameters of the hierarchical model (Huang
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 17

(a) (b) (c)

Fig. 9. Configuration 5 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b) partial
GS SBL algorithm; (c) full GS SBL algorithm. The only damage substructure corresponds to θ1,−y .

(a) (b) (c)

Fig. 10. Configuration 6 scenario: (a) posterior PDFs for the fast SBL algorithm; (b) and (c) posterior PDFs by running: (b)
partial GS SBL algorithm; (c) full GS SBL algorithm. The only damage substructure corresponds to θ2,+x .

et al., 2014, 2016). This concern is alleviated for the full 6 CONCLUDING REMARKS
GS SBL algorithm because its correct characterization
of the posterior uncertainty of all uncertain parameters Probability as a logic provides a rigorous framework
is expected to help the algorithm escape being trapped for a Bayesian approach to quantifying modeling un-
in local maxima. certainty in model updating in system identification.
18 Huang & Beck

It allows plausible reasoning about structural behav- also been developed. The full and partial GS algorithms
ior based on incomplete information. A key con- differ in their strategies to deal with the posterior uncer-
cept is a stochastic system model class, which defines tainty of the hyperparameters. The comparison study
the fundamental probability models that allow robust of the effectiveness and robustness of the presented
stochastic structural analyses to be performed. Such a algorithms was demonstrated through the analysis of
model class can be constructed by stochastic embed- the IASC–ASCE Phase II experimental benchmark
ding of any deterministic model of the structure’s I/O problem to identify brace damage. The results suggest
behavior. One distinguishing aspect of the Bayesian that the full GS SBL algorithm is more reliable than the
framework is marginalization of posterior PDFs, where other two SBL algorithms when using real experimental
instead of seeking to estimate all “nuisance” parame- data where there is significant modeling error. This
ters in the models, the goal is to integrate them out superior performance is attributed to its more robust
to properly preserve their contribution to the poste- promotion of model sparseness and more accurate
rior uncertainty of the parameters of interest. This al- posterior uncertainty quantification for the stiffness
lows us to assess the relative plausibility of each model parameters and system modal parameters.
within a set of candidate model classes chosen to repre-
sent the uncertain structural behavior. Applying Bayes’
Theorem at the model class level automatically penal- ACKNOWLEDGMENTS
izes models that are too simple (“under-fit” the data)
and too complex (“over-fit” the data), which is the Yong Huang was supported by the George W. Hous-
Bayesian Ockham Razor at work. This quantitative im- ner Earthquake Engineering Research fund at the Cal-
plementation of Ockham’s Razor is a natural conse- ifornia Institute of Technology and grants from the
quence of applying Bayesian updating at the model class National Natural Science Foundation of China (No.
level. 51778192 and 51308161). This support is gratefully
SBL is an effective strategy to incorporate sparse- acknowledged.
ness during model updating by introducing a hierarchi-
cal model and automatically implementing the Bayesian
Ockham Razor with respect to the hyperparameters REFERENCES
that control the sparseness. The focus in this article is
applying SBL with ARD in Bayesian system identifi- Amezquita-Sanchez, J. P., Park, H. S. & Adeli, H. (2017), A
cation based on noisy incomplete modal data where novel methodology for modal parameters identification of
we can impose spatially sparse stiffness changes when large smart structures using music, empirical wavelet trans-
form, and Hilbert transform, Engineering Structures, 147,
updating a structural model. In general, the ARD ap- 148–59.
proach allows model class selection by suppressing Beck, J. L. (2010), Bayesian system identification based on
terms in a linear-in-the-parameters expansion of possi- probability logic, Structural Control and Health Monitoring,
ble terms, where each subset of terms could form a sep- 17, 825–47.
arate model class. It does this by inducing a sparse pa- Beck, J. L. & Katafygiotis, L. S. (1998), Updating models and
their uncertainties. I: Bayesian statistical framework, Jour-
rameter vector during updating. In this work, a modified nal of Engineering Mechanics, 124, 455–61.
ARD approach is used during the updating to induce Beck, J. L. & Yuen, K. V. (2004), Model selection using re-
sparseness in the change in the stiffness scaling param- sponse measurements: a Bayesian probabilistic approach,
eters θ from the values appropriate for the original un- Journal of Engineering Mechanics, 130, 192–203.
damaged structure to the current potentially damaged Chiachio, M., Beck, J. L., Chiachio, J. & Guillermo, R. (2014),
Approximate Bayesian computation by subset simulation,
structure. This reduces the potential ill-conditioning to SIAM Journal on Scientific Computing, 36(3), A1339–58.
provide more reliable identification results. Ching, J. & Beck, J. L. (2003), Two-Step Bayesian Struc-
Our recently developed fast SBL algorithm and ture Health Monitoring Approach for IASC-ASCE Phase II
partial GS SBL algorithm have been briefly reviewed in Simulated and Experimental Benchmark Studies, Technical
the article. Based on a similar hierarchical SBL model, Report, EERL 2003-02, Earthquake Engineering Research
Laboratory, California Institute of Technology, Pasadena,
we improved our SBL theory for system identification CA.
and developed a full GS procedure to provide a full Ching, J. & Chen, Y. (2007), Transitional Markov Chain
characterization of the posterior uncertainty of all Monte Carlo method for Bayesian model updating, model
unknown parameters and hyperparameters, which is class selection and model averaging, Journal of Engineering
especially useful when the model class is not globally Mechanics, 133, 816–32.
Ching, J., Muto, M. & Beck, J. L. (2006), Structural model up-
identifiable because the Laplace approximations in the dating and health monitoring with incomplete modal data
earlier SBL algorithms are not accurate. An efficient using Gibbs sampler, Computer-Aided Civil and Infrastruc-
sampling strategy for the full GS SBL algorithm has ture Engineering, 21(4), 242–57.
Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning 19

Ching, J., Phoon, K.-K., Beck, J. L. & Huang, Y. (2017), Mackay, D. J. C. (1992), Bayesian methods for adaptive mod-
Identifiability of geotechnical site-specific trend functions, els. Ph.D. thesis in Computation and Neural Systems, Cali-
ASCE-ASME Journal of Risk and Uncertainty in Engineer- fornia Institute of Technology, Pasadena, CA.
ing Systems, Part A: Civil Engineering, 3(4), 04017021. Mu, H. Q. & Yuen, K. V. (2017), Novel sparse Bayesian learn-
Dyke, S. J., Bernal, D., Beck, J. L. & Ventura, C. (2003), ing and its application to ground motion pattern recog-
Experimental phase II of the structural health monitoring nition, Journal of Computing in Civil Engineering 31(5),
benchmark problem, in Proceedings of 16th Engineering https://doi.org/10.1061/(ASCE)CP.1943-5487.0000668.
Mechanics Conference, ASCE, Reston, VA. Oh, B. K., Kim, D. & Park, H. S. (2017), Modal response-
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, based visual system identification and model updating
A. & Rubin, D. B. (2013), Bayesian Data Analysis, 3rd edn., methods for building structures, Computer-Aided Civil and
Chapman & Hall/CRC, Boca Raton, FL. Infrastructure Engineering, 32(1), 34–56.
Green, P. L., Cross, E. J. & Worden, K. (2015), Bayesian sys- Oh, C. K., Beck, J. L. & Yamada, M. (2008), Bayesian learn-
tem identification of dynamical systems using highly infor- ing using automatic relevance determination prior with an
mative training data, Mechanical Systems and Signal Pro- application to earthquake early warning, Journal of Engi-
cessing, 56–57, 109–22. neering Mechanics, 134(12), 1013–20.
Gull, S. F. (1988), Bayesian inductive inference and maximum Papadimitriou, C., Beck, J. L. & Katafygiotis, L. S. (2001), Up-
entropy, in G. J. Erickson and C. R. Smith (eds.), Maximum dating robust reliability using structural test data, Proba-
Entropy and Bayesian Methods, Kluwer Academic Publish- bilistic Engineering Mechanics, 16, 103–13.
ers, Dordrecht, The Netherlands, pp. 53–74. Perez-Ramirez, C. A., Amezquita-Sanchez, J. P., Adeli, H.,
Huang, Y. & Beck, J. L. (2015), Hierarchical sparse Bayesian Valtierra-Rodriguez, M., Camarena-Martinez, D. & Rene
learning for structural health monitoring with incomplete Romero-Troncoso, R. J. (2016), New methodology for
modal data, International Journal for Uncertainty Quantifi- modal parameters identification of smart civil structures us-
cation, 5(2), 139–69. ing ambient vibrations and synchrosqueezed wavelet, Engi-
Huang, Y., Beck, J. L. & Li, H. (2017a), Hierarchical sparse neering Applications of Artificial Intelligence, 48, 1–16.
Bayesian learning for structural damage detection: theory, Shan, J., Ouyang, Y., Yuan, H. & Shi, W. (2016), Seismic
computation and application, Structural Safety, 64, 37–53. data driven identification of linear physical models for
Huang, Y., Beck, J. L. & Li, H. (2017b), Bayesian system building structures using performance and stabilizing objec-
identification based on hierarchical sparse Bayesian learn- tives, Computer-Aided Civil and Infrastructure Engineering,
ing and Gibbs sampling with application to structural dam- 31(11), 846–70.
age assessment, Computer Methods in Applied Mechanics Tipping, M. E. (2000), The relevance vector machine, in S.
and Engineering, 318, 382–411. A. Solla, T. K. Leen, and K.-R. Müller (eds.). Advances
Huang, Y., Beck, J. L., Wu, S. & Li, H. (2014), Robust in Neural Information Processing Systems 12, MIT Press,
Bayesian compressive sensing for signals in structural Cambridge, MA, pp. 652–58.
health monitoring, Computer-Aided Civil and Infrastruc- Tipping, M. E. (2001a), Sparse Bayesian learning and the rel-
ture Engineering, 29(3), 160–79. evance vector machine, Journal of Machine Learning Re-
Huang, Y., Beck, J. L., Wu, S. & Li, H. (2016), Bayesian com- search, 1, 211–44.
pressive sensing for approximately sparse signals and ap- Tipping, M. E. (2001b), Sparse kernel principal component
plication to structural health monitoring signals for data analysis, in Advances in Neural Information Processing Sys-
loss recovery, Probabilistic Engineering Mechanics, 46, 62– tems, vol. 13, MIT Press, Cambridge, MA, pp. 633–39.
79. Vakilzadeh, M. K., Huang, Y., Beck, J. L. & Abrahamsson, T.
Jaynes, E. T. (1983), Papers on Probability, Statistics and Sta- (2017), Approximate Bayesian computation by subset sim-
tistical Physics, R. D. Rosenkrantz (ed.), D. Reidel Publish- ulation using hierarchical state-space models, Mechanical
ing, Dordrecht, Holland. Systems and Signal Processing, 84(Part B), 2–20.
Jaynes, E. T. (2003), Probability Theory: The Logic of Science, Vanik, M. W., Beck, J. L. & Au, S. K. (2000), Bayesian proba-
Cambridge University Press, Cambridge, UK. bilistic approach to structural health monitoring, Journal of
Jefferys, W. H. & Berger, J. O. (1992), Ockham’s razor and Engineering Mechanics-ASCE, 126(7), 738–45.
Bayesian analysis, American Scientist, 80, 64–72. Wang, Y. & Zhao, T. (2017), Statistical interpretation of soil
Li, Z., Park, H. S. & Adeli, H. (2017), New method for modal property profiles from sparse data using Bayesian compres-
identification and health monitoring of superhighrise build- sive sampling, Geotechnique, 67(6), 523–36.
ing structures using discretized synchrosqueezed wavelet Yuen, K. V. & Katafygiotis, L. S. (2001), Bayesian time–
and Hilbert transforms, The Structural Design of Tall and domain approach for modal updating using ambient data,
Special Buildings, 26(3), 1312–28. Probabilistic Engineering Mechanics, 16(3), 219–31.

You might also like