Structural Safety: Armin Tabandeh, Gaofeng Jia, Paolo Gardoni

Structural Safety 97 (2022) 102216
Contents lists available at ScienceDirect
Structural Safety
journal homepage: www.elsevier.com/locate/strusafe
A review and assessment of importance sampling methods for reliability

analysis
Armin Tabandeh a ,∗, Gaofeng Jia b , Paolo Gardoni a
a Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
b
Department of Civil and Environmental Engineering, Colorado State University, Fort Collins, CO, USA
ARTICLE INFO ABSTRACT
Keywords: This paper reviews the mathematical foundation of the importance sampling technique and discusses two
Importance sampling general classes of methods to construct the importance sampling density (or probability measure) for reliability
Reliability analysis analysis. The paper first explains the failure probability estimator of the importance sampling technique,
Gaussian mixture
its statistical properties, and computational complexity. The optimal but not implementable importance
Kernel density estimation
sampling density, derived from the variational calculus, is the starting point of the two general classes
Surrogate
of importance sampling methods. For time-variant reliability analysis, the optimal but not implementable
stochastic control is derived that induces the corresponding optimal importance sampling probability measure.
In the first class, the optimal importance sampling density is directly approximated by a member of a family
of parametric or nonparametric probability density functions. This approximation requires defining the family
of approximating probability densities, a measure of distance between two probability densities, and an
optimization algorithm. In the second class, the approximating importance sampling density has the general
functional form of the optimal solution. The approximation amounts to replacing the limit-state function
with a computationally convenient surrogate. The paper then explores the performances of the two classes
of importance sampling methods through several benchmark numerical examples. The challenges and future
directions of the importance sampling technique are also discussed.
1. Introduction required to achieve the desired accuracy level in the standard MC

simulation is inversely proportional to the failure probability (e.g., for
The performance of reliability analysis methods depends on the small failure probabilities, accurate reliability analysis usually requires
( )
interplay between their mathematical flexibility and computational ∼  106 number of limit-state function evaluations.)
complexity. Gradient-based methods like the First-Order Reliability Importance Sampling (IS) is an enhanced MC simulation technique
Method (FORM) and the Second-Order Reliability Method (SORM) [1] whose objective is to reduce the computational complexity of the stan-
generally have low computational complexity at the cost of imposing dard MC simulation while retaining its mathematical flexibility. The
restrictive regularity conditions on the limit-state function that defines idea of the IS technique is to define a biased sampling density, called
the failure domain. When the required regularity conditions are met, IS density, that can sample from the failure domain more frequently
FORM and SORM usually converge with ∼  (10) number of limit-state than the base sampling density in the standard MC simulation [2–
function evaluations. However, they are prone to significant errors or
4]. Samples drawn from the IS density are then weighted to yield
infeasible when used for (1) problems with a strongly nonlinear fail-
an unbiased estimate of the failure probability. Implementing the IS
ure boundary, particularly in the vicinity of design-point(s) (the most
technique for the time-variant reliability analysis of dynamical systems
likely failure point(s) in the standard Gaussian probability space), (2)
amounts to designing an IS probability measure that generates sample
component reliability problems with multiple design points or a single
paths out-crossing the failure boundary more frequently than that
but non-dominant one, and (3) problems with non-smooth limit-state
functions (e.g., system reliability problems.) In contrast, sampling tech- under the base probability measure. The IS probability measure is
niques based on the variants of the Monte Carlo (MC) simulation relax induced by introducing a stochastic control into the equation of motion
the restrictive regularity conditions at the cost of higher computational of the dynamical system that increases the likelihood of out-crossing the
complexity than FORM and SORM. For example, the number of samples failure boundary. The generated sample paths under the IS probability
∗ Corresponding author.
E-mail addresses: tabande2@illinois.edu (A. Tabandeh), gjia@colostate.edu (G. Jia), gardoni@illinois.edu (P. Gardoni).
https://doi.org/10.1016/j.strusafe.2022.102216
Received 28 June 2021; Received in revised form 12 February 2022; Accepted 9 March 2022
Available online 21 March 2022
0167-4730/© 2022 Elsevier Ltd. All rights reserved.
A. Tabandeh et al. Structural Safety 97 (2022) 102216
measure are then weighted according to the Radon–Nikodym derivative 2.1. Formulation of reliability analysis
to yield an unbiased estimate of the failure probability (i.e., the first-
passage probability) [5]. The success of the IS technique depends on Following the conventional notation in reliability analysis [4,29,
the prudent choice of the IS density or stochastic control [6,7]. It 30], consider a system whose performance depends on a vector of state
( )
is possible to derive the IS density or stochastic control directly via variables 𝐱⊺ = 𝑥1 , … , 𝑥𝑑 ∈ 𝒳 ⊆ R𝑑 , where 𝒳 denotes the space of
the optimization of the failure probability estimator. However, the all possible values of 𝐱. We assume that the probability distribution
dependence of the optimal solution (probability density or stochastic of 𝐱 that captures its uncertainty has a probability density function
control) on the unknown failure probability limits its use to a mere (PDF) 𝑝 (𝐱). Also, suppose 𝑔 (𝐱) ∶ 𝒳 → R denotes a limit-state function
guide for developing efficient IS densities or stochastic control that are that characterizes the system performance, defined such that 𝛺𝐹 =
implementable. {𝐱 ∈ 𝒳 ∶ 𝑔 (𝐱) ≤ 0} indicates its failure domain (i.e., realizations for
The reliability analysis literature includes many IS methods. How- which the system fails to meet the desired performance level.) Con-
ever, the existing methods share similar theoretical foundations. In sidering the uncertainties in 𝐱, we can compute the failure probability
general, we recognize two classes of IS methods. This review paper 𝑃𝐹 as
focuses on these two classes and discusses them in a consistent theo- [ ]
𝑃𝐹 = 𝑝 (𝐱) d𝐱 = 𝐼𝛺𝐹 (𝐱) 𝑝 (𝐱) d𝐱 = E𝑝 𝐼𝛺𝐹 (𝐱) , (1)
retical framework for comparison purposes. The first class of methods ∫𝛺𝐹 ∫𝒳
develops the IS density by minimizing a discrepancy measure between
where 𝐼𝛺𝐹 (𝐱) is an indicator function, defined as 𝐼𝛺𝐹 (𝐱) = 1, if 𝐱 ∈ 𝛺𝐹 ,
the optimal solution and a family of approximating densities. For
and 𝐼𝛺𝐹 (𝐱) = 0, if 𝐱 ∉ 𝛺𝐹 ; and E𝑝 [⋅] is the expectation with respect to
example, Ang et al. [8] derived the IS density by minimizing the mean
𝑝 (𝐱). The reliability of the system is then 𝑅 = 1 − 𝑃𝐹 .
squared error between the optimal solution and kernel densities (a
For systems with a large vector of uncertain state variables (i.e., 𝑑 ≫
family of nonparametric probability densities.) The estimator of the
1), the estimation of 𝑃𝐹 requires evaluating a high-dimensional integral
mean squared error should be evaluated at samples from the optimal
that is often analytically intractable. Simulation techniques (e.g., MC
solution, for which the sampling algorithm controls the computational
simulation) provide a general means to evaluate such high-dimensional
complexity. Later, Au and Beck [9] improved the work of Ang et al. [8]
integrals [3,4,31]. Simulation techniques rely on an unbiased and con-
by proposing a more efficient sampling algorithm based on a Markov
sistent estimator of 𝑃𝐹 to ensure convergence to the true, but unknown,
Chain Monte Carlo simulation technique. More recently, Kurtz and
value of 𝑃𝐹 . For example, in the standard MC simulation,
[ ]the following
Song [10] derived the IS density by minimizing the relative entropy
estimator is commonly used to estimate 𝑃𝐹 = E𝑝 𝐼𝛺𝐹 (𝐱) :
between the optimal solution and Gaussian mixtures (a family of para-
metric probability densities.) As before, the estimator of the relative
1 ∑
𝑁
( ) ( (𝑛) )
entropy should be evaluated at samples from the optimal solution. 𝑃̂𝐹 ,𝑝 𝑁 = 𝐼 𝐱 , (2)
𝑁 𝑛=1 𝛺𝐹
Kurtz and Song [10] developed an adaptive estimation method to
{ }
improve the computational efficiency, where the Gaussian mixture where 𝑁 = 𝐱(𝑛) ∶ 𝑛 = 1, … , 𝑁 is a set of 𝑁 statistically independent
gradually approaches the optimal solution. The second class of methods and identically distributed samples from 𝑝 (𝐱). The above expression fol-
( )
is based on the general form of the optimal solution in which the lows by replacing 𝑝 (𝐱) in Eq. (1) with the empirical density 𝑝̂ 𝐱, 𝑁 =
1 ∑𝑁 ( )
computationally demanding limit-state function is replaced with a fast- 𝛿 𝐱 − 𝐱(𝑛) , where 𝛿 (⋅) is the Dirac delta function. Note that
𝑁 (𝑛=1 )
to-evaluate surrogate. Examples of such surrogates include polynomial 𝑃̂𝐹 ,𝑝 𝑁 is a random function of the samples in 𝑁 ; hence, the
( )
response surface [11–15], support vector regression [16–19], neural performance of 𝑃̂𝐹 ,𝑝 𝑁 can be evaluated in terms of its statistical
[ ( )]
network [20–23], Gaussian process [24–26], and polynomial chaos properties. In particular, one can show that the mean E𝑝 𝑃̂𝐹 ,𝑝 𝑁
[ ( )] ( )
expansion [27,28]. The algorithm that generates required samples for and variance Var𝑝 𝑃̂𝐹 ,𝑝 𝑁 of 𝑃̂𝐹 ,𝑝 𝑁 are [31]
the training of the surrogate controls the computational complexity of [ ( )]
the method. After the initial training, this class of methods usually uses E𝑝 𝑃̂𝐹 ,𝑝 𝑁 = 𝑃𝐹 ,
( )
an active learning technique to generate additional training samples [ ( )] 𝑃𝐹 1 − 𝑃𝐹 (3)
Var𝑝 𝑃̂𝐹 ,𝑝 𝑁 = .
that improve the failure boundary representation. Besides theoretical 𝑁
discussions, the performances of the two classes will be illustrated ( )
The above statistics indicate that 𝑃̂𝐹 ,𝑝 𝑁 satisfies the desired
through several benchmark numerical examples. We also discuss the [ ( )]
unbiasedness (i.e., E𝑝 𝑃̂𝐹 ,𝑝 𝑁 = 𝑃𝐹 ) and consistency (i.e., Var𝑝
challenges and future directions of the IS technique. [ ( )] 𝑁→∞
𝑃̂𝐹 ,𝑝 𝑁 ⟶ 0) conditions. More generally, one can show that
The rest of the paper is organized into six sections. The next ( ) 𝑎.𝑠. ∑ 𝑎.𝑠.
𝑃̂𝐹 ,𝑝 𝑁 ⟶ 𝑃𝐹 almost surely (𝑎.𝑠.) or, equivalently, 𝑁 𝑛=1 𝑟𝑛 ∕𝑁 ⟶
section defines the reliability problem and presents the mathematical ( (𝑛) ) ( 4)
0, where 𝑟𝑛 = 𝐼𝛺𝐹 𝐱 − 𝑃𝐹 . Using E 𝑟𝑛 < ∞ and the statistical
basis of the IS technique. Sections 3 and 4 explain the mathematical [( )4 ]
∑𝑁
development of the two classes of IS methods and elaborate notable independence of 𝑟1 , … , 𝑟𝑁 , it can be shown that E 𝑛=1 𝑟𝑛 =
contributions in each class. Section 5 evaluates the performance of ( 4) ( 2 ) [ ( 2 )]2 2
the discussed IS methods through benchmark reliability problems. Sec- 𝑁E 𝑟1 + 3 𝑁 − 𝑁 E( 𝑟1 ≤ 𝑁 𝑐0 for ) some (constant 𝑐0 > 0. For
|∑ | )
tion 6 discusses the challenges of the IS technique and its future any 𝜀 > 0, we have P | 𝑁 𝑟 ∕𝑁 | ≥ 𝜀 ≤ 𝑐 ∕ 𝑁 2 𝜀4 according to
| 𝑛=1 𝑛 | ( 0∑ )
direction. Finally, the last section summarizes the paper and draws | |
Chebyshev inequality, leading to lim𝑁→∞ P | 𝑁 𝑟𝑛 ∕𝑁 | ≥ 𝜀 = 0.
some conclusions. | 𝑛=1 |
Invoking
(∑ the Borel–Cantelli ) Lemma, the latter implies
| 𝑁 |
P | 𝑛=1 𝑟𝑛 ∕𝑁 | ≥ 𝜀 infinitely often = 0 [32]. The Chebyshev inequal-
| | ( ) ( )
2. Formulation of reliability analysis using importance sampling ity also implies that at least 𝑁 (𝜀, 𝛾) = 𝑃𝐹 1 − 𝑃𝐹 ∕ 𝛾𝜀2 number
of samples is required to ensure, with confidence 1 − 𝛾, that the
In this section, we first briefly present the formulation of reliability estimated failure probability is bounded between 𝑃𝐹 −𝜀 and 𝑃𝐹 +𝜀. The
analysis focusing on sampling techniques based on Monte Carlo sim- mapping (𝜀, 𝛾) ↦ 𝑁 (𝜀, 𝛾) is a measure of the estimator’s computational
( )
ulation. We then lay out the foundation of the importance sampling complexity. The Coefficient of Variation (CoV) of 𝑃̂𝐹 ,𝑝 𝑁 provides
technique and discuss its computational complexity. Finally, we intro- similar information about its computational complexity. i.e.,
duce the two classes of IS methods that will be reviewed in this paper. √ [ ( )] √
This introductory section aims to facilitate the review and discussion Var𝑝 𝑃̂𝐹 ,𝑝 𝑁 1 − 𝑃𝐹
𝛿𝑀𝐶 = [ ( )] = , (4)
of the two classes of IS methods presented in subsequent sections. E𝑝 𝑃̂𝐹 ,𝑝 𝑁 𝑁𝑃𝐹
2
( )
where 𝛿 is the CoV of 𝑃̂𝐹 ,𝑝 𝑁 , which implies that almost 𝑃𝐹 for all 𝑞𝐼𝑆 (𝐱) ∈ ; thus, the estimator is unbiased. We then write
⌈ ( 2 𝑀𝐶 )⌉ [ ( )]
1∕ 𝛿𝑀𝐶 𝑃𝐹 samples is required to achieve the confidence level of the estimator’s variance Var𝑞 𝑃̂𝐹 ,𝑞 𝑁 as
( )
1 − 𝛿𝑀𝐶 . In reliability analysis, the failure of a system is typically [ ]
[ ( )] 1 𝑝 (𝐱)
a rare event whose probability of occurrence usually does not exceed Var𝑞 𝑃̂𝐹 ,𝑞 𝑁 = Var𝑞 𝐼𝛺𝐹 (𝐱)
( ) 𝑁 𝑞𝐼𝑆 (𝐱)
 10−3 (i.e., 𝑃𝐹 < 10−3 ); thus, the MC estimator in Eq. (2) requires { [ ] } (10)
a large number of 𝑔 (𝐱) evaluations to yield an accurate estimate of 1 𝑝2 (𝐱)
= E𝑞 𝐼𝛺𝐹 (𝐱) − 𝑃𝐹2 .
𝑃𝐹 . For systems with a computationally demanding 𝑔 (𝐱), the slow 𝑁 𝑞 2 (𝐱) 𝐼𝑆
convergence of the standard MC simulation significantly limits its use
The consistency and computational complexity of the estimator
for reliability analysis. [ ( )]
depend on the choice of 𝑞𝐼𝑆 . Hence, the functional Var𝑞 𝑃̂𝐹 ,𝑞 𝑁
We can also use the IS technique for the reliability analysis of
can be minimized for 𝑞𝐼𝑆 [6,7,31]. i.e.,
dynamical systems driven by stochastic excitations (i.e., estimate the
first-passage probability.) The vector of state variables 𝐱𝑡 is an 𝒳 -valued ∗
[ ( )] 𝐼𝛺𝐹 (𝐱) 𝑝 (𝐱)
𝑞𝐼𝑆 (𝐱) = arg min Var𝑞 𝑃̂𝐹 ,𝑞 𝑁 = , (11)
stochastic process governed by 𝑞𝐼𝑆 ∈ 𝑃𝐹
( ) ( ) ∗ (𝐱) is the optimal IS density, which results in Var 𝑃̂
[ ( )]
d𝐱𝑡 = 𝐚 𝐱𝑡 , 𝑡 d𝑡 + 𝐁 𝐱𝑡 , 𝑡 d𝐰𝑡 , (5) where 𝑞𝐼𝑆 𝑞 𝐹 ,𝑞 𝑁
∗
= 0 for all 𝑁 ∈ N (see Eq. (10)). The density 𝑞𝐼𝑆 (𝐱) is obtained from
with the initial condition 𝐱𝑡𝑠 at time 𝑡𝑠 , where 𝐚 (⋅, ⋅) is the drift vector; the Euler–Lagrange equation 𝜕𝐿∕𝜕𝑞𝐼𝑆 = 0 subject to ∫ 𝑞𝐼𝑆 (𝐱) d𝐱 = 1,
′ ′ ( )
𝐁 (⋅, ⋅) is an R𝑑×𝑑 -valued matrix; and 𝐰𝑡 is the R𝑑 -valued standard where the Lagrangian is 𝐿 𝑞𝐼𝑆 = 𝐼𝛺𝐹 (𝐱) 𝑝2 (𝐱) ∕𝑞𝐼𝑆 (𝐱). See [36] for
{ ( ) }
Wiener process. The failure domain 𝛺𝐹 (𝒯 ) = 𝐱𝑡 ∶ inf 𝑡∈𝒯 𝑔 𝐱𝑡 ≤ 0 more discussion on the computational complexity of the IS technique.
[ ]
is now a function of the considered time interval 𝒯 = 𝑡𝑠 , 𝑡𝑒 ⊂ R+ . The In general, 𝑞𝐼𝑆∗ (𝐱) is not realizable since its normalizing constant 𝑃
𝐹
failure domain also depends on 𝐱𝑡𝑠 and 𝑡𝑠 , which is assumed implicitly. is the ultimate quantity of interest. However, the functional form of
We can express the failure probability as ∗ (𝐱) provides insight into the design of a computationally efficient
𝑞𝐼𝑆
( ) [ ( )] IS density. Deviation among available importance sampling methods
𝑃𝐹 (𝒯 ) = dP 𝐱𝑡 = EP 𝐼𝛺𝐹 (𝒯 ) 𝐱𝑡 . (6) initiates from how the knowledge of 𝑞𝐼𝑆 ∗ (𝐱) is used in the design of a
∫𝛺𝐹 (𝒯 )
( ) near-optimal IS density.
where P 𝐱𝑡 is the probability measure on the space of R𝑑 -valued con- For dynamical systems, we may capture the uncertainty in 𝐱𝑡 by
tinuous functions, containing the samples of 𝐱𝑡 . As for time-invariant an equivalent set of random variables using spectral representation
problems, the standard MC estimator of 𝑃𝐹 (𝒯 ) is methods (e.g., Karhunen–Loève expansion [37]) and proceed with the
( ) IS estimator of time-invariant problems in Eq. (8). However, the grow-
1 ∑
𝑁
( )
𝑃̂𝐹 ,P 𝒯 , 𝑁 = 𝐼 𝐱(𝑛) , (7) ing number of required random variables can lead to computational
𝑁 𝑛=1 𝛺𝐹 (𝒯 ) 𝑡
demands exceeding available resources. Instead, one can extend the
{ }
where 𝑁 = 𝐱𝑡(𝑛) ∶ 𝑛 = 1, … , 𝑁 is now a set of 𝑁 statistically general idea of the IS technique to dynamical systems by designing
independent sample paths under a common probability measure P. a new probability measure that increases the likelihood of the failure
Alternatively, the first-passage probability of the Markov process 𝐱𝑡 event [5,38–40]. Introducing the IS probability measure Q, we can
satisfies the Kolmogorov backward equation, a parabolic partial differ- estimate the failure probability as
[ ]
ential equation, that can be solved numerically for simpler dynamical dP ( ) ( ) ( ) dP ( )
𝑃𝐹 (𝒯 ) = 𝐱̌ 𝑡 dQ 𝐱̌ 𝑡 = EQ 𝐼𝛺𝐹 (𝒯 ) 𝐱̌ 𝑡 𝐱̌ 𝑡 , (12)
systems. See [33,34] for a review of the equation and discussion on its ∫𝛺𝐹 (𝒯 ) dQ dQ
numerical solution.
where dP∕dQ (⋅) is the Radon–Nikodym derivative of P with respect to
Q, and 𝐱̌ 𝑡 is the vector of state variables with probability measure Q.
2.2. Concept of importance sampling A common approach to obtain dP∕dQ is based on the Girsanov
transformation [41]. In this approach, the Wiener process 𝐰𝑡 in Eq. (5)
Importance sampling can significantly improve the computational is replaced by
complexity of the standard Monte Carlo estimator, making an other- 𝑡
wise infeasible reliability analysis amenable to simulations [6,7,31,35]. 𝐰̌ 𝑡 = 𝐰𝑡 − 𝐮𝜏 d𝜏, (13)
Instead of using samples from 𝑝 (𝐱), the importance sampling (IS) ∫ 𝑡𝑠
technique introduces an alternative sampling density that can reduce ′
in which 𝐮𝜏 is an R𝑑 -valued stochastic control that should be estimated.
the estimator’s variance [6,7,31]. Introducing the importance sampling Substituting Eq. (13) into Eq. (5) yields the transformed governing
density 𝑞𝐼𝑆 (𝐱) into the definition of 𝑃𝐹 in Eq. (1) yields equation of 𝐱̌ 𝑡 , written as
[ ]
𝑝 (𝐱) 𝑝 (𝐱) [ ( ) ( ) ] ( )
𝑃𝐹 = 𝐼𝛺𝐹 (𝐱) 𝑞𝐼𝑆 (𝐱) d𝐱 = E𝑞 𝐼𝛺𝐹 (𝐱) , (8) d𝐱̌ 𝑡 = 𝐚 𝐱̌ 𝑡 , 𝑡 + 𝐁 𝐱̌ 𝑡 , 𝑡 𝐮𝑡 d𝑡 + 𝐁 𝐱̌ 𝑡 , 𝑡 d𝐰̌ 𝑡 . (14)
∫𝒳 𝑞𝐼𝑆 (𝐱) 𝑞𝐼𝑆 (𝐱)
[ ( ) ( ) ]
where E𝑞 [⋅] is the expectation with respect to 𝑞𝐼𝑆 (𝐱). For the integral The design of the drift term 𝐚 𝐱̌ 𝑡 , 𝑡 + 𝐁 𝐱̌ 𝑡 , 𝑡 𝐮𝑡 via 𝐮𝑡 is the lever
to be well-defined, it is required that 𝑞𝐼𝑆 (𝐱) > 0 for all 𝐱 ∈ 𝛺𝐹 , unless to control the likelihood of arriving at the failure domain. According
𝐼𝛺𝐹 (𝐱) 𝑝 (𝐱) = 0. Let  denote the space of admissible IS densities. to the Girsanov theorem [41], the probability measures P and Q are
Accordingly, we may write the failure probability estimator as related via the Radon–Nikodym derivative
( ) ( )
dP ( )
𝑡𝑒 𝑡𝑒
1 ∑
𝑁
( ) ( ) 𝑝 𝐱(𝑛) 𝐱̌ 𝑡 = exp −
⊺
𝐮𝜏 d𝐰̌ 𝜏 −
1 ⊺
𝐮𝜏 𝐮𝜏 d𝜏 . (15)
𝑃̂𝐹 ,𝑞 𝑁 = 𝐼𝛺𝐹 𝐱(𝑛) ( ), (9) dQ ∫ 𝑡𝑠 2 ∫𝑡𝑠
𝑁 𝑛=1 𝑞𝐼𝑆 𝐱(𝑛)
where 𝑁 is a set of 𝑁 statistically independent and identically dis- Combining Eqs. (12)–(15), we can write the IS estimator of the
tributed samples from 𝑞𝐼𝑆 (𝐱). The above expression follows by replac- failure probability as
( )
ing 𝑞𝐼𝑆 (𝐱) in Eq. (8) with the empirical sampling density 𝑞̂𝐼𝑆 𝐱, 𝑁 = ( )
1 ∑
𝑁
∑𝑁 ( ) ( ) ( ) ( )
𝑛=1 𝑤𝑛 𝛿 𝐱 − 𝐱
(𝑛) , where 𝑤 ∝ 𝑝 𝐱(𝑛) ∕𝑞
𝑛 𝐼𝑆 𝐱
(𝑛) is the weight as- 𝑃̂𝐹 ,Q 𝒯 , 𝑁 = 𝐼𝛺𝐹 (𝒯 ) 𝐱̌ 𝑡(𝑛)
𝑁 𝑛=1
signed to 𝐱(𝑛) ∈ 𝑁 . The proportionality notation ∝ in the definition ( ) (16)
∑
of 𝑤𝑛 means to ensure that 𝑁 𝑛=1 𝑤𝑛 = 1, leading to a proper sampling 𝑡𝑒
1
𝑡
𝑒
( ) [ ( )] × exp −
⊺(𝑛)
𝐮𝜏 d𝐰̌ (𝑛)
𝜏 −
⊺(𝑛)
𝐮 𝐮(𝑛)𝜏 d𝜏 .
density 𝑞̂𝐼𝑆 𝐱, 𝑁 . It is straightforward to show that E𝑞 𝑃̂𝐹 ,𝑞 𝑁 = ∫ 𝑡𝑠 2 𝑡𝑠 𝜏
∫
3
Like for time-invariant problems, we can find the optimal control as follows [38,44]:
𝐮∗𝑡 by minimizing the estimator’s variance ( ) ⊺ ( )
𝐁 𝐱̌ 𝑡 , 𝑡 ∇𝐱𝑡 𝑃𝐹 𝐱𝑡𝑠 = 𝐱̌ 𝑡 , 𝑡𝑠 = 𝑡
{ [ 𝐮∗𝑡 = (
𝑠
) , (22)
[ ( )] 1 ( )
VarQ 𝑃̂𝐹 ,Q 𝒯 , 𝑁 = EQ 𝐼𝛺𝐹 (𝒯 ) 𝐱̌ 𝑡 𝑃𝐹 𝐱𝑡𝑠 = 𝐱̌ 𝑡 , 𝑡𝑠 = 𝑡
𝑁
( )] ( )
𝑡𝑒
⊺
𝑡𝑒
⊺
where ∇𝐱𝑡 𝑃𝐹 𝐱𝑡𝑠 = 𝐱̌ 𝑡 , 𝑡𝑖 = 𝑡 is the gradient of 𝑃𝐹 with the initial
𝑠
× exp −2 𝐮𝜏 d𝐰̌ 𝜏 − 𝐮𝜏 𝐮𝜏 d𝜏 (17) condition 𝐱𝑡𝑠 = 𝐱̌ 𝑡 at time 𝑡𝑠 = 𝑡. The optimal control 𝐮∗𝑡 leads to a zero-
∫ 𝑡𝑠 ∫ 𝑡𝑠
} variance estimator [44]. As for time-invariant problems, the optimal
− 𝑃𝐹2 (𝒯 ) . IS probability measure with 𝐮∗𝑡 is not directly implementable since 𝐮∗𝑡
depends on the estimates of the failure probability and its gradient for
different values of 𝐱𝑡𝑠 and 𝑡𝑠 . However, the general form of 𝐮∗𝑡 is the
We can derive 𝐮∗𝑡 by invoking the stochastic control theory [42]. To starting point to design implementable controls 𝐮̂ ∗𝑡 .
facilitate the optimization, we focus on the part of Eq. (17) that involves
the unknown control 𝐮𝑡 . We cast the optimization problem as 2.3. Scope of this review paper on importance sampling methods for relia-
( ) [ ( )] bility analysis
𝜓 𝐱̌ 𝑡𝑠 , 𝑡𝑠 = min EQ 𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡 , (18)
𝐮𝑡
( ) [ ( )] This review paper discusses several notable importance sampling
( ) 𝑡 ⊺ 𝑡 ⊺ ∗ (𝐱) for time-invariant reliability
with 𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡 = ln 𝐼𝛺𝐹 (𝒯 ) 𝐱̌ 𝑡 exp −2 ∫𝑡 𝑒 𝐮𝜏 d𝐰̌ 𝜏 − ∫𝑡 𝑒 𝐮𝜏 𝐮𝜏 d𝜏 , methods developed to approximate 𝑞𝐼𝑆
𝑠 𝑠
where the logarithmic transformation ln (⋅) leaves the solution 𝐮∗𝑡 un- analysis. We also briefly discuss available methods to approximate the
changed. optimal control 𝐮∗𝑡 for the time-variant reliability analysis of dynamical
{ [ ( We must
)]} also consider the constraint
EQ exp 𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡 ≥ 𝑃𝐹2 (𝒯 ) in finding 𝐮∗𝑡 . The optimal control 𝐮∗𝑡 systems. The scope of the IS methods goes well beyond reliability
satisfies the Hamilton–Jacobi–Bellman equation [43] formulated next. analysis; researchers have implemented them in different areas, in-
cluding particle filtering [45,46], statistical signal processing [47], and
( By discretizing
) 𝒯 into sub-intervals with increments 𝛥𝑡, we express
Bayesian inference [48]. These applications are usually concerned with
𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡 as
estimating a probability distribution that is known up to a propor-
tionality constant (e.g., the posterior distribution in Bayesian infer-
( ) [ ( )]
𝑡𝑒 −𝛥𝑡
∑ ⊺ 𝑡𝑒 −𝛥𝑡
∑ ⊺ ence) or estimating its statistical moments. This review paper focuses
𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡𝑠 , … , 𝐮𝑡𝑒 −𝛥𝑡 = ln 𝐼𝛺𝐹 (𝒯 ) 𝐱̌ 𝑡 − 2 𝐮𝜏 𝛥𝐰̌ 𝜏 − 𝐮𝜏 𝐮𝜏 𝛥𝑡,
specifically on IS methods developed for reliability analysis, aiming to
𝜏=𝑡𝑠 𝜏=𝑡𝑠
approximate 𝑞𝐼𝑆 ∗ (𝐱) in Eq. (11) or 𝐮∗ in Eq. (22). The current literature
𝑡
(19) includes a wide range of methods to approximate 𝑞𝐼𝑆 ∗ (𝐱) and 𝐮∗ ; we
𝑡
group these methods into two general classes depending on whether
leading to the following optimization problem for 𝐮𝑡𝑠 , … , 𝐮𝑡𝑒 −𝛥𝑡 [42]: or not they use the derived optimal solution in Eq. (11) or (22). The
( ) [ ( )] first class, called density approximation methods, deals with methods
𝜓 𝐱̌ 𝑡𝑠 , 𝑡𝑠 = min EQ 𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡𝑠 , … , 𝐮𝑡𝑒 −𝛥𝑡 that directly approximate 𝑞𝐼𝑆 ∗ (𝐱) by a member of a family of parametric
𝐮𝑡𝑠 ,…,𝐮𝑡𝑒 −𝛥𝑡
{ or nonparametric probability density functions. The second class, called
⊺
= min −𝐮𝑡 𝐮𝑡𝑠 𝛥𝑡 limit-state approximation methods, deals with methods that exploit the
𝐮𝑡𝑠 𝑠
)]} general form of 𝑞𝐼𝑆 ∗ (𝐱) ∝ 𝐼

[ ( 𝛺𝐹 (𝐱) 𝑝 (𝐱) while approximating the limit-
+ min EQ 𝑣 𝐱̌ 𝑡𝑠 +𝛥𝑡 , 𝑡𝑠 + 𝛥𝑡, 𝐮𝑡𝑠 +𝛥𝑡 , … , 𝐮𝑡𝑒 −𝛥𝑡 state function 𝑔 (𝐱) in the definition of 𝐼𝛺𝐹 (𝐱). In the next two sections,
𝐮𝑡𝑠 +𝛥𝑡 ,…,𝐮𝑡𝑒 −𝛥𝑡
{ [ ( )]} these two classes of IS methods will be discussed in a consistent theoret-
⊺
= min −𝐮𝑡 𝐮𝑡𝑠 𝛥𝑡 + EQ 𝜓 𝐱̌ 𝑡𝑠 +𝛥𝑡 , 𝑡𝑠 + 𝛥𝑡 ical framework for comparison purposes, notable contributions under
𝐮𝑡𝑠 𝑠
{ each class will be reviewed, and their key characteristics, advantages,
( )
⊺ and limitations will be highlighted.
= min − 𝐮𝑡 𝐮𝑡𝑠 𝛥𝑡 + 𝜓 𝐱̌ 𝑡𝑠 , 𝑡𝑠
𝐮𝑡𝑠 𝑠
[ ( ) ( ) ] ( )
+ 𝛥𝑡 𝐚 𝐱̌ 𝑡𝑠 , 𝑡𝑠 + 𝐁 𝐱̌ 𝑡𝑠 , 𝑡𝑠 𝐮𝑡𝑠 ⋅ ∇𝜓 𝐱̌ 𝑡𝑠 , 𝑡𝑠 3. Density approximation methods
( )
𝜕𝜓 𝐱̌ 𝑡𝑠 , 𝑡𝑠 Density approximation methods generally involve two main tasks,
+ 𝛥𝑡 defining the space of candidate densities ̂ ⊆  to approximate 𝑞𝐼𝑆 ∗ (𝐱),
𝜕𝑡 }
[ ( ) ( )] ( ) ( ) and finding a density 𝑞̂𝐼𝑆 ⋄ (𝐱) ∈  ̂ that closely approximates 𝑞 ∗ (𝐱). Re-
𝛥𝑡 𝐼𝑆
+ 𝐁 𝐱̌ 𝑡𝑠 , 𝑡𝑠 𝐁⊺ 𝐱̌ 𝑡𝑠 , 𝑡𝑠 ∶ ∇2 𝜓 𝐱̌ 𝑡𝑠 , 𝑡𝑠 +  𝛥𝜏 2 , call that  is the space of admissible IS densities. The first task restricts
2
the space of candidate densities that the learning algorithm is allowed
(20) ⋄ (𝐱), and the second task requires a learning algorithm
to search for 𝑞̂𝐼𝑆
( ) ⋄ (𝐱) as a function of training data 
{ (𝑚) }
where the second line follows by substituting 𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡𝑠 , … , 𝐮𝑡𝑒 −𝛥𝑡 to find 𝑞̂𝐼𝑆 𝑀 = 𝐱 ∶ 𝑚 = 1, … , 𝑀
sampled from 𝑞𝐼𝑆 ∗ (𝐱). Since ̂ ⊆ , it is implicit in the definition of ̂
with Eq. (19), separating its first term at time 𝑡𝑠 from the rest, and
( ) in the first task that 𝑞̂𝐼𝑆 (𝐱) > 0 for all 𝑞̂𝐼𝑆 (𝐱) ∈ ̂ and 𝐱 ∈ 𝛺𝐹 , unless
using EQ 𝛥𝐰̌ 𝜏 = 0; the third line follows from the definition
( of 𝜓 (⋅, ⋅);
) 𝐼𝛺𝐹 (𝐱) 𝑝 (𝐱) = 0. To clarify the role of each task, we decompose the
and, the last line follows from the Taylor expansion of 𝜓 𝐱̌ 𝑡𝑠 +𝛥𝑡 , 𝑡𝑠 + 𝛥𝑡
( ) incurred error in the learning process as
about 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , and substituting d𝐱̌ 𝑡 with Eq. (14). Letting 𝛥𝑡 → 0 in [ ∗ ⋄
] [ ∗ ∗
] [ ∗ ⋄
]
𝑑 𝑞𝐼𝑆 (𝐱) , 𝑞̂𝐼𝑆 (𝐱) ≤ 𝑑 𝑞𝐼𝑆 (𝐱) , 𝑞̂𝐼𝑆 (𝐱) + 𝑑 𝑞̂𝐼𝑆 (𝐱) , 𝑞̂𝐼𝑆 (𝐱) , (23)
Eq. (20) yields the Hamilton–Jacobi–Bellman equation
[ ] that follows from the triangle inequality [49], where 𝑑 (⋅, ⋅) is a measure
𝜕𝜓 1 ( ⊺)
+ min −𝐮⊺ 𝐮 + (𝐚 + 𝐁𝐮) ⋅ ∇𝜓 + 𝐁𝐁 ∶ ∇2 𝜓 = 0, (21) of the distance between two densities; and 𝑞̂𝐼𝑆 ̂ is the best possi-
∗ (𝐱) ∈ 
𝜕𝑡 𝐮 2
ble density in ̂ to approximate 𝑞𝐼𝑆 ∗ (𝐱) ∗ (𝐱)
(i.e., 𝑞̂𝐼𝑆
where ∇ and ∶ are the gradient and contraction operators. We dropped [ ∗ ]
= arg min𝑞̂ ∈̂ 𝑑 𝑞𝐼𝑆 (𝐱) , 𝑞̂𝐼𝑆 (𝐱) .) The first term on the right-hand
𝐼𝑆
the arguments and indices of functions in Eq. (21) for the brevity of side of Eq. (23), i.e., the distance between 𝑞𝐼𝑆 ∗ (𝐱) and 𝑞̂∗ (𝐱), is the
𝐼𝑆
notation. It can be shown that 𝐮{∗𝑡 that
[ (satisfies the
)]}Hamilton–Jacobi– approximation error that depends on the complexity of the search
Bellman equation subject to EQ exp 𝑣 𝐱̌ 𝑡𝑠 , 𝑡𝑠 , 𝐮𝑡 ≥ 𝑃𝐹2 (𝒯 ) will be space .̂ The second term on the right-hand side of Eq. (23), i.e., the
4
∗ (𝐱) and 𝑞̂⋄ (𝐱), is the estimation error that depends

distance between 𝑞̂𝐼𝑆 importance sampling is an effort to reduce the estimation error while
𝐼𝑆
on the learning algorithm, training data 𝑀 , and the complexity of managing the computational [ cost. In adaptive
] importance sampling,
̂ The overall performance of a learning algorithm depends on the
. the evaluation of E𝑝 𝐼𝛺𝐹 (𝐱) ln 𝑞̂𝐼𝑆 (𝐱, 𝜣) undergoes several rounds of
interplay between these two sources of error. In general, for a given refinement to draw sufficient samples from the failure domain 𝛺𝐹 such
learning algorithm and training data 𝑀 , expanding ̂ can reduce the that a convergence criterion is met (e.g., based on the CoV of the failure
approximation error at the cost of increasing the likelihood of larger probability estimate.) If the computational cost of evaluating the limit-
estimation errors. state function 𝑔 (𝐱) is 𝑐𝑔 , the total computational cost of estimating
Next, we discuss parametric and nonparametric methods to define 𝑃𝐹 based on this IS method will become (𝑀 + 𝑁) 𝑐𝑔 , where 𝑀 is the
̂ and to develop the learning algorithm. Parametric methods usually number of training data 𝑀 required ⋄
( ) to estimate 𝜣 and 𝑁 is the
define ̂ based on a finite mixture model, a convex combination ⋄
number of samples 𝑁 from 𝑞̂𝐼𝑆 𝐱, 𝜣 required to estimate 𝑃𝐹 based
of probability distributions from a parametric family. Alternatively, on Eq. (9). For the desired level of accuracy, the numbers of ( 𝑔 (𝐱))
the (joint) IS density 𝑞̂𝐼𝑆 (𝐱) in ̂ might be designed as 𝑞̂𝐼𝑆 (𝐱) = evaluations 𝑀 and 𝑁 are dependent, implying that a refined 𝑞̂𝐼𝑆 𝐱, 𝜣 ⋄
[ ( ) ( ) ] ( ) ( )
𝒯 𝑞̂𝐼𝑆 𝑥1 , … , 𝑞̂𝐼𝑆 𝑥𝑑 ; 𝝍 , where 𝑞̂𝐼𝑆 𝑥1 , … , 𝑞̂𝐼𝑆 𝑥𝑑 are paramet- can reduce the number of samples 𝑁 in Eq. (9), and vice versa.
ric marginal densities and 𝒯 (⋅) is a parametric function with a vector of The broader literature of the IS technique also includes several
parameters 𝝍 to model the dependence structure of 𝑞̂𝐼𝑆 (𝐱). Examples of variants of adaptive methods that are less explored in the context of
density approximation methods with this design are copula models [50] reliability analysis. Notable examples are Adaptive Multiple IS [55,61]
and generative models [51,52] based on neural networks [53]. Non- and Adaptive Population IS [62,63] methods. In the Adaptive Multiple
parametric methods, however, allow the functional form of densities in IS method, 𝑞̂𝐼𝑆 (𝐱, 𝜣) is a deterministic mixture model with a grow-
̂ to grow indefinitely with training data and, hence, significantly ex- ing number of components. ( ) The adaptive process starts with a single
pand the search space .̂ A notable example of nonparametric methods, parametric PDF 𝑞 [1] 𝐱, 𝜽[1] and the number of components grows with
discussed in this section, is the kernel density and its adaptive variants. each adaptive step.
) After 𝐽 steps, the ( IS density becomes 𝑞̂𝐼𝑆 (𝐱, 𝜣) =
∑𝐽 1 [𝑗] ( )
𝑗=1 𝐽 𝑞 𝐱, 𝜽[𝑗] , where each 𝑞 [𝑗] 𝐱, 𝜽[𝑗] generates the same fixed
3.1. Importance sampling using finite mixture models (say, 𝑀) number of samples (i.e., 𝑞̂𝐼𝑆 (𝐱, 𝜣) is a deterministic mixture
′
model.) At each Step 𝑗 ′ (∈ {1, …), 𝐽 }, the vector of parameters 𝜽[𝑗 ] of the
[𝑗 ′
] [𝑗′ ]
3.1.1. Formulation added component 𝑞 𝐱, 𝜽 is estimated using all the sampled data
[ ( )]
For parametric importance sampling methods (e.g., [10,54–59]), we up to Step 𝑗 ′ (i.e., evaluating E𝑝 𝐼𝛺𝐹 (𝐱) ln 𝑞 [𝑗 ] 𝐱, 𝜽[𝑗 ]
′ ′
in Eq. (26)
may
{ define a search space in terms of finite}mixture models as ̂ = ∑𝑗 ′ −1 1 [𝑗] ( )
∑ ( ) ( ) using the IS estimator in Eq. (9) with 𝑞̂𝐼𝑆 (𝐱) ≡ 𝑗=1 𝑗 ′ −1 𝑞 𝐱, 𝜽[𝑗] .)
𝑞̂𝐼𝑆 (𝐱, 𝜣) = 𝐾𝑘=1 𝑤𝑘 𝑞𝑘 𝐱, 𝜽𝑘 ∶ 𝜣 ∈▵𝐾 ×
𝐾 , where 𝜣 = 𝐰, 𝜽
1∶𝐾
The Adaptive Population IS method follows ) a similar idea with the
are
{ model parameters; } 𝐾 is the vector of mixture weights;
𝐰 ∈▵ ▵ = ∑𝐾 1 (
∑ ( ) 𝐾 difference that 𝑞̂𝐼𝑆 (𝐱, 𝜣) = 𝑞 𝐱, 𝜽 𝑘 is a deterministic mix-
𝐰 ∈ [0, 1]𝐾 ∶ 𝐾 𝑘=1 𝑤𝑘 = 1 is a probability simplex; 𝑞𝑘 𝐱, 𝜽𝑘 is a 𝑘=1 𝐾 𝑘
ture model with a given number of components 𝐾 that is not evolv-
mixture component, a parametric PDF with parameters 𝜽𝑘 ∈ ; 
ing with the adaptive steps. As before, the deterministic (part )im-
denotes the space of all possible values of 𝜽𝑘 ; and 𝐾 is the prescribed
( ) plies that the number of samples from each component 𝑞𝑘 𝐱, 𝜽𝑘 is
number of mixture components. The selected 𝐾 and form of 𝑞𝑘 𝐱, 𝜽𝑘
fixed. At adaptive Step 𝑗 ∈ {1, … , 𝐽 }, we( can update ) the estimate
control the approximation error.
of 𝜽𝑘 using the sampled data from 𝑞𝑘[𝑗−1] 𝐱, 𝜽[𝑗−1] (i.e., evaluating
We then need to equip ̂ with a distance measure, required to define [ ( )]
𝑘
convergence. A typical distance measure is the relative entropy (also E𝑝 𝐼𝛺𝐹 (𝐱) ln 𝑞𝑘 𝐱, 𝜽𝑘 in Eq. (26) using the IS estimator in Eq. (9) with
( )
known as the Kullback–Leibler (KL) divergence) [60], which is defined 𝑞̂𝐼𝑆 (𝐱) ≡ 𝑞𝑘[𝑗−1] 𝐱, 𝜽[𝑗−1] .) The 𝐾 components of the mixture model are
𝑘
for two densities 𝑞1 (𝐱) and 𝑞2 (𝐱) as treated as separate IS densities in the adaptive process; however,
∑ ( )the
[ ] [ ] mixture model in its complete form (i.e., 𝑞̂𝐼𝑆 (𝐱) ≡ 𝐾 1
𝑞𝑘 𝐱, 𝜽𝑘 ) is
𝐷𝐾𝐿 𝑞1 (𝐱) ∥ 𝑞2 (𝐱) = 𝑞1 (𝐱) ln 𝑞1 (𝐱) − ln 𝑞2 (𝐱) d𝐱, (24) 𝑘=1 𝐾
∫𝒳 used in Eq. (9) to estimate the failure probability. A review of these
where 𝐷𝐾𝐿 (⋅ ∥ ⋅) is the relative entropy or KL divergence. Strictly methods and their performances in the context of signal processing
speaking, 𝐷𝐾𝐿 is not a proper distance measure since it is not symmetric can be found in [64,65]. The consistency of the adaptive IS methods
[ ] [ ] when 𝑁, 𝑀 → ∞ with a fixed number of adaptive steps 𝐽 follows from
(i.e., 𝐷𝐾𝐿 𝑞1 (𝐱) ∥ 𝑞2 (𝐱) ≠ 𝐷𝐾𝐿 𝑞2 (𝐱) ∥ 𝑞1 (𝐱) ) and does not satisfy the
triangle inequality. Nonetheless, 𝐷𝐾𝐿 provides useful information to the same argument discussed in Section 2.2. However, evaluating the
∗ (𝐱, 𝜣) ∈ .
find the optimal IS density 𝑞̂𝐼𝑆 ̂ Finding 𝑞̂∗ (𝐱, 𝜣) ∈ ̂ then consistency of the estimator when 𝐽 → ∞ with fixed 𝑁 and 𝑀 is more
𝐼𝑆
nuanced since the estimator at each adaptive step is usually biased (see
amounts to solving the following optimization problem:
[ ∗ ] the discussion in [60,63].)
𝜣 ∗ = arg min 𝐷𝐾𝐿 𝑞𝐼𝑆 (𝐱) ∥ 𝑞̂𝐼𝑆 (𝐱, 𝜣) ∗
𝜣∈▵𝐾 × 𝐾 (For dynamical)systems, one idea is to approximate 𝐮𝑡 by replacing
(25) 𝑃𝐹 𝐱𝑡𝑖 = 𝐱̌ 𝑡 , 𝑡𝑖 = 𝑡 in Eq. (22) with the 𝑃𝐹 (⋅) of an equivalent linear
∗
= arg max 𝑞𝐼𝑆 (𝐱) ln 𝑞̂𝐼𝑆 (𝐱, 𝜣) d𝐱, system [66,67]. This method can yield reasonable approximations for
𝜣∈▵𝐾 × 𝐾 ∫𝒳
mildly nonlinear systems, but the results become unsatisfactory with
where the transition to the second line follows from the definition of the increasing level of nonlinearity [68]. A viable solution to this
∗ (𝐱) ∝
𝐷𝐾𝐿 and retaining only the terms that involve 𝜣. Substituting 𝑞𝐼𝑆 limitation is based on the work of Tanaka [5], where 𝐮∗𝑡 is approximated
𝐼𝛺𝐹 (𝐱) 𝑝 (𝐱) from Eq. (11) into Eq. (25) then yields by a design-point excitation — the most likely sample path of 𝐰̇ 𝑡
(𝐰̇ 𝑡 d𝑡 = d𝐰𝑡 ) in Eq. (5) out-crossing the failure boundary at a given
𝜣 ∗ = arg max 𝐼𝛺𝐹 (𝐱) ln 𝑞̂𝐼𝑆 (𝐱, 𝜣) 𝑝 (𝐱) d𝐱 time 𝑡 = 𝑡𝑘 . The idea is first to cast the problem of approximating 𝐮∗𝑡
𝜣∈▵𝐾 × 𝐾 ∫𝒳
[ ] (26) as a time-invariant reliability analysis and then use FORM to find the
= arg max E𝑝 𝐼𝛺𝐹 (𝐱) ln 𝑞̂𝐼𝑆 (𝐱, 𝜣) , design-point excitation. The time-invariant reliability analysis is based
𝜣∈▵𝐾 × 𝐾
[ ] on the discrete representation of d𝐰𝑡 in Eq. (5) in terms of a finite set
where E𝑝 𝐼𝛺𝐹 (𝐱) ln 𝑞̂𝐼𝑆 (𝐱, 𝜣) is estimated based on the training data of random vectors, leading to the discrete governing equation
𝑀 , using the Monte Carlo estimator (c.f. Eq. (2)) or, more effi- ( ) ( )
𝐱𝜏+𝛥𝑡 = 𝐱𝜏 + 𝐚 𝐱𝜏 , 𝜏 𝛥𝑡 + 𝐁 𝐱𝜏 , 𝜏 𝛥𝐰𝜏 , (27)
ciently, the importance sampling estimator (c.f. Eq.
[ (9)). The selected] √
estimator, number of samples, landscape of E𝑝 𝐼𝛺𝐹 (𝐱) ln 𝑞̂𝐼𝑆 (𝐱, 𝜣) , for 𝜏 = 𝑡𝑠 , … , 𝑡𝑒 − 𝛥𝑡, with the initial condition 𝐱𝑡𝑠 , and 𝛥𝐰𝜏 = 2𝜋𝛥𝑡𝐳𝜏 ,
and optimization algorithm collectively control the estimation error; where 𝐳𝜏 is the vector of independent and identically distributed stan-
hence, the optimization problem may yield 𝜣 ⋄ instead of 𝜣 ∗ . Adaptive dard Gaussian random variables at time 𝜏. The time-integration in
5
Eq. (27) is the Euler–Maruyama scheme. A good introduction to differ- 3.1.2. Characteristics, challenges, and recent developments
ent time-integration schemes for stochastic differential equations can A common finite mixture model is the Gaussian mixture, where
( )
be found in [69]. the mixture components are Gaussian distributions. i.e., 𝑞𝑘 𝐱, 𝜽𝑘 ≡
( ) 𝑑
The design-point excitation at time 𝑡 = 𝑡𝑘 is the result of the 𝑘 𝐱, 𝝁𝑘 , 𝜮 𝑘 , where 𝝁𝑘 ∈ R is the component mean vector, and
𝜮 𝑘 ∈ R𝑑×𝑑 is the component covariance matrix. Gaussian mixture
following time-invariant reliability analysis [70,71]: ( )
with a free parameter 𝐾 is dense in 𝐿2 𝛺𝐹 (e.g., [73]), the space
⎧ ⎫ of square-integrable densities; thus, by letting 𝐾 grow indefinitely, the
( ) ⎪ ∑ ⊺ [ ( )] ⎪ ∗ (𝐱) as close as
∗
𝐳 𝑡𝑘 = arg min ⎨ 𝐳𝜏 𝐳𝜏 ∶ 𝑔 𝐳 𝑡𝑘 ≤ 0⎬ , (28) Gaussian mixture model can in principle approach 𝑞𝐼𝑆
⎪𝜏∈{𝑡𝑠 ,…,𝑡𝑒 −𝛥𝑡} ⎪ desired (i.e., it reduces the approximation error.) However, the number
⎩ ⎭ of unknown model parameters rapidly increases with 𝐾, as does the
[( ) ]
( ) likelihood of a larger estimation error, unless the number of training
that is solved using FORM, where 𝐳∗ 𝑡𝑘 = vec 𝐳𝑡∗ , … , 𝐳𝑡∗ −𝛥𝑡 is a data 𝑀 increases accordingly. The number of unknown model param-
𝑠 𝑒 𝑡=𝑡𝑘
column vector [with( operator vec)] (⋅) stacking columns into a vector, and eters also rapidly increases with the dimensionality of the uncertain
[ ( )]
𝑔 𝐳 𝑡𝑘 = 𝑔 𝐱𝑡𝑘 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 . The design-point excitation yields state variables (a Gaussian mixture model with 𝐾 components for
( ) √ ∑ state variables 𝐱 of dimension 𝑑 includes (𝐾 − 1) + 𝐾𝑑 + 𝐾𝑑 (𝑑 + 1) ∕2
the control 𝐮̂ ∗𝑡 𝑡𝑘 = 2𝜋∕𝛥𝑡 𝜏∈{𝑡𝑠 ,…,𝑡𝑒 −𝛥𝑡} 𝐳𝜏∗ 𝛿 (𝑡 − 𝜏). Substituting
( ) unknown model parameters in 𝐰, 𝝁1∶𝐾 , and 𝜮 1∶𝐾 .) The rapid growth
∗
𝐮̂ 𝑡 𝑡𝑘 into Eq. (14), we can find the response of the system under the ( )
( ) rate of parameters  𝑘𝑑 2 (also known as the curse of dimensionality)
effect of 𝐮̂ ∗𝑡 𝑡𝑘 as
of Gaussian mixture models limits their use to estimate small failure
[ ( ) ( ) ( )] ( ) probabilities for high-dimensional problems (i.e., 𝑑 ≳ 100.)
𝐱̌ 𝜏+𝛥𝑡 = 𝐱̌ 𝜏 + 𝐚 𝐱̌ 𝜏 , 𝜏 + 𝐁 𝐱̌ 𝜏 , 𝜏 𝐮̂ ∗𝑡 𝑡𝑘 𝛥𝑡 + 𝐁 𝐱̌ 𝜏 , 𝜏 𝛥𝐰̌ 𝜏 , (29)
The current literature includes several efforts to overcome the curse
for 𝜏 = 𝑡𝑠 , … , 𝑡𝑒 − 𝛥𝑡. The first-passage failure event is not restricted to of dimensionality problem while maintaining the general flexibility of
finite mixture models in approximating 𝑞𝐼𝑆 ∗ (𝐱). These efforts exploit
a single out-crossing event at the specific time 𝑡 = 𝑡𝑘 . Instead, we need
( ) [ ] ̂ In
to find 𝐮̂ ∗𝑡 𝑡𝑘 for all 𝑡𝑘 ∈ 𝑡𝑠 , 𝑡𝑒 . The out-crossing times 𝑡𝑘 and their the specific properties of 𝛺𝐹 in high-dimensional space to define 𝑄.
( ) high-dimensional standard Gaussian probability space, the
associated controls 𝐮̂ ∗𝑡 𝑡𝑘 are not equally likely, but have a probability √ probability
distribution [38]. Following Bucher [72], we can express the first-order density concentrates about the surface of a ball of radius 𝑑 (e.g., [74,
75]). Leveraging this property, Wang and Song [58] proposed a von
approximation of the out-crossing time’s PDF as:
[ ( )] Mises–Fisher mixture model for importance sampling whose samples
( ) 𝛷 −𝛽 𝑡𝑘 d𝜏 are on the surface of a ball in R𝑑 . This method reduces the growth
𝑤𝑘 = 𝑓 𝑡𝑘 = 𝑡 , ( )
∫𝑡 𝑒 𝛷 [−𝛽 (𝜏)] d𝜏 rate of parameters from  𝑘𝑑 2 in Gaussian mixtures to  (𝑘𝑑); though,
𝑠
[ ∗ ( )] (30) it requires a transformation from the original probability space to the
( ) ∇𝐳(𝑡𝑘 ) 𝑔 𝐳 𝑡𝑘
∗
( ) standard Gaussian probability space. The von Mises–Fisher mixture
𝛽 𝑡𝑘 = − [ ( )]‖ 𝐳 𝑡 ,
‖ 𝑘
model is exclusively designed for high-dimensional importance sam-
‖∇𝐳(𝑡𝑘 ) 𝑔 𝐳∗ 𝑡𝑘 ‖
‖ ‖ pling since the required density concentration property does not hold
( )
where 𝛽 𝑡𝑘 is the reliability index of the out-crossing event at time in low-dimensional space. Alternatively, the variants of dimensionality
| ( )| reduction techniques (e.g., [76–79]) can be used to first identify a
𝑡 = 𝑡 (the magnitude of the reliability index simplifies to |𝛽 𝑡𝑘 | =
√ 𝑘 | |
𝑡 ∗⊺ low-dimensional subspace of 𝒳 in which the training data 𝑀 re-
∫𝑡 𝑒 𝐳𝜏 𝐳𝜏∗ d𝜏), and 𝛷 (⋅) is the standard Normal (Gaussian) probability
𝑠 { } side, and then develop the finite mixture model for the projected
distribution function. Letting 𝑡𝑘 ∈ 𝑡𝑠 , … , 𝑡𝑒 − 𝛥𝑡 , we replace the inte- 𝑀 on the identified low-dimensional subspace. The presented IS
∑ [ ( )] ( )
grand in the definition of 𝑤𝑘 with 𝑡𝑘 ∈{𝑡𝑠 ,…,𝑡𝑒 −𝛥𝑡} 𝛷 −𝛽 𝑡𝑘 𝛿 𝜏 − 𝑡𝑘 , method for dynamical systems also offers a solution for academic type
leading to a probability mass function for the out-crossing time. high-dimensional problems (i.e., idealized dynamical systems driven
Accordingly, we can express the IS estimate of the failure probabil- by stochastic processes.) The computational cost of finding design-
ity as point excitations for many out-crossing times is a significant challenge
( ) to implement this IS method for more realistic nonlinear dynamical
( ) 𝑝 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 systems. There have been some recent efforts to facilitate the computa-
𝑃𝐹 = 𝐼 𝐱̌ ( ) tion of design-point excitations (see, for example, [40].) However, the
∫R𝑑×𝐾 𝛺𝐹 𝑡 (31)
𝑞𝐼𝑆 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 controls defined based on the design-point excitation generally carry
( )
the limitations of FORM, discussed earlier. Other methods of designing
× 𝑞𝐼𝑆 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 d𝐳𝑡𝑠 … d𝐳𝑡𝑒 −𝛥𝑡 ,
such (stochastic) controls require an advanced level of knowledge on
( ) ∏ ∏ ( ) the stochastic control theory, making any possible solutions even less
where 𝑝 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 = 𝜏∈{𝑡𝑠 ,…,𝑡𝑒 −𝛥𝑡} 𝑑𝑖=1 𝜙 𝑧𝑖,𝜏 is the base joint
convenient.
PDF, ( in which ) 𝜙 (⋅) is the standard
( Normal ) (Gaussian) PDF; and
∑
𝑞𝐼𝑆 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 = 𝐾 𝑘=1 𝑤𝑘 𝑞𝑘 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 is the IS density, a mix- 3.2. Importance sampling using kernel densities
ture of standard Normal (Gaussian) densities ( centered
) at∏design-points
{ ∗( ) }
𝐳 𝑡𝑘 ∶ 𝑘 = 1, … , 𝐾 , in which 𝑞𝑘 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 = 𝜏∈{𝑡𝑠 ,…,𝑡𝑒 −𝛥𝑡} 3.2.1. Formulation
∏𝑑 ( )
∗ with the bijective mapping {1, … , 𝐾} ↔ For nonparametric importance sampling methods (e.g.,
𝑖=1 𝜙 𝑧𝑖,𝜏 − 𝑧𝑖,𝜏 ; 𝑡𝑘 {
{ } [8,9,80–83]), we may define the search space as ̂ = 𝑞̂𝐼𝑆} (𝐱, 𝜃) = 1∕
𝑡𝑠 , … , 𝑡𝑒 − 𝛥𝑡 . ( ) ∑ [( ) ]
𝑀 (𝑚) ∕𝜃 ∶ 𝐾 (𝐱) ≥ 0, ∫ 𝐾 (𝐱) d𝐱 = 1 , where 𝐾
We can retrieve the discrete counterpart of the Radon–Nikodym 𝑀𝜃 𝑑 𝑚=1 𝐾𝑞 𝐱 − 𝐱 𝑞 𝒳 𝑞 𝑞
( ) is a kernel density, a non-negative function that integrates to one; 𝜃 > 0
derivative in Eq. (15) by setting 𝐮̂ ∗𝑡 ≡ 𝐮̂ ∗𝑡 𝑡𝑘 and 𝑞𝐼𝑆 (⋅) ≡ 𝑞𝑘 (⋅) in
is a model parameter, called bandwidth. A common kernel density is
Eq. (31) [38]. Each sample from 𝑞𝐼𝑆 (⋅) yields a particular control 𝐮̂ ∗𝑡 = ( )
( ) based on the Gaussian kernel, defined as 𝐾𝐺 (𝐱) ∝ exp −1∕2𝐱𝜮 −1 𝐱⊺ ,
𝐮̂ ∗𝑡 𝑡𝑘 with probability 𝑤𝑘 . Associated with the selected control is a
where 𝜮 can be taken as the covariance matrix of the sampled data 𝑀
sample path of 𝐱̌ 𝑡 , obtained
( from the
) numerical solution of Eq. (29). to capture the spread of data in different directions. We may use 𝐿1 -
For each realization 𝐳𝑡𝑠 , … , 𝐳𝑡𝑒 −𝛥𝑡 ∼ 𝑞𝐼𝑆 , finding the sample path norm to measure the distance between 𝑞̂𝐼𝑆 (𝐱, 𝜃) ∈ ̂ and 𝑞𝐼𝑆∗ (𝐱) for the
𝐱̌ 𝑡(𝑛) amounts to solving a first-order deterministic initial value problem. given 𝑀 . We gain insight into the overall performance of the kernel
( ) ( )
Given a set of sampled pairs 𝐮̂ ∗𝑡 𝑡𝑘 and 𝐱̌ 𝑡(𝑛) , we can estimate the failure density estimator by evaluating the behavior of its 𝐿1 -error  𝑀 =
‖ ∗ (𝐱)‖
probability using the IS estimator in Eq. (16) (or, equivalently, Eq. (9).) ‖𝑞̂𝐼𝑆 (𝐱, 𝜃) − 𝑞𝐼𝑆 ‖ 1 (i.e., error defined based on the 𝐿1 -norm.) As
‖ ‖𝐿
6
( ) ∗ (𝐱). It can
before,  𝑀 is( a random ) function of samples 𝑀 ∼ 𝑞𝐼𝑆 together with the sensitivity factor 𝛼 (i.e., the unknown model parame-
( )
be shown that  𝑀 has a bounded difference (i.e., if we change ters in the optimization problem in Eq. (34) become 𝜣 = 𝛼, 𝜃1 , … , 𝜃𝑑 .)
the value of any 𝐱(𝑚) in the definition ( of )𝑞̂𝐼𝑆 (𝐱, 𝜃) while keeping all The broader literature includes more variants of adaptive methods for
the others fixed, then the value of  𝑀 will not change by more nonparametric IS density that are not explored in the context of reli-
than
( (a finite value); accordingly, ) the McDiarmid’s inequality yields ability analysis. A notable example is the Standard Population Monte
| ) [ ( )]| ( )
P | 𝑀 − E𝑞 ∗  𝑀 | > 𝜀 ≤ 2 exp −𝑀𝜀2 ∕2 [84]. This result Carlo method [87] that caused a resurgence of interest in adaptive IS
| |
explains how quickly the 𝐿1 -error decays with the number of training methods after earlier contributions by, for example, Oh and Berger [48]
data 𝑀 . (see [64,65] for more background.) In this adaptive method, 𝑞̂𝐼𝑆 (𝐱, 𝜣)
The task of finding 𝑞̂𝐼𝑆 ∗ (𝐱, 𝜃) ∈  ̂ amounts to finding 𝜃 ∗ such is a deterministic mixture with 𝑀 components, where each 𝐱(𝑚) ∈ 𝑀
( )
∗
that a discrepancy measure between 𝑞̂𝐼𝑆 (𝐱, 𝜃) and 𝑞𝐼𝑆 ∗ (𝐱) is minimized. is drawn from a specific density 𝑞̂𝑚 𝐱, 𝜽𝑚 . The mixture components
Examples of such discrepancy measures include the KL divergence and can all belong to a given family of probability distributions (e.g., all
‖ ∗ (𝐱)‖ . A common discrepancy measure is the 𝐿2 -error,
‖𝑞̂𝐼𝑆 (𝐱, 𝜃) − 𝑞𝐼𝑆 ‖ 1 Gaussian) with different parameters, or each component can have its
( )
‖ ‖𝐿
which yields own parametric form. In updating the estimates of 𝜣 = 𝜽1 , … , 𝜽𝑀 ,
‖ ‖ the adaptive process involves a re-sampling step that aims to address
𝜃 ∗ = arg min ‖𝑞̂𝐼𝑆 (𝐱, 𝜃) − 𝑞𝐼𝑆 ∗
(𝐱)‖ 2
𝜃∈R+ ‖ ‖𝐿 the degeneracy problem arising due to samples with small IS weights
(32) ( ) ( )
| ∗ |2 (i.e., samples 𝐱(𝑚) with 𝑝 𝐱(𝑚) ∕𝑞̂𝑚 𝐱(𝑚) , 𝜽𝑚 ≪ 1.) The interested
= arg min |𝑞̂𝐼𝑆 (𝐱, 𝜃) − 𝑞𝐼𝑆 (𝐱)| d𝐱,
𝜃∈R+ ∫𝒳 | | reader is referred to [47] (Chapter 5) for the schematic illustration of
the degeneracy problem and the solution offered by the re-sampling
where R+ indicates positive real numbers. The 𝐿2 -error in the above
step in the context of particle filtering.
optimization problem assigns equal weights to small and large differ-
ences between 𝑞̂𝐼𝑆 (𝐱, 𝜃) and 𝑞𝐼𝑆∗ (𝐱), leading to poor tail estimate of
3.2.2. Characteristics, challenges, and recent developments
𝑞̂𝐼𝑆 (𝐱, 𝜃). However, the accuracy of the tail estimate is the most relevant
to failure probability. Introducing a weighting function can help avoid It is known that the adaptive kernel density in Eq. (36) may suffer
the dominance of large error values in regions closer to the density from the curse of dimensionality, where the number of required train-
mode(s). Accordingly, we may rewrite the optimization problem as ing data 𝑀 rapidly increases with the number of dimensions 𝑑. One
idea to overcome the curse of dimensionality is to focus on the most
1 | |2
𝜃 ∗ = arg min ∗
|𝑞̂ (𝐱, 𝜃) − 𝑞𝐼𝑆 (𝐱)| d𝐱, (33) influential or important random variables and develop the IS density
( )
𝜃∈R+ ∫𝒳 𝑞̂𝐼𝑆 (𝐱, 𝜃) | 𝐼𝑆 |
for those random variables (i.e., focus on optimizing 𝑞̂𝐼𝑆 𝐱𝑢 , 𝜣 , where
where the weighting function 1∕𝑞̂𝐼𝑆 (𝐱, 𝜃) captures the importance of the 𝐱𝑢 is a sub-vector of 𝐱 indexed by 𝑢 ⊆ {1, … , 𝑑}.) Jia and Taflanidis
tail estimate by inflating small error values in the tail. [88] proposed a probabilistic sensitivity analysis method to identify 𝐱𝑢 ,
Minimizing the weighted 𝐿2 -error in Eq. (33) is equivalent to mini- a sub-vector of random variables that contribute the most to the failure
mizing the variance of the failure probability estimator in Eq. (7) using probability estimate.
𝑞̂𝐼𝑆 (𝐱, 𝜃) as the IS density (see [85] for the proof.) Substituting Eq. (8) Another idea to overcome the curse of dimensionality is to de-
into Eq. (7) and using the estimator of E𝑞 [⋅] (c.f. Eq. (6)) yields [8] velop efficient sampling algorithms for generating 𝑀 . The adaptive IS
[ ( ) ] method in [9] uses the Markov Chain Monte Carlo (MCMC) simulation
1 ∑
𝑀
∗ 𝑃𝐹 𝑝 𝐱(𝑚)
𝜃 = arg min ( ) − 𝑃𝐹 , (34) algorithm to generate 𝑀 , which is an improvement with respect to
𝜃∈R+ 𝑁 𝑀 𝑚=1 𝑞̂𝐼𝑆,∼𝑚 𝐱(𝑚) , 𝜃
earlier contributions that used the standard MC simulation (e.g., [8]).
( ) [ ] ∑𝑀 [( (𝑛) ) ]
where 𝑞̂𝐼𝑆,∼𝑛 𝐱(𝑛) , 𝜃 = 1∕ (𝑀 − 1) 𝜃 𝑑 − 𝐱(𝑚) ∕𝜃 . However, the MCMC algorithm (with a low acceptance rate) may gen-
𝑚=1,𝑚≠𝑛 𝐾𝑞 𝐱
The above expression can be further simplified, retaining only the erate many repeated samples, leading to a reduced number of distinct
terms that involve the unknown parameter 𝜃. The selected discrepancy samples for kernel density estimation. For this, the idea that Jia et al.
measure, estimators, number of samples, and optimization algorithm [89] proposed can be applied, where the adaptive kernel density with
collectively control the estimation error; hence, the optimization prob- an accept-reject simulation algorithm is used to efficiently generate
lem may yield 𝜃 ⋄ instead of 𝜃 ∗ . The number of training data 𝑀 and 𝑀 , rather than using the kernel density only for the failure probability
number of samples 𝑁 from 𝑞̂𝐼𝑆 ⋄ (𝐱, 𝜃) = 𝑞̂ (𝐱, 𝜃 ⋄ ) to estimate 𝑃 in estimation. Recent algorithmic advances like in Metropolis adjusted
𝐼𝑆 𝐹
Eq. (6) can be increased if the estimator’s variance from Eq. (7) exceeds Langevin diffusion algorithm (e.g., [90,91]) and Hamiltonian Monte
the desired acceptable level. As before, the total computational cost of Carlo (e.g., [92,93]) made it possible to improve the acceptance rate of
estimating 𝑃𝐹 based on this IS method is (𝑀 + 𝑁) 𝑐𝑔 . MCMC algorithm and accelerate the exploration of the failure domain
A large value of 𝜃 ⋄ may lead to over-smoothed 𝑞̂𝐼𝑆 (𝐱, 𝜃 ⋄ ), whereas ∗ (𝐱) is transformed
𝛺𝐹 . In these algorithms, the task of sampling from 𝑞𝐼𝑆
a small value of 𝜃 ⋄ may lead to spurious noises in the tail of 𝑞̂𝐼𝑆 (𝐱, 𝜃 ⋄ ) into simulating from a dynamical system, governed by an over-damped
due to sparse samples [9]. Adaptive kernel density estimate [8,9] is Langevin equation, whose response is a Markov process with 𝑞𝐼𝑆 ∗ (𝐱) as
an effort to address these issues by adjusting the bandwidth according the density of its stationary distribution (e.g., [83].)
to the samples’ sparsity. The strategy is to introduce a local correction
factor 𝜆𝑛 for 𝜃 associated with 𝐱(𝑛) ∈ 𝑀 as [8] 4. Limit-state approximation methods
[ ( (𝑚) )]1∕𝑀 ⎫𝛼
⎧ ∏𝑀
⎪ 𝑚=1 𝑝 𝐱 ⎪ 4.1. Formulation
𝜆𝑛 = ⎨ ( ) ⎬ , (35)
⎪ 𝑝 𝐱(𝑛) ⎪
⎩ ⎭ The objective of limit-state approximation methods (e.g., [11,24,
where 𝛼 ∈ [0, 1] is the sensitivity factor, for which a value of 𝛼 = 0.5 28,94–97]) is to develop a fast-to-evaluate surrogate 𝑔̂ ⋄ (𝐱) that closely
is suggested based on experience [86]. Setting 𝛼 = 0 or equivalently approximates the computationally demanding 𝑔 (𝐱). We can then write
𝜆𝑛 = 1 leads to the fixed-bandwidth kernel density. We then write the ⋄ (𝐱) ∝ 𝐼 ̂⋄ ⋄
the IS density as 𝑞̂𝐼𝑆 𝛺̂ 𝐹⋄ (𝐱) 𝑝 (𝐱), where 𝛺𝐹 = {𝐱 ∈ 𝒳 ∶ 𝑔̂ (𝐱) ≤ 0}
general form of the adaptive kernel densities in the search space ̂ as is the approximated failure domain. Since ̂ ⊆ , the surrogate 𝑔̂ ⋄ (𝐱)
( ) of a well-defined IS density 𝑞̂⋄ (𝐱) should satisfy 𝛺̂ ⋄ ⊇ 𝛺𝐹 . The
1 ∑
𝑀
1 𝐱 − 𝐱(𝑚) 𝐼𝑆 𝐹
𝑞̂𝐼𝑆 (𝐱, 𝜃) = ( )𝑑
𝐾𝑞 . (36) development of 𝑔̂ ⋄ (𝐱) involves two main tasks, defining the space of
𝑀 𝑚=1 𝜆 𝜃 𝜆𝑚 𝜃
𝑚 candidate surrogates ̂ to approximate 𝑔 (𝐱), and finding a surrogate
The adaptive method 𝑔̂ ⋄ (𝐱) ∈ ̂ that closely approximates 𝑔 (𝐱). The first task restricts the
( ) might be further refined to consider separate
bandwidths 𝜃1 , … , 𝜃𝑑 for different coordinates of 𝐱 and optimize them space of candidate surrogates that the learning algorithm is allowed to
7
[ ( )]⊺
search for 𝑔̂ ⋄ (𝐱), and the second task requires a learning algorithm to 1, … , 𝑀; 𝐠 = 𝑔 𝐱(𝑚) is the column vector of limit-state functions
[ ( )]⊺
find 𝑔̂ ⋄ (𝐱). We can bound the total error ‖𝑔̂ ⋄ (𝐱) − 𝑔 (𝐱)‖𝐿2 (𝑝) as evaluated at training data 𝑀 ; 𝐇 = 𝐡 𝐱(𝑚) is the matrix that collects
the basis functions values evaluated at training data 𝑀 ; and 𝐑 =
‖𝑔̂ ⋄ (𝐱) − 𝑔 (𝐱)‖ 2 ≤ ‖𝑔̂ ⋄ (𝐱) − 𝑔̂ ∗ (𝐱)‖ 2 + ‖𝑔̂ ∗ (𝐱) − 𝑔 (𝐱)‖ 2 , (37)
‖ ‖𝐿 (𝑝) ‖ ‖𝐿 (𝑝) ‖ ‖𝐿 (𝑝) 𝐡 (𝐱) − 𝐇𝐊−1 𝐤⊺ (𝐱). It can also be shown that the maximum likelihood
√ ( )−1
estimate of the unknown model parameters is 𝜽⋄ = 𝐇𝐊−1 𝐇⊺ 𝐇𝐊−1 𝐠
where ‖⋅‖𝐿2 (𝑝) = ∫ |⋅|2 𝑝 (𝐱) d𝐱 is the 𝐿2 (𝑝)-norm; and 𝑔̂ ∗ (𝐱) ∈ ̂ (e.g., [99]).
( )
is the surrogate with the minimum 𝐿2 (𝑝)-error in ̂ (i.e., 𝑔̂ ∗ (𝐱) = The covariance function 𝐾 𝐱, 𝐱′ of the Gaussian process also in-
arg min𝑔(𝐱)∈
̂ ̂ ‖𝑔̂ (𝐱) − 𝑔 (𝐱)‖𝐿2 (𝑝) .) The overall performance of a learn- cludes some unknown hyperparameters that need to be estimated.
ing algorithm depends on the interplay between the estimation error Hence, the results presented so far are conditioned on the estimates
‖𝑔̂ ⋄ (𝐱) − 𝑔̂ ∗ (𝐱)‖𝐿2 (𝑝) and the approximation error ‖𝑔̂ ∗ (𝐱) − 𝑔 (𝐱)‖𝐿2 (𝑝) . of these hyperparameters. It is the estimation of these hyperparame-
We further explain the error analysis through a general regression ters that controls the estimation error and may cause achieving 𝑔̂ ⋄ (𝐱)
{ }
problem. Let ̂ = 𝑔̂ (𝐱) ∈ 𝐾 (𝒳 ) ∶ ‖𝑔̂ (𝐱)‖𝐾 ≤ 𝜌 be the search space, instead of 𝑔̂ ∗ (𝐱). The estimation of the unknown model parameters
where 𝐾 (𝒳 ) is a reproducing kernel Hilbert space, equipped with a often entails an iterative two-step process between the estimation of
Mercer kernel 𝐾 ∶ 𝒳 × 𝒳 → R (e.g., [98]); and 𝜌 is the regularization ( )
𝜽⋄ and the hyperparameters in 𝐾 𝐱, 𝐱′ until a convergence criterion
constant, with a similar role as a prior distribution in Bayesian infer- is met. We finally remark that the mean function 𝝁⋄ (𝐱) of the esti-
ence (e.g., [99]). For every function 𝑔̂ (𝐱) ∈ 𝐾 (𝒳 ), there exists some
∑ ( ) mated Gaussian process in Eq. (36) resembles typical functions in a
𝑠 ∈ N, 𝜉1 , … , 𝜉𝑠 ∈ R, and 𝐱1 , … , 𝐱𝑠 ∈ 𝒳 such that 𝑔̂ (𝐱) = 𝑠𝑖=1 𝜉𝑖 𝐾 𝐱𝑖 , 𝐱 reproducing kernel Hilbert space; to help see the analogy, we rewrite
(e.g., [84].) We can then express the kernel-induced norm as ‖𝑔̂ (𝐱)‖𝐾 = ∑ ( (𝑚) )
√ ( ) the mean function as 𝜇 ⋄ (𝐱) = 𝐡⊺ (𝐱) 𝜽⋄ + 𝑀 𝑚=1 𝜉𝑚 𝐾 𝐱, 𝐱 , where
∑𝑠 −1
( ⊺ ⋄
)
̂ 𝝃=𝐊 𝐠−𝐇 𝜽 .
𝑖 ,𝑖 =1 𝜉𝑖1 𝜉𝑖2 𝐾 𝐱𝑖1 , 𝐱𝑖2 . The defined search space  is general and
1 2
closely related to the search space of the Gaussian process regression Predictions based on 𝑔̂ ⋄ (𝐱) should ideally consider the statistical
(see, for example, [99,100].) We also assume that 𝑔̂ (𝐱) is bounded with uncertainty arising from the estimates of model parameters and model
probability one (i.e., |𝑔̂ (𝐱)| ≤ 𝐵 for some constant 𝐵 < ∞.) Using the uncertainty captured by 𝝈 ⋄ (𝐱). The current methods in the litera-
least-square algorithm, we obtain 𝑔̂ ⋄ (𝐱) from ture, however, typically use a point estimate of the model parameters
(e.g., maximum a posterior estimate) and set 𝑔̂ ⋄ (𝐱) = 𝜇 ⋄ (𝐱). Though,
1 ∑ | ( (𝑚) )
𝑀
( )|2 the approach in [24] uses the predictive estimate of 𝐼𝛺̂ ⋄ (𝐱) in the
𝑔̂ ⋄ (𝐱) = arg min |𝑔̂ 𝐱 − 𝑔 𝐱(𝑚) | , (38) 𝐹
𝑔(𝐱)∈
̂ ̂ 𝑀 | | definition of 𝑞̂⋄ (𝐱) ∝ 𝐼 ̂ ⋄ (𝐱) 𝑝 (𝐱). The predictive estimate 𝐼̃ ̂ ⋄ (𝐱)
𝑚=1 𝐼𝑆 𝛺𝐹 𝛺𝐹
{ } incorporates model uncertainty as
where the training data 𝑀 = 𝐱(𝑚) ∶ 𝑚 = 1, … , 𝑀 are sampled from [ ⋄ ]
[ ] ( ) 𝜇 (𝐱)
𝑝 (𝐱). Hajek and Raginsky [84] showed that with confidence 1 − 𝛾, the
𝐼̃𝛺̂ ⋄ (𝐱) = E 𝐼𝛺̂ ⋄ (𝐱) = P 𝐱 ∈ 𝛺̂ 𝐹⋄ = 𝛷 − ⋄ . (41)
least-square result 𝑔̂ ⋄ (𝐱) satisfies 𝐹 𝐹 𝜎 (𝐱)
( )2 In the point estimate of 𝐼𝛺̂ ⋄ (𝐱), the random function 𝑔̂ ⋄ (𝐱) is
𝐵 + 𝐶𝐾 𝜌
‖𝑔̂ ⋄ (𝐱) − 𝑔 (𝐱)‖ 2 ≤ ‖𝑔̂ ∗ (𝐱) − 𝑔 (𝐱)‖ 2 + 16 √
𝐹
‖ ‖𝐿 (𝑝) ‖ ‖𝐿 (𝑝) replaced by its mean function 𝜇 ⋄ (𝐱) in the definition of 𝛺̂ 𝐹⋄ . The point
𝑀 (39)
√ estimate is only an approximation of the predictive estimate that can
( 2 ) 8 ln (1∕𝛾) be obtained from the Taylor expansion of a smoothed version of 𝐼𝛺̂ ⋄ (𝐱)
+ 𝐵 + 𝐶𝐾2 𝜌2 ,
𝑀 around 𝜇⋄ (𝐱). Using the predictive estimate, we can then write 𝑞̂𝐼𝑆 ⋄𝐹 (𝐱)
√
where 𝐶𝐾 = sup𝐱∈𝒳 𝐾 (𝐱, 𝐱) < ∞. We learn from the above expres- as
̂ 𝐼̃𝛺̂ ⋄ (𝐱) 𝑝 (𝐱)
sion that the estimation error for the √ considered search space  and
⋄ 𝐹
least-square algorithm decays with 𝑀. 𝑞̂𝐼𝑆 (𝐱) = , (42)
𝑃𝐹⋄
The distinction among available limit-state approximation methods
is arising from the design of the search space ̂ and the learning where 𝑃𝐹⋄ = ∫𝒳 𝐼̃𝛺̂ ⋄ (𝐱) 𝑝 (𝐱) d𝐱 is the base failure probability estimate by
𝐹
algorithm to find 𝑔̂ ⋄ (𝐱). Examples of surrogates with different designs of the mere replacement of 𝐼𝛺𝐹 (𝐱) with 𝐼̃𝛺̂ ⋄ (𝐱) (i.e., before applying the
̂ and learning algorithms include the variants of response surface [11– IS modification.) Substituting 𝑞𝐼𝑆 (𝐱) = 𝑞̂𝐼𝑆 ⋄𝐹 (𝐱) in the failure probability
15], support vector regression [16–19], neural network [20–23], Gaus- integral in Eq. (8) yields
sian process [24–26], and polynomial chaos expansion [27,28]. More
𝑝 (𝐱) ⋄
recently, importance sampling based on the Gaussian process surrogate 𝑃𝐹 = 𝐼𝛺𝐹 (𝐱) ⋄ (𝐱) 𝑞̂𝐼𝑆 (𝐱) d𝐱
∫𝒳 𝑞̂𝐼𝑆
has received increasing attention; hence, the rest of this section mainly
𝐼𝛺𝐹 (𝐱) [ ] (43)
focuses on this limit-state approximation method.
=𝑃𝐹⋄ 𝑞̂⋄ (𝐱) d𝐱 = 𝑃𝐹⋄ 𝐷𝐼𝑆 𝑔 (𝐱) , 𝑔̂ ⋄ (𝐱) ,
Gaussian process regression builds on the premise that the ac- ∫𝒳 𝐼̃ ̂ ⋄
𝛺 (𝐱) 𝐼𝑆
𝐹
tual limit-state function 𝑔 (𝐱) is a realization of an underlying Gaus-
sian process; hence, the generated search space ̂ consists of random where 𝐷𝐼𝑆 (⋅, ⋅) ≥ 0 is the correction factor of applying importance
functions 𝑔̂ (𝐱) = 𝐡⊺ (𝐱) 𝜽 + 𝜀 (𝐱), where 𝐡⊺ (𝐱) 𝜽 is the mean func- sampling (a measure of distance between the actual and approximated
tion, in which 𝐡 (𝐱) is the vector of basis functions; 𝜽 is the vector limit-state functions.)
( ( )) Replacing the computationally demanding 𝑔 (𝐱) with a fast-to-
of model parameters; and 𝜀 (𝐱) ∼  0, 𝐾 𝐱, 𝐱′ is a zero-mean
( ′) evaluate 𝑔̂ ⋄ (𝐱) enables evaluating 𝑃𝐹⋄ and 𝐷𝐼𝑆 using standard MC]
Gaussian process with covariance function 𝐾 𝐱, 𝐱 . It can be shown [
that the optimization problem in Eq. (38) for the designated ̂ and ⋄ ̃
estimators. Specifically, the standard MC estimator of 𝑃 = E𝑝 𝐼 ̂ ⋄ (𝐱)
{ } 𝐹 𝛺𝐹
training data 𝑀 = 𝐱(𝑚) ∶ 𝑚 = 1, … , 𝑀 sampled from 𝑝 (𝐱) yields is
𝑔̂ (𝐱) ∼  (𝜇 (𝐱) , 𝜎 (𝐱)) (e.g., [99]), where  (𝜇 ⋄ (𝐱) , 𝜎 ⋄ (𝐱)) is a Nor-
⋄ ⋄ ⋄
( ) 𝑁1
mal (Gaussian) distribution with mean 𝜇 ⋄ (𝐱) and standard deviation 1 ∑ ̃ ( (𝑚) )
𝑃̂𝐹⋄ 𝑁1 = 𝐼 ̂⋄ 𝐱 , (44)
𝑁1 𝑚=1 𝛺𝐹
𝜎 ⋄ (𝐱) defined as
( )
𝜇 ⋄ (𝐱) =𝐡⊺ (𝐱) 𝜽⋄ + 𝐤⊺ (𝐱) 𝐊−1 𝐠 − 𝐇⊺ 𝜽⋄ , where 𝐱(1) , … , 𝐱(𝑁1 ) ∈ 𝑁1 are statistically independent samples from
√ (40)
( )−1 𝑝 (𝐱).[ Likewise, the ] standard MC estimator of 𝐷𝐼𝑆 =
𝜎 ⋄ (𝐱) = 𝐾 (𝐱, 𝐱) − 𝐤⊺ (𝐱) 𝐊−1 𝐤 (𝐱) + 𝐑⊺ 𝐇𝐊−1 𝐇⊺ 𝐑, E𝑞̂⋄ 𝐼𝛺𝐹 (𝐱) ∕𝐼̃𝛺̂ ⋄ (𝐱) is
[ ( )]⊺ 𝐼𝑆 𝐹
in which 𝐤 (𝐱) = 𝐾 𝐱, 𝐱(𝑚) is the column vector of covariances ( (𝑛) )
( (𝑚) ) [ ( )] ( ) 𝑁2
⋄
between 𝑔̂ (𝐱) and 𝑔̂ 𝐱 ⋄ , for 𝑚 = 1, … , 𝑀; 𝐊 = 𝐾 𝐱(𝑛) , 𝐱(𝑚) is ̂ 1 ∑ 𝐼𝛺𝐹 𝐱
( ) ( ) 𝐷𝐼𝑆 𝑁2 = ( ), (45)
the matrix of covariances between 𝑔̂ ⋄ 𝐱(𝑛) and 𝑔̂ ⋄ 𝐱(𝑚) , for 𝑛, 𝑚 = 𝑁2 𝑛=1 𝐼̃𝛺̂ ⋄ 𝐱(𝑛)
𝐹
8
where 𝐱(1) , … , 𝐱(𝑁2 ) ∈ 𝑁2 are statistically independent samples from analysis. These techniques, however, have been barely explored in the
⋄ (𝐱). The number of samples 𝑁 and 𝑁 for each estimator can be
𝑞̂𝐼𝑆 context of limit-state approximation methods for importance sampling.
1 2
decided according to the desired accuracy level of accuracy captured A recent development can be found in [109].
by the CoV of the estimator. In the end, the failure probability can be A series of hybrid methods have also been developed that leverage a
estimated from low-fidelity surrogate 𝑔̂ ⋄ (𝐱) to generate a large set of training data 𝑀
( ) ( ) ( ) that are then used to estimate a parametric or a nonparametric IS den-
𝑃̂𝐹 𝑁1 , 𝑁2 = 𝑃̂𝐹⋄ 𝑁1 𝐷̂ 𝐼𝑆 𝑁2 , (46)
sity (e.g., [53,95,110–112]). These methods are distinguished by their
√ ( ⋄) ( ) ( ) employed surrogate and density approximation methods. For example,
with a CoV of 2
𝛿𝑀𝐶 𝑃̂𝐹 + 𝛿𝑀𝐶
2 𝐷̂ 𝐼𝑆 , holds for 𝛿𝑀𝐶 𝑃̂𝐹⋄ , the design of the IS density in [112] avoids the usual optimization
( ) ( ⋄) ( )
𝛿𝑀𝐶 𝐷̂ 𝐼𝑆 ≪ 1 [24], where 𝛿𝑀𝐶 𝑃̂𝐹 and 𝛿𝑀𝐶 𝐷̂ 𝐼𝑆 are the CoVs of step to estimate the unknown parameters of the IS density. Instead,
the estimators in Eqs. (25) and (26). ⋄ (𝐱) is a weighted empirical density designed as follows: a large set
𝑞̂𝐼𝑆
The sampling algorithm that generates the training data 𝑀 to find of training data sampled from 𝑝 (𝐱) are partitioned into failure 𝛺̂ 𝐹⋄ =
𝑔̂ ⋄ (𝐱) plays a crucial role in controlling the computational cost and in {𝐱 ∈ 𝒳 ∶ 𝑔̂ ⋄ (𝐱) ≤ 0} and safe 𝛺̂ 𝑆⋄ = {𝐱 ∈ 𝒳 ∶ 𝑔̂ ⋄ (𝐱) > 0} subsets. The
( )
⋄ (𝐱) = ∑𝑀 𝑤 𝛿 𝐱 − 𝐱(𝑚) is then developed,
bounding the estimation error. As before, the total computational cost empirical IS density 𝑞̂𝐼𝑆
( ) 𝑚=1 𝑚 ∑ ( )
of estimating 𝑃𝐹 based on this IS method is 𝑁1 𝑐𝑔̂ + 𝑀 + 𝑁2 𝑐𝑔 , where where the weights are determined such that 𝑚∶𝐱(𝑚) ∈𝛺̂ ⋄ 𝑤𝑚 𝛿 𝐱 − 𝐱(𝑚) =
𝑐𝑔̂ is the computational cost of evaluating the surrogate 𝑔̂ ⋄ (𝐱). Selecting ∑ ( ) 𝐹
(𝑚) . Instead of using active learning techniques
⋄ 𝑤𝑚 𝛿 𝐱 − 𝐱
𝑚∶𝐱(𝑚) ∈𝛺̂ 𝑆
more informative training data 𝑀 can help reduce this computational
to improve the performance of 𝑔̂ ⋄ (𝐱) in regions close to the limit-
cost. Adaptive methods using active learning techniques offer a solution
state surface 𝑔 (𝐱) = 0, the actual limit-state function 𝑔 (𝐱) is used for
to select such informative data points. From a small set of training data,
estimating 𝑃𝐹 in Eq. (9) when |𝑔̂ ⋄ (𝐱)| ≤ 𝜅 for a positive real 𝜅. This
a sequential strategy is followed, where each step provides the next
is because, under specific conditions, the mere use of 𝑔̂ ⋄ (𝐱) to estimate
best (set of) training data point(s) to be included in 𝑀 that would 𝑃𝐹 can lead to erroneous results regardless of 𝑔̂ ⋄ (𝐱) accuracy [27,110].
help improve the surrogate, IS density, or failure probability estimate. The hyperparameter 𝜅 is a key factor in controlling the computational
In the context of IS, a perfect surrogate only needs to correctly predict cost of the IS estimator in Eq. (9). Refs. [27,110,112] have developed
the sign of 𝑔 (𝐱) for every 𝐱 ∈ 𝒳 , or equivalently, correctly classify iterative algorithms to select 𝜅 based on heuristics.
each 𝐱 ∈ 𝛺𝐹 . Thus, active learning techniques usually select the next
best (set of) training data point(s) from the subsets of 𝒳 in which 5. Performances of importance sampling methods
the surrogate has the highest misclassification likelihood (i.e., data
points for which 𝜇 ⋄ (𝐱) is close to zero and 𝜎 ⋄ (𝐱) is large.) There have This section discusses the performance of the reviewed IS methods
been many efforts to improve the classification accuracy of different in terms of their accuracy and computational cost, considering five
surrogates based on adaptive methods (e.g., [94,101–103].) A recent benchmark reliability problems. These problems serve to illustrate
account of available adaptive methods in the literature to improve some of the key characteristics of the reviewed IS methods based on
𝑔̂ ⋄ (𝐱) can be found in [104]. their standard implementations. Though it might be possible to improve
the standard implementation of each method with extra efforts, it is not
4.2. Characteristics, challenges, and recent developments pursued herein.
In evaluating the failure probability, the importance of misclassifi- 5.1. Benchmark problem 1: Component with a parabolic limit-state function
cation is more prominent for those 𝐱’s that are in the vicinity of the
design-point(s) (i.e., the point(s) with the highest failure probability.) The first example is a component reliability problem with two
Limited research has considered the importance of data in the adaptive design-points. The limit-state function is a parabola, defined as [113]
improvements of 𝑔̂ ⋄ (𝐱). For example, Dubourg et al. [24] considered ( )2
the importance of data by sampling the next best set of training 𝑔 (𝐱) = 5 − 𝑥2 − 0.5 𝑥1 − 0.1 , (47)
data points from the current estimate of 𝑞̂𝐼𝑆⋄ (𝐱). In a slightly different
where 𝑥1 and 𝑥2 are statistically independent and identically distributed
context, Breitung [105] proposed a novel approach that reduces the random variables with standard Gaussian distributions. The optimal IS
computational cost of sampling by first finding the design-point(s) ∗ (𝐱) ∝ 𝐼
( ) ( )
density is 𝑞𝐼𝑆 𝜙 𝑥1 𝜙 𝑥2 in which the failure domain is
through a global optimization method. In the standard Gaussian space, {( 𝛺𝐹 (𝐱)
) ( )2 }
defined as 𝛺𝐹 = 𝑥1 , 𝑥2 ∈ R2 ∶ 5 − 𝑥2 − 0.5 𝑥1 − 0.1 ≤ 0 .
the computed design-point indicates that 𝛺𝐹 stays outside the sphere
The functional form of the IS density according to the considered
𝑆𝛽 (0) centered at the origin and has radius 𝛽 (reliability index). A sam-
{ } approximation methods are
pling algorithm is then used to explore for 𝛺𝐹 ⊆ 𝒳 ⧵ 𝑆𝛽 (0) , starting
from the vicinity of the design-point(s). More recently, Kim and Song ∑
𝐾
( )
[106] proposed an adaptive method that combines misclassification Gaussian mixture: 𝑞̂𝐼𝑆 (𝐱, 𝜣) = 𝑤𝑘 𝑘 𝐱, 𝝁𝑘 , 𝜮 𝑘 ,
uncertainty with a data importance measure defined in terms of the 𝑘=1
[ 𝑑 ( )]
Euclidean distance from the design-point(s) in the standard Gaussian 1 ∑ ∏ 1
𝑀
𝑥𝑖 − 𝑥(𝑚)
𝑖
space. The estimation of the design-point(s) is based on the current Kernel density: 𝑞̂𝐼𝑆 (𝐱, 𝜣) = 𝐾 ,
𝑀 𝑚=1 𝑖=1 𝜆𝑚 𝜃𝑖 𝐺 𝜆𝑚 𝜃 𝑖
estimate of 𝑔̂ ⋄ (𝐱).
𝐼̃𝛺̂ (𝐱) 𝑝 (𝐱)
Limit-state approximation methods can significantly reduce the Gaussian process: 𝑞̂𝐼𝑆 (𝐱, 𝜣) = 𝐹
.
computational cost of evaluating the failure probability. However, ∫𝒳 𝐼̃𝛺̂ (𝐱) 𝑝 (𝐱) d𝐱
𝐹
as for the density approximation methods, the curse of dimension-
(48)
ality can limit the use of surrogates for high-dimensional problems
( )
(i.e., problems with 𝑑 ∼  102 .) The variants of dimensionality We compare the performances of these methods (Gaussian mixture,
reduction techniques can help overcome this issue by mapping the Kernel density, and Gaussian process) in terms of their computational
high-dimensional 𝐱 into a lower dimensional space such that specific cost and estimation accuracy. We use the number of 𝑔 (𝐱) evaluations
statistical properties are preserved. The distinction among different dias the proxy of the computational cost. The convergence criterion
mensionality reduction techniques arises from the applied mapping and of simulations in all reliability analyses is based on the CoV of the
( )
the considered statistical properties. Kernel dimension reduction [107] estimator 𝑃̂𝐹 𝑁 , which is set to 0.05. Fig. 1 shows the optimal IS
and gradient-based kernel dimension reduction [108] are two examples ∗
density 𝑞𝐼𝑆 (𝐱) together with the estimated IS densities according to
that implemented dimensionality reduction techniques in regression the three IS methods. Each subfigure shows the contour plot of the
9
Fig. 1. The contour plots of estimated IS densities for the component reliability problem with a parabolic limit-state function.
estimated IS density, superimposed on the limit-state surface 𝑔 (𝐱) = Simulation [116]. Three subset levels (with level probability 0.(1) and
( ) ( )
0 and the base PDF 𝑝 (𝐱) = 𝜙 𝑥1 𝜙 𝑥2 . The figure enables the a total of 2500 simulations are needed to generate the 𝑀 = 50
qualitative comparison of the three IS methods in terms of capturing initial failure samples. (3) For the Gaussian process regression, we
the general shape of 𝑞𝐼𝑆∗ (𝐱). The estimated IS densities based on all initially select a set of 10 = max {10, 2𝑑} training data points using
the three approximation methods have concentrated around the two Latin Hypercube Sampling [117]. We then use the adaptive algorithm
design-points on 𝑔 (𝐱) = 0. However, 𝑞̂𝐼𝑆 (𝐱, 𝜣) based on the Gaussian based on the active learning technique developed in [94] to select
process approximation of 𝑔 (𝐱) captures the general shape of 𝑞𝐼𝑆 ∗ (𝐱) additional training data that help improve the model performance. The
( )
more closely. An improved IS density can significantly facilitate the convergence criterion is based on 𝛿𝑀𝐶 𝑃̂𝐹⋄ (see Eq. (46)), set to 0.03.
estimation of the failure probability. In this example, we use the ordinary Gaussian process regression with
Table 1 summarizes the performances of the three IS methods. The an unknown constant mean (i.e., 𝑔̂ (𝐱) = 𝜃0 + 𝜀 (𝐱).)
Monte Carlo estimate of 𝑃𝐹 with 𝛿𝑀𝐶 = 0.05 is 3.02×10−3 . To implement The obtained results indicate that the estimated failure probabilities
these methods, we use the Python toolbox from https://www.bgu. from different methods are comparable for the selected convergence
tum.de/era/software/ for the Gaussian mixture IS method [114], our criterion. However, there are significant differences in their computa-
own Matlab® codes for the kernel density IS method, and the Matlab- tional costs. For each method, the table reports the number of 𝑔 (𝐱)
based uncertainty quantification toolbox, called UQLab, from https: evaluations for the training of the respective model and for IS simula-
//www.uqlab.com/ for the Gaussian process IS method [115]. The tions. For example, 6000 + 636 in the first row implies that we use 6000
specific settings of the three IS methods used in this example are as evaluations of 𝑔 (𝐱) to train the Gaussian mixture and conduct addi-
follows: (1) The Gaussian mixture model consists of two components tional 636 evaluations of 𝑔 (𝐱) for IS simulations. A close approximation
(i.e., 𝐾 = 2), and we use the adaptive algorithm developed in [114] ∗ can significantly reduce the computational cost of the IS simu-
of 𝑞𝐼𝑆
to estimate its unknown model parameters. The algorithm converges lations. In this example, the limit-state approximation method based
in four steps with 2000 training data per step (i.e., 2000 evaluations on the Gaussian process is computationally superior to the two density
of 𝑔 (𝐱) per step.) (2) For the kernel density method, to establish approximation methods. It follows that the Gaussian process regression
the initial failure samples needed to build the kernel density, we use can closely approximate the smooth parabolic limit-state function in
the adaptive sampling algorithm developed in [89] within the Subset this example with only a few 𝑔 (𝐱) evaluations (i.e., 41). Also, the IS
10
Table 1 Table 2
The performances of three IS methods to solve the component reliability problem with The performances of three IS methods to solve the component reliability problem with
a parabolic limit-state function. a metaball limit-state function.
Method Number of 𝑔 (𝐱) evaluations Failure probability Method Number of 𝑔 (𝐱) evaluations Failure probability
Gaussian mixture 6000 + 636 3.08 × 10−3 Gaussian mixture 7000 + 359 1.60 × 10−8
Kernel density 2500 + 300 2.70 × 10−3 Kernel density 9700 + 200 2.11 × 10−8
Gaussian process 41 + 100 3.05 × 10−3 Kernel density (MCMC) 135 + 2500 1.24 × 10−5
Gaussian process 29 + 100 1.07 × 10−5
based on MCMC with multiple chains starting from points uniformly

generated in the plane [−6, 6]2 (labeled kernel density (MCMC).) The
convergence criterion in all reliability analyses is based on the CoV of
( )
the estimator 𝑃̂𝐹 𝑁 , set to 0.05. Fig. 3 shows the optimal IS density
∗ (𝐱) together with the estimated IS densities 𝑞̂ (𝐱, 𝜣) according to
𝑞𝐼𝑆 𝐼𝑆
the three approximation methods. The description of the plots is the
same as those in Fig. 1. Because of the changing topological structure
of the level-sets, 𝑞̂𝐼𝑆 (𝐱, 𝜣) based on the two adaptive density approxi-
mation methods concentrated around a point on 𝑔 (𝐱) = 0 that is farther
away from the origin than the design-point (see Figs. 3b and 3c). Failing
to explore the design-point neighborhood causes underestimating 𝑃𝐹 .
However, as illustrated in Fig. 3(d), a non-adaptive sampling method
can help overcome the challenge of the changing topological structure.
Fig. 2. The landscape of the metaball limit-state function, its level-sets, and the search Fig. 3(e) shows that 𝑞̂𝐼𝑆 (𝐱, 𝜣) based on the Gaussian process represen-
path to find the design-point on the limit-state surface. (For interpretation of the tation of 𝑔 (𝐱) concentrates around the design point and, hence, yields
references to color in this figure legend, the reader is referred to the web version ∗ (𝐱).
a close approximation of 𝑞𝐼𝑆
of this article.)
Table 2 summarizes the performances of the three IS methods. The
Monte Carlo estimate of 𝑃𝐹 with 𝛿𝑀𝐶 = 0.05 is 1.02 × 10−5 . The settings
∗ . However,
of the three IS methods are similar to those in the previous example.
density in this method has the same functional form as 𝑞𝐼𝑆
The adaptive algorithm of the Gaussian mixture model converges in
in the density approximation methods, the training of 𝑞̂𝐼𝑆 (𝐱, 𝜣) requires
eight steps with 1000 training data per step. Applying the usual changes
samples from the failure domain 𝛺𝐹 that, in turn, requires many 𝑔 (𝐱)
to the algorithm, like the number of components and the number
evaluations. Also, the functional forms of 𝑞̂𝐼𝑆 (𝐱, 𝜣) are different from
∗ and their training involves a highly non-convex optimization. These
of samples per step, does not help find the actual design-point. As
𝑞𝐼𝑆
∗ with a
expected, adaptive algorithms lead to the underestimation of 𝑃𝐹 since
factors control the quality of 𝑞̂𝐼𝑆 (𝐱, 𝜣) in approximating 𝑞𝐼𝑆
𝑞̂𝐼𝑆 (𝐱, 𝜣) concentrates around a point that is far from the design-
limited number of 𝑔 (𝐱) evaluations.
point. Even small under- or over-estimation of the design-point can
significantly affect the failure probability estimate since the tail of the
5.2. Benchmark problem 2: Component with changing topological structure standard Gaussian PDF decays with a squared exponential rate. The
adaptive algorithm of the kernel density based on the Subset Simulation
The second example is a component reliability problem with the also converges to a small region (see Fig. 3c) and cannot overcome
changing topological structure of its level-sets {𝐱 ∈ 𝒳 ∶ 𝑔 (𝐱) = 𝑐} as the the discontinuity of the path to the actual design-point, leading to the
constant 𝑐 approaches 0 from above (i.e., 𝑐 ↓ 0). Adaptive methods that significant underestimation of 𝑃𝐹 . The changing topological structure
gradually move towards the design-point(s) can break down due to the also challenges the convergence of adaptive algorithms like the Subset
changes in the topological structure of the level-sets (see [118,119] for Simulation. The algorithm will ‘‘think’’ that the problem has a small
more discussion.) The limit-state function of this example is a metaball, failure probability and will run many Subset Simulation levels unless
defined as [118] some additional checks stop the algorithm. Instead, the algorithm based
30 on MCMC with multiple chains can generate many samples in the
𝑔 (𝐱) = [ ]2
( )2 neighborhood of the design-point (see Fig. 3d), leading to the improved
4 𝑥1 + 2 ∕9 + 𝑥22 ∕25 + 1
(49) estimate of 𝑃𝐹 . The comparison of the results based on the two sam-
20 pling algorithms explains that once the training samples are generated
+ [ ]2 −5
( )2 ( )2 properly, the idea of using kernel density estimate to establish the IS
𝑥1 − 2.5 ∕4 + 𝑥2 − 0.5 ∕25 + 1
density still works. The adaptive algorithm of the Gaussian process
where 𝑥1 and 𝑥2 are statistically independent and identically distributed regression is the same as before. The performance of this IS method
random variables with standard Gaussian distributions. Fig. 2 illus- has to do with the adaptive algorithm conducting an implicit stochastic
trates the changing topological structure of the level-sets, where the optimization to find the design-point. Unlike its counterparts in the den-
path of the points on the level-sets with the shortest distance to the sity approximation methods, the adaptive algorithm in the limit-state
origin has a discontinuity. Fig. 2 shows this path as 𝑐 in the level-set approximation method does not gradually move toward the design-
{𝐱 ∈ 𝒳 ∶ 𝑔 (𝐱) = 𝑐} varies from 20 to 0. The top plot in the figure also point. Instead, the selected active learning technique determines the
shows the limit-state surface (curve herein) in red, which is the level-set search space by favoring training data that are close to the limit-state
with 𝑐 = 0. surface and yield 𝑔̂ ⋄ (𝐱) predictions with high uncertainty (i.e., training
As in the first example, we compare the performances of the three IS data with small |𝜇 ⋄ (𝐱)| and large 𝜎 ⋄ (𝐱) values in Eq. (40).) The selected
methods in terms of their computational cost and estimation accuracy. active learning technique has been successful in this example. However,
However, in this example, we use two different algorithms to generate its performance can generally be improved by further considering the
the initial samples needed to build the kernel density. One is based importance of candidate training data on the limit-state surface based
on the adaptive sampling algorithm developed in [89] used within the on their distance to the origin in the standard Gaussian space (see, for
Subset Simulation [116] (labeled kernel density), and the other one is example, the active learning technique developed in [103].)
11
Fig. 3. The contour plots of estimated IS densities for the component reliability problem with a metaball limit-state function.
5.3. Benchmark problem 3: Series system with two components of the two components, defined as [9]
( ) ( )4
𝑔1 (𝐱) =3 − 𝑥2 + exp −𝑥21 ∕10 + 𝑥1 ∕5 ,
(50)
The third example is the reliability analysis of a series system with 𝑔2 (𝐱) =8 − 𝑥1 𝑥2 ,
two components. The limit-state function of the system is 𝑔 (𝐱) = in which 𝑥1 and 𝑥2 are statistically independent and identically dis-
{ }
min 𝑔1 (𝐱) , 𝑔2 (𝐱) , where 𝑔1 (𝐱) and 𝑔2 (𝐱) are the limit-state functions tributed random variables with standard Gaussian distributions. The
12
Fig. 4. The contour plots of estimated IS densities for the system reliability problem with two components in series.
limit-state function 𝑔1 (𝐱) features a single design point(at√(0, 4)√and

) the process regression is the same as before. The obtained results indicate
limit-state function 𝑔2 (𝐱) features two design points at 2 2, 2 2 and that the estimated 𝑃𝐹 based on the three IS methods are comparable
( √ √ ) for the selected convergence criterion. However, compared to the first
−2 2, −2 2 .
example, the differences in their computational costs have significantly
As in the previous examples, we compare the performances of the increased. The geometry of the failure domain and its small probability
three IS methods in terms of their computational cost and estimation is responsible for the increased gap in the computational costs (i.e., the
accuracy. The convergence criterion in all reliability analyses is based ( )
( ) order of( failure
) probability has decreased from  10−3 in Example
on the CoV of the estimator 𝑃̂𝐹 𝑁 , set to 0.05. Fig. 4 shows the 1 to  10−5 in the current example.) Specifically, the geometry of
optimal IS density 𝑞𝐼𝑆 ∗ (𝐱) together with the estimated IS densities
the failure domain affects the generation of required samples for the
according to the three IS methods. The descriptions of the plots are the training of the IS density. For example, the sampling algorithm for
same as those in Fig. 1. All the three methods successfully identified the the training of the Gaussian mixture requires more adaptive steps to
three design points on 𝑔 (𝐱) = 0 and their estimated IS densities have eventually draw samples ( from
√ the
√ failure
) domain 𝛺𝐹 . The separation
concentrated around the design points. However, as in Example 1, we of the design-point at −2 2, −2 2 from the other two design-points
observe that 𝑞̂𝐼𝑆 (𝐱, 𝜣) based on the Gaussian process representation of also affects the performance of the sampling algorithm. The geometry
𝑔 (𝐱) yields a closer approximation of 𝑞𝐼𝑆 ∗ (𝐱).
of the failure domain affects the approximation methods in different
Table 3 summarizes the performances of the three IS methods. The ways. The density approximation methods can only learn from samples
Monte Carlo estimate of 𝑃𝐹 with 𝛿𝑀𝐶 = 0.05 is 8.70 × 10−5 . The general in the failure domain (see Eqs. (26) and (36)), whereas the limit-state
settings of the three IS methods are similar to those in the previous approximation methods can learn from samples in both failure and safe
examples. The Gaussian mixture model in this example consists of three domains. In general, the limit-state function’s behavior in regions far
components, and the adaptive algorithm converges in six steps with from the design-points barely affects the failure probability estimate.
3500 training data per step. To establish the initial samples for the ker- However, the sampled data from the failure or safe domains far from
nel density, we use the adaptive sampling algorithm developed in [89] the design-points may adversely impact the limit-state approximation
used within the Subset Simulation. Six subset levels (with level proba- methods’ performance. The sampling algorithm and the considered
bility 0.2) with a total of 9500 simulations are needed to generate the estimation error can help incorporate training data’s importance in
𝑀 = 100 initial failure samples. The adaptive algorithm of the Gaussian developing the limit-state function’s surrogate.
13
Table 3 Table 4
The performances of the three IS methods to solve the system reliability problem with The performances of three IS methods to solve the high-dimensional component
two components in series. reliability problem with a linear limit-state function.
Method Number of 𝑔 (𝐱) evaluations Failure probability Method Number of 𝑔 (𝐱) evaluations Failure probability
Gaussian mixture 17, 500 + 432 8.97 × 10−5 von Mises–Fisher mixture 4500 + 1255 2.39 × 10−4
Kernel density 9500 + 400 8.61 × 10−5 Kernel density 3300 + 10, 000 2.23 × 10−4
Gaussian process 48 + 100 8.91 × 10−5 Gaussian process 386 + 100 2.38 × 10−4
5.4. Benchmark problem 4: Component with a high-dimensional linear number of 𝑔 (𝐱) evaluations. However, the clock time of training 𝑔̂ ⋄ (𝐱)
limit-state function is much longer that of training 𝑞̂𝐼𝑆 ⋄ (𝐱) for the two density IS methods.
In general, for a simple analytical 𝑔 (𝐱), like the one in this problem,
The fourth example is a component reliability problem with a the computational cost of training a model generally dominates that of
high-dimensional limit-state function, defined as [3] evaluating 𝑔 (𝐱). In Gaussian process regression, this issue is particularly
√ highlighted with the increased dimensionality of the problem, where
∑
𝑑
𝑔 (𝐱) = 𝛽 𝑑 − 𝑥𝑖 , (51) more training data is required to achieve the desired level of accuracy.
𝑖=1 However, the dimensionality of the covariance matrix 𝐊 in Eq. (40)
where 𝑥1 , … , 𝑥𝑑 are statistically independent and identically distributed increases with the number of training data, which in turn increases
random variables with standard Gaussian distributions. The optimal the computational cost of estimating the unknown hyperparameters.
∏𝑑 ( )
∗ (𝐱) ∝ 𝐼
IS density is 𝑞𝐼𝑆 𝛺𝐹 (𝐱) { 𝑖=1 𝜙 𝑥𝑖 in which the failure domain
The repeated estimations of the hyperparameters, required in active
( ) √ ∑ learning techniques, further increases the computational cost. Nuanced
is the closed halfspace 𝛺𝐹 = 𝐱⊺ = 𝑥1 , … , 𝑥𝑑 ∈ R𝑑 ∶ 𝛽 𝑑 − 𝑑𝑖=1 𝑥𝑖
basis functions 𝐡 (𝐱) in Eq. (40) would be necessary to implement the
≤ 0}. The exact failure probability is 𝑃𝐹 = 𝛷 (−𝛽), regardless of the
Gaussian process regression for 𝑑 ≥ 20. Finally, since the estimators in
problem dimension 𝑑.
Eqs. (44) and (45) include the uncertainty 𝜎 ⋄ (𝐱) in the definition of
In this example, we compare the performances of the three IS
𝐼̃𝛺̂ ⋄ , the convergence criterion of the active learning technique should
methods for the case in which 𝛽 = 3.5 and 𝑑 = 20. The convergence 𝐹
criterion in all reliability analyses is based on the CoV of the estimator account for the convergence of 𝜎 ⋄ (𝐱) in addition to 𝜇 ⋄ (𝐱) (see, for
( ) example, [121]).
𝑃̂𝐹 𝑁 , set to 0.05. The curse of dimensionality in this problem
prevents implementing the IS density based on the Gaussian mixture
that can achieve the desired CoV of 0.05. Instead, we implement the IS 5.5. Benchmark problem 5: Component with a high-dimensional nonlinear
density based on the von Mises–Fisher mixture to overcome the curse limit-state function
of dimensionality.
Table 4 summarizes the performances of the three IS methods. The In the last example, we consider a Duffing oscillator defined by the
actual failure probability is 𝛷 (−3.5) = 2.32 × 10−4 . The von Mises– stochastic differential equation [67]
Fisher mixture model in this example consists of a single component, ( ) √
𝑦̈𝑡 + 2𝜁 𝜔0 𝑦̇ 𝑡 + 𝜔20 𝑦𝑡 + 𝜀𝑦3𝑡 = 𝑠0 𝑤̇ 𝑡 , (52)
and the adaptive algorithm converges in four steps with 1500 training
data per step. To establish the initial samples for the kernel density, with initial conditions 𝑦0 = 0 and 𝑦̇ 0 = 0, where 𝑦̈𝑡 , 𝑦̇ 𝑡 , and 𝑦𝑡 are,
we use MCMC within the Subset Simulation instead of the adaptive respectively, the acceleration, velocity, and displacement responses;
sampling algorithm in [89]. This is because the adaptive sampling algo- 𝜁 ∈ (0, 1) is the damping ratio; 𝜔0 is the natural frequency of the
rithm in [89] works well for problems with low-dimensional dominant oscillator; 𝜀 is a positive constant that controls the level of nonlinearity
√
variables, where we can build the adaptive kernel sampling density (𝜀 = 0 yields a linear oscillator); and 𝑠0 𝑤̇ 𝑡 is the Gaussian white-noise
for the subset of low-dimensional dominant variables to achieve good excitation with power spectral density 𝑠0 . Introducing the state vector
⊺ ( ) ( )
sampling efficiency. However, there are no low-dimensional dominant 𝐱𝑡 = 𝑥1,𝑡 , 𝑥2,𝑡 = 𝑦𝑡 , 𝑦̇ 𝑡 , we can rewrite Eq. (52) in the state-space
variables for the current problem, while building the kernel sampling form as
( ) ( )
density for relatively high-dimensional variables creates challenges 𝑥2,𝑡( ) 0
that render the adaptive sampling algorithm with the accept-reject d𝐱𝑡 = 2 3 d𝑡 + √ d𝑤𝑡 , (53)
−2𝜁 𝜔0 𝑥2,𝑡 − 𝜔0 𝑥1,𝑡 + 𝜀𝑥1,𝑡 𝑠0
algorithm inefficient. Therefore, we use MCMC instead of the accept-
reject algorithm. This also highlights some potential challenges in where the discrete representation of the white-noise in the frequency
∑𝑑∕2 {√ [ ( ) ( )]}
generating samples needed to build the IS density. Four subset levels domain d𝑤𝑡 ≅ 𝑖=1 2𝛥𝜔 𝑧𝑖 sin 𝜔𝑖 𝑡 + 𝑧𝑑∕2+𝑖 cos 𝜔𝑖 𝑡 d𝑡 is based
(with level probability 0.1) with a total of 3300 simulations are needed on the vector of independent and identically distributed standard Gaus-
( )
to generate the 𝑀 = 100 initial failure samples. The adaptive algorithm sian random variables 𝐳⊺ = 𝑧1 , … , 𝑧𝑑 [122].
of the Gaussian process regression is the same as before. Notable The quantity of interest is the probability of the first-passage failure,
insights about the performances of the three IS methods in a high- defined by the limit-state function
dimensional problem are as follows: (1) We observe that in the case
of the von Mises–Fisher mixture, a proper design of the approximating 𝑔 (𝐳) = 𝑥𝑐 − sup 𝑥1,𝑡 (𝐳) , (54)
𝑡∈[0,𝑡𝑒 ]
IS density based on the specific conditions of the problem can make
an otherwise infeasible method amenable to accurate estimates. (2) where 𝑥𝑐 is the prescribed deformation capacity. The limit-state func-
Also, as can be seen, for the kernel density it needs a large number tion describes the event where the displacement response 𝑥1,𝑡 (𝐳) ex-
[ ]
of samples to estimate 𝑃𝐹 . This highlights the challenge of building ceeds the threshold 𝑥𝑐 at any time in the interval 0, 𝑡𝑒 .
high-dimensional IS densities, which actually exists in general [120] In this example, in addition to the three IS methods considered
and is not unique to the kernel density. But for the kernel density, the earlier, we evaluate the performance of the IS method for dynamical
curse of dimensionality also comes into the play, and even with the systems, discussed in Section 3.1. For numerical illustrations, we use
optimization of the bandwidth and the use of adaptive algorithms, the the following values of model parameters: 𝜁 = 0.25, 𝜔0 = 1, 𝜀 = 1,
established IS density still requires a large number of evaluations to 𝑠0 = 1, 𝑥𝑐 = 4.15, 𝑡𝑒 = 5, and 𝑑 = 50. We also use a band-limited
reach a low CoV. (3) Similar to the previous example, the limit-state representation of the Gaussian white-noise with the power spectral
IS method based on the Gaussian process yields 𝑃̂𝐹 that is comparable density 𝑠 (𝜔) = 𝑠0 for |𝜔| ≤ 4𝜋 and 𝑠 (𝜔) = 0, otherwise. The convergence
to that from the density IS methods but with a significantly smaller criterion in all reliability analyses is based on the CoV of the estimator
14
( )
𝑃̂𝐹 𝑁 , set to 0.1 in this example. As in the previous example, we Table 5
The performances of the four IS methods to solve the high-dimensional component
implement the IS density based on the von Mises–Fisher mixture to
reliability problem with an implicit nonlinear limit-state function.
overcome the curse of dimensionality.
Method Number of 𝑔 (𝐱) evaluations Failure probability
Table 5 summarizes the performances of the four IS methods. The
Control with design-point 19 + 50, 000 6.02 × 10−4
Monte Carlo estimate of 𝑃𝐹 with 𝛿𝑀𝐶 = 0.05 is 6.35×10−4 . The approach
excitation
developed in [70] facilitates estimating the design-point excitations
( ) von Mises–Fisher mixture 37, 500 + 11, 200 6.27 × 10−4
𝐮̂ ∗𝑡 𝑡𝑘 as the optimal controls in Eq. (16). It is shown that the design- Kernel density 22, 000 + 10, 000 6.22 × 10−4
point excitation is identical to the excitation that generates the mirror Gaussian process 1100 + 851 6.46 × 10−4
⊺
image of the free-vibration response with the initial condition of 𝐱0 =
( )
𝑥𝑐 , 0 . We use this method to compute the design-point excitations at
equally spaced times 𝑡𝑘 in the interval [0.25, 5] with the increments
can effectively explore 𝛺𝐹 at a manageable computational cost. A
of 0.25 [s]. After some adjustments, we found that the von Mises–
systematic testing of any new solution approach is crucial since often
Fisher mixture model with ten components and 12,500 training data
there is no rigorous mathematical proof that can show when and how
per step yields 𝑃̂𝐹 with the target CoV level. We can compare the
a new solution approach would work or break down. Though, testing
number of components with the 20 components in the Gaussian mixture
a new solution approach with diverse examples can be beneficial in
model of the IS method for dynamical systems. To establish the initial
establishing its limitations and revealing where it needs improvement
samples for the kernel density, we use the adaptive sampling algorithm
(e.g., [118,123]).
developed in [89] used within the Subset Simulation. Four subset levels
Two general approaches have been pursued to address the curse of
(with level probability 0.1) with a total of 22,000 simulations are
dimensionality in density approximation methods. The first approach
needed to generate the 𝑀 = 50 initial failure samples. The adaptive
consists of the variants of the dimensionality reduction technique. The
algorithm of the Gaussian process regression is the same as before.
main idea is that the high-dimensional 𝐱 ∈ 𝒳 is embedded in a
The IS method based on the controls with design-point excitations
low-dimensional subspace of 𝒳 that should be identified. The low-
involves 20 evaluations of 𝑔 (𝐳) to estimate the design-point excitations
dimensional subspace can be achieved through a projection or finding
and another 50,000 to estimate 𝑃𝐹 using Eqs. (16) and (31). The perfor-
a small number of state variables that control 𝑃𝐹 . For example, mani-
mance of this method primarily depends on the accuracy of estimating
fold learning (e.g., [79,124–126]) techniques offer a general means to
the controls. The mirror image excitation approach [70] used in this
identify such embedded low-dimensional subspaces, as do the results
example is based on the premise of having zero displacement and
of importance measures in FORM. Once a low-dimensional subspace
velocity responses at the end of the free-vibration (i.e., 𝑥1,𝜏 = 𝑥2,𝜏 = 0
[ ] is identified, the usual density approximation methods would be ap-
for each up-crossing time 𝜏 ∈ 0, 𝑡𝑒 .) Since the considered time interval
and damping ratio do not allow us to satisfy this condition, we had to plicable. The second approach consists of designing IS densities that
approximate the design-point excitations based on the stored energy are amenable to high-dimensional problems. This includes IS densities
in the system at the end of the free-vibrations (more details on the that are exclusively designed for high-dimensional problems like the
approximation method can be found in [70].) The density approxi- von Mises–Fisher mixture and those that suffer less from the curse
mation method using the von Mises–Fisher mixture has a comparable of dimensionality. Examples of the latter case include recent efforts
performance but with fewer components. In the first-passage problem, that integrate marginal or factorized densities of state variables with a
each component captures the probability of a dominant up-crossing parametric dependence structure like Copula or Generative Adversarial
[ ] Network (GAN) [53,127].
event in a sub-interval of 0, 𝑡𝑒 . In the approximation method based on
the kernel density, we develop the IS density for the top six important The current literature also includes several treatments for the curse
dimensions based on a probabilistic sensitivity analysis while the rest dimensionality in limit-state approximation methods. One idea is to
are at their base probability distribution [89]. In sensitivity analysis, focus on a few important state variables while accounting for the
we use the failure samples and kernel density to estimate the relative impacts of unimportant state variables on the statistics of 𝑔 (𝐱) by
entropy between the optimal IS density and the base PDF as the impor- treating the statistical information of 𝑔 (e.g., mean and variance) as
tance measure of each variable [88]. We use this approach because it is outputs for the surrogate [128,129]. Others (e.g., [130]) have ex-
challenging to build the kernel density for high-dimensional variables plored the idea of decomposing high-dimensional problems into many
(this is a general challenge and is not unique to the kernel density.) If low-dimensional ones and then develop multiple surrogates for the low-
kernel IS density is built for all the dimensions, the failure probability dimensional problems. Also, a combination of different surrogates can
estimates would have large errors. Even with building the kernel den- be used to address some of the challenges with high-dimensional 𝐱
sity for the subset of important variables, optimizing the bandwidth, (e.g., polynomial chaos and Gaussian process in [121,131]). Different
and using an adaptive algorithm, the established IS density still requires approaches have also been proposed to build surrogates in a reduced
a large number of evaluations to reach a low CoV. The results from this latent space. For example, Principal Component Analysis (PCA) has
example and the previous one highlight the challenges in building IS been used to reduce the dimension of state variables for topology
density for high-dimensional problems, especially when there are no optimization, where a surrogate is built based on low-dimensional
dominant low-dimensional subsets of important variables. Finally, the latent state variables [132]. Likewise, the so-called active subspace
discussion on the performance of the limit-state approximation method method has been used to find the direction with strongest variability in
based on the Gaussian process for the high-dimensional linear limit- 𝑔 (𝐱) and then build a surrogate in a low-dimensional subspace [133].
state function still holds in this problem. However, the clock time The starting point of sampling algorithms and their ability to rapidly
of training 𝑔̂ ⋄ (𝐱) and the computational demand has significantly in- approach and explore 𝛺𝐹 are determinant factors to the computational
creased with the increase in the problem dimension and the limit-state cost of model training. Sampling algorithms usually waste computa-
function nonlinearity. tional resources, to different degrees, before reaching the important
regions in 𝒳 that control the estimation of 𝑃𝐹 . This issue is partic-
6. Summary of challenges and future directions ularly pronounced in problems with a high-dimensional 𝐱 due to the
curse of dimensionality. One idea to reduce the computational cost
In this section, we summarize the challenges of the reviewed IS is to find such important regions through an optimization problem
methods and briefly discuss their future directions. The main challenges like that in FORM. The sampling algorithm can then take over from
and future directions generally focus on two main aspects. The design the identified important regions to explore 𝛺𝐹 . Besides MCMC and
of ̂ for high-dimensional problems, and sampling algorithms that rejection sampling, recent developments in machine learning like GAN
15
can also be used to generate training data that populate 𝛺𝐹 . The data for training the importance sampling density in the first class and
attractive feature of GAN is that based on existing samples, it can surrogate in the second class controls the computational cost. The paper
learn an implicit model of the underlying distribution and then uses evaluated the performance of methods from the two classes through
the generator (in GAN) to directly generate samples from the learned benchmark problems. The obtained results indicate that the overall
model instead of relying on the further use of stochastic sampling algo- performance of the first class of methods mainly depends on the magni-
rithms. It is also effective for learning high-dimensional distributions. tude of the failure probability, whereas the performance of the second
It was recently used in [134] to generate samples in Subset Simulation class of methods mainly depends on the general shape of the limit-
for a high-dimensional problem. Dimensionality reduction techniques state function and its smoothness. The obtained results also highlighted
have also been explored to facilitate the stochastic sampling in high the prominent role of the defined density approximation space and
dimension (e.g., active subspace has been used to accelerate MCMC in selected sampling algorithms in addressing the curse of dimensionality
high-dimensional space [135]). in high-dimensional problems. Finally, the paper presented a summary
For the density approximation methods, due to the way that the of important challenges and future directions of IS methods.
densities are established, they cannot consider the boundary of 𝑔 (𝐱) or,
equivalently, the bounds of the samples; thus, the established density References
generates many samples in the safe domain, which contribute to the
relatively higher number of samples needed for the IS estimation of the [1] Der Kiureghian A. First-and second-order reliability methods. In: Nikolaidis E,
failure probability. This limitation can be seen in the first two bench- Ghiocel DM, Singhal S, editors. Engineering design reliability handbook. FL:
mark examples compared with the limit-state approximation method, CRC Press Boca Raton; 2004, p. 465–94.
[2] Schuëller GI, Stix R. A critical appraisal of methods to determine failure
which establishes IS density that considers the boundary of 𝑔 (𝐱). The probabilities. Struct Saf 1987;4:293–309.
performance of the sampling algorithm can be improved by taking into [3] Engelund S, Rackwitz R. A benchmark study on importance sampling techniques
account the bounds of the samples, either through boundary correction in structural reliability. Struct Saf 1993;12:255–76.
techniques (e.g., the multivariate boundary corrected kernel density [4] Ditlevsen O, Madsen HO. Structural reliability methods. New York, NY: Wiley;
used in [136]) or through combination with a limit-state approximation 1996.
[5] Tanaka H. Application of importance sampling method to time-dependent
method that can approximately check whether the samples generated system reliability analyses using the Girsanov transformation. In: Shiraishi N,
by the IS density are failure samples or not. Shinozuka M, Wen Y-K, editors. Proceedings of the seventh international
conference on structural safety and reliability. Rotterdam, Netherland: Balkema;
7. Conclusions 1997, p. 411–8.
[6] Srinivasan R. Importance sampling: Applications in communications and
detection. Springer Science & Business Media; 2013.
This paper presented a critical review of importance sampling meth- [7] Owen AB. Monte Carlo theory, methods and examples. 2013.
ods for reliability analysis. After presenting the mathematical formu- [8] Ang GL, Ang AH-S, Tang WH. Optimal importance-sampling density estimator.
lation of the importance sampling for reliability analysis, the paper J Eng Mech 1992;118:1146–63.
discussed failure probability estimators, their statistical properties, and [9] Au S-K, Beck JL. A new adaptive importance sampling scheme for reliability
computational complexities. Accordingly, a closed-form solution is de- calculations. Struct Saf 1999;21:135–58.
[10] Kurtz N, Song J. Cross-entropy-based adaptive importance sampling using
rived for the optimal importance sampling density, leading to a zero- Gaussian mixture. Struct Saf 2013;42:35–44.
variance estimator for failure probability (i.e., only one sample is [11] Bucher CG, Bourgund U. A fast and efficient response surface approach for
required to estimate the failure probability.) For the time-variant reli- structural reliability problems. Struct Saf 1990;7:57–66.
ability analysis of dynamical systems, a closed-form solution is derived [12] Gayton N, Bourinet JM, Lemaire M. CQ2RS: A new statistical approach to the
for the optimal stochastic control that induces the optimal importance response surface method for reliability analysis. Struct Saf 2003;25:99–121.
[13] Allaix DL, Carbone VI. An improvement of the response surface method. Struct
sampling probability measure. However, the dependence of the optimal Saf 2011;33:165–72.
solution (i.e., optimal importance sampling density or stochastic con- [14] Zhao W, Fan F, Wang W. Non-linear partial least squares response surface
trol) on the failure probability prevents its implementation. The paper method for structural reliability analysis. Reliab Eng Syst Saf 2017;161:69–77.
reviewed two classes of methods that try to approximate the optimal [15] Guimarães H, Matos JC, Henriques AA. An innovative adaptive sparse response
solution with a near-optimal importance sampling density or stochastic surface method for structural reliability analysis. Struct Saf 2018;73:12–28.
[16] Hurtado JE. Filtered importance sampling with support vector margin: A
control that is implementable. The paper discusses two general com- powerful method for structural reliability analysis. Struct Saf 2007;29:2–15.
peting sources of error in developing the importance sampling density. [17] Bourinet J-M. Rare-event probability estimation with adaptive support vector
The first source, called the approximation error, is related to the search regression surrogates. Reliab Eng Syst Saf 2016;150:210–21.
space’s richness for approximating the optimal solution. The richer the [18] Pan Q, Dias D. An efficient reliability method combining adaptive support
search space, the smaller the approximation error. The second source, vector machine and Monte Carlo simulation. Struct Saf 2017;67:85–95.
[19] Roy A, Manna R, Chakraborty S. Support vector regression based metamodeling
called the estimation error, is related to the learning algorithm that for structural reliability analysis. Probab Eng Mech 2019;55:78–89.
selects a search space member with the minimum distance from the op- [20] Papadrakakis M, Lagaros ND. Reliability-based structural optimization using
timal solution. The richness of the search space increases the likelihood neural networks and Monte Carlo simulation. Comput Methods Appl Mech
of larger estimation errors. In the first class of approximation methods, Engrg 2002;191:3491–507.
the search space is defined by a family of parametric or nonparametric [21] Deng J, Gu D, Li X, Yue ZQ. Structural reliability analysis for implicit
performance functions using artificial neural network. Struct Saf 2005;27:25–48.
probability densities. The learning algorithm aims to find a member of [22] Chojaczyk AA, Teixeira AP, Neves LC, Cardoso JB, Soares CG. Review and
the search space that has the minimum distance from the optimal so- application of artificial neural networks models in reliability analysis of steel
lution. Typical distance measures include Kullback–Leibler divergence structures. Struct Saf 2015;52:78–89.
and mean squared error. The IS density of dynamical systems is also a [23] Xiao N-C, Zuo MJ, Zhou C. A new adaptive sequential sampling method to
construct surrogate models for efficient reliability analysis. Reliab Eng Syst Saf
mixture of standard Gaussian probability densities, each centered at a
2018;169:330–8.
point defined by an associated control function. Each control function [24] Dubourg V, Sudret B, Deheeger F. Metamodel-based importance sampling for
represents the most likely excitation that causes the system to out- structural reliability analysis. Probab Eng Mech 2013;33:47–57.
crossing the failure boundary at a specific time. The second class of [25] Balesdent M, Morio J, Marzat J. Kriging-based adaptive importance sampling
approximation methods follows the optimal solution’s functional form algorithms for rare event estimation. Struct Saf 2013;44:1–10.
[26] Zhao H, Yue Z, Liu Y, Gao Z, Zhang Y. An efficient reliability method
but replaces the computationally demanding limit-state function with a
combining adaptive importance sampling and Kriging metamodel. Appl Math
fast-to-evaluate surrogate. Thus, the search space is defined by the class Model 2015;39:1853–66.
of approximating surrogates (e.g., neural network, polynomial chaos, or [27] Li J, Xiu D. Evaluation of failure probability via surrogate models. J Comput
Gaussian process.) The sampling algorithm that generates the required Phys 2010;229:8966–80.
16
[28] Marelli S, Sudret B. An active-learning algorithm that combines sparse polyno- [64] Bugallo MF, Martino L, Corander J. Adaptive importance sampling in signal
mial chaos expansions and bootstrap for structural reliability analysis. Struct processing. Digit Signal Process 2015;47:36–49.
Saf 2018;75:67–74. [65] Bugallo MF, Elvira V, Martino L, Luengo D, Miguez J, Djuric PM. Adaptive
[29] Gardoni P. Risk and reliability analysis: Theory and applications. In: Risk and importance sampling: The past, the present, and the future. IEEE Signal Process
reliability analysis. Springer International Publishing; 2017. Mag 2017;34:60–79.
[30] Jia G, Tabandeh A, Gardoni P. A density extrapolation approach to estimate [66] Macke M. Variance reduction in Monte Carlo simulation of dynamic systems.
failure probabilities. Struct Saf 2021;93:102128. In: Melchers RE, Stewart MG, editors. Proceedings of the eighth international
[31] Robert C, Casella G. Monte Carlo statistical methods. New York, NY: Springer; conference on applications of statistics and probability. Rotterdam, Netherland:
2004. Balkema; 2000.
[32] Evans LC. An introduction to stochastic differential equations. Providence, RI: [67] Olsen AI, Naess A. An importance sampling procedure for estimating failure
American Mathematical Society; 2013. probabilities of non-linear dynamic systems subjected to random noise. Int J
[33] Schuëller GI, editor. A state-of-the-art report on computational stochastic Non-Linear Mech 2007;42:848–63.
mechanics. Probab Eng Mech 1997;12:197–321. [68] Macke M, Harnpornchai N. Importance sampling of dynamic systems – A com-
[34] Singer H. Kolmogorov backward equations with singular diffusion matrices. parative study. In: Corotis R, Schuëller GI, Shinozuka M, editors. Proceedings
FernUniversität in Hagen; 2019. of the eighth international conference on structural safety and reliability.
[35] Jia G, Gardoni P. Simulation-based approach for estimation of stochas- [69] Higham DJ. An algorithmic introduction to numerical simulation of stochastic
tic performances of deteriorating engineering systems. Probab Eng Mech differential equations. SIAM Rev 2001;43:525–46.
2018;52:28–39. [70] Koo H, Der Kiureghian A, Fujimura K. Design-point excitation for non-linear
[36] Agapiou S, Papaspiliopoulos O, Sanz-Alonso D, Stuart AM. Importance sampling: random vibrations. Probab Eng Mech 2005;20:136–47.
Intrinsic dimension and computational cost. Statist Sci 2017;405–31. [71] Fujimura K, Der Kiureghian A. Tail-equivalent linearization method for
[37] Loève M. Probability theory II. In: Graduate texts in mathematics. New York, nonlinear random vibration. Probab Eng Mech 2007;22:63–76.
NY: Springer-Verlag; 1978. [72] Bucher C. An importance sampling technique for randomly excited systems
[38] Macke M, Bucher C. Importance sampling for randomly excited dynamical descretized by finite elements. In: Ko J-M, Xu Y-L, editors. In: Proceedings of the
systems. J Sound Vib 2003;268:269–90. international conference on advances in structural dynamics, vol. 2, Amsterdam:
[39] Ogawa J, Tanaka H. Importance sampling for stochastic systems under Elsevier; 2000, p. 1135–42.
stationary noise having a specified power spectrum. Probab Eng Mech [73] Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. Cambridge, MA:
2009;24:537–44. MIT press; 2016.
[40] Kanjilal O, Manohar CS. Girsanov’s transformation based variance reduced [74] Ledoux M. The concentration of measure phenomenon. American Mathematical
Monte Carlo simulation schemes for reliability estimation in nonlinear stochastic Society; 2001.
dynamics. J Comput Phys 2017;341:278–94. [75] Katafygiotis LS, Zuev KM. Geometric insight into the challenges of solving
high-dimensional reliability problems. Probab Eng Mech 2008;23:208–18.
[41] Girsanov IV. On transforming a certain class of stochastic processes
by absolutely continuous substitution of measures. Theory Probab Appl [76] Bach FR, Jordan MI. Kernel independent component analysis. J Mach Learn
1960;5:285–301. Res 2002;3:1–48.
[77] Abdi H, Williams LJ. Principal component analysis. Wiley Interdiscip Rev
[42] Kappen HJ, Ruiz HC. Adaptive importance sampling for control and inference.
Comput Stat 2010;2:433–59.
J Stat Phys 2016;162:1244–66.
[78] Coifman RR, Lafon S. Diffusion maps. Appl Comput Harmon Anal 2006;21:5–30.
[43] Evans LC. Partial differential equations. American Mathematical Society; 2010.
[79] Soize C, Ghanem R. Data-driven probability concentration and sampling on
[44] Milstein GN. Numerical integration of stochastic differential equations, volume
manifold. J Comput Phys 2016;321:242–58.
313. Dordrecht, Netherlands: Springer; 1994.
[80] Hjort NL, Holmes C, Müller P, Walker SG. Bayesian nonparametrics. Cambridge,
[45] Doucet A, De Freitas N, Gordon NJ, et al. Sequential Monte Carlo methods in
UK: Cambridge University Press; 2010.
practice, volume 1. Springer; 2001.
[81] Morio J. Extreme quantile estimation with nonparametric adaptive importance
[46] Andrieu C, Doucet A, Holenstein R. Particle Markov chain Monte Carlo methods.
sampling. Simul Model Pract Theory 2012;27:76–89.
J R Stat Soc Ser B Stat Methodol 2010;72:269–342.
[82] Zhang J, Xiao M, Gao L, Chu S. A combined projection-outline-based active
[47] Adali T, Haykin (Eds.) S. Adaptive signal processing: Next Generation Solutions,
learning Kriging and adaptive importance sampling method for hybrid reliabil-
volume 55. John Wiley & Sons; 2010.
ity analysis with small failure probabilities. Comput Methods Appl Mech Engrg
[48] Oh M-S, Berger JO. Adaptive importance sampling in Monte Carlo integration.
2019;344:13–33.
J Stat Comput Simul 1992;41:143–68.
[83] ShangGuan D. A general purpose strategy for realizing the zero-variance
[49] Steele JM. The Cauchy-Schwarz master class: An introduction to the art of
importance sampling and calculating the unknown integration constant. J
mathematical inequalities. Cambridge University Press; 2004.
Comput Phys 2021;436:110311.
[50] Joe H. Dependence modeling with copulas. Boca Raton, FL: CRC Press; 2014. [84] Hajek B, Raginsky M. ECE 543: Statistical learning theory. Urbana, IL:
[51] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et University of Illinois at Urbana-Champaign; 2014.
al. Generative adversarial nets. In: Advances in neural information processing [85] Ang GL. Kernel method in Monte Carlo importance sampling (Ph.D. thesis),
systems. 2014, arXiv preprint arXiv:1406.2661. Urbana, IL, USA: University of Illinois at Urbana-Champaign; 1991.
[52] Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using real NVP. 2016, [86] Abramson IS. On bandwidth variation in kernel estimates-a square root law.
arXiv preprint arXiv:1605.08803. Ann Statist 1982;10:1217–23.
[53] Wan X, Wei S. Coupling the reduced-order model and the generative model for [87] Cappé O, Guillin A, Marin J-M, Robert CP. Population Monte Carlo. J Comput
an importance sampling estimator. J Comput Phys 2020;408:109281. Graph Statist 2004;13:907–29.
[54] Cappé O, Douc R, Guillin A, Marin J-M, Robert CP. Adaptive importance [88] Jia G, Taflanidis AA. Sample-based evaluation of global probabilistic sensitivity
sampling in general mixture classes. Stat Comput 2008;18:447–59. measures. Comput Struct 2014;144:103–18.
[55] Cornuet J-M, Marin J-M, Mira A, Robert CP. Adaptive multiple importance [89] Jia G, Taflanidis AA, Beck JL. A new adaptive rejection sampling method
sampling. Scand J Stat 2012;39:798–812. using kernel density approximations and its application to subset simulation.
[56] Dai H, Zhang H, Rasmussen KJR, Wang W. Wavelet density-based adaptive ASCE-ASME J Risk Uncertain Eng Syst A 2017;3:D4015001.
importance sampling method. Struct Saf 2015;52:161–9. [90] Ma Y-A, Chen T, Fox EB. A complete recipe for stochastic gradient MCMC. In:
[57] Papaioannou I, Papadimitriou C, Straub D. Sequential importance sampling for Advances in neural information processing systems 28, NIPS 2015. 2015.
structural reliability analysis. Struct Saf 2016;62:66–75. [91] Mou W, Ma Y-A, Wainwright MJ, Bartlett PL, Jordan MI. High-order
[58] Wang Z, Song J. Cross-entropy-based adaptive importance sampling using Langevin diffusion yields an accelerated MCMC algorithm.. J Mach Learn Res
von Mises-Fisher mixture for high dimensional reliability analysis. Struct Saf 2021;22:1–41.
2016;59:42–52. [92] Neal RM. Handbook of Markov chain Monte Carlo. In: Brooks S, Gelman A,
[59] Dai H, Zhang H, Wang W. A new maximum entropy-based importance sampling Jones G, Meng X-L, editors. Handbook of Markov Chain Monte Carlo, volume
for reliability analysis. Struct Saf 2016;63:71–80. 2. Boca Raton, FL: CRC press; 2011, p. 113–62.
[60] Douc R, Guillin A, Marin J-M, Robert CP. Convergence of adaptive mixtures of [93] Chen Y, Dwivedi R, Wainwright MJ, Yu B. Fast mixing of metropolized
importance sampling schemes. Ann Statist 2007;35:420–48. Hamiltonian Monte Carlo: Benefits of multi-step gradients. J Mach Learn Res
[61] Marin J-M, Pudlo P, Sedki M. Consistency of the adaptive multiple importance 2020;21:1–63.
sampling. 2012, arXiv preprint arXiv:1211.2548. [94] Echard B, Gayton N, Lemaire M. AK-MCS: An active learning reliability method
[62] Martino L, Elvira V, Luengo D, Corander J. An adaptive population importance combining Kriging and Monte Carlo simulation. Struct Saf 2011;33:145–54.
sampler. In: 2014 IEEE international conference on acoustics, speech and signal [95] Peherstorfer B, Cui T, Marzouk Y, Willcox K. Multifidelity importance sampling.
processing. IEEE; 2014, p. 8038–42. Comput Methods Appl Mech Engrg 2016;300:490–509.
[63] Martino L, Elvira V, Luengo D, Corander J. An adaptive population im- [96] Yun W, Lu Z, Jiang X. An efficient reliability analysis method combining adap-
portance sampler: Learning from uncertainty. IEEE Trans Signal Process tive Kriging and modified importance sampling for small failure probability.
2015;63:4422–37. Struct Multidiscip Optim 2018;58:1383–93.
17
[97] Xiao N-C, Zhan H, Yuan K. A new reliability method for small failure [117] McKay MD, Beckman RJ, Conover WJ. Comparison the three methods for
probability problems by combining the adaptive importance sampling and selecting values of input variable in the analysis of output from a computer
surrogate models. Comput Methods Appl Mech Engrg 2020;372:113336. code. Technometrics 1979;21.
[98] Cucker F, Zhou DX. Learning theory: An Approximation Theory Viewpoint. [118] Breitung K. The geometry of limit state function graphs and subset simulation:
Cambridge University Press; 2007. Counterexamples. Reliab Eng Syst Saf 2019;182:98–106.
[99] Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. [119] Breitung K. SORM, design points, subset simulation, and Markov chain Monte
Cambridge, MA: MIT Press; 2006. Carlo. ASCE-ASME J Risk Uncertain Eng Syst A 2021;7:04021052.
[100] Kanagawa M, Hennig P, Sejdinovic D, Sriperumbudur BK. Gaussian processes [120] Au SK, Beck JL. Important sampling in high dimensions. Struct Saf
and kernel methods: A review on connections and equivalences. 2018, arXiv 2003;25:139–63.
preprint arXiv:1807.02582. [121] Schöbi R, Sudret B, Marelli S. Rare event estimation using polynomial-chaos
[101] Picheny V, Ginsbourger D, Roustant O, Haftka RT, Kim N-H. Adaptive designs Kriging. ASCE-ASME J Risk Uncertain Eng Syst A 2017;3:D4016002.
of experiments for accurate approximation of a target region. J Mech Des [122] Shinozuka M, Deodatis G. Simulation of stochastic process by spectral
2010;132. representation. Appl Mech Rev 1991;44:191–204.
[102] Bect J, Ginsbourger D, Li L, Picheny V, Vazquez E. Sequential design of [123] Rackwitz R. Reliability analysis—a review and some perspectives. Struct Saf
computer experiments for the estimation of a probability of failure. Stat Comput 2001;23:365–95.
2012;22:773–93. [124] Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, et al. Geometric
[103] Sun Z, Wang J, Li R, Tong C. LIF: A new Kriging based learning function diffusions as a tool for harmonic analysis and structure definition of data:
and its application to structural reliability analysis. Reliab Eng Syst Saf Diffusion maps. Proc Natl Acad Sci 2005;102:7426–31.
2017;157:152–65. [125] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural
[104] Teixeira R, Nogal M, O’Connor A. Adaptive approaches in metamodel-based networks. Science 2006;313:504–7.
reliability analysis: A review. Struct Saf 2021;89:102019. [126] Ma Y, Fu Y. Manifold learning theory and applications. Boca Raton, FL: CRC
[105] Breitung KW. Metaheuristics of failure probability estimation in high dimen- press; 2012.
sions. In: Song J, editor. Proceedings of the 13th international conference on [127] Tang K, Wan X, Liao Q. Deep density estimation via invertible block-triangular
applications of statistics and probability in civil engineering. mapping. Theor Appl Mech Lett 2020;10:143–8.
[106] Kim J, Song J. Probability-adaptive Kriging in n-Ball (PAK-Bn) for reliability [128] Gidaris I, Taflanidis AA, Mavroeidis GP. Kriging metamodeling in seismic risk
analysis. Struct Saf 2020;85:101924. assessment based on stochastic ground motion models. Earthq Eng Struct Dyn
[107] Fukumizu K, Bach FR, Jordan MI, et al. Kernel dimension reduction in 2015;44:2377–99.
regression. Ann Statist 2009;37:1871–905. [129] Taflanidis AA, Zhang J, Patsialis D. Applications of reduced order and sur-
[108] Fukumizu K, Leng C. Gradient-based kernel dimension reduction for regression. rogate modeling in structural dynamics. In: Model validation and uncertainty
J Amer Statist Assoc 2014;109:359–70. quantification, volume 3. Springer; 2020, p. 297–9.
[109] Zuniga MM, Murangira A, Perdrizet T. Structural reliability assessment through [130] Micheli L, Alipour A, Laflamme S. Multiple-surrogate models for probabilistic
surrogate based importance sampling with dimension reduction. Reliab Eng Syst performance assessment of wind-excited tall buildings under uncertainties.
Saf 2021;207:107289. ASCE-ASME J Risk Uncertain Eng Syst A 2020;6:04020042.
[110] Li J, Li J, Xiu D. An efficient surrogate-based method for computing rare failure [131] Schobi R, Sudret B, Wiart J. Polynomial-chaos-based Kriging. Int J Uncertain
probability. J Comput Phys 2011;230:8683–97. Quantif 2015;5.
[111] Dalbey KR, Swiler LP. Gaussian process adaptive importance sampling. Int J [132] Li M, Cheng Z, Jia G, Shi Z. Dimension reduction and surrogate based topology
Uncertain Quantif 2014;4. optimization of periodic structures. Compos Struct 2019;229:111385.
[112] Grigoriu M. Data-based importance sampling estimates for extreme events. J [133] Constantine PG, Dow E, Wang Q. Active subspace methods in the-
Comput Phys 2020;412:109429. ory and practice: Applications to Kriging surfaces. SIAM J Sci Comput
[113] Der Kiureghian A, Dakessian T. Multiple design points in first and second-order 2014;36:A1500–24.
reliability. Struct Saf 1998;20:37–49. [134] Li M, Jia G, Cheng Z, Shi Z. Generative adversarial network guided topol-
[114] Geyer S, Papaioannou I, Straub D. Cross entropy-based importance sampling ogy optimization of periodic structures via subset simulation. Compos Struct
using Gaussian densities revisited. Struct Saf 2019;76:15–27. 2021;260:113254.
[115] Marelli S, Sudret B. UQLab: A framework for uncertainty quantification in [135] Constantine PG, Kent C, Bui-Thanh T. Accelerating Markov chain Monte Carlo
MATLAB. In: Vulnerability, uncertainty, and risk: Quantification, mitigation, with active subspaces. SIAM J Sci Comput 2016;38:A2779–805.
and management, ICVRAM2014. 2014. p. 2554–2563. [136] Jia G, Taflanidis AA. Non-parametric stochastic subset optimization utilizing
[116] Au S-K, Beck JL. Estimation of small failure probabilities in high dimensions multivariate boundary kernels and adaptive stochastic sampling. Adv Eng Softw
by subset simulation. Probab Eng Mech 2001;16:263–77. 2015;89:3–16.
18

Structural Safety: Armin Tabandeh, Gaofeng Jia, Paolo Gardoni

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Structural Safety: Armin Tabandeh, Gaofeng Jia, Paolo Gardoni

Uploaded by

Copyright:

Available Formats

Structural Safety 97 (2022) 102216

Contents lists available at ScienceDirect

A review and assessment of importance sampling methods for reliability

ARTICLE INFO ABSTRACT

1. Introduction required to achieve the desired accuracy level in the standard MC

)]} general form of 𝑞𝐼𝑆 ∗ (𝐱) ∝ 𝐼

∗ (𝐱) and 𝑞̂⋄ (𝐱), is the estimation error that depends

based on MCMC with multiple chains starting from points uniformly

limit-state function 𝑔1 (𝐱) features a single design point(at√(0, 4)√and

You might also like