You are on page 1of 11

Accelerated subset simulation with neural networks

for reliability analysis

by
Edson David Cifuentes Gómez

Advisor
Dr. Techn. Diego Andrés Alvarez Marı́n

Submited in partial fulfilment of


the requirements for the degree of
Civil Engineer

Department of civil engineering


Faculty of engineering and architecture
National University of Colombia at Manizales

Manizales, June, 2016

1
Abstract
The present work aims to enhance the subset simulation algorithm in terms of the variance
using a neural network aproach. For reaching this goal, some of the steps followed when
implementing subset simulation were changed for increasing the number of samples used
without increasing the number of function evaluations, this way it is expected that the
variance will be reduced with no extra comptational effort. For the sake of comparision an
example will be developed using the modified subset simulation and conventional subset
simulation. Finally the work aims to see if the goal was achieved with the changes made
to the algorithm of subset simutation.

2
Introduction
Subset simulation is an useful and widely used tool for calculating failure probabilities
in structural systems, is an excellent alternative for crude Monte-Carlo simulation which
is the traditional simulation method for calculating failure probabilities, Monte-Carlo
simulation gives very accurate results, however this method becomes computationally
expensive when calculating small faiure probabilities (Pf ≤ 10−3 ), due to the evaluations
of the system must increase in order to reach such level of accuracy. For overcoming this
disadvantage Subset simulation is proposed, which is a probabilistic simulation method
that computes efficiently such probabilities. For implementing subset simulation Mar-
kov Chain Monte Carlo (MCMC) methods must be known, these methods are used for
drawing random samples from any probability distribution. Despite of computing small
failure probabilites in an efficient way, subset simulation has a problem: its variance it is
inversely proportional on the number of samples used in the simulation, making its re-
duction a problem, for as the number of samples increases so does the number of function
evaluations, making the simulation slower. For this an aproach using neural networks will
be used. Neural networks are a useful machine learning tool that imitate the behavior of
the brain using elements called neurons that relate data and “learn” a function that fits
to this data in the best way possible according to certain parameters. Finally an exam-
ples will be developed, to show the reduction of the variance when the neural networks
are used to learn the function that defines each problem. Throughout this work, several
concepts of probability theory will be used, the reader is refered to [1] to get a better
look of the probability concepts used here.

3
Monte-Carlo simulation
The Monte-Carlo simulation solves a problem by simulating directly the physical process,
so its not necessary to find the solution for the equations that describe the problem, its
widely used in various fields of knowledge like chemistry, physics, etc.
The failure estimate by crude Monte-Carlo simulation is
N
1 X
Pˆf = I(θk ∈ F ) (1)
N k=1
where θk : 1, 2, ..., N are i.i.d samples drawn from the PDF of the random variable that is
being analysed, which are used to evaluate the limit-state function g(θK ). An analysis is
required for each θk , aiming to find if it belongs to the failure region or not, this is done
by following the next criteria:

Failure region:I(θk ∈ F ) = 1 implies that g(θk ) ≤ 0


Failure region:I(θk ∈ F ) = 0 implies that g(θk ) > 0
With the conditions stated for eq. (1) it is clear that the fauilure probability can be
computed as:
Nf
Pˆf = (2)
N
Where Nf is the number of samples that belong to the failure region, in this equation
when N tends to infinity, the estimator gets closer to the true failure probability.
The variance of the equation (2) can be calculated assuming that each simulation cycle

is a Bernoulli trial with probability Pf . Then, it can be said that the number of failures
in N tests follow a binomial distribution; knowing the previous statements the variance
of the Monte-Carlo method can be calculated as
Nf 1 N Pf (1 − Pf ) Pf (1 − f )
V ar(Pˆf ) = V ar( ) = ( 2 )V ar(N f ) = = (3)
N N N2 N

The true value for Pf is not known, so the value of the estimator Pˆf is a fair approximation,
because its expected value is equal to Pf , that is.

Nf 1 N Pf
E(Pˆf ) = E( ) = ( )E(Nf ) = = Pf (4)
N N N
and finally
Pˆf (1 − Pˆf )
V ar(Pˆf ) = (5)
N
Additionally, the MCS estimator in Eq.(1) computed using the i.i.d samples θk : 1, 2, ...N
is as well-known, unbiased.

4
Subset Simulation Algorithm
The Subset simulation algorithm appeared as an alternative to Crude Monte-Carlo
simulation for calculating small failure probabilities. Its basic idea is to express the small
failure probability as a product of large conditional failure probabilities, this approach
takes the problem of computing a very small probability, which is computationally ex-
pensive and makes it much easier to calculate by the means of Monte-Carlo simulation.

The probability of failure is defined as:


Z
PF = P (θθ ∈ F ) = IF (θθ )q(θθ )dθθ (6)

where θ = [θ1 , ..., θn ] ∈ Θ ⊂ Rn represents an uncertain state of the system with proba-
bility density function (PDF) q; F denotes the failure region on the variable space Θ;If
is the indicator function: IF (θθ ) = 1 if θ ∈ F and equalsQzero otherwise. The samples
generated in θ are assumed to be independent, i.e. q(θθ ) = nj=1 qj (θj ), where for every j,
qj is a one dimensional PDF of θj .

Now, F will denote the failure region, that for convenience will be expressed as a
decreasing sequence of failure regions this is F = F1 ⊂ F2 ⊂ ... ⊂ Fm , so that Fk =
∩ki=1 Fi , k = 1, 2, ...m.
By definition of conditional probability we have:
m
\
Pf = P (Fm ) = F1 ∩ F2 ∩ ... ∩ Fm = P ( Fi ) (7)
i=1

m−1
\ m−1
\
Pf = P ( )P (Fm | Fi ) = P (Fm−1 )P (Fm |Fm−1 ) (8)
i=1 i=1
m
Y
Pf = P (Fm−1 )P (Fm |Fm−1 ) = ... = P (F1 ) P (Fi |Fi−1 ) (9)
i=2

with equation (9) it is showed that instead of calculating a small probability directly, the
problem can be reduced to calculate the product of larger conditional failure probabilities.

After defining the failure probability, the next step to take is to choose the intermediate
failure events, also the Modified Metropolis algorithm must be mentioned in order to fully
understand the algorithm.

In subset simulation, the failure domain of a system is represented as the exceedence


of a demand D above some specified threshold level C, then the intermediate failure re-
gions can be represented as [4]:

Fi = D > Ci
where 0 < C1 < C2 ... < Cm = C form an increasing sequence of intermediate threshold
values, how this sequence of values is chosen is of great importance in the develop of
subset simulation, for it affects the estimation of the intermediate failure probabilities.

5
These values cannot be very diferent from each other, since doing so would bring back
the problem of calculating small probabilities. Conversely if they are too similar to each
other, it would take a much greater number of levels in order to calculate the intermediate
probability which would be computationally expensive.

It is a difficult task to find the optimal values for this intermediate thresholds, in order
to get suitable estimations of the probability, these values will be chosed adaptively so
that the probability at each level depends on a common specified value p0 . According to
several researchers the optimal value for p0 lies on the interval [0,1; 0,3].

With a value for p0 , the intermediate threshold value C1 that defines F1 can be obtai-
ned as the p0 -percentile of the N evalations of the limit-state function, sorted in descen-
ding order. It should be mentioned that p0 and N are chosen so that N p0 is an integer.

The intermediate thresholds depend on the conditional sampling and thus, they will
vary in each simulation run. It should be said that P (Fi |Fi−1 ) is not equal to p0 , since it
is a estimator of the conditional probability, knowing this N should be large enough so
the variability of Ci and thus the error in P (Fi |Fi−1 ) ≈ p0 is kept small.

As it was said before, for a full undestanding of the subset simulation, the Modified
Metropolis Algorithm (MMA) must be explained.

This is a Markov Chain Monte-Carlo (MCMC) technique that can draw samples
from a given probability distribution, the significance of (MMA) is on the fact that
the method can generate samples from a conditional distribution q(.|Fi ), the method
guarantees that the generated samples will be distributed as q(.|Fi ).The Markov Chain
Monte-Carlo methods is very extensive and it is not limited to the theory shown here,
for a better understanding of this theory the reader is refered to [3].

The MMA proceeds as follows:

“p∗j (ξ|θ) is called the ‘proposal PDF’ for every j = 1, ..., n, and is a one-dimensional
PDF for ξ centered at θ with symmetry property p∗j (ξ|θ) = p∗j (θ|ξ). From a given sample θ
generate a sequence of samples {θθ 1 , θ 2 , ...} by computing θ k+1 from θk = [θθ k (1), ..., θ k (n)],
k = 1, 2, ...n

1. Generate a candidate state θe: For each component j = 1, ..., n simulate ξj from
p∗j (.|θk (j)). Compute the ratio rj = qj (ξj )/qj (θk (j)). Set θe(j) = ξj with probability
min{1, rj } and set θe(j) = θek (j) with remaining probability 1-min{1, rj }.

2. Acccept/reject θe: Check the location of θe if θe ∈ Fi accept it as the next sample i.e.
θk+1 = θe, otherwise reject it and take the current sample as the next sample, i.e.
θk+1 = θk ”[4]

6
With the most important aspects of the algorithm mentioned, the development of the
algorithm can be made.

The subset algorithm method proceeds as follows:

1. Define rv as the number for variables involved in the problem ,N as the number of
random samples that will be generated per level, and p0 as the conditional failure
probability.

2. Generate N random samples with a Monte-Carlo sampling from the marginal PDF
of each variable involved in the problem.

3. Evaluate the generated samples in the limit-state function, and sort this evaluations
in descending order.

4. Find the threshold value as the p0-percentile of the N evaluation functions

5. Define the intermediate failure level as the samples under the threshold value.

6. Using MMA generate 1/p0 samples using each sample of the intermediate failure
probability as seed, this way the next intermediate failure level will have N samples
again.

7. Return to step 2. and do each step until the threshold value overcomes the demand,
that is Cm > D, where m is the last level of intermediate failure.

8. Compute the failure probability as Pf = pm−1


0 (Nf /N ), where Nf is the number of
samples that lie in level Fm

9. Return Pf

It should be noted that the computational efficciency of the subset simulation algo-
rithm increases compared with Monte-Carlo simulation as the complexity of the problem
increases i.e. as the value of the failure probability decreases, this is because the compu-
tational effort depends strongly on the logarithm of P(F) [3].

Despite of being very useful in the calculation of small failure probabilities, the va-
riance of the subset simulation depends strongly on the number of samples in each inter-
mediate failure level, which is likely to be small because of the nature of the algorithm
that seeks to estimate the failure probability with the less amount of samples possible.
Increasing the number of samples in an attempt to reduce the variance of the estimator,
would make the algorithm slow, because it has to evaluate the entire system in order to
calculate the failure probability, even more if the system to be analyzed has a high level
of complexity, it would take a very long time to calculate the failure probability with
a large number of samples per intermediate failure level. For solving this problem the
neural network approach is proposed and will be explained next.

7
Subset simulation enchanced with Neural Networks
As was stated previously, the variance of the subset simulation decreases as the number of
samples used in each level increases, but the problem cannot be solved simply by increa-
sing the number of samples because the number of evaluations of the system increases as
well.
Neural networks are machine learning algorithms that can learn the behavior of a sys-
tem given a pair of input-output variables, through a procedure called training, there are
several training procedures and several neural networks algorithms but these procedures
will not be explained here, as it is not the goal of this work; the reader is refered to [2]
for a better undestanting of this useful tool.

The interest in the neural network approach is that they can learn the behavior of a
function with given input-output parameters, which is very advantageous for the subset
simulation algorithm, for the neural network can learn the behavior of the analysed sys-
tem, making necessary only one function evaluation per conditional level from which the
neural network will learn the behavior of the system. By doing so the number of evalua-
tions of the system are strongly reduced, because the generated samples will be evaluated
in the neural network instead of being evaluated in the limit-state function, solving the
main problem (in terms of the evaluations of the system) of the subset simulation.

By following this idea, the number of samples can increase considerably, with too little
or no extra computational effort,this way it is expected that the variance will be reduced,
as the number of samples in each conditional level can be increased.

The subset simulation algorithm was modified in order to use the neural network
approach, the modified algorithm proceeds as follows.
1. Define: rv as the number of variables involved in the problem, NM C as the initial
number of Monte-Carlo samples, R as a positive integer, so that RNM C will be
the number of samples at each intermediate failure level and p0 conditional failure
probability at each conditional level.
2. Generate NM C Monte-Carlo samples and evaluate them in the limit-state function
3. With the samples as an input set and the evaluations as an output set, train a
neural network.
4. Generate (R−1)NM C Monte-Carlo Samples and evaluate them in the trained neural
network.
5. With the evaluations of steps 1. and 3. create a vector, sort it in descending order
and find the threshold value as the p0 percentile of the RNM C samples
6. Choose each R the samples for the next intermediate failure level, this samples
will be called the A group, the remaining samples will be called the B group; A
group will have RNM C samples for each variable and B group will have (R − 1)NM C
samples for each variable.
7. Take the A group from the previous step and with the MMA procedure generate
1/p0 with each sample as a seed, it should be noted that the MMA uses the limit-
state function to evaluate the samples.

8
8. Take the samples generated in step 7. and evaluate them in the limit-state function.

9. Train a neural network with the samples taken in step 6. as an input set, and the
evaluations made in step 7. as an output set.

10. Take the B group from step 6. and with the MMA procedure generate 1/p0 with
each sample as a seed, this time the MMA must use the neural network trained in
step 9. to evaluate the generated samples.

11. Evaluate the samples made in step 10. in the neural network trained in step 9.

12. Take the evaluations made in steps 8. and 11. and create a vector, sort it in descen-
dent order and find the threshold value as the p0 percentile of the RMM C evaluations.

13. Return to step 6.

14. Repeat until the threshold value overcomes de the maximum demand

With this modification made the following thing to do is to compare the results of
the modified subset simulation algorithm with the results of the conventional subset
simulation.

Example
The analysed function is f (x) = 3 − x so maxdemand = 3 and g(x) = x each algo-
rithm was runned 5 times and the results are showed next:

This is the result for the modified subset simulation algorithm

the mean of the failure probabilities is on the horizontal axis, as its seen on the graphic

9
most the probability values tend to be near to the value 1,3x10−3 , its also seen that a
few values of the failure probability tend to be near to the value 1,7x10−3 . The time in
which the 5 simulations were made was aproximately 10 min

Now the result for the conventional subset simulation algorithm:

the mean of the failure probabilities is on the horizontal axis, as shown in the graphic
the conventional subseet simulation gives very diferent results on each run,almost every
of the failure probabilities calculated can be seen individually. The time in which the 5
simulations were made was aproximately 0.15 seconds.

Conclusions
1. As the results showed the modified subset simulation reduces in some grade the
variance of the estimator of the failure probability.

2. The complexity of the model plays an important role in the implementation of subset
simulation, this was seen on the implementation of the example, while the modified
Subset simulation took 5 minutes to make every simulation, with the conventional
subset simulation the time needed was dramaticaly smaller.

10
Bibliography
[1] Kottegoda,N. T. & Rosso, R. (2008) Applied statistics for civil and enviromental
engineers (p. ), New York, United States of America: McGraw-Hill

[2] Bishop,C. M. (2006) Pattern recognition and Machine learning (p.225-284),


Singapore, Singapore : Springer

[3] Uribe, F. (2011) Implementation of simulation methods in structural


reliability(Bachelor Thesis) National University of Colombia, Manizales, Colombia

[4] Au, S. K. & Beck, J.(2001) Estimation of small failure probabilities in high
dimensions by subset simulation, Probabilistic Engineering Mechanics 16 (4)
pp.(263-277) ISSN 0266-8920 doi:10.1016/S0266-8920(01)00019-4

11

You might also like