You are on page 1of 14

# Nuclear Engineering and Design 77 (1984) 49-62

North-Holland, Amsterdam

MONTE

CARLO SIMULATION

49

OF MARKOV

UNRELIABILITY

MODELS

E.E. L E W I S a n d F r a n z B O H M *
Department of Mechanical and Nuclear Engineering, The Technological Institute, Northwestern University, Evanston, Illinois
60201, USA
Received 9 June 1983

A Monte Carlo method is formulated for the evaluation of the unrealibility of complex systems with known component
failure and repair rates. The formulation is in terms of a Markov process allowing dependencies between components to be
modeled and computational efficiencies to be achieved in the Monte Carlo simulation. Two variance reduction techniques,
forced transition and failure biasing, are employed to increase computational efficiency of the random walk procedure. For an
example problem these result in improved computational efficiency by more than three orders of magnitudes over analog
Monte Carlo. The method is generalized to treat problems with distributed failure and repair rate data, and a batching
technique is introduced and shown to result in substantial increases in computational efficiency for an example problem. A
method for separating the variance due to the data uncertainty from that due to the finite number of random walks is
presented.

1. Introduction
Fault tree methodologies [1-3] are widely employed
in probabilistic risk assessment of nuclear reactors. Reactor shutdown, emergency core cooling and other safety
systems require such low failure probabilities that sufficient reliability estimates often cannot be made from
operating experience or system test data. Fault trees
provide a method for estimating system reliability
parameters in terms of more easily obtainable data for
component failure and repair rates, For large fault trees
computer analysis is needed both to express the logical
structure of the tree in terms of minimal cut sets, and
for the quantitative evaluation of the system unrealibility or unavailability. In what follows we shall consider
Markov Monte Carlo methods for the evaluation of
system unreliability.
Early use of Monte Carlo techniques was made for
the quantitative evaluation of fault trees [4,5]. While
some effort has continued in the use of purely Monte
Carlo methods, they have largely been supplanted deterministic techniques often referred to as Kinetic Tree
methods [5-9]. Two limitations, however, present themselves in the use of Kinetic Tree methods.
* Current address: Institut fur Kerntechnik und Energiesysteme, Universit~t Stuttgart, Pfaffenwaldring 32, 7000 Stuttgart 80, Fed. Rep. Germany.

First, in Kinetic Tree methods the reliability characteristics of each component are modeled separately.
To evaluate the fault tree by combining component
failure probabilities, the components are assumed to
behave independently of one another. In fact, dependencies often arise from common mode failures, from the
increased stress in partially disabled systems, and from
a variety of errors in testing, maintenance and repair.
Due to this limitation of the Kinetic Tree formulation there is increasing use of Markov models for reliability analysis [2,3,10-12], for with such models quite
general dependencies between components may be
treated. For systems with more than a few components,
however, Markov analysis by deterministic means becomes a prodigious task. For even while innovative
methods have been employed to reduce the complexity
of the computations [10-12], the fact remains that one
must solve a set of 21 coupled first-order differential
equations, where I is the number of components, Thus
even a system with only ten components will result in a
system of over one thousand coupled equations with a
transition matrix with over a million elements. Moreover, if some of the components are repairable, the
equations are likely to be quite stiff, requiring that very
small time steps be used in the numerical integration.
A second limitation on Kinetic Tree methods is a
result of the lack of precision to which the component
failure and repair rates are normally known. A means is

0 0 2 9 - 5 4 9 3 / 8 4 / \$ 0 3 . 0 0 © E l s e v i e r S c i e n c e P u b l i s h e r s B.V.
( N o r t h - H o l l a n d Physics P u b l i s h i n g D i v i s i o n )

Thus the set of absorbing states is identical to F. following a transi- (2) . There are then 2 / system states arising from all possible combinations of operating and failed components. This approach retains the power of deterministic Markov methods in modeling component dependencies that would not be possible if direct Monte Carlo simulation were to be carried out. In section 4 the Monte Carlo formulation is generalized to include probability distributions that represent the uncertainty in the component failure and repair rate data.v k . An absorbing state is one out of which no transitions can be made.E. In the numerical examples that follow only time-independent failure and repair rates are considered. the set of failed states. these equations are not in the same form used in deterministic calculations. Markov formulation 2. (5) In this study we consider only system unreliability: Once the system fails it remains in the failed state. For Monte Carlo simulation were are interested in n probability density that the system arrives in +k ( t ) = state k at time t after the n th transition.. k') -~ i will make a state transition at t given (1) 1 that it is state k ' at time t ' ( t ' ~< t ).50 E. The variance in the result is then due to two causes: the finite number of random walk simulations. each of which may be either operating or failed. Invariably this is accomplished by Monte Carlo sampling of the failure rate data using log-normal or other distributions [13 18]. requiring that the solution of the coupled set of differential equations be repeated thousands of times. for repairs could be made on the failed system. variance reduction techniques. Lewis +. (4) k~\$2 Since all transitions by definition must cause a change in system state. k ' ) = 7k.enter state k. 2. we assume a system consisting of I components. variance and other characteristics of the system are estimated. are applied to greatly increase the computational efficiency of Monte Carlo reliability calculations. Analog Markov Monte Carlo where 0 < Yk' < ~ for nonabsorbing states and 7k = () for absorbing states. probability that the system will q(klk') =. Two classes of states may be considered. A batching technique is introduced and is shown to further reduce that part of the variance due to the finite number of random walks without a commensurate increase in computing effort. A similar procedure is also applied to Markov models [10]. similar to those that have been highly developed for neutral particle transport calculations [19 22]. In deterministic calculations the equations are cast in terms of Pk (t) the probability that the system occupies state k at time t [10]. Integral equations To formulate a Markov process [23.k')=O and q(kik')=Skk. At the same time the Monte Carlo simulation requires very little computer memory.2. If system unavailability were to be considered. and the mean. since it may include both failure and repair rates. (7) . B6hm / Monte Carlo simulation required to determine the variance in the result of the fault tree analysis in terms of the variance of the failure and repair rate data from which the component characteristics are calculated. absorbing and nonabsorbing [23]. F.1. for nonabsorbing states q(klk ) = 0 . there would be failed states that are not absorbing states. In section 3. We refer to ~'k as the transition rate. For nonabsorbing states the probability densities are normalized by q ( k l k ' ) = 1. l e t / " be the set of all system failure states.t ' ) ] . (6) 2. The fault tree is evaluated deterministically with data from each data sampling. What follows is the formulation of a class of Monte Carlo methods which provides a natural framework for the treatment of both component dependencies and data uncertainties. tion out of state k'. Let Probability density that the system f(tlt'.24] for system failure in a form suitable for Monte Carlo simulation of the random walks. e x p [ . The Markov process is governed by a set of integral equations which we now derive. Under these restrictions the transition probability density may be written as f(tlt'. In section 2 we formulate Monte Carlo simulation of the unreliability of systems with repairable components within the framework of a Markov process. Certain combinations of component failures correspond to system failure. Consequently f(t]t'. let 12 represent the set of all possible system states. Although they are equivalent. The equations governing system failure are constructed from two probability density functions. ( t . and the uncertainty in the data.

It is shown schematically for one random walk in fig. Monte Carlo simulations of the integral equations may be used to estimate weighted integrals of the form ~ zk ( t ) = ~ fotdg q( klk ~) q( k dO) f ( tlg ' ka ) f ( talO.k(t) = ~ q ( k l k .. . 1 n f(t[t.. k~ (21) k l 12 tP3k(t)=]~ ~-~f0dt2f0 dg q(klk2)q(k2lkl)q(kl[O ) k2 k 1 ×f(t[t2. k .. From eqs. / ' 2 d r . _ .. . (9) and (10) we have n =0 E ftdt.-~). (11) ~bk(t)=~k0d(t)+ ~ In order to relate the relationship between the solution of this set of integral equations and the Monte Carlo simulation it is useful to write the Von Neumann series solution for tkT.-= f0 dt"fo t .'-a). . = { k.)f(tlt.. k.) k' The integral equation for ~bk(t) is obtained by combining eqs. .. (6) ft t ' + A t . k. the unreliability is just the probability that the system will enter a failed state during 0 < t < T. k.)=f(tltn. specified by the components of t. Lewis. (8). Thus the unreliability is given by E k~F (22) 0 2.. 0). BShm / Monte Carlo simulation and in the probability density for arrival in state k: ~k(t) = ~ G(t). k')t~k ( t' ) .. (19) n'=l Finally. (12) we may write ~ o ( t ) = S k o 3 ( t ).q( klk') ftdt' f ( tlt '.. kn (17) where the probability densities for a particular random walk. . F. where T is the design life. Hence E. k 2. (14) We may write the Von Neumann expansion as follows: In particular. 0). }T. t n_~ and passing through states ka. Analog simulation The analog Monte Carlo simulation of the foregoing Markov process is straightforward. _ I (15) kl t tn f dt. (12) and so on. . where t o = 0 and k 0 = 0. t .).Ik.(t) and +k(t). t2 . .~(t). ~o I-I q(k. d t f ( O t . (20) ~Yk( t ) = q( klO)f( tD. over random walks of all lengths.. 1.)=q(klk. _ . as in eq.k. = f t . (16) Then generalizing eq. where in the n = 0 term the sum and integral over k.E. ' k ). The walk is initialized at time zero. ) fotdt... and k. t 2 ._ 1. k 2 . k._l). (18) n'=l (10) k' k. To sample time intervals At between transitions. (9) f'dt.. e (23) . For higher terms we may write these sums of multiple integrals more compactly by defining a random walk in which n state transitions are made by transition time and state vectors t. 51 Let (8) n=0 E---E E kn t The problem is initialized by placing the system in state k 0 = 0 a t t 0 = 0 . k') = exp{ -yk.k. are replaced by unity.'ltn'-l. ~ ..3.. ) f ( t l t ... } L (13) k.E. +~(t) = E A recursive relationship for the states for which n > 0 follows immediately from the definitions of the probability densities: j/._l q ( k l k .-1 ( t ). and t. G ) f ( t d o . ) 1-I f(t. one first finds the cumulative distribution corresponding to eq. are n>0 n q(klk. 0).zlt }. /=E fdt~k(t).. (8) through (10): ~kk( t ) = 3ko3( t ) + Y'. q(klk.. Note that the probability density +~(t) consists of the sum of contributions from random walks with transitions at t a. _ a .. . ' .. - kn k .f(tlt. k2)f(t2lg. the probability density for the system entering a state k at t is determined by summing.

as ~. Monte Carlo simulation of the foregoing Markov process is greatly facilitated by small computer memory requirements. or even to enumerate the 2 t possible states for a system consisting of 1 components. After the first transition only the singlets in the cut sets need to be considered. and set equal to the cumulative distribution on the left.E.~ have the same probability densities.e. ) + ~ ( 0 ~ . Consider a system of I components. and 1 . trial) with design life T. the necessary coefficient may be calculated " o n the fly" as the simulation proceeds from the 21 component failure and repair rates along with a nominal amount of data representing common mode failures and other dependencies. We express ). For clarity we consider the case of no component dependencies. (21) the variance of f k is given by [21] n=l Determine t n of n-th tronsition ~ Determine o = = Y. and a repair r a t e / t i. and after the second transition only singlets or singlets and doublets. and chooses the state which satisfies k y~ q(k"lk')<~'<~ k"=0 k+l ~. this reduces to the binomial form: ko 02 = if(1 --. and so on. (24) Yk' where we have utilized the fact that ~. (26) Each random walk is terminated when t exceeds the design life T. f d t ~ ( . say ~'. with 1 for failure and 0 for nonfailure.z 2. one chooses a second uniformly distributed random number. BOhm / Monte Carlo simulation 52 t=O. Rather. (3l) In contrast to deterministic methods.~ 1 In ~. (22) however. Hence the initial value of the transition rate is 1 Yo = E xi. To estimate the variance of the result we first nole that with the general weight function given in eq. depending on the number of failed components. = X +/~. Thus if M random walks are carried out and m of these are found to result in system failure. To sample for the resulting state k.E. Lewis. A random number ~ is sampled from a uniform distribution 0 < ~ ~< 1. (33) . 1 \$ 2 ( a ) -.a). Inverting the equation then yields At = -.a~T2-T a(1 .t~). the sample mean unreliability is a = m/M.. Failed state determination is made by comparing the set of failed components to the minimal cut sets [1] after each transition. (22). F. {29) Finally. each component with a failure rate X. o=(~)= ~1 ( 1 -~1. (27) k No L For the unreliability defined by eq. the analog sampling is binomial. (32) i=1 Succeeding transition rates are determined by subtracting h i and adding/~i for each failure and adding %/and subtracting/x i for each repair of component i. One of a number of standard notations may be employed to include dependencies in the failure and repair rates. it is unnecessary to store the large transition matrix required in deterministic methods. Initially the system is assumed to have no failed components. (30) or using the sample variance We sample At using the inverse transform method [24]. or when a failed state is reached. using the central limit theorem. for a large number of histories we find the variance of 0 to be Fig. q(k"lk' ). (25) k" =0 Since the unreliability is given by eq. 1.~ ) . Procedure for one random walk (i. for n-th Mote (28} which may be approximated by the sample variance S 2 = fi(1 .

the figure of merit. (39) and similar to eqs.~<7~'~< E i'=l i'=1 i'~U i'~U Xi'- (36) If y~' > ~. the variance will be the same. which we shall refer to as forced transition and trafisition biasing. Some techniques.5]. Determination of which component has failed or been repaired. however. (8) and (9) ~k(t) = £ ~(t) n=0 (40) . Ordinarily a substantial reduction in o 2 is achieved. 3. even though most components do not fail within the design life T. while maintaining an unbiased estimate of the expectation value.E. Thus the average time per trial is much shorter for the Markov process. is carried out as follows. and that the )~> 0 for values of t. Fortunately. then for ~ << 1 we have from eqs. and similarly if q(klk' ) > 0 then # ( k l k ' ) > 0.E. 3. (24) from which it is determined that no transitions take place before t = T. (26) and (31) fi 1 ( a N ) 1/2 ' (38) Even though the time per trial may be small.. Here two techniques are considered.k+ E # i . A random number 4' is first generated. achieve increased figures of merit by decreasing the time per history even though some increase in 02 may result. ( t ) = modified density that the system arrives state k at time t after the nth transition. more than compensating for any increase that may result in the time per trial [19-21]. Figure of merit The standard criterion for judging the efficiency of Monte Carlo methods is the figure of merit [24. We require that f and ~ also satisfy eqs. Biased random walk formulation Suppose we consider a modified or "biased" random walk where the probability denisty f(tlt'. and thereby of the new state of the system. Since both traditional and Markov simulation are binomial. Lewis. such as Russian Roulette. random walk). and k ' for which f > 0. B6hm / Monte Carlo simulation where X = ~ h~ (34) ieU and = E #. k. 2. Consequently the Markov Broadly defined. For if o(fi)/fi is used as a measure of relative error.e. respectively. both defined above. In contrast. (3) and (4). (31). are replaced respectively by the modified distributions f ( t l t ' . the failed component is determined from i i+1 E X.4. We next consider these.25]: 1 / ( o 2 t ) where o 2 is the variance of the sampled distribution and t is the time per trial (i. Likewise we may define ~ . ft. F." (35t i~F Here U and F are the sets of unfailed and failed components. Both of these may be considered within the following framework.. for systems with small unreliability the majority of Markov trials will consist of only one sampling of eq. each trial must consist--at a minimum--of a sampling of the time to first failure of every component.1. as indicated by eq. < y ~ ' ~ < X + E ~i' i'=1 53 (37) i'=l i'vE This proceeds may be further speeded up by using rejection sampling [21] on the failure rates. Markov Monte Carlo lends itself well to a number of powerful variance reduction techniques that are similar to those in Monte Carlo neutral particle transport calculations. k') and ~ ( k l k ' ). realizing that with the traditional model dependencies between components are not readily simulated. In traditional simulation. Variance reduction methods i+1 . Consider a system with a substantial number of components and a small unreliability. It is instructive to compare the figure of merit of the Markov Monte Carlo formulation to that of the more transitional formulation [4. k') and the discrete probabilities q(klk'). it is still likely to be expensive for the simulation of highly reliable systems. the repaired component is determined from i i'~F formulation results in the larger figure of merit. If y~' < X. very large numbers of trials are required. While Markov Monte Carlo may offer substantial improvements in modeling and computational efficiency over traditional analog Monte Carlo methods. variance reduction methods modify analog Monte Carlo simulations in order to increase the 1/(o2t). t'.

.<1. ) u( t. (42) JO (kk ( t ) = 8koS ( t ) + ~_~~l( klk') £'dt' f ( tlt'. (44) is substituted into eq. k " . ) " (49) Analogous to eqs. may be estimated from the sample variance: Sz = M -1~ E rn~F (u~-a) 2" (56) .(t. Implementation With the foregoing equations we may construct a biased Monte Carlo simulation in which j? and 0 instead o f f and q are sampled. (53) Since the unreliability is now expressed as an average of u(t. kit".~(t) sampling is a function of the successive steps in the random walk defined by t. 1)- With the foregoing weight definitions we may insert eqs.. This condition is met provided the weights satisfy the recursive relationship q ( k l k ' ) f ( t l t ' .. kit "-a.."-1 ( t' ). kit ..l ) ..lk. . and k.. w kn ( t ) ~ ~kn( t )=~_~q(klk') k' fodt / ( t i t ' (52) (44) Substituting this expression into eq. (18) and (19) we have defined 0 ( k l k . O ( k l k .k.. (10) and (11): ( k ~ ( t ) = ~ # ( k l k . ( t ) = ~l( klk. ) f ( tlt. w ° ( t ) = 1.. Lewis.) ~ ( tlt. k " ) = O( k l k .). (54). (46) However ~ ( t ) must also satisfy eq.). f ( t l t-. "" relationship from which +k (t) can be estimated.. The estimate of the unreliability. and then adjusted to correct for the bias at each sampling. (45) Then if eq.k. (22) the unreliability may be expressed in terms of f and 0 as T t ~ ~= E E E~ dtf d. in terms o f f and c~: 4k(t)=Sk08(t)+~ Y'.k.) k~l'n=O tk~ 0 0 × u(t. ) over those biased random walks leading to system failure. kl.. kit n.. k~) 1-] }(. × . (9) and requiring eq. This is done by associating a weight with each random walk.~ q . ) ~ gl(klk. (41) Following the same logic as for analog Markov Monte Carlo we may show that ~ ( t ) satisfies integral equations analogous to eq....t. given a sampling of ~k(t). for M random walks is now a = ~1 Z . (45) and (48) into eq.~.-1 (. the weight of the trial will go to one. given by eq. ) f i n'~0 O(k..E. and eqs..( . kit.~. 1:: Brhrn /Monte Carlo simulation 54 and and ~ ° ( t ) = 8koS(t). ) [ ' d t . ' k') (47) From this expression it is clear that the weighting of the ~. t. k. ) . (44).E.)f(tlt. (48) where q( k l k .) n=0 k' ~b~(t) = w~(t)~/~(t). Likewise the variance of the random walk contributions about if._l(t. (55) where u. k°)... the variance of u becomes °2= E kGl. ' w~. k. k') ~k. k') w~. .f and 0 ~ q.... ) n = O k . k . w. ( t )... ".. The weight is initialized to one. (20). -. ) ] ( tlt. (41) to hold yields the initial condition x k. here denotes the weight u(t.O(kl*. To do this we define a weight w~(t) such that (5l) ' 'k). ( t) = u( t...~. k .f'dt°O(k[ko)f(tlt. UO u × [~. (54) Observe that as f---._l) (50) 3.').) =-j(tl.. while still obtaining an unbiased estimate of the unrealiability. ) f ( t l t . . (53) and (55) revert to the binomial suppling expressions. E f d t f dt. k. ~." ). k') and gl(klk' ) to sample +k(t).. (42). l1 }(tl. (10) we obtain Then combining this expression with eq.. ko) of the ruth random walk at the time a failed state is entered. k ..o. kit.: w~. (43) Suppose a Markov Monte Carlo is carried out with We require a )c(tlt'.2. k . to obtain the solution for ~bk(t).(t. ~k ' ° k)~b .

This can be accomplished most directly by causing more trials to contribute to the result than in the binomial case. we define a parameter x such that Y'~ gl(k]k')= x. respectively. Since repair rates are much larger than failure rates the second sum normally dominates systems with any failed components. 3. To bias the transitions toward additional failures. t'<t<T. Failure biasing The second type of variance reduction that is employed we shall refer to as failure biasing. F BOhm / Monte Carlo simulation Then from the central limit theorem the variance of the unreliability estimate u is given by o(fi) 2 = S2/M. and by q(klk') gl(kl k') for repair. (64) k~R Thus in the nonanalog random walk a fraction x of the transitions is forced to result in additional component failures. 1 ~_. k into two classes.x). (61) 3. (63) k~A 1 y ( T . q( k"lk' ) k~R. (56).k') 1 . q(k"lk') 1 -.X k"~R (68) . (60) 0(k]k') = 1 . (58) otherwise.t') is not large. For as this happens there will be fewer zero weight trials. but those trials that do contribute will have weights much smaller than one. the transition k ~ A corresponds to an additional component failure. Lewis. The need for With such a transition the weight of the trial is multiplied by q(klk') q(klk') 1 y. kEA Note that if there are no dependences.2. When no failed components are present often ~ . ) x k"eA (67) for component failures.t') << 1 and therefore the rare event approximation may be used. and therefore there is only a small chance that an additional transition will take place before the end of design life. This technique is only applied when y is sufficiently small that ~ . Hence for nonabsorbing states Y" q(klk' ) + ~ q(kJk')= 1. k') = i .x .2. q(k. (66) k"~R y ( r .t').1. k') ](tit'. We then multiply the trial weight by f(t]t'.t') .e-r(r-r) (59) such biasing is amplified by the fact that for most component types repair rates are much larger than failure rates. Within the class of failure transitions.e -v~r-'') =0 for t' ~<t ~< T. while k ~ R corresponds to an additional component failure..2. The notion here is to force the system toward states with more component failures and fewer repaired components in order to bring it closer to system failure. (57) The object of the biasing is to derive Monte Carlo methods with reduced variance.E.ik. the gl(klk') are determined by increasing q(kl/~ ) by the same mulitiplier: O(klk')- q(ktk') x. Thus if analog sampling is used most failures are followed by repair at the next transition. leading to smaller values of the bracketed term in eqs.E.. (54) and thus in eq. ( T . then the two sums in this expression have numbers of terms equal to the number of unfailed components and to the number of failed components. q( k"lk') (65) k"~A and q(kl k') and f(tlt'. This is accomplished by expanding the foregoing exponentials to yield ](tit'. Y~. k') f(t]t'. Suppose for a particular system state k ' we divide all possible transitions k'---. When failed components are present V is usually sufficiently large that there is no significant gain in variance if we take simply )?=f. ( T . In what follows we utilize two such variance reduction techniques. Forced transitions For the forced transition method we take e-V(t t') ](tit'. k ~ F ~. k') (62) k~R and hence to maintain an unbiased result. k') 55 q(kFk') ( 1 . leading to very few system failures even though the forced transition method is being applied.

This may be seen from the numerical results that follow. (71) Two versions of a Markov process Monte Carlo code have been written for a PDP 11/44 minicomputer. the failure is of the component m for which i+1 ~ki'<--~ x i'--I iG U O(klk') (69) X (70) x~. the system state following the transition. (33). and the tallies. 1.'<" . Lewis. The module for determining whether system failure has occurred compares the current system state to the minimum cut sets. This is carried out by testing only .x ) . Fault tree for example problem. 3. the newly repaired component i is determined from E #.(1 -.'=1 (1 . O(klk') (72) v The foregoing forced transition and failure biasing techniques lead to large improvements in the figure of merit for systems with smaller unreliabilities and repairable components.x ) i'cU q(klk') .U and the weight of the trial is multiplied by q(kJk') and the weight is multiplied by ~< E /tc r=l i'~U Fig.3. If ~' >/x. B6hm / Monte Carlo simulation As in analog Monte Carlo there is no need to store all of the O ( k l k ' ) in the computer. Both have the logical arrangement shown in fig. and 7 is again given by eq.E. F.56 E.' ~-. If ~' < x . 2. Suppose that in state k ' before the transition eqs. for in the absence of component dependencies only the failure and repair rates of the components are required. The sampling proceeds as follows: Choose a random number ~' and compare it to x. The input consists of failure and repair rates for each component along with the minimal cut sets for the system.2. Code structure E ~ki'" i'--I iG. The analog and nonanalog versions differ in the methods for determining the time to transition. (34) and (35) are the sums of the failure and repair states for the unfailed and failed components respectively.

26 0.5 0.17 0.5 10.5 0. singlets and doublets for two failures and so on.5 3. The most important result is the increase of well over three orders of magnitude of the figure of merit. bunching the weights about ff leads to small variance.S / I I ~ a Analog (point data) b c Nonanalog (point data) Nonanalog (distributed data) 1. A very large number of trials was required for the analog run in order to obtain statistically significant results.4 119.26 0.E.3.042 0.444 >(10-4 + 0.53 >(108 0.24 x l 0 -8 0. the great reduction is the variance is the prominent feature of the nonanalog method. The reason for the variance reduction in the nonanalog method may be illustrated quite graphically [25] by considering the density distribution of weights in the tallies. Table 2 Comparison of Monte Carlo unreliability results Quantity Trials M Time/trial t(m s) Sample variance S2 Figure of merit 1/(S2t) Unreliability Uncertainty -t.4 0.000. The characteristics of the variance reduction techniques are nicely illustrated by a problem very similar to that published by Vesely [5]. F.5 3.042 0.61 0.18 >(10 . By defining a number of component failures beyond which the system is assumed to be failed the cut set testing can be truncated at the price of a conservative overestimate of the system unreliability.50 >( 10 -4 0. In analog Monte Carlo the sampling is binomial.86 0.26 3.26 >( 107 4.7 1. Although the time per history is nominally larger.17 0 0 0 0 57 nonanalog (with x = 0.8 0.16 >(101° 6. The problem is a severe test of the Monte Carlo in that the unreliability is very small. (74) and the variance of u is 3.673 >(10-4 _+0.000 1. 1/(S2t'). and the results are compared in columns a and b in table 2. The fault tree is shown in fig.50 X 10 -4 + 0. Both analog and (75) Since according to the central limit theorem the variance in the estimate t~ about ff is o 2/M. 2 and the failure and repair rate data is given in table 1. Lewis.5 0. (73) Written in terms of f ( u ) the unreliability is a=fduuf(u).4 . where M is the number of random walks.85) methods have been applied to the problem.000 10. when the nonanalog method is employed. Brhm / Monte Carlo simulation Table 1 Data for example problem i X~(10 -s h -1 ) ui(h -1) 1 2 3 4 5 6 7 8 9 10 0. those cut sets that are possible for the current number of component failures: singlets for one failure.8 0.0049 >( 10 .17 0.01 X 10 .0136-0 .042 0.000 10.E. Suppose we let probability denisty of a failed system weight u f ( u ) = resulting from Monte Carlo random walk. Illustrative example a2=fdu(u-a)Zf(u).

it is necessary to represent failure and repair data as random variables characterized by probability density functions. to carry out realistic unrealiabilty analysis. 2697 (76) which is consistent with the results given in eqs.994. In the nonanalog Monte Carlo the distribution takes on the form f ( u ) = (1 . F. We indicate this by a subscript I.E.f f ) 6 ( # ) + ~ 6 ( u - 1). Data distributions To begin. These features lead to the small variance observed in table 2 when the variance reduction techniques are applied. In this section we perform this task as follows. (73) is expressed in terms of p as probability density of a failed system weight u f ( ulp ) = resulting from a Monte Carlo random walk.~. since only 30 random walks do not contribute to the nonanalog results. 4. 6 (77) where fdug(u) = 1. the data into account. and to the data uncertainty on the other. Approximate representation of f(u). B6hm / Monte Carlo simulation 58 yielding f ( u ) = (1 . (82) o. o 2 = (a . In particular the probability density given by eq. on the one hand.5 )< 10 -4 to c = 0. 3. (74) and (75) also become dependent on p. the degree to which the weights are bunched about ft. failure and repair rate) data is expressed as probability density functions.a)2g(u). Lewis.c) ~ + c f d u ( u .). Thus the probabilities of the preceding section are conditioned on the data p. (81) Consequently.e. (28) and (30). Monte Carlo with data uncertainties It has been assumed implicitly in the foregoing analysis that the failure and repair rates h i in ~ti for each of the components are known with perfect precision.~. First the input (i.1.: fi. and second. The effectiveness of the nonanalog modification of the game depends first on the degree to which the fraction of random walks which contribute to the tally can be increased over the analog value ~. made by the random walk simulation. 3. Then a Monte Carlo random walk procedure is generalized to take the uncertainty of. = f d u u f ( ulp ). This leads naturally to the idea of batching as well as to some insight as to the relative contributions to the variance of the system unreliability 2~ 5 10 I t~. given the data v.~ = fdu (83) [ u - fi. (80) Here c is just the fraction of trials that terminate in failed states and thus contribute to the unreliability tally.12f(ul. as indicated by the histogram o f f ( u ) shown in fig.c ) 6 ( u ) + c g ( u ) . 15 20 25 30 35 t~O 45 50 Weight [xlO-5] Fig. In actuality these data most often have substantial uncertainties associated with them [1]. Accordingly. 4. The nonzero weights are clustered about the value of ~.E. For the nonanalog sample of 10000 random walks the sample value of c is found to be increased from = 0. we let ~ represent a vector whose components are the )~i and #i of each component of a system. the unreliability and variance given by eqs. To estimate the unreliability fi in the presence of . (78) ~= cfdu (79) u g(u).

(88) where we have used the convention fd. ) = probability denisty of v. the substantial computational effort required to sample the input data is carried out only once per batch. = 0 and only the second term remains.).=fhfd.).2. Illustrative example To carry out the Monte Carlo simulation... and of with fixed data are given by eqs. Thus the time per batch doesn't rise in proportion to the number of random walks per batch but rather as 7 = a + bM. we obtain. the second term vanishes. (82) and (83). vs batch size shown in fig. For the 10000 batch calculations plotted in fig. is the average over the M random walks with data v. (90). It is apparent that both the time per random walk t. At present each of these is sampled from independent log-normal distributions. while the second is due to the data uncertainty.E. (91) in which case o 2 = O2o. F. (85) If the uncertainties in each v.. f(v)=8(. Conversely..0071 respectively for M = 8 and 64. and with c = 3 for all failure and repair rates. The reasons for this may be understood as follows. To examine the effects of data uncertainties the problem described in the preceding section was reconsidered using the data in table 1 as the distribution mean values. + (90) 59 where ~. 3. v0. if the Monte Carlo simulation of the random walk is replaced by an analytical calculation of if. This is seen by t a k i n g .a )2f(ulv)f(v). Random walk batching Substantial improvements in the results given in table 2 have been found to result if instead of performingjust one random walk for each set of data a batch of M random walks is run for each of N batches.a2 (93) n=l is the sample variance corresponding to eq.. (87) and the variance of the associated random walk is o2 = f du f dv(u. Substantial variance reduction can be achieved. with the spread in data characterized by a factor c as detailed in the Appendix. then a. This corresponds to the situation where Kinetic Tree methods are combined with Monte Carlo sampling of failure and repair rate data. (84) We then define f ( v ) = probability density of v. for example t~ × 104 = 0.6878 + 0. Lewis.. the PDP 11/44 code discussed earlier was generalized so that at the beginning of each random walk the failure and repair rates are chosen from probability density functions. The variance is thus made up of two contributions. This is illustrated by the plot of the figure of merit. The variance in fi is SZ/N. This is just the result obtained in the preceding section.3.l(... First. If there is no data uncertainty. equal to the mean values.fd.E.-Vo). 2 . In batch calculations the unreliability is estimated as an average over N independent batches. 1/(S2t). (92) n~l where fi. Bfhm / Monte Carlo simulation data uncertainty. u =~ 1 N E fi.g(u. The time per random walk is substantially larger because a sampling of the log-normal distributions must be carried out at the beginning of each random walk...6827 + 0. (89) 4. Generalizations to other distributions or to dependencies between component data is straightforward. and the sample variance of the result are larger for distributed than for point data..0085 and 0. (88) in the more instructive form = fd. where for large N s 2 = ~1 u . For the present problem we find empiri- .). (86) In the presence of data uncertainty the unreliability is given by =fdufd~ uf(ul. 4. are independent f(n) =f(~l)f(v2). 4. The first is due to variance of the random walk procedure with fixed data.)f(. That the variance should be larger is clear if we perform some algebra to write eq.. For small batch sizes the improvement in the figure of merit comes from two sources. The not analog results with point and distributed data are compared in columns b and c of table 2.2. (94) where a is the data sampling time and b is the time per random walk. we define for each piece of data vi f ( v .

)o fd~f(~)(a. Batch size M. (97) Thus by taking sufficiently large batches. For small batch sizes the variance of the result is substantially reduced. H. Following the exposition of analog Monte Carlo simulation. By comparing the results in fig.4. Thus for this problem about twelve random walks can be included in a batch before the time per batch doubles.2 32 6t. data uncertainty to be cally that a---110 ms and b---10 m s / r a n d o m walk. 5 with eq. (97) we may estimate the contribution to the variance due to the 5. (95) causes the figure of merit to deteriorate. causing the dominant effect to be the time per batch increase with batch size. for too large a batch size o z remains essentially constant. (90). } . (99) The dotted line represents the estimate of eq.: = . (97)]. 4. E %3 \<.51 × 10 -~ (98) Subtracting this result from the variance for M = 1 then yields the contribution from the underlying random walk to be fd~f(~)o£ = 1. the variance due to the random walk simulation can be reduced to insignificance compared to that due to the data uncertainty. (94) and (97)]. 4 may be expected to exist for all problems.E. Sample variance vs batch sizes [calculated result from eq. b ----cokutafed 4-E observed (. a pair of variance reduction methods were constructed. Figure of merit vs._ x Ia'. 5. For illustrative example these were shown to lead to improvements in a figure of merit for computational efficiency of factor of well over three orders in magnitude. (88). ~)2 = 0.60 E. batch size [calculated results from eqs. (96) If the batch quantities are used to evaluate the variance 02 in eq. From the above it is seen that an optimum similar to that illustrated in fig.) / o M . (95) OM~. we obtain instead of eq.- 2+fdpf()O(~ . 6 > ~s 1 "u. According to the central limit theorem the batch averages form a Gaussian distribution is about ~.aicu[mfed c 7 o ÷ x L. o2=---~/dpf(. BOhm / Monte Carlo simulation o ~2 o. This is illustrated in fig.~ ) 2. The reason for this is understood by considering M sufficiently large (say > 30) that the central limit theorem can be applied to the batch average. Discussion In the foregoing sections we have formulated the evaluation of fault trees for the unreliability of repairable systems as a Markov process suitable for effective simulation by Monte Carlo methods. . 5 where the sample variance is plotted versus batch size.u. ÷ observed x. Botch size Fig. The batch variance then is related to the variance of the underlying random walk by 02. The benefit should be larger as more components are considered.33 × 10 -8. = o2/M. For too small a batch size the variance o~ in the first term of eq. (97) using these values. Lewis. F. Fig.

Rumble. Engrg. and (3) the inclusion of time-dependent failure a n d / o r repair rate sampling. Englewood Cliffs.R. Erdmann. Nucl. NY.P. Goldberg. with twelve random numbers. Reliabihty Engineering and Risk Assessment (Prentice-Hall.E.: J.J. Vesely and R. a fair comparison of computational efficiency would be the Monte Carlo time per batch versus the Kinetic Tree time per trial. Vesely. [8] R. Fault Tree Handbook. Engrg. Philadelphia (1977).F.E. Sci. Engrg.61 + g .S. Sept. Appendix The log-normal distribution is widely used in fault tree analysis for the representation of uncertainty in failure and repair rate data [1. While we have not had the opportunity to make numerical comparisons between Markov Monte Carlo and Kinetic Tree methods which use Monte Carlo data sampling. Lewis. assumes that the problem is chosen in which component dependencies do not rule out the use of Kinetic Tree methods. U. Narum. [9] J. correlated sampling. 13 (1970) 337-357.F. [10] I. Henley and H. ANS ENS Intl. This. (2) the restructing of the computational algorithms to treat the full range of component dependencies with which the Markov process is capable of dealing. Vesely. 20-24. Availability on standby safety systems. i=1 and then the value of p is determined from ~ = I n v . H.E. Garrik. Eds. is a failure or repair rate. Leverenz and E. Vaurio.13]. Nucl. [7] W.T. NJ. Nucl. W. 73 (1980) 1-18. Universit~it Stuttgart. Suppose that there is a 90% probability the ~. Principals of unified safety analysis. New York. J. PREP and KITT: computer codes for the automatic evaluation of a fault tree. Nuclear Regulatory Commission. Vesely and F. The probability density is written as 2~ } o~. in: Nuclear Systems Reliability Engineering and Risk Assessment. 15 (1970) 245-321 [5] W. USAEC Report IN-1349 (1970). where ~ >/1 is the error factor referred to in the text. Then using the log-normal probability density. Burdick. A variety of variance reduction techniques beyond those discussed in section 3 may lead to further improvements in computational efficiency. an observation seems in order. Kirch. it may be shown that the parameters and o 2 are given by o = In c/1. of course. Kelley. With the encouraging results obtained from this initial study it appears reasonable to generalize the method both to treat more general reliability models and to achieve yet greater computational efficiency. McCormick. They are determined as follows.E. = In ~ . . Time dependent methodology for fault tree evaluation.E. 1981).0 .J.E. and number of importance sampling techniques. D.C.L. [4] B. 1981). 61 where ~.A. References [1] H. Des. NUREG-0492 (1981).E. 1981. FRANTIC--a computer code for time-dependent unavailability analysis. A method for quantifying logic models for safety analysis. these may include Russian Roulette and splitting. U. Gyftopoulos. In the Monte Carlo simulation the error function is first sampled using the central limit technique [24]. Fussel and G.645. Modeling generalization may proceed along several lines including (1) the inclusion of unavailability as well as unreliability estimates. F. Hassl and F. [3] N. The parameters g and o 2 are not identical to the mean value g and the variance of p. lies between p/c and pc. Port Chester. It is based in part on work submitted by the second author in partial fulfillment of the requiremens for the Diplom Ingenieur. and a batching technique was shown to lead to further improvements in the figure of merit. For equal data sampling one would equate the number of Markov Monte Carlo batches to the number of Kinetic Tree trials. 5 o 2 . [2] E.B. Des.K.R. Roberts. 12 ~ = o E [ ~ i . NUREG-0193 (1977). Topical Meeting on Probabilistic Risk Assessment. If one then chose the batch size just large enough so that the random walk variance could be ignored relative to the variance due to data uncertainty. SIAM.S. in: Proc.2 .E. Markovian reliability analysis under uncertainty with an application on the shutdown system of the Clinch River Breeder Reactor. Goldberg. Nuclear Regulatory Commission Report. [6] W.J. Kumamoto. exp 2o. F B6hm / Monte Carlo simulation The Markov Monte Carlo formulation was extended to problems with da~:a uncertainties.F. Reliability and Risk Analysis (Academic Press.R. Acknowledgemens This work was supported in part by the Deutscher Akademischer Austauschdienst. Papazoglou and E.

[15] R.D. Topical Meeting on Probabilistic Risk Assessment. [14] S. Erdmann et al. Comparison of the Monte Carlo and systems moments methods for uncertainty analysis. Handscome. Wakefield and D. Goertzel and M. Reading.S.O. Hockenbury and P.) A review of the theory and applications of Monte Carlo.J.. Englewood Cliffs. ANS/ENS Intl. NY.K. Los Alamos Scientific Laboratory Report LA-7396-M (1981). Wolf. 1I. NY. in: Proc. ANS/ENS Intl..S. [25] " N C N P . Gelbard and J. Oak Ridge National Laboratory Report O R N L / R S I C 44(1980). ANS/ENS Intl. Modarres. Ligon. Horwitz (Pergamon Press.E. London. 1967). 1981. Probabilistic Safety Analysis 111. Gerrard. in: Progress in Nuclear Energy.M. Lee. 20-24. Saskawa and M. in: Proc. Port Chester.a general Monte Carlo code for neutron and photon transport. 1981. Soc. N. Monte Carlo methods in transport problems. Trans. [12] R. U.D. Lewis. 20-24.: G. Hammersley and D.E. Series II. Spanier. Simulation and the Monte Carlo Method (John Wiley & Sons. commercial nuclear power plants. Quantification of uncertainties in risk assessment using the STADIC-2 code. S. in: Proc. New York.J. S. in: Proc. [21] E. 20-24. Nuclear Regulatory Commission Report WASH-1400. F. Jackson. NUREG-75/014 (1975). Hughes. [24] R. Port Chester. Am. 20-24. Failure mode analysis using state variables derived from fault trees with application. Sugawara. Monte Carlo Methods (Methuen. NY. 1981.L.S. Trubey and B. . MOCARS: a Monte Carlo simulation code for determining the distribution of simulation limits.C. Electric Power Research Institute Report EPRI NP-749 (1978). Introduction to Stochastic Processes (PrenticeHall.H. 19811 Port Chester.Y. [23] E. Cinlar. ANS/ENS Intl.J. Levinson. Vol. Rasmussen and L. Mathews.H. Topical Meeting on Probabilistic Risk Assessment. Energy Research and Development Agency Report TREE-1138 (1977). Eds.C. Monte Carlo simulation by the DL-MODMC code. [22] D. [19] J. Topical Meeting on Probabilistic Risk Assessment. B6hm / Monte Carlo simulation [11] H. [13] Reactor Safety Study--An assessment of accident risks in U. 1975). U. Port Chester.62 E. Rubinstein. New Jersey. R.H. Bartholomew. J. [20] G. [16] M. Sept.M. [17] D. 1969). McGill (eds. M. methods. Sept. 1981). 1958). Application of time-dependent reliability code MARKOV to nuclear power plants. Kalos. 35 (1980) 395. Monte Carlo Principles and Neutron Transport Problems (Addison-Wesley. Topical Meeting on Probabilistic Risk Assessment. Yeater. Sept.S.. [18] P. NY. New York. Sept. Sanders and J. Nucl.Y.B.W. Mass.