Professional Documents
Culture Documents
104
On: 03 Jul 2021
Access details: subscription number
Publisher: CRC Press
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London SW1P 1WG, UK
Publication details
https://www.routledgehandbooks.com/doi/10.1201/b21983-3
Andreas Joanni, Qamar Mahboob, Enrico Zio
Published online on: 29 Mar 2018
How to cite :- Andreas Joanni, Qamar Mahboob, Enrico Zio. 29 Mar 2018, Advanced Methods for
RAM Analysis and Decision Making from: Handbook of RAMS in Railway Systems, Theory and Practice
CRC Press
Accessed on: 03 Jul 2021
https://www.routledgehandbooks.com/doi/10.1201/b21983-3
This Document PDF may be used for research, teaching and private study purposes. Any substantial or systematic reproductions,
re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents will be complete or
accurate or up to date. The publisher shall not be liable for an loss, actions, claims, proceedings, demand or costs or damages
whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.
3
Advanced Methods for RAM Analysis
and Decision Making
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
CONTENTS
3.1 Introduction........................................................................................................................... 39
3.2 Markov Models..................................................................................................................... 40
3.2.1 Theoretical Background........................................................................................... 41
3.2.2 Markov Model Development.................................................................................. 45
3.2.3 Further Extensions.................................................................................................... 45
3.2.4 Applications in Railway Industry.......................................................................... 45
3.2.5 Strengths and Limitations....................................................................................... 47
3.3 Monte Carlo Simulation....................................................................................................... 47
3.3.1 Generation of Random Variates.............................................................................. 47
3.3.2 Basic Concepts of Monte Carlo Simulation........................................................... 49
3.3.3 Variance Reduction Techniques............................................................................. 50
3.3.4 Applications in Railway Industry.......................................................................... 51
3.3.5 Strengths and Limitations....................................................................................... 51
3.4 Petri Nets................................................................................................................................ 51
3.4.1 Basic Elements of GSPNs and Construction Guidelines..................................... 52
3.4.2 Applications in Railway Industry..........................................................................54
3.5 Bayesian Networks and Influence Diagrams................................................................... 55
3.5.1 Basic Concepts of Bayesian Networks................................................................... 55
3.5.2 Inference in BNs........................................................................................................ 56
3.5.3 Mapping of Fault Trees and Event Trees to BNs.................................................. 57
3.5.4 Influence Diagrams.................................................................................................. 57
3.5.5 Applications in Railway Industry.......................................................................... 59
3.5.6 Strengths and Limitations....................................................................................... 59
3.6 Summary................................................................................................................................ 61
References........................................................................................................................................ 61
3.1 Introduction
For the complex systems used in the railway industry, the basic methods of reliability,
availability, and maintainability (RAM) analysis, and the subsequent decision making as
illustrated in the previous chapter may not be sufficient. Thus, in this chapter, we intro-
duce advanced techniques for modeling such complex systems. For the application of
these methods to railway systems, please refer to other relevant chapters in this handbook
39
40 Handbook of RAMS in Railway Systems
and references therein. Some of the basic mathematical models, expressions, definitions,
and terminology related to RAM analysis can be found in IEC 61703:2016 and IEC 60050-
192:2015 (for terminology, see also http://www.electropedia.org). The examples presented
in this chapter are for academic and illustration purposes only, for instance, by consider-
ing limited failure scenarios only.
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
3.2 Markov Models
When applied in the context of reliability and safety analyses, Markov models provide a
quantitative technique to describe the time evolution of a system in terms of a set of dis-
crete states and transitions between them, given that the current and future states of the
system do not depend on its state at any time in the past but only on the present state; see,
e.g., Zio (2009). Markov models are particularly useful for analyzing systems with redun-
dancy, as well as systems where the occurrence of system failure depends on the sequence
of occurrence of individual component failures. They are also well suited for analyzing
systems with complex operation and maintenance strategies such as cold standby compo-
nents, prioritized repair actions, and limited resources for corrective maintenance activi-
ties. Considering the system as a whole, the system states can be classified into states with
various degrees of degraded performance, therefore allowing the modeling of multistate
systems as described, e.g., by Lisnianski, Frenkel, and Ding (2010) and Podofillini, Zio, and
Vatn (2004).
Markov models can be visualized by means of state-transition diagrams as shown in
the following. Depending on the system and the type of analysis, various transient and
stationary reliability and availability measures can be obtained by solving the correspond-
ing model, such as instantaneous and steady-state availability, average availability, mean
operation time to (first) failure, and mean down time. In the context of functional safety,
Markov models allow the computation of measures such as dangerous failure rate and
average probability of failure on demand.
A detailed discussion of Markov models can be found, e.g., in the books by Zio (2009),
Birolini (2007), and Rausand and Høyland (2004). Guidelines for the application of Markov
models as well as some theoretical background are also described in IEC 61165:2006.
In the following section, we will confine attention to Markov processes with finite state
space (where the possible states of the system are numbered 1, 2, …, n) and continuous
time parameter t. Such a process can be characterized by means of the so-called transition
probabilities pi,j (t, s), where
{
pi , j (t , s) := P X (t) = j X ( s) = i } (3.1)
with 0 ≤ s ≤ t and i, j = 1, …, n. X(t) denotes the state of the system at time t. The preceding
relation is a sufficient description of a Markov process due to the assumption that the cur-
rent and future states of the system do not depend on its state at any time in the past (i.e.,
at any time before s).
Advanced Methods for RAM Analysis and Decision Making 41
3.2.1 Theoretical Background
For a homogeneous Markov process with a finite state space and continuous time param-
eter, we have, instead of Equation 3.1,
{ } {
pi , j (t) := P X (t + s) = j X ( s) = i = P X (t) = j X (0) = i }
with s ≥ 0, t ≥ 0, and i, j = 1, …, n. Noting that
1 for i = j
pi , j (0) =
0 for i ≠ j
for i, j = 1, …, n and under certain conditions, it can be shown that the transition probabili-
ties are the solution of the following system of differential equations:
n
d
dt
pi , j (t) = ∑ pi , k (t)ak , j + pi , j (t)a j , j,
k = 1, k ≠ j
pk , j (t)
with the so-called transition rates ak , j := limt↓0 , where i, j = 1,…, n, i ≠ j, and
∑
n t
a j , j := − a j ,l
l = 1,l ≠ j .
The so-called state probabilities pj(t) := P{X(t) = j}, with t ≥ 0 and j = 1, …, n, are obtained
from the transition probabilities as
p j (t) = ∑ p (0)p
i=1
i i, j (t).
Hence, it follows that the state probabilities are the solution of the system of differential
equations
d
∑
n
p j (t) = pk (t)ak , j + p j (t)a j , j ,
dt k = 1, k ≠ j (3.2)
which is equivalent to
d
p(t) = A p(t)
dt (3.3)
42 Handbook of RAMS in Railway Systems
∑
n
− a1,l a2 , 1 an , 1
l= 2
∑
n
a1,2 − a2 ,l an , 2
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
A= l = 1,l ≠ 2 .
∑
n− 1
a1,n a2 , n − an , l
l=1
The sums over the individual columns are equal to zero, which reflects the fact that the
change in probability over all states must always cancel out (independent of the current
state probabilities). The preceding system of differential equations for the state probabili-
ties has the formal solution
which makes use of the matrix exponential function. In simple cases, the preceding equa-
tions can be solved in a closed form; for the numerical computation of the matrix exponen-
tial function, see Moler and Van Loan (2003).
One can visualize the preceding relations using a state-transition diagram, as dem-
onstrated with the following examples. Consider first case (a) of a parallel system (hot
standby) consisting of two components A and B with failure rates λA and λB, respectively.
Suppose that there is no possibility of repair. The corresponding state-transition diagram
is shown in Figure 3.1. The fact that the system is supposed to be in the initial state at t = 0
is modeled by assuming p(0) = (1,0,0,0)T in Equation 3.4.
The set of possible states for the system is represented by four states: initial state 1, where
both components A and B are functioning; state 2 where component A is failed ( A), given
that no repair is possible after failure, and component B is functioning; state 3 where com-
ponent A is functioning and component B is failed (B); system failure state 4, where both
component A and component B are failed ( AB). State 4 is a so-called absorbing state, which
AB
Initial state
1
λA λB
AB 2 3 AB
λB λA
FIGURE 3.1
State-transition diagram for a parallel system of components A and B without repair. The overline negation sign
means that a component is failed.
Advanced Methods for RAM Analysis and Decision Making 43
implies that the system remains in this state because there are no outgoing transitions. The
corresponding transition matrix A of Equation 3.3 takes the following form:
−( λ + λ ) 0 0 0
A B
λ A − λ B 0 0
A= .
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
λ B 0 −λA 0
0 λB λA 0
The failure probability F(t) for the system is equal to the probability p4(t) of the system
being in state 4, which is—by definition—the failure state. The mean operating time to
∞
(first) failure is obtained as
0 ∫
[1 − F(t)] dt.
As a variant of the previous case, consider the case (b) in which both components are
repairable with repair rate μ in those configurations in which the system is functioning; on
the contrary, once the system has failed the components remain failed. The corresponding
state-transition diagram is shown in Figure 3.2.
The corresponding transition matrix A of Equation 3.3 now takes the following form:
−( λ + λ ) µ µ 0
A B
λA −( λ B + µ ) 0 0
A= .
λB 0 −( λ A + µ ) 0
0 λB λA 0
State 4 is again an absorbing state, and the failure probability F(t) for the system is equal
to the probability p4(t).
Finally, as the last case (c), consider repair of components A and B with repair rate μ
also when the system is in the failed state. The corresponding state-transition diagram is
shown in Figure 3.3, and the transition matrix A equals
−( λ + λ ) µ µ 0
A B
λA −( λ B + µ ) 0 µ
A= .
λ B 0 − ( λ A + µ ) µ
0 λB λA −2µ
AB
Initial state
1
λA
µ λB µ
AB 2 3 AB
λB λA
System failure state 4
(absorbing)
AB
FIGURE 3.2
State-transition diagram for a parallel system of components A and B with partial repair.
44 Handbook of RAMS in Railway Systems
AB
Initial state
1
λA
µ µ
λB
AB 2 3 AB
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
µ λA
λB
µ
4
System failure state
AB
FIGURE 3.3
State-transition diagram for the parallel system of components A and B with full repair.
It should be noted that the probability p4(t) associated with the—by definition—failure
state 4 here represents the instantaneous unavailability U(t), i.e., the probability that an item
is not in a state to perform as required at given instants. In order to calculate the mean
operating time to first failure, one would have to remove the outgoing transitions from the
failure state, which would effectively yield the transition matrix from case b.
The last case (c) of the example represents what is called an irreducible Markov process,
i.e., every state can be reached from every other state (possibly through intermediate states).
For an irreducible Markov process, it can be shown that there exist limiting distributions
pi > 0, i = 1,…, n, such that
lim pi (t) = pi
t→∞
independent of the initial distribution p(0). These limiting distributions are computed by
d
noting that at stationarity pj(t) = 0, j = 1,…, n and solving the linear system of equations
dt
derived from Equation 3.2. The state probabilities are uniquely determined by
n n
∑
n
where one (arbitrarily chosen) equation must be replaced by p j = 1. For the case (c) of
j=1
the example, this could look like
−( λ + λ )
A B µ µ 0 p1
0
λA −( λ B + µ ) 0 µ p2 0
= .
λB 0 −( λ A + µ ) µ p3 0
1 1 1 1 p4
1
In general, the system states can comprise more than one failure state. That is, the sys-
tem states are divided into two classes: one for the system being in an up state and one for
Advanced Methods for RAM Analysis and Decision Making 45
the down state. For the calculation of corresponding values for the mean operating time
between failures and mean down times for the stationary case, see, e.g., Birolini (2007).
• Definition of the objectives of the analysis, for instance, defining whether tran-
sient or stationary measures are of interest and identifying the relevant system
properties and boundary conditions, e.g., whether the system is repairable or
nonrepairable
• Verification that Markov modeling is suitable for the problem at hand; this includes
verification whether the random failure and repair times of the system compo-
nents can indeed be assumed independent and exponentially distributed
• Identification of the states required to describe the system for the objectives of the
analysis, for instance, several failure states may be required depending on the pos-
sible modes a system can fail and ways it can recover from a failed state; moreover,
system failure due to common causes may require additional failure states
• Verification whether the system states can be combined for model simplification
without affecting the results; this may be the case if the system exhibits certain
symmetries: for example, it may be sufficient to distinguish between the number
of functioning identical components, instead of distinguishing each of the indi-
vidual functioning components
• Setting up the transition matrix based on the identified system states; evaluating
the model according to the objective of the analysis; and interpreting and docu-
menting the results
3.2.3 Further Extensions
One extension of a homogeneous Markov process with finite state space is the so-called
semi-Markov process, where the times the system resides in a given state can be arbitrarily
distributed and not necessarily exponentially distributed. Another extension are Markov
reward models that take into account reward functions for transitions between states as
well as reward functions for the time the system is in a given state, hence allowing the
computation of a wider class of performance measures, such as average maintenance cost.
time steps only. They are described by initial probabilities p0 and transition probabilities
(instead of transition rates) represented through a transition matrix Π. The transition prob-
abilities indicate the probability that the system attains a certain state at time step i, given
that the system was in a certain state a time step i − 1. Suppose that the initial probabilities
of the four states of a nonrepairable railway facility are
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
In the preceding equation, the system states 3 and 4 can cause risks on railways, and their
probabilities are changing over time. The transition matrix Π that governs the changes of a
state that occur within a system is given by
π 11 π 12 π 13 π 14 0.95 0
0 0
π 21 π 22 π 23 π 24 0.05 0.9 0.1 0
∏= .
=
π 31 π 32 π 33 π 34 0 0.05 0.8 0
π 41 π 42 π 43 π 44 0 0.05 0.1 1
0.6
Failure probability
0.4
0.2
0
0 10 20 30 40 50
Years
FIGURE 3.4
Failure probabilities of a nonrepairable railway facility.
Advanced Methods for RAM Analysis and Decision Making 47
In the preceding equation, πjk, j ≠ k is the transition probability from state k to state j in
successive time steps. For example, π21 is the transition probability in state 2 at time step i,
given that it was in state 1 in time step i − 1. The probability of system failure as a function
of time is shown in Figure 3.4.
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
Let X be a random variable that follows an arbitrary probability distribution with cumu-
lative distribution function FX(x). The inverse transform method (see, e.g., the book by Zio
[2013]) uses the inverse function of FX−1 ( y ), where
{ }
FX−1 ( y ) = inf x : FX ( x) ≥ y , 0 ≤ y ≤ 1.
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
If the random variable U is uniformly distributed between 0 and 1, then it can be shown
that X = FX−1 (U ) has the cumulative distribution function FX(x). This is also illustrated in
Figure 3.5.
It may happen that it is not easy to compute the inverse of a complex cumulative distri-
bution function FX(x). In this case, the so-called acceptance–rejection method may be used;
see, e.g., the book by Zio (2013). Let f X(x) be the probability density function corresponding
to FX(x), i.e.,
d
f X ( x) = FX ( x)
dx
and let fG(x) be the probability density function of another random variable G, for which it
is easy to generate realizations. Further, there must exist a constant C ≥ 1 such that f X(x) ≤
C ∙ fG(x) for each x ∈ ℝ. Then, the following steps are carried out:
This is illustrated in Figure 3.6, where the crosses indicate accepted values, and the circles,
rejected values.
FX(x)
1
U
x
0 X = FX1(U)
FIGURE 3.5
Illustration of the inverse transform method.
C . fG (x)
fX (x)
0
x
0
FIGURE 3.6
Illustration of the acceptance–rejection method.
Advanced Methods for RAM Analysis and Decision Making 49
µ = E[Y ] =
∫ y ⋅ f (y)dy.
−∞
Y
µ = E[Y ] =
∫ g(x) ⋅ f (x)dx. (3.5)
X
µˆ n =
1
n ∑Y .
i=1
i
It should be noted that µ̂ n is a random variable, too. Intuitively, the estimated value µ̂ n
approximates well the true value μ if the number of sample n is large enough. Indeed, if Y
has a finite variance Var (Y) = σ2 < ∞, then it can be shown that the expected value of µ̂ n is
E[µ̂ n ] = µ ,
σ
Var(µˆ n ) = .
n (3.6)
s s
µˆ n − zα/2 ≤ µ ≤ µˆ n + zα/2 ,
n n
where zα/2 denotes the upper α/2 quantile of the standard normal distribution and
2
s =
1
n−1 ∑ (Y − µˆ ) ,
i=1
i n
2
n ≥ 2.
50 Handbook of RAMS in Railway Systems
An important special case of Equation 3.5 is when the function Y = g(x) is the index func-
tion g(x ) = A (x ), which is equal to 1 if x ∈ A and 0 otherwise. In this case, E[Y ] = E A (x )
is equal to the probability P(X ∈ A) = p. The estimated value p̂n of p is obtained as
n
pˆ n = A ,
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
where nA is the number of samples out of n for which X ∈ A, i.e., the number of “successes” of
falling in A. Approximate confidence intervals for p can also be computed for this case using
pˆ n (1 − pˆ n )
s2 = , n ≥ 2,
n−1
but it is also possible to obtain exact confidence intervals by noting that nA follows a bino-
mial distribution, see, e.g., the book by Birolini (2007).
For instance, in the context of risk analysis where measures such as value at risk are
sought, it may be required to compute the so-called empirical cumulative distribution function
FˆY ( y ) from the samples Y1, Y2,…,Yn, where
FˆY ( y ) =
1
n ∑ (Y ≤ y).
i=1
i
One possibility to estimate the error is through pointwise confidence intervals, which
may essentially be computed for each fixed point in the same way as for simple prob-
abilities described in the previous paragraph. A better alternative is to construct confidence
bands using the Dvoretzky–Kiefer–Wolfowitz inequality; see, e.g., the book by Wasserman
(2006). Estimation procedures for the quantiles of a cumulative distribution function are
also available as described by Guyader, Hengartner, and Matzner-Lober (2011).
the statistical dispersion of the emergency braking performance. For instance, the correc-
tion factor Kdry takes into account the rolling stock characteristics and represents the sta-
tistical dispersion of braking efforts on dry rails. It depends on a given Emergency Brake
Confidence Level (EBCL), which is transmitted by trackside and can range from 1 − 10−1 up
to 1 − 10−9. Due to the high confidence levels, variance reduction techniques and estimation
procedures suitable for extreme quantiles can prove useful.
Other applications of Monte Carlo simulation in the railway industry arise in the con-
text of risk analysis, in order to quantify scenarios that involve uncertain events; see,
e.g., the study by Vanorio and Mera (2013). Furthermore, the Monte Carlo simulation can
prove useful in order to model uncertainties related to the reliability characteristics of the
components. One might, for instance, assume uniform distributions for the failure rates
assigned to the basic events of a FT model, which, then, propagate through the tree to yield
a distribution for the failure rate of the top event of interest.
3.4 Petri Nets
Petri nets (PNs) are a technique for describing the global behavior of a system by model-
ing local states, local events (transitions), and their relations to each other. Their graphical
notation, in the form of a directed graph with tokens, allows intuitive and compact mod-
eling of deterministic or stochastic systems for various applications, such as concurrent
processes in logistics and transportation, as well as distributed and parallel communica-
tion systems.
Very relevant for RAM purposes are the so-called generalized stochastic PNs (GSPNs),
where random variables are introduced to represent random times until certain events
occur. In this context, local events may represent failure or repair events, while local states
may represent functioning states, failure states, or states with various degrees of degraded
performance.
In case the random transitions in the model are exponentially distributed and the deter-
ministic transitions have zero delay (immediate transitions), GSPNs can be represented as
Markov models; however, the former usually have a much more compact representation.
52 Handbook of RAMS in Railway Systems
A detailed description of PNs can be found, e.g., in IEC 62551:2012 and in the book by
Marsan et al. (1996).
• Places, which are graphically indicated by circles and denoted by P1, P2,…,Pn. They
may contain zero, one or more tokens to describe the local state. The combination
of the number of tokens in the places is referred to as the marking of a PN and rep-
resents information about the global system state at a given time
• Transitions denoted by T1, T2,…,Tm, which are indicated by outlined rectangles in
case of exponentially distributed random transitions and by solid bars in case of
immediate transitions
• Directed arcs, which indicate the relationships between places and transitions. A
so-called input arc is an arc from a (input) place to a transition, whereas an output
arc is an arc from a transition to a (output) place. A transition is said to be enabled,
i.e., any deterministic or random times are counted from that instant, when the
input places connected to it each have the required number of tokens, which is
determined by the multiplicity of the arc. Output arcs can have a multiplicity,
too, which describes the number of tokens put into a place after a transition has
occurred. If no multiplicity is indicated next to an arc, it defaults to one.
In addition, GSPNs may contain other elements such as inhibitor arcs, which indicate
that a local state may disable a transition instead of enabling it. These can have multiplici-
ties similar to ordinary arcs. Moreover, there exist testing arcs, which do not reduce the
number of tokens in an input place IEC 62551:2012.
The dynamic behavior of the system described by means of GSPNs is determined by the
following rules:
As a simple example, consider the stochastic PN from Figure 3.8, which models a paral-
lel system of two components with identical failure rates λ and repair rates μ. The (initial)
marking shown in the figure indicates that both components are currently functioning.
Despite the similarity of the graph shown in the figure to an analogous Markov model,
there is, generally, a fundamental notational difference between PNs and Markov models:
Advanced Methods for RAM Analysis and Decision Making 53
T1 P2
P1
P3
(before transition)
T1 P2
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
P1
P3
(after transition)
FIGURE 3.7
Transition in a stochastic PN.
T1 [2λ] T2 [λ]
P1 P2 P3
T4 [µ] T3 [µ]
FIGURE 3.8
Example of a parallel system modeled with a stochastic PN.
in a Markov model, each state denotes one of the possible global states of the system; in a
PN, the individual places with given numbers of tokens in that place denote local states,
while the possible states of the system are represented by all possible combinations of
tokens in the respective places (which may depend on the initial marking).
The following (iterative) steps should be taken when constructing and analyzing com-
plex PN models IEC 62551:2012:
Extensions of PNs have been developed, for instance, where certain properties can be
attached to the tokens of so-called colored PNs; see, e.g., the book by Jensen and Kristensen
(2009). For guidelines on efficient construction of PNs, one may consult Signoret et al. (2013)
54 Handbook of RAMS in Railway Systems
FIGURE 3.9
Simplified PN model of a railway level crossing. (Based on IEC 62551:2012. Analysis Techniques for Dependability—
Petri Net Techniques, IEC, Geneva, 2012.)
Advanced Methods for RAM Analysis and Decision Making 55
in BNs are random variables, and the directed links between them represent the depen-
dence structure (or causal relation). The random variables represent a finite set of possible
mutually exclusive states. The directed links are used to create dependencies among the
random variables in the BNs.
A conditional probability table (CPT) is attached to child random variables, which is defined
conditional on its parents. In Figure 3.10, X3 and X4 are the children of the parent X2. The
variable X2 has no parents and therefore it has unconditional probabilities, called prior
probabilities.
The BNs are based on the so-called d-separation properties of causal networks. In a serial
connections (such as formed by random variables X2, X4, X6 in Figure 3.10), the evidence
is transmitted through the network if the state of the intermediate random variable is not
known with certainty. In a diverging connection (such as formed by random variables X2,
X3, X4), the evidence is transmitted when the state of the common parent variable is not
known with certainty. Random variables X3, X4, X5 form a converging connection in which
evidence is transmitted only if there is some information about the child variable or one
of its descendants.
Suppose we have the BN from Figure 3.10 with random variables that are defined in a
finite space X = [X1, X2,…,X6]. For the discrete case, the probability mass function (PMF)
of each random variable is defined conditional on its parents, i.e., p(x1|x2,…,x6), and the
joint PMF p(x) of a set of random variables X is obtained by multiplying all the conditional
probabilities:
( ) ( ) ( ) ( )
p( x ) = p x6 x4 ⋅ p x5 x4 , x3 , x2 ⋅ p x4 x2 ⋅ p x3 x2 , x1 ⋅ p( x2 ) ⋅ p( x1 ).
X1 X2
X3 X4
X5 X6
FIGURE 3.10
Exemplary BN.
56 Handbook of RAMS in Railway Systems
BNs have the advantage that they can predict (using the total probability theorem, which
is the denominator in Bayes’s theorem) and estimate (using Bayes’s theorem) states of ran-
dom variables:
p(Y|X )p(X )
p(X|Y ) = .
∑
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
p(Y|X )p(X )
X
This equation essentially states that the posterior probability of X given Y is equal to the
probability p(Y|X) p(X) = p(X ∩ Y) divided by the probability of Y.
The details on how random variables are dependent on each other and how information
can flow in such networks are well explained, for instance, in the books by Jensen and
Nielsen (2007) and Barber (2012). For the application of BNs in various industries, we refer
to Lampis and Andrews (2009) and Oukhellou et al. (2008).
3.5.2 Inference in BNs
BNs are used for performing three tasks—structural learning (SL), parameter learning (PL),
and probabilistic inference (PI). SL and PL are data-driven processes, whereas PI is used to
compute the probability values for a set of variables, given some evidences in the network.
In SL, algorithms determine the topology of the network model, such as the number of
links and directions among the random variables in the network. In PL, unknown param-
eters of the joint or conditional probability distributions in BNs are determined from given
data, using algorithms such as the expectation maximization algorithm. For more details
on SL, PL, and PI, the reader is referred to the books by Jensen and Nielsen (2007), Koller
and Friedman (2009), and Russell and Norvig (2009).
A number of questions related to the (conditional) probability of variables being in a
given state can be answered using PI. Consider again the network in Figure 3.10, where all
random variables have m = 2 states named “True” and “False.” In order to answer ques-
tions such as what is the probability of random variable X6 being in a specific state, say,
“True”, given that another (set of) random variables X3 is observed to be equal to “True”,
one must compute the following:
One has to solve the joint PMFs (numerator) and marginalized probability (denomi-
nator) in the preceding equation. A number of algorithms (including both exact and
approximate) exist to answer probability questions in a BN. The exact inference meth-
ods such as inference by enumeration and variable elimination are used for relatively
smaller and simpler BNs, which have discrete random variables. Large or complex BNs
require simulation-based approximate algorithms such as direct sampling, rejection sam-
pling, likelihood weighting, and Markov chain Monte Carlo. The major advantage of the
approximate algorithms is that they allow handling continuous random variables. In other
words, the discretization of continuous variables can be avoided. The main disadvantage
of the approximate algorithms is that it is difficult to analyze and assess their accuracy.
For details on approximate algorithms, one may consult Koller and Friedman (2009) and
Russell and Norvig (2009).
Advanced Methods for RAM Analysis and Decision Making 57
required directed links. For example, the logical AND and OR gates, respectively, for the
random variables X1 and X2 from Figure 3.10 are expressed as CPTs, as shown in Table 3.1.
It is easy to see that BNs represent a methodology with which some of the limitations
of classical FT analysis (FTA) can be overcome. For instance, the modeling of more than
only two binary states per variable, as well as dependent events and noncoherent systems
(introduced by Exclusive-OR or NOT gates) is straightforward. More details regarding the
mapping of FTA and BNs can be found, e.g., in the study by Bobbio et al. (2001).
Similar to FTs, event trees (ETs) can be mapped to BNs by performing numerical and
graphical related algorithmic tasks. A graphical representation of the simplified mapping
procedure for both FTA and ET analysis (ETA) based on the thesis by Mahboob (2014) is
presented in Figure 3.11.
For understanding of the mapping procedure, the nodes in the BNs are further classified
as follows:
• Root nodes of BNs are used to represent basic events of the FTA, and technical and
operational barriers as well as neutralizing factors in the ETA. Barriers can actively
mitigate the hazard evolution to an accident. They can be characterized by their
functionality, frequency of occurrence, and effectiveness in reference to specific
hazard occurrences. Neutralizing factors also mitigate the hazard propagation to
an accident after the barriers have failed.
• Intermediate nodes of BNs are used to represent intermediate events, i.e., logical
gates for FTA and intermediate scenarios for ETA.
• Leaf node of BNs are used to represent the output of network models, i.e., one or
more top events for FTA and one or more consequences in ETA.
3.5.4 Influence Diagrams
Influence diagrams (IDs) are the extensions of BNs, where decision nodes (rectangular)
and utility nodes (diamond shaped) are added (see Figure 3.12) to the regular nodes. The
directed links (arrows) represent probabilistic dependencies among the system, decision,
and utility variables in the IDs.
A decision analysis with given information (such as on system states and their prob-
abilities, decision alternatives, and their utilities) is called prior analysis. In the IDs with
TABLE 3.1
CPTs Equivalent to Logical AND and OR Gates in FTs
AND Gate OR Gate
X1 = “True” “False” X1 = “True” “False”
X2 = “True” “False” “True” “False” X2 = “True” “False” “True” “False”
P(X3 = “True”) 1 0 0 P(X3 = “True”) 1 1 1 0
P(X3 = “False”) 0 1 1 1 P(X3 = “False”) 0 0 0 1
58 Handbook of RAMS in Railway Systems
Barriers, neutralizing
Basic events Root nodes
factors
Graphical mapping
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
FIGURE 3.11
Simplified mapping procedure of FTs and ETs to BNs. (Based on Q. Mahboob, A Bayesian Network Methodology
for Railway Risk, Safety and Decision Support, PhD thesis, Technische Universität Dresden, Dresden, 2014.)
X1 X2
Decision
X3 X4
node
Utility
X5 node
FIGURE 3.12
Exemplary ID.
given information, the decision nodes are parents to the system variables and the utility
nodes. In other words, the decision nodes influence both the system variables and utility
outcomes. Figure 3.12 is a simple representation of a prior decision analysis using IDs.
In the prior analysis, the evaluation and decision optimization are made on the basis of
probabilistic modeling and statistical values available prior to any decision. Guidelines on
the interpretation of uncertainties and probabilities in engineering decision making can
be found in the study by Faber and Vrouwenvelder (2008).
The optimal set of decisions dopt is identified as the set of decisions that maximizes the
expected utility. Mathematically, it is written as
Advanced Methods for RAM Analysis and Decision Making 59
dopt = arg max E u(d, x ) = arg max
d d ∑ u(d, x) ⋅ p ( x d).
x
In the equation above, d = (d1, d2,…,dn) are sets of decision alternatives, u(d, x) is the util-
ity function, p(x|d) is the probability, E[u(d, x)] is the expectation operator, and x is a vector
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
of system variables (from BNs). By using prior analysis, the simple comparison of utilities
associated with different system outcomes, which are influenced by the system variables
and sets of decision alternatives, can be performed, and the sets of decision alternatives
can be ranked and optimized.
IDs are described, e.g., by Jensen and Nielsen (2007) and Barber (2012). For a classical
treatment of statistical decision theory, see the book by Raiffa and Schlaifer (1961).
In the preceding equation, the basic events CTA and HTS are statistically independent,
meaning that p(CTA ∩ HTS) = p(CTA) ⋅ p(HTS).
Driver errors
SPAD Failures and protections TPWS fails
cess/adja
failures
cent
Collapse Train
derailment
Clearance Speed and Curve in track
alignment alignment
FIGURE 3.13
BNs model for train derailment accident equivalent to the models based on FTA and ETA from Chapter 2. (From Q. Mahboob, A Bayesian Network Methodology for
Railway Risk, Safety and Decision Support, PhD thesis, Technische Universität Dresden, Dresden, 2014.)
Handbook of RAMS in Railway Systems
Advanced Methods for RAM Analysis and Decision Making 61
3.6 Summary
A number of advanced modeling techniques for complex systems used in the railway
industry have been described in this chapter. Particularly for modeling of systems with
redundancy, concurrent behavior, and complex operation and maintenance strategies,
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
Markov models are presumably more intuitive to develop and understand compared with
PNs. However, PNs allow a more compact modeling and are therefore better suited for
modeling complex systems encountered in practice. Monte Carlo simulation offers great
modeling flexibility to take into account any uncertainties and to provide solutions for
complex systems of any kind. This, however, comes at the cost of increased effort in devel-
oping the simulation models, oftentimes by means of writing programming code. Further,
the convergence of the numerical results must always be assessed. Finally, BNs and IDs
are powerful techniques for modeling and handling uncertainties and complex dependen-
cies in risk scenarios and decision problems. However, BNs are less suitable for modeling
problems in continuous time and for seeking stationary solutions.
References
D. Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press, Cambridge, UK,
2012.
A. Birolini, Reliability Engineering: Theory and Practice, Fifth Edition, Springer, Berlin, 2007.
A. Bobbio, L. Portinale, M. Minichino, and E. Ciancamerla, Improving the analysis of dependable
systems by mapping fault trees into Bayesian networks, Reliability Engineering and System
Safety 71, pp. 249–260, 2001.
M. H. Faber and A. C. W. M. Vrouwenvelder, Interpretation of Uncertainties and Probabilities in Civil
Engineering Decision Analysis, Background Documents on Risk Assessment in Engineering,
No. 2, Joint Committee of Structural Safety, http://www.jcss.byg.dtu.dk/, 2008.
A. Giua and C. Seatzu, Modeling and supervisory control of railway networks using Petri nets, IEEE
Transactions on Automation Science and Engineering, 5, 3, pp. 431–445, July 2008.
A. Guyader, N. Hengartner, and E. Matzner-Lober, Simulation and estimation of extreme quantiles
and extreme probabilities, Applied Mathematics & Optimization, 64, 2, pp. 171–196, 2011.
IEC (International Electrotechnical Commission) 61165:2006: Application of Markov techniques, IEC,
Geneva.
IEC 62551:2012: Analysis Techniques for Dependability—Petri Net Techniques, IEC, Geneva.
IEC 60050-192:2015: International Electrotechnical Vocabulary—Part 192: Dependability, IEC, Geneva.
IEC 61703:2016: Mathematical Expressions for Reliability, Availability, Maintainability and Maintenance
Support Terms, IEC, Geneva.
International Union of Railways, UIC B 126/DT 414: Methodology for the Safety Margin Calculation of
the Emergency Brake Intervention Curve for Trains Operated by ETCS/ERTMS, International Union
of Railways, Paris, 2006.
K. Jensen and L. M. Kristensen, Coloured Petri Nets; Modelling and Validation of Concurrent Systems,
Springer, Berlin, 2009.
F. V. Jensen and T. D. Nielsen, Bayesian Networks and Decision Graphs, Springer, Berlin, 2007.
D. Koller and N. Friedman, Probabilistic Graphical Models—Principles and Techniques, MIT Press,
Cambridge, MA, 2009.
D. P. Kroese, T. Taimre, and Z. I. Botev, Handbook of Monte Carlo Methods, Wiley, Hoboken, NJ, 2011.
62 Handbook of RAMS in Railway Systems
M. Lampis and J. D. Andrews, Bayesian belief networks for system fault diagnostics, Quality and
Reliability Engineering International, 25, 4, pp. 409–426, 2009.
A. Lisnianski, I. Frenkel, and Y. Ding, Multi-state System Reliability Analysis and Optimization for
Engineers and Industrial Managers, Springer, Berlin, 2010.
Y. Liu and M. Rausand, Proof-testing strategies induced by dangerous detected failures of safety-
instrumented systems, Reliability Engineering & System Safety, 145, pp. 366–372, 2016.
Downloaded By: 10.3.98.104 At: 04:45 03 Jul 2021; For: 9781315269351, chapter3, 10.1201/b21983-3
Q. Mahboob, A Bayesian Network Methodology for Railway Risk, Safety and Decision Support, PhD thesis,
Technische Universität Dresden, Dresden, 2014.
M. A. Marsan, G. Balbo, G. Conte et al., Modelling with Generalized stochastic Petri Nets, Wiley,
Hoboken, NJ, 1996.
C. Moler and C. Van Loan, Nineteen dubious ways to compute the exponential of a matrix, twenty-
five years later, SIAM Review, 45, 1, pp. 3–49, 2003.
L. Oukhellou, E. Côme, L. Bouillaut, and P. Aknin, Combined use of sensor data and structural
knowledge processed by Bayesian network: Application to a railway diagnosis aid scheme,
Transportation Research Part C: Emerging Technologies, 16, 6, pp. 755–767, 2008.
L. Podofillini, E. Zio, and J. Vatn, Modelling the degrading failure of a rail section under periodic
inspection. In: Spitzer, C., Schmocker, U., and Dang V. N. (eds), Probabilistic Safety Assessment
and Management (pp. 2570–2575). Springer, London, 2004.
D. Prescott and J. Andrews, Investigating railway track asset management using a Markov analysis,
Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit, 229,
4, pp. 402–416, 2015.
H. Raiffa and R. Schlaifer, Applied Statistical Decision Theory, Harvard Business School, Boston, MA,
1961.
M. Rausand and A. Høyland, System Reliability Theory: Models, Statistical Methods, and Applications,
Second Edition, Wiley, Hoboken, NJ, 2004.
F. J. Restel, M. Zajac, Reliability model of the railway transportation system with respect to hazard
states, In: 2015 IEEE International Conference on Industrial Engineering and Engineering Management
(IEEM), pp. 1031–1036, Institute of Electrical and Electronics Engineers, Piscataway, NJ, 2015.
R. Y. Rubinstein, D. P. Kroese, Simulation and the Monte Carlo Method, Third Edition, Wiley, Hoboken,
NJ, 2016.
S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, Pearson, London, 2009.
J.-P. Signoret, Y. Dutuit, P.-J. Cacheux, C. Folleau, S. Collas, and P. Thomas. Make your Petri nets
understandable: Reliability block diagrams driven Petri nets, Reliability Engineering and System
Safety, 113, pp. 61–75, 2013.
G. Vanorio and J. M. Mera, Methodology for risk analysis in railway tunnels using Monte Carlo
simulation, pp. 673–683, Computers in Railways XIII, WIT Press, Ashurst, UK, 2013.
L. Wasserman, All of Nonparametric Statistics, Springer, Berlin, 2006.
D. Wu and E. Schnieder, Scenario-based system design with colored Petri nets: An application to
train control systems, Software & Systems Modeling, pp. 1–23, 2016.
A. Zimmermann and G. Hommel. A train control system case study in model-based real time system
design. In: Parallel and Distributed Processing Symposium, Institute of Electrical and Electronics
Engineers, Piscataway, NJ, 2003.
E. Zio, Computational Methods for Reliability and Risk Analysis, World Scientific, Singapore, 2009.
E. Zio, The Monte Carlo Simulation Method for System Reliability and Risk Analysis, Springer, Berlin,
2013.