You are on page 1of 5

Reliability Engineering and System Safety 66 (1999) 171–175

www.elsevier.com/locate/ress

Application of micro Markov models for quantitative safety assessment to


determine safety integrity levels as defined by the IEC 61508 standard for
functional safety
B. Knegtering a,*, A.C. Brombacher b
a
Honeywell Safety Management Systems, P.O. Box 116, 5201 AC s-Hertogenbosc, The Netherlands
b
Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
Received 28 October 1998; accepted 13 February 1999

Abstract
This paper presents a method that will drastically reduce the calculation effort required to obtain quantitative safety and reliability
assessments to determine safety integrity levels for applications in the process industry. The method described combines all benefits of
Markov modeling with the practical benefits of reliability block diagrams. q 1999 Elsevier Science Ltd. All rights reserved.
Keywords: Reliability block diagrams; Fault tree analysis; Micro Markov analysis

1. Introduction mon for the generated Markov models to contain up to 500–


600 states. This means that solving Markov models will be a
In this high-tech age, most industrial processes are safe- time-consuming task, even on today’s powerful computers.
guarded at all times. If important process parameters are not Fig. 1 illustrates the dramatic increase of Markov states as
monitored closely, industrial processes may constitute a real the complexity of the safety function grows. A Markov
safety risk to their environment. The main objective of safe- model will have 48 states if only the Main Part components
guarding systems is to reduce the Equipment Under Control are considered. If the model is expanded to include Input
risk to an acceptable level considering human safety, safety Module parts and Output Module parts, the number of states
of the environment and economic benefits. grows fast, and this growth rate will increase even further if
In order to categorize the risk reduction factors, safety field devices like sensors and actuators are added to the
standards such as the IEC 61508 and the ANSI/ISA S84.01 safety function.
have defined Safety Integrity Levels (SILs). It has turned out The proposed calculation model focuses on a redundant
that it is extremely difficult to validate the SILs of safety safety system equipped with a Watchdog module.
functions. To do this, you need to calculate the probability
of failure on demand. A number of techniques are available
2. Basic probability calculation assumptions
to perform these calculations, including Fault Tree Analy-
sis, Reliability Block Diagrams (RBDs) and Markov analy-
At the first level, a safeguarding system has two system
sis.
states:
Markov modeling is the most comprehensive technique
used today. However, it is very difficult to handle large • the system is active; or
Markov models as they require a tremendous amount of • the system has tripped.
calculation [1,2,3].
Fig. 2 shows the probabilities of the two system states for
Several calculation tools have been developed, but they
a certain period of time.
all suffer from one big drawback: the exploding size of
Consider the system at a second level. The probability
Markov models. For redundant safety loops, it is not uncom-
that the system is active is split up into two states:

* Corresponding author. Fax: 1 31-73-621-9125.


• the system is active and functioning okay; or
E-mail address: bert.knegtering@netherlands.honeywell.com • the system is active and not functioning okay; it has
(B. Knegtering) failed to function.
0951-8320/99/$ - see front matter q 1999 Elsevier Science Ltd. All rights reserved.
PII: S0951-832 0(99)00034-4
172 B. Knegtering, A.C. Brombacher / Reliability Engineering and System Safety 66 (1999) 171–175

Fig. 1. Growth of Markov states.

The probabilities of the states, that the safeguarding Undetected, Dangerous Detected and Dangerous Unde-
system may be in, can be calculated (see Fig. 3) using the tected. This is why Markov models for large system will
following basic probability axioms: often only consider combinations of up to two failures. This
may be justified by assuming that the probability of states
Psystem is active ˆ 1 2 Psystem has tripped ;
where three failures have occurred is negligible compared to
the probability of being in a state where only one or two
Psystem is okay ˆ Psystem is active 2 Psystem has failed to function : failures have occurred.
The above principle serves as the basis for splitting up a
full Markov model into “Micro” Markov models, using rear- 4. Combining reliability block diagrams and Markov
ranged Reliability Block Diagrams. modeling

As the considered 2oo3 system fails due to a combination


3. Full Markov modeling technique of two failures, the Reliability Block Diagram can be rear-
ranged into pairs of parallel Reliability Blocks (Fig. 6).
The Reliability Block Diagram (Fig. 4) shows a system
Once again, the system has failed if two of its components
that consists of three components with two-out-of-three
fail.
(2oo3) voting being applied. The system will fail if two
Using the conventional RBD method, the probability of
out of the three components have failed, resulting in a lost
failure would be:
connection between U and V.
Fig. 5 shows the Markov model that is generated for the Psystem has failed ˆ Pa Pb 1 Pa Pc 1 Pb Pc 2 2Pa Pb Pc ;
2oo3 system as shown in Fig. 4. It distinguishes between a
total of eight states. Please note that only the failure transi- where Pa, Pb, Pc denote the probability that components a, b
tions (l i) are reflected, and that the repair rates have not or c has failed.
been added. As the probability of the third-order factor is extremely
Solving the model means calculating the probability of small, it may be neglected. This allows us to calculate the
states in which the system has failed. In the figure below, probability of system failure after a certain period of time
these states are marked dark gray (states 5, 6, 7 and 8). for each pair of Reliability Blocks (Fig. 7).
It is not hard to imagine that the Markov models for large The probability of failure would be:
systems will be huge. After all, safety loops will often Psystem has failed ˆ 1 2 …1 2 PI †…1 2 PII †…1 2 PIII †;
contain many more components than three, each being
able to fail in four different ways: Safe Detected, Safe where PI is the probability of a lost connection between U

Fig. 2. Active vs. tripped. Fig. 3. Fail to function.


B. Knegtering, A.C. Brombacher / Reliability Engineering and System Safety 66 (1999) 171–175 173

Fig. 6. Pairs of parallel reliability blocks.

Fig. 4. Three-component system with 2oo3 voting.


are required. If the micro Markov matrices are multiplied
one time, a 4 × 4 matrix is multiplied, which requires only
and V, PII is the probability of a lost connection between W 128 flops.
and X, and PIII is the probability of a lost connection As there are three micro Markov models to be multiplied,
between Y and Z. a total of 384 flops are required.
A simple Markov model can be used to calculate prob- 0 1 0 1
ability PI, as shown in Fig. 8. The simplified model does not ← 8 ! ← 8 !
B C B C
only include the failure transitions (l i), but also the repair B" C B" C
B C B C
transitions (m i). B C×B C ˆ 1024 flops
B C B C
B8 C B8 C
The probability of Markov state 4 corresponds to PI. @ A @ A
Probabilities PII and PIII are calculated in the same manner. # #
The Venn diagram shown in Fig. 9 visualizes the applica- 0 1 0 1
tion of the probability assumptions for PI, PII, PIII and Psystem ← 4 ! ← 4 !
B C B C
has failed PI is represented by sectors 5 and 8, PII by sectors 6 B" C B" C
B C B C
and 8, and PIII by sectors 7 and 8. Psystem has failed is reflected by B C×B C ˆ 128 flops:
B C B C
B4 C B4 C
sectors 5, 6, 7 and 8. @ A @ A
The state numbers in the Venn diagram correspond to the # #
state numbers of the full Markov model shown in Fig. 5. It
will immediately be obvious that the combination of the It can be concluded that the significant reduction of the
occurrence of three failures is also represented in sector 8, number of flops (384 vs. 1024) will result in a calculation
without having to generate this state in the micro Markov effort reduction by almost a factor 3. This factor will
models. become even higher if more complex systems are consid-
ered.
5. Impact on the reliability calculation effort
6. Creating micro Markov models
The impact on the calculation effort will be determined
using the Markov matrix multiplication method (ISA dTR
Fig. 10 presents the steps to be taken from implementing
84, draft 5) [4]. In order to calculate the system safety, the
the Process & Instrumentation Diagram up to the calculation
transition rates of the Markov model are presented in matrix
of the system reliability.
notation.
Conventionally, three steps need to be carried out:
If an N × N Markov matrix is multiplied one time, 2N 3
flops (floating point operations) are required to complete the 1. translating a Process & Instrumentation Diagram into a
calculation. For an 8 × 8 matrix multiplication, 1024 flops Reliability Block Diagram;

Fig. 5. Markov model for 2oo3 system.


174 B. Knegtering, A.C. Brombacher / Reliability Engineering and System Safety 66 (1999) 171–175

Fig. 7. Failure probability for pairs of reliability blocks.

micro Markov models are used, the analysis results in 24


Markov models with 2 states and 118 Markov models with 4
states. This means that the number of flops is dramatically
reduced: 6.40 × 10 8 flops vs. 15.48 × 10 3 flops (2 × 684 3 vs.
24 × 2 × 2 3 1 118 × 2 × 4 3). The calculation time required
is reduced from hours for a full Markov model to seconds
for micro Markov models.

Fig. 8. Simplified Markov model. 7. Comparison of reliability calculations using different


techniques

In order to uncover the impact on the calculation results


using the micro Markov modeling technique versus conven-
tional Markov modeling and Reliability Block Diagrams,
PFD calculations are performed. (PFD ˆ Probability of
Failure on Demand, as laid down in IEC 61508). The
PFD’s are calculated for the 2oo3 voting system as shown
in Fig. 4 [5,6].
Four calulation methods are applied:
1. Full Markov analysis. System states are involved where
Fig. 9. Combined failure probability.
all three modules have failed.
2. Full Markov analysis restricted to system states where a
maximum of two modules have failed.
2. generating a Markov model from the Reliability Block 3. Micro Markov analysis which considers combinations of
Diagram; and two failures.
3. solving the Markov model and calculating the system 4. Reliability Block Diagrams.
reliability.
Fig. 11 shows the calculation results using the four tech-
Redefining the Reliability Block Diagram results in an niques mentioned above.
extra step to be taken, but this will save a lot of calculation The first thing that catches the eye is the considerable
time when solving the Markov model. difference between Markov modeling and Reliability
If the safeguarding system shown in Fig. 1 is analyzed, Block Diagrams. This can be explained by the different
the total number of states in a full Markov model is 684. If usage of the repair rates. (See also Rouvroye et al. “New

Fig. 10. Comparison of conventional vs. micro Markov modeling.


B. Knegtering, A.C. Brombacher / Reliability Engineering and System Safety 66 (1999) 171–175 175

Fig. 11. Comparison of PFD calculations using different techniques.

quantitative safety standards: different techniques, different be solved analytically, which once again simplifies the
results?”.) probability calculation.
Another thing is the small differences between all three 4. The results of the micro Markov modeling calculation
Markov-based modeling techniques. However, restricted appear to be more conservative (i.e. “safer”) compared
Markov models show a more optimistic safety performance to restricted Markov modeling, which considers a maxi-
compared to full Markov models, which also considers mum of two failures.
states in which more than two modules have failed. If
micro Markov modeling is applied, the values turns out to
be a bit more conservative, i.e. a more safely calculated References
performance. The micro Markov modeling technique is
[1] Xing L, Fleming KN, Loh WT. Comparison of Markov model and fault
therefore preferable to the restricted Markov modeling tech-
tree approach in determining initiating event frequency for systems
nique, which considers a maximum of two failures. with two train configurations. Reliability Engineering and System
Safety 1996;53:17–29.
[2] Rouvroye JL, Brombacher AC, et al. Uncertainty in safety. New tech-
8. Conclusions niques for the assessment and optimisation of safety in process indus-
try. SERA-Vol. 4, Safety Engineering and Risk Analysis, ASME, San
1. Building large Markov models is very time-consuming Francisco, 1995.
and very susceptible to modeling errors. [3] ISA S84. 67 Alexander Drive, P.O. Box 12277, Research Triangle
2. To practically handle reliability calculations using Park, NC 27709.
[4] ISA TR84.0.02. Version 3, 67 Alexander Drive, P.O. Box 12277,
Markov modeling, the Reliability Block Diagram should Research Triangle Park, NC 27709, December 1997.
first be redefined. This must be done in such a way that [5] IEC 61508. Functional safety of electrical/electronic/programmable
the number of failure-redundant parts is minimized. electronic safety-related systems.
Solving many small Markov models takes a lot less [6] IEC 61078. Analysis techniques for dependability—Reliability block
calculation effort than solving one huge Markov model diagram method, 1991.
[7] Rouvroye JL, Brombacher AC. New quantitative safety standards:
that contains everything [7]. different techniques, different results? Proceedings of the ESREL
3. When systems are considered which are a maximum of conference on European Safety and Reliability, Trondheim, 16–19
one-fault tolerant, the micro Markov models can easily June, 1998.

You might also like