You are on page 1of 18

www.acm.ab.

ca
info@acm.ab.ca

A division of ACM Automation Inc

Failure rates – Analysis and calculations as per IEC 61511

Mohammed Al-Sayed, Mr.


Ken Bingham, Mr.

ACM, Calgary, Alberta, Canada

02 February, 2004

Abstract

With the adoption of IEC 61511 Functional Safety – Safety Instrumented Systems for the Process
Industry Sector by many companies, the question of which standard to follow in designing Safety
Instrument Systems (SIS) has been answered. However, a more challenging question is where to find
quality failure rate data, and how to use it to comply with the intent of IEC 61511. ACM Facility Safety
reviewed OREDA failure rate data and applied the IEC 61511 standard to this data source to calculate the
value of the failure rate. Specifically, the issue of how to determine the failure rate such that it
demonstrates the mean time to failure on a statistical basis to a single sided lower confidence limit of at
least 70%, which is specified in IEC Standard 61511-1, Section 11.9.2-C., was addressed. The selection
of failures rate from data sources in order to use it in PFD calculations is not as simple as one might think.
The user needs to understand the assumptions made in the IEC standard and in data sources like
OREDA in terms of accuracy, uncertainty, modeling, and values. Correctly applying the IEC standards
means that instrumentation (valves, transmitters, logic solvers, etc.) is optimized and results in significant
lifecycle savings. A mathematical solution is offered to help select accurate and reasonable failure rate
data, not too conservative and yet compliant with the IEC standard. This is a great starting point for
system designers.
Conclusions – Using the recommended calculation, based on λ MTTF , the PFD for a SIL loop is
approximately 20% higher than using the λ mean. Care should be practiced when using failure rate data
sources, especially for those looking to meet IEC 61511 requirements. The recommended calculation
provides designers a value for the failure rate that complies with the IEC requirements.

www.safetyusersgroup.com 1 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc

1.0 Purpose
During SIL Assessment (also called SIL Determination) and SIL Validation (also called SIL Calculation),
failure rate values must be used that comply with the IEC 61511 standard1. The standard stipulates that
when choosing failure rate data for a subsystem, a “recognized industry source or experience of the
previous use of the subsystem in the same environment as for the intended application” be used.
Moreover, the standard recommends that failure rates be determined such that they demonstrate “that the
claimed mean time to failure on a statistical basis to a single sided lower confidence limit of at least 70%”.

This article will focus on three main areas:


• An interpretation of the IEC standard1;
• A general understanding of the phase “failure rate” and an example of a typical source of
failure rate data (OREDA); and
• The application of the IEC standard1 to a typical source of data, such as OREDA, to calculate
the value of the failure rate.

1
IEC Standard 61511-1, Section 11.9.2-C.

www.safetyusersgroup.com 2 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


2.0 General
Equipment failures are hazardous events that have a significant level of undesired consequences. In
order for an operator to minimize the risks associated with these hazardous events a number of actions
are required. The IEC standards 61508 and 61511 provide a performance-based, not prescriptive,
guidance on how to identify, assess, and mitigate process hazards.

Calculating the probability of failure on demand for process equipment, which may act as safeguards on a
system, allows the operator to determine the level of risk associated with the hazardous event. In
addition, it allows the operator to identify the level of reliability for that system. In case the system did not
meet the required safety integrity level, the operator may be required to take appropriate action to
increase the reliability of the system. Such actions can be in the form of adding redundant equipment,
increase the diagnostic coverage, or choose more reliable equipment by choosing equipment with a lower
failure rate. The question then is; how can one obtain the data, and how reliable is it?

The probability of failure on demand can be calculated by various methods, some of which are mentioned
in the IEC Standard2 such as LOPA analysis and fault tree. However, one needs the failure rate (λ). The
assumption is made in the standard that “component failure rates are constant over the life time of the
3
system” . It can be shown mathematically that for the exponential distribution, the failure rate is
constant4. Consequently, the assumption in the IEC standard is effectively saying the time to fail follows
an exponential distribution.

In the past, choosing the failure rate was done by going to an industry source, such as OREDA, and
picking the subsystem under consideration, then using the tables to reach the λ value. But it is not that
simple for a typical subsystem, and for a typical failure mode, because OREDA provides the mean, the
standard deviation, and what is called the upper and lower limits of the failure rate. Therefore, one may
ask:
• Why are all these values provided, and what do they mean?
• Which value should be used, if any, and why?
• What is recommended by the IEC standard1 and how is it related to the available data
sources?
• How would the choice of failure rate data impact on the SIL validation assessment?

This article answers the previous questions and provides an interpretation to what is stated in the IEC
standard1.

2
Refer to IEC 61508-6 standard for further reading regarding the calculation of the probability of failure on
demand.
3
IEC 61508-6 page 22
4
In the exponential distribution the Probability Density Function is f t (t ) = λ . ⋅ e− λt . The Cumulative
f t (t )
Density Function is F t (t ) = 1 − e− λt . Consequently the Failure Rate function is λ (t ) = =λ =
Rt (t )
Constant.

www.safetyusersgroup.com 3 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


5
For example , when choosing the failure rate data for critical failure of ball valves, one will find there are
three values mentioned. What is meant by these values? Which one should be used; is it the mean, the
upper, or the lower values? Choosing the mean value will lead to a probability of failure of F(t) that will be
smaller than the one if the upper value was chosen. This may mislead the operator to the conclusion the
subsystem under consideration has a higher reliability, and therefore an imaginary proof of integrity level.
According to the IEC Standard1, it will be shown that none of the above mentioned failure rate values are
100% correct, and in fact, the failure rate needs to be recalculated to meet the requirements in the IEC
standard.

The intention of this article is to provide a technical reference and a practical methodology for
extrapolating and calculating the failure rate (λ) that complies with the requirements in the IEC standard
61511-1.

The following sections will provide conceptual information about the failure rate. An interpretation of the
IEC standard1 will be provided on a statistical basis for both the Mean Time to failure and the failure rate.
Demonstrating a mathematical/statistical solution to the problem, using a systematic approach to
calculate the failure rate from raw data will follow this.

5
OREDA 2002, page 575.

www.safetyusersgroup.com 4 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


3.0 Failure Rate Data

3.1 General Understanding


Failure6 Rate is defined as the probability7 that the actual life (T) of a component will lie in the range (t,
t+δt). Hence, the failure rate is the average rate at which failure occur in this interval of time.
n
λ=
K∆t
Equation 1 – Failure Rate Definition

The failure rate function tells us how likely it is an item that has survived up to time (t) will fail during the
next unit of time (OREDA 97, Page 18).

In the case of exponential distribution the failure rate function λ(t) = λ is constant. This means the item is
not deteriorating during this time which corresponds to the useful life phase in the bathtub curve of failure
rate, see

Figure 1 below (OREDA 97, Page 19).

It must be mentioned the burn-in phase as well as the wear-out phase do not follow an exponential
distribution, and can be seen in

Figure 1 not to have a constant failure rate.

6
Failure is defined for a component or system as “reaching, or being in, a state in which the component
or system fails to fulfill its intended design function.
7
Probability: “The relative frequency with which a specified event A can be expected to occur in a large
number of repetitions (of an ‘experiment’) under controlled conditions.”

www.safetyusersgroup.com 5 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc

Figure 1 - Bathtub Curve

3.2 How Meaningful is a Point Estimate for λ


To estimate the failure rate one needs to examine the failure data set to be either what is called
homogeneous8, or non-homogeneous9. For the homogenous sample, the estimate of failure rate is the
ratio of the number of failure (n) to the aggregated time in service (τ) as shown in Equation 2:

n
λ =τ
i
i

Equation 2 – A Point Estimate for the Failure Rate

Equation 2 will give an average point estimate of the failure rate for that sample of data, and during the
time the data was collected. Obviously, for a different set of data and/or a different period, this point
estimate will be different. Therefore, it is recommended to provide a range in which the failure rate might
fall and which would accommodate for the uncertainty in the failure rate value. The Offshore Reliability
Data (OREDA) provides a range in which the failure rate has a 90% chance of failing. This range can be
referred to as the Confidence Interval for the failure rate.

8
Homogeneous: Sample of data obtained from failure of identical items that have been operating under
the same operational and environmental conditions. (OREDA 97, Page 20).
9
Non-homogeneous: Sample of data obtained from different installations with different operational and
environmental conditions. (OREDA 97, Page21).

www.safetyusersgroup.com 6 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


3.3 Off Shore Reliability Data Approach
For a non-homogeneous sample, the variation of the failure rate data between samples may be modeled
with the assumption that the failure rate is a random variable and the probability density function follows a
Chi-Squared distribution (χ2), OREDA 97, Pages (22-24).

The OREDA failure rate calculator in the Off Shore Reliability Data book calculates the upper and lower
limits of failure rate (λ) in which the (λ) value has a 90% chance of failing, as shown in Figure 2. Providing
the confidence interval for the failure rate accommodates for the uncertainty in the point estimate value,
but does not meet the requirements in the IEC standard1.

The mean value and the upper and lower limits can be seen on Figure 2 as (A), (B), and (C) respectively.

Figure 2 - Upper and lower limits of 90% Confidence Interval for (λ)

www.safetyusersgroup.com 7 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


3.4 IEC Interpretation
The IEC Standard10 says: “the claimed mean time to failure on a statistical basis to a single sided lower
confidence limit of at least 70%”, but what does that actually mean? The following sections provide an
interpretation to that statement for both the Mean Time to failure (MTTF) and the Failure Rate (λ).

3.4.1 70% confidence limit for the time to fail


According to the IEC Standard10, the time to failure follows an exponential distribution as in Figure 3.

A single sided lower confidence limit of 70% for the mean time to failure (MTTF)11 means that the
probability, the time to failure, will not be less than the predicted mean time to failure is equal to 70%,
Equation 3 and Figure 3. It also means that the probability the time to failure will not exceed the predicted
mean time to failure is 30%, Equation 4 and Figure 3.

Based on that condition, the following definitions were developed:


• Success can be defined as the event when failure occurs at a time (t) that is bigger (or after)
the predicted (or calculated) mean time to failure. Consequently, the probability of success
can be seen in Equation 3 equal to 70%.
• Failure can be defined as the event when failure occurs at a time (t) that is less than (or
before) the predicted (or calculated) mean time to failure. Consequently, the probability of
failure can be seen in Equation 4 equal to 30%.

P(t > MTTF ) = 70% = Success


Equation 3 – Probability of Success

P(t ≤ MTTF ) = 30% = Failure


Equation 4 – Probability of Failure

10
IEC standard 61511-1, page 54.
11
Mean Time To Failure or the Expected life of a component when the Probability Density Function
∞ ∞
1
∫ ∫
− λt
P.D.F. is exponentially distributed is given by: MTTF = E (t ) = tf T (t ) dt = te dt = .
0 0
λ

www.safetyusersgroup.com 8 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc

Figure 3 - P (t > MTTF) = 70%, P (t ≤ MTTF) = 30%

3.4.2 70% confidence limit for failure rate


Now that the IEC requirement has been clarified for the mean time to failure (MTTF), what is the
requirement for the Failure Rate (λ)?

Since the Probability Density Function (P.D.F.) for the probability of failure follows an exponential
distribution, the mathematical relationship between the mean time to failure and the Failure Rate (λ) can
be seen in

Equation 5.

1
MTTF =
λ

Equation 5 – MTTF Equation for Exponential Distribution

www.safetyusersgroup.com 9 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


Therefore, the IEC standard appears to recommend the failure rate should meet a single-sided upper
confidence limit of 30% at the most. This means, in all probability, that the failure rate will be less than the
calculated value is equal to 70%, Equation 6.

Therefore, success can be defined as the event when the failure rate (λ) is smaller than the calculated
( λ MTTF ). Consequently, the probability of success can be seen in Equation 6 to be equal to 70%.

P(λ ≤ λ MTTF ) = 70%


Equation 6

Figure 4 - Probability (λ ≤ λMTTF) = 70%

www.safetyusersgroup.com 10 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


3.5 Conclusions
The failure rate is recommended by the IEC standard to meet a single sided upper confidence limit of
30% at the most, but failure rate in data sources such as OREDA provide other values as described in
section 4.3.

Figure 5 shows the data provided by OREDA: the mean value λθ , the upper confidence limit (B), and
the lower confidence limit (C). It also shows the value recommended by the IEC standard as λ mttf . The
difference between the value of λθ and the value of λ mttf is shown as ∆λ .

Figure 5 – OREDA Values in Relation to λmttf

The following section provides a mathematical approach to reach the Failure Rate value that is
recommended by the IEC ( λ mttf ) standard from the knowledge of the mean value ( λθ ) and the standard
deviation ( σ ) provided in the tables of Offshore Reliability Data handbook (OREDA).

www.safetyusersgroup.com 11 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


4.0 Calculations

4.1 General

The methodology in OREDA was used here to calculate the required λmttf , or the failure rate value such
that the corresponding mean time to failure can be estimated on a statistical basis to a single sided lower
confidence limit of at least 70%.

The χ² distribution12 was used in the OREDA handbook to model the Probability Density Function of
failure data. The Chi-Squared distribution is a special case of the Gamma function with parameters ( α )
and ( β ). The ( α ) parameter represents the number of degrees of freedom. The percentile values of the
Chi-Squared distribution can be found in the “New Cambridge Statistical Tables, 2nd edition”.

4.2 Calculation steps

The starting point for the calculation is the final estimate of the mean ( θ ) and the Standard deviation ( σ )
for the Chi-Squared distribution. These values can be calculated as shown in OREDA 97, page 23. Then
the following steps can be followed to reach the λmttf required:
Step 1 Calculate the Gamma parameters:

θ
• β=
σ
2

• α = β ⋅θ
Step 2 Calculate the number of degrees of freedom= 2 α
Step 3 Extrapolate the χ 22α (0.3) from the tables (Ref. 6)
• Step4 Calculate the upper 30% for λ using
Equation 7

• χ 22α (0.3)
1
λmttf =

Equation 7 – Recommended Calculation

12
Refer to http://mathworld.wolfram.com/Chi-SquaredDistribution.html for more information on the Chi-
Squared Distribution.

www.safetyusersgroup.com 12 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


4.3 Example #1
Ball Valve Failure Rate

A simple example of the impact of the λmttf approach is seen below where a λmttf of 20.7399 compares to
a mean ( θ ) failure rate of 17.00. (Failure data taken from page 575 of OREDA 2003)

• θ = 17 failure/ 106 h.
• σ = 12.47.
θ
• β= = 17/12.47² = 0.1093.
σ
2

• α = β * θ = 0.1093 * 17 = 1.8585.
• Number of degrees of freedom = 2 * α = 3.6650.
• χ 2
3.6650 (0.3) = 4.5347.
• χ 22α (0.3) = 4.5347 / (2 * 0.1093) = 20.7399 failure/ 106 h.
1
• λmttf =

It can be seen from the previous example that in order to comply with the requirements in the IEC
standard 61511-1 to choose the Failure Rate value, it is not 100% accurate to just use the values
provided by data sources.

The methodology described above allows the user to reach the Failure Rate value from the knowledge of
data provided in recognized industry sources such as OREDA.

www.safetyusersgroup.com 13 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


4.4 Example #2
Assume that a propane tank has three elements - a level sensor, a PLC and a gate valve (

Figure 6).

Figure 6 – Simple SIL Loop

Using three different failure rate values (upper, mean and λmttf ), a significantly different result is obtained.

Using the Mean or the Upper Values

Based on data from the 1997 Offshore Reliability Data handbook, Table 1 was created. It shows the
mean, the upper and the standard deviation for the failure rates that will be used to calculate the λmttf and
the Probability of Failure on Demand (PFD) for each component.

www.safetyusersgroup.com 14 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc

OREDA 97
Standard
Equipment Mean value Upper value Repair time Page
Deviation
Number
PLC 134.83 276.33 74.71 1.1 282
level sensor 6.27 11.2 2.67 7.9 329
gate valve 12.21 27.35 7.87 30.8 377
Table 1 – λ Values Obtained from OREDA 1997

Using the Recommended Calculation

Table 2 shows the failure rates (λmttf) obtained for the Safety Instrumented System components using the
recommended calculation. These values comply with the IEC standard 61511-1, but more importantly,
they reflect a reasonable interpretation of the data obtained from the OREDA handbook.

Equipment β α 2α χ2(0.3) λmttf


PLC 0.024156225 3.256983784 6.513967567 7.823 161.9251369
level sensor 0.879518579 5.514581492 11.02916298 12.93237093 7.351960058
gate valve 0.197136118 2.407031995 4.814063991 5.843479904 14.82092672
Table 2 – PFD Values Using Recommended Calculation

www.safetyusersgroup.com 15 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


Impact on Total SIL Loop PFD

Using SilCore™ software, Table 3 shows the PFD results obtained assuming constant values for the
Testing Interval, Mean Time To Repair (MTTR) and Diagnostic Coverage. The total PFD using the Upper
λ value is 0.092556, as compared to 0.054384 for (λmttf) and 0.045421 for using the mean λ value.

Equipment PFD using Upper λ PFD using λmttf PFD using Mean λ
PLC 0.0588 0.00644 0.005495
level Sensor 0.009794 0.034887 0.029136
Gate Valve 0.023962 0.013057 0.01079
Total 0.092556 0.054384 0.045421
Table 3 – PFD Comparison

Conclusions

Using different values for the individual component failure rates gives significantly different results. Using
the recommended calculation, the PFD for the SIL loop is approximately 20% higher than using the λ
mean. This is consistent with expectations. It also means that Safety Instrumented Systems designers
will need to mitigate approximately 20% more risk if they use the recommended calculation versus using
the λ mean. However, using the recommended calculation results in a 41% lower PFD value as
compared to using the upper λ level failure rates (0.054384 versus 0.092556).
Therefore, care should be practiced when using failure rate data sources, especially for those looking to
meet IEC 61511 requirements. The recommended calculation provides designers a value for the failure
rate that complies with the IEC requirements.

www.safetyusersgroup.com 16 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc


5.0 List of acronyms

Symbol Explanation

λ The Failure Rate

f t (t ) The probability density function of time to fail.

Ft (t ) The Failure Probability function, the Cumulative density function.

Rt (t ) The Reliability function

E(t), MTTF The Expected time to fail, the Mean time to failure.

n Number of failure

K Number of components

δt , ∆t Time interval

τ ,t Time

λmttf Failure rate that corresponds to the single sided lower confidence limit of 70%.

2α The number of Degrees of Freedom for the Chi-Squared distribution.

β The Beta factor in the Gamma/ Chi-Squared distribution.

χ 22α The Chi-Squared distribution with ( 2α ) degrees of freedom.

θ The mean value for the Failure Rate.

σ The Standard deviation of the Failure Rate.

www.safetyusersgroup.com 17 This document is available on


www.acm.ab.ca
info@acm.ab.ca

A division of ACM Automation Inc

6.0 References

1 IEC Standard 61508, 1998, IEC publications.


2 IEC Standard 61511, 2003, IEC Publications.
3 Off Shore Reliability Data, 1997, Det Norske Veritas, OREDA Publications.
4 Off Shore Reliability Data, 2002, Det Norske Veritas, OREDA Publications.
5 Baker, M.J. (1996), “RISK MODELLING AND QUANTIFICATION”. Lecture No 2, Hazards Forum
Lecture Programme on “An engineer’s responsibility for safety”, Hazards Forum, London, 25-53.
6 New Cambridge Statistical Tables, 2ed edition, Cambridge University Press.
7 http://mathworld.wolfram.com/Chi-SquaredDistribution.html

www.safetyusersgroup.com 18 This document is available on

You might also like