You are on page 1of 299

Welcome to Functional Safety Engineering I by exida.com.

This course
covers material useful in the first parts of the safety lifecycle process
focused on helping the participants understand the important methods and
skills needed to support their organizations functional safety efforts. Much
of the material relates to the widely used IEC 61508 and 61511 standards
for functional safety. The material is also a useful review for the Certified
Functional Safety Expert examination administered by the CFSE
Governance Board as approved by TUV.

1
2
3
4
This course has been developed over the last several years by a team of
experts in various aspects of functional safety engineering. Ongoing review
and improvements ensure that the material is up to date and reflects the
best current practices.

5
Before beginning instruction it is helpful for all of the participants to get
know know a little more about each other. This will assist the educational
process by allowing the instructor and participants to know where each
others interests lie. It will also allow the instructor to develop examples
that relate directly to the industries and problems being faced by the
participants.

6
Over the duration of this course the participant will develop the following
skills.

7
Over the duration of this course the participant will develop the following
skills.

8
This course is divided into eight sections covering the first parts of the
safety lifecycle.

9
10
This first section defines a safety instrumented system and describes it’s
purpose. Safety instrumented functions are defined and described.

11
12
13
9
14
9
15
9
16
9
During the analysis phase of the safety lifecycle, the objective is to identify
potential hazards and estimate their risk. If that risk is tolerable according
to corporate and regulatory standards, no further risk reduction is needed.
However, in many cases, the risk must be reduced. An analysis of all
existing layers of protection should be conducted as part of this
determination.

17
18
19
9
20
The main objective of IEC61508 is to help insure that functional safety is
achieved by following the basic safety lifecycle (SLC) procedure and by
maintaining the associated documentation. The essence of the SLC procedure
is to first analyze the situation and document the safety requirements. Then,
these requirements are translated into a documented safety system design,
using appropriate software and hardware subsystems and design methodology.
Next, the system is evaluated against the required integrity and reliability
specifications and is modified as needed. Finally, the system is operated and
maintained according to accepted procedures, and the results are documented
to insure that performance standards are maintained throughout the system’s
life.
Equipment is usually certified – not the vendors.

21
The main objective of IEC61508 is to help insure that functional safety is
achieved by following the basic safety lifecycle (SLC) procedure and by
maintaining the associated documentation. The essence of the SLC procedure
is to first analyze the situation and document the safety requirements. Then,
these requirements are translated into a documented safety system design,
using appropriate software and hardware subsystems and design methodology.
Next, the system is evaluated against the required integrity and reliability
specifications and is modified as needed. Finally, the system is operated and
maintained according to accepted procedures, and the results are documented
to insure that performance standards are maintained throughout the system’s
life.

22
IEC 61511 (draft) defines a Safety Instrumented System (SIS) as

“instrumented system used to implement one or more safety instrumented


functions. A SIS is composed of any combination of sensor(s), logic
solver(s), and final element(s).”

There is no restriction as to what type of technology is used or the size of


the system.

23
IEC 61508 does not use the term Safety Instrumented System (SIS).
Instead it uses the term Safety Related System (SRS) to mean the same
thing. Many expect the 61508 standard to be updated to the newer term
- SIS.

24
Practitioners often prefer a more functional definition of SIS such as:

“A SIS is defined as a system composed of sensors, logic solvers and final


elements designed for the purpose of:

1) Automatically taking an industrial process to a safe state when


specified conditions are violated;

2) Permit a process to move forward in a safe manner when specified


conditions allow (permissive functions); or

3) Taking action to mitigate the consequences of an industrial hazard.”

A SIS is much like a basic process control system (BPCS) in that both have
sensors, logic solvers and final elements. But a SIS operates in a
completely different mode and unique design and maintenance, or
mechanical integrity requirements are needed.

25
Like other types of control systems, a Safety Instrumented System
typically consists of many safety loops, called safety instrumented
functions.

26
A safety instrumented function is defined by the IEC standards as a
“Function to be implemented by a SIS which is intended to achieve or
maintain a safe state for the process with respect to a specific hazardous
event.”

27
Thus an individual Safety Instrumented Function (SIF) is designed to first
identify the need and then act to bring the system to a safe state for each
hazard scenario. The effectiveness of the SIF is measured by the its risk
reduction factor (often expressed as a Safety Integrity Level). The required
risk reduction is the difference between the process risk before a SIF and
the “tolerable level” of risk to be achieved for that process or piece of
equipment.
It is important to note that a SIF is an individual function and a SIS can
include multiple functions, so the SIL refers to each SIF rather than to the
entire safety instrumented system.

28
29
Each safety instrumented function is intended to protect against a
particular hazard via shutdown, permissive or mitigation functions such as:

• Shutdown fuel supply to furnace;


• Supply emergency coolant to reduce extreme temperature;
• Open valve to relieve excessive pressure;
• Direct escaping liquid to waste handling system;
• Issue pre-recorded emergency message to response team.
Note that an SIF must include elements of detection, decision and action
to achieve or maintain a safe (or mitigated) state.

30
Also like a control system, SIS sensors measure relevant equipment and
process parameters. In the process industries these include pressure,
temperature, flow, level, gas concentrations, flame presence or other
similar measurements.
In machine safety sensors measure operator intrusion into a dangerous
zone, release of a two hand control switch, and other similar physical
parameters.

31
A SIS has a logic solver. This is typically a special purpose PLC but can
also be a relay system or solid state logic. The controller reads signals
from the sensors, executes pre-programmed functions designed to
prevent or mitigate a potentially dangerous process hazard and takes
action by sending signals to final elements.

32
In the process industries, the final element in a SIF is often a remote
actuated valve. Sometimes solenoid valves are used directly, as are
power relays, motors or other devices that do things like interrupt fuel
flow, vent high pressure gas, flood with cooling water or release inert gas.

As with sensors, final elements in a safety instrumented function handle


the same process materials and environmental conditions as a control
system and need to be designed with the same considerations for
materials, hazardous area classifications, and so forth.

33
The actual implementation of any single safety instrumented function may
include multiple sensors, signal conditioning modules, multiple final
elements and dedicated circuit utilities like electrical power or instrument
air.

34
The IEC 61508 safety lifecycle is one of the more comprehensive guidlines
for how to effectively achieve functional safety through the use of safety
instrumented systems.

35
This section define a safety instrumented system and describe its purpose.
Safety instrumented functions were defined and described next. The
difference between control systems and safety systems were then
discussed in the context of standards compliance, risk reduction and
failure modes. Finally the various laws, regulations and relevant standards
relating to safety systems were generally discussed before introducing the
safety lifecycle (SLC) concept.

36
This section focuses on the safety lifecycle process for achieving functional
safety. After identifying its objectives, this section describes the version of
the lifecycle put forward in the different standards. The section then
finishes with a more detailed discussion of the different phases of the
lifecycle.

37
A study of actual industrial accident causes done by the Health and Safety
Executive in the United Kingdom showed a number of different cause
catagories. The most significant of these was poor safety function
specification at 44%. Other causes included „changes after commissioning
(including online changes)“ at 21%, operation and maintenance errors at
15%, design and implementation errors at 15% and installation and
commissioning errors at 6%. There were problems in most every activity
leading to an operational safety instrumented system however, it was
apparent that better methods were needed in the front end of the
engineering process as well as the back end. It appeared that a „lifecycle
approach“ was not being used.

38
39
The IEC 61508 standard lays out a nominal 16 step process which can be
divided into three main classifications. The Analysis phases deal with
gathering background information to identify and specify the needs for the
system. The Realisation phases concern system design and fabrication
while the operation phase deals with using and maintaining the system
properly during its operating life.

40
41
In safety instrumented systems designed to automatically protect an
industrial process, the steps required to do this cannot begin until the
conceptual design for a process is complete. At that point the process is
examined for its potential hazards. The risk of each hazard is assessed by
estimating the likelihood or frequency of occurance and consequence
magnitude if it does occur. For those risks that need to be reduced, safety
requirements are created. Often the needed safety can be achieved
without a safety instrumented system. For those places where a SIS is
judged to be the best solution, a risk reduction target is defined with a
Safety Integrity Level (SIL). A description of the needed safety functions
along with all important information including the SIL is documented in a
safety requirements specification (SRS).

42
During the analysis phase of the safety lifecycle, the objective is to identify
potential hazards and estimate their risk. If that risk is tolerable according
to corporate and regulatory standards, no further risk reduction is needed.
However, in many cases, the risk must be reduced. An analysis of all
existing layers of protection should be conducted as part of this
determination.

43
Once the layers of protection are identified, an analysis can be done using
either qualitative or quantitative methods. The decision to use qualitative
versus quantitative methods is based on the severity of the consequences.

44
For those hazards where a SIS is required to achieve the necessary risk
reduction, an order of magnitude value called a safety integrity level (SIL)
is selected to specify that required capability.

45
The hazard identification and risk analysis process culminates in a
document called the safety requirements specification. For all safety
instrumented functions defined, certain information must be specified. This
should include the specific conditions sensed, the actions to be taken,
timing, maintenance and bypass requirements as well as any known
special requirements needed to properly reduce the risk.

46
47
The realization phase begins with conceptual design of the safety instrumented system
based on the Safety Requirements Specification. The desired technology is chosen for
the sensors, logic solver and final elements.
Once the technology is chosen, often redundant device configurations or architectures
are selected based on experience in safety instrumented system design. There are
several different architectures that can be used depending on the performance of the
individual components and the needs of the system. They commonly have names like
1oo1, 1oo2, 2oo3 and 1oo2D. The 2oo3 or „two out of three“ configuration means that
three elements are present and two of those three devices must indicate a trip in
order for that trip to be signalled.

48
Conceptual design begins with identification of the equipment to be used in
the SIS. The criteria used to select equipment for process control (such
as the materials, accuracy, environmental conditions, etc.) also completely
apply to safety applications. In addition, failure rate data should be
available to assist in the design.
For equipment certified to a particular SIL level, obtain the equipment’s
Safety Manual. That manual includes essential information for proper
application of the equipment. For equipment not safety certified, the user
is responsible for proper application.

Instructor’s Note: For application assistance on particular equipment,


contact the process safety engineer.

49
Often the designer will specify a redundant architecture to achieve a higher
level of safety or equipment availability. Different redundant architectures
can achieve fault tolerance against different failure modes. A detail
understanding of redundant architectures is presented in a separate set of
lessons.

50
If testing is automatically performed by the safety equipment, the test
interval is determined by the manufacturer or when the equipment is
setup. Normally for on-line automatic testing, this interval is short on the
order of seconds or minutes. Otherwise for more involved off-line testing,
the requirement is normally to provide periodic proof testing during
turnaround. In that case the time period between turnaround shutdowns
is the target proof test interval.

51
If the target SIL is not achieved by the initial design, different technology, different
levels of redundancy, or different test philospohies must be chosen. The revision
process iterates until an acceptable conceptual design is achieved.

52
Tools such as simplified equations, fault trees or Markov models are used to calculate
if the system achieves its required SIL using data obtained from various sources.
Failure rate databases that are product and application specific are best. In some
cases, manufacturer‘s failure data is a good source of failure data. The end result of
the analysis is a set reliability and safety metrics that are used to verify that the
requirements have been met.

53
If the proposed design does not meet the SIL requirement, a number of options are
available to achieve the required SIL:
1. The hazard review team could be asked to re-evaluate the SIL requirement based
on suggestions for addtional layers of protection or some other basis.
2. The periodic proof test interval could be reduced. Often this will result in the need
for some on-line test facility if the process cannot be shutdown for each test.
3. Better equipment can be chosen in terms of safety ratings – lower dangerous
failure rates or better automatic diagnostics are needed to achieve a higher SIL.
4. Additional redundancy can also be added to the SIF, this will also help achieve a
higher SIL.

54
When the conceptual design is complete, the detailed design work including wiring
drawings, installation planning, programming of the logic solver, etc are done. As in
normal controls practice, this typically must be recorded in a detailed design
document.
The realization phase ends with the system installation, commissioning and startup
acceptance testing where the design verification is completed.

55
The operation phase of the safety lifecycle begins with a validation of the
design. This validation must answer the following questions.
Does the system solve the problems identified during the hazard analysis?
Have all necessary design steps been carried out successfully?
Has the design met the target SIL for each safety instrumented function?
Have the maintenance procedures been created and verified?
Is there a management of change procedure in place?
Are operators and maintenance personnel qualified and trained?
The answers to these questions must be acceptable before proceeding into
startup and operation.

56
The validation task in the operation phase of the safety lifecycle is
especially important. It is here where all the earlier safety lifecycle
activities are reviewed to ensure that the right steps were carried out and
that the documentation is in place. Specific testing is also done to confirm
that the SIS functions according to design requirements. IEC 61511 states
that FAT may be a part of the validation actvities.

57
Periodic proof testing is done per the schedules and methods established
during the conceptual and detail design steps. This is done to ensure the
system is as safe on the day before decommissioning as it was when it was
brought on stream.

58
Periodically during a system’s operating life the hazards should be
reviewed. If any changes are identified from these reviews or if system
changes are made for other reasons, these changes should be made
according to the safety lifecycle, picking up at the step most appropriate to
the change under consideration.

59
60
The main objective of IEC61508 is to help insure that functional safety is
achieved by following the basic safety lifecycle (SLC) procedure and by
maintaining the associated documentation. The essence of the SLC
procedure is to first analyze the situation and document the safety
requirements. Then, these requirements are translated into a documented
safety system design, using appropriate software and hardware
subsystems and design methodology. Next, the system is evaluated
against the required integrity and reliability specifications and is modified
as needed. Finally, the system is operated and maintained according to
accepted procedures, and the results are documented to insure that
performance standards are maintained throughout the system’s life.

61
During the analysis phase of the safety lifecycle, the objective is to identify
potential hazards and estimate their risk. If that risk is tolerable according
to corporate and regulatory standards, no further risk reduction is needed.
However, in many cases, the risk must be reduced. An analysis of all
existing layers of protection should be conducted as part of this
determination.

62
The picture above shows the result of a risk assessment performed by Shell for
a existing Hydrogen Manufacturing Unit.
49% of all safety functions were too safe (over-engineered)
4% of the safety functions were under-engineered which means that they were
not safe enough.

63
The NAM gathered the results of risk assessments of 7 different existing
plants. The classification of the 5319 safety functions in these 7 plants show
results that are similar (approximately) to the results of the Hydrogen
Manufacturing Unit that was assessed by Shell.
In the overall results of the NAM,
37% of all safety functions were too safe (over-engineered)
6% of the safety functions were under-engineered which means that they were
not safe enough.

64
An additional component of the standards that should be mentioned here
is that of personnel competency. All those individuals responsible for
safety lifecycle tasks must be competent in those tasks.

65
9
One way to demonstrate and ensure that competence is through the
Certified Functional Safety Expert program.

66
9
The CFSE has several specific requirements.

67
9
The CFSE is offered in several different specialties.

68
9
The CFSE is offered in several different specialties.

69
9
This section focused on the safety lifecycle process for achieving functional
safety. After identifying its objectives, this section described the version of
the lifecycle put forward in the different standards. The section then
presented a more detailed discussion of the different phases of the
lifecycle. It finished with a discussion of the requirements for personnel
competency.

70
This section reviews the basic principles of risk management and how they
apply to safety instrumented systems. It begins with a specific definition of
risk and develops the idea of a non-zero risk tolerance. The section then
details ways to properly measure risk and move on to characterize the
level of risk reduction needed to reduce the existing level of risk to a level
that is tolerable. Finally the section covers how risk considerations fit into
the safety lifecycle process.

71
The definition of risk includes components of likelihood and consequence,
which both contribute to the risk in any given situation. Risk events often
have multiple consequences that cause harm in multiple areas. In safety
engineering, these areas are referred to as receptors.

72
Organizations have legal, moral, and financial responsibilities to limit the
risks that their processes pose to workers and neighbors. Understanding
the principles an organization already uses (stated and un-stated) should
be the basis for developing a safety strategy. The strategy is more likely to
be used and achieve results if is consistent with the way that organization
works. The amount of risk that is tolerable is a function of the laws and
values of the community, the magnitude and type of consequences, and
the amount of resources required for risk reduction projects. The balance
between these will vary with the company, industry and culture of the
surrounding community.

73
Humankind has always engaged in activities that involve some amount of
risk, particularly if an activity has yielded a benefit. Almost all activities
involve some degree of risk and, in nearly every case, the only way to
reduce the risk to zero is to avoid the activity altogether. In the extreme,
the only way to avoid death is to never be alive. The objective of risk
management is to effectively balance an activity’s benefits against its risks
based on a knowledge of the trade offs that exist between them.

74
Risk reduction engineering is complicated by the wide variety of ways in
which the magnitude of a risk’s consequences or forms of harm can be
expressed. This complication makes it difficult to compare different
receptors of that harm. For example, how severe is a personnel injury or
death when compared to varying levels of business interruption or
equipment damage? While it may be difficult or legally awkward to
compare these different receptors, their relative importance must be
qualitatively determined in order to identify an intelligent balance between
different potential risks.

75
The receptors of an event’s harmful consequences and the nature of those
consequences will determine how that risk is expressed. If the
consequence of an unwanted event is dangerous to humans, the risk
might be measured in potential number of fatalities or injuries. If the
consequence does not involve human injury, the risk might be expressed
in terms of amount of spilled material or financial loss.
Risks to humans are often further subdivided into the risk faced by an
individual and the risk to society posed by events that will cause multiple
persons to be injured or killed.

76
Individual risk of fatality is one of the most frequently used measures of
risk in process plants. Individual risk targets are frequently expressed in
terms of chance of fatality per year. The UK HSE Tolerability of Risk
framework has established individual risk of fatality limits of 1x10-5 per
year as a de minimus level, or the level below which risk is so small as to
be considered negligible, and 1x10-3 per year as the de manifestus level,
or the level above which risk is not tolerable under any circumstances.
In between these bounds, there is a region where the typical practice is to
reduce the risk to a level as low as is reasonably practicable. This is based
on weighing the costs of risk reduction versus the amount of risk reduction
that can be achieved.

77
ALARP’s framework is based on “maximizing utility,” combined with limits placed on it such that “no
one is made considerably worse off.” The philosophical principle behind ALARP can be stated as:
“Maximize the expected utility of your investment, but do not expose anyone, whether your workers or
neighbors, to an excessive increase in risk.”

78
Individual risk of fatality is one of the most frequently used measures of
risk in process plants. Individual risk targets are frequently expressed in
terms of chance of fatality per year. The UK HSE Tolerability of Risk
framework has established individual risk of fatality limits of 1x10-5 per
year as a de minimus level, or the level below which risk is so small as to
be considered negligible, and 1x10-3 per year as the de manifestus level,
or the level above which risk is not tolerable under any circumstances.
In between these bounds, there is a region where the typical practice is to
reduce the risk to a level as low as is reasonably practicable. This is based
on weighing the costs of risk reduction versus the amount of risk reduction
that can be achieved.

79
Good in that it addresses all forms of harm and uses precise numbers.
It may face some difficulty in certain environments because it is so precise.
Consider an opposing attorney asking why you think it is ok to kill 5 people at
your 1000 person plant every 10 years. Also, the financial number is probably
somewhat high.

80
Good in that it gives precise numbers while also allowing flexibility to identify
a practical level of risk reduction within the moderate risk zone. Also,
especially for the United States, it does not give any official acceptance of
fatalities from a legal perspective. With this in mind, it is important that the
limits of practical risk reduction be well understood by the SIL selection
workshop team to use their best judgment to properly assign SIL for high
consequence events. Good in that it easily lends itself to later SIL selection
work.
Bad in that it does not include other forms of harm aside from personnel
safety. Although equivalent matrices may be developed for the other forms of
harm.

81
At this time the course participant should practice the skills learned in this
section by completing Application Exercise 1. The solution to Application
Exercise 1 can be found in the additional resources section of this training
module.

82
The objective is the safety lifecycle process is reduce risk. Inherent Risk is
defined the amount of risk in a completed process design resulting from a
given quantities of materials and given process parameters. For example a
process involving a large storage tank of highly toxic phosgene would have
a much higher inherent risk than a similar process using a smaller quantity
of a less hazardous species.

83
The design objective of a safety instrumented system is to reduce the risk
of any process hazard from a region known as the ‘unacceptable risk
region’ to the ‘tolerable risk region.’ The inherent risk of a process from a
consequence perspective is fixed once the process design is fixed.
Inherent risk takes no credit for protective measures such as safety
instrumented systems, relief devices, etc.

NOTE: ALARP (As Low As Reasonably Practicable). For more details on the
application of this term see exida.com on-line lesson – ALARP.

84
Geographic risk is a measure of the probability that an event will occur in a
specific geographic location. Geographic risks typically are shown by
drawing lines of constant risk, or risk isopleths, on a process plot plan as
shown in the drawing above.
NOTE: In this diagram, inside the inner black circle, an event that will
cause fatality will occur with a frequency of 10-3 per year or more. Inside
the middle red line, fatal events will occur with a frequency of 10-4 or
more. Outside the outer green line, the risk is below the common
tolerance level; in this case with a 10-5 frequency of occurrence or less.
Geographic risk criteria are extremely useful for purposes such as facility
siting, but have found little use in the task of risk reduction design
engineering. As with individual risk criteria, geographic risk criteria do not
weigh multiple fatality incidents more strongly than single fatality
incidents.

85
Geographic risk is a measure of the probability that an event will occur in a
specific geographic location. Geographic risks typically are shown by
drawing lines of constant risk, or risk isopleths, on a process plot plan as
shown in the drawing above.
NOTE: In this diagram, inside the inner black circle, an event that will
cause fatality will occur with a frequency of 10-3 per year or more. Inside
the middle red line, fatal events will occur with a frequency of 10-4 or
more. Outside the outer green line, the risk is below the common
tolerance level; in this case with a 10-5 frequency of occurrence or less.
Geographic risk criteria are extremely useful for purposes such as facility
siting, but have found little use in the task of risk reduction design
engineering. As with individual risk criteria, geographic risk criteria do not
weigh multiple fatality incidents more strongly than single fatality
incidents.

86
Where possible, it is desirable to reduce the inherent process risk by
modifying the process. Consequences can be reduced by lowering
quantities of materials or building physical protection. Likelihood can be
reduced by reviewing methods used to control the process as well as any
means to recover from upsets, such as alarms.

87
If this reduction in the likelihood and consequence still leaves the
estimated process risk too high, safety instrumented functions are often
designed to reduce risk further. Answer to what is wrong: there is no
reason to use SIL 2 or 3 if SIL 1 already gets to an acceptable risk level.

88
The IEC and ISO have both issued relevant standards for risk
management. These relate to general technical and environmental risk
management as well as to specific risk reduction safety systems. There are
numerous government regulations specific to each country and
jurisdiction. The regulations and standards listed here are just some
examples of what exists. Thus it is usually helpful to search a specific
jurisdiction for what is applicable for a given project.

89
This general risk management (RM) flow sheet, based on the Australian
risk management standard 4360, shows a typical risk management
process. It is generally similar to other RM processes outlined in other
standards. Although the names of the steps may be different, the
functions of identification, characterization, action, review, and
communication are common to essentially all accepted procedures.
This general concept has been used as the basis of the safety lifecycle.

90
Thus the primary features that make up the safety lifecycle work together
to support the overall process of reducing risk to a tolerable level.

91
This section reviewed the basic principles of risk management and how
they apply to safety instrumented systems. It began with a specific
definition of risk and developed the idea of a non-zero risk tolerance. The
section then detailed ways to properly measure risk and went on to
characterize the level of risk reduction needed to reduce the existing level
of risk to a level that is tolerable. Finally the section covered how risk
considerations fit into the safety lifecycle process.

92
This section begins with some basic rules of probability. Then it develops
several important definitions regarding events before developing the
concepts of probability multiplication and addition. The section then
finishes with some exercises and a treatment of the fault tree analysis
method.

93
The probability of an event is determined by one of two methods.
Probability can be assigned based on the physical properties, for instance,
the geometry and physical shape of a die will determine the probability of
one particular side facing up after the die is rolled.
The second way probability is determined is by analyzing the outcome of
experimental trials. When using experimental outcomes, the probability of
an event is determined by dividing the number of times that the event of
interest occurs by the total number of trials.
Probability is a real number that always lies between zero and one.
A probability of zero
When a probability of zero is assigned, it means that the event is never
expected to happen. It does not mean that the event cannot happen.
Consider a coin toss. A probability of 0.5 is assigned to heads and a
probability of 0.5 is assigned to tails. A probability of zero is assigned to
the edge of the coin. Once when teaching this material one of the
instructors tossed a coin and it landed on edge. There was a hardwood
floor with cracks. The coin landed in a crack and stuck there, on edge. It
can happen. The challenge is that when working with low probabilities,
these kinds of assumption errors can matter a lot and will often dominate
the result.

94
Often, Venn diagrams are used to express probability in a graphical
manner. A rectangle is used to show all possible “outcomes,” which have
a total probability of one. A subset area inside the rectangle is used to
represent a possible outcome, one event in this case is designated E. The
size of the subset area is proportional to the probability.

95
In this example Venn diagram, two possible outcomes are shown. They
are marked “H” and “T.” Each occupies one half of the total area so it is
clear that a probability of 0.5 has been assigned to each. What
“experiment” does this represent? A coin toss.

96
Venn diagrams can effectively show probability estimates. In this example
one can quickly see that the probability of a software failure is roughly half
of the total.

97
Twenty objects exist in a container. If they are counted, sixteen are gold
in color. Based on that experiemental observation, we can assign a
probability of 0.8 (16/20) that an object randomly selected will be gold.
This can be seen on the Venn diagram. In another count, it is determined
that fifteen of the objects are round marbles. Based on this data we can
assign a probability of 0.75 (15/20) that a randomly chosen object will be
a round marble.

98
Experimental outcomes or “events” have properties. When one outcome
does not affect another, those events are called independent. Events can
be independent in parallel or series. In parallel, flipping two coins, one
outcome does not affect the other. In series, when flipping the same coin
twice, the first toss does not affect the second. It is often assumed that
equipment failures in SIS design are independent.
When a situation exists where there are only two possible outcomes, the
events are called complimentary when only one can occur. In reliability
engineering, success and failure are complementary.
When a situation exists where more than two outcomes may occur, the
events are called mutually exclusive if only one can occur. The toss of a
fair die is an example. There are six possible outcomes but only one will
occur. Failure events are rarely mutually exclusive. However, failure
modes are considered mutually exclusive.
Events
Can an independent event be mutually exclusive? The answer comes from
the definitions. In a mutually exclusive event, one outcome assures that
the others cannot happen. In an independent event, one outcome has no
affect on another. An event cannot be both independent and mutually
exclusive.

99
As stated, independent events have no affect on each other. A classic
example is a coin toss. If one coin lands heads, how could that have any
affect on the second coin? Realistically it does not. Another example is
the dice toss. Each die provides a result independent of the other.
What about component failures? Often we consider failures to be
independent. If one component fails due to a stress, normally no other
component would fail. Therefore it is reasonable to assume that
component failures are independent. Of course, it is possible that one
failure would create a stress in another component and the assumption of
independence requires extreme care.

100
Complimentary events occur when two possible outcomes are mutually
exclusive. If one event occurs, the other will not. The outcome of a coin
toss is complimentary. Success and failure are complimentary. If a
system is operating successfully, it has not failed.
With complimentary events, the probability of one event equals one minus
the probability of the other. If the probability of successful operation for
the next year equals 0.8, what is the probability of failure?
0.2

101
When more than
two outcomes are
possible and only
one will occur,
they are mutually
exclusive. If one

102
die is tossed, the
outcome is mutually
exclusive. If a pair
of dice is tossed, the
outcome is also
mutually exclusive.
If the outcome is a
7, it will not be an
11.
Are mutually
exclusive outcomes
complementary?
102
No.
Are complimentary
outcomes mutually
exclusive? Yes.

102
When there is a logical relationship between events, the probabilities of
event combinations can be calculated. For two independent events, the
probability of getting event A AND event B equals the probability of A
times the probability of B. This is represented in a Venn diagram as the
overlap between area A and area B.

103
As an example, consider a system with a limit switch and a solenoid valve.
The system is successful if both the limit switch AND the solenoid valve are
successful. If the probability of successfu operation for the next year
equals 0.9 for the limit switch and 0.98 for the solenoid valve, what is the
probability of successful operation for the combined system?

104
The system is successful if both the limit switch AND the solenoid valve are
successful. Probability of system success = 0.9 * 0.98 = 0.882 based on
the events being fully independent.

105
When the logical relationship between two events is represented by an OR
function, the combination of two mutually exclusive events can be
calculated by adding the individual probabilities. If, for instance there are
two outcomes, A and B, that are mutually exclusive, then the probability of
A and B occurring is PA plus PB. Mutually exclusive events show no overlap
on a Venn diagram.

106
For example, if one fair die is rolled what is the probability of getting a 4 or
a 6? These two outcomes are mutually exclusive so the probabilities of
each can be added. The result is then 2/6 or 0.33.

107
For example, if one fair die is rolled what is the probability of getting a 4 or
a 6? These two outcomes are mutually exclusive so the probabilities of
each can be added. The result is then 2/6 or 0.33.

108
If a pair of fair dice are thrown once, what is the probability of getting a 7
or a 9? Since the outcomes are mutually exclusive, the individual
probabilities are added. Of the thirty-six possible outcomes, there are six
combinations that give a seven and four combinations that give a nine.
P(7 OR 9) = 6/36 + 4/36 = 10/36.

109
When the events are not mutually exclusive, the situation becomes more
complex. Consider the Venn diagram of Event A and Event B such that A
and B are not mutually exclusive. The total probability of each event is
represented by the area of each circle. The probability that either event
occurs would be represented by the area of both circles. Since the events
are not mutually exclusive there is an overlap between the two sets. This
overlap situation represents both events occurring simultaneously (A AND
B). If the probability of A OR B is calculated using simple addition, then
the overlap section will be counted twice! To correct this error we must
subtract the area of the overlap. The area of the overlap is calculated
using probability multiplication. The final equation that is used for
probability addition of non-mutually exclusive events is:

P (A or B) = P(A) + P(B) – P(A and B) = P(A) + P(B) – P(A)*P(B)

110
Consider an example where a sack contains 100 objects. The shape is
either a round marble or a square block. The color is either red or gold.
What is the probability that a randomly selected object will be either a
marble or gold?

111
The events MARBLE and GOLD are not mutually exclusive because it is
possible to withdraw an object that is both a marble AND gold. Thus, the
non-mutually exclusive form of probability addition is used.
P(M and G) = 0.75 + 0.8 – (0.75 * 0.8) = 0.95

112
Another view of the problem is useful. What is the probability of not
getting an object that is either gold or a marble. That happens if the
object is a red block. That probability is 0.2 * 0.25 = 0.05. Using the rule
of complimentary events, the probability of getting a gold object or a
marble is therefore 1 - 0.05 = 0.95.
A red block
The probability of getting a red object equals one minus the probability of
getting a gold object since these outcomes are complimentary - we get
either red or gold. That equals 1 - 0.8 = 0.2.
The probability of getting a block equals one minus the probability of
getting a marble since these outcomes are complimentary - we select
either a marble or a block. That equals 1 - 0.75 = 0.25.
The probability of getting an object that is red and a marble equals 0.2 *
0.25 = 0.05. A five percent chance of getting a red cube.

113
What happens when there are three independent events? The situation
gets more complicated and the A or B or C equation is shown in the slide.
In this case, the probabilities of each event are first added together. Then
probabilities of each combination of two events at a time are subtracted.
Finally the probability of al three events is added. Why is this?

114
The objective is calculate the area represented by event A or B or C. First
start by adding the probabilities of event A plus event B. There is too
much area as shown by the green zone of double counting. The area of A
AND B must be subtracted as was the case for the two event situation
considered earlier.

115
When the area of A AND B is subtracted, the extra area is gone.

116
If the area of C is added, too much area is again included as shown in
green. This must be removed.

117
Removing the area of A AND C is a partial solution but too much area still
remains.

118
When the area of B AND C is removed, the result is too small.

119
When the area of A AND B AND C is added back in, the result is finally
correct.

120
A general from of the equation using the complimentary event rule can
also be used for probability addition of three independent events.

121
Try these problems. The answers are covered on the following slides.

122
The probability of getting a four on each die is 1/6. The probability of
getting a four on one AND a four on the other equals 1/6 * 1/6 = 1/36.

123
The probability of getting any specific number of both dice is 1/36. There
are six ways to get the same number, each mutually exclusive. Therefore,
the probability of getting the same number on both dice is 1/36 + 1/36 +
1/36 + 1/36 + 1/36 + 1/36 = 1/6.

124
The answer is 0.5. While one may argue with the assumption that birth
probabilities are independent events, this is generally assumed. Therefore
previous events do not affect future events.

125
Thinking about a variation on the old “Russian Roulette” problem, the
probability of an incident in the first year can be assigned based on
physical construction. The assigned value is 1/6.
What is the probability over an interval of three years? A natural answer is
3/6. Is this correct?
What about an interval of ten years? A natural answer is 10/6. But this is
not a valid probability since it is greater than 1! An answer of 10/6 cannot
be right.
As is usually the case, the trick is in the definitions not the math. Applying
the independent event definition is the key to this problem.

126
What is the probability over an interval of three years? One approach is
the calculate the probability of not having an incident in one year. This is
a complimentary event which equals 5/6. An incident does not occur in
three years only if there is no incident in year one AND year two AND year
three. That probability of no incident is 5/6 * 5/6 * 5/6 = 0.579.
The probability of an incident is therefore 1 - 0.579 = 0.421
Following a similar approach for a period of ten years, the probability
equals 1 - (5/6)10 = 0.839. That number is much better than 10/6!

127
At this time the course participant should practice the skills learned in this
section by completing Application Exercise 1. The solution to Application
Exercise 1 can be found in the additional resources section of this training
module.

128
Fault tree analysis is a top down approach to describing failures of complex
systems. A fault tree analysis begins with the “top event” typically the
failure of the system. This top event is the result of a number of basic
events that contribute to, or initiate, the system failure. The logic of a
fault tree is displayed by the symbols representing the basic events and
gates that logically relate those events. Each of the common fault tree
symbols represent a type of event or logical relationship.

129
A rectangle represents an event or a resulting fault.
A circle represents a basic fault.
A trapezoid represents an incomplete event. An incomplete event is one
where more information is required to fully understand the event, but the
analysis does not require this level of detail, and has not been performed.
A hexagon represents an inhibit gate. An inhibit gate represents a logical
event which can inhibit another event from occurring.
The “half moon” shaped symbol represents on OR gate. Logically, the
output of the gate is true if any of the inputs are true. The probability of
the output is calculated using probability multiplication.
The semicircular symbol represents an AND gate. Logically, the output of
the gate is true if all of the inputs are true. The probability of the output is
calculated using probability addition.
The pentagon represents a trigger event, or a house event, which is an
event that is virtually guaranteed to occur given the conditions under which
the modeling is occurring.

130
As one would expect, the output of an AND gate is the logical ANDing of
the inputs to the gate. Since the operation is a logical AND, probability
multiplication of the inputs is used to calculate the probability of the
output.
In this example, the failure of a battery system is related to the system‘s
batteries being discharged combined with the charger failing. They are
related using an AND gate. The battery discharge probability is 0.2, and
the charger failure probability is 0.01. What is the probability of system
failure?
Assuming the events are independent, use probability multiplication to
determine the output probability.
Psys = Pbat * Pcharge
Psys = 0.2 * 0.01 = 0.002

131
Also as one would expect, the output of an OR gate is the logical ORing of
the inputs to the gate. Since the operation is a logical OR, probability
addition of the inputs is used to calculate the probability of the output.
In this example, the failure of a pressure sensing system is related to the
failure of the two transmitters that comprise the system. The events are
related using an OR gate. The probability of either of the transmitters
failing is 0.001. What is the probability of system failure?
Assuming the events are independent, use probability addition to
determine the output probability.
Psys = 1 – [(1-P1) * (1-P2)]
Psys = 1 – [(1-0.001) * (1-0.001)] = 0.002

132
Fault trees are capable of portraying multiple input systems as well as
binary. The triple OR gate probability outcome is calculated in the same
way as described earlier with the simple addition formula for mutually
excusive events and the more involved formula for independent events.
The triple AND gate uses the same probability multiplication formula as
earlier for independent events.

133
Often times frequency and probability are mixed. Sometimes this mix is
not appropriate as in the case of two different form inputs to an OR gate.
Sometimes it is possible to combine the two forms as in the case of an
AND gate. Here the frequency result is the product of the probability and
the frequency inputs.

Note that it is possible to treat mixed input to an OR gate but only if a time
period is specified for the frequency to be first converted to a probability of
an occurrence during that fixed time period. Then the two probabilities are
treated as for a normal OR gate.

134
Fault tree logic treats frequency OR gates as simple addition provided the
events are independent. The logical AND operation is not defined for
frequencies unless one or more of them is converted to probabilities using
a specified time base.

135
At this time the course participant should practice the skills learned in this
section by completing Application Exercise 1. The solution to Application
Exercise 1 can be found in the additional resources section of this training
module.

136
This section began with some basic rules of probability. Then it developed
several important definitions regarding events before developing the
concepts of probability multiplication and addition. The section then
finished with some exercises and a treatment of the fault tree analysis
method.

137
This section on Process Hazards Analysis (PHA) begins with some
definitions and description of the process. It then moves on to discuss the
HAZOP method of identifying hazards, how the consequence component of
the risk of a hazard is analyzed, and how the corresponding likelihood
analysis is performed. The section then finishes with the event tree
method of fault propagation as it relates to likelihood analysis.

138
Accepted standards and practices define a hazard as a potential source of
harm with the potential to cause various types of damage. The common
idea is potential, so something only needs to be capable of causing harm
to be considered a hazard. Thus a vessel containing flammable liquid or an
industrial robot that cuts sheet metal both represent hazards since they
can both potentially cause damage and harm.

139
The first step in the sequence between a hazard, which is contained and
controlled by the process, and the unwanted accident, is the initiating
event.

140
Although the example shows an initiating event where a significant
outcome will occur, this is not always the case. Consider an initiating
event where fuel gas supply to a fired heater is temporarily contaminated
with nitrogen, causing a loss of flame at the burner. If loss of flame is
detected, and the fuel gas flow to the burner is then stopped, no
significant outcome will result from this initiating event. An intermediate
event can be event in the series between the initiating event and the final
outcome.

141
An incident is the result of an initiating event that is not stopped from
propagating. The incident is most basic description of an unwanted
accident, and provides the least information. The term incident is simply
used to convey the fact that the process has lost containment of the
chemical, or other potential energy source.

142
Given the fact that a loss of containment event, or incident, has occurred.
A number of different consequences are often possible. For instance,
the release of acrylonitrile could result in a pool fire, flash fire,
unconfined vapor cloud explosion, toxic effects, or a combination of
these effects. The term used to describe the particular effect that is
being analyzed is incident outcome.
The important distinction between an incident and an incident outcome, is
that an incident describes all of the possible results of a loss of
containment, including the outcome where the released material
harmlessly disperses. Incident outcomes describe specific
manifestations of the incident.

143
Once the effect zone is established, the consequence can be determined
be estimating the number of receptors that will receive the incident
outcome’s effects. The consequence is simply a measurement of the
impact, in terms of loss, of the incident outcome.
While the definition above stresses the fact that consequence is
determined for an individual outcome case, the problem of SIL selection
usually requires that overall or “average” outcome for all potential incident
outcome cases stemming from a single incident to be considered.
Calculation of an “average” consequence for an incident can be performed
using event tree analysis.
Both effect zones and consequences will vary depending on the type of
receptor that is under study. For instance, the hazard zone surrounding a
fire will be larger for injuries than for fatalities. For a consequence to be
stated in such a way that can be understood, both the size of the
consequence, and the type of receptor, or damage, must be stated. The
receptors that are typically considered when performing risk analyses are
injuries to people, loss of life, equipment damage, environmental damage,
and business interruption.
Any of the types of damages, or receptors, listed above can be used for
representing consequences depending on the type of decision that is being
made. For instance, decisions about personnel safety usually require the
consequence to be stated in terms of both injuries and fatalities, but cost-
benefit analysis of equipment protection systems will require that
consequence be stated in terms of property damage, in monetary units.
Optimally, all of the types of losses should be considered simultaneously.

144
This process uses conversion factors to determine the relative utility of each of
the loss types to trade off amongst them. With this procedure, a single uniform
basis of loss can be derived, which will facilitate the decision making process by
making the decision one-dimensional.

144
145
146
The IEC61508 standard focuses on the functions of identifying the
hazards, determining event sequences leading to the harm or damage,
and identifying the risks associated with these events. It is important to
note that the standard defines risk as a combination of the likelihood of
harm and the magnitude of that harm.

147
Generally the standards include the identification, consequence estimation,
and likelihood estimation for the harmful outcome to fall under the
category of Process Hazards Analysis.

148
The US regulations list several approved methods that can be used to
address a wide range of situations and circumstances. Although the
methods are required by law only in cases in the United States where
certain hazardous materials are present, the methods are much more
broadly applicable. There is even the flexibility in the US regulation to
employ other methods provided they are indeed appropriate and
equivalent. Thus these other methods should be thorough, established,
and well documented to insure appropriateness and equivalence.

149
The US regulations list several important requirements for the PHA that
should be addressed by any effective system. Many of these are general
good engineering practice, but it is helpful to have the full list to be sure
that all of the key components are addressed.

150
The US regulations also specify some useful requirements for the PHA
process that also fall under the category of good engineering practice. The
team conducting the analysis should have a diverse range of skills to
provide a broad coverage of the system in question. Similarly, the PHA
should incorporate a clear documentation function and should insure that
the recommendations from the analysis are indeed acted upon. Finally, the
process must always be fresh and evergreen with provisions for ongoing
hazards analysis at least every five years.

151
152
When developing a list of safety instrumented functions that required SIL
selection analysis from a PHA report, the following information will be
required.
1) A description of the safety instrumented function. The description of
the safety instrumented function, that is either in place, or has been
recommended with be found in the safeguards and recommendations
columns of the report respectively.
2) The hazards that is being prevented can be found in the same row as
the safety instrumented function identification in the consequences
column.
3) The initiating events that cause an accident to occur can be found in
one of two places. Either the report will contain a causes column, or the
cause of the hazard will be found as part of the question that prompted the
hazard to be identified – for instance it might be a what-if question or a
checklist item.
4) Safeguards other than the SIF under consideration will be found in the
safeguards column. It is important to list not only the non safety
instrumented safeguards, but also other SIS that might perform an action
that will prevent the same hazard.

153
This basic warm end piping protection function.

154
The basic HAZOP results section shown above lists two guideword entries
for the node defined by the warm end cryogenic exchanger considering the
design parameter of temperature. The recommendations are only initial
suggestions and based on the action responsibility list, Jones and Smith
will investigate each situation in more detail to determine what action is
indeed appropriate in each case.
This figure shows a typical PHA report, in this case a report for a HAZOP
study. The report has a tabular format. Each individual hazards that is
identified is listed on a row, and the characteristics of the hazard are listed
in the columns. There are variations in reporting among the various
styles, but much of the same information is presented. The report shown
here has columns for causes, consequences, safeguards, and
recommendations. This report also shows the deviation, in this case
“temperature too low” which caused the hazard to be identified.
The main difference between PHA reports generated by the various
methods is the prompt which caused the hazard to be identified. In a
HAZOP the prompt is a guide-word combination. In checklist studies the
prompt is a checklist item, and in what-if study the prompt is the what if
question that is posed by the facilitator. Regardless of the type of study,
the consequence of the hazard, the safeguards that prevent the hazard
and the recommendations for improvement should all be listed, usually in
a tabular format.

155
The first step in the SIL selection process is identification of all of the SIF
that need to have their SIL selected. SIF are listed in two locations in a
PHA report. Existing safeguards are listed in the safeguards column of the
report, and recommended safeguards that do not already exist in the
process are listed in the recommendations column of the report.
Review of the example PHA report yields three potential layers of
protection. Scanning the safeguards column, a set of flow alarms (with
presumed operator action), a process shut off, and an independent PLC
low T shut off are found. These layers of protection already exist in the
process but are not SIF. Thus no SIF currently exist in the process
Scanning the recommendations column of the PHA report yields one SIF.
The first row contains a recommendation of “Should Indep. PLC low T shut
off be an SIS?” Since the SIS is in the recommendations column of the
PHA report, this SIF is recommended and does not already exist in the
process. Note that this SIF recommendation is essentially an upgrade of an
existing PLC action. Thus in a LOPA, care must be taken to avoid taking
credit for both the existing non-SIS function and the recommended
upgraded SIF actions to prevent the hazard.

156
The hazard that is being prevented and its associated consequence can be
found in the consequences or description of hazard column of the PHA
report.
In the example PHA above, the consequence, that has been identified is
“brittle fracture of downstream piping and fire” This consequence will also
be considered in the associated Layer of Protection Analysis (LOPA) part of
the overall likelihood analysis.

157
Initiating events are related to the consequence that was determined in
the previous step. For each column where a consequence is listed, which
is to be mitigated by a listed SIF, the initiating events that cause that
consequence should be determined. In a HAZOP report, the initiating
events are found in the causes column of the PHA report. In what-if and
checklist studies, the initiating events might also be found in the prompts,
(e.g., checklist items, and what-if questions).
There is a potential that multiple initiating events are present for a single
outcome. When this situation occurs, the same outcome, or consequence,
will be shown in multiple rows of the PHA report. Each row that the
consequence appears in will have different causes.
In the example PHA shown above, the consequence “Potential brittle
fracture of downstream piping and fire” appears in two rows. In this
situations, there are two causes, or initiating events related to this
outcome. Those two initiating events are 1) flow imbalance between
streams causing a drop in temperature for the piping in question, and 2) a
weather extreme.

158
In a LOPA analysis, each individual initiating event will have an associated
set of safeguards. The safeguards that are listed are unique to each
initiating event. They are not generic to the consequence that is being
prevented. The PHA report facilitates the determination of safeguards for
each individual initiating event, because the protection layers are listed for
each individual initiating event in the safeguards column of the report.
Each column of the report is dedicated to a single initiating event.
In the PHA report example shown above, there are two initiating events,
both of which lead to the same outcome. The initiating event “flow
imbalance between streams” has the safeguards of alarms (operator
intervention), process shut off and independent PLC low T shut off. The
initiating event “weather extreme” has the safeguard of PLC low T shut off.
Note that the sets of safeguards for the two initiating events are not the
same.
Finally, it is worth noting that some of the listed safeguards may not be
effective layers of protection and some layers of protection may be
missing. More on that in later slides.

159
In all cases, the engineering documents that describe a process should be
reviewed to determine if any SIFs are incorporated in the process that
were not listed in the PHA study. Review of engineering documents is
required because the PHA study may not identify all of the SIF that are
already incorporated in the process.
The history of operating experience for processes is incorporated into a
design package by detailed design contractors and process licensors. The
process designers often include SIS in new designs to prevent accidents
and near-misses that have occurred in other similar process designs.
Identifying SIF from the engineering design package is often difficult. The
SIF in the design package are typically not differentiated from the other
control loops in the process. Identification based on a P&ID representation
of a control loop requires control engineering expertise, so that the
function which is being performed can be understood.
A knowledge of the process under study is also required to be able to
understand why each of the control functions that are shown on a P&ID
are being employed. When SIF are identified by design documentation
review, LOPA analysis is complicated. In this case there is no document
that describes the hazard being prevented, the consequence if the hazard
is realized or the safeguards in place to prevent the hazard. All of this
information must be developed by the analysis through expertise in risk
analysis and a thorough understanding of the process under study.
Performing this task might require expert assistance from either the
process design engineers in the detailed design or licensing firm, or a
specialist consulting firm.

160
The consequence analysis part of the safety lifecycle follows after the
potential hazards have been identified.

161
Consequence analysis deals with the potential harmful effects of the
hazard on the surrounding people and property. Although traditional risk
management focuses on the impacts to people, which include potential
fatalities and other injuries, a good risk management program should take
a broader view. Good consequence analysis processes also take into
account losses due to business interruption, property damage, damage to
sensitive environments resulting from any chemical release, and third-
party liability. The losses from a process incident can be tangible, such as
property damage, or intangible, such as degraded corporate image. The
common thread in all of these losses is that they affect a company’s
bottom line. While the result of the chemical release modeling is an “effect
zone,” the consequence analysis is completed only after impact analysis,
which translates the effect of the consequence into loss of personnel and
property terms.

162
Release of a toxic chemical might produce little, if any, property damage,
but may result in significant impact on workforce and surrounding off-site
population. The effect of a toxic release will be caused by the presence of
the toxic chemical, not by any energetic force that it produces. As such,
knowledge of the effects of a toxic chemical release will be known by
determining what concentrations of the material will be present in various
areas downwind of the release. The analysis of the concentrations of
materials downwind of releases is called dispersion modeling.

163
164
Consequences are often grouped into qualitative and quantitative
categories. In this case, Minor consequences are those that are initially
limited to the area of the event. Serious consequences are those that
could cause serious injury or fatality on-site or off-site or cause property
damage between one and five million dollars. (Note that this table makes
an effort to include consequences that are not specifically safety related,
such as property damage.) An Extensive event is one that is expected to
be five times worse than a serious accident.

165
Consequences are often grouped into qualitative and quantitative
categories. In this case, Minor consequences are those that are initially
limited to the area of the event. Serious consequences are those that
could cause serious injury or fatality on-site or off-site or cause property
damage between one and five million dollars. (Note that this table makes
an effort to include consequences that are not specifically safety related,
such as property damage.) An Extensive event is one that is expected to
be five times worse than a serious accident.

166
Consequences are often grouped into qualitative and quantitative
categories. In this case, Minor consequences are those that are initially
limited to the area of the event. Serious consequences are those that
could cause serious injury or fatality on-site or off-site or cause property
damage between one and five million dollars. (Note that this table makes
an effort to include consequences that are not specifically safety related,
such as property damage.) An Extensive event is one that is expected to
be five times worse than a serious accident.

167
Consequence analysis is the task of estimating the damage resulting from
a process accident. In the terminology of risk assessment, as published in
the CCPS Quantitative Risk Analysis Guidance, consequence is a measure
of the expected outcome of an event and is expressed as “effect distances”
or “effect zones.” Consequence analysis of potential accidents in the
process industries typically involves analyzing the release of hazardous
chemicals. This analysis is normally carried out using mathematical models
and computer software addressing the chemical and physical phenomena.

168
All of these techniques, terms, and methods work toward the same
purpose of measuring the magnitude of harm that results from a given
accident.

169
For each incident outcome case that is being considered, the risk analysis
will determine the resulting effect zone. The effect zone is an area where
the concentration of a measurable criterion that corresponds to the
magnitude of the incident outcome under study is above a certain
threshold. For instance, the magnitude of a fire is proportional to the heat
radiation produced by that fire, usually measured in kW/m2. The actual
value of the criterion that is used, which is called the endpoint, is typically
selected to represent a certain level of vulnerability of a receptor to that
incident outcome. For example, EPA selected a thermal radiation endpoint
of 5 kW/m2 to represent an injury endpoint when formulating its Risk
Management Program regulation.
For a flammable vapor release, the area over which a particular incident
outcome case produces an effect based on a specified overpressure
criterion (e.g., an effect zone from an unconfined vapor cloud explosion of
28,000 kg of hexane assuming 1% yield is 0.18 km2 if on overpressure
criterion of 3 psig is established).
For a loss of containment incident producing thermal radiation effects, the
area over which a particular incident outcome case produces an effect
based on a specified damage criterion [e.g., a circular effect zone
surrounding a pool fire resulting from a flammable liquid spill, whose
boundary is defined by the radial distance at which the radiative flux from
the pool fire has decreased by 5 kW/m2 (approximately 1600 BTU/hr-ft2)]

170
There is a wide range of models that are available to conduct consequence
analysis. This table presents some of the more popular models that can
be used for consequence analysis for SIL selection. The table also shows
the strengths and limitations of the various models. The table includes
models that are both publicly available and proprietary models that are
used within industry or government but is by no means comprehensive.
There are a large number of models that are commercially available, and
several have been developed for analysis niches, such as EPA RMP
compliance and environmental engineering.

171
At this time, the course participant should practice the skills learned in this
section by completing Application Exercise 1. The solution to Application
Exercise 1 can be found in the additional resources section of this training
module.

172
Likelihood Analysis and the closely related Layer of Protection Analysis
(LOPA) are generally conducted after the hazards have been identified and
before determining if a safety instrumented system is required to achieve
a tolerable risk level.

173
Likelihood of a hazard is defined as the frequency of the harmful outcome
event. This is most often expressed in units of events per year or events
per million hours. It is important not to confuse likelihood with probability
since the two terms are distinctly different in safety analysis.

174
175
As for consequence analysis, likelihood is also commonly broken out into
qualitative or quantitative categories. In this case, a Low likelihood is one
where the event is not expected to occur within the lifetime of the plant
and is assigned a frequency of less than 1x10-4 events per year.

176
The challenge in applying any statistically derived values for the likelihood
of an event is to be sure that they are relevant for the situation at hand.
This is much easier for common equipment such as valves or pressure
vessels in normal service than it is for less prevalent equipment under
more unusual conditions.
The lack of sufficient relevant data is often a severe limitation of straight
statistical analysis considering the overall harmful outcome.

177
When statistical analysis of the overall outcome is not an effective way to
determine harmful event likelihood, fault propagation modeling can be
used. Fault propagation modeling analyzes the chain of events that leads
to a harmful event. By using statistics on a smaller scale and analyzing
what events initiate and contribute to that chain and establishing how they
are logically related, the final harmful event likelihood can be determined.

178
These techniques use the failure event rates of individual components in
the chain of events to determine the frequency of a harmful accident
involving the overall system. As mentioned before, failure rates of more
common individual components like pumps, compressors, valves, and
vessels are easier to find in reference databases or to estimate because a
large experience base is available for these components.

179
An event tree is comprised of a single initiating event, which is an action or
a failure of a piece of equipment that starts the chain of events that can
lead to one of several outcomes. The paths to the different outcomes are
determined by the event trees branches. Each branch is a set of events
that can occur in the chain of events that leads to the final outcomes.
Each branch has an associated probability of occurrence for each possible
path.

180
This slide shows a typical event tree diagram. The diagram, which
represents the chain-of-events for an accident, is viewed from left to right.
The diagram begins with an initiating event. In the middle section are the
branches that lead to various outcomes. In this case there are two sets of
branches. In the first set, there are three potential intermediate
outcomes, and in the second set there are two potential outcomes for each
branch point. Overall, there are five outcomes that can result from the
initiating event shown in this diagram.

181
The probabilities in an event chain can be used to calculate the probability or
frequency of the overall final outcome. Probability multiplication can be used
as long as each event or layer of protection in the chain is independent of all of
the others.

182
In this example, an event tree is constructed for possible outcomes from
our brittle piping fracture. The problem states that the initiating event for
the tree is the pipe fracturing. Not that this will include causes of process
flow imbalance and weather extreme. The problem also states that there
are three branch point sets for the tree. The first branch is the possibility
that the leak may be minor or catastrophic, the second branch is whether
a source of ignition is present, and the third branch is whether other areas
are also ignited as well.

183
From the initiating event on the left, each of the three branch point sets is
drawn sequentially. The first branch is the size classification of the fracture
and leak. There are two possible events in this set, thus two branch paths.
The second branch is the question of whether the material ignites. The
third branch point considers whether the fire propagate to other areas so it
only applies to the cases where there is a fire. The third branch impacts
only two of the events that are possible in the second branch set, since the
leak has to exist in the first place. The order in which the branches are
shown on the tree is important in some cases and should be considered
when building an event tree.
There are five outcomes that are possible as a result of this event tree
from the top down: 1) Full plant explosion, 2) large fire but plant intact, 3)
large release no fire, full plant explosion (same as case 1), 4) small fire but
plant intact, and 5) small release no fire.

184
In the previous example an event tree diagram was built to describe the
events that might result from a pipe rupture. In this example that event
tree must be quantified with the data provided in the slide. The problem
gives the initiating accident likelihood and the probability of each of the
branch events. The problem also requires that two outcomes be
determined, first the likelihood of a plant explosion, and second the
likelihood of a small release and small fire.

185
The likelihood and probability data is placed on the event tree as shown in
the diagram above. Note that probabilities were placed on both of the
events in each branch. This was done by assuming that the events were
complementary (no other events were possible and the probabilities add to
one). Calculation of the final outcome likelihood is then performed using
probability multiplication.
For the outcome of the full plant explosion, the likelihood for each
contribution is calculated as follows:
(1/20) * (2/3) * (0.3) * (0.2) = 0.00201 per year down the large fire path.
For the same outcome down the small fire path, the likelihood is calculated
as follows.
(1/20) * (1/3) * (0.3) * (0.04) = 0.000066 per year.
These frequencies are added to give the 0.00208 per year total
Note that the likelihood units of the outcome are the same as the units of
the initiating event, since the probability multiplication does not affect the
initial units.

186
These risk integrals are a measure of the total expected loss, i.e., a
summation of the likelihood and consequence for all potential loss events
that are being considered.
In the case of Safety Instrumented System (SIS) design, this would be all
of the consequences that are prevented by a single Safety Instrumented
Function (SIF).

187
So far in this section, we have considered cases where there is a single
defined consequence and corresponding single likelihood of it taking place.
In many situations, there are multiple different impact outcomes with
corresponding multiple likelihoods. In these cases, they must be combined
to determine a single overall risk to properly select the SIL. In order to
combine the consequences of the potential harmful outcomes related to a
single SIF and compare them to the tolerable risk, they must be expressed
in the same terms as the tolerable risk levels. No matter whether the
consequence is expressed as a single overall cost or loss variable or if
personnel impacts are kept separate from financial impacts, it is possible
to use a risk integral approach to continue the SIL selection process.

188
In mathematical form, this summation includes a consequence times
frequency risk contribution to the total for each event in question.

189
190
At this time the course participant should practice the skills learned in this
section by completing Application Exercise 3. The solution to Application
Exercise 3 can be found in the additional resources section of this training
module.

191
This section on Process Hazards Analysis (PHA) began with some
definitions and description of the process. It then moved on to discuss the
HAZOP method of identifying hazards, how the consequence component of
the risk of a hazard is analyzed, and how the corresponding likelihood
analysis is performed. The section then finished with the event tree
method of fault propagation as it relates to likelihood analysis.

192
This section covers a specific form of fault propagation modeling know as
Layer of Protection Analysis or LOPA. It begins with a description of the of
fault propagation and event tree method context for the process. Then it
covers the basics of using LOPA with some examples. The section then
finishes with a discussion of initiating events and failure rates along with
descriptions of typical protection layers.

193
As covered in the last section, when statistical analysis alone is not an
effective way to determine harmful event likelihood, fault propagation
modeling can be used. Fault propagation modeling is a class of techniques
that analyze the chain of events that can lead to a harmful outcome. By
analyzing what events initiate and contribute to that chain and establishing
how they are logically related, the final harmful event likelihood can be
determined.

194
Layer of Protection Analysis (LOPA) is a variation of event tree analysis
that is limited and optimized for a specific situation. The specific situation
involves an initiating event that can lead to an unwanted accident, but the
event can be prevented by one or more protection layers that may stop
the chain of events from leading to the harmful outcome.
In a similar fashion to event tree analysis, an initiating event starts the
chain of events that leads to the unwanted impact. The layers of
protection are similar to the event tree’s branches. Unlike an event tree,
only two outcomes are possible: the unwanted accident or no event. Only
one of those two outcomes, accident likelihood, is generally calculated.
Also, for SIL determination, it is usually easier and clearer to specifically
exclude the potential SIL rated safety function to determine the accident
frequency without the function. The SIL level is then selected to provide
enough risk reduction (low enough PFD) to give a maximum tolerable
accident frequency.

195
196
If the protection layer fails, the analysis proceeds to the next protection
layer or to the outcome. Thus the occurrence of the undesired final
accident outcome is the only outcome of interest in the LOPA analysis.
NOTE: If a relief valve or other device that may reduce but not eliminate
the hazard is used as a protection layer, the reduced consequence scenario
may need to be considered as a separate harmful outcome.

197
A layer of protection analysis for a pipe rupture is shown in the following
example. The problem statement lists the accidental outcome that is
being considered as a fire subsequent to the pipe rupture. The rupture of
the pipe is caused by brittle fracture initiated by two different events.
Several layers of protection that prevent the process flow imbalance from
propagating to a fire are proposed in the problem statement, these are 1)
operator response, 2) DCS action, 3) separate PLC (potential SIF), 4)
Uncertain rupture even with low temperature, and 5) control of ignition
sources; low ignition probability.

198
A layer of protection analysis for a pipe rupture is shown in the following
example. The problem statement lists the accidental outcome that is
being considered as a fire subsequent to the pipe rupture. The rupture of
the pipe is caused by brittle fracture initiated by two different events.
Several layers of protection that prevent the weather extreme from
propagating to a fire are proposed in the problem statement, these are 1)
operator response, 2) separate PLC (potential SIF), 3) Uncertain rupture
even with low temperature, and 4) control of ignition sources; low ignition
probability.

199
The initiating event, process flow imbalance, starts the diagram on the left.
The protection layers are then listed after the initiating event in order of
their escalation, with the success branch leading to either the next
protection layer, or the final outcome, and the failure branches leading to
the no event outcome. The unwanted accident outcome, or pipe rupture
and fire, is the outcome of interest on the right side of the LOPA diagram.
The other outcome is ‘no event’.
Note that the Operator Response has been combined with the DCS. The
reason for this is that they are not independent since they share the DCS
panel and main processing system as a significant common element.
Counting them both separately is a dangerous incorrect practice since it
over estimates the effectiveness in preventing the accident. To properly
take credit for this combined layer of protection in a quantitative analysis,
a common mode calculation must be done using fault trees or some
equivalent model.
Note also that the potential SIF is not included in the analysis to give a
resulting frequency without the function present.

200
The initiating event, process flow imbalance, starts the diagram on the left.
The protection layers are then listed after the initiating event in order of
their escalation, with the success branch leading to either the next
protection layer, or the final outcome, and the failure branches leading to
the no event outcome. The unwanted accident outcome, or pipe rupture
and fire, is the outcome of interest on the right side of the LOPA diagram.
The other outcome is ‘no event’.
Note that the layers are slightly different than for the other initiating event

201
Quantification of the outcome of the layer of protection analysis is
performed using an identical procedure to event tree analysis. Only one
outcome is of interest, the unwanted accident. The frequency at which an
accident is initiated, but is prevented by a layer of protection is typically of
no interest to the analyst, and is not usually calculated.
The probability of the unwanted accident is logically related to the
initiating event and the protection layers by logical ‘AND’s. The accident
frequency is the frequency at which the initiating event occurs, and all of
the protection layers fail. Since the relationship is a logical AND,
probability multiplication is used to calculate the outcome frequency. The
outcome is the initiating event frequency multiplied by the probability of
failure on demand, or PFD, of all of the protection layers.

202
The LOPA diagram that was created in the previous example is quantified
with the information provided above.

203
The initiating event frequency and protection layer failure probabilities are
added to the LOPA diagram as shown. The ‘no event’ outcome is of no
interest, so its frequency is not calculated. The rupture and fire outcome
is the outcome which is calculated. Probability multiplication is used to
calculate the resulting outcome frequency of 9.49 x 10 –3 per year. Note
that this is only part of the total frequency.

204
The initiating event frequency and protection layer failure probabilities are
added to the LOPA diagram as shown. The ‘no event’ outcome is of no
interest, so its frequency is not calculated. The rupture and fire outcome
is the outcome which is calculated. Probability multiplication is used to
calculate the resulting outcome frequency of 1.52 x 10 –3 per year. Since
the accident can result from two different initiating events, the total
frequency is the sum of the contributions from each event path. For a total
frequency of 1.1 x 10-2 per year or once per 90 years without the SIF
present.

205
At this time the course participant should practice the skills learned in this
section by completing Application Exercise 5. The solution to Application
Exercise 5 can be found in the additional resources section of this training
module.

206
Quantification of LOPA diagrams depends on historical information about
how the events that cause or propagate an accident have occurred in the
past. The best source of this reliability data is records of the failures and
maintenance of equipment that exists in the process plant of interest. This
information is best suited for quantifying the LOPA diagram because the
failure rate best describes the actual conditions under which the process
equipment is being used. Information gathered from another plant or
industry databases might not be representative because the equipment is
maintained differently or used in more severe process conditions.
Unfortunately, such historical reliability information is often not available.
Plant maintenance records can be used in addition to SIS function test
data. Since ISA S84 requires that SIS be tested, records from these test
will produce data that can be used to develop failure rates.
Industry average data is also available. This data is collected by
consortiums of operating companies and engineering societies such as ISA
and AIChE. This data is accumulated for a range of operating conditions
and categorized into generic equipment types. It is unlikely that failure
rates for a specific model of instrument will be found in a generic industry
database.
Use of expert judgment in the quantification process is inevitable. Failure
rates obtained from several sources may vary substantially, and the failure
rates within one particular database might have a large range of
applicability. Applying this data to a specific situation will require an
analyst to consider the equipment that is installed in the plant versus that
equipment that is described in the database.

207
Most operating companies maintain records of maintenance and repair for
process equipment items. These records are required by many national
laws and regulations. For instance, in the United States, maintenance
records for safety related equipment items is required by the mechanical
integrity clauses of the OSHA process safety management regulation and
the EPA risk management plan regulation. In addition, industry consensus
standards such as ISA S84 require that safety instrumented systems be
tested. The records from these tests are another valuable source of data.
Function test data can be used to calculate a number of reliability
measures. The most common measure is the failure rate. Failure rate is
defined as the number of failures that have occurred divided by the total
time the equipment has been in operation. The equation used to calculate
failure rate, in terms of hours is shown here.

208
Calculating the probability of failure on demand is more complex than a
simple unit conversion. The PFD depends not only on the failure rate of
the device, but also the failure mode of the equipment and the interval at
which the device is tested.
The failure mode of a device is a symptom, condition, or fashion in which a
device fails. A mode might be identified as a loss of function, premature
function (function without demand), an out of tolerance condition, or a
simply physical characteristic such as a leak (incipient failure mode)
observed during inspection.
In some cases, the failure rate that is listed in a database is divided into
those failures that will cause a failure on demand, or dangerous failures,
and those that will cause a nuisance trip, or safe failures. These databases
provide an overall failure rate that includes both modes, and then provides
a distribution between safe and dangerous failures. This type of
presentation is common in failure rate databases that cover electronic
components.
Most databases list the failure rates for each mode of failure. For process
equipment items, many failure modes may exist. For instance, in the
CCPS Reliability Guidelines, the failure modes that are listed for a pilot
operated relief valves include, 1) seat leakage, 2) fails to open, 3) spurious
operation, 4) fails to open on demand, 5) inter-stage leakage. It is
important to select the proper mode of failure when quantifying a LOPA
diagram.
The probability of failure on demand of a device depends on the frequency
of testing. An untested device’s PFD gets larger as time increases. The

209
relationship between failure rate and test interval is exponential and
approximated by the equation shown.

209
At this time the course participant should practice the skills learned in this
section by completing Application Exercise 5. The solution to Application
Exercise 5 can be found in the additional resources section of this training
module.

210
In order to be considered an independent layer of protection, several
criteria must be met.
Specificity – An independent protection layer must be specifically
designed to be capable of preventing the consequences of the potentially
hazardous event.
Independence – The operation of the protection layer must be
completely independent from all other protection layers, no common
equipment can be shared with other protection layers.
Dependability – The device must be able to dependably prevent the
consequence from occurring. Both systematic and random faults need to
be considered in its design
Auditability – The device should be proof tested and maintained. These
audits of operation are necessary to ensure that the specified level of risk
reduction is being achieved.

211
One of the most common layers of protection in a process plan
environment is the basic process control system itself. However to properly
take credit for this layer, it must be fully capable of preventing the
accident given the series of events leading to it in the LOPA.

212
Operator response is also a commonly available layer of protection. It is
important to insure that this is independent of all other layers of protection
considered, including the basic process control system. In very special
cases based on human response analysis (HRA), it may be possible to
credit a PFD better than 0.1 but such cases must meet numerous strict
criteria.

213
One simple protection layer is whether the hazard is continuously present
or not. It is important not to double count this relative to the initiating
event frequency which may already take this into account.

214
In some cases, it is possible to credit the mechanical integrity of a
containment vessel as a layer of protection.

215
Various other mechanical relief devices can also be layers of protection.
Again it is vital that they be considered properly in the context of the
earlier layers having already failed.

216
External risk reduction methods can definitely improve safety. However,
they are best considered separately from layer of protection analysis
unless they completely prevent the harmful outcome. If they only reduce
its intensity, then they should be considered as part of the consequence
magnitude analysis.

217
Ignition probability is an important consideration for fires. Detailed
attention to limiting these though plant procedures and equipment design
is vital to insuring that the credit can indeed be taken here.

218
Explosion probability depends on many highly complicated factors relating
to turbulence of a potential flame front leading to a shockwave and
explosion. This should not be credited unless it is well understood in the
particular case at hand.

219
Occupancy may be credited as well but only if it can be verified that the
occupancy was not already considered as part of the consequence
analysis.

220
221
This section covered a specific form of fault propagation modeling know as
Layer of Protection Analysis or LOPA. It began with a description of the of
fault propagation and event tree method context for the process. Then it
covered the basics of using LOPA with some examples. The section then
finished with a discussion of initiating events and failure rates along with
descriptions of typical protection layers.

222
This section focuses on the selection of the correct safety integrity level for
a system based on the results of the likelihood and consequence analysis.
It begins with a discussion of the SILs themselves. Then the balance of the
section covers different methods for selection including the hazard matrix,
risk graph and several quantitative techniques.

223
Determining whether a SIS is required and what integrity level is needed
for each safety function follows the likelohood and consequence analysis
steps in the safety lifecycle.

224
The Safety Integrity Level is a measure defined in the IEC61508 standard.
The key measure of a system’s integrity is how well it can be counted on
to do what it is supposed to do when it is supposed to do it. For the Low
Demand mode operation common in the process industry, the average
probability of failure on demand (PFDavg) is the variable that defines the
SIL, as shown in the table on this slide. The risk reduction factor is the
reciprocal of the PFDavg, and the SIL number itself represents the
minimum number of orders of magnitude of risk reduction that the SIF will
provide.
For the High Demand mode common in machinery applications, SIL relates
to the frequency of unsafe failures of the SIF per hour, since the systems
used are required to act more frequently than they are tested and
repaired.

225
As was mentioned in section 1, an individual Safety Instrumented Function
(SIF) is designed to identify the need and then act to bring the system to a
safe state for each hazard scenario. The effectiveness of the risk reduction
is measured by the function’s risk reduction factor (often expressed as a
Safety Integrity Level). The required risk reduction is the difference
between the process risk before a SIF and the “tolerable level” of risk to be
achieved for that process or piece of equipment.
It is important to note that a SIF is an individual function and a SIS can
include multiple functions, so the SIL refers to each SIF rather than to the
entire safety instrumented system.

226
To properly assign a SIL, the tolerable level of risk must first be identified.
This must be clearly understood to allow objective, uniform decisions
about how much risk reduction is required in each case. Then, one must
identify the level of risk initially present in a given situation without any
SIS present. The SIL selection process characterizes any difference to
identify what the proposed safety system must do to move the overall risk
level to the tolerable region.

227
With the Hazard Matrix procedure, the consequence and likelihood each
form an axis of the SIL selection matrix. Based on the agreed risk
tolerance, the analyst or team will select a SIL corresponding to identified
consequence and likelihood categories. This SIL represents the amount of
risk reduction that is required to move an event with the selected
consequence and likelihood to the agreed tolerable risk region. The
number of categories and thus the size of the matrix will depend on the
specifics of an organization’s risk definition categories.

228
Although this typical consequence category set from Guidelines for the
Safe Automation of Chemical Processes (AIChE) shows three categories,
four and five categories are also commonly used. Each entry contains a
qualitative term for severity and a description of the type of consequence
that would be considered typical of the category.

229
The selection of the proper category is performed either qualitatively,
using expert judgment, or can be assisted by quantitative calculation tools.
The output from a quantitative consequence analysis might be, for
example, PLL=0.1, or probable loss of life of 0.1 fatalities for the incident
that is being studied. Using this criteria, the analyst might select Minor or
Serious. A PLL of 0.1 indicates that there is a 10% chance of one fatality.

230
These typical likelihood categories also come from the Guidelines for the
Safe Automation of Chemical Processes textbook from the AIChE. Although
this table shows three categories, four and five categories are also
commonly used. The table contains a qualitative term for a likelihood
rating, a description of the type of frequency considered typical of the
category, and quantitative ranges for unmitigated event likelihood. The
quantitative ranges for categories are often not included in consequence
and likelihood tables, leaving the analyst or team with responsibility to
select the proper category based on their expert judgment.

231
The final SIL assignment is usually performed using the two-dimensional
type of matrix we have been discussing. There are variations of this
technique that use a third dimension to represent the layers of protection
that are available to prevent an incident from occurring, but since LOPA
more accurately assesses the effectiveness of the independent layers of
protection, this type of matrix is not described here.
The matrix that is shown here is based on information provided in Section
E of IEC draft standard 61511-3, which describes matrix-based SIL
selection methods in some detail. Each consequence-likelihood pair has an
associated SIL level that represents the amount of risk reduction required
to make the given situation tolerable.

232
Example 1 asks for the SIL of a safety instrumented function whose
consequence is Serious and likelihood is High. Selecting the High likelihood
row, and then the Serious consequence column from the matrix yields SIL
3.

233
234
235
236
Risk graphs, like hazard matrices, are usually qualitative and category
based. While a hazard matrix specifically considers only likelihood and
consequence, a risk graph considers four items: likelihood, consequence,
probability of occupancy, and probability of avoiding the hazard. Although
probability of occupancy and the chance to avoid the hazard can be
included in the likelihood and consequence analysis for the hazard matrix
option.

237
Using the selected categories for each parameter, the analyst or team
follows a decision path that leads to the box that contains a SIL
assignment.
The tolerable level of risk is used to set up the risk graph and to assign a
required SIL for each possible case listed in the graph, similar to the set-
up of the hazard matrix.

238
Risk graph analysis uses four parameters to make a SIL selection. These
parameters are consequence, occupancy, probability of avoiding the
hazard, and demand rate. The Consequence parameter represents the
average number of fatalities that are likely to result from a hazard when
the area is occupied, and should include the expected size of the hazard
and the receptors’ vulnerability to the hazard.

239
This table shows some typical consequence categories. Four categories is
the number most frequently used in risk graphs. The table contains a two-
letter code to describe that rating, and provides a description of the
probable loss of life range that would be considered typical of the category.
The table also contains some descriptive comments about the classification
process.
In this table, consequence categories are based on the probable loss of life
related to an accident scenario. If the PLL is less than 0.01, then the
category is CA. Likewise, a PLL from 0.01 to 0.1 relates to category CB, a
PLL from 0.1 to 1 relates to a category of CC, and a PLL that is greater than
1 relates to a category of CD. Other documents contain consequence
descriptions that are more qualitative than the PLL-based assignments
shown here. In examples presented in IEC61508, CA relates to minor
injury, CB relates to serious injury or fatality, CC relates to multiple
fatalities, and CD relates to a large number of fatalities.

240
This table shows two typical occupancy categories. The table contains two-
letter identifiers for occupancy categories that relate to the fraction of time
that the area that would be impacted by the hazard is occupied by
personnel.
FA is assigned to the occupancy parameter if the amount of time that the
hazardous zone is occupied is less than 10% of the operating time. If the
hazard zone is occupied more frequently, FB is selected.

241
This table shows two typical probability of avoidance categories. The table
contains two-letter identifiers for probability of avoidance categories that
relate to the probability that an exposed operator would be able to detect
a hazardous condition and have a means of escaping the effects of that
condition.
PA is assigned only if all of the following conditions are true:
1) Facilities are provided to alert the operator that the SIS has failed.
2) Independent facilities are provided to shut down such that the hazard
can be avoided or which enable persons to escape the area safely.
3) The time between the operator being alerted and a hazardous event
occurring exceeds one hour.
If all of the previous conditions are not met, PB is selected.

242
This table shows some typical demand rate categories. This table shows
three categories, which is typical of risk graph analysis. The table contains
two-letter identifiers for demand rate categories that relate to the
unmitigated frequency at which the hazardous event will occur.
The table shown here assigns demand rate categories based on
quantitative rate information. If the demand rate is less than 0.03, then
the category is W1. Likewise, a demand rate from 0.03 to 0.3 relates to
category W2, and a demand rate from 0.3 to 3 relates to a category of W3
A demand rate that is higher than 3 occurrences per year will require
recalibration of the risk graph to account for high frequency demands.

243
Some references, such as IEC61508 part 5, contain demand rate
descriptions that are more qualitative than the frequency-based
assignments. In that reference, W1 relates to a very slight probability, W2
relates to a slight probability, and W3 relates to a relatively high probability
of the unwanted event.

244
Once the consequence, likelihood, occupancy, and probability of avoidance
categories have been determined, the risk graph is used to determine the
SIL that will reduce the risk to a tolerable level.
The SIL is selected by drawing a path from the starting point on the left to
the boxes at the right by following the categories selected for
consequence, occupancy, and probability of avoidance. The combination of
those three determines the row that is selected. The specific box of the
selected row is determined by the selected demand rate category.
The box that is selected will contain one of the following: 1, 2, 3, 4, a, b,
or “---.” If the box contains a numeral, then that is the required SIL for
the SIF. If the box contains an “a,” then no special safety systems (e.g.,
an SIS) are required to achieve a tolerable level of risk. If the box contains
a “b,” then a single SIS is not sufficient to reduce the risk to a tolerable
level. If the box contains a “---,” then no safety requirements exist for the
function.

245
This example asks for the SIL of a Safety Instrumented Function with the
following parameters: fatality is likely, the area is normally occupied, there
is no possibility of avoiding the hazard, and the demand rate is low.

246
Based on the description of consequence, the PLL is between 0.5 and 1.0,
which falls into the CC category. Since the area is normally occupied, FB is
chosen for occupancy. No possibility of avoidance requires that PB is
selected for probability of avoidance. A low demand rate falls into the W2
category, which is between the very slight probability of level 1 and the
relatively high probability of level 3. Following from the starting point using
the selected categories yields row 5. Since W2 is selected, the second
column is chosen. The selected box contains SIL 3.

247
While the risk graphs and hazard matrices can be either qualitative or
quantitative, the method of using frequency based targets is inherently
quantitative. A maximum allowable frequency target is selected based on
the consequence of the hazard that the SIF is preventing. The required
risk reduction is the difference between the unmitigated event frequency
and the maximum event frequency target. The selected SIL represents
the probability category for SIS failure on demand that will ensure the
resulting event frequency with the SIS does not violate the maximum
frequency target.

248
When using frequency based targets, tolerable frequencies for unwanted
events are established first. The frequency that is tolerable will depend on
the consequence of the event. The table shown here gives some typical
tolerable frequency limits for a range of consequences. The table includes
a consequence category name, a description of the range of consequences
that are represented by that category, and the associated maximum
allowable frequency target.
This table gives a target frequency of 1 chance in 1,000 per year for
“minor” consequences, which are those that are limited to a local area and
generally involve minor injuries. One chance in 10,000 per year is
assigned “serious” consequences, which are those that involve serious
injury or fatality. “Extensive” consequences are those that are five times
greater than a “serious” consequence. Extensive consequences are
assigned a target maximum frequency of 1 chance in 1,000,000 per year.

249
The amount of risk reduction that an SIS can provide is a function of the
amount that the system can decrease the frequency of an unwanted
event. This decrease in frequency of the unwanted event is a function of
the probability of failure on demand, or PFD, of the SIS.
For an event that is mitigated by an SIS to occur anyway, both the
unmitigated event must occur and the SIS must fail. Since the events are
logically related by an ‘AND”, mitigated event frequency is calculated with
probability multiplication. In this instance, the PFD of the safety
instrumented system is the value that is desired, so the probability
multiplication equation is solved for PFD. The final equation is shown here.

250
251
This example considers a safety instrumented function where the hazard
being prevented has a consequence of PLL=0.21 and a likelihood of 1/576
events per year. Using the targets described several slides earlier select
the appropriate SIL for the situation.
Based on the description of consequence, the selected category is “Minor”,
which has an associated target frequency of 1.0x10-4. The target
frequency of 1.0x10-4 and the unmitigated event frequency of 1/576 are
combined to calculate the PFD using the equation from two slides earlier.
The result is a required PFD of 0.058. A probability of failure on demand of
0.058 can be achieved with a SIL 2 system.

252
Individual risk based targets are the most quantitative method for SIL
assignment. The maximum allowable frequency target is calculated based
on the maximum allowable individual risk (for a single loss of life event)
and the expected loss of life for the event in question. The required risk
reduction is the difference between the unmitigated event frequency and
the event frequency target calculated for the scenario. The selected SIL
represents the probability category for SIS failure on demand that will
ensure the mitigated event frequency does not violate the maximum
frequency target.

253
If individual risk criteria are used, then a tolerable frequency for the event
can be calculated based on the probable loss of life due to the event in
question and the maximum risk target for a single loss of life event. The
target frequency is the individual risk of fatality frequency divided by the
probable loss of life, as shown in the equation above.
Once the frequency target is determined, the risk reduction or required
PFD is calculated using the same equation as in the general frequency
based target method.

254
This example asks for the SIL of a safety instrumented function where the
hazard being prevented has a consequence of PLL=0.21 and a likelihood of
1/576 events per year only this time a different individual risk target is
used. The PLL in this case is 0.21 and the individual risk target frequency
is 1.0x10-4. Taking the ratio of these two numbers gives a target
frequency of 4.8x10-4 for the event in question. This value and the
unmitigated event frequency of 1/576 are combined to calculate the PFD
using the equation from several slides earlier. The result is a required PFD
of 0.27. A probability of failure on demand of 0.27 can be achieved with a
SIL 1 system.

255
This example asks for the SIL of a safety instrumented function where the
hazard being prevented has an existing risk of 0.044 deaths per year only
this time a different individual risk target is used. The individual risk target
frequency is 1.0x10-4. Taking the ratio of these two numbers gives a
target risk reduction of 440 or greater. This value can also be used to
calculate the PFD using the equation from several slides earlier. The result
is a required PFD of 0.00227. A probability of failure on demand of 0.0027
can be achieved with only some SIL 2 systems but all SIL 3 systems will
meet the requirement.

256
At this time the course participant should practice the skills learned in this
section by completing Application Exercise 6. The solution to Application
Exercise 6 can be found in the additional resources section of this training
module.

257
The key requirement for using risk integrals is that a single loss variable
must be able to be applied to the system in question. This can easily be
done if all of the harm is expressed or converted to financial units.
Risk integrals can also be applied to consider multiple personnel safety
consequences through the use of probable loss of life or PLL. The
important aspect of PLL is that it can take on fractional values, i.e., an
injury event can have a PLL of 0.1 or some other value less than one
representing the severity of the event in these probable loss of life terms.

258
Risk integrals are only now gaining acceptance in the design-engineering
field as a means of measuring risk. Despite this only-recent acceptance,
risk integrals have several advantages over other methods for measuring
risk:
· The single risk variable is easy to use in optimization and decision-
making
· The risk considers the impact of multiple fatality events
· Different risks can be considered on a uniform financial basis for cost-
benefit analysis
As a result of these advantages, the risk integrals of Potential Loss of Life
for personnel safety and Expected Value for overall financial impact are
ideal for risk reduction design engineering.

259
Using the single variable approach, it is possible to express each
consequence in that variable as shown on this slide. The total hazard
consequence can now be readily determined by adding the consequences
of each receptor in terms of the single variable. Assuming that the hazard
will cause all of these traceable impacts, the total cost of the brittle pipe
fracture risk is ~1.27 M$ per year without any safety function.
Note that in this case, the decrease in company image caused by the
hazard was determined to be accounted for in the other categories and no
additional cost was assessed in the analysis.

260
Considering the brittle pipe fracture and explosion example developed
earlier along with the safety system cost data, which SIF option should be
chosen from purely financial considerations?

This analysis is worth doing for a few key SIFs to check how well the
tolerable risk guidelines agree with the financially driven cost benefit
analysis recommendation. This may or may not be the case, even when
the tolerable risk limit comes from financially considerations.

261
Putting each case on an annual cost basis clarifies the choice significantly.
Since the first option provides a $31,000 per year savings relative to doing
nothing, it has significant potential.

262
Continuing the analysis for the second option shows the lowest total cost is
for the SIL system at $106,350 per year, more than $1 million savings
relative to doing nothing. Thus in this case, the cost benefit analysis
indicates the SIL 2 function is the best option.

263
For multiple receptors per hazard, some companies calculate required risk
reduction factors or integrity levels (IL) for each receptor. The RRF for the
instrumented function in this situation is chosen to be the highest one,
since it will automatically satisfy the other lesser requirements.

264
265
This section focused on the selection of the correct safety integrity level for
a system based on the results of the likelihood and consequence analysis.
It began with a discussion of the SILs themselves. Then the balance of the
section covered different methods for selection including the hazard
matrix, risk graph and several quantitative techniques.

266
This final section of the course deals with the safety requirements
specification. Beginning with the definition of this key piece of
documentation, this section then describes the required input needed to
properly specify the system safety requirements. Next, the section
discusses useful formats for the SRS before finishing with common SRS
problems and how to address them.

267
The overall objective of the SRS is to specify everything needed to allow
the safe and effective realization of the safety instrumented system. Some
have called the creation of this document an administrative burden. Most
realize that a complete, clear, and accurate SRS saves a lot of
misunderstanding and rework.
One way to minimize the effort and maximize the net benefit is to use SIL
selection, risk, and hazard analysis work process templates that generate
most of the SRS as their direct output.

268
Once all of the hazards identification, analysis, and SIL selection work is
complete, there is still the final clear documentation of these results
needed before the analysis phase is complete.

269
When developing an SRS, all of the information generated during the analysis
phase of the safety lifecycle is needed. This includes all the process or
machinery information, potential hazards, regulatory requirements, and a list
of all safety instrumented functions. For each SIF, the required risk reduction
(usually in the form of a SIL) and related risk analysis information are also
required.

270
271
272
This sample list of functional requirements includes both process and
safety system details.

273
Other functional requirement information includes consideration for
abnormal circumstances.

274
The integrity requirements listed here address both “safe” and
“dangerous” failure modes for the safety system.

275
A suggested format for the SRS is to start with the material needed for a
reader to understand the requirements. This introduction can be followed
with any general requirements, especially those things in common with all
SIF. SIF specific requirements can follow.

276
Many different kinds of general requirements can be stated, including the
decision to use de-energize-to-trip as the standard implementation
method. This slide and the following two slides show typical examples of
entries in the general requirements section of a SRS.

277
Applicable standards are also a good candidate for the general
requirements list.

278
General requirements can include false trip rate (usually in terms of mean
time to failure [MTTF]; spurious failure) and process safety time.

279
Once the introduction and general requirements sections are presented,
the set of specific requirements for each SIF follows naturally. An excellent
format for the requirements specific to each SIF is a predefined chart
containing all required information for that SIF.

280
When this kind of specific requirement chart is accompanied by a cause-
and-effect diagram (or other description of the logic), the requirements for
each SIF can be quickly and effectively presented.

281
Several different methods are commonly used to present the logic safety
requirements including plain text, cause and effect diagrams, and binary
logic diagrams. Each has advantages and disadvantages that will help
determine which is method is best in each situation.

282
Consider an example with two inputs and three outputs.

283
While some use a straight textual description, it is considered better to
provide structure and write the description almost like a computer
language. Most consider this far easier to understand and more precise.

284
Cause and Effect diagrams are also useful in many applications to describe
the logic.

285
Logic can also be expressed in the form of binary logic diagrams. This
drawing generally takes the most time but provides the most precise
description.
NOTE: The logic shows the nature of the de-energize to trip application.
Reading the textual description, one might use the word OR to describe
the logic. For example, the valves should close when BS01 indicates a trip
OR when PSL01 indicates a trip. But these signals go to logic zero when
trip is indicated, therefore the AND gate should be used per De Morgan’s
theorem.

286
Items marked in Red may be general or specific depending on application

287
The process of creating safety requirements specifications is not foolproof.
Studies have shown that problems common to general documentation
systems such as incomplete input, poor document maintenance and
revision control as well as incomplete information provided are also
relevant to SRS documentation.

288
IEC61508 makes recommendations of techniques to help avoid problems in
requirements specifications. The use of predefined formats for SIF specific
data is equivalent to the recommended checklists technique. The use of a
cause-and-effect diagram can also help, since it provides a predefined format
as well.
One other technique strongly recommended is the inspection or review. When
complete, the SRS should be reviewed by a group with good knowledge of the
process or machinery in question. One person needs to be appointed to be
responsible for the review to help achieve the needed quality.

289
It should be remembered that the measure of quality for any document is
not the number of pages or the weight, but how precisely and how quickly
the reader can accurately comprehend the required information.

290
This final section of the course dealt with the safety requirements
specification. Beginning with the definition of this key piece of
documentation, this section then described the required input needed to
properly specify the system safety requirements. Next, the section
discussed useful formats for the SRS before finishing with common SRS
problems and how to address them.

291
Over the past two days, we have covered a lot of ground concerning the
concepts and methods to identify hazards, assess their impact, and
manage the resulting risk through properly characterizing the
requirements of any necessary safety instrumented systems.

292
Looking at our progress another way, we have covered the analysis
phase of the Safety Lifecycle in some detail. Hazards are identifed,
analyzed and risk reduction targets are established for any SIF where
risk reduction is needed. The list of SIFs and the requirements are then
documented to provide a proper platform for the realization phase of
designing and building safety instrumented system..

293
To close, the analysis phase is part of the overall safety lifecycle
method to help achieve functional safety or freedom from
unacceptable risk as expressed by IEC 61508 and other
international and national standards. It is a very power and useful
tool capable of saving lives and preventing the waste of resources
on low value safety efforts.

294
295

You might also like