Professional Documents
Culture Documents
SIL and SIS Design - SHELL PDF
SIL and SIS Design - SHELL PDF
1
Functional Safety is the safety to be achieved by IPFs.
SIL Assessment & SIS design for non-functional safety experts
The title was intended to be IPF for Dummies. However . For Dummies is a
registered trademark of Wiley Publishing Ltd, the well known U.S. publishing
company. Therefore we could not use that title.
This presentation and hand-outs are intended for process engineers, operational
personnel and others that are involved in the process of IPF Classification and
testing. It is made for those who need to know the basics and essentials of IPF
classification without having to know all details, ifs and buts.
This presentation aims to provide appreciation of the IPF method (e.g. why an IPF
study needs to be done) as well as buy-into the conclusions and resulting IPF design
and test effort.
IPFs are all about risks. IPFs are intended to reduce the risk using instrumentation.
IEC 61511- the relevant international standard, refers to the risk reduction achieved
by instrumentation as functional safety.
IPF methodology is intended to allow the design and maintenance of tripsystems to
be based on the risk to be reduced. The higher the risk, the more effort we have to
do to keep the remaining risk acceptable.
E.g. if a certain process hazard may occur every 10 year (e.g. the failure of a control
loop in the dangerous direction) and the consequences are that a large compressor
is exposed to liquid carry over from the inlet scrubber, we can assess the risk. E.g. if
it happens we have to repair the compressor and the resulting cost of repair and lost
revenue from downtime is 5 million $, we can estimate the risk at 500K$ per year.
This is not acceptable and needs to be reduced. By installing a high level switch that
trips the compressor, we can avoid the consequences (the hazardous event). This
IPF should reduce the risk from 500K$ to say 5K$ per year.
The IPF in the above example reduces the risk by a factor of at least 100. Instead of
referring to the risk reduction to be achieved, we refer to the SIL as per IEC 61511
What is Risk?
Risk is the likelihood of an event times the severity of the
consequences.
The likelihood is expressed as a frequency (e.g. 0.2 times per
year)
In Shell the severity of consequences are expressed in terms of
consequences to people, environment and the business ($).
For IPFs the risks are assessed for each hazardous event to be
protected against. E.g. burner flame-out leads to furnace
explosion.
Flame out happens about once per 5 years, consequence will
be 5M$ + possible casualties.
Because an IPF is intended to reduce the risk , we first have to assess the risk to be
reduced.
What is risk?
Risk in the process industry is commonly expressed as the frequency at which the
problem may occur multiplies by the severity of the consequences if it is not stopped
by any protective measure.
The severity of the consequences is expressed as the consequences to people,
environment and assets (repair costs and production losses).
In the IPF method, only the risk is assessed that is associated with the specific
hazardous situation that the IPF is protecting against. So the hazardous situations
are taken one by one. The totalised risk of operating an LNG plant is not calculated.
Where such total remaining risk is a concern, other techniques are applied (e.g.
QRA).
Only where the cumulative risk may be reduced by very obvious measures the IPF
methodology recognises the situation and improves the trip system design, This is
the so called adding rule which is not discussed in this presentation.
Increasing risk
Lines of equal risk
Likelihood
Severity of consequences
Risk increases from the lower left corner to the upper right corner of the graph.
One could now try to assess the risk by plotting the likelihood and the severity of
consequences and establish the risk as a dot (the intersection) on the graph.
However assessing the risk accurately is very difficult.
Likelihood
Low Risk
Consequence
It would be much easier if we only needed to assess in which category the likelihood
and consequence severity falls.
E.g. I do not know the likelihood but it is between once per year and once per 10
years. I do not know about the consequences but I do know that it is between 1 and
10 M$.
By doing so I can relative quickly assess the risk category (e.g. High or medium
high).
This technique is the basis for the Shell Hazards and Effect Management Process
(HEMP) matrix that is also used by all Shell OUs.
Risk reduction
Preventive and Mitigating IPF effects
Base Risk = Demand rate x consequence = DR x CQ1
End Risk = DR x PFDtarget x CQ1 Mitigating IPF
(F&G)
End Risk = DR x PFDtarget x CQ2
High Risk
Likelihood
Preventative
(normal IPF)
Low Risk
CQ2 CQ1
Consequence
Shell Global Solutions
As discussed, an IPF is intended to reduce risk, but we need to know how and how
much.
Normal IPFs prevent the hazardous situation to develop into an event with
undesired consequences. Sometimes, the IPF may fail such that the undesired
consequences occur after all. However the frequency at which these events occur
are reduced dramatically. So normal IPFs move the risk downwards on the risk
matrix.
Some IPFs cannot reduce the frequency of occurrence of the event. E.g. a fire
detector cannot reduce the frequency at which the fire occurs. However it can reduce
the severity of consequences by e.g. initiating a sprinkler system.
According the Shell group HEMP risks should be reduced to a level where the are
either as low as reasonably practicably (ALARP) or so low that there is no longer a
need to demonstrate that the risk is ALARP. However in all cases we should strive
towards further risk (especially personal and environmental risk) reduction as soon
as suitable techniques become available and the society acceptance of risks change.
Some risks are so high that HEMP classifies them as intollerable. No matter what it
takes, we have to do something about it.
In the ALARP region we would need to demonstrate that either the risk can be
reduced further (e.g. with IPFs) or that the efforts (and money) required to reduce the
risk further would be disproportioned compared to the risk reduction gained. If that is
the case the risk is ALARP.
E.g. if a risk is $50,000 per year and further reduction would also take $50,000 per
year, the risk does not need to be reduced further and is ALARP.
Normally IPFs are not that expensive and using the normal IPF risk graph (see slide
12 14) will result in a remaining risk level that Shell considers broadly acceptable,
I.e. there is no need to demonstrate ALARP.
Only in cases where IPF testing needs to be waived, ALARP considerations may be
used to justify a waiver.
SIL Classes
IPF Class SIL PFD Risk Reduction Typical Implementation
As discussed IPF Classes are used as categories of IPFs that achieve a certain risk
reduction.
Below IPF Class III (PFD < 0.1) there are no requirements with regards to the risk
reduction to be achieved however there may still be a requirement/opportunity to
reduce the risk further by having an alarm or an automated DCS action.
For SIL 4 IPFs there is no equivalent IPFClass. Indeed a risk reduction better than
10,000 is very difficult to achieve and seeking alternative risk reducing measures is
often a better option.
A High Integrity Pressure Protection System (HIPPS) is the only practical example of
SIL 4 IPFs known. E.g. PDOs Main Oil Line has a few.
Likelihood (y-1)
High Risk
1
1-1 Risk Reduction of a
factor >100
1-2 => SIL 2
Low Risk
Remaining Risk
Consequence
When the initial risk has been mapped on the risk graph/matrix, and the areas of
tolerable and acceptable risks are known, one can determine how much risk
reduction is needed.
In the example above, the risk reduction required is 100 to get into the broadly
acceptable risk.
This kind of considerations form the basis of the calibration of a risk matrix that
yields the required SIL directly.
Likelihood (y-1)
1 2 3 4 High Risk
1
a 1 2 3
10-1
a a 1 2
10-2
Low Risk - a a 1
Consequence
As seen in the previous slide, each cell of the risk matrix requires a certain risk
reduction to achieve broadly acceptable risks. So we can immediately put the
required SIL in each cell such that after the implementation of the IPF the risk
becomes broadly acceptable.
This has been done in the risk graph above. Please note that the above example is
just an example and should not be used for any risk or IPF study!
RAM calibration
For every RAM, the calibration is extremely important as it
embeds acceptable remaining risk criteria
Assumptions and guidelines for use are critical e.g.:
Average consequences or potential consequences?
Credit for post top event mitigation layers built in or not?
(RRM RAM does include, SOPUS and SIC RAM does not)
How to assess likelihood? Include which non-IPF protection
layers?
Etc.
For those of you with special interest in risk assessment and differences in graphs
used in and outside Shell:
For those of you that might have been exposed to different risk graphs and matrices,
please note that the road to a calibrated risk matrix is full with pitfalls and
assumptions that should be clarified and enforced when it is used.
E.g. some matrices (like the RRM-IPF RAM and the 1996 IPF DEP risk graph)
assume potential credible consequences where others assume avarage
consequences.
The RRM-IPF RAM as well as the 1996 IPF DEP risk graph take credit for other post
top event (see slide 16) mitigation layers such that the user does not need to
specifically take them into account. This makes the matrix/graph easy to use but
create seemingly high remaining risks, especially for personnel safety.
E.g. if an hazardous situation occurs every 10 years and a casualty may result, both
the RRM-IPF RAM and the 1996 IPF DEP risk graph require an IPFClass IV / SIL 2
(risk reduction of 100). This means that the casualty is now experienced once per
1000 years. This is too much as per common corporate acceptable risk criteria (less
than once per 10,000 years per hazardous situation). However if the embedded
credit for other post top event (see slide 16) mitigation layers is taken into account,
the remaining risk becomes better than once per 10,000 years.
Further discussion of this subject would be way beyond the scope of this hand-out!
Risk
LOPA
ALARP
Over-engineering
Trip system complexity
Under-engineering
Every advantage has its disadvantage 2 also apply to installing SIFs in a process plant.
By installing a SIF a new situation is created that may create new hazardous situations. If
the instruments fail spuriously economic losses are incurred and the event often results in
flaring (environmental consequences).
So the risks associated with the original hazardous situations are reduced and new ones
created such that at some stage the total risks again increase. At this point the plant
becomes over engineered.
Therefore, to arrive at a fit for purpose SIS, also the risks associated with spurious trips
(safe failures of instruments) need to be studied.
Tools such as Layer of Protection analysis (LOPA) and ALARP evaluation help to prevent
over and under engineering.
LOPA helps to estimate the unmitigated event frequency (the hazardous event frequency
if the SIF were not realised) more accurately.
An ALARP evaluation also considers the new risk created by the various SIF designs
planned.
Therefore SIFpro includes both tools to help to arrive at a design that is fit for purpose.
2
Johan Cruyff
The Fundamentals of Safety are at the heart of the IEC 61508 and 61511. It
concentrates on:-
When designing and planning your process, you have to evaluate all your potential
hazards. This may be done using HAZOP or any other method that arrives at a
similar result.
Of each hazard, one should establish if the hazard is acceptable without additional
measures or if safeguards maybe required. These maybe procedural, changes in the
design, mechanical (RVs etc..) or by instruments.
For instruments, you have to classify the safety functions into safety integrity levels
(SIL) that essential give a measure of the degree of risk reduction these functions
should offer. This risk reduction is expressed as probability of failure on demand.
Of course the instruments should be able to bring the process to a safe state!
Following the establishing of the SIL, one should design and maintain the
instruments to ensure that the requirements of the SIL are met. Moreover these
design, construction, testing, commissioning and maintenance activities shall be
planned and auditable (documentation).
Design intent:
Hazardous situation prevent <released hazard>
IPF
Released Hazard
Hazardous event
Consequences of failure
on demand
Consequences
Mitigative IPFs
Alarms
Preventive IPFs
the bowtie
consequences
threats
barriers
(independant)
Shell Global Solutions
Criticality
This is the RAM used in SIFpro. Either by direct selection or by doing a LOPA
analysis the unmitigated event frequency is established. The unmitigated event
frequency is often referred to as the demand rate although this term is essentially
misleading.
The risk to be reduced by the SIF is also depending on other protection layers that
would act in case the SIF fails on demand I.e. act after the SIF had its chance (e.g. a
non return valve as part of a backflow protection system). This means that the
frequency at which the hazardous event will occur (e.g. actual backflow) does not
necessarily occur at the same frequency bat which the SIF is demanded to work.
Next consequence severity is established depending on the consequence category,
different questionnaires are available to help assessing the severity.
The highest consequence severity and the demand rate establish the initial risk or
criticaliy.
SIFpro allows the RAM to be calibrated and therefore rates the initial risk (using
letters like L,M,H, etc.) and the SIL is mapped against each cell in the RAM.
Design of an IPF
The SIL is a measure of the risk reduction expected to be
delivered by the IPF.
Two requirements for each IPF:-
1. The IPF shall meet the required degree of fault tolerance
2. The IPF shall meet the required PFD
In order to comply with IEC 61508 and IEC 61511 the IPF methodology requires the
design of an IPF to comply with both the following requirements:
The deterministic requirements (minimum degree of fault tolerance). E.g. for an SIL
3, at least a 1oo2 voting architecture is required. Detailed rules etc. would be too
much detail for this slide pack.
This rule is intended to protect the designer against over optimistic probabilistic
assumptions in cases of high risks (lies, damned lies and statistics)
The probabilistic requirements (meet the maximum PFD of the SIL; see slide 8). E.g.
for an SIL 3 the overall PFD should be better that 1E-3. See next slides for further
details.
PFDIPF
IPF
= PFDinitiator
initiator
+ PFDlog
logic
ic__solver
solver
+ PFD final
final __element
element
=
(PFD
PFD SIL
SIL
PFD
PFDlog
logic
ic__solver
solver
)
et =
PFD
PFDttarg
arget
22
The following slides aim to introduce the statistical calculations that should
demonstrate that the probabilistic requirements for the SIL have been met.
In general the PFD of the IPF is the sum of the PFD of all independent components
like the initiators (the sensors), the logic solver (e.g. the safety PLC) and the final
elements (the valves, etc.). Invariable the field devices (sensors and final elements)
are the weakest part of the IPF.
Many IPFs share initiators and final elements. If the test effort (see next slides) is
optimised for one IPF, it will influence other IPFs as well. Optimising all test efforts of
all components of the tripsystem is quite a calculations task!
To simplify calculations, the PFD budget of an initiator or final element is often
established by subtracting the PFD of the logic solver from the available PFD of the
complete IPF and divide the remainder equally between the initiator and the final
element. This is the approach taken by the RRM-IPF software. SIFpro on the other
hand optimises the whole function and takes the complete available PFD into
account to optimise initiator and final element testing.
time
The calculation of the likelihood of failure of a safety related (IPF) instrument at the
moment it is demanded to act (the probability of failure on demand or PFD) is based
on the assumption that the failure behaviour of instruments is generally random. This
assumption is illustrated above and on the next slide.
Instruments are initially exposed to early life failures as caused by manufacturing
defects, application and commissioning problems. The likelihood or frequency of
occurrence decreases rapidly over time. (the green curve).
On the other hand instruments, they are subjected to ageing as well as caused by
corrosion, erosion, fatigue, effects of possible stressful environment (UV, RFI etc.),
etc.. The effects of these age related failures tend to rise slowly over time until wear-
out sets in and the likelihood rises rapidly. This is shown with the red curve. E.g. for
ESD valves used in refineries, statistics from Exxon suggest that this effect sets in
after 10 years or so.
The combination of both is the famous bath tube curve (from its shape) in blue.
time
Testing & Mission time Replacement/
commissioning overhaul
Shell Global Solutions
During the initial phase of the life of an instrument, it is not really used for its safety
mission yet. The purpose of testing and commissioning is to find systematic (wiring,
configuration, integration etc. problems) and early life failures.
After commissioning the instrument is really used but before old age is taking its toll,
it is either replaced or overhauled to re-instate the as new condition.
In the mission time, the failure rate (the frequency at which a failure occurs) remains
practically constant.
The failure rate could be e.g. be 2E-2 per year. Obviously an instrument cannot fail
for 2%. It fails or it doesnt. A failure rate of 2E-2 should be interpreted as 2 out of
100 instruments failure in one year. Which instrument and when in the year is taken
as random.
Probability of failure
Imagine a bucket with 95 black and 5 red balls
Every year I take one ball and put it back if it is black.
If it is red I keep it and stop sampling.
A red ball indicates that the instrument failed dangerously but
I do not know (unrevealed).
What is the chance that I have a red ball after 1 year? (5%)
What is the chance that I have a red ball after 2 years?
(0.05+0.05*0.95=9.75%). Etc.
The chance of having a red ball increases over time until it is
100%.
The probability of failure on demand is the probability that I will find an instrument
failed at the moment it is actually required to work properly as caused by a demand
on the IPF. So we can compare it with an experiment with red and black balls in a
bucket.
The bucket contains 100 balls of which 5 are red. Each year I take one ball (blind
folded) and check the colour. If it is black there is no failure and I put it back. If it is
red, it symbolises a failed instrument. Once the instrument failed, it cannot really fail
again and therefore I stop taking samples once the red ball is taken.
After one year, I check the colour of the ball. What is the chance that it is red? 5% of
course. What is the chance after 2 years that it is red? This is the probability that it is
red after 1 year + the probability that I take a red ball the next year. For the 2nd year
the chance is equal to the probability that it was black the 1st year (95%) times the
chance that it is red the 2nd year (5%).
If this experiment is done during many many years, the probability that there is a red
ball becomes 100%. The probability over the years is shown in the next slide.
0.6
0.5
PFDt PFDt
0.4
0.3
0.2
0.1
0
0 10 20 30 40
Years
The first few years, the probability of failure on demand (PFD) rises almost linear with
time. This is shown as the purple line on the slide above.
IPF testing
Imagine After a while I check if I have a red ball. If I do, I
put it back and start over again.
In other words I check if the instrument failed dangerously and
unrevealed. If it did, I will repair.
The PFDt is now reset to zero after the test because:-
I am sure it did not fail yet (PFD = zero)
I repair if failed (PFD is zero again after the repair)
Suppose I test every 2 years
Testing has the effect of putting the red ball back into the bucket if I verify if I have
one or not.
Suppose I test every 2 years if there is a red ball and I put it back if I have one.
0.08
PFDt
0.06 PFDt
PFD
PFDavg
avg
0.04
0.02
0
0 5 10 15 20 25
ye ar
Because for real IPFs, the demand may come at any time, we are interested in the
average PFD throughout the life of the IPF. This is the time average PFD or PFDavg.
PFDavg of an instrument
As can be seen PFDavg ~ du T
Where:
du is the random dangerous unrevealed failure rate
T is the test interval.
This assumes perfect testing and no unavailability during test,
no unavailability due to repairs etc.
If test is not perfect there is a remnant PFD
From the previous slide one can see that the PFDavg is about T
0.16
0.14
0.12
PFDt 0.1 PFDt testable
0.08 PFD
PFDavg PFDt untestable
avg
0.06 PFDt
0.04
0.02
0
0 5 10 15 20 25
ye ar
If the test is not perfect (e.g. the probability that a dangerous fault, if it is there, will be
found by the test, is not 100%), there is a remaining probability that there is a
dangerous unrevealed failure left after the test.
Every time the test is carried out, there is an aspect of the instrument that is not
looked at by the test. The probability that this part of the instrument develops a
dangerous problem increases of time. This is the purple line.
The resulting overall PFDt rises over time and hence the PFDavg is higher as
compared to the situation with perfect testing.
This implies that the test coverage (how good is the test?) has an effect on the
PFDavg.
The above slide summaries the parameters that affect the PFDavg of an instrument.
The list is self explanatory.
Obviously diagnostics are very powerful because a dangerous failure that would
otherwise be left unnoticed until the next test, will be detected and alarmed. Repairs
are initiated immediately resulting in a much improved fractional dead time. The
fraction of the time that the instrument has a dangerous failure is much reduced
because we do not wait until a next test is carried out.
MVC is widely used in Shell to do exactly that.
Learning's(1)
IPF testing effectively reduces the time a dangerous undetected
failure remains lurking in the dark: reduces PFD, reduces
risk.
IPF testing is dictated by the risk reduction (= PFDavg) to be
achieved. (PFDavg ~ du T)
Required risk reduction is dictated by the initial risk.
Learning's(2)
Unrevealed Failure Robustness dramatically improves PFDavg
Diagnostics dramatically reduces manual testing efforts.
MVC is an effective way to diagnose transmitters
Reducing the test interval by a factor 2 reduces the PFDavg by
a factor 2 and thus increases the remaining risk with a factor
2.
The initiator(s), logic solver and the final element(s) should all
successfully work to avert the hazardous event. Hence:
PFDIPF
IPF
= PFDinitiator
initiator
+ PFDlog
logic
ic__solver
solver
+ PFD final
final __element
element
Quiz
What is risk?
What are the Shell risk criteria?
What is safety?
Do we need an IPF if the initial risk is acceptable?
What happens to a risk if an IPF is installed as classified using
the corporate risk graph?
What happens to the risk if tests are postponed or waived?
Why does testing reduce the PFDavg?
How can I improve the PFDavg of an instrument without testing
more?
What is risk? For the process industry (IEC 61511) it is defined as the product of the
event frequency and severity of consequence. Unit is consequence per time (e.g. 0.1
casualty per year)
What are the Shell risk criteria? Discussion..
What is safety? The absence of unacceptable risk (Class discussion..not
discussed in this slide pack!).
Do we need an IPF if the initial risk is acceptable? No.
What happens to a risk if an IPF is installed as classified using the corporate risk
graph? It becomes broadly acceptable.
What happens to the risk if IPF tests are postponed or waived? The risk increases
and will likely become tolerable. ALARP should be demonstrated (acc HEMP). It is
not expected that the risk becomes intolerable because that would require the test
interval to increase with more than a factor 10 (inferred from the HEMP).
Why does testing reduce the PFDavg? Because it reduces the time an undetected
dangerous failure may be present, I.e. it reduced the fractional dead time, the
fraction of time the device is not available to carry out its safety mission.
How can I improve the PFDavg of an instrument without testing more? Add
unrevealed failure robustness, Improve diagnostics or improve the dangerous failure
rate.
Contact details: