You are on page 1of 29

University Of Western Australia

Subsea Technology module OENA8589

RISK, RELIABILITY AND AVAILABILITY

Kevin Mullen

Risk

1
What is Risk?

• “The chance of something happening that will have an


impact on the objective”
• Frequency x Consequence
• “Expected value of an unwanted outcome measured in
dollars”

What is Risk?

• “Expected value of an unwanted outcome measured in


dollars”
– Injury or death of personnel
– Damage or destruction of the environment
– Excessive production costs
– Reduction or loss of production
– Project delays

2
Consequence

Likelihood

Typical Risk Matrix

Likelihood -> Never heard of Has occurred Has occurred Occurs often Occurs often
Consequence in industry in industry in company in company at site

No Injury LOW LOW LOW LOW LOW

Slight injury LOW LOW MED MED MED

Minor injury LOW MED MED HIGH HIGH

Major injury MED MED HIGH HIGH VERY HIGH


Fatality MED HIGH HIGH VERY HIGH VERY HIGH

Multiple fatality HIGH HIGH VERY HIGH VERY HIGH VERY HIGH

VERY HIGH Rectify immediately


HIGH Rectify with urgency, unless clearly impracticable
MED Reduce risk as far as practicable
LOW Accept, but manage through competency and awareness

3
Threat to Enterprise Major Serious Minor Incidental
ENTERPRISE- (Catastrophic) (2) (3) (4) (5)
(1)  PERSONNEL – One or  PERSONNEL - One or  PERSONNEL - Single  PERSONNEL – Minor or
WIDE RISK PERSONNEL – Multiple (five or several fatalities, limited to more severe injuries, injury, not severe, possible no injury, no lost time.
RANKING APPENDIX A
more) fatalities. --immediate
ENTERPRISE-WIDE
area of incident. RISK permanently
including RANKING MATRIX
lost time.  COMMUNITY - No injury,
COMMUNITY – Widespread  COMMUNITY - One or disabling injuries.  COMMUNITY - Odor or hazard, or annoyance to the
MATRIX impact to nearby communities.
ENVIRONMENTAL – Long term
environmental impact, and/or
Enterprise-Wide Risk Ranking Matrix
more severe injuries.
 ENVIRONMENTAL -
Significant release with
 COMMUNITY - One or
more minor injuries.
 ENVIRONMENTAL -
noise complaint from the
public.
 ENVIRONMENTAL -
public.
 ENVIRONMENT -
Environmentally recordable
adverse, worldwide publicity. serious off-site impact and Significant release with Release which results in event with no Agency
SEVERITY OF FACILITY – Total destruction to more likely than not to cause serious off-site impact. Agency notification or Permit notification or Permit
CONSEQUENCES installation(s) estimated at a immediate or long-term  FACILITY - Damage violation. violation.
cost greater than $100,000,000; health effects. to process area(s) at an  FACILITY - Some  FACILITY - Minimal
Extended facility shutdown,  FACILITY – Damage to estimated cost greater equipment damage at an equipment damage at an
and/or potential for permanent installation(s) estimated at a than $1,000,000 but estimated cost greater than estimated cost less than
closure. cost greater than less than $10,000,000; $100,000 but less than $100,000; negligible
LIKELIHOOD OF For floating production systems, $10,000,000 but less than 10 to 90 days of $1,000,000; 1 to 10 days of downtime.
OCCURRENCE loss of floating structure. $100,000,000; downtime in downtime. downtime.
excess of 90 days.

Frequent 1 2 2 3 5
(1)
Incident is very likely to occur at
this facility. Possibly several
times during its life time.
Statistical probability P> 10-2

Occasional 2 2 3 4 6
(2)
Incident may occur at this facility
some time during its life time.
Statistical probability:
10-2 > P > 10-3

Seldom 3 3 4 5 6
(3)
Incident has occurred at a similar
facility and may reasonably
occur at this facility.
Statistical probability:
10-3 > P > 10-4

Unlikely 4 4 6 6 6
(4)
Given current practices and
procedures, this incident is not
likely to occur at this facility.
Statistical probability:
10-4 >P > 10-6

Remote 4 5 6 6 6
(5)
Highly unlikely, although
statistics show that a similar
event has happened.
Statistical probability P< 10-6

PRIMARY DRIVER ENTERPRISE RISK SAFETY MANAGEMENT OCCUPATIONAL HEALTH AND SAFETY DRIVEN
MANAGEMENT DRIVEN SYSTEM DRIVEN

Risk Assessments
QRA – Quantitative Risk Assessment

1. Identifying what could go wrong


2. Estimating the likelihood of these
events occurring Risk Analysis
3. Examining the possible
consequences of these events

4. Deciding which risks are tolerable Risk Assessment


and which aren’t

5. Modifying the activity so the Risk Management -


intolerable risks are reduced or changes to design and
eliminated. operational practice

4
Fatal Accident Rates

Implied Cost of Averting a Fatality (ICAF)


58. In making an assessment of reasonable practicability, there is a need to set criteria on the
value of a life or implied cost of averting a statistical fatality (ICAF). HSE’s ‘Reducing Risks
Protecting People’ document sets the value of a life at £1,000,000 and by implication
therefore the level at which the costs are disproportionate to the benefits gained. In simplistic
terms, a measure that costs less than £1,000,000 and saves a life over the lifetime of an
installation is reasonably practicable, while one that costs significantly more than £1,000,000,
is disproportionate and therefore is not justified. However case law indicates that costs should
be grossly disproportionate and therefore costs in excess of this figure (usually multiples) are
used in the offshore industry. In reality of course there is no simple cut-off and a whole range
of factors, including uncertainty need to be taken account of in the decision making process.
59. In the offshore industry there is a need to take account of the increased focus on societal (or
group) risk, i.e. the risk of multiple fatalities in a single event, as a result of society's
perceptions of these types of accident. Therefore the offshore industry typically addresses this
by using a high proportion factor for the maximum level of sacrifice that can be borne without
it being judged ‘grossly disproportionate’; this has the effect of increasing the ICAF value used
for decision-making. The typical ICAF value used by the offshore industry is around
£6,000,000, i.e. a proportion factor of 6. HSE considers this to be the minimum level for the
application of Cost Benefit Analysis (CBA) in the offshore industry.
60. Use of a proportion factor of 6 ensures that any CBA tends towards the conservative end of
the spectrum and therefore takes account of the potential for multiple fatalities and
uncertainty. Although a proportion factor of 6 tends to be used, there are no agreed standards
and it is for each duty holder to apply higher levels if appropriate, for example in very novel
designs.
Extract from Assessment Principles for Offshore Safety Cases (APOSC)
Issued March 2006
UK Health and Safety Executive

5
Safety Terminology
• Risk Assessment - a subjective evaluation, involving judgment,
intuition and experience, where the level of risk is classified in
four levels and their associated measures of
Fatalities/Person/Year
– 1) Tolerable Risk - level prepared to accept but will continue
to seek reduction. 10-3 to 10-5
– 2) Acceptable Risk - level prepared to accept without seeking
further reduction. 10-5
– 3) Unacceptable Risk - level prepared to reject for oneself
and others. 10-3
– 4) ALARP - As low as reasonably practicable.
• The usual measure of risk at a global level is
Fatalities/Person/Year, but for the local view, i.e., for your
immediate corporate mission, risk can be viewed as simply the
“failure of your product.”
• The usual format for the analysis of Risk Assessment is a “Cost-
Benefit” Analysis, lives saved versus monetary costs.

What is Risk Management?


Risk Management is the effective identification, assessment and
control of Risk

• Establish Context and Scope


• Identify the Hazards
• Assess the Risk
– frequency
– consequences
– safeguards
• Rank the Risks
• Eliminate / Minimise the Risk
• Ongoing review and monitoring

6
How is Risk Managed?

• Useful Tools:
– QRA
– RAM studies
– FMECA
– HAZID \ HAZOP
– Audits
• Best implemented during design
• Qualitatively first, then quantitatively

Why is Risk Management needed?


• Legislation \ Standards
• Control of Major Hazard Facilities
• Pipeline Acts
• OS&H Regulations 1984
• AS/NZS 4360 Risk Management
• Necessary for business optimisation ($)
• Increase value by:
– minimising loss ($)
– maximising opportunity ($)
• Optimises the performance of the facility
• Reduces probability of becoming:
– Piper Alpha
– Longford
– Exxon Valdez

7
History of Major Hazards Control
1960’s Flixborough UK (explosion and fire)
Prescriptive
• Recommendations for design and operation
• (USA) style statutory provisions
• Consideration of the operation of safety procedures

1970’s Alexander L. Kielland (accommodation platform capsize)


The “Safety Report” approach.
• Operator has to describe safety management to the Regulator.

1980’s Bhopal India (toxic release)


• Concept Safety Evaluations based on Quantified Risk Analysis Techniques QRA
• Aims to identify and quantify risks to an acceptable level

1990’s Piper Alpha oil platform (explosion and fire)


The “Safety Case” approach.
• Operator has to convince Regulator on safety management.
• Companies now responsible for their Actions - Must assess and determine the level of Risk

2000’s Bombay High North platform (explosion and fire)


Control of Major Hazards
• Safety SILs

Bowtie Diagram

Critical
Event

Events leading to critical event Events following critical event

The process of risk analysis, with a sequence of


events leading to a hazardous situation (critical
event), followed by a series of events leading to a
variety of possible consequences

8
Identify the Control Measures
Proactive Controls Reactive Controls

Causes Hazards Incidents Outcomes

Reduction
measures Emergency Response

Prevention Mitigation Prevention of


Elimination measures measures escalation
measures

Safety Case
“A documented body of evidence that provides a convincing and
valid argument that a system is adequately safe for a given
application in a given environment”
To implement a safety case we need to:
• make an explicit set of claims about the system
• produce the supporting evidence
• provide a set of safety arguments that link the claims to the
evidence
• make clear the assumptions and judgements underlying the
arguments
The Safety Case must demonstrate that the control measures are
adequate to eliminate or reduce as far as practicable risks
associated with Major Incidents
Demonstration is typically achieved through:
• Reference to Codes of Practice, Standards, Guidance, etc.
• Through risk assessment (qualitative or quantitative)
The safety case is a “living document” which evolves over the safety
life-cycle.

9
Reliability

RAM DEFINITIONS
• RAM – Reliability, Availability, Maintainability
• Reliability - The ability of an item to perform a required function
under stated conditions for a stated period of time (BS4778) –
UPTIME
• Failure – The termination of the ability of an item to perform a
required function (BS4778) - FAILURE EVENT
• Maintainability - The ability of an item, under stated conditions of
use, to be retained in, or restored to, a state in which it can
perform its required functions, when maintenance is performed
under stated conditions and using prescribed procedures and
resources (BS4778) - DOWNTIME
• Availability - The ability of an item (under combined aspects of its
reliability, maintainability and maintenance support) to perform a
required function at a stated instant of time or over a stated
period of time (BS4778) - UPTIME / (UPTIME + DOWNTIME) or
MTTF / (MTTF + MTTR)
• Deliverability – The ability of a system to deliver gas to the LNG
plant (under combined aspects of availability and capacity)
understated conditions and at a stated instant of time or over a
stated period of time – (AVAILABILITY * CAPACITY)

10
Reliability: Key Design Requirement
• Reliability is as fundamental a design requirement as function and
performance
• For every Functional requirement a Reliability requirement can (in
principle) be specified
– Function: Seal A must not leak
– Reliability: P(seal A does not leak) > 0.99

• For every Performance requirement a Reliability requirement can


(in principle) be specified
– Function: Valve must close in less than 10 seconds
– Reliability: P(time to close < 10) > 0.99

Failure Characteristics
• Different components fail in different patterns
– Flow components, chokes & valves - wear out
– Mechanical components, wellheads – long life
– Electronic components - fail early or last a long time
– Pressure containment, pipes – system fails pressure
test, or long life
– Environmental influences, CO2, H2S, chlorides, over-
protective CP and H2 build-up – corrode progressively
or induce rapid cracking failures
• These create various distribution, Normal, Exponential,
Weibull, etc.
• Simple Prediction uses Exponential = e ^ (t/mttf) as
approximation for linear failure rates
• Complex Simulation programs use distributions matched
to components

11
Factors influencing failure rate

In general the failure rate of a component or element


depends on four main factors:
(a) Quality
(b) Temperature
(c) Environment
(d) Stress

These factors are influenced by:


• the design process
• manufacture
• the way the system is operated

Probabilistic Design

Probability Distribution Function of Load and Resistance

12
Stress and Strength

Overlapping of stress and


strength distributions

Failure Rate and Mean Time To Failure


Example: Constant Failure Rate
• Set h(t) = λ, a constant failure rate. Integrate to find the reliability R(t)
• R(t) = exp (-λ t),
This is often used in reliability analysis of systems.
Mean Time To Failure (MTTF) - average time a device or system will
operate, without repair, before failure. Form the Expected Value
Theorem:

• E(x) = ∫ x f(x) dx, and introducing an integration by parts, it follows


that the MTTF can be determined as:

• MTTF = ∫ t f(t) dt = ∫ R(t) dt


For the special case of a constant failure rate:
• MTTF = 1 / λ

13
Availability

Availability Improvement
• Availability = MTTF / (MTTF+MTTR)

• It is express as a fixed ratio, NOT time dependent

• Availability can be achieved in 2 ways:

– Extend failure free operating period (reliability)

– Reduce time to restore system (maintainability)

• Subsea time to repair must include; Detection,


Location, Analysis of repair, Spares / repair kit,
Qualification, Mobilisation, Deployment, Repair
execution, Commissioning.

• Increased value in driving for Reliability rather than


Maintainability to achieve Availability

14
Reliability & Repair Data

Reliability / Availability of Repairable Items

Assessment Period (t) 30 years

ITEM REPAIRABLE MTTF FAILURE RATE QUANTITY RELIABILITY UNRELIABILITY MTTR REPAIR RATE AVAILABILITY UNAVAILABILITY
ITEM X OF ITEMS OVER PERIOD OVER PERIOD u PROPORTION PROPORTION
years years^-1 No. Re=exp^(-Xt) 1-Re days years^-1 A=u / (X + u) 1-A
Hydraulic System Elements
1 Production Pipiing 10000 0.0001 1 0.99700 0.0030 100 3.650 0.999973 0.000027
2 Test / Vent Piping 5000 0.0002 1 0.99402 0.0060 100 3.650 0.999945 0.000055
3 10 inch 10 kpsi gate valve Isolation function 1000 0.0010 1 0.97045 0.0296 70 5.214 0.999808 0.000192
4 10 inch 10 kpsi gate valve HIPPS function 250 0.0040 1 0.88692 0.1131 20 18.250 0.999781 0.000219
5 1/2" Test Valve 250 0.0040 1 0.88692 0.1131 20 18.250 0.999781 0.000219
6 1/2" Vent Valve 250 0.0040 1 0.88692 0.1131 20 18.250 0.999781 0.000219
7 PZT Sensor 50 0.0200 1 0.54881 0.4512 20 18.250 0.998905 0.001095
8 HIPPS Hydraulic Module 210 0.0048 1 0.86688 0.1331 20 18.250 0.999739 0.000261
9 Check valve 500 0.0020 1 0.94176 0.0582 20 18.250 0.999890 0.000110
10 HIPPS SEM 42 0.0238 1 0.48954 0.5105 20 18.250 0.998697 0.001303

Types of Redundancy

• Classified on how the redundant elements are introduced into the circuit
• Active or Static Redundancy
– External components are not required to perform the function of
detection, decision and switching when an element or path in the
structure fails.
• Standby or Dynamic Redundancy
– External elements are required to detect, make a decision and switch
to another element or path as a replacement for a failed element or
path.
• Generally subsea systems (e.g. umbilicals, the MCS) use active
redundancy – hot standby

• As an alternative to redundancy, consider Diversity


– using alternative arrangements of a different kind
– e.g. the Back-Up Intervention Control system (BUICS) available on
Snohvit, in case the umbilical fails

15
Simple Parallel Redundancy
Active - Type 1

In its simplest form,


redundancy consists of a
simple parallel combination
of elements. If any element
fails open, identical paths
exist through parallel
redundant elements.

Bimodal Parallel Redundancy


Active - Type 3

(a) Bimodal Parallel/


Series Redundancy
A series connection of parallel
redundant elements provides
protection against shorts and
opens. Direct short across the
network due to a single element
shorting is prevented by a
(b) Bimodal Series/ redundant element in series. An
Parallel Redundancy open across the network is
prevented by the parallel element.
Network (a) is useful when the
primary element failure mode is
open. Network (b) is useful when
the primary element failure mode
is short.

16
Series and Parallel Availabiity Calculations

SAP Series - Availabilty - Product

Availability 72.000%
Umbilical Subsea
UnAvail 28.000%
Av 90.000% Av 80.000%
UnAv 10.000% UnAv 20.000%

PUP Parallel - Unavailabilty - Product

SCM A

Re 90.000%
UnRe 10.000% OR Re 99.000%
MTTF yrs 4.5 UnRe 1.000%
MTTR years 0.5

SCM B

Re 90.000%
UnRe 10.000%
MTTF yrs 4.5
MTTR days 0.5

Maintainability

17
Maintainability

• Philosophy - preventative, corrective, opportunistic


• Actions to demonstrate function is in good condition
– In service monitoring, testing and footprinting
– Corrosion monitoring
– Noise / vibration monitoring
– Fluid monitoring, sand detection, SRBs, chlorides, scale
• Repair planning and contingencies, pipeline repair systems,
spares stock holding, stand-by or call-off intervention contracts,
alternative temporary systems
• Access systems and tooling
• All aim to reduce MTTR
• Reliability Centred Maintenance
• Historic records, Trends, Predictive capability & feed back loops

Maintenance Philosophy

• Subsea  Excess Capacity (typical)


• Subsea  High Redundancy (typical)
– spare wells
– valves
– spare control systems
• Mobilise maintenance when…?

18
Maintaining the Gorgon Field

Deliverability

19
Deliverability

• Deliverability = Availability * Capacity


• Useful terms
– DCQ, Daily Contract Quantity
– Shortfall, Quantity not supplied
• Security of supply
• Contract shape and style
• Business Risk and Exposure
• Best Programs focus on the issue
• Used to understand, Quantify risk & contract accordingly
• Shapes contract terms DCQ to rolling 24 hour average quantity
• “Its about the money stupid”

Deliverability
• How to get high deliverability
– System analysis & engineering
– Understanding frequency & duration of failures
– Standard sizes and component rating at no extra cost
– De-bottlenecking & tuning capacity of system
– Line pack and storage
– Ability of downstream to respond to peak turn-up rates
– Capacity and ullage as pressure drops due to well failure
– Temporary increase of flow velocity / erosion limits wrt life
– N out of M philosophy and sparing insurance
• Operability studies & modelling
• Supply chain models based on “Just In Time” logistics
• Define value of Re Av De in relationship to project

20
Safety Integrity Levels

What is a Safety Integrity Level?


Safety Integrity Level is the required “reliability” of a safety function
Safety Low demand mode of operation
Integrity (Average probability of failure to
Level perform its design function on demand)
4 ≥ 10-5 to < 10-4
3 ≥ 10-4 to < 10-3
2 ≥ 10-3 to < 10-2
1 ≥ 10-2 to < 10-1

Safety High demand or continuous mode of


Integrity operation
Level (Probability of a dangerous failure per
annum)
4 ≥ 10-5 to < 10-4
3 ≥ 10-4 to < 10-3
2 ≥ 10-3 to < 10-2
1 ≥ 10-2 to < 10-1

21
PFD

• Risk reduction requiring a SIL 4 function should not be implemented. Rather, this
should prompt a redistribution of required risk reduction across other measures.

Classic HIPPS Configuration

22
SIL 3 HIPPS example

Risk Reduction

Residual Tolerable Initial Risk of high pressure


getting past the tree
risk risk production choke (Pressure
Regulating System)

100 (once per annum)


1.87 x 10-6 pa 10-5 pa (Acceptable failure rate per DNV)

Necessary risk reduction Increasing


risk

Actual risk reduction

23
Risk Reduction

Layers of Protection

Pressure Protection System for Pipeline

Residual Tolerable Initial Risk of hydrate


blockage, and
risk risk overpressuring the
pipeline

100 (once per annum)


10-5 (Acceptable failure rate per DNV)

Necessary risk reduction Increasing


risk

Actual risk reduction

Partial risk covered


Risk Reduction by Risk Reduction by
by other systems Pressure Safety Pressure Regulating
e.g. manual shutdown, System System
Pipeline Simulator etc. SIL 3 SIL 2

Risk reduction achieved by all safety-related


systems and external risk reduction facilities

24
Equipment Failure Rates

Equipment PFDs

25
PFD as a function of Test Interval

Probability PFDAVG = ½ λ τ i
of Failure
on Demand

PFDavg

Test TIF
Independent
Failure
Time, Test Interval τ i

PFD for a simple system

Proof Test = 1 yr

For the Pressure Transmitter,

PFDSE = 0.44 x 10-3

For the logic solving element,

PFDLS = 7.0 x 10-3

For the final element,

PFDFE = 3.5 x 10-3

Therefore, for the safety function,

PFDAVG = 0.44 x 10 -3 + 7.0 x 10-3 + 3.5 x 10-3 = 1.1 x 10-2

≡ Safety Integrity Level 1


Change proof test interval to 6 months

PFDSE = 0.22 x 10 -3
PFDLS = 3.5 x 10-3
PFDFE = 1.75 x 10-3
PFDAVG = 5.5 x 10-3
≡ Safety Integrity Level 2

26
Layered Protection System
Subsea Control Module

Subsea Gas Plant


Electronics DCS
Dump Module PPS
Valve card

Single layer
PFDAVG = 1.1 x 10-2 ≡ Safety Integrity Level 1 (annual testing)

Dual layers
PFDAVG = (1.1 x 10-2) x (1.1 x 10-2) ≡ 1.2 x 10-4
(assuming no common mode failure)
≡ ”Safety Integrity Level 3”
(annual testing)

Conclusion

27
The cost of failure - BP experience

These are the direct costs only, Foinaven also incurred:


• FPSO demurrage charges
• NPV of production (20% * 80,000 bbl/d * 300 days * 25USD / bbl) 120MUSD
• Share value erosion and significantly lower dividends for period
• Loss of public / shareholder confidence in BP abilities to manage
technology
• Reputation damage
• Tangible losses > 250MUSD, Measurable losses at least the same again
• Changed BP contracting philosophy, EPC to EPCM Managed Engineeirng
• Schiehallion SCM were run at single high pressure but DCV pilots were
not requalified and subsequently overstressed and leaked.

The BP Bathtub Curve

28
Value of Performance
An interesting echo from the 1970’s

or SAFETY

29