You are on page 1of 43

ASM

Abnormal
Situation
Management
Defining the way things
will be.
The birth of ASM...

• ASM grew from an initial focus on alarm


management. Most sites are aware that operator
overload and alarm floods are common during
abnormal operations. As we analyzed the issues
around alarm management, we discovered that
operator problems with the alarm system were
only a symptom of a general issue:
– the design, implementation, and maintenance
of many facilities, systems, and practices.
ASM Consortium
• Charter:
Current Membership: – Research the causes of
abnormal situations and
create technologies to
address this problem
• Deliverables:
– Technology, best practices,
application knowledge,
prototypes, metrics
• History:
– Started in 1994
– Co-funded by US Govt
(NIST)
– Budget: +$16M USD
BRAD ADAMS WALK ER University Affiliates • Current Status:
A R C H I T E C T U R E, P. C.
– Committed through 2002
– Honeywell leadership
– Expanding membership
Requirements for Safe Operation
• Hazards must be recognized and
Understood
• Equipment must be “fit for purpose”
• Systems and procedures to maintain
plant Integrity
• Competent staff
• Emergency Preparedness
• Monitor Performance
In the area of alarm management most companies fail to
meet these basic requirements for safe operation
Various cost elements
Theoretical Limit
F uture upgrades (e.g., Theore tically possible; cu rrently unsustainable
Advanc ed Control) Current Limit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even

Lost Revenue
Loss

Fixed Costs
Additional Shut down (Idle Plant)
Effic iency

unplanned costs
Accident Equipment
Plant Perfor mance damage, etc.

Losses due to
Savings from reducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
A Look At Plant Operations
A typical Production
Profile for an Asset 95 days
Intensive Facility for a
calendar year. 79 days
62 days

47 days
23 days
30 days
Days per Year

16 days

8 days

5 days

< 60% Daily Production 95% 100%

Production Target set by Enterprise


Factors Affecting Plant
Operations
Plant Operating Target
Planning Constraints
Plant Availability Operational Constraints

Production
Plant Incidents
Days per Year

Effectiveness
Asset Utilization
Plant Capacity Limit

< 60% Daily Production 95% 100%

Agility/Flexibility
Frequency Frequency
# Days
# Days

10
15
20

0
5
100
150
200
250
300

50

10
12
14
16
18
0
100
150
200
250
300

0
2
4
6
8
50

0
280
280 112
457

290
290
115
300 463
300
310 118
310 468

320
320
121
474
330
330
340 124
480
340
350
350 127
486

360
360
130
370 492

370
380 133
380 3.2% 497


$33.5 M
390
390
136 503
400
400
139
410 509
410
420
420 142
515
24.2M

5.8%

430
$24.2M

430
5.8%

145
520
440
440

$38.5 M
450 148
526
450
Feed Ra te

Rate
460
Production rate

Rate

Total Feed
Total Feed

460 151 532

470
470
154


480 538

480
490
490 157 543

500
500
160
549
510
510
520 163 555
520
530
530 166 561

540
540
169 567
550
550
560
172 572
560
570
570 174 578
580
580
590 177 584
590
600
600 180 590

610
1503

610
183 595
620
620
Real Life Examples

$38.5M!
capacity!
incidents!

5.8% in lost

lost $33.5M!
And this plant
This plant had
This plant had

This plant lost


$24.2M in lost
capacity due to
asset availability &
Site Studies have identified Plant Lost
Opportunity
Between 3-15% in Lost
Capacity is attributed to asset
in-availability and incidents Plant Operating Target
Planning Constraints
Plant Availability Operational Constraints

Plant Incidents Production


Management
NEW EMPHASIS!!
Days per Year

DCS/APC/
Asset Management Optimization efforts
Reliability & CMMS
Plant Capacity Limit

< 60% Daily Production 95% 100%


Manufacturing
Execution
Scheduling & ERP
Major Profit Potential
Emphasis on plant & Higher Plant Operating Target
equipment reliability Fewer Planning Constraints
improvements and reduced
incidents can result in a
recovery of 3-15% of Fewer Operational Constraints

lost capacity!
Days per Year

Plant Capacity Limit

< 60% Daily Production 95% 100%


The Importance of Alarm Management
Improvement Project
Alarm management is the proper
design, implementation, operation,
and maintenance of industrial
manufacturing plant alarm
systems.
Current alarming practices are leading to
Incidents
Major problem is:-
alarm flood
Standing Alarms
Poor Configuration of Alarms
Nuisance Alarms
Technology exists to significantly contribute to
effective alarm systems and provide good
Situation Awareness
Alarms identified as contribution
A Case
b

The lightning struck just before 9:00 AM on a Sunday. It immediately


started a fire in the crude distillation unit of the refinery. The control
operators on duty responded by calling out the fire brigade, and then
had to divert their attention to a growing number of alarms while
desperately trying to bring the crude unit to a safe emergency
shutdown.
Hydrocarbon flow was lost to the deethanizer in the FCCU recovery
section, which fed the debutanizer further along. The system was
arranged to prevent total loss of liquid level in the two vessels, so the
falling level in the deethanizer caused the deethanizer discharge valve
to close. This, in turn, caused the level in the debutanizer to drop
rapidly and its discharge valve also closed. Heat remained on the
debutanizer and the trapped liquid vaporized as the pressure rose
causing the pressure relief valve to “pop” (for the first of three times)
into the flare KO drum and then immediately onto the flare itself.
continued

In a matter of minutes, the board operator was able to restore flow to


the deethanizer. This permitted the deethanizer discharge valve to
be opened, allowing renewed flow forward to the debutanizer. The
rising level in the debutanizer should have caused the debutanizer
discharge valve to open (by the level controller action) and allow
b

flow on to the naphtha splitter. Although the operators in the


control room received a signal indicating the valve had opened, the
debutanizer, nonetheless was filling rapidly with liquid while the
naphtha splitter was emptying. The operators were concentrating
on the displays which focussed on the problems with the
deethanizer and debutanizer, and had no overview of the process
available to indicate that even though the debutanizer discharge
valve registered as open, there was no flow going from the
debutanizer to the naphtha splitter.
Despite attempts to divert the excess, the debutanizer became liquid-
logged about an hour later and the pressure relief valve lifted for the
second time, venting to the flare via the flare KO drum. Because there
were enormous volumes of gas venting, the level of liquid in the flare
KO drum was rising to a very high value.

About 2-1/2 hours later, the debutanizer vented to the flare a third time AND
CONTINUED VENTING FOR 36 MINUTES. The high level alarm for
the flare drum was activated at this time. But with alarms going off every
2 to 3 seconds, there appears to be no evidence that that alarm was ever
seen. By this time, the flare KO drum had filled with liquid well beyond
its design capacity. The fast-flowing gas through the overfilled drum
forced liquid out of the drum’s discharge pipe. The discharge line was not
designed for liquid, so the force of the liquid caused a rupture at an elbow.
This released over 20 tons of highly flammable hydrocarbon.
continued

The ensuing release quickly formed an ominous


drifting cloud of vapor and droplets. In a matter
of minutes, this cloud found its ignition source
350 feet downwind. The resulting explosion was
heard 80 miles away. In the town nearest the
plant, few windows still held intact panes, so
overpowering was the pressure shock wave from
the blast. The last fires in the refinery were
eventually extinguished 2 days later. end
Interface
between the
organization
& the individual
Management Workplace

Source Functional Condition Unsafe Acts


Failure Failure Tokens Errors &
Types Types Precursors Violations

Organization Individual
Stylistic or Cultural General Failure Poor workplace Near miss
Indicators Types design Auditing
Top Down: Accidents High workload
Unsociable hours Du Pont
Commitment Incidents
Inadequate Training
Competence Near-Misses training Workspace
Cognizance 1-10 hit list Poor perception
Motivation
data collected & of hazards
Proactive Design Attitude
analyzed Alarms
SI Projects Human Factors
Safety Information System
Control room Group Factors
Diagnostic and
Best Practices design Working Practice
remedial measures
Various cost elements
Theoretical Limit
F uture upgrades (e.g., Theore tically possible; cu rrently unsustainable
Advanc ed Control) Current Limit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even

Lost Revenue
Loss

Fixed Costs
Additional Shut down (Idle Plant)
Effic iency

unplanned costs
Accident Equipment
Plant Perfor mance damage, etc.

Losses due to
Savings from reducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
Managing Abnormal Situations
Anatomy of a Disaster from Operations Perspective

Operational Critical Operational Plant


Modes: Plant States: Systems: Goals: Activities:

Disaster Area Emergency Response


System
Emergency Minimize Firefighting
Site Emergency Response Impact
Accident First Aid
System
Rescue
Physical and Mechanical Bring to
Containment System Safe State
Out of Evacuation
Control
Safety Shutdown,
Protective Systems,
Abnormal Hardwired Emergency Alarms
Return to Manual Control &
Normal Troubleshooting
Abnormal
DCS Alarm System

Decision Support System


Process Equipment,
Keep Normal Preventative
Normal Normal DCS, Automatic Controls Monitoring &
Plant Management Systems Testing
Frequency
# Days
# Days Frequency

10
15
20

0
5
100
150
200
250
300

50

10
12
14
16
18

0
2
4
6
8
0
100
150
200
250
300

50

0
280
280
457
112
290
290
463
300 115
300
Days per Year 310
310 468
118

320
320
474
121
330
330

340 480
124
340

350 350
486 127

360 360
492 130
370 370

380 133
380 497
3.2%



$33.5 M
390 390
503 136

400 400

509 139
410 410

420 420
515
142
$24.2M

430 430
5.8%

145
520

< 60%
440 440
$38.5 M
148
450 450 526
Feed Ra te

Rate
Producti on rate

Total Feed

460 460

Rate
151

Total Feed
532

470 470
154


538
480 480

490 490 157


543

500 500
160
549

510 510
163
520 520 555

530 530 166


561

540 540
169
567

550 550

172
560 560
572

570 570 174

578

580 580
177

590 584
590
180
600 600
590
Summarized Production Data

610
610 183
1503

595
620
620

Daily Production
95%
100%
Plant Operating Target
Planning Constraints
Operational Constraints
Unexpected Upsets Cost 3-8% of Capacity

Optimization efforts

Plant Capacity Limit


~ $10 Billion annually in lost production !
Major Profit Potential
Higher Plant Operating Target

Fewer Planning Constraints

Fewer Operational Constraints


Focused efforts can
result in recovery of
3-8% of capacity
Days per Year

Plant Capacity Limit

< 60% Daily Production 95% 100%

~ $10 Billion potential to the bottom line!


Timing diagram of DIN V 19251 as applicable
for a single channel SRS with ultimate self tests
executed within the PST

Failure Occurrence in the Failure is Safe status of the


Process or in the Detected Process assured
Safeguarding System

t
System internal Time for Time for reaction of the Process
diagnostic time corrective action on the corrective action

Fault Tolerance Time

Fault tolerance time of the process or Process Safety Time (PST)


Reliability Requirements for Alarms
Claimed PFDavg Alarm system Human
integrity/reliability reliability
requirements requirements
1 – 0.1 Alarms may be
integrated into the
process control
system

No special requirements – however


the alarm system should be operated
engineered and maintained to the
good engineering standards
identified in the EEMUA Guide

EMMUA Alarm Systems Guide page 17


CONCEPT 1 : RISK REDUCTION

Actual Risk to meet


remaining required Level EUC Risk
risk of Safety

Necessary minimum risk reduction [ R ] Increasing


Risk
Actual risk reduction

Partial risk covered Partial risk covered Partial risk covered


by E/E/PES by Other Technology by External Risk
SRSs SRSs Reduction Facilities

Risk reduction achieved by all SRSs & External Risk Reduction Facilities
SAFETY INTEGRITY LEVELS

TABLE 2: SAFETY INTEGRITY LEVELS:


TARGET FAILURE MEASURES
SAFETY DEMAND MODE CONTINUOUS/
INTEGRIT OF OPERATION HIGH DEMAND
Y LEVEL (Average MODE OF
Probability of OPERATION
(SIL) failure to perform (Average
its design Probability of a
function on dangerous failure
4 10 to < 10
-5
demand)
-4
10 to
-5
per < 10 -4
year)
3 10 -4 to < 10 -3 10 -4 to < 10 -3

2 10 -3 to < 10 -2 10 -3 to < 10 -2

1 10 -2 to < 10 -1 10 -2 to < 10 -1
Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
0.1 – 0.01 Alarms system should The operator should be
be designated as safety trained in the
related & categorized as management of the
SIL 1 specific plant failure
that the alarm indicates;
Alarm system should The alarm presentation
be independent from arrangements should
the process control make the claimed alarm
system very obvious to the
operator and
distinguishable from
other alarms
The alarm should
remain on view to the
operator for the whole
of the time it is active
EMMUA Alarm Systems Guide page 17
Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
Below 0.01 Alarms system would It is not recommended
have to be designated as that claims for a PFDavg
safety related and below 0.01 are made
categorized as at least for any operator action
SIL2 even if it is multiple
alarmed and very
simple.
For all credible
accident scenarios the
designer should
demonstrate that the
total number of safety
related alarms and their
maximum rate of
presentation does not
overload the operator

EMMUA Alarm Systems Guide page 17


The Setting of a high pre-trip alarm
Maximum rate of change
of alarmed variable during fault

Limit at which
Time for operator B protection operates
to respond to alarm
and correct fault Abnormal Operating Region

Alarm Setting
A
Limit of largest normal
operational fluctuation
EMMUA Alarm Systems Guide page 17
120 Explosion
Lower Explosive Limit (LEL)
Gas Concentration (Percentage of LEL)

100
Actual Gas
Concentration
80
Actual trip point

Normal
60 operating Level Error Measured Gas
Set trip point Concentration
Gas concentration
prior to fault
40

20 Fault Sampling Sensor Error Shut Down


Occurs Delay Delay Delay System Delay

0
0 10 20 30 40 50 60 70 80
Time after onset of fault (Seconds)
Redesign Choices
• Redesign - the plant or its controls to provide greater margin between the
normal operating limits & the trip limits. This is the most desirable solution but
is often impractical or too expensive;
• Setting within normal operating limits - setting the alam within the limits of
normal operating fluctuations & accepting that spurious alarms will occur
during large normal disturbances. This is ergonomically very undesirable and
will tend to increase alarm rates and reduce the operator confidence in the alarm
system. In effect it increases the Average Probability of Failure on Demand
(PFDavg) for the alarm system as a whole;
• setting nearer trip limits - setting the alarm closer to the trip limits and
accepting that some fast transients will not be corrected by the operator before
they reach the trip level. This will increase the production losses due to plant
trips, & because there are more demands on the protection system, tend to make
the plant less safe. It also implies an increase PFDavg for the alarm system.

EMMUA Alarm Systems Guide page 17


Different Kinds of Events

Potential
Impact
of
Initiating
Abrupt/Catastrophic
Event

Manageable

Insidious

Time
Impact of DCS Alarm System
Awareness of Disturbances
With typical alarm systems,
orienting begins after an event Incident
creates an abnormal plant state.
The extent of the problem can
impact operator’s ability to be fully
aware of the locations of process
Potential disturbances.
Impact As disturbances propagate the
number of conditions to be aware of
of increases as well as the response
Initiating requirements and the likelihood of
missing important information.
Event Failure is
Detected
Safe status of the
Process assured
Failure Occurrence in the
Process or in the Safeguarding System Time

Point of operator awareness

Correct intervention causes return to normal


Impact of DCS Alarm System
Management of Problems
Incident

Inadequate filtering interferes with Action


Potential
Impact Alarm Floods delay Evaluation
of
Standing Alarms
Initiating interfere with
Event Orientation

Time

Point of operator awareness

Correct intervention causes return to normal


Impact of Good Alarm Management in Situation
Awareness

• Increases likelihood of
awareness of disturbances
Potential • Reduces time to awareness
• Hence, reduces the average
Impact impact of initiating events
of
Initiating
Event

Time

Average shift in awareness with decision support


Impact of Protection System

UN-SAFE
Incident
Trip SAFE
Emergency Alarm Loss
Impact
of
Initiating Quality
Event High Alarm
Operator
diagnostic time
Profit
Time FTT
Process Safety Time
Trip from SIS Emergency High FTT= Fault Tolerance Time
No response
Incorrect

Potential
Impact
of
Initiating
Event Suboptimal

Best
Time
Impact of Decision Support System
Support for Optimal Response

• Reduces errors
• Decreases time to implement
response
Potential • Manages side effects
• Increases awareness
Impact
of
Initiating
Event

Time
ASM Alarm Management Solutions
Education for Management, Engineers, Technicians
and Operators.

• Alarm Performance Assessment.


• Requirement for alarm optimization tools.
• Alignment with Company & EEMUA Guidelines.
• Alarm Rationalization.
• User Interface Design.
• Decision Support Activities
Alarm Management Optimization
Objectives
• Enhance operator effectiveness
– Avoid alarm floods
– Identify root causes
– Eliminate nuisance alarms
• Enhance profitability
– Reduce variability
– Maximize plant up time
– Prevent damage to equipment
• Reduce risk of :
– Injury to personnel
– Environmental incidents
Alarm Management Optimization
The Process

Collect
Collect Data
Data

Change
Change
Management
Management Analyze
Analyze

Develop Plant
Alarm Management
Standards & Philosophy

Identify
Identify
Implement
Implement Enhancements
Enhancements

Verify
Verify Against
Against
Standards
Standards
Alarm Management Optimization
Alarm Management Before - 30 Points Account for ~ 85 %

• Increase the effectiveness of the existing 100


K
of All Alarms

alarm system through proven


methodology
– Analyze existing system performance
– Assist in developing an alarm strategy and educating
operations staff
– Rationalize existing alarm system
After - 30 Points Account for ~ 52 %
• Recommend and apply new alarm 2
K
of All Alarms

management software
– UserAlert
– Optimization Suite
• Alarm Rationalization and Documentation
• Alarm Metrics and Analysis
• Advanced Alarm Handlers
Optimization Suite…
Alarm Rationalization
• Alarm priority (class) is based on severity and
level of impact and time
• Available priority options in TPS:
– No Action
– Journal
– Print
– Print & Journal
– Low
– High
– Emergency
Optimization Suite…
Alarm Rationalization
• Recommends alarm priorities based on plant
philosophy
– Severity of impact
– Time to respond
– Trip Point
• Electronically captures plant alarm
management philosophy
– Time to respond rules definition
– Impact and severity rules definition
• Apply manual priority override
• Use Alarm Impact Templates
• Generate EC Files (Honeywell)

You might also like