November 12 2003 - Presentation On Alarm Management

ASM
Abnormal
Situation
Management
Defining the way things
will be.
The birth of ASM...
• ASM grew from an initial focus on alarm

management. Most sites are aware that operator
overload and alarm floods are common during
abnormal operations. As we analyzed the issues
around alarm management, we discovered that
operator problems with the alarm system were
only a symptom of a general issue:
– the design, implementation, and maintenance
of many facilities, systems, and practices.
ASM Consortium
• Charter:
Current Membership: – Research the causes of
abnormal situations and
create technologies to
address this problem
• Deliverables:
– Technology, best practices,
application knowledge,
prototypes, metrics
• History:
– Started in 1994
– Co-funded by US Govt
(NIST)
– Budget: +$16M USD
BRAD ADAMS WALK ER University Affiliates • Current Status:
A R C H I T E C T U R E, P. C.
– Committed through 2002
– Honeywell leadership
– Expanding membership
Requirements for Safe Operation
• Hazards must be recognized and
Understood
• Equipment must be “fit for purpose”
• Systems and procedures to maintain
plant Integrity
• Competent staff
• Emergency Preparedness
• Monitor Performance
In the area of alarm management most companies fail to
meet these basic requirements for safe operation
Various cost elements
Theoretical Limit
F uture upgrades (e.g., Theore tically possible; cu rrently unsustainable
Advanc ed Control) Current Limit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even
Lost Revenue
Loss
Fixed Costs
Additional Shut down (Idle Plant)
Effic iency
unplanned costs
Accident Equipment
Plant Perfor mance damage, etc.
Losses due to
Savings from reducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
A Look At Plant Operations
A typical Production
Profile for an Asset 95 days
Intensive Facility for a
calendar year. 79 days
62 days
47 days
23 days
30 days
Days per Year
16 days
8 days
5 days
< 60% Daily Production 95% 100%
Production Target set by Enterprise

Factors Affecting Plant
Operations
Plant Operating Target
Planning Constraints
Plant Availability Operational Constraints
Production
Plant Incidents
Days per Year
Effectiveness
Asset Utilization
Plant Capacity Limit
Agility/Flexibility
Frequency Frequency
# Days
# Days
10
15
20
0
5
100
150
200
250
300
50
10
12
14
16
18
0
100
150
200
250
300
0
2
4
6
8
50
0
280
280 112
457
290
290
115
300 463
300
310 118
310 468
320
320
121
474
330
330
340 124
480
340
350
350 127
486
360
360
130
370 492
370
380 133
380 3.2% 497

$33.5 M
390
390
136 503
400
400
139
410 509
410
420
420 142
515
24.2M
5.8%
430
$24.2M
430
5.8%
145
520
440
440
$38.5 M
450 148
526
450
Feed Ra te
Rate
460
Production rate
Rate
Total Feed
Total Feed
460 151 532
470
470
154

480 538
480
490
490 157 543
500
500
160
549
510
510
520 163 555
520
530
530 166 561
540
540
169 567
550
550
560
172 572
560
570
570 174 578
580
580
590 177 584
590
600
600 180 590
610
1503
610
183 595
620
620
Real Life Examples
$38.5M!
capacity!
incidents!
5.8% in lost
lost $33.5M!
And this plant
This plant had
This plant had
This plant lost

$24.2M in lost
capacity due to
asset availability &
Site Studies have identified Plant Lost
Opportunity
Between 3-15% in Lost
Capacity is attributed to asset
in-availability and incidents Plant Operating Target
Plant Availability Operational Constraints
Plant Incidents Production

Management
NEW EMPHASIS!!
Days per Year
DCS/APC/
Asset Management Optimization efforts
Reliability & CMMS

Manufacturing
Execution
Scheduling & ERP
Major Profit Potential
Emphasis on plant & Higher Plant Operating Target
equipment reliability Fewer Planning Constraints
improvements and reduced
incidents can result in a
recovery of 3-15% of Fewer Operational Constraints
lost capacity!
Days per Year

The Importance of Alarm Management
Improvement Project
Alarm management is the proper
design, implementation, operation,
and maintenance of industrial
manufacturing plant alarm
systems.
Current alarming practices are leading to
Incidents
Major problem is:-
alarm flood
Standing Alarms
Poor Configuration of Alarms
Nuisance Alarms
Technology exists to significantly contribute to
effective alarm systems and provide good
Situation Awareness
Alarms identified as contribution
A Case
b
The lightning struck just before 9:00 AM on a Sunday. It immediately

started a fire in the crude distillation unit of the refinery. The control
operators on duty responded by calling out the fire brigade, and then
had to divert their attention to a growing number of alarms while
desperately trying to bring the crude unit to a safe emergency
shutdown.
Hydrocarbon flow was lost to the deethanizer in the FCCU recovery
section, which fed the debutanizer further along. The system was
arranged to prevent total loss of liquid level in the two vessels, so the
falling level in the deethanizer caused the deethanizer discharge valve
to close. This, in turn, caused the level in the debutanizer to drop
rapidly and its discharge valve also closed. Heat remained on the
debutanizer and the trapped liquid vaporized as the pressure rose
causing the pressure relief valve to “pop” (for the first of three times)
into the flare KO drum and then immediately onto the flare itself.
continued
In a matter of minutes, the board operator was able to restore flow to

the deethanizer. This permitted the deethanizer discharge valve to
be opened, allowing renewed flow forward to the debutanizer. The
rising level in the debutanizer should have caused the debutanizer
discharge valve to open (by the level controller action) and allow
b
flow on to the naphtha splitter. Although the operators in the

control room received a signal indicating the valve had opened, the
debutanizer, nonetheless was filling rapidly with liquid while the
naphtha splitter was emptying. The operators were concentrating
on the displays which focussed on the problems with the
deethanizer and debutanizer, and had no overview of the process
available to indicate that even though the debutanizer discharge
valve registered as open, there was no flow going from the
debutanizer to the naphtha splitter.
Despite attempts to divert the excess, the debutanizer became liquid-
logged about an hour later and the pressure relief valve lifted for the
second time, venting to the flare via the flare KO drum. Because there
were enormous volumes of gas venting, the level of liquid in the flare
KO drum was rising to a very high value.
About 2-1/2 hours later, the debutanizer vented to the flare a third time AND
CONTINUED VENTING FOR 36 MINUTES. The high level alarm for
the flare drum was activated at this time. But with alarms going off every
2 to 3 seconds, there appears to be no evidence that that alarm was ever
seen. By this time, the flare KO drum had filled with liquid well beyond
its design capacity. The fast-flowing gas through the overfilled drum
forced liquid out of the drum’s discharge pipe. The discharge line was not
designed for liquid, so the force of the liquid caused a rupture at an elbow.
This released over 20 tons of highly flammable hydrocarbon.
continued
The ensuing release quickly formed an ominous

drifting cloud of vapor and droplets. In a matter
of minutes, this cloud found its ignition source
350 feet downwind. The resulting explosion was
heard 80 miles away. In the town nearest the
plant, few windows still held intact panes, so
overpowering was the pressure shock wave from
the blast. The last fires in the refinery were
eventually extinguished 2 days later. end
Interface
between the
organization
& the individual
Management Workplace
Source Functional Condition Unsafe Acts

Failure Failure Tokens Errors &
Types Types Precursors Violations
Organization Individual
Stylistic or Cultural General Failure Poor workplace Near miss
Indicators Types design Auditing
Top Down: Accidents High workload
Unsociable hours Du Pont
Commitment Incidents
Inadequate Training
Competence Near-Misses training Workspace
Cognizance 1-10 hit list Poor perception
Motivation
data collected & of hazards
Proactive Design Attitude
analyzed Alarms
SI Projects Human Factors
Safety Information System
Control room Group Factors
Diagnostic and
Best Practices design Working Practice
remedial measures
Various cost elements
Theoretical Limit
F uture upgrades (e.g., Theore tically possible; cu rrently unsustainable
Advanc ed Control) Current Limit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even
Lost Revenue
Loss
Fixed Costs
Additional Shut down (Idle Plant)
Effic iency
unplanned costs
Accident Equipment
Plant Perfor mance damage, etc.
Losses due to
Savings from reducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
Managing Abnormal Situations
Anatomy of a Disaster from Operations Perspective
Operational Critical Operational Plant

Modes: Plant States: Systems: Goals: Activities:
Disaster Area Emergency Response

System
Emergency Minimize Firefighting
Site Emergency Response Impact
Accident First Aid
System
Rescue
Physical and Mechanical Bring to
Containment System Safe State
Out of Evacuation
Control
Safety Shutdown,
Protective Systems,
Abnormal Hardwired Emergency Alarms
Return to Manual Control &
Normal Troubleshooting
Abnormal
DCS Alarm System
Decision Support System

Process Equipment,
Keep Normal Preventative
Normal Normal DCS, Automatic Controls Monitoring &
Plant Management Systems Testing
Frequency
# Days
# Days Frequency
10
15
20
0
5
100
150
200
250
300
50
10
12
14
16
18
0
2
4
6
8
0
100
150
200
250
300
50
0
280
280
457
112
290
290
463
300 115
300
Days per Year 310
310 468
118
320
320
474
121
330
330
340 480
124
340
350 350
486 127
360 360
492 130
370 370
380 133
380 497
3.2%

$33.5 M
390 390
503 136
400 400
509 139
410 410
420 420
515
142
$24.2M
430 430
5.8%
145
520
< 60%
440 440
$38.5 M
148
450 450 526
Feed Ra te
Rate
Producti on rate
Total Feed
460 460
Rate
151
Total Feed
532
470 470
154

538
480 480
490 490 157

543
500 500
160
549
510 510
163
520 520 555
530 530 166

561
540 540
169
567
550 550
172
560 560
572
570 570 174
578
580 580
177
590 584
590
180
600 600
590
Summarized Production Data
610
610 183
1503
595
620
620
Daily Production
95%
100%
Plant Operating Target
Operational Constraints
Unexpected Upsets Cost 3-8% of Capacity
Optimization efforts

~ $10 Billion annually in lost production !
Major Profit Potential
Higher Plant Operating Target
Fewer Planning Constraints
Fewer Operational Constraints

Focused efforts can
result in recovery of
3-8% of capacity
Days per Year
~ $10 Billion potential to the bottom line!

Timing diagram of DIN V 19251 as applicable
for a single channel SRS with ultimate self tests
executed within the PST
Failure Occurrence in the Failure is Safe status of the

Process or in the Detected Process assured
Safeguarding System
t
System internal Time for Time for reaction of the Process
diagnostic time corrective action on the corrective action
Fault Tolerance Time
Fault tolerance time of the process or Process Safety Time (PST)

Reliability Requirements for Alarms
Claimed PFDavg Alarm system Human
integrity/reliability reliability
requirements requirements
1 – 0.1 Alarms may be
integrated into the
process control
system
No special requirements – however

the alarm system should be operated
engineered and maintained to the
good engineering standards
identified in the EEMUA Guide
EMMUA Alarm Systems Guide page 17

CONCEPT 1 : RISK REDUCTION
Actual Risk to meet

remaining required Level EUC Risk
risk of Safety
Necessary minimum risk reduction [ R ] Increasing

Risk
Actual risk reduction
Partial risk covered Partial risk covered Partial risk covered

by E/E/PES by Other Technology by External Risk
SRSs SRSs Reduction Facilities
Risk reduction achieved by all SRSs & External Risk Reduction Facilities
SAFETY INTEGRITY LEVELS
TABLE 2: SAFETY INTEGRITY LEVELS:

TARGET FAILURE MEASURES
SAFETY DEMAND MODE CONTINUOUS/
INTEGRIT OF OPERATION HIGH DEMAND
Y LEVEL (Average MODE OF
Probability of OPERATION
(SIL) failure to perform (Average
its design Probability of a
function on dangerous failure
4 10 to < 10
-5
demand)
-4
10 to
-5
per < 10 -4
year)
3 10 -4 to < 10 -3 10 -4 to < 10 -3
2 10 -3 to < 10 -2 10 -3 to < 10 -2
1 10 -2 to < 10 -1 10 -2 to < 10 -1
Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
0.1 – 0.01 Alarms system should The operator should be
be designated as safety trained in the
related & categorized as management of the
SIL 1 specific plant failure
that the alarm indicates;
Alarm system should The alarm presentation
be independent from arrangements should
the process control make the claimed alarm
system very obvious to the
operator and
distinguishable from
other alarms
The alarm should
remain on view to the
operator for the whole
of the time it is active
Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
Below 0.01 Alarms system would It is not recommended
have to be designated as that claims for a PFDavg
safety related and below 0.01 are made
categorized as at least for any operator action
SIL2 even if it is multiple
alarmed and very
simple.
For all credible
accident scenarios the
designer should
demonstrate that the
total number of safety
related alarms and their
maximum rate of
presentation does not
overload the operator

The Setting of a high pre-trip alarm
Maximum rate of change
of alarmed variable during fault
Limit at which
Time for operator B protection operates
to respond to alarm
and correct fault Abnormal Operating Region
Alarm Setting
A
Limit of largest normal
operational fluctuation
120 Explosion
Lower Explosive Limit (LEL)
Gas Concentration (Percentage of LEL)
100
Actual Gas
Concentration
80
Actual trip point
Normal
60 operating Level Error Measured Gas
Set trip point Concentration
Gas concentration
prior to fault
40
20 Fault Sampling Sensor Error Shut Down

Occurs Delay Delay Delay System Delay
0
0 10 20 30 40 50 60 70 80
Time after onset of fault (Seconds)
Redesign Choices
• Redesign - the plant or its controls to provide greater margin between the
normal operating limits & the trip limits. This is the most desirable solution but
is often impractical or too expensive;
• Setting within normal operating limits - setting the alam within the limits of
normal operating fluctuations & accepting that spurious alarms will occur
during large normal disturbances. This is ergonomically very undesirable and
will tend to increase alarm rates and reduce the operator confidence in the alarm
system. In effect it increases the Average Probability of Failure on Demand
(PFDavg) for the alarm system as a whole;
• setting nearer trip limits - setting the alarm closer to the trip limits and
accepting that some fast transients will not be corrected by the operator before
they reach the trip level. This will increase the production losses due to plant
trips, & because there are more demands on the protection system, tend to make
the plant less safe. It also implies an increase PFDavg for the alarm system.

Different Kinds of Events
Potential
Impact
of
Initiating
Abrupt/Catastrophic
Event
Manageable
Insidious
Time
Impact of DCS Alarm System
Awareness of Disturbances
With typical alarm systems,
orienting begins after an event Incident
creates an abnormal plant state.
The extent of the problem can
impact operator’s ability to be fully
aware of the locations of process
Potential disturbances.
Impact As disturbances propagate the
number of conditions to be aware of
of increases as well as the response
Initiating requirements and the likelihood of
missing important information.
Event Failure is
Detected
Safe status of the
Process assured
Failure Occurrence in the
Process or in the Safeguarding System Time
Point of operator awareness
Correct intervention causes return to normal

Impact of DCS Alarm System
Management of Problems
Incident
Inadequate filtering interferes with Action

Potential
Impact Alarm Floods delay Evaluation
of
Standing Alarms
Initiating interfere with
Event Orientation
Time
Point of operator awareness
Correct intervention causes return to normal

Impact of Good Alarm Management in Situation
Awareness
• Increases likelihood of
awareness of disturbances
Potential • Reduces time to awareness
• Hence, reduces the average
Impact impact of initiating events
of
Initiating
Event
Time
Average shift in awareness with decision support

Impact of Protection System
UN-SAFE
Incident
Trip SAFE
Emergency Alarm Loss
Impact
of
Initiating Quality
Event High Alarm
Operator
diagnostic time
Profit
Time FTT
Process Safety Time
Trip from SIS Emergency High FTT= Fault Tolerance Time
No response
Incorrect
Potential
Impact
of
Initiating
Event Suboptimal
Best
Time
Impact of Decision Support System
Support for Optimal Response
• Reduces errors
• Decreases time to implement
response
Potential • Manages side effects
• Increases awareness
Impact
of
Initiating
Event
Time
ASM Alarm Management Solutions
Education for Management, Engineers, Technicians
and Operators.
• Alarm Performance Assessment.

• Requirement for alarm optimization tools.
• Alignment with Company & EEMUA Guidelines.
• Alarm Rationalization.
• User Interface Design.
• Decision Support Activities
Alarm Management Optimization
Objectives
• Enhance operator effectiveness
– Avoid alarm floods
– Identify root causes
– Eliminate nuisance alarms
• Enhance profitability
– Reduce variability
– Maximize plant up time
– Prevent damage to equipment
• Reduce risk of :
– Injury to personnel
– Environmental incidents
The Process
Collect
Collect Data
Data
Change
Change
Management
Management Analyze
Analyze
Develop Plant
Alarm Management
Standards & Philosophy
Identify
Identify
Implement
Implement Enhancements
Enhancements
Verify
Verify Against
Against
Standards
Standards
Alarm Management Before - 30 Points Account for ~ 85 %
• Increase the effectiveness of the existing 100

K
of All Alarms
alarm system through proven

methodology
– Analyze existing system performance
– Assist in developing an alarm strategy and educating
operations staff
– Rationalize existing alarm system
After - 30 Points Account for ~ 52 %
• Recommend and apply new alarm 2
K
of All Alarms
management software
– UserAlert
– Optimization Suite
• Alarm Rationalization and Documentation
• Alarm Metrics and Analysis
• Advanced Alarm Handlers
Optimization Suite…
Alarm Rationalization
• Alarm priority (class) is based on severity and
level of impact and time
• Available priority options in TPS:
– No Action
– Journal
– Print
– Print & Journal
– Low
– High
– Emergency
Optimization Suite…
Alarm Rationalization
• Recommends alarm priorities based on plant
philosophy
– Severity of impact
– Time to respond
– Trip Point
• Electronically captures plant alarm
management philosophy
– Time to respond rules definition
– Impact and severity rules definition
• Apply manual priority override
• Use Alarm Impact Templates
• Generate EC Files (Honeywell)

November 12 2003 - Presentation On Alarm Management

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

November 12 2003 - Presentation On Alarm Management

Uploaded by

Copyright:

Available Formats

ASM

• ASM grew from an initial focus on alarm

< 60% Daily Production 95% 100%

Production Target set by Enterprise

< 60% Daily Production 95% 100%

460 151 532

This plant lost

Plant Incidents Production

< 60% Daily Production 95% 100%

Plant Capacity Limit

< 60% Daily Production 95% 100%

The lightning struck just before 9:00 AM on a Sunday. It immediately

In a matter of minutes, the board operator was able to restore flow to

flow on to the naphtha splitter. Although the operators in the

The ensuing release quickly formed an ominous

Source Functional Condition Unsafe Acts

Operational Critical Operational Plant

Disaster Area Emergency Response

Decision Support System

490 490 157

530 530 166

570 570 174

Plant Capacity Limit

Fewer Planning Constraints

Fewer Operational Constraints

Plant Capacity Limit

< 60% Daily Production 95% 100%

~ $10 Billion potential to the bottom line!

Failure Occurrence in the Failure is Safe status of the

Fault Tolerance Time

Fault tolerance time of the process or Process Safety Time (PST)

No special requirements – however

EMMUA Alarm Systems Guide page 17

Actual Risk to meet

Necessary minimum risk reduction [ R ] Increasing

Partial risk covered Partial risk covered Partial risk covered

TABLE 2: SAFETY INTEGRITY LEVELS:

EMMUA Alarm Systems Guide page 17

20 Fault Sampling Sensor Error Shut Down

EMMUA Alarm Systems Guide page 17

Point of operator awareness

Correct intervention causes return to normal

Inadequate filtering interferes with Action

Point of operator awareness

Correct intervention causes return to normal

Average shift in awareness with decision support

• Alarm Performance Assessment.

• Increase the effectiveness of the existing 100

alarm system through proven

You might also like