You are on page 1of 33

Best Ever Alarm System Toolkit

Xihui Chen,
Katia Danilova,
Kay Kasemir
SNS/ORNL
kasemirk@ornl.gov
July 2009

Previous Attempts at SNS
• ALH, soft-IOCs and EDM screens
• Issues
– GUI
• Static Layouts
• N clicks to see (some of the) active alarms
– Configuration
• .. was bad  Always too many alarms
• Changes required contacting one of the 2 experts,
restart ALH, …

– Information




2

Operator guidance?
Related displays?
Most frequent alarm?
Timeline of alarm?

Managed by UT-Battelle
for the U.S. Department of Energy

New End-User View: Alarm Table
• All current
alarms
– new, ack’ed

• Sort by PV,
Descr., Time,
Severity, …
• Optional:
Annunciate

• Acknowledge one or multiple alarms
3

– Select by PV or description
– byBNL/RHIC
type un-ack’
Managed
UT-Battelle
for the U.S. Department of Energy

ack’ed • Hierarchical – Optionally only show active alarms – Ack’/Un-ack’ PVs or sub-tree 4 Managed by UT-Battelle for the U. new. Department of Energy . inactive.Another View: Alarm Tree • All alarms – Disabled.S.

Commands  Basic Text  Start EDM screen  Open web page  Run ext.Guidance. from all selected alarms 5 Managed by UT-Battelle for the U. Related Displays. command Hierarchical: Including info of parent entries Merges Guidance etc. Department of Energy .S.

.. Within CSS Alarms History of PV EPICS Config. 6 Managed by UT-Battelle for the U.S. Department of Energy .

Department of Energy . submit. not limited to Oracle-based SNS ELog 7 Managed by UT-Battelle for the U.E-Log Entries • “Logbook” from context menu creates text w/ basic info about selected alarms. Edit.S. • Pluggable implementation.

S. may require Authentication/Authorization Log in/out while CSS is running 8 Managed by UT-Battelle for the U..Online Configuration Changes . Department of Energy .

Department of Energy .Configure PV • Again online • Especially useful for operators to update guidance and related screens.S. 9 Managed by UT-Battelle for the U.

.into generic CSS log also used for error/warn/info/debug messages • Alarm Server: State transitions. Department of Energy . Config changes • Generic Message History Viewer – Example w/ Filter on TEXT=CONFIG 10 Managed by UT-Battelle for the U.S.Logging • . Annunciations • Alarm GUI: Ack/Un-Ack requests.

Problem fixed 5. Department of Energy 2. Alarm Server latches alarm 1.Logging: Get timeline • Example: Filter on TYPE. Ack’ed by operator 3. PV 6. clears. Alarm Server annunciates 11 Managed by UT-Battelle for the U. All OK 4. triggers again . PV triggers.S.

All Sorts of Web Reports 12 Managed by UT-Battelle for the U. Department of Energy .S.

Technical View IOCs PV Updates (Channel Access. Config Updates JMS TALK Tomcat ALARM_CLIENT Alarm Cfg & State RDB Alarm Client GUI CSS Applications .S. Department of Energy ALARM_SERVER -Reports Ack’. …) Alarm Server Current Alarms: Acknowledged? Transient? Annunciated? Log Messages Annunciations Alarm Updates LOG JMS 2 RDB JMS 2 Speech Message RDB 13 Managed by UT-Battelle for the U.

. or non-latching – like ALH “ack. transient” • Annunciate • Chatter filter ala ALH  Alarm only if severity persists some minimum time  . subsequent MINOR alarms not annunciated – ALH would again blink/require ack’ 14 Managed by UT-Battelle for the U.General Alarm Server Behavior • Latch highest severity. or alarm happens >=N times within period • Optional formula-based alarm enablement: – Enable if “(pv_x > 5 && pv_y < 7) || pv_z==1” – … but we prefer to move that logic into IOC • When acknowledging MAJOR alarm. Department of Energy .S.

ISA. B.Best Ever Alarm System Tools. "Alarm Management: Seven (??) Effective Methods for Optimum Performance".. E. 2007 15 Managed by UT-Battelle for the U.S. but Tools are only half the issue Good configuration requires plan & follow-up. Indeed . Habibi. Hollifield. Department of Energy .

related displays – Manageable alarm rate (<150/day) – Operators will respond to every alarm (corollary to manageable rate) 16 Managed by UT-Battelle for the U.S.Alarm Philosophy Goal: Help operators take correct actions – Alarms with guidance. Department of Energy .

“tell next shift”. …  Consider consequence of no action  Is it the best alarm? – Would other subsystems. alarm at the same time? 17 Managed by UT-Battelle for the U. Department of Energy . with better PVs.What’s a valid alarm?  DOES IT REQUIRE IMMEDIATE OPERATOR ACTION? – What action? Alarm guidance!  Not “make elog entry”.S.

.. Department of Energy . …). HIHI. Requires thought.How are alarms added? • Alarm triggers: PVs on IOCs – But more than just setting HIGH. HSV.. enable based on machine state. communication.S. documentation • Added to alarm server with – Guidance: How to respond – Related screen: Reason for alarm (limits. HHSV – HYST is good idea – Dynamic limits. link to screens mentioned in guidance – Link to rationalization info (wiki) 18 Managed by UT-Battelle for the U.

Public Can EPICS cause contained spill of mercury? Uncontained spill?? No effect Beam off <10 min Beam off >10min Beam off < 1 sec? <$10000 >$10000 Cost: Beam Production.Impact/Consequence Grid Category So What Minor Consequence Personnel Safety PPS independent from EPICS? Environment. Downtime.S. Department of Energy . Beam Quality • Mostly: How long will beam be off? 19 Major Consequence Managed by UT-Battelle for the U.

S..30 minutes MINOR MAJOR <10 minutes MAJOR MAJOR + Annunciate – This part is still evolving… 20 Managed by UT-Battelle for the U... combined with Response Time Time to Respond Minor Consequence Major Consequence >30 Minutes NO_ALARM MINOR 10. Department of Energy .

/… • Immediate action required? – Do something to prevent interlock trip • Impact. …” • Related Displays: Screen that shows Temp. Consequence? – Beam off: Reset & OK.Example: Elevated Temp/Press/Res. 5 minutes? – Cryo cold box trip: Off for a day? • Time to respond? – 10 minutes to prevent interlock?   • MINOR? MAJOR? • Guidance: “Open Valve 47 a bit. … 21 Managed by UT-Battelle for the U. Department of Energy . Valve.Err.S.

I need to gooo!” “Mommy. Department of Energy . it won’t get worse if we wait  Pick One “Mommy. but we’re safe for now.S. I went” (Does it require operator action? How much time is there?) 22 Managed by UT-Battelle for the U.“Safety System” Alarms  Protection Systems not per se high priority – Action is required.

S. Department of Energy .Avoid Multiple Alarm Levels 23 Managed by UT-Battelle for the U.

no guidance • Rethought w/ subsystem engineer.S.Bad Example: Old SNS ‘MEBT’ Alarms • Each amplifier trip: ≥ 3 ~identical alarms. IOC programmer and operators: 1 better alarm 24 Managed by UT-Battelle for the U. Department of Energy .

S. Department of Energy .Alarms for Redundant Pumps 25 Managed by UT-Battelle for the U.

Department of Energy . both can be off 26 Managed by UT-Battelle for the U.Alarm Generation: Redundant Pumps the wrong way • Control System – Pump1 on/off status – Pump2 on/off status • Simple Config setting: Pump Off => Alarm: – It’s normal for the ‘backup’ to be off – Both running is usually bad as well • Except during tests or switchover – During maintenance.S.

Department of Energy .S.Redundant Pumps Required Pumps: 1 • Control System – – – – Pump1 on/off status Pump2 on/off status Number of running pumps Configurable number of desired pumps • Alarm System: Running == Desired? – … with delay to handle tests. switchover • Same applies to devices that are only needed on-demand 27 Managed by UT-Battelle for the U.

S.Weekly Review: How Many? Top 10? 28 Managed by UT-Battelle for the U. Department of Energy .

S.A lot of information available  How often did PV trigger?  For how long?  When?  Temporary issue? Or need HYST. Department of Energy . alarm delay. fix to hardware? 29 Managed by UT-Battelle for the U.

Forgotten? 30 Managed by UT-Battelle for the U. Department of Energy .Weekly Check: Stale.S.

Server. Department of Energy .S. Filter s LDAP GUI: Similar to SNS GUI shown here RDB 31 Managed by UT-Battelle for the U.What about the DESY Alarm System? CSS Oth er IOC Interconnection Server No Channel Monitor of alarm PVs! JMS LOG JMS2RDB ALARM Filt.Alr m Access selected IOCs push all alarms via new protocol into Interconn.

how many of the 350000 PVs would send alarms? We want to make the addition of alarms simple. though • DESY IOCs send all alarms.Design Choices • Similar alarm table and tree GUIs • JMS for communication – slightly different messages. Voice Mail 32 Managed by UT-Battelle for the U. • JMS Listeners – SNS: Logger. Department of Energy . • DESY/SNS: LDAP vs. zero additional configuration – At SNS. RDB for configuration/state – Choice was based on available infrastructure. EMail. but not automatic. and encourage guidance.S. then filtered in AMS – DESY: All IOC alarms should show up in AMS. Annunciator – DESY: Logger. Send SMS. related displays.

S. links to operational procedures – “Philosophy” helps decide what gets added and how • Immediate Operator Action? Consequence? Response Time? – Weekly review spots troubles and tries to improve configuration 33 Managed by UT-Battelle for the U. no guidance. no related displays • Now ~400. rel. displays.Summary • BEAST operational at SNS since Feb’09 – DESY AMS is similar and has been operational for longer • Pick either. Department of Energy . but good configuration requires work in any case – Started with previous “annunciated” alarms • ~300. all with guidance.