You are on page 1of 15

Control Engineering Practice 79 (2018) 50–64

Contents lists available at ScienceDirect

Control Engineering Practice


journal homepage: www.elsevier.com/locate/conengprac

Design of visualization plots of industrial alarm and event data for


enhanced alarm management✩
Wenkai Hu a, *, Ahmad W. Al-Dabbagh a , Tongwen Chen a , Sirish L. Shah b
a
Department of Electrical & Computer Eng., University of Alberta, Edmonton, Alberta, Canada
b Department of Chemical & Materials Eng., University of Alberta, Edmonton, Alberta, Canada

ARTICLE INFO ABSTRACT

Keywords: The availability of large volumes of alarm & event data in complex industrial facilities has prompted the
Industrial alarm systems development of alarm management techniques and also resulted in a great demand to transform such data and
Alarm management derived results into effective visual forms. Even though good visualization applications can be found in many
Alarm floods existing studies, systematic studies to the design of visualization plots are still rare in the area of industrial alarm
Alarm data
monitoring. More efforts need to be devoted to enriching the family of visualization techniques, so as to help
Visual analytics
industrial practitioners in better understanding the behavior of alarm systems and to facilitate decision making
for the enhancement of alarm management. This paper presents timely work in the design of visualization plots
of alarm & event data. First, a comprehensive literature survey is carried out to investigate existing visualization
techniques, which are categorized into three classes based on the input information. Problems in the existing
studies are summarized and design requirements for visual analytics are presented. Then, design studies on the
development of visualization plots are presented in three categories, including visualization towards overall
performance, visualization towards pattern insights, and visualization towards realtime applications. Examples
are provided to demonstrate the effectiveness and utility of these visualization techniques.

1. Introduction operations (Wang, Yang, Chen, & Shah, 2016). Motivated by big gaps
between the poor performance of alarm systems and requirements
Alarm systems serve as critical assets to assist operators in perceiv- (e.g., the benchmarks in industrial standards ANSI/ISA-18.2, 2009;
ing near misses and managing hazardous situations in various large- EEMUA-191, 2013; IEC-62682, 2014) for efficient alarm monitoring,
scale process industries, such as oil and gas, chemical, metallurgical,
tremendous research efforts have been devoted to improve alarm mon-
pharmaceutical, and utility facilities. An efficient alarm system must:
itoring in the following aspects, including the evaluation and design of
(i) detect abnormalities and warn operators promptly, and (ii) not
alarm systems (Cheng, Izadi, & Chen, 2013a; Xu, Wang, Izadi, & Chen,
mislead, overload or distract operators (Gupta, Giridhar, Venkatasub-
ramanian, & Reklaitis, 2013). A well designed alarm system with 2012; Yu, Wang, & Yang, 2017), the detection and removal of nuisance
excellent performance is critical to process safety and operational effi- alarms (Hu, Wang, & Chen, 2015; Kondaveeti et al., 2013; Schleburg,
ciency. Hence, industry standards and guidelines, such as ANSI/ISA-18.2 Christiansen, Thornhill, & Fay, 2013; Stauffer, Booth, & Bogdan, 2011),
(2009), EEMUA-191 (2013), and IEC-62682 (2014), have been widely the deployment of state-based alarm suppression (Hollifield & Habibi,
used by industrial practitioners, to guide the design, implementation, 2010; Hu, Chen, & Shah, 2018a; Parker, 2010; Wang et al., 2016), as
and maintenance of industrial alarm systems. With the emphasis on well as the reduction and inhibition of alarm floods (Beebe, Ferrer,
process safety and lessons from many industrial disasters, increasing & Logerot, 2013; Charbonnier, Bouchair, & Gayet, 2015, 2016; Hu,
attention is being applied to this appealing research area, aiming at im- Wang, & Chen, 2016; Pariyani, Seider, Oktem, & Soroush, 2010; Stanic,
proving alarm management and reducing the risk in process operation.
Subramaniam, Sahin, Choi, & Choi, 2010; Vogel-Heuser, Schütz, &
Real industrial alarm systems usually have poor performance owing
Folmer, 2015; Wang, Ten, Hou, & Ginter, 2017). These innovative
to alarm overloading, nuisance alarms, and alarm floods, which are
reasons that lead to distracting operators and deterioration of process ideas and new methods provide effective solutions to overcome alarm


This work was supported by the Natural Sciences and Engineering Research Council of Canada.
* Corresponding author.
E-mail addresses: wenkai@ualberta.ca (W. Hu), aaldabba@ualberta.ca (A.W. Al-Dabbagh), tchen@ualberta.ca (T. Chen), sirish.shah@ualberta.ca (S.L. Shah).

https://doi.org/10.1016/j.conengprac.2018.07.005
Received 4 April 2018; Accepted 11 July 2018
Available online 24 July 2018
0967-0661/© 2018 Elsevier Ltd. All rights reserved.
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

overloading, reduce alarm floods, and improve operator awareness to and utilitarian value of these visualization techniques. Discussions on
critical events. these visualization plots are provided to guide the implementation under
The extensive deployment of Distributed Control Systems (DCSs) different circumstances or for different objectives.
and the Supervisory Control and Data Acquisition (SCADA) systems in The remainder of the paper is organized as follows: Section 2 pro-
complex industrial facilities makes the acquisition, transmission, and vides a comprehensive overview of industrial alarm systems and alarm
storage of large volumes of process operation data simple and feasible. data. Section 3 presents a literature survey on existing visualization tech-
The availability of such data is a critical component for successful niques for industrial alarm & event data. Section 4 presents some new
operation of industrial processes (Kourti, 2002; Qin, 2014), and also visualization plots, which have the capability of delivering informative
a driving force that has prompted the development of alarm monitoring insights to alarm information. Concluding remarks are presented in the
techniques (as discussed above). Using either traditional data statistical last section.
methods (Gupta et al., 2013; Kondaveeti et al., 2013), or new technolo-
gies, such as data mining methods (Hu et al., 2018a; Vogel-Heuser et 2. Overview of industrial alarm monitoring
al., 2015) and big data infrastructure (Li, Hu, Wang, & Huang, 2015;
Qin, 2014), a large amount of historical data can be easily mined to This section introduces the architecture of alarm systems in complex
extract useful knowledge and information that will help in decision- industrial facilities, and also provides discussions on the structure and
making (Qin, 2014). For example, hidden patterns can be recovered properties of alarm data with supporting examples and schematics.
from historical alarm & event data, and used to design advanced alarm
2.1. The basics of alarm systems
management techniques (Charbonnier et al., 2015, 2016; Hu et al.,
2018a; Hu, Wang et al., 2016; Vogel-Heuser et al., 2015). Dynamic Risk
An alarm system is an integration of hardware and software that
Analysis (DRA) exploits alarm data to calculate the failure probabilities
serves as a core asset in complex industrial facilities to notify operators
and the probabilities of a plant shut-down or accident, and uses these
of abnormal conditions or equipment malfunctions. Fig. 1 presents the
results to track near misses or faults (Abimbola, Khan, & Khakzad, 2016;
architecture of an industrial alarm system, which connects processes
Khakzad, Khan, & Amyotte, 2011; Oktem, Seider, Soroush, Pariyani, et
and operators through four components: the data Inputs/Outputs (I/O)
al., 2013; Pariyani, Seider, Oktem, & Soroush, 2012a, 2012b). However,
server, control and safety systems, alarm logs, and Human-Machine
these results obtained using advanced data analytic methods are usually
Interface (HMI) systems (ANSI/ISA-18.2, 2009; IEC-62682, 2014). The
not easily understandable, especially to process engineers and operators
control and safety systems are typically associated with two major
who may not have the background of data science. This motivates
parts, namely, the Basic Process Control System (BPCS) and the Safety
the need for careful investigation of the challenges associated with
Instrumented System (SIS), which produce alarms based on process mea-
knowledge representation in a more informative manner, as well as
surements or logic conditions. Further, the HMI (namely, through a com-
exploring the opportunities for data visualization to assist operators of puter screen or an annunciator panel) presents alarms to operators in au-
industrial facilities in better understanding the process operation. dible or visual notifications, and enables operators to respond to alarms
The concept of data visualization is to transform data into visual and to make changes to processes. The alarm logs store alarms and
representations to allow users to get a rapid understanding of insights, related events in certain structures, and perhaps in a specified format.
observe hidden patterns, and interact with the data (Chen, Guo, & Wang, In order to supervise the design, implementation, and maintenance
2015; Keim, 2002; Munzner, 2015). A variety of data visualization of alarm systems, quite a few industrial standards and guidelines
techniques have been proposed (Chen et al., 2015; Keim, 2002; Setlur have been proposed. Specifically, ANSI/ISA-18.2 (2009), EEMUA-191
& Stone, 2016; Thorvaldsdóttir, Robinson, & Mesirov, 2013; Tyanova (2013), and IEC-62682 (2014) are the most widely used standards. They
et al., 2016; Weber, Alexa, & Muller, 2011) and applied to different do not only provide benchmark specifications on the performance of
fields, such as intelligent transportation (Chen et al., 2015) and alarm systems, but also suggest exploiting advanced alarm management
biological analysis (Tyanova et al., 2016). However, the exploration techniques to improve alarm monitoring. In Fig. 1, the advanced alarm
of data visualization in the field of alarm monitoring has not kept management techniques serve to analyze the alarm & event data in
pace. Kondaveeti, Izadi, Shah, Black, and Chen (2012) proposed novel alarm logs, and extract useful knowledge from the data; such knowledge
alarm data visualization tools for routine assessment of industrial alarm is then utilized to supervise alarm rationalization and alarm system
systems. Laberge, Bullemer, Tolsma, and Reising (2014) and Satuf, design, or to directly assist operators with decision-making (for example,
Kaszkurewicz, Schirru, and de Campos (2016) designed new graphical while facing difficult situations, such as during critical alarms or alarm
panels to display and track alarms in real-time. However, the focus of floods). The advanced alarm management techniques can be imple-
most relevant literature is on the devolvement of alarm management mented in external systems, and many tools developed in the literature
techniques, whereas visualization plots are only used as auxiliary tools can be used to implement the techniques. For example, the association
to represent results. There is lack of systematic studies to the design rules of mode-dependent alarms can be extracted from historical alarm
of visualization plots. Compared to the limited outcomes, the demand data using the method in Hu et al. (2018a) and then used as candidates
for visualization of alarm & event data is huge. More efforts need to to configure state-based alarming modules, which are commonly used
be devoted to enriching the family of visualization techniques, so as to reduce standing alarms (referred to as alarms remaining in the active
to help industrial practitioners in better understanding the behavior of state for an extended duration, according to ANSI/ISA-18.2, 2009). Also,
alarm systems and to facilitate decision making for the enhancement of methods in Charbonnier et al. (2015), Charbonnier et al. (2016) and Hu,
alarm management. Wang et al. (2016) provide solutions to help operators perceive alarm
The goal of this paper is to guide the development of visualization floods in an early stage to prevent the deterioration of the situation.
techniques for alarm & event data, and to design visualization plots
that meet industrial demands for visual analytics. First, a comprehensive 2.2. The basics of alarm and event data
literature survey is carried out to investigate the existing visualization
techniques, which are categorized into three classes based on the Alarm signals are produced as binary time series in alarm systems
input information. Problems in the existing studies are summarized to indicate the associated processes deviating from their normal speci-
and design requirements for visualization plots are presented. Then, fications or violation of some safety conditions. Whether the monitored
design studies on the development of visualization plots are presented in process is analog or digital, the mechanism to generate an alarm signal
three domains, including visualization towards overall performance, vi- is the same, and can be uniformly described as
{
sualization towards pattern insights, and visualization towards realtime 1, if 𝑦(𝑡) ∉ 𝑌 ,
applications. Examples are provided to demonstrate the effectiveness 𝑥̃ 𝑎𝑖 (𝑡) = (1)
0, otherwise,

51
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Fig. 1. The architecture of an industrial alarm system. The arrows indicate dataflow and the dashed line indicates the off-line utilization of uncovered knowledge.

Fig. 2. An overview of messages stored in an Alarm & Event (A&E) database.

where 𝑎𝑖 represents a unique alarm variable that belongs to the whole set with some real examples for each category. It is also noteworthy that
of configured alarms  = {𝑎𝑖 , 𝑖 = 1, 2, … , ||} in an alarm system (The each event is associated with many attributes to uniquely identify the
notation |⋅| denotes the cardinality of a set). 𝑌 is the normal operating process, describe the functionality of the alarm, and denote real-time
specification of a process variable 𝑦, and it indicates an alarm limit messages.
(e.g., a high or low limit) for an analog 𝑦, or certain logic condition Fig. 3 displays some commonly used data attributes in industrial
for a digital 𝑦. The occurrences of 𝑎𝑖 ∈  can be found at triggering alarm systems. These attributes are important bricks constituting the
time instants as A&E database that describe the alarms. For example, a tag name
{ uniquely identifies a basic element, such as a process variable, a
1, if 𝑥̃ 𝑎𝑖 (𝑡 − 1) = 0 & 𝑥̃ 𝑎𝑖 (𝑡) = 1,
𝑥𝑎𝑖 (𝑡) = (2) device, and a control loop, which may have various built-in alarm
0, otherwise.
functions distinguished by alarm types, such as low (LO), Low–Low (LL),
When an alarm occurs, it is presented to operators in an audible High (HI), High–High (HH), Low Deviation (LD), and High Deviation
or visual form, and is displayed on a computer screen as a textual (HD) (Arnold & Darius, 1989). Priorities, such as critical, high, medium,
message in a scrolling list (ANSI/ISA-18.2, 2009; EEMUA-191, 2013; and low, are essential attributes indicating the importance of alarms,
Hollifield & Habibi, 2010; IEC-62682, 2014), as an example. Then, and are usually assigned based on safety, financial, and environmental
the operator responds to this alarm and takes corrective actions to factors (Timms, 2009). In addition to these configuration attributes,
bring the system back to normal. There are two types of operator which do not usually change over time, the other data attributes, such
responses: (i) a response involving state transitions of alarms, for exam- as alarm messages, time stamps, and event IDs, are real-time messages
ple, acknowledging, shelving/unshelving, suppressing/unsuppressing, indicating if an alarm occurs (ALM) or returns-to-normal (RTN), along
and disabling/enabling alarms, as shown in the alarm state transition with the time as well as the numerical identifier information related
diagram in ANSI/ISA-18.2 (2009); (ii) a response involving operator to the event. Arnold and Darius (1989), Timms (2009), and Hollifield
actions dealing with the process or equipment, for example, manipu- and Habibi (2010) provided more details about the data attributes in
lating a valve, adjusting control parameters, and changing over to a industrial alarm systems.
standby pump (EEMUA-191, 2013; Hollifield & Habibi, 2010). All these Further, Table 1 presents an illustrative example of a structured A&E
messages involving alarms and operator responses can be historized and database, where each column header describes an attribute and each
stored in an Alarm & Event (A&E) database. In addition to alarm mes- row corresponds to an event displayed through the textual message.
sages and operator responses, events about automatic process changes, More specifically, the column ‘‘Event type’’ identifies if the event is
for example, the variation of operating rates and transition of operating an alarm or an action; the column ‘‘Message’’ presents the changes in
modes (Hu et al., 2018a), are also available in many A&E databases. alarm states or operator actions; the column ‘‘Description’’ describes the
Fig. 2 presents the categorization of events in an A&E database along functionality of each alarm or action. For example, for the process tag

52
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Fig. 3. An overview of data attributes associated with alarms.

Table 1
An example of a structured A&E database.
Time stamp Event type Tag name Alarm type Message Priority Plant unit Description
21/07/2017 14:21:19 Alarm LV2 HI ALM Critical Area B The fluid level is above its high limit
21/07/2017 14:26:47 Action VAV2 OPEN Area B Open a drainage valve at the tank outlet
21/07/2017 14:27:33 Alarm LV2 HI RTN Critical Area B The fluid level is below its high limit
21/07/2017 17:55:27 Action PMP1 STOP Area C Stop a pump that supplies lube oil
21/07/2017 17:56:31 Alarm PI1 LO ALM Low Area C The discharge pressure is below its low limit
21/07/2017 21:17:23 Action PMP1 START Area C Start a pump that supplies lube oil
21/07/2017 21:18:02 Alarm PI1 LO RTN Low Area C The discharge pressure is above its low limit

‘‘LV2’’ measuring the fluid level located in ‘‘Area B’’, its high limit ‘‘HI’’ validate the uncovered relations, such as correlated alarms and mode-
alarm configured with a ‘‘Critical’’ priority was triggered at 14:21:19, dependent alarms.
21/07/2017. Then, the operator responded by opening a drainage valve In real-time monitoring, alarms are displayed on physical screens in
‘‘VAV2’’ at 14:26:47. This action brought the process back to normal and a traditional list-based alarm summary display, for example Fig. 1 in
the alarm cleared at 14:27:33. It can be observed that the availability Laberge et al. (2014), which has the capacities to discriminate alarm
of such detailed information stored in an A&E database, such as that of priorities by colors, provide parameter information and descriptions,
Table 1, allows for further analysis to gain a deeper understanding on and allow operators to sort the order of the displayed list (Hollifield
the operation of complex industrial facilities. To facilitate such efforts, & Habibi, 2010). Laberge et al. (2014) proposed a new alarm tracker
data visualization is considered an instrumental step. summary display, which combines the list-based display with a new
trend-style overview display in two integrated panels. Experimental
3. Taxonomy of visualization of alarm and event data results demonstrated that operators were more responsive to alarms and
showed a better capability in handling alarm floods. In the experiments
This section investigates existing visualization techniques for alarm to study human factors (Adhitya et al., 2014), a schematic display was
& event data, and categorizes them into three classes depending on used in a company with the alarm display. The schematic display pro-
the input information, including the visualization of raw information, vides an overview of physical components. The alarm display visualizes
the alarms associated with these components using a traditional list-
visualization of statistical results, and visualization of alarm patterns. A
based pane and a historical alarm pane, where the latter one displays
detailed investigation into the visualization techniques is provided in the
temporal trends of alarms that have occurred recently. A similar alarm
subsequent subsections, and Table 2 presents a list of these visualization
display pane was also used in Xu et al. (2014) to visualize both historical
techniques, along with comparisons from various perspectives.
alarms in a recent period and anticipatory alarms in the near future. In
order to improve operator situation awareness in alarm floods, Satuf et
3.1. Visualization of raw information
al. (2016) proposed a new ecological alarm interface, called Advanced
System of Intelligent Alarms to display alarms prioritized in real-time
The raw information refers to binary-valued alarm signals as well with relevant graphical information on process conditions. Overall, the
as information contained in a historical A&E database. Visualization list-based alarm summary and the historical panel of alarm trends are
of such raw information does not involve any statistical calculation or useful for real-time alarm monitoring.
advanced data analytics. The main objective of the visualization of raw
data is related to two aspects, namely, (i) providing an overview picture 3.2. Visualization of statistical results
of the performance of alarm systems for offline analysis, and (ii) tracking
the annunciation of alarms for real-time monitoring. Visualizing his- In contrast to the raw information, the statistical results are not
torical data on alarms and related events could be beneficial in offline directly available from the A&E database and usually involve certain
analysis. The High Density Alarm Plot (HDAP) proposed in Kondaveeti calculations. The basic statistical analysis derives performance metrics,
et al. (2012) is a useful tool that provides a holistic view of alarm system such as the alarm count, average alarm rate, and peak alarm rate, to
performance by visualizing the alarm occurrences of the top-most bad judge whether an alarm system experienced alarm overloading or bad
actors (referred to as alarms with highest alarm counts) as color coded alarm flood situations. The dynamic risk analysis computes the failure
bins. Yang et al. (2012) and Wang et al. (2016) exploit another type of probabilities of safety systems and the probabilities of a plant shut-down
high density plot that displays both the alarm occurrences and alarm or accident, so as to evaluate risks of process operations and diagnose
states, so that users can observe long standing alarms and alarm flood faults (Oktem et al., 2013; Pariyani et al., 2012a, 2012b). Commonly
situations. In addition, Hu et al. (2018a), Hu et al. (2015), Hu, Wang et known plots, such as bar charts, pie charts, spider charts, and box plots,
al. (2017) and Wang et al. (2016) visualize the time trends of historical are used in practice to visualize such statistical results. For example,
alarms and related events as stem plots or line plots, so as to observe or bar charts were employed to visualize alarm counts and alarm priority

53
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Table 2
Categorization of existing visualization techniques applied to alarm & event data; the abbreviations AM, PC, and OR refer to alarm messages, process changes, and
operator responses, respectively; U and M refer to univariate and multivariate, respectively; L and S refer to large and small, respectively.
Input information Visualization plots Event Type Dimension On/off-line Data amount
AM PC OR (U/M) (L/S)
High density alarm plot (Kondaveeti et al., 2012) ✓ M Off L/S
High density plot of time trends (Wang et al., 2016; Yang, Shah, Xiao, & ✓ M Off L/S
Chen, 2012)
Raw information
Stem or line plot (Hu et al., 2018a; Hu et al., 2015; Hu, Wang, Chen, & ✓ ✓ ✓ U/M Off S
Shah, 2017; Wang et al., 2016)
List-based alarm summary display (Hollifield & Habibi, 2010; Laberge et al., ✓ M On S
2014)
Alarm tracker summary display (Laberge et al., 2014) ✓ M On S
Alarm list and trend display panes (Adhitya, Cheng, Lee, & Srinivasan, ✓ M On S
2014; Satuf et al., 2016; Xu, Adhitya, & Srinivasan, 2014)
Bar chart (Beebe et al., 2013; Hu et al., 2015; Hu, Wang et al., 2017; ✓ ✓ ✓ U/M Off S
Kondaveeti et al., 2013; Oktem et al., 2013; Pariyani et al., 2010, 2012b;
Soares, Pinto, & de Souza Jr, 2016; Wang et al., 2016; Yu, Khan, &
Garaniya, 2016)
Statistical Results
Pie chart (Timms, 2009) ✓ U Off S
Spider chart (Satuf et al., 2016) ✓ ✓ ✓ M Off S
Layered radar chart (Al-Dabbagh, Hu, Lai, Chen, & Shah, 2018) ✓ ✓ ✓ M Off S
Box plot (Charbonnier et al., 2016; Pariyani et al., 2010, 2012b) ✓ ✓ U Off S
Segmented area plot (EEMUA-191, 2013; Gao, Xu, Gu, Lin, & Zhu, 2015) ✓ M Off S
Alarm burst plot (Hollifield & Habibi, 2010; Wang et al., 2016) ✓ U On/off L/S
Bayesian network, tree diagram (Abimbola et al., 2016; Khakzad et al., ✓ ✓ ✓ M off S
2011; Pariyani et al., 2012a; Wang, Khan, & Ahmed, 2015)
Similarity color map (Charbonnier et al., 2016; Cheng, Izadi, & Chen, ✓ M Off S
Alarm patterns 2013b; Kondaveeti et al., 2012; Soares et al., 2016; Yang et al., 2012)
Workflow diagram (Gao et al., 2015; Hu, Al-Dabbagh, Chen, & Shah, 2016; ✓ ✓ ✓ M Off S
Hu, Wang et al., 2017; Simeu-Abazi, Lefebvre, & Derain, 2011)
Heat map (Charbonnier et al., 2015) ✓ M Off S

distributions (Beebe et al., 2013), time durations in alarm states (Wang identifying alarm floods. The networks or tree diagrams can reveal the
et al., 2016), time delays between alarm occurrences (Hu et al., 2015; transition probabilities between events and help to track abnormalities.
Hu, Wang et al., 2017), average recovery time (Pariyani et al., 2010),
run length distributions and chattering indices (Kondaveeti et al., 2013), 3.3. Visualization of alarm patterns
and the probabilities of failures or emergency shutdown (Oktem et
al., 2013; Pariyani et al., 2012b). Three-Dimensional (3D) bar charts Alarm patterns refer to the results, such as correlations, similarities,
were utilized to display alarm counts with respect to months and association rules, and sequential patterns, that are not directly observ-
years (Soares et al., 2016), and joint probability distributions on a self- able and require advanced data analytic methods for their extraction
organizing map (Yu et al., 2016). A stacked bar chart was used in from data. These alarm patterns can help industrial practitioners with
Pariyani et al. (2010) to visualize the number of alarms categorized decision-making under different scenarios, such as configuring state-
by priorities and types. In addition to bar charts, other plots, such based alarming, coping with alarm floods, and reducing redundant
as pie charts, box charts, and spider charts, were also exploited to alarm configurations. The correlations or similarities between alarms
display alarm priority distributions (Timms, 2009), length of alarm or alarm floods can be represented by cross-correlations, Jaccard coeffi-
sequences (Charbonnier et al., 2016), probabilities of an emergency cients, and Sogenfrei coefficients (Hu et al., 2015; Hu, Wang et al., 2016;
shutdown or an accident (Pariyani et al., 2012b), recovery time for Yang et al., 2012). To visualize such relations, correlation color maps
abnormal events (Pariyani et al., 2010), and performance metrics of and similarity color maps were used in Charbonnier et al. (2016), Cheng
two scenarios (Satuf et al., 2016). In Al-Dabbagh et al. (2018), a layered et al. (2013b), Kondaveeti et al. (2012), Soares et al. (2016) and Yang
radar chart was proposed to track the changes of alarm counts for a large et al. (2012). A correlation color map displays correlation levels that
number of alarm variables. range from −1 to 1, between alarms in pairs (Soares et al., 2016; Yang
Based on the Key Performance Indicators (KPIs) recommendations et al., 2012). Each correlation level is represented using a colored block.
in EEMUA-191 (2013), a segmented area plot is commonly used in Analogously, a similarity color map displays similarities that range from
practice to determine whether the overall performance of an alarm 0 to 1, between pairwise alarms, alarm floods, or faults (Charbonnier et
system is predictive, robust, stable, reactive, or overloaded (EEMUA- al., 2016; Cheng et al., 2013b; Kondaveeti et al., 2012). By contrast,
191, 2013; Gao et al., 2015). An alarm burst plot is used to visualize some relations, such as causal relations between alarms (Gao et al.,
the burst alarm rate, which is calculated as the number of alarms in a 2015; Hu, Wang et al., 2017) and dependencies between events or
10 min time window (Hollifield & Habibi, 2010; Wang et al., 2016). The alarms (Hu, Al-Dabbagh et al., 2016; Simeu-Abazi et al., 2011), are
dynamic analysis visualizes alarms and related events using networks asymmetric and acyclic, and thus can only be represented using work
or tree diagrams, such as the Bayesian network (Khakzad et al., 2011; flow diagrams, such as signed directed graphs and Petri nets. Some
Wang et al., 2015), the event tree (Pariyani et al., 2012a), and the fault patterns, such as alarm sets or alarm sequences, are usually associated
tree (Abimbola et al., 2016). In summary, each visualization plot has its with some specific alarm floods or faults. To show whether alarms are
own advantage and specific usage. The bar chart is usually used to make present in such situations, a binary matrix plot or a heat map could be
simple comparisons of one or multiple metrics. The pie chart is more a good choice (Charbonnier et al., 2015).
appropriate to display percentages of the whole. The spider chart can
be used to compare multiple metrics over different periods or between 3.4. Discussions
different categories. The box plot is useful to show time information, for
example, response time and time delays. The matrix plot is a good tool The visualization plots in the above literature survey are useful
to evaluate the overall performance, and the alarm burst plot helps in tools to help in observing and understanding raw A&E data, as well as

54
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

statistical results and patterns derived from the data. Even though the Table 3
applicability and effectiveness of these plots have been demonstrated An example of performance metrics for the alarm management of five consoles.
by many numerical or industrial applications, there are still quite a few Console Peak alarm rate Average alarm rate Num. of unique alarms
common problems that arise, which are summarized in the following (alarms/10 min) (alarms/10 min)
three aspects: Console 1 6 0.5 300
Console 2 87 4.8 1273
(i) Most existing plots in literature are designed to help researchers Console 3 359 3.4 897
with better representations of the derived statistical and ana- Console 4 89 33 2534
Console 5 1496 129 1867
lytical results, other than to assist industrial practitioners, such
as alarm analysts, process engineers and plant operators, to
understand the performance of alarm systems or to improve
alarm management. (1) Bubble Chart to Evaluate Overall Performance
(ii) Most of these plots are only capable of presenting very limited in- A bubble chart is a variant of the scatter plot, but displays data in
formation, such as results based on a single attribute, and results three dimensions. It is adopted here and combined with the segmented
related to a small number of variables. They are not informative area plot (EEMUA-191, 2013; Gao et al., 2015) to evaluate the overall
enough to give a comprehensive look of the performance of performance of an alarm system. Hereby, this bubble chart consists of
an alarm system, or to help users to get a deep insight of the two essential parts:
problems hidden in the data.
(i) The segmented area plot divides the performance of alarm sys-
(iii) There is lack of comparisons to the industrial benchmarks, such
tems into 5 zones based on the KPI recommendations in EEMUA-
as the acceptable alarm rate and the threshold for alarm floods. It
191 (2013);
is important that this be done since these benchmarks represent
(ii) The bubbles are displayed in three dimensions, including the
the targets for maintaining a healthy industrial alarm system.
average alarm rate 𝛾𝑎 as the vertical coordinate, the peak alarm
In summary, the studies on the design of visualization techniques rate 𝛾𝑝 as the horizontal coordinate, and the number of unique
for industrial alarm & event data are quite limited, and more efforts alarms 𝛾𝑛 as the area of the bubble.
should be taken to enrich the family of visualization techniques, so
Based on Eq. (2), the average alarm rate 𝛾𝑎 for the data in a given
as to help industrial practitioners to have a better understanding the
time period [𝑠 , 𝑒 ] is
behavior of alarm systems and take actions to enhance industrial alarm
management. The development of visualization techniques of alarm & ∑|| ∑𝑒
𝑖=1
𝑥 (𝑘)
𝑘=𝑠 𝑎𝑖
event data should incorporate the following requirements: 𝛾𝑎 = . (3)
𝑒 − 𝑠 + 1
(i) The capability of being easily understandable to industrial prac- where 𝑠 and 𝑒 indicate the start and end time stamps of the studied
titioners, such as alarm analysts, process engineers and plant time period. The peak alarm rate refers to the largest value of the burst
operators. alarm rate 𝜈(𝑡), which is calculated as the alarm count in the past time
(ii) The capability of presenting the overall performance of an bin [𝑡 − 𝑇 + 1, 𝑡]. Based on Eq. (2), the formula of 𝜈(𝑡) is given by
alarm system from multiple perspectives and revealing hidden ||
problems in the historical data. ∑ ∑
𝑡
𝜈(𝑡) = 𝑥𝑎𝑖 (𝑘), (4)
(iii) The capability of making comparisons between groups, variables, 𝑖=1 𝑘=𝑡−𝑇 +1
time periods, as well as metrics and the corresponding bench-
where 𝑇 takes the value of 600 s based on benchmarks in ANSI/ISA-18.2
marks.
(2009). Then, the peak alarm rate is calculated as
(iv) A better design methodology to deliver effective display of the
data and catch users’ attentions immediately. 𝛾𝑝 = max 𝜈(𝑡). (5)
𝑡∈[𝑠 +𝑇 −1,𝑒 ]
Due to the availability of large volumes of historical data in modern
There are two ways to calculate the number of unique alarms 𝛾𝑛 : (i) it
industrial facilities, the demand of more and better visualization tech-
can be the number of configured alarms in an alarm system, namely,
niques is huge, so as to augment human capabilities to discover problems
𝛾𝑛 = ||, or (ii) it can be the number of unique alarms with at
and patterns from the alarm & event data. The development of new
least one occurrence within the studied period, namely, 𝛾𝑛 = |{𝑎𝑖 ∈
visualization techniques should overcome the exposed problems in the ∑𝑒
 ∶ 𝑥 (𝑡) > 0}|. Five colors are used to distinguish the five
𝑡=𝑠 𝑎𝑖
literature and incorporate the requirements in the above discussions.
performance zones. Given the average alarm rate and peak alarm rate
4. Design of visualization plots of alarm & event data as 𝛾𝑎 and 𝛾𝑝 (alarms/10 min for each operator), the coordinates (𝑥, 𝑦)
(the center of each bubble) are calculated as
This section presents the design of visualization plots that provide ( )
(𝑥, 𝑦) = min(𝑓 (𝛾𝑎 ), 4), min(𝑓 (𝛾𝑝 ), 5) , (6)
comprehensive and intuitive insights of alarm & event data. The visu-
alization techniques are demonstrated for three categories, namely, vi- where the formula 𝑓 (⋅) is given by
sualization towards overall performance, visualization towards pattern {
1 + log10 𝛾 if 𝛾 > 1,
insights, and visualization towards realtime applications. 𝑓 (𝛾) = (7)
𝛾 if 0 ≤ 𝛾 ≤ 1,
4.1. Visualization towards overall performance where 𝛾 is either 𝛾𝑎 or 𝛾𝑝 . The minimization in Eq. (6) guarantees that
the center of a bubble never exceeds the boundary of the area plot, even
The first type of visualization techniques involves evaluation of if 𝛾𝑎 > 1000 and 𝛾𝑝 > 10 000. The area 𝐴 of a bubble is proportional to
overall performance of an alarm system or a complex industrial facility, the number of unique alarms 𝛾𝑛 , i.e., 𝐴 = 𝐾𝛾𝑛 , where 𝐾 is a user defined
which can be based on performance metrics or basic statistical values. parameter to adjust the size of the bubble. It can be set to a large value,
In this subsection, three visualization plots are proposed, namely, a so as to avoid overspreading of large bubbles.
bubble chart to evaluate overall performance, a treemap to quickly Fig. 4 gives an example of the bubble chart, which displays the met-
locate problems, and a ranking chart to track top bad actors. They allow rics (shown in Table 3) calculated based on the data over one full month.
users to gain more abstract information, quickly and easily assess the Each bubble represents one console; the area of each bubble is propor-
overall performance, and make comparisons to benchmarks. tional to the number of unique alarms (shown in the last column of

55
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

(ii) The interactive design is incorporated to allow observation from


different perspectives, to enable zooming-in on the second level
of the treemap, and to show the associated basic information
as data tips (e.g. the tag name, priority, unit, alarm count, and
chattering index);
(iii) Colors are effectively used to highlight certain groups or items,
and texts are adjusted based on sizes of rectangles.

The treemap has several advantages: it makes efficient use of the


screen space and thus can display a large number of items; it can reveal
patterns or problems based on colors and sizes of rectangles; it can easily
carry out comparisons between items or groups.
An example of the treemap displaying alarm counts categorized by
plant units is shown in Fig. 5. There are 48 categories corresponding
to 48 areas in a thermal power plant. The name of each plant unit
and the alarm count in proportion to the whole alarm count are shown
at the header of each rectangle. Top 10 bad actors are displayed with
tag names and highlighted using an orange color. It can be observed
from the plot where the most alarms were from as well as who the top
Fig. 4. An example of the bubble chart to evaluate the overall performance bad actors were. In addition, an interactive design is incorporated: two
of alarm systems. The five colors correspond to five performances: predictive, dropdown menus are included to facilitate observations from different
robust, stable, reactive, and overloaded, from the best to the worst. The center perspectives rather than just to display top 10 bad actors. The first
of each bubble corresponds to the average alarm rate and the peak alarm rate
dropdown menu enables grouping alarms based on different categories,
of each console. The area of the bubble is proportional to the number of unique
such as alarm priorities and alarm types. The second dropdown menu
alarms in each console. A bubble could be as small as a point, representing only
allows highlighting alarms based on different criterions, e.g., the user
one unique alarm. To view the numeric information, an interactive design can
be included to display the three metrics as data tips by pointing to a bubble. can choose to highlight all chattering alarms.
The second interactive design is incorporated: the treemap in the
second level is generated by clicking on the appropriate rectangle
in the first level. For example, Fig. 6 presents the second level of
Table 3) in the data of each console. It can be observed from the plot that the treemap in Fig. 5 by selecting the plant unit ‘‘Coal pulverizer’’.
the alarm system in Console 1 had a predictive performance, meaning Moreover, in order to observe detailed information, the third interactive
that the average alarm rate was acceptable to the operator and there was design is incorporated: a plot shows up to present detailed information
not any alarm flood problem. By contrast, the alarm systems in Consoles
of one alarm tag when the mouse pointer moves over its associated
4 and 5 had extremely high alarm rates, making the alarm systems lose
rectangle. For example, moving the mouse pointer over the rectangle of
their effectiveness and thus overloading operators. The alarm rates in
‘‘TAG165.TP18’’, this rectangular area is highlighted by a golden frame;
Consoles 2 and 3 were high but tolerable. Further efforts can be taken
meanwhile a data tip window shows up to display the basic information
to bring the alarm rates down targeting at a predictive performance.
of this alarm tag, including the tag name, alarm priority, plant unit,
alarm count, chattering index, and the line graph of time series, shown in
Remark 1. This visualization plot is designed to present the overall
performance of industrial alarm systems. It is a combination of the Fig. 7. The orange color is used for a ‘‘High’’ priority. The information of
trendy scatter/bubble chart and the segmented area plot. Compared chattering index is highlighted using the red color to capture attentions,
to conventional scatter/bubble charts, the designed plot is capable since it is higher than the threshold of chattering alarms (Kondaveeti
of comparing metrics with industrial benchmarks, making the perfor- et al., 2013). Moving off this rectangle, this data tip window for
mance evaluation more intuitive. Compared to the segmented area plot ‘‘TAG165.TP18’’ disappears automatically. Such an interactive design is
(EEMUA-191, 2013; Gao et al., 2015), it provides additional information helpful for the user to quickly observe the detailed information hidden in
about the scale of each console or alarm system, and enables compar- the treemap. It can also be incorporated in the first level of the treemap.
isons between different consoles or alarm systems in three dimensions.
This visualization plot can be used by managers, team leaders, alarm Remark 2. This visualization technique is designed to observe statis-
analysts, and process engineers, for routine alarm audits. Targets can tical results from different perspectives and quickly locate problems at
be set for future alarm management based on the zone location of rectangular areas. Compared to bar charts and pie charts, this treemap
each alarm system in the plot and the size of its scale. It can also be presents more information involving groups and individual alarms
used in realtime to track the performance change of an alarm system. through the hierarchical structure, and makes more efficient use of
An unhealthy status, e.g., ‘‘Reactive’’ and ‘‘Overloaded’’, will lead to space. Compared to conventional tables, it enables direct comparisons
interventions to improve the performance of the alarm system. based on the rectangular areas, and allows flexible grouping and high-
(2) Treemap to Quickly Locate Problems lighting. In addition, the interactive designs help to observe detailed
A treemap is a simple visualization plot displaying data of multiple information while displaying the abstracted results. Potential users in-
categories in a hierarchical structure (Shneiderman, 1992). Each cate- clude alarm analysts and process engineers, who take the responsibility
gory is represented by a rectangular area with its size determined by the of maintaining alarm systems and solving alarm management problems.
associated quantity value. Rectangle areas representing subcategories It can be concluded quickly from the treemap which groups (e.g., units,
are nested inside of each rectangle in the parent level. The sum of the priorities, and types) tend to receive more alarms and which alarms
sizes of all child rectangles in each parent category is usually equal to the account for a high alarm rate. As a result, the problems with these groups
size of their parent rectangle. Here, the treemap is adapted to visualize and alarms will be given priorities for investigation and solution.
statistical information, such as alarm counts. It consists of the following
(3) Ranking Chart to Track Top Bad Actors
essential parts:
As an essential stage of the alarm management life cycle described
(i) The hierarchical structure is used to show categories (e.g., plant in ANSI/ISA-18.2 (2009), an audit provides periodic reviews of the
units, priorities, and alarm types) and items (e.g., alarm tags) in performance of an alarm system and may reveal gaps that are not
the first and second levels of the treemap, respectively; apparent from routine monitoring. A typical task of alarm audit is to

56
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Fig. 5. An example of the treemap of alarm counts categorized by plant units. The size of each rectangle (or child rectangle) corresponds to the total number of
alarm occurrences in a plant unit (or of an alarm tag). The orange color highlights the top 10 bad actors. The tag names and alarm count proportions are displayed.
The font size of each tag name is proportional to the size of its rectangle. Two dropdown menus are designed to enable grouping alarms based on different categories
and to allow highlighting items based on different criterions. Moreover, clicking on the appropriate rectangle can generate the treemap in the second level.

Fig. 6. An example of the treemap in the second layer.

track top bad actors, which are known as alarms with highest alarm (iv) Alarm counts are displayed beside the colored blocks.
counts. The top bad actors are often the most contributing factors of
Based on Eq. (2), the alarm count 𝜓𝑎𝑖 of each unique alarm variable
high alarm rate. Tracking and solving top bad actors help to relieve ∑𝑒
alarm overloading. To track the changes of top bad actors, a ranking 𝑎𝑖 is calculated as 𝜓𝑎𝑖 = 𝑘= 𝑥𝑎𝑖 (𝑘) in a studied time period of [𝑠 , 𝑒 ].
𝑠
chart is proposed. It consists of four essential components: Given alarm data over two time periods, two zones of top bad actors
are drawn based on the alarm counts. In each zone (denoted by 𝑗 = 1
(i) Two zones display the top bad actors over two time periods, and or 2), the top 𝑁 bad actors over the corresponding time period are
the bad actors in each zone are ranked in a decreasing order of found and denoted by 𝑎̂𝑗𝑖 ∈ , 𝑖 = 1, 2, … , 𝑁 with their alarm counts
∑𝑒
the alarm count; 𝜓𝑎̂𝑗 = 𝑘= 𝑥 𝑗 (𝑘). Then, the total alarm count of the top 𝑁 bad actors
𝑠 𝑎̂𝑖
𝑖 ∑
(ii) Each zone consists of a group of stacked blocks corresponding to in each zone is 𝛹 𝑗 = 𝑁 𝑗
𝑖=1 𝜓𝑎̂𝑗 . The height 𝐻 of each zone of the ranking
𝑖
different bad actors; 𝛹𝑗
chart is calculated as 𝐻 𝑗 = max(𝛹 1 ,𝛹 2 )
. Accordingly, the block height ℎ𝑗𝑖
(iii) Different colors are used to indicate disappearing, appearing, 𝜓 𝑗
𝑎̂
increasing rank, and decreasing rank of the bad actors; of each bad actor 𝑎̂𝑗𝑖 is 𝑗 𝑗
ℎ𝑖 = 𝐻 ∗ 𝛹 𝑗 . 𝑖

57
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Fig. 7. An example of the interactive design that displays the basic information of an alarm in a data tip window. This window appears automatically by moving
the mouse pointer over a rectangle in the treemap.

Fig. 8. An example of the ranking chart to track top bad actors. The left and right zones indicate the top 10 bad actors (denoted by 10 colored blocks) over two
months. The height of each block is proportional to the alarm count of the bad actor. Two colors are used: orange represents the appearing or increasing rank; green
indicates the disappearing or decreasing rank. The numbers on the side are alarm counts for the top 10 bad actors over the week. (For interpretation of the references
to color in this figure legend, the reader is referred to the web version of this article.)

An example of the ranking chart to track the changes of the top 10 of individual alarms. Potential users include alarm analysts and process
bad actors over two months is presented in Fig. 8. It can be observed engineers, who will take actions to deal with top 10 bad actors based
from the plot that the total alarm count of the top 10 bad actors the decreasing or increasing of alarm counts, e.g., evaluating whether
decreased in the current month compared to that in the previous month. some implemented solutions are effective in reducing nuisance alarms,
Four bad actors disappeared whereas four new bad actors appeared. In and determining what alarms require further efforts to bring the alarm
addition, four bad actors had increasing ranks whereas the ranks of two counts down.
other bad actors decreased.

Remark 3. This visualization plot is designed to track top 10 bad 4.2. Visualization towards pattern insights
actors, which is an important task of periodic alarm audit. Compared
to conventional bar charts and pie charts, this ranking chart allows The second type of visualization involves pattern insights, which is
for comparisons between different time periods, which involve two useful for decision-making, but may not be obtained in a straightfor-
aspects, including the total alarm count of top 10 bad actors and that ward fashion from the data. In this subsection, two new visualization

58
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Fig. 9. An example of the event flow chart of alarm states and operator Fig. 10. An example of the simplified plot of the event flow chart presented in
responses for the low alarm of a tank level. The vertical axis corresponds to the Fig. 9. The height of each area indicates the frequency of the associated operator
sequence index number that indicates a pair of alarm occurrence and clearance response (i.e., marked by a blue or green color). The topmost area indicates that
over different periods. The horizontal axis corresponds to the time that indicates there were no responses for three alarm occurrences. (For interpretation of the
the duration of an alarm state. Two distinct color bars within the alarm states references to color in this figure legend, the reader is referred to the web version
represent two different operator responses (i.e., green represents running feed of this article.)
pump 1; blue represents running feed pump 2). (For interpretation of the
references to color in this figure legend, the reader is referred to the web version
of this article.)
effective in clearing the alarms. In order to have abstracted information
and compare different cases, the event flow chart is simplified using
the method in Monroe et al. (2013), and accordingly, Fig. 10 presents
techniques, namely, an event flow chart for event interactions and a
a simplified graph. The width of the red area before the blue or green
spiral graph for alarm floods, are designed to assist users in identifying
bar indicates the time to respond to the alarm occurrences. The width of
patterns of interest and importance in a visual form as well as to gain
that after the blue or green bar denotes the time for the alarm to return
insights of the historical alarm & event data.
to normal after a response. It can be observed from the plot that the
(1) Event Flow Chart for Event Interactions
response marked by the green color was more timely and effective than
As discussed in Section 2.2, the alarm & event data includes three
the response marked by the blue color.
pieces of information, namely, alarm messages, process changes, and
operator responses. These events could be related to each other. For
Remark 4. This visualization plot is designed to present the abstracted
instance, alarms may come on simultaneously or in a sequential order,
information about interactions between alarms and related events. It
due to configuration duplications or physical connections (Hu et al.,
can be used to augment flow diagrams, such as the petri nets in Hu, Al-
2015; Hu, Wang et al., 2017). Process changes, such as state transitions,
may cause a series of consequence alarms (Hu et al., 2018a). Operator Dabbagh et al. (2016), which are used to show what actions are usually
responses are conducted to acknowledge alarms or cope with abnormali- taken to respond to alarms. Potential users include alarm analysts and
ties (ANSI/ISA-18.2, 2009; Hollifield & Habibi, 2010; Hu, Al-Dabbagh et process engineers, who can discover interesting patterns from historical
al., 2016). Such event interactions can be uncovered using the methods data, e.g., what actions are usually taken to respond to certain alarms
in Hu, Al-Dabbagh et al. (2016), Hu et al. (2018a), Hu et al. (2015) and and whether or not they are effective. Such knowledge can be used to
Hu, Wang et al. (2017). In order to see interactions and to make them train new operators or to improve operator responses.
easily understandable, it is necessary to present information, such as (2) Spiral Graph for Alarm Floods
orders, frequencies, and time intervals, in a good visual form. Monroe, Alarm floods are situations where the number of annunciated alarms
Lan, Lee, Plaisant, and Shneiderman (2013) proposed an event flow
is more than that can be effectively managed by an operator. They
chart to transform an entire dataset of temporal event records into
are usually caused by propagated abnormalities in interconnected ar-
an aggregated display, allowing analysis of patterns and trends in a
eas (Stanic et al., 2010; Wang et al., 2017) and may lead to serious
population-level. This visualization technique is exploited to present the
consequences (Beebe et al., 2013; EEMUA-191, 2013; Pariyani et al.,
interactions between alarms and related events. An event flow chart
2010; Wang et al., 2016). Beebe et al. (2013), Vogel-Heuser et al. (2015),
consists of three essential components:
Charbonnier et al. (2015, 2016), Wang et al. (2016), Hu, Wang et al.
(i) A sequence of alarms and related events is shown as an event (2016) and Wang et al. (2017) investigated the causes and consequences
flow in one row of the chart; of alarm floods, and proposed systematic methods to cope with alarm
(ii) Alarms and related events are presented in a chronological order, floods. In practice, visualization techniques are needed to help users to
and different alarms and events are denoted by different symbols observe alarm floods and discover hidden problems from historical data.
and colors; Here, a spiral graph (Weber et al., 2011) is adapted to visualize alarm
(iii) An interactive design is incorporated to enable the alignment flood situations. A preliminary work is presented in Hu, Chen, and Shah
of events in different perspectives and the simplification of the (2018b). This visualization plot consists of the following essential parts:
event flow.
(i) A spiral grows clockwise and continuously in circles with each
An example of the event flow chart to visualize the alarm state circle representing a certain time period, e.g., one day;
transitions and the related operator responses is presented in Fig. 9. It (ii) Each point on one circle represents a time instant of a smaller
can be observed from the plot that two types of operator responses were time resolution depending on the sampling rate, e.g., one second;

59
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

(iii) Colors are used to represent values of the time series related to Remark 5. This visualization plot is designed to observe alarm flood
alarm flood situations. situations. Compared to bar charts, heatmaps, and line graphs, this spiral
graph has the following advantages: (1) A bar chart shows the statistical
The spiral graph has several advantages: it makes an effective use information, e.g., the number of alarm floods over different hours or
of screen space and can visualize data over a very long time period; it days. In comparison, the spiral graph provides more details, users cannot
supports effective comparisons in two dimensions, namely, comparisons only tell which hours or days tend to receive more alarm floods, but also
in a neighborhood and comparisons of circles; it allows for an easy identify the exact time instants when an alarm flood starts and ends. (2)
observation of periodic behaviors and trends. Using this graph, the user A heat map presents alarm rates in two coordinates, namely, time and
cannot only identify when an alarm flood starts and ends, but also day. However, the continuity of the data is broken, making users hard to
determines which day or what time tends to receive more alarm floods. observe the changes from the end of one day to the beginning of the next
Alarm floods are identified, by comparing the burst alarm rate 𝜈(𝑡) day. In comparison, the spiral graph can visualize data over a very long
in Eq. (4) with the benchmark thresholds. A variable 𝜓 indicating the time period without breaking its continuity with respect to time. (3) A
presence of alarm floods is calculated as line graph, such as the alarm burst plot (Hollifield & Habibi, 2010; Wang
et al., 2016), shows the alarm rate on the time axis, which is straight.
⎧ 1 if 𝜈(𝑡) ≥ 𝛥 & 𝜓(𝑡 − 1) = 0, Due to the limitation of the screen width, alarm floods become hard to
⎪ observe if the data spreads over a very long time period. In comparison,
𝜓(𝑡) = ⎨ 0 if 𝜈(𝑡) < 𝛥 & 𝜓(𝑡 − 1) = 1, (8)
⎪𝜓(𝑡 − 1) otherwise, the spiral graph makes effective use of space and can effectively visualize

data on a single line growing spirally. Potential users of a spiral graph
where 𝛥 and 𝛥 denote the benchmark thresholds to identify the start include alarm analysts and process engineers, who conduct alarm flood
and end of an alarm flood. According to ANSI/ISA-18.2 (2009), the analysis and develop solutions. More specifically, based on periodic
benchmark thresholds are 𝛥 = 10 and 𝛥 = 5 based on a time bin of patterns, the user need to think about why these time periods tend to
size 𝑇 = 600 s in Eq. (4). The index values 𝜓(𝑡) = 1 and 𝜓(𝑡) = 0 receive more alarm floods, could such periodic patterns be caused by
represent the presence and absence of an alarm flood at time instant 𝑡, transitions of work shifts, and what solutions should be taken to improve
operators’ workflow patterns. The user can also identify which alarm
respectively. 𝜓(𝑡) is initialized to be 𝜓(0) = 0. An alarm flood is said
floods are long and serious, and then investigates how to prevent the
to begin at time instant 𝑡𝑠 if 𝜓(𝑡𝑠 ) = 1 & 𝜓(𝑡𝑠 − 1) = 0, and end at time
reproduction of such alarm floods.
instant 𝑡𝑒 if 𝜓(𝑡𝑒 ) = 0 & 𝜓(𝑡𝑒 − 1) = 1.
The spiral graph to visualize alarm flood situations is implemented
based on the following discussion. Given an alarm data spanning 𝑁 4.3. Visualization towards realtime application
days, the time instant 𝑡 is scaled to the range [1, 𝑁 ∗ 86, 400] (1 day
is equal to 86,400 s). Then the angle in the spiral graph for each time The previously discussed visualization plots are specifically designed
instant 𝑡 is for off-line analysis, and thus they are static. This subsection presents
two dynamic plots towards realtime visualization, namely, the dynamic
2𝜋𝑡 high density plot and the dynamic 3D bar chart, to allow for a compre-
𝜃(𝑡) = + 2𝛽𝜋. (9)
𝛼 hensive view of the alarm system performance.
More specifically, given an alarm data spanning 𝑁 days, the indexing (1) Dynamic High Density Alarm Plot
signal 𝜓(𝑡) in Eq. (8) that indicates the presence of alarm floods The High Density Alarm Plot (HDAP) in Kondaveeti et al. (2012)
is calculated first. Then, the coordinates at each time instant 𝑡 are provides an overview picture of the whole alarm system by tracking
calculated as the occurrences of individual alarms in the studied time period. This
( ) subsection turns this static plot into a dynamic plot for realtime appli-
𝜋
𝑥(𝑡) = 𝛾𝜃(𝑡) cos 𝜃(𝑡) + ,
2) cations. Comparing to the static one in Kondaveeti et al. (2012), the new
( (10)
𝜋 dynamic HDAP has the following special features:
𝑦(𝑡) = 𝛾𝜃(𝑡) sin 𝜃(𝑡) + ,
2
where (i) The plot presents the alarm counts for different alarm variables
in a short recent period (e.g., hours or days);
2𝜋𝑡 (ii) The time window, color bins, and alarm orders update with time
𝜃(𝑡) = + 2𝛽𝜋, (11)
86400 (alarm tags are sorted by their alarm counts in a descending
where 𝛽 and 𝛾 are a user specified parameters to adjust the radiuses of order);
the innermost and outermost spiral circles, respectively. The specifica- (iii) An alarm burst plot is shown to indicate alarm flood situations,
tion of the two parameters 𝛽 and 𝛾 is up to the size of the screen so as to and benchmark thresholds are included and shown as horizontal
give users the best view. As a result, a spiral graph with 𝑁 continuous lines.
circles is produced. Each circle of the graph represents 24 h. As the final
step, the values of a certain variable are represented by different colors Fig. 13 presents an example of the dynamic HDAP. The upper subplot
in the spiral graph. shows the burst alarm rate, namely, the number of alarm occurrences in
each 10 min time bin, and compares it to the red line, which indicates
Figs. 11 and 12 present spiral graphs for the burst alarm rate 𝜈(𝑡)
the benchmark threshold of alarm floods (namely, 10 alarms/10 min)
and the alarm flood index 𝜓(𝑡), respectively. Each of the graph displays
in ANSI/ISA-18.2 (2009). Depending on the application, the threshold
a time series of 30 days, starting at 00 ∶ 00 ∶ 00 at the innermost circle,
of the acceptable alarm rate (namely, 1 alarm/10 min) can also be
and goes clockwise until reaching 23 ∶ 59 ∶ 59 at the outermost circle.
included in the alarm burst plot. The upper and lower subplots update
From Fig. 11, the time periods (i.e., marked by green colors) which are with time in a direction as indicated by the black arrow at the bottom
likely to receive less alarms and time periods which tend to have high of Fig. 13. It can be observed from this plot that the alarm system had
alarm rates (i.e., marked by red colors) can be observed. It can also a good performance in the past day according to the low burst alarm
be found that most points on the spiral are green and yellow, implying rate in the upper plot. However, the increase of the burst alarm rate in
the alarm system was healthy in most of the studied period. But there the most recent hour indicates the emerging of new alarm floods. This
are still quite a few spots with red colors, indicating high alarm rates can be also observed from the lower plot when more bins of red and
and alarm flood situations. According to Fig. 12, alarm floods were orange colors appear. Moreover, it can be also observed that there was
quite common and they were more likely to appear during the following a group of correlated alarms, namely, ‘‘Tag10.FAU’’, ‘‘Tag14.FAU’’, and
periods: 3 ∶ 00 − 4 ∶ 00 h, 11 ∶ 00 − 13 ∶ 00 h, and 19 ∶ 00 − 23 ∶ 00 h. ‘‘Tag15.FAU’’, since they appeared almost simultaneously.

60
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Fig. 11. An example of the spiral graph to show burst alarm rates. Green colors denote lower burst alarm rates, whereas yellow and orange colors represent higher
burst alarm rates. Red colors indicate alarm flood situations. The spiral starts at 00 ∶ 00 ∶ 00 of the first day at the innermost circle, and goes clockwise until reaching
23 ∶ 59 ∶ 59 of the last data at the outermost circle. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of
this article.)

Fig. 12. An example of the spiral graph to show alarm flood periods, which are highlighted by orange colors. The spiral starts at 00 ∶ 00 ∶ 00 of the first day at the
innermost circle, and goes clockwise until reaching 23 ∶ 59 ∶ 59 of the last data at the outermost circle. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)

61
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Fig. 13. An example of the dynamic high density alarm plot. The upper subplot displays the burst alarm rate in a moving window. The lower subplot displays alarm
counts for the top 20 bad actors in a time window of 24 h. The alarm count for one specific alarm variable in a time bin of 10 min is color coded and shown as a
color bin. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Remark 6. This visualization plot is designed to present an overview


picture of the whole alarm system by tracking the occurrences of
individual alarms in realtime. Compared to the static plot in Kondaveeti
et al. (2012), this plot is dynamic and updated with time, and thus it
can be used in realtime to assist online alarm monitoring. Moreover, it
includes a subplot to display the burst alarm rate and compare it with
the benchmark threshold, and thus can help users to identify alarm flood
situations. Plant operators are the potential users. By observing this
dynamic plot, plant operators can identify if the alarm system has a good
performance, what the top bad actors are, as well as whether there is any
chattering alarm, repeating alarm, standing alarm, correlated alarm, and
alarm flood. Based on such observations, actions will be taken to reduce
nuisance alarms and improve alarm management.

(2) Dynamic 3D Bar Chart


In addition to representing alarm counts using color bins shown in
Fig. 13, this part proposes a dynamic 3D bar chart to display the same
information using vertical bars. Analogous to the dynamic HDAP, the
3D bar chart has the following features: Fig. 14. An example of the dynamic 3D bar chart. Each bar indicates the
alarm count for one alarm variable over a 10 min time period. The black curve
(i) The chart presents the alarm information in three dimensions, indicates the burst alarm rate. The green and red lines represents benchmark
including the time on the 𝑥-axis, the names of alarms on the 𝑦- thresholds of 1 and 10 alarms over a 10 min time period.
axis, and the alarm counts on the 𝑧-axis;
(ii) The time window, bar lengths, and alarm orders update with
a bad performance of the alarm system. It can also be found that the
time (alarm tags are sorted by their alarm counts in a descending
order); alarm ‘‘Tag48.LO’’ was chattering or repeating, which accounts for the
(iii) An alarm burst plot is integrated to denote alarm flood situations, peaks of the black line. Overall, the alarm system had alarm floods and
and benchmark thresholds are included and shown as horizontal alarm chattering problems in the recent 2 h time period, and thus actions
lines. were required to solve the two problems.

Fig. 14 presents an example of the dynamic 3D bar chart. From the Remark 7. This visualization plot is designed to present an overview
plot, the alarms that have high alarm rates in the most recent time picture of the whole alarm system in realtime. Compared to the previous
period can be observed from the vertical bars, and the changes of the dynamic high density alarm plot, this plot has the same function, but
burst alarm rate can be observed from the black curve. The time window presents data in a different visual form. Plant operators are the potential
moves in a direction as indicated by the black arrow at the bottom of users. This visualization plot provides an alternative to the dynamic high
Fig. 14. It can be found from the black line that the burst alarm rate was density plot for realtime alarm monitoring.
fluctuating around the benchmark threshold of alarm floods, indicating

62
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

5. Concluding remarks Hollifield, B. R., & Habibi, E. (2010). Alarm management: A comprehensive guide. Research
Traingle Park, NC: ISA.
Hu, W., Al-Dabbagh, A. W., Chen, T., & Shah, S. L. (2016). Process discovery of operator
This paper studied visualization of industrial alarm & event data, actions in response to univariate alarms. IFAC-PapersOnLine, 49(7), 1026–1031.
which complies with the new research and development trend of Hu, W., Chen, T., & Shah, S. L. (2018a). Discovering association rules of mode-dependent
industrial alarm monitoring and management in the era of big data. alarms from alarm and event logs. IEEE Transactions on Control Systems Technology,
By transforming data into visual representations, users have a clear and 26(3), 971–983.
Hu, W., Chen, T., & Shah, S. L. (2018b). Detection of frequent alarm patterns in industrial
rapid understanding of the insights as well as hidden patterns, which
alarm floods using itemset mining methods. IEEE Transactions on Industrial Electronics,
can lead to improved decision making support in alarm rationalization, 65(9), 7290–7300.
management, and monitoring. This paper presented a comprehensive Hu, W., Wang, J., & Chen, T. (2015). A new method to detect and quantify correlated
literature survey on the existing alarm data visualization plots, and alarms with occurrence delays. Computers & Chemical Engineering , 80, 189–198.
Hu, W., Wang, J., & Chen, T. (2016). A local alignment approach to similarity analysis of
categorized them into three classes based on the input information.
industrial alarm flood sequences. Control Engineering Practice, 55, 13–25.
Their characteristics were compared from the perspectives of event Hu, W., Wang, J., Chen, T., & Shah, S. L. (2017). Cause-effect analysis of industrial alarm
types, dimensions, online/offline applications, and the amount of data. variables using transfer entropies. Control Engineering Practice, 64, 205–214.
Further, new visualization techniques were designed based on generic IEC-62682, (2014). Management of alarm systems for the process industries. IEC (Interna-
tional Electrotechnical Commission).
visualization plots or were motivated by special requirements of indus-
Keim, D. A. (2002). Information visualization and visual data mining. IEEE Transactions
trial alarm systems. These new visualization techniques are capable of on Visualization and Computer Graphics, 8(1), 1–8.
providing a comprehensive, informative, and intuitive understanding of Khakzad, N., Khan, F., & Amyotte, P. (2011). Safety analysis in process facilities:
the performance and behaviors of industrial alarm systems. comparison of fault tree and bayesian network approaches. Reliability Engineering &
System Safety, 96(8), 925–932.
According to Munzner (2015) and Sedlmair, Meyer, & Munzner
Kondaveeti, S. R., Izadi, I., Shah, S. L., Black, T., & Chen, T. (2012). Graphical tools for
(2012), validation is critical to bring the designed visualization tech- routine assessment of industrial alarm systems. Computers & Chemical Engineering , 46,
niques to the deployment step in real systems. A significant limitation of 39–47.
the current work is the lack of validation from industrial users, especially Kondaveeti, S. R., Izadi, I., Shah, S. L., Shook, D. S., Kadali, R., & Chen, T. (2013).
plant operators, even though the designed visualization in this work has Quantification of alarm chatter based on run length distributions. Chemical Engineering
Research and Design, 91(12), 2550–2558.
successfully attracted interests from industry. Thus, a future direction Kourti, T. (2002). Process analysis and abnormal situation detection: from theory to
is to implement these visualization plots in real systems and improve practice. IEEE Control Systems, 22(5), 10–25.
the design based on the limitations of the systems and feedback from Laberge, J. C., Bullemer, P., Tolsma, M., & Reising, D. V. C. (2014). Addressing alarm flood
industrial users. In addition, more efforts will be devoted in the future situations in the process industries through alarm summary display design and alarm
response strategy. International Journal of Industrial Ergonomics, 44(3), 395–406.
to design better visualizations, e.g., incorporating efficient interactive Li, D., Hu, J., Wang, H., & Huang, W. (2015). A distributed parallel alarm management
design based on industrial workflows, and developing better alarm strategy for alarm reduction in chemical plants. Journal of Process Control, 34,
display panels for alarm monitoring, so as to provide efficient visual 117–125.
analytics to reveal problems and discover patterns from alarm & event Monroe, M., Lan, R., Lee, H., Plaisant, C., & Shneiderman, B. (2013). Temporal event
sequence simplification. IEEE Transactions on Visualization and Computer Graphics,
data.
19(12), 2227–2236.
Munzner, T. (2015). Visualization analysis & design. Boca Raton, FL: CRC Press.
References Oktem, U. G., Seider, W. D., Soroush, M., Pariyani, A., et al. (2013). Improve process safety
with near-miss analysis. Chemical Engineering Progress, 109(5), 20–27.
Pariyani, A., Seider, W. D., Oktem, U. G., & Soroush, M. (2010). Incidents investigation and
Abimbola, M., Khan, F., & Khakzad, N. (2016). Risk-based safety analysis of well integrity
dynamic analysis of large alarm databases in chemical plants: A fluidized-catalytic-
operations. Safety Science, 84, 149–160.
cracking unit case study. Industrial and Engineering Chemistry Research, 49(17), 8062–
Adhitya, A., Cheng, S. F., Lee, Z., & Srinivasan, R. (2014). Quantifying the effectiveness of
8079.
an alarm management system through human factors studies. Computers & Chemical
Pariyani, A., Seider, W. D., Oktem, U. G., & Soroush, M. (2012). Dynamic risk analysis
Engineering , 67 , 1–12.
using alarm databases to improve process safety and product quality: Part I-Data
Al-Dabbagh, A. W., Hu, W., Lai, S., Chen, T., & Shah, S. L. (2018). Towards the
compaction. AIChE Journal, 58(3), 812–825.
advancement of decision support tools for industrial automation: operation metrics,
Pariyani, A., Seider, W. D., Oktem, U. G., & Soroush, M. (2012). Dynamic risk analysis
visualization plots, and ranking in alarm floods. IEEE Transactions on Automation
using alarm databases to improve process safety and product quality: Part II-Bayesian
Science and Engineering , PP(99), 1–14.
analysis. AIChE Journal, 58(3), 826–841.
ANSI/ISA-18.2, (2009). Management of alarm systems for the process industries. Durham,
Parker, B. (2010). How to avoid alarm overload with centralized alarm management.
NC USA: ISA (International Society of Automation).
Power, 154(2), 38–41.
Arnold, M. W., & Darius, I. H. (1989). Alarm management in batch process control. ISA
Qin, S. J. (2014). Process data analytics in the era of big data. AIChE Journal, 60(9),
Transactions, 28(3), 33–40.
3092–3100.
Beebe, D., Ferrer, S., & Logerot, D. (2013). The connection of peak alarm rates to plant
Satuf, E. N., Kaszkurewicz, E., Schirru, R., & de Campos, M. C. M. M. (2016). Situation
incidents and what you can do to minimize. Process Safety Progress, 32(1), 72–77.
awareness measurement of an ecological interface designed to operator support
Charbonnier, S., Bouchair, N., & Gayet, P. (2015). A weighted dissimilarity index to isolate
during alarm floods. International Journal of Industrial Ergonomics, 53, 179–192.
faults during alarm floods. Control Engineering Practice, 45, 110–122.
Schleburg, M., Christiansen, L., Thornhill, N. F., & Fay, A. (2013). A combined analysis
Charbonnier, S., Bouchair, N., & Gayet, P. (2016). Fault template extraction to assist op-
of plant connectivity and alarm logs to reduce the number of alerts in an automation
erators during industrial alarm floods. Engineering Applications of Artificial Intelligence,
system. Journal of Process Control, 23(6), 839–851.
50, 32–44.
Sedlmair, M., Meyer, M., & Munzner, T. (2012). Design study methodology: Reflections
Chen, W., Guo, F., & Wang, F. Y. (2015). A survey of traffic data visualization. IEEE
from the trenches and the stacks. IEEE Transactions on Visualization and Computer
Transactions on Intelligent Transportation Systems, 16(6), 2970–2984.
Graphics, 18(12), 2431–2440.
Cheng, Y., Izadi, I., & Chen, T. (2013). Optimal alarm signal processing: Filter design and
Setlur, V., & Stone, M. C. (2016). A linguistic approach to categorical color assignment
performance analysis. IEEE Transactions on Automation Science and Engineering , 10(2),
for data visualization. IEEE Transactions on Visualization and Computer Graphics, 22(1),
446–451.
698–707.
Cheng, Y., Izadi, I., & Chen, T. (2013). Pattern matching of alarm flood sequences by a
Shneiderman, B. (1992). Tree visualization with tree-maps: 2-d space-filling approach.
modified Smith-Waterman algorithm. Chemical Engineering Research and Design, 91(6),
ACM Transactions on Graphics (TOG), 11(1), 92–99.
1085–1094.
Simeu-Abazi, Z., Lefebvre, A., & Derain, J. P. (2011). A methodology of alarm filtering
EEMUA-191, (2013). Alarm systems: A guide to design, management and procurement.
using dynamic fault tree. Reliability Engineering & System Safety, 96(2), 257–266.
London: EEMUA (Engineering Equipment and Materials Users’ Association).
Soares, V. B., Pinto, J. C., & de Souza, M. B., Jr. (2016). Alarm management practices in
Gao, H., Xu, Y., Gu, X., Lin, X., & Zhu, Q. (2015). Systematic rationalization approach for
natural gas processing plants. Control Engineering Practice, 55, 185–196.
multivariate correlated alarms based on interpretive structural modeling and Likert
Stanic, S., Subramaniam, S., Sahin, G., Choi, H., & Choi, H. A. (2010). Active monitoring
scale. Chinese Journal of Chemical Engineering , 23(12), 1987–1996.
and alarm management for fault localization in transparent all-optical networks. IEEE
Gupta, A., Giridhar, A., Venkatasubramanian, V., & Reklaitis, G. V. (2013). Intelligent
Transactions on Network and Service Management , 7 (2), 118–131.
alarm management applied to continuous pharmaceutical tablet manufacturing: an
Stauffer, T., Booth, S., & Bogdan, J. (2011). Managing alarms using rationalization. Control
integrated approach. Industrial and Engineering Chemistry Research, 52(35), 12357–
Engineering , 58(3), 30–35.
12368.

63
W. Hu et al. Control Engineering Practice 79 (2018) 50–64

Thorvaldsdóttir, H., Robinson, J. T., & Mesirov, J. P. (2013). Integrative Genomics Viewer Wang, J., Yang, F., Chen, T., & Shah, S. L. (2016). An overview of industrial alarm
(IGV): high-performance genomics data visualization and exploration. Briefings in systems: main causes for alarm overloading, research status, and open problems. IEEE
Bioinformatics, 14(2), 178–192. Transactions on Automation Science and Engineering , 13(2), 1045–1061.
Timms, C. (2009). Hazards equal trips or alarms or both. Process Safety and Environmental Weber, M., Alexa, M., & Muller, W. (2011). Visualizing time-series on spirals. In Proceed-
Protection, 87 (1), 3–13. 12th international symposium of loss prevention and safety ings of IEEE symposium on information visualization INFOVIS 2001 (pp. 7–13).
promotion in the process industries. Xu, S., Adhitya, A., & Srinivasan, R. (2014). Hybrid model-based framework for alarm
Tyanova, S., Temu, T., Sinitcyn, P., Carlson, A., Hein, M. Y., Geiger, T., et al. (2016). The anticipation. Industrial and Engineering Chemistry Research, 53(13), 5182–5193.
perseus computational platform for comprehensive analysis of (prote) omics data. Xu, J., Wang, J., Izadi, I., & Chen, T. (2012). Performance assessment and design
Nature Methods, 13(9), 731–740. for univariate alarm systems based on FAR, MAR, and AAD. IEEE Transactions on
Vogel-Heuser, B., Schütz, D., & Folmer, J. (2015). Criteria-based alarm flood pattern recog- Automation Science and Engineering , 9(2), 296–307.
nition using historical data from automated production systems (aps). Mechatronics, Yang, F., Shah, S., Xiao, D., & Chen, T. (2012). Improved correlation analysis and
visualization of industrial alarm data. ISA Transactions, 51(4), 499–506.
31, 89–100.
Yu, H., Khan, F., & Garaniya, V. (2016). Risk-based process system monitoring using
Wang, C., Ten, C. W., Hou, Y., & Ginter, A. (2017). Cyber inference system for substation
self-organizing map integrated with loss functions. The Canadian Journal of Chemical
anomalies against alter-and-hide attacks. IEEE Transactions on Power Systems, 32(2),
Engineering , 94(7), 1295–1307.
896–909.
Yu, Y., Wang, J., & Yang, Z. (2017). Design of alarm trippoints for univariate analog
Wang, H., Khan, F., & Ahmed, S. (2015). Design of scenario-based early warning system for
process variables based on alarm probability plots. IEEE Transactions on Industrial
process operations. Industrial and Engineering Chemistry Research, 54(33), 8255–8265.
Electronics, 64(8), 6496–6505.

64

You might also like