Annals of Nuclear Energy: Hyojin Kim, Awwal Mohammed Arigi, Jonghyun Kim

Annals of Nuclear Energy 153 (2021) 108077
Contents lists available at ScienceDirect
Annals of Nuclear Energy

journal homepage: www.elsevier.com/locate/anucene
Development of a diagnostic algorithm for abnormal situations using

long short-term memory and variational autoencoder
Hyojin Kim, Awwal Mohammed Arigi, Jonghyun Kim ⇑
Department of Nuclear Engineering, Chosun University, 309 Pilmun-daero, Dong-gu, Gwangju 501-709, Republic of Korea
a r t i c l e i n f o a b s t r a c t
Article history: It is recognized that an abnormal situation diagnosis is a challenging task for nuclear power plant (NPP)
Received 1 June 2020 operators because of the excessive information and high workload in such situations. To help operators,
Received in revised form 3 December 2020 several studies have proposed operator support systems using artificial intelligence techniques. However,
Accepted 7 December 2020
those methods could neither assess anonymous cases as an unknown situation nor confirm whether its
outputs are reliable or not. In this study, an algorithm that can confirm the diagnosis results and deter-
mine unknown situations using long short-term memory (LSTM) and variational autoencoder (VAE) is
Keywords:
proposed. LSTM was adopted as the primary network for diagnosing abnormal situations. Meanwhile,
Event diagnosis algorithm
Abnormal situation
VAE-based assistance networks were added to the algorithm to ensure that the credibility of the diagno-
Long short-term memory sis is estimated via the anomaly score-based negative log-likelihood. The algorithm was tested and
Variational autoencoder implemented using the compact nuclear simulator for the Westinghouse 900 MWe NPP. Furthermore,
the simulation also considered noise-added data.
Ó 2020 Elsevier Ltd. All rights reserved.
1. Introduction To address these issues, several researchers have suggested

operator support systems and algorithms to reduce the burden
Diagnosing nuclear power plants (NPPs) in abnormal situations on operators. For instance, (Hsieh et al., 2012) used an abnormal
is considered one of the most difficult tasks of operators. First, symptom matrix, while (Kim and Jung, 2013) used a flowchart of
there is excess information to consider in the decision-making pro- the AOPs generated through the C 5.0 package to implement oper-
cess of operators. Moreover, NPPs not only have approximately ator support systems. Other works using support vector machines
4000 alarms and monitoring devices in the main control room, (SVM) (Na et al., 2008), expert systems (Wang et al., 2016), and
but they also have more than 100 operating procedures that should artificial neural networks (ANNs) (Ohga and Seki, 1993) have
be followed during abnormal situations (Bae et al., 2006). Using demonstrated the applicability of artificial intelligence (AI) tech-
this large body of information, operators diagnose abnormal situa- niques. However, the SVM algorithms and expert systems have
tions and execute specific actions in accordance with the relevant their downsides. SVM algorithms are not suitable for large data-
operating procedures (Park et al., 2017). However, this excessive sets, while expert systems are usually developed for specific
information could confuse operators and increase the likelihood domains and acquiring the necessary knowledge is time consum-
of error caused due to excess mental workload (Park et al., 2019). ing. Meanwhile, ANNs are regarded as one of the most relevant
Additionally, some abnormal situations require speedy diagnosis approaches to handle pattern recognition and large nonlinear data
and response to prevent the reactor from being tripped. (e.g., handwriting recognition, translation, financial forecasting).
Therefore, some papers have proposed algorithms using ANNs for
the diagnosis of abnormal situations in NPPs (Fantoni and
Mazzola, 1996; S ß eker et al., 2003; Embrechts and Benedek, 2004;
Abbreviations: NPP, Nuclear power plant; LSTM, Long short-term memory; VAE, Uhrig and Hines, 2005; Yang and Kim, 2018).
Variational autoencoder; SVM, Support vector machines; AI, Artificial intelligence; Several diagnostic algorithms using ANNs have performed well
ANN, Artificial neural network; AE, Autoencoder; CNS, Compact nuclear simulator;
in trained cases, but some drawbacks are observed. The first one is
RNN, Recurrent neural network; AUC, Area under the curve; CCW, Component
coolant water; ROC, Receiver operating characteristic; AE, Autoencoder. that it cannot correctly assess anonymous cases as an unknown sit-
⇑ Corresponding author. uation if an unknown abnormal situation is provided. As some
E-mail addresses: kim05140@chosun.kr (H. Kim), awwal@chosun.kr (A.M. Arigi), abnormal events are not known and are unpredictable in actual
jonghyun.kim@chosun.ac.kr (J. Kim).
https://doi.org/10.1016/j.anucene.2020.108077
0306-4549/Ó 2020 Elsevier Ltd. All rights reserved.
H. Kim, Awwal Mohammed Arigi and J. Kim Annals of Nuclear Energy 153 (2021) 108077
NPPs, defining all abnormal events is also difficult. This limitation the noise from the signal was not considered in previous studies,
could harm the safety of NPPs caused by the wrong diagnostic which exists in actual NPPs.
results from an algorithm. Therefore, this study not only aims to propose a diagnostic algo-
To determine unknown events, (Yang and Kim, 2020) suggested rithm for abnormal situations by identifying unknown events, con-
a diagnostic algorithm utilizing long short-term memory (LSTM) firming the diagnostic results, and conducting the final diagnosis
networks (i.e., a type of recurrent neural network) and autoencoder through the algorithm but also increases the reliability of diagnos-
(AE). To identify unknown events, the authors proposed a detector tic results. This algorithm combines LSTM and VAE for identifying
using AE based on reconstruction error (i.e., differential between unknown situations and confirms the diagnosis of the LSTM net-
the original and the reconstructed data). The idea behind this work. First, the utilized methodologies are briefly introduced in
detection is that an AE cannot adequately reconstruct unpredicted Section 2. Second, this study develops a functional architecture of
patterns of anomalous data compared to predicted non-anomalous the diagnostic algorithm that consists of several functions. Then,
data. Further, it has shown a good diagnostic performance and it defines the function and interaction between the functions, in
capability of handling non-trained situations. Section 3. Finally, the implementation, training, and testing of
However, there are several disadvantages to identify unknown the proposed diagnostic algorithm are described in detail. The
events using AE. In this study, variational autoencoders (VAEs) training and testing data include a Gaussian noise to reflect the
have been applied to compensate for the disadvantages in AE. First, actual situations in NPPs. This algorithm was trained and imple-
the AE is short of the ability to address variability. The latent vari- mented using a compact nuclear simulator (CNS). The reference
ables of AE are defined by deterministic mappings. Even though plant is based on the three-loop Westinghouse 930 MWe pressur-
known events and unknown events might share the same mean ized water reactor.
value, the variability can differ. Also, presumably unknown events
will have greater variance. Since deterministic mappings of AE can
be regarded as a mapping to the mean value of a Dirac delta distri- 2. Methods
bution, the AE cannot address variability. On the other hand, since
VAE uses the probabilistic encoder for modeling the distribution of This section briefly explains the LSTM and VAE networks that
the latent variables rather than the latent variable itself, the vari- were mainly used for developing the diagnosis algorithm. LSTM
ability of the latent space can be considered to account from the was applied as the primary network for diagnosing abnormal
data (Murphy, 2012; Kingma et al., 2014). events. Meanwhile, VAE-based assistance networks were adopted
Second, the reconstruction errors are difficult to calculate if to ensure the credibility of the diagnosis result from the LSTM
the input variables are heterogeneous. AE-based unknown event network.
detection uses reconstruction errors as anomaly scores, which
are difficult to calculate when there are multiple input variables.
Fundamentally, a weighted sum is required to utilize multiple 2.1. Long short-term memory
data. The problem is that a universal objective method to decide
the appropriate weight is not available since the weights will be LSTM is a neural network architecture based on recurrent neu-
different depending on the data. Furthermore, even after the ral network (RNN) for processing long temporal sequences of data.
weights are decided, deciding the threshold for reconstruction It has been suggested for long temporal sequence learning to han-
error is complicated. However, a VAE can calculate the recon- dle the vanishing gradient problem. Sequential data can also be
struction log-likelihood of input by modeling the original proba- handled with other sequence models, such as Markov models, con-
bility distribution data (Park et al., 2018). This VAE-based ditional random fields, and Kalman filters. However, unlike LSTM,
detector can detect an anomaly when the reconstruction log- these models exhibit difficulty in handling long-range dependen-
likelihood, which is based on the expected distribution of current cies. Although LSTM has the same structure as RNNs, it uses a dif-
observation, is lower than the threshold. Since the probability dis- ferent equation to calculate the hidden state. Specifically, LSTM
tribution of each variable allows them to be separately calculated uses a memory cell instead of an RNN neuron. It combines fast
by its variability, the calculation of the reconstruction log- training with efficient learning on tasks that require sequential
likelihood does not require weighting of the reconstruction error short-term memory storage for many time steps during a trial.
of heterogeneous data and does not vary depending on the char- LSTM can learn to bridge minimal time lags over 1000 discrete
acteristics of the heterogeneous data. Thus, deciding the threshold time steps by enforcing special units called memory cells. It deter-
of the reconstruction log-likelihood is much more objective, rea- mines whether the previous memory value should be altered and
sonable, and easy to understand than that of the reconstruction then calculates the value to be stored in the current memory based
error (An and Cho, 2015). on the current state and input value of the memory cell. This struc-
In addition, the above diagnostic algorithm (Yang and Kim, ture is highly effective in storing long sequences. Further, alterna-
2020) needs to potential improvements. An improvement with this tive models (i.e., Markov models, conditional random fields, and
diagnostic algorithm is that the algorithm produces the diagnosis Kalman filters) require domain knowledge or feature engineering,
result with probabilities and the operators need to make the final and they offer less chance for unexpected discovery, whereas LSTM
decision based on the probability result. In some situations, this can learn representations and discover unpredicted structures
diagnostic algorithm may provide multiple and competing diagno- (Fantoni and Mazzola, 1996; Embrechts and Benedek, 2004;
sis results with competing probability values. For example, this Monner and Reggia, 2012; Lipton et al., 2016; Lee et al., 2020).
diagnostic algorithm produces an output such that Event-A has a Fig. 1 shows the architecture of the LSTM cell applied in this
probability of 0.6 and Event-B has a probability of 0.4. In this case, study. The input sample x passes through every gate as in a con-
operators might make errors due to confusion and uncertainty veyor belt system. With other LSTM models, each LSTM cell adjusts
when the probabilities of diagnosis results are competing, espe- the output value using the input gate, forget gate, and output gate
cially during the initial period of abnormal situations when the while maintaining the cell. The information in the cell state is
probabilities of those events may not be clearly distinguished. In unchanged and can be added or deleted through each gate. Fur-
some abnormal situations, fast decision making is important. thermore, as the operation of each gate is composed of an addi-
Hence, confirming the diagnostic result of the algorithm during tional operation attached to the cell state, it can avoid the
the initial period of an abnormal situation is necessary. Moreover, vanishing gradient problem.
2
Fig. 1. Architecture of LSTM.
The input gate determines the capacity of the input value, the
forget gate determines the degree to which the previous cell state
is forgotten, and the output gate determines the value of the out-
put. Eqs. (1)–(3) represent the input gate, forget gate, and output
gate denoted by i, f, and o, respectively; Here, r and t represent a
sigmoid function and time state, respectively. The cell state is
determined as shown in Eq. (4), where C represents the cell state.
Finally, LSTM provides the output using Eq. (5), where h represents
the output of the LSTM network.

f t ¼ r W f C t1 ; ht1 ; xt þ bf ð1Þ
it ¼ rðW i ½C t1 ; ht1 ; xt þ bi Þ ð2Þ

Fig. 2. Archtecture of AE.
ot ¼ rðW o ½C t ; ht1 ; xt þ bo Þ ð3Þ

b ¼ f ðW d hð X Þ þ bd Þ
X ð7Þ
C t ¼ f t C t1 þ it tanhðW c ½ht1 ; xt þ bc Þ ð4Þ

ht ¼ ot tanhðC t Þ ð5Þ b ¼ kX X
L X; X b k2 ð8Þ
The encoder in Eq. (6) maps an input X to a hidden layer repre-

2.2. Autoencoder sentation hð X Þ by a related mapping following a nonlinearity. The
decoder in Eq. (7) maps the hidden representation hð X Þ back to
AE is a neural network that is trained by unsupervised learning, the original input space as reconstruction by the same transforma-
which is trained to learn reconstructions that approximate the tion as the encoder. The square of the error between the input and
input value to derive the output value (Zhou and Paffenroth, output is called the reconstruction error as in Eq. (8). AE learns to
2017). Fig. 2 shows the architecture of AE. AE comprises an encoder minimize this reconstruction error.
that encodes the input to the hidden layer and a decoder that
decodes the encoded hidden unit and outputs the same size as
the input. As shown in Eq. (6), X 2 RD and hð X Þ 2 RM are the input 2.3. Variational autoencoder
data and the hidden data representation, respectively. f ðzÞ is the
nonlinear transformation function. Generally, a logistic sigmoid VAE (Kingma et al., 2014) is an unsupervised deep learning gen-
erative model that can model the distribution of training data. If
function f ðzÞ ¼ 1þexp
1
ðzÞ
is applied. W e 2 RMD and be 2 RM are the
input data are similar to the training data, the output appears to
weight matrix and bias vector, respectively. The hidden represen- be similar to the input. Otherwise, a probabilistic measure that
b 2 RD are shown
tation of the network output and reconstruction X considers the variability of the distribution variables decreases
in Eq. (7), where W d 2 RDM and bd 2 RD are the weight matrix and (Chen et al., 2019). Several studies have suggested a fault detection
bias vector, respectively. algorithm using the reconstruction log-likelihood of VAE and
showed the compatibility of VAE with LSTM (An and Cho, 2015;
hð X Þ ¼ f ð W e X þ be Þ ð6Þ
Park et al., 2018; Xu et al., 2018).
3
VAE provides a flexible formulation for interpreting and encod- 3. Diagnostic algorithm for abnormal situations using LSTM and
ing z as a potential variable in probabilistic generation models. As VAE
shown in Fig. 3, the input sample x passes through the encoder to
obtain parameters of latent space distribution. The latent variable z This section describes the overall structure of the diagnostic
was obtained from sampling in the current distribution, and then z algorithm for abnormal situations using ANN. Fig. 4 illustrates
was used to generate a reconstructed sample through the decoder the functional architecture of the diagnostic algorithm design,
(Chen et al., 2019). VAE comprises of a probabilistic encoder which consists of four functions: 1) input pre-processing function,
(q/ ðzjxÞ) and decoder (ph ðxjzÞ). As the posterior distribution 2) unknown event identification function, 3) event diagnosis func-
(ph ðzjxÞ) is intractable, VAE approximates ph ðzjxÞ using the encoder tion, and 4) confirmation of diagnosis result function. Table 1 pre-
q/ ðzjxÞ, which is assumed to be Gaussian and is parameterized by sents a summary of descriptions, inputs, and outputs of each
£. This enables the encoder to learn and predict latent variables function.
z, which makes it is possible to draw samples from this
distribution. 3.1. Input Pre-processing function
To decode a sample z drawn from q/ ðzjxÞ to the input x, the
reconstruction loss (Eq. (9)) must be minimized. The first term of The first function of the algorithm is to process the plant param-
Eq. (9) is the Kullback-Leibler (KL) divergence between the approx- eters and make it suitable as network inputs. The inputs for the
imate posterior and prior latent variable z. This term makes the networks are selected from operating procedures based on their
posterior distribution to be similar to the prior distribution by importance and ability to affect the state of the plant or system
working as a regularization term. The second term of Eq. (9) can availability. These inputs should have a range of values from 0 to
be understood in terms of reconstruction of x through the posterior 1. However, plant parameters have a different range of values or
distribution q/ ðzjxÞ and likelihood ph ðxjzÞ. states (e.g., pressurizer pressure: 158 kg/cm2, alarm: on or off).
Generally, variables with higher values will have a larger impact
Lðh; /; xÞ ¼ DKL q/ ðzjxÞjjph ðzÞ þ Eq/ ðzjxÞ ½logph ðxjzÞ ð9Þ
on the network results. However, higher values are not necessarily
The choice of distribution types is important because VAE mod- more important for prediction. This problem produces local min-
els the approximated posterior distribution q/ ðzjxÞ from a prior ima. Therefore, the input pre-processing obtains the regular plant
ph ðxÞ and likelihood ph ðxjzÞ. A typical choice for the posterior is parameters as input and then outputs the normalized plant param-
Gaussian distribution, where the standard normal distribution eters that will be utilized by the networks.
N ð0; 1Þ is used for the prior ph ðxÞ. Min-max normalization is used to prevent local minima and
increase the learning speed. Thus, the input of the networks is cal-
2.4. Softmax culated by using Eq. (11). here, X is the current value of plant
parameters, and X min and X max are the minimum and maximum
The softmax function is used for the post-processing of the values of collected data, respectively. In this equation, X input has a
LSTM output. The softmax function is an activation function com- range of 0–1.
monly used in the output layer of deep learning models that can
ðX X min Þ
categorize more than three classes of output (Bishop, 2006). Soft- X input ¼ ð11Þ
ðX max X min Þ
max significantly deviations between the values and then normal-
izes the outputs. Eq. (10) represents the softmax function. For
y 2 Rk (the input vector to the softmax function), k is the number 3.2. Unknown event identification function
of classes of output. For Sð yÞ1 . . . Sð yÞk , the normalized output val-
ues were between 0 and 1, and the sum of output values was This function is used to identify the unknown event via a com-
always 1. bination of the VAE and LSTM networks. The VAE network was
e yi combined with LSTM (Park et al., 2018) and it was used not only
Sð yÞi ¼ Pk ðfor i ¼ 1; . . . ; kÞ ð10Þ to support the sequence of input data but also to capture the com-
yj
j¼1 e
plex temporal dependence in time series. This function receives
Fig. 3. Architecture of VAE.
4
Fig. 4. Functional architecture of the diagnostic algorithm design.
normalized NPP parameters from the input pre-processing func- temporal AEs. Given the multi-parameters input xt 2 RD (where D
tion and identifies the unknown event in real time. The anomaly is the number of input parameters), made up of x1;t , . . . xD;t , at time
scores that indicate discrepancies between the actual and trained twhich is normalized plant parameters from input pre-processing
data were used for this function. If the anomaly score is below function, the encoder approximates the posterior pðzt jxt Þ by the
the threshold, the event is identified as a known event for which LSTM of the encoder to estimate the mean lzt 2 RM and variance
the diagnosis network in the next function has been trained. If
rzt 2 RM of the latent variable zt 2 RM . Then, the randomly sampled
the anomaly score is above the threshold, the event is unknown,
zt from the posterior pðzt jxt Þ is fed into the LSTM of the decoder.
and the message ‘‘Unknown event occurrence” is provided to the
The final outputs are the reconstruction mean lxt 2 RD
operators as the output.
The process of unknown event identification function is shown (lx1;t ; lxD;t Þ and variance rxt 2 RD rx1;t ; rxD;t .
in Fig. 5. To consider the temporal dependency of time-series data To identify the unknown event, the unknown event identifica-
in a VAE, a VAE is combined with LSTMs by replacing the feed- tion function detects an anomalous execution when the current
forward network in a VAE with the LSTMs similar to conventional anomaly score is above the threshold a in Eq. (12). The term
0
5
Table 1
Summary of diagnostic algorithm functions.
No. Function Input Output

1. Input pre-processing function Plant parameters (from the NPP) Normalized plant parameters (to networks)
2. Identification of unknown event Normalized plant parameters (from input pre-processing Identification as a known event (to event diagnosis function)
function function) Identification as an unknown event (to operators)
3. Event diagnosis function Identification of known event (from identification of Eventual highest probability event (to confirmation of
unknown event function) diagnosis result function)
Normalized plant parameters (from input pre-processing
function)
4. Confirmation of diagnosis result Highest probability event (from event diagnosis function) Identified event (to operators)
function Normalized plant parameters (from Input pre-processing Incorrect diagnosis results (to event diagnosis function)
function)
Fig. 5. Various processes in the unknown event identification function.
a ¼ Smean þ 3Sstd
0
f s ðxt ; £; hÞ is an anomaly score calculator. The anomaly score is ð14Þ
defined as the negative log-likelihood of xt , represented in Eq.
The threshold a was determined using three-sigma limits
0
(13), with respect to the reconstructed distribution of xt from an

LSTM-VAE network. Here, lxt and rxt are the mean and variance (Shewhart, 1931) after considering the anomaly score distribution
of the training data. The fact that the anomaly scores are achieving
of the reconstructed distribution from an LSTM-VAE network with
a smaller value indicates that the output data (i.e., reconstructed
parameters £ and h, respectively. Fig. 6 shows an example of the
data from LSTM-VAE) is similar to the training data and the thresh-
anomaly score calculation. Note that xt 2 R3 (for x1;t ; x2;t ; x3;t ) repre-
old considers the upper control limit. Smean and Sstd are the mean
sents the normalized input parameters, lxt 2 R3 (for lx1;t ; lx2;t ; lx3t ) and standard deviation of the anomaly scores of the training data,
represents the reconstruction mean, and rxt 2 R3 (for respectively (Eq. (14)). This not only sets a range for the process
rx1;t ; rx2;t ; rx3t Þ represents the reconstruction variance. Each ele- parameter at 0.27% control limits (corresponding to the three-
ment goes through the Eq. (13) then the anomaly score is calcu- sigma in normal distribution) but also minimizes the cost associ-
lated. Notice that in this example, the number of input ated with preventing the error of classifying a known event as
parameters (D) is three. Therefore, the anomaly score calculated unknown. Section 4.2.2 discusses the method to determine the
in this example is 0.927 as shown in Fig. 6. A high anomaly score threshold and hyperparameters.
means that the input has not been adequately reconstructed by
the LSTM-VAE network. 3.3. Event diagnosis function
(
Unknownev ent; iff s ðxt ; £; hÞ > a
0
ð12Þ This function produces diagnosis results of the plant situation

Knownev ent; otherwise; using LSTM. Fig. 7 shows the processes in the event diagnosis func-
tion. The LSTM network receives normalized plant parameters
1X D from the input pre-processing function and produces identified
f s ðxt ; £; hÞ ¼ logp xi;t ; lxi;t ; rxi;t ð13Þ abnormal events with their probabilities. The output is post-
D i¼1
processed using the softmax function. The probability represents
Fig. 6. Example of anomaly score calculation.
6
Fig. 7. Processes in the event diagnosis function.
Fig. 8. Processes in the confirmation of diagnosis results function.
Fig. 9. Plant overview interface of CNS.
7
the confidence level of the identified event. Then, this function nostic result. The procedure to determine hyperparameters, such
selects the event with the highest probability among the diagnostic as number of layers and nodes, is described in Section 4.2.3.
results for the confirmation function. In addition, multiple events
can be identified with different probabilities in the previous func-
3.4. Confirmation of diagnosis result function
tion. If the confirmation function returns the information stating
that the current situation is not consistent with the diagnostic
This function is used to confirm whether the current abnormal
result, then this function will select the next event with the highest
situation is identical to the event selected in the event diagnosis
probability until the current situation is consistent with the diag-
function. This function has a library that consists of LSTM-VAE net-
Table 2
Abnormal scenarios and the number of simulations.
No. Scenarios Training cases Verification cases Total cases

1 Failure of pressurizer pressure channel (High) 14 4 18
2 Failure of pressurizer pressure channel (Low) 20 6 26
3 Failure of pressurizer water level channel (High) – 6 6
4 Failure of pressurizer water level channel (Low) 11 4 15
5 Failure of steam generator water level channel (Low) 32 8 40
6 Failure of steam generator water level channel (High) 35 6 41
7 Control rod drop 38 10 48
8 Continuous insertion of control rod 7 1 8
9 Continuous withdrawal of control rod 6 2 8
10 Opening of pressurizer power-operated relief valve 42 10 52
11 Failure of pressurizer safety valve 35 8 43
12 Opening of pressurizer spray valve 41 9 50
13 Stopping of charging pump – 1 1
14 Stopping of two main feedwater pumps – 3 3
15 Main steam line isolation – 3 3
16 Rupture at the inlet of the regenerative heat exchanger 40 10 50
17 Leakage from chemical volume and control system to component coolant water (CCW) 40 10 50
18 Leakage at the outlet of charging control flow valve 24 6 30
19 Leakage into the CCW system from the reactor coolant system 24 6 30
20 Leakage from steam generator tube – 36 36
Total 409 149 558
Fig. 10. Examples of pressurizer temperature data ((a): original CNS data and (b): data with ± 5% Gaussian noise).
Fig. 11. Example of normalized pressurizer temperature.
8
Fig. 12. Identification of unknown event function using the LSTM-VAE network.
Table 3
Accuracy comparison results of various configured networks.
No. Time step Batch size Layers Accuracy

1 5 32 2 0.9668
2 5 32 3 0.9638
3 5 64 2 0.9634
4 5 64 3 0.9650
5 10 32 2 0.9768
6 10 32 3 0.9746
7 10 64 2 0.9764
8 10 64 3 0.9741
9 15 32 2 0.9767
10 15 32 3 0.9762
11 15 64 2 0.9764
12 15 64 3 0.9766
Fig. 13. Result of ROC and AUC of the identification of unknown event function.
Next, it verifies whether the current situation is identical to the
selected event by using the LSTM-VAE network. To estimate the
works for trained events. Further, the LSTM-VAE network for the anomaly score, this function uses negative log-likelihood (i.e., sim-
selected event is used to confirm that the selected event is identical ilar to ‘‘the unknown event identification function”). If the negative
to the trained event. log-likelihood is below the threshold, then the algorithm declares
Fig. 8 shows the confirmation process of the selected event. that the diagnosis result from the event diagnosis function is cor-
First, this function selects the LSTM-VAE network from the library rect and confirmed. If the negative log-likelihood is beyond the
that corresponds to the event identified in the previous function. threshold, then it returns to the previous function to select another
9
event. The thresholds of LSTM-VAE networks are determined in a The collected data were added with ± 5% Gaussian noise to
similar manner as that described in Section 3.2. reflect actual signals from NPPs. The CNS produces data without
noises, as shown in Fig. 10(a). The noise was added to the CNS data
intentionally, as shown in Fig. 10(b).
4. Implementation
4.2. Function implementation
The suggested algorithm was implemented by using a CNS that
simulates a Westinghouse 900 MWe, three loops, pressurized
4.2.1. Input preprocessing function
water reactor. Fig. 9 shows a plant overview interface of the CNS.
To implement the min–max normalization in the input prepro-
For implementation, a desktop computer with NVIDIA GeForce
cessing function, we determined the maximum and minimum val-
GTX 1080 11 GB GPU, Intel 4.00 GHz CPU, Samsung 850 PRO
ues of parameters based on all the collected data (i.e., 558 cases).
512 GB MZ-7KE512B SSD, and 24 GB RAM was used.
Moreover, as ± 5% Gaussian noise was added to the simulator data
Python 3.7.3 was used as the coding language, and several python
(making it similar to actual NPP data), some data can be larger than
libraries, including Keras and Pandas, were used to model the
the maximum values or lower than the minimum values. There-
algorithm.
fore, the data will not fall between 0 and 1 when normalized. To
prevent this problem, we added a 10% margin to the maximum
and 10% to the minimum values for each parameter. Fig. 11
4.1. Data collection
shows an example of a normalized pressurizer temperature.
To implement, train, and validate the algorithm, CNS was used
as a real-time testbed. A total of 20 abnormal situations and 558 4.2.2. Identification of unknown event function
cases were simulated to collect data. Table 2 shows the abnormal The unknown event function was identified using the LSTM-
scenarios and the numbers of simulations for each event. The sce- VAE network. In this network, a dataset has 10 s sequence and
narios included representative abnormal situations in actual NPPs, 139 input values. As mentioned in Section 4.1, 409 scenarios (i.e.,
such as instrument failures (Nos. 1–6), component failures (Nos. 7– 192,637 datasets for 139 parameters) were trained. Fig. 12 illus-
16), and leakages (Nos. 17–20). trates how the unknown event function is identified by using the
Based on the abnormal operating procedures of the reference LSTM-VAE network. VAE does not necessarily tune the hyperpa-
plant, 139 parameters were selected for input, including plant vari- rameters as it provides a variational information bottleneck that
ables (e.g., temperature or pressure) and component states (e.g., prevents overfitting. Consequently, it has the effect of making the
pump or valve status). These parameters were collected every sec- optimal bottleneck size in a given hyperparameter (Tishby et al.,
ond during the simulations. 2015; Alemi et al., 2019; Ruder, 2016). Additionally, this LSTM-
Among the 20 scenarios collected, 15 scenarios containing 409 VAE network used Adam (Kingma and Ba, 2015) optimizer with a
cases were used for training, and 5 scenarios were used for validat- learning rate of 0.0001 and ran for 100 epochs.
ing untrained events. A total of 149 cases were used for validating This study used the receiver operating characteristic (ROC)
the algorithm, including the five untrained events (i.e., 3, 13, 14, curve to evaluate the performance of the network. The ROC curve
15, and 20). Among them, 115 cases were used for determining is created by plotting the true positive rate (i.e., the ratio of cor-
the thresholds of identification of unknown event function and rectly predicted positive observations to all observations in the
confirmation of diagnosis result function. actual class) against the false positive rate (i.e., the ratio of the
Fig. 14. Event diagnosis function using LSTM and softmax.
10
Fig. 15. Illustration of the confirmation of diagnosis results function.
Table 4
Results of training data, performance, and thresholds of LSTM-VAE networks.
No. Name Trained data AUC Threshold

1 VAE Ab 01 3459 datasets (Ab 01) 0.995 0.923
6 VAE Ab 07 34,735 datasets (Ab 07) 0.999 0.920
16 VAE Normal 7602 datasets (Normal state) 0.995 0.923
incorrectly predicted negative observations to all observations in of the test and is interpreted as the average value of sensitivity
the actual class) is a useful method of interpreting the performance for all possible values of specificity. It can also take on any value
of a binary classifier. The area under the ROC curve (AUC) is an between 0 and 1, where a value of 0 indicates a perfectly inaccu-
effective measure to summarize the overall diagnostic accuracy rate test. The closer AUC is to 1, the better the overall diagnostic
11
performance. In general, an AUC of 0.5 suggests no discrimination, in a manner similar to the identification of an unknown event.
0.7–0.8 is considered acceptable, 0.8 to 0.9 is considered excellent, Table 4 shows the training dataset, AUC, and threshold of each
and more than 0.9 is considered outstanding (Bergstra and Bengio, LSTM-VAE network in the confirmation of diagnosis result function.
2012; Mandrekar, 2010; Park et al., 2004). The threshold was
determined as 0.923 using Eq. (14). Fig. 13 shows the result of
ROC and AUC of LSTM-VAE. 4.3. Verification
Verifications were performed for 149 scenarios (i.e., 57,309

4.2.3. Event diagnosis function datasets). Among them, 100 cases and 47,987 datasets were used
The event diagnosis function was developed using LSTM. A total for the trained events, while 49 cases and 9322 datasets were used
of 409 scenarios (i.e., 192,637 datasets for 139 parameters) were for untrained events. As a result of the verification, the accuracy of
trained. To optimize the LSTM network in the event diagnosis func- the network of unknown event identification (i.e., predicting a
tion, a manual search method was used while individually adjust- trained event as a known event and an untrained event as an
ing the hyperparameters (e.g., input sequence length, batch size, unknown event) is 96.72%. In addition, the accuracy of the net-
and number of layers). In general, there is no golden rule for hyper- work for confirmation of diagnosis results is 98.44%. The verifica-
parameter determination to optimize the network (Ruder, 2016; tion demonstrated that the algorithm could successfully
Bergstra and Bengio, 2012; Snoek et al., 2012). Table 3 shows the diagnose the trained events and identify the untrained events.
accuracy comparison results of different configured networks. The Fig. 16 shows an example for the diagnosis of an untrained event
accuracy is defined as the ratio of correctly predicted data to total (i.e., Ab 20), which is a leakage from the steam generator tubes.
verification data. Consequently, an optimal LSTM network with In this case, the identification of unknown event function identifies
10 s time steps, 32 batch sizes, and 2 layers was selected. This net- the event as an untrained event because the construction error
work used the Adam optimizer (Kingma and Ba, 2015) with a learn- goes beyond the threshold. Thus, the message ‘‘Unknown Event”
ing rate of 0.0001 and ran for 100 epochs. Fig. 14 shows the is provided.
illustration of the event diagnosis function using LSTM and softmax. Fig. 17 illustrates how the algorithm diagnoses a trained event
(i.e., event Ab 08), which is the continuous insertion of control
4.2.4. Confirmation of diagnosis result function rod. The input pre-processing function normalizes plant parame-
To implement this function, an LSTM-VAE library was devel- ters. Subsequently, the identification of unknown event function
oped. This library comprises 16 LSTM-VAE networks that are identifies the current situation as a trained event. The event diag-
trained for each known event (i.e., 15 abnormal events and 1 nor- nosis function examines the event as Ab 08, and diagnosis result
mal state). Fig. 15 shows the illustration of the confirmation of diag- function confirms that the diagnosis result is correct. Finally, ‘‘Ab
nosis result function (i.e., when Ab 01 event is diagnosed). Each 08: continuous insertion of control rod” is presented as the diagno-
LSTM-VAE network and threshold were comprised and determined sis result for the current situation.
Fig. 16. Process of diagnosing an untrained event.
12
Fig. 17. Process of diagnosing a trained event.
13
Table 5 5. Result and discussion

Accuracy comparison between LSTM networks.
Data type Accuracy Many other AI-based diagnostic algorithms applied data with-
LSTM networks Original CNS data without noise 0.9735 out noise, whereas, in our algorithm, ±5% Gaussian noise was
CNS data with noise 0.9768 added to the simulator data, as the actual signal in NPPs generally
contains some noise. In the previous section, the proposed algo-
rithm showed a good performance for the noise-added data. We
Table 6
also compared the performance of the algorithm between the
AUC comparison between VAE networks. noise-added and the noise-free data. Tables 5 and 6 show the accu-
racy and AUC comparisons of VAE networks for two different data,
No. Name AUC
respectively. The comparison indicates that the algorithm pro-
Original CNS data CNS data with noise posed in this study shows similar performances for noise-added
1 VAE Unknown 0.973 0.981 and noise-free data.
2 VAE Ab 01 0.987 0.995 In this study, the VAEs have been applied to compensate for the
3 VAE Ab 02 0.931 0.934
disadvantages of AE. We compared the performance of the algo-
4 VAE Ab 04 0.993 0.993
5 VAE Ab 05 0.998 0.998 rithm between the AE network and the VAE network for the iden-
6 VAE Ab 06 0.974 0.974 tification of unknown event functions. Confusion matrix and
7 VAE Ab 07 0.998 0.999 accuracy of verification results are used to evaluate the accuracy
8 VAE Ab 08 0.959 0.958
and validity of the algorithms. The confusion matrix is adopted
9 VAE Ab 09 0.956 0.955
10 VAE Ab 10 0.977 0.985
to show the diagnostic algorithm accuracy for the trained data
11 VAE Ab 11 0.937 0.938 and untrained data. Figs. 18 and 19 show the confusion matrixes
12 VAE Ab 12 0.995 0.996 of the AE network and VAE network for the identification of
13 VAE Ab 16 0.954 0.965 unknown event functions, respectively. The comparison indicates
14 VAE Ab 17 0.952 0.953
that the AE network and VAE network show similar accuracy for
15 VAE Ab 18 0.982 0.984
16 VAE Ab 19 0.969 0.967 trained data. However, for the untrained data, the VAE network
17 VAE Normal 0.995 0.995 shows a 20% higher accuracy than the AE network (i.e., AE:
70.01% and VAE: 92.37%).
Fig. 18. Confusion matrix of the AE network for identification of unknown event function.
14
Fig. 19. Confusion matrix of the VAE network for identification of unknown event function.
6. Conclusion tion. Jonghyun Kim: Supervision, Project administration,

Validation.
In this study, an algorithm that uses LSTM and VAE networks to
diagnose abnormal situations in NPPs was proposed. The proposed Declaration of Competing Interest
algorithm has the capability of finding unknown situations, diag-
nosing known situations, and confirming the results. For a more The authors declare that they have no known competing finan-
realistic evaluation, noise-added signals were also considered. cial interests or personal relationships that could have appeared
The validation demonstrated that the proposed algorithm could to influence the work reported in this paper.
provide correct diagnostic results as intended. This algorithm can
be applied to an operator support system to improve the operator’s
Acknowledgments
situation awareness during abnormal situations in NPPs.
However, there are some further works for this approach to be
This work was supported by the National Research Foundation
implemented as an operator support system in actual NPPs. First,
of Korea (NRF) grant funded by the Korean government (Ministry
an operator support system needs to provide the information
of Science and ICT) (2018M2B2B1065651) and by the Basic Science
about how the system draws a diagnostic result as well as the
Research Program through the NRF funded by the Ministry of
event identified. One of solutions to this ‘‘black box” problem is
Science, ICT & Future Planning (N01190021-06).
applying explainable AI technologies. Second, the capability to
detect the failure of sensors or signals is necessary. Since the cor-
rectness of support systems largely relies on the integrity and qual- References
ity of signals from the NPP, wrong signals may directly lead to
Alemi, A.A., Fischer, I., Dillon, J. V., Murphy, K., 2019. Deep variational information
wrong information to operators. Therefore, the algorithm to iden-
bottleneck. 5th Int. Conf. Learn. Represent. ICLR 2017 - Conf. Track Proc. 1–19.
tify signal failures needs to be studied. An, J., Cho, S., 2015. Variational autoencoder based anomaly detection using
reconstruction probability. Special Lecture on IE 2 (1).
Bae, H., Chun, S.P., Kim, S., 2006. Predictive fault detection and diagnosis of nuclear
power plant using the two-step neural network models. Lect. Notes Comput.
Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 3973
CRediT authorship contribution statement LNCS, 420–425. https://doi.org/10.1007/11760191_62
Bergstra, J., Bengio, Y., 2012. Random search for hyper-parameter optimization. J.
Hyojin Kim: Conceptualization, Software, Visualization, Mach. Learn. Res. 13, 281–305.
Bishop, C.M., 2006. Pattern recognition and machine learning. Springer.
Writing-Original draft, Data curation, Investigation, Formal analy- Chen, R.-Q., Shi, G.-H., Zhao, W.-L., Liang, C.-H., 2019. Sequential VAE-LSTM for
sis. Awwal Mohammed Arigi: Writing-Review & Editing, Valida- Anomaly Detection on Time Series 1–7.
15
Embrechts, M.J., Benedek, S., 2004. Hybrid identification of nuclear power plant Park, D., Hoshi, Y., Kemp, C.C., 2018. A multimodal anomaly detector for robot-
transients with artificial neural networks. IEEE Trans. Ind. Electron. 51, 686– assisted feeding using an LSTM-based variational autoencoder. IEEE Robot.
693. https://doi.org/10.1109/TIE.2004.824874. Autom. Lett. 3, 1544–1551. https://doi.org/10.1109/LRA.2018.2801475.
Fantoni, P.F., Mazzola, A., 1996. A pattern recognition-artificial neural networks Park, J., Lee, D., Jung, W., Kim, J., 2017. An experimental investigation on relationship
based model for signal validation in nuclear power plants. Ann. Nucl. Energy 23, between PSFs and operator performances in the digital main control room. Ann.
1069–1076. https://doi.org/10.1016/0306-4549(96)84661-5. Nucl. Energy 101, 58–68. https://doi.org/10.1016/j.anucene.2016.10.020.
Hsieh, M.H., Hwang, S.L., Liu, K.H., Liang, S.F.M., Chuang, C.F., 2012. A decision Park, J., Arigi, A.M., Kim, J., 2019. A comparison of the quantification aspects of
support system for identifying abnormal operating procedures in a nuclear human reliability analysis methods in nuclear power plants. Ann. Nucl. Energy
power plant. Nucl. Eng. Des. 249, 413–418. https://doi.org/10.1016/j. 133, 297–312. https://doi.org/10.1016/j.anucene.2019.05.031.
nucengdes.2012.04.009. Ruder, S., 2016. An overview of gradient descent optimization algorithms 1–14.
Kim, Y., Jung, W., 2013. A Diagnosis Support System for Abnormal Situations of Sßeker, S., Ayaz, E., Türkcan, E., 2003. Elman’s recurrent neural network applications
Hanbit Units. Transactions of the Korean Nuclear Society Autumn Meeting. to condition monitoring in nuclear power plant and rotating machinery. Eng.
Kingma, D.P., Ba, J.L., 2015. Adam: A method for stochastic optimization. 3rd Int. Appl. Artif. Intell. 16, 647–656. https://doi.org/10.1016/j.engappai.2003.10.004.
Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 1–15. Shewhart, W.A., 1931. Economic control of quality of manufactured product.
Kingma, D. P., Mohamed, S., Rezende, D. J., & Welling, M., 2014. Semi-supervised Macmillan And Co Ltd, London.
learning with deep generative models. In Advances in neural information Snoek, J., Larochelle, H., Adams, R.P., 2012. Practical bayesian optimization of
processing systems (pp. 3581-3589). machine learning algorithms. Adv. neural Inf. Process. Syst., 2951–2959
Lee, D., Arigi, A.M., Kim, J., 2020. Algorithm for autonomous power-increase Tishby, N., Zaslavsky, N., 2015. Deep learning and the information bottleneck
operation using deep reinforcement learning and a rule-based system. IEEE principle. 2015 IEEE Inf. Theory Work. ITW 2015. https://doi.org/10.1109/
Access 1–1. https://doi.org/10.1109/access.2020.3034218. ITW.2015.7133169.
Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R., 2016. Learning to diagnose with LSTM Uhrig, R.E., Hines, J., 2005. Computational intelligence in nuclear engineering. Nucl.
recurrent neural networks. 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Eng. Technol. 37 (2), 127–138.
Track Proc. 1–18. Wang, W., Yang, M., Seong, P.H., 2016. Development of a rule-based diagnostic
Mandrekar, J, 2010. Receiver operating characteristic curve in diagnostic test platform on an object-oriented expert system shell. Ann. Nucl. Energy 88, 252–
assessment. Journal of Thoracic Oncology 5 (9), 1315–1316. https://doi.org/ 264. https://doi.org/10.1016/j.anucene.2015.11.008.
10.1097/JTO.0b013e3181ec173d. Xu, H., Chen, W., Zhao, N., Li, Zeyan, Bu, J., Li, Zhihan, Liu, Y., Zhao, Y., Pei, D., Feng, Y.,
Monner, D., Reggia, J.A., 2012. A generalized LSTM-like training algorithm for Chen, J., Wang, Z., Qiao, H., 2018. Unsupervised Anomaly Detection via
second-order recurrent neural networks. Neural Networks 25, 70–83. https:// Variational Auto-Encoder for Seasonal KPIs in Web Applications. Web Conf.
doi.org/10.1016/j.neunet.2011.07.003. 2018 - Proc. World Wide Web Conf. WWW 2018 2, 187–196. https://doi.org/
Murphy, K.P., 2012. Machine Learning – A Probabilistic Perspective – Table-of- 10.1145/3178876.3185996.
Contents. MIT Press 1049. https://doi.org/10.1038/217994a0. Yang, J., Kim, J., 2018. An accident diagnosis algorithm using long short-term
Na, M.G., Park, W.S., Lim, D.H., 2008. Detection and diagnostics of loss of coolant memory. Nucl. Eng. Technol. 50, 582–588.
accidents using support vector machines. IEEE Trans. Nucl. Sci. 55, 628–636. Yang, J., Kim, J., 2020. Accident diagnosis algorithm with untrained accident
https://doi.org/10.1109/TNS.2007.911136. identification during power-increasing operation. Reliab. Eng. Syst. Saf. 202.
Ohga, Y., Seki, H., 1993. Abnormal event identification in nuclear power plants using https://doi.org/10.1016/j.ress.2020.107032.
a neural network and knowledge processing. Nucl. Technol. 101, 159–167 Zhou, C., Paffenroth, R.C., 2017. Anomaly detection with robust deep autoencoders.
https://doi.org/10.13182/NT93-A34777. Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. Part F129685, 665–674.
Park, S.H., Goo, J.M., Jo, C.H., 2004. Receiver operating characteristic (ROC) curve: https://doi.org/10.1145/3097983.3098052.
Practical review for radiologists. Korean J. Radiol. 5, 11–18. https://doi.org/
10.3348/kjr.2004.5.1.11.
16

Annals of Nuclear Energy: Hyojin Kim, Awwal Mohammed Arigi, Jonghyun Kim

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Annals of Nuclear Energy: Hyojin Kim, Awwal Mohammed Arigi, Jonghyun Kim

Uploaded by

Copyright:

Available Formats

Annals of Nuclear Energy 153 (2021) 108077

Contents lists available at ScienceDirect

Annals of Nuclear Energy

Development of a diagnostic algorithm for abnormal situations using

1. Introduction To address these issues, several researchers have suggested

Fig. 1. Architecture of LSTM.

it ¼ rðW i ½C t1 ; ht1 ; xt þ bi Þ ð2Þ

ot ¼ rðW o ½C t ; ht1 ; xt þ bo Þ ð3Þ

The encoder in Eq. (6) maps an input X to a hidden layer repre-

Fig. 3. Architecture of VAE.

Fig. 4. Functional architecture of the diagnostic algorithm design.

No. Function Input Output

Fig. 5. Various processes in the unknown event identification function.

(13), with respect to the reconstructed distribution of xt from an

ð12Þ This function produces diagnosis results of the plant situation

Fig. 6. Example of anomaly score calculation.

Fig. 7. Processes in the event diagnosis function.

Fig. 8. Processes in the confirmation of diagnosis results function.

Fig. 9. Plant overview interface of CNS.

No. Scenarios Training cases Verification cases Total cases

Fig. 11. Example of normalized pressurizer temperature.

No. Time step Batch size Layers Accuracy

Fig. 14. Event diagnosis function using LSTM and softmax.

Fig. 15. Illustration of the confirmation of diagnosis results function.

No. Name Trained data AUC Threshold

Verifications were performed for 149 scenarios (i.e., 57,309

Fig. 16. Process of diagnosing an untrained event.

Fig. 17. Process of diagnosing a trained event.

Table 5 5. Result and discussion

6. Conclusion tion. Jonghyun Kim: Supervision, Project administration,

You might also like