Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: We propose a novel sequence-to-sequence prediction approach for the estimation of the remaining use-
Received 31 May 2021 ful lifetime (RUL) of technical components. The approach is based on deep recurrent neural network
Revised 8 July 2021
structures, namely bidirectional Long Short Term Memory (LSTM) networks, which we augment with an
Accepted 20 July 2021
attention mechanism to allow for a more fine-grained information flow between the input and output
Available online 31 July 2021
sequence. Using the base architecture as a reference, we experiment with various forms of attention
Keywords: mechanisms as well as different forms of additional input embeddings. Further, we analyse the impact of
Remaining useful lifetime estimation the sequence length on the estimation quality. We apply our approach to the well known C-MAPSS data
Encoder-decoder networks set previously serving as a benchmark dataset for RUL prediction. We obtain state of the art results on
Bidirectional long-short term memory the data set and provide a thorough hyperparameter study that underlines, that more simple but well
Attention mechanism tuned architecture can achieve comparable or better performance than highly complex architectures.
© 2021 The Authors. Published by Elsevier Ltd.
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
https://doi.org/10.1016/j.iswa.2021.20 0 049
2667-3053/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Neural networks have been researched in different application provide results on the C-MAPPS data set. Section 8 concludes the
fields including modelling Esen et al. (2008); Hikmet Esen et al. paper.
(2009, 2017), forecasting Hikmet Esen et al. (2008a,b), machine
health monitoring Zhao et al. (2019a) and RUL estimation. Mainly, 2. Related work
convolutional neural networks (CNN) Yann LeCun et al. (1990) and
recurrent neural networks (RNN) such as long short term memo- We present an overview on remaining useful lifetime estima-
ries (LSTM) Hochreiter and Schmidhuber (1997), and gated recur- tion methods with particular emphasis on the recently prevalent
rent units (GRU) Junyoung Chung et al. (2014) have provided more deep learning approaches while we refer to Jardine et al. (2006);
dynamic and efficient ways of performing RUL estimations. Tsui et al. (2015) for overviews on classical techniques. A review
This paper introduces a novel deep learning architecture for re- on the application of deep learning approaches for prognostics and
maining useful lifetime estimation that operates directly on raw health management is presented in Khan and Yairi (2018).
data sets. Notably, we propose to apply sequence-to-sequence ar- Notably, two types of Deep Neural Networks (DNN) have been
chitectures Sutskever (Vinyals) best known from natural language used in RUL estimation, namely RNNs and CNNs. Additionally, var-
processing tasks to extract the complex degradation pattern inher- ious architectures combining both network types have been pro-
ent to RUL estimation from a machine’s sensor readings. To this posed in the literature. In Zheng et al. (2017), a stacked LSTM
end, we opt for a bidirectional LSTM as the base network, which network combined with multiple fully connected (FC) layers has
we operate in an encoder-decoder (ED) style manner. Following the been presented while in Wang et al. (2018) a stacked bidirec-
recent improvements to encoder-decoder architectures provided by tional LSTM network with additional Fully Connected (FC) lay-
attention mechanisms, we augment the architecture with different ers for RUL prediction of C-MAPSS dataset is shown. Alternatively,
attention modules differing mainly in mapping input and hidden Zhao et al. (2019b) first construct trend features that are fed to
variables to the attention keys and queries. Further, we provide a a stacked LSTM network for predicting RULs. The approach pro-
detailed discussion on the impact of different ED architecture hy- posed in Listou Ellefsen et al. (2019) introduces an unsupervised
perparameters on the RUL estimation performance and the input pre-training stage using Restricted Boltzmann Machine (RBM) to
encoding required for the sequence processing. The approach is extract complex raw input features and subsequently employ a ge-
applied to the well known C-MAPSS Saxena et al. (06/10/2008 - netic algorithm to tune the hyperparameters. Following the pre-
09/10/2008) benchmark dataset and compared with other meth- training, supervised training is performed for RUL prediction. A
ods. The results obtained with the proposed architecture perform vanilla LSTM neural network model along with dynamic differen-
superior or on par with actual state-of-the-art approaches under- tial technology to get good RUL prediction accuracy is proposed
lining the approach’s applicability. Particularly, we show that even in Wu et al. (2018). A bidirectional LSTM model for RUL estimation
simpler architectures can result in comparable or better perfor- of the C-MAPPS dataset is presented in Elsheikh et al. (2019).
mance than more complex architectures with a high number of To further strengthen RNN-based encoder-decoder layers’ learn-
parameters. ing abilities, attention mechanisms have been proposed in Dzmitry
The contributions of the paper can be summarised as follows: Bahdanau et al. (2015) and Luong et al. (2015) to allow each de-
coder state to attend to all the encoder hidden states before gen-
• We present a novel approach for remaining useful life- erating the next output. Please refer to Chaudhari et al. (2019b) for
time estimation based on attention augmented sequence- a detailed overview of the different types of attention methods.
to-sequence neural networks using bidirectional LSTMs. The While these approaches concentrate on machine translation tasks,
sequence-to-sequence framework appears highly suitable to Chen et al. (2021) use such attention mechanism for RUL predic-
cover the inherent nonlinear trend analysis problem under- tion, where handcrafted features are extracted, concatenated with
lying RUL estimations. the LSTM output and fed to a regression layer. The attention-based
• A novel Shared Kernel Convolution (SKC) Neural Network ap- LSTM encoder-decoder network proposed in Ragab et al. (2020) re-
proach as an encoder to the bidirectional LSTM has been constructs the input data and generates RUL using a parallel RUL
proposed to model the inter-dependencies among the differ- predictor layer. The RUL predictor layer receives dual latent fea-
ent input variables. ture representations, i.e., the attended encoder latent representa-
• A universally applicable sliding window approach for target tions and the latent decoder representations as inputs.
sequence generation has been presented to better represent A survey of 1D CNN models for condition monitoring in ma-
the health status of a system. chines is presented in Kiranyaz et al. (2021). The first use of
• We discuss architectural variations, including different atten- CNNs for RUL estimation dates back to Sateesh Babu et al. (2016),
tion mechanisms operating on the hidden representation of while Li et al. (2018) performs convolution operation with ker-
the bidirectional LSTM and various types of input embed- nels of unit width. These unit width convolution kernels allow for
dings. Further, we analyse the impact of the sequence length kernel weights sharing across raw sensors, thereby enhancing the
on the prediction accuracy and provide a thorough analysis network’s ability to learn abstract feature information. Recently,
of hyperparameters, resulting in guidelines on architectural a generalized dilation CNN methodology for RUL estimation has
components’ suitability. been proposed in Gavneet Singh Chadha et al. (2021) to model
• We apply the approach to the well known C-MAPSS data long term time dependencies. An attention-based CNN approach
set serving as a very challenging benchmark study in pre- is proposed in Tan and Teo (2021) where the CNN extract features
vious works. We report state-of-the-art results on the data across multiple temporal axes fed to an attention layer to predict
set with comparably lightweight and straightforward archi- the RUL. The work in Tan and Teo (2021) replaces the Softmax ac-
tectures. tivation with a Sigmoid activation in the attention mechanism to
add multivariate RUL estimation features. The Global Luong con-
The paper is organised as follows. Section 2 discusses work re- catenation method to calculate attention alignment scores is pro-
lated to our approach. Section 3 presents preliminaries concerning posed in Paulo Roberto De Oliveira Da Costa et al. (2020) for RUL
the considered neural network structures. Section 4 presents the prediction for the C-MAPSS dataset. An attention CNN-LSTM ar-
novel approach for RUL estimation using RNNs followed by the de- chitecture is introduced in Zhang et al. (2017), where a CNN is
scription of the training data set generation for the Sequence-to- used for raw input feature extraction before feeding into a stacked
Sequence NN in 5 and the dataset description 6. In Section 7 we LSTM network. The LSTM output from all time-steps is attended
2
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
3
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
4
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
single output feature map is given by Gu et al. (2018), ED) structure. The ability of sequence mapping of variable input-
output lengths, i.e., Ti =To is exploited in the model shown in Fig. 4.
Zi f = [W T · xi:i+KL −1 + b] f (8)
The encoder is a BiLSTM network where the initial hidden and cell
where W is the kernel weight from the input to the output fea- states are initialized with zeros. After processing the input infor-
out
×N f m in ×KL ×KW −
→ ←−
ture map and, W ∈ RN f m , b is the bias and b ∈ mation in recurrent method, the final hidden states, hT and hT
out
Nfm
R . In the case of SK 2D-CNN with one input channel, W ∈ from both directions are concatenated to form a latent representa-
out
RN f m ×1×KL ×1 . For n = 0,1,... N output feature maps, the final con- tion, s0 . At any direction, hT ∈ RL×de where L denotes the number
volved output is given by, of stacked LSTM layers (L=1 in our case), and de denotes encoder
hidden dimension. Therefore, the combined latent representation
0:N
Cout = [Zi0 , Zi1 , Zi2 , ., ZiF ]0:N (9) is s0 ∈ R1×2de . Complex degradation pattern information from the
The convolution output is flattened and passed through input sequence is encoded in this latent representation, which is to
a fully connected (FC) layer with weight matrix, Wflatten ∈ be decoded and mapped into corresponding RUL points. Hence, the
hidden state of the first LSTM cell in the decoder is initialized by
R(N f m ·Ntw ·N f t )×(Ntw ·N f t ) in order to combine all the extracted infor-
this encoded latent representation. The input to the first decoder
mation across all feature maps. The FC output is then reshaped to
LSTM cell is a tensor of zeros, which serves as a start-of-sequence
create a vector of the original input size, i.e., RNtw ×N f t .
< SOS > token. The output from every decoder LSTM cell is passed
4. Sequence-to-sequence Neural Network based RUL estimation through a FC layer with weight, W∈ Rdd ×1 , where dd is the decoder
hidden dimension and 1 is the RUL dimension. The output is the
In this section, we present the novel sequence-to-sequence neu- predicted RUL in that decoder timestep and it is passed on as the
ral network architecture for RUL estimation. To this end, we first input to the next decoder LSTM cell. This recurrent operation con-
give an overview of the proposed architecture and subsequently, tinues until outputs from all pre-defined number of decoder LSTM
we explain the individual components in detail. cells, To are generated. The FC weights and biases are shared across
all the decoder LSTM timesteps.
4.1. Overview on the sequence-to-sequence architecture
4.3. Shared CNN encoder-decoder sequence-to-sequence predictor
Fig. 4 shows the architecture of the novel proposed sequence-
to-sequence prediction based neural network. As its core, we pro- As an input of the multivariate time series input data results
pose an encoder-decoder structure where a bidirectional LSTM rep- in potentially high dimensional encoder hidden dimension, we op-
resents the encoder. Simultaneously, the decoder is composed of a tionally augment the encoder BiLSTM with an additional input em-
unidirectional LSTM and a fully connected layer to form the out- bedding provided by a CNN-2D architecture with shared kernels
put of the network predicting a sequence of RUL estimation val- (SKC) as introduced in Section 3.3. The combined model is referred
ues. Additionally, we propose using the attention mechanism that to in this work as the Shared-Kernel Convolutional LSTM Encoder-
augments the network with relevance weights between the infor- Decoder (SKC LSTM ED) sequence RUL predictor and is shown in
mation on each decoder time-step and the encoder’s entire hidden Fig. 5.
state representation. We experiment with two types of encoder- The CNN layer potentially extracts complex features from the
decoder attention mechanisms, namely Dzmitry Bahdanau et al. raw input before feeding them to the BiLSTM encoder. The SKC
(2015) and Luong et al. (2015). Optionally, we further add in- layer kernel is of length, KL and unit width, KW = 1. The no. of
put encoding provided by a shared kernel CNN as described in feature maps are Nfm . Non-linear Leaky ReLU activation is applied
Section 3.3. In the following, we discuss the components of the after the CNN layer. Zero padding of appropriate lengths is applied
architecture in detail. along each feature map’s temporal edges to prevent information
loss in the sliding kernels. The CNN layer’s output is flattened to
4.2. Encoder-Decoder sequence-to-sequence predictor rearrange the information from the extracted feature maps in a
single column. The flattened tensor is then passed through a FC
As the core architecture of our proposed Sequence-to-Sequence layer with weight W ∈ R(N f m ·Ntw ·N f t )×(Ntw ·N f t ) , where, Ntw and Nft
RUL Predictor, we employ an LSTM-based encoder-decoder (LSTM are the no. of timesteps and no. of features respectively in the in-
5
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
put feature map. The output is then reshaped to a form that re- where
sembles the shape of the input feature map. This is done to enable exi
the bidirectional LSTM encoder to perform the recurrent operation So f tmax(xi ) = (12)
on an equivalent number of input time-steps and features as in the
K
xj
e
raw input tensor.
j=1
Attention mechanism in deep neural networks refers to “At- Ci j = WiTj · h̄i (13)
tending to” certain parts of the input for generating output. As dis-
4. Attention Combined, Idecoder, j-1 This step performs a
cussed in the previous sections, vanilla RNN or LSTM structures use
weighted concatenation with weight matrix, Wapplied ∈
the input data’s temporal dynamics and map it to sequential out-
R(dd +dRUL )×dd , between the context vector, Cij and the
put data. However, there is still the question of relevance between
decoder output from previous timestep, yˆ j−1 .
the output generated at a particular time-step and the input se-
quence used to generate that output. Even though LSTM reduces Idecoder, j−1 = Wapplied · [Ci j ; yˆ j−1 ] (14)
the effects of vanishing and exploding gradients for very long se-
quences, it does not eliminate them entirely. Moreover, NN archi- And finally,
tectures such as RNN, LSTM or CNN can fall short of processing
yˆ j = LSTM((Idecoder, j−1 ), s j−1 ) (15)
highly complex feature representations to generate accurate out-
puts. The attention mechanism counters this problem by finding where LSTM is the decoder in the model.
the relevance between the output data it needs to generate and
the encoder’s hidden state representations. The attention structure It can be deduced from Eq. (10)–(15), that the Bahdanau atten-
creates a context vector from encoder hidden states. It then allows tion model computes the relevance of all encoder hidden states,
the decoder to use the context vector to generate a more precise h̄i w.r.t the previous decoder hidden state, sj-1 to decide the in-
and relevant output Chaudhari et al. (2019a); Dzmitry Bahdanau put, Idecoder, j-1 for the current decoder timestep. The input is passed
et al. (2015); Luong et al. (2015); Swati Meshram (2019). Hence, through the decoder LSTM cell at that timestep to generate an out-
attention structures are simply an addition to the existing encoder- put, yˆ j . Moreover, this method deploys a weighted additive align-
decoder networks. The different types of attention methodologies ment technique Dzmitry Bahdanau et al. (2015). These are the key
and their use in this work are detailed in the following sections. differences between this attention technique and the Luong atten-
tion model which is detailed in the next section.
4.4.1. Bahdanau Attention
The attention model proposed by Dzmitry Bahdanau et al. 4.4.2. Luong attention
(2015) calculates the decoder output, yˆ j at any time j based on the The Luong attention model deploys an approach similar to
i=1,2,...,T encoder timesteps as follows: Bahdanau attention, which considers all encoder hidden states to
1. Alignment Score, Sij calculate the context vector and attention weights Luong et al.
(2015) also called the global attention method. However, this
Si j = Walign · tanh · [(Wh · h̄i ); (Ws · s j−1 )] (10) model also suggests using a more localised approach, where the
context and attention weights are computed by focusing on a
where Wh ∈ Rde ×dd
and Ws ∈ Rdd ×dd
are the weight matri-
windowed segment of the encoder hidden states. Such attention
ces for all encoder hidden states, h̄i and the corresponding
is known as local attention or window-based attention since the
previous decoder hidden state, sj-1 respectively and tanh is
ex −e−x
mechanism deploys a window of a certain length upon the encoder
the hyperbolic tangent function with tanh(x ) = ex +e−x states, depending upon a position token, pi received from the cur-
2. Attention Weights, Wij
rent decoder time-step. The selection of the window length and its
Wi j = So f tmax(Si j ) (11) centre can be made in two ways Luong et al. (2015):
6
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
• Monotonic Alignment: A fixed window length is selected It is evident from Eq. (17)–(21), that the Luong attention model
where the center point, pi depends upon the current de- computes the relevance of all encoder hidden states, h̄i w.r.t the
coder position. The window moves along with the progress current decoder hidden state, sj , and candidate output, ȳ j to decide
of each decoder time-step to the next. For an encoder with i the output, yˆ j for the current decoder timestep. This method also
timesteps and decoder with j timesteps, the window length, provides the flexibility to deploy multiplicative, weighted multi-
Lwindow in the local attention models in this work is calcu- plicative and weighted additive alignment calculations Luong et al.
lated as, (2015). The possibility of aligning the decoder representations with
a windowed segment of the encoder hidden states reduces com-
Lwindow = i − j + 1 (16) putation cost and focuses more on the relevant input informa-
• Predictive Alignment: This method predicts the alignment tion. Global and local attention techniques, along with the differ-
window by applying a Gaussian distribution centred around ent alignment formulas, have been used in developing encoder-
pt . Predictive alignment is not applied in this work. decoder architectures in this work, and their performances will be
compared in the experimental section.
Regardless of global or local attention, the Luong attention
model also follows the four previously mentioned steps but in a
4.5. Attention enhanced encoder-decoder predictor
different manner. For simplicity, the steps are shown as follows
concerning global attention Luong et al. (2015).
As previously discussed, the attention mechanism generates rel-
1. Alignment Score, Sij The alignment score, Sij in Luong atten- evance weights between the information on each decoder time-
tion model can be calculated by any of the following three step and the encoder’s entire hidden state representation. Hence,
methods: it allows stronger emphasis on the beneficial input information,
⎧ i.e. degradation information, for the corresponding RUL prediction
⎨h̄i · s j , Dot and improves the prediction performance. We test the proposed
Si j = Wh (h̄i ) · s j , General (17) models with the two types of encoder-decoder attention mecha-
⎩
tanh(Walign .[h̄i ; s j ] ), Concat enat e nisms proposed in Dzmitry Bahdanau et al. (2015) and Luong et al.
(2015). In both architectures, the encoder remains the stated BiL-
Where, Wh ∈ Rde ×dd , h̄i is the complete encoder hidden STM network with zero initialisation of the hidden and cell states.
states and sj is the current decoder hidden state. The final hidden states are concatenated and then used to initialise
2. Attention Weights, Wij the decoder hidden state. The LSTM ED architectures with atten-
tion are explained in the following.
Wi j = So f tmax(Si j ) (18)
7
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
timestep input. Thus, for every recurring decoder timestep, the at- decoded outputs can often suffer from slow convergence due to
tention mechanism generates attended inputs for that decoder cell processing faulty predictions at the initial training stages. The ac-
using the predicted RUL and hidden state from the previous step, cumulation of error for more extended periods can be avoided by
until the end of the sequence is reached. The collective outputs feeding the true label instead of the predicted outcome. Apart from
from all decoder timesteps represent the predicted sequence RUL. recurrent models, TF can also be used in auto-regressive models,
e.g, transformer since, it also operates on the principle of encoder-
4.5.2. RUL-Prediction using Luong Attention decoder structure. However, studies in Bengio et al. (2015); Goyal
The proposed LSTM ED RUL predictor with Luong attention is et al. (2016) have shown models to be suffering from ”Exposure
shown in Fig. 7. Luong Attention mechanism is implemented ac- Bias” during inference where the model fails to provide accurate
cording to Section 4.4.2. At any decoder timestep sj-1 , an alignment and stable predictions due to its high dependency on exposed
score is created between the entire encoder hidden representa- ground truth labels. As the model moves farther away from its own
tion, h̄i ∈ RTi ×2de and the instantaneous decoder hidden state, sj-1 ∈ predicted outputs, the discrepancy between training and inference
increases. This can be avoided by setting a TF-free running ratio,
R1×dd . This score is softmaxed and element-wise multiplication is
where the model will be trained for a certain period with right-
performed between the softmaxed score and h̄i to generate a con-
shifted ground truth labels and for the rest of the time with its
text vector. The decoder cell sj-1 creates a candidate output and a
predictions from the previous time step.
weighted addition between the candidate output and the context
vector generates the final predicted RUL, yˆ j−1 . This predicted RUL,
yˆ j−1 also serves as the input to the next decoder timestep sj . Thus
for every candidate output generated by a decoder timestep, Lu- 5. Sliding Window approach and sequence target RUL
ong attention block creates a final predicted output and the next generation
timestep input by attending to the encoder hidden representation
with respect to the candidate output. Training the sequence-to-sequence model in this work requires
Luong attention mechanism can also attend to a localised seg- feeding the input data in batches of sub-sequences. These sub-
ment of the encoder hidden states instead of the global hidden sequences optimise the model’s learning process since it learns to
states by performing local attention. In local attention mode, a po- map the corresponding health status from a short sequence of lo-
sitional token, Pj is sent to the attention block from the decoder calised data Ye and Keogh (2009). This sub-sequences creation is
hidden state based on which a local window is generated using achieved by using sliding windows, where a window of a specific
monotonic alignment (Section 4.4.2). The window ”rolls over” the length moves across the input signal over time, captures the infor-
encoder hidden states with the progression of each decoder time- mation at a certain instance and feeds it to the model to predict
step until the end of the sequence is reached. The collected RULs the corresponding RUL. The step size of the window is termed in
from the decoder’s recurrent operation is the sequence RUL pre- this work as stride. Additionally, the novelty of this work is the
diction of the Luong attention-based LSTM ED network. generation of RUL outputs in a sequence instead of a single data
point. This is proposed based on the hypothesis that a model can
4.6. Teacher Forcing better represent an engine’s health status in a sequence of RUL
labels. Sudden degradation or fluctuations occurring at a specific
In encoder-decoder models dealing with sequence-to-sequence input time window can be more easily mapped to a sequence of
predictions, teacher forcing (TF) is a technique used in training that RULs instead of averaging the learned information into a unit. The
feeds the ground truth label from the previous time-step instead RUL labels created by using the piece-wise linear function are un-
of the model predicted value as the input for processing the sub- folded to a specific size to achieve this. This size depends on the
sequent time-step output. TF is a commonly used feature in NLP length of the target sliding window, v. The space between two tar-
applications and is used in recurrent learning in RNNs Williams get RUL labels is filled with v-1 equidistant interval points. Hence,
and Zipser (1989). NN models that heavily depend on previously an output RUL sequence from the initial to the final cycle of length,
8
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Table 1
C-MAPSS Dataset Sensor Parameters Saxena et al. (06/10/2008
- 09/10/2008).
9
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Table 2
Operational Information of C-MAPSS Dataset .
Train Test
cyclesd(e,d )
egrading
(e,d )
= cyclestotal (e,d )
− cycleshealthy . (23)
10
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Table 3
Hyperparameters for ED RUL Predictors.
Hyperparameters Range
Predicted RUL Length (TRUL ) {10, 15, 20} {10, 15, 20}
Batch Size (BS) {1024} {256, 512}
SKC Kernel Length (KL ) {3, 5} {3, 5}
Encoder Hidden Size (de ) {32, 48, 64} {40, 64}
Decoder Hidden Size (dd ) {64, 96, 128} {80, 128}
Learning Rate (LR) {1e-1, 5e-2, 1e-2} {5e-2, 1e-2}
Teacher Forcing (TF%) {0, 30, 40} {0, 30, 40}
Table 4
Best Hyperparameters for ED Models.
TRUL 10 10 20
BS 1024 1024 512
KL - 3 -
de 32 64 40
dd 64 128 80
LR 1e-2 1e-2 1e-2
TF% 30 40 0
7.1. Effect of type of attention with encoder-decoder architectures 7.3. Comparison among different encoder structures in the
encoder-decoder architectures
The performance of the LSTM encoder-decoder model with dif-
ferent attention types (LSTM AT ED) is evaluated in this section. The SKC LSTM ED model adds a feature extracting SKC layer on
The LSTM AT ED model is separately evaluated with Bahdanau at- top of the LSTM Bahadanau AT ED layer as discussed in Section 3.3.
tention and the three variants of Luong attention (Dot product, The best performing hyperparameters for achieving the optimum
General and Concatenation) for both Global(Gl.) and Local (L) at- performance from this model are shown in Table 4.
tention operations. The hyperparameters for each of the models The inclusion of a CNN layer requires new additional hyperpa-
are shown in Table 4. As illustrated in Fig. 11, the model with rameters to be optimized. A fixed output feature map, N f m = 8 is
Bahdanau attention performs excellently and much better than selected while kernel lengths of 3 and 5 are inspected. All other
the other attention types. A very significant improvement is ob- training parameters remain the same as the LSTM ED model. The
served on sub-datasets FD0 01, FD0 03 and FD0 04 compared to the scores in the SKC LSTM Bahdanau AT model deteriorate compared
scores from previous models while maintaining a very good score to the LSTM Bahdanau AT ED model since it fails to strike a balance
in FD002. Among the Luong attention variants, the best perfor- in any case throughout the four sub-datasets. Therefore, it can be
11
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Fig. 14. Effect of Hidden Sizes (de , dd ) in LSTM Bahdanau Attention Encoder De-
coder Network.
Fig. 13. Analysis of SKC Layer in LSTM Bahdanau Attention Encoder Decoder Net-
work.
12
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Table 5
Comparison of Model Performance (Scores) with Related Literature.
13
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Fig. 17. Remaining Useful Lifetime (RUL) Estimation Plots by Proposed Models for sub-dataset FD001 Engine ID 100.
14
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Fig. 18. Testing Loss comparison between L STM ED, L STM Bahdanau AT ED and L STM Luong Concat Gl. AT ED architectures for all C-MAPSS sub-datasets.
Fig. 19. Testing Loss comparison between L STM Bahdanau AT ED and SKC L STM Bahdanau AT ED architectures for all C-MAPSS sub-datasets.
decoder cell, allowing each candidate RUL to directly attend to the throughout the life cycle of the engines. These, however, do not
encoder hidden representations. This evidently benefits the model represent the score-based performance of the model architectures
as it begins performing at a much lower loss magnitude than com- since, the scoring function depends on the final RUL values of each
pared to LSTM ED or LSTM Bahdanau AT ED. LSTM AT ED mod- engine instead of the RUL at every instance.
els, especially with Bahdanau attention, appear to converge to the
minimum loss magnitude in the latter epochs than compared to 7.8. Comparison with existing literature
the vanilla LSTM ED model. The testing loss comparison between
LSTM Bahdanau AT ED and SKC LSTM Bahdanau AT ED is shown The sequence RUL predictor models presented in this work
in Fig. 19. While both models require a fair amount of epochs to are evaluated against the literature model performances based on
converge to a minimum loss point, the SKC LSTM Bahdanau AT their generated scores (Table 5) for the C-MAPSS Turbofan En-
ED accomplishes this faster compared to the LSTM Bahdanau AT gine Dataset. As evident in Table 5, the proposed LSTM ED and
ED. This reflects the contribution of a feature extracting SKC layer the LSTM Bahdanau AT ED model outperforms the best literature
that helps the model to reach early convergence and predict RULs score Tan and Teo (2021) for FD002. The proposed SKC LSTM ED
that are closer to the actual values from an earlier stage. These and LSTM Bahdanau AT ED model outperform the best literature
loss plots reflect the overall prediction capability of the models score Paulo Roberto De Oliveira Da Costa et al. (2020) for FD003.
15
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Table 6
Comparison of Model Performance (RMSE) with Related Literature.
are shown in Table 6. Even in this case, the proposed LSTM Bah-
danau AT ED model RMSE is either very close to, or much lower
than the lowest RMSE from the literature for all sub-datasets. In
addition, the attention-based SKC LSTM ED also produce excellent
losses, reflecting the ability of the attention mechanism to attend
to specific parts of the input for predicting nearly accurate RULs
throughout the prediction horizon.
The average score on all the operating conditions in the C-
MAPSS dataset in relation to the total number of trainable param-
eters in the best performing models from literature and our best
performing model is shown in Fig. 20. The best performing LSTM
Bahdanau AT ED based models require a similar number of train-
able parameters as the ones in literature but performs drastically
better with an average score of 847. The obtained results are supe-
rior to the current state of the art.
8. Conclusions
Fig. 20. Average score of our best architecture LSTM Bahdanau Attention Encoder
In this paper we propose a novel approach for remaining useful
Decoder Network on the C-MAPSS dataset compared to existing work in literature.
lifetime estimation based on sequence-to-sequence LSTM encoder-
decoder structures augmented with Bahdanau and Luong atten-
tion. The different Luong attention variants are examined in both
The performance of the LSTM Bahdanau AT ED model for FD001 global and local format. The contributions of a Shared kernel CNN
and FD004 is very closely matched to the best literature score layer for feature extraction in the LSTM-based attention and non-
in Tan and Teo (2021). attention encoder-decoders are also evaluated. Not all attention
A thorough observation of Table 5 confirms that the proposed variants perform equally good, with Bahdanau attention outper-
LSTM Bahdanau AT ED model outmatches every literature model forming the rest in terms of balanced sub-dataset scores.
when providing a highly balanced performance throughout all the A novel network architecture is proposed which allows for the
C-MAPSS sub-datasets. Among the literature, the attention-based reading of sequences of the multidimensional time series using
temporal CNN architecture Tan and Teo (2021) provides a good a sliding window for the input as well as the output sequences.
overall performance. Additionally, an overall comparison between This significantly increases the sample efficiency of the training
attention and non-attention architectures in Table 5 shows the sig- procedure and hence allows for estimation of RUL also in sparse
nificant improvement yielded by the attention-based models. The data sets with strongly limited test cases of machine degradation.
Transformer encoder model with Gated CNN Mo et al. (2021) does Therefore, the primary hypothesis of enhanced performance with
not deploy the scoring function as a performance metric and a sequence of RUL is achieved in this work. A comparison of the
hence, scores from this literature are not available for comparison. obtained results with the actual state-of-the-art results from the
A distinctive analysis can be made by comparing the models literature shows the superior performance of the proposed archi-
based on their performances throughout the engines life cycles. tectures.
This is evaluated by the RMSE losses of the models during the test- In future work, we will further optimize the training procedure
ing phase. The RMSE for all the implemented and literature models for the proposed architecture especially by using deeper CNN mod-
16
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
els with more feature maps and other kernel lengths. Furthermore, Gavneet Singh Chadha, Utkarsh Panara, Andreas Schwung, & Steven X. Ding
we will test alternative neural network architecture, particularly (2021). Generalized dilation convolutional neural networks for remaining use-
ful lifetime estimation. Neurocomputing, 452, 182–199. https://doi.org/10.1016/j.
transformer networks, for their ability to tackle remaining useful neucom.2021.04.109.
lifetime estimation problems. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feed-
forward neural networks. In Proceedings of the Thirteenth International Confer-
ence on Artificial Intelligence and Statistics. In Proceedings of Machine Learning
Declaration of competing interest Research: 9 (pp. 249–256). PMLR.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Adaptive computation
and machine learning. Cambridge, Massachusetts: The MIT Press.
• All authors have participated in the conception of the
Goyal, A., Lamb, A., Zhang, Y., Zhang, S., Courville, A., & Bengio, Y. (2016). Professor
manuscript, including the analysis and interpretation of the re- forcing: A new algorithm for training recurrent networks. In Proceedings of the
sults, drafting and revising the article for important intellectual 30th International Conference on Neural Information Processing Systems. In NIPS’16
content. (p. 46084616). Red Hook, NY, USA: Curran Associates Inc..
Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidi-
• The enclosed manuscript has not been submitted to, nor is con- rectional LSTM and other neural network architectures. Neural Networks : The
currently under review at another journal or publishing venue. Official Journal of the International Neural Network Society, 18(5–6), 602–610.
• The authors have no affiliation or involvement with any orga- https://doi.org/10.1016/j.neunet.2005.06.042.
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G.,
nization or entity with a direct or indirect financial interest in Cai, J., & Chen, T. (2018). Recent advances in convolutional neural networks. Pat-
the subject matter discussed in the manuscript. tern Recognition, 77, 354–377. https://doi.org/10.1016/j.patcog.2017.10.013.
• The name of the authors below have affiliations or involvement Heimes, F. O. (2008). Recurrent neural networks for remaining useful life estimation.
In 2008 International Conference on Prognostics and Health Management (pp. 1–
with organizations with a direct or indirect financial interest in 6). https://doi.org/10.1109/PHM.2008.4711422.
the subject matter discussed in the manuscript Hikmet Esen, Filiz Ozgen, Mehmet Esen, & Abdulkadir Sengur (2009). Artificial neu-
ral network and wavelet neural network approaches for modelling of a solar air
heater. Expert Systems with Applications, 36(8), 11240–11248. https://doi.org/10.
CRediT authorship contribution statement 1016/j.eswa.2009.02.073.
Hikmet Esen, Mehmet Esen, & Onur Ozsolak (2017). Modelling and experimental
Sayed Rafay Bin Shah: Software, Conceptualization, Method- performance analysis of solar-assisted ground source heat pump system. Journal
of Experimental & Theoretical Artificial Intelligence, 29(1), 1–17. https://doi.org/10.
ology, Validation, Formal analysis, Resources, Writing – original 1080/0952813X.2015.1056242.
draft, Writing – review & editing, Visualization. Gavneet Singh Hikmet Esen, Mustafa Inalli, Abdulkadir Sengur, & Mehmet Esen (2008a). Forecast-
Chadha: Conceptualization, Methodology, Validation, Formal anal- ing of a ground-coupled heat pump performance using neural networks with
statistical data weighting pre-processing. International Journal of Thermal Sci-
ysis, Resources, Writing – original draft, Writing – review & ences, 47(4), 431–441. https://doi.org/10.1016/j.ijthermalsci.20 07.03.0 04.
editing, Project administration. Andreas Schwung: Conceptualiza- Hikmet Esen, Mustafa Inalli, Abdulkadir Sengur, & Mehmet Esen (2008b). Perfor-
tion, Writing – original draft, Supervision, Project administration. mance prediction of a ground-coupled heat pump system using artificial neural
networks. Expert Systems with Applications, 35(4), 1940–1948. https://doi.org/10.
Steven X. Ding: Supervision, Project administration. 1016/j.eswa.2007.08.081.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computa-
References tion, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Isermann, R. (2005). Model-based fault-detection and diagnosis – status and ap-
Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional plications. Annual Reviews in Control, 29(1), 71–85. https://doi.org/10.1016/j.
neural network. In I. C. o. E. &. Technology (Ed.), Proceedings of 2017 Interna- arcontrol.20 04.12.0 02.
tional Conference on Engineering & Technology (icet’2017) (pp. 1–6). Piscataway, Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review on machinery diagnos-
NJ: IEEE. https://doi.org/10.1109/ICEngTechnol.2017.8308186. tics and prognostics implementing condition-based maintenance. Mechanical
Bektas, O., Jones, J. A., Sankararaman, S., Roychoudhury, I., & Goebel, K. (2019). A Systems and Signal Processing, 20(7), 1483–1510. https://doi.org/10.1016/j.ymssp.
neural network filtering approach for similarity-based remaining useful life es- 2005.09.012.
timation. The International Journal of Advanced Manufacturing Technology, 101(1– Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, & Yoshua Bengio (2014). Empir-
4), 87–103. https://doi.org/10.10 07/s0 0170-018-2874-0. ical evaluation of gated recurrent neural networks on sequence modeling. CoRR,
Bengio, S., Vinyals, O., Jaitly, N., & Shazeer, N. (2015). Scheduled sampling for se- abs/1412.3555.
quence prediction with recurrent neural networks. In Proceedings of the 28th Khan, S., & Yairi, T. (2018). A review on the application of deep learning in sys-
International Conference on Neural Information Processing Systems - volume 1. In tem health management. Mechanical Systems and Signal Processing, 107, 241–
NIPS’15 (p. 11711179). Cambridge, MA, USA: MIT Press. 265. https://doi.org/10.1016/j.ymssp.2017.11.024.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Kingma, D. P., & Ba, J. (2015). Adam: A Method for stochastic optimization. CoRR,
Journal of Machine Learning Research, 13(Feb), 281–305. abs/1412.6980.
Botchkarev, A. (2019). A new typology design of performance metrics to measure Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., & Inman, D. J. (2021).
errors in machine learning regression algorithms. Interdisciplinary Journal of In- 1D convolutional neural networks and applications: A survey. Mechanical Sys-
formation, Knowledge, and Management, 14, 045–076. https://doi.org/10.28945/ tems and Signal Processing, 151, 107398. https://doi.org/10.1016/j.ymssp.2020.
4184. 107398.
Chatterjee, S., & Litt, J. (2003). Online model parameter estimation of jet engine Kong, Z., Cui, Y., Xia, Z., & Lv, H. (2019). Convolution and long short-Term mem-
degradation for autonomous propulsion control. In Guidance, Navigation, and ory hybrid deep neural networks for remaining useful life prognostics. Applied
Control and Co-Located Conferences. https://doi.org/10.2514/6.2003-5425. Sciences, 9(19), 4156. https://doi.org/10.3390/app9194156.
Chaudhari, S., Polatkan, G., Ramanath, R., & Mithal, V. (2019a). An attentive survey Lei, Y., Li, N., Guo, L., Li, N., Yan, T., & Lin, J. (2018). Machinery health prognostics:
of attention models. ArXiv, abs/1904.02874. A systematic review from data acquisition to rul prediction. Mechanical Systems
Chaudhari, S., Polatkan, G., Ramanath, R., & Mithal, V. (2019b). An attentive survey and Signal Processing, 104, 799–834. https://doi.org/10.1016/j.ymssp.2017.11.016.
of attention models. arXiv preprint arXiv:1904.02874. Li, J., Li, X., & He, D. (2019a). A directed acyclic graph network combined with
Chen, Z., Wu, M., Zhao, R., Guretno, F., Yan, R., & Li, X. (2021). Machine remain- CNN and LSTM for remaining useful life prediction. IEEE Access, 7, 75464–75475.
ing useful life prediction via an attention-Based deep learning approach. IEEE https://doi.org/10.1109/ACCESS.2019.2919566.
Transactions on Industrial Electronics, 68(3), 2521–2531. https://doi.org/10.1109/ Li, X., Ding, Q., & Sun, J.-Q. (2018). Remaining useful life estimation in prognostics
TIE.2020.2972443. using deep convolution neural networks. Reliability Engineering & System Safety,
Dzmitry Bahdanau, Kyunghyun Cho, & Yoshua Bengio (2015). Neural machine trans- 172, 1–11. https://doi.org/10.1016/j.ress.2017.11.021.
lation by jointly learning to align and translate. CoRR, abs/1409.0473. Li, X., Zhang, W., & Ding, Q. (2019b). Deep learning-based remaining useful life es-
Elsheikh, A., Yacout, S., & Ouali, M.-S. (2019). Bidirectional handshaking lstm for re- timation of bearings using multi-scale feature extraction. Reliability Engineering
maining useful life prediction. Neurocomputing, 323, 148–156. https://doi.org/10. & System Safety, 182, 208–218. https://doi.org/10.1016/j.ress.2018.11.011.
1016/j.neucom.2018.09.076. Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A Critical Review of Recurrent Neural
Esen, H., Inalli, M., Sengur, A., & Esen, M. (2008). Artificial neural networks and Networks for Sequence Learning. https://arxiv.org/pdf/1506.0 0 019.
adaptive neuro-fuzzy assessments for ground-coupled heat pump system. En- Listou Ellefsen, A., Bjørlykhaug, E., Æsøy, V., Ushakov, S., & Zhang, H. (2019). Re-
ergy and Buildings, 40(6), 1074–1083. https://doi.org/10.1016/j.enbuild.2007.10. maining useful life predictions for turbofan engine degradation using semi-
002. Supervised deep architecture. Reliability Engineering & System Safety, 183, 240–
Frederick, D. K., DeCastro, J. A., & Litt, J. S. (2007). Users Guide for the Commercial 251. https://doi.org/10.1016/j.ress.2018.11.027.
Modular Aero-Propulsion System Simulation (C-MAPSS). Liu, H., Liu, Z., Jia, W., & Lin, X. (2019). A novel deep learning-based encoder-decoder
Gavneet Singh Chadha, Ambarish Panambilly, Andreas Schwung, & Steven X. Ding model for remaining useful life prediction. In 2019 International Joint Conference
(2020). Bidirectional deep recurrent neural networks for process fault classifica- on Neural Networks (ijcnn) (pp. 1–8). IEEE. https://doi.org/10.1109/IJCNN.2019.
tion. ISA Transactions, 106, 330–342. https://doi.org/10.1016/j.isatra.2020.07.011. 8852129.
17
S.R.B. Shah, G.S. Chadha, A. Schwung et al. Intelligent Systems with Applications 10–11 (2021) 200049
Liu, H., Liu, Z., Jia, W., & Lin, X. (2021). Remaining useful life prediction using a Wang, J., Wen, G., Yang, S., & Liu, Y. (2018). Remaining useful life estimation in
novel feature-attention-based end-to-end approach. IEEE Transactions on Indus- prognostics using deep bidirectional LSTM neural network. In 2018 Prognos-
trial Informatics, 17(2), 1197–1207. https://doi.org/10.1109/TII.2020.2983760. tics and System Health Management Conference (phm-chongqing) (pp. 1037–1042).
Luong, T., Pham, H., & Manning, C. D. (2015). Effective Approaches to Attention- https://doi.org/10.1109/PHM-Chongqing.2018.00184.
based Neural Machine Translation. In Proceedings of the 2015 Conference on Em- Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully
pirical Methods in Natural Language Processing (pp. 1412–1421). Lisbon, Portugal: recurrent neural networks. Neural Computation, 1(2), 270–280. https://doi.org/
Association for Computational Linguistics. https://doi.org/10.18653/v1/D15-1166. 10.1162/neco.1989.1.2.270.
Martin, K. F. (1994). A review by discussion of condition monitoring and fault diag- Wu, J. (2017). Introduction to convolutional neural networks. National Key Lab for
nosis in machine tools. International Journal of Machine Tools and Manufacture, Novel Software Technology. Nanjing University. China, 5, 23.
34(4), 527–551. https://doi.org/10.1016/0890- 6955(94)90083- 3. Wu, Y., Yuan, M., Dong, S., Lin, L., & Liu, Y. (2018). Remaining useful life estimation
Mo, Y., Wu, Q., Li, X., & Huang, B. (2021). Remaining useful life estimation via trans- of engineered systems using vanilla lstm neural networks. Neurocomputing, 275,
former encoder enhanced by a gated convolutional unit. Journal of Intelligent 167–179. https://doi.org/10.1016/j.neucom.2017.05.063.
Manufacturing. https://doi.org/10.1007/s10845- 021- 01750- x. Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, Richard E.
Nandi, S., Toliyat, H. A., & Li, X. (2005). Condition monitoring and fault diagnosis of Howard, Wayne E. Hubbard, & Lawrence D. Jackel (1990). Handwritten digit
electrical motors—A review. IEEE Transactions on Energy Conversion, 20(4), 719– recognition with a back-propagation network denver, colorado, usa, november
729. https://doi.org/10.1109/TEC.2005.847955. 27–30, 1989]. In David S. Touretzky (Ed.), Advances in Neural Information Pro-
Paulo Roberto De Oliveira Da Costa, Alp Akcay, Yingqian Zhang, & Uzay Kay- cessing Systems 2 (pp. 396–404). Morgan Kaufmann.
mak (2020). Attention and long short-Term memory network for remaining use- Ye, L., & Keogh, E. (2009). Time Series Shapelets: A New Primitive for Data Mining.
ful lifetime predictions of turbofan engine degradation. International Journal of In KDD ’09 (p. 947956). New York, NY, USA: Association for Computing Machin-
Prognostics and Health Management, 10, 12. ery. https://doi.org/10.1145/1557019.1557122.
Peng, C., Chen, Y., Chen, Q., Tang, Z., Li, L., & Gui, W. (2021). A remaining useful life Yu, Z., Ramanarayanan, V., Suendermann-Oeft, D., Wang, X., Zechner, K., Chen, L.,
prognosis of turbofan engine using temporal and spatial feature fusion. Sensors Tao, J., Ivanou, A., & Qian, Y. (2015). Using bidirectional LSTM recurrent neural
(Basel, Switzerland), 21(2). https://doi.org/10.3390/s21020418. networks to learn high-level abstractions of sequential features for automated
Ragab, M., Chen, Z., Wu, M., Kwoh, C.-K., Yan, R., & Li, X. (2020). Attention scoring of non-native spontaneous speech. In 2015 IEEE Workshop on Automatic
Sequence to Sequence Model for Machine Remaining Useful Life Prediction. Speech Recognition and Understanding (asru) (pp. 338–345). IEEE. https://doi.org/
https://arxiv.org/pdf/2007.09868. 10.1109/ASRU.2015.7404814.
Sateesh Babu, G., Zhao, P., & Li, X.-L. (2016). Deep convolutional neural network Zhang, H., Zhang, Q., Shao, S., Niu, T., & Yang, X. (2017). Attention-Based LSTM net-
based regression approach for estimation of remaining useful life. In Database work for rotatory machine remaining useful life prediction. IEEE Access, 1–9.
Systems for Advanced Applications (pp. 214–228). Springer International Publish- https://doi.org/10.1109/ACCESS.2020.3010066.
ing. https://doi.org/10.1007/978- 3- 319- 32025- 0_14. Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., & Gao, R. X. (2019a). Deep learning and
Saxena, A., Goebel, K., Simon, D., & Eklund, N. (06/10/2008 - 09/10/2008). Dam- its applications to machine health monitoring. Mechanical Systems and Signal
age propagation modeling for aircraft engine run-to-failure simulation. In 2008 Processing, 115, 213–237. https://doi.org/10.1016/j.ymssp.2018.05.050.
International Conference on Prognostics and Health Management (pp. 1–9). IEEE. Zhao, S., Zhang, Y., Wang, S., Zhou, B., & Cheng, C. (2019b). A recurrent neural net-
https://doi.org/10.1109/PHM.2008.4711414. work approach for remaining useful life prediction utilizing a novel trend fea-
Sutskever, I., Vinyals, O., & Le V, Q. Sequence to sequence learning with neural net- tures construction method. Measurement, 146, 279–288. https://doi.org/10.1016/
works. http://arxiv.org/pdf/1409.3215v3. j.measurement.2019.06.004.
Swati Meshram (2019). Survey on attention neural network models for natural lan- Zheng, S., Ristovski, K., Farahat, A., & Gupta, C. (2017). Long short-term memory net-
guage processing: 8. https://doi.org/10.15680/IJIRSET.2019.0810037. work for remaining useful life estimation. In 2017 IEEE International Conference
Tan, W. M., & Teo, T. H. (2021). Remaining useful life prediction using temporal con- on Prognostics and Health Management (icphm) (pp. 88–95). https://doi.org/10.
volution with attention. Artificial Intelligence (AI), 2(1), 48–70. https://doi.org/10. 1109/ICPHM.2017.7998311.
3390/ai2010 0 05.
Tsui, K. L., Chen, N., Zhou, Q., Hai, Y., & Wang, W. (2015). Prognostics and health
management: A review on data driven approaches. Mathematical Problems in
Engineering, 2015(6), 1–17. https://doi.org/10.1155/2015/793161.
18