Professional Documents
Culture Documents
Keywords: In practical applications, degradation level estimation is often facing the challenge of dealing with unlabeled
Degradation level estimation time series characterized by long-term temporal dependencies, which are typically not properly represented
Conceptors using sliding time windows. Inspired by the idea of representing temporal patterns by a mechanism of
Reservoir computing
neurodynamical pattern learning, called Conceptors, a two-stage method for the estimation of the equipment
Time series clustering
degradation level is developed. In the first stage, clusters of Conceptors representing similar patterns of
Convolutional Neural Network (CNN )
Bearings
degradation within complete run-to-failure trajectories are identified; in the second stage, the obtained clusters
are used to supervise the training of a convolutional neural network classifier of the equipment degradation
level. The proposed method is applied to a synthetic case study and to two literature case studies regarding
bearings degradation level estimation. The obtained results show that the proposed method provides more
accurate estimation of the equipment degradation level than other state-of-the-art methods.
∗ Corresponding author.
E-mail addresses: mingjing.xu@polimi.it (M. Xu), piero.baraldi@polimi.it (P. Baraldi), zhe.yang@polimi.it (Z. Yang), enrico.zio@polimi.it (E. Zio).
https://doi.org/10.1016/j.eswa.2022.118962
Received 8 August 2021; Received in revised form 25 June 2022; Accepted 1 October 2022
Available online 8 October 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
2
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Notice that the unsupervised learning methods previously intro- K-Means (Jain, 2010) and Self-Organizing Map (SOM ) (Vesanto & Al-
duced cannot reapplied to the problem of this work. For example, the honiemi, 2000), with Laplacian Score (LS) (He et al., 2005) for feature
methods in Al-Dahidi et al. (2018), Baraldi et al. (2015a) and Liu et al. selection, Fourier Transform (FT ) and Short Time Fourier Transform
(2018) that cluster transients of components are not able to diagnosis (STFT ) (Borisagar et al., 2019) for feature extraction, and Dynamic
the degradation level, since they do not consider the dynamics of the Time Warping (DTW ) (Salvador & Chan, 2007) for distance computa-
degradation evolution. tion, in addition, a deep learning clustering approach based on Auto-
Since representation learning is adaptively capable of learning fea- Encoder and Generative Adversarial Network for extracting features
tures from raw data, it can constitute an excellent a priori choice for the and K-means clustering them is proposed for comparison; the classifi-
development of diagnostic techniques. In the work of Lei et al. (2016), cation stage has been performed by K-Nearest Neighbor (KNN ) (Zhang
an unsupervised sparse filtering method based on a two-layer neural et al., 2017), Temporal Convolutional Network (TCN ) (Lea et al., 2017)
network is used to directly learn features from mechanical vibration and Long-Short Term Memory (LSTM ) (Hochreiter & Schmidhuber,
signals. In the work of Han et al. (2019), a Spatio-Temporal Pattern 1997).
Network (STPN ) based on Probabilistic Finite State Automation (PFSA) The present work is built up on the Conceptor-based clustering
and Markov machines is proposed to represent temporal and spatial method proposed by the authors in Xu et al. (2020b), which is here
structures for diagnostics in complex systems. However, these con- extended to develop a classifier of the test component degradation
ventional representation learning methods cannot capture long-term level (Xu et al., 2020b). Differently from the method in Xu et al.
temporal dependencies in the time series and they typically require (2020b), which can be applied only to an entire run-to-failure degrada-
high computational complexity. tion trajectory, the method here developed can be applied to the data
Conceptor is a mechanism of neurodynamical temporal pattern collected from a component until the present time, as which allows a
representation proposed in the work of Jaeger (2017). Considering a real-time degradation assessment.
reservoir, i.e. a randomly generated and sparsely connected Recurrent Other differences with respect to Xu et al. (2020b) are:
Neural Network (RNN ) (Lukoševičius & Jaeger, 2009), Conceptors can
be understood as filters characterizing the geometries of the temporal • The formalization of method for Conceptor Clustering in an algo-
states of the reservoir neurons in the form of square matrices (Jaeger, rithm, which is reported in Section 4.1.2;
2017), achieving a direction-selective damping of high-dimensional • The analysis of the time and space complexity of the Conceptor
reservoir states (Qian & Zhang, 2018). The choice of Conceptor is Clustering algorithm, which is reported in Section 4.3 of the
motivated by its capability of catching the degradation dynamics of present work;
the multivariate time series with variable length and representing into • The visualization of the embedding space of the normalized eigen-
the form of matrix, thus making it more accurate and easier for the vectors found by the Conceptor Clustering algorithm which is
clustering and classification of degradation level. shown in Section 5.4 of the present work;
The proposed method develops in two stages. In the first stage, the • The validation of the clustering method on two new case studies.
Conceptors extracted from the run-to-failure degradation trajectories of
The original contributions of the work are:
training are clustered into several non-overlapped time series segments
representing different degradation levels. The first stage addresses 1. the development of a novel Conceptors-aided clustering ap-
the challenge of simultaneously coping with features correlations and proach for multi-variate time series with variable length;
long-term degradation dynamics in run-to-failure trajectories, which is 2. the development of Conceptors-based CNN for degradation level
the key issue in the components degradation level clustering. In the estimation;
second stage, the Conceptors and corresponding labels obtained from 3. the proposed methodology can be applied to generic application
the clustering in the first stage are used to train a Convolutional Neural of unsupervised degradation level estimation with time series
Network (CNN ) for real-time diagnosing the equipment degradation input.
level. The CNN receives in input the Conceptors extracted from the
reservoir states at the current time, which contains information about The remaining of the paper is organized as follows: Section 2 states
the long term evolution of the degradation, and the difference be- the problem and illustrates the work objectives; Section 3 introduces
tween the Conceptors extracted at the present and previous time steps, the background and preliminaries of the work and Section 4 describes
which contains information about the short-term degradation variation. the proposed method of unsupervised degradation level estimation;
The second stage addresses the challenge of bridging the degradation Section 5 introduces the numerical synthetic case study with long-term
dynamic representations, which is in the form of Conceptors and Con- temporal dependencies and the two bearing case studies, and, then,
ceptor differences, and their associated classification level. Specifically, discusses the obtained results. Finally, some conclusions and remarks
ESN allows embedding the local degradation dynamic into the reservoir are drawn in Section 6.
state space, and Conceptor extracts global degradation dynamic by
projecting reservoir states into a regularized eigenspace. The choice of 2. Problem statement
using CNNs for the classification task is motivated by the type of input,
i.e. matrices, which contain information about the correction among We consider a population of 𝑅 similar components, which have
the reservoir states. already failed in the past; each component is monitored by 𝑆 sensors.
Three case studies are considered to verify the performance of the The 𝑆 measured signals which relate to the equipment operating con-
proposed method. The first one is a problem of degradation level esti- ditions and degradation, are assumed to be synchronously measured up
mation built on synthetic data, whereas the second and third concern to failure. The run-to-failure trajectory of the generic rth component
the classification of the degradation level of bearings, the data are taken (𝑟 = 1, … , 𝑅) is represented by a matrix 𝑷 𝑟 , whose generic entry 𝑝𝑟𝑠,𝑖 is
from the repositories of the Center for Intelligent Maintenance Systems the value of signal 𝑠 at time 𝑡𝑖 , 𝑠 = 1, … , 𝑆 and 𝑖 = 1, … , 𝑁𝑟 , where 𝑁𝑟 is
(IMS) (Qiu et al., 2006) and of the PHM 2012 data challenge based on the number of measurements taken during the entire life of component
PRONOSTIA experiment platform (Nectoux et al., 2012a), respectively. 𝑟. Notice that due to the stochasticity of the degradation process, the
Notice that in all cases, we assume that the degradation level of the lifetime of each component is different and, therefore, we need to
training data is not known. develop a method able to deal with time series of different lengths,
The performance of the proposed method has been compared with 𝑁𝑟 , 𝑟 = 1, … , 𝑅. We realistically assume that no information is available
that of other state-of-the-art methods. Specifically: the clustering stage for directly labeling the degradation level of the components. This
has been performed by combining Spectral Clustering (Ng et al., 2001), situation is common for those components whose degradation cannot
3
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Fig. 1. Scheme of the problem statement. Note that 𝑁𝑟 may have different value w.r.t. different trajectory 𝑟.
3. Preliminaries
4
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Note that Conceptors are obtained by the reservoir state 𝒙(𝑡𝑖 ) which is Unit (ReLU ), 𝑓 (𝑥) = 𝑚𝑎𝑥(0, 𝑥), is typically used as activation function of
generated for Echo State Network. the neurons to implement nonlinear transformation as well as improve
the representation ability (Glorot et al., 2011; Maas et al., 2013).
3.3. Spectral clustering The pooling layer typically receives in input the output of a con-
volutional layer, and performs the sub-sampling operation to lower the
Spectral clustering is based on the construction of a similarity graph spatial size of the feature maps and reduce the parameters of the whole
𝐺 = (𝑉 , 𝐸), where 𝑉 = {𝑣1 , … , 𝑣𝑙 } identifies the set of vertices and 𝐸 = network (Krizhevsky et al., 2012; Simonyan & Zisserman, 2014). The
{𝑣1 − 𝑣2 , 𝑣1 − 𝑣3 , … , 𝑣𝑙 − 𝑣𝑙−1 } is the set of edges connecting the vertices. pooling operator adopted in this work is max pooling (Zhang et al.,
2020).
Each vertex represents an object, and the weight associated to the edge
Similarly to what done in traditional convolutional neural networks,
connecting two generic vertices 𝑣𝑚 and 𝑣𝑛 is the measure of similarity
a fully connected layer is stacked after a pooling layer to connect all
between objects 𝑚 and 𝑛, denoted by 𝐴𝑚𝑛 , 𝑚, 𝑛 = 1, 2, … , 𝑙. Spectral
previous feature maps (Krizhevsky et al., 2012; Simonyan & Zisserman,
clustering aims at defining partitions of vertexes clusters, such that the
2014), and dropout operation is applied during training to reduce
edges between objects belonging to the same partition have associated
overfitting (Zhang et al., 2020).
large weights (objects within the same cluster are similar), whereas the Finally, a softmax classification layer (Simonyan & Zisserman, 2014),
edges between objects belonging to different partitions have associated which receives in input the output of the fully connected layer and
small weights (objects in different clusters are dissimilar). The spectral provides the classification result, is used.
clustering technique entails five main steps (Baraldi et al., 2015b, 2013;
Li, Song et al., 2020): 4. Proposed method
(1) Construct the similarity matrix 𝑨, whose general entry 𝐴𝑚𝑛 is the This Section describes the proposed two-stage unsupervised learning
measure of similarity between 𝑣𝑚 and 𝑣𝑛 , 𝑚, 𝑛 = 1, 2, … , 𝑙. method for degradation level estimation. Given the lack of data labeled
(2) Build the normalized Laplacian matrix 𝑳. This requires the with the component degradation, the method is based on the two stages
computation of the diagonal degree matrix, 𝜦, whose entries of clustering, aimed at labeling the training trajectories, 𝑷 𝑟 , 𝑟 = 1, … , 𝑅,
∑
𝛬1 , 𝛬2 , … , 𝛬𝑙 are such that the given entry is 𝛬𝑚 = 𝑙𝑛=1 𝐴𝑚𝑛 , (Stage 1) and after classification of the test trajectory 𝑷 (Stage 2)
from which the normalized Laplacian matrix is obtained: (Fig. 3).
1 1
𝑳=𝑰 − 𝜦− 2 𝑨𝜦− 2 (4) Stage 1: Clustering. It receives in input the training trajectories and
provides in output the clusters of Conceptors. The generic time 𝑡𝑖 at
where 𝑰 is the identity matrix of size 𝑙 × 𝑙. which the trajectories change cluster corresponds to the time at which
(3) Compute the eigenvalues, 𝜆1 , 𝜆2 , … , 𝜆𝑙 , and corresponding the equipment degradation goes from one level to the successive.
eigenvectors, 𝑏1 , 𝑏2 , … , 𝑏𝑙 , of the normalized Laplacian matrix 𝑳,
Stage 2: Classification. A classifier is trained by using the cluster labels
whose eigenvalues are sorted from the smallest to the largest. Se-
found in Stage 1. It receives in input the Conceptors at the present time
lect the first k smallest eigenvalues 𝜆1 , … , 𝜆𝑘 and corresponding
and the difference between the Conceptors at the present and previous
eigenvectors 𝑏1 , … , 𝑏𝑘 (Von Luxburg, 2007).
time steps, and provides in output the estimation of the equipment
(4) Build the matrix 𝑩 of size 𝑙 × 𝑘, in which the 𝑘 columns are degradation level.
the eigenvectors 𝑏1 , … , 𝑏𝑘 found in step 2. A matrix 𝒁 is, then, The introduction of the intermediate phase of generating Concep-
obtained by normalizing 𝑩 rows (Von Luxburg, 2007): tors, instead of directly clustering and classifying signals, is motivated
( 𝑘 )1∕2 by the factor that: (1) Conceptors catch the dynamics of the degrada-
⎛ ∑ ⎞
𝑧𝑚𝑛̃ =𝑏𝑚𝑛̃ ∕ ⎜ 2
𝑏𝑚𝑛̃ ⎟ , 𝑚= 1,…,𝑙, 𝑛=
̃ 1,…,𝑘 (5) tion patterns, reducing the influence of noise and operational condi-
⎜ 𝑛=1 ⎟ tions; (2) Conceptors allow measuring similarities among time series of
⎝ ̃ ⎠
different lengths; (3) Conceptors provide a synthetic representation of
It has been shown that this change of data representation allows the information in the signals, which allows reducing the computational
identifying clusters more easily (Von Luxburg, 2007). complexity of measuring the similarity among time series.
(5) Apply a clustering algorithm to the rows of the matrix 𝒁, each
one representing an object in the space of the first 𝑘 normal- 4.1. Stage 1: Clustering of the Conceptors
ized eigenvectors. In this work, we use the k-means clustering
algorithm for this. The labeling of the training trajectories is based on the two phases of
generating Conceptors and clustering them. The diagram for illustrating
3.4. Convolutional Neural Networks (CNNs) Conceptor Clustering algorithm is shown in Fig. 4.
CNNs have shown superior performances in various degradation 4.1.1. Generation of the Conceptors
level estimation applications due to their ability of extracting features The objective of the first phase is to represent the segment from time
𝑡1 to time 𝑡𝑖 of the generic 𝑟th degradation trajectory, 𝑝𝑟1∶𝑆,1∶𝑖 ∈ R𝑆×𝑖 ,
for the classification task (Li, Hu et al., 2020). The CNN structure is
based on the repetition of convolutional, pooling, fully connected and by means of the Conceptor 𝑪 𝑟𝑖 ∈ R𝑁×𝑁 .
The Conceptor matrix 𝑪 𝑟𝑖 is computed by applying Eqs. (1) and
softmax classification layers (Krizhevsky et al., 2012).
(3) to the multi-variate time series 𝑝𝑟1∶𝑆,1∶𝑖 , with the reservoir states
Convolutional layers convolve the input by using kernels, i.e. filters
at time 𝑡0 , 𝒙(𝑡0 ), initialized to zero to avoid unnecessary uncertainty.
with local receptive field, that apply the same weights over the entire
The correlation matrix 𝑹 is computed by adopting an iterative updating
input field. The feature map 𝑿 𝑚+1
𝑛 provided in output by the 𝑛th kernel
procedure:
matrix, 𝑾̃ 𝑚+1 , associated to the 𝑚 + 1th layer is:
𝑛 𝑖−1 ( ) ( )𝑇 1
𝑹(𝑖) =𝑹(𝑖−1) ⋅ +𝒙 𝑡𝑖 𝒙 𝑡𝑖 ⋅ (7)
𝑿 𝑚+1 ̃ 𝑚+1 𝑚 𝑚+1
(6)
𝑖 𝑖
𝑛 =𝑾 𝑛 ⊙𝑿 +𝒃𝑛
where 𝑹(𝑖) denotes the correlation matrix at time 𝑡𝑖 and 𝑹(0) is the null
𝑚
where 𝑿 is the input to the 𝑚 + 1th convolutional layer, ⊙ denotes the matrix. Notice that the matrix 𝑹(𝑖) from which the Conceptor matrix
convolution operation and 𝒃𝑚+1
𝑛 is the corresponding bias term. After 𝑪 𝑟𝑖 is obtained contains information about the evolution of the signals
the convolutional operation, the Batch-Normalization (BN ) technique from time 𝑡1 to time 𝑡𝑖 . Therefore, 𝑪 𝑟𝑖 provides a synthetic, fixed-size
is applied to overtake the difficulty of training CNN with saturating representation of the time signals evolution in the time window [0, 𝑡𝑖 ]
nonlinearities and to speed up the training process. The Rectifier Linear whose length can vary from 𝑡𝑖 to 𝑡𝑁𝑖 .
5
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Fig. 4. Illustration of Conceptors Clustering algorithm. Given any two time series from starting point 𝑡1 to time 𝑡𝑖 and time 𝑡𝑗 , they are converted into Conceptor matrices 𝑪 𝑖 and
𝑪 𝑗 by using Conceptor Generator, separately, then for any 𝑖, 𝑗 = 1, … , 𝑁𝑟 , the distance matrix of Conceptors can be obtained, at last, Spectral Clustering can be applied to the
distance matrix and the clustering results are obtained.
6
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
step the degradation can upmost increase of one level, i.e. 𝑪 𝑟𝑖 ∈ 𝛱𝑑𝑟 𝑟 → The time complexity of the Conceptors clustering algorithm is
𝑖 ( ) ( ) ( )
𝑪 𝑟𝑖+1 ∈ {𝛱𝑑𝑟 𝑟 ∪ 𝛱𝑑𝑟 𝑟 +1 }. 𝑅 ⋅ 𝑁𝑟 ⋅ 𝑁 3 + 𝑅 ⋅ 𝑁𝑟2 ⋅ 𝑁 2 + 𝑅 ⋅ 𝑁𝑟3 ⋅ 𝐾 . The first term accounts
𝑖 𝑖
for the cost of generating the Conceptors, the second term the cost
4.2. Stage 2: Classification of the Conceptors of computing the similarity matrix, and the last term the cost of the
spectral clustering algorithm and the Silhouette Index computation.
The objective of the classification module is to estimate the degra- The time complexity of the classification algorithm is (𝑁𝑐𝑜𝑛𝑣 ⋅ 𝑁𝑓2 ⋅
dation level of the test component at time 𝑡𝑐 . Let 𝑪 𝑡𝑒𝑠𝑡
𝑐−1
, 𝑪 𝑡𝑒𝑠𝑡
𝑐 be the 𝑤2𝑓 ⋅ 𝑁 2 ) + (𝑁𝑓 ⋅ 𝑁 2 ) + (𝑁𝑓2 ), where the first term accounts for
Conceptor matrices representing the degradation trajectory of the test the cost of convolution operation in the 𝑁𝑐𝑜𝑛𝑣 layers, each one with
component from time 𝑡1 to time 𝑡𝑐−1 and from time 𝑡1 to time 𝑡𝑐 , time complexity of (𝑁𝑓2 ⋅ 𝑤2𝑓 ⋅ 𝑁 2 ), where 𝑤𝑓 indicates the width of
respectively. Inputs of the classification module are the Conceptor convolution kernel, the second term the cost of the max-pooling layers
𝑪 𝑡𝑒𝑠𝑡
𝑐 , which represents the long-term temporal dependency of the test
and the third term the cost of the fully connected layers.
degradation trajectory, and the difference 𝛽⋅(𝑪 𝑡𝑒𝑠𝑡 𝑡𝑒𝑠𝑡
𝑐 −𝑪 𝑐−1 ) (Fig. 3), which
With respect to the space complexity, Algorithm 1 needs to store
{ }(𝑘)
represents the short-term variation of the degradation dynamic between 𝑪 𝑟𝑖 , 𝑨𝑟 and the intermediate clustering results 𝛱1𝑟 , … , 𝛱𝑘𝑟 during
( )
time 𝑡𝑐−1 and time 𝑡𝑐 . With respect to the setting of 𝛽, i.e. the scaling the learning procedure. Its space complexity is therefore 𝑁𝑟 ⋅ 𝑁 2 +
( 2) ( )
factor of the difference matrix of two adjacent Conceptors, since the 𝑁𝑟 + 𝑅 ⋅ 𝑁𝑟 ⋅ 𝐾 . Considering that 𝑁 2 < 𝑁𝑟 , and 𝑅 ⋅ 𝐾 < 𝑁𝑟 in
7
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Table 1 Table 2
Setting of the hyperparameters in Stage 1. The architecture of the Conceptor-based CNN network.
( )
most cases, the overall space complexity is 𝑁𝑟2 . In Stage 2, the space
2 2 2
complexity is (𝑁𝑐𝑜𝑛𝑣 ⋅𝑁𝑓 ⋅𝑤𝑓 )+(𝑁𝑓 ), where the first term accounts for
the space needed for storing the parameters of the 𝑁𝑐𝑜𝑛𝑣 convolutional
layers, each one of space complexity equal to (𝑁𝑓2 ⋅𝑤2𝑓 ), and the second
term is the space complexity of fully connected layers. Notice that no a stack of three of these convolution layers gives a receptive field
storage cost is associated to the max-pooling layers because they do not of size 7 × 7 using only 3 × (3 × 3) = 27 < 49 parameters. Notice
have weight parameters. that the wider the receptive field applied to a Conceptor, the more
information on the degradation dynamic is captured. The convolution
5. Case studies stride is fixed to 1 to provide more detailed degradation information,
and spatial pooling operation is carried out by three max-pooling layers
The proposed method is compared to state-of-the-art methods for connected after the convolutional layers. Max-pooling is performed
clustering and classification, considering one synthetic and two real over a 2 × 2 window with stride size 2. Note that the setting of
case studies. the number of channels follows the principle according to which the
number of channels doubles for each max-pooling layer added, and the
5.1. Hyperparameters setting for the proposed method base channel number is 𝑁𝑓 = 64 (Simonyan & Zisserman, 2014).
The CNN training is carried out by using the method of mini-batch
Table 1 reports the setting of the hyperparameters used to generate stochastic gradient descent with momentum (Sutskever et al., 2013).
the Conceptors. Since the larger the reservoir size 𝑁, the larger the The mini-batch size is set to 64, the momentum to 0.9 (Simonyan
Memory Capacity (MC), which quantifies the memory span of the & Zisserman, 2014). The training is regularized by using 𝐿2 norm
ESN (Büsing et al., 2010; Qiao et al., 2016), but also, the larger the weight decay and the 𝐿2 penalty multiplier set to 5 × 10−4 (Simonyan
computational burden, a trade-off value of 𝑁 = 100 is chosen. The & Zisserman, 2014). The maximal number of epoch is set to 30.
spectral radius 𝜌(|𝑾 |) is set equal to 0.9 to allow longer retainment of
the system state, which requires a large spectral radius, while ensuring 5.2. Methods considered for the comparison
the echo state property, which requires 𝜌 (|𝑾 |) ≤ 1 (Yildiz et al., 2012).
The connectivity 𝑐, i.e. the ratio between the number of connections The methods considered for the comparison of the clustering results
in the reservoir and the number 𝑁 2 of all possible connections, is set are characterized by different choices with respect to the type of input
equal to 5∕𝑁 to guarantee proper MC without extensively increasing data (time window of prefixed lengths, all the time series until the
the computational burden (Qiao et al., 2016). Conceptor aperture 𝛼, present time), the feature extraction technique (statistical features,
which can be interpreted as the scaling factor of the reservoir state Fourier transform, short time Fourier transform, deep learning), the
in Eq. (3), is set equal to 1 according to Jaeger (2017). distance measure (DTW, Euclidean) and the clustering algorithm (spec-
The architecture of the CNN employed in this work is reported in tral clustering, k-means, self organizing maps). Methods that receive
Table 2. Inspired by the Visual Geometry Group (VGG) net architecture in input time windows of pre-fixed length of the time series will be
proposed by Simonyan and Zisserman (2014), multiple convolutional indicated by the letter (a) (Clustering Methods 1a, 2a, 3a and 4a) and
layers characterized by kernels with small receptive fields of size 𝑤𝑓 × methods that receive in input all the time series from time 𝑡1 until the
𝑤𝑓 , with 𝑤𝑓 = 3, are stacked. This architecture provides the ability current time will be indicated by the letter (b) (Clustering Methods 1b,
of obtaining a wide receptive field with a relatively limited number 2b, 3b and 4b).
of parameters (Simonyan & Zisserman, 2014). For example, a stack Methods 1, 2, 3 and 4 in Table 3 have been used to perform an
of two convolution layers with kernel size 3 × 3 gives a receptive ablation study for the validation of the proposed Conceptors Clustering
field of size 5 × 5 using only 2 × (3 × 3) = 18 < 25 parameters and method. In particular: Method 1 uses the same clustering algorithm
8
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Table 3
Clustering methods considered for the comparison. (a) indicates that the method receives in input a time window formed by the last 𝑙𝑤 signal
measurements, whereas (b) indicates that the input is from time 𝑡1 to the current time.
of the proposed method from which it differs only for the distance The methods considered for the comparison of the classification
metric. Therefore, it allows investigating the effect of measuring simi- results are characterized by different feature extraction/representation
larity using the Conceptors instead of another state-of-the-art distance, techniques (windowed time series, Conceptors), distance measures
i.e. DTW ; Method 2 allows investigating the effect of using Conceptors (DTW and Frobenius norm of Conceptors difference) and classification
and Spectral clustering instead of traditional feature extraction and methods (KNN, TCN, LSTM and CNN ). Classification Method 1 receives
selection techniques in the time-domain; Method 3 allows investigating: in input the time windows made of the last 𝑙𝑤 measurements, whereas
(i) the difference between Conceptors and Fourier transform-based ap- Classification Methods 2, 3, 4 and 5 receive in input all the time series
proaches for feature extraction and selection in the frequency domain, from time 𝑡1 to the current time 𝑡𝑚 . The choice of reducing the length of
and (ii) another neuron-based method for clustering; Method 4 allows the time series provided in input to Classification Method 1 is motivated
comparing: (i) the proposed method for feature extraction with state- by its large computational complexity, which is quadratic with respect
of-the-art deep learning-based methods for feature extraction and (ii) to the number of inputs. Classification Method 6 is combined with the
clustering method which combines features extracted by using deep proposed clustering method to verify the motivation of using a two-
learning approaches and k-means; specifically, Clustering Method 4 stage method, since it directly compares the similarity between the test
uses a Generative Adversarial Network (GAN ) to reconstruct the dis- sample and the clusters centers and, then, assigns the membership to
tribution of data and an Auto-Encoder (AE), formed by an encoder and the classes.
the GAN generator to project the data into a latent space. Clustering Methods 1, 2, 3, 4, 5 and 6 in Table 4 allow performing an
Method 4b first transforms the variable-length time series into reservoir ablation study for the validation of the proposed Conceptors-based
states and, then, applies the same procedure of Method 4a. CNN classification method. In particular, Method 1 investigates (i) the
The design of the ablation study for the proposed clustering method effect of providing in input the whole trajectory (used by Conceptor)
has considered the fact that some combinations of feature extraction, instead of time windows made by the last 𝑙𝑤 measurements and (ii)
selection and distance measure are not possible or are not expected to the effect of measuring similarity between Conceptors instead of the
provide satisfactory results. For example, Clustering Method 1 directly similarity between time series through DTW, Method 2 investigates
uses the pairwise DTW distance for the clustering without performing the effect using Conceptors-based CNN classifiers instead of state-of-
an intermediate step of feature extraction and selection. Although the-art convolution-based classifiers of time series characterized by
Method 2 and 3 can provide intermediate feature extraction and se- the use of TCN for extracting features by temporal convolution and
lection results, the feature selection procedure of Method 3 uses the logistic regression as last classification layer, Method 3 investigates
phase value of Fourier spectrum, which cannot be applied to Method the effect using Conceptors-based CNN classifiers instead of state-of-
2. Clustering Method 4 does not support feature extraction, because the-art Recurrent Neural Networks, Method 4 investigates the effect of
the deep learning methods (AE, GAN ) used in the feature extraction Conceptors classification by using CNN instead of KNN and Method
procedure automatically generate the required features for clustering 5 investigates the effect of providing in input to the classifiers also
purpose. the difference between Conceptors extracted at the present and pre-
The other comparison methods have been introduced to investigate vious time steps. Method 6 investigates the need of adopting a two
the effect of the spectral clustering algorithm with respect to other pos- stage approach with clustering and classification instead of a single
sible clustering approaches. Since Conceptors are matrices whereas the stage approach. Notice that the objective of the ablation study is not
state-of-the-art clustering approaches receive in input vectors, different to compare the proposed classification method with all the possible
techniques for features extraction and selection have been used for the combination of input quantities, feature extraction and representa-
comparison. tion techniques, distance measures, and classification algorithm, but
With respect to the setting of the hyperparameters, the DTW warp- to justify the choices made in the design of the proposed method
ing path is set within 5 samples of a straight line fitting the two by comparison with state-of-the-art methods suitable for the specific
sequences, the parameter 𝜎 of the Spectral Clustering (Eq. (9)) and problems to be tackled.
Laplacian Score are set to 1, the number of training epoch of the SOMs Due to the characteristics of time series classification, feature ex-
is set equal to 200, the neuron number is set equal to the number of traction/presentation procedure are designed for different classification
clusters 𝑘, the max number of neighbors to 3 and the learning rate to methods. For example, Classification Method 1 uses DTW distance
0.02. Number of layers of neural network in Method 4 is set to 2, and metric to do KNN classification; however, this is not applicable to deep
neuron number 50. To avoid the complexity of optimizing the number learning classification methods TCN and LSTM (end-to-end learning
of clusters, Clustering Methods 1, 2, 3 and 4 search for a number of procedure) and Conceptor-based methods (Method 4, 5 and 6), because
clusters equal to the number of levels of degradation of the component. DTW is not applicable to Conceptor Matrix. Vice versa, the distance
9
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Table 4
Classification methods considered for the comparison.
metric used in Conceptor-based methods is not applicable to Method where 𝐻[𝑦1 , …,𝑦𝑁𝑟 ] and 𝐻[𝑐1 , …,𝑐𝑁𝑟 ] indicate the entropy of the true
1. and assigned degradation levels obtained from the clustering of the 𝑟th
The hyperparameters of the classification methods are optimized by trajectory, respectively:
performing a grid search on a validation set with the objective of maxi- [ ] ∑
𝑔
mizing the classification accuracy. The number of nearest neighbors of 𝐻 𝑦1 , …,𝑦𝑁𝑟 = − 𝑝𝑟 (𝛽)log 𝑝𝑟 (𝛽) ,
KNN is searched in the range {1, 2, … , 51}. The filter size of the TCN 𝛽=1
convolution layer is searched in the set {3, 4, 5, 6, 7, 8} (Bai et al., 2018), [ ] ∑
𝑔̂
( ) ( )
the number of filters in the set {125, 150, 175, 200} and the number of 𝐻 𝑐1 , …,𝑐𝑁𝑟 = − 𝛾 log 𝑝𝑟 ̂
𝑝𝑟 ̂ 𝛾 (14)
residual blocks in the set {1, 2, 3, 4} (Bai et al., 2018). The number of 𝛾 =1
̂
hidden neurons of the LSTMs is searched in the set {50, 100, 200} and the A NMI value of 1 indicates the most satisfactory performance.
number of LSTM layers in the set {1, 2}. The architecture of the CNNs Similarly, the Accuracy metric assessing the performance in the
used by the proposed method and Classification Method 5 is reported in classification of a test trajectory is:
Table 2 and the parameters reported in Section 4.2. The hyperparame- ∑𝑁𝑡𝑒𝑠𝑡
ters used for generating the Conceptors used by Classification Method 𝛿(𝑦𝑚 , 𝑐𝑚 )
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑡𝑒𝑠𝑡 = 𝑚=1 (15)
4, 5 and the proposed method are listed in Table 1. 𝑁𝑡𝑒𝑠𝑡
and the MI metric is:
5.3. Performance metrics
∑
𝑔̂ ∑
𝑔
𝑝𝑡𝑒𝑠𝑡 (𝛽, ̂
𝛾)
𝑀𝐼𝑡𝑒𝑠𝑡 = 𝑝𝑡𝑒𝑠𝑡 (𝛽, ̂
𝛾 ) log (16)
The performance of the methods are evaluated at the two stages of: 𝛾 =1 𝛽=1
̂
𝑝𝑡𝑒𝑠𝑡 (𝛽) 𝑝𝑡𝑒𝑠𝑡 (̂
𝛾)
(i) unsupervised clustering of the 𝑅 run-to-failure training trajectories
𝑝𝑟1∶𝑆,1∶𝑁 , 𝑟 = 1 … 𝑅, and (ii) classification of the degradation level 5.4. Case study I: Synthetic dataset
𝑟
of the 𝑅𝑡𝑒𝑠𝑡 test trajectories 𝑝𝑡𝑒𝑠𝑡
1∶𝑆,1∶𝑁
, 𝑡𝑒𝑠𝑡 = 1 … 𝑅𝑡𝑒𝑠𝑡 . The metrics
𝑡𝑒𝑠𝑡
We consider a component characterized by a discrete three-levels
Accuracy and Normalized Mutual Information (NMI ) are used in both degradation process (Alaswad & Xiang, 2017). A run-to-failure trajec-
cases. Considering the 𝑟th generic run-to-failure training trajectory tory is simulated by using an auxiliary continuous variable 𝜂(𝑡), which
𝑝𝑟1∶𝑆,1∶𝑁 and indicating as 𝑚 the segment of the trajectory 𝑝𝑟1∶𝑆,1∶𝑚 going evolves following the exponential function (Gebraeel, 2006):
𝑟
from time 𝑡1 to time 𝑡𝑚 , with 𝑚 = 1, 𝑙𝑤 , 2𝑙𝑤 , … , 𝑁𝑟 , the Accuracy of the
clustering of trajectory 𝑟 is: 𝜂 (𝑡) =𝜃⋅𝑒𝛽𝑡 (17)
∑𝑁𝑟
𝛿(𝑦𝑚 , 𝑐𝑚 ) where 𝜃 and 𝛽 are independent random variables associated to a spe-
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑟 = 𝑚=1 , 𝑟 = 1, … , 𝑅 (11) cific component, with ln 𝜃 ∼ (−3, 0.62 ) and 𝛽 ∼ (−0.015, 0.0032 ).
𝑁𝑟
The component fails when 𝜂(𝑡) reaches the failure threshold 𝜂𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 = 1,
where 𝑦𝑚 and 𝑐𝑚 are the ground-truth and the assigned degradation
whereas the first state transition, from degradation level 1 to degrada-
level of segment 𝑚, respectively, 𝛿(𝑦𝑚 , 𝑐𝑚 ) is the delta function, which
tion level 2, occurs when 𝜂 (𝑡) reaches (1∕3) ⋅ 𝜂𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 , and the second
is equal to 1 if 𝑦𝑚 = 𝑐𝑚 , and 0 otherwise. The Accuracy metric is in the
state transition, from degradation level 2 to degradation level 3, occurs
range [0, 1], with 1 indicating the most satisfactory performance.
when 𝜂 (𝑡) reaches 2∕3 ⋅ 𝜂𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 .
The Mutual Information (MI ) metric measures the degree of depen-
To mimic the complexity of a real industrial case, the measured
dency between the ground-truth and assigned degradation levels (Cakir
signals are influenced by component degradation, operational and en-
et al., 2019). It is preferred to the Accuracy metric when the dataset
vironmental conditions, and process noise. Operational and environ-
is imbalanced (Jain, 2010); the MI on the 𝑟th trajectory is (Ye et al.,
mental conditions are assumed to have periodic behaviors to simulate
2018):
seasonal effects:
∑
𝑔̂ ∑
𝑔
𝑝𝑟 (𝛽, ̂
𝛾) ( )
2𝜋
𝑀𝐼𝑟 = 𝑝𝑟 (𝛽, ̂
𝛾 ) log , 𝑟 = 1, … , 𝑅 (12) 𝛤 (𝑡) =𝑠𝑖𝑛 𝑡 +𝜔𝑐 (𝑡) (18)
𝛾 =1 𝛽=1
̂
𝑝𝑟 (𝛽) 𝑝𝑟 (̂
𝛾) 50
with 𝜔𝑐 (𝑡) being a Gaussian noise with distribution (0, 0.2) represent-
where, considering a randomly sampled segment 𝑚 extracted from the
ing the stochasticity of the environmental changes. The process noise
𝑟th training trajectory, 𝑝𝑟 (𝛽) and 𝑝𝑟 (̂
𝛾 ) are the probabilities that its
𝜔 (𝑡) is sampled from the Gaussian distribution 𝑁(0, 0.1). Component
true and assigned degradation levels are 𝛽 and ̂ 𝛾 , respectively, 𝑝𝑟 (𝛽, ̂
𝛾)
degradation is quantified by the step function 𝐷(𝑡) (Nourelfath et al.,
is the joint probability that its true degradation level is 𝛽 and its
2012):
assigned degradation level is ̂ 𝛾 , and 𝑔 and 𝑔̂ are the number of true
and assigned degradation levels, respectively. NMI normalizes MI in ⎧ 1 ⋅ ∫ 𝑡12 𝜂 (𝑡) 𝑑𝑡 𝑖𝑓 𝑑𝑒𝑔𝑟𝑎𝑑𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙= 1
the range [0, 1] (Wulandari et al., 2019; Ye et al., 2018): ⎪ 𝑡12 0
⎪ 1 𝑡23
𝑀𝐼𝑟 𝐷 (𝑡) = ⎨ 𝑡23 −𝑡12 ⋅ ∫𝑡12 𝜂 (𝑡) 𝑑𝑡 𝑖𝑓 𝑑𝑒𝑔𝑟𝑎𝑑𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙= 2 (19)
𝑁𝑀𝐼𝑟 = ( ) (13) ⎪ 1 𝑡𝑓
𝐻[𝑦1 , …,𝑦𝑁𝑟 ] + 𝐻[𝑐1 , …,𝑐𝑁𝑟 ] ∕2 ⎪ 𝑡 −𝑡 ⋅ ∫𝑡23 𝜂 (𝑡) 𝑑𝑡 𝑖𝑓 𝑑𝑒𝑔𝑟𝑎𝑑𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙= 3
⎩ 𝑓 23
10
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
with 𝑡12 , 𝑡23 , 𝑡𝑓 indicating the times of the transitions from state 1 to
state 2, from state 2 to state 3 and from state 3 to failure, respectively.
In practice, 𝐷(𝑡) indicates the expected value of the variable 𝜂(𝑡) when
the component is in a given degradation state.
The values of ten signals 𝜒𝑠 (𝑡), 𝑠 = 1, … , 10, are simulated by using
the following equations that combine at least two of the three factors
𝐷(𝑡), 𝛤 (𝑡) and 𝜔(𝑡):
A total of 200 run-to-failure degradation trajectories have been them (lighter colors) and dissimilar to those of different degradation
simulated by sampling 𝛽 and ln 𝜃 from their distributions, and applying levels (darker color). This effect can also be seen in Fig. 8, which
Eq. (19) to obtain the corresponding signal values. They are divided shows the scatter plot of the Conceptors in the space of the first three
into a set of 100 training trajectories used for clustering the patterns normalized eigenvectors found by the spectral clustering algorithm.
within each trajectory and 100 test trajectories used for verifying the Time sequences of the same degradation level are close and those of
classification performance. The lengths of time series are set randomly different degradation levels are well separated.
among 150, 160, … , 250; an example of a trajectory is given in Fig. 6, Fig. 9 shows the distribution of the number of clusters
{ ∗}
with length equal to 200. 𝑘𝑟 𝑟=1,…,𝑅=100 identified by applying the spectral clustering algorithm
Fig. 6 shows the obtained evolution of 𝐷(𝑡), 𝛤 (𝑡) and 𝜔(𝑡) during to the training trajectories. Notice that the statistical mode of the
a simulated run-to-failure trajectory representing the life of one com- distribution corresponds to the true number of degradation states in
ponent. Notice that the ten measured signals 𝜒1 (𝑡) , … , 𝜒10 (𝑡) do not the dataset.
explicitly show any evolution trend that it is easy to correlate to the Table 5 reports the obtained clustering performances. Notice that
equipment degradation process. the proposed method provides significantly more satisfactory perfor-
The hyperparameters and model architecture of the proposed mances than those of Methods 1, 2, 3 and 4. This is due to the capa-
method are set equal to the values reported in Tables 1 and 2. The bilities of the proposed Conceptors-aided clustering method of effec-
first ten Conceptor matrices of each trajectory are discarded due to tively capturing the global degradation dynamics and of the Frobenius
the burn-in period of the reservoir states caused by their initialization, distance of assessing the similarity among the Conceptors.
and the Conceptor matrix is computed for segment 𝑚 with 𝑚 = In the second stage, whose objective is the classification of the
11𝑙𝑤 , 12𝑙𝑤 , … , 𝑁𝑟 (𝑙𝑤 = 5). Fig. 7 shows the pairwise distance matrix test trajectories, the following sets of labels, are associated to the
among the segments 𝑝𝑟1∶𝑆,1∶𝑚 , extracted from a run-to-failure training trajectories segments 𝑝𝑟1∶𝑆,1∶𝑚 , (𝑟 = 1, … , 𝑅, 𝑚 = 1, 𝑙𝑤 , 2𝑙𝑤 … , 𝑁𝑟 ) of
trajectory provided by the proposed method. It can be seen that the seg- the training set to train the classifiers: (i) the labels provided by the
ments characterized by the same degradation level are similar among clustering Methods 1a, 2b, 3a and 4b, which have been shown to be
11
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Table 5
Comparison of the performance obtained in clustering the training trajectories. The best performance is reported in bold-italic, (a) indicates
that the method receives in input a time window formed by the last 𝑙𝑤 signal measurements, whereas (b) indicates that the input is formed
by the time interval between 𝑡1 and the current time.
Fig. 10. Confusion matrix of the degradation level estimation results provided by the
proposed clustering and classification methods.
12
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Table 6
Comparison of the performance in the classification of the test trajectories. The best performance of each line is reported in bold-italic, notice
that the proposed Conceptors-based CNN achieves the best performance when true labels are available. Note that Classification 4, 5 and 6 act
as the ablation study to verify the importance of each procedure in the proposed classification method, and thus, are not combined with other
clustering methods.
Fig. 11. Example of transformation of the raw signals into a multivariate time series. Fig. 12. Pairwise distance matrix for bearing No. 4.
5.5. Case study II: IMS bearing dataset each 1 s time window. This way, the raw signals are transformed into
multivariate time series features as illustrated in Fig. 11. The Conceptor
The bearing dataset provided by the Center for Intelligent Mainte- matrix is then computed when ten 1-second windows are acquired and
nance Systems (IMS) of the University of Cincinati (Qiu et al., 2006) is the corresponding six features are extracted. The first 20 Conceptor
considered. A shaft supporting a 6000 lbs load installed is coupled to an matrices of each trajectory are discarded due to the burn-in period of
AC motor and rotates at 2000 rpm. Four force-lubricated bearings are
the reservoir states caused by their initialization, and the Conceptor
mounted on the shaft and two accelerometers are placed close to each
matrices are computed every five measurements.
bearing for measuring the vibrations at a frequency of 20 kHz. Every
By applying Algorithm 1, an optimal number of degradation levels
10 min of operation the acceleration signals are measured for a time
equal to 2 has been identified. Fig. 12 shows the pairwise distance
windows of 1 s containing 20 480 values. In this work, we consider
matrix among the segments 𝑝𝑟1∶𝑆,1∶𝑚 with 𝑚 = 21𝑙𝑤 , 22𝑙𝑤 , … , 𝑁𝑟 (𝑙𝑤 =
the degradation trajectories of bearings No. 3 and 4, which are the
10) extracted from the run-to-failure trajectory of bearing No. 4. It
only ones that fail in experiment 1 (Qiu et al., 2006). Each trajectory
can be seen that there is a clear partition between degradation levels
is formed by 𝑁𝑟 = 2156, (𝑟= 1, 2), 1-second length time windows, with
1 (healthy) and 2 (degraded). Notice that the separation boundary
𝑟= 1, 2 denoting bearing No. 3 and 4, respectively. The onset of ab-
between degradation levels 1 and 2 fits the reference value of 1672.
normal conditions on bearing No. 3 at time 1617 (Qiu et al., 2006),
2027 (Hasani et al., 2017) and on bearing No. 4 at time 1617 (Qiu The vertical and horizontal lines indicate the average of the times of
et al., 2006), 1760 (Yu, 2012) and 1641 (Hasani et al., 2017). Given the degradation onset detection in the literature works.
unavailability of the ground-truth time of the transition from a normal Table 7 reports the clustering performance. The proposed method is
(healthy) to an abnormal (degraded) condition, we assume that it has more accurate in the labeling of the training trajectories than the other
occurred at the average time among the transition times reported in clustering methods. It is interesting to notice that Clustering Methods
literatures (1822 for bearing No. 3 and 1672 for bearing No. 4), and 1b, 3b and 4b, which receive in input the whole trajectory, are not
we refer to these two quantities as the reference transition times. able to properly deal with the long-term dynamic of the degradation
Given the large dimensionality of the data, which makes unfeasi- process and, therefore, underperform with respect to the Clustering
ble directly using all the acceleration values acquired at 20 kHz as Methods 1a, 3a and 4b. Due to the better performance of time series
input of the clustering and classification methods, a feature extraction representation and similarity measurement, the Conceptor Clustering
procedure has been applied. In particular, according to Jardine et al. method outperform other clustering methods.
(2006) and Yang et al. (2021), the six statistical features of mean, The labels assigned to the segments of the trajectories by Clus-
root mean square, standard deviation, kurtosis, skewness and min– tering Methods 1a, 2b, 3a and 4a, that are the best options among
max range of the acceleration signal in the 1 s time window measured the corresponding methods (a) and (b), have been considered for the
by the sensor closest to the monitored bearing are computed from comparison of the classification results.
13
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Table 7
Comparison of the performance in clustering the training trajectories. The best performances are reported in bold-italic. (a) indicates that the
method receives in input a time window formed by the last 𝑙𝑤 signal measurements, whereas (b) indicates that the input is from time 𝑡1 to
the current time.
Table 8
Comparison of the performance in the classification of the test trajectories. The best performance of each line is reported in bold-italic; notice that proposed
Conceptors-based CNN achieves the best accuracy and NMI values. Note that Classification 4, 5 and 6 act as the ablation study to verify the importance of each
procedure in the proposed classification method, and thus, are not combined with other clustering methods.
A leave-one-out cross validation strategy is employed for setting the literature works have investigated the 3rd operating condition and we
scaling factor 𝛽 used for the classification. In practice, one trajectory is cannot find the reference transition times.
used for training and the other one for performance evaluation, and The same six statistical features (mean, root mean square, standard
vice-versa. The obtained optimal value of 𝛽 is 1. deviation, kurtosis, skewness and min–max range) of the acceleration
Table 8 reports the degradation level estimation performances on signal considered in case study II have been provided in input to the
the test trajectories. Similarly, it is organized in two purposes, (a) classification methods. They are extracted from the raw signal mea-
comparison of classification performances between the proposed clas- surements acquired during the 0.1-second windows containing 2560
sification method and others and (b) comparison of the result of unsu- acceleration values. Therefore, each time window becomes a single 6-
pervised degradation level estimation between the proposed two-stage dimensional pattern of the time series given in input to the clustering
approach and the compositional method of combining other clustering and classification algorithms.
and classification methods. According to the result, for purpose (a), The Conceptor matrix is computed each time 10 windows are
knowing that only very few trajectories (only 2) are used for training, acquired and, as before, the first 20 Conceptor matrices of each run-
accuracy of the TCN (Classification Method 2) and LSTM (Classifi- to-failure trajectory are discarded due to the burn-in period of the
cation Method 3) are worse than the KNN and the proposed, which reservoir states caused by their initialization. By applying Algorithm 1,
shows that the proposed classification method is not easy to overfit an optimal number of degradation levels equal to 2 has been identified.
the training data; for purpose (b), the proposed two-stage degrada- Table 10 reports the performance of the clustering methods. The
tion level estimation approach outperforms the combination of other proposed clustering method provides the most satisfactory clustering
clustering and classification methods. Note that the proposed two- of the training trajectories in operating condition 1. With respect to
stage Conceptor-CNN methods achieves slightly better performance the 2nd operating condition, the proposed method provides the largest
than Method 6, which compares similarity to the clusters centers and NMI and an Accuracy slightly smaller than that of Clustering Method
assigns the membership. Therefore, the use of a training set with 3a. Note that the Clustering Method 4 (deep learning-based) is nearly
some errors can improve the generalization ability of the models, and, the worst among the comparison methods, because the run-to-failure
therefore, improve the classification performance. trajectories are few and, thus, the deep learning approach is prone
to overfit and obtains poor generalization ability. Due to the better
5.6. Case study III: PHM challenge 2012 bearing dataset performance of time series representation and similarity measurement,
the Conceptor Clustering method outperform other clustering methods.
We consider the dataset of the PHM challenge 2012, which contains With respect to the classification of the test trajectories, a leave-
bearings run-to-failure data obtained using the PRONOSTIA experiment one-out cross validation procedure is applied to the labeled trajectories
platform (Nectoux et al., 2012b). and the average accuracy is used as optimization objective to set the
Three different operating conditions characterized by different mo- scaling factor 𝛽. The obtained optimal 𝛽 is 10 000 for the 1st operating
tor rotation speeds and loads on the bearings were considered during condition and 1 for the 2nd operating condition.
the run-to-failure degradation experiments: 1800 rpm and 4000 N (1st Table 11 reports the degradation level estimation performances
operating condition), 1650 rpm and 4200 N (2nd operating condition) obtained on the test trajectories. Note that, due to the absence of true
and 1500 rpm and 5000 N (3rd operating condition). Two accelerom- labels, the Table is organized to do the comparison of the result of unsu-
eters are installed in the horizontal and vertical position of the shaft pervised degradation level estimation between the proposed two-stage
to measure vibrations at a frequency of 25.6 kHz. Time windows approach and the compositional method of combining other clustering
of length equal to 0.1 s, containing 2560 acceleration values, are and classification methods. The proposed two-stage approach provides
measured every 10 s. Seven bearings are tested under the 1st and 2nd the best performance in terms of Accuracy on the trajectories of the
operating conditions. Table 9 reports the times at which the onset of 1st and 2nd operating conditions. The NMI metric estimations are
abnormal conditions have been detected in literature works. Notice that affected by large uncertainty due to the highly unbalanced number of
no transitions in abnormal conditions have been observed for bearings observations in normal (healthy) and abnormal (degraded) conditions.
No. 3, 4, 5 and 7 of the experiment in the 2nd operating condition. The Therefore, a difference of few missed or false alarms can cause a large
experiment in the 3rd operating condition is not considered, since no modification of the NMI metric. Notice that, although classification
14
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Table 9
Times of transition from the healthy state to the degraded state in the literature works.
Operating condition # of bearing Time of state transition in literature works Mean value
Mao, Zhang et al. (2020) Xiao et al. (2020) Jin et al. (2014) Mao, Tian et al. (2020)
1st No. 1 1405 1462 1129 1410 1351.5
1st No. 2 826 826 762 – 804.7
1st No. 3 1174 1365 891 1204 1158.5
1st No. 4 1087 1084 1083 – 1084.7
1st No. 5 2443 2411 1141 – 1998.3
1st No. 6 1590 1631 1641 – 1620.7
1st No. 7 2212 2206 885 – 1767.7
2nd No. 1 155 – – – 155
2nd No. 2 255 – – – 255
2nd No. 6 688 – – – 688
Table 10
Comparison of the performance in clustering the training trajectories. The best performance of each line is reported in italic. (a) indicates that
the method receives in input a time window formed by the last 𝑙𝑤 signal measurements, whereas (b) indicates that the input is from time 𝑡1
to the current time.
Table 11
Comparison of the performance in the classification of the test trajectories. The best performance of each line is reported in bold-italic; notice
that the proposed Conceptor-based CNN achieves the best accuracy for both working conditions. Note that Classification 4, 5 and 6 act as
the ablation study to verify the importance of each procedure in the proposed classification method, and thus, are not combined with other
clustering methods.
method 6 is slightly better than classification method 4, method 6 is CRediT authorship contribution statement
worse than other methods and the proposed two-stage Conceptor-based
approach. This is because method 6 is prone to be affected by outlier Mingjing Xu: Methodology, Writing – original draft, Writing – re-
trajectories if there were only a few run-to-failure trajectories. view & editing, Coding with python. Piero Baraldi: Conceptualization,
Methodology, Supervision, Reviewing and editing. Zhe Yang: Syn-
6. Conclusion thetic dataset preparing. Enrico Zio: Conceptualization, Methodology,
Supervision, Reviewing and editing.
A two-stage method based on the combined use of Conceptors and
CNNs has been proposed for data-driven degradation level estimation
Declaration of competing interest
in the case in which only few unlabeled run-to-failure degradation
trajectories are available.
The authors declare that they have no known competing finan-
The proposed method has outperformed state-of-the-art methods
in accuracy on a synthetic and two literature case studies containing cial interests or personal relationships that could have appeared to
vibrational data extracted from bearings with a competitive compu- influence the work reported in this paper.
tational efficiency. The obtained results show that Conceptors allow
effectively dealing with multivariate time series characterized by long- Data availability
term temporal dependencies, which are difficult to treat with methods
based on the use of sliding time windows. Also, the combination of Con- The authors do not have permission to share data.
ceptors and convolutional neural networks have been shown effective
in the classification of the degradation level of test components. Acknowledgments
It is, therefore, possible to conclude that the proposed method
contributes to overtake one of the main limitations toward the practical The work is developed within the research project ‘‘SMART MAIN-
estimation of equipment degradation level, i.e. the need of a large TENANCE OF INDUSTRIAL PLANTS AND CIVIL STRUCTURES BY 4.0
amount of labeled data for model training. MONITORING TECHNOLOGIES AND PROGNOSTIC APPROACHES -
15
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
MAC4PRO’’, sponsored by the call BRIC-2018 of the National Institute Ghafoori, Z., Erfani, S. M., Bezdek, J. C., Karunasekera, S., & Leckie, C. (2020). LN-SNE:
for Insurance against Accidents at Work – INAIL. Mingjing Xu gratefully Log-normal distributed stochastic neighbor embedding for anomaly detection. IEEE
Transactions on Knowledge and Data Engineering, http://dx.doi.org/10.1109/TKDE.
acknowledges the financial support from the China Scholarship Council
2019.2934450.
(No. 201606420061). Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks.
In Proceedings of the Fourteenth International Conference on Artificial Intelligence and
Statistics (pp. 315–323).
References
Han, T., Liu, C., Wu, L., Sarkar, S., & Jiang, D. (2019). An adaptive spatiotemporal
feature learning approach for fault diagnosis in complex systems. Mechanical
Al-Dahidi, S., Di Maio, F., Baraldi, P., Zio, E., & Seraoui, R. (2018). A framework for Systems and Signal Processing, http://dx.doi.org/10.1016/j.ymssp.2018.07.048.
reconciliating data clusters from a fleet of nuclear power plants turbines for fault Hasani, R. M., Wang, G., & Grosu, R. (2017). An automated auto-encoder correlation-
diagnosis. Applied Soft Computing, 69, 213–231. based health-monitoring and prognostic method for machine bearings. arXiv
Alaswad, S., & Xiang, Y. (2017). A review on condition-based maintenance optimization preprint arXiv:1703.06272.
models for stochastically deteriorating system. Reliability Engineering & System He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in
Safety, http://dx.doi.org/10.1016/j.ress.2016.08.009. Neural Information Processing Systems, 18, 507–514.
Ali, J. B., Fnaiech, N., Saidi, L., hebel Morello, & Fnaiech, F. (2015). Application of Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,
empirical mode decomposition and artificial neural network for automatic bearing 9(8), 1735–1780.
fault diagnosis based on vibration signals. Applied Acoustics, 89, 16–27. Jaeger, H. (2017). Using conceptors to manage neural long-term memories for temporal
Arn, R., Narayana, P., Draper, B., Emerson, T., Kirby, M., & Peterson, C. (2018). Motion patterns. Journal of Machine Learning Research, 18(1), 387–429.
segmentation via generalized curvatures. IEEE Transactions on Pattern Analysis and Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters,
Machine Intelligence, http://dx.doi.org/10.1109/TPAMI.2018.2869741. 31(8), 651–666.
Atamuradov, V., Member, IEEE, Medjaher, K., & Member (2018). Railway point Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics
machine prognostics based on feature fusion and health state assessment. IEEE and prognostics implementing condition-based maintenance. Mechanical Systems and
Transactions on Instrumentation and Measurement, PP(99), 1–14. Signal Processing, 20(7), 1483–1510.
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic Jiao, J., Zhao, M., & Lin, J. (2020). Unsupervised adversarial adaptation network
convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv: for intelligent fault diagnosis. IEEE Transactions on Industrial Electronics, 67(11),
1803.01271. 9904–9913. http://dx.doi.org/10.1109/TIE.2019.2956366.
Baraldi, P., Di Maio, F., Rigamonti, M., Zio, E., & Seraoui, R. (2015a). Clustering for Jin, X., Sun, Y., Shan, J., Wang, Y., & Xu, Z. (2014). Health monitoring and fault
unsupervised fault diagnosis in nuclear turbine shut-down transients. Mechanical detection using wavelet packet technique and multivariate process control method.
Systems and Signal Processing, 58, 160–178. In 2014 prognostics and system health management conference (PHM-2014 Hunan)
Baraldi, P., Di Maio, F., Rigamonti, M., Zio, E., & Seraoui, R. (2015b). Unsupervised (pp. 257–260). IEEE.
clustering of vibration signals for identifying anomalous conditions in a nuclear Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
turbine. Journal of Intelligent & Fuzzy Systems, 28(4), 1723–1731. convolutional neural networks. In Advances in neural information processing systems
Baraldi, P., Di Maio, F., & Zio, E. (2013). Unsupervised clustering for fault diagnosis in (pp. 1097–1105).
nuclear power plant components. International Journal of Computational Intelligence Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., & Kim, K. J. (2019). A survey of deep
Systems, 6(4), 764–777. learning-based network anomaly detection. Cluster Computing, 1–13.
Belhadi, A., Djenouri, Y., Srivastava, G., Djenouri, D., & Lin, C. W. (2020). A two- Längkvist, M., Karlsson, L., & Loutfi, A. (2014). A review of unsupervised feature
phase anomaly detection model for secure intelligent transportation ride-hailing learning and deep learning for time-series modeling. Pattern Recognition Letters,
trajectories. IEEE Transactions on Intelligent Transportation Systems, PP(99). http://dx.doi.org/10.1016/j.patrec.2014.01.008, arXiv:1602.07261.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and Lea, C., Flynn, M. D., Vidal, R., Reiter, A., & Hager, G. D. (2017). Temporal
new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, convolutional networks for action segmentation and detection. In Proceedings of
http://dx.doi.org/10.1109/TPAMI.2013.50, arXiv:1206.5538. the IEEE conference on computer vision and pattern recognition (pp. 156–165).
Borisagar, K. R., Thanki, R. M., & Sedani, B. S. (2019). Fourier transform, short-time Lei, Y., Jia, F., Lin, J., Xing, S., & Ding, S. X. (2016). An intelligent fault diagnosis
Fourier transform, and wavelet transform. In Speech enhancement techniques for method using unsupervised feature learning towards mechanical big data. IEEE
digital hearing aids (pp. 63–74). Springer. Transactions on Industrial Electronics, 63(5), 3137–3147.
Lei, J., Liu, C., & Jiang, D. (2019). Fault diagnosis of wind turbine based on
Büsing, L., Schrauwen, B., & Legenstein, R. (2010). Connectivity, dynamics, and mem-
long short-term memory networks. Renewable Energy, http://dx.doi.org/10.1016/
ory in reservoir computing with binary and analog neurons. Neural Computation,
j.renene.2018.10.031.
22(5), 1272–1311.
Li, L., Hansman, R. J., Palacios, R., & Welsch, R. (2016). Anomaly detection via a
Cakir, F., He, K., Bargal, S. A., & Sclaroff, S. (2019). Hashing with mutual information.
Gaussian mixture model for flight operation and safety monitoring. Transportation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10), 2424–2437.
Research Part C (Emerging Technologies), http://dx.doi.org/10.1016/j.trc.2016.01.
Cao, C., Huang, Y., Yang, Y., Wang, L., Wang, Z., & Tan, T. (2019). Feedback convo-
007.
lutional neural network for visual localization and segmentation. IEEE Transactions
Li, X., Hu, Y., Li, M., & Zheng, J. (2020). Fault diagnostics between different type
on Pattern Analysis and Machine Intelligence, http://dx.doi.org/10.1109/TPAMI.2018.
of components: A transfer learning approach. Applied Soft Computing, 86, Article
2843329.
105950.
Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market
Li, X., Li, X., & Ma, H. (2020). Deep representation clustering-based fault diagnosis
analysis and prediction: Methodology, data representations, and case studies. Expert
method with unsupervised data applied to rotating machinery. Mechanical Systems
Systems with Applications, 83, 187–205.
and Signal Processing, 143, Article 106825.
Costa, B. S. J., Angelov, P. P., & Guedes, L. A. (2015). Fully unsupervised fault Li, Q., Song, Y., Zhang, J., & Sheng, V. S. (2020). Multiclass imbalanced learning
detection and identification based on recursive density estimation and self-evolving with one-versus-one decomposition and spectral clustering. Expert Systems with
cloud-based classifier. Neurocomputing, 150, 289–303. Applications, 147, Article 113152.
Diaz, M., Henriquez, P., Ferrer, M. A., Pirlo, G., Alonso, J. B., Carmona-Duarte, C., Li, C., Zia, M. Z., Tran, Q. H., Yu, X., Hager, G. D., & Chandraker, M. (2019).
& Impedovo, D. (2017). Stability-based system for bearing fault early detection. Deep supervision with intermediate concepts. http://dx.doi.org/10.1109/TPAMI.
Expert Systems with Applications, 79, 65–75. 2018.2863285, arXiv:1801.03399,
Dibaj, A., Ettefagh, M. M., Hassannejad, R., & Ehghaghi, M. B. (2021). A hybrid fine- Lin, K., Lu, J., Chen, C. S., Zhou, J., & Sun, M. T. (2019). Unsupervised deep learning
tuned VMD and CNN scheme for untrained compound fault diagnosis of rotating of compact binary descriptors. IEEE Transactions on Pattern Analysis and Machine
machinery with unequal-severity faults. Expert Systems with Applications, 167, Article Intelligence, http://dx.doi.org/10.1109/TPAMI.2018.2833865.
114094. Liu, H., Song, W., Niu, Y., & Zio, E. (0000). A generalized cauchy method for remaining
Ellefsen, A. L., Bjørlykhaug, E., Æsøy, V., & Zhang, H. (2019). An unsupervised useful life prediction of wind turbine gearboxes, Mechanical Systems and Signal
reconstruction-based fault detection algorithm for maritime components. IEEE Processing, 153, 107471.
Access, 7, 16101–16109. Liu, H., Zhou, J., Xu, Y., Zheng, Y., Peng, X., & Jiang, W. (2018). Unsupervised fault
Ferreira, A. A., Ludermir, T. B., & De Aquino, R. R. B. (2013). An approach to diagnosis of rolling bearings using a deep neural network based on generative
reservoir computing design and training. Expert Systems with Applications, http: adversarial networks. Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2018.
//dx.doi.org/10.1016/j.eswa.2013.01.029. 07.034.
Gangsar, P., & Tiwari, R. (2020). Signal based condition monitoring techniques for fault Lu, S., He, Q., & Wang, J. (2019). A review of stochastic resonance in rotating machine
detection and diagnosis of induction motors: A state-of-the-art review. Mechanical fault detection. Mechanical Systems and Signal Processing, 116, 230–260.
Systems and Signal Processing, 144, Article 106908. Lukoševičius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent
Gebraeel, N. (2006). Sensory-updated residual life distributions for components with neural network training. Computer Science Review, 3(3), 127–149.
exponential degradation patterns. IEEE Transactions on Automation Science and Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural
Engineering, http://dx.doi.org/10.1109/TASE.2006.876609. network acoustic models. In Proc. Icml, Vol. 30 (p. 3).
16
M. Xu et al. Expert Systems With Applications 213 (2023) 118962
Mao, W., Tian, S., Fan, J., Liang, X., & Safian, A. (2020). Online detection of bearing Wu, Z., Luo, H., Yang, Y., Zhu, X., & Qiu, X. (2018). An unsupervised degradation
incipient fault with semi-supervised architecture and deep feature representation. estimation framework for diagnostics and prognostics in cyber-physical system. In
Journal of Manufacturing Systems, 55, 179–198. 2018 IEEE 4th world forum on internet of things (WF-IoT) (pp. 784–789). IEEE.
Mao, W., Zhang, D., Tian, S., & Tang, J. (2020). Robust detection of bearing early fault Wulandari, C. P., Ou-Yang, C., & Wang, H.-C. (2019). Applying mutual information
based on deep transfer learning. Electronics, 9(2), 323. for discretization to support the discovery of rare-unusual association rule in
Miao, H., Li, B., Sun, C., & Liu, J. (2019). Joint learning of degradation assessment cerebrovascular examination dataset. Expert Systems with Applications, 118, 52–64.
and RUL prediction for aero-engines via dual-task deep LSTM networks. IEEE Xiao, L., Liu, Z., Zhang, Y., Zheng, Y., & Cheng, C. (2020). Degradation assessment of
Transactions on Industrial Informatics, 1. bearings with trend-reconstruct-based features selection and gated recurrent unit
Nectoux, P., Gouriveau, R., Medjaher, K., Ramasso, E., Chebel-Morello, B., Zerhouni, N., network. Measurement, 165, Article 108064.
& Varnier, C. (2012a). PRONOSTIA : An experimental platform for bearings Xiao, Y., Wang, H., Zhang, L., & Xu, W. (2014). Two methods of selecting Gaussian
accelerated degradation tests. In IEEE international conference on prognostics and kernel parameters for one-class SVM and their application to fault detection.
health management, PHM’12. Knowledge-Based Systems, 59, 75–84.
Nectoux, P., Gouriveau, R., Medjaher, K., Ramasso, E., Chebel-Morello, B., Zerhouni, N., Xu, M., Baraldi, P., Al-Dahidi, S., & Zio, E. (2019). Fault prognostics in presence of
& Varnier, C. (2012b). PRONOSTIA: An experimental platform for bearings event-based measurements: Proceedings of ESREL 2019, Sep 22-26, 2019, Hannover,
accelerated degradation tests. Germany. Research Publishing.
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an Xu, M., Baraldi, P., Al-Dahidi, S., & Zio, E. (2020). Fault prognostics by an ensemble
algorithm. Advances in Neural Information Processing Systems, 14, 849–856. of Echo State Networks in presence of event based measurements. Engineering
Nourelfath, M., Châtelet, E., & Nahas, N. (2012). Joint redundancy and imperfect Applications of Artificial Intelligence, 87, Article 103346.
preventive maintenance optimization for series-parallel multi-state degraded sys- Xu, M., Baraldi, P., & Zio, E. (2020b). Fault diagnostics by conceptors-aided clustering.
tems. Reliability Engineering & System Safety, http://dx.doi.org/10.1016/j.ress.2012. In 30th European safety and reliability conference, ESREL 2020 and 15th probabilistic
03.004. safety assessment and management conference, PSAM 2020 (pp. 3656–3663). Research
Qian, G., & Zhang, L. (2018). A simple feedforward convolutional conceptor neural Publishing Services.
network for classification. Applied Soft Computing, 70, 1034–1041. Xu, M., Baraldi, P., & Zio, E. (2020c). Fault diagnostics by conceptors-aided clustering:
Qiao, Z., Lei, Y., & Li, N. (2019). Applications of stochastic resonance to machinery Proceedings of ESREL 2020, Nov 1-5, 2020, Venice, Italy. Research Publishing.
fault detection: A review and tutorial. Mechanical Systems and Signal Processing, 122, Yang, Z., Baraldi, P., & Zio, E. (2021). A method for fault detection in multi-
502–536. component systems based on sparse autoencoder-based deep neural networks.
Qiao, J., Li, F., Han, H., & Li, W. (2016). Growing echo-state network with multiple Reliability Engineering & System Safety, Article 108278.
subreservoirs. IEEE Transactions on Neural Networks and Learning Systems, 28(2), Ye, J., Qi, G.-J., Zhuang, N., Hu, H., & Hua, K. A. (2018). Learning compact features
391–404. for human activity recognition via probabilistic first-take-all. IEEE Transactions on
Qiu, H., Lee, J., Lin, J., & Yu, G. (2006). Wavelet filter-based weak signature detection Pattern Analysis and Machine Intelligence, 42(1), 126–139.
method and its application on rolling element bearing prognostics. Journal of Sound Yildiz, I. B., Jaeger, H., & Kiebel, S. J. (2012). Re-visiting the echo state property.
and Vibration, 289(4–5), 1066–1090. Neural Networks, 35, 1–9.
Rocco, S. C. M., & Zio, E. (2007). A support vector machine integrated system for the Yin, S., Ding, S. X., Xie, X., & Luo, H. (2014). A review on basic data-driven approaches
classification of operation anomalies in nuclear components and systems. Reliability for industrial process monitoring. http://dx.doi.org/10.1109/TIE.2014.2301773.
Engineering & System Safety, http://dx.doi.org/10.1016/j.ress.2006.02.003. Yu, J. (2012). Health condition monitoring of machines based on hidden Markov model
Rodríguez-Ramos, A., da Silva Neto, A. J., & Llanes-Santiago, O. (2018). An approach and contribution analysis. IEEE Transactions on Instrumentation and Measurement,
to fault diagnosis with online detection of novel faults using fuzzy clustering tools. 61(8), 2200–2211.
Expert Systems with Applications, 113, 200–212. Yuan, J., & Liu, X. (2013). Semi-supervised learning and condition fusion for fault
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation diagnosis. Mechanical Systems and Signal Processing, 38(2), 615–627.
of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. Zhang, Y., Li, X., Gao, L., & Li, P. (2018). A new subset based deep feature learning
Salvador, S., & Chan, P. (2007). Toward accurate dynamic time warping in linear time method for intelligent fault diagnosis of bearing. Expert Systems with Applications,
and space. Intelligent Data Analysis, 11(5), 561–580. 110, 125–142.
Saravanan, N., & Ramachandran, K. I. (2010). Incipient gear box fault diagnosis using Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. (2017). Efficient knn classification
discrete wavelet transform (DWT) for feature extraction and classification using with different numbers of nearest neighbors. IEEE Transactions on Neural Networks
artificial neural network (ANN). Expert Systems with Applications, http://dx.doi.org/ and Learning Systems, 29(5), 1774–1785.
10.1016/j.eswa.2009.11.006. Zhang, M., Wang, N., Li, Y., & Gao, X. (2019). Deep latent low-rank representation for
Sarmadi, H., & Karamodin, A. (2020). A novel anomaly detection method based on face sketch synthesis. IEEE Transactions on Neural Networks and Learning Systems,
adaptive mahalanobis-squared distance and one-class kNN rule for structural health http://dx.doi.org/10.1109/TNNLS.2018.2890017.
monitoring under environmental effects. Mechanical Systems and Signal Processing, Zhang, S., Zhang, S., Wang, B., & Habetler, T. G. (2020). Deep learning algorithms for
http://dx.doi.org/10.1016/j.ymssp.2019.106495. bearing fault diagnosticsx—A comprehensive review. IEEE Access, 8, 29857–29881.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale Zheng, Y., Li, S., Yan, R., Tang, H., & Tan, K. C. (2018). Sparse temporal encoding of
image recognition. arXiv preprint arXiv:1409.1556. visual features for robust object recognition by spiking neurons. IEEE Transactions
Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of on Neural Networks and Learning Systems, http://dx.doi.org/10.1109/TNNLS.2018.
initialization and momentum in deep learning. In International conference on machine 2812811.
learning (pp. 1139–1147). Zhu, H., Lu, L., Yao, J., Dai, S., & Hu, Y. (2018). Fault diagnosis approach for
Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE photovoltaic arrays based on unsupervised sample clustering and probabilistic
Transactions on Neural Networks, 11(3), 586–600. neural network model. Solar Energy, http://dx.doi.org/10.1016/j.solener.2018.10.
Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 054.
395–416.
17