You are on page 1of 17

Expert Systems With Applications 213 (2023) 118962

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A two-stage estimation method based on Conceptors-aided unsupervised


clustering and convolutional neural network classification for the estimation
of the degradation level of industrial equipment
Mingjing Xu a , Piero Baraldi a ,∗, Zhe Yang b,a , Enrico Zio a,c,d
a
Energy Department, Politecnico di Milano, Via La Masa 34, 20156 Milan, Italy
b School of Mechanical Engineering, Dongguan University of Technology, Dongguan, 523808, China
c MINES ParisTech, PSL Research University, CRC, Sophia Antipolis, France
d Aramis Srl, Via pergolesi 5, Milano, Italy

ARTICLE INFO ABSTRACT

Keywords: In practical applications, degradation level estimation is often facing the challenge of dealing with unlabeled
Degradation level estimation time series characterized by long-term temporal dependencies, which are typically not properly represented
Conceptors using sliding time windows. Inspired by the idea of representing temporal patterns by a mechanism of
Reservoir computing
neurodynamical pattern learning, called Conceptors, a two-stage method for the estimation of the equipment
Time series clustering
degradation level is developed. In the first stage, clusters of Conceptors representing similar patterns of
Convolutional Neural Network (CNN )
Bearings
degradation within complete run-to-failure trajectories are identified; in the second stage, the obtained clusters
are used to supervise the training of a convolutional neural network classifier of the equipment degradation
level. The proposed method is applied to a synthetic case study and to two literature case studies regarding
bearings degradation level estimation. The obtained results show that the proposed method provides more
accurate estimation of the equipment degradation level than other state-of-the-art methods.

1. Introduction patterns are sufficiently dissimilar to normal condition patterns to


allow discriminating them Ghafoori et al. (2020). On the other hand,
Fault detection entails the identification of the occurrence of ab- in many industrial applications anomalous conditions are rare and
normal conditions in the behaviors of industrial equipment, whereas changes in operating and environmental conditions cause variations
fault diagnostics entails the estimation of the equipment degradation of the measured signals that are larger than the variations caused
level and the classification of the cause and type of the abnormal
by the onset of a degradation of a component, at least at the early
condition (Gangsar & Tiwari, 2020; Lu et al., 2019; Qiao et al., 2019).
stages after its occurrence. For this reason, one-class classification
Accurate and efficient fault detection and diagnostics can support
effective maintenance and, in turn, increase production availability and method (Xiao et al., 2014), which are trained on a dataset containing
system safety, also possibly reducing the overall maintenance costs (Al- only normal condition patterns are among the most used for fault
Dahidi et al., 2018; Lei et al., 2019; Liu et al., 0000; Zhang et al., detection. Examples of classification methods applied to the one-class
2018). classification problems are: Support Vector Machines (SVMs) (Rocco &
Fault detection approaches are typically categorized as supervised, Zio, 2007), nearest neighbor-based methods (Sarmadi & Karamodin,
unsupervised and one-class classification (Belhadi et al., 2020). Su- 2020), statistical-based models (Li et al., 2016) and Deep Learning
pervised methods require the availability of a sufficient number of based (DL) (Kwon et al., 2019).
signal measurements labeled with the information on the component Fault diagnostics methods are typically based on supervised ap-
health state, i.e. normal or anomalous. They typically face the prob- proaches such as Artificial Neural Networks (ANNs) (Ali et al., 2015),
lems of dealing with imbalanced datasets, being abnormal condition
Support Vector Machines (SVMs) (Atamuradov et al., 2018) and Long
data typically rare and of the variability of the operating conditions.
Short Term Memory (LSTMs) (Miao et al., 2019). Notice that the
Unsupervised methods do not need labeled data, but they typically
assume that (i) a sufficient number of patterns collected in both nor- availability of labeled data needed for the model training depends from
mal and anomalous conditions is available, (ii) anomalous condition the characteristics of the system and the probability of investigating

∗ Corresponding author.
E-mail addresses: mingjing.xu@polimi.it (M. Xu), piero.baraldi@polimi.it (P. Baraldi), zhe.yang@polimi.it (Z. Yang), enrico.zio@polimi.it (E. Zio).

https://doi.org/10.1016/j.eswa.2022.118962
Received 8 August 2021; Received in revised form 25 June 2022; Accepted 1 October 2022
Available online 8 October 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Nomenclature 𝑁𝑐𝑜𝑛𝑣 Number of convolutional layers


𝑤𝑓 Width of the convolutional filter
ANN Artificial Neural Network
𝛱1𝑟 , … , 𝛱𝑘𝑟 (𝑘) 𝑘 Clusters of Conceptors 𝑪 𝑟𝑖 =1,…,𝑁
SVM Support Vector Machine 𝑟
𝛱1𝑟 , … , 𝛱𝑔𝑟̂ Obtained clusters of Conceptors 𝑪 𝑟𝑖 =1,…,𝑁
RNN Recurrent Neural Network 𝑟
𝑑𝑖𝑟 Degradation level of equipment 𝑟 at time 𝑡𝑖
DTW Dynamic Time Warping
𝛱𝑑𝑟 𝑟 Cluster representing degradation level 𝑑𝑖𝑟
ESN Echo State Network 𝑖

𝑁 Number of internal neurons 𝑑𝑖𝑡𝑒𝑠𝑡 Degradation level of test equipment at time


𝑆 Number of signals 𝑡𝑖
𝑠 Generic 𝑠th signal NMI Normalized Mutual Information
𝑅 Number of monitored equipment
𝑟 Generic 𝑟th equipment
𝑁𝑟 Number of measurements taken from equipment value systems, labeled data of faults are rarely available, and, therefore,
𝑟 the use of supervised methods for fault diagnostic is limited.
𝑁𝑡𝑒𝑠𝑡 Number of measurements taken from test equip- This work considers the common situation in which labeled data
ment containing information on the equipment health state are not available
𝑡𝑖 Generic time instant and only run-to-failure data collected during the entire life of equip-
𝑝𝑟𝑠,𝑖 Value of signal 𝑠 measured at time 𝑡𝑖 from ment can be exploited for fault detection and diagnostics. In this case,
equipment 𝑟 given the impossibility of using supervised learning technique, methods
𝑷𝑟 Matrix whose generic entry is 𝑝𝑟𝑠,𝑖 of unsupervised learning for clustering the data can be developed.
𝑝𝑡𝑒𝑠𝑡 Value of signal 𝑠 measured at time 𝑡𝑖 from test Depending on the specific characteristics of the industrial application,
𝑠,𝑖
equipment the identified clusters can be used for fault detection, when one of the
identified clusters corresponds to normal condition and other clusters
𝑾 𝑖𝑛 Weights of the connections from the input
to abnormal conditions that are not further labeled by experts as to
neurons to the internal neurons
fault diagnostics, when it is possible to assign the clusters to specific
𝑾 Weights of the connections among the internal
degradation levels as types of malfunctionings.
neurons
Unsupervised learning is an important topic in machine learning for
𝜌 ESN spectral radius time series segmentation (Arn et al., 2018; Cakir et al., 2019; Saravanan
𝑐 Connectivity of the reservoir neurons & Ramachandran, 2010) and pattern recognition (Cao et al., 2019;
𝒙(𝑡𝑖 ) Reservoir state at time 𝑡𝑖 Costa et al., 2015; Längkvist et al., 2014; Lin et al., 2019; Zheng et al.,
𝒖(𝑡𝑖 ) Input at time 𝑡𝑖 2018). In degradation level estimation applications, it is used to provide
𝑹(𝑖) Correlation matrix of 𝒙(𝑡𝑖 ) at time 𝑡𝑖 abstract representations of the raw measurement data, and discrimi-
𝛼 Control parameter called aperture nate between healthy and working conditions with different levels of
𝑪 𝑟𝑖 Conceptor representing the degradation of equip- degradation (Al-Dahidi et al., 2018; Baraldi et al., 2015a; Wu et al.,
ment 𝑟 from time 𝑡1 to time 𝑡𝑖 2018). In the work of Wu et al. (2018), a cluster-based Hidden Markov
Model is proposed for learning the internal state and transition of the
𝑪 𝑡𝑒𝑠𝑡
𝑖 Conceptor representing the degradation of test
degraded states. In the work of Al-Dahidi et al. (2018), a framework
equipment from time 𝑡1 to time 𝑡𝑖
for fleets of nuclear power plants turbines based on the reconciliation of
𝑟
𝐷𝑖𝑗 Frobenius distance between Conceptors 𝑪 𝑟𝑖 and
data clusters through unsupervised consensus has been proposed. In the
𝑪 𝑟𝑗
work of Baraldi et al. (2015a), a methodology based on unsupervised
𝐴𝑟𝑖𝑗 Similarity measure between Conceptors 𝑪 𝑟𝑖 and spectral clustering combined with fuzzy C-means (FCM ) is developed
𝑪 𝑟𝑗 for identifying groups of similar shutdown transients performed by a
𝑨𝑟 Matrix of the similarity among Conceptors repre- nuclear turbine.
senting equipment 𝑟 degradation Representation learning techniques that can disentangle the differ-
𝛬 Diagonal degree matrix of 𝑨𝑟 ent explanatory factors of variation behind the raw data, making it
𝑳 Normalized Laplacian matrix easier to extract and organize the discriminative information when
𝜆1 , … , 𝜆𝑙 Eigenvalues of the normalized Laplacian matrix 𝑳 building fault diagnostic models (Bengio et al., 2013; Chong et al.,
𝑏1 , … , 𝑏𝑙 Eigenvectors of the normalized Laplacian matrix 2017; Li et al., 2019; Zhang et al., 2019). In principle, features able
𝑳 to catch the dynamics of the measured signals can be extracted from
𝒁 Spectral normalization matrix of 𝑳 the raw measurements by applying ad hoc signal processing such as
Short Time Fourier and Wavelet transformations (Yin et al., 2014).
𝐾 Maximum possible number of Conceptors clusters
However, the processing is heavily dependent on a priori knowledge
𝑆𝐼𝑘𝑟 Silhouette Index of the 𝑘 clusters obtained from
and diagnostic expertise (Liu et al., 2018; Zhu et al., 2018), and can be
equipment 𝑟
quite time-consuming and labor-intensive (Lei et al., 2016).
𝑘∗𝑟 Optimal number of Conceptors clusters for equip- Given the above, the objective of this work is to develop a method
ment 𝑟 for estimating the degradation level of working equipment using sig-
𝑔̂ Estimated number of degradation levels nals measured during operation. To realistically reproduce industrial
𝑾̃ 𝑚+1 𝑛th convolutional kernel associated to the 𝑚 + 1th applications, we assume that few unlabeled run-to-failure degradation
𝑛
layer trajectories are available to build the degradation level estimation
model and that given the stochasticity of the degradation process,
the trajectories can have very different durations. This condition is
encountered in many applications such as rotating machinery (Diaz
the cause of faults or performing laboratory experiments for acquiring et al., 2017; Dibaj et al., 2021; Li, Li et al., 2020), maritime compo-
the labeled data. According to Baraldi et al. (2015b) and Yuan and Liu nents (Ellefsen et al., 2019), actuator system (Rodríguez-Ramos et al.,
(2013), in many industrial sectors characterized by high risk and high 2018) and planetary gearbox (Jiao et al., 2020).

2
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Notice that the unsupervised learning methods previously intro- K-Means (Jain, 2010) and Self-Organizing Map (SOM ) (Vesanto & Al-
duced cannot reapplied to the problem of this work. For example, the honiemi, 2000), with Laplacian Score (LS) (He et al., 2005) for feature
methods in Al-Dahidi et al. (2018), Baraldi et al. (2015a) and Liu et al. selection, Fourier Transform (FT ) and Short Time Fourier Transform
(2018) that cluster transients of components are not able to diagnosis (STFT ) (Borisagar et al., 2019) for feature extraction, and Dynamic
the degradation level, since they do not consider the dynamics of the Time Warping (DTW ) (Salvador & Chan, 2007) for distance computa-
degradation evolution. tion, in addition, a deep learning clustering approach based on Auto-
Since representation learning is adaptively capable of learning fea- Encoder and Generative Adversarial Network for extracting features
tures from raw data, it can constitute an excellent a priori choice for the and K-means clustering them is proposed for comparison; the classifi-
development of diagnostic techniques. In the work of Lei et al. (2016), cation stage has been performed by K-Nearest Neighbor (KNN ) (Zhang
an unsupervised sparse filtering method based on a two-layer neural et al., 2017), Temporal Convolutional Network (TCN ) (Lea et al., 2017)
network is used to directly learn features from mechanical vibration and Long-Short Term Memory (LSTM ) (Hochreiter & Schmidhuber,
signals. In the work of Han et al. (2019), a Spatio-Temporal Pattern 1997).
Network (STPN ) based on Probabilistic Finite State Automation (PFSA) The present work is built up on the Conceptor-based clustering
and Markov machines is proposed to represent temporal and spatial method proposed by the authors in Xu et al. (2020b), which is here
structures for diagnostics in complex systems. However, these con- extended to develop a classifier of the test component degradation
ventional representation learning methods cannot capture long-term level (Xu et al., 2020b). Differently from the method in Xu et al.
temporal dependencies in the time series and they typically require (2020b), which can be applied only to an entire run-to-failure degrada-
high computational complexity. tion trajectory, the method here developed can be applied to the data
Conceptor is a mechanism of neurodynamical temporal pattern collected from a component until the present time, as which allows a
representation proposed in the work of Jaeger (2017). Considering a real-time degradation assessment.
reservoir, i.e. a randomly generated and sparsely connected Recurrent Other differences with respect to Xu et al. (2020b) are:
Neural Network (RNN ) (Lukoševičius & Jaeger, 2009), Conceptors can
be understood as filters characterizing the geometries of the temporal • The formalization of method for Conceptor Clustering in an algo-
states of the reservoir neurons in the form of square matrices (Jaeger, rithm, which is reported in Section 4.1.2;
2017), achieving a direction-selective damping of high-dimensional • The analysis of the time and space complexity of the Conceptor
reservoir states (Qian & Zhang, 2018). The choice of Conceptor is Clustering algorithm, which is reported in Section 4.3 of the
motivated by its capability of catching the degradation dynamics of present work;
the multivariate time series with variable length and representing into • The visualization of the embedding space of the normalized eigen-
the form of matrix, thus making it more accurate and easier for the vectors found by the Conceptor Clustering algorithm which is
clustering and classification of degradation level. shown in Section 5.4 of the present work;
The proposed method develops in two stages. In the first stage, the • The validation of the clustering method on two new case studies.
Conceptors extracted from the run-to-failure degradation trajectories of
The original contributions of the work are:
training are clustered into several non-overlapped time series segments
representing different degradation levels. The first stage addresses 1. the development of a novel Conceptors-aided clustering ap-
the challenge of simultaneously coping with features correlations and proach for multi-variate time series with variable length;
long-term degradation dynamics in run-to-failure trajectories, which is 2. the development of Conceptors-based CNN for degradation level
the key issue in the components degradation level clustering. In the estimation;
second stage, the Conceptors and corresponding labels obtained from 3. the proposed methodology can be applied to generic application
the clustering in the first stage are used to train a Convolutional Neural of unsupervised degradation level estimation with time series
Network (CNN ) for real-time diagnosing the equipment degradation input.
level. The CNN receives in input the Conceptors extracted from the
reservoir states at the current time, which contains information about The remaining of the paper is organized as follows: Section 2 states
the long term evolution of the degradation, and the difference be- the problem and illustrates the work objectives; Section 3 introduces
tween the Conceptors extracted at the present and previous time steps, the background and preliminaries of the work and Section 4 describes
which contains information about the short-term degradation variation. the proposed method of unsupervised degradation level estimation;
The second stage addresses the challenge of bridging the degradation Section 5 introduces the numerical synthetic case study with long-term
dynamic representations, which is in the form of Conceptors and Con- temporal dependencies and the two bearing case studies, and, then,
ceptor differences, and their associated classification level. Specifically, discusses the obtained results. Finally, some conclusions and remarks
ESN allows embedding the local degradation dynamic into the reservoir are drawn in Section 6.
state space, and Conceptor extracts global degradation dynamic by
projecting reservoir states into a regularized eigenspace. The choice of 2. Problem statement
using CNNs for the classification task is motivated by the type of input,
i.e. matrices, which contain information about the correction among We consider a population of 𝑅 similar components, which have
the reservoir states. already failed in the past; each component is monitored by 𝑆 sensors.
Three case studies are considered to verify the performance of the The 𝑆 measured signals which relate to the equipment operating con-
proposed method. The first one is a problem of degradation level esti- ditions and degradation, are assumed to be synchronously measured up
mation built on synthetic data, whereas the second and third concern to failure. The run-to-failure trajectory of the generic rth component
the classification of the degradation level of bearings, the data are taken (𝑟 = 1, … , 𝑅) is represented by a matrix 𝑷 𝑟 , whose generic entry 𝑝𝑟𝑠,𝑖 is
from the repositories of the Center for Intelligent Maintenance Systems the value of signal 𝑠 at time 𝑡𝑖 , 𝑠 = 1, … , 𝑆 and 𝑖 = 1, … , 𝑁𝑟 , where 𝑁𝑟 is
(IMS) (Qiu et al., 2006) and of the PHM 2012 data challenge based on the number of measurements taken during the entire life of component
PRONOSTIA experiment platform (Nectoux et al., 2012a), respectively. 𝑟. Notice that due to the stochasticity of the degradation process, the
Notice that in all cases, we assume that the degradation level of the lifetime of each component is different and, therefore, we need to
training data is not known. develop a method able to deal with time series of different lengths,
The performance of the proposed method has been compared with 𝑁𝑟 , 𝑟 = 1, … , 𝑅. We realistically assume that no information is available
that of other state-of-the-art methods. Specifically: the clustering stage for directly labeling the degradation level of the components. This
has been performed by combining Spectral Clustering (Ng et al., 2001), situation is common for those components whose degradation cannot

3
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Fig. 1. Scheme of the problem statement. Note that 𝑁𝑟 may have different value w.r.t. different trajectory 𝑟.

be assessed during in-field applications but only performing time and


resource consuming tests.
The objective of the present work is to develop a method for the
real time estimation of the degradation level, 𝑑𝑐 , at the present time,
𝑡𝑐 , of a test component using the set of measurements 𝑝1∶𝑆,1∶𝑐 of the
𝑆 signals collected at time 𝑡1 , … , 𝑡𝑐 . Without loss of generality, we
describe equipment degradation using 𝑔 ∈ N+ discrete levels. Fig. 1
shows a pictural representation of the problem statement.

3. Preliminaries

3.1. Reservoir Computing


Fig. 2. ESN architecture.
In Reservoir Computing (RC), a reservoir is a large, randomly
connected recurrent neural network that receives time-varying input
signals (Lukoševičius & Jaeger, 2009). In this work, we consider Echo where 𝒖(t𝑖 ) denotes the input at time 𝑡𝑖 . The state activation function
State Networks (ESNs) (Ferreira et al., 2013), whose architecture is ( )𝑇
𝒇 = 𝑓1 , . . . , 𝑓𝑁 is a sigmoid function, such as 𝑡𝑎𝑛ℎ, and is applied
characterized by S input neurons and a reservoir with 𝑁 ≫ 1 internal element-wise.
neurons (Fig. 2). The matrix 𝑾 𝑖𝑛 of size 𝑁 × 𝑆 contains the weights
of the connections from the input neurons to the internal neurons.
3.2. Conceptors
The matrix 𝑾 of size 𝑁 × 𝑁 contains the weights of the connections
among the internal neurons. The elements of 𝑾 𝑖𝑛 and 𝑾 are randomly
Conceptors can be understood as filters characterizing the temporal
initialized from a uniform distribution 𝑈 (−1, 1) and remain unchanged
reservoir activation patterns (Jaeger, 2017). The Conceptor matrix 𝑪
during the learning stage (Lukoševičius & Jaeger, 2009; Xu et al., 2019; ( )
is defined as the linear transformation of the reservoir state 𝒙 𝑡𝑖 that
Xu, Baraldi, Al-Dahidi et al., 2020; Xu et al., 2020c). A precondition for
minimizes the loss function:
ESN learning algorithms to function is that the underlying reservoir [ ]
network possesses the Echo State Property (ESP), i.e. the effect of the ‖ ( ) ( )‖2
E𝑡𝑖 =𝑡1 ,𝑡2 ,…,𝑡𝑙 ‖𝒙 𝑡𝑖 − 𝑪𝒙 𝑡𝑖 ‖ + 𝛼 −2 ‖𝑪‖2𝑓 𝑟𝑜 (2)
current states of the reservoir internal neurons and of the input on ‖ ‖
a future state should vanish gradually as time passes and not get where 𝛼 is a control parameter called aperture, ‖⋅‖𝑓 𝑟𝑜 is the Frobenius
amplified or even persist (Yildiz et al., 2012). A sufficient condition norm (Jaeger, 2017) and 𝑙 is the length of the time sequence. The
for the verification of the ESP in standard ESNs is that the spectral closed-form solution of the minimization problem is:
radius 𝜌(|𝑾 |) of |𝑾 |, i.e. the magnitude of the largest eigenvalue of ( )−1
|𝑾 | (Yildiz et al., 2012), be smaller than 1, 𝜌(|𝑾 |) < 1. 𝑪=𝑹 𝑹+𝛼 −2 𝑰 (3)
The reservoir state at time 𝑡𝑖 represented by the 𝑁 dimensional [ ( ) ( ) ]
( ) where 𝑹 = E𝑡𝑖 =𝑡1 ,𝑡2 ,…,𝑡𝑙 𝒙 𝑡𝑖 𝒙 𝑡𝑖
𝑇
is the 𝑁 × 𝑁 correlation matrix of
vector 𝒙 𝑡𝑖 ∈ R𝑁 is updated according to: ( )
𝒙 𝑡𝑖 and 𝑰 is the 𝑁 × 𝑁 identity matrix. In intuitive terms, 𝑪 is a soft
𝒙(𝑡𝑖 ) = 𝒇 (𝑾 𝑖𝑛 𝒖(𝑡𝑖 ) + 𝑾 𝒙(𝑡𝑖−1 )) (1) projection matrix on the linear subspace where the samples of 𝒙(𝑡𝑖 ) lie.

4
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Note that Conceptors are obtained by the reservoir state 𝒙(𝑡𝑖 ) which is Unit (ReLU ), 𝑓 (𝑥) = 𝑚𝑎𝑥(0, 𝑥), is typically used as activation function of
generated for Echo State Network. the neurons to implement nonlinear transformation as well as improve
the representation ability (Glorot et al., 2011; Maas et al., 2013).
3.3. Spectral clustering The pooling layer typically receives in input the output of a con-
volutional layer, and performs the sub-sampling operation to lower the
Spectral clustering is based on the construction of a similarity graph spatial size of the feature maps and reduce the parameters of the whole
𝐺 = (𝑉 , 𝐸), where 𝑉 = {𝑣1 , … , 𝑣𝑙 } identifies the set of vertices and 𝐸 = network (Krizhevsky et al., 2012; Simonyan & Zisserman, 2014). The
{𝑣1 − 𝑣2 , 𝑣1 − 𝑣3 , … , 𝑣𝑙 − 𝑣𝑙−1 } is the set of edges connecting the vertices. pooling operator adopted in this work is max pooling (Zhang et al.,
2020).
Each vertex represents an object, and the weight associated to the edge
Similarly to what done in traditional convolutional neural networks,
connecting two generic vertices 𝑣𝑚 and 𝑣𝑛 is the measure of similarity
a fully connected layer is stacked after a pooling layer to connect all
between objects 𝑚 and 𝑛, denoted by 𝐴𝑚𝑛 , 𝑚, 𝑛 = 1, 2, … , 𝑙. Spectral
previous feature maps (Krizhevsky et al., 2012; Simonyan & Zisserman,
clustering aims at defining partitions of vertexes clusters, such that the
2014), and dropout operation is applied during training to reduce
edges between objects belonging to the same partition have associated
overfitting (Zhang et al., 2020).
large weights (objects within the same cluster are similar), whereas the Finally, a softmax classification layer (Simonyan & Zisserman, 2014),
edges between objects belonging to different partitions have associated which receives in input the output of the fully connected layer and
small weights (objects in different clusters are dissimilar). The spectral provides the classification result, is used.
clustering technique entails five main steps (Baraldi et al., 2015b, 2013;
Li, Song et al., 2020): 4. Proposed method

(1) Construct the similarity matrix 𝑨, whose general entry 𝐴𝑚𝑛 is the This Section describes the proposed two-stage unsupervised learning
measure of similarity between 𝑣𝑚 and 𝑣𝑛 , 𝑚, 𝑛 = 1, 2, … , 𝑙. method for degradation level estimation. Given the lack of data labeled
(2) Build the normalized Laplacian matrix 𝑳. This requires the with the component degradation, the method is based on the two stages
computation of the diagonal degree matrix, 𝜦, whose entries of clustering, aimed at labeling the training trajectories, 𝑷 𝑟 , 𝑟 = 1, … , 𝑅,

𝛬1 , 𝛬2 , … , 𝛬𝑙 are such that the given entry is 𝛬𝑚 = 𝑙𝑛=1 𝐴𝑚𝑛 , (Stage 1) and after classification of the test trajectory 𝑷 (Stage 2)
from which the normalized Laplacian matrix is obtained: (Fig. 3).
1 1
𝑳=𝑰 − 𝜦− 2 𝑨𝜦− 2 (4) Stage 1: Clustering. It receives in input the training trajectories and
provides in output the clusters of Conceptors. The generic time 𝑡𝑖 at
where 𝑰 is the identity matrix of size 𝑙 × 𝑙. which the trajectories change cluster corresponds to the time at which
(3) Compute the eigenvalues, 𝜆1 , 𝜆2 , … , 𝜆𝑙 , and corresponding the equipment degradation goes from one level to the successive.
eigenvectors, 𝑏1 , 𝑏2 , … , 𝑏𝑙 , of the normalized Laplacian matrix 𝑳,
Stage 2: Classification. A classifier is trained by using the cluster labels
whose eigenvalues are sorted from the smallest to the largest. Se-
found in Stage 1. It receives in input the Conceptors at the present time
lect the first k smallest eigenvalues 𝜆1 , … , 𝜆𝑘 and corresponding
and the difference between the Conceptors at the present and previous
eigenvectors 𝑏1 , … , 𝑏𝑘 (Von Luxburg, 2007).
time steps, and provides in output the estimation of the equipment
(4) Build the matrix 𝑩 of size 𝑙 × 𝑘, in which the 𝑘 columns are degradation level.
the eigenvectors 𝑏1 , … , 𝑏𝑘 found in step 2. A matrix 𝒁 is, then, The introduction of the intermediate phase of generating Concep-
obtained by normalizing 𝑩 rows (Von Luxburg, 2007): tors, instead of directly clustering and classifying signals, is motivated
( 𝑘 )1∕2 by the factor that: (1) Conceptors catch the dynamics of the degrada-
⎛ ∑ ⎞
𝑧𝑚𝑛̃ =𝑏𝑚𝑛̃ ∕ ⎜ 2
𝑏𝑚𝑛̃ ⎟ , 𝑚= 1,…,𝑙, 𝑛=
̃ 1,…,𝑘 (5) tion patterns, reducing the influence of noise and operational condi-
⎜ 𝑛=1 ⎟ tions; (2) Conceptors allow measuring similarities among time series of
⎝ ̃ ⎠
different lengths; (3) Conceptors provide a synthetic representation of
It has been shown that this change of data representation allows the information in the signals, which allows reducing the computational
identifying clusters more easily (Von Luxburg, 2007). complexity of measuring the similarity among time series.
(5) Apply a clustering algorithm to the rows of the matrix 𝒁, each
one representing an object in the space of the first 𝑘 normal- 4.1. Stage 1: Clustering of the Conceptors
ized eigenvectors. In this work, we use the k-means clustering
algorithm for this. The labeling of the training trajectories is based on the two phases of
generating Conceptors and clustering them. The diagram for illustrating
3.4. Convolutional Neural Networks (CNNs) Conceptor Clustering algorithm is shown in Fig. 4.

CNNs have shown superior performances in various degradation 4.1.1. Generation of the Conceptors
level estimation applications due to their ability of extracting features The objective of the first phase is to represent the segment from time
𝑡1 to time 𝑡𝑖 of the generic 𝑟th degradation trajectory, 𝑝𝑟1∶𝑆,1∶𝑖 ∈ R𝑆×𝑖 ,
for the classification task (Li, Hu et al., 2020). The CNN structure is
based on the repetition of convolutional, pooling, fully connected and by means of the Conceptor 𝑪 𝑟𝑖 ∈ R𝑁×𝑁 .
The Conceptor matrix 𝑪 𝑟𝑖 is computed by applying Eqs. (1) and
softmax classification layers (Krizhevsky et al., 2012).
(3) to the multi-variate time series 𝑝𝑟1∶𝑆,1∶𝑖 , with the reservoir states
Convolutional layers convolve the input by using kernels, i.e. filters
at time 𝑡0 , 𝒙(𝑡0 ), initialized to zero to avoid unnecessary uncertainty.
with local receptive field, that apply the same weights over the entire
The correlation matrix 𝑹 is computed by adopting an iterative updating
input field. The feature map 𝑿 𝑚+1
𝑛 provided in output by the 𝑛th kernel
procedure:
matrix, 𝑾̃ 𝑚+1 , associated to the 𝑚 + 1th layer is:
𝑛 𝑖−1 ( ) ( )𝑇 1
𝑹(𝑖) =𝑹(𝑖−1) ⋅ +𝒙 𝑡𝑖 𝒙 𝑡𝑖 ⋅ (7)
𝑿 𝑚+1 ̃ 𝑚+1 𝑚 𝑚+1
(6)
𝑖 𝑖
𝑛 =𝑾 𝑛 ⊙𝑿 +𝒃𝑛
where 𝑹(𝑖) denotes the correlation matrix at time 𝑡𝑖 and 𝑹(0) is the null
𝑚
where 𝑿 is the input to the 𝑚 + 1th convolutional layer, ⊙ denotes the matrix. Notice that the matrix 𝑹(𝑖) from which the Conceptor matrix
convolution operation and 𝒃𝑚+1
𝑛 is the corresponding bias term. After 𝑪 𝑟𝑖 is obtained contains information about the evolution of the signals
the convolutional operation, the Batch-Normalization (BN ) technique from time 𝑡1 to time 𝑡𝑖 . Therefore, 𝑪 𝑟𝑖 provides a synthetic, fixed-size
is applied to overtake the difficulty of training CNN with saturating representation of the time signals evolution in the time window [0, 𝑡𝑖 ]
nonlinearities and to speed up the training process. The Rectifier Linear whose length can vary from 𝑡𝑖 to 𝑡𝑁𝑖 .

5
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Fig. 3. Sketch of the unsupervised degradation level estimation method.

Fig. 4. Illustration of Conceptors Clustering algorithm. Given any two time series from starting point 𝑡1 to time 𝑡𝑖 and time 𝑡𝑗 , they are converted into Conceptor matrices 𝑪 𝑖 and
𝑪 𝑗 by using Conceptor Generator, separately, then for any 𝑖, 𝑗 = 1, … , 𝑁𝑟 , the distance matrix of Conceptors can be obtained, at last, Spectral Clustering can be applied to the
distance matrix and the clustering results are obtained.

6
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

4.1.2. Clustering of the Conceptors Algorithm 1: Conceptors Clustering


Let {𝑪 𝑟𝑖 }𝑖=1,…,𝑁 be the set of Conceptor matrices, obtained from the Input: 𝑅 run-to-failure trajectories {𝑷 𝑟 }𝑟=1,…,𝑅
𝑟
generic 𝑟th trajectory. The objective of this module is to cluster them Initialize: reservoir states 𝒙(𝑡0 ) ← 𝒛𝒆𝒓𝒐𝒔
into 𝑔̂ > 1 clusters, 𝛱1𝑟 , … , 𝛱𝑔𝑟̂ , where 𝑔̂ is the estimation of the number
1 for 𝑟 = 1 to 𝑅 do
𝑔 of equipment degradation levels experienced during a run-to-failure 2 for 𝑖 = 1 to 𝑁𝑟 do
degradation trajectory. 3 Compute 𝒙(𝑡𝑖 ) by Equation (1)
Since traditional clustering methods cannot be directly used for 4 Update correlation matrix 𝑹(𝑖) by Equation (7)
clustering matrices, and the vectorization of a matrix causes a quadratic 5 Compute Conceptor matrix 𝑪 𝑟𝑖 by Equation (3)
increase of the problem dimensionality, we resort to the spectral clus-
tering algorithm (Von Luxburg, 2007). 6 Compute similarity matrix 𝑨𝑟 by Equations (8), (9)
The distance metric between two Conceptors 𝑪 𝑟𝑖 and 𝑪 𝑟𝑗 , (𝑖, 𝑗 = 7 for 𝑘 = 2 to 𝐾 do
1, … , 𝑁𝑟 ), is computed using the Frobenius norm (Jaeger, 2017): 8 Obtain clusters {𝛱1𝑟 , … , 𝛱𝑘𝑟 }(𝑘) by using the spectral
clustering algorithm (Section 3.3)
𝑟 ‖ ‖
𝐷𝑖𝑗 = ‖𝑪 𝑟𝑖 − 𝑪 𝑟𝑗 ‖ 𝑓 𝑟𝑜 = (8) 9 Compute Silhouette Index 𝑆𝐼𝑘𝑟
‖ ‖
√ (( ) )
(( ) ) 𝑇 (( ) ) 10 Compute optimal number of clusters, 𝑘∗𝑟 , for trajectory 𝑟 by
𝑇 𝑇
Tr 𝑪 𝑟𝑖 𝑪 𝑟𝑖 + Tr 𝑪 𝑟𝑗 𝑪 𝑟𝑗 − 2 Tr 𝑪 𝑟𝑖 𝑪 𝑟𝑗 Equation (10)
11 Assign 𝑔̂ ←statistical mode of the set {𝑘∗𝑟 }𝑟=1,…,𝑅
with Tr(⋅) indicating the trace of the matrix. Then, the similarity
measure 𝐴𝑟𝑖𝑗 between the two Conceptors 𝑪 𝑟𝑖 and 𝑪 𝑟𝑗 extracted from the Output: clusters 𝛱1𝑟 , … , 𝛱𝑔𝑟̂ , (𝑟 = 1, … , 𝑅)
𝑟th training trajectory is defined by (Von Luxburg, 2007): 12 for 𝑟 = 1 to 𝑅 do
( )2 13 Use the clusters {𝛱1𝑟 , … , 𝛱𝑘𝑟 }(𝑘) identified in Line 8 for 𝑘 = 𝑔̂
⎛ 𝑟 ⎞
𝑟 ⎜ 𝐷𝑖𝑗 ⎟
𝐴𝑖𝑗 = 𝑒𝑥𝑝 ⎜− ⎟ (9)
⎜ 2𝜎 2 ⎟
⎝ ⎠
where 𝜎 is a parameter set to shape the desired interpretation of similar- elements of the Conceptor difference matrix which consider short-term
ity. The larger the value of 𝜎, the wider the width of the neighborhoods degradation variation are expected to be from 1 to order of 3 magnitude
and the slower the decline of similarity with respect to the distance smaller than the element of the Conceptor matrix, which consider long-
metric. term temporal dependency, the possible values 𝛽 are chosen in the
Finally, the spectral clustering algorithm of Section 3.3 is applied space {1, 10, 100, 1000}.
to obtain 𝑔̂ clusters of Conceptors {𝛱1𝑟 , … , 𝛱𝑔𝑟̂ } from the 𝑟th trajectory The classification module is based on a CNN dedicated for the input
with respect to the identification of 𝑔, ̂ the optimal number of clusters containing Conceptors and Conceptor difference (Fig. 5). Because CNNs
𝑘∗𝑟 to be considered for clustering the Conceptors obtained from each can naturally deal with Conceptors and Conceptor difference which
training trajectory 𝑟, 𝑟 = 1, … , 𝑅, is found by maximizing the Silhouette have the form of square matrices. In particular, CNNs are efficient and
Index 𝑆𝐼 (Rousseeuw, 1987), which is a measure of how much similar robust due to the use of small convolutional kernels, which apply the
an object is to the objects of its own cluster compared to the objects of same weights over the entire input field, significantly decreasing the
other clusters; 𝑘∗𝑟 is, therefore, given by: number of model parameters with respect to those needed by other
classification methods based on the transformation of matrices into
𝑘∗𝑟 =argmax 𝑆𝐼 𝑟𝑘 ; 𝑟 = 1, … , 𝑅 (10) vectors.
𝑘=2,…,𝐾
A CNN has been chosen an classification method for the following
where 𝐾 is the maximum possible number of degradation levels
reasons: (1) reduced number of model parameters with respect to
(Table 1).
methods that transform the Conceptors matrices into vectors. This is
The estimated number of degradation levels 𝑔̂ to be considered for
due to the fact that the same small convolution kernels are applied to
clustering all 𝑅 degradation trajectories is taken equal to the value ap-
{ } the entire input field. (2) The possibility of simultaneously convolving
pearing most often in the set 𝑘∗𝑟 𝑟=1,…,𝑅 . Then, for each run-to-failure
the Conceptor matrix 𝑪 𝑡𝑒𝑠𝑡
𝑐 and the Conceptor difference matrix 𝑪 𝑡𝑒𝑠𝑡
𝑐 −
trajectory, the clusters identified by the spectral clustering algorithm 𝑡𝑒𝑠𝑡
𝑪 𝑐−1 to consider in the analysis the correlation between the long term
when 𝑔̂ clusters were searched are used. The Conceptors clustering
degradation dynamic, extracted by 𝑪 𝑡𝑒𝑠𝑡𝑐 , and the short term variation
procedure is reported in Algorithm 1.
of degradation dynamic, extracted by 𝑪 𝑡𝑒𝑠𝑡 𝑡𝑒𝑠𝑡
𝑐 − 𝑪 𝑐−1 .
To label the Conceptors clusters, 𝛱1𝑟 , … , 𝛱𝑔𝑟̂ , (𝑟 = 1, … , 𝑅), as
specific degradation levels, we observe that the degradation process is 4.3. Complexity analysis
typically monotonic, i.e. 𝑑𝑖𝑟 ≤ 𝑑𝑖+1
𝑟 , and continuous, i.e. in one time

step the degradation can upmost increase of one level, i.e. 𝑪 𝑟𝑖 ∈ 𝛱𝑑𝑟 𝑟 → The time complexity of the Conceptors clustering algorithm is 
𝑖 ( ) ( ) ( )
𝑪 𝑟𝑖+1 ∈ {𝛱𝑑𝑟 𝑟 ∪ 𝛱𝑑𝑟 𝑟 +1 }. 𝑅 ⋅ 𝑁𝑟 ⋅ 𝑁 3 +  𝑅 ⋅ 𝑁𝑟2 ⋅ 𝑁 2 +  𝑅 ⋅ 𝑁𝑟3 ⋅ 𝐾 . The first term accounts
𝑖 𝑖
for the cost of generating the Conceptors, the second term the cost
4.2. Stage 2: Classification of the Conceptors of computing the similarity matrix, and the last term the cost of the
spectral clustering algorithm and the Silhouette Index computation.
The objective of the classification module is to estimate the degra- The time complexity of the classification algorithm is (𝑁𝑐𝑜𝑛𝑣 ⋅ 𝑁𝑓2 ⋅
dation level of the test component at time 𝑡𝑐 . Let 𝑪 𝑡𝑒𝑠𝑡
𝑐−1
, 𝑪 𝑡𝑒𝑠𝑡
𝑐 be the 𝑤2𝑓 ⋅ 𝑁 2 ) + (𝑁𝑓 ⋅ 𝑁 2 ) + (𝑁𝑓2 ), where the first term accounts for
Conceptor matrices representing the degradation trajectory of the test the cost of convolution operation in the 𝑁𝑐𝑜𝑛𝑣 layers, each one with
component from time 𝑡1 to time 𝑡𝑐−1 and from time 𝑡1 to time 𝑡𝑐 , time complexity of (𝑁𝑓2 ⋅ 𝑤2𝑓 ⋅ 𝑁 2 ), where 𝑤𝑓 indicates the width of
respectively. Inputs of the classification module are the Conceptor convolution kernel, the second term the cost of the max-pooling layers
𝑪 𝑡𝑒𝑠𝑡
𝑐 , which represents the long-term temporal dependency of the test
and the third term the cost of the fully connected layers.
degradation trajectory, and the difference 𝛽⋅(𝑪 𝑡𝑒𝑠𝑡 𝑡𝑒𝑠𝑡
𝑐 −𝑪 𝑐−1 ) (Fig. 3), which
With respect to the space complexity, Algorithm 1 needs to store
{ }(𝑘)
represents the short-term variation of the degradation dynamic between 𝑪 𝑟𝑖 , 𝑨𝑟 and the intermediate clustering results 𝛱1𝑟 , … , 𝛱𝑘𝑟 during
( )
time 𝑡𝑐−1 and time 𝑡𝑐 . With respect to the setting of 𝛽, i.e. the scaling the learning procedure. Its space complexity is therefore  𝑁𝑟 ⋅ 𝑁 2 +
( 2) ( )
factor of the difference matrix of two adjacent Conceptors, since the  𝑁𝑟 +  𝑅 ⋅ 𝑁𝑟 ⋅ 𝐾 . Considering that 𝑁 2 < 𝑁𝑟 , and 𝑅 ⋅ 𝐾 < 𝑁𝑟 in

7
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Fig. 5. Classification module based on the use of CNN.

Table 1 Table 2
Setting of the hyperparameters in Stage 1. The architecture of the Conceptor-based CNN network.

( )
most cases, the overall space complexity is  𝑁𝑟2 . In Stage 2, the space
2 2 2
complexity is (𝑁𝑐𝑜𝑛𝑣 ⋅𝑁𝑓 ⋅𝑤𝑓 )+(𝑁𝑓 ), where the first term accounts for
the space needed for storing the parameters of the 𝑁𝑐𝑜𝑛𝑣 convolutional
layers, each one of space complexity equal to (𝑁𝑓2 ⋅𝑤2𝑓 ), and the second
term is the space complexity of fully connected layers. Notice that no a stack of three of these convolution layers gives a receptive field
storage cost is associated to the max-pooling layers because they do not of size 7 × 7 using only 3 × (3 × 3) = 27 < 49 parameters. Notice
have weight parameters. that the wider the receptive field applied to a Conceptor, the more
information on the degradation dynamic is captured. The convolution
5. Case studies stride is fixed to 1 to provide more detailed degradation information,
and spatial pooling operation is carried out by three max-pooling layers
The proposed method is compared to state-of-the-art methods for connected after the convolutional layers. Max-pooling is performed
clustering and classification, considering one synthetic and two real over a 2 × 2 window with stride size 2. Note that the setting of
case studies. the number of channels follows the principle according to which the
number of channels doubles for each max-pooling layer added, and the
5.1. Hyperparameters setting for the proposed method base channel number is 𝑁𝑓 = 64 (Simonyan & Zisserman, 2014).
The CNN training is carried out by using the method of mini-batch
Table 1 reports the setting of the hyperparameters used to generate stochastic gradient descent with momentum (Sutskever et al., 2013).
the Conceptors. Since the larger the reservoir size 𝑁, the larger the The mini-batch size is set to 64, the momentum to 0.9 (Simonyan
Memory Capacity (MC), which quantifies the memory span of the & Zisserman, 2014). The training is regularized by using 𝐿2 norm
ESN (Büsing et al., 2010; Qiao et al., 2016), but also, the larger the weight decay and the 𝐿2 penalty multiplier set to 5 × 10−4 (Simonyan
computational burden, a trade-off value of 𝑁 = 100 is chosen. The & Zisserman, 2014). The maximal number of epoch is set to 30.
spectral radius 𝜌(|𝑾 |) is set equal to 0.9 to allow longer retainment of
the system state, which requires a large spectral radius, while ensuring 5.2. Methods considered for the comparison
the echo state property, which requires 𝜌 (|𝑾 |) ≤ 1 (Yildiz et al., 2012).
The connectivity 𝑐, i.e. the ratio between the number of connections The methods considered for the comparison of the clustering results
in the reservoir and the number 𝑁 2 of all possible connections, is set are characterized by different choices with respect to the type of input
equal to 5∕𝑁 to guarantee proper MC without extensively increasing data (time window of prefixed lengths, all the time series until the
the computational burden (Qiao et al., 2016). Conceptor aperture 𝛼, present time), the feature extraction technique (statistical features,
which can be interpreted as the scaling factor of the reservoir state Fourier transform, short time Fourier transform, deep learning), the
in Eq. (3), is set equal to 1 according to Jaeger (2017). distance measure (DTW, Euclidean) and the clustering algorithm (spec-
The architecture of the CNN employed in this work is reported in tral clustering, k-means, self organizing maps). Methods that receive
Table 2. Inspired by the Visual Geometry Group (VGG) net architecture in input time windows of pre-fixed length of the time series will be
proposed by Simonyan and Zisserman (2014), multiple convolutional indicated by the letter (a) (Clustering Methods 1a, 2a, 3a and 4a) and
layers characterized by kernels with small receptive fields of size 𝑤𝑓 × methods that receive in input all the time series from time 𝑡1 until the
𝑤𝑓 , with 𝑤𝑓 = 3, are stacked. This architecture provides the ability current time will be indicated by the letter (b) (Clustering Methods 1b,
of obtaining a wide receptive field with a relatively limited number 2b, 3b and 4b).
of parameters (Simonyan & Zisserman, 2014). For example, a stack Methods 1, 2, 3 and 4 in Table 3 have been used to perform an
of two convolution layers with kernel size 3 × 3 gives a receptive ablation study for the validation of the proposed Conceptors Clustering
field of size 5 × 5 using only 2 × (3 × 3) = 18 < 25 parameters and method. In particular: Method 1 uses the same clustering algorithm

8
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Table 3
Clustering methods considered for the comparison. (a) indicates that the method receives in input a time window formed by the last 𝑙𝑤 signal
measurements, whereas (b) indicates that the input is from time 𝑡1 to the current time.

of the proposed method from which it differs only for the distance The methods considered for the comparison of the classification
metric. Therefore, it allows investigating the effect of measuring simi- results are characterized by different feature extraction/representation
larity using the Conceptors instead of another state-of-the-art distance, techniques (windowed time series, Conceptors), distance measures
i.e. DTW ; Method 2 allows investigating the effect of using Conceptors (DTW and Frobenius norm of Conceptors difference) and classification
and Spectral clustering instead of traditional feature extraction and methods (KNN, TCN, LSTM and CNN ). Classification Method 1 receives
selection techniques in the time-domain; Method 3 allows investigating: in input the time windows made of the last 𝑙𝑤 measurements, whereas
(i) the difference between Conceptors and Fourier transform-based ap- Classification Methods 2, 3, 4 and 5 receive in input all the time series
proaches for feature extraction and selection in the frequency domain, from time 𝑡1 to the current time 𝑡𝑚 . The choice of reducing the length of
and (ii) another neuron-based method for clustering; Method 4 allows the time series provided in input to Classification Method 1 is motivated
comparing: (i) the proposed method for feature extraction with state- by its large computational complexity, which is quadratic with respect
of-the-art deep learning-based methods for feature extraction and (ii) to the number of inputs. Classification Method 6 is combined with the
clustering method which combines features extracted by using deep proposed clustering method to verify the motivation of using a two-
learning approaches and k-means; specifically, Clustering Method 4 stage method, since it directly compares the similarity between the test
uses a Generative Adversarial Network (GAN ) to reconstruct the dis- sample and the clusters centers and, then, assigns the membership to
tribution of data and an Auto-Encoder (AE), formed by an encoder and the classes.
the GAN generator to project the data into a latent space. Clustering Methods 1, 2, 3, 4, 5 and 6 in Table 4 allow performing an
Method 4b first transforms the variable-length time series into reservoir ablation study for the validation of the proposed Conceptors-based
states and, then, applies the same procedure of Method 4a. CNN classification method. In particular, Method 1 investigates (i) the
The design of the ablation study for the proposed clustering method effect of providing in input the whole trajectory (used by Conceptor)
has considered the fact that some combinations of feature extraction, instead of time windows made by the last 𝑙𝑤 measurements and (ii)
selection and distance measure are not possible or are not expected to the effect of measuring similarity between Conceptors instead of the
provide satisfactory results. For example, Clustering Method 1 directly similarity between time series through DTW, Method 2 investigates
uses the pairwise DTW distance for the clustering without performing the effect using Conceptors-based CNN classifiers instead of state-of-
an intermediate step of feature extraction and selection. Although the-art convolution-based classifiers of time series characterized by
Method 2 and 3 can provide intermediate feature extraction and se- the use of TCN for extracting features by temporal convolution and
lection results, the feature selection procedure of Method 3 uses the logistic regression as last classification layer, Method 3 investigates
phase value of Fourier spectrum, which cannot be applied to Method the effect using Conceptors-based CNN classifiers instead of state-of-
2. Clustering Method 4 does not support feature extraction, because the-art Recurrent Neural Networks, Method 4 investigates the effect of
the deep learning methods (AE, GAN ) used in the feature extraction Conceptors classification by using CNN instead of KNN and Method
procedure automatically generate the required features for clustering 5 investigates the effect of providing in input to the classifiers also
purpose. the difference between Conceptors extracted at the present and pre-
The other comparison methods have been introduced to investigate vious time steps. Method 6 investigates the need of adopting a two
the effect of the spectral clustering algorithm with respect to other pos- stage approach with clustering and classification instead of a single
sible clustering approaches. Since Conceptors are matrices whereas the stage approach. Notice that the objective of the ablation study is not
state-of-the-art clustering approaches receive in input vectors, different to compare the proposed classification method with all the possible
techniques for features extraction and selection have been used for the combination of input quantities, feature extraction and representa-
comparison. tion techniques, distance measures, and classification algorithm, but
With respect to the setting of the hyperparameters, the DTW warp- to justify the choices made in the design of the proposed method
ing path is set within 5 samples of a straight line fitting the two by comparison with state-of-the-art methods suitable for the specific
sequences, the parameter 𝜎 of the Spectral Clustering (Eq. (9)) and problems to be tackled.
Laplacian Score are set to 1, the number of training epoch of the SOMs Due to the characteristics of time series classification, feature ex-
is set equal to 200, the neuron number is set equal to the number of traction/presentation procedure are designed for different classification
clusters 𝑘, the max number of neighbors to 3 and the learning rate to methods. For example, Classification Method 1 uses DTW distance
0.02. Number of layers of neural network in Method 4 is set to 2, and metric to do KNN classification; however, this is not applicable to deep
neuron number 50. To avoid the complexity of optimizing the number learning classification methods TCN and LSTM (end-to-end learning
of clusters, Clustering Methods 1, 2, 3 and 4 search for a number of procedure) and Conceptor-based methods (Method 4, 5 and 6), because
clusters equal to the number of levels of degradation of the component. DTW is not applicable to Conceptor Matrix. Vice versa, the distance

9
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Table 4
Classification methods considered for the comparison.

metric used in Conceptor-based methods is not applicable to Method where 𝐻[𝑦1 , …,𝑦𝑁𝑟 ] and 𝐻[𝑐1 , …,𝑐𝑁𝑟 ] indicate the entropy of the true
1. and assigned degradation levels obtained from the clustering of the 𝑟th
The hyperparameters of the classification methods are optimized by trajectory, respectively:
performing a grid search on a validation set with the objective of maxi- [ ] ∑
𝑔
mizing the classification accuracy. The number of nearest neighbors of 𝐻 𝑦1 , …,𝑦𝑁𝑟 = − 𝑝𝑟 (𝛽)log 𝑝𝑟 (𝛽) ,
KNN is searched in the range {1, 2, … , 51}. The filter size of the TCN 𝛽=1
convolution layer is searched in the set {3, 4, 5, 6, 7, 8} (Bai et al., 2018), [ ] ∑
𝑔̂
( ) ( )
the number of filters in the set {125, 150, 175, 200} and the number of 𝐻 𝑐1 , …,𝑐𝑁𝑟 = − 𝛾 log 𝑝𝑟 ̂
𝑝𝑟 ̂ 𝛾 (14)
residual blocks in the set {1, 2, 3, 4} (Bai et al., 2018). The number of 𝛾 =1
̂

hidden neurons of the LSTMs is searched in the set {50, 100, 200} and the A NMI value of 1 indicates the most satisfactory performance.
number of LSTM layers in the set {1, 2}. The architecture of the CNNs Similarly, the Accuracy metric assessing the performance in the
used by the proposed method and Classification Method 5 is reported in classification of a test trajectory is:
Table 2 and the parameters reported in Section 4.2. The hyperparame- ∑𝑁𝑡𝑒𝑠𝑡
ters used for generating the Conceptors used by Classification Method 𝛿(𝑦𝑚 , 𝑐𝑚 )
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑡𝑒𝑠𝑡 = 𝑚=1 (15)
4, 5 and the proposed method are listed in Table 1. 𝑁𝑡𝑒𝑠𝑡
and the MI metric is:
5.3. Performance metrics

𝑔̂ ∑
𝑔
𝑝𝑡𝑒𝑠𝑡 (𝛽, ̂
𝛾)
𝑀𝐼𝑡𝑒𝑠𝑡 = 𝑝𝑡𝑒𝑠𝑡 (𝛽, ̂
𝛾 ) log (16)
The performance of the methods are evaluated at the two stages of: 𝛾 =1 𝛽=1
̂
𝑝𝑡𝑒𝑠𝑡 (𝛽) 𝑝𝑡𝑒𝑠𝑡 (̂
𝛾)
(i) unsupervised clustering of the 𝑅 run-to-failure training trajectories
𝑝𝑟1∶𝑆,1∶𝑁 , 𝑟 = 1 … 𝑅, and (ii) classification of the degradation level 5.4. Case study I: Synthetic dataset
𝑟
of the 𝑅𝑡𝑒𝑠𝑡 test trajectories 𝑝𝑡𝑒𝑠𝑡
1∶𝑆,1∶𝑁
, 𝑡𝑒𝑠𝑡 = 1 … 𝑅𝑡𝑒𝑠𝑡 . The metrics
𝑡𝑒𝑠𝑡
We consider a component characterized by a discrete three-levels
Accuracy and Normalized Mutual Information (NMI ) are used in both degradation process (Alaswad & Xiang, 2017). A run-to-failure trajec-
cases. Considering the 𝑟th generic run-to-failure training trajectory tory is simulated by using an auxiliary continuous variable 𝜂(𝑡), which
𝑝𝑟1∶𝑆,1∶𝑁 and indicating as 𝑚 the segment of the trajectory 𝑝𝑟1∶𝑆,1∶𝑚 going evolves following the exponential function (Gebraeel, 2006):
𝑟
from time 𝑡1 to time 𝑡𝑚 , with 𝑚 = 1, 𝑙𝑤 , 2𝑙𝑤 , … , 𝑁𝑟 , the Accuracy of the
clustering of trajectory 𝑟 is: 𝜂 (𝑡) =𝜃⋅𝑒𝛽𝑡 (17)
∑𝑁𝑟
𝛿(𝑦𝑚 , 𝑐𝑚 ) where 𝜃 and 𝛽 are independent random variables associated to a spe-
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑟 = 𝑚=1 , 𝑟 = 1, … , 𝑅 (11) cific component, with ln 𝜃 ∼  (−3, 0.62 ) and 𝛽 ∼  (−0.015, 0.0032 ).
𝑁𝑟
The component fails when 𝜂(𝑡) reaches the failure threshold 𝜂𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 = 1,
where 𝑦𝑚 and 𝑐𝑚 are the ground-truth and the assigned degradation
whereas the first state transition, from degradation level 1 to degrada-
level of segment 𝑚, respectively, 𝛿(𝑦𝑚 , 𝑐𝑚 ) is the delta function, which
tion level 2, occurs when 𝜂 (𝑡) reaches (1∕3) ⋅ 𝜂𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 , and the second
is equal to 1 if 𝑦𝑚 = 𝑐𝑚 , and 0 otherwise. The Accuracy metric is in the
state transition, from degradation level 2 to degradation level 3, occurs
range [0, 1], with 1 indicating the most satisfactory performance.
when 𝜂 (𝑡) reaches 2∕3 ⋅ 𝜂𝑓 𝑎𝑖𝑙𝑢𝑟𝑒 .
The Mutual Information (MI ) metric measures the degree of depen-
To mimic the complexity of a real industrial case, the measured
dency between the ground-truth and assigned degradation levels (Cakir
signals are influenced by component degradation, operational and en-
et al., 2019). It is preferred to the Accuracy metric when the dataset
vironmental conditions, and process noise. Operational and environ-
is imbalanced (Jain, 2010); the MI on the 𝑟th trajectory is (Ye et al.,
mental conditions are assumed to have periodic behaviors to simulate
2018):
seasonal effects:

𝑔̂ ∑
𝑔
𝑝𝑟 (𝛽, ̂
𝛾) ( )
2𝜋
𝑀𝐼𝑟 = 𝑝𝑟 (𝛽, ̂
𝛾 ) log , 𝑟 = 1, … , 𝑅 (12) 𝛤 (𝑡) =𝑠𝑖𝑛 𝑡 +𝜔𝑐 (𝑡) (18)
𝛾 =1 𝛽=1
̂
𝑝𝑟 (𝛽) 𝑝𝑟 (̂
𝛾) 50
with 𝜔𝑐 (𝑡) being a Gaussian noise with distribution  (0, 0.2) represent-
where, considering a randomly sampled segment 𝑚 extracted from the
ing the stochasticity of the environmental changes. The process noise
𝑟th training trajectory, 𝑝𝑟 (𝛽) and 𝑝𝑟 (̂
𝛾 ) are the probabilities that its
𝜔 (𝑡) is sampled from the Gaussian distribution 𝑁(0, 0.1). Component
true and assigned degradation levels are 𝛽 and ̂ 𝛾 , respectively, 𝑝𝑟 (𝛽, ̂
𝛾)
degradation is quantified by the step function 𝐷(𝑡) (Nourelfath et al.,
is the joint probability that its true degradation level is 𝛽 and its
2012):
assigned degradation level is ̂ 𝛾 , and 𝑔 and 𝑔̂ are the number of true
and assigned degradation levels, respectively. NMI normalizes MI in ⎧ 1 ⋅ ∫ 𝑡12 𝜂 (𝑡) 𝑑𝑡 𝑖𝑓 𝑑𝑒𝑔𝑟𝑎𝑑𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙= 1
the range [0, 1] (Wulandari et al., 2019; Ye et al., 2018): ⎪ 𝑡12 0
⎪ 1 𝑡23
𝑀𝐼𝑟 𝐷 (𝑡) = ⎨ 𝑡23 −𝑡12 ⋅ ∫𝑡12 𝜂 (𝑡) 𝑑𝑡 𝑖𝑓 𝑑𝑒𝑔𝑟𝑎𝑑𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙= 2 (19)
𝑁𝑀𝐼𝑟 = ( ) (13) ⎪ 1 𝑡𝑓
𝐻[𝑦1 , …,𝑦𝑁𝑟 ] + 𝐻[𝑐1 , …,𝑐𝑁𝑟 ] ∕2 ⎪ 𝑡 −𝑡 ⋅ ∫𝑡23 𝜂 (𝑡) 𝑑𝑡 𝑖𝑓 𝑑𝑒𝑔𝑟𝑎𝑑𝑎𝑡𝑖𝑜𝑛 𝑙𝑒𝑣𝑒𝑙= 3
⎩ 𝑓 23

10
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Fig. 6. Simulated run-to-failure trajectory of one component.

with 𝑡12 , 𝑡23 , 𝑡𝑓 indicating the times of the transitions from state 1 to
state 2, from state 2 to state 3 and from state 3 to failure, respectively.
In practice, 𝐷(𝑡) indicates the expected value of the variable 𝜂(𝑡) when
the component is in a given degradation state.
The values of ten signals 𝜒𝑠 (𝑡), 𝑠 = 1, … , 10, are simulated by using
the following equations that combine at least two of the three factors
𝐷(𝑡), 𝛤 (𝑡) and 𝜔(𝑡):

𝜒1 (𝑡) = sin((5∕2) ⋅ 𝐷 (𝑡) + 2𝛤 (𝑡))


𝜒2 (𝑡) = sin(5𝐷 (𝑡) + tan(𝛤 (𝑡)))
𝜒3 (𝑡) = tan((1∕2) ⋅ 𝛤 (𝑡) − (2∕3) ⋅ 𝜔 (𝑡))
𝜒4 (𝑡) = cos (𝜔(𝑡)) 3 ⋅ sin(𝛤 (𝑡))
𝜒5 (𝑡) = cos(5𝐷 (𝑡) + 3tan(3𝜔(𝑡))) (20)
2
𝜒6 (𝑡) = sin(5𝐷 (𝑡) + 𝜔 (t) − tan (𝛤 (𝑡) + 1))
𝜒7 (𝑡) = tan ((5∕6) ⋅ 𝐷(𝑡)𝛤 (𝑡)) +𝜔 (𝑡)
𝜒8 (𝑡) = cos2 ((7∕2) ⋅ 𝐷 (𝑡) + 2𝛤 (𝑡))
Fig. 7. Pairwise distance matrix based on Conceptors distance.
𝜒9 (𝑡) = cos(5𝐷 (𝑡) + tan 𝛤 (𝑡) − tan((1∕2) ⋅ 𝜔(𝑡)))
𝜒10 (𝑡) = tan ((1∕4) ⋅𝜔 (t)) − (1∕4) ⋅cos((4∕3) 𝐷(𝑡)sin(𝛤 (𝑡)))

A total of 200 run-to-failure degradation trajectories have been them (lighter colors) and dissimilar to those of different degradation
simulated by sampling 𝛽 and ln 𝜃 from their distributions, and applying levels (darker color). This effect can also be seen in Fig. 8, which
Eq. (19) to obtain the corresponding signal values. They are divided shows the scatter plot of the Conceptors in the space of the first three
into a set of 100 training trajectories used for clustering the patterns normalized eigenvectors found by the spectral clustering algorithm.
within each trajectory and 100 test trajectories used for verifying the Time sequences of the same degradation level are close and those of
classification performance. The lengths of time series are set randomly different degradation levels are well separated.
among 150, 160, … , 250; an example of a trajectory is given in Fig. 6, Fig. 9 shows the distribution of the number of clusters
{ ∗}
with length equal to 200. 𝑘𝑟 𝑟=1,…,𝑅=100 identified by applying the spectral clustering algorithm
Fig. 6 shows the obtained evolution of 𝐷(𝑡), 𝛤 (𝑡) and 𝜔(𝑡) during to the training trajectories. Notice that the statistical mode of the
a simulated run-to-failure trajectory representing the life of one com- distribution corresponds to the true number of degradation states in
ponent. Notice that the ten measured signals 𝜒1 (𝑡) , … , 𝜒10 (𝑡) do not the dataset.
explicitly show any evolution trend that it is easy to correlate to the Table 5 reports the obtained clustering performances. Notice that
equipment degradation process. the proposed method provides significantly more satisfactory perfor-
The hyperparameters and model architecture of the proposed mances than those of Methods 1, 2, 3 and 4. This is due to the capa-
method are set equal to the values reported in Tables 1 and 2. The bilities of the proposed Conceptors-aided clustering method of effec-
first ten Conceptor matrices of each trajectory are discarded due to tively capturing the global degradation dynamics and of the Frobenius
the burn-in period of the reservoir states caused by their initialization, distance of assessing the similarity among the Conceptors.
and the Conceptor matrix is computed for segment 𝑚 with 𝑚 = In the second stage, whose objective is the classification of the
11𝑙𝑤 , 12𝑙𝑤 , … , 𝑁𝑟 (𝑙𝑤 = 5). Fig. 7 shows the pairwise distance matrix test trajectories, the following sets of labels, are associated to the
among the segments 𝑝𝑟1∶𝑆,1∶𝑚 , extracted from a run-to-failure training trajectories segments 𝑝𝑟1∶𝑆,1∶𝑚 , (𝑟 = 1, … , 𝑅, 𝑚 = 1, 𝑙𝑤 , 2𝑙𝑤 … , 𝑁𝑟 ) of
trajectory provided by the proposed method. It can be seen that the seg- the training set to train the classifiers: (i) the labels provided by the
ments characterized by the same degradation level are similar among clustering Methods 1a, 2b, 3a and 4b, which have been shown to be

11
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Table 5
Comparison of the performance obtained in clustering the training trajectories. The best performance is reported in bold-italic, (a) indicates
that the method receives in input a time window formed by the last 𝑙𝑤 signal measurements, whereas (b) indicates that the input is formed
by the time interval between 𝑡1 and the current time.

Fig. 10. Confusion matrix of the degradation level estimation results provided by the
proposed clustering and classification methods.

and others and (b) comparison of the result of unsupervised degrada-


tion level estimation between the proposed two-stage approach and
Fig. 8. Scatter plot of the Conceptors in the space of the first three normalized the compositional method of combining other clustering and classifi-
eigenvectors found by the spectral clustering algorithm.
cation methods. According to the result, for purpose (a), the proposed
conceptor-aided classification method provides the best classification
performances, note that only true labels are considered; for purpose
(b), the proposed two-stage method achieves better degradation level
estimation performances than the combination of all clustering and
classification methods that do not use Conceptors (Clustering Methods
1, 2, 3, 4 combined with Classification Methods 1, 2 and 3). It is
possible to conclude that the use of Conceptors allow to achieve better
generalization at the expenses of the learning capability. However, in
Table 6, when provided prelabels by clustering method, the proposed
classification method has a larger drop in performance (0.0797’s accu-
racy drop and 0.114’s NMI drop) than compared methods. This result
shows that the performance of proposed method is probable to decline
if training labels are noisy. Note that, the proposed two-stage method
achieves slightly better performance than Method 6, which compares
similarity to the clusters centers and assigns the membership and serves
the purpose of investigating the need of adopting a two stage approach
with clustering and classification.
Fig. 9. Distribution of the optimal number of clusters identified for each one of the
Fig. 10 shows the confusion matrix of the degradation level esti-
100 training trajectories.
mation results provided by the combination of the proposed clustering
and classification methods. Notice that no misclassifications involve the
assignment to degradation level 3 of patterns whose true degradation
more accurate than those provided by clustering Methods 1b, 2a, 3b
level is 1 or vice versa. Furthermore, since there are more misclassi-
and 4a; (ii) the labels provided by the Proposed Method; (iii) the ground
fications involving degradation levels 1 and 2 than degradation level
truth labels. Notice that the last case, which assumes the unrealistic
3, the method is apt for risk prone applications, where the focus is
hypothesis of knowing the true labels of the training trajectories, is
to recognize when the component is very degraded for maintenance
considered to have an ideal best situation. intervention.
With respect to the learning of the proposed Conceptor-CNN clas- The experiments of computational time are performed in MATLAB
sifiers, the 100 labeled run-to-failure trajectories have been divided 2019b with a hardware Intel Core i5 CPU at 2.9 GHz and 16G RAM.
into two parts: 80 are used to train the classification model and the For brevity, the detailed numbers are not showed in the present work.
remaining for validation to set the model hyperparameter scaling factor It is found that the proposed clustering method is faster than Clustering
𝛽, obtaining the best accuracy with 𝛽 equal to 1. Methods 3 and 4, while slower than Clustering Methods 1 and 2.
Table 6 reports the degradation level estimation performances on Considering the classification task, the proposed method is slower than
the test trajectories. It is organized in two purposes, (a) comparison of traditional classification Methods 1, 2 and 3 as expected, this is due to
classification performances between the proposed classification method the more time consumed of CNN in training.

12
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Table 6
Comparison of the performance in the classification of the test trajectories. The best performance of each line is reported in bold-italic, notice
that the proposed Conceptors-based CNN achieves the best performance when true labels are available. Note that Classification 4, 5 and 6 act
as the ablation study to verify the importance of each procedure in the proposed classification method, and thus, are not combined with other
clustering methods.

Fig. 11. Example of transformation of the raw signals into a multivariate time series. Fig. 12. Pairwise distance matrix for bearing No. 4.

5.5. Case study II: IMS bearing dataset each 1 s time window. This way, the raw signals are transformed into
multivariate time series features as illustrated in Fig. 11. The Conceptor
The bearing dataset provided by the Center for Intelligent Mainte- matrix is then computed when ten 1-second windows are acquired and
nance Systems (IMS) of the University of Cincinati (Qiu et al., 2006) is the corresponding six features are extracted. The first 20 Conceptor
considered. A shaft supporting a 6000 lbs load installed is coupled to an matrices of each trajectory are discarded due to the burn-in period of
AC motor and rotates at 2000 rpm. Four force-lubricated bearings are
the reservoir states caused by their initialization, and the Conceptor
mounted on the shaft and two accelerometers are placed close to each
matrices are computed every five measurements.
bearing for measuring the vibrations at a frequency of 20 kHz. Every
By applying Algorithm 1, an optimal number of degradation levels
10 min of operation the acceleration signals are measured for a time
equal to 2 has been identified. Fig. 12 shows the pairwise distance
windows of 1 s containing 20 480 values. In this work, we consider
matrix among the segments 𝑝𝑟1∶𝑆,1∶𝑚 with 𝑚 = 21𝑙𝑤 , 22𝑙𝑤 , … , 𝑁𝑟 (𝑙𝑤 =
the degradation trajectories of bearings No. 3 and 4, which are the
10) extracted from the run-to-failure trajectory of bearing No. 4. It
only ones that fail in experiment 1 (Qiu et al., 2006). Each trajectory
can be seen that there is a clear partition between degradation levels
is formed by 𝑁𝑟 = 2156, (𝑟= 1, 2), 1-second length time windows, with
1 (healthy) and 2 (degraded). Notice that the separation boundary
𝑟= 1, 2 denoting bearing No. 3 and 4, respectively. The onset of ab-
between degradation levels 1 and 2 fits the reference value of 1672.
normal conditions on bearing No. 3 at time 1617 (Qiu et al., 2006),
2027 (Hasani et al., 2017) and on bearing No. 4 at time 1617 (Qiu The vertical and horizontal lines indicate the average of the times of
et al., 2006), 1760 (Yu, 2012) and 1641 (Hasani et al., 2017). Given the degradation onset detection in the literature works.
unavailability of the ground-truth time of the transition from a normal Table 7 reports the clustering performance. The proposed method is
(healthy) to an abnormal (degraded) condition, we assume that it has more accurate in the labeling of the training trajectories than the other
occurred at the average time among the transition times reported in clustering methods. It is interesting to notice that Clustering Methods
literatures (1822 for bearing No. 3 and 1672 for bearing No. 4), and 1b, 3b and 4b, which receive in input the whole trajectory, are not
we refer to these two quantities as the reference transition times. able to properly deal with the long-term dynamic of the degradation
Given the large dimensionality of the data, which makes unfeasi- process and, therefore, underperform with respect to the Clustering
ble directly using all the acceleration values acquired at 20 kHz as Methods 1a, 3a and 4b. Due to the better performance of time series
input of the clustering and classification methods, a feature extraction representation and similarity measurement, the Conceptor Clustering
procedure has been applied. In particular, according to Jardine et al. method outperform other clustering methods.
(2006) and Yang et al. (2021), the six statistical features of mean, The labels assigned to the segments of the trajectories by Clus-
root mean square, standard deviation, kurtosis, skewness and min– tering Methods 1a, 2b, 3a and 4a, that are the best options among
max range of the acceleration signal in the 1 s time window measured the corresponding methods (a) and (b), have been considered for the
by the sensor closest to the monitored bearing are computed from comparison of the classification results.

13
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Table 7
Comparison of the performance in clustering the training trajectories. The best performances are reported in bold-italic. (a) indicates that the
method receives in input a time window formed by the last 𝑙𝑤 signal measurements, whereas (b) indicates that the input is from time 𝑡1 to
the current time.

Table 8
Comparison of the performance in the classification of the test trajectories. The best performance of each line is reported in bold-italic; notice that proposed
Conceptors-based CNN achieves the best accuracy and NMI values. Note that Classification 4, 5 and 6 act as the ablation study to verify the importance of each
procedure in the proposed classification method, and thus, are not combined with other clustering methods.

A leave-one-out cross validation strategy is employed for setting the literature works have investigated the 3rd operating condition and we
scaling factor 𝛽 used for the classification. In practice, one trajectory is cannot find the reference transition times.
used for training and the other one for performance evaluation, and The same six statistical features (mean, root mean square, standard
vice-versa. The obtained optimal value of 𝛽 is 1. deviation, kurtosis, skewness and min–max range) of the acceleration
Table 8 reports the degradation level estimation performances on signal considered in case study II have been provided in input to the
the test trajectories. Similarly, it is organized in two purposes, (a) classification methods. They are extracted from the raw signal mea-
comparison of classification performances between the proposed clas- surements acquired during the 0.1-second windows containing 2560
sification method and others and (b) comparison of the result of unsu- acceleration values. Therefore, each time window becomes a single 6-
pervised degradation level estimation between the proposed two-stage dimensional pattern of the time series given in input to the clustering
approach and the compositional method of combining other clustering and classification algorithms.
and classification methods. According to the result, for purpose (a), The Conceptor matrix is computed each time 10 windows are
knowing that only very few trajectories (only 2) are used for training, acquired and, as before, the first 20 Conceptor matrices of each run-
accuracy of the TCN (Classification Method 2) and LSTM (Classifi- to-failure trajectory are discarded due to the burn-in period of the
cation Method 3) are worse than the KNN and the proposed, which reservoir states caused by their initialization. By applying Algorithm 1,
shows that the proposed classification method is not easy to overfit an optimal number of degradation levels equal to 2 has been identified.
the training data; for purpose (b), the proposed two-stage degrada- Table 10 reports the performance of the clustering methods. The
tion level estimation approach outperforms the combination of other proposed clustering method provides the most satisfactory clustering
clustering and classification methods. Note that the proposed two- of the training trajectories in operating condition 1. With respect to
stage Conceptor-CNN methods achieves slightly better performance the 2nd operating condition, the proposed method provides the largest
than Method 6, which compares similarity to the clusters centers and NMI and an Accuracy slightly smaller than that of Clustering Method
assigns the membership. Therefore, the use of a training set with 3a. Note that the Clustering Method 4 (deep learning-based) is nearly
some errors can improve the generalization ability of the models, and, the worst among the comparison methods, because the run-to-failure
therefore, improve the classification performance. trajectories are few and, thus, the deep learning approach is prone
to overfit and obtains poor generalization ability. Due to the better
5.6. Case study III: PHM challenge 2012 bearing dataset performance of time series representation and similarity measurement,
the Conceptor Clustering method outperform other clustering methods.
We consider the dataset of the PHM challenge 2012, which contains With respect to the classification of the test trajectories, a leave-
bearings run-to-failure data obtained using the PRONOSTIA experiment one-out cross validation procedure is applied to the labeled trajectories
platform (Nectoux et al., 2012b). and the average accuracy is used as optimization objective to set the
Three different operating conditions characterized by different mo- scaling factor 𝛽. The obtained optimal 𝛽 is 10 000 for the 1st operating
tor rotation speeds and loads on the bearings were considered during condition and 1 for the 2nd operating condition.
the run-to-failure degradation experiments: 1800 rpm and 4000 N (1st Table 11 reports the degradation level estimation performances
operating condition), 1650 rpm and 4200 N (2nd operating condition) obtained on the test trajectories. Note that, due to the absence of true
and 1500 rpm and 5000 N (3rd operating condition). Two accelerom- labels, the Table is organized to do the comparison of the result of unsu-
eters are installed in the horizontal and vertical position of the shaft pervised degradation level estimation between the proposed two-stage
to measure vibrations at a frequency of 25.6 kHz. Time windows approach and the compositional method of combining other clustering
of length equal to 0.1 s, containing 2560 acceleration values, are and classification methods. The proposed two-stage approach provides
measured every 10 s. Seven bearings are tested under the 1st and 2nd the best performance in terms of Accuracy on the trajectories of the
operating conditions. Table 9 reports the times at which the onset of 1st and 2nd operating conditions. The NMI metric estimations are
abnormal conditions have been detected in literature works. Notice that affected by large uncertainty due to the highly unbalanced number of
no transitions in abnormal conditions have been observed for bearings observations in normal (healthy) and abnormal (degraded) conditions.
No. 3, 4, 5 and 7 of the experiment in the 2nd operating condition. The Therefore, a difference of few missed or false alarms can cause a large
experiment in the 3rd operating condition is not considered, since no modification of the NMI metric. Notice that, although classification

14
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Table 9
Times of transition from the healthy state to the degraded state in the literature works.
Operating condition # of bearing Time of state transition in literature works Mean value
Mao, Zhang et al. (2020) Xiao et al. (2020) Jin et al. (2014) Mao, Tian et al. (2020)
1st No. 1 1405 1462 1129 1410 1351.5
1st No. 2 826 826 762 – 804.7
1st No. 3 1174 1365 891 1204 1158.5
1st No. 4 1087 1084 1083 – 1084.7
1st No. 5 2443 2411 1141 – 1998.3
1st No. 6 1590 1631 1641 – 1620.7
1st No. 7 2212 2206 885 – 1767.7
2nd No. 1 155 – – – 155
2nd No. 2 255 – – – 255
2nd No. 6 688 – – – 688

Table 10
Comparison of the performance in clustering the training trajectories. The best performance of each line is reported in italic. (a) indicates that
the method receives in input a time window formed by the last 𝑙𝑤 signal measurements, whereas (b) indicates that the input is from time 𝑡1
to the current time.

Table 11
Comparison of the performance in the classification of the test trajectories. The best performance of each line is reported in bold-italic; notice
that the proposed Conceptor-based CNN achieves the best accuracy for both working conditions. Note that Classification 4, 5 and 6 act as
the ablation study to verify the importance of each procedure in the proposed classification method, and thus, are not combined with other
clustering methods.

method 6 is slightly better than classification method 4, method 6 is CRediT authorship contribution statement
worse than other methods and the proposed two-stage Conceptor-based
approach. This is because method 6 is prone to be affected by outlier Mingjing Xu: Methodology, Writing – original draft, Writing – re-
trajectories if there were only a few run-to-failure trajectories. view & editing, Coding with python. Piero Baraldi: Conceptualization,
Methodology, Supervision, Reviewing and editing. Zhe Yang: Syn-
6. Conclusion thetic dataset preparing. Enrico Zio: Conceptualization, Methodology,
Supervision, Reviewing and editing.
A two-stage method based on the combined use of Conceptors and
CNNs has been proposed for data-driven degradation level estimation
Declaration of competing interest
in the case in which only few unlabeled run-to-failure degradation
trajectories are available.
The authors declare that they have no known competing finan-
The proposed method has outperformed state-of-the-art methods
in accuracy on a synthetic and two literature case studies containing cial interests or personal relationships that could have appeared to
vibrational data extracted from bearings with a competitive compu- influence the work reported in this paper.
tational efficiency. The obtained results show that Conceptors allow
effectively dealing with multivariate time series characterized by long- Data availability
term temporal dependencies, which are difficult to treat with methods
based on the use of sliding time windows. Also, the combination of Con- The authors do not have permission to share data.
ceptors and convolutional neural networks have been shown effective
in the classification of the degradation level of test components. Acknowledgments
It is, therefore, possible to conclude that the proposed method
contributes to overtake one of the main limitations toward the practical The work is developed within the research project ‘‘SMART MAIN-
estimation of equipment degradation level, i.e. the need of a large TENANCE OF INDUSTRIAL PLANTS AND CIVIL STRUCTURES BY 4.0
amount of labeled data for model training. MONITORING TECHNOLOGIES AND PROGNOSTIC APPROACHES -

15
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

MAC4PRO’’, sponsored by the call BRIC-2018 of the National Institute Ghafoori, Z., Erfani, S. M., Bezdek, J. C., Karunasekera, S., & Leckie, C. (2020). LN-SNE:
for Insurance against Accidents at Work – INAIL. Mingjing Xu gratefully Log-normal distributed stochastic neighbor embedding for anomaly detection. IEEE
Transactions on Knowledge and Data Engineering, http://dx.doi.org/10.1109/TKDE.
acknowledges the financial support from the China Scholarship Council
2019.2934450.
(No. 201606420061). Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks.
In Proceedings of the Fourteenth International Conference on Artificial Intelligence and
Statistics (pp. 315–323).
References
Han, T., Liu, C., Wu, L., Sarkar, S., & Jiang, D. (2019). An adaptive spatiotemporal
feature learning approach for fault diagnosis in complex systems. Mechanical
Al-Dahidi, S., Di Maio, F., Baraldi, P., Zio, E., & Seraoui, R. (2018). A framework for Systems and Signal Processing, http://dx.doi.org/10.1016/j.ymssp.2018.07.048.
reconciliating data clusters from a fleet of nuclear power plants turbines for fault Hasani, R. M., Wang, G., & Grosu, R. (2017). An automated auto-encoder correlation-
diagnosis. Applied Soft Computing, 69, 213–231. based health-monitoring and prognostic method for machine bearings. arXiv
Alaswad, S., & Xiang, Y. (2017). A review on condition-based maintenance optimization preprint arXiv:1703.06272.
models for stochastically deteriorating system. Reliability Engineering & System He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in
Safety, http://dx.doi.org/10.1016/j.ress.2016.08.009. Neural Information Processing Systems, 18, 507–514.
Ali, J. B., Fnaiech, N., Saidi, L., hebel Morello, & Fnaiech, F. (2015). Application of Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,
empirical mode decomposition and artificial neural network for automatic bearing 9(8), 1735–1780.
fault diagnosis based on vibration signals. Applied Acoustics, 89, 16–27. Jaeger, H. (2017). Using conceptors to manage neural long-term memories for temporal
Arn, R., Narayana, P., Draper, B., Emerson, T., Kirby, M., & Peterson, C. (2018). Motion patterns. Journal of Machine Learning Research, 18(1), 387–429.
segmentation via generalized curvatures. IEEE Transactions on Pattern Analysis and Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters,
Machine Intelligence, http://dx.doi.org/10.1109/TPAMI.2018.2869741. 31(8), 651–666.
Atamuradov, V., Member, IEEE, Medjaher, K., & Member (2018). Railway point Jardine, A. K., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics
machine prognostics based on feature fusion and health state assessment. IEEE and prognostics implementing condition-based maintenance. Mechanical Systems and
Transactions on Instrumentation and Measurement, PP(99), 1–14. Signal Processing, 20(7), 1483–1510.
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic Jiao, J., Zhao, M., & Lin, J. (2020). Unsupervised adversarial adaptation network
convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv: for intelligent fault diagnosis. IEEE Transactions on Industrial Electronics, 67(11),
1803.01271. 9904–9913. http://dx.doi.org/10.1109/TIE.2019.2956366.
Baraldi, P., Di Maio, F., Rigamonti, M., Zio, E., & Seraoui, R. (2015a). Clustering for Jin, X., Sun, Y., Shan, J., Wang, Y., & Xu, Z. (2014). Health monitoring and fault
unsupervised fault diagnosis in nuclear turbine shut-down transients. Mechanical detection using wavelet packet technique and multivariate process control method.
Systems and Signal Processing, 58, 160–178. In 2014 prognostics and system health management conference (PHM-2014 Hunan)
Baraldi, P., Di Maio, F., Rigamonti, M., Zio, E., & Seraoui, R. (2015b). Unsupervised (pp. 257–260). IEEE.
clustering of vibration signals for identifying anomalous conditions in a nuclear Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
turbine. Journal of Intelligent & Fuzzy Systems, 28(4), 1723–1731. convolutional neural networks. In Advances in neural information processing systems
Baraldi, P., Di Maio, F., & Zio, E. (2013). Unsupervised clustering for fault diagnosis in (pp. 1097–1105).
nuclear power plant components. International Journal of Computational Intelligence Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., & Kim, K. J. (2019). A survey of deep
Systems, 6(4), 764–777. learning-based network anomaly detection. Cluster Computing, 1–13.
Belhadi, A., Djenouri, Y., Srivastava, G., Djenouri, D., & Lin, C. W. (2020). A two- Längkvist, M., Karlsson, L., & Loutfi, A. (2014). A review of unsupervised feature
phase anomaly detection model for secure intelligent transportation ride-hailing learning and deep learning for time-series modeling. Pattern Recognition Letters,
trajectories. IEEE Transactions on Intelligent Transportation Systems, PP(99). http://dx.doi.org/10.1016/j.patrec.2014.01.008, arXiv:1602.07261.
Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and Lea, C., Flynn, M. D., Vidal, R., Reiter, A., & Hager, G. D. (2017). Temporal
new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, convolutional networks for action segmentation and detection. In Proceedings of
http://dx.doi.org/10.1109/TPAMI.2013.50, arXiv:1206.5538. the IEEE conference on computer vision and pattern recognition (pp. 156–165).
Borisagar, K. R., Thanki, R. M., & Sedani, B. S. (2019). Fourier transform, short-time Lei, Y., Jia, F., Lin, J., Xing, S., & Ding, S. X. (2016). An intelligent fault diagnosis
Fourier transform, and wavelet transform. In Speech enhancement techniques for method using unsupervised feature learning towards mechanical big data. IEEE
digital hearing aids (pp. 63–74). Springer. Transactions on Industrial Electronics, 63(5), 3137–3147.
Lei, J., Liu, C., & Jiang, D. (2019). Fault diagnosis of wind turbine based on
Büsing, L., Schrauwen, B., & Legenstein, R. (2010). Connectivity, dynamics, and mem-
long short-term memory networks. Renewable Energy, http://dx.doi.org/10.1016/
ory in reservoir computing with binary and analog neurons. Neural Computation,
j.renene.2018.10.031.
22(5), 1272–1311.
Li, L., Hansman, R. J., Palacios, R., & Welsch, R. (2016). Anomaly detection via a
Cakir, F., He, K., Bargal, S. A., & Sclaroff, S. (2019). Hashing with mutual information.
Gaussian mixture model for flight operation and safety monitoring. Transportation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(10), 2424–2437.
Research Part C (Emerging Technologies), http://dx.doi.org/10.1016/j.trc.2016.01.
Cao, C., Huang, Y., Yang, Y., Wang, L., Wang, Z., & Tan, T. (2019). Feedback convo-
007.
lutional neural network for visual localization and segmentation. IEEE Transactions
Li, X., Hu, Y., Li, M., & Zheng, J. (2020). Fault diagnostics between different type
on Pattern Analysis and Machine Intelligence, http://dx.doi.org/10.1109/TPAMI.2018.
of components: A transfer learning approach. Applied Soft Computing, 86, Article
2843329.
105950.
Chong, E., Han, C., & Park, F. C. (2017). Deep learning networks for stock market
Li, X., Li, X., & Ma, H. (2020). Deep representation clustering-based fault diagnosis
analysis and prediction: Methodology, data representations, and case studies. Expert
method with unsupervised data applied to rotating machinery. Mechanical Systems
Systems with Applications, 83, 187–205.
and Signal Processing, 143, Article 106825.
Costa, B. S. J., Angelov, P. P., & Guedes, L. A. (2015). Fully unsupervised fault Li, Q., Song, Y., Zhang, J., & Sheng, V. S. (2020). Multiclass imbalanced learning
detection and identification based on recursive density estimation and self-evolving with one-versus-one decomposition and spectral clustering. Expert Systems with
cloud-based classifier. Neurocomputing, 150, 289–303. Applications, 147, Article 113152.
Diaz, M., Henriquez, P., Ferrer, M. A., Pirlo, G., Alonso, J. B., Carmona-Duarte, C., Li, C., Zia, M. Z., Tran, Q. H., Yu, X., Hager, G. D., & Chandraker, M. (2019).
& Impedovo, D. (2017). Stability-based system for bearing fault early detection. Deep supervision with intermediate concepts. http://dx.doi.org/10.1109/TPAMI.
Expert Systems with Applications, 79, 65–75. 2018.2863285, arXiv:1801.03399,
Dibaj, A., Ettefagh, M. M., Hassannejad, R., & Ehghaghi, M. B. (2021). A hybrid fine- Lin, K., Lu, J., Chen, C. S., Zhou, J., & Sun, M. T. (2019). Unsupervised deep learning
tuned VMD and CNN scheme for untrained compound fault diagnosis of rotating of compact binary descriptors. IEEE Transactions on Pattern Analysis and Machine
machinery with unequal-severity faults. Expert Systems with Applications, 167, Article Intelligence, http://dx.doi.org/10.1109/TPAMI.2018.2833865.
114094. Liu, H., Song, W., Niu, Y., & Zio, E. (0000). A generalized cauchy method for remaining
Ellefsen, A. L., Bjørlykhaug, E., Æsøy, V., & Zhang, H. (2019). An unsupervised useful life prediction of wind turbine gearboxes, Mechanical Systems and Signal
reconstruction-based fault detection algorithm for maritime components. IEEE Processing, 153, 107471.
Access, 7, 16101–16109. Liu, H., Zhou, J., Xu, Y., Zheng, Y., Peng, X., & Jiang, W. (2018). Unsupervised fault
Ferreira, A. A., Ludermir, T. B., & De Aquino, R. R. B. (2013). An approach to diagnosis of rolling bearings using a deep neural network based on generative
reservoir computing design and training. Expert Systems with Applications, http: adversarial networks. Neurocomputing, http://dx.doi.org/10.1016/j.neucom.2018.
//dx.doi.org/10.1016/j.eswa.2013.01.029. 07.034.
Gangsar, P., & Tiwari, R. (2020). Signal based condition monitoring techniques for fault Lu, S., He, Q., & Wang, J. (2019). A review of stochastic resonance in rotating machine
detection and diagnosis of induction motors: A state-of-the-art review. Mechanical fault detection. Mechanical Systems and Signal Processing, 116, 230–260.
Systems and Signal Processing, 144, Article 106908. Lukoševičius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent
Gebraeel, N. (2006). Sensory-updated residual life distributions for components with neural network training. Computer Science Review, 3(3), 127–149.
exponential degradation patterns. IEEE Transactions on Automation Science and Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural
Engineering, http://dx.doi.org/10.1109/TASE.2006.876609. network acoustic models. In Proc. Icml, Vol. 30 (p. 3).

16
M. Xu et al. Expert Systems With Applications 213 (2023) 118962

Mao, W., Tian, S., Fan, J., Liang, X., & Safian, A. (2020). Online detection of bearing Wu, Z., Luo, H., Yang, Y., Zhu, X., & Qiu, X. (2018). An unsupervised degradation
incipient fault with semi-supervised architecture and deep feature representation. estimation framework for diagnostics and prognostics in cyber-physical system. In
Journal of Manufacturing Systems, 55, 179–198. 2018 IEEE 4th world forum on internet of things (WF-IoT) (pp. 784–789). IEEE.
Mao, W., Zhang, D., Tian, S., & Tang, J. (2020). Robust detection of bearing early fault Wulandari, C. P., Ou-Yang, C., & Wang, H.-C. (2019). Applying mutual information
based on deep transfer learning. Electronics, 9(2), 323. for discretization to support the discovery of rare-unusual association rule in
Miao, H., Li, B., Sun, C., & Liu, J. (2019). Joint learning of degradation assessment cerebrovascular examination dataset. Expert Systems with Applications, 118, 52–64.
and RUL prediction for aero-engines via dual-task deep LSTM networks. IEEE Xiao, L., Liu, Z., Zhang, Y., Zheng, Y., & Cheng, C. (2020). Degradation assessment of
Transactions on Industrial Informatics, 1. bearings with trend-reconstruct-based features selection and gated recurrent unit
Nectoux, P., Gouriveau, R., Medjaher, K., Ramasso, E., Chebel-Morello, B., Zerhouni, N., network. Measurement, 165, Article 108064.
& Varnier, C. (2012a). PRONOSTIA : An experimental platform for bearings Xiao, Y., Wang, H., Zhang, L., & Xu, W. (2014). Two methods of selecting Gaussian
accelerated degradation tests. In IEEE international conference on prognostics and kernel parameters for one-class SVM and their application to fault detection.
health management, PHM’12. Knowledge-Based Systems, 59, 75–84.
Nectoux, P., Gouriveau, R., Medjaher, K., Ramasso, E., Chebel-Morello, B., Zerhouni, N., Xu, M., Baraldi, P., Al-Dahidi, S., & Zio, E. (2019). Fault prognostics in presence of
& Varnier, C. (2012b). PRONOSTIA: An experimental platform for bearings event-based measurements: Proceedings of ESREL 2019, Sep 22-26, 2019, Hannover,
accelerated degradation tests. Germany. Research Publishing.
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an Xu, M., Baraldi, P., Al-Dahidi, S., & Zio, E. (2020). Fault prognostics by an ensemble
algorithm. Advances in Neural Information Processing Systems, 14, 849–856. of Echo State Networks in presence of event based measurements. Engineering
Nourelfath, M., Châtelet, E., & Nahas, N. (2012). Joint redundancy and imperfect Applications of Artificial Intelligence, 87, Article 103346.
preventive maintenance optimization for series-parallel multi-state degraded sys- Xu, M., Baraldi, P., & Zio, E. (2020b). Fault diagnostics by conceptors-aided clustering.
tems. Reliability Engineering & System Safety, http://dx.doi.org/10.1016/j.ress.2012. In 30th European safety and reliability conference, ESREL 2020 and 15th probabilistic
03.004. safety assessment and management conference, PSAM 2020 (pp. 3656–3663). Research
Qian, G., & Zhang, L. (2018). A simple feedforward convolutional conceptor neural Publishing Services.
network for classification. Applied Soft Computing, 70, 1034–1041. Xu, M., Baraldi, P., & Zio, E. (2020c). Fault diagnostics by conceptors-aided clustering:
Qiao, Z., Lei, Y., & Li, N. (2019). Applications of stochastic resonance to machinery Proceedings of ESREL 2020, Nov 1-5, 2020, Venice, Italy. Research Publishing.
fault detection: A review and tutorial. Mechanical Systems and Signal Processing, 122, Yang, Z., Baraldi, P., & Zio, E. (2021). A method for fault detection in multi-
502–536. component systems based on sparse autoencoder-based deep neural networks.
Qiao, J., Li, F., Han, H., & Li, W. (2016). Growing echo-state network with multiple Reliability Engineering & System Safety, Article 108278.
subreservoirs. IEEE Transactions on Neural Networks and Learning Systems, 28(2), Ye, J., Qi, G.-J., Zhuang, N., Hu, H., & Hua, K. A. (2018). Learning compact features
391–404. for human activity recognition via probabilistic first-take-all. IEEE Transactions on
Qiu, H., Lee, J., Lin, J., & Yu, G. (2006). Wavelet filter-based weak signature detection Pattern Analysis and Machine Intelligence, 42(1), 126–139.
method and its application on rolling element bearing prognostics. Journal of Sound Yildiz, I. B., Jaeger, H., & Kiebel, S. J. (2012). Re-visiting the echo state property.
and Vibration, 289(4–5), 1066–1090. Neural Networks, 35, 1–9.
Rocco, S. C. M., & Zio, E. (2007). A support vector machine integrated system for the Yin, S., Ding, S. X., Xie, X., & Luo, H. (2014). A review on basic data-driven approaches
classification of operation anomalies in nuclear components and systems. Reliability for industrial process monitoring. http://dx.doi.org/10.1109/TIE.2014.2301773.
Engineering & System Safety, http://dx.doi.org/10.1016/j.ress.2006.02.003. Yu, J. (2012). Health condition monitoring of machines based on hidden Markov model
Rodríguez-Ramos, A., da Silva Neto, A. J., & Llanes-Santiago, O. (2018). An approach and contribution analysis. IEEE Transactions on Instrumentation and Measurement,
to fault diagnosis with online detection of novel faults using fuzzy clustering tools. 61(8), 2200–2211.
Expert Systems with Applications, 113, 200–212. Yuan, J., & Liu, X. (2013). Semi-supervised learning and condition fusion for fault
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation diagnosis. Mechanical Systems and Signal Processing, 38(2), 615–627.
of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. Zhang, Y., Li, X., Gao, L., & Li, P. (2018). A new subset based deep feature learning
Salvador, S., & Chan, P. (2007). Toward accurate dynamic time warping in linear time method for intelligent fault diagnosis of bearing. Expert Systems with Applications,
and space. Intelligent Data Analysis, 11(5), 561–580. 110, 125–142.
Saravanan, N., & Ramachandran, K. I. (2010). Incipient gear box fault diagnosis using Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. (2017). Efficient knn classification
discrete wavelet transform (DWT) for feature extraction and classification using with different numbers of nearest neighbors. IEEE Transactions on Neural Networks
artificial neural network (ANN). Expert Systems with Applications, http://dx.doi.org/ and Learning Systems, 29(5), 1774–1785.
10.1016/j.eswa.2009.11.006. Zhang, M., Wang, N., Li, Y., & Gao, X. (2019). Deep latent low-rank representation for
Sarmadi, H., & Karamodin, A. (2020). A novel anomaly detection method based on face sketch synthesis. IEEE Transactions on Neural Networks and Learning Systems,
adaptive mahalanobis-squared distance and one-class kNN rule for structural health http://dx.doi.org/10.1109/TNNLS.2018.2890017.
monitoring under environmental effects. Mechanical Systems and Signal Processing, Zhang, S., Zhang, S., Wang, B., & Habetler, T. G. (2020). Deep learning algorithms for
http://dx.doi.org/10.1016/j.ymssp.2019.106495. bearing fault diagnosticsx—A comprehensive review. IEEE Access, 8, 29857–29881.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale Zheng, Y., Li, S., Yan, R., Tang, H., & Tan, K. C. (2018). Sparse temporal encoding of
image recognition. arXiv preprint arXiv:1409.1556. visual features for robust object recognition by spiking neurons. IEEE Transactions
Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of on Neural Networks and Learning Systems, http://dx.doi.org/10.1109/TNNLS.2018.
initialization and momentum in deep learning. In International conference on machine 2812811.
learning (pp. 1139–1147). Zhu, H., Lu, L., Yao, J., Dai, S., & Hu, Y. (2018). Fault diagnosis approach for
Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE photovoltaic arrays based on unsupervised sample clustering and probabilistic
Transactions on Neural Networks, 11(3), 586–600. neural network model. Solar Energy, http://dx.doi.org/10.1016/j.solener.2018.10.
Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 054.
395–416.

17

You might also like