You are on page 1of 8

Available online at www.sciencedirect.

com

ScienceDirect
Available online at www.sciencedirect.com

ScienceDirect
Procedia Manufacturing 00 (2019) 000–000
www.elsevier.com/locate/procedia

Procedia Manufacturing 51 (2020) 316–323

30th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM2021)


15-18 June 2021, Athens, Greece.
Abnormal vibration detection in the bearing-shaft system via semi-
supervised classification of accelerometer signal patterns
Sujeong Baeka*, Hyun Sik Yoonb, Duck Young Kimb
a
Department of Industrial and Management Engineering, Hanbat National University, Donseodae-Ro 125, Daejeon, 34158, Republic of Korea
b
Department of Industrial and Management Engineering, Pohang University of Science and Technology, Cheongam-Ro 77, Pohang, 37673, Repulic of Korea
* Corresponding author. Tel.: +82-42-821-1228; fax: +82-42-821-1591. E-mail address: sbaek@hanbat.ac.kr

Abstract

Several methods have been proposed for fault detection in mechanical systems based on sensor signals. It is preferable that corresponding label
for each sensor signal should be provided and analyzed via appropriate supervised classification methods. However, the label information about
a system's status often does not perfectly pair to the corresponding data. Therefore, we apply a semi-supervised classification for fault detection
using pattern extraction of multivariate signals. This approach transforms continuous time series into a set of contiguous bins via multivariate
discretization. Then, we identify informative patterns in the system states, by using a self-training method with limited label information. To
demonstrate the effectiveness of the proposed extraction method, five accelerometer signals are collected from a bearing-shaft system. The
proposed method successfully reveals informative fault patterns that can be applied as references for fault detection. The method achieved a
higher detection performance, regardless the ratio of unlabeled inputs in datasets.

© 2020The
© 2020 TheAuthors.
Authors. Published
Published by Elsevier
by Elsevier Ltd. Ltd.
This
This isisan
anopen
openaccess
access article
article under
under the BY-NC-ND
the CC CC BY-NC-ND licenselicense (https://creativecommons.org/licenses/by-nc-nd/4.0/)
https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review underresponsibility
Peer-review under responsibility ofof
thethe scientific
scientific committee
committee of FAIM
of the the FAIM 2021.
2020.

Keywords: Fault detection; Pattern extraction; Semi-supervised classification; Time series

1. Introduction analyze the sensor signals related to faults in various


mechanical systems [8, 9]. However, very few sensor signals
can typically be obtained for fault detection in a mechanical
With the introduction of new technology and divergence of element, and these signals are typically very short, nonlinear,
customer preferences, mechanical systems have become more and non-stationary [10, 11]. These signal characteristics can
intelligent, more complex, and therefore more uncertain [1, 2, easily give rise to unclear relationships between the signal and
3, 4]. Failure to adequately control a system leads to the the system's operational states (e.g., no-fault state and fault
occurrence of faults and system downtimes; these result in state). In such a case, the typical geometric similarity is not
economic losses, such as high warranty costs, as well as social satisfied because of inconsistencies incorrectly identifying as
losses, including negative impacts on consumer confidence [5, statistical outliers those signal values or trends associated with
6, 7]. Therefore, to reliably operate a system while preventing fault states [12]. To address this issue, pattern extraction with
unexpected fault occurrences, appropriate strategies for fault time series representations has been introduced to identify
detection and identification have been pursued. informative indicators of faults.
Because it is difficult to obtain clear and quantitative Among such approaches, time series discretization has
information in advance about a mechanical system—such as been widely used to analyze unexpected or anomalous signal
the fault-related frequency band and the state-space changes [13, 14, 15] by preserving the meaningful signal
representation [7]—many researchers have tried to indirectly changes while reducing the size of the dataset. After
2351-9789 © 2020 The Authors. Published by Elsevier Ltd.
2351-9789
This is an©open
2020 access
The Authors. Published
article under bytheElsevier Ltd.
CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an openunder
Peer-review access article under the CC
responsibility BY-NC-ND
of the license
scientific https://creativecommons.org/licenses/by-nc-nd/4.0/)
committee of the FAIM 2021.
Peer-review under responsibility of the scientific committee of the FAIM 2020.
10.1016/j.promfg.2020.10.045
Sujeong Baek et al. / Procedia Manufacturing 51 (2020) 316–323 317
2 Author name / Procedia Manufacturing 00 (2019) 000–000

transforming the original signals, the significant signal classification problem. In addition, they conducted the pre-
patterns are extracted from target states of a system as motifs training of each layer in an unsupervised manner to find
[16, 17, 18]. For example, the transformation of the original appropriate parameter settings of the classifier. Then, in a
signals to discretized state vectors (DSVs) is applied both in supervised manner, they applied convolution filters to
discrete wavelet transforms and in one of the time series discretized time-series data. A combination of SAX and neural
discretization methods, namely, symbolic aggregate networks was also used to distinguish gestures and actions
approximation (SAX) [19]. By counting the number of [35]. A 3D posed image was converted into a symbol matrix
occurrences of each DSV and using the k-nearest neighbor using SAX, and then a hierarchical-clustering method was
classifier, George et al. extracted meaningful patterns about used to assign symbol matrices to several 3D posed-image
rotor faults in induction motors [20]. Duan et al. analyzed groups. After generating the groups in a semi-supervised
compressor faults by SAX transformation and bit map [21]. manner, the investigators converted a symbol matrix into a
After transforming the original vibration signals, they feature vector using a CNN and predicted the current gesture
generated a meaningful bit map from each different fault state by a long short-term memory model in a supervised condition.
(e.g., spring faults, valve fracture, and valve wear). In addition, However, during the time-series classification and fault
many researchers have used time series classification with a detection, a decrease in the performance of semi-supervised
supervised learning classifier for the extraction of significant learning approaches was observed compared with that of
markers of faults [22, 23, 24]. supervised learning. In addition, several detection approaches
However, obtaining corresponding and correct labels for provided high detection performance either with few labeled
the entire time series data is very expensive and time and many unlabeled input datasets or with few labeled and
consuming [25, 26, 27], and mechanical systems are usually many unlabeled inputs [27, 33, 34]. Because semi-supervised
rife with unknown faults [28]. In other words, in practice, the detection cannot be controlled in practical environments, it
label information about a system's status often fails to should not be susceptible to the number of unlabeled inputs in
perfectly pair with the corresponding sensor data. For example, the entire dataset. Therefore, in this study, we propose adding
fully labeled datasets are often collected from in-lab the number of DSV occurrences in no-fault and fault patterns
experiments or test runs, whereas unlabeled datasets are as input data in semi-supervised learning to re-weigh the
collected using the identical system in practical environments. extracted patterns and make up for the information loss from
Therefore, several semi-supervised learning approaches have the discretization. The proposed occurrence information
been introduced in order to leverage the advantages of both should enable dealing with a severity as a fault marker.
supervised and unsupervised learning [29, 30]. Zhao et al. Subsequently, we analyzed the dependence of fault detection
proposed a semi-supervised learning classifier for fault performance on the amount of data in training sets to compare
detection in solar photovoltaic arrays [31]. They normalized the performance of our proposed method with the existing
and filtered measurements and then distinguished fault data supervised or semi-supervised detection methods.
from other data by using supervised and unsupervised The remainder of the present study is organized as follows.
classifiers together in a semi-supervised manner. This method Section 2 details the transformation of the original continuous
showed good performance without labeling costs in signals into DSVs and describes the extraction of fault
continuously updating models. For the extraction of patterns using semi-supervised classification in consideration
meaningful vibration signals, a semi-supervised classifier was of the number of DSV occurrences. Section 3 describes the
applied with Kernel marginal Fisher analysis [32]. Those bearing-shaft system analyzed here, the collection of
authors first reduced redundant information in order to acceleration signals and the control of abnormal states. The
highlight meaningful signal behaviors related to system status pattern extraction method is experimentally verified and
change. They then extracted optimal low-dimensional features validated for the detection of fault states in the bearing shaft.
in order to improve the classification performance of various Finally, Section 4 presents concluding remarks and future
bearing fault types. directions following this work.
In many studies, semi-supervised learning approaches
showed the improvement in fault detection with not fully 2. Semi-supervised classification of signal patterns
labeled input datasets. However, it is not easy to apply a semi-
supervised classifier to DSVs, because time series 2.1. Pattern generation from multivariate signals
discretization reduces the number of datasets, making the
preprocessed dataset smaller than the original dataset. Jun et al. We used the multivariate discretization in a pattern
obtained a symbolic representation of time series with a semi- extraction manner, to observe informative signal behaviors, as
supervised classification [33]. They transformed the time- developed in our previous study [11] (illustrated as Fig. 1).
series data into a series of granules by applying a hidden This developed multivariate discretization-based pattern
Markov model, and then used both the hidden Markov model analysis was composed of (i) a digitizing and partitioning step
and a shallow neural network together with symbolic and and (ii) an abnormal pattern extraction step. The developed
original real-valued data, respectively. Also, a convolution pattern extraction method showed higher fault detection and
neural network (CNN) was modified and used for a semi- prediction performance in the electromechanical systems, such
supervised learning in the time-series classification [34]. First, as an automobile engine and a laser welding machine. In this
the investigators artificially increased the amount of training study, we therefore continue to apply the method to extract
data to handle partially labeled inputs in the supervised accelerometer signal patterns to detect abnormal vibrations in
318 Sujeong Baek et al. / Procedia Manufacturing 51 (2020) 316–323
Author name / Procedia Manufacturing 00 (2019) 000–000 3

 D( X )11 D( X )12  D( X )1s 


 
D( X )21     (1)
D(X)  
   D( X )ik  
 
 D( X ) m1 D( X )m2  D( X )ms 

where 𝑚𝑚 is the number of sensor signals, 𝑠𝑠 is the number of


time segments, and 𝐷𝐷�𝑋𝑋��� is one of the assigned digitalized
labels corresponding to the original 𝑖𝑖 th sensor signal within the
𝑘𝑘th time segment.
During the discretization procedure, each discretization
parameter, listed in Table 1, was applied for pattern extraction
of the accelerometer signals. We then selected the optimal
values for these parameters according to an empirical
sensitivity analysis of discretization parameters [11].
Thereafter, we calculated the key characteristics (KCIs) of the
accelerometer signals because KCIs are highly significant in
selecting the optimal discretization parameters. The signals
Fig. 1. DSV generation by multivariate time series discretization using showed a large abrupt variance which indicates the given
acceleration signals acquired from the bearing-shaft system: (a) the original signals were highly fluctuating. Also, the datasets had low
data; (b) the partitioning step (a series of time segments); (c) the digitizing discernibility index which means it was not straightforward to
step (a series of labeled time segments); (d) the generated DSV. distinguish between two distributions from normal and
abnormal states respectively. Therefore, we chose seven bins
the bearing-shaft system. The digitizing step was also usually and an 80% bin width threshold which can focus on separating
called a labeling step, but to prevent confusion between outliers into as many different bins as possible, rather than
labeling in discretization and assignment of class information dividing the measurements in a centroid of the given
such as a fault or no-fault for classification, we used the term distribution. That is, the median bin that includes a centroid of
digitizing. In this research, a second supervised abnormal the distribution occupies 80% of total distribution, whereas
pattern extraction step was substituted with the semi- other bins occupy 3.33% each.
supervised classification step, as explained in Section 2.2.
Table 1. Parameters used for multivariate time series discretization.
First, we divided continuous the entire time series data into
𝑠𝑠 segments (i.e., time windows) with an equal length 𝑤𝑤. A Discretization parameter Parameter setting
small portion of the multiple-sensor data was transformed into
either a single label or a few labels. Various statistical indices Number of data points 100 points (every 1s) without any
of time series data can be utilized as discretization labels. in a time segment nested points
Herein, we selected the mean value of the original time series
Number of bins 7 bins
data within a time segment as shown in Fig. 1. Because the
obtained accelerometer signals oscillated up and down quite Bin width threshold 80%
symmetrically, exhibiting a mean value of zero, the absolute
values of the original sensor signals were applied to
investigate the changes in magnitude over time. 2.2. Semi-supervised classification considering the
To convert a continuous mean value into a discrete digital occurrences of the DSVs
label, we divided the sensor signal domain into digitalized
bins by maximum likelihood estimation (MLE). MLE was The generated DSV matrix from the previous section was
used to estimate the entire collection of sensor signals as a applied to a semi-supervised classifier to distinguish current
parametric probability density function (PDF) (i.e., an status as abnormal or normal. To do this, we adopted the self-
estimated distribution). Based on user-defined parameters (i.e., training classifier based on a deep neural network (DNN).
the number of bins and bin width threshold), the estimated Self-training was a simple semi-supervised method and
distribution of each sensor signal was cut into several showed good performance when the number of labeled
digitalized bins 𝑙𝑙�� where 𝑖𝑖 represents the identification datasets was small [27, 36]. The method consisted of three
number of the sensor signal and 𝑗𝑗 represents the relative steps, as illustrated in Fig. 2: (i) initial classification with
location of the current bin. The maximum value of 𝑗𝑗 is 𝑏𝑏, the labeled DSVs, (ii) unlabeled data prediction, and (iii)
number of bins. retraining with both labeled datasets and pseudo-labeled data.
Finally, we were able to approximate the original Before training, the DNN structure needs to be defined.
multivariate sensor signal 𝐗𝐗 as the 𝑚𝑚 � 𝑠𝑠 matrix of DSVs, In this study, we employed a neural network with three
��𝐗𝐗� denoted as hidden layers. While the number output node was one, the
number of input nodes corresponds to seven which were the
sum of the number of elements in one DSV—herein five
labels—and two neurons which are for the occurrence number
of the corresponding DSV in each system state. If the value of
Sujeong Baek et al. / Procedia Manufacturing 51 (2020) 316–323 319
4 Author name / Procedia Manufacturing 00 (2019) 000–000

user-selected threshold to determine the confidence limits on


datasets, such as above 0.95 and below 0.05. Step 2 and 3
were recursive until no unlabeled datasets remained.
To overcome the small number of training DSVs and
compute the predictive power of DSVs, we added the number
of DSV occurrences in the normal and abnormal states as
input variables respectively. In step 1, we counted the
occurrences of each labeled DSV in normal and abnormal
states. Because a given system status often lasted for a long
period (mean length: 83s), it was possible to find a DSV in
one specific state more than once. We counted these
occurrences as distinct, because a repetitive DSV can serve as
Fig. 2. The general framework of the self-training classifier.
a more powerful index for a given system’s state. We then
the output node was assigned as 0, the current state of the applied the DNN model to the labeled DSVs and their
bearing-shaft system was considered as normal. If the output corresponding occurrence numbers in normal and abnormal
node provided 1 as a result, it was considered that abnormal states for the first training. After the initial training, we
vibration is in the system. The number of other nodes and the conducted the same procedure for step 2 with unlabeled inputs.
corresponding transfer functions should be optimized, and we When unlabeled DSVs had the corresponding pseudo-label by
determined the settings by conducting an empirical the DNN model, ‘most confident DSVs’ were also selected of
experiment. We applied several parameter settings for the which output value was within the confidence limits. Then, we
numbers of hidden layers, such as increasing those numbers counted the occurrences of both the existing labeled DSVs,
(e.g., 5, 10, and 20 for the first, second and third layers and unlabeled but the most confident DSVs from normal and
respectively) or decreasing the numbers (e.g., 20, 10, and 5 for abnormal states. Consequently, both the DSVs and the
the first, second and third layers respectively). We also corresponding numbers of occurrences were used as new
considered different transfer functions, including a positive training datasets in the next iteration. That is, the training
linear function and a hyperbolic tangent sigmoid function. datasets in the self-training classifier were repetitively
After trying different parameter settings of a DNN, the changed in every training iteration by adding the new
structure with the highest performance in the fault detection confident DSVs among pseudo-labeled DSVs. By considering
and computation time was selected and summarized in Table 2. the occurrences of DSVs in both system states as input nodes
respectively, it was possible to give higher value as an output
Table 2. The structure of input, hidden, and output layers used in abnormal to the particular DSV which was frequently found in abnormal
vibration detection. states but never in normal states. Visa versa, we assigned a
Layer ID The number of nodes Transfer function lower value as a result to a DSV which are rarely found in
abnormal states but usually observed in normal states. In
Input layer 7
addition, we could differentiate the known DSVs and
Hidden layer 1 5 unknown DSVs in the test phase of abnormal vibration
Positive linear transfer detection.
Hidden layer 2 10
function
Hidden layer 3 20 3. Experimental results of abnormal vibration detection
Output layer 1
using signal patterns

3.1. The bearing-shaft system


First, we trained the DNN model with the selected
architecture with the labeled dataset as a supervised condition For validation and verification of the proposed detection
and predicted the unlabeled data via the initially trained method, we designed the bearing-shaft system shown in Fig.
classifier. Then, the predicted outputs of the unlabeled datasets 3a. The bearing-shaft system was composed of two shafts, five
were sorted. An output equal to 1 was considered an abnormal bearings, two plates, and two gears. Mechanical rotation of the
pattern with complete confidence. On the other hand, an system is powered by a DC motor, which transfers power
output equal to 0 was considered by the corresponding DSV to through shafts and interlocking gears. To monitor the system's
completely represent only a normal status of the bearing-shaft operational status in real time, five acceleration sensors (PCB-
system. In this regard, among the pseudo-labeled data, the 352A21) were attached to each bearing cover and the upper
maximum (the most abnormal) and minimum (most normal) side of the motor, as illustrated in Fig. 3b. It is Vibration
outputs were considered most confident, and they were used signals were collected by a real-time data acquisition (NI-
as new input data with corresponding pseudo-labels into the cDAQ-9178) at a sampling rate of 100Hz for an average of
DNN model for retraining. If we had selected the most 83s (standard deviation: 7.8s) per dataset; the signals were
abnormal and most normal data, the computational cost would directly stored in a database.
be very expensive. Therefore, in this study, we employed a
320 Sujeong Baek et al. / Procedia Manufacturing 51 (2020) 316–323
Author name / Procedia Manufacturing 00 (2019) 000–000 5

The characteristics of the bearing-shaft system, including


the gear ratio, motor speed, and dimensions are further
detailed in Table 3 and detail specification of the
accelerometers used are summarized in Table 4. Additional
descriptions of the collected datasets are provided in Table 5.
A sample of a collected dataset was given as an 𝑛𝑛 � � array,
where 𝑛𝑛 indicates the length of the time series in either a
normal or an abnormal state and 5 rows indicate 5 sensors, as
shown in Table 6. For example, if sensor signals were
collected for 60s at a sampling rate of 100Hz, the size of the
collected datasets would be ���� � �.

Table 3. Experimental setup of the bearing-shaft system and data acquisition


device to collect vibration signals during normal and abnormal states

Total dimension (� � � � �� 0.95m � 0.25m � 0.35m

Gear diameter (driving, driven) 0.11m, 0.07m

The number of gear teeth


50, 30
(Driving, driven)

Motor speed 250rpm

Table 4. Specifications of the sensor used to measure the vibration of the


bearing-shaft system.
Fig. 3. Bearing-shaft system: (a) photo of the bearing-shaft system; (b) the
On the motor and corresponding brief diagram of the bearing-shaft system.
Sensor positions
on the four bearing covers

Sensitivity 1.0 𝑚𝑚𝑚𝑚𝑚 � Five signals were collected at a sampling rate of 100Hz,
which is relatively but sufficiently low for monitoring the
Measurement range � 4,900 𝑚𝑚𝑚𝑚𝑚 � 𝑝𝑝𝑝𝑝 status of the rotating equipment. If the sampling rate is
Frequency range (� 5%) [1.0 10,000] Hz extremely low, it is susceptible to distortions such as aliasing
�� or folding [37]. On the other hand, if too high, more
Spectral noise (100Hz) 60 /√𝐻𝐻𝐻𝐻
�� computation cost would be required. According to the
previous research, any value over the Nyquist sampling rate
Table 5. Description of the collected vibration datasets during normal and
(herein 0.5Hz) was adequate and appropriate to collect signals
abnormal states.
in fault pattern extraction through DSVs [38]. When
Number of sensors in the system Five accelerometer sensors multivariate discretization was used for the fault pattern
extraction, 100Hz was an allowable sampling rate because of
Number of abnormal states the partitioning step. As several measurements in the user-
10 times for each fault type
(type 1 / type 2 / type 3)
defined length of the time segment are transformed into one
Total number of normal states 30 states label, above the minimum sampling rate (i.e., the Nyquist
sampling rate) the performance of the extracted fault pattern
Monitoring period for each state
About 83s was not significantly different from a Tukey honest
(Standard deviation: 7.8s) significant-difference test. That is, time segmentation showed
Sampling frequency 100Hz
a similar effect to having a low sampling rate, and there was
no significant difference for any sampling rates higher than
the Nyquist's minimum sampling rate. Therefore, in this study,
Table 6. The structure of hidden layers used in abnormal vibration detection.
we selected 100Hz, which was not only sufficiently high to
Time Sensor 1 Sensor 2 Sensor 3 Sensor 4 Sensor 5 capture the signal behavior over time but also efficient to
collect a huge amount of time series data.
00.000 -0.0084 0.0766 0.0287 -0.1182 -0.0193
We artificially simulated three different fault states by
00.001 0.0947 0.0011 -0.01125 0.1374 -0.0047 controlling the basic components (i.e., shaft and bearing) of
00.002 0.0042 0.0195 0.0622 0.0637 -0.0119 the bearing-shaft system as follows: For the first fault type,
two shafts were arranged in a non-parallel configuration. Thus,
00.003 0.0011 0.0171 -0.0052 0.1322 0.0353
the rotational force was not fully transmitted to the shorter
00.004 0.0211 -0.0484 0.0265 0.0611 0.0134 shaft due to an improper engagement between the teeth of two
gears as illustrated in Fig. 4a. Another fault was generated by

loosening a bolt in a bearing cover that holds both a bearing
65.623 0.0106 0.0105 0.0413 -0.1881 0.0251 and a shaft in place, as depicted in Fig. 4b. As the bearing
cover becomes loose, the adjacent shafts, gears, and plates
may tend to rotate unstably. To generate a third fault type,
Sujeong Baek et al. / Procedia Manufacturing 51 (2020) 316–323 321
6 Author name / Procedia Manufacturing 00 (2019) 000–000

non-parallel shafts and loose bearing cover were applied Training datasets without labels, those with corresponding
simultaneously. labels, and test datasets were divided as summarized in Table
Although the physical experimental setup for any fault type 7.
depicted in Fig. 4 seemed to generate distinguishable sensor
signals, enabling an abnormal state to be detected easily, Table 7. Datasets divided into test datasets, training datasets without labels,
and training datasets with corresponding labels for 5-fold cross-validation.
however, it was not simple to classify the two health states of
the system through a tr additional statistical detection method. The ratio of Training
Training
removed labels datasets with the
Because some acceleration signals from the normal state and among the
Test datasets datasets
corresponding
any abnormal state are extremely non-linear and non- training datasets
without labels
labels
stationary, as Fig. 5 illustrates, it was not straightforward to 25% Group 1 Group 2 Group 3, 4, 5
find significant differences between normal and abnormal
Group 1 Group 2, 3 Group 4, 5
system states. Therefore, we employed the proposed fault- 50%
detection method described in Section 2. 75% Group 1 Group 2, 3, 4 Group 5

20% Group 2 Group 1 Group 3, 4, 5


3.2. Experimental results
50% Group 2 Group 1, 3 Group 4, 5
In order to identify the performance of the proposed Group 2 Group 1, 3, 4 Group 5
75%
method, we collected and analyzed five vibration signals in 30
no-fault and 30 fault states (of three different fault types) of …
the bearing-shaft system. To identify the performance of the
75% Group 5 Group 1, 2, 3 Group 4
derived classification model, 5-fold cross-validation was
conducted. In particular, the performance of self-training
methods was dependent on the ease of finding local optima in Based on the multivariate discretization described in
initially labeled datasets, so it was necessary to conduct k-fold Section 2.1, we transformed the collected sensor signals into a
cross-validation. Four-fifths of the total datasets were used in series of DSVs. We obtained the absolute value of every
a training phase to identify abnormal vibration patterns. measurement and found the optimal PDF of the accelerometer
Although we had fully labeled entire datasets as a normal or signal. Digitized bins were generated according to user-
an abnormal state, for performance measurement we needed to defined discretization parameters (i.e., seven bins and an 80%
artificially generate partially labeled datasets for semi- bin width threshold) and the computed PDFs. Subsequently,
supervised classification. To do this, one quarter, one half, or measurements of a sensor within a certain time segment were
three-fourths of the training datasets were randomly selected converted into the corresponding label, and five labels in the
for removal of corresponding labels in a training phase. For same time segment were considered as one DSV at that time
example, we assigned each dataset to one of five groups. segment. Fig. 1 illustrates how a series of DSVs of an
accelerometer signal is formed from the signal collected
during the normal state of the bearing-shaft system. In one
case, three-fifths of the total datasets only had the correct
corresponding labels, and we used them in the first training
phase (i.e., the first row in Table 7). About 177 DSVs were
found in the training datasets and representative DSVs for
each system state (one normal state and three abnormal states)
are summarized in Table 8.

Table 8. An example of the representative DSV for each system state when
Fig. 4. Controlling the bearing-shaft system to generate a fault: (a) fault type three-fifths of the total datasets had the correct corresponding labels and were
1 (non-parallel shafts); (b) fault type 2 (a loose bearing cover). used in the first training phase (i.e., the first row in Table 7).

System state Representative DSV

Normal state �𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� ��

Abnormal state (Fault type 1) �𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� ��

Abnormal state (Fault type 2) �𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� ��

Abnormal state (Fault type 3) �𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� , 𝑙𝑙�� ��

A representative DSV means that the DSV only occurred in


the target state; however, it was never found in other states. As
shown in Table 8, a certain DSV discovered from an abnormal
Fig. 5. An example of the magnitude of accelerometer signals in different state had DSVs containing different labels in the normal state.
system states: (a) a normal state; (b) fault type 1 (non-parallel shafts); (c) Hence it was quite simple to distinguish the abnormal states
fault type 2 (a loose bearing cover); (d) Fault type 3 (non-parallel shafts and
a loose bearing cover simultaneously).
from the normal state; as observed with the representative
322 Sujeong Baek et al. / Procedia Manufacturing 51 (2020) 316–323
Author name / Procedia Manufacturing 00 (2019) 000–000 7

DSV from fault type 3 against the representative DSV from corresponding occurrence numbers of input DSVs in normal
normal state in Table 8. However, as several of the DSVs and abnormal states. As a result, the proposed method
found in abnormal states showed only slight differences from provides higher detection performance regardless of the ratio
the DSVs of the normal states, still it was not straightforward of unlabeled DSVs in the training datasets. In particular, the
to detect abnormal vibrations using traditional statistical proposed method correctly determines novel unknown DSVs
methods. Note that these representative DSVs depends on the in the test phases by considering the occurrence of DSVs.
setting of k-fold cross validation. Although the primary objectives of this study were
To examine the effectiveness of the proposed detection achieved, further work is needed to overcome the limitations
method, we implemented three methods for detecting of the present study: The detection performance of the
abnormal vibrations: supervised fault detection [11] with the proposed method was rarely affected by any alternations of
labeled training datasets; the original self-training-based, the test conditions (such as a different motor speed, a different
semi-super vised classification; and the proposed detection gear ratio, and different lengths of shafts), but, it required a
method (semi-supervised classification considering the new training model with the identical architecture of the
number of DSV occurrences). We measured detection proposed method, because of the different signal
performance by calculating the number of discernible characteristics of the collected data. Therefore, we need (i) to
abnormal states in the test datasets. According to our previous test the performance of the proposed detection method across
supervised fault detection research [11], a discernible different test conditions. In addition, it is also necessary (ii) to
abnormal state was defined as a system state in which one or generalize the analysis results using more diverse types of
more abnormal patterns are detected. faults, (iii) to validate and verify the proposed method in early
Table 9 summarizes the abnormal vibration detection detection of degradation before fault occurrence, and (iv) to
performance of each label removal regime. Regardless of the discuss how to apply the method to unsupervised problems.
degree of label removal (i.e., of the prevalence of unlabeled
datasets) among the training sets, semi-supervised Acknowledgements
classifications showed better detection. As the ratio of
removed labels in training datasets increased, semi-supervised This work was supported by the National Research
methods provided higher detection rates, whereas supervised Foundation of Korea (NRF) grant funded by the Korea
detection worsened. Semi-supervised classification also can government (MSIT) (No. 2019R1G1A1097478).
provide more accurate detection results for unknown DSVs
that are never found in the training DSVs. In addition, we References
were able to identify the effect of considering the occurrence
numbers of DSVs with an increasing proportion of unlabeled [1] Lu Y, Xie R, Liang SY. Detection of weakfault using sparse
DSVs. empirical wavelet transform for cyclicfault. The International Journal of
Advanced Manufac-turing Technology 2018;99(5-8):1195–12012.
[2] Peng Y, Dong M, Zuo MJ. Current status of machine prognostics in
Table 9. The number of discernible abnormal states as a percentage of total
condition-based maintenance: a review. The International Journal of
abnormal states (=30 abnormal states) for supervised fault detection with the
labeled training datasets, for the original self-training based semi-supervised Advanced Manufac-turing Technology 2010;50(1-4):297–3133.
[3] Addo-Tenkorang R, Helo PT. Big data applications in
classification, and the proposed detection method. A number beside the
operations/supply-chain management: A literature review. Computers &
percentage denotes the exact number of discernible states among total
abnormal states. Industrial Engineering 2016;101:528–5434.
[4] Khan S, Phillips P, Jennions I, Hockley C. No fault found events in
The proposed maintenance engineering part 1: Current trends, implications and
The ratio of Supervised The original
detection organizationalpractices. Reliability Engineering & System Safety
removed labels fault detection self-training
method 2014;123:183–1955.
among the with the based semi-
(considering [5] Sydor P, Kavade R, Hockley CJ. Warranty impacts from no fault found
training labeled training supervised
the DSV’s (nff) and an impact avoidance benchmarking tool. Advances in Through-
datasets datasets classification
occurrences) life Engineering Services, Cham: Springer; 2017. p 245–2596.
25% 90% (27.0) 93% (28.0) 93% (28.0) [6] Tjahjono B, Teixeira ELS, Alfaro SCA. An on-line simulation to link
asset condition monitoring andoperations decisions in through-life
50% 89% (26.8) 86% (25.8) 97% (29.0) engineering services. In Proceedings of 2013 Winter Simulation
Conference. 2013.
75% 85% (25.4) 91% (27.2) 100% (30.0) [7] Erkoyuncu JA, Khan S, Hussain SMF, Roy R. A framework to estimate
the cost of no-fault found events. International Journal of production
economics 2016;173:207–2228.
4. Conclusion [8] Lee WJ, Wu H, Huang A, Sutherland JW. Learning via acceleration
spectrograms of a dc motor systemwith application to condition
In this study, we demonstrate semi-supervised abnormal monitoring. The International Journal of Advanced Manufacturing
Technology 2020;106(3-4):803–8169.
vibration detection in a constructed bearing-shaft system that
[9] Lu Y, Xie R, Liang SY. Detection of weakfault using sparse
can exhibit three different fault states based on the placement empirical wavelet transform for cyclicfault. The International Journal of
of the shafts and the tightness of a bearing cover. These fault Advanced Manufac-turing Technology 2018;99(5-8):1195–1201.
states are analyzed by identifying fault patterns based on the [10] Dragomir OE, Gouriveau R, Dragomir F, Minca E,Zerhouni N.
DSVs from five vibration sensor signals. Because time series Review of prognostic problem incondition-based maintenance. In 2009
European Con-trol Conference. 2009.
discretization reduces the number of datasets, we consider as
input variables both the transformed DSVs and the
Sujeong Baek et al. / Procedia Manufacturing 51 (2020) 316–323 323
8 Author name / Procedia Manufacturing 00 (2019) 000–000

[11] Baek S, Kim DY. Empirical sensitivity analysis of discretization [25] Huang G, Song S, Gupta JN, Wu C. Semi-supervised and
parameters for fault pattern extractionfrom multivariate time series data. unsupervised extreme learning machines. IEEE Transactions on
IEEE transactions oncybernetics 2016;47(5):1198–1209. cybernetics 2014;44(12):2405–2417
[12] Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey. [26] Seliya N, Khoshgoftaar TM. Software quality estimation with
ACM computing surveys 2009;41(3):1–58. limited fault data: A semi-supervised learning perspective. Software
[13] Georgoulas G, Karvelis P, Loutas T, Stylios CD. Rolling element Quality Journal 2007;15(3):327–344.
bearings diagnostics using the symbolic aggregate approximation. [27] Wei L, Keogh E. Semi-supervised time series classification. In
Mechanical Systemsand Signal Processing 2015;60:229–242. Proceedings of the 12th ACMSIGKDD International Conference on
[14] Mattes A, Schöpka U, Schellenberger M, Scheibelhofer P, Leditzky G. Knowledge Discovery and Data Mining. 2006.
Virtual equipment for benchmark-ing predictive maintenance [28] Theissler A. Detecting known and unknownfaults in automotive
algorithms. In Proceedings of the 2012 Winter Simulation Conference. systems using ensemble-based anomaly detection. Knowledge-Based
2012. Systems 2017;123:163–173.
[15] Yiakopoulos C, Gryllias K, Chioua M, Hollender M. Antoniadis I. [29] Schwenker F, Trentin E. Pattern classificationand clustering: A review
An on-line sax and hmm-basedanomaly detection and visualization tool of partially supervised learning approaches. Pattern Recognition Letters
for early dis-turbance discovery in a dynamic industrial process. 2014;37:4–14.
Journal of Process Control 2016;44:134–159. [30] Wu H, Yu Z, Wang Y. Real-time FDM machinecondition monitoring
[16] Liu B, Li J, Chen C, Tan W, Chen Q, Zhou M. Efficient motif discovery and diagnosis based on acousticemission and hidden semi-markov model.
for large-scale time series in health care. IEEE Transactions on Industrial The International Journal of Advanced Manufacturing Technology
Informatics 2015; 11(3):583–590. 2017;90:2027–2036.
[17] Keogh E, Lin J, Lee SH, Van Herle H. Finding the most unusual time [31] Zhao Y, Ball R, Mosesian J, de Palma JF, Lehman B. Graph-based semi-
series subsequence: Algorithms and applications. Knowledge and supervised learning for fault detection and classification in solar
Information Systems 2007;11(1):1–27. photovoltaic arrays. IEEE Transactions on Power Electronics
[18] Mitsa T. Temporal data mining. Temporal Pattern Discovery, Boca 2014;30(5):2848–2858.
Raton: CRC Group; 2010. p.153-200. [32] Jiang L, Xuan J, Shi T. Feature extraction basedon semi-supervised
[19] Karvelis P, Georgoulas G, Tsoumas IP, Antonino-Daviu JA, Climente- kernel marginal fisher analysis and its application in bearing fault
Alarcón V, Stylios CD. A symbolic representation approach for the diagnosis. MechanicalSystems and Signal Processing 2013;41(1-2):113–
diagnosis of broken rotor bars in induction motors. IEEE Transactions on 126.
Industrial Informatics 2015;11(5):1028–1037. [33] Meng J, Wu L, Wang X, Lin T. Granulation-based symbolic
[20] Georgoulas G, Karvelis P, Stylios CD, Tsoumas IP. Antonino- representation of time series and semi-supervised classification.
Daviu JA, Corral-Hernández J, Climente-Alarcón V, Nikolakopoulos G. Computers and Mathematics with Applications 2011;62:3581–3590.
Automatizing the detection of rotor failures in induction motors [34] Le Guennec A, Malinowski S, Tavenard R. Data augmentation for time
operatedvia soft-starters. In Proceedings of IECON 2015-41st Annual series classification using convolutional neural networks. In
Conference of the IEEE Industrial Electronics Society, 2015. Proceedings of ECML/PKDD Workshop on Advanced Analytics and
[21] Duan L, Zhang Y, Zhao J, Wang J, Wang X, Zhao F. A hybrid Learning on Temporal Data, 2016.
approach of sax and bitmap for ma-chinery fault diagnosis. In [35] Batch A, Lee K, Maddali HT, Elmqvist N. Gesture and action discovery
Proceedings of 2016 International Symposium on Flexible Automation. for evaluating virtual environments with semi-supervised segmentation of
2016. telemetry records. In Proceedings of 2018 IEEE International Conference
[22] Othman Z, Eshames HF. Abnormal patterns detection in control charts on Artificial Intelligence and Virtual Reality (AIVR). 2018.
using classification techniques. Int J Adv Comput Technol [36] Vesel`y K, Hannemann M, Burget L. Semi-supervised training of
2012;4(10):61–70. deep neural networks. In Proceedings of 2013 IEEE Workshop on
[23] Wang J, Balasubramanian A, Mojica de la Vega L, Green JR, Automatic Speech Recognition and Understanding. 2013.
Samal A, Prabhakaran B. Word recognition from continuous [37] Landau HJ. Sampling, data transmission, and the nyquist rate.
articulatory movement time-series data using symbolic representations. Proceedings of the IEEE 1967;55(10):1701–1706.
In Proceedings of the Fourth Workshop on Speech and Language [38] Baek S, Kim DY. Effects of sampling rate on the performance of
Processing for Assistive Technologies, Association for Computational multidimensional discretization-based fault detection. In Proceedings of
Linguistics, 2013. the 2015 Spring Conference of Korean Institute of Industrial Engineers,
[24] Chen J. A predictive system for blast furnaces by integrating a neural 2015.
network with qualitative analysis. Engineering Applications of Artificial
Intelligence 2001;14(1):77–85.

You might also like