You are on page 1of 26

Journal of

Marine Science
and Engineering

Article
Failure Analysis and Intelligent Identification of Critical
Friction Pairs of an Axial Piston Pump
Yong Zhu 1,2,3 , Tao Zhou 1 , Shengnan Tang 4,5,6, * and Shouqi Yuan 1

1 National Research Center of Pumps, Jiangsu University, Zhenjiang 212013, China


2 International Shipping Research Institute, Gongqing Institute of Science and Technology,
Jiujiang 332020, China
3 Leo Group Co., Ltd., Wenling 317500, China
4 Institute of Advanced Manufacturing and Modern Equipment Technology, Jiangsu University,
Zhenjiang 212013, China
5 Saurer (Changzhou) Textile Machinery Co., Ltd., Changzhou 213200, China
6 Wenling Fluid Machinery Technology Institute of Jiangsu University, Wenling 317525, China
* Correspondence: tangsn@ujs.edu.cn; Tel.: +86-188-5286-6635

Abstract: Hydraulic axial piston pumps are the power source of fluid power systems and have impor-
tant applications in many fields. They have a compact structure, high efficiency, large transmission
power, and excellent flow variable performance. However, the crucial components of pumps easily
suffer from different faults. It is therefore important to investigate a precise fault identification method
to maintain reliability of the system. The use of deep models in feature learning, data mining, auto-
matic identification, and classification has led to the development of novel fault diagnosis methods. In
this research, typical faults and wears of the important friction pairs of piston pumps were analyzed.
Different working conditions were considered by monitoring outlet pressure signals. To overcome
the low efficiency and time-consuming nature of traditional manual parameter tuning, the Bayesian
algorithm was introduced for adaptive optimization of an established deep learning model. The pro-
posed method can explore potential fault feature information from the signals and adaptively identify
the main fault types. The average diagnostic accuracy was found to reach up to 100%, indicating the
ability of the method to detect typical faults of axial piston pumps with high precision.

Citation: Zhu, Y.; Zhou, T.; Tang, S.; Keywords: axial piston pump; fault identification; deep learning; Bayesian algorithm
Yuan, S. Failure Analysis and
Intelligent Identification of Critical
Friction Pairs of an Axial Piston
Pump. J. Mar. Sci. Eng. 2023, 11, 616.
1. Introduction
https://doi.org/10.3390/jmse11030616
An axial piston pump is one of the hydraulic components with the highest technical
Academic Editor: Leszek
content in a hydraulic transmission system [1–3]. Axial piston pumps have been widely
Chybowski
applied in the transmission system of equipment in many fields, including offshore drilling
Received: 27 February 2023 platforms, drilling machines, and diving equipment in deep sea, as depicted in Figure 1.
Revised: 10 March 2023 The structure of an axial piston pump consists of many friction pairs, meaning friction and
Accepted: 12 March 2023 wear can occur during operation. Some critical friction pairs are illustrated in Figure 2. Any
Published: 14 March 2023 failure will lead to great loss of production and have an effect on the safety and validity of
the system [4–6]. Therefore, the efficient fault diagnosis of axial piston pumps is greatly
valuable and worth in-depth exploration.
Fault diagnosis aims to determine the cause, position, type, and level of faults and
Copyright: © 2023 by the authors.
predict the present condition as well as evolutionary trends. It generally includes the
Licensee MDPI, Basel, Switzerland.
detection, isolation, and identification of faults. Fault diagnosis methods are mainly based
This article is an open access article
on signal, model, and knowledge [7–9]. Signal-based methods can decide the fault type
distributed under the terms and
conditions of the Creative Commons
and nature by analyzing the time domain, frequency domain, and time–frequency features
Attribution (CC BY) license (https://
of original signals. Model-based methods maintain inherent sensitivity to unknown faults,
creativecommons.org/licenses/by/
and an accurate mathematical model of the object needs to be constructed for its faults
4.0/). to be diagnosed. Methods based on knowledge achieve fault diagnosis by analyzing and

J. Mar. Sci. Eng. 2023, 11, 616. https://doi.org/10.3390/jmse11030616 https://www.mdpi.com/journal/jmse


J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 2 of 27
Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 2 of 27

J. Mar. Sci. Eng. 2023, 11, 616 2 of 26

faults, and an accurate mathematical model of the object needs to be constructed for its
faults, and an accurate mathematical model of the object needs to be constructed for its
faults to be diagnosed. Methods based on knowledge achieve fault diagnosis by analyzing
faults to be diagnosed. Methods based on knowledge achieve fault diagnosis by analyzing
processing
and processing the the
rawraw signals.
signals. Considering
Considering the influence
the influence of noiseofinterference
noise interference on bearing
on bearing
and processing the raw signals. Considering the influence of noise interference on bearing
vibrationvibration
signals,signals, the variational
the variational modemode extraction
extraction methodmethodcan can
be be used
used forfor signalpro-
signal processing.
vibration signals, the variational mode extraction method can be used for signal pro-
cessing.AnAnimproved
improvedone-dimensional
one-dimensional convolutional
convolutional neural neural network (1D CNN) was de- developed
cessing. An improved one-dimensional convolutional neural network (1D CNN) was de-
velopedforforfault
faultdiagnosis
diagnosisofofrolling
rollingbearings
bearingsby byintroducing
introducingbatch batchnormalization,
normalization,a aself-attention
self-
veloped for fault diagnosis of rolling bearings by introducing batch normalization, a self-
attention layer, and global average pooling [10]. Yu et al. developed a fusion methodraw
layer, and global average pooling [10]. Yu et al. developed a fusion method of of signals
attention layer, and global average pooling [10]. Yu et al. developed a fusion method of
raw signals by combining modified empirical wavelet transform and the variance contri-rate [11].
by combining modified empirical wavelet transform and the variance contribution
raw signals by combining modified empirical wavelet transform and the variance contri-
The [11].
bution rate slight faults
The of faults
slight hydraulic pumps can
of hydraulic pumps be accurately detected
can be accurately based on
detected fused
based on feature
bution rate [11]. The slight faults of hydraulic pumps can be accurately detected based on
information.
fused feature By fusing
information. wavelet
By fusing packet
wavelet transformation
packet transformationand anda newa newtangent hyperbolic
tangent
fused feature information. By fusing wavelet packet transformation and a new tangent
fuzzy
hyperbolic entropy
fuzzy measure,
entropy ZhouZhou
measure, et al. et
proposed a new amethod
al. proposed to effectively
new method achieve defect
to effectively
hyperbolicidentification
fuzzy entropy of measure,
axial piston Zhou
pumpset al. proposed
[12]. Owing toathe
new method to
insufficient effectively
analysis of the dynamic
achieve defect identification of axial piston pumps [12]. Owing to the insufficient analysis
achieve defect identification of axial pistonTang
pumps [12]. Owing to the insufficient analysis
of the dynamic characteristics of piston pumps, Tang et al. investigated the effects ofloading
characteristics of piston pumps, et al. investigated the effects of external ex- and
of the dynamic
health characteristics
status by of piston pumps,
constructing a virtual Tang et al. investigated
prototype model [13]. the effects of ex-
Furthermore, loose slipper
ternal loading and health status by constructing a virtual prototype model [13]. Further-
ternal loading andwas
failure health status by constructing a virtual prototype model [13]. Further-images of a
more, loose slippereffectively
failure wasdetected for different
effectively detected for loads. By acquiring
different loads. Bytheacquiring
thermal the
more, loose slipper failurecurrent
was effectivelymotor,
detected fordifferent
different loads. By acquiring the
thermalbrushless
images ofdirecta brushless electric
direct current three
electric motor,faulty three states were
different investigated
faulty states using
thermal images
a newof a brushless
feature direct
extraction currentand
method electric
deepmotor,
learning three different
methods. faulty
High states of around
accuracy
were investigated using a new feature extraction method and deep learning methods.
were investigated
100% was using a
obtained new feature
using extraction method and deep learning methods.
High accuracy of around 100% wasthe power using
obtained of normalized
the powerimage difference
of normalized method
image and three
differ-
High accuracy
deep ofmodels
around[14].
100% was obtained using the power of normalized image differ-
ence method and three deep models [14].
ence method and three deep models [14].

(a) (b)
(a) (b)

(c) (d)
(c) (d)
Figure 1.Figure
Applications of hydraulic
1. Applications piston pumps. (a) Offshore drilling platform, (b) deep-sea drill,
Figure 1. Applications of hydraulicofpiston
hydraulic piston
pumps. pumps.
(a) Offshore (a) Offshore
drilling drilling
platform, (b)platform,
deep-sea(b) deep-sea
drill, drill,
(c) deep-sea manned submersible, (d) deep-sea oil and gas exploitation.
(c) deep-sea(c) deep-sea
manned manned submersible,
submersible, (d) deep-sea(d)
oildeep-sea
and gas oil and gas exploitation.
exploitation.

(a) (b)
(a) (b)

(c) (d)
(c) (d)

Figure 2. Main friction pairs in an axial piston pump. (a) Piston ball head and slipper, (b) slipper and
swash plate, (c) piston and its hole, (d) cylinder and valve plate.
J. Mar. Sci. Eng. 2023, 11, 616 3 of 26

Artificial intelligence is known as a progress of knowledge-based methods and has


boosted intelligence of mechanical fault diagnosis [15–17]. Kumar et al. used an artificial
neural network to carry out the intelligent detection of a centrifugal pump under varying
speeds. The method comprehensively learnt multisource feature information in acceler-
ation, pressure, and motor line current signals and promoted the accuracy of detecting
blockage failures [18]. To explore the causes of anomalies in wind turbines, Wang et al.
proposed a two-stage decomposition strategy for fault detection and identification. Multi-
head self-attention mechanism and position-wise feed-forward network were used. The
method presented superior identification performance compared to autoencoder-based
methods and had the attention of long short-term memory [19]. Yuan et al. explored an
intelligent index on the basis of an enhanced weighted square envelope spectrum and
solved the adaptive selection of multiwavelet basis functions. A support vector machine
was applied for weight optimization. The results for two bearing datasets indicated that
the method was effective for early machinery fault diagnosis [20]. Considering the large
amounts of unlabeled data in project practice, Shi et al. constructed a novel fault diagnosis
method based on deep hypergraph autoencoder embedding. The method combined the
strengths of the extreme learning machine autoencoder in computation and the powerful
capability of deep learning approaches in representation learning [21]. The method showed
superiority in accuracy and computation of fault diagnosis of rolling bearings and rotors.
Because of the severe distribution difference of samples in healthy and faulty conditions,
Liu constructed a diagnosis method by integrating a hierarchical extreme learning machine
and transfer learning. The method achieved fault identification of the gas path of an aero-
engine and showed strong generalization ability [22]. Taking into account the potential
complicated and heterogenous feature distribution difference, a strategy called interdo-
main decision discrepancy minimization was introduced to enhance the transfer learning
method [23]. Through experimental validation, it was revealed to be more accurate and
robust for fault diagnosis of bearings and gearbox datasets with small and massive samples.
Considering the small faulty samples in nuclear power plants, a new method based on
lightweight conditional generative adversarial network was developed for fault diagnosis
via sample augmentation. The performance of the method was validated using three
different public datasets [24]. Given the problems of sparse training data and the reliability
of the conclusions in present intelligent diagnosis methods, a new unsupervised learning
framework was established [25]. Contrastive learning was used to obtain the common
features between the same fault types and distinguish differences between different fault
types. A feature-assisted multibranch method was employed as a guidance for the model
to weaken the possible interference from operational parameters. The proposed method
was especially efficient for bearing fault diagnosis under different speeds. Zhao and Shen
proposed a novel diagnosis method based on semisupervised domain generalization for
unseen working conditions. Some new samples with pseudo labels were obtained on
the strength of the samples with labels, and a sample purification strategy based on the
entropy theory was introduced to enhance the restructured samples. Semisupervised
fault diagnosis was accomplished for the bearings and gearbox [26]. The method was
shown to be effective by analysis of the computation complexity and performance in noisy
environments. However, the automatic sample selection rate and its generalization for
different machines need to be further explored. To deal with unreliability from unseen
faults, an ensemble diagnosis method was developed by combining five different CNNs,
CNNs with different convolutional layers, ResNet, and inception network. The constructed
method could identify unknown faults, and was found to be trustworthy for fault diagnosis
of wind turbines and gearboxes [27]. To determine possible misdiagnosis, Zhou et al.
constructed a probabilistic Bayesian CNN framework for more accurate fault identification.
The method could complete diagnosis of visible bearing faults and was also verified to be
valid for unseen gear faults under normal and noisy conditions [28]. Due to the multiple
parameters and time-consuming model optimization, a multihierarchy compound network
compression approach was used for bearing fault diagnosis [29]. Unlike traditional CNNs,
J. Mar. Sci. Eng. 2023, 11, 616 4 of 26

structured and unstructured pruning was carried out by removing unimportant filters in
the convolutional layer and inconsequential connections in the fully connected layer. The
parameters were reduced while achieving equal recognition accuracy. Based on denoising
time–frequency images, Chao et al. explored the identification of cavitation severity on a
high-speed aviation hydraulic pump using a CNN-based intelligent method [30]. Driven
by acoustic signals, Kumar et al. investigated fault identification of a centrifugal pump
employing a CNN for feature extraction and classification [31]. The experiments indicated
increased accuracy and powerful generalization capability.
Most of the present literature concerns the use of fault diagnosis methods for bearing,
gearing, and wind turbines. In an earlier study, a deep model named two-layer CNN was
constructed for fault diagnosis of axial piston pumps based on vibration signals [32]. In
this research, similar defect types and degree of damage of key friction pairs of axial piston
pumps were considered. A new deeper CNN network structure with five convolutional
layers was built for fault identification. The innovations are embodied in the following
three points.
(1) The structure of a hydraulic axial piston pump is composed of many friction pairs
and is very complex. This makes it difficult to achieve fault monitoring and diagnosis, and
it is much harder to accomplish intelligent diagnosis process. This research provides a new
intelligent method for fault diagnosis of the key friction pairs of axial piston pumps.
(2) Deep learning methods are employed for fault diagnosis of typical rotating ma-
chinery. However, the construction of the current deep models has shortcomings, such as
its time-consuming nature and inefficient manual parameter tuning. This research utilizes
the strengths of the Bayesian optimization algorithm and completes automatic learning of
the model hyperparameters.
(3) Many fault diagnosis methods are used for machinery vibration. Some are imple-
mented based on acoustic emission technology. Taking into account the pressure change
associated with component failure and easy monitoring of the signal, this study probes into
the monitored outlet pressure signals and dissects the hidden fault characteristics from a
special angle.
The rest of the paper is structured as follows. A brief introduction of CNN and
Bayesian optimization (BO) is provided in Section 2. Section 3 describes the implementation
of the proposed fault identification method. The experimental setup and data acquisition
are outlined in Section 4. In Section 5, the diagnosis accuracy and performance of the
method are analyzed and discussed. Finally, a conclusion is drawn and prospects for future
research are discussed in Section 6.

2. Theoretical Background
2.1. Convolutional Neural Network
A typical fully connected neural network can generally result in information loss,
and lots of parameters lead to overfitting. With the progress of deep models, CNN is
gradually exhibiting superiority in processing large-scale images [33–35]. In general, the
basic structure of a CNN consists of an input layer, convolution layer, pooling layer, fully
connected layer, and output layer. Convolution layer, pooling layer, and fully connected
(FC) layer are collectively called hidden layers. The connection between the convolutional
layer and the subsampled layer is a local connection. A fully connected mode is adopted
in the fully connected layer. A CNN presents three distinguished characteristics: local
connection, weight sharing, and downsampling. The local receptive field is a particular
structure that endows CNN with the function of local connection and weight sharing.
The calculation of convolutional layer is as follows [36],

Mm = A( I · Wm + bm ) (1)
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 5 of 27

J. Mar. Sci. Eng. 2023, 11, 616 M m = A(I W m + bm) 5 of(1)


26
th

where A denotes a kind of nonlinear activation function; m denotes the m feature


map; A
where I denotes a kind ofisnonlinear
is the input; used as the convolution
activation operator;
function; and the
m denotes mth feature
the new featuremap;
map,I
is
weight, and ·bias
the input; is used as M , W operator;
as the convolution
are denoted , and b , and the new feature map, weight, and
respectively.
bias are
Thedenoted as M, is
classification W,achieved
and b, respectively.
by the Softmax function, which can convert the output
The classification is achieved
values of multiclasses into probabilityby thedistributions
Softmax function,
in thewhich
range can
[0, convert
1] and 1.the
It output
can be
values of multiclasses
expressed as follows [37]: into probability distributions in the range [0, 1] and 1. It can be
expressed as follows [37]:
( x+ )
Output = Soft max(W nx + b n) =eeW n bn
(W x + b ) n n (2)
 (W b )
Output = So f tmax(Wn x + bn ) = p (W p x+ (2)
∑1 ee x+b )
o o
o o
1

wherethe
where theinput
inputofofthethe output
output is presented
is presented x ; n denotes
as x;asn denotes one
one class; class;
and P is and P is the
the number of
numbernodes,
output of output
also nodes,
knownalso known
as the numberas the number of categories.
of categories.
Figure
Figure 3 presents the convolutional
convolutional operation.
operation.The Theconvolutional
convolutionalkernel,
kernel,known
knownasasa
afilter,
filter,conducts
conductsrelative
relativeoperation
operationininthethe receptive
receptive field
field for feature extraction
extraction [38].
[38]. The
The
pooling layer shown in Figure 4 can reduce dimension and parameters.
pooling layer shown in Figure 4 can reduce dimension and parameters. can complete It can complete
downsampling
downsampling using using max-pooling
max-pooling or or average
average pooling.
pooling.

J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 6 of 28


Figure 3.
Figure 3. Convolutional
Convolutional layer.
layer.

Figure 4. Pooling layer.

To reduce the interdependence of the parameters and achieve sparsity of the net-
Figure 4. Pooling Figure
layer.4. Pooling layer.
work, a rectified linear unit layer (ReLU) is usually used together with a convolutional
To reduce the interdependence of the parameters and achieve sparsity of the net-
layerTo[39]. The the
reduce ReLU a function
interdependence
work, isunit
rectified linear shown of the
layer in parameters
Figure
(ReLU) is usually5.used
Theand FCachieve
together layer issparsity
locatedof
with a convolutional atthe
thenetwork,
end of a
CNN, and each layer
node [39].
is The ReLUinterconnected
fully function is shown in Figure
with 5. the
The FC layer is in
nodes located
the atfront
the end layer.
of a It integrates
a rectified linear unit layer (ReLU) is usually used together with a convolutional layer [39].
CNN, and each node is fully interconnected with the nodes in the front layer. It integrates
The ReLU function is shown
the features infrom
extracted Figure 5. The
the previous layerFC layer
and maps is tolocated
them at the
the label space. The end
struc- of a CNN, and
ture of the FC layer is presented in Figure 6.
each node is fully interconnected with the nodes in the front layer. It integrates the features
extracted from the previous layer and maps them to the label space. The structure of the
FC layer is presented in Figure 6.
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 6 of 27
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 6 of 27

J. Mar. Sci. Eng. 2023, 11, 616


the
the features
features extracted
extracted from
from the
the previous
previous layer
layer and
and maps
maps them
them to
to the
the label
label space.
space. The
The struc-
struc-
6 of 26
ture of the FC layer is presented in Figure 6.
ture of the FC layer is presented in Figure 6.

Figure ReLU function.


Figure 5.
5. ReLU
ReLU function.
function.

Figure
Figure 6.
6. Fully
Fully connected
Fully connected layer.
connected layer.
layer.
2.2. Bayesian Algorithm
2.2. Algorithm
2.2. Bayesian
Bayesian Algorithm
Machine learningalgorithms
Machine algorithms containnumerous numerous parameters. The ones that can be
Machine learning
learning algorithms contain contain numerous parameters. parameters. The The ones
ones that
that can
can bebe op-
op-
optimized
timized through
through training
training areare called
called parameters, such as the weight in neural networks,
timized
while through
those that training
cannot be are called parameters,
optimized parameters,
by training
such
such
are
as
as the
called
weight
weight in in neural
thehyperparameters, neuralsuch networks,
networks,
as the
while
while those
those that
that cannot
cannot be optimized
be optimized by training
by training are called
areItcalled hyperparameters,
hyperparameters, such
such as as the
the
learning
learning rate
rate (LR),
(LR), neurons
neurons per
per layer,
layer, and
and batch
batch size.
size. should
It should be noted
be noted that parameters
that parameters are
learning
updated, rate
while(LR), neurons
hyperparameters per layer,
are and
always batch size.
constant It
in should
training.be noted
It is that
essential parameters
to adjust
are
are updated, whilewhile hyperparameters are always constant in
in training. It
It is essential
essential to to ad-
the updated,
just hyperparameters
the hyperparameters
hyperparameters
to acquire
to moremore
acquire
are always
efficient
efficient
constant
machinemachinelearningtraining.
models
learning
is
models[40,41].
[40,41].
ad-
just the hyperparameters
There are two major to acquire
challenges more
in efficient machine
hyperparameter optimization. learning models
First, [40,41].
it isisa acombina-
There are
are twotwo major challenges in
in hyperparameter optimization. First,
First, it is a combi-
torialThere
optimization major
problem challenges
that is hard hyperparameter
to complete using optimization.
gradient descent. itSecond, combi-
it is
natorial
natorial optimization
optimization problem
problem that is hard
that isofhard to complete
to complete using using gradient
gradient descent. Second,
Second, it
time-consuming to evaluate a group hyperparameters, which makes descent.
some evolutionary it
is
is time-consuming
time-consuming to
to evaluate
evaluate aa group
group of
of hyperparameters,
hyperparameters, which
which makes
makes some
some evolution-
evolution-
algorithms inapplicable. Grid search aims to search for a suitable configuration by trying
ary
ary algorithms
algorithms inapplicable. Grid search
search aimsaims to to search for aa suitable configuration by
all the possible inapplicable.
combinationsGrid of a hyperparameter. search
Random for search
suitable
choosesconfiguration
the best con- by
trying
trying all
all the
the possible
possible combinations
combinations of
of aa hyperparameter.
hyperparameter. Random
Random search
search chooses
chooses the
the best
best
figuration via random combinations of the hyperparameters. Compared to grid search,
configuration
configuration via random
random combinations of
of the
the hyperparameters. Compared
Compared to to grid
it can prevent via unnecessary combinations
attempts at optimizing hyperparameters.
unimportant hyperparameters. grid
The
search,
search, it can
it can prevent
prevent unnecessary
unnecessary attempts
attempts at optimizing
at optimizing unimportant hyperparameters.
limitation is that the two search-based methods ignore theunimportant
relationship hyperparameters.
between different
The
The limitation
limitation is that
that thethe two
is combinations. two search-based
search-based methods methods ignoreignore thethe relationship
relationship between
between dif- dif-
hyperparameter
ferent hyperparameter
ferentBayesian
hyperparameter combinations.
combinations.
optimization is an adaptive method for hyperparameter selection. BO can
Bayesian
Bayesian optimization
optimization is
is an
an adaptive
adaptive method
method for hyperparameter selection. BO can
predict the next possible combination that can bringforthehyperparameter
greatest benefitsselection.
according BOto can
the
predict
predict the next possible combination that can bring the greatest benefits according to the
current thetried next possible
groups [42].combination
BO includes that can information
prior bring the greatest benefits according
of parameters and shows to fast
the
current
current tried
tried groups
convergence. groups
BO is aimed[42].
[42]. BOBO includes prior
prior information
includeshyperparameter
at optimal information of
of parameters
parameters
combinations thatandcanshows
and obtainfast
shows fast
the
convergence.
convergence. BO is aimed
BO is aimedBO
best model performance. at optimal
at optimal hyperparameter
uses twohyperparameter
important functions: combinations
combinations
Gaussian that
that can obtain
can obtain
process (GP) and the
the
best
best model
model performance.
acquisition function. GP BO
performance. can uses
BO uses two
two important
effectively deal withfunctions:
important nonlinearGaussian
functions: tasks andprocess
Gaussian is able(GP)
process to fitand
(GP) and
the
objective function in optimization.
J. Mar. Sci. Eng. 2023, 11, 616 7 of 26

The following formula represents a multivariate Gaussian distribution [43,44]:

f ( p1:k ) ∼ N ( a( p1:k ), cθ ( p1:k , p1:k )) (3)

where f signifies a smooth function, a finite input collection is expressed as p1:k , a( p1:k ) j = a( p j )
presents a mean vector, cθ ( p1:k , p1:k ) ji = cθ ( p j , pi ) denotes covariance matrix, and the
parameterization is accomplished by θ.
The mean value and variance of samples are two critical statistics in Gaussian dis-
tribution. Exploitation refers to choosing points where the mean value is larger, while
exploration refers to selection of points where the variance is larger. BO aims to balance
exploitation and exploration. An appropriate point should be a tradeoff of exploitation and
exploration. Acquisition function is considered as a suitable choice for sampling. Among
many kinds of acquisition functions, expected improvement (EI) based on improvement is
easy and effective. It does not have many parameters and has simple computation. The
formula of EI is indicated as follows [44]:

α EI ( D; σ, l ) = E[ M (0, y( D ) − y( D ∗))] (4)

where l denotes a dataset, y( D ) presents the best objective that can be called the posterior
expected value, and y( D ∗) shows the current best result or the maximum that has been
encountered so far.
EI combined with GP is as follows:

α EI ( D; σ, l ) = β t ( D; θ, l )[ xΦ( x ) + φ( x )] (5)

µ ( D;σ,l )−y( D ∗)
where x = t β( D;σ,l ) , Φ denotes the cumulative distribution function, and φ presents
the probability density function that conforms to the standard normal distribution.
EI with stochastic noise can be expressed as follows:

α EI ( D; σ, l ) = β( D; σ, l )[tΦ(t) + φ(t)] (6)

µ ( D;σ,l )−µ(θ ∗)
where t = t β( D;σ,l ) , and µ(σ ∗) signifies the most desired results that were obtained
by calculating the mean value.

3. Proposed Diagnosis Method


A refined CNN model was constructed based on the traditional AlexNet (T-AlexNet) [45].
The strategies of data transform and dropout [46] and Adam algorithm were employed
in the establishment of the model [47]. The proposed identification method combines a
modified CNN model and Bayesian algorithm. The flowchart is displayed in Figure 7,
while a more visualized account is depicted in Figure 8. The method can be divided into
two steps.
The first step is called signal-to-image, where raw pressure signals are acquired with
a pressure sensor. The obtained time series are converted into images using continuous
wavelet transform. The time–frequency images are input into the deep model followed by
data transform processing.
The second step is called feature extraction and identification, where a common
CNN model is built based on T-AlexNet with initial hyperparameters. The important
model hyperparameters are optimized using the Bayesian algorithm. LR, epoch, batch
size, dropout ratio, convolutional kernels, and other features are included. The best
hyperparameter groups are obtained through optimization, and the model achieves high
performance. The Bayesian optimized AlexNet (B-AlexNet) is used for adaptive failure
identification of an axial piston pump based on the above feature images.
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 8 of 2

J. Mar. Sci. Eng. 2023, 11, 616 8 of 26


Bayesian optimized AlexNet (B-AlexNet) is used for adaptive failure identification of an
axial piston pump based on the above feature images.

Data
Original signal (1D)
acquisition

Time-frequency
analysis (CWT)

Time-frequency
Data digram (2D)
preprocessing

Training dataset Testing dataset

Train Test Validate

Deep CNN model

Bayesian
optimization
Model training
and optimization No
Optimal model

Yes

Model saving

No
Validation success
Validation and
application Yes

Apply to fault diagnosis

Figure7.7.Flowchart
Figure Flowchartof of
thethe proposed
proposed method.
method.
J. Mar. Sci. Sci.
J. Mar. Eng.Eng.
2023, 11,
2023, 11,x616
FOR PEER REVIEW 9 of 26

Pressure sensor

Continuous wavelet transform

Bayesian optimization

Intelligent fault identification

Figure
Figure8. Proposed diagnosis
8. Proposed method.
diagnosis method.
4. Design of the Experimental Bench
4. Design of of
The bench thea swash
Experimental Bench
plate plunger pump was used for fault simulation tests, as
shownThe bench
as Figure 9. Itofcontained
a swasha Y132M-4
plate plunger pump
motor and was used
an MCY14-1B forpiston
axial faultpump
simulation
with seven plungers. In the experiment, three types of sensors were
shown as Figure 9. It contained a Y132M-4 motor and an MCY14-1B axial pisto installed: vibration
sensors in three different directions, pressure sensor, and sound sensor. Data from different
with seven plungers. In the experiment, three types of sensors were installed: v
sensors were acquired. This research was conducted based on the analysis of pressure
sensors
signals. Weinperformed
three different
samplingdirections, pressure
with a frequency of 10 sensor,
kHz. and sound sensor. Data fro
ent The
sensors
nominalwere acquired.
pressure Thispump
of a piston research
is 31.5was
MPa.conducted based
The motor has on the
a constant analysis of
rotating
speed of 1470 r/min. The pressure sensor is an instrument
signals. We performed sampling with a frequency of 10 kHz. of SYB-351 that can display
parameters. It can measure from 0 to 25 MPa. This is mainly because the actual working
pressure of the pump is generally lower than the nominal pressure. The pump outlet
pipeline is equipped with a pressure sensor to monitor the outlet pressure signal of the
pump. Five conditions
Motor were
Vane analyzed in the research: wear of the slipper (hx), wear of the
pump
swash plate (xp), failure of the loose slipper (sx), wear of the central spring (th), and no
visible failure (zc). Different pressures and wear degrees were examined.
x
z

y
Pressure sensor
4. Design of the Experimental Bench
The bench of a swash plate plunger pump was used for fault simulation tests, as
shown as Figure 9. It contained a Y132M-4 motor and an MCY14-1B axial piston pump
with seven plungers. In the experiment, three types of sensors were installed: vibration
sensors in three different directions, pressure sensor, and sound sensor. Data from differ-
J. Mar. Sci. Eng. 2023, 11, 616 10 of 26
ent sensors were acquired. This research was conducted based on the analysis of pressure
signals. We performed sampling with a frequency of 10 kHz.
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 10 of 27

Motor Vane pump


A
x
Axial x z A x z z o
Drive motor piston x
pump

y
Pressure sensor z y
y y
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 10 of 27
Axial piston pump Vibration sensor Sound sensor

Oil tank Charge Data


amplifier acquisition card
A

Axial x z A x z o
Drive motor Figure 9. piston
Test bed of an axial piston pump. x
pump
The nominal pressure of a piston pump is 31.5 MPa. The motor has a constant rotat-
ing speed of 1470 r/min.y The pressure sensor is z
y an instrument y SYB-351 that can display
of
parameters. It can measure from 0 to 25 MPa. This is mainly because the actual working
pressure of the pump is generally lower than the nominal pressure. The pump outlet pipe-
line
Oil Chargeto monitor the
is equipped with a pressure sensor
tank Dataoutlet pressure signal of the pump.
amplifier acquisition
Five conditions were analyzed in the research: wear of the cardslipper (hx), wear of the swash
plate (xp), failure of the loose slipper (sx), wear of the central spring (th), and no visible
failure (zc). Different pressures and wear degrees were examined.
Figure 9. Test bed of an axial piston pump.
Figure 9. Test bed of an axial piston pump.
5. Results and Discussion
The nominal
5. Results pressure of a piston pump is 31.5 MPa. The motor has a constant rotat-
and Discussion
5.1.speed
ing SignalofAnalysis
1470 r/min. The pressure sensor is an instrument of SYB-351 that can display
5.1. Signal Analysis
The sample
parameters. It can length
measure infrom
the experiment
0 to 25 MPa.was This1024. The complex
is mainly wavelet
because the actualwas taken as
working
The sample length in the experiment was 1024. The complex wavelet was taken as
pressure
the waveletof the pump
basis is generally
function. lowersequence
The scale than the nominal
length was pressure. Thebandwidth
256. The pump outlet andpipe-
center
the
line
wavelet basis with
is equipped
function. The scale sequence length was 256. The bandwidth and center
frequency were botha3.pressure
Figure 10 sensor
shows to monitor the outlet pressure
the time–frequency diagramssignal of the
of the pump.sig-
pressure
frequency
Five were bothwere3.analyzed
Figure 10inshows the time–frequency diagrams of the pressure signal.
nal. conditions
The time–frequency imagesthe research:
were used towear
build of the
the dataset.
slipper (hx),
Eachwear
faultoftype
the swash
consisted
The time–frequency
plate (xp), failure of images
thewhole were
loose datasetused
slipper (sx),to build the dataset. Each fault type consisted of
of 1200 samples. The was wear of theof
composed central spring (th),
6000 images withandtheno visible
size of 256 ×
1200 samples.
failure (zc). The whole
Different dataset
pressures was composed of 6000 images with the size of 256 × 256.
256. The images were resized andintowear
224 ×degrees
224, and were examined.
random horizontal flip was performed.
The images were resized into 224 × 224, and random horizontal flip was performed. The
The training dataset accounted for 70%, and the remaining 30% was the test dataset. The
training
5.frequency
Resultsdataset
and accounted for 70%, and the remaining 30% was the test dataset. The
Discussion
changed with time under each condition. The features were similar, and differ-
frequency
5.1.
changed with time under each condition. The features were similar, and different
entSignal Analysiswere hard to identify. The visual information was almost the same in the
fault modes
fault modes were hard to identify. The visual information was almost the same in the states
statesTheof sample
slipper length
wear and loose
in the slipper failure.
experiment Establishing
was 1024. The complexa learning
waveletmodel to mine
was taken as the
of slipper wear and loose slipper failure. Establishing a learning model to mine the valid
valid
the information
wavelet is important
basis function. to complete
The scale sequencefault
lengthclassification
was 256. The of bandwidth
axial pistonand pumps.
center
information is important to complete fault classification of axial piston pumps.
frequency were both 3. Figure 10 shows the time–frequency diagrams of the pressure sig-
nal. The time–frequency
5000 images were used to build the dataset. Each fault type consisted
of 1200 samples. The whole dataset was composed 0.6
of 6000 images with the size of 256 ×
256. The images
4000 were resized into 224 × 224, and random horizontal flip was performed.
0.5
The training dataset accounted for 70%, and the remaining 30% was the test dataset. The
Frequency ( f /Hz)

frequency 3000
changed with time under each condition. 0.4 The features were similar, and differ-
ent fault modes were hard to identify. The visual information was almost the same in the
0.3
states of slipper
2000 wear and loose slipper failure. Establishing a learning model to mine the
valid information is important to complete fault classification
0.2 of axial piston pumps.
1000
0.1
5000

0 0.02 0.04 0.06 0.08 0.1 0.6


4000 Time (t /s)
0.5
(a)
Frequency ( f /Hz)

3000 0.4
Figure 10. Cont.
0.3
2000
0.2
1000
0.1

0 0.02 0.04 0.06 0.08 0.1


J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 11 of 27
J. Mar. Sci. Eng. 2023, 11, 616 11 of 26

5000 5000
0.6 0.6
4000 4000
0.5 0.5
Frequency ( f /Hz)

Frequency ( f /Hz)
3000 0.4 3000 0.4

0.3 0.3
2000 2000
0.2 0.2
1000 1000
0.1 0.1

0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Time (t /s) Time (t /s)
(b) (c)
5000 5000
0.7
0.6
4000 4000 0.6
0.5
Frequency ( f /Hz)

Frequency ( f /Hz)
0.5
3000 0.4 3000
0.4
0.3
2000 2000 0.3
0.2
0.2
1000 1000
0.1 0.1

0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Time (t /s) Time (t /s)
(d) (e)
Figure10.10.Time–frequency
Figure Time–frequency distributions
distributions of
of pressure
pressure signal
signal via
viacontinuous
continuouswavelet
wavelettransform.
transform.(a)
Normal condition, (b) wear of the slipper, (c) failure of the loose slipper, (d) wear of the swash plate,
(a) Normal condition, (b) wear of the slipper, (c) failure of the loose slipper, (d) wear of the swash
(e) wear of the central spring.
plate, (e) wear of the central spring.
5.2.Identification
5.2. IdentificationResults
Results
5.2.1.
5.2.1. AnalysisofofLR
Analysis LR
The
The range
range ofof
LRLR was
was setset from
from 0.0001
0.0001 toto 0.0009.
0.0009. Then,
Then, optimizationofofthe
optimization the model
model was
was
conducted
conductedaccording
accordingtotothe thedecreasing
decreasingprinciple
principleofofLR,
LR,which
whichwas was0.0009,
0.0009,0.0008,
0.0008,0.0006,
0.0006,
0.0004,
0.0004,0.0002,
0.0002, and
and0.0001
0.0001 ininthetheexperiments.
experiments. AA small
smallloss was
loss wasfound
found ininlowlowLR,
LR,asas
shown
shown
ininFigure
Figure 11.11.
The
The lowest
lowest loss
losswaswasfound
foundwhenwhen LRLR was
was0.0001.
0.0001.The
Therate
rateofofreduction
reductionbecame
became
slow
slowwith
with thethe
increase
increasein LR.
in LR.TheThelossloss
decreased to the
decreased to minimum
the minimum valuevalue
fastest whenwhen
fastest the LRthe
wasLR0.0002. It fluctuated
was 0.0002. with the
It fluctuated withincrease in epochs
the increase whenwhen
in epochs the LR was
the LR0.0009. Therefore,
was 0.0009. There-
it fore,
can be seenbethat
it can seena large
that aLR largeis detrimental to theto
LR is detrimental training of this
the training ofmodel. The change
this model. in
The change
test accuracy followed an opposite pattern compared to the loss. It
in test accuracy followed an opposite pattern compared to the loss. It reached lowest whenreached lowest when
thetheLRLRwaswas 0.0009
0.0009 and
and highest
highest when
when it it
was 0.0001.
was 0.0001.TheTheaccuracy
accuracycurve
curvepresented
presentedgoodgood
convergence. Therefore, the initial LR was chosen
convergence. Therefore, the initial LR was chosen as 0.0001. as 0.0001.
The classification accuracy in 10 tests was analyzed for each LR. As can be seen from
Figure 12, the improved AlexNet (I-AlexNet) presented relative stability with an LR of
0.0001. The accuracy fluctuated to varying degrees with small or large LR, especially when
it was 0.0009, 0.0008, and 0.0004. Therefore, for pressure signals, the I-AlexNet showed
good stability as the LR was 0.0001.
The average accuracy for 10 tests under different LRs is shown in Figure 13. The
average identification accuracy of 10 experiments reached the highest level of 99.99% when
the LR was 0.0001. Combined with the above analysis on model convergence, the model
showed great performance under this LR.
J.J. Mar.
Mar. Sci.
Sci. Eng.
Eng. 2023,
2023, 11,
11, x616
FOR PEER REVIEW 1212ofof 27
26

(a) (b)
Figure 11. The curves of loss and test accuracy. (a) Loss, (b) accuracy.

The classification accuracy in 10 tests was analyzed for each LR. As can be seen from
Figure 12, the improved AlexNet (I-AlexNet) presented relative stability with an LR of
0.0001. The accuracy fluctuated to varying degrees with small or large LR, especially when
(a) (b)
it was 0.0009, 0.0008, and 0.0004. Therefore, for pressure signals, the I-AlexNet showed
good
Figurestability
Figure 11.
11. The as theof
The curves
curves LR
of was
loss
loss 0.0001.
and
and test accuracy. (a)
test accuracy. Loss,(b)
(a) Loss, (b)accuracy.
accuracy.

The classification accuracy in 10 tests was analyzed for each LR. As can be seen from
100
Figure 12, the improved AlexNet (I-AlexNet) presented relative stability with an LR of
99 The accuracy fluctuated to varying degrees with small or large LR, especially when
0.0001.
it was 0.0009, 0.0008, and 0.0004. Therefore, for pressure signals, the I-AlexNet showed
Accuracy (%)

good98stability as the LR was 0.0001.


LR0.0001
100
97 LR0.0002
LR0.0004
99
96 LR0.0006
LR0.0008
Accuracy (%)

LR0.0009
98
95
1 2 3 4 5 6 7 8 9 10
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW LR0.0001 Trial 13 of 27
97 LR0.0002
The accuracy
Figure 12. The accuracy in
in 10
10 trials.
LR0.0004
96 LR0.0006
The average accuracy for 10 tests under different LRs is shown in Figure 13. The av-
LR0.0008
100.50 LR0.0009 reached the highest level of 99.99% when
erage identification accuracy of 10 experiments
95
the LR1 was2 0.0001.
3 4 5
Combined 6 with
7 the
8 above
9 10
analysis on model convergence, the model
Trial
showed great performance
99.99 under this LR.
100.00
Figure 12. The accuracy in 10 trials.

The average accuracy for 10 tests under different LRs is shown in Figure 13. The av-
99.50
erage identification accuracy of 10 experiments reached the highest level of 99.99% when
Accuracy (%)

the LR was 0.0001. Combined with the above analysis on model convergence, the model
showed great performance under this LR.
99.00

98.50

98.00
LR0.0001 LR0.0002 LR0.0004 LR0.0006 LR0.0008 LR0.0009
LR

Figure13.
Figure 13.The
Theaverage
averageaccuracy
accuracywith
withdifferent
differentLRs.
LRs.

5.2.2. Analysis of Epoch


When analyzing the influence of epochs on fault classification, only the epochs were
changed and other parameters were fixed. The batch size was 32, and the initial value of
the size and number of convolution kernels was the same as that in Section 5.2.1. The
initial LR was selected as 0.0001. Ten independent trials were carried out by selecting the
epoch as 60. The average accuracy is shown in Figure 14.
98.00
LR0.0001 LR0.0002 LR0.0004 LR0.0006 LR0.0008 LR0.0009
LR
J. Mar. Sci. Eng. 2023, 11, 616 13 of 26
Figure 13. The average accuracy with different LRs.

5.2.2.
5.2.2. Analysis
Analysis of
of Epoch
Epoch
When
When analyzing the
analyzing the influence
influence of
of epochs
epochs onon fault
fault classification,
classification, only
only the
the epochs
epochs were
were
changed
changed and other parameters were fixed. The batch size was 32, and the initial value
and other parameters were fixed. The batch size was 32, and the initial value of of
the size and
the size andnumber
numberofofconvolution
convolution kernels
kernels waswas
the the
same same as in
as that that in Section
Section 5.2.1.initial
5.2.1. The The
initial
LR was LR was selected
selected as 0.0001.
as 0.0001. Ten independent
Ten independent trialscarried
trials were were carried
out byout by selecting
selecting the epochthe
epoch as 60.
as 60. The The average
average accuracy
accuracy is shown
is shown in Figure
in Figure 14. 14.

(a) (b)
Figure
Figure 14.
14. Tendency
Tendency of
of loss
loss and
and accuracy. (a) Loss,
accuracy. (a) Loss,(b)
(b)accuracy.
accuracy.

The training loss gradually decreased and the accuracy showed increasing
increasing trend with
increasing epochs. The test accuracy was slightly higher than that of the training samples
in less than
than1010epochs.
epochs.The
Theaccuracy
accuracyexceeded
exceeded 99%
99%after more
after than
more 10 epochs.
than The The
10 epochs. loss loss
was
stable
was afterafter
stable more thanthan
more 30 epochs, indicating
30 epochs, thatthat
indicating thethe
neural network
neural networkhadhad
been trained
been to
trained
convergence.
to convergence. Therefore, thethe
Therefore, epoch of of
epoch 30 30
was found
was foundto to
be be
suitable.
suitable.
J. Mar. Sci. Eng. 2023, 11, x FOR PEER5.2.3.
REVIEWAnalysis of Dropout
Dropout RateRate 14 of 27
5.2.3. Analysis of
In the model training stage,
In the model training stage, the the dropout
dropout strategy
strategy was
was employed
employed in in the
the full
full connection
connection
layer. The
layer. Therange
rangewas wassetsetasas0.1
0.1
toto
0.90.9
forfor
thethe analysis
analysis of the
of the model
model performance
performance 0.1, 0.5,
0.1, 0.3, 0.3,
0.5, and
0.7, 0.7, 0.9,
andrespectively.
0.9, respectively. The accuracy
The accuracy curvescurves under different
under different dropoutdropout
ratios areratios
shownare
shown
in Figurein Figure
15. The15. The
loss loss showed
showed an increasing
an increasing tendency
tendency as theasdropout
the dropout
ratioratio increased.
increased. In
In contrast,
contrast, the the accuracy
accuracy decreased
decreased gradually
gradually withwith the increase
the increase in dropout.
in dropout. The model
The model rep-
represented
resented better
better convergence
convergence under
under small
small ratios
ratios with
with theincrease
the increaseininepoch,
epoch,andand aa small
small
difference was found among the maximum accuracy of test samples under
difference was found among the maximum accuracy of test samples under different ratios. different ratios.

(a) (b)
Figure
Figure 15.
15. Trend
Trend of
of loss
loss and
and accuracy. (a) Loss,
accuracy. (a) Loss,(b)
(b)accuracy.
accuracy.

The average accuracy of 10 trials under different dropout ratios is displayed in Figure
16. There was no obvious difference in the classification accuracy with different dropout
ratios. The average accuracy was slightly higher when the dropout was 0.5. Considering
the convergence analysis and the most network structures to be obtained, the dropout
ratio of 0.5 was selected.

100
(a) (b)
J. Mar. Sci. Eng. 2023, 11, 616 14 of 26
Figure 15. Trend of loss and accuracy. (a) Loss, (b) accuracy.

The average accuracy of 10 trials under different dropout ratios is displayed in Figure
The average accuracy of 10 trials under different dropout ratios is displayed in
16. There
Figure 16. was nowas
There obvious difference
no obvious in theinclassification
difference accuracy
the classification withwith
accuracy different
differ-dropou
ent dropout ratios. The average accuracy was slightly higher when the dropout wasConsidering
ratios. The average accuracy was slightly higher when the dropout was 0.5. 0.5.
the convergence
Considering analysis and
the convergence the and
analysis mostthenetwork structures
most network to be
structures obtained,
to be obtained,the
thedropou
ratio of ratio
dropout 0.5 was selected.
of 0.5 was selected.

100

99.5
Accuracy (%)

99

98.5

98
0.1 0.3 0.5 0.7 0.9
Dropout ratio
Figure 16.Classification
Figure16. Classificationaccuracy under
accuracy the different
under dropout
the different ratios. ratios.
dropout
5.2.4. Effect of Batch Size on Diagnostic Accuracy
5.2.4. Effect of Batch Size on Diagnostic Accuracy
The epoch was set as 30, and the batch sizes were 8, 16, 21, 32, 42, 56, 64, and 84. The
The
average epoch was
accuracy of 10set as 30, andrepeated
independent the batch sizes
tests were in
is shown 8, Figure
16, 21,17.
32,Little
42, 56, 64, and 84. The
difference
average
in accuracy
accuracy was foundof 10 independent
among repeated
different batch sizes,tests is shown
and the averageinwas
Figure
more17. Little
than difference
99.9%.
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 15 of 27
in accuracy
When was
the batch found
size among
was 21, different batch
the classification sizes,
accuracy and the
reached average
100%. was more
The model than 99.9%
achieved
good
Whenrecognition
the batch for
sizedifferent
was 21,fault categories of the
the classification axial piston
accuracy pump.
reached 100%. The model achieved
good recognition for different fault categories of the axial piston pump.
100

99.8
Accuracy (%)

99.6

99.4

99.2

99
8 24 40 56 72 84
Batch size

17.Relationship
Figure 17.
Figure Relationshipbetween classification
between accuracy
classification and batch
accuracy and size.
batch size.

As there was no significant difference in accuracy of models at different batch sizes,


As there was no significant difference in accuracy of models at different batch sizes
the calculation time was analyzed. As can be seen from Table 1, when the batch size was
the calculation time was analyzed. As can be seen from Table 1, when the batch size was
small, the calculation time was long. When it increased to more than 32, the time difference
small,
of eachthe calculation
epoch timeFor
was smaller. wasthelong. When
present it increased
sample to more
set, a smaller batchthan
size 32,
wasthe time difference
conducive
of each epoch was smaller. For the present sample set, a smaller batch size
to improving the accuracy of fault classification. It was more beneficial to the judgment was conducive
of
to
theimproving the accuracy
final fault category. of fault
Therefore, the classification.
batch size of 32Itwas
wasfound
moretobeneficial
be suitabletoaccording
the judgment of
to comprehensive
the analysisTherefore,
final fault category. of classification accuracy
the batch sizeand calculation
of 32 was foundcost.to be suitable according
to comprehensive analysis of classification accuracy and calculation cost.

Table 1. Computational time of different batch sizes in one epoch.

Batch
1 2 3 4 5 6 7 8 9 10 Time (s)
Size
8 16.16 16.08 16.17 16.27 16.10 16.20 16.02 16.20 16.16 16.12 16.15
J. Mar. Sci. Eng. 2023, 11, 616 15 of 26

Table 1. Computational time of different batch sizes in one epoch.

Batch
1 2 3 4 5 6 7 8 9 10 Time (s)
Size
8 16.16 16.08 16.17 16.27 16.10 16.20 16.02 16.20 16.16 16.12 16.15
16 12.12 12.93 12.01 11.99 12.37 12.53 12.23 12.15 12.00 12.06 12.24
21 11.94 12.04 11.77 11.82 11.79 11.89 11.75 11.77 11.72 11.74 11.82
32 10.40 10.33 10.38 10.34 10.63 10.513 10.37 10.47 10.47 10.58 10.45
42 10.09 10.02 10.08 10.15 10.05 10.20 10.03 10.13 9.94 10.19 10.09
56 9.78 9.90 10.05 10.06 9.80 9.89 9.93 9.92 10.09 10.09 9.95
64 9.97 9.82 9.97 9.78 9.77 9.70 9.94 9.96 10.28 9.90 9.91
84 9.99 9.83 9.54 9.69 10.28 10.26 10.02 9.91 9.79 9.97 9.93

5.2.5. Effect of Kernel Number on Diagnostic Accuracy


The epoch was set as 30 and the batch size was set as 32 in the experiment. Meanwhile,
the other parameters remained unchanged. Figure 18 depicts the accuracy at different
kernel ratios. The accuracy of the model was more than 99.9%. There was a slight increase
in accuracy with the increase in ratio. The accuracy achieved up to 99.97% when the ratio
was 2. There was no obvious change when the ratio increased to more than 3 times. Due to
the insignificant difference in accuracy with different kernel rates, the time consumption
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 16 of 27
was analyzed. As shown in Table 2, a small change in average time was found with the
increase in kernel number of C2.

99.9
Accuracy (%)

99.8

99.7

99.6

99.5
1 1.5 2 2.5 3 3.5 4
Kernel ratio of C2 and C1

Figure 18. Classification accuracy by different kernel number.


number.

Table 2. Computational time of different kernel numbers in one epoch.

Time
Time
C2/C1
C2/C1 1 1 2 2 3 3 4 4 5 5 66 7 7 88 99 10
10
(s)
(s)
11 10.21
10.21 10.65
10.65 11.01
11.01 10.32
10.32 10.57
10.57 10.73
10.73 10.56
10.56 11.42
11.42 10.38
10.38 10.58
10.58 10.64
10.64
1.5
1.5 11.22
11.22 11.17
11.17 11.01
11.01 11.14 11.15
11.14 11.15 11.00
11.00 11.17
11.17 11.12
11.12 11.03
11.03 11.03
11.03 11.10
11.10
22 11.12
11.12 11.19
11.19 10.99
10.99 11.26 11.09
11.26 11.09 11.06
11.06 10.91
10.91 10.93
10.93 11.05
11.05 11.21
11.21 11.08
11.08
2.5 10.66 10.91 10.98 11.00 10.90 10.95 10.88 10.65 10.85 10.75 10.85
2.53 10.66
11.89
10.91
11.98
10.98
12.05
11.00
10.89
10.90
10.96
10.95
10.95
10.88
11.38
10.65
11.55
10.85
11.03
10.75
11.02
10.85
11.37
33.5 11.89
11.20 11.98
11.14 12.05
11.05 10.89
11.22 10.96
11.042 10.95
11.08 11.38
10.96 11.55
11.034 11.03
10.91 11.02
11.16 11.37
11.08
3.54 11.20
11.41 11.14
11.14 11.05
11.23 11.22
11.09 11.042
11.02 11.08
11.10 10.96
10.90 11.034
11.03 10.91
11.109 11.16
11.35 11.08
11.14
4 11.41 11.14 11.23 11.09 11.02 11.10 10.90 11.03 11.109 11.35 11.14
5.2.6. Effect of Kernel Size on Diagnostic Accuracy
5.2.6. Effect of Kernel Size on Diagnostic Accuracy
The influence of kernel size on accuracy was also examined, as shown in Figure 19.
The influence
The highest of kernel
test accuracy wassize on accuracy
obtained was
when the also examined,
combination as shown
is k(9,5). in Figure
The kernel 19.
size was
The highest test accuracy was obtained
selected as 9 × 9 for C1 and 5 × 5 for C2. when the combination is k(9,5). The kernel size
was selected as 9 × 9 for C1 and 5 × 5 for C2.

100

99.9
%)
4 11.41 11.14 11.23 11.09 11.02 11.10 10.90 11.03 11.109 11.35 11.14

5.2.6. Effect of Kernel Size on Diagnostic Accuracy


The influence of kernel size on accuracy was also examined, as shown in Figure 19
J. Mar. Sci. Eng. 2023, 11, 616 The highest test accuracy was obtained when the combination is k(9,5). The
16 ofkernel
26 size
was selected as 9 × 9 for C1 and 5 × 5 for C2.

100

99.9

Accuracy (%)
99.8

99.7

99.6

99.5
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW
k(9,5) k(7,5) k(5,5) k(3,5)k(11,3)k(9,3) k(7,3) k(5,3) k(3,3) 17 of 27
Combination of k1 and k2
Figure 19.Classification
Figure19. Classificationaccuracy by different
accuracy kernel
by different size. size.
kernel
The optimal parameters obtained above were as follows: convolutional kernel, C1
As can be seen from the above analysis, different hyperparameters had different effects
and As can
C2; be seen
number, 48 from
and 96;thesize
above analysis,
of k1, 9 × 9; different
size of k2,hyperparameters
5 × 5. The had
size different ef
on the fault diagnosis result. An obvious difference could be found frombatch
the analysis was
of 32, the
fects
epoch
the on
LR and the
wasthe fault diagnosis
30,kernel
the dropout result.
size. Asratio An
was
for the obvious difference
0.5,parameters,
other and the initial
suchLR could be
wasepoch,
as the found
0.0001.theTofrom the
verify
batch size, analysis
the relia-
of the
bility
and LR
theand andnumber,
kernel the of
stability kernel size.
parameters,
there was no Assignificant
forindependent
10 the other parameters,
influencetests were
on the such as
carried
diagnosis the As
out. epoch,
performance.shown theinbatch
In Fig
size, and the
ure 20, theonly
particular, kernel
difference number, there
between the
a slight difference was
was test no significant
and training
observed influence
accuracy
with different on the
wasratios.
dropout diagnosis perfor
not significant, and the
mance.
test In particular,
accuracy was above only99.9%.
a slight difference
The was observed
stable results proved the witheffectiveness
different dropoutof the ratios.
model
5.2.7. Validation of Diagnostic Model
structure and parameters.
5.2.7.The optimal parameters
Validation
Dimensionality of Diagnosticobtained
reduction wasabove
Model were asusing
achieved follows: convolutional
t-stochastic kernel, embedding
neighbor C1 and (t
C2; number, 48 and 96; size of k1, 9 × 9; size of k2, 5 × 5. The batch size was 32, the epoch
SNE). The visualization results are shown in Figure 21. The features of the five states in
was 30, the dropout ratio was 0.5, and the initial LR was 0.0001. To verify the reliability
the original
and input
stability of signal were
parameters, evenly distributed,
10 independent and it was
tests were carried out. difficult
As showntoincomplete
Figure 20,identifi-
cation
the of the fault
difference betweencategories
the test at
andthis stage. accuracy
training The characteristics of different
was not significant, andfault classes be-
the test
accuracy was above 99.9%. The stable results proved the effectiveness of the model structure connec-
came much easier to distinguish from the initial convolution layer to the final full
tionparameters.
and layer.

100

99.5

99
Accuracy (%)

Test sample
98.5 Train sample
98

97.5

97
1 2 3 4 5 6 7 8 9 10
Trial
Figure 20.Curves
Figure20. Curvesofof
training andand
training testing accuracy
testing in 10 in
accuracy trials.
10 trials.

30 Dimensionality reduction was achieved 80 using t-stochastic neighbor embedding (t-SNE).


The visualization results are shown in Figure 21. The features of the five states in the original
20 input signal were evenly distributed, and60it was difficult to complete identification of the
fault categories at this stage. The characteristics of different fault classes became much easier
10 40
to distinguish from the initial convolution layer to the final full connection layer.
0 20

-10 0

-20 -20

-30 -40

-40 -60
98

A
97.5

97
1 2 3 4 5 6 7 8 9 10
J. Mar. Sci. Eng. 2023, 11, 616 Trial 17 of 26

Figure 20. Curves of training and testing accuracy in 10 trials.

30 80

20 60

10 40

0 20

-10 0

-20 -20

-30 -40

-40 -60
-50 0 50 -100 -50 0 50 100
(a) (b)
80 80

60 60

40
40
20
20
0
0
-20
-20
-40

-40 -60
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 18 of 27
-60 -80
-60 -40 -20 0 20 40 60 80 -100 -50 0 50 100
(c) (d)
80 80

60 60

40
40
20
20
0
0
-20
-20
-40

-60 -40

-80 -60
-100 -50 0 50 100 -100 -50 0 50 100
(e) (f)

Figure 21. Visualization of feature representation via t-SNE. (a) Input data, (b) C1, (c) C3, (d) C5,
Figure
(e) FC2,21.
(f) Visualization of feature representation via t-SNE. (a) Input data, (b) C1, (c) C3, (d) C5,
classifier layer.
(e) FC2, (f) classifier layer.
There was some overlapping among the feature distributions of different fault types
whenThere
the model
was somewas overlapping
shallow. Theamongmixingthephenomenon was moreofobvious
feature distributions differentthrough the
fault types
learning
when theofmodel
C1, and it could
was be seen
shallow. Thethat misclassification
mixing phenomenonofwas the different fault types
more obvious existed
through the
at this stage. The features of th gradually gathered after learning of C3, while
learning of C1, and it could be seen that misclassification of the different fault types ex- the features
of sx at
isted and hxstage.
this were Themixed with each
features of th other. Thegathered
gradually features after
of each fault type
learning of C3,showed
while thean
aggregated trend by the extraction of C5. There was still some overlapping
features of sx and hx were mixed with each other. The features of each fault type showed between the
feature distributions
an aggregated trend of byththeand sx. The features
extraction of thewas
of C5. There different categories
still some in the FC
overlapping layer
between
could be clearly
the feature identified.
distributions of Although
th and sx.there
The was some
features ofdistance among
the different the features
categories of the
in the FC
same fault type, the features of one type were collected in the same
layer could be clearly identified. Although there was some distance among the features ofarea. The feature
representations
the same fault type,of lowthelevel wereoftransformed
features one type wereintocollected
those of high
in thelevel.
sameThe identification
area. The feature
representations of low level were transformed into those of high level. The identification
and classification of the signal were effectively accomplished for an axial piston pump via
the feature learning of different network layers.
To explore the diagnostic performance of the proposed model, 10 independent tests
were conducted and different CNN models were used for comparative analysis. As can
learning of C1, and it could be seen that misclassification of the different fault types ex-
isted at this stage. The features of th gradually gathered after learning of C3, while the
features of sx and hx were mixed with each other. The features of each fault type showed
an aggregated trend by the extraction of C5. There was still some overlapping between
the feature distributions of th and sx. The features of the different categories in the FC
J. Mar. Sci. Eng. 2023, 11, 616 layer could be clearly identified. Although there was some distance among the features 18 of of
26
the same fault type, the features of one type were collected in the same area. The feature
representations of low level were transformed into those of high level. The identification
and classification
classification of the signal were effectively accomplished for an axial piston pump via
feature learning
the feature learning ofof different
different network
network layers.
layers.
To explore
To explore the
the diagnostic
diagnostic performance
performance of of the
the proposed
proposed model,
model, 1010 independent
independent tests
were conducted
were conductedand anddifferent
differentCNN CNNmodels
modelswere
were used
used forfor comparative
comparative analysis.
analysis. AsAscancanbe
be observed
observed fromfrom Figure
Figure 22, 22,
thethe I-AlexNet
I-AlexNet converged
converged better
better than
than other
other models.
models. In In
thethe ini-
initial
tial stage
stage of feature
of feature extraction,
extraction, the accuracy
the accuracy of LeNet-based
of the the LeNet-basedmodelmodel
waswas lower.
lower. When Whenthe
the epochs
epochs werewere
moremore
thanthan 10,accuracy
10, the the accuracy of I-AlexNet
of I-AlexNet was more
was more than but
than 99%, 99%, but
the the ac-
accuracy
of T-LeNet
curacy was stillwas
of T-LeNet less still
thanless
90%. After
than 90%. 30After
epochs 30 of training,
epochs I-AlexNet,
of training, T-AlexNet,
I-AlexNet, T-
3-CNN,
AlexNet,and 3-CNN
3-CNN, andall3-CNN
achieved all high classification
achieved accuracy. accuracy.
high classification

Figure 22.
Figure 22. Curve
Curve of
of testing
testing accuracy
accuracy for
for different
different CNN
CNN models.
models.

As can be seen from Table 3, the average accuracy of I-AlexNet reached 99.99%. The
average accuracy of T-LeNet was only 94.06%, while the other models achieved more
than 99%. Comparing VGG11 and deep network structure, the average accuracy and
standard deviation were the same. This further confirmed the recognition performance of
I-AlexNet. I-AlexNet showed good classification effect and stability in implementing fault
identification of an axial piston pump.

Table 3. Average classification accuracy of different CNN models.

Model Average Accuracy (%) Standard Deviation


T-LeNet 94.06 0.08442
I-LeNet 99.61 0.001624
3-CNN 99.92 0.001342
4-CNN 99.93 0.001135
VGG11 99.99 0.0002530
T-AlexNet 99.94 0.001386
I-AlexNet 99.99 0.0002530

The confusion matrix of one trial is shown in Figure 23. As for I-LeNet and T-LeNet,
the categories of recognition errors were mainly concentrated in hx, sx, and th. The
identification of xp was obviously improved for I-LeNet. The misrecognition for sx and
th was found in 3-CNN and 4-CNN. As a much deeper model, VGG11 achieved accurate
classification. Misclassification was observed for sx and th faults using the T-AlexNet
model. Among them, six samples in sx were misclassified as hx, and two samples in th
were misclassified as sx. The I-AlexNet model had no misclassification phenomenon for
any of the fault categories. The classification accuracy of each category is displayed in
Table 4. The precision of I-AlexNet for the five states of an axial piston pump reached
100%. The classification precision of sx and hx was improved by the model improvement
compared to the traditional model.
found in 3-CNN and 4-CNN. As a much deeper model, VGG11 achieved accurate classi-
fication. Misclassification was observed for sx and th faults using the T-AlexNet model.
Among them, six samples in sx were misclassified as hx, and two samples in th were mis-
classified as sx. The I-AlexNet model had no misclassification phenomenon for any of the
fault categories. The classification accuracy of each category is displayed in Table 4. The
J. Mar. Sci. Eng. 2023, 11, 616 precision of I-AlexNet for the five states of an axial piston pump reached 100%. The clas- 19 of 26
sification precision of sx and hx was improved by the model improvement compared to
the traditional model.

J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 20 of 27

(a)

(b) (c)

(d) (e)

(f) (g)
Figure 23. Confusion matrix in one trial. (a) I-LeNet, (b) I-LeNet, (c) 3-CNN, (d) 4-CNN, (e)
Figure 23. Confusion matrix in one trial. (a) I-LeNet, (b) I-LeNet, (c) 3-CNN, (d) 4-CNN, (e) VGG11,
VGG11, (f) I-AlexNet, (g) T-AlexNet.
(f) I-AlexNet, (g) T-AlexNet.
Table 4. Precision of I-AlexNet and T-AlexNet for different fault types.

Fault Type I-AlexNet (%) T-AlexNet (%)


zc 100.0 100.0
xp 100.0 100.0
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 21 of 27
J. Mar. Sci. Eng. 2023, 11, 616 20 of 26

sx of I-AlexNet and T-AlexNet


Table 4. Precision 100.0
for different fault types. 99.4
hx 100.0 98.4
Fault
th Type I-AlexNet
100.0 (%) T-AlexNet
100.0 (%)
zc 100.0 100.0
xp
To investigate the classification effect 100.0 100.0
of models on each fault category, different
sx 100.0
CNN models were used for comparative analysis. As can be seen from Figure 99.4 24, the per-
hx 100.0
formance of T-LeNet was significantly lower than that of other models.98.4 The other four
th 100.0 100.0
models had no significant difference in the classification results of zc, xp, and sx. However,
for the categories of hx and th, the classification effect of the I-AlexNet model showed a
slightToimprovement
investigate the
andclassification effectother
was better than of models on each fault category, different CNN
models.
models were used for comparative analysis.
The computing time of one independent test wasAs can be seen from
taken Figure
for 24, the
analysis, andperformance
the results
of T-LeNet was significantly lower than that of other models. The
are depicted in Table 5. The mean computing time of the T-AlexNet model was other four models
11.04hads,
no significant
while the timedifference
consumption in the
of classification
the I-AlexNetresults
model of zc,slightly
was xp, andmore.
sx. However,
Compared fortothe
3-
categories
CNN, 4-CNN, of hxand
andVGG11,
th, the classification
the average time effect of the I-AlexNet
consumption of themodel showed
I-AlexNet a slight
model was
improvement and
significantly lower. was better than other models.

102.00
100.00
98.00
96.00
Accuracy (%)

94.00
92.00
90.00
88.00
86.00
84.00
T-
I-AlexNet T-LeNet I-LeNet 3-CNN 4-CNN VGG11
AlexNet
hx 100.00 96.40 99.72 100.00 98.89 100.00 100.00
sx 100.00 90.00 100.00 100.00 100.00 98.33 100.00
th 100.00 98.89 99.44 99.44 100.00 99.44 99.72
xp 100.00 94.44 100.00 100.00 100.00 100.00 100.00
zc 100.00 98.06 100.00 100.00 100.00 100.00 100.00

Figure 24. Histogram of accuracy of different CNN models for each fault type.
Figure 24. Histogram of accuracy of different CNN models for each fault type.

TableThe
5. Computational timeof
computing time of one
different CNN models
independent testinwas
onetaken
epoch.for analysis, and the results
are depicted in Table 5. The mean computing time of the T-AlexNet model was Time 11.04 s,
Model 1 2 3 4 5 6 7 8 9
while the time consumption of the I-AlexNet model was slightly more. Compared to 10
(s)
3-CNN, 4-CNN, and VGG11, the average time consumption of the I-AlexNet model was
I-AlexNet 11.09 11.06 11.22 12.23 11.14 11.12 11.19 11.09 11.09 11.16 11.24
significantly lower.
T-AlexNet 11.15 11.05 11.22 11.15 11.08 10.90 10.95 10.91 11.05 10.96 11.04
T-LeNet 4.14 4.38 4.13
5.2.8. 4.08 of Diagnostic
Optimization 4.28 4.18
Model 4.18 4.28 4.16 4.06 4.19
I-LeNet 4.78 4.63 4.89
In the process of BO, some presetting hyperparameters of the I-AlexNet model5.08
4.83 7.72 4.72 4.80 4.75 4.88 4.80 were
3-CNN 12.99 13.05 the 12.97 13.24 The
same as above. 13.20 13.26
selection 13.03 for each
of the range 12.99hyperparameter
12.89 13.01 13.06on
was based
4-CNN 13.38 13.45 13.60 manual
previous 13.68tuning.
13.67 13.56
The parameter 13.79 13.52
optimization 13.67 in Figure
is depicted 13.4325. Table
13.58 6
VGG11 31.85 32.05 32.12
shows 31.25
the optimization 31.27 31.67
results. The 31.43 speed
convergence 31.19 31.02 AlexNet
of the improved 32.18 was31.60
faster
with BO. The classification accuracy reached 100% after more than 10 iterations.
5.2.8. Optimization of Diagnostic Model
J. Mar. Sci. Eng. 2023, 11, 616 21 of 26

Table 5. Computational time of different CNN models in one epoch.

J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW Time 22 of 2


Model 1 2 3 4 5 6 7 8 9 10
(s)
I-AlexNet 11.09 11.06 11.22 12.23 11.14 11.12 11.19 11.09 11.09 11.16 11.24
T-AlexNet 11.15 11.05 11.22 11.15 11.08 10.90 10.95 10.91 11.05 10.96 11.04
T-LeNet 4.14 4.38
In the process
4.13 4.08
of BO,
4.28
some4.18
presetting
4.18
hyperparameters
4.28 4.16
of the4.06
I-AlexNet model were
4.19
I-LeNet 4.78 4.63 the same
4.89 as above.
4.83 The7.72
selection4.72
of the range
4.80 for 4.75
each hyperparameter
4.88 4.80 was5.08
based on pre
3-CNN 12.99 13.05vious12.97
manual 13.24
tuning. The
13.20parameter
13.26 optimization
13.03 is depicted
12.99 12.89 in Figure
13.01 25.13.06
Table 6 shows
4-CNN 13.38 13.45the optimization
13.60 13.68results.
13.67 13.56
The convergence 13.79
speed 13.52 13.67
of the improved 13.43
AlexNet 13.58 faster with
was
VGG11 31.85 32.05 32.12 31.25 31.27 31.67 31.43 31.19 31.02 32.18 31.60
BO. The classification accuracy reached 100% after more than 10 iterations.

100

99.95
Accuracy (%)

99.9

99.85

99.8

99.75
0 20 40 60 80 100
Iteration
Figure 25.Classification
Figure25. Classificationaccuracy withwith
accuracy BO. BO.

Table 6. The range and results of hyperparameter optimization.


Table 6. The range and results of hyperparameter optimization.
Serial Number Hyperparameter Range Optimal Result
Serial Number Hyperparameter Range Optimal Result
1 1 LR LR [0.0001, 0.001]
[0.0001, 0.001] 0.00012 0.00012
2 Batch size [24, 56] 51
2 3 Batch
Epoch size [20, 50] [24, 56] 33 51
3 4 Epoch
Size of convolutional kernel (C1) [5, 9] [20, 50] 5 33
4 5 Size of convolutional kernel (C2)
Size of convolutional kernel (C1) [3, 7]
[5, 9] 5
5
6 Number of convolutional kernel (C1) [30, 60] 54
5 7 Size of
Number of convolutional kernel
convolutional kernel (C2) (C2) [80, 140] [3, 7] 97 5
6 8 Number Neurons of FC1
of convolutional kernel (C1) [1000, 1600][30, 60] 1457 54
9 Neurons of FC2 [400, 800] 541
7 10 Number of convolutional kernel (C2)
Dropout ratio [0.1, 0.9]
[80, 140] 0.31
97
8 Neurons of FC1 [1000, 1600] 1457
9 Neurons of FC2 [400, 800] 541
The hyperparameter combinations obtained by BO were used for model verification.
10 Dropout ratio [0.1, 0.9] 0.31
As can be seen from the training loss curve in Figure 26, the model had good convergent re-
sults. The accuracy trend of the train and test samples showed a high degree of coincidence.
The difference
The small hyperparameter
indicatedcombinations obtained
good stability of by BOmodel.
the B-AlexNet were used for model
The confusion verification
matrix
Asone
of canexperiment
be seen from and the
the training loss precision
classification curve in are
Figure 26, the
depicted model 27
in Figure had good
and Tableconvergen
7,
respectively.
results. The The identification
accuracy trend ofeffects of I-AlexNet
the train and testand B-AlexNet
samples wereasuperior
showed to thatof coinci
high degree
of T-AlexNet,
dence. and the
The small precision indicated
difference reached upgoodto 100%. The B-AlexNet
stability especially model.
of the B-AlexNet showed The
an confu
improvement for sx and hx compared to T-AlexNet.
sion matrix of one experiment and the classification precision are depicted in Figure 27
and Table 7, respectively. The identification effects of I-AlexNet and B-AlexNet were su
perior to that of T-AlexNet, and the precision reached up to 100%. The B-AlexNet espe
cially showed an improvement for sx and hx compared to T-AlexNet.
Mar. Sci. Eng. 2023, 11, 616
J. Mar. x FOR PEER REVIEW 2322of
of 27
26
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 23 of 27

(a)
(a) (b)
(b)
Figure 26.
26.Trend
Figure26.
Figure of
Trendof
Trend loss
ofloss and
lossand identification
andidentification accuracy.
identification accuracy. (a)Loss,
(a)
accuracy. (a) Loss,(b)
Loss (b)accuracy.
, (b) accuracy. .
accuracy

(a)

(b) (c)
(b) (c)
Figure 27. Confusion matrix of a trial. (a) B-AlexNet, (b) I-AlexNet,(c) (c) T-AlexNet.
27. Confusion matrix of a trial. (a)
Figure 27.
Figure (a) B-AlexNet,
B-AlexNet,(b) I-AlexNet,
(b) I-AlexNet, T-AlexNet.
(c) T-AlexNet.
Table 7.Precision
Precision ofeach
each condition with
with different models.
models.
Table 7. Precision of
Table 7. of each condition
condition with different
different models.
Fault Type B-AlexNet I-AlexNet T-AlexNet
Fault Type
Fault Type B-AlexNet
B-AlexNet I-AlexNet
I-AlexNet T-AlexNet
T-AlexNet
zc 100.0 100.0 100.0
zc
zc 100.0
100.0 100.0
100.0 100.0
100.0
xp 100.0 100.0 100.0
xp
xp 100.0
100.0 100.0
100.0 100.0
100.0
sx
sx 100.0
100.0 100.0
100.0 99.4
99.4
sx 100.0 100.0 99.4
hx
hx 100.0
100.0 100.0
100.0 98.4
98.4
hx
th 100.0
100.0 100.0
100.0 98.4
100.0
th 100.0 100.0 100.0
th 100.0 100.0 100.0
J. Mar. Sci. Eng. 2023, 11, 616 23 of 26
J. Mar. Sci. Eng. 2023, 11, x FOR PEER REVIEW 24 of 27

The feature learning of B-AlexNet was investigated by employing t-SNE, and the
Theare
results feature
shown learning of B-AlexNet
in Figure was investigated
28. The features by employing
in raw input are presentedt-SNE,
withand the re-
scattered
sults are It
points. shown
was in Figureto
difficult 28.find
The afeatures
certaininprinciple
raw inputbetween
are presented with
features ofscattered
differentpoints.
faults.
It was difficult
Gradually, to find aof
the features certain principle
the same between
fault type features in
got together of clusters
differentdue
faults. Gradually,
to the learning
the features of thelayers.
of convolutional same fault
Fivetype got together
categories in clusters
were clearly due to through
observed the learning
the of convolu-
subsequent
tional layers. Five
fully connected categories
layers were clearly
and classifier observed through
layer. B-AlexNet could gain theuseful
subsequent fullyincon-
knowledge the
nected layers and classifier layer. B-AlexNet could gain useful knowledge in
time–frequency representations and achieve the classification of five typical states of an the time–
frequency
axial pistonrepresentations
pump. and achieve the classification of five typical states of an axial
piston pump.
30 80

20 60

10 40

0 20

-10 0

-20 -20

-30 -40

-40 -60
-50 0 50 -100 -50 0 50 100
(a) (b)
80 80

60 60

40 40

20 20

0 0

-20 -20

-40 -40

-60 -60

-80 -80
-100 -50 0 50 100 -100 -50 0 50 100 150
(c) (d)
100 60

40
50
20

0 0

-20
-50
-40

-100 -60
-150 -100 -50 0 50 100 150 200 -100 -50 0 50 100
(e) (f)

Figure
Figure 28.
28. Visualization
Visualizationofoffeature
featurerepresentation
representationvia t-SNE.
via (a)(a)
t-SNE. Input
Inputdata , (b)
data, (b)C1, (c)
C1, (c)C3 , (d)
C3, (d)C5
C5,,
(e) FC2 , (f) classifier layer
(e) FC2, (f) classifier layer. .

6. Conclusions
An integrated method was constructed based on a deep learning model and Bayesian
algorithm. The feature information hidden in the pressure signal was fully mined, and the
J. Mar. Sci. Eng. 2023, 11, 616 24 of 26

6. Conclusions
An integrated method was constructed based on a deep learning model and Bayesian
algorithm. The feature information hidden in the pressure signal was fully mined, and the
identification of typical faults of an axial piston pump was automatically accomplished.
Continuous wavelet transform was used for signal analysis and to maintain information on
both time and frequency. Bayesian algorithm adaptively learnt the critical hyperparameters
of the deep model by absorbing the merits of the Gaussian process to improve the model.
The following conclusions were obtained:
(1) Among the main selected hyperparameters, the learning rate showed the greatest
effect on the diagnosis results. The kernel number and kernel size also had considerable
influence. Compared to the above parameters, the epoch, dropout ratio, and batch size did
not have a remarkable effect.
(2) The experiments showed that improved AlexNet with manual tuning showed
better performance than the other contrastive models.
(3) By adopting Bayesian optimization algorithm, the proposed method named B-
AlexNet had a higher accuracy and stronger robustness. The average accuracy reached up
to 100% for five health states of an axial piston pump, and the precision for each fault type
was also enhanced.
(4) The identification performance of the diagnostic model was further validated by
feature distributions learned by different network layers. The constructed model can fuse
feature extraction and fault classification, and it requires less professional knowledge and
experience on signal processing.
In the future, further improvement to the Bayesian optimization algorithm will be
investigated taking into account the influence of different kernel functions in the Gaussian
process, and an improvement strategy based on noise addition will be developed. Moreover,
the generalizability of the diagnosis model will be explored by mining more comprehensive
information from multiple signals.

Author Contributions: Validation, writing—review and editing and funding acquisition, Y.Z.; formal
analysis and writing—original draft preparation, T.Z.; conceptualization, methodology, investigation,
and writing—original draft preparation, S.T.; supervision, S.Y. All authors have read and agreed to
the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (52205057,
52175052), National Key Research and Development Program of China (2020YFC1512402), Natural
Science Foundation of the Jiangsu Higher Education Institutions of China (22KJB460002), China Post-
doctoral Science Foundation (2022M723702), Taizhou Science and Technology Plan Project (22gyb42),
and the Youth Talent Development Program of Jiangsu University.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author upon reasonable request.
Acknowledgments: The authors are grateful to Wanlu Jiang and Siyuan Liu from Yanshan University
for their support in the experiments.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Lan, Y.; Li, Z.; Liu, S.; Huang, J.; Niu, L.; Xiong, X.; Niu, C.; Wu, B.; Zhou, X.; Yan, J.; et al. Experimental investigation on cavitation
and cavitation detection of axial piston pump based on MLP-Mixer. Measurement 2022, 200, 111582. [CrossRef]
2. Kumar, A.; Gandhi, C.P.; Tang, H.; Vashishtha, G.; Kumar, R.; Zhou, Y.; Xiang, J. Adaptive sensitive frequency band selection for
VMD to identify defective components of an axial piston pump. Chin. J. Aeronaut. 2022, 35, 250–265. [CrossRef]
3. Haidak, G.; Wei, X.; Li, F.; Larbi, A.; Wang, D. Heat effects modelling on the efficiency loss of the lubricating interface between
piston and cylinder in axial piston pumps. Tribol. Int. 2022, 175, 107846. [CrossRef]
J. Mar. Sci. Eng. 2023, 11, 616 25 of 26

4. Ying, P.; Tang, H.; Ye, S.; Ren, Y.; Xiang, J.; Kumar, A. Dynamic modelling of swashplate with local defects in axial piston pump
and coupled vibration analysis. Mech. Syst. Signal Process. 2023, 15, 110081. [CrossRef]
5. Tang, S.; Zhu, Y.; Yuan, S. A novel adaptive convolutional neural network for fault diagnosis of hydraulic piston pump with
acoustic images. Adv. Eng. Inform. 2022, 52, 101554. [CrossRef]
6. Haidak, G.; Wang, D. Analysis of damage and failure mechanism under a lubricated slipper/swashplate interface in axial piston
machines. Procedia Struct. Integr. 2022, 35, 124–131. [CrossRef]
7. Nie, S.; He, H.; Ji, H.; Nie, S.; Yan, X.; Yin, F. Failure analysis of auxiliary support bearing/shaft tribopair in seawater hydraulic
axial piston pump. Eng. Fail. Anal. 2023, 146, 107069. [CrossRef]
8. Lv, Y.; Zhao, W.; Zhao, Z.; Li, W.; Ng, K.K.H. Vibration signal-based early fault prognosis: Status quo and applications. Adv. Eng.
Inform. 2022, 52, 101609. [CrossRef]
9. Gu, L.; Ma, Z.; Tian, Q.; Sun, Y. Application of instantaneous speed fluctuation signal in fault diagnosis of axial piston pump.
J. Drain. Irrig. Mach. Eng. 2021, 39, 740–746.
10. Ye, M.; Yan, X.; Chen, N.; Jia, M. Intelligent fault diagnosis of rolling bearing using variational mode extraction and improved
one-dimensional convolutional neural network. Appl. Acoust. 2023, 202, 109143. [CrossRef]
11. Yu, H.; Li, H.; Li, Y. Vibration signal fusion using improved empirical wavelet transform and variance contribution rate for weak
fault detection of hydraulic pumps. ISA Trans. 2020, 107, 385–401. [CrossRef]
12. Zhou, Y.; Kumar, A.; Parkash, C.; Vashishtha, G.; Tang, H.; Glowacz, A.; Dong, A.; Xiang, J. Development of entropy measure for
selecting highly sensitive WPT band to identify defective components of an axial piston pump. Appl. Acoust. 2023, 203, 109225.
[CrossRef]
13. Tang, H.; Fu, Z.; Huang, Y. A fault diagnosis method for loose slipper failure of piston pump in construction machinery under
changing load. Appl. Acoust. 2021, 172, 107634. [CrossRef]
14. Glowacz, A. Thermographic fault diagnosis of shaft of BLDC motor. Sensors 2022, 22, 8537. [CrossRef]
15. Qian, P.; Pu, C.; Liu, L.; Li, X.; Zhang, B.; Gu, Z.; Meng, D. Development of a new high-precision friction test platform and
experimental study of friction characteristics for pneumatic cylinders. Meas. Sci. Technol. 2022, 33, 065001. [CrossRef]
16. Tang, S.; Zhu, Y.; Yuan, S. An adaptive deep learning model towards fault diagnosis of hydraulic piston pump using pressure
signal. Eng. Fail. Anal. 2022, 138, 106300. [CrossRef]
17. Zhu, Z.; Lei, Y.; Qi, G.; Chai, Y.; Mazur, N.; An, Y.; Huang, X. A review of the application of deep learning in intelligent fault
diagnosis of rotating machinery. Measurement 2023, 206, 112346. [CrossRef]
18. Kumar, D.; Dewangan, A.; Tiwari, R.; Bordoloi, D.J. Identification of inlet pipe blockage level in centrifugal pump over a range of
speeds by deep learning algorithm using multi-source data. Measurement 2021, 186, 110146. [CrossRef]
19. Wang, A.; Pei, Y.; Qian, Z.; Zareipour, H.; Jing, B.; An, J. A two-stage anomaly decomposition scheme based on multi-variable
correlation extraction for wind turbine fault detection and identification. Appl. Energy 2022, 321, 119373. [CrossRef]
20. Yuan, J.; Luo, L.; Jiang, H.; Zhao, Q.; Zhou, B. An intelligent index-driven multiwavelet feature extraction method for mechanical
fault diagnosis. Mech. Syst. Signal Process. 2023, 188, 109992. [CrossRef]
21. Shi, M.; Ding, C.; Wang, R.; Song, Q.; Shen, C.; Huang, W.; Zhu, Z. Deep hypergraph autoencoder embedding: An efficient
intelligent approach for rotating machinery fault diagnosis. Knowl. Based Syst. 2023, 260, 110172. [CrossRef]
22. Liu, J. Gas path fault diagnosis of aircraft engine using HELM and transfer learning. Eng. Appl. Artif. Intell. 2022, 114, 105149.
[CrossRef]
23. Su, Z.; Zhang, J.; Tang, J.; Wang, Y.; Xu, H.; Zou, J.; Fan, S. A novel deep transfer learning method with inter-domain decision
discrepancy minimization for intelligent fault diagnosis. Knowl. Based Syst. 2023, 259, 110065. [CrossRef]
24. Qian, G.; Liu, J. Fault diagnosis based on conditional generative adversarial networks in nuclear power plants. Ann. Nucl. Energy
2022, 176, 109267. [CrossRef]
25. Shi, Z.; Chen, J.; Zhang, X.; Zi, Y.; Li, C.; Chen, J. A reliable feature-assisted contrastive generalization net for intelligent fault
diagnosis under unseen machines and working conditions. Mech. Syst. Signal Process. 2023, 188, 110011. [CrossRef]
26. Zhao, C.; Shen, W. Mutual-assistance semi-supervised domain generalization network for intelligent fault diagnosis under unseen
working conditions. Mech. Syst. Signal Process. 2023, 189, 110074. [CrossRef]
27. Han, T.; Li, Y. Out-of-distribution detection-assisted trustworthy machinery fault diagnosis approach with uncertainty-aware
deep ensembles. Reliab. Eng. Syst. Saf. 2022, 226, 108648. [CrossRef]
28. Zhou, T.; Han, T.; Droguett, E.L. Towards trustworthy machine fault diagnosis: A probabilistic Bayesian deep learning framework.
Reliab. Eng. Syst. Saf. 2022, 224, 108525. [CrossRef]
29. Sun, J.; Liu, Z.; Wen, J.; Fu, R. Multiple hierarchical compression for deep neural network toward intelligent bearing fault
diagnosis. Eng. Appl. Artif. Intell. 2022, 116, 105498. [CrossRef]
30. Chao, Q.; Zhang, J.; Xu, B.; Wang, Q.; Lyu, F.; Li, K. Integrated slipper retainer mechanism to eliminate slipper wear in high-speed
axial piston pumps. Front. Mech. Eng. 2022, 17, 1. [CrossRef]
31. Kumar, A.; Gandhi, C.P.; Zhou, Y.; Kumar, R.; Xiang, J. Improved deep convolution neural network (CNN) for the identification
of defects in the centrifugal pump using acoustic images. Appl. Acoust. 2020, 167, 107399. [CrossRef]
32. Tang, S.; Zhu, Y.; Yuan, S. Intelligent fault diagnosis of hydraulic piston pump based on deep learning and Bayesian optimization.
ISA Trans. 2022, 129, 555–563. [CrossRef]
J. Mar. Sci. Eng. 2023, 11, 616 26 of 26

33. Zhu, Y.; Li, G.; Tang, S.; Wang, R.; Su, H.; Wang, C. Acoustic signal-based fault detection of hydraulic piston pump using a particle
swarm optimization enhancement CNN. Appl. Acoust. 2022, 192, 108718. [CrossRef]
34. Chao, Q.; Gao, H.; Tao, J.; Wang, Y.; Zhou, J.; Liu, C. Adaptive decision-level fusion strategy for the fault diagnosis of axial piston
pumps using multiple channels of vibration signals. Sci. China Technol. Sci. 2022, 65, 470–480. [CrossRef]
35. Zhu, Y.; Li, G.; Wang, R.; Tang, S.; Su, H.; Cao, K. Intelligent fault diagnosis of hydraulic piston pump combining improved
LeNet-5 and PSO hyperparameter optimization. Appl. Acoust. 2021, 183, 108336. [CrossRef]
36. Xie, Y.; Sun, W.; Ren, M.; Chen, S.; Huang, Z.; Pan, X. Stacking ensemble learning models for daily runoff prediction using 1D and
2D CNNs. Expert Syst. Appl. 2023, 217, 119469. [CrossRef]
37. Ali, D.; Mohammad, E.; Reza, H.; Biuok, E. A hybrid fine-tuned VMD and CNN scheme for untrained compound fault diagnosis
of rotating machinery with unequal-severity faults. Expert Syst. Appl. 2021, 167, 114094.
38. An, F.; Wang, J. Rolling bearing fault diagnosis algorithm using overlapping group sparse-deep complex convolutional neural
network. Nonlinear Dyn. 2022, 108, 2353–2368. [CrossRef]
39. Zhao, M.; Fu, X.; Zhang, Y.; Meng, L.; Tang, B. Highly imbalanced fault diagnosis of mechanical systems based on wavelet packet
distortion and convolutional neural networks. Adv. Eng. Inform. 2022, 51, 101535. [CrossRef]
40. Tang, S.; Zhu, Y.; Yuan, S. Intelligent fault identification of hydraulic pump using deep adaptive normalized CNN and syn-
chrosqueezed wavelet transform. Reliab. Eng. Syst. Saf. 2022, 224, 108560. [CrossRef]
41. Venkatesh, B.; Anuradha, J. A review of feature selection and its methods. Cybern. Inf. Technol. 2019, 19, 3–26. [CrossRef]
42. Tang, S.; Zhu, Y.; Yuan, S. An improved convolutional neural network with an adaptable learning rate towards multi-signal fault
diagnosis of hydraulic piston pump. Adv. Eng. Inform. 2021, 50, 101406. [CrossRef]
43. Wu, J.; Chen, X.; Zhang, H.; Xiong, L.; Lei, H.; Deng, S. Hyperparameter optimization for machine learning models based on
Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40.
44. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; Freitas, N.D. Taking the human out of the loop: A review of Bayesian
optimization. Proc. IEEE 2016, 104, 148–175. [CrossRef]
45. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf.
Process. Syst. 2012, 25, 1097–1105. [CrossRef]
46. Baldi, P.; Sadowski, P. The dropout learning algorithm. Artif. Intell. 2014, 210, 78–122. [CrossRef]
47. Kingma, D.; Ba, J. Adam: A method for stochastic optimization. Comput. Sci. 2014. Available online: http://de.arxiv.org/pdf/14
12.6980 (accessed on 1 January 2023).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like