Professional Documents
Culture Documents
Robust GMM least square twin K-class support vector machine for urban
water pipe leak recognition
Mingyang Liu a, Jin Yang a, *, 1, Shuaiyong Li b, Zhihao Zhou b, Endong Fan a, Wei Zheng a
a
Key Laboratory of Optoelectronic Technology & System of China Education Ministry, Chongqing University, Chongqing 400044, PR China
b
Key Laboratory of Industrial Internet of Things & Networked Control of China Education Ministry, Chongqing University of Posts and Telecommunications, Chongqing
400065, China
A R T I C L E I N F O A B S T R A C T
Keywords: Monitoring pipe operation status is very important in saving water resources and realizing sustainability of the
Leak recognition water supply system. Currently, the support vector machine (SVM) and its improved algorithms have been used
Outliers to detect leaks in pipe networks. Among them, the least square twin multi-class support vector machine (LST-
LST-KSVC
KSVC) is a novel multi-classification method. However, LST-KSVC assigns same weights to leak samples,
GMM
GLT-KSVC
including outliers that affect it. In addition, through the quadratic loss function, LST-KSVC often misclassifies rest
class. Therefore, to overcome these two drawbacks, we propose a weighted version of LST-KSVC, referred to as
the GMM least square twin K-class support vector machine (GLT-KSVC). Based on the Gaussian mixture model
(GMM) method, GLT-KSVC assigns larger weights to main leak samples, and assigns smaller weights to outliers,
making it insensitive to outliers. Moreover, GLT-KSVC avoids the misclassification drawback for rest samples by
using weighted least squares linear loss function. The leak recognition experiment revealed that GLT-KSVC is
better than LST-KSVC in classification accuracy, and its calculation time is only slightly higher than that of LST-
KSVC.
* Corresponding author.
E-mail addresses: 20160802053@cqu.edu.cn (M. Liu), yangjin@cqu.edu.cn (J. Yang), lishuaiyong@cqupt.edu.cn (S. Li).
1
ORCID: https://orcid.org/0000-0002-6606-8310.
https://doi.org/10.1016/j.eswa.2022.116525
Received 26 May 2021; Received in revised form 2 September 2021; Accepted 7 January 2022
Available online 12 January 2022
0957-4174/© 2022 Published by Elsevier Ltd.
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
et al., 2018; Sun et al., 2016; Xiao et al., 2019; Zhou et al., 2019), which associated with two limitations. One, by using quadratic loss functions,
use online monitoring algorithms or models in the background server. LST-KSVC often misclassifies the rest class and two, LST-KSVC is sensi
These methods can analyze pipe vibration data, pressure data, flow data tive to outliers. In leak recognition, outliers appear in leak sample data
and other pipe operation parameters, online, to recognize pipe operation due to inevitable environmental noise interference, and these outliers
status, and prompt the water company to deal with leak risks. The are often situated away from the main data. Then, LST-KSVC assigns
proposed GLT-KSVC leak recognition algorithm in this paper is an online same weights to leak sample points, including outliers, which makes the
pipeline monitoring method. classification trend unsatisfactory. To obtain an excellent leak classifi
In the offline leak detection field, Ebrahimi-Moghadam et al. (2018) cation trend, the classification algorithm should be made insensitive to
developed leak calculator functions for low and medium buried pressure these outliers. Therefore, we propose an improved weighted version of
pipelines, the main calculator parameters included pipe diameter, leak LST-KSVC, referred to as the GMM least square twin K-class support
hole diameter, and flow pressure. Keramat et al. (2021) used the transfer vector machine (GLT-KSVC), which can overcome the drawbacks of LST-
matrix to derive frequency responses of transient waves in a leaking KSVC.
viscoelastic pipe, after which the maximum likelihood estimate was
applied to locate leaks. Brennan et al. (2019) proposed the random 3. Theory
telegraph theory to achieve derivation of approximate analytical solu
tions for leak location cross-correlation functions. This method shows 3.1. Introduction to LST-KSVC
that even if leak signals suffer from severe amplitude distortion, as long
as zero crossings in leak noise data are retained, then, an accurate time LST-KSVC is a novel multi-class classification algorithm that utilizes
delay estimation can be obtained through cross-correlation functions. the “one-versus-one-versus-rest” structure to evaluate all training sam
Asada et al. (2020) presented a leak location method that used transient ples with ternary outputs {− 1, 0, +1}. In this section, we briefly describe
pressure damping by energy dissipation from the leak. This method { ( )}
LST-KSVC. We consider D = (x1 , y1 ), (x2 , y2 ), ⋯, xm , ym , to be the
minimized the affection of high-frequency noise on leak location. training dataset. Whereby xi denotes input samples in the m-dimen
However, these methods have two drawbacks. One is that mathematical sional real space Rm , yi ∈ Nq is the q-class output term, i = 1,⋯,m. Then,
analysis forms of the leak model are easily interfered with by real we briefly introduce the structure of “one-versus-one-versus-rest”,
environmental factors, while the other is that calculations using these which evaluates all training points with ternary outputs {− 1, 0, +1}. Let
methods are too complicated and require high-quality hardware
matrix A ∈ Rl1 ×m be the training samples belonging to label “+1”, B ∈
equipment. Therefore, real leak location finding may highly depend on
Rl2 ×m be the training samples belonging to label “-1”, and C ∈ Rl3 ×m
hardware quality.
represent the remaining class data with label “0”, where l1 + l2 + l3 =
In the online leak detection field, Hu et al. (2021) combined density
m. In LST-KSVC classification, two non-parallel hyperplanes are
based spatial clustering of applications with noise (DBSCAN) and mul
formulated as follows:
tiscale fully convolutional networks (MFCN) to detect leaks and manage
{
water losses. The DBSCAN divided a large water network into a number w+1 x + b+ = 0
of zones, and then, the MFCN was used to manage each single zone. (1)
w−2 x + b− = 0
Zhou et al. (2019) proposed a novel leak location recognition method
based on the Fully-linear DenseNet (BLIFF). This method can monitor
where by w+1 , w2 ∈ R are normal vectors to the hyperplanes, while b+ ,
− n
sudden bursts in water pipe networks. Pérez-Pérez et al. (2021) pre
b− ∈ R are the scalar. Decision functions of LST-KSVC are obtained
sented an online leak detection system based on artificial neural network
through the following two optimization functions:
(ANN) techniques, this system can monitor pipe networks through on
line pressure and flow rate data. Cody & Narasimhan, (2020) used the 1 T c1 c2
min δ δ + ξT ξ + ηT η (2)
linear prediction (LP) data-driven method to detect and localize small w+ ,b ,δ,ξ,η2
1 +
2 2
leaks. Zhou et al. (2011) proposed an expert system based on the
⎧
Bayesian reasoning approach to complete the leak detection and size ⎪ Aw+ + e+ b+ = δ
⎪
⎨
estimation in complex real pipe systems. Mandal (Mandal et al., 2012) (1 )
s.t. − e− − Bw+ 1 + e− b+ = ξ
used rough set theory, artificial bee colony (ABC) algorithm and support ⎪
⎪ ( )
⎩ e0 (ε − 1) − Cw+ + e0 b+ = η
vector machine (SVM) to detect the pipe leak. Lee et al. (2013) com 1
2
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
xT w+ T −
1 +b+ > − 1 +ε and x w2 +b− < 1 − ε determined by LST-KSVC.
The red and green straight lines intersect and pass through the “0′′
class (rest samples, green points), therefore, the function (7) also makes
a misclassification decision for some rest samples. Moreover, LST-KSVC
gives the same weights for sample points (including outliers). This
means that LST-KSVC is sensitive to outliers. The earlier mentioned two
drawbacks can be overcome by a novel GLT-KSVC algorithm as follows.
3.2. Gaussian mixture model least square twin k-class support vector
machine (GLT-KSVC)
⎧ ( ) 2 c2 ( ) 2
⎪ 1 2 c1
⎪
⎨ wmin
⎪ ‖Aw+
1 + e+ b+ ‖ + ‖e− + Bw+
1 + e− b+ ‖ + ‖e0 (ε − 1) − Cw+
1 + e0 b+ ‖
1 ,b 2 2 2
(4)
+ +
⎪
⎪ 1 c3 ( ) 2 c4 ( ) 2
⎪
⎩ min ‖Bw−2 + e− b− ‖2 + ‖e+ − Aw−2 + e+ b− ‖ + ‖e0 (1 − ε) − Cw−2 + e0 b− ‖
w2− ,b− 2 2 2
Let E = [Ae+ ], F = [Be− ], G = [Ce0 ]. In this regard, two non-parallel matrix while K represents the number of mixture components. There
∑
hyperplanes are obtained. Then, LST-KSVC separates all training sam fore, the main parameters of GMM are; pj , μj and j . Then, we intro
ples based on “one-versus-one-versus-rest” structure. In the “one-versus- duced a hidden variable, yi , to represent the probability that the i-th
one-versus-rest” structure, q(q− 1)
2
LST-KSVC subclassifiers are established sample point belongs to each Gaussian component. yi obeys multinomial
for q-class classification. When a new testing sample, x, appears, LST- distribution. A probability model was established as follows:
KSVC determines its class label through a vote process. For each (i, j) ( )
( ) ∑ K
sub-classifier, LST-KSVC labels “+1” to i-th class samples, “-1” to j-th p yi = j = φj φj ≥ 0, φj = 1 (9)
class samples, “0” to all remaining classes, respectively, where j=1
i,j ∈ {1, 2, ⋯, q}. Then, w+ 1 , b+ , w2 and b− are obtained from Eq. (5).
−
( )
In case of linear LST-KSVC, classification labels are determined through where φj is the Gaussian model of the j-th mixture model, and p xi |yi
the following decision function. obeys the j-th Gaussian component:
⎧ ( )
⎪ T + ( ) ∑
⎨ +1, if x w1 + b+ > − 1 + ε
⎪ p xi |yi = j N μj , (10)
f (x) = − 1, if xT w−2 + b− < 1 − ε (6) j
⎪
⎪
⎩ 0, otherwise Therefore, according to Eq. (9) and Eq. (10), the likelihood function
is established for the
In non-linear LST-KSVC, the decision function is designed as: ∑
pj , μj and j parameters.
⎧ ( T T) +
⎪
⎨ +1, ifk (x , D )w1 + b+ > − 1 + ε
⎪ ( ∑) ∑m ( ∑) ∑m ∑
K ( ∑) ( )
f (x) = − 1, ifk xT , DT w−2 + b− < 1 − ε (7) γ p, μ, = logP xi ; p, μ, = log P xi |yi ; μ, P yi ; p
⎪
⎪ i=1 i=1 yi =1
⎩ 0, otherwise
(11)
As shown in Fig. 1, linear LST-KSVC evaluates three samples based
where m represents the number of samples. Then, parameters pj , μj and
on two decision hyperplanes in Eq. (6), where red and green lines are the ∑
two decision hyperplanes j are estimated in the maximum likelihood Eq. (11). The estimated
3
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
1 T c1
min δ δ + ξT ξ + c2 λT η (15)
w+ ,b ,δ,ξ,η2
1 +
2
⎧ ( )
⎪
⎪ W+ +
1 Aw1 + e+ b+ = δ
⎨ { ( )}
s.t. W−2 − e− − Bw+ 1 + e− b+ =ξ
⎪
⎪ ( )
⎩ e0 (ε − 1) − Cw+ + e0 b+ = η
1
⎧
1 ∑K (i) obtained by GMM, δ and δ* belong to l1-dimensional real space, ξ and ξ*
⎪
⎪
⎪
⎪
⎪
pj =
m
ω
i=1 j belong to l2-dimensional real space, η and η* belong to l3-dimensional
⎪
⎪
⎪
⎪
⎪ ∑ m real space, A, B, C ∈ Rli ×n (i = 1, 2, 3), e0 and e1, e2 are vectors of
⎪
⎪ ω(i) x(i)
⎨ μj = ∑ i=1 j appropriate dimensions. The λ and λ* belong to l3-dimensional real
(13)
m
⎪ ω(i)
i=1 j space, and are determined by least squares linear loss function (Wang &
⎪
⎪
⎪
⎪
⎪ ∑m (i) ( )( )T Zhong, 2014) to eliminate the local infinitesimal effect. The first two
⎪
⎪ ∑ ω x(i) − μj x(i) − μj
⎪
⎪
⎪ = i=1 j
∑ terms of Eq. (15) and Eq. (16) are used to obtain the weights W+ 1 and
⎪ K
⎩ j ω(i) i=1 j W−2 , that is, the GMM algorithm gives a definite weighted value for
every sample point. As mentioned in subsection 3.2, weights of the main
The E-step and M− step are cyclically iterated until convergence. data cluster are much larger than weights of outliers, which is equivalent
GMM has been previously described (Lu et al., 2019). In this section, we to GLT-KSVC reducing the sensitivities for outliers in classification.
describe how weighted values of GLT-KSVC are generated by GMM. Then, it is found that the third term of Eq. (15) and Eq. (16) is different
First, the weights obtained by GMM of the j-th “+1” cluster sample and from that of Eq. (2) and Eq. (3), respectively. The difference in third term
the j-th “-1” cluster sample are denoted by p+j , pj (0 < pj < 1, 0 < pj <
− + −
is because LST-KSVC and GLT-KSVC use different loss functions to rest
([ ])
1). Then, weight diagonal matrices W+ 1 = diag p+
1 , ⋯, pj1
+
and W−2 = samples. As shown in Fig. 2, we used the same data set (plotted in Fig. 1)
([ ]) in subsection 3.2. The use of weighted least squares linear loss function
diag p−1 , ⋯, p−j1 are established by a series of iterated p+ j and pj ,
−
in Eq. (15) and Eq. (16) inhibits the classification hyperplanes of GLT-
respectively. KSVC from passing through rest samples, thereby improving the classi
Linear GLT-KSVC fication accuracy for rest class (class 0).
In linear GST-KSVC, two non-parallel hyperplanes are defined as: Then, we substituted the constraint conditions into objective func
tions as shown in Eq. (17) and Eq. (18).
1 +( ) 2 c1 { ( )} 2 { ( )}
min ‖W1 Aw+
1 + e+ b+ ‖ + ‖W−2 − e− − Bw+
1 + e− b+ ‖ + c2 λT e0 (ε − 1) − Cw+
1 + e0 b+ (17)
w+ ,b ,δ,ξ,η2
1 +
2
and
{
w+1x + b+ = 0
(14)
w−2 x + b− = 0
1 −( ) 2 c3 { ( )} 2 { ( )}
min ‖W2 Bw−2 + e− b− ‖ + ‖W+
1 e+ − Aw−2 + e+ b− ‖ + c4 λ*T e0 (1 − ε) − Cw−2 + e0 b− (18)
w−2 ,b− ,δ* ,ξ* ,η* 2 2
4
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
{ ( + + ) ( − )
W+ + − + − −
1 A W1 Aw1 + W1 e+ b+ + c1 W2 B W1 Bw1 + W1 e− b+ + W1 e− − c2 λ*T C = 0
( ) ( ) (19)
W+ + + + − − + − −
1 e+ W1 Aw1 + W1 e+ b+ + c1 W2 e− W1 Bw1 + W1 e− b+ + W1 e− − c2 λ*T e0 = 0
{ ( ) ( + − ) *T
W−2 B W−2 Bw−2 + W−2 e− b− + c3 W+ + +
1 A W1 Aw2 − W1 e+ b− + W1 e+ − c4 λ C = 0
( ) ( ) (20)
W2 e− W2 Bw1 + W2 e− b− + c3 W1 e+ W1 Aw2 − W1 e+ b− + W1 e+ − c4 λ*T e0 = 0
− − + − + + − − +
{ ( T T)
Kw+1 x ,D + b+ = 0
( ) (25)
Next, we arranged Eq. (19) and Eq. (20) into matrix forms (Eq. (21) Kw−2 xT , DT + b− = 0
and Eq. (22)), and solved the parameters: w+
1 , b+ (Eq. (23)) and w2 , b−
−
(Eq. (24)). where K(∙) is an arbitrary kernel function (D. Franken, 1997), it sim
[ ][ ] [ ][ ] [ ]
W+ + + +
1 AW1 AW1 AW1 e+ w+ W−2 BW−1 BW−1 BW−1 e− w+ C
1
+ c1 1
− c2 λ*T =0 (21)
W1 e+ W1 AW1 e+ W+
+ + +
1 e+ b+ W−1 e− W−2 BW−2 e− W−1 e− b+ e0
[ ][ ] [ ][ ] [ ]
W−2 BW−2 BW−2 BW−2 e− w−2 W+ + + +
1 AW1 AW1 AW1 e+ w−2 C
+ c3 − c4 λ*T =0 (22)
W+ + + +
1 e+ W1 AW1 e+ W1 e+ b− W2 e+ W1 AW1 e− W+
+ + −
1 e∓ b− e0
⎧
⎪
⎪ [ ] [ ]
⎪
⎪
⎪
⎪ *T C − 1 w−2
⎪
⎪ c4 λ {M + c3 N} =
⎪
⎪ e0 b−
⎪
⎪
⎪
⎪ [ ]
⎪
⎨ − − − −
W2 BW2 BW2 BW2 e−
M= (24)
⎪
⎪ W+ + + +
1 e+ W1 AW1 e+ W1 e+
⎪
⎪
⎪
⎪ [ ]
⎪
⎪ W+ + + +
⎪
⎪ 1 AW1 AW1 AW1 e+
⎪
⎪ N=
⎪ W+ + − +
⎪
⎪ 2 e+ W1 AW1 e− W1 e−
⎩
5
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
1 T c1
min δ δ + ξT ξ + c2 λT η (26)
w+ ,b ,δ,ξ,η2
1 +
2
⎧ ( ( ) )
⎪ W+ + T
⎪
⎨ 1 Kw1 A, D + e+ b+ = δ
{ ( ( T
) )}
s.t. W−2 − e− − Kw+ 1 B, D + e− b+ =ξ
⎪
⎪ ( ( ) )
⎩ e0 (ε − 1) − Kw+ C, DT + e0 b+ = η
1
and
1 *T * c3 *T *
min ξ ξ + δ δ + c4 λ*T η* (27)
w−2 ,b− ,δ* ,ξ* ,η* 2 2 Fig. 5. Three different leak levels and background noise.
⎧
⎪
( ( ) )
W−2 Kw−2 B, DT + e− b− = ξ* GLT-KSVC labels “+1” to i-th class samples, “-1” to j-th class samples,
⎪
⎨ { ( ( ) )} and “0” to all remaining classes, respectively, where i, j ∈ {1, 2, ⋯, q}.
s.t. W+ 1 e+ − Kw−2 A, DT + e+ b− = δ*
⎪ Then, w+1 , b+ , w2 and b− in the (i, j)th sub-classifier are obtained using
−
⎩ e (1 − ε) − Kw− C, DT + e b ) = η*
⎪ ( ( )
0 2 0 − Eq. (30) and Eq. (31). In the case of linear GLT-KSVC, classification la
bels are determined using the following decision function:
Similar to the linear case, we expressed Eq. (26) and Eq. (27) in
matrix forms (Eq. (28) and Eq. (29)), the w+
1 , b+ and w2 , b− parameters
−
[ ( ) + ( ) + ( ) + ][ + ] [ ( ) ( ) ( ) ][ ] [ ]
T T T
W+
1 K A, D W1 K A, D W1 K A, D W1 e+ w1 W−2 K B, DT W−1 K B, DT W−1 K B, DT W−1 e− w+ *T C
( T
) + c1 ( ) 1
− c2 λ =0 (28)
W+ + + +
1 e+ W1 K A, D W1 e+ W1 e+ b+ W−1 e− W−2 K B, DT W−2 e− W−1 e− b+ e0
[ ( ) ( ) ( ) ][ ] [ ( ) + ( ) + ( ) + ][ − ] [ ]
W−2 K B, DT W−2 K B, DT W−2 K B, DT W−2 e− w−2 W+ T T T
1 K A, D W1 K A, D W1 K A, D W1 e+ w2 *T C
( T
) + c3 ( T
) − c4 λ =0 (29)
W+ + + +
1 e+ W1 K A, D W1 e+ W1 e+ b− W+ + − +
2 e+ W1 K A, D W1 e− W1 e∓ b− e0
[ ] [ ]{[ ( ) + ( ) + ( ) + ] [ ( ) ( ) ( ) ] }− 1
T T T
w+ C W+
1 K A, D W1 K A, D W1 K A, D W1 e+ W−2 K B, DT W−1 K B, DT W−1 K B, DT W−1 e−
1
= c2 λ*T ( T
) + c1 ( ) (30)
b+ e0 W+ + + +
1 e+ W1 K A, D W1 e+ W1 e+ W−1 e− W−2 K B, DT W−2 e− W−1 e−
[ ] [ ]{[ ( ) ( ) ( ) ] [ ( ) + ( ) + ( ) + ] }− 1
w−2 *T C W−2 K B, DT W−2 K B, DT W−2 K B, DT W−2 e− W+ T T T
1 K A, D W1 K A, D W1 K A, D W1 e+
= c4 λ ( T
) + + c3 ( T
) − (31)
b− e0 W+ + +
1 e+ W1 K A, D W1 e+ W1 e+ W+ + +
2 e+ W1 K A, D W1 e− W1 e−
⎧
⎪ T +
Finally, kernel decision surfaces of non-linear GLT-KSVC had been ⎨ +1, if x w1 + b+ > − 1 + ε
⎪
f (x) = − 1, if xT w−2 + b− < 1 − ε (32)
established based on Eq. (30) and Eq. (31), that is, the non-linear version ⎪
⎪
⎩
of GLT-KSVC are obtained. 0, otherwise
6
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
Fig. 6. (a). The original 2-D leak and background noise samples. (b). The simplified 2-D leak and background noise samples.
Fig. 7. (a). The result of GMM model fitting. (b). The result of GMM clustering.
Fig. 8. (a). The posterior probability of cluster 1. (b). The posterior probability of cluster 2. (c). The posterior probability of cluster 3. (d). The posterior probability of
cluster 4.
7
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
Fig. 9. Non-linear LST-KSVC and GLT-KSVC classifications. (a). Non-linear LST-KSVC classification for leak data. (b). Non-linear GLT-KSVC classification for
leak data.
4.1. An overview of the proposed leak recognition procedure 4.4. GMM pre-processing
The proposed leak recognition model for urban water pipe networks As described in subsection 3.2, the GMM cluster method was used to
is based on acoustic emission (AE) data, variance and other statistical reduce outlier interference. In the first step of the GMM cluster, the EM
characteristics, GMM weighted values and GLT-KSVC. A schematic algorithm was used to fit sample points, which gave the GMM model
presentation of the procedure of leak recognition model is shown in parameters. Fig. 7(a) shows that the obtained fitting model is close to the
Fig. 3, which can be described in details as follows. sample point distributions. After GMM model parameters were ob
tained, sample points were clustered as shown in Fig. 7(b). Red dots
Step 1: AE piezoelectric (PZT) sensors were used to acquire pipe represent cluster 1, yellow dots represent cluster 2, green dots represent
vibration AE data. cluster 3, blue dots represent cluster 4, while the purple dots represent
Step 2: Eight statistical feature parameters (variance, standard de the outlier set.
viation, kurtosis, sample entropy, skewness, mean, energy, RMS) Then, GMM was used to calculate the posterior probability of sample
were used to extract vibration characteristics from the AE data, and points belonging to each cluster label, called membership degree, which
further build train samples T, as well as test samples D. was used to construct the weighted matrix W. Fig. 8 shows membership
Step 3: GMM assigned different weighted values for train sample degrees of the four cluster labels. In Fig. 8(a), the closer the sample point
point. is to red, the higher the probability that the sample point belongs to
Step 4: The presented GLT-KSVC was run for train samples T and test cluster 1, and the higher the degree of membership that the sample point
samples D. belongs to cluster 1, the greater the weighted value that the sample point
is assigned; on the contrary, the closer the sample point is to blue, the
8
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
lower the probability that the sample point belongs to cluster 1, and the Resources, Validation. Endong Fan: Investigation, Resources, Software.
lower the membership degree, the smaller the weighted value that the Wei Zheng: Investigation, Project administration.
sample point is assigned. As shown in Fig. 8, outliers have low mem
berships in each cluster label, that is, each outlier is given a small Declaration of Competing Interest
weighted value, which makes the GLT-KSVC classification algorithm
insensitive to these outliers. The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to influence
4.5. GLT-KSVC classification for leak recognition the work reported in this paper.
In pipe leak detection, leak signal is affected by noise, leak data Acknowledgements
samples are often not linearly separable, and thus, linear classification is
no longer applicable. We decided to use the non-linear GLT-KSVC J.Y. acknowledges the National Natural Science Foundation of China
method to detect leak levels. In the non-linear case, the RBF was selected (No. 51675069), the Fundamental Research Funds for the Central Uni
as the kernel function, the fundamental parameters, C and σ , were versities (Nos. 2018CDQYGD0020, cqu2018CDHB1A05).
optimized in a grid search method, and the experiments were completed
using matlab 2019a. References
4.6. Experimental result comparisons and discussion Arifin, B. M. S., Li, Z., Shah, S. L., Meyer, G. A., & Colin, A. (2018). A novel data-driven
leak detection and localization algorithm using the Kantorovich distance. Computers
& Chemical Engineering. https://doi.org/10.1016/j.compchemeng.2017.09.022
Fig. 9(a) and (b) shows non-linear LST-KSVC and GLT-KSVC classi Asada, Y., Kimura, M., Azechi, I., Iida, T., & Kubo, N. (2020). Leak detection method
fications for leak data. As described in subsection 4.2, this case includes using energy dissipation model in a pressurized pipeline. Journal of Hydraulic
four classification labels; small leak, medium leak, serious leak, and Research. https://doi.org/10.1080/00221686.2020.1818308
Brennan, M. J., Gao, Y., Ayala, P. C., Almeida, F. C. L., Joseph, P. F., & Paschoalini, A. T.
background noise, respectively. Green represents small leak, black rep (2019). Amplitude distortion of measured leak noise signals caused by
resents medium leak, white represents serious leak, pink-orange repre instrumentation: Effects on leak detection in water pipes using the cross-correlation
sents background noise, while the red circle is the support vector. In method. Journal of Sound and Vibration, 461, Article 114905. https://doi.org/
10.1016/j.jsv.2019.114905
Fig. 9(a), the yellow plane and light-blue plane have an intersection
Cody, R. A., & Narasimhan, S. (2020). A field implementation of linear prediction for
area, and it is shown that outliers have a great influence on the classi leak-monitoring in water distribution networks. Advanced Engineering Informatics, 45,
fication outcomes. In Fig. 9(b), the pink-orange outliers do not affect the Article 101103. https://doi.org/10.1016/j.aei.2020.101103
Franken, D. (1997). Positiveness of the solutions for the convergence-modified Twomey-
classification trend, and the four classification areas are obvious. In the
algorithm to solve Fredholm-integral-equation of the first kind with arbitrary kernel-
classification accuracy, GLT-KSVC reached 98.52%, while the LST-KSVC functions. Journal of Aerosol Science, 28(97), 275–276.
reached 89.04%. Regarding sample training time, GLT-KSVC used Dawood, T., Elwakil, E., Novoa, H. M., & Gárate Delgado, J. F. (2021). Toward urban
0.2945 s while LST-KSVC used 0.2876 s. To further describe the classi sustainability and clean potable water: Prediction of water quality via artificial
neural networks. Journal of Cleaner Production, 291. https://doi.org/10.1016/j.
fication performance of GLT-KSVC, we used GLT-KSVC, LST-KSVC and jclepro.2020.125266
classic SVM methods to recognize leak samples with many tests. Table 1 Diao, X., Jiang, J., Shen, G., Chi, Z., Wang, Z., Ni, L., … Hao, Y. (2020). An improved
compares the outcomes of the SVM, LST-KSVC and GLT-KSVC methods. variational mode decomposition method based on particle swarm optimization for
leak detection of liquid pipelines. Mechanical Systems and Signal Processing. https://
The most accurate and best computational time for every method are doi.org/10.1016/j.ymssp.2020.106787
marked in bold. GLT-KSVC exhibited the highest classification accuracy Ebrahimi-Moghadam, A., Farzaneh-Gord, M., Arabkoohsar, A., & Moghadam, A. J.
for leak samples. When the number of features was 4, 6, 7, and 8, the (2018). CFD analysis of natural gas emission from damaged pipelines: Correlation
development for leakage estimation. Journal of Cleaner Production, 199, 257–271.
SVM method could not accomplish the classification. Regarding https://doi.org/10.1016/j.jclepro.2018.07.127
computational time, GLT-KSVC and LST-KSVC are comparable, but it Gao, Y., Liu, Y., Ma, Y., Cheng, X., & Yang, J. (2018). Application of the differentiation
takes far less time than SVM. process into the correlation-based leak detection in urban pipeline networks.
Mechanical Systems and Signal Processing. https://doi.org/10.1016/j.
ymssp.2018.04.036
5. Conclusion Hu, X., Han, Y., Yu, B., Geng, Z., & Fan, J. (2021). Novel leakage detection and water loss
management of urban water supply network using multiscale neural networks.
Journal of Cleaner Production, 278, Article 123611. https://doi.org/10.1016/j.
We propose a multi-class SVM algorithm that is based on the LST-
jclepro.2020.123611
KSVC method, referred to as the GLT-KSVC algorithm. The GLT-KSVC Keramat, A., Karney, B., Ghidaoui, M. S., & Wang, X. (2021). Transient-based leak
algorithm assigns different weight values to leak sample points, based detection in the frequency domain considering fluid–structure interaction and
on the GMM method. Since GLT-KSVC assigns small weight values to viscoelasticity. Mechanical Systems and Signal Processing, 153, Article 107500.
https://doi.org/10.1016/j.ymssp.2020.107500
outliers in the leak dataset, it makes the GLT-KSVC algorithm insensitive Kim, Y., Lee, S. J., Park, T., Lee, G., Suh, J. C., & Lee, J. M. (2016). Robust leak detection
to outliers. However, the LST-KSVC algorithm cannot overcome outlier and its localization using interval estimation for water distribution network.
interference. Moreover, by using weighted least squares linear loss Computers & Chemical Engineering, 92, 1–17. https://doi.org/10.1016/j.
compchemeng.2016.04.027
function in GLT-KSVC, the classification effect of GLT-KSVC is better Lee, L. H., Rajkumar, R., Lo, L. H., Wan, C. H., & Isa, D. (2013). Oil and gas pipeline
than that of LST-KSVC for the rest class. That is, GLT-KSVC overcomes failure prediction system using long range ultrasonic transducers and Euclidean-
the misclassification effect on the rest class. However, there are some Support Vector Machines classification approach. Expert Systems with Applications, 40
(6), 1925–1934. https://doi.org/10.1016/j.eswa.2012.10.006
limitations. When data has too many outliers, the GLT-KSVC may fail to Lu, Y., Tian, Z., Peng, P., Niu, J., Li, W., & Zhang, H. (2019). GMM clustering for heating
recognize the leak. Therefore, our algorithm should be further improved load patterns in-depth identification and prediction model accuracy improvement of
for samples with a large number of outliers, which is our next research district heating system. Energy and Buildings, 190, 49–60. https://doi.org/10.1016/j.
enbuild.2019.02.014
plan.
Mandal, S. K., Chan, F. T. S., & Tiwari, M. K. (2012). Leak detection of pipeline: An
integrated approach of rough set theory and artificial bee colony trained SVM. Expert
CRediT authorship contribution statement Systems with Applications, 39(3), 3071–3080. https://doi.org/10.1016/j.
eswa.2011.08.170
Nasiri, J. A., Moghadam Charkari, N., & Jalili, S. (2015). Least squares twin multi-class
Mingyang Liu: Methodology, Software, Validation, Formal analysis, classification support vector machine. Pattern Recognition, 48(3), 984–992. https://
Data curation, Writing – original draft, Writing – review & editing, doi.org/10.1016/j.patcog.2014.09.020
Visualization. Jin Yang: Conceptualization, Writing – review & editing, Nguyen, S. T. N., Gong, J., Lambert, M. F., Zecchin, A. C., & Simpson, A. R. (2018). Least
squares deconvolution for leak detection with a pseudo random binary sequence
Supervision, Project administration, Funding acquisition. Shuaiyong Li: excitation. Mechanical Systems and Signal Processing, 99, 846–858. https://doi.org/
Validation, Resources, Data curation. Zhihao Zhou: Investigation, 10.1016/j.ymssp.2017.07.003
9
M. Liu et al. Expert Systems With Applications 195 (2022) 116525
Ni, L., Jiang, J., Pan, Y., & Wang, Z. (2014). Leak location of pipelines based on Wang, K., & Zhong, P. (2014). Robust non-convex least squares loss function for
characteristic entropy. Journal of Loss Prevention in the Process Industries, 30(1), regression with outliers. Knowledge-Based Systems, 71, 290–302. https://doi.org/
24–36. https://doi.org/10.1016/j.jlp.2014.04.004 10.1016/j.knosys.2014.08.003
Pérez-Pérez, E. J., López-Estrada, F. R., Valencia-Palomo, G., Torres, L., Puig, V., & Mina- Xiao, R., Hu, Q., & Li, J. (2019). Leak detection of gas pipelines using acoustic signals
Antonio, J. D. (2021). Leak diagnosis in pipelines using a combined artificial neural based on wavelet transform and Support Vector Machine. Measurement: Journal of the
network approach. Control Engineering Practice, 107(May 2020), 104677. 10.1016/j. International Measurement Confederation.. https://doi.org/10.1016/j.
conengprac.2020.104677. measurement.2019.06.050
Rajeswaran, A., Narasimhan, S., & Narasimhan, S. (2018). A graph partitioning Xu, Y., Guo, R., & Wang, L. (2013). A Twin Multi-Class Classification Support Vector
algorithm for leak detection in water distribution networks. Computers & Chemical Machine. Cognitive Computation, 5(4), 580–588. https://doi.org/10.1007/s12559-
Engineering, 108, 11–23. https://doi.org/10.1016/j.compchemeng.2017.08.007 012-9179-7
Reddy, H. P., Narasimhan, S., Bhallamudi, S. M., & Bairagi, S. (2011). Leak detection in Žalik, B. (2005). An efficient sweep-line Delaunay triangulation algorithm. CAD
gas pipeline networks using an efficient state estimator. Part-I: Theory and Computer Aided Design, 37(10), 1027–1038. https://doi.org/10.1016/j.
simulations. Computers & Chemical Engineering, 35(4), 651–661. https://doi.org/ cad.2004.10.004
10.1016/j.compchemeng.2010.10.006 Zhou, X., Tang, Z., Xu, W., Meng, F., Chu, X., Xin, K., & Fu, G. (2019). Deep learning
Saade, M., & Mustapha, S. (2020). Assessment of the structural conditions in steel identifies accurate burst locations in water distribution networks. Water Research,
pipeline under various operational conditions – A machine learning approach. 166, Article 115058. https://doi.org/10.1016/j.watres.2019.115058
Measurement: Journal of the International Measurement Confederation, 166, Article Zhou, Z. J., Hu, C. H., Xu, D. L., Yang, J. B., & Zhou, D. H. (2011). Bayesian reasoning
108262. https://doi.org/10.1016/j.measurement.2020.108262 approach based recursive algorithm for online updating belief rule based expert
Sun, J., Xiao, Q., Wen, J., & Zhang, Y. (2016). Natural gas pipeline leak aperture system of pipeline leak detection. Expert Systems with Applications, 38(4), 3937–3943.
identification and location based on local mean decomposition analysis. https://doi.org/10.1016/j.eswa.2010.09.055
Measurement: Journal of the International Measurement Confederation.. https://doi.
org/10.1016/j.measurement.2015.10.015
10