You are on page 1of 12

Volume 4 Number 6 December 2021 (596-607)

DOI: 10.1016/j.gloei.2022.01.008

Production And
Hosting By Elsevier
Global Energy Interconnection
On Behalf Of KeAi Contents lists available at ScienceDirect
https: //www.sciencedirect.com/journal/global-energy-interconnection

Full-length article

Fault diagnosis of electric transformers based on infrared


image processing and semi-supervised learning
Jian Fang1, 2, Fan Yang1, 2, Rui Tong2, Qin Yu2, Xiaofeng Dai2
1. Key Laboratory of Middle-low Voltage Electric Equipment Inspection and Testing of China Southern
Power Grid Co., Ltd, Guangzhou 510620, P.R. China
Scan for more details
2. Guangzhou Power Supply Bureau of Guangdong Power Grid Co., Ltd, Guangzhou 510620, P.R. China

Abstract: It is crucial to maintain the safe and stable operation of distribution transformers, which constitute a key part of
power systems. In the event of transformer failure, the fault type must be diagnosed in a timely and accurate manner. To this
end, a transformer fault diagnosis method based on infrared image processing and semi-supervised learning is proposed
herein. First, we perform feature extraction on the collected infrared-image data to extract temperature, texture, and shape
features as the model reference vectors. Then, a generative adversarial network (GAN) is constructed to generate synthetic
samples for the minority subset of labelled samples. The proposed method can learn information from unlabeled sample
data, unlike conventional supervised learning methods. Subsequently, a semi-supervised graph model is trained on the
entire dataset, i.e., both labeled and unlabeled data. Finally, we test the proposed model on an actual dataset collected from
a Chinese electricity provider. The experimental results show that the use of feature extraction, sample generation, and
semi-supervised learning model can improve the accuracy of transformer fault classification. This verifies the effectiveness
of the proposed method.

Keywords: Transformer, Fault diagnosis, Infrared image, Generative adversarial network, Semi-supervised learning.

0 Introduction system. However, factors such as disrepair and the external


environmental conditions can cause faults or transformer
The distribution transformer is the key equipment dysfunction [1-3]. The common fault types include
that ensures the stable and reliable operation of a power equipment discharge faults and overheating faults [4-7].
In the event of a fault, it is necessary to analyze the fault
type in a timely and accurate manner to ensure that the
transformer can be repaired and rapidly restored to normal
Received: June 19 2021 Accepted: November 15 2021 Published:
operation. Therefore, research directed at improving the
December 25 2021
Fan Yang Qin Yu
accuracy of fault type diagnosis of transformers is highly
yangfansyy@126.com z645337515@126.com important [8-11].
Jian Fang Xiaofeng Dai
Existing methods for fault diagnosis and condition
fjenglish@163.com 1943973624@qq.com assessment of transformers include the application of the
Rui Tong IEC three-ratio method [12], Roger ratio method [13],
x645337515@126.com David triangle [6], and uncoded ratio [14] to analyze the
2096-5117/© 2021 Global Energy Interconnection Development and Cooperation Organization. Production and hosting by Elsevier B.V. on behalf of KeAi Communications
Co., Ltd. This is an open access article under the CC BY-NC-ND license (http: //creativecommons.org/licenses/by-nc-nd/4.0/ ).

596
Jian Fang et al. Fault diagnosis of electric transformers based on infrared image processing and semi-supervised learning

chromatographic data of transformer oil. In addition, issues. The number of samples of a type of label accounts
researchers have used infrared images to perform fault for a large majority of the entire data. More attention would
diagnosis and status assessment of equipment. If the be paid to these samples during model training, whereas the
equipment remains in a faulty operating state for a long information on the remaining samples would be omitted.
time, the temperature of the faulty area and its surroundings This would reduce the generalization capability of the
will increase. Infrared images can reflect the contour model and thereby, cause overfitting problems. To address
shape and texture characteristics of the device to a certain this, Reference [35] proposed an adaptive over-sampling
extent [16-19]. From the perspective of image acquisition, method for imbalanced datasets to improve the diagnostic
the detection process is relatively straightforward, the performance for power transformers. In this study, an
detection time is short, and the process is not subjected to adaptive synthetic minority over-sampling technique was
electromagnetic interference. Thus, it is unnecessary to used in the data pre-processing stage to generate new
shut down the equipment [20]. However, once the collected data. Based on this enriched dataset, certain classification
infrared images are fed to a central monitoring system, the methods were used to validate the effectiveness of this over-
system operators require substantial time to handle these sampling method. Reference [36] used data preprocessing
data. Moreover, depending entirely on the knowledge and gradient boosting methods for fault diagnosis of an
and experience of the staff may result in omission and/or oil-immersed transformer. The method is used to identify
misdiagnosis of phenomena [21-22]. and replace outliers to obtain denoising samples. The high
At present, with the continuous development of big dimensionality of infrared image data makes this data
data and artificial intelligence technology, an increasing expansion work more challenging.
number of intelligent analysis systems are being proposed. Considering this, a transformer fault diagnosis model
These can mine inherent information from collected based on image processing and semi-supervised learning
data samples and improve the diagnosis accuracy [23- is proposed in this paper. We first extract the temperature
26]. A method to apply the infrared technology in the feature, texture feature, and shape feature from the collected
fault diagnosis of substation equipment was proposed infrared image data as the feature parameters of the model.
in Reference [27]. Reference [28] put forward a method Second, the GAN algorithm is used to generate samples
to automatically detect the oil level in electrical power for the labeled sample dataset. Then, labeled and unlabeled
transformers in substations, based on infrared images. data are used to construct a graph-based semi-supervised
The edge points of the transformer oil conservator are learning network. Finally, it is tested on actual data. The
identified by applying the edge detection process on experimental results show that the method proposed in
an infrared image of the conservator. Then, its location this paper has high accuracy. The contributions of this
and shape can be obtained by an iterative ellipse fitting paper compared with conventional methods are as follows:
approach. Reference [29] used infrared thermography 1) extracts key information parameters from the original
data to diagnose the fault type of electrical equipment. infrared image; 2) reduces the imbalance between classes
In this study, K-means algorithm was used to cluster of labeled samples; and 3) constructs a semi-supervised
the infrared images, and a support vector machine was learning model, which can fully use the unlabeled data in
used as a classifier to estimate the fault type. Reference the database to further improve the accuracy of transformer
[30] used the infrared thermal imaging technology for fault diagnosis.
the fault diagnosis of power equipment. Reference The remainder of this paper is organized as follows:
[31] used a video surveillance system to monitor the Section II introduces the feature extraction of infrared
operating condition of electric equipment, based on the images. Section III describes the semi-supervised graph
infrared theory, a temperature measurement model, and a model. Section IV presents the GAN for sample synthesis.
temperature modification model. Reference [32] proposed Case studies are shown in Section V, and the conclusions
a real-time and off-line method to monitor temperature are presented in Section VI.
variations and analyze the fault region of electrical
equipment using infrared thermograms. Reference [33] 1 Feature Extraction of Infrared Images
located heating faults using the infrared heating thermal
imaging technology. Reference [34] proposed an auto- Infrared images can accurately and effectively reflect
diagnosis system for electrical distribution panes using a the thermal characteristics of transformers. Infrared images
matter-element model. are presently being used widely to evaluate the operating
Labeled sample datasets involve sample imbalance status of power equipment. However, effective information

597
Global Energy Interconnection Vol. 4 No. 6 Dec. 2021

cannot be obtained by relying only on the originally g (i, j ) − g min


G (i, j ) = × ( N g − 1) (1)
acquired infrared image. It is necessary to extract key g max − g min
information from the original image. Based on the extracted
f (i, j ) − f min
feature information, the diagnosis model can be designed F (i, j ) = × ( N f − 1) (2)
f max − f min
and analyzed more effectively, and the accuracy of model
results can be improved further. This study focuses on the where g max, g max, f max, and f min represent the maximum gray
extraction of the following features from an infrared image value, minimum gray value, maximum gradient value, and
of a transformer: temperature features, texture features, and minimum gradient value, respectively, of the image. The
shape features. number of pixels for which G (m, n) = i and F (m, n) = j, is
the value at (i, j ) of the gray–gradient co-occurrence matrix
1.1 Temperature Features of Infrared Images
H. The normalization yields the following:
Substation equipment failure is generally accompanied H (i, j )
by drastic variations in the equipment temperature. For H (i, j ) = (3)
M ×M
example, when the equipment undergoes an overheating On this basis, we can calculate the non-uniformity of
fault, the temperature of the fault location and surrounding gray distribution U1 and that of gradient distribution U2 as
area increases significantly. Therefore, we selected three the temperature features of infrared images:
fundamental temperature features as the most intuitive
Nf Ng
feature information: the maximum regional temperature,
average regional temperature, and variance of the regional
∑ [∑ H (i, j )]2
i =1 j =1
U1 = (4)
temperature distribution. Although the temperature N f Ng

information can directly reflect the temperature variation ∑∑ H (i, j )


i =1 j =1
characteristics of the device, it can be affected by
Nf Ng
environmental conditions (e.g., the temperature of the
external environment, light intensity, and wind speed)
∑ [∑ H (i, j )]
i =1 j =1
2

U2 = (5)
during the acquisition of the infrared image. The recorded N f Ng

temperature information exhibits deviations; hence, it cannot ∑∑ H (i, j )


i =1 j =1
accurately reflect the actual variations in the temperature
distribution of the detected equipment. Therefore, to 1.3 Shape Features of Infrared Images
reduce the interference caused by variations in the external This study calculates the Hu moment features of the
environment and limit the measurement error of the infrared images and linearly combines these to extract
infrared thermal imager of the detection device, additional the moment features that are invariant to translation and
texture and shape features were incorporated in this study rotation. The specific process is as follows:
as characteristic parameters for the fault type analysis of The ( p + q ) -th-order moment of a two-dimensional
transformers. discrete image can be expressed as
1.2 Texture Features of Infrared Images m pq = ∑∑ x p y q s( x, y ) (6)
x y
After the integrated information on the gray value and where s ( x, y ) is the pixel value at (i, j ). The zero-order
gradient of the infrared image is extracted, we calculate the moment of the image m00 is the sum of all pixels. The
co-occurrence matrix of the two values. Then, we process it centroid ( x , y ) can be determined using the first moment of
to obtain the final extracted texture information. This texture the image:
feature can be more sensitive to the boundary information m m
x = 10 , y = 01 (7)
of the image and reflect the roughness and uniformity of m00 m00
the image. This is conducive to the subsequent decision The second-order moment of the image contains the
analysis. The specific process of texture feature extraction is direction and size information. It is also called the moment
as follows: of inertia. The third-order moment reflects the degree of
Let g (i, j ) and f (i, j ) denote the gray value and gradient, distortion of the image projection, and the fourth-order
respectively, of (i, j ) in the transformer infrared image. moment reflects the projection kurtosis. The central moment
The size of the image is M × M. First, these need to be of the image is defined as
normalized and enlarged to the ranges of [0, N g − 1] and α pq = ∑∑ ( x − x ) p ( y − y ) q s ( x, y ) (8)
[0, N f − 1], respectively: x y

598
Jian Fang et al. Fault diagnosis of electric transformers based on infrared image processing and semi-supervised learning

After normalization, we obtain respectively. Here, the lengths of the datasets are l and
α pq u, respectively. Moreover, l < u , l + u = m . Here, m is
α pq = (9)
α ( p + q ) 2 +1
00
the length of the total dataset. A graph G = (V , E ) can
We can calculate the Hu moments on the basis of. These be constructed based on the dataset D = Dl ∪ Du. Here,
are expressed as follows: V = {x1 ,  xl , xl +1 ,  xl + u ,} is the node set, and E is the edge
κ1 = α20 + α02 (10) set. The affinity matrix can be defined as

κ 2 = (α20 + α02 ) 2 + 4α112 (11)  − xi − x j


2

exp( 2
) if i ≠ j
W (i, j ) =  2σ 2 (17)
κ 3 = (α30 − 3α21 ) + 3(α21 − α03 ) 2 (12)

 0 otherwise
κ 4 = (α30 − 3α12 ) + 3(α12 − α03 ) 2 (13)
where σ is the bandwidth of the Gaussian function.
κ 5 = (α30 + 3α12 )(α30 + α12 )[(α30 + α12 ) 2 − 3(α21 + α03 ) 2 ] Suppose  is the mapping function learned from the
+(3α21 − α03 )(α21 + α03 )[3(α30 + α12 ) 2 − (α21 + α03 ) 2 ] (14) graph G = (V , E ). Then,  can be used for classification.
Furthermore, yi = sign( ( xi )), and yi ∈ {−1,1}. The semi-
κ 6 = (α20 − α02 )[(α30 + α12 ) 2 − (α21 + α03 ) 2 ]
(15) supervised learning model needs to be constructed based
+4α11 (α30 + α12 )(α21 + α03 ) on a fundamental assumption: similar sample inputs would
have similar corresponding output values. An energy
κ 7 = (3α21 − α03 )(α30 + α12 )[(α30 + α12 ) 2 − 3(α21 + α03 ) 2 ]
function can be defined based on this assumption:
+(3α21 − α30 )(α21 + α03 )[3(α30 + α12 ) 2 − (α21 + α03 ) 2 ] (16)
1 m m
E ( ) = ∑∑W (i, j )( f ( xi ) − f ( x j ))2
2 i =1 j =1
2 Graph-based Semi-supervised Learning m m m
Method = ∑ di  2 ( xi ) − ∑∑ W (i, j ) f ( xi ) f ( x j ) (18)
i =1 i =1 j =1

A collected sample set mainly includes two categories: = F T ( D − W )F


labeled samples and unlabeled samples. The proportion of
where F = ( Fl T FuT )T. The classification results of labeled
unlabeled sample data is generally higher owing to factors
and unlabeled data are Fl = ( f ( x1 ); f ( x2 ); ; f ( xl )) and
such as the cost of data labeling. Only labeled sample data
Fu = ( f ( xl +1 ); f ( xl + 2 );; f ( xl + u )) , r e s p e c t i v e l y . T h e
for network training are used in the conventional supervised
diagonal matrix is D = diag (d1 , d 2 ,  d m ) . Here, each
learning model, and the information contained in a large
diagonal element d i is the sum of the elements of the i-th
amount of unlabeled data is omitted. This results in a low
row of the matrix W.
generalization capability of the trained model. This study When the energy function F attains the minimum value,
addresses this problem using the graph learning model for the classification result function for labeled samples is
semi-supervised learning. It enables the full use of these
 ( xi ) = yi, and the classification function for unlabeled
unlabeled sample data for model training [20].
samples is ∆Fu = 0. Here, ∆ is the Laplacian matrix, and
Graph learning is the mapping of a sample set to a
∆ = D − W. D and W are divided into blocks with the l-th
corresponding structure graph. Each sample element in the row and l-th column as the dividing lines. This can be
set can correspond to a node in the constructed graph. Edges expressed as
would connect nodes corresponding to the two samples.
The weight of an edge is proportional to the similarity W Wlu  D 0lu 
W =  ll D =  ll (19)
of the original sample. If the point corresponding to the Wul Wuu   0ul Duu 
marked sample has a color and the color type corresponds
Then, Eq. (18) can be rewritten as
to the category of the label, the point corresponding to
the unlabeled sample has no color. Then, semi-supervised  D 0lu  Wll Wlu    Fl 
E ( ) = ( FlT FuT )T   ll −  
learning can be considered as the spreading of the colors in   0ul Duu  Wul Wuu    Fu  (20)
the graph along the path of the edges. Because the points =FlT ( Dll − Wll ) Fl − 2 FuT Wul Fl + FuT ( Duu − Wuu ) Fu
and edges in the graph can be expressed in the form of a
matrix, matrix operations can be used to derive the graph ∂E ( )
network [37-39]. Let = 0. Then,
∂f u
Let Dl = {( x1 , y1 ),  , ( xl , yl )} and Du = {xl +1 , xl + 2 ,  , xl + u }
denote the labeled dataset and unlabeled dataset, Fu = ( Duu − Wuu ) −1Wul Fl (21)

599
Global Energy Interconnection Vol. 4 No. 6 Dec. 2021

Let P = D -1W. We can obtain confrontational parts: the generator and the discriminator.
The function of the generator is to learn the characteristics
 Dll -1 0lu  Wll Wlu   Dll -1Wll Dll Wlu 
-1
of the actual data and generate synthetic data based on
P= 
 =  (22)
 0ul Duu -1  Wul Wuu   Duu -1Wll Duu -1Wlu  it. The function of the discriminator is to distinguish the
source of the input sample, i.e., to correctly assess whether
Now, Puu = Duu -1Wuu, and Pul = Dul -1Wul. Then, Eq(21)
can be rewritten as the input sample is from the actual sample set or synthetic
sample dataset. As the training of the GAN progresses, the
Fu = ( Duu ( I − Duu −1Wuu )) −1Wul Fl
(23) capabilities of the generator and discriminator improves
= ( I − Puu ) −1 Pul Fl gradually [40].
We can use data of Dl to calculate Fl = ( y1 ; y2 ;; yl ). A structural diagram of the GAN is shown in Fig. 1.
Then, we can obtain Fu to classify the unlabeled data. The input of the generator G is a random noise vector z.
For the multi-label classification problem, let Y denote The noise is generally Gaussian or uniformly distributed,
the label set. Accordingly, a graph G = (V , E ) can be and the output is synthesized sample data G(z). The input of
constructed. In addition, a new non-negative label matrix the discriminator D is real data. Its objective is to perform
J = ( J1T , J1T , …, J mT )T must be constructed. Here, J ∈  m× Y . binary classification on the input data x and G(x). If it is
The label of sample xi corresponds to the i-th row element of determined that the input source is a real sample, it outputs
the matrix J, i.e., J i = ( J i ,1 , J i ,2 , , J i ,|Y | ). The classification one. Otherwise, it outputs zero. Then, the output result of
rule is yi = arg max J i , j. the discriminator would be guided by backpropagation to
1≤ j≤|Y | optimize the parameters and thereby, improve the capability
For i ∈[1, m] and j ∈[1,| Y |], the matrix J needs to be of G (so that the distribution of G(z) is as close as feasible
normalized. This is given as to the distribution of the actual data pdata). Simultaneously,
1 if (1 ≤ i ≤ l ) ∧ ( yi = j ) the discriminator improves its classification performance.
J (0) = Yi , j =  (24) The two continue to optimize in this adversarial training.
0 otherwise
When the discriminator cannot determine which dataset the
The matrix W is used to construct the transfer matrix of input sample originated from, it can be considered that the
the label S: generator has learnt the distribution characteristics of the
S = D −1 2WD −1 2 (25) actual data [41].
1 1 1
w h e r e D −1 2 = diag( , , , ). Then, the Real data x
x
d1 d 2 dl + u
Discriminator D True/False
iteration formula can be expressed as Random variable z Generator G
J (t + 1) = β SJ (t ) + (1 − β )Y (26)
G(z)

where β ∈ (0,1) represents the model hyper-parameters. Fig. 1 Flowchart of GAN


After Eq(26) converges, we can obtain
J opt = lim J (t ) = (1 − β )( I − β S ) −1Y (27) 3.2 Generative Adversarial Network
The above iterative process is essentially equivalent to
the solving of an optimization problem: The GAN is a generative model. According to
probability and statistics theory, generative models usually
1 m 1 1 l
min ( ∑ W (i, j ) Ji − J j ) + µ ∑ J i − Yi (28) include a set of probability distribution functions px ( x, θ )
J 2 i , j =1 di dj i =1 with a parameter θ and can be trained to approximate the
where µ > 0 is the regularization parameter of the function. probability distribution of real data Pdata ( x) to the extent
1− β feasible. Then, new synthetic data can be generated via
When µ = , the optimal solution of Eq(28) is the
β extraction according to the aforementioned probability
convergent solution of Eq(27). distribution [42-44].
If the probability distribution function PG ( x, θ ) is
3 Sample Synthesis Based on Generative restricted further (e.g., it is assumed that the probability
Adversarial Network distribution function conforms to the Gaussian distribution),
the maximum likelihood method is generally used to
3.1 Fundamental principles optimize θ. Assuming that a set of real samples exists, these
The generative adversarial network (GAN) was inspired samples conform to the probability distribution Pdata ( x) and
by the binary zero-sum game theory. It is divided into two are mutually independent. Let pdata ( x )and px ( x, θ ) be the

600
Jian Fang et al. Fault diagnosis of electric transformers based on infrared image processing and semi-supervised learning

probability density functions of the sum Pdata ( x) and Px ( x, θ ), 3.3 Training steps
respectively. Then, the likelihood function of this sample set
The optimization of GAN is a minimization problem. In
is n the actual training process, the generator and discriminator
L(θ ) = L( x1 , x2 , x3 , , xn , θ ) = ∏ px ( xi , θ ) (29) models have their respective loss functions. The two are
i =1

Take the logarithm of both sides of the likelihood trained alternately. The detailed steps for training a GAN
function: are given below:
n
(1) Given the probability distribution Pdata ( x) and
log L(θ ) = ∑ log px ( xi , θ ) (30)
i =1 prior distribution Pz ( z ) of the actual data, set the hyper-
The objective of the generative model is to determine a parameters n and ηsimultaneously. In addition, initialize the
maximum likelihood estimate that maximizes the value of parameters ϕ and θ of the discriminator and generator.
the likelihood function: (2) Extract real samples {x1 , x2 , , xn } from Pdata ( x), and
n collect random noise samples {z1 , z2 , , zn } from the prior
θˆ = arg max ∑ log px ( xi , θ ) (31) distribution. Then, input the noise samples into the generator
θ i =1

Because the sample is drawn by the probability to obtain the generated data {G ( z1 ), G ( z2 ), , G ( zn )}. Use
the gradient ascent method to update the parameter ϕ of the
distribution function Px ( x,θ ), Eq (31) can be rewritten as
discriminator. That is,
θˆ ≈ arg max Ex ∼ p [log px ( x, θ )] (32) 1 n 1 n
θ
data
J D (ϕ ,θ ) = ∑
n i =1
log D( xi ) + ∑ log(1 − D(G ( zi ))) (36)
n i =1
By transforming the above formula into integral
operation, we obtain ϕ ← ϕ + η∇ϕ J D (ϕ ,θ ) (37)
θˆ ≈ arg max ∫ pdata ( x) log px ( x, θ )dx (33) (3) Repeat Step 2 k times.
θ
(4) Extract random noise samples {z1 , z2 , , zn } from
The concept of KL divergence is introduced below.
the prior distribution, and update the parameter θ of the
It is an index used to measure the degree of difference
generator using the gradient descent method:
between two probability distributions. The KL divergence
1 n
n∑
is zero when the two probability distributions are identical. J G (ϕ ,θ ) = log(1 − D(G ( zi ))) (38)
Furthermore, a higher KL divergence implies a larger i=1

difference between the two probability distribution θ ← θ −η∇θ J G (ϕ ,θ ) (39)


functions. The following is the expression for KL (5) Repeat Steps 2, 3, and 4 until the network converges
divergence: or attains the set maximum number of iterations.
p ( x)
DKL (Pdata || Px ) = ∫ pdata ( x) log data dx (34)
p x ( x, θ ) 4 Transformer Fault Diagnosis Process
4.1 Fault Diagnosis Process
θˆ ≈ arg min DKL (Pdata || Px ) (35)
θ Image processing and graph-based semi-supervised
The objectives of the discriminator are to maximize the learning methods are used to realize fault diagnosis of
output value D( x) of the sample x from the true probability transformers. The specific process is shown in Fig. 2. First,
distribution Pdata ( x), and minimize the output value of the the infrared image data of the transformers are obtained as
pseudo data D(G ( z )) generated by the generator G ( z ). To a sample case library. In addition, the temperature feature,
conveniently express and calculate the objective function texture feature, and shape feature of the infrared image
of the GAN model, imitating the operation of taking the are extracted as the characteristic features. The sample
pairs of the likelihood functions mentioned above, taking case library is classified into two categories: the labeled
the logarithms of D( x) and D(G ( z )) respectively. In dataset and unlabeled datasets. For the labeled dataset,
addition, the maximum value of log( D ( x )) and minimum the imbalance ratio of the sample is first calculated. The
value of log( D(G ( z )) are calculated using the same imbalance ratio is equal to the number of samples in the
formula, and the values of both are obtained in the range majority subset divided by the number of samples in the
from zero to one. Then, it is altered from log( D (G ( z )) to minority subset. If the imbalance ratio is higher than the
log(1-D(G ( z ))). Finally, the expectations for log( D (G ( z )) set threshold, it can be considered that the labeled dataset
and log(1-D(G ( z ))) are taken to obtain the discriminator’s has sample imbalance issue. It is necessary to use GAN to
objective function. generate samples for the minority subset. The imbalance
601
Global Energy Interconnection Vol. 4 No. 6 Dec. 2021

threshold in this study is set to three. Then, the labeled dataset Discriminator

and unlabeled dataset are used to construct a graph network,


Input
10@1×5000
train it, and finally output the fault type of the device. This

study mainly considers two types of substation equipment
Fc1:
10×5000 10×5000 10@1×5000
failures: equipment defect failures and overheating failures. E:10@1×32 H:10@1×32
R:10×32 Gθ

Obtain infrared image of Embedding LSTM


transformers layer
Fully connected
layer Softmax
layer
Extract temperature features, texture
features and shape features of
infrared images
Generator
Input
10@1×5000

Labeled dataset Unlabeled dataset Fc1:


10×5000 10@1×5000
10×5000
E:10@1×32 H:10@1×32
No R:10×32 Gθ
Imbalancc ratioĹ3

Yes
Use GAN to gnereate Embedding LSTM
data layer
Fully connected
layer Softmax
layer
Built graph G=(V,E)

Fig. 3 Structure of GAN


Train the graph
Table 1 Confusion matrix
Output fault type
Estimation label
Fig. 2 Fault diagnosis process 1 0

4.2 GAN Structure 1 True Positive False Negative


Actual label
0 False Positive True Negative
The structure of GAN is shown in Fig. 3. The
generator model is based mainly on LSTM. The GAN
nTP + nTN
structure includes an embedding layer, an LSTM network ACC= × 100% (40)
layer, a fully connected layer, and a softmax layer. The nTP + nTN + nFP + nFN
discriminator model is based mainly on CNN. It includes the nTP
embedding layer, two-dimensional convolutional layer, one- REC = × 100% (41)
nTP + nFN
dimensional maximum pooling layer, fully connected layer,
and softmax layer. The input features of the GAN model nTP
PRE = × 100% (42)
are the features extracted from the infrared images, i.e., nTP + nFP
temperature features, texture features, and shape features.
5.2 Field test results
5 Case Studies The infrared images of transformers were collected from
a Chinese electric company. The numbers of unlabeled and
5.1 Evaluation Metrics
labeled samples are 359 and 182, respectively. Of the latter
The performance of the model in terms of fault number, 41 are defective faults and 141 are overheated
classification is evaluated by constructing a confusion faults. The non-uniformity of the labeled samples is 3.43,
matrix, as shown in Table 1. For classification problems which is higher than the set imbalance threshold. Therefore,
with imbalanced data, the classification performance of the minority labeled samples need to be generated using
the model cannot be evaluated completely based only on the proposed GAN. First, the temperature features, texture
the classification accuracy (ACC). Therefore, based on features, and shape features are extracted from the infrared
the confusion matrix, we use two additional evaluation image as the characteristic features. Then, the GAN model
indicators: recall (REC) and precision (PRE). is used to generate data for the labeled samples of the
602
Jian Fang et al. Fault diagnosis of electric transformers based on infrared image processing and semi-supervised learning

defective faults. Subsequently, these characteristic features 84 equipment defect


of both labeled and unlabeled samples are fed as model
overheating
83 mean value
inputs to train the semi-graph classification model. Finally,

PRE (%)
82

the proposed algorithm is applied to the field test with the 81

well-trained classification model. Fig. 4 illustrates the tested 80

infrared image of the transformer. Based on the results


79

from the algorithm presented in this paper, it is concluded


78
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
β
that the transformer has an overheating fault. Furthermore,
Fig. 5 Effects of different hyper-parameter values
the information is sent to the field staff in a timely manner
using the algorithm. An inspection by the staff revealed that
the transformer was damaged at the point marked by a black
5.4 Effects of Feature Extraction
box in the figure. The damage caused the transformer to In this subsection, we test the effectiveness of feature
overheat. This observation also verifies the effectiveness of extraction on the original data. The following method is
the method proposed in this paper. designed for comparison. Method 1 uses the original image
information as the input of the model. Note that because the
dimensionality of the vector is high at this time, no sample
expansion method is used. Methods 2–4 extract only one
feature of the infrared image from among the temperature
feature, texture feature, and shape feature. The classification
results of the two types of faults are shown in Table 2.

Table 2 Effects of different feature extraction methods

Fault type Method ACC (%) REC (%) PRE (%)

Fig. 4 Field test results Method 1 52.2 53.1 54.7

Method 2 62.4 64.8 63.6


5.3 hyper-parameter Selection Equipment
Method 3 68.5 70.2 71.1
defect
We first analyze the effect of the selection of hyper- Method 4 74.5 72.6 76.8
parameter. Let β ∈{0.1,0.3,0.5,0.7,0.9}. The results of
Proposed 82.2 84.7 83.1
classification of ACC, REC, and PRE for the two types of
faults are shown in Fig. 5. It is evident that the classification Method 1 57.5 60.1 58.8

performance of the proposed model is the highest when Method 2 65.5 68.2 67.7
β = 0.7. This determines the setting of the hyper parameter. Overheating Method 3 72.3 77.2 74.1

Method 4 78.8 75.6 80.1


88
equipment defect
86 overheating Proposed 86.2 84.8 83.5
mean value
84
ACC (%)

82 The results in Table 2 reveal that effective feature


80 extraction can significantly improve the model’s capability
78 to classify transformer faults. This is because although the
76
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
input vector has a high dimensionality, the model cannot
β learn effectively from it owing to the limited number of
86 samples. This results in a decrease in classification accuracy.
equipment defect
overheating
The effect of the model can be improved moderately by
using simple feature extraction. This is because the use of
84 mean value
REC (%)

82 only one type of feature quantity is biased and cannot fully


reflect the input information features. It should be noted
80
that among the three feature quantities, the improvement
78 corresponding to the temperature feature is the lowest
and that corresponding to the shape feature is the highest.
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
β

603
Global Energy Interconnection Vol. 4 No. 6 Dec. 2021

This also indicates that the temperature feature would be supervised learning framework. It can learn the mapping
subjected to interference by the external measurement relationship from labeled samples and effectively mine
environment and would also be affected by the accuracy information from a large number of unlabeled samples.
of the measurement tool. These would ultimately affect the This significantly improves the classification capability
accuracy of the temperature feature. In contrast, Hu moment of the model and ensures that it has a good generalization
is used as the shape feature, with its higher robustness. capability. Therefore, it is more suitable for practical
Combining the advantages of the above-mentioned features, applications. It is particularly suitable for fault classification
the method in this study uses three types of information of transformers because samples of such faults are difficult
simultaneously. Thereby, infrared image features can be to obtain.
extracted fully and accurately. This, in turn, can significantly
improve the classification accuracy of the model.
5.6 Effects of GAN
In this section, we verify the effectiveness of generating
5.5 Effects of Semi-supervised Graph Model
minority samples through GAN. Two comparison methods
We test the effect of using the graph network for semi- are designed to better compare the model classification
supervised learning to classify the infrared images of effect. The first method is to use only the original collected
substation equipment. The following two methods are database for analysis, i.e., sample generation method is
designed for comparison. A support vector machine (SVM) not used. The second method is to apply the model with
is used in Method 1 for fault classification. The SVM oversampling. The transformer fault classification results
model uses the radial basis function as the kernel function. for the different methods are shown in Table 4.
The penalty factor is set to 0.2, and the parameter of the
kernel function is set to 104. The second method involves Table 4 Effects of different classification models
using a multi-layer neural network. The number of network Fault type Method ACC (%) REC (%) PRE (%)
layers of the DNN is set to five, the learning rate is set
Method 1 61.1 64.2 60.5
to 0.001, and the learning period is set to 3000. Because
Equipment
both Methods 1 and 2 involve supervised learning, they Method 2 71.5 72.0 74.1
defect
use only labeled sample datasets for model training. The Proposed 82.2 84.7 83.1
classification results for the two faults are shown in Table 3. Method 1 64.4 65.3 61.1

Overheating Method 2 68.5 71.1 67.7


Table 3 Effects of different classification models
Proposed 86.2 84.8 83.5
Fault type Method ACC (%) REC (%) PRE (%)

Method 1 65.1 67.5 64.5


Equipment
Table 4 shows that the strategy of sample synthesis can
Method 2 71.5 73.2 72.1 improve the accuracy of model classification. Although
defect
Proposed 82.2 84.7 83.1 the classification effect can be improved by applying
Method 1 70.2 72.2 70.9 the oversampling strategy, the range of improvement
is significantly small. This is because the oversampling
Overheating Method 2 74.4 73.3 71.0
method does not add new data samples and only repeats the
Proposed 86.2 84.8 83.5
original samples. This strategy would result in overfitting
of the trained model and a reduction in its generalization
Table 3 shows that the classification accuracy of the capability. In contrast, this study uses GAN to generate
method proposed in this paper is significantly higher than new samples. The sample generation process considers the
that of the other two methods. This is because Methods spatial distribution of the minority sample data. This ensures
1 and 2 are supervised learning. Therefore, effective the randomness and effectiveness of the generated samples.
classification requires a large number of labeled samples for This, in turn, improves the generalization capability of
model training. However, in the actual process, it is highly the model and enhances its capability to classify infrared
difficult to obtain such labeled samples. Training on a images of transformers.
limited set of labeled samples results in a low classification
5.7 Effects of number of samples
effect of the model and causes over-fitting problems (which
results in low generalization capability). In contrast, the Because both labeled and unlabeled data are applied to
graph learning method proposed in this paper is a semi- train the model, it is worthwhile to investigate the effect of

604
Jian Fang et al. Fault diagnosis of electric transformers based on infrared image processing and semi-supervised learning

the proportion of these numbers of samples on the model Declaration of Competing Interest
performance. Herein, we set the proportion of the labeled
data and unlabeled data as a number from { 13 , 12 ,1,2,3}. The We declare that we have no conflict of interest.
transformer fault classification results of different ratios are
shown in Table 5. References
[1] Wang M H (2003) A novel extension method for transformer
Table 5 ACC of model with different proportions of samples
fault diagnosis. IEEE Transactions on Power Delivery, 18(1):
Data proportion 1/3 1/2 1 2 3 164-169
Equipment defect 78.3 80.5 82.2 83.5 84.1 [2] Xu C, Liu J S, Zhang Y, et al (2020) Loss Reduction Methods of
the On-Load Tap-Changer Transformer in Distribution Network
Overheating 79.2 81.5 82.9 85.2 86.0
under the Background of Electric Energy Substitution.Power
System and Clean Energy, 36(3):13-17, 26
Table 5 shows that the ACC of the model increases [3] Chen Z H, Wu Z Y, Liu H, et al (2020) Research on Three-
(which indicates an improvement in the model performance) Phase Unbalanced Loss of Transformers Based on Crisscross
with the increase in the proportion of labeled samples. This Optimization Algorithm. Power System and Clean Energy,
is because the labeled samples contain more information 36(7):57-63
for model training. Note that the proposed model performs [4] Ghoneim SSM, Taha IBM (2016) A new approach of DGA
better than the compared methods for these five cases. interpretation technique for transformer fault diagnosis.
International Journal of Electrical Power & Energy Systems, 81:
265-274
6 Conclusions [5] Lyu D, Sun Y C, Guo M W, et al (2020) A novel modular
multilevel DC transformer based on dual buck submodule.
A fault diagnosis model for transformers is proposed Electric Power Engineering Technology, 39(5): 169-177
based on image processing, graph-based semi-supervised [6] Xu T G, Wang Y Q, Zhu C, et al (2020) Numerical simulation
learning, and GAN. First, the key feature parameters are and experimental study on heat transfer performance of panel-
extracted from the infrared images, including temperature type radiators for transformers. Electric Power Engineering
features, texture features, and shape features. Then, the Technology, 39(5): 178-184
GAN algorithm is used with the labeled sample data to [7] Sun Y C, Ding N M, Wang Q (2020) Power integrated modular
generate samples for the minority samples. Finally, a graph- multilevel solid-state transformer with common carrier
based semi-supervised network is constructed and trained modulation. Electric Power Engineering Technology, 39(4):2-8
for all the data including the labeled and unlabeled samples. [8] Huang Y C, Yang H T, Huang C L (1997) Developing a new
The experimental results show that the method presented transformer fault diagnosis system through evolutionary fuzzy
logic. IEEE Transactions on Power Delivery, 12(2): 761-767
in this paper has a high accuracy of fault classification of
[9] Zhang Z, Xu C F, Wang Y F, et al (2020) Control strategies for
distribution transformers. The following are the advantages
the multi-level DC-link power electronic transformer. Electric
of the proposed method: (1) The key feature parameters are
Power Engineering Technology, 39(4):9-15
extracted from the original infrared images to ensure the [10] Zhang Z F, Xie Y Y, Yang C, et al (2020) Bipolar short-circuit
validity and universality of the input information; (2) The fault ride-through method of DC transformer. Electric Power
problem of imbalance between classes in labeled samples is Engineering Technology, 39(4):16-21, 41
reduced, and the model’s performance is improved; (3) The [11] Zhang C Y, Yuan Y B, Yuan X D, et al (2020) Selection of DC
information of unlabeled samples is used fully. This reduces port sensor position for power electronic transformer. Electric
the model’s dependence on labeled sample data and makes Power Engineering Technology, 39(4):22-27
it more suitable for actual fault diagnosis of transformers. [12] IEC 599, International Electrotechnical Commission. Guidance
for the analysis in transformer and other on-filled electrical
equipment in service
7 Acknowledgements
[13] Rogers R R (1978) IEEE and IEC codes to interpret incipient
faults in transformers, using gas in oil analysis. IEEE Transactions
This work was supported by China Southern Power
on Electrical Insulation, EI-13(5): 349-354
Grid Co. Ltd. science and technology project (Research
[14] Duval M (1989) Dissolved gas analysis: It can save your
on the theory, technology and application of stereoscopic
transformer. IEEE Electrical Insulation Magazine, 5(6): 22-27
disaster defense for power distribution network in large [15] Bacha K, Souahlia S, Gossa M (2012) Power transformer fault
city, GZHKJXM20180060) and National Natural Science diagnosis based on dissolved gas analysis by support vector
Foundation of China (No.51477100). machine. Electric Power Systems Research, 83(1): 73-79

605
Global Energy Interconnection Vol. 4 No. 6 Dec. 2021

[16] Chen R Y, Jiang J, Chen M, et al (2020) Feature selection High voltage apparatus 49(1): 126-129
of dissolved gases in power transformer based on maximal [31] Ran L I (2008) Research and realization of convergence between
information coefficient. Electric Power Engineering Technology, infrared temperature measurement technology and video
39(2): 140-145 surveillance system of substation. Power System Technology,
[17] Xia J H, Fang Y, Liu B R, et al (2020) The transformer 32(14): 80-82
differential protection method based on characteristic current [32] Jaffery Z A, Dubey A K (2014) Design of early fault detection
waveform width criterion. Electric Power Engineering technique for electrical assets using infrared thermograms.
Technology, 39(1): 184-190 International Journal of Electrical Power & Energy Systems, 63:
[18] Xu B, Yin X G, Zhang Z, et al (2018) Power Grid Fault Diagnosis 753-759
Model Based on Information Fusion of Topological Graph [33] Chen Q S, Li Y X, Liu Z X, et al (2018) Analysis of transformer
Element. Transactions of China Electrotechnical Society, 33(3): abnormal heating based on infrared thermal imaging technology.
512-522 2018 2nd IEEE Conference on Energy Internet and Energy
[19] Ding P, Zhu K, Zhu Y Q, Luan Y Y, (2019). Determining System Integration (EI2). Beijing, China. IEEE, 1-5
the Permanent Fault in Distribution Lines Based on Model [34] Wang M H, Wu P C, Jiang W J (2015) Application of infrared
Recognition. Transactions of China Electrotechnical Society, thermography and extension recognize method to intelligent fault
34(5): 1004-1012 diagnosis of distribution panels. IEEJ Transactions on Electrical
[20] Wang Z Y, Liu Y L, Griffin P J A combined ANN and expert and Electronic Engineering, 10(4): 479-486
system tool for transformer fault diagnosis. IEEE Transactions on [35] Tra V, Duong B P, Kim J M (2019) Improving diagnostic
Power Delivery, 13(4): 1224-1229 performance of a power transformer using an adaptive over-
[21] Lin C E, Ling J M, Huang C L (1993) An expert system for sampling method for imbalanced data. IEEE Transactions on
transformer fault diagnosis using dissolved gas analysis. IEEE Dielectrics and Electrical Insulation, 26(4): 1325-1333
Transactions on Power Delivery, 8(1): 231-238 [36] Liu N, Zhang B, Ma Q, et al (2021) Stack Attention-Pruning
[22] Zhang Y, Ding X, Liu Y,et al. (1996) An artificial neural Aggregates Multiscale Graph Convolution Networks for
network approach to transformer fault diagnosis. IEEE Hyperspectral Remote Sensing Image Classification, IEEE
Transactions on Power Delivery, 11(4): 1836-1841 Access, 9: 44974-44988
[23] Li T, Hu W H, Li J, et al (2020) Intelligent Economic Dispatch [37] Zhang W J, Liu H, Liu Y C, et al (2020) Semi-supervised
for PV-PHS Integrated System: a Deep Reinforcement Learning- hierarchical recurrent graph neural network for city-wide parking
Based Approach. Transactions of China Electrotechnical Society, availability prediction. Proceedings of the AAAI Conference on
35(13): 2757-2768 Artificial Intelligence, 34(1): 1186-1193
[24] Xie S W, Hu Z J, Wang J Y, et al (2019) A Multi-Objective [38] Gong L H, Yang J J, Zhang X (2020) Semi-Supervised Breast
Planning Model of Active Distribution Network Based on Histological Image Classification by Node-Attention Graph
Uncertain Random Network Theory and Its Solution Algorithm. Transfer Network, IEEE Access, 8 : 158335-158345
Transactions of China Electrotechnical Society, 34(5): 1038-1054 [39] Chen C, Li K, Wei W, et al (2021) Hierarchical Graph Neural
[25] Ju Z L, Xing W, Jin H P, et al (2020) Detection of Substation Networks for Few-Shot Learning, IEEE Transactions on
Faults Image Based on Lightweight Network.Power System and Circuits and Systems for Video Technology, doi: 10.1109/
Clean Energy, 36(8):43-49 TCSVT.2021.3058098
[26] Sun X R, Li Y Z, Peng B, et al (2020) Substation Foreign Object [40] Creswell, Antonia, et al (2018) Generative adversarial networks:
Detection Based on Generative Adversarial Network and Deep An overview, IEEE Signal Processing Magazine, 35(1): 53-65
Residual Neural Network.Power System and Clean Energy, [41] Creswell A, White T, Dumoulin V, et al (2018) Generative
36(9):68-75 adversarial networks: An overview. IEEE Signal Processing
[27] Cao Y, Gu X M, Jin Q (2008) Infrared technology in the fault Magazine, 35(1): 53-65
diagnosis of substation equipment. 2008 China International [42] Karras T, Laine S, Aila T (2019) A style-based generator
Conference on Electricity Distribution. Guangzhou. IEEE, 1-6 architecture for generative adversarial networks//Proceedings
[28] Liu J, Xie C, Lin L H (2010) Automatic detection of oil level of the IEEE/CVF Conference on Computer Vision and Pattern
of electric power transformers using infrared image. Gaodianya Recognition : 4401-4410
Jishu/ High Voltage Engineering, 36(4): 964-970 [43] Zhu B H, Jiao J T, David T (2020) Deconstructing generative
[29] Zou H, Huang F Z (2015) A novel intelligent fault diagnosis adversarial networks, IEEE Transactions on Information
method for electrical equipment using infrared thermography. Theory, 66(11): 7155-7179
Infrared Physics & Technology, 73: 29-35 [44] Wu Y L, et al (2021) Gradient normalization for generative
[30] Shao, Jin, et al (2014) Application of infrared thermal imaging adversarial networks, Proceedings of the IEEE/CVF International
technology to condition-based maintenance of power equipment, Conference on Computer Vision

606
Jian Fang et al. Fault diagnosis of electric transformers based on infrared image processing and semi-supervised learning

Biographies

J ian Fang received the master degree at Qin Yu received the master degree at Tsinghua
Wuhan University in 2011. His main research University in 2010. His main research interests
interests include distribution network operation include smart distribution network, robot
management, distribution network technology control and high voltage technology.
supervision, and distribution network new
technology promotion.

Fan Yang received the master degree at South Xiaofeng Dai received the master degree at
China University of Technology in 2019. North China Electric Power University in 2006.
His main research interests include smart His main research interests include smart
distribution network and cable technology. distribution network and cable technology.

(Editor Dawei Wang)

Rui Tong received the master degree at Tianjin


University in 2013. His main research interests
include smart distribution network and external
insulation of power equipment.

607

You might also like