You are on page 1of 5

A Fault Diagnosis Method of Power Transformer

Based on Cost Sensitive One-Dimensional


Convolution Neural Network
Lijing Zhang Gehao Sheng Huijuan Hou Xiuchen Jiang
Department of Electrical Department of Electrical Department of Electrical Department of Electrical
Engineering Engineering Engineering Engineering
Shanghai Jiaotong University Shanghai Jiaotong University Shanghai Jiaotong University Shanghai Jiaotong University
Shanghai, China Shanghai, China Shanghai, China Shanghai, China
lijingzhang1874@163.com shenghe@sjtu.edu.cn houhuijuan@sjtu.edu.cn xcjiang@sjtu.edu.cn

Abstract—Machine learning based dissolved gas analysis diagnosis methods can be mainly divided into two types. One
(DGA) is a significant technique for the incipient fault diagnosis type is the traditional ratio criteria interpretation, i.e.
of power transformers. However, the diagnosis methods have Doernerburg, Duval Triangle, and IEC Ratio [4], [5], which
limitations in learning from imbalanced fault dataset, especially compares key gas ratios to some predefined criteria interval of
in the case of a severe uneven distribution. The research different faults. But, this kind of methods is prone to
interests, how to establish a suitable model for the imbalanced misjudgment, because of the absolute classification boundaries.
fault diagnosis, are drawing a great deal of attention. In this With the development of artificial intelligence (AI), machine
paper, a novel one-dimensional convolution neural network (1D learning based DGA interpretations, such as support vector
CNN) model based on cost sensitive learning is proposed to be
machine (SVM) [6], [7], neutral network [8], and deep belief
employed for transformer fault classification. Firstly, the
structure of 1D CNN is designed with three hidden layers and
network (DBN) [9] have drawn much attrition in recent
two fully connected layers. Then, a class-dependent cost matrix is research. This type of methods can extract the non-linear
introduced into the soft-max function, which can modify the relationship between gas concentrations or ratios and
training process of the posed cost sensitive 1D CNN (CS-1D transformer faults, so that they have made great progress in the
CNN), so as to pay more attention on the minority classes. field of fault diagnosis. However, these machine learning based
Moreover, the PSO algorithm is adopted to optimize cost matrix methods train all fault samples by assuming an equal
for the CS-1D CNN. The performance of the proposed model is importance leading the results to favor the majority fault class.
evaluated by case studies on a real-world fault dataset. The Statistically, the transformer fault dataset is inevitably
results reveal that the CS-1D CNN model has improved the imbalanced distribution, so it will have effect on the
accuracies of the minority fault classes and thus the overall performance of most machine learning methods.
accuracy of the whole dataset.
Nowadays, cost sensitive based deep learning methods can
Keywords—power transformer, dissolved gas analysis (DGA), effectively solve drawbacks of the traditional machine learning
fault diagnosis, cost sensitive learning, one-dimensional methods in imbalanced classification [10]. By introducing cost
convolution neural network (1D CNN) sensitive elements, the training process of deep network can be
modified so as to have better performance on the minority
I. INTRODUCTION classes. For instance, a cost-sensitive convolutional neural
As one of the most critical equipment in power system, network (CNN) was raised for imbalanced image classification
power transformers have a significant impact on the safety and [11]. The effectiveness of cost-sensitive DBN was discussed
reliability of electricity supply. However, the operation dealing with both the binary and multiple classifications [12].
condition of transformers is harassed by many internal and However, in the transformer fault diagnosis field, the potential
external factors including natural aging, overloading, the advantages of cost-sensitive deep network on imbalanced
erosion of external environment and so on [1], [2]. Hence, it is dataset have not been fully researched yet.
required to defect incipient faults in transformers as early as Based on the above discussions, a cost sensitive based
possible, so as to avoid the catastrophic failure of power deep learning method is proposed in this paper, which is
system. suitable for imbalanced fault diagnosis. According to the
At present, there are various techniques employed in fault characteristics of input vector, one-dimension convolution
diagnosis of transformers. Amongst them, dissolved gas neural network (1D CNN) involving three hidden layers and
analysis (DGA) is an effective technique, due to its two fully connected layers is used as the basic classifier.
convenience of online monitoring [3]. The DGA based Further, a cost matrix is introduced into the softmax formed
output layer to construct cost sensitive 1D CNN (CS-1D CNN)

978-1-7281-5281-3/20/$31.00 ©2020 IEEE 1824

Authorized licensed use limited to: Universidade Estadual de Campinas. Downloaded on June 08,2021 at 18:17:25 UTC from IEEE Xplore. Restrictions apply.
model. In addition, PSO algorithm is adopted to optimize the the first fully connected layer has 16 neurons, and the
cost matrix preventing the limitations of traditional activation function is also in the form of ReLU.
determination method. Finally, a case study is carried out to
evaluate the effect of the proposed CS-1D CNN method. It Softmax classification
shows that the accuracy of all fault classes can be improved by
employing cost sensitive learning. Fault
Classification FC2(9) / ReLU
The rest of this paper is organized as follows. Section II
describes 1D CNN briefly. Section III presents how to FC1(16) / ReLU
construct CS-1D CNN for fault diagnosis. Section IV
elaborates the proposed CS-1D CNN based transformer fault Maxpooling ( 2 )
diagnosis model. In Section V, a case study of fault diagnosis
is analyzed by applying the posed method. Section VI draws Conv3(32, 1) / ReLU
the conclusions.
Feature
Conv2(32, 2) / ReLU
II. COST-SENSITIVE ONE DIMENSION CONVOLUTION Extraction

NEURAL NETWORK
Conv1(64, 2) / ReLU
A. Intriduction of one dimension convolution neural
network DGA feature vector
1D CNN can well deal with one dimension input signal,
which is composed of input layer, hidden layers (convolution Fig. 2. Proposed structure of 1D CNN for transformer fault diagnosis
layer and pooling layer), fully connected layers and output
layer [13]. The typical structure of 1D CNN is shown as Fig. 1. The one-dimension input vector is formed by the DGA
features of transformer. To effectively learn fault features from
the input vector, three convolution layers are used to extract
features, which are closely followed by the ReLU activation
function. Then, a max-pooling layer is employed to reduce the
dimension of feature vectors. After the pooling operation, the
output dimension will be compressed to half of the input value.
Subsequently, two fully connected layers, including 16 neurons
and 9 neurons respectively, are utilized to reduce the
dimension of the fully connected vector. At last, the fault
diagnosis result is received by soft-max classifier. In addition,
dropout operation is employed to avoid the overfitting problem
of the proposed network. The dropouts are introduced to the
Fig. 1. Typital strusture of 1D CNN convolution layers following each ReLU activation function.
C. Integrating cost matrix into1D CNN
The feature extraction layer, that is the hidden layer,
consists of convolution layer and pooling layer. In the As known, the fault diagnosis of transformer is a multiple
convolution layer, multiple convolution kernels are used for classification task. In the soft-max layer of 1D CNN, by
extracting the features of input layer. And, the pooling mapping the multiple neurons to the range of 0 and 1, the
operation can extract the mainly features from the output of probability distribution belonging to different fault classes can
previous layer, so as to reduce the dimensionality of be obtained as
eigenvectors as well as improve the robustness of nonlinear
features.Then, all the extracted feature vectors are connected to exp(ok )
form one-dimensional vector as the input vector of the fully P( y = k | X n ) = (1)
 k =1 exp(ok )
K
connected layer. The number of neurons in the fully connected
layer is equal to that of the fault classification. Finally, soft-
max classifier is used to obtain the target output classification. where P(y=k | Xn) is the probability of sample Xn belonging to
k class. ok is the kth neuron output of the second fully connected
B. Construction of 1D CNN layer before the soft-max function. K is the total number of
In this paper, a special 1D CNN with three convolution possible fault classes, and it is equal to 9.
layers is proposed for transformer fault diagnosis. Fig. 2
presents the detailed structure of the proposed network. In order to construct cost sensitive 1D CNN (CS-1D CNN),
Conv1(64,2)/ ReLU represents that the first convolution layer a cost matrix C is introduced to modify the output of the soft-
has 64 convolution kernels with the size of 2 for each of them, max layer, which can ensure the training process be stable.
and the ReLU activation function is used for nonlinear Then, the output of the soft-max classifier is modified as:
mapping. The representation of the other two convolution
 k =1 cy ,k exp(ok )
K
layers is similar with the first layer. Maxpooling (2) shows the P ( y = k | X n ) = (c yn , k exp(ok )) (2)
n
pooling window size is 2. Further, FC1(16)/ReLu represents

1825

Authorized licensed use limited to: Universidade Estadual de Campinas. Downloaded on June 08,2021 at 18:17:25 UTC from IEEE Xplore. Restrictions apply.
where cyn,k represents the misclassification cost of classifying where xn,i and xn,i are the ith attribute of Xn before and after
instance Xn into class k, while it actually belongs to class yn, normalization. xmax,i and xmin,i are the maximum value and
which is the element of cost matrix C: minimum value of the ith attribute over all samples,
respectively.
 c1,1 c1,2  c1, K 
c C. Optimizing cost matrix for CS-1D CNN
c2,2  c2, K 
C=
2,1 In this section, a heuristic swarm intelligence algorithm
(3)
      PSO is used to optimize the cost matrix for 1D CNN, which
 
 cK ,1 cK ,2  cK , K  can quickly optimize non-linear problem. Taking optimization
speed and stability of 1D CNN into consideration, the
The cross-entropy loss function is used to evaluate the error elements of cost matrix are fixed in the range of [0.6, 1] with a
of the estimated soft-max output probability distribution and step of 0.1. The searching procedure involves three steps.
the target class probability distribution, which can be given by Step 1: Train 1D CNN without cost matrix, and obtain
confusion matrix on the validation dataset. Find classes that
c yn , yn exp(o yn ) are prone to misclassify, set the cost element between them as
L(C , yn ) = − log( P( y = yn | X n ))= − log( ) ci,k=1.

K
c
k =1 yn , k
exp(ok )
Step 2: Use PSO algorithm to optimize the other elements
(4) of cost matrix. Generate initial particle population, and use G-
mean as fitness function to evaluate each particle. Update the
During training procedure, the parameters of 1D CNN are
position and speed of particle until the number of iterations is
updated to reshape the class probability, so that the desired
class has the maximum value. However, the minority fault reached. In addition, both the initial values and the updated
class is less frequent and under-represented in the training values should satisfy the constraint of ci,i being not greater
dataset, it will make the parameters of 1D CNN in favor of the than any ci,k. Fig. 3 shows the flow chart of PSO optimization
majority class. And, the new introduced cost matrix C can process.
modify the 1D CNN updating process, so as to encourage the Initializing particle swarm
correct classification of the infrequent fault class.
III. TRANSFORMER FAULT DIAGNOSIS MODEL BASED ON Calculate the fitness function
(according to G-mean metrics)
CS-1D CNN
Update optimums of each particle and
A. Fault classes of power transformers the whole population

According to DL/T 722-2014 standard, transformer


incipient faults can be divided into overheating faults,
discharging faults and compound faults. The overheating faults Check the number of iterations,
Y Output the
have three degrees involving thermal faults of low temperature it is equat to the maximum?
optimized cost
matrix
(LT), medium temperature (MT), and high temperature (HT).
And, there are three kinds of discharging faults including N
partial discharge (PD), low-energy discharge (LD), and high- Update the position and speed of each
End
energy discharge (HD). Moreover, two compound faults, i.e. particle
low-energy discharge with thermal fault (LDT) and high-
energy discharge with thermal fault (HDT) are also considered. Fig. 3. Flow chart of PSO optimazation for cost matrix
In this section, all the eight fault classes mentioned above and
normal state(NS) class are included to transformer fault dataset, Step 3: Introduce the optimized cost matrix to 1D CNN to
which are shown as CLT, CMT, CHT, CPD, CLD, CHD, CLDT, CHDT form CS-1D CNN. Use training dataset to train the network.
and CNS, respectively. D. Evaluation metrics
B. Construction of input feature vector In addition to the accuracy metric, the performance of
The input feature vector has a significant impact on the classifier on multiclass fault dataset is evaluated by the other
effectiveness of transformer fault diagnosis model. In this three metrics: recall, precision and G-mean. And, they are
paper, both DGA concentrations and relative proportions of expressed as follows:
DGA are taken into account. The input vector has eight
attributes including five concentrations H2, CH4, C2H6, C2H4,  recallk = TPk (TPk + FN k ) ∀k ∈ K
C2H2, as well as three relative proportions C2H2 / C2H4, CH4 /  (6)
H2, C2H4/ C2H6.  precisionk = TPk (TPk + FPk ) ∀k ∈ K
In order to reduce influences of value levels of various K
attributes, all attributes in the input feature vector are G − mean = K ∏ Re callk (7)
normalized by k =1

xn,i =(xn,i − xmin,i )/(xmax,i − xmin,i ) (5)

1826

Authorized licensed use limited to: Universidade Estadual de Campinas. Downloaded on June 08,2021 at 18:17:25 UTC from IEEE Xplore. Restrictions apply.
IV. CASE STUDY AND ANALYSIS effectively modify the training procedure of 1D CNN, so as to
minimize the classification difference among various classes.
A. Fault dataset and model parameters
For this paper, a transformer fault dataset collected from TABLE II. CLASSIFICATION RESULT OF 1D CNN METHOD AND CS-
distinct sources, is used to verify the performance of CS-1D 1D CNN METHOD
CNN method. Some samples of the dataset are collected from 1D CNN CS-1D CNN
power companies in China, while the others are digested from Class Recall Precision Recall Precision
the IEC TC 10 database and published literatures. The fault
(%) (%) (%) (%)
dataset is randomly shuffled and divided by 5:2:3 to form
training dataset, validation dataset and testing dataset. And the CNS 98.54 96.21 99.03 97.45
detailed sample distributions of the original dataset and the CHT 96.17 84.34 94.63 88.85
three divided datasets are described in Table I.
CMT 75.27 85.90 89.60 84.62
TABLE I. SAMPLE DISTRIBUTION OF ORIGINAL CLT 74.36 91.47 75.82 96.17
DATASET,TRAINING VALIDATION AND TESTING DATASETS
CHD 81.36 84.45 78.22 83.00
Original Training Validation Testing
Class CLD 81.23 78.18 79.83 79.49
dataset dataset dataset dataset
CNS 685 343 137 205 CPD 84.75 92.01 87.01 92.92

CHT 580 290 116 174 CHDT 68.06 59.49 79.17 57.05

CMT 309 154 62 93 CLDT 66.67 69.65 84.21 84.71

CLT 304 152 61 91 Accuracy(%) 86.30 88.41

CHD 428 214 86 128 G-mean(%) 80.04 84.96

CLD 398 199 80 119 In order to better understand the process of feature
extraction, the t-SNE technique is employed to visualize the
CPD 196 98 39 59 learning characteristics of each layer. Fig. 4 and Fig.5 display
CHDT 79 39 16 24 the feature visualization results of 1D CNN and CS-1D CNN,
respectively. It is clearly observed that, through three hidden
CLDT 64 32 13 19 layers, the features with the same fault class are gradually
becoming closer, and the features with the different fault
The detailed network construction of the proposed CS-1D classes are more separated. Moreover, comparing Fig.4 with
CNN has been expressed in Section III. In addition, the Fig.5, it illustrates that the visualization result of CS-1D CNN
dropout ratio of neurons is set as 0.1. The Adam method is is more separable than that of 1D CNN. Thus, the proposed
used to update the gradient of 1D CNN and the learning rate is approach has more powerful feature extraction ability, which
0.001. The iteration numbers of training process is 300. makes it easier to classify the various transformer fault classes.
Generally, the parameters of PSO can have a significant
influence on the searching process for cost matrix. A larger C. Comparison with other classification methods
value enables a better result but it may lead to a slower In this section, the performance of the proposed CS-1D
convergence. Hence, the two main parameters of PSO, i.e. CNN model is compared with the performance of SVM model,
initial number of particles and number of iterations, are set as BPNN and 1D CNN models. The RBF is adopted as the kernel
30 and 20 empirically. function for the SVM model. And, the coefficient gamma of
B. Application of CS-1D CNN in fault diagnosis RBF is set as 400. In the BPNN model, the number of hidden
layers is 2, and the hidden neurons are 15 and 10, respectively.
For the purpose of verifying the effectiveness of the The Adam method is adopted, and the learning rate is 0.001.
proposed CS-1D CNN method, it is compared with the The iteration number of training process is 500. The structure
traditional 1D CNN method. Table II shows the classification of 1D CNN is similar with CS-1D CNN, except the cost matrix
performance of the two methods. in the soft-max function. The simulation results of these
It can be seen that in 1D CNN method, the recall values on algorithms are shown in Table IIII.
minority classes (CHDT and CLDT) are less than 70%, which are As shown: 1) Both SVM and BPNN models have poor
lower than those of the other seven classes. And, the accuracy performance on G-mean, as well as with large difference
of 1D CNN is 86.30%, while the G-mean is only 80.04%. It between accuracy and G-mean. And, the BPNN model
indicates the ability of 1D CNN on imbalanced fault dataset is outperforms the SVM model on the imbalanced fault diagnosis.
limited. To make a comparison of CS-1D CNN and 1D CNN, 2) The 1D CNN model has better performance on accuracy and
after introducing the cost matrix, the recall values of most G-mean metrics than SVM and BPNN models. 3) The CS-1D
classes are increased. Especially, the recall values of CHDT and CNN has the best classification result among the four
CLDT are improved by 11.11% and 17.54%. Moreover, the algorithms. Comparing with SVM method, it can improve the
accuracy and G-mean are increased by 2.11% and 4.92%, with two metrics by 9.28% and 23.66%, respectively.
little difference amongst them. It shows that the cost matrix can

1827

Authorized licensed use limited to: Universidade Estadual de Campinas. Downloaded on June 08,2021 at 18:17:25 UTC from IEEE Xplore. Restrictions apply.
feasibility and effectiveness of the proposed CS-1D CNN
method. The main conclusions are as follows:
1) With the cost matrix optimized PSO algorithm, the training
process of the posed CS-1D CNN can be modified so as to
pay more attention on the minority classes.
2) Comparing with traditional machine learning algorithms,
the CS-1D CNN method has excellent performance on
most fault classes, especially on the minority classes (CHDT
(a) Raw data features (b)Features in the 1st hidden layer and CLDT). After integrating cost sensitive learning, the
accuracy and G-mean metrics are significantly improved
and the G-mean is increased to almost 85%.
REFERENCES
[1] D. Martin, J. Marks, T. K. Saha, O. Krause, and N. Mahmoudi,
“Investigation into modeling Australian power transformer failure and
retirement statistics,” IEEE Trans. Power Del., vol. 33, no. 4, pp. 2011-
2019, August 2018.
[2] M. M. Islam, G. Lee, S. N. Hettiwatte, and K. Williams, “Calculating a
health index for power transformers using a subsystem-based GRNN
(c)Features in the 2nd hidden layer (d)Features in the 3rd hidden layer approach,” IEEE Trans. Power Del., vol. 33, no. 4, pp. 1903-1912,
August 2018.
Fig. 4. Feature visualization via t-SNE of 1D CNN
[3] H. Ma, C. Ekanayake, and T. Saha, “Power transformer fault diagnosis
under measurement originated uncertainties,” IEEE Trans. Dielectr.
Electr. Insul., vol. 19, no. 6, pp. 1982-1990, December 2012.
[4] M. Duval, “A review of faults detectable by gas-in-oil analysis in
transformers,” IEEE Electr. Insul. Mag., vol. 18, no. 3, pp. 8-17,
March/June 2002.
[5] IEEE Standard C57.104-2008 “IEEE Guide for the Interpretation of
Gases Generated in Oil-Immersed Transformers”, 2008.
[6] Y. Cui, H. Ma, and T. Saha, “Improvement of power transformer
insulation diagnosis using oil characteristics data preprocessed by
SMOTEBoost technique,” IEEE Trans. Dielectr. Electr. Insul., vol. 21,
no. 5, pp. 2363-2373, October 2014.
(a) Raw data features (b)Features in the 1st hidden layer [7] J. Li, Q. Zhang, K. Wang, J. Wang, T. Zhou, and Y. Zhang, “Optimal
dissolved gas ratios selected by genetic algorithm for power transformer
fault diagnosis based on support vector machine,” IEEE Trans. Dielectr.
Electr. Insul., vol. 23, no. 2, pp. 1198-1206, April 2016.
[8] V. Miranda, A. R. G. Castro, and S. Lima, “Diagnosing faults in power
transformers with autoassociative neural networks and mean shift,” IEEE
Trans. Power Del., vol. 27, no. 3, pp. 1350-1357, July 2012.
[9] J. Dai, H. Song, G. Sheng, and X. Jiang, “Dissolved gas analysis of
insulating oil for power transformer fault diagnosis with deep belief
network,” IEEE Trans. Dielectr. Electr. Insul., vol. 24, no. 5, pp. 2828-
2835, October 2017.
[10] X. Jiang, S. Pan, G. Long, F. Xiong, J. Jiang, and C. Zhang, “Cost-
(c)Features in the 2nd hidden layer (d)Features in the 3rd hidden layer sensitive parallel learning framework for insurance intelligence
operation,” IEEE Trans. Ind. Electro., vol. 66, no. 12, pp. 9713-9723,
Fig. 5. Feature visualization via t-SNE of CS-1D CNN
December 2019.
[11] S. H. Khan, M. Hayat, M. Bennamoun, F. A. Sohel, and R. Togneri,
TABLE III. COMPARISON OF THE PROPOSED CS-1D CNN METHOD
WITH OTHER TRADITIONAL METHODS “Cost-sensitive learning of deep feature representations from imbalanced
data,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 8, pp. 3573-
3587, July 2015.
Class SVM BPNN 1D CNN CS-1D CNN
[12] C. Zhang, K. C. Tan, H. Li, and G. S. Hong, “A cost-sensitive deep
belief network for imbalanced classification,” IEEE Trans. Neural Netw.
Accuracy(%) 79.13 83.45 86.30 88.41 Learn. Syst., vol. 30, no. 1, pp.109-122, January 2019.
G-mean(%) 61.30 70.69 80.04 84.96 [13] Y. Li, L. Zou, L. Jiang, and X. Zhou, “Fault diagnosis of rotating
machinery based on combination of deep belief network and one-
dimensional convolutional neural network,” IEEE Access, vol. 7, pp.
165710-165723, November 2019.
V. CONCLUSIONS
A cost sensitive 1D CNN model is proposed for fault
diagnosis of power transformers. Case studies have shown the

1828

Authorized licensed use limited to: Universidade Estadual de Campinas. Downloaded on June 08,2021 at 18:17:25 UTC from IEEE Xplore. Restrictions apply.

You might also like