Professional Documents
Culture Documents
Abstract-Because traditional transformer diagnosing missing in every reduction set, the classification performance
approaches are over-rigidity and need almost complete and is not good.
accurate testing data, a NB (naive Bayesian) classifier based
model to diagnose transformer faults is presented and Support vector machine (SVM), a novel network algorithm,
constructed in the paper. As the diagnosing performance is has emerged as one powerful tool for data analysis. It satisfies
depressed by incomplete testing data, SVM regression approach structural risk minimization based on statistical learning
is used to estimate the missing data. Thus a new diagnosis model, theory, whose decision rule could still obtain small error to
which integrates SVM regression and NB classifier, is independent test sampling. So, propose a SVM Regression
constructed. The diagnosing experiments of different
transformer testing scenarios show that the constructed NB (SVR) based algorithm to fill in missing data. And a new
diagnosis model has a good performance given complete testing model that integrates NB classifier and SVR algorithm is built
data, and the proposed SVM regression approach can raise the to diagnose the transformers' faults. By this way, optimize the
accuracy of transformer diagnosing even if a certainty quantity diagnosis performance.
of data or important data are missed.
II. SUPPORT VECTOR MACHINES REGRESSION
I. INTRODUCTION Support vector machine, developed by Vapnik and Cortes,
As one of the key equipment in transmission and is a novel machine learning method based on statistical
distribution system, power transformer condition has a directly learning theory (SLT). The basic idea of SVM is to map the
influence on safety and reliability of a power system. input data points to a high-dimensional feature space and find
Therefore, for the sake of avoiding the tremendous blackout a hyper-plane. The algorithm is chosen in such a way to
loss caused by power apparatus failure, it is very important to maximize the margin between the separating hyper-plane and
evaluate transformers' conditions to find incipient faults and data. So as to minimizing an upper bound of generalization
to catch any potentially extended deterioration as early as error. As shown Fig. 1, the actualizing form of SVM is like a
possible. neural network.
Many technologies are used to examine transformers. SVM mainly has two classes of application, classification
Among all the methods, dissolved gas analysis (DGA) is a fast and regression. In this paper, SVM regression is introduced.
and effective method for detecting incipient faults of Let's consider a linear instance of a regression problem.
transformers. But the actual regulations or rules based on Given a data set D= (x,y1),. .,,y),(x.,y,)} , the regression
DGA can only give determinant boundaries, which cannot function is approximated by the following function
represent the relation between the faults and the exterior f(x,a) = w-x+b (1)
character. Therefore, some new approaches based on DGA Coefficients w and b are estimated by minimizing the
and artificial intelligence, such as evidential reasoning theory, regularized risk function:
SVM classifier, and neural network are studied in [1,2,3,4]. R w L (y, f (x, a)) (2)
2 1
However, most of these methods need complete and accurate i=1
information to gain feasible diagnosis results. where, l11lw II2 is used as a flatness measurement of(1); C is
Bayesian classifiers, based on classification models, have a regularized constant that determines the tradeoff between the
preponderance on fault diagnosis of complicated systems. Ref. training error and the model flatness; and L(y, f (x, OC)) is the
[5] considers Bayesian classifiers to be effective models in the
domain of uncertain knowledge representation and organum.
Among them, a Native Bayesian (NB) classifier is popularly Decision rule f(x)
used for its simple structure and good performance. Although
Significances
Bayesian classification can dispose incomplete data, its yi
performance will descend when key information is missing [6].
As available testing data for transformer fault diagnosis are Nonlinear conversion based on
support vectors x1,..., xn
incomplete and biased, Ref. [6] proposes an approach to
integrate the Bayesian classifiers with rough set, which can
Input vectors
reduce the degree of incompleteness by acquiring optimum x=(Xl ... Xn)
reduction set. But when more than one key attribute values are xl x2 X3 xn
Figure 1. The framework of SVM
(2) Huber loss function: Theoretically, given all the attributes are conditionally
( ~~~~~~~~~~2 independent, Ref [7] considers that the performance of the
cly f (x,a)l- Cif
2
-
lY - f (x, a)l > c
Native Bayesian classifier is excellent.
L(y, f (x,(a))
l-Y- f(x,a ,if ly - f (x,a)l < c IV. DIAGNOSIS MODEL BASED ON NB CLASSIFIER AND
(3) £-Insensitive loss function: SVR ALGORITHM
L(y, f (x, a)) = L(ly, f (X, a)l,) =0, if ly, f (x,a)l < A. Attributes Selection andData Pretreatment
ly, f (x, a)l -£,,else Attributes selection is important for pattern recognition and
As to the nonlinear question, we can map the input data x to a data analysis methods. Reasonless attributes selection may
high-dimensional feature space F via a nonlinear mapping cause lacking-learning or over-learning. There are many
¢ . And then, settle the nonlinear question as a linear one. features that reflect faults' character. According to practical
That is: considerations and dissolved gas analysis technology, choose
f (x) =(w (x)) -P + b the key gasses and ratios of certain gasses as attribute
D: Rn -F, w E F variables, which are shown in the Table I.
For computational convenience, introduce the so-called Native Bayesian classification is a symbolic analysis
kernel function with the form K(x, xi) = P(X)T c(y) And so, all .
method, of which the attribute variables must be dispersed
computations are carried on via kernel function in the input data. Thus the continuous data must be dispersed. Take into
space.
account the fact that the attributes' standards is different from
each other, a viable dispersing means is to set threshold value
III. NATIVE BAYESIAN CLASSIFICATION for every attributes and divide every attributes into several
Bayesian classifiers are classification models based on sects. Take every sect as a decided dispersed value. According
statistical theory. They learn the training set and then conclude to the IEC/IEEE standards and experts' experience, define the
classifiers. Among them, a Native Bayesian classifier is dispersion criterion in the model as shown in table I.
popularly used for its simple structure and good performance. B. Model Construction
According to Bayes' rule: It is very important to design a sound classification model.
P(Ck XI,... X,) = (Ck ) (X , . t
Xn CO For the sake of balancing the complexity and the
generalization when executing the classification, build the NB
aP(ck )P(XI*--, X CCk) diagnosis model, which has the attribute variables chosen and
a decision variable. As Fig. 2 shows, the NB model is a two-
aP(ck)H P(xi Ck) (3) layers Bayesian network. Each attribute is conditionally
i=l
Native Bayesian classifier basically learns the class- attribute Xi. In this model, decision variable C has six optional
conditional probabilities P(Xi =Xi Ck) of each variable xi values: normal running (C0), low energy discharge (C1), high
given the class label . By this way, a new test case
ck energy discharge (C2), low or medium temperature thermal
(x1=x1,...,x, =x) is then classified. The detail is described fault (C3), high temperature thermal fault (C4) and partial
as followed: discharge (C5).
The prior probability of given class variable ck is: TABLE I
DISPERSION CRITERION (UNIT: pL/L)
N
P(C ) P(ck) (4) Code Attribute
N variable Dispersion criterion
where N, iS the amount of the samples whose class label is 0 1 2
Xi <140 . 141 <700 . 701
ckin the training set, and N is the total number of all samples X2
H2
CH4 <60 2 61 <400 2 401
in the training set. X3 C2H6 <100 >101- <150 >151
Reckon the conditional probability of each variable node by X4 C2H4 <120 > 121 <200 > 201
computing their likelihood probabilities as following formula: X5 C2H2 <5 > 6 <35 >36
X6 C1+C2 <150 > 150
P(Xi = xi C) = P(Xi xiC) NIC (5) X7 C2H2/ C2H4 <0.1 > 0.1 <3 >3
X8 C2H4/H2 > 0.1 <1 <0.1 21
where x(x-) is the amount of the samples which satisfy X9 C2H4/C2H6 <1 >1 <3 >3
ck
X1O CO2/CO >3 <7 <3 7
C = ck and xAi = xi in the training set.
testing set. Other 501 records as the training set of the NB
model. Do diagnosis with the NB model. Only 4 records get
right result.