Comparison of Two Partial Discharge Classification Methods

Comparison of two partial discharge classification
methods
J. A. Hunter*, L. Hao and P. L. Lewin D. Evagorou, A. Kyprianou and G. E. Georghiou
School of Electronics and Computer Science Department of Electrical and Computer Engineering
University of Southampton University of Cyprus
Southampton, United Kingdom Nicosia, Cyprus
E-mail (*): jh09r@ecs.soton.ac.uk
Abstract—Two signal classification methods have been examined different PD sources is one of the first steps towards producing
to discover their suitability for the task of partial discharge (PD) a universal PD detection system.
identification. An experiment has been designed to artificially
mimic signals produced by a range of PD sources that are known Machine learning is the study of computer algorithms that
to occur within high voltage (HV) items of plant. The bushing tap improve their performance automatically through experience.
point of a large Auto-transformer has been highlighted as a Successful applications include data mining programs that
possible point on which to attach PD sensing equipment and is discover general rules from large databases and autonomous
utilized in this experiment. Artificial PD signals are injected into vehicles that learn how to drive on public roads. With the
the HV electrode of the bushing itself and a high frequency increasing interest in intelligent classification systems, it was
current transformer (HFCT) is used to monitor the current decided to investigate whether machine learning techniques,
between the tap-point and earth. The experimentally produced coupled with modern signal analysis tools, could be used to
data was analyzed using two different signal processing accurately identify different PD sources.
algorithms and their classification performance compared. The
signals produced by four different artificial PD sources (surface II. SUPPORT VECTOR MACHINE OPERATION
discharge in air, corona discharge in air, floating discharge in oil
and internal discharge in oil) have been processed, then classified
Support Vector machines (SVM) were initially developed
using two machine learning techniques, namely the support by V. N. Vapnik in 1995 as an application of statistical
vector machine (SVM) and probabilistic neural network (PNN). learning theory. They involve the construction of a decision
The feature extraction algorithms involve performing wavelet function within a set of labeled training data. The functions
packet analysis on the PD signals recorded over a single power that are produced can be used to solve either classification or
cycle. The dimensionality of the data has been reduced by finding regression problems. SVM operation can be explained as an
the first four moments of the probability density function (Mean, extension of the optimum hyperplane problem displayed in
Standard deviation, Skew and Kurtosis) of the wavelet packet Fig. 1. A hyperplane is formed between two linearly separable
coefficients to produce a suitable feature vector. Initial results data classes, with the optimal solution providing the largest
indicate that very high identification rates are possible with the
SVM able to classify PD signals with a slightly higher accuracy
separating margin, "m" between the hyperplane and data
than a PNN. points. The SVM iteratively optimizes the hyperplane function
with each "training" data point and the points closest to the
Keywords: Partial Discharge; Support vector machines; margin are defined as the, "support vectors". As it is possible
Probabilistic neural network; Wavelet analysis; Probability density to produce the hyperplane from the support vectors, the
function; remaining data is deemed surplus to requirement and is
discarded (reducing memory demands). After a model has
I. INTRODUCTION been created for a training data-set, parameter optimization is
The term "Partial discharge" (PD) is defined by IEC 60270 completed then it can be used as a decision boundary to
as, 'a localized electrical discharge that only partially bridges classify data of unknown class.
the insulation between conductors and may or may not occur
adjacent to a conductor' [1]. Modern insulation systems can The classification performance of the technique described
consist of combinations of solid, liquid or gaseous material. above is limited when dealing with data sets that aren't linearly
This leads to a vast range of mechanisms, described as PD that separable. The ability of Kernel functions to map input vectors
each have their own characteristics and varied effects on the into higher dimensional space, coupled with Cover's theorem
quality of insulation used in modern power systems. It is that states: pattern classification is easier to achieve in higher
widely recognized that the analysis of PD signals can be used dimensional space, produces an effective solution to this
to monitor the health of dielectric materials. The ability to problem [3]. Previous published results have highlighted the
accurately detect and classify PD sources is a fundamental
Radial basis function (RBF) kernel as a suitable choice for
stage in the development of an expert insulation condition
monitoring system [2]. It is believed that the development of kernel selection due to its ability to provide non-linear
signal analysis techniques that are able to accurately classify mapping, added to the fact that it has a number of optimization
parameters [3].
Figure 1. Optimum hyperplane diagram
Figure 2: Symlet 8 Mother Wavelet
III. EXPERIMENTATION language, each with varying properties. The eighth order of
An overview of the experimental method that is used to the Symlet wavelet family is used as the basis function for
generate the data sets used in this paper is included below; a analysis in this report is shown in Fig. 2.
more detailed explanation is available in [4]. In order to
artificially produce signals that closely resemble those Wavelet packet transforms are a form of multi-scale
produced by on-line equipment in the field an electrical differential operator. The outcome of wavelet analysis is to
coupling technique was used. To achieve this, each discharge decompose a signal from the time domain to the time-scale
source was used to inject their unique PD pulses into a 60 kV domain. This is achieved by expressing the original signal as a
bushing (model 60HC755). A high frequency current set of shifted and scaled versions of a single basis function
transformer (HFCT) was used as a sensor to monitor the ψ(t) (Mother Wavelet). The original time domain signal is fed
current passing between the bushing tap point and earth as
through a series of complementary high pass filters and down-
shown in Fig. 2. The signal from the IPEF HFCT model CAE
sampled by a factor of two to generate higher frequency
140/100 was recorded, analyzed and displayed using a
Tektronix DPO71254 digital oscilloscope, with a bandwidth of coefficients (a representation of the detail of the original
12.5 GHz, a sample rate of 50GS/s and a 400 MSample signal). The original signal is then passed through low pass
memory. filters and down-sampled by two to generate lower frequency
coefficients (a representation of the approximation of the
A Robinson conventional PD detector was used for original signal). The newly generated wavelet coefficients
calibration and to quantify the apparent charge level. This represent decomposition level 2 (decomposition level 1 is the
allowed for the PD signal amplitude (mV) to be related to a original signal), the detail and approximation coefficients of
discharge quantity (pC). This enabled the severity of each PD level 2 are themselves analyzed to produce the next
event to be quantified; an important aspect when linking the decomposition level and the proccess continues for a user-
effects of PD to electrical ageing [5]. Following calibration, the
defined number of stages.
required PD source was connected to the arrangement, the
inception voltage found and PD signals recorded at a range of
voltages above the inception voltage. The acquired data
consists of 500 consecutive cycles at a sampling rate of 500
KS/s for each PD source and voltage. The PD data was
synchronized to the 50 Hz power cycle to allow for phase
resolved analysis, each individual cycle consists of
approximately 10000 points (due to fluctuations in the power
frequency). The time domain signal that was recorded over
each power cycle were used as inputs for both of the
classification algorithms. Table I documents the data-sets
provided by this experiment for classification.
IV. WAVELET PACKET TRANSFORM
A wavelet can be thought of as a small, irregular, asymmetric
wave that has a short duration and a zero mean value. It
oscillates quickly either side of the central point and then
decays as shown in Fig. 2. The use of wavelet analysis has
proven to be more effective at analyzing inherently “spiky”
signals such as PD. A range of wavelet families are available Figure 3: Wavelet packet decomposition coefficients, a=Original signal,
within the MATLAB programming b=Approximation coefficients, c=Detailed coefficients.
Wavelet packet analysis produces a binary tree data structure, A. Common feature extraction method
with each node representing a set of coefficients that are Wavelet packet transforms are being increasingly used to
derived from the original signal. Each layer of the tree analyze and de-noise PD signals due to their flexibility and the
represents a decomposition level for the analysis. The characteristics of its user-defined basis function: the wavelet.
classification algorithms in this paper use different sections of When comparing wavelets to a sine wave (the basis of Fourier
the binary tree to complete their analysis. The team using transform), the wavelet was seen to be more effective at
probabilistic neural networks (PNN) extract information from analyzing inherently "spiky" signals such as PD [10]. Both
the entire decomposition level to use for classification. In feature extraction methods described in this paper share the
contrast to that, the classification technique that utilizes SVM same initial processing step; Wavelet packet analysis was
extracts information from one node of three decomposition performed on the formatted experimental data from each power
levels. cycle to produce a set of wavelet coefficients. During this
process, the data is transformed from the time domain to time-
V. FEATURE EXTRACTION scale domain. This has the effect of reducing the
Effective feature extraction is an integral component for dimensionality of the data and separating its high and low
the development of an intelligent classification algorithm. It frequency components. The "Symlet" family of order eight was
involves capturing the most relevant and unique information selected to produce nine decomposition levels when
from the experimentally produced data-sets. Literature transforming the original data as these have previously been
highlights a range of methods used to extract characteristics used to analyze PD signals [11].
from PD signals: phase based parameters, pulse sequence B. Support Vector Machine algorithm
analysis, frequency spectrum and statistical operators [6-9]. As
PD events appear as pulses with a fast rise time and a short
two different machine learning techniques are utilized in this
duration [12]. For this reason the wavelet decomposition
paper, the methods used to manipulate the experimental data
containing the low frequency transformation was deemed to
into a suitable form for each method differ slightly. The initial
contain irrelevant information and was discarded. The detail
stage of processing is common for both algorithms and is
wavelet coefficients of decomposition levels 3, 6 and 9 were
applied on the same data-set.
extracted from the data and arranged to form a single vector of
756 dimensions. The data-set documented in Table I were input
to the feature extraction method and the feature vector of each
cycle were ordered into a matrix. These vectors were then
scaled between -1 and 1, to reduce the effect of large numerical
values dominating the classification process.
C. SVM and principal component analysis
The general operation of a kernel is to map multi-class data
into higher dimensional space, where the data clusters are
hopefully linearly separable. Once in this new space, the SVM
creates a separating hyperplane model that is later used to
classify test data. If the feature extraction method can be
designed to maximize the linear separation of the data sets
before they are input to the kernel, it is believed to be beneficial
for classification accuracy. Principal component analysis
(PCA) is a tool within MATLAB for visualizing multi-
dimensional data. The function generates a set of orthogonal
variables which are linear combinations of the original
variables [13]. PCA produces a matrix of the same dimensions
of the original matrix but the first few principal components
contain 80% of the variance of the original data. Fig. 3
represents the first two principal components of the post-
Figure 4. Schematic diagram of experiment processed data used to train the SVM by the team from the
Table I. Breakdown of Partial Discharge data University of Southampton. The plot shows a degree of
clustering for all PD types (meaning the feature extraction
Experimental data properties
method is highlighting common characteristics of the PD
PD type Voltage level (kV) Total no. of Cycles signals) and that the signals produced by the Internal source are
Floating 23.4, 25.5 1000 linearly separable from the other sources. The points related to
the corona data are mostly separable and the data from the
Corona 7, 8, 9, 10 2000
remaining sources are fairly overlain (although some separation
Internal 21, 24.5 1000 is noticed when performing PCA in three-dimensions). This
could lead to poor classification results between surface and
Surface 6, 9 1000
floating PD signals.
a. The 7 and 9 kV Corona data was not available for the team from the University of Southampton
Principal Component Scatter Plot
first four moments of the probability density function of the
0.06 wavelet decomposition levels. Where η is the mean, σ is the
standard deviation, γ is the skew and κ is the kurtosis of a
0.05
distribution as defined by (1-4). The values that were produced
0.04
mathematically described the distribution of the wavelet
coefficients; each of the nine decomposition levels has four
Second Principal Component
0.03 descriptors, leading to a feature vector of 36 dimensions.
0.02
(1)
(2)
0.01
Floating
(3)
0 Corona
Surface
(4)
-0.01 Internal
The use of statistical moments reduced the dimensionality
-0.02
of the data from 900 to 36 (a reduction factor of 25).
-0.06 -0.04 -0.02 0 0.02 0.04 0.06
First Principal Component G. Probabilistic Neural Network training equivalent
Reference [14] explains the operation of PNN and its
Figure 5. Principal Component plot of SVM training data. implementation for a similar application. The general operation
of the PNN is to learn the pattern statistics of a set of
supervised training data. This is achieved by the use of
D. Support Vector Machine training and optimization Gaussian kernel summation to estimate the conditional
probability density function and selection of the class
To produce the most rigorous SVM classification model displaying the minimal expected risk. If a large enough set of
possible, the largest set of training data should be used; if more training data is supplied, an accurate decision boundary is
data is used for training, a larger number of support vectors will constructed and test data can be classified. The PNN was
be generated resulting in a more effective hyperplane model. trained with the first 200 cycles of Corona data and 100 cycles
As this classification algorithm is to be compared with the for the remaining Floating, Internal and Surface PD types. All
method that utilized PNN, it was decided to mimic the method of the experimentally produced data was then input to the PNN
used by the team from the University of Cyprus. 100 cycles of and a classification accuracy was found.
data were randomly selected from each PD source and used to
train the SVM using the radial basis function (RBF) kernel. VI. RESULTS
The choice of kernel function is important, as the kernel
function defines the feature space in which the training set will An extensive data-set of experimentally produced PD
be classified. A training label vector was generated to allow the signals produced from a range of sources and voltage levels
SVM to identify the signals produced from different PD was classified using two machine learning techniques. Two
sources. Cross validation and a grid search were used to find different feature extraction methods were designed (to enhance
the optimum values of the regularization (C) and flexibility the performance of each respective machine learning method in
parameter (γ) for the SVM and enable the maximum terms of computational efficiency and classification accuracy)
classification accuracy possible. Cross-validation involves and implemented to capture the unique characteristics inherent
folding the training data by a user defined number and using to each PD source. Table II shows the classification accuracies
one of these data-sets to test the accuracy of the other training produced by each method.
sets for a range of parameters, it is completed to reduce the Table II. Classification accuracy results
effects of the overfitting problem. Classification accuracy (%)
PD type
E. SVM testing PNN method SVM method
The entire data-set (1000 cycles per PD source) was input Floating 91.9 99.63
to the optimized SVM model along with a testing label vector.
Corona 97.49 99.38
The SVM classified the input matrix using the optimized model
and compared the class each feature vector was assigned to Internal 100 100
with its actual class and a classification accuracy percentage
Surface 99.8 97.62
was obtained.
Total 97.30 99.17
F. Probabilistic Neural Network algorithm
The team from the University of Cyprus used PNN to
classify the experimental data. One of the limitations of PNN is These results highlight the high accuracy of the applied
when analyzing data that is of a high dimensionality, the methods at classifying individual PD types. The results show
complexity of the PNN itself is increased and computational that with the same ration of training to testing data, the SVM
time starts to become a factor. To solve this problem, the team outperforms the PNN in two of the four PD sources as well as
developed a data mining algorithm that involved finding the the overall classification value. Both methods perfectly
identified all of the PD signals produced by the Internal source,
in agreement with the analysis of Fig. 3. Both methods Conference record of the 2006 IEEE International Symposium on
effectively implemented the use of Wavelet Packet Transforms Electrical Insulation. pp 110-113, June 2006.
to analyze the time domain data. Future research will [4] E. Gulski and F. H. Kreuger, "Computer-aided recognition of partial
discharge sources," IEEE Trans. on Electric. Insul., vol. 27, pp. 82-92,
investigate whether these algorithms return such impressive February 1992.
accuracy levels if a lower signal to noise ratio is present (as [5] J. P. van Bolhuis, E. Gulski and J. J. Smit, "Monitoring and diagnostic of
found in items of plant in the field). The possibility of transformer insulation," IEEE Trans. on Power Delivery, vol. 17, pp.
introducing noise signals as an independent class of signal, 528-536, 2002.
could also be investigated. The data acquisition method in the [6] F. H. Kreuger, E. Gulski and A. Krivda, "Classification of partial
experimental setup used to measure the artificial PD signals discharges," IEEE Trans. on Electric. Insul., vol. 28, pp. 917-931,
was aimed at obtaining useful phase resolved data. If the December 1993.
sampling rate was increased to fully complement the use of the [7] M. Hoof and R. Patsch, "Investigation of PD resulting from transformers
HFCT, more of the higher frequency characteristics of the PD using the PSA method," in IEEE 1997 Annual report on Electric. Insul.
and Dielectric Phenomena, vol. 2, pp. 562-566, October 1997.
pulses could be captured and possibly used for PD source
[8] Y. Tian, P. L. Lewin, A. E. Davies, S. J. Sutton and S. G.
classification. If the methods were supplied with larger training Swingler,"Application of acoustic emission techniques and artificial
data-sets, higher classification accuracies could be achieved. neural networks to partil discharge classification," In Conference Record
of the 2002 IEEE International Symposium on Electric. Insul., pp. 119-
VII. CONCLUSIONS 123, April 2002.
The performance of two PD source classification [9] N. F. Ab Aziz, L. Hao and P. L. Lewin, "Analysis of partial discharge
measurement data using a support vector machine," 5th Student
algorithms has been investigated. Both methods produced high Conference on Research and Development, pp. 1-6, December 2007.
classification results with the SVM algorithm being slightly [10] X. D. Ma, C. Zhou and I. J. Kemp, "DSP based partial discharge
more accurate than the method implementing PNN. A characterisation by wavelet analysis," ISDEIV 2000, Vol 2, pp. 780-783,
drawback for the use of PNN method is its limitation in September 2000.
processing data of high dimensionality (hence the use of a [11] L. Hao, P. L. Lewin and S. G. Swingler, "Identification of multiple
complex feature extraction method). As the complexity of the partial discharge sources," Proc. from the International Conference on
SVM model is related to the number of support vectors Condition monitoring and Diagnosis, pp. 118-121, April 2008.
involved and not the dimensionality of the data, it doesn't [12] G. C. Stone, B. A. Lloyd, S.R. Campbell and H. G. Sedding,
experience the same problem. "Development of automatic, continuous partial discharge monitoring
systems to detect motor and generator partial discharges," Proc. of the
IEEE Electrical Machines and Drives Conference, pMA2-3.1-pMA2-
REFERENCES 3.3, 1997.
[1] Partial Discharge Measurements, IEC Publication 60270, 2000. [13] D. K. Kim and N. S. Kim, "Rapid speaker adaptation using probabilistic
[2] S. Boggs and J. Densley, "Fundamentals of partial discharge in the principal component analysis," IEEE Trans. on Signal Processing, Vol.
context of field cable testing," IEEE Electric. Insul. Mag., vol. 16, pp 13- 8, pp. 180-183, 2001.
18, September-October 2000. [14] D. Evagorou, A. Kyprianou, P.L. Lewin, A. Stavrou, V. Efthymiou and
[3] L. Hao, P. L. Lewin and S. Dodd, "Comparison of support vector G. E. Georghiou, "Classification of partial discharge signals using
machine based partial discharge identification parameters," In probabilistic neural network," Proc. from IEEE International Conference
on Solid Dielectrics, pp. 609-615, July 2007.

Comparison of Two Partial Discharge Classification Methods

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparison of Two Partial Discharge Classification Methods

Uploaded by

Copyright:

Available Formats

Comparison of two partial discharge classification

0.03 descriptors, leading to a feature vector of 36 dimensions.

You might also like