Wang 2018

IET Generation, Transmission & Distribution
Research Article
Deep-learning based fault diagnosis using ISSN 1751-8687

Received on 8th April 2018
Revised 14th June 2018
computer-visualised power flow Accepted on 2nd July 2018
E-First on 29th August 2018
doi: 10.1049/iet-gtd.2018.5254
www.ietdl.org
Songyan Wang1 , Shixiong Fan2, Jianwen Chen3, Xingwei Liu2, Bowen Hao1, Jilai Yu1
1School of Electrical Engineering and Automation, Harbin Institute of Technology, Harbin, People's Republic of China
2ChinaElectric Power Research Institute, Beijing, People's Republic of China
3School of Computer Science and Technology, Harbin Institute of Technology, Harbin, People's Republic of China
E-mail: wangsongyan@163.com
Abstract: Changes in system topology, such as branch breaking and the loss of a generator or load, may profoundly influence
the operation security of the power system. This study introduces a novel deep-learning based fault diagnosis method using
power flow to diagnose topology changes in the power system. Power flow samples with different system states and topologies
are first computed numerically; then, they are transformed into computer-visualised images. Using massive power-flow image
samples, a convolutional neural network that aims to identify the system state is trained. A feature-map restriction technique is
used to restructure the network. To enhance the robustness of the network, the random noise of branch flow is considered in the
sample generation process. The results show that the proposed deep-learning based method may diagnose system faults
effectively.
1 Introduction (i) ANN suffers the ‘curse of dimensionality’ problem.

Conventional ANN is a fully connected network and the network is
1.1 Literature review increased to an unbearable size in large-scale power systems [9,
Fault diagnosis of a power system is a complicated process because 10].
it contains many sections such as generators, transmission lines and (ii) The portability of the ANN is weak. Since ANN is a fully
bus bars, and their operation states are monitored by protective connected network, a specific power system always corresponds to
relays (PRs), circuit breakers (CBs) and communication equipment its customised network structure. If an ANN is used in another
[1]. Fault diagnosis analysis is generally based on the data provided system, with even slight changes in topology or elements, the ANN
by Supervisory Control and Data Acquisition (SCADA) during must be structured again to ensure that the network can be
power system disturbances. When a fault occurs, a large number of sufficiently generalised.
alarm messages from SCADA are provided to system engineers in (iii) Although ANN uses alarm log messages as input, the alarm
a short period of time. Under this circumstance, the fault analysis log messages from SCADA are often incomplete and uncertain
in the control centre becomes a challenge. In addition, in actual because many PRs and CBs might malfunction when hidden
industrial environments, it is quite possible that branch breaking failures of the protection systems occur in a blackout. If one or
and the loss of generator or load occurs, causing changes in the more PRs and CBs do not function properly, it can be very difficult
topology of the system that substantially threats the security of the to identify faulty sections [1].
power system. However, due to the severe interference in actual
industrial environments, the measurement device might become In recent years, since efficient optimisation methods were
blackout, or might suffer from noise interference. In the first case, proposed to train deep neural networks, deep learning has become
the faulted element is ‘unobservable’ to the system engineer, as a superior technique [11, 12]. Several deep neutral networks have
system engineers do not know whether the line is broken or not. In been proposed, including autoencoders, recurrent neural networks
the second case, the monitored branch flow can show an ‘actual’ (RNNs), restricted Boltzmann machines (RBMs), deep belief
value rather than an ideal zero, even though the line is already networks (DBNs) [13], and convolutional neural networks (CNNs).
broken. In other words, although the faulted element appears Deep learning is already used in some specific areas of power
‘observable’ to the system engineer, the monitored value of the systems. Deep RNN [14] and RBM [15] have been proposed for
faulted branch flow is completely erroneous. Both cases largely load forecasting. Auto-encoders have been applied to extract
confuse system engineers seeking an accurate fault diagnosis. features from load profiles [16]. Some deep learning techniques are
Therefore, it is necessary to develop efficient and intelligent also adopted in the inspection of the images of transmission lines
methods for fault diagnosis in actual system operations. and electricity-theft detection [17, 18]. Mocanu et al. [19] proposed
Fault diagnosis has been implemented by various approaches. a hybrid type of method that combines reinforcement learning with
These methods include Petri-net systems [2, 3], Bayesian network deep learning to perform on-line optimisation of schedules for
systems [4], optimisation methods (OM) [5], cause-effect networks building energy management systems. López et al. [20] proposed
(CE-Nets) [6] and expert systems (ES) [7]. Among all these an intelligent charging strategy using deep learning to determine
methods, artificial neural network (ANN) is a widely applied when to charge electric vehicles during connections. He et al. [21]
method in fault diagnosis [8]. ANN overcomes the limitations of exploited deep learning techniques to recognise the behavioural
conventional ES, and it reduces the dependence on prior expert features of false-data injection attacks. Kong et al. [22] proposed a
knowledge. In particular, a distinctive advantage is its long short-term memory-based deep-learning forecasting
generalisation capability, which is particularly beneficial for fault- framework with appliance consumption sequences. These works
diagnosis problems. However, some problems with this method are enrich the application of deep learning in power system fields.
still unavoidable in actual industrial environments. These problems Among the deep networks described above, CNN may be the
include most effective approach in these studies. As a milestone in
artificial-intelligence (AI) history, CNN is used in AlphaGo and
IET Gener. Transm. Distrib., 2018, Vol. 12 Iss. 17, pp. 3985-3992 3985
© The Institution of Engineering and Technology 2018
(iii) The random noise added in the branch flow in the sample
generation process forces the CNN to extract features outside the
faulted area. This novel sample generation technique makes the
CNN robust to tolerate interference after a branch is broken.
In this paper, for simplification, we only discuss the fault case

with broken branches because the method can be easily adopted to
the loss of generator/load cases. The remaining paper is organised
as follows. In Section 2, the mechanism of training the CNN with
the CVPFI is stated. In Section 3, the removal of redundant feature
maps in the convolutional layer is analysed. In Section 4, a sample
generation technique for feature-covered CVPFIs is presented, and
performance indices are introduced. In Section 5, a comprehensive
case study is presented. Conclusions and discussions are given in
Section 6.
2 Training CNN with CVPFI

2.1 Feasibility of using power flow to diagnose system fault
Power flow can be considered as a reflection of the operational
state of the power system. For a specific system state, once the
injected power flow or topology is changed, the power flow of the
whole system also varies simultaneously. In addition, active power
Fig. 1 Change in power flow after a fault occurs flow may change substantially and may spread over a much larger
(1.24: the value of the pre-fault power flow; 0.98: the value of the post-fault power area than re-active power flow when a fault occurs, making the
flow; solid arrow: the direction of the pre-fault power flow; dotted arrow: the direction fault easier to be identified. Thus, monitoring the active power
of the post-fault power flow if it is changed after branch broken) flow of the power system from SCADA is a technically feasible
approach for fault diagnosis. A demonstration of the power flow
Atari game agents [23, 24] that exhibit preterhuman ability in the variance before and after the fault is shown in Fig. 1. In the figure,
human–computer combat. CNN can extract features automatically branch L4-5 is broken after the fault is cleared. Notice that the
from raw data, especially images, and can provide precise variation in power flow <0.02 p.u. is not shown in the figure. In the
identification. Compared with conventional ANN, both sparse remainder of this paper, only active power flow is discussed.
connectivity and shared weights techniques profoundly reduce the From Fig. 1, compared with the pre-fault power flow, both the
complexity of the network for image identification (aiming at direction and value of the branch flow may vary after a fault is
Problem (i)). From the test of ImageNet database with millions of cleared. In addition, the variance of power flow is severe around
image samples [25], some universal structures of CNN perform the faulted section, but it decays in the outer branches, making the
quite well in identifying photograph images with different variation of power flow after fault spread similar to a ‘ripple’, as
complexities, indicating that the structure of CNN is more portable shown in Fig. 1. Such a complicated variation of power flow
than ANN (aiming at Problem (ii)). In addition, CNN captures the indicates that a unique power flow corresponds to a ‘one and only’
features of images in a ‘fuzzy’ way, i.e. the images can still be system state, and a system fault is also correlated with a unique
identified even if disturbance or deficiency occurs in the image post-fault power flow. Under this circumstance, although the
samples (aiming at Problem (iii)). These distinctive features reveal measurement devices in the faulted area might suffer a blackout or
that CNN has the potential to outperform ANN in the fault an interference with noise, monitoring power flow outside the
diagnosis of power systems. faulted area is theoretically feasible to identify faults due to the
unique mapping between the fault and the post-fault power flows.
Most importantly, the distinctive spatial feature of the power flow
1.2 Scope and contribution of the paper
indicates that the CNN, which also has distinctive perceptual skills
This paper proposes a CNN-based fault diagnosis method for the at inducing spatial-images, can be used in the identification of
detection of branch breaking as well as the loss of generators or power flow and the diagnosis of system faults.
loads using power flow of the system. First, the post-fault power
flow for different fault types is computed numerically; then, they 2.2 Transformation rule between the human-visualised power
are transferred into the form of a computer-vision-based power flow and CVPFI
flow image (CVPFI). Instead of using conventional alarm log
messages, the CVPFI is used as the input of the CNN because For system operators, the numerical calculated power flow in the
power flow is naturally sensitive to the interference of the system dispatch console is expressed as geometry, digits, characters and so
and because power flow also shows demonstrable plane-image on, as shown in Fig. 2a. However, if such information is directly
characteristics. Second, a feature-map restriction technique is poured into the CNN, the identification of the power flow would
proposed. Redundant feature maps of the CNN after initial fail, as the CNN can only perceive the colours, shapes and outlines
trainings can be removed via a similarity check, which optimises of the image. Therefore, the transformation rule between the
the network structure and simplifies the training process. Third, to numerical power flow and the CVPFI should be designed because
imitate the actual post-fault environment with severe noise, the only in this manner can the distinctive advantage of the CNN in
random noise of branch flow is considered in the sample perceiving spatial images be utilised.
generation process to enhance the robustness of the network. Technically, the CNN is more sensitive to the larger area and
The contributions of this paper are summarised as follows: deeper colours in an image. Considering the physical and electrical
characteristics of the power flow, the following principles can be
(i) The transformation rule fully utilises the distinctive image- adopted when designing the transformation rule.
identification advantage of the CNN. This rule provides an
alternative method in the fault diagnosis of a power system. (i) Area of electrical elements in the CVPFI: The area of a
(ii) The removal of redundant feature maps can reduce the training generator/load should be larger than that of a branch in the CVPFI,
cost efficiently. This technique is portable and can be used in other as the ‘status’ of the injected power source is higher than that of an
power system fields regarding the CNN. ordinary branch in actual system operations. However, some
important branches (such as tie lines in the interconnected system)
3986 IET Gener. Transm. Distrib., 2018, Vol. 12 Iss. 17, pp. 3985-3992
Fig. 2 Transformation rule between the numerical power flow and the CVPFI
(a) Human-visualised power flow, (b) Pixel matrix of the CVPFI after transformation, (c) Coloured CVPFI
The principles above are tutorial. A feasible and practical

transformation rule is designed below:
(a) The power output of the generator or load is depicted as a

square with 4 pixels.
(b) The branch/transformer flow is expressed as a line with 3
pixels.
(c) The value of the pixel is set as that of the numerical calculated
power flow.
(d) The connection buses are neglected in the CVPFI, as their
injected power is zero.
(e) The values of all the pixel colours in the blank area are set to
zero.
A demonstration of the proposed transformation rule is shown

Fig. 3 Structure of the CNN in Fig. 2.
From Fig. 2, the electric elements, the system topology, and the
Table 1 Hyperparameters of the CNN value and direction of the power flow are fully preserved in the
Layer Layer type Hyperparameters CVPFI, which can be used as samples for the training of the CNN.
Essentially speaking, compared with those photo images in the
C1 convolution input size: 28 × 28 × 1
ImageNet database, the CVPFI is a special ‘image’ because it is a
kernel size: 5 × 5 reflection of the electrical power flow in the form of shape and
kernel number: 6 colours. If the power flow of the system varies, the correlated
P1 ave-pooling input size: 24 × 24 × 6 CVPFI would also vary simultaneously. Hence, the electrical
C2 convolution input size: 12 × 12 × 6 principles are fully preserved in a CVPFI.
kernel size: 5 × 5
kernel number: 12 2.3 Network structure of the CNN
P2 ave-pooling input size: 8 × 8 × 12 In this section, a typical structure of a CNN is proposed. The CNN
F1 fully connected input size: 192 consists of two convolutional layers, two average-pooling layers
neuron number: 7 and one fully connected layer. The structure and parameter settings
follow the model in [26], whose structure is widely used in hand-
writing digital identification. The structure is shown in Fig. 3. The
also possess a larger area than other elements in the CVPFI to hyperparameters of the CNN include the number of kernels, the
highlight its status in operation. kernel size of the convolutional layers, the pool size of the average-
(ii) Topology of the system in the CVPFI: The location of the pooling layer, and the number of outputs of the last dense layer.
electric elements (generator, load, transformer, branch etc.) in the The hyperparameters are summarised in Table 1.
CVPFI should be similar to that of the electrical diagram of the Since CVPFIs are variable, two convolutional layers are applied
actual system. In this manner, both the topology of the actual to extract the spatial features of the image. After the first
system and the spatial feature of the power flow can be preserved convolution, the size of a feature map in C1 is 24 × 24, which is
in the CVPFI. smaller than the original 28 × 28 CVPFI input. Later, after the first
(iii) Colour in the CVPFI: Both the value and the direction of the subsampling (P1) and the second convolution (C2), the size of a
power flow can be preserved in the CVPFI via pixel colours in the feature map in C2 becomes 8 × 8, which is much smaller than C1.
area of the electric elements. In this manner, the colours in the In the end, all the sub-sampled feature maps in P2 are poured into
CVPFI can be utilised to reflect the digital value of the power flow. the fully connected layer to perform the final classification. More
details regarding the proposed structure can be found in [26].
(i) After the first convolution (C1), the local spatial structure of the
input image is already extracted by the CNN. If we take a deep
insight into these feature maps, a feature map in C1 appears as an
‘abstracted’ image of the CVPFI, as shown in Fig. 5a. However, in
C1, only shallow features of the CVPFI are extracted; thus, the
outline of the original CVPFI in these feature maps can still be
identified by humans.
(ii) After first pooling (P1) and the second convolution (C2), the
feature maps in C2 seem ‘more’ abstracted than those in C1, as
shown in Fig. 5b. This finding indicates that deep local spatial
Fig. 4 Perception of the CNN features of the CVPFI are extracted by the CNN. Notice that these
deep features can be perceived only by the CNN and that they
already cannot be perceived by humans.
(iii) After second pooling (P2), all the deep abstracted local spatial
features are converged in the fully connected layer. The fully
connected layer learns at an even more abstracted level, integrating
global information from across the entire image to obtain a feasible
classification of the CVPFI.
From the analysis above, the CNN can extract the local spatial
features via feature maps in each layer and can converge them
together to form a global perception of the image. This approach is
quite different from the classic ANN, whose input is depicted as a
vertical coded line of neurons.
3 Removal of redundant feature maps in the

convolutional layer
Generally, typical structures of the CNN for the identification of
the photograph images, such as LeNet-5, were constructed
manually via hundreds of experiments because the spatial features
of the normal photograph images in the ImageNet set are
complicated [25], making the CNN difficult to construct. Similarly,
in the structure of a CNN, for the identification of the CVPFI,
some feature maps might be redundant if too many feature maps
Fig. 5 Feature maps in C1 and C2 are involved in the convolutional layer. However, since the spatial
(a) 6 feature maps in C1, (b) 12 feature maps in C2 features of the CVPFI are much simpler than those photograph
images in the ImageNet database, the optimisation of the structure
of the CNN for the CVPFI technically becomes feasible. A
demonstration of the redundant feature map is shown in Fig. 6.
In Fig. 6, one can find that C1-f1 and C1-f5 are quite similar,
revealing that the two feature maps are highly correlated to the
same feature in the CVPFI.
To evaluate the similarity between the two feature maps A and
B, the matrix correlation coefficient rc can be defined as
∑m ∑n (Amn − Ā)(Bmn − B̄)

rc = 2 2 (1)
∑m ∑n (Amn − Ā) ∑m ∑n (Bmn − B̄)
where Ā = ∑m ∑n Amn /Nc2, B̄ = ∑m ∑n Bmn /Nc2

For the two feature maps A and B, the redundancy of B can be
judged via the criterion
rc > threshold (2)

Fig. 6 Redundant feature maps in the convolutional layer If (2) holds, then the feature map B can be removed from the
convolutional layer, which may reduce both the complexity of the
2.4 How the CNN perceives the CVPFI network and the time cost of trainings.
Once the CVPFI is set as the input of the CNN, a square window in
the input image is set as the ‘receptive field’ that slides across the 4 Sample generation and performance evaluation
entire input image. The mechanism of the perception of the CNN
4.1 Generation of the standard CVPFI samples
to the CVPFI is shown in Fig. 4.
In Fig. 4, once the local receptive field is positioned at a certain Theoretically, the training samples for the proposed CNN can be
area of the image, all the pixels in the area are convoluted by a directly obtained from the measured data in SCADA. However, to
kernel [27]. The fast scanning of the CVPFI via the receptive field imitate more abundant system states, including those merely
ensures that the local spatial features of the image can be captured occurred ones in actual system operation, it is necessary to generate
by the CNN and can be stored in feature maps. For the CVPFI in a large number of CVPFI samples off-line. In this paper, a sample
Fig. 3, the feature maps of layers C1 and C2 are shown in Fig. 5. generation procedure is provided as below:
From Fig. 5, the mechanism of the perception of the CNN can
be speculated as described below: Step 1: For generator Gi and load Lj, the random injected power of
the system is calculated as
(rand)
PGi = kGiPGi kGi = rand[0, KGi]
(rand)
(3)
PL j = kL jPL j kL j = rand[0, KL j]
In (3), KGi and KLj are pre-defined parameters that can be set
according to actual system dispatch patterns. kGi and KLj are
randomly generated values.
Step 2: Using the random power of the generator and load, the
injected power of each bus of the system can be obtained.
Step 3: For the branch-breaking case, the broken branch is first
removed from the topology of the system. For the normal case, no Fig. 7 Comparison between the normal case and the branch-breaking
change is made to the original topology of the system. If the total case
number of fault types is Nfault_type, the power flow of the normal (a) Normal case, (b) Branch-breaking case
case and fault cases would be calculated (Nfault_type + 1) times to
cover different topologies of the system.
Step 4: By iterating Ns times all the steps above, Ns × (Nfault_type +
1) samples can be obtained.
From the analysis above, a large number of standard samples

can be generated via the changes in both the injected power and the
topologies of the system. A comparison between the CVPFI of a
normal case and that of a fault case is shown in Fig. 7.
From Fig. 7, except for the injected power, all the branch flows
of the system vary after L4-5 is broken. Moreover, such spatial
variation of the power flow is fully reflected in the CVPFI, which Fig. 8 Two CVPFIs with the same fault but with different branch
also validates the effectiveness of the transformation rule, as stated disturbances (the fault type is L4-5 broken)
in Section 2.2. (a) Branch disturbance is −0.50 p.u., (b) Branch disturbance is 0.35 p.u.
Note that the power flow of broken branch is set to zero in the
standard samples in this section, but it is set randomly in the 4.3 Performance indices
feature covered samples, as analysed in the following section. This
approach is also the essential difference between the two types of For a classification problem, assume that the total number of the
samples. samples in the test data is Ntest. If the total number of the correctly
identified samples is Nright, then the accuracy of the identification
4.2 Generation of CVPFI samples using the feature-cover of the CVPFIs in the test data can be defined as
technique
Accuracy = Nright /Ntest (5)
To simulate the possible blackout and the interference to the
measurement devices after branch breaking occurs, random noise In (5), a higher accuracy indicates that the proposed CNN has a
can be added to the broken branch in sample generations better performance in fault diagnosis.
deliberately. For the fault case with broken line l, the branch flow The CNN may become over-fitted because of the larger number
with noise is calculated as of weights and biases in the network. Therefore, data augmentation
is a primary method to avoid overfitting of the proposed CNN. The
Pl = rand[ − Pdis, Pdis] (4) typical data augmentation techniques including noise injection,
horizontal reflection, and random sampling that are widely applied
In (4), Pdis is a pre-defined parameter. Demonstration of the two in CNN-based image classification can be migrated to train large
CVPFIs with the same fault but different branch flow noises are number of CVPFI samples. In addition, dropout and weight decay
shown in Figs. 8a and b, respectively. In the figure, the two can also be applied in the training of the proposed CNN [27].
samples are correlated to the same label (L4-5 broken).
In fact, except for simulating the actual industrial environment, 5 Case study
the key importance of adding noise to the broken branch flow is
that such action could make the CNN extract features of the image 5.1 Data description
outside the area of the broken branch because, once samples with The simulations are implemented using Python and Matlab on a
the same label are set as input of the CNN for training, the standard PC with an Intel Core i7-7700 CPU running at 4.20 GHz
disturbance would destroy the local spatial feature of the broken and with 16.0 GB of RAM. The deep CNN and ANN architectures
branch in the CVPFI. Under this circumstance, the CNN would are constructed based on Matlab and Theano [26, 27]. Three cases
neglect the area of the broken branch in the image because there is are simulated in this section, as outlined below:
no common feature in it. Instead, the CNN focuses on extracting
common spatial features of the samples outside the broken branch. Case 1: The CNN is trained with standard CVPFI samples, and the
In other words, the local spatial feature of the broken branch is network is restructured via removal of redundant feature maps.
likely to be ‘covered’, and it becomes blank from the perspective Case 2: The restructured CNN in Case 1 is also used in Case 2, but
of the CVPFI. the network, in this case, is trained with feature-covered samples.
The feature-cover technique in the generation of the CVPFI
Case 3: A comparison is provided between the ANN and CNN.
samples is of importance because it indicates that the fault can be
identified even though not all the information of the image is Case 4: A comparison is provided between the SVM and CNN.
perceived by the CNN, making the CNN quite robust in the
application of fault diagnosis, regardless of whether the The test system is a 9-bus system [28]. The basic power flow of
measurement devices are suffering from a blackout or are the system is shown in Fig. 2. In each case, the training set contains
interfered with noise. A further validation of the technique is 70,000 samples, and the test set contains 9800 samples. The
provided in the Case Study. samples in Case 1 are generated as standard samples, and the
samples in Cases 2–4 are generated as feature-covered samples
with branch noise, as analysed in Section 4. Seven types of branch
Table 2 Fault types
Fault Number of Number of Fault Number of Number of
type training test type training test
samples, samples, samples, samples,
103 103 103 103
normal 10 1.4 L6-9 10 1.4
L4-5 10 1.4 L7-8 10 1.4
L4-6 10 1.4 L8-9 10 1.4
L5-7 10 1.4
Fig. 9 Accuracy of the original CNN and restructured CNN in each epoch
Table 3 Parameters for the training of the CNN
Parameter Value
learning rate 1.0
batch size 50
epochs 50
Table 4 Time cost of the trainings in cases 1 and 2

Case Total elapsed time of 50 epochs, s
1 original CNN restructured CNN
2781.62 1290.94
2 original CNN restructured CNN
N/A 1302.11
Table 5 Output of the CNN for the CVPFI53

Fault type Output Fault type Output
normal 0.042 L6-9 0.001
L4-5 0.000 L7-8 0.836
L4-6 0.000 L8-9 0.000
L5-7 0.268 — —
Fig. 10 Incorrectly identified test sample

(a) CVPFI, (b) Pixel matrix Table 6 Correlations of the feature maps in C1 before
restructuring
breaking faults (including normal case) and the corresponding C1-f1 C1-f2 C1-f3 C1-f4 C1-f5 C1-f6
number of training and test samples are shown in Table 2. C1-f1 1 0.2361 0.6086 −0.1836 −0.7753 0.6299
The original structure of the CNN is shown in Table 1. The size
C1-f2 0.2361 1 −0.1652 −0.0790 −0.1247 0.0497
of the input layer is 28 × 28 (also the size of a CVPFI). The output
layer contains seven neurons that are correlated to seven fault C1-f3 0.6086 −0.1652 1 −0.0699 −0.0708 0.1627
types, as shown in Table 2. The activation function in the output C1-f4 −0.1836 −0.0790 −0.0699 1 0.3952 0.0313
layer is sigmoid. The parameter settings for the training of the C1-f5 −0.7753 −0.1247 −0.0708 0.3952 1 −0.6211
CNN are shown in Table 3. C1-f6 0.6299 0.0497 0.1627 0.0313 −0.6211 1
5.2 Case 1: Trainings of the CNN with the standard CVPFI

samples Table 7 Correlations of feature maps in C1 after
restructuring
5.2.1 Accuracy analysis: After the training of 50 epochs, the
elapsed time for training in Case 1 is shown in Table 4. The C1-f1 C1-f2 C1-f3
accuracy of each epoch is shown in Fig. 9. C1-f1 1 −0.4151 −0.4207
From Fig. 9, the accuracy reaches 90.75% at the 12th epoch, C1-f2 −0.4151 1 0.4211
and the maximum accuracy is 96.18%, which occurs at the 48th C1-f3 −0.4207 0.4211 1
epoch. Generally, the accuracy increases with the epochs. The
simulation reveals that the proposed method can identify both
normal and fault states of the power system, thereby proving the appears to be already ‘broken’ from the perspective of the CNN. In
effectiveness of the proposed method for fault diagnosis. fact, such an unclear CVPFI is impossible to be identified by
humans. In the output layer, the value of L7-8 is the highest among
5.2.2 Wrongly identified samples: In the 50th epoch, 374 test all the outputs, as described in Table 5. This result is also the
samples are not correctly identified. To further analyse the reason reason why the broken L7-8 is incorrectly identified as the final
for this result, an incorrectly identified test sample is provided result. Note that the value of L5-7 is the second highest, which
below. The label of the sample is L5-7 broken, but it is incorrectly indicates that the CNN identifies L5-7 as probably broken in this
identified as L7-8 broken. The CVPFI and its left-half pixel matrix case.
are shown in Figs. 10a and b, respectively. The output of CNN is
shown in Table 5. 5.2.3 Removal of redundant feature maps: Using the similarity
From Fig. 10b, the values of the pixels in the area of L5-7 are check analysed in Section 3, the procedure of the removal of
set to zero (marked in red) because L5-7 is broken. However, since redundant feature maps is outlined as follows:
the randomly generated power of G2 is too low, i.e. the output of
G2 is only 0.004 p.u., the branch flow of L7-8 is extremely low.
Under this circumstance, although L7-8 is not actually broken, it
s-1: rc between two feature maps in C1 of the original CNN is
calculated, as shown in Table 6. If the threshold is set to 0.6, f1 and
f3, f1 and f5, and f1 and f6 are found to be highly correlated; thus
f3, f5 and f6 can be removed from C1, i.e. only three feature maps
are left in C1.
s-2: After restructuring and retraining, the correlation result is
shown in Table 7. From Table 7, no redundant feature map occurs
in C1.
s-3: The similarity check is further extended to C2. It is found that
f3 and f4, and f6 and f10 are highly correlated. Thus, f4 and f10 are
removed from C2.
s-4: After the second restructuring and retraining, no redundant
feature map occurs at either C1 or C2. Thus, the final structure of
the CNN is three feature maps in C1 and ten feature maps in C2. A
demonstration of the feature maps in the final restructured CNN is
shown in Fig. 11. From the figure, the feature maps in each
convolutional layer are quite different, and no redundant map exists
in the network after the restructuring processes.
Fig. 11 Feature maps in the restructured CNN
From Table 4 and Fig. 9, after restructuring, the training elapsed
time is reduced to 46% of that of the original CNN. Moreover, the
accuracy increases more rapidly in the first ten epochs when using
the restructured CNN because the removal of feature maps
profoundly simplifies the structure of the network, i.e. the number
of both convolutional layers and pooling layers are reduced
effectively, which in turn directly reduces the complexity of
training. Therefore, the restructured CNN can be trained through
fewer epochs and shorter time, highlighting the advantage of the
network restructuring.
5.3 Case 2: Trainings of the CNN with feature covered

CVPFI samples
Fig. 12 Accuracy of three sub-cases
The restructured CNN is applied in Case 2. Three sub-cases are
provided below for comparison: Table 8 Accuracy of the SVM with different penalty
parameters
sc-1: Training samples are standard, and test samples are feature C Accuracy, % C Accuracy, %
covered. 1000 87.78 10 79.70
sc-2: Both training samples and test samples are feature covered. 500 87.66 1 68.84
sc-3: Training samples are feature covered, and test samples are 100 87.31 0.1 57.01
standard.
The accuracy of the three sub-cases is shown in Fig. 12. The

which uses a randomly coded line of the power flow as inputs, the
elapsed time for training in sc-2 is shown in Table 4.
CNN has a stronger capability in identifying system faults because
From Fig. 12, in sc-1, the accuracy is lower than 30% after 30
the CNN has a distinctive advantage in capturing the spatial
epochs. Such a result is of importance because it reveals that the
features of the power flow. In particular, the spatial dimensional
CNN trained by the standard samples places more focus on the
information of the power flow of the system cannot be extracted by
spatial features of the broken branch, and it neglects the spatial
the ANN because of the defect of the technique. However, by using
features outside the fault area; as a result, the CNN cannot handle a
a local receptive field and a convolution technique to scan the
branch disturbance. Comparatively, in sc-2, the accuracy is
whole CVPFI, such spatial information can be captured by the
profoundly increased to >96% after 40 epochs because the CNN is
CNN. This capability explains why the CNN performs better than
trained by feature-covered samples, making the CNN quite robust
the ANN in fault diagnosis.
in tolerating branch disturbance. In addition, the simulated result in
sc-3 proves that the CNN trained by feature-covered samples can
also identify standard samples well. Therefore, compared with the 5.5 Case 4: Comparison between the SVM and the proposed
CNN in sc-1, the CNN trained by feature-covered samples is both CNN
robust and feasible in fault diagnosis. A comparison is made between the SVM and the restructured
CNN. The support vector classification is used in this case. The
5.4 Case 3: Comparison between the ANN and the proposed input of the SVM is also set as a vertical coded line, similar to that
CNN of the ANN in Case 3. The training set of the SVM contains 70,000
samples, and the test set contains 9800 samples. The identification
A comparison is made between the ANN and the restructured
accuracy of the SVM with different values of penalty parameter C
CNN. The ANN is a classic three-layer network, with the input
is shown in Table 8. Other parameters of the SVM are set as the
layer, the hidden layer, and the output layer containing 15 neurons,
default values using Python sklearn [29].
16 neurons, and 7 neurons, respectively. The input of the ANN is a
From Table 8, the accuracy of the SVM is profoundly affected
vertical coded line of the power flow of the generator, load and
by the variance of the penalty parameter. The maximum
branches. Branch disturbance is considered in the training samples
identification accuracy of the SVM is <88%, which is even lower
of the ANN. The accuracy of the ANN in 50 epochs is shown in
than that of the ANN and the CNN. Therefore, the proposed CNN
Fig. 12.
shows a better performance and is more robust than the SVM in the
From Fig. 12, the accuracy of the ANN increases more rapidly
fault diagnosis of power systems.
than that of the CNN in the first five epochs. In contrast, in the
following epochs, the accuracy of the CNN becomes higher than
that of the ANN by ∼4%. Therefore, compared with the ANN,
6 Conclusions [7] Lee, H.J., Ahn, B.S., Park, Y.M.: ‘A fault diagnosis expert system for
distribution substations’, IEEE Trans. Power Deliv., 2000, 15, (1), pp. 92–97
A deep-learning based fault diagnosis method using computer- [8] dos Santos Fonseca, W.A., Bezerra, U.H., Nunes, M.V.A., et al.:
‘Simultaneous fault section estimation and protective device failure detection
visualised power flow is proposed in this paper. Based on the using percentage values of the protective devices alarms’, IEEE Trans. Power
numerical calculated power flow, the CVPFI can be formed via the Syst., 2013, 28, (1), pp. 170–180
transformation rule. Through the training of massive CVPFI [9] Bi, T., Yan, Z., Wen, F., et al.: ‘On-line fault section estimation in power
samples, a CNN that aims to identify system faults can be obtained. systems with radial basis function neural network’, Int. J. Electr. Power
Energy Syst., 2002, 24, (4), pp. 321–328
The removal of redundant feature maps in the CNN could end the [10] Cardoso, G., Rolim, J.G., Zürn, H.H.: ‘Application of neural-network
training with fewer epochs and a shorter time. The feature-cover modules to electric power system fault section estimation’, IEEE Trans.
technique could make the CNN robust in tolerating system Power Deliv., 2004, 19, (3), pp. 1034–1041
interference. The case study results revealed that the CNN [11] LeCun, Y., Bengio, Y., Hinton, G.: ‘Deep learning’, Nature, 2015, 521, pp.
436–444
proposed in this paper can identify system faults with disturbances. [12] Hinton, G.E., Salakhutdinov, R.R.: ‘Reducing the dimensionality of data with
In particular, the CNN trained by feature-covered CVPFIs is both neural networks’, Science, 2006, 313, (5786), pp. 504–507
robust and feasible in terms of fault diagnosis. Finally, the CNN [13] Schmidhuber, J.: ‘Deep learning in neural networks: an overview’, Neural
has the potential to perform better than the ANN and SVM because Netw., 2015, 61, pp. 85–117
[14] Shi, H., Xu, M., Li, R.: ‘Deep learning for household load forecasting–a novel
it has a distinctive advantage in capturing the spatial information of pooling deep RNN’, IEEE Trans. Smart Grid, 2017, PP, doi: 10.1109/
the power flow. The findings of this study can also be applied to TSG.2017.2686012
other power system fields, even transient stability problems. [15] Mocanu, E., Nguyen, P.H., Gibescu, M., et al.: ‘Deep learning for estimating
In this paper, a large number of power flow samples are building energy consumption’, Sustain. Energy Grids Netw., 2016, 6, pp. 91–
99
generated using a small test system. However, the generation of [16] Varga, E.D., Beretka, S.F., Noce, C., et al.: ‘Robust real-time load profile
power flow samples may be a challenge for those multi-machine encoding and classification framework for efficient power systems operation’,
systems with hundreds of generators due to the convergence IEEE Trans. Power Syst., 2015, 30, (4), pp. 1897–1904
problem. Solving the problem also requires the invention of some [17] Nguyen, V.N., Jenssen, R., Roverso, D.: ‘Automatic autonomous vision-based
power line inspection: a review of current status and the potential role of deep
novel sample generation techniques in future work. learning’, Int. J. Electr. Power Energy Syst., 2018, 99, pp. 107–120
[18] Zheng, Z., Yang, Y., Niu, X., et al.: ‘Wide and deep convolutional neural
7 Acknowledgments networks for electricity-theft detection to secure smart grids’, IEEE Trans.
Ind. Inf., 2018, 14, (4), pp. 1606–1615
This work was supported by the project ‘Artificial Intelligence [19] Mocanu, E., Mocanu, D.C., Nguyen, P.H.: ‘On-line building energy
optimization using deep reinforcement learning’, IEEE Trans. Smart Grid,
Based Power System Regulation Framework and Representative 2018, PP, doi: 10.1109/TSG.2018.2834219
Technologies of Reactive Power and Voltage Control’ of the State [20] López, K.L., Gagné, C., Gardner, M.A.: ‘Demand-side management using
Grid Corporation of China. deep learning for smart charging of electric vehicles’, IEEE Trans. Smart
Grid, 2018, PP, doi: 10.1109/TSG.2018.2808247
[21] He, Y., Mendis, G.J., Wei, J.: ‘Real-time detection of false data injection
8 References attacks in smart grid: a deep learning-based intelligent mechanism’, IEEE
Trans. Smart Grid, 2017, 8, (5), pp. 2505–2516
[1] Salehi-Dobakhshari, A., Ranjbar, A.M.: ‘Application of synchronised phasor [22] Kong, W., Dong, Z.Y., Hill, D.J., et al.: ‘Short-term residential load
measurements to wide-area fault diagnosis and location’, IET Gener. Transm. forecasting based on resident behaviour learning’, IEEE Trans. Power Syst.,
Distrib., 2014, 8, (4), pp. 716–729 2018, 33, (1), pp. 1087–1088
[2] Sun, J., Qin, S.Y., Song, Y.H.: ‘Fault diagnosis of electric power systems [23] Silver, D., Schrittwieser, J., Simonyan, K., et al.: ‘Mastering the game of go
based on fuzzy petri nets’, IEEE Trans. Power Syst., 2004, 19, (4), pp. 2053– without human knowledge’, Nature, 2017, 550, pp. 354–359
2059 [24] Mnih, V., Kavukcuoglu, K., Silver, D., et al.: ‘Human-level control through
[3] Xu, L., Kezunovic, M.: ‘Implementing fuzzy reasoning petri-nets for fault deep reinforcement learning’, Nature, 2015, 518, pp. 529–533
section estimation’, IEEE Trans. Power Deliv., 2008, 23, (2), pp. 676–685 [25] ‘Imagenet database’. Available at http://www.image-net.org
[4] Zhu, Y., Huo, L., Lu, J.: ‘Bayesian networks-based approach for power [26] ‘Deep learn toolbox’. Available at https://github.com/rasmusbergpalm/
systems fault diagnosis’, IEEE Trans. Power Deliv., 2006, 21, (2), pp. 634– DeepLearnToolbox
639 [27] ‘Neural networks and deep learning’. Available at http://
[5] Oliveira, A.L., de Araújo, O.C.B., Cardoso, G., et al.: ‘A mixed integer neuralnetworksanddeeplearning.com
programming model for optimal fault section estimation in power systems’, [28] Anderson, P.M., Fouad, A.A.: ‘Power system control and stability’ (The Iowa
Int. J. Electr. Power Energy Syst., 2016, 77, pp. 372–384 State University Press, Ames, 1977)
[6] Chen, W.H., Tsai, S.H., Lin, H.I.: ‘Fault section estimation for power [29] ‘scikit-learn’. Available at http://scikit-learn.org/stable/modules/generated/
networks using logic cause-effect models’, IEEE Trans. Power Deliv., 2011, sklearn.svm.SVC.html
26, (2), pp. 963–971

Wang 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wang 2018

Uploaded by

Copyright:

Available Formats

IET Generation, Transmission & Distribution

Deep-learning based fault diagnosis using ISSN 1751-8687

1 Introduction (i) ANN suffers the ‘curse of dimensionality’ problem.

In this paper, for simplification, we only discuss the fault case

2 Training CNN with CVPFI

The principles above are tutorial. A feasible and practical

(a) The power output of the generator or load is depicted as a

A demonstration of the proposed transformation rule is shown

3 Removal of redundant feature maps in the

∑m ∑n (Amn − Ā)(Bmn − B̄)

where Ā = ∑m ∑n Amn /Nc2, B̄ = ∑m ∑n Bmn /Nc2

rc > threshold (2)

From the analysis above, a large number of standard samples

Table 4 Time cost of the trainings in cases 1 and 2

Table 5 Output of the CNN for the CVPFI53

Fig. 10 Incorrectly identified test sample

5.2 Case 1: Trainings of the CNN with the standard CVPFI

5.3 Case 2: Trainings of the CNN with feature covered

The accuracy of the three sub-cases is shown in Fig. 12. The

You might also like