Professional Documents
Culture Documents
net/publication/347907440
CITATIONS READS
0 117
4 authors, including:
Ehdieh Khaledian
Washington State University
11 PUBLICATIONS 29 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ehdieh Khaledian on 26 December 2020.
Abstract—Power grid operators assess situational awareness estimation, event detection [10]–[12], protection system failure
using time-tagged measurements from phasor measurement units diagnosis [13]–[15], system instability detection [16], etc.
(PMUs) placed at multiple locations in a network. However, The complex infrastructure to deliver high-resolution time-
synchrophasor measurements are prone to anomalies which may
impact the performance of phasor based applications. Anomalies stamped data makes the synchrophasor measurements prone
include any deviation from expected measurements resulting to bad data, such as, missing data or outliers. Synchrophasor
from power system events or bad data. Bad data include data data with its high reporting rate tend to capture the power
errors or loss of information due to failures in supporting system events occurring in the system, called event data [17].
synchrophasor cyber infrastructure. It is necessary to flag bad Event data can be faults in the system, switching operations,
data before utilizing for an application. This work proposes a tool
for the detection and classification of anomalous data using an load changes, generator drop, and other such power system
unsupervised stacked ensemble learning algorithm. The proposed events. If fed with low quality or erroneous data, applications
synchrophasor anomaly detection and classification (SyADC) tool might produce a result that can be misleading. Therefore,
analyzes a selected window of data points using a combination it is pertinent to detect and classify an anomaly into bad
of three unsupervised methods, namely: isolation forest, KMeans data or event data to aid and improve the performance of
and LoOP. The method classifies the data as anomalies or normal
data with more than 99% recall. The method also provides a synchrophasor based applications.
probability of the data to be an event or bad data with more The anomalies in PMU data can be modeled as outliers
than 99% recall. Results for the IEEE 14 and 68 bus systems with in time -series data (i.e., a sequence of data points). Iden-
synchrophasor data obtained using Real-Time Digital Simulator tification of outliers in PMU time-series data is proposed
and data of industrial PMUs highlight the superiority of the in [18]. A Kalman filter-based algorithm for conditioning of
algorithm to detect and classify anomalies.
synchrophasor data is developed in [19]. Several clustering-
Index Terms—Phasor Measurement Units (PMU), Kalman based methods for detecting bad data in PMUs have been
Filter, Isolation Forest, Anomaly Detection, Event Detection. proposed in recent times [20], [21]. Statistical, clustering and
unsupervised machine learning-based methods to detect bad-
I. I NTRODUCTION data have been developed in [22]. Further, [23]–[25] provide
more insight into power system event detection and recovery
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
2
is used. The proposed method detects the anomalies with a bad results for the application due to inconsistencies
recall (true positive) of 99%. Furthermore, the correlation in archival, such as, mixing of PMU ID, data drop or
between the PMUs is computed to detect the events and PDC corruption.
errors; thus increasing the algorithm’s accuracy. The signifi- These issues might corrupt the PMU data making it unfit
cant contributions of this work are summarized as follows: for use in an application. Therefore, pre-processing of data
1) Real-time anomaly detection using unsupervised ma- is required before an application uses it. Data received from
chine learning with high accuracy and using limited PMUs can have outliers, missing data and event data. The
memory bound. data set shown in Fig. 2 contains four outlier points and two
2) Classification of anomalous data into bad-data and instances of missing data in chunks. It also contains three
power system events using data correlation techniques. power system events, seen as transients in the plot. SyADC
3) Detection of anomalies in PMU data due to PDC errors. aims to provide the user of PMU data with information in real-
time about the quality of the data. The additional information
Section II provides an overview of PMU data flow and data will aid the application’s performance. For example, if there
anomalies. Section III discusses the effect of data anomaly on is bad data, the application will wait for good data to arrive.
applications. The proposed methodology has been presented Similarly, an application designed to work in a quasi-steady
in section IV. Results and concluding remarks are provided in state will benefit if SyADC informs the application in real-time
section V and section VI respectively. that the reported data is an event data.
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
3
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
4
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
5
Applying the iforest to a window magnifies the anomalies C. Clustering Using KMeans
in the window. The output of this step would be the input for
The output of iforest is fed to a KMeans [38] algorithm as
the next algorithm that gives the probability of outlying from
the second layer learner. This learner classifies the results from
the normal data for each data point.
the iforest into binary clusters of normal/anomaly classes. Let
(1) (1)
an initial list of clusters of KMeans be m1 , ..., mk . Each si
B. Probability Using LoOP
would be assigned to the cluster which has the least squared
In order to compute the probability of a particular data point Euclidean distance as given by (12).
in a window to be a local density outlier, the local outlier
2
2
probability (LoOP) [36] algorithm is applied. The output (t) (t)
(t)
Ci = {sp :
sp − mi
≤
sp − mj
∀j , 1 ≤ j ≤ k}
obtained from the iforest is fed to the LoOP algorithm. LoOP
(12)
carefully resembles the popular local outlier factor (LOF) [37]
algorithm and normalizes the outlier factors to probabilities. The cluster in each step can be updated using (13).
It uses statistical theory to determine the final score which
indicates the probability of an observation in a window to (t+1) 1 X
mi = sj (13)
be a local density outlier. These probabilities facilitate the (t)
Ci xj ∈C (t)
comparison of a data point with its neighbors in the dataset i
Then, LoOP is applied to compute the probability of an where Ξi is the mean of points in Ci . This is equivalent to
observation s ∈ ρ to be an outlier. This probability is derived minimizing the pairwise squared deviations of points in the
from a so-called standard distance from s to reference points same cluster (15):
R: k
sP X 1 X 2
2 argminC kx − yk (15)
r∈R dist(s, r) 2 |Ci |
σ(s, R) = (7) i=1 x,y∈Ci
|R|
Let C be the the output of the KMeans which is a set of
where dist(s, r) is the distance between s and r given by
cluster labels with size w and ci = 1, or ci = 0 ∀ci ∈ C.
a distance metric (e.g., Euclidean or Manhattan distance).
The probabilistic set distance of a point d to reference
points S with ‘significance’ λ (usually 3, corresponding to D. Ensemble of Observations
98% confidence) is defined as:
The final output can be computed by combining the result
pdist(λ, s, R) = λ ∗ σ(s, r) (8) of the LoOP (li ) and KMeans (ci ) for observation i. The
probability of the observation i to be an anomaly is obtained
From the following step, nearest neighbors (NN) are used as as follows (17):
reference sets. NN here refers to the nearest Euclidean distance
between observations resulted from the iforest. For a given P = pi , pi+1 , pi+2 , ...pi+w (16)
neighborhood size k and significance λ, the probabilistic local
outlier factor (PLOF) of data point d is defined as: P = C.L = ci li , ci+1 li+1 , ..., ci+w li+w (17)
pdist(λ, s, N Nk (s)) In this problem, K=2, K here is the number of clusters, is
P LOFλ,k (s) = − 1. (9)
Er∈N Nk (s) [pdist(λ, r, N Nk (s))] considered which means demanding the KMeans algorithm to
divide the observations into two groups. KMeans is sensitive
Finally, this is used to define local outlier probabilities.
to changes because it minimizes the sum-of-squares which
Given the previous equations (7, 8, and 9), the probability
puts more weight on instances different from the normal data.
that a data point s ∈ ρ is a local outlier is defined as:
It puts normal data into one group. However, it might put
P LOFλ,k (s) normal data with small changes in the second group that are
LoOPλ,k (s) = max 0, erf √ (10)
nP LOF · 2 considered outliers. The normal data is annotated as zero and
q outliers as one. So, ci = 0 addresses the normal data, and
nP LOF = λ. E[P LOF 2 ] (11) ci = 1 shows the outliers. Also, li closer to 0 (the prediction
result from LoOP) shows the normal data. By multiplying the
Let L be the output of the LoOP (10) which is a set of KMeans cluster results with LoOP results, the normal data
probabilities with size w and 0 ≤ li ≤ 1, ∀li ∈ L. The value detected by KMeans is masked which increases the precision
li is the probability of a data to be an outlier, and its value of the ensemble method because it prevents assigning normal
ranges from 0 to 1. data to outliers.
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
6
pm
P1 pm
P2
E. Data classification corr2(P M Uj , P M Uk )
j=1 k=1 (20)
A combination of the results from KMeans and LoOP is corr2D(Z1 , Z2 ) =
used to detect the anomalies. If pi ≤ 0.1 ∀pi , pi ∈ P , it pm1 pm2
indicates either no anomalies in the data, or PDC error in th
where Fi in (19) represents the i feature for the PMU, and
the whole window. If pi > 0.1, there are anomalies in the P M Uj is the j th PMUs in the PDC. If the correlation between
window. The threshold with the smallest error is defined as PMUs in adjacent PDCs is less than 0.4, it is labeled as bad
the optimized threshold where the error is obtained as the data. For the correlation value greater than 0.4, a power system
ratio of the number of falsely detected to the whole number of event is identified.
data. To obtain this, the algorithm was run on several datasets The distribution of correlation between two PMUs belong-
for a range of thresholds from 0 to 1. For data sets with ing to the same PDC is shown in Fig. 6. First, the correlation
5%, 10%, and 15% anomaly rate, it was observed that there for 806 different windows of size 10 (8060 data points) was
is a global minimum when the thresholds are about 0.1. If computed for the test data set. Then, the distribution of the
pi for 50% of the data points is greater than 0.1, then it is correlations was plotted for 806 windows using box plots. The
not possible to classify the normal data from anomalies. For median value (indentations in box) for normal data was about
such a case, the data window is extended to 1.5 times of the 0.85 which is not shown as correlation for normal data is
original window size and the algorithm is applied again. In not required. However, for bad data, the correlation value is
this case, to figure out the type of anomaly, the correlation close to 0.30. Importantly, the boxes between bad data and
between individual PMU data streams such as I, V and F PDC error, normal data, and events did not overlap which
(Table I) is checked. For this, the Pearson correlation (18) demonstrated the distinct correlation ranges. It was observed
is used to compute the correlation. The Pearson correlation that the correlation for anomalies varied from 0 to 0.4, mainly
coefficient is a widely used measure for linear relationships gravitating in 0.2 to 0.4 interquartile. For events and PDC
between two normal distributed variables and thus, often just errors, it ranged from 0.7 to 1. The correlation for the event
called correlation coefficient [39]. The correlation values vary occurrences ranged from 0.7 to 0.95, and for PDC errors
from -1 to +1 . These values are rendered by considering the correlation was very high and more than 0.9. The PDC
the thresholds inferring from previous studies [39]–[41] and error correlation was high because PMUs have almost zero
experiments (Fig. 6). A value of −0.4 ≤ corr ≤ 0.4 indicates values for all data points. The correlation of 0.4 to 0.7 usually
no correlation between data streams, and it can be inferred happens for an event in a neighboring zone. The data set
as bad data. The correlation values of 0.4 < corr < 0.7 and analyzed to derive the threshold is from IEEE 14 bus simulated
−0.7 < corr < −0.4 explicate a moderate correlation which in real-time digital simulator (RTDS).
might be the impact of an event from another zone. Therefore,
the algorithm requires two more steps to detect the events.
The same applies to the correlation values 0.7 < corr < 1
and −1 < corr < −0.7 which outlines a strong correlation.
Therefore, in the next level, the algorithm calculates the
correlation between PMUs in the same zone using (19). An
event will affect the data streams from all PMUs in a PDC.
If the correlation among PMUs belonging to a certain PDC
is −0.4 ≤ corr ≤ 0.4, then the algorithm suggests that there
is bad data.Otherwise, the algorithm calculates the correlation
between the PMUs of different PDCs (Table I). A value of
|corr| ≤ 0.4 indicates no correlation between data streams
, and it can be inferred as bad data [39]. For |corr| > 0.4,
two more steps are required to detect the events. First, the
correlation between PMUs in the same PDC are calculated
using (20). If the correlation between PMUs in a PDC is Fig. 6: Correlation distribution between two PMUs
less than 0.7, then it is concluded as bad data. Otherwise,
the correlation between PMUs of different PDCs is calculated
F. PDC error detection
(Table I).
For correlation between two different PMUs, the matrix After detecting the anomalies, the next step is detection of
correlation is defined as (19). For two PDCs, correlation is PDC errors.
Pn Pw
defined as (20). j=1 i=1 Fij
µ(D) = (21)
cov(X, Y ) nw
corr(x, y) = (18)
σx σy If µ(D) ≤ 0.2 for a PMU in a zone, it means the data in
the PMU might be missing packet data.
n
P
corr(AFi , BFi ) If a PDC malfunctions, it reports ‘0’ or ‘Nan’ depending
i=1 upon settings for all data streams in the PMUs. All the ‘Nans’
corr2(A, B) = (19)
n are converted to zeros for simplicity. Therefore, if the average
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
7
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
8
anomalies are correlated with a specific PDC, that will also Bus 8 for Fig. 8. As seen in Table II, the different rates of
be detected. bad-data ranged from 5% to 15%.
Fig. 9 presents a detailed test analysis of PMUs at Bus 6 and
V. R ESULTS Bus 10, which belong to the same PDC. In Bus 6 and Bus 10,
two instances of PDC errors were simulated, totaling 80 data
To validate the proposed method, two test systems were
points. Bus 6 also had an event simulated. First, the events,
modeled in the RTDS, and PMU data from the hardware PMU
PDC errors, and bad data were all detected as anomalies.
and OpenPDC software were obtained. Different power system
Next, the application of the correlation processing classified
events like faults, generator drop, load change, transformer
the types of anomalies. In scenario A, a small number of data
tap change and cap bank switching were modeled at different
points were normal, and the majority were zeros (PDC error).
buses. The SyADC algorithm was run on each PMU data set
At first, the unsupervised algorithm detected the smaller cluster
and the results are presented in the subsection below.
as anomalies. The average value µ(D) of the major cluster
was calculated, and it was less than the set threshold (0.2
pu). Therefore, the major cluster was detected as PDC error
and the smaller one as normal data points. In scenario B, all
the data points had the same distribution, so the ensemble
algorithm detected those as normal data. However, in the post-
processing step, µ(D) < 0.2 p.u. Therefore, the correlation
between all the PMUs of the same PDC was computed. A
strong correlation was detected; thus indicating a case of PDC
error. In scenario C and D, an individual missing data and an
outlier was simulated which were detected by the unsupervised
algorithm. Since these were individual anomalies, a weak
correlation was detected, leading to accurate classification of
anomalies. In scenario E, first, anomalies were detected by
the unsupervised algorithm. In the post-processing step, the
Fig. 8: IEEE 14 Bus network with distribution of PMUs correlation was calculated between data streams (current and
voltage in Fig. 9). A strong correlation was observed; hence
the correlation with other neighbor PMUs was computed. In
A. Results this particular case, there was a correlation between PMUs at
Bus 6 and Bus 10, although the event occurred at Bus 6. In
The length of the data window was 8060 points with a
scenario F, the data were normal, and the proposed algorithm
rate of 30 frames per seconds. Bad data was injected in five
correctly identified the normal data. It is to be noted that all
different fractions of the data set. The percentage of bad
the other anomalies simulated for the above described test case
data injections were varied from 5% to 15%. The range of
shown in Fig. 9 were also detected but was not color-coded
data anomalies varied from 0.07% to 50%. For example, if a
for simplicity of explanation.
selected data point has a voltage of 1 p.u, a random function
that follows normal distribution updates it to a value in the
range of 0.5-0.93, 1.07-1.5 or zero (missing data). TABLE II: T EST RESULTS FOR BUS 6 AND 8
Outlier detection is an imbalanced classification problem. Anomaly Rate
Data is classified as anomalous or normal, with the normal 5% 7.5% 10% 13% 15%
category representing the majority of the data points. In this TP 424 608 882 1051 1174
FP 82 80 67 81 76
problem, accuracy is not a proper measure for assessing model TN 7560 7374 7110 6909 6778
PMU Bus 6
performance. Therefore, two metrics, recall and precision, FN 1 5 8 26 39
were used for evaluating the results [42]. First, false positive Recall 0.997 0.99 0.99 0.97 0.97
Precision 0.84 0.89 0.93 0.93 0.94
(FP), false negative (FN), true positive (TP) and true negative TP 418 609 875 1042 1180
(TN) were computed. Following this, the recall and precision FP 67 63 43 81 63
were computed using (22) and (23): TN 7580 7390 7142 6920 6788
PMU Bus 8
FN 2 5 7 24 36
TP Recall 0.995 0.99 0.99 0.98 0.97
precision = (22) Precision 0.86 0.90 0.95 0.93 0.95
TP + FP
TP
recall = (23) Table III shows the data for a window size 10 with three data
TP + FN streams, voltage, current and frequency and the probability
where TP is the number of bad data detected as bad data, results. The probabilities were zero for all data points, except
FN is the number of bad data detected as normal data, TN is for data points: 2, 5, and 8. The data points 2 and 5 were
the number of normal data detected as normal data, and FP detected as anomalies; but the probability of being anomaly
is the number of normal data detected as bad data. Table II for data point 8 was less than the selected threshold. Therefore,
presents the recall and precision for PMU Bus 6 and PMU it was considered as normal data.
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
9
Fig. 9: Plot for two PMUs showing different scenarios, for Bad data, Events, and PDC errors detected using SyADC. A) A
smaller part of the window are normal data points, and majority are PDC error B) PDC error C) Individual missing data D)
Individual outlier E) Event F) Normal data.
TABLE III: DATA CLASSIFICATION USING PROBABILITY
VALUE If data is highly polluted (' 15%), the recall for the method
still remains more than 97% as it detects the anomalies for
S.No. Voltage Current Frequency Probability Flag
1 7.46 1.87 60.02 0 Normal
small windows of data locally.
2 7.46 0.87 60.02 0.53 Anomaly Table VI shows the processing time for data with four
3 7.46 1.87 60.02 0 Normal different window sizes. The processing time for a window
4 7.46 1.87 60.02 0 Normal
5 8.06 1.88 60.02 0.41 Anomaly size of 100 data points and 15% rate of anomalies is less than
6 7.47 1.89 60.02 0 Normal 0.13 seconds. Larger window size results in better accuracy
7 7.47 1.89 60.02 0 Normal of the tool; however, the time for anomaly detection and
8 7.46 1.87 60.02 0.024 Normal
9 7.47 1.89 60.02 0 Normal classification increases which is not desirable for application in
10 7.47 1.89 60.02 0 Normal real time. A window size of 10 provides the desirable accuracy
of data to be fed to PMU applications.
Table IV and V compares the performance of the proposed
tool with other methods. For the data with 15% anomalies, the TABLE VI: P ROCESSING TIME PER WINDOW
precision is almost 96%. Recall for the data with 5% error is Window Size (#of data points) Processing Time (Seconds)
99.7%. 10 0.105
TABLE IV: C OMPARATIVE ANALYSIS OF S YADC 20 0.113
50 0.116
Anomaly Rate Method Precision Recall 100 0.129
Linear Regression [43] 0.78 0.857
DBSCAN [44] 0.80 0.901
5%
P M UEnsemble [22] 0.86 0.90 For the selected thresholds, area under curve (AUC) Re-
SyncAD [27] 0.88 0.936
SyADC 0.86 0.997 ceiver operating characteristic (ROC) for the proposed method
Linear Regression 0.82 0.83 is about 99%. The AUC-ROC for three other approaches
DBSCAN 0.81 0.93 compared to the proposed method is shown in Fig. 10.
10%
P M UEnsemble 0.90 0.94
SyncAD 0.91 0.947 The principal clustering algorithm iforest uses a small
SyADC 0.93 0.99 sub-sampling size which can efficiently perform anomaly
Linear Regression 0.86 0.80 detection with minimal memory footprint. Also, iforest
DBSCAN 0.82 0.94
15%
P M UEnsemble 0.92 0.95 has a space complexity of O(n) because at worst, it needs
SyncAD 0.93 0.965 to go ( 12 × n) times of the array which requires a low
SyADC 0.95 0.97 memory bound. The time complexity is O(log n) because
the array cuts in half every time you iterate. For KMeans
TABLE V: S YADC FEATURES and LoOP, the result of the iforest is used as input which
is only one dimensional and therefore memory usage
Reg. DBSCAN P M UEns. SyncAD is low. For example, space complexity for KMeans is
SyADC
[43] [44] [22] [27]
Bad O((m+k)n), where m is the number of objects and n is
X X X X X
Data the number of attributes considering n-dimensional objects
Events 7 7 7 7 X which here is one. Therefore, the space complexity is O(m+k).
PDC
7 7 7 7 X
Faults
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
10
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
11
[2] Y. V. Makarov, P. Du, S. Lu, T. B. Nguyen, X. Guo, J. Burns, J. F. IEEE Power & Energy Society General Meeting (PESGM), pp. 1–5,
Gronquist, and M. Pai, “PMU-based wide-area security assessment: IEEE, 2019.
concept, method, and implementation,” IEEE Trans. Smart Grid, vol. 3, [25] K. Chatterjee, K. Mahapatra, and N. R. Chaudhuri, “Robust recovery
no. 3, pp. 1325–1332, 2012. of PMU signals with outlier characterization and stochastic subspace
[3] P. Kundu and A. K. Pradhan, “Enhanced protection security using the selection,” IEEE Trans. Smart Grid, 2019.
system integrity protection scheme (SIPS),” IEEE Trans. Power Del., [26] E. McCollum, J. Bestebreur, J. Town, and A. Gould, “Correlating
vol. 31, pp. 228–235, Feb. 2016. protective relay reports for system-wide, post-event analysis,” in 2018
[4] M. A. Donolo, “Advantages of synchrophasor measurements over 71st Annual Conference for Protective Relay Engineers (CPRE), pp. 1–
SCADA measurements for power system state estimation,” tech. rep., 6, Mar. 2018.
Schweitzer Engineering Laboratories, 2006. [27] S. Pandey, A. Srivastava, and B. Amidan, “A real time event detection,
[5] M. Göl and A. Abur, “A hybrid state estimator for systems with limited classification and localization using synchrophasor data,” IEEE Trans.
number of PMUs,” IEEE Trans. Power Syst., vol. 30, no. 3, pp. 1511– Power Syst., pp. 1–1, 2020.
1517, 2015. [28] P. NASPI, “PMU data quality: A framework for the attributes of PMU
[6] S. A. N. Sarmadi, A. S. Dobakhshari, S. Azizi, and A. M. Ranjbar, “A data quality and quality impacts to synchrophasor applications,” 2017.
sectionalizing method in power system restoration based on WAMS,” [29] H. Gharavi and B. Hu, “Synchrophasor sensor networks for grid com-
IEEE Trans. Smart Grid, vol. 2, no. 1, pp. 190–197, 2011. munication and protection,” Proceedings of the IEEE, vol. 105, no. 7,
[7] Y. Guo, K. Li, D. Laverty, and Y. Xue, “Synchrophasor-based islanding pp. 1408–1428, 2017.
detection for distributed generation systems using systematic principal [30] A. Sundararajan, T. Khan, A. Moghadasi, and A. I. Sarwat, “Survey on
component analysis approaches,” IEEE Trans. Power Del., vol. 30, no. 6, synchrophasor data quality and cybersecurity challenges, and evaluation
pp. 2544–2552, 2015. of their interdependencies,” Journal of Modern Power Systems and Clean
[8] G. Liu, J. Quintero, and V. M. Venkatasubramanian, “Oscillation mon- Energy, vol. 7, no. 3, pp. 449–467, 2019.
itoring system based on wide area synchrophasors in power systems,” [31] A. S. Chowdhury, E. Khaledian, and S. L. Broschat, “Capreomycin
in 2007 iREP symposium-bulk power system dynamics and control-VII. resistance prediction in two species of mycobacterium using a stacked
Revitalizing Operational Reliability, pp. 1–13, IEEE, 2007. ensemble method,” Journal of applied microbiology, 2019.
[9] S. Pandey, A. K. Srivastava, P. Markham, M. Patel, et al., “Online [32] M. E. Aminanto, L. Zhu, T. Ban, R. Isawa, T. Takahashi, and D. Inoue,
estimation of steady-state load models considering data anomalies,” “Automated threat-alert screening for battling alert fatigue with temporal
IEEE Trans. Ind. App., vol. 54, no. 1, pp. 712–721, 2018. isolation forest,” in 2019 17th International Conference on Privacy,
[10] P. Kundu and A. K. Pradhan, “Real-time event identification using Security and Trust (PST), pp. 1–3, IEEE, 2019.
synchrophasor data from selected buses,” IET Generation, Transmission [33] T. Zhang, E. Wang, and D. Zhang, “Predicting failures in hard drivers
Distribution, vol. 12, no. 7, pp. 1664–1671, 2018. based on isolation forest algorithm using sliding window,” in Journal
[11] O. P. Dahal, S. M. Brahma, and H. Cao, “Comprehensive clustering of of Physics: Conference Series, vol. 1187, p. 042084, IOP Publishing,
disturbance events recorded by phasor measurement units,” IEEE Trans. 2019.
Power Del., vol. 29, pp. 1390–1397, Jun 2014. [34] Z. Zou, Y. Xie, K. Huang, G. Xu, D. Feng, and D. Long, “A docker
[12] C. Sun, X. Wang, Y. Zheng, S. Chen, and Y. Yue, “Early warning system container anomaly monitoring system based on optimized isolation
for spatiotemporal prediction of fault events in a power transmission forest,” IEEE Trans. Cloud Computing, 2019.
system,” IET Generation, Transmission & Distribution, vol. 13, no. 21, [35] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 Eighth
pp. 4888–4899, 2019. IEEE International Conference on Data Mining, pp. 413–422, IEEE,
[13] A. Gholami, A. K. Srivastava, and S. Pandey, “Data-driven failure 2008.
diagnosis in transmission protection system with multiple events and [36] H.-P. Kriegel, P. Kröger, E. Schubert, and A. Zimek, “Loop: local
data anomalies,” Journal of Modern Power Systems and Clean Energy, outlier probabilities,” in Proceedings of the 18th ACM conference on
vol. 7, no. 4, pp. 767–778, 2019. Information and knowledge management, pp. 1649–1652, ACM, 2009.
[14] P. Khaledian, B. K. Johnson, and S. Hemati, “Power grid resiliency [37] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying
improvement through remedial action schemes,” in IECON 2018-44th density-based local outliers,” in ACM sigmod record, vol. 29, pp. 93–
Annual Conference of the IEEE Industrial Electronics Society, pp. 774– 104, ACM, 2000.
779, IEEE, 2018. [38] N. Grira, M. Crucianu, and N. Boujemaa, “Unsupervised and semi-
[15] R. Dubey, S. R. Samantaray, and B. K. Panigrahi, “An spatiotempo- supervised clustering: a brief survey,” A review of machine learning
ral information system based wide-area protection fault identification techniques for processing multimedia content, vol. 1, pp. 9–16, 2004.
scheme,” International Journal of Electrical Power & Energy Systems, [39] H. Akoglu, “User’s guide to correlation coefficients,” Turkish journal of
vol. 89, pp. 136–145, 2017. emergency medicine, vol. 18, no. 3, pp. 91–93, 2018.
[16] A. Srivastava, S. Pandey, M. Zhou, P. Banerjee, and Y. Wu, “Ensemble [40] R. Artusi, P. Verderio, and E. Marubini, “Bravais-pearson and spearman
based technique for synchrophasor data quality and analyzing its impact correlation coefficients: meaning, test of hypothesis and confidence
on applications,” in North American Synchrophasor Initiative (NASPI), interval,” The International journal of biological markers, vol. 17, no. 2,
Gaithersburg, MD, pp. 1–24, March 2017. pp. 148–151, 2002.
[17] W. Li, C. Wen, J. Chen, K. Wong, J. Teng, and C. Yuen, “Location [41] Z. Gao, D. Kong, and C. Gao, “Modeling and control of complex
identification of power line outages using PMU measurements with bad dynamic systems: applied mathematical aspects,” 2012.
data,” IEEE Trans. Power Syst., vol. 31, pp. 3624–3635, Sep. 2016. [42] D. M. Powers, “Evaluation: from precision, recall and f-measure to roc,
[18] A. Lazarevic and V. Kumar, “Feature bagging for outlier detection,” in informedness, markedness and correlation,” 2011.
11th ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., pp. 157–166, [43] S. Weisberg, Applied linear regression, vol. 528. John Wiley & Sons,
2005. 2005.
[19] K. D. Jones, A. Pal, and J. S. Thorp, “Methodology for performing [44] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based
synchrophasor data conditioning and validation,” IEEE Trans. Power algorithm for discovering clusters in large spatial databases with noise.,”
Syst., vol. 30, pp. 1121–1130, May 2015. in Kdd, vol. 96, pp. 226–231, 1996.
[20] X. Wang, D. Shi, Z. Wang, C. Xu, Q. Zhang, X. Zhang, and Z. Yu,
“Online calibration of phasor measurement unit using density-based
spatial clustering,” IEEE Trans. Power Del., vol. 33, pp. 1081–1090, Ehdieh Khaledian Ehdieh Khaledian (Student
June 2018. Member, IEEE) is a PhD candidate in computer
[21] S. Pandey, S. Chanda, A. Srivastava, and R. Hovsapian, “Resiliency- science at Washington State University, Pullman,
driven proactive distribution system reconfiguration with synchrophasor WA. She received her M.S. degree in computer engi-
data,” IEEE Trans. Power Syst., pp. 1–1, 2020. neering from University of Isfahan, Isfahan in 2014.
[22] M. Zhou, Y. Wang, A. K. Srivastava, Y. Wu, and P. Banerjee, “Ensemble Her research interest includes machine learning and
based algorithm for synchrophasor data anomaly detection,” IEEE Trans. data mining, and extracting the important patterns
Smart Grid, pp. 1–1, 2018. from masses of data.
[23] T. Wu, Y. J. Zhang, and X. Tang, “Online detection of events with low-
quality synchrophasor measurements based on iforest,” IEEE Trans. Ind.
Inform., 2020.
[24] K. Chatterjee and N. R. Chaudhuri, “Corruption-resilient detection of
event-induced outliers in PMU data: A kernel pca approach,” in 2019
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TSG.2020.3046602, IEEE
Transactions on Smart Grid
12
1949-3053 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Washington State University. Downloaded on December 26,2020 at 03:54:41 UTC from IEEE Xplore. Restrictions apply.
View publication stats