Professional Documents
Culture Documents
02 Etingov BigData 20180425
02 Etingov BigData 20180425
May 8, 2018 1
Project team
PNNL Partners
Pavel Etingov LANL
Jason Hou LBNL
Huiying Ren BPA
Heng Wang
Troy Zuroske
Dimitri Zarzhitsky
May 8, 2018 2
Project goals
May 8, 2018 3
PNNL cloud infrastructure
May 8, 2018 4
Apache Spark
May 8, 2018 5
Spark research cluster based on PNNL
cloud
Current configuration
20 nodes
RAM 512 Gb
Recently upgraded
to Spark 2.2
May 8, 2018 6
Cloud based ML-PMU Framework
PNNL EIOC
Spark SPARK
Parquet
(head Node 2 Python
node)
Historical
data DB
PySpark
.csv, .xml (HDFS) interface Node X Python
REST API
May 8, 2018 7
PMU data stream
PNNL receives PMU data
Data frame organization defined by IEEE C37.118.2
stream from Bonneville Power
Administration
12 PMUs
Multiple channels (Voltage and
Current Phasors, Frequency,
ROCOF)
PMU Data stored in PDAT
format
PDAT format developed by
BPA
Based on IEEE Std.
C37.118.2-2011
Binary files
Each file contains 1 minute of
data
One file ~ 5 MB
May 8, 2018 8
Ongoing work
May 8, 2018 9
PDAT data extraction
PMU data
Read information from PDAT stream
and creates SPARK data
frames PDAT HDFS PDAT
PNNL extract
Store information in Hive or EIOC
Parquet tables Spark
Data
Implemented in PySpark that frames
allows parallel processing of
multiple PDAT files DB Hive Tables
Spark
Significantly increased Parquet
performance
To read information for 1
hour takes about 20
seconds (20 nodes cluster)
May 8, 2018 10
Event detection (threshold based)
User specified
Moving Window
Average frequency
Delta frequency
Event duration
Signal
May 8, 2018 11
Examples of Detected events
Frequency events
May 8, 2018 12
Examples of Detected events
Voltage event
May 8, 2018 13
Examples of Detected events
Voltage event
May 8, 2018 14
Wavelet analysis
Wavelets transform:
Use small waves, so called wavelets, to provide localized time-frequency
analysis.
Scaling (stretching/compressing it; frequency band) and shifting
(delaying/hastening its onset) original waveform
Low scale compressed wavelets high frequency
High scale stretched wavelets low frequency
Assign a coefficient of similarity
May 8, 2018 15
Benefit for the non-stationarity signals
Offline Anomaly Detection based on
Wavelet Analysis
Wavelet-based
Wavelet Moving window Spatiotemporal Final list of
PMU multi-resolution
coefficients based anomaly Anomaly scoring correlation detected
Signals analysis
extraction candidates analysis anomalies
(MRA)
Voltage Signals are Extract the Identify anomaly Anomalous score False alarms or
phasors transformed into detail wavelet candidates at is set to be 1 when localized events
time-frequency coefficients at multiresolution an anomaly is corresponds to
Frequency domain scales matching levels with detected at each relatively weak
the durations of moving-window- resolution level. spatiotemporal
ROCOF MRA decomposes events based anomaly correlations
signals into detection The anomaly score
approximation (A) matrices are the
and detail (D) 5-min bandwidth summation of
components of moving scores at multi-
window resolution levels
across all PMUs
May 8, 2018 17
Examples of Detected Anomalies (1)
An example of detected
PMU voltage anomaly
where the PMUs have
consistent behaviors and
strong cross-correlations.
Red marks: detected events
Green marks: recorded historical
events by NERC
An example of detected
PMU frequency anomaly
where the PMUs have
consistent behaviors and
strong cross-correlations.
May 8, 2018 18
Examples of Detected Anomalies (2)
• Local anomalies
outstanding voltages
anomalies The right panel shows the
The left panel shows the first PCA by removing the
two principal components of redundant angle variation.
three attributes (voltage, angle The voltage and frequency
and frequency). are nearly orthogonal
factors
May 8, 2018 20
Online Anomaly Detection Based on
Dynamic Machine Learning
• The second order polynomial dynamic regression
model is built sequentially for PMU of subsequent 5-
minute time windows. Kalman filter is applied to
compute filtered values of the state vectors, together
with their covariance matrices.
Averaged training RMSE across 12 Units Averaged prediction RMSE across 12 Units
The red vertical lines
show temporal
locations of recorded
events
Historical recorded event and anomaly The detection rates of historical recorded events
event detected by the framework
Spark cluster for ML and PMU (big data) analysis was deployed.
It is based on the PNNL institution cloud system
PMU data has been collecting in PDAT format (PMU data stream
from PBA to PNNL EIOC)
Methodologies for both online and offline anomaly detection have
been developed
Enhanced robustness to bad data
Python (PySpark) modules are under development
PDAT data extraction
Event detection (based on thresholds)
Wavelet anomaly detection
Dynamic nonlinear model and Kalman filter based online detection
framework
May 8, 2018 24