655363

ISTANBUL TECHNICAL UNIVERSITY  GRADUATE SCHOOL OF SCIENCE
ENGINEERING AND TECHNOLOGY
APPLICATION OF HYBRID SIMULATION AND IMPROVEMENT OF

DECISION TREE ALGORITHMS FOR REAL-TIME TRANSIENT STABILITY
PREDICTION BASED ON PMU MEASUREMENTS
M.Sc. THESIS
Tohid BEHDADNIA
Department of Electrical Engineering
Electrical Engineering Programme
DECEMBER 2020
ISTANBUL TECHNICAL UNIVERSITY  GRADUATE SCHOOL OF SCIENCE
ENGINEERING AND TECHNOLOGY

M.Sc. THESIS
Tohid BEHDADNIA
(504171084)
Department of Electrical Engineering
Electrical Engineering Programme
Thesis Advisor: Prof. Dr. V. M. Istemihan GENC
DECEMBER 2020
ISTANBUL TEKNİK ÜNİVERSİTESİ  FEN BİLİMLERİ ENSTİTÜSÜ
PMU ÖLÇÜMLERİNE DAYALI GERÇEK ZAMANDA GEÇİCİ HAL

KARARLILIĞI KESTİRİMİ İÇİN HİBRİT SİMÜLASYON UYGULAMASI VE
KARAR AĞACI ALGORİTMALARININ GELİŞTİRİLMESİ
YÜKSEK LİSANS TEZİ
Tohid BEHDADNIA
(504171084)
Elektrik Mühendisliği Anabilim Dalı
Elektrik Mühendisliği Programı
Tez Danışmanı: Prof. Dr. V. M. Istemihan GENC
ARALIK 2020
Tohid BEHDADNIA, a M.Sc. student of İTU Graduate School of Science Engineering
and Technology student ID 504171084, successfully defended the thesis/dissertation
entitled “APPLICATION OF HYBRID SIMULATION AND IMPROVEMENT OF
PREDICTION BASED ON PMU MEASUREMENTS”, which he prepared after
fulfilling the requirements specified in the associated legislations, before the jury
whose signatures are below.
Thesis Advisor : Prof. Dr. V. M. Istemihan GENC ..............................

İstanbul Technical University
Jury Members : Prof. Dr. Ahmet CANSIZ .............................

Istanbul Technical University
Asst. Prof. Dr. Bülent BİLİR ..............................

İzmir University of Economics
Date of Submission : 30 November 2020

Date of Defense : 30 December 2020
v
vi
To my family,
vii
viii
FOREWORD
First and foremost, I would like to thank my advisor, Prof. Dr. V. M. Istemihan Genc
who invited me into his group, guided me through my master's program and advised
me through my entire time at ITU. His direction kept me on track, and his valuable
feedback always pushed me forward.
I would also like to express my gratitude towards Asst. Prof. Dr. Yusuf Yaslan for
contributing to this research with his valuable expertise on artificial intelligence.
Thanks also go to Res. Asst. Mert Kesici who was kind enough to share, discuss, and
explain his previous work on this subject, which was instrumental to make this thesis
happen.
In addition, I would like to thank my family, who have always been there for me,
supporting me emotionally and financially, and to whom I can attribute all my success
in life.
This thesis is supported by The Scientific and Technical Research Council of Turkey
(TUBITAK) under project no. 118E184.
December 2020 Tohid BEHDADNIA

(Electrical Engineer)
ix
x
TABLE OF CONTENTS
Page
Importance of Using ML-based TSA Tools ....................................................... 1

Importance of Using Hybrid-Type Simulator .................................................... 2
Importance of Enhancing Quality of PMU Measurements ................................ 3
Aim of the Thesis ............................................................................................... 3
Thesis Organization............................................................................................ 4
Parameterized Model of Waveform Components .............................................. 5

2.1.1 Fundamental frequency component ............................................................ 5
2.1.1.1 Natural oscillations .............................................................................. 6
2.1.1.2 Forced oscillations ............................................................................... 6
2.1.2 Inter-harmonic component .......................................................................... 6
2.1.3 Harmonic component .................................................................................. 7
2.1.4 Decaying DC-offset component.................................................................. 7
2.1.5 Noise component......................................................................................... 8
Nature of Waveform Components ..................................................................... 8
2.2.1 Electromechanical components................................................................... 8
2.2.2 Electromagnetic components ...................................................................... 9
Mathematical Algorithm Used Within PMUs .................................................. 11

A Generic PMU ................................................................................................ 12
Phasor-Type Simulation ................................................................................... 15

Electromagnetic Transient-Type Simulation ................................................... 16
Hybrid-Type Simulation .................................................................................. 16
Importance of Modeling Electromagnetic Components .................................. 19

Importance of Simulating Detailed Model of PMU in EMT Domain ............. 20
Application of Hybrid Simulation to Generate Realistic Synchrophasor Data 20
Static Method of Data Arrangement (Conventional Method) .......................... 23

Dynamic Method of Data Arrangement (New Method) .................................. 24
xi
ANN as Function Estimator ............................................................................. 28
Mitigating Measurement Errors Using ANN ................................................... 30
Offline Supervised Dataset Generation ............................................................ 35

Data Preprocessing ........................................................................................... 35
8.2.1 FC-based preprocessing ............................................................................ 36
8.2.2 ANN-based preprocessing......................................................................... 36
Training and Performance Evaluation Procedure............................................. 37
Database Generation Using Hybrid-Type Simulation...................................... 40

Comparing Output of Simulators ..................................................................... 41
Impacts of Dynamic Response of PMUs on Classification Performance ........ 43
Data Preprocessing and Its Effects on Classification Performance.................. 44
9.4.1 FC-based preprocessing ............................................................................ 44
9.4.2 ANN-based preprocessing......................................................................... 45
xii
ABBREVIATIONS
ACF : Amplitude Correction Factor

A/D : Analog-to-Digital
ANMI : Average of Normalized Mutual Information
ANN : Artificial Neural Network
BR : Bayesian Regularization
DFT : Discrete Fourier Transform
DS : Detailed System
EMT : Electromagnetic Transient
EMTP : Electromagnetic Transient Program
ES : External System
FC : Feature Cleansing
FCFFN : Fully-Connected Feed-Forward Network
GPS : Global Positioning System
LM : Levenberg-Marquardt
ML : Machine Learning
MSE : Mean Squared Error
PLL : Phase Locked Loop
PMU : Phasor Measurement Unit
RRV : Regression R-Value
SCG : Scaled Conjugate Gradient
SCADA : Supervisory Control and Data Acquisition
TCS : Training Convergence Speed
TSA : Transient Stability Assessment
TSAT : Transient Stability Assessment Tool
VA : Voltage Angle
VF : Voltage Frequency
VM : Voltage Magnitude
WGN : White Gaussian Noise
WSCC : Western Systems Coordinating Council
xiii
xiv
SYMBOLS
Xp : Fundamental frequency component
X ih : Interharmonic component
Xh : Harmonic component
X dc : Decaying DC-offset component
ε : Noise component
k : Sampling time index
m : The disturbance time in the power system
xp : Magnitude of X p
φp : Phase of X p
fc : Fundamental frequency (60 or 50 Hz)

T : Fixed sampling period
φ natural
p
: φp in the presence of natural oscillations
φd : Amplitude of φnatural
p
fd : Frequency of φnatural
p
γ : Damping coefficient
x modulation
p
: x p in the presence of forced modulations
xam : Amplitude of x modulation

p
f am : Frequency of x modulation
p
φmodulation
p
:  p in the presence of forced modulations
f fm : Frequency of φmodulation
p
φ fm : Amplitude of φmodulation
p
xih : Magnitude of X ih
φih : Phase of X ih
f ih : Frequency of X ih
h : Harmonic order
xv
H : The highest rank of the harmonic in the signal
xh : Magnitude of hth -order harmonic
φh : Phase of hth -order harmonic
xdc : Amplitude of decaying DC-offset
τ : Time constant of decaying DC-offset
N : Number of analyzed samples
 
WGN μ,σ 2 : White Gaussian Noise with mean μ and variance σ 2
Δf : The difference between the actual and nominal frequencies

f : Frequency of X p in off-nominal frequency conditions
Im : Imaginary part
Re : Real part
X ART : Artificial signal (Reference signal)
f ART : Frequency of X ART
 : Set of state variables
 : Set of algebraic variables
x Dp : Erroneous (distorted) form of x p
φ Dp : Erroneous (distorted) form of φp
fD : Erroneous (distorted) form of f

P : Total number of PMUs in the power system
To : Length of the observation time window
Δt : Sampling interval
c : Number of PMU-equipped buses, 1  c  P 
s : Number of simulated scenarios, {1,..., S }

v s,c : Time-series measurements obtained from the bus c at the scenario s
j : Number of data points in v s ,c
Dx : Distorting functions of amplitude ( x p )
Dφ : Distorting functions of phase angle ( φp )
Df : Distorting functions of frequency ( f )
Dx-1 : The inverse function of Dx

Dφ-1 : The inverse function of D
xvi
D -1f : The inverse function of D f
α0 : Input vector of ANN

n0 : Dimension of α0
β̂ : Output vector of ANN

z : Dimension of β̂
θ : Parameter vector
ρ : Dimension of θ
L : Total number of layers in ANN (Hidden layers + Output layer)
l : Number of layers, l = {1, 2, ..., L}
ĝ : Deterministic function that is characterized by θ
gˆ l : Deterministic function of layer l
αl : Input vector of layer l + 1
θl : Parameter vector of gˆ l
σl : Activation function of layer l
Wl : Weight matrix of layer l

bl : Bias Vector of layer l
nl : Total number of neurons in layer l
I : Total number of training samples
i : Number of training samples, i = {1, 2, ..., I}
: Loss function
θ* : The parameter vector that minimizes
tc : Fault clearing time
tcmin : Minimum expected clearing time
tcmax : Maximum expected clearing time
 : Class label (stable or unstable)
δmax : Maximum angle separation between any two generators
FVr : r  th feature vector in the feature space, r  {1,..., R}
V : Vector of the class label
xvii
NMI r  FVr ;  V  : Normalized mutual information FVr and  V
CFV : Set of corrupted feature vectors
xviii
LIST OF TABLES
Page
Table 8.1 : Confusion matrix..................................................................................... 38

Table 9.1 : TSA results for WSCC 127-bus test system (39 PMUs). ....................... 43
Table 9.2 : TSA results when the classifiers are trained with outputs of a phasor-
based simulator and tested with the outputs of a hybrid simulator. ........ 44
Table 9.3 : ANMI before and after FC. ..................................................................... 44
Table 9.4 : TSA results before and after FC. ............................................................ 45
Table 9.5 : Error mitigation results. .......................................................................... 48
Table 9.6 : TSA results before and after mitigating measurement errors. ................ 50
xix
xx
LIST OF FIGURES
Page
Figure 3.1 : Major elements of the modern phasor measurement unit. .................... 13
Figure 4.1 : Total scheme of a hybrid simulator. ...................................................... 17
Figure 5.1 : Total scheme of the proposed simulation method to provide realistic
synchrophasor data at a reasonable speed: (a) total scheme of the
proposed simulation method. (b) details of the PMU model. ................. 21
Figure 6.1 : Total scheme of the transient stability status prediction based on the
conventional method of data arrangement. ............................................. 24
Figure 6.2 : Total scheme of the transient stability status prediction based on the new
method of data arrangement.................................................................... 25
Figure 7.1 : Input-output model: (a) the gray-box model characterized by and a
parameter vector. (b) the fully-connected feed-forward network that fits
into the box in (a). ................................................................................... 29
Figure 7.2 : The procedure of training an ANN to mitigate errors of PMU
measurements: (a) training phase, (b) usage phase. ................................ 31
Figure 8.1 : The proposed transient stability assessment methodology. ................... 34
Single line diagram of the WSCC 127-bus test system. ........................ 39
Voltage waveform before and after superposition of non-fundamental
frequency components. ........................................................................... 41
Synchrophasor data obtained from an arbitrary bus of test system: (a)
voltage magnitude, (b) voltage angle, (c) voltage frequency. ................. 42
Graphical diagram of the built neural networks for mitigating
measurement errors. ................................................................................ 46
Visualizing PMU measurements before and after mitigating
measurement errors: (a) voltage magnitude, (b) voltage angle, (c) voltage
frequency................................................................................................. 47
xxi
xxii
SUMMARY
Because of the stressed operating conditions, which are aggravated by the growing
load demand, and which are also unprecedented due to the high penetration of
renewable energy sources, modern power systems have become more vulnerable to
their credible contingencies. These contingencies could sometimes trigger a series of
protection actions that might develop into cascading failures and might even lead to
serious blackouts. In case of a disturbance driving the power system to transient
instability, a fast prediction of its security status could be vital for allowing sufficient
time to take emergency control actions.
Recently, artificial intelligence methods, including Machine Learning (ML)
techniques, have been broadly applied to real-time Transient Stability Assessment
(TSA) of power systems, mainly because of their high capability to provide a precise
and quick assessment of system stability status. Since supervised ML models can
easily learn the complex relationship between the stability status of a power grid and
the time-synchronized phasor measurements of different buses, they are trained by a
vast amount of synthesized synchrophasor data to be developed for on-line
applications.
In practical systems, synchrophasors are measured by the Phasor Measurement Units
(PMUs) in the presence of different waveform distortions such as DC and harmonic
distortions, transients, and noises on their input signals. Generally, such waveform
distortions can significantly impair the functioning of the PMUs. These impairments
lead to inaccurate, uninformative, and erroneous measurements that can mislead the
ML-based algorithms. As a result, it is necessary to remove or mitigate erroneous
measurements before feeding them to ML-based TSA models.
Basically, the first requirement to design and develop a general and reliable algorithm
for removing or mitigating measurement errors is analyzing the characteristics of a
large set of realistic synchrophasor data that providing true representations of PMU
datasets. Conventionally, in TSA studies, large-scale synchrophasor data are generated
by phasor-based or quasi-static simulation methods. However, outputs of phasor-based
simulators are often devoid of actual PMU data attributes. This leads to generating
unrealistically error-free synchrophasors. Obviously, error-free data is not a good
alternative to the practical data, and using them either in training or validation process
may result in creating unreliable and pseudo-accurate algorithms and also could cause
specialists to make incorrect conclusions based on idealistic, experimental results. In
the literature, there have been many Electromagnetic Transient (EMT)-based
simulation techniques proposed for modeling broadband models of different
components in order to achieve precise and realistic responses outside the nominal
sinusoidal regime. Nevertheless, fully EMT-type simulations are not without
xxiii
limitations related to processing time. Although the synthetic data generated by EMT-
type programs are very realistic, the heavy computational burden restricts their
application.
According to the aforementioned problems associated with producing a large scale of
realistic PMU data, the first objective of this thesis is to propose a new hybrid-type
simulation method to generate a vast amount of realistic synchrophasor data at a
reasonable time. Once a sufficient amount of realistic PMU data are generated to be
used as predictor variables in ML-based TSA models, as the second objective, we
attempt to preprocess input dataset by efficiently removing highly erroneous
measurements from feature space or by mitigating PMU data errors using Artificial
Neural Network (ANN) techniques. In this respect, at first, a new method of
synchrophasor data arrangement is proposed to isolate and remove erroneous
measurements from feature space. With this method, the erroneous parts of the time-
series measurements are effectively removed, while the remaining relevant
information is retained to enhance the transient stability prediction accuracy. Next, as
a second, and main, method of dealing with erroneous measurements, an ANN-based
algorithm is used to mitigate the measurement errors. In this approach, a two-layer
ANN is trained with three different network training functions (e.g., Levenberg-
Marquardt (LM), Scaled Conjugate Gradient (SCG), and Bayesian Regularization
(BR)), and then their performances are evaluated and compared considering the Mean
Squared Error (MSE), Regression R-Value (RRV), and Training Convergence Speed
(TCS).
The efficacy of the proposed methods is investigated in 127-bus WSCC (Western
Systems Coordinating Council) test system. Throughout the precise and innovative
simulations of the study, it is found out that, in sub-transient and transient conditions,
the dataset of synchrophasor measurements contain a large amount of erroneous data,
which are the fundamental cause of the derailment of ML-based TSA models. These
erroneous data are either removed from feature space or partially corrected using an
ANN-based error mitigation algorithm. It is shown that the transient stability
prediction accuracy improves up to 3% when the erroneous measurements are isolated
and removed from feature space, and it enhances up to 7% when the measurement
errors are mitigated using ANN techniques.
xxiv
PMU ÖLÇÜMLERİNE DAYALI GERÇEK ZAMANDA GEÇİCİ HAL
KARARLILIĞI KESTİRİMİ İÇİN HİBRİT SİMÜLASYON UYGULAMASI
VE KARAR AĞACI ALGORİTMALARININ GELİŞTİRİLMESİ
ÖZET
Artan yük talebiyle ağırlaşan ve aynı zamanda yenilenebilir enerji kaynaklarının

yüksek entegrasyonu nedeniyle eşi görülmemiş olan stresli çalışma koşulları
nedeniyle, modern güç sistemleri, muhtemel beklenmedik olaylara/bozucu etkilere
karşı daha savunmasız hale gelmiştir. Bu beklenmedik durumlar bazen kademeli
arızalara dönüşebilecek ve hatta ciddi kesintilere yol açabilecek bir dizi koruma
eylemini tetikleyebilir. Güç sistemini geçici hal kararsızlığına sürükleyen bir arıza
durumunda, acil durum kontrol işlemlerini gerçekleştirmek ve yeterli zaman sağlamak
için güvenlik durumunun hızlı bir şekilde tahmin edilmesi hayati olabilir.
Son yıllarda, Makine Öğrenimi (MÖ) teknikleri de dahil olmak üzere yapay zekâ
yöntemleri, güç sistemlerinin gerçek zamanlı Geçici Hal Kararlılık Değerlendirmesine
(GKD), temel olarak sistem kararlılık durumunun kesin ve hızlı bir değerlendirmesini
sağlama kabiliyetleri nedeniyle, geniş ölçüde uygulanmıştır. Denetimli makine
öğrenimi modelleri, bir güç sisteminin kararlılık durumu ile farklı baraların zaman
senkronizasyonlu fazör ölçümleri arasındaki karmaşık ilişkiyi kolayca
öğrenebildiğinden, çevrimiçi uygulamalar için geliştirilecek büyük miktarlarda
sentezlenmiş senkrofazör verileriyle eğitilirler.
Pratik sistemlerde, senkrofazörler, DC ve uyumlu bozulmalar, geçişler ve giriş
sinyallerindeki gürültüler gibi farklı dalga şekli bozulmalarının varlığında Fazör
Ölçüm Birimleri (FÖD'lar) tarafından ölçülür. Genel olarak, bu tür dalga biçimi
bozulmaları, FÖD'ların işleyişini önemli ölçüde bozabilir. Bu bozukluklar, MÖ tabanlı
algoritmaları yanıltabilecek yanlış ve hatalı ölçümlere yol açar. Sonuç olarak, hatalı
ölçümleri, MÖ tabanlı GKD modellerine beslemeden önce ortadan kaldırmak veya
azaltmak gerekir.
Temel olarak, ölçüm hatalarını ortadan kaldırmak veya azaltmak için genel ve
güvenilir bir algoritma tasarlamanın ve geliştirmenin ilk gerekliliği, FÖD veri
kümelerinin gerçek temsillerini sağlayan geniş bir gerçekçi senkrofazör veri
kümesinin özelliklerini analiz etmektir. Geleneksel olarak, GDK çalışmalarında,
büyük ölçekli senkrofazör verileri, fazör tabanlı veya yarı-statik simülasyon
yöntemleriyle üretilir. Bununla birlikte, fazör tabanlı simülatörlerin çıktıları genellikle
gerçek FÖD verilerinin özniteliklerinden yoksundur. Bu, gerçekçi olmayan, hatasız
senkrofazörler üretilmesine yol açar. Açıkçası, hatasız veriler pratik verilere iyi bir
alternatif değildir ve bunları eğitim veya doğrulama süreçlerinde kullanmak
güvenilmez algoritmalar oluşturmaya neden olabilir. Ayrıca araştırmacıların idealist,
deneysel sonuçlara dayalı yanlış sonuçlar çıkarmasına neden olabilir. Literatürde,
nominal sinüzoidal rejimin dışında kesin ve gerçekçi yanıtlar elde etmek için farklı
bileşenlerin geniş bant modellerini modellemek için önerilen birçok Elektromanyetik
Geçici Hal (EGH) tabanlı benzetim tekniği vardır. Bununla birlikte, tam EGH tipi
xxv
benzetimler işlem süresiyle ilgili sınırlamalara sahip değildir. EGH tipi programlar
tarafından üretilen sentetik veriler çok gerçekçi olsa da ağır hesaplama yükü,
uygulamalarını kısıtlamaktadır.
Büyük ölçekli gerçekçi FÖD verilerinin üretilmesiyle ilgili yukarıda belirtilen
sorunlara göre, bu tezin ilk amacı, makul bir zamanda çok sayıda gerçekçi senkrofazör
verisi oluşturmak için yeni bir hibrit tip benzetim yöntemi önermektedir. Hibrit tip
benzetiminin temel amacı, orijinal ağı, gerekli modelleme doğruluğuna dayanarak, bir
parça fazör tabanlı simülatörü tarafından benzetimi yapılırken diğeri EGH simülatörü
tarafından yürütülecek şekilde iki parçaya bölünmektir. Daha ayrıntılı ve doğru
sonuçların gerekli olduğu daha küçük kısım için EGH tipi benzetim kullanılır. Bu
bölüm FÖD'leri veya dinamik davranışları ayrıntılı modellerinin daha küçük zaman
adımlarıyla benzetimlerini gerçekleştirerek daha doğru bir şekilde karakterize edilecek
diğer herhangi bir elemanı içerebilir. Ağın geniş bölümlerini kucaklayan diğer
bölümün fazör tabanlı simülatörü tarafından benzetimi gerçekleştirilir. Bu kısımda,
bileşenlerin daha az ayrıntılı modelleri yeterli olmakla birlikte, simülatörün hızlı
hesaplama yeteneği önemlidir. Buna göre, ara yüz fazör tabanlı ve EGH simülatörleri,
her iki simülatörün esasını devralan bir hibrit tip benzetimi oluşturur.
Makine öğrenimi tabanlı GKD modellerinde tahmin edilebilir değişkenler olarak
kullanılmak üzere yeterli miktarda gerçekçi FÖD verisi oluşturulduktan sonra, ikinci
amaç olarak, öznitelik alanından oldukça hatalı ölçümleri verimli bir şekilde kaldırarak
veya FÖD ölçüm hatalarını Yapay Sinir Ağı (YSA) tekniklerini kullanarak azaltarak
girdi veri kümesini önceden işlemeye çalışırız. Bu bağlamda, ilk olarak, hatalı
ölçümleri öznitelik uzayından izole etmek ve kaldırmak için yeni bir senkrofazör veri
düzenleme yöntemi önerilmiştir. Bu yöntemle, zaman serisi ölçümlerinin hatalı
bölümleri etkin bir şekilde kaldırılırken, kalan ilgili bilgiler geçici hal kararlılık tahmin
doğruluğunu artırmak için saklanır.
Ardından, hatalı ölçümlerle başa çıkmanın ikinci ve ana yöntemi olarak, ölçüm
hatalarını azaltmak için YSA tabanlı bir algoritma kullanılır. Bu yaklaşımda, iki
katmanlı bir YSA, üç farklı ağ eğitim fonksiyonu ile eğitilir (örneğin, Ölçekli Eşlenik
Gradyan (ÖEG), Levenberg-Marquardt (LM) ve Bayes Düzenleme (BD)) ve ardından
performansları değerlendirilir ve Ortalama Kare Hata (OKH), Regresyon R-Değeri
(RRD) ve Eğitim Yakınsama Hızı (EYH) dikkate alınarak karşılaştırılır.
ÖEG geri yayılım algoritması, satır araması gerektirmeyen tek eşlenik gradyan
algoritmasıdır ve çok iyi bir genel amaçlı eğitim algoritmasıdır. LM geri yayılım
algoritması. Orta büyüklükteki ağlar için hızlı bir eğitim algoritmasıdır. Ayrıca eğitim
seti çok büyük olduğunda kullanım için hafıza azaltma özelliğine sahiptir. BR geri
yayılım algoritması, iyi genelleşen ağlar üretmek için LM eğitim algoritmasının
değiştirilmiş formudur. Optimum ağ mimarisini belirleme zorluğunu yeterince azaltır.
İyi bir genelleme başarımı elde etmek için, 585.000 eğitim örneği sinir ağına beslenir.
Eğitim prosedürü, maksimum döneme (100) ulaştığında veya genelleme gelişmeyi
bıraktığında otomatik olarak durur. Ağ eğitildikten sonra, yani tüm ağırlıkların ve
önyargıların ayarlanması, test örnekleri ile test edilebilir. Uygulanan modellerin
başarımlarının kapsamlı bir görüntüsüne sahip olmak için bunları 409.500 geciktirme
örneği ile test ediyoruz.
Önerilen yöntemlerin etkinliği 127 baralı BSKK (Batı Sistemleri Koordinasyon
Konseyi) test sisteminde test edilmiştir. BSKK127- baralı sistemi, düğümlerin ve
dalların sayısını azaltarak BSKK179- baralı sisteminden türetilmiştir. Bu sistem,
benzer geçici kararlılık değerlendirme çalışmaları için ünlü bir test sistemidir. 127
xxvi
bara'lık sistemde 37 generatör ve 211 iletim hattı bulunmaktadır. 127-baralı sisteminde
gerilim fazörleri, kararlılık tahmini için en uygun olarak konumlandırılmış sınırlı
sayıda FÖD'den elde edilmektedir.
Çalışma kapsamında gerçekleştirilen benzetimler sonucunda, makine öğrenmesi
tabanlı GKD modellerinin yüksek doğruluktan sapmasının sebebi olarak senkrofazör
ölçümlerinin içerdiği yüksek hata oranları olduğu görülmüştür. Bu hatalı veriler ya
öznitelik alanından kaldırılır ya da YSA tabanlı bir hata azaltma algoritması kullanarak
kısmen düzeltilir. Geçici hal kararlılık tahmin doğruluğunun, hatalı ölçümler izole
edildiğinde ve öznitelik alanından çıkarıldığında %3'e, YSA teknikleri kullanılarak
ölçüm hataları azaltıldığında ise %7'ye kadar arttığı gösterilmiştir.
Çalışma kapsamında gerçekleştirilen benzetimler sonucunda, makine öğrenmesi
tabanlı GKD modellerinin yüksek doğruluktan sapmasının sebebi olarak seknrofazör
ölçümlerinin içerdiği yüksek hata oranları olduğu görülmüştür. Bu hatalı veriler ya
öznitelik alanından kaldırılır ya da YSA tabanlı bir hata azaltma algoritması kullanarak
kısmen düzeltilir. Geçici hal kararlılık tahmin doğruluğunun, hatalı ölçümler izole
edildiğinde ve öznitelik alanından çıkarıldığında %3'e, YSA teknikleri kullanılarak
ölçüm hataları azaltıldığında ise %7'ye kadar arttığı gösterilmiştir.
xxvii
xxviii
INTRODUCTION
With the rapid deployment of Phasor Measurement Units (PMUs), time-synchronized

phasor measurements are widely used as predictor variables [1] in Machine Learning
(ML) models [2] to facilitate and accelerate real-time Transient Stability Assessment
(TSA) [3, 4]. Although a number of powerful models and tools have been proposed
for such a purpose [5, 6], the quality, reality, and accuracy of the synthetic dataset that
is fed into these models are rarely considered [7, 8]. This negligence can influence the
entire ML-based assessment model’s dependability during a practical application [9].
One of the important considerations to guarantee the reliability of the ML-based TSA
models is to train them with the large sets of realistic synthetic PMU data [10]. Since
the required dataset of synchrophasors is generated through offline simulation [11], it
is necessary to perform simulations as realistic as possible for ensuring that the
associated attributes of the simulated datasets are sufficiently similar to those obtained
from the real power system [12, 13]. In addition, in order to prevent erroneous data, as
the fundamental cause of the derailment of ML-based models, to be involved in the
decision-making process, the quality of the input feature sets must be validated and
improved through the data filtering and/or data preprocessing process in both training
phase and testing phase. According to the above descriptions, as a first step toward the
objective of developing a reliable ML-based TSA model, we propose an efficient
method for generating a vast amount of realistic synchrophasor data at a reasonable
time. As a second step toward this objective, we enhance the quality and relevance of
the input feature sets through data preprocessing using new data cleansing and/or error
correction techniques.
Importance of Using ML-based TSA Tools
Present-day power grids are being pushed to operate closer to security boundaries by
the growing load demand and insufficient infrastructure investments in numerous
countries. Simultaneously, a vast amount of penetration of renewable energy has made
the systems more complex. Consequently, power grids become more vulnerable to
1
contingencies that could trigger dynamic or static insecurity sequence leading to
blackouts. This has fueled the need for faster and more accurate methods of security
assessment.
In terms of transient stability assessment, traditional methods such as time-domain

simulations normally are computationally complex and expensive. This usually makes
it difficult to offer timely situational awareness of the risk of blackouts. In recent years,
ML techniques have been broadly applied to TSA of power systems, mainly because
of their high capability to provide a quick and accurate assessment of system stability
status. With ML-based TSA tools, the required computation time to acquire stability
information of power systems can be remarkably diminished. Rather, the ML-based
TSA models can be applied on-line to forecast the stability status at a very fast speed.
Importance of Using Hybrid-Type Simulator
As major of the ML-based transient stability assessment models are trained and
developed by an offline-generated dataset of synchrophasors, the required
synchrophasor data must be simulated in a realistic way, to bear true representations
of PMU measurements in non-stationary conditions.
Conventionally, in TSA studies phasor-based or quasi-static simulators are used for

generating large-scale datasets due to their simplicity and fast speed. However, outputs
of phasor-based simulators are often devoid of actual PMU data attributes, especially
in transient conditions. Consequently, the generated synchrophasor data would not be
sufficiently qualified to be fed into ML-based TSA models. Realistic data can be
provided using Electromagnetic Transient Program (EMTP). Nevertheless, due to the
detailed representation of power system components and increased accuracy, fully
Electromagnetic Transients (EMT)-type simulation requires significantly more
computing time. Therefore, in practice fully EMT simulation may not be suitable for
the studies which require extensive simulation results.
To overcome the aforementioned problems usually phasor-type and EMT-type

simulations are integrated in a proper way based on the requirements of each study. In
this study, the proposed hybrid-type simulation offers functionally to provide a vast
amount of realistic synchrophasor data at a reasonable speed by only simulating
PMUs, as the key measuring elements, in the EMT domain.
2
Importance of Enhancing Quality of PMU Measurements
As the main waveform component that carries information about the power system
stability (electromechanical mode) is the fundamental frequency component of power
system (voltage and current) waveforms with its damping, sustained, or growing
oscillations, phasors of this component should be measured as accurately as possible
from different points of the power system. In this respect, GPS-based PMUs were
invented to estimate the real-time synchrophasor measurements of the voltage and
current waveforms. The filtering mechanism, which is generally utilized in PMUs, can
precisely measure phasors for the waveforms with constant parameters (amplitude and
frequency) within an observation time window of length T [14]. However, in general,
power grid waveforms fed into the PMUs are not static sinusoidal signals. Rather, they
contain noises, sustained harmonics, inter-harmonics, etc. Besides, during power
system disturbances, abrupt step-changes and oscillations may occur due to the faults,
switching operations, and electromechanical transients of generating unit rotors [15].
Fourier transform-based phasor calculation algorithms are derived based on the pure
sinusoidal signal model (static phasor model). Consequently, remarkable algorithm
errors are expected when the input electrical signal containing disturbances such as a
step-change in amplitude and phase angle, natural/forced oscillations, frequency drifts,
harmonic/inter-harmonic distortions, noises, and etc. Researchers have attempted to
address this problem by presenting new filtering and phasor estimation techniques [15,
16] or by applying wavelet-based time-series data denoising techniques [17].
However, the proposed methods so far are not able to accurately estimate phasors of
the fundamental frequency component in the presence of all the aforementioned
waveform distortions (especially when the analyzing window of a phasor estimation
method is swept from pre-fault stage to post-fault stage). This may cause inaccurate
phasor estimation and thus wrong protective or control decisions. To prevent such a
disaster, it is important to remove erroneous measurements or mitigate measurement
errors before feeding PMU data to the ML-based TSA algorithms.
Aim of the Thesis
This thesis aims to propose a reliable ML-based TSA tool for online applications. The
proposed method is intended to provide high assessment performance in realistic
conditions when the dataset of synchrophasor measurements contain errors,
3
distortions, and noises. In this regard, as a first step towards the aim of this thesis, a
new hybrid-type simulation is presented which can simulate a vast amount of realistic
synthetic PMU data at a feasible time. Then, a new data cleansing and error mitigation
method is introduced to remove and mitigate the erroneous measurements,
respectively. The efficacy of the proposed methodology is demonstrated on the
Western Systems Coordinating Council (WSCC) 127-bus power system.
Thesis Organization
The rest of this thesis is organized as follows: In Section 2 different components of

power system waveforms are introduced and their nature is presented. Section 3
describes the formulation of the phasor estimation algorithm to compute the phasor of
power system waveforms. In Section 4 different types of real-time simulation methods
are explained. Section 5 provides insight into the requirements of generating realistic
PMU data and then illustrates how our proposed hybrid-type simulation can meet these
requirements. In Section 6 it is described how unstructured PMU data can be arranged
and transferred to the feature space through the static and dynamic method. Section 7
presents how the quality of PMU data can be improved using an artificial neural
network. In section 8 the approach for ML-based transient stability status prediction is
explained. Section 9 presents the results obtained by applying the proposed methods
to a 127-bus test system. Section 10 draws conclusions.
4
COMPONENTS OF POWER SYSTEM WAVEFORMS
Parameterized Model of Waveform Components
Generally, power system waveforms are not static sinusoidal signals. Rather, they
contain natural/forced oscillations, harmonic/inter-harmonic distortions, DC offsets,
noises and etc. Accordingly, power system waveforms can be defined as a
superposition of many different distinct signals as is expressed by (2.1).
X  k   X p  k   X ih  k   X h  k   X dc  k     k  , k  1,, m  1, m, m  1, (2.1)
in which k represents the sampling time index and m represents the disturbance time
in power system, while X p  k  , X ih  k  , X h  k  X dc  k  , and   k  are the
fundamental component, inter-harmonic component, harmonic component, decaying

DC-offset, and a noise component, respectively. Mainly, the waveforms from the
cycles immediately after occurring network disturbances or clearing of network
disturbances are more chaotic than those from other cycles due to the modulation
(forced oscillation) and natural oscillation phenomenon.
In order to have a more detailed look, distinct components of (2.1) are defined and
modeled as follows:
2.1.1 Fundamental frequency component
The fundamental frequency component X p  k  is the main part of waveforms in the
power system. In distribution systems, the fundamental component of voltage

waveforms is a periodic sinusoidal signal which can be characterized by amplitude
x p  k  and phase  p  k  . Thus we have:
X p  k   x p  k   cos  2 f c kT   p  k   (2.2)
where f c is the fundamental frequency in Hz and T is a fixed sampling period. In a
normal operating condition, both of x p and  p are almost constant over time, however,
5
when a power swing occurs, x p and  p become time-varying due to the natural and
forced oscillations.
2.1.1.1 Natural oscillations
Generally, in the presence of natural oscillations  p can be expressed by:
 pnatural  k  m   d  e  kT cos  2 f d kT  (2.3)
where  d and f d are amplitude and frequency of the natural oscillations respectively,
while  is damping coefficient [18]. Mainly, natural oscillations are an after-effect of

the electromechanical dynamics governing the system. If the electromechanical
eigenvalues of the system are stable, the natural oscillations will dampen and the power
system will converge to a new operating point.
2.1.1.2 Forced oscillations
On the other hand, in the presence of forced oscillations (also known as forced
modulations), x p and  p are modeled as follows:
x modulation
p  k  m   xam sin  2 f am kT   Const (2.4)
 pmodulation  k  m    fm  cos  2 f fm kT  (2.5)
where the modulation in amplitude is performed with frequency f am and amplitude
xam , while frequency modulation is achieved with frequency f fm and amplitude  fm
[16]. In power systems, forced oscillations are the response of the system to a cyclic
input in most cases caused by control apparatuses. In many cases, forced oscillations
are caused by the mechanical input to a generator to correct the apparatus causing the
oscillation.
2.1.2 Inter-harmonic component
According to the definition given in IEEE 519-2014 [19], inter-harmonic component

is a frequency component of a periodic electrical signal that is not an integer multiple
of the power frequency, 60 or 50 Hz. Power system inter-harmonics are mainly created
by three general phenomena: 1) rapid non-periodic changes in voltage in a transient
6
state, 2) when the voltage or current amplitude modulation is implemented for control
purposes and 3) asynchronous switching. Mathematically, the inter-harmonic
component can be expressed as follows:
X ih  k    xih  k  cos  2 fih kT  ih  k   (2.6)
where xih , f ih , and ih  k  are magnitude, frequency, and phase of the inter-
harmonic component [20].
2.1.3 Harmonic component
Harmonics are a steady-state or quasi-steady-state concept, which cause continuous

distortion in voltage and current. The sources of harmonic in power systems include
power electronic converters, static VAR systems, inverters for distributed generation,
and ac phase controllers. In steady-state harmonic distortion should be limited to an
acceptable limit as recommended in IEEE 519-2014 [19]. But during or after
disturbances (transient conditions) exceeding these limitations may be encountered.
Generally, the harmonic component of power system signals can be modeled and
expressed by (2.7).
H H
x  k   x  k  cos  2 hf kT    k 
h2
h
h2
h c h (2.7)
in which h, H , xh , and  h are harmonic order, the highest rank of the harmonic in the
signal, magnitude of the hth harmonic, and phase of the hth harmonic, respectively [20].
2.1.4 Decaying DC-offset component
Voltage waveforms commonly have a decaying DC component due to capacitive

voltage transformer transients [21]. In a normal manner, the DC-offsets exponentially
decay to negligible values after a few cycles. We can model the decaying DC
component as follows:
k m

xdc  k  m   xdc e  N
(2.8)
7
where xdc and  are the amplitude and the time constant of the decaying DC-offset
respectively, while N represents the number of analyzed samples [20].
2.1.5 Noise component
Field waveforms are scarcely noise-free. A spurious frequency component which is

not a inter-harmonic, harmonic, or step-changes may be considered to be noise. The
source of noise commonly are electromagnetic interference and nonharmonic
components [21]. Generally, it is considered the noise in the input waveform to be a
White Gaussian Noise (WGN) process. Equation (2.9) shows the noise model for
power system signals.
  k   WGN   ,  2  (2.9)
 
where WGN  ,  is white Gaussian noise with mean  and variance  .
2 2
Nature of Waveform Components
For the purpose of the present discussion, distinct components of power system
waveforms are classified into two main categories: electromechanical and
electromagnetic [22].
2.2.1 Electromechanical components
Voltage waveform components can be electromechanical when they are created by

involving the rotor movement of large generating units. In power systems, the
fundamental component is a type of electromechanical signal with a frequency close
to 60 or 50 Hz. During the power swing1 the amplitude and phase angle of the
fundamental component ( X p ) are fluctuated with low-frequency mainly between 0.1
Hz to 2 Hz range [23], basically, due to the presence of natural and forced oscillations.
Correspondingly, the fundamental component ( X p ) with its superimposed
electromechanical oscillation signals (  pnatural , x modulation

p , and  pmodulation ) are categorized
1
Power swing occurs when a balance between the power generation and demand is lost due to the fault,
line switching, generation tripping, loss of load, or other system disturbances.
8
as electromechanical components of power system waveforms.
2.2.2 Electromagnetic components
Voltage waveform component can be electromagnetic, when they are created due to
the interaction between the (electric) energy stored in capacitors and the magnetic
energy stored in inductors, or when they are created by involving the interaction
between the electric energy stored in circuit components and the mechanical energy
stored in rotating machines, or it can be defined as the response of the power system
elements to the perturbation caused by external electromagnetic filed or to a change in
the physical configuration of the grid [24]. Basically, electromagnetic signals
are created as a result of sudden load or excitation changes, faults, or abnormal
operating conditions. In power grids, electromagnetic waves have a wide range of
frequencies varying from DC to several MHz. Non-
fundamental frequency components such as abrupt step-changes, inter-harmonics (
X ih ), harmonics ( X h ), decaying DC-offset ( X dc ), and noises (  ) are categorized as
electromagnetic components of power system waveforms.
9
10
PHASOR MEASUREMENT UNITS
A phasor measurement unit (PMU) is an apparatus used to estimate the amplitude,

phase angle, and frequency of electrical quantities (such as voltage) in the power grid
utilizing a common time source for synchronization. Time synchronization is usually
provided by the Global Positioning System (GPS), which allows synchronized real-
time measurements of multiple remote points on the network. PMUs can capture
samples from a waveform and reconstruct the phasor quantity, made up of amplitude
and angle measurement. The resulting measurements are called synchrophasors. The
synchrophasor measurements are important because if the network’s supply and
demand are not matched, frequency imbalances can cause stress on the power system,
which is potentially a cause of cascading failures and/or power system blackouts.
A standard PMU can report measurements with a high temporal resolution (120
measurements/second). This helps us in analyzing dynamic events in the power system
which is not possible with conventional Supervisory Control and Data Acquisition
(SCADA) measurements that just report one measurement every two or four seconds.
Therefore, PMUs are considered as one of the most important measuring apparatuses
of smart grids that equip utilities with enhanced control and monitoring capabilities.
Mathematical Algorithm Used Within PMUs
The fundamental component of the PMUs’ input waveform is presented by (2.2).

During off-nominal frequency conditions, where the actual frequency deviates from
the nominal value, the signal of (2.2) is presented as:
X p  k   x p  cos  2 f c kT  2fkT   p  (3.1)
where f is a constant offset that indicates the difference between the nominal and
actual frequencies. The above equation can be rewritten as follows:
 
X p  k   xpe
j 2fkT  p
(3.2)
11
So, the phase angle will change uniformly in proportion to f . To define p for off-
nominal conditions, at first the frequency is computed accurately using (3.3).
 Im  X p  k     Im  X p  k  1  
tan 1    tan 1  
 Re  X p  k     Re  X p  k  1   (3.3)
f  k   fc 
2 T
where subscripts Re and Im stand for real and imaginary parts, respectively. In the
next step, an artificial signal with the computed frequency and a zero phase angle is
generated and considered as the reference:
X ART  k   cos  2 f ART kT  (3.4)
where f ART is equal with the computed frequency using (3.3). Finally, the phase angle
is computed as follows:
 Im  X p  k     Im  X ART  k   
 p  k   tan 1    tan 1  
 Re  X p  k     Re  X ART  k   
(3.5)
In a like manner, the amplitude of the analyzed signal in the kth sample is computed as
follows:
x p  k   ACF    Re  X p  k     Im  X p  k  

2 2 2
f (3.6)
where ACF is the amplitude correction factor that can be computed offline with an
arbitrary resolution for each frequency.
A Generic PMU
As the PMUs invented by various companies differ from each other in many aspects,
it is difficult to discuss the PMU hardware structure in a universally applicable way.
Nevertheless, it is possible to discuss a generic PMU, which will capture the principal
elements of typical PMUs.
12
Figure 3.1 is based on the structure of the first standard PMU invented by Virginia
Tech. The analog inputs are waveforms (voltage or current) obtained from the
secondary windings of the current or voltage transformers. All three-phase waveforms
are used to calculate positive sequence measurement. The frequency response of the
anti-aliasing filters is dictated by the sampling rate chosen for the sampling process.
The Fourier based filters are mostly analog-type filters with a cutoff frequency less
than half of the sampling frequency to satisfy the Nyquist criterion.
Figure 3.1 : Major elements of the modern phasor measurement unit.
13
14
REAL-TIME SIMULATION METHODS
Recently, real-time simulation turned out to be essential because of the need to

evaluate the performance of predictive and control algorithms. There exist numerous
reasons why it is imperative to carry out real-time simulations. One of the main reasons
is that it assists developers to minimize the cost of fixing errors presented in any control
system. Although the cost involved in fixing errors at the design stage is approximately
zero, it costs too much to fix the same error at the application stage. Therefore,
performing real-time simulations helps to identify and fix potential errors that may
occur during the application of control algorithms. Besides, the real-time simulation
assists to perform dynamic tests on power grids in the laboratory environment, this
previously could only be done on the field. Performing dynamic tests on the field is
extremely risky and expensive. The real-time simulation allows validation of designs
to be performed all along the project, detection of algorithm errors at early
phases of engineering, and also gives room for infinite test capabilities.
Phasor-Type Simulation
A complete power system model for TSA can be mathematically described by a set of
differential equations and algebraic equations:

  f( ) (4.1)
0  g( ) (4.2)
where  is the set of state variables and  is the set of algebraic variables. The
algebraic equations represent the loads, connecting grid, and transmission system,
while the differential equations model dynamics of the rotating machines. These
equations are nonlinear, and the classic solution method is to utilize a discretization
approach to convert the differential equations to a new set of nonlinear algebraic
15
equations and then solving these two sets of nonlinear algebraic equations by a proper
iterative method.
To make the solution as fast as possible typically a time-step in the range of

milliseconds is chosen for the simulation. However, the large integration time-step of
the transient stability programs is the main restriction for the detailed representation
of nonlinear elements (such as PMUs) and dynamically fast events. For instance, to
evaluate transient responses of PMUs a time-step in the order of a few microseconds
is required. Therefore, in traditional transient stability programs (phasor-type
simulation) these devices can just be represented as quasi-steady-state models, which
are only suitable for normal working conditions or are developed for a specific type of
fault.
Electromagnetic Transient-Type Simulation
In transient stability studies there several cases when the simulation of voltage and
current transient waveforms is required. Examples of such cases include simulation of
nonlinear elements (such as PMUs), frequency-dependent systems, and design of
protection algorithms. Basically, EMT-type simulation requires detailed modeling and
consequently a much smaller simulation time-step than in the phasor-type simulation.
According to the highest frequency in the simulation and the type of transient, the
simulation step-size can change in the range of a few hundred microseconds for slow
transients, to a few nanoseconds for faster transients.
A large number of differential and algebraic equations and a small simulation time-
step have made the EMT-type simulation a computationally expensive and inefficient
type of simulation. In practice, it is not efficient to perform electromagnetic transient
analysis for a large power grid where all of the elements and systems are represented
using detailed models.
Hybrid-Type Simulation
The main objective of the hybrid-type simulation is to split the power system into two
parts, in such a way that, based on the required modeling accuracy, one part is
simulated by the phasor-based simulator while the other is carried out by the EMT
simulator. EMT-type simulation is utilized for the smaller part in which more detailed
16
and precise results are required. The smaller part may comprise PMUs or any other
elements whose dynamic behavior are to be characterized more accurately by
simulating their detailed models with smaller time-steps. The other part that embraces
extensive portions of the network is simulated by the phasor-based simulator. In this
part, whereas less detailed models of the components are sufficient, the capability of
the simulator for quick computation is important.
In a hybrid simulator, EMT-based and phasor-based simulators are run on two separate
zones: 1) the Detailed System (DS) and 2) the External System (ES). Thus, each
simulator requires a true picture of the other zone which adequately reflects its
characteristics. This needs a converter block which should be able to convert phasors
of phasor-based simulators to equivalent instantaneous waveforms and vice versa.
Figure 4.1 illustrates the total scheme of the hybrid simulator.
Figure 4.1 : Total scheme of a hybrid simulator.
17
18
SIMULATING REALISTIC SYCHROPHASOR DATA
As mentioned, the state-of-the-art ML-based TSA models are trained and developed
by an offline-generated dataset of synchrophasors. As these models are to be used in
practical applications, the used synthetic synchrophasor data must be simulated in a
realistic way, accurately reflecting the impacts of waveform disturbances on the
dynamic response of PMUs. Mainly, the realism of the generated PMU data depends
on two factors: 1) the reality of the simulated input waveforms, and 2) the utilized
model and simulation environment in which the principle elements of the PMUs are
to be simulated. In the following subsections, the requirements of simulating realistic
wide-area PMU measurements for transient stability studies are expressed and then the
proposed method of synthesizing PMU data is presented.
Importance of Modeling Electromagnetic Components
Basically, electromagnetic components of power system waveforms (e.g., harmonic,

inter-harmonic, decaying DC-offset, and noises) do not impair the stability of the
power system, thus can often be neglected in theoretical transient stability studies.
However, in practical systems, these electromagnetic waves are superimposed to the
fundamental frequency component and distort the sinusoidal shape of current and
voltage waveforms. These nonlinear disturbances may impair the filtering and phasor
estimation algorithms, which have been predominately used in practical PMUs.
Pragmatically, these impairments make it unlikely to filter out the fundamental
component (as the most determinant factor that impacts the stability of the system) and
precisely capture the required dynamics. In this respect, if the distorting effects of
superimposed electromagnetic waves are ignored during the synthetic PMU data
simulation, the produced synchrophasors would be unrealistically error-free, and
subsequently unqualified to be fed into ML-based TSA models which are to be used
in practical applications.
19
Importance of Simulating Detailed Model of PMU in EMT Domain
From the viewpoint of modeling and simulation, PMU models can be categorized into
two main categories: generic and detailed. A generic model is based on the general
characteristics of PMUs. It is a simplified model that behaves similarly to the PMU,
within clearly understood and specified bounds. A generic model can not represent all
performance specifications of a PMU. Although such models can provide a fair insight
into the operation of the PMUs, they are not adequate to provide realistic PMU data in
sub-transient and transient conditions.
On the other hand, a detailed model reproduces the characteristics, algorithms, and
behavior of the practical model of PMU in detail. This approach models all internal
elements of the PMU, as well as high-frequency interactions between its principle
components (such as the interaction between A/D converter and anti-aliasing filter).
Detailed modeling can be complex and require EMT-type simulation.
As a result, since the generic model of PMU does not meet the requirements of
providing realistic synchrophasor data, especially in sub-transient and transient
conditions, it must be developed to a detailed model. This detailed modeling is only
possible using EMTP which is the specialized software for electromagnetic transient
studies.
Application of Hybrid Simulation to Generate Realistic Synchrophasor Data
By analyzing the spectra of instantaneous voltage waveforms, obtained from fully

EMT simulations or real-life systems, all possible spectral components of voltage
waveforms (fundamental and non-fundamental components) can be parameterized in
terms of amplitude, phase angle, and frequency. In essence, offline spectral analysis
helps to derive the explicit parameterized model of electromagnetic components of
voltage waveforms without knowing details of electromagnetic interactions between
the non-linear elements of the power system, as the main source of noises and
electromagnetic components. Additionally, in literature many standards and studies
have been carried out to estimate exact parameters of electromagnetic components or
to determine their range of variations. In this regard, without an excessive
computational burden for modeling electromagnetic interactions in the time domain,
it is possible to identify parameters of electromagnetic components and simulate their
20
effects on power system waveforms and PMU responses. In this regard, by only
simulating a detailed model of PMUs (as the most sensitive components to waveform
disturbances) in the EMT domain, impairing impacts of waveform disturbances
(whether electromagnetic or electromechanical) can be effectively simulated and
studied.
Figure 5.1(a) depicts the total scheme of the proposed simulation method, where the
x p ,  p , and f are the actual value of amplitude, phase angle, and frequency of power
system waveforms, respectively, while x pD ,  pD , and f D are their measured value
(erroneous version). Subfigure (b) shows the general structure and principal
components of a typical Phase Locked Loop (PLL) PMU used in (a).
Figure 5.1 : Total scheme of the proposed simulation method to provide realistic
synchrophasor data at a reasonable speed: (a) total scheme of the proposed
simulation method. (b) details of the PMU model.
21
22
ARRANGING UNSTRUCTURED DATA IN FEATURE SPACE
Generally, when the time-synchronized PMU measurements are reported to the control
center, they do not have a pre-defined data model and are not arranged in a pre-defined
manner. This means that the reported database is unstructured and cannot be fed to
ML-based algorithms without any arrangement. In the literature, in order to construct
a meaningful feature space, the sequential PMU data are arranged in an n-dimensional
space in a specific order. This process is known as data arrangement. In the following
subsections, after representing the conventional method of data arrangement,
limitations, and problems of the commonly used data model are identified and then
some alternative solutions are proposed.
Static Method of Data Arrangement (Conventional Method)
For a given power system, suppose P phasor measurement units,

{PMU1 , PMU 2 ,..., PMU P } , are installed at P buses for online monitoring. Given an
observation time window of length To , PMU measurements are sequentially acquired
from individual buses with a sampling interval of t . In this respect, we have:
v s ,c  vs1,c , vs2,c ,, vsj,c  (6.1)
where v s ,c is time-series measurements obtained from the bus c 1  c  P  at the
scenario s  {1,..., S } , and j  To / t is the number of data points in v s ,c .
Conventionally, in order to create a feature space, sequential PMU data of a certain

type of electrical quantities, e.g., voltage magnitude, are arranged in a feature space
according to the ID number of PMUs as follow:
23
 v1,1 v1,2 v1,P 
v v 2,2 v 2,P 
Q= 
2,1
  (6.2)
 
 v S,1 v S,2 v S,P 
The total scheme of transient stability prediction based on this arrangement is shown
in Figure 6.1.
conventional method of data arrangement.
Generally, in the transient condition, significant errors are observed in the time-series
measurements of neighboring buses which are electrically close to the fault location.
This means that in different scenarios, the accuracy of the reported time-series
measurements varies dynamically as a function of fault location. Since in the
conventional procedure of data arrangement, the input feature space is constructed
through a static configuration, highly erroneous measurements can lie anywhere in n-
dimensional feature space. Thus, almost all of the feature vectors (all columns of Q )
are corrupted by erroneous data. Obviously, feeding corrupted feature vectors into the
ML models can significantly derail the classification process and result in poor training
efficiency.
Dynamic Method of Data Arrangement (New Method)
In our proposed method of data arrangement, in order to limit the corruption of feature
vectors, the sequential PMU data are arranged in a dynamic manner, based on the
24
electrical distance of PMU-equipped buses from the fault point. It will be illustrated
that, in this arrangement, the erroneous data are placed in some specific feature vectors
and can be easily removed without reducing mutual information between the feature
vectors and targets. This property makes us able to modify the feature space by
eliminating a limited number of feature vectors that are corrupted by highly erroneous
data. The total scheme of transient stability prediction based on the new arrangement
is shown in Figure 6.2.
new method of data arrangement.
25
26
ENHANCING DATA QUALITY USING ARTIFICIAL NEURAL
NETWORK
For the purpose of the present section, erroneous PMU data are assumed as a function
of error-free (actual) values. This assumption can be mathematically represented as
follows:
x pD  Dx  x p  (7.1)
 pD  D  p  (7.2)
f D
 Df f (7.3)
where Dx , D , and D f are unknown distorting functions of amplitude, phase angle,
and frequency, respectively. The inverse of these functions ( Dx1 , D1 , and D f 1 ), known
as error-mitigation function, are used for error mitigation purposes:
 
Dx1 x pD  x p (7.4)
D1  pD    p (7.5)
 
D f 1 f D  f (7.6)
The traditional way to estimate the inverse of distorting functions is to determine a

suitable parameterized model of the distorting functions then approximate the
parameters from measurements, and ultimately build their inverse functions according
to the approximations. This method is susceptible to error-propagation. An other
method is to build and tune an Artificial Neural Network (ANN) to straitly invert the
distorting function, without requiring parameter approximation or explicit modeling.
The following subsections describe the details of this approach.
27
ANN as Function Estimator
Consider a systematic network that takes an n0 -length input vector α0  n0

and
produces a z -length output vector βˆ  z

, as shown in Figure 7.1(a). The output is
ascertained by the input via a deterministic function ĝ as expressed by (7.7):
βˆ  ĝ  α0 ; θ  (7.7)
basically, ĝ is a fixed function but is described by an  - dimensional parameter


vector θ  . Although numerous various input-output relations can be modeled in this
way by changing the parameter vector θ , they all share an underlying structure
ascertained by the initial choice of ĝ . This is known as the gray-box model. When the
deterministic function ĝ is chosen to resemble the human brain's neurons, the gray
box is called an ANN. The classical form of ANN is a Fully-Connected Feed-Forward
Network (FCFFN). This network is illustrated in Figure 7.1(b).
In particular, the FCFFN can accurately estimate any continuous function by using a
large but finite number of neurons and parameters [25]. In FCFFN, ĝ is a composition
of L functions, gˆ1 , , gˆ l , which describe transitions between neurons in an input
layer to neurons in an output layer via L  1 intermediate hidden layers. The function
gˆ l is determined by the parameters θl  Wl , bl  and modeled as:
gˆ l  αl 1 ; θl    l Wl αl 1  bl  (7.8)
nl nl 1
where Wl  and bl  nl
are weight matrix and bias vector, respectively,
while  l : nl
 nl
is called an activation function. The function gˆ l can be
interpreted as taking the values αl 1 in the nl 1 neurons of the layer l  1 , combining
the values together according to the affine transition relation Wl αl 1  bl , and eventually
applying the activation function  l to determine the values of the nl neurons of the
layer l .
28
Figure 7.1 : Input-output model: (a) the gray-box model characterized by and a parameter vector. (b) the fully-connected feed-forward network
that fits into the box in (a).
29
The parameter vector of an ANN can be trained to estimate an unknown function that
we call g ; that is, ĝ should be well trained to become a fair approximation of g . This
is usually done by supervised learning using a set of I learning samples containing
input vectors α
train  i
and the corresponding output vectors β
train i
 
 g α train i that we
desire the ANN to reproduce, for i 1,..., I . These training samples can be represented
as the columns of two matrices:
Α train  [α train 1  α train  I ] (7.9)
Βtrain   β train1  β train I  (7.10)
The inputs should preferably be selected in a random manner from the distribution of
inputs that appears when using g in reality. The training basically consists of
discovering the parameter θ that minimizes (or maximizes) a loss function

*
that
measures the estimation mismatch:
θ *  argmin
θ
 θ, Α train
, Βtrain 
(7.11)
 
The main purpose is that the trained ANN gˆ α0 ; θ will provide almost the right
*
outputs not only for the training samples but for any input data α0 obtained in the same
way. This is known as generalization and is the desired property for practical
application. Basically, the training in (7.11) is an optimization problem. Different
network training algorithms can be used to find computationally and performance-wise
acceptable suboptimal solution for this non-convex optimization problem. Scaled
Conjugate Gradient (SCG) [26], Levenberg-Marquardt (LM) [27], and Bayesian
Regularization (BR) [28] are three well-known backpropagation algorithms that can
be used as network training functions to update bias values and weight values so as to
produce a network that generalizes well.
Mitigating Measurement Errors Using ANN
Based on the description given in section 7.1, the error mitigation function of different
systems can be accurately estimated by supervised training of a well-structured ANN.
30
This procedure for approximating error mitigation functions of PMUs ( Dx1 , D1 , and
D f 1 ), is illustrated in Figure 7.2(a).
Figure 7.2 : The procedure of training an ANN to mitigate errors of PMU

measurements: (a) training phase, (b) usage phase.
   
In Figure 7.2, β  x p ,  p , f is the true value, α  x p ,  p , f is erroneous value, and
D D D
ĝ is an estimate of error mitigation function. It can be seen that the training procedure
will iteratively update the parameter vector θ to gradually decrease the estimation
errors until it converges to some θ . Then, the trained ANN in Figure 7.2(b) is used
*

to counteract the distorting function D  Dx , D , D f  without having to model it in a
clear and detailed manner and approximate model parameters. If the ANN is designed
to have sufficiently low complexity, then the trained ANN can be utilized in real-time
applications.
Basically, different network training functions (backpropagation algorithms) such as

SCG, LM, and BR, can be used to train ANN. SCG backpropagation algorithm is the
only conjugate gradient algorithm that needs no line search and is an excellent general-
purpose training algorithm. LM backpropagation algorithm is a fast training function
for networks of medium size. It also has memory reduction feature that can be used
when the training set is too large. BR backpropagation algorithm is a modified form
of the LM training function to yield networks that generalize well. It sufficiently
decreases the difficulty of determining the optimum network architecture
31
32
ML-BASED TRANSIENT STABILITY PREDICTION
ML-based TSA approaches consist of two phases: online and offline. The offline phase
is initialized by the generation of training sets, which is obtained by performing time-
domain simulations and gathering the time-synchronized measurements from PMU-
equipped buses. The generated datasets are then used for developing pre-processing
algorithms in order to enhance the quality of the dataset. Finally, the processed datasets
are fed into the ML-based TSA tools for training them and tuning their parameters. In
the online phase, after collecting PMU measurements from different points of the
power system, data processing is immediately performed to modify the dataset by the
developed pre-processing algorithms. Next, the trained transient stability classifier
adopts the modified dataset as input and forecasts the stability status of the power grid.
The flowchart of the proposed method is presented in Figure 8.1. In this flowchart, the
preprocessing steps are shown for two different preprocessing methods:
1) In the first preprocessing method, erroneous PMU measurements are simply

removed from feature space by Feature Cleansing (FC) techniques. This method is
FC-based preprocessing.
2) In the second preprocessing method, instead of removing erroneous measurement
from feature space, it is attempted to mitigate measurement errors using ANN. This
method is called ANN-based preprocessing.
Basically, preprocessing is done in order to enhance dataset quality and reduce

irrelevant information. Depending on user convenience, accuracy requirements, and
application needs, one may decide to use either the first or second method of
preprocessing.
The following subsections describe each step of the offline phase of the proposed TSA
methodology by presenting details of dataset generation, dataset preprocessing,
training procedure of the classifiers, and performance evaluation.
33
Figure 8.1 : The proposed transient stability assessment methodology.
34
Offline Supervised Dataset Generation
Synchrophasor measurements are usually generated through off-line time-domain

simulations. In these simulations, the power grid is subjected to a large number of
credible contingencies [29] involving disturbances that can make the system unstable.
These contingencies can include a sudden three-phase short circuit, an outage of a
transmission line, or a generator followed by a subsequent tripping. The clearing time
t c , can vary within the minimum expected clearing time tc min
and maximum expected
clearing time tc ( tc  tc  tc ). For a reliable TSA, the training set should cover a
max min max
sufficient number of operating points so that the models developed can be a good
representation of the practical power system and can be tolerant to the uncertainties in
operating conditions. For each operating point, time-domain simulations are carried
out to determine the stability index with respect to any credible contingency. The
system’s transient stability can be assessed and classified, as either stable or unstable,
with a label  ,
 1 stable 
 for   0
 
 1 unstable 
 for   0 (8.1)
where η = 360 - δmax / 360 + δmax is power angle-based stability index, and δmax is
maximum angle separation between any two generating units at the same time in post-
diturbance response [30].
Data Preprocessing
Basically, data preprocessing is the most important phase in ML projects. Raw PMU
data usually comes with many imperfections such as error, missing value, and noise.
If there are many erroneous and unreliable data in the dataset, then knowledge
discovery during the training phase is more difficult. This problem can be sufficiently
solved by data preprocessing techniques. The product of data preprocessing is the final
training set.
In this thesis data preprocessing is performed in two different ways: FC-based

preprocessing and ANN-based preprocessing. The details of these two approaches are
given in the following subsection.
35
8.2.1 FC-based preprocessing
Feature Cleansing (FC) or feature cleaning is the process of detecting and removing a
subset of features that are corrupted, inaccurate, invalid, or distorted. Feature cleansing
is a type of data preprocessing technique by which the size of feature space can be
reduced for a faster learning speed and better forecast accuracy. Additionally, by FC
the quality of the input feature space can be increased for obtaining more reliable ML
models.
An efficient FC algorithm should effectively remove the erroneous and misleading

data while keeping the relevant information. To ensure the efficiency of the used FC
algorithm, information theoretic quantities, such as Average of Normalized Mutual
Information (ANMI), can be computed and compared before and after the feature
cleansing. Let FVr be the r  th feature vector in the feature space ( r  {1,..., R} ), and
 V be the vector of the class label. After FC, the ANMI can be calculated as follows:
 NMI r  FVr ;  V    r NMI r  FVr ;  V 

rR
r 1 CFV
ANMI  (8.2)
R  CFV
where NMI r  FVr ;  V  is the normalized mutual information between the feature
vector FVr and the vector of targets  V , while CFV is the set of corrupted feature
vectors. A significant reduction in ANMI after applying FC reduces the informativity
of the input feature space, which is clearly undesirable and can negatively affect the
classification performance.
8.2.2 ANN-based preprocessing
High capability of ANNs as well as their simplicity results in increasing their usage
for preprocessing dynamic and erroneous datasets. In TSA studies, ANN-based data
preprocessing can be considered as a powerful tool for mitigating errors of PMU data
used in ML-based TSA models.
As described in Section 7.1, the main objective of an ANN-based algorithm is to

minimize (or maximize) a loss function that measures the approximation mismatch
between outputs and targets. Generally, the loss function can be defined as Mean
Squared Error (MSE)-based loss function:
36
MSE 
1 I

I i 1

β traini  gˆ α traini ; θ  2
(8.3)
or it can be defined as a correlation-based loss function:
  gˆ  α     β 
I
train i
; θ  gˆ α train i ; θ train i
 β train i
 i 1
  gˆ  α    β 
Corr (8.4)
 
I 2 I 2
train i train i train i train i
; θ  gˆ α ;θ β
i 1 i 1
where gˆ  α train i ; θ    i 1 gˆ  α train i ; θ  and β train i   i 1 β train i .

I I
In equation (8.3), the loss function MSE measures the average squared difference
between outputs and targets. This function should be minimized to obtain better pre-
processing performance (zero means no error). On the other hand, Corr measure the
correlation between outputs and targets. It has a value between zero and one. This
function should be maximized for better pre-processing performance. A Corr value of
1 means a close relationship between outputs and targets, while 0 means a random
relationship.
Training and Performance Evaluation Procedure
Training a statistical learning model and evaluating its performance on the same data
has been proven to produce biased results due to overfitting [31]. In that regard, in the
literature, several methods for performance evaluation and training have been
proposed. These methods use different data samples to train and evaluate the
performance of ML-models.
For many predictive tasks, the most widely used training method is K -fold cross-
validation. The idea behind this approach is to randomly shuffle the data divided into
K equal-sized folds. Of the K folds, a single fold is held as the validation set for testing
the model, and the remaining K  1 folds are utilized as training ser. This process is
then repeated K times, with each of the K folds used exactly once as the validation data.
In this way parameters of classifiers are optimally tuned for the best accuracy
37
In order to effectively evaluate the performance of the ML-based TSA models, we use
the confusion matrix [1] shown in Table 8.1.
Table 8.1 : Confusion matrix.

Observed
Predicted
Stability Instability
Stability 11 12
Instability 21 22
The indices to evaluate the performance of the ML-based TSA model are defined as
follows:
11  22
ACC  100%
11  12  21  22 (8.5)
22
TUR  100%
12  22 (8.6)
11
TSR  100%
11  21 (8.7)
where ACC represents the overall accuracy, TUR represents the proportion of correct
results predicted to be unstable for all unstable instances, TSR represents the
proportion of correct results predicted to be stable from all stable instances.
38
CASE STUDY ON TEST SYSTEM
In this section, analysis is implemented on the Western Systems Coordinating Council

(WSCC) 127-bus system in order to test the effectiveness of the proposed methods.
The WSCC 127-bus system is derived from the WSCC 179-bus system by reducing
the number of nodes and branches [32]. This system is a famous test system for similar
transient stability assessment studies [33]. There are 37 generators and 211
transmission lines in the 127-bus system. For online monitoring and state estimation,
39 buses of this system are equipped with PLL-based PMUs based on [34]. The single-
line diagram of the 127-bus system and the location of the PMU-equipped buses are
shown in Figure 9.1.
Single line diagram of the WSCC 127-bus test system.
In this study TSAT software [35], MATLAB/SIMULINK, and PYTHON are used for
phasor-type simulation, EMT-type simulation, and implementing machine learning
algorithms, respectively.
39
Database Generation Using Hybrid-Type Simulation
The datasets are generated via time-domain hybrid-type simulation. Based on the
proposed method, power system components, including generating units, exciters,
governors, stabilizers, loads, and etc. are simulated as the external system in phasor
domain using TSAT software, and the Phase Locked Loop (PLL)-based PMUs
(conforming with IEEE Std C37.118.1-2011) are simulated as detailed systems in
EMT domain using MATLAB software.
For the test system we studied, the loading level is varied from 80% to 120% . The
contingencies considered are the three-phase-to-ground fault that occurs on all lines
and on all buses, where the fault is located at 25% , 50% , and 75% of their length. The
line faults are cleared by removing the transmission line. The start time of the fault is
uniformly set to t f  0.02s , the operation time of the proximal safety device is set to
4-8 cycles, and the time-domain simulation duration is set to 10 s. A class label (“ 1
” or “ 1 ”) is assigned to each simulated scenario based on descriptions given in Section
8.1. The total number of the simulated scenarios is 30000 and the ratio of unstable
instances to stable instances is almost 1: 2 .
As the next step, the simulated voltage phasors are converted to the instantaneous
waveforms using a phasor-to-waveform converter block (See Section 5.3). Then
unmodeled dynamics such as harmonics, inter-harmonics, decaying DC-offsets, and
noise components are superimposed on the voltage waveform. After superimposing
the aforementioned components, voltage waveforms become more sophisticated but
realistic. The final form of the generated waveform is shown in equation (9.1):
 
X  t   x p  t  cos 120 t   p  t    0.15 x p  t  cos  360 t   p  t    
 6
   
0.1x p  t  cos  600 t   p  t     0.05 x p  t  cos  840 t   p  t    
 6  6
    (9.1)
0.05 x p  t  cos  420 t   p  t     0.05 x p  t  cos  420 t   p  t    
 6  6
 
0.01x p  t  cos  900 t   p  t     0.3e  8t  WGN  0, 0.01
 6
In this simulation, only harmonics with order three, five, and seven and inter-
harmonics with order 3.5 and 7.5 are considered. The time constant and the amplitude
40
of the decaying DC-offset is chosen 8 and 0.3, respectively, and the mean and variance
of the superimposed noise are chosen zero and 0.01, respectively.
Actually, equation (9.1) is the general form of voltage waveforms that are fed into
PMUs in the EMT domain. Figure 9.2 illustrates the voltage waveform before and after
superimposing non-fundamental frequency components.
Voltage waveform before and after superposition of non-fundamental

frequency components.
Comparing Output of Simulators
In order to show the difference between time-synchronized PMU data obtained from
the designed hybrid simulator, and phasor data obtained from the commonly used
phasor-based simulator (transient stability simulator), Figure. 9.2 is given. In this
figure three main electrical quantities, Voltage Magnitude (VM), Voltage Angle (VA),
and Voltage Frequency (VF) are presented to illustrate the short time effects of
dynamically fast events, distortions, and transients on the PMU measurements. It is
demonstrated that these nonlinear effects cannot be modeled by phasor-type
simulation and thus their effects on PMU data are neglected.
It can be seen in Figure 9.2 (a) that after fault occurrence, and fault clearance, there
are some delays and ripples in the outputs of the hybrid simulator that could not be
modeled by phasor-based simulators. These ripples persist for a few cycles and then
die out over time. According to Figure 9.2 (b), it can be seen that there is a phase
difference between the outputs of the phasor-based simulator and outputs of the hybrid
simulator which is due to the filtering mechanisms being applied during the Discrete
41
Fourier transform (DFT)-based phasor estimation process [15, 36, 37]. In Figure 9.2
(c), there are some significant overshoots in the outputs of the hybrid simulator, which
are not seen in the outputs of the phasor-based simulator.
In general, it can be concluded that, in the presence of dynamically fast events,

transients, and distortions phasors of electrical quantities cannot be accurately
measured by PMUs. This fact is neglected in phasor-type simulations which result in
generating unrealistically accurate and error-free data. Obviously, error-free
synchrophasor measurements do not bear true representations of the phasor
measurement unit dataset and could be inadequate for developing a reliable ML-based
TSA model.
Synchrophasor data obtained from an arbitrary bus of test system: (a)

voltage magnitude, (b) voltage angle, (c) voltage frequency.
42
Impacts of Dynamic Response of PMUs on Classification Performance
Generally, in TSA studies, the predictors/classifiers use the post-disturbance

measurements of the first few cycles to predict the stability status of the power system.
Consequently, these measurements must truthfully represent power system dynamics.
However, as it was shown in the previous section, the accuracy of measurements may
highly be influenced by the presence of transient phenomena due to the inherent
dynamics of PMUs. To illustrate the impacts of the dynamic response of PMUs on the
classification performance, at first, we train several well-known decision tree-based
classifiers with the datasets obtained from the commonly used phasor-based simulator,
and then we retrain the same models with the datasets obtained from the hybrid
simulator. In this experiment, we choose the voltage magnitude as the input features
(predictor variables). The observation time window is 0.1s ( To  0.1s ) and the sampling
interval is 1/60s ( t  1 / 60 s ). To effectively evaluating models' performance and to

prevent the classifiers from overfitting, datasets are randomly partitioned into 1:3 as a
testing set and training set, respectively. The hyperparameters of the classifiers are
tuned using 10-fold cross-validation. Table 9.1 shows the performance of different
classifiers for the testing datasets of the WSCC 127-bus systems.
Table 9.1 : TSA results for WSCC 127-bus test system (39 PMUs).
TSA results based on the phasor-
TSA results based on the hybrid
based simulator, in which the
simulator, in which the dynamics
Model dynamics of the PMUs cannot be
of the PMUs are modeled.
modeled.
ACC (%) TUR (%) TSR (%) ACC (%) TUR (%) TSR (%)
XGBoost 97.6 95.4 98.5 89.7 79.0 94.1
Random Forest 97.9 96.0 98.6 91.1 83.2 94.5
Decision Tree 96.8 94.4 97.9 88.4 78.7 92.6
Bagged Tree Ensemble 97.1 95.0 98.1 89.5 78.9 94.0
By comparing the results reported in Table 9.1 it can be seen that all models show poor
performance when they are tested with realistic synthesized data. The deficiency in
performance is more evident in predicting unstable cases ( TUR %  87% ). It must be
noted that, although the performances in the TSA results based on the phasor-based
simulator in Table 9.1 seem to be better than the ones based on the hybrid simulator,
they are not credible since they rely on the dataset of unrealistic error-free
measurements. This means that the promising results of the error-free dataset cannot
43
be achieved in realistic conditions. As shown in Table 9.2, the accuracy of these
models decreases significantly when they are tested by realistic synthesized PMU data.
Table 9.2 : TSA results when the classifiers are trained with outputs of a phasor-
based simulator and tested with the outputs of a hybrid simulator.
Model ACC (%) TUR (%) TSR (%)
XGBoost 67.8 65.6 68.7
Random Forest 70.7 68.8 71.4
Decision Tree 65.6 63.2 66.7
Bagged Tree Ensemble 67.6 65.5 68.6
Data Preprocessing and Its Effects on Classification Performance
The vulnerability of statistical learning models when they are subject to erroneous or
corrupted data, demonstrate the importance of data preprocessing. This vulnerability
was illustrated in the previous section. There are two proposed methods of
preprocessing in this study, which are implemented on data collected from the WSCC
127-bus test system.
In all experiments of this section, voltage magnitude measurements are selected as the
predictor variables to train and test the ML-based TSA models, while To  0.1s and
t  1 / 60 s .
9.4.1 FC-based preprocessing
In order to guarantee the quality of the input feature vectors and minimizing the
misleading effects of erroneous measurements, feature cleansing is applied to the
dataset. Thereby, highly corrupted feature vectors (feature vectors with MSE greater
than average MSE of the whole feature vectors) are removed from datasets. The ANMI
before and after FC is shown in Table 9.3.
Table 9.3 : ANMI before and after FC.

Dataset of VM Dataset of VA Dataset of VF
Arrangement
Method ANMI ANMI ANMI ANMI ANMI ANMI
Before FC After FC Before FC After FC Before FC After FC
Static 0.0429 0.0205 0.0358 0.0196 0.0341 0.0180
Dynamic 0.0432 0.0431 0.0357 0.0355 0.0350 0.0376
44
It can be seen that, after applying FC on the statically arranged dataset, the ANMI is
decreased significantly. This is, in essence, due to the unintentional removal of
accurate measurements that are significantly integrated with erroneous data in the
feature space. Basically, the reduction of ANMI is undesirable and could have a
considerably negative impact on classification performance. Nevertheless, when FC is
applied on a dynamically arranged dataset, ANMI does not decrease (and in some
cases increase), thus the informativity of the dataset is retained. This means that for
effectively removing erroneous data and preventing their high integration with
informative data, sequential PMU measurements should be arranged dynamically
based on the contingency location (See Section 6). Thereby, erroneous data are
effectively removed from feature space, while the remaining relevant information is
retained to enhance the transient stability prediction accuracy. Table 9.4 illustrates the
performance of several well-known classifiers before and after FC.
Table 9.4 : TSA results before and after FC.
Before FC After FC
Model
ACC TUR TSR ACC TUR TSR
(%) (%) (%) (%) (%) (%)
XGBoost 90.0 79.3 94.4 92.3 86.3 94.9
Random Forest 91.4 83.5 94.8 94.1 88.5 96.4
Decision Tree 88.7 79.0 92.9 91.0 84.1 94.0
Bagged Tree Ensemble 89.8 79.2 94.3 91.8 85.3 94.4
It can be seen that by efficiently removing erroneous measurements from feature space
the performance of classifiers is considerably improved. This performance
improvement is more evident in predicting unstable cases, which shows the importance
and effectiveness of the proposed FC-based preprocessing method.
9.4.2 ANN-based preprocessing
As a second method of dealing with erroneous measurements, it is also possible to

correct erroneous measurements instead of removing them from the feature space. In
this approach, a two-layer FCFFN with 25 sigmoid hidden layer neurons and 17 linear
output layer neurons is used. The input of this network is a fixed size vector of time-
series data (e.g., VM, VA, and VF). In Figure 9.4 the time-series inputs and graphical
diagram of the built neural networks are illustrated.
45
Graphical diagram of the built neural networks for mitigating
measurement errors.
In order to achieve a good generalization performance, 585,000 training samples are

fed into the neural network. In the training phase, the weights and bias values are tuned
by three different backpropagation algorithms: BR, LM, and SCG. The training
procedure automatically stops when the maximum of epochs (100) reaches or when
generalization stops improving. Once the network is trained, which means that all the
weights and biases are set, it can be tested by holdout (test) samples. Figure 9.5
illustrates the error mitigation results for an arbitrary sample. It can be seen that all of
the implemented models have the ability to partially mitigate errors of time-series
measurements. By taking a closer look at the plotted graphs, one can see that there is
a great similarity between the outputs of ANN+BR and actual (true) values. However,
this only the result of one test. In order to have a comprehensive view of the
performance of the implemented models, we test them with 409,500 holdout instances.
The criteria: Mean Squared Error (MSE), Regression R-Value (RRV), and Training
Convergence Speed (TCS), which respectively measure the average squared error
between outputs and targets, correlation between outputs and targets, and training time,
are used in this evaluation. The numerical results are reported in Table 9.5.
46
Visualizing PMU measurements before and after mitigating
measurement errors: (a) voltage magnitude, (b) voltage angle, (c) voltage frequency.
47
Table 9.5 : Error mitigation results.
After Mitigating Measurement Errors

Before Mitigating
Measurement Errors
ANN+SCG ANN+LM ANN+BR
TCS TCS TCS

MSE RRV MSE RRV MSE RRV MSE RRV
(min) (min) (min)
VM 260e-5 0.916 98e-5 0.970 3.6 0.5e-5 0.997 848 0.3e-5 0.999 861
VA 175.0 0.920 75.5 0.965 2.8 11.9 0.994 702 11.3 0.995 723
VF 180e-4 0.728 21e-4 0.821 2.9 1.7e-4 0.985 630 1.4e-4 0.988 660
48
The bold values in Table 9.5 represent the best MSE, RRV, and TCS. It can be seen
that MSE and RRV achieved by ANN+BR and ANN+LM is much better than by
ANN+SCG. However, these models require excessively long training time, compared
with ANN+SCG.
As the final experiment, the effect of mitigating measurement errors on the TSA results
is analyzed. In this experiment, the first six post-fault measurements are chosen as
predictor variables. It can be seen in Table 9.6 that, the overall accuracy can be
significantly increased by mitigating measurement errors. Among the all error
mitigation models implemented in this study, ANN+BR have the most impact on the
transient stability prediction accuracy. This positive impact is more evident in
predicting unstable cases, which indicates the importance and effectiveness of the
proposed data preprocessing method.
49
Table 9.6 : TSA results before and after mitigating measurement errors.
TSA results after mitigating measurement errors.

TSA results after
mitigating
measurement errors.
ANN+SCG ANN+LM ANN+BR
ACC TUR TSR ACC TUR TSR ACC TUR TSR ACC TUR TSR
(%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
XGBoost 89.7 79.0 94.1 93.8 83.1 98.2 97.0 94.8 97.9 97.4 95.2 98.4
Random Forest 91.1 83.2 94.5 95.0 87.1 98.4 97.3 95.4 97.0 97.8 96.0 98.4
Decision Tree 88.4 78.7 92.6 91.5 81.8 95.7 95.2 92.8 96.3 95.3 92.9 96.4
Bagged Tree Ensemble 89.5 78.9 94.0 93.4 83.0 98.0 96.5 94.4 97.5 96.9 94.8 97.8
50
CONCLUSIONS
In this thesis to develop a reliable ML-based TSA model, a new approach of hybrid-
type simulation is proposed for generating large sets of realistic synchrophasor data in
a feasible time. It is shown that the proposed simulation method provides an adequate
implementation environment for simulating the nonlinear effects of waveform
distortions and transients (e.g. electromechanical oscillations, abrupt step changes,
harmonics/inter-harmonics, DC offsets, and noises) on the dynamic response of
PMUs. In this way, more realistic PMU data are obtained. The outputs of the hybrid
simulator are compared with the outputs of the commonly used phasor-based
simulators; then the effect of PMU errors on the stability prediction performance is
illustrated. According to the performance evaluation results, erroneous measurements
are recognized as the main cause of the derailment of ML-based TSA models. To
prevent the derailment of ML-based TSA models, two innovative data preprocessing
methods are proposed: FC-based data preprocessing and ANN-based data
preprocessing. It is shown that FC-based preprocessing enables us to remove the
corrupted feature vectors without reducing the informativity of the feature space or
unintentionally removing accurate measurements. The promising results of this
approach motivated us to more investigate on data preprocessing methods. As a second
method of data preprocessing, it is proposed to use an artificial neural network as an
error mitigation function. In this approach, the errors of measurements are mitigated
via a two-layer fully-connected feed-forward artificial neural network. In order to train
our network, three different backpropagation algorithms (SCG, LM, and BR) are used.
The performance of these network training functions is evaluated in the aspect of MSE,
RRV, and TCS. It is illustrated that both LM backpropagation and BR backpropagation
show very smooth MSE performance, while MSE is relatively high when the SCG
backpropagation is used. However, very high TCS may restrict the application of LM
and BR algorithms. As the final experiment, the effect of ANN-based data
preprocessing on the TSA results is analyzed. It illustrated that the overall accuracy of
stability prediction enhances up to 7% when the errors of PMU measurements are
mitigated through ANN-based data preprocessing.
51
52
REFERENCES
[1] Wang, B., Fang, B., Wang, Y., Liu, H., & Liu, Y. (2016). Power system transient
stability assessment based on big data and the core vector machine.
IEEE Transactions on Smart Grid, 7(5), 2561-2570. doi:
10.1109/TSG.2016.2549063.
[2] James, J. Q., Hill, D. J., Lam, A. Y., Gu, J., & Li, V. O. (2017). Intelligent time-
adaptive transient stability assessment system. IEEE Transactions on
Power Systems, 33(1), 1049-1058. doi:
10.1109/TPWRS.2017.2707501.
[3] Zhu, Q., Chen, J., Zhu, L., Shi, D., Bai, X., Duan, X., & Liu, Y. (2018). A deep
end-to-end model for transient stability assessment with PMU data.
IEEE Access, 6, 65474-65487. doi: 10.1109/ACCESS.2018.2872796.
[4] Yan, R., Geng, G., Jiang, Q., & Li, Y. (2019). Fast transient stability batch
assessment using cascaded convolutional neural networks. IEEE
Transactions on Power Systems, 34(4), 2802-2813. doi:
10.1109/TPWRS.2019.2895592.
[5] James, J. Q., Lam, A. Y., Hill, D. J., & Li, V. O. (2017). Delay aware intelligent
transient stability assessment system. IEEE Access, 5, 17230-17239.
doi:10.1109/ACCESS.2017.2746093.
[6] Tan, B., Yang, J., Pan, X., Li, J., Xie, P., & Zeng, C. (2017). Representational
learning approach for power system transient stability assessment based
on convolutional neural network. The Journal of Engineering,
2017(13), 1847-1850. doi:10.1049/joe.2017.0651.
[7] Zheng, L., Hu, W., Zhou, Y., Min, Y., Xu, X., Wang, C., & Yu, R. (2017, July).
Deep belief network based nonlinear representation learning for
transient stability assessment. In 2017 IEEE Power & Energy Society
General Meeting (pp. 1-5). IEEE. doi: 10.1109/PESGM.2017.8274126.
[8] Zhou, Y., Wu, J., Yu, Z., Ji, L., & Hao, L. (2016). A hierarchical method for
transient stability prediction of power systems using the confidence of
a SVM-based ensemble classifier. Energies, 9(10), 778.
doi:10.3390/en9100778.
[9] Gupta, A., Gurrala, G., & Sastry, P. S. (2018). An online power system stability
monitoring system using convolutional neural networks. IEEE
Transactions on Power Systems, 34(2), 864-872. doi:
10.1109/TPWRS.2018.2872505.
[10] Rashidi, M., & Farjah, E. (2016). LEs based framework for transient instability
prediction and mitigation using PMU data. IET Generation,
Transmission & Distribution, 10(14), 3431-3440. doi: 10.1049/iet-
gtd.2015.1482.
53
[11] Ren, C., Xu, Y., Zhang, Y., & Hu, C. (2018, August). A multiple randomized
learning based ensemble model for power system dynamic security
assessment. In 2018 IEEE Power & Energy Society General Meeting
(PESGM) (pp. 1-5). IEEE. doi: 10.1109/PESGM.2018.8585991.
[12] Ren, C., & Xu, Y. (2019). A fully data-driven method based on generative
adversarial networks for power system dynamic security assessment
with missing data. IEEE Transactions on Power Systems, 34(6), 5044-
5052. doi: 10.1109/TPWRS.2019.2922671.
[13] Zhang, Y., Xu, Y., & Dong, Z. Y. (2017). Robust ensemble data analytics for
incomplete PMU measurements-based power system stability
assessment. IEEE Transactions on Power Systems, 33(1), 1124-1126.
doi: 10.1109/TPWRS.2017.2698239.
[14] IEEE Standard Association. (2011). IEEE Standard for Synchrophasor
Measurements for Power Systems. IEEE Std C37. 118.1-2011, 1-61.
doi:10.1109/IEEESTD.2011.6111219.
[15] Ren, J., & Kezunovic, M. (2012). An adaptive phasor estimator for power system
waveforms containing transients. IEEE transactions on power delivery,
27(2), 735-745. doi: 10.1109/TPWRD.2012.2183896.
[16] Ashrafian, A., Mirsalim, M., & Masoum, M. A. (2017). An adaptive recursive
wavelet based algorithm for real-time measurement of power system
variables during off-nominal frequency conditions. IEEE Transactions
on Industrial Informatics, 14(3), 818-828. doi:
10.1109/TII.2017.2727222.
[17] Lotric, U., & Dobnikar, A. (2005). Predicting time series using neural networks
with wavelet-based denoising layers. Neural Computing &
Applications, 14(1), 11-17. doi: 10.1007/s00521-004-0434-z.
[18] Dotta, D., Chow, J. H., & Wilches-Bernal, F. (2017, July). Application of
modulation techniques for power system transient response studies. In
2017 IEEE Power & Energy Society General Meeting (pp. 1-5). IEEE.
doi: 10.1109/PESGM.2017.8274328.
[19] LANGELLA, R., TESTA, A., & Alii, E. (2014). IEEE recommended practice
and requirements for harmonic control in electric power systems. IEEE.
doi: 10.1109/IEEESTD.2014.6826459.
[20] Kamwa, I., Samantaray, S. R., & Joos, G. (2013). Wide frequency range
adaptive phasor and frequency PMU algorithms. IEEE Transactions on
smart grid, 5(2), 569-579. doi: 10.1109/TSG.2013.2264536.
[21] Phadke, A. G., & Thorp, J. S. (2008). Phasor Estimation of Nominal Frequency
Inputs. In Synchronized Phasor Measurements and Their Applications
(pp. 29-48). Springer, Boston, MA.
[22] Phadke, A. G., & Thorp, J. S. (2008). Transient Response of Phasor
Measurement Units. In Synchronized Phasor Measurements and Their
Applications (pp. 107-131). Springer, Boston, MA.
[23] Xie, R., & Trudnowski, D. (2015, July). Distinguishing features of natural and
forced oscillations. In 2015 IEEE Power & Energy Society General
Meeting (pp. 1-5). IEEE. doi: 10.1109/PESGM.2015.7285781.
54
[24] Martinez-Velasco, J. A. (2015). Introduction to Electromagnetic Transient
Analysis of Power Systems. Transient Analysis of Power Systems:
Solution Techniques, Tools and Applications, 1-8. doi:
10.1002/9781118694190.ch1.
[25] Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function.
Mathematics of control, signals and systems, 2(4), 303-314. doi:
10.1007/BF02551274.
[26] Møller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised
learning. Neural networks, 6(4), 525-533. doi: 10.1016/S0893-
6080(05)80056-5.
[27] Yu, H., & Wilamowski, B. M. (2011). Levenberg-marquardt training. Industrial
electronics handbook, 5(12), 1.
[28] Burden, F., & Winkler, D. (2008). Bayesian regularization of neural networks.
In Artificial neural networks (pp. 23-42). Humana Press. doi:
10.1007/978-1-60327-101-1_3.
[29] Dhar, R. N. (1982). Computer aided power system operation and analysis.
McGraw-Hill Companies.
[30] TOOLS, D. P. TSAT User Manual. TSAT, Power-Tech.
[31] Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for
model selection. Statistics surveys, 4, 40-79.
[32] Fan, D. (2008). Synchronized measurements and applications during power
system dynamics (Doctoral dissertation, Virginia Tech).
[33] Mahdi, M., & Genc, V. I. (2017, April). Artificial neural network based
algorithm for early prediction of transient stability using wide area
measurements. In 2017 5th International Istanbul Smart Grid and
Cities Congress and Fair (ICSG) (pp. 17-21). IEEE. doi:
10.1109/SGCF.2017.7947611.
[34] Gou, B. (2008). Generalized integer linear programming formulation for optimal
PMU placement. IEEE transactions on Power Systems, 23(3), 1099-
1104. doi: 10.1109/TPWRS.2008.926475.
[35] LabsInc, P. DSATools, Dynamic Security Assessment Software.
[36] Wang, Y., Lu, C., Kamwa, I., Fang, C., & Ling, P. (2020). An adaptive filters
based PMU algorithm for both steady-state and dynamic conditions in
distribution networks. International Journal of Electrical Power &
Energy Systems, 117, 105714.
[37] Ren, J., Kezunovic, M., & Stenbakken, G. (2011). Characterizing dynamic
behavior of PMUs using step signals. European Transactions on
Electrical Power, 21(4), 1496-1508.
55
56
CURRICULUM VITAE
Name Surname : Tohid BEHDADNIA
Place and Date of Birth : Urmia, 01.09.1995
E-Mail : behdadnia18@itu.edu.tr
EDUCATION :
 B.Sc. : 2017, Urmia University, Electrical Engineering
PROFESSIONAL EXPERIENCE AND REWARDS:
 2018-2020 Project Scholarship, 118E184 TÜBİTAK Project, Istanbul Technical

University, Electrical Engineering Department.
PUBLICATIONS, PRESENTATIONS AND PATENTS ON THE THESIS:

 Behdadnia, T., Yaslan, Y., Genc, V. M. I. (2020). A New Method of Decision
Tree Based Transient Stability Assessment Using Hybrid Simulation for Real-time
PMU Measurements, IET Generation, Transmission & Distribution, 1-16. doi:
10.1049/gtd2.12051.
57

655363

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

655363

Uploaded by

Copyright:

Available Formats

ISTANBUL TECHNICAL UNIVERSITY  GRADUATE SCHOOL OF SCIENCE

ENGINEERING AND TECHNOLOGY

APPLICATION OF HYBRID SIMULATION AND IMPROVEMENT OF

Department of Electrical Engineering

Electrical Engineering Programme

APPLICATION OF HYBRID SIMULATION AND IMPROVEMENT OF

Department of Electrical Engineering

Electrical Engineering Programme

Thesis Advisor: Prof. Dr. V. M. Istemihan GENC

PMU ÖLÇÜMLERİNE DAYALI GERÇEK ZAMANDA GEÇİCİ HAL

YÜKSEK LİSANS TEZİ

Elektrik Mühendisliği Anabilim Dalı

Elektrik Mühendisliği Programı

Tez Danışmanı: Prof. Dr. V. M. Istemihan GENC

Thesis Advisor : Prof. Dr. V. M. Istemihan GENC ..............................

Jury Members : Prof. Dr. Ahmet CANSIZ .............................

Asst. Prof. Dr. Bülent BİLİR ..............................

Date of Submission : 30 November 2020

December 2020 Tohid BEHDADNIA

Importance of Using ML-based TSA Tools ....................................................... 1

Parameterized Model of Waveform Components .............................................. 5

Mathematical Algorithm Used Within PMUs .................................................. 11

Phasor-Type Simulation ................................................................................... 15

Importance of Modeling Electromagnetic Components .................................. 19

Static Method of Data Arrangement (Conventional Method) .......................... 23

Offline Supervised Dataset Generation ............................................................ 35

Database Generation Using Hybrid-Type Simulation...................................... 40

ACF : Amplitude Correction Factor

Xp : Fundamental frequency component

fc : Fundamental frequency (60 or 50 Hz)

xam : Amplitude of x modulation

Δf : The difference between the actual and nominal frequencies

φ Dp : Erroneous (distorted) form of φp

fD : Erroneous (distorted) form of f

s : Number of simulated scenarios, {1,..., S }

j : Number of data points in v s ,c

Dx : Distorting functions of amplitude ( x p )

Dφ : Distorting functions of phase angle ( φp )

Df : Distorting functions of frequency ( f )

Dx-1 : The inverse function of Dx

α0 : Input vector of ANN

β̂ : Output vector of ANN

Wl : Weight matrix of layer l

tc : Fault clearing time

tcmin : Minimum expected clearing time

tcmax : Maximum expected clearing time

 : Class label (stable or unstable)

δmax : Maximum angle separation between any two generators

FVr : r  th feature vector in the feature space, r  {1,..., R}

V : Vector of the class label

CFV : Set of corrupted feature vectors

Table 8.1 : Confusion matrix..................................................................................... 38

Artan yük talebiyle ağırlaşan ve aynı zamanda yenilenebilir enerji kaynaklarının

With the rapid deployment of Phasor Measurement Units (PMUs), time-synchronized

Importance of Using ML-based TSA Tools

In terms of transient stability assessment, traditional methods such as time-domain

Importance of Using Hybrid-Type Simulator

Conventionally, in TSA studies phasor-based or quasi-static simulators are used for

To overcome the aforementioned problems usually phasor-type and EMT-type

Aim of the Thesis

The rest of this thesis is organized as follows: In Section 2 different components of

Parameterized Model of Waveform Components

X  k   X p  k   X ih  k   X h  k   X dc  k     k  , k  1,, m  1, m, m  1, (2.1)

fundamental component, inter-harmonic component, harmonic component, decaying