You are on page 1of 15

144 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO.

2, SEPTEMBER 2010

Catastrophe Predictors From Ensemble Decision-Tree


Learning of Wide-Area Severity Indices
Innocent Kamwa, Fellow, IEEE, S. R. Samantaray, Member, IEEE, and Geza Joos, Fellow, IEEE

Abstract—Catastrophe precursors are essential prerequisites


for response-based remedial action schemes, at both the protective
and the operator levels. In this paper, wide-area-severity indices
(WASI) derived from PMU measurements serve as the basis for
building fast catastrophe predictors using random-forest (RF)
learning. Given the randomness in the ensemble of decision trees
(DTs) stacked in the RF model, it can provide at the recall stage
not only an early assessment of the stable/unstable status of
an ongoing contingency but also a probability outcome which
quantifies the confidence level of the decision. This methodology,
which to the best of our knowledge is new to the dynamic secu-
rity assessment (DSA) of power systems, is also very effective in
evaluating the importance of and interaction among the various
WASI input features. Our research unexpectedly showed that the
ensemble of trees in the RF is very robust in the presence of small
changes in the training data and generalize across widely different
network dynamics. Thus, the same RF performed very well on
a large database with more than 60 000 instances from a test
system (10%) and an actual (90%) system combined. One such
a general RF (with 210 trees) boosted the reliability of a 9-cycle
catastrophe predictor to 99.9%, compared to only 70% when a
single conventionally trained DT is used.
Index Terms—Dynamic security assessment (DSA), decision tree
(DT), early termination, phasor measurement unit (PMU), random
forests (RF), remedial action schemes (RAS), stability assessment,
wide-area measurement systems (WAMS), wide-area severity in- Fig. 1. Database generation process applied to a test [32] and actual [22] sys-
dices (WASI). tems. The m insecure cases are replicated three times in the extended training
file.
I. INTRODUCTION Nowadays, power system defense plans against rare contin-
gencies are based on event detection [15], [16] using breaker
ATTERN recognition was identified as a key method for status and fault signals from relays in combination; basically
P power system security analysis as early as 1968 [1] but in-
vestigation of decision trees (DT) for dynamic security assess-
because the more appealing response-based approach [3],
[20]–[22] is not yet fast enough to allow for effective remedial
ment [2] and response-based discrete event control [3] sparked actions. However, given the data- and processor-rich environ-
interest only in the late eighties. Since then, many data-mining- ment of modern power grids, many researchers are looking at
based predictive tools such as neural networks [4] and support decision tree-based data mining to advance this vision, which
vector machines (SVMs) [5], [6] have also been tried on these seems more likely now than in the eighties, especially in view
problems but DTs remain the prominent machine learning tool of the smart-grid revolution.
Why is a DT so attractive for power system applications? It
in power system dynamic performance assessment and control.
Researchers have applied them to the Entergy (USA) [7], EDF not only clusters observations into groups with similar values
(France) [8], and the Greek [9], Spanish [10], Québec (Canada) for the response variable but it also shows exactly how these
[11] and Chinese [12] power systems. clusters were constructed using a tree whose branches are split
according to the values of the explanatory variables. If the tree
Manuscript received December 11, 2009; revised May 05, 2010; accepted
has only a few leaves, it is easy to interpret and visualize by the
June 07, 2010. Date of publication August 09, 2010; date of current version operator or the protection engineer. Categorical and continuous
August 20, 2010. Paper no. TSG-00025-2009. type variables are handled seamlessly and allowed to relate in
I. Kamwa is with the Hydro-Quebec/IREQ, Power Systems and Mathematics,
Varennes, QC J3X 1S1, Canada (e-mail:kamwa.innocent@ireq.ca).
highly nonlinear ways with the response. Most importantly for
S. R. Samantaray and Geza Joos are with the Department of Electrical and large-scale problems, training times scale efficiently versus di-
Computer Engineering, McGill University, Montreal, QC H3A 2A7, Canada mensionality, i.e., O(N log(N)) operations, are needed to sort
(e-mail: sbh_samant@yahoo.co.in, geza.joos@mcgill.ca). training points in each dimension.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. Unfortunately, a single decision tree typically produces less
Digital Object Identifier 10.1109/TSG.2010.2052935 predictive power than an NNET or SVM [4], [5]. Our anal-
1949-3053/$26.00 © 2010 IEEE
KAMWA et al.: CATASTROPHE PREDICTORS FROM ENSEMBLE DECISION-TREE LEARNING OF WIDE-AREA SEVERITY INDICES 145

Fig. 2. Summary of WASI feature computation and incorporation in the ensemble learning-based catastrophe prediction.

ysis of recent DT applications to DSA suggests a 90%–96% events. Such a “committee of experts” paradigm is not entirely
reliability ceiling, even on small or moderate-size problems, new to our field. Classification reliability was increased in [13]
which is definitely unsatisfactory, considering that modern elec- by combining a DT and a NNET trained on the same data
tric power systems are normally designed and operated to meet while a committee of NNETs was explored in [6]. However,
a “3 nines” reliability standard. This means that electric grid we will show that, by structuring the training set properly,
power is 99.97% reliable ([23], p.7). Even if a DT predictor is RF-based learning of snapshots of wide-area-severity indices
easily interpretable and results in transparent rules for the user, it (WASI) features captured at 150 ms and 300 ms following the
can hardly find widespread acceptance as long as the reliability fault-clearing time results in highly accurate predictors of the
and security of the underlying decision show no significant im- postfault security status, even in the event of a loss of security
provement over the current situation. or stability 10 s or more latter.
Another disturbing pitfall of the DT model currently used in To demonstrate the greatest generalization capability of the
our field is that it is not stable to small changes in the learning proposed methodology, a single RF is shown to have a 99.9%
data: the exact position of the cut point in the tree, as well as reliability on a large data set containing a mix of 90% instances
the selection of the splitting variable, strongly depend on the from the Hydro-Québec grid [15], [22] and 10% instances from
particular distribution of observations in the learning sample. a nine-area test system [32] (60 836 total number of cases, in-
Thus, as an undesired side effect of the recursive partitioning cluding 22.5% unstable cases). Other interesting properties of
approach [24], the entire tree structure could be altered if the first the RF-based predictors in the context of real-time DSA, such
splitting variable, or only the first cut point, is chosen differently as probability-based ranking, two-variable nomograms, and the
due to a small change in the learning data. This instability results importance and significance of the WASI variables involved in
in a high variability of DT predictions. the prediction are illustrated and discussed.
One approach to mitigating most of the above pitfalls of the
single DT is to use the random forest learning model of Breiman II. BUILDING THE DATA SET
[25], which is basically a large ensemble of unpruned decision
trees created by randomizing the split at each node of the tree. A. Systems Studied
To decide the class of a new case, each tree in the committee
casts a vote for the predicted class. Random forests are not the In this paper two quite different power system models will
only way to ensemble DT learning [29], but they have the great be analyzed. The first is a 783-bus representation of the Hydro-
advantage of fast tuning with almost no user input, except for Québec system used for operations planning. The same con-
the number of trees in the ensemble. figurations and contingencies described in [22] are used. They
One last attribute we are seeking from data mining for pre- represent winter and summer operations planning models with
emptive analysis of catastrophic events is the capability not only about 1000 load flow patterns generated by the transfer limit
to classify the unlabeled samples into a fixed number of cate- search and critical clearing time search processes based on 32
gories but also to rank them on the basis of class membership carefully chosen 735-kV contingencies. The second grid model
probability. The randomness built into random forests by ran- is a 67-bus, nine-area test system used in [32] to demonstrate
domization of splitting variables and cutting values in unpruned a PMU placement method. From five base load-flow configu-
trees makes them attractive for probability estimation by aver- rations, many others are generated by stressing the system by
aging the individual tree probability. Stated otherwise, the RF means of a power transfer limit search under 32 contingencies.
approach allows us not only to classify the instance as secure or
B. Training Data Organization and Motivation
not, but also to rank its security level based on a more reliable
sample probability estimation. Instead of applying the new method to the two systems sepa-
The main contribution of this paper is to demonstrate rately, we will instead combine the two sets of data in the hope
the effectiveness of the random forest learning approach in of finding a single, more general, predictor. Our idea, summa-
PMU-based predictive assessment of catastrophic power system rized in Fig. 1, can be explained as follows.
146 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO. 2, SEPTEMBER 2010

Fig. 3. Example of two key features able to split the databases into two subspaces of stable (OK = 01) and unstable (OK = 1) cases. Sample count: 60 826
with 77% secure cases.

Let us suppose that we have a set of examples TABLE I


describing the general concept of system secu- SUMMARY COUNT OF TRAINING DATA SET IN CONFIGURATIONS STUDIED
rity, , where S and I stand for secure and insecure
states respectively. For each case in Fig. 1, a set of attributes
(A) is stored along with the status (C) of the case derived by
analyzing the simulation. Learning the concept of “security”
consists in inferring its general definition from the set of
given examples. However, it is well known that the insecurity
threat is very system-dependent. Years ago, the Hydro-Québec
system was transient stability-limited while today it is voltage
stability-limited in its most likely configurations. Similarly, the
analysis in [32] showed that the nine-area system in Fig. 1 is
very oscillatory but not overly sensitive to voltage instability.
where is the zero-one loss function defined as
Therefore, learning the “security concept” on networks with
widely different security attributes will result in a more general
predictor encapsulating a broader security definition.
(2)
The generalization error of the corresponding decision tree
DT(S) trained on data set S can be defined as the mis-classifica-
tion rate over the performance data set F: Given the high reliability of power systems, we usually have
in Fig. 1. Including more stressed configurations in the
database increases the number of insecure cases to some extent,
(1) but they will still represent a small share of the total number of
cases. Hence, for the Hydro-Québec system they represent 23%
KAMWA et al.: CATASTROPHE PREDICTORS FROM ENSEMBLE DECISION-TREE LEARNING OF WIDE-AREA SEVERITY INDICES 147

Fig. 4. Convergence characteristics of random forest learning: training of the 300-ms response time catastrophe predictor. Scenarios S1 (left) and S2 (right) for
combined HQ + test systems.

of cases out of a total of 55 196. For the test system, we have C. WASI Features
940 (17%) insecure cases out of 5632.
In the past, steady-state precontingency (SCADA-based) fea-
If the share of insecure cases is and the sample misclassifi-
tures such as line flows and voltage magnitude were used in data
cation risk of is , (1) can be rewritten as
mining-based online DSA [5]. However, with the availability
of WAMS, wide-area response-based features make more sense
(3) because they make full use of the dynamic information con-
veyed by the postfault phasor. In particular, COI-based quan-
tities are well known too as a key descriptor of the power grid
It is obvious that, since , the total error will be deter- stability state [34]. In this work, we will use a particular flavor of
mined essentially by the incorrect classification rate of secure PMU-based features, so-called WASI, which we defined in [13].
cases. Any DT inducer in an attempt to minimize the overall The power grid is assumed to be a collection of
misclassification rate will tend to classify secure cases accu- electrical areas interconnected by weak tie-lines. The genera-
rately at the expense of insecure cases, which are actually the tion inertias of each area, are supposed to be known to the
main focus of catastrophic event analysis. To mitigate this fun- catastrophe predictor device from a central authority (i.e., the
damental issue [37], which is often overlooked in data mining EMS). This is the only external parameter required. In each area,
of power system dynamic responses, we can simply incorporate one or more PMUs are installed to measure the cycle-by-cycle
as many replicas of insecure cases as needed to balance the ratio bus phasor voltages, which are then sent to the predictor de-
of secure/insecure cases. For instance, if the class is replicated vice (Fig. 2). Following a fault (which is automatically detected
times ( in Fig. 1), (3) becomes by the PMUs), the following serial computation sequence is
started:
(4) 1. compute the pilot phasor and frequency of each area by
averaging the within-area measurements: ;
2. compute the system COI of the angle and frequency vari-
which means that the share of the insecure set in the total mis- ables using the available area inertias ;
classification error is multiplied by a factor . One advantage 3. project the pilot angle and frequency of step 1) into the COI
of the DT predictor is its total insensitivity to duplication in reference of step 2): ;
the data: no singularity will occur during tree growing and only 4. compute the shift from the prefault to postfault COI-angle
the allocation of errors between secure/insecure classes will be for each area: PostFltAngle;
reequilibrated. As a result, insecure cases will be more accu- 5. compute the power spectrum density (PSD) of the dot-
rately classified, which will greatly improve the suitability of product of frequency and angle to obtain a tracking severity
the predictor in the power system context. index by area.
148 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO. 2, SEPTEMBER 2010

Fig. 5. Visual summary of correlations between the 10 candidate attributes and the decision variable OK: Scenario S1 with combined HQ + test systems.

From the above algorithm, wide-area severity indices can be D. Preliminary Screening
defined in the time and frequency domains as follows: [22]: Fig. 3 presents the box-plots of two typical indices, which
will be shown later to have the highest importance levels in the
random forest model: the energy-based FastWASI 300 ms, and
System-wide minimum voltage over the the voltage-based ms. These box-plots are catego-
time span of ms or ms rized according to the security condition of the case, with
after fault clearing. (secure) and (insecure). The boxes on the plots de-
Area-wide minimum voltage over the time fine 50% of the sample. The lower limit of the box defines the
span of or ms after fault first quartile of the data, while the upper limit corresponds to the
clearing, considering only the busses in the third quartile. It can therefore be concluded that 75% of stable
faulted area. cases verify the relationship , while
for 75% unstable cases, . Clearly,
FastWASITs System-wide frequency-domain severity
this attribute is able to split the stability domain into two largely
index defined over the time span of
disjointed sets. A similar reasoning shows that ms
ms or ms after fault
can split the database into two stable/unstable sets delimited as
clearing.
follows: contains about 75% of stable
VLowPassTs Filtered system-wide minimum voltage for cases while includes 75% of unstable
ms. cases. Interestingly the angle feature (not shown) does not dis-
Filtered system-wide severity index defined play any such clear-cut values to separate the data set into the
for ms. same two secure/insecure classes.

Fault duration. III. RANDOM FORESTS


PostFltAngle System-wide maximum COI angle A. Background
deviation from steady-state to fault-clearing Random forests are large combinations of decorrelated tree
time. predictors such that each tree depends on the values of a random
It is observed that area-wide features are not used in this work vector sampled independently. Individual trees are noisy and
in contrast to [22]. The reason is that, when two different net- unstable but, when grown sufficiently deep, they have relatively
works are combined, the meaning of area-wide feature is lost. low bias [27]. They are therefore ideal candidates for ensemble
However, we will see that the above subset of features which growing as they can capture complex interactions, while fully
considers system-wide variables only is enough for the purpose benefiting from aggregation-based variance reduction. Using a
of predictive catastrophic event analysis. random selection of features to split each node and resampling
KAMWA et al.: CATASTROPHE PREDICTORS FROM ENSEMBLE DECISION-TREE LEARNING OF WIDE-AREA SEVERITY INDICES 149

Fig. 6. Top-down importance of the variables according to the accuracy loss or misclassification rate reduction (gini) when they are respectively removed from or
+
included in the attributes set. Random forest learning of a 300-ms response time catastrophe predictor: Scenario S1 for combined HQ test systems.

(with replacement) the training set to grow each tree yields error ii. Pick the best variable/split-point among the
rates that are decorrelated and more robust with respect to noise.
The generalization error of random forests converges to a limit iii. Split the node into two daughter nodes
as the number of trees in the forest increases. 2. Output the ensemble of trees
The basic idea of most procedures for ensemble tree growing .
is that for the th tree ( , the number of trees in the
ensemble) a random vector is generated, independent of the Although the random forest is a relatively young data mining
past random vectors but with the same distribu- tool, scholars [24], [27], [28] have started to recognize its
tion, and a single tree is grown using the training set S and the strengths: i) simple and easy to use; ii) very high accuracy;
set of attributes in , resulting in a classifier where iii) relatively robust response to outliers and noise; iv) gives
is an input vector. In random split selection, consists of a useful internal estimates of generalization error and feature
number of independent random integers where , importance; and v) not overfitting if the number of trees is large.
the number of attributes in .
A random forest consists of a collection of tree-structured B. Prediction From Ensemble Trees
classifiers , where are inde- A random-forest prediction is the combination of all indi-
pendent, identically distributed, random vectors and each tree vidual-tree predictions. For classification, the class that most
casts a unit vote for the most popular class giving an input . trees vote for is returned as the prediction of the ensemble
An algorithmic view of the RF growing process is summarized
below [25]: (5)

where is the class prediction of the th tree in the random


forest. For predicting probabilities, i.e., relative class frequen-
1. For to :
cies, the results of single trees are averaged
a. Draw a boostrap sample of size N from the
training data S (which contains samples)
b. Grow a random forest tree to the
boostrapped data by recursively repeating the (6)
steps below for each terminal node of the tree until
no other split is possible (i.e., it is an unpruned
tree, of a maximal depth): where denotes the probability of
i. Select variables from the WASI mapping the observation to a class S or I, by the random forest
features tree .
150 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO. 2, SEPTEMBER 2010

Fig. 7. Comparison of results between single flat, single extended and RF extended with respect to reliability, security and accuracy for mixed Hydro-Québec and
test systems. (a) 150 ms. (b) 300 ms.

A traditional DT essentially represents an explicit decision Remember that each tree is built on a bootstrap sample ,
boundary, and an instance E is classified into class c if E falls which serves as a learning set for this particular tree. con-
into the decision area (a leaf in the decision tree) corresponding tains only two-thirds of the observations [25], i.e., those M-N
to c [30]. The class probability is typically estimated by samples not participating in the training of a given tree can
the fraction of instances of class c in the leaf into which E falls. serve as a “built-in” test sample for computing the prediction
This estimate is very crude when the tree is pruned because all accuracy of that tree. The advantage of an out-of-bag error is
the instances falling into the same leaf have the same class prob- that a more realistic estimate of the error rate can be obtained. If
ability. More accurate probability estimates require unpruned we feed the random forest inducer with S containing only 70%
trees [31], which are the backbone of the random forests. Stated of the original data and keep the rest for testing, given that each
otherwise, RF predictors have the advantage of providing for tree is trained on two-thirds of the data only, it turns out that, at
each noncrisp decision, a probabilistic confidence level, which the learning stage, only 50% of the data are actually seen by a
is actually a proxy to security ranking. Assuming that the proba- given tree in the random forest. If the resulting predictor still
bility estimates from individual trees are random variables, each worked well on the external test set, we have to admit that it is
with a variance , the variance of the average in (6) is a robust and general model.
[27], which confirms that the random forest leads seamlessly to
improved probability estimates. C. Relative Importance of Variables
In addition to the ordinary prediction described above, Single classification trees are easily interpretable, both intu-
random forests have a so-called out-of-bag (oob) prediction. itively at first glance and descriptively when looking in detail at
KAMWA et al.: CATASTROPHE PREDICTORS FROM ENSEMBLE DECISION-TREE LEARNING OF WIDE-AREA SEVERITY INDICES 151

Fig. 8. Comparison results between single flat, single extended, and RF extended with respect to reliability, security, and accuracy for test system: 150-ms re-
sponse- time catastrophe predictor. (a) Test. (b) HQ.

the tree structure. In particular, variables that are not included in with and
the tree do not contribute to the model. An ensemble of trees has is a one-zero gain function defined
the advantage that it gives each variable the chance to appear in the opposite way from (2), i.e., the result is 1 when is
different contexts with different covariates and can thus better correctly predicted by the tree and 0 otherwise. In addition,
reflect its potentially complex effect on the response. Moreover, is the ith oob data instance with the original order of
order effects induced by the recursive variable selection scheme values, while in , the values of are permuted randomly
employed in constructing the single trees are eliminated by av- according to the ordering , i.e.,
eraging over the entire ensemble. Therefore, in RF models vari-
able importance measures are computed to assess the relevance
of each variable over all trees of the ensemble. It is calculated
as the mean decrease in accuracy using the oob observations. It can be noted that by definition, if is not
Since each tree is grown from a bootstrapped sample , in the tree. The raw importance score for each variable is then
the observations can be used to calculate the computed as the average importance over all trees
importance of variable in the th tree as follows:

(8)

For a fixed number of trees, variables with the largest impor-


(7)
tance scores are more important for an accurate classification.
152 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO. 2, SEPTEMBER 2010

+
Fig. 9. Histogram of the probability of secure/insecure contingencies for the random-forest model of the combined HQ test systems trained on the data scenario
S1: 150-ms response- time catastrophe predictor.

This importance score can also serve to rank in a robust statis- instances. In sharp contrast, the situation is reversed for the con-
tical way, the role that various attributes play in the RF emu- ventional training scenario S2. Therefore, the proposed exten-
lation of the genuine power system security mechanism behind sion of the data set by replicating the instable cases three to four
the data. times results in a radically more accurate prediction of the inse-
IV. RESULTS cure cases.

A. Scenarios Studied B. Importance of the Variables


To assess the predictors proposed in this paper, we will use The accuracy of the classification is strongly dependent on the
the open-source software R [26] which includes the implemen- quality of the attributes describing the security concept. Fig. 5
tation of conventional decision trees and random forests. The shows the correlation between the decision (OK) and the at-
evaluation setup is summarized in Table I. We will basically be tributes. Highly correlated variables are close together and pre-
considering the performance of a single DT with flat file training sented in the same color: frequency domain features on one
versus extended file training. We will also compare the random side and voltage features on the other, with PostFaultAngle in
forest to a single-tree model assuming the extended data file. the middle. We interpret the degree of any correlation by both
The decision tree and random forest inducers are configured to the shape and the color of the graphic elements [35]. Any vari-
randomly select only 70% of the assumed data file to build the able is, of course, perfectly correlated with itself, as reflected
model. by the diagonal lines on the diagonal of the graphic. Wherever
For easy comparison, all trained models are validated with the graphic element is a perfect circle, there is no correlation
confusion matrices computed using the corresponding flat data between the variables, as is the case in the correlation between
file (not the 30% set aside by the model inducer). For each model PostFaultAngle and VlowPass 300 ms.
and training scenario, two predictors will be derived: the first has The colors used to shade the circles give another clue to the
a 300-ms response time because it incorporates all ten features strength of the correlation: the color intensity is maximal for a
defined in Section II. The second uses the subset of only six perfect correlation and minimal (white) if there is no correlation.
features with a 150-ms measurement time or less. Shades of red are used for negative correlations and blue for pos-
For all random forest models, a limit of 210 trees was set with itive correlations. What this result underscores is the high rele-
, initially to avoid memory overflow. However, as vance of energy-based features like WASI for explaining the se-
shown in Fig. 4, this is actually enough because the oob error curity concept based on PMU measurements. FastWasi 300 ms
starts to stabilize around 50 trees. The training of scenario S1 is the variable most positively correlated to OK while
with the combined HQ and test systems data took 2 min on a ms is the most negatively correlated to OK (largest ellipses in
2-GHz Centrino 2 Laptop with 4 GB memory. Fig. 4 shows that both cases).
the conventional training approach (S2) results in an oob error This complementary behavior is highlighted by the impor-
three times larger than the proposed extended data file-based tance analysis results from the random forest learning in Fig. 6.
training. Moreover, in scenario S1, the out-of-bag misclassifi- Considering the improvement in the misclassification rate, the
cation error is much greater for the secure than for the insecure top four variables are respectively ms, FastWasi
KAMWA et al.: CATASTROPHE PREDICTORS FROM ENSEMBLE DECISION-TREE LEARNING OF WIDE-AREA SEVERITY INDICES 153

Fig. 10. Probability of a case being secure according to the random forest-based 150- and 300-ms catastrophe predictors.

300 ms, PostFaultAngle, TDEF . Out-of-bag accuracy-based at the expense of a slight degradation of the overall accuracy,
ranking results in approximately the same top four, although which dips to 88% from 91% for the 300-ms predictor.
FastWasi150 ms is substituted to the highly correlated FastWasi Random-forest learning based on the extended data file ap-
300 ms. However, the difference in accuracy loss between these pears more appropriate for the reliability, while also improving
four variables is so tenuous that we should conclude, in fact, the accuracy. The 150-ms predictor now offers 99.8% reliability
that they are all equally important for achieving a predictor with compared to only 47% when a traditional approach is used.
good generalization capabilities. Likewise, the 300-ms predictor is 99.9% reliable. This is the
first time, after a decade of data mining investigation for DSA,
C. Performance Assessment that we have come across “3 nines” reliability using a large and
In assessing the performance of the classifiers on flat data set realistic data set. The accuracy is very high at 99% and, again,
without replication of insecure instances, three statistical met- the validation of the random forest model is based on combined
rics are defined as follows [13], [22]. HQ test systems data. Fig. 8 shows additional performance re-
i) Reliability: (Total number of insecure cases—total sults from separate predictive security modeling of the HQ and
number of cases converted to secure cases)/total number test systems.
of insecure cases. As the performance of the 300-ms predictors is quite similar,
ii) Security: (Total number of secure cases—total number of only the 150-ms predictor data are illustrated. The main obser-
cases converted to insecure cases)/total number of secure vation is that the conventional DT approach fails on both the test
cases. and HQ systems, with a reliability of only 85% and 59% respec-
iii) Accuracy: (Total number of cases—number of mis-clas- tively. By contrast, random forest learning in data scenario S1
sifications)/total number of cases. produces no misclassified insecure case for the test system, even
Fig. 7 summarizes the performance of the 150- and 300-ms if each of the 210 trees in the random forest was trained with
response time predictors for the combined HQ and test systems. about half the data only. On the HQ system, the proposed pre-
“Single Flat” corresponds to results obtained using the random dictor could achieve “3 nines” reliability, compared to only 91%
forest with a single decision tree in data scenario S3 (conven- reliability of a similarly trained single tree model. The overall
tional approach). “Single Extended” corresponds to results ob- accuracy of the two ensemble learning-based predictors exceeds
tained using a random forest with a single decision tree in data 99% whereas it is jammed at around 90% for the conventional
scenario S2 (three replications of insecure cases added to the flat approach [3], [12], [36].
data set). “RF Extended” corresponds to results obtained using a
210-tree random forest model in the data scenario S1. When the D. Probability Ranking
modeling and training are performed according to conventional The ranking information is difficult to visualize given the
practices using a single tree and a flat data file with no replica- large size of the database. However, Fig. 9 provides a glimpse
tion of insecure cases (scenario S3) [7], [12], [36], the reliability of it from the separate histograms for secure and insecure in-
is very poor: only 47% and 66% for the 150-ms and 300-ms stances. About 80% of secure cases are positively identified as
predictors respectively. By contrast, the security is relatively secure (zero value of probability) while a slightly greater pro-
good and comparable to typical decision-tree accuracy reported portion of insecure cases is considered definitely insecure (prob-
in recent power systems literature. Using an extended data file ability value of 1). A small number of cases fall in the middle
for training (scenario S2) improves the single-tree model per- and should be described in terms of their probability rather than
formance drastically: the reliability jumps to 88% and 93% for their class. These are actually limit cases that sit on the security
the 150-ms and 300-ms predictors respectively. But this is done boundary, but it should be borne in mind that only 600 out of
154 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO. 2, SEPTEMBER 2010

Fig. 11. Level plots of two-dimensional nomograms from a random forest trained on data scenario S1: energy versus voltage.

Fig. 12. Level plots of two-dimensional nomograms from a random forest trained on data scenario S1: energy versus PostFaultAngle.

60 826 cases are misclassified by the random forest trained with 150 ms become more secure later on
data set S1. . The reverse is also true. This behavior
Fig. 10 shows a very weak correlation between the ranks warrants further investigation although here again, in view of
of the contingency at 150 ms and 300 ms respectively. It the 99% accuracy of the model, it is important to remember
appears that some contingencies on the verge of instability at that the number of misclassified cases is very small.
KAMWA et al.: CATASTROPHE PREDICTORS FROM ENSEMBLE DECISION-TREE LEARNING OF WIDE-AREA SEVERITY INDICES 155

Fig. 13. Example of mid-term multiswing instability from the test system—Legend: R1; . . . ; R9 represent areas 1 to 9. SlowWasi and FastWasi values shown
are computed over 2- and 5-s time windows respectively. PostEvent values correspond to the final time at the end of the record.

E. Nomograms Fig. 12 provides a complementary picture where the system is


less prone to instability when the energy is low and the postfault
Security prediction results are often presented in the form of angle is . However, we have some bright “secure” spots at
two-dimensional plots called nomograms [7] but the availability a high angle, which may be due to changes in the power flow
of contingency probability or severity makes it possible to revisit pattern after fault clearing.
these plots, as shown in Figs. 11 and 12. These postcontingency
nomograms could be updated in real-time following the incep-
V. DISCUSSION
tion of a contingency. With the energy-based severity measure-
ment set on the horizontal axis, we can assign the voltage or The proposed predictors are far more reliable than the usual
PostFaultAngle to the vertical axis. The probability of the con- DT [7], [12], [22], [35], which is noteworthy given the large
tingency is then used in a level plot in such a way that insecure dataset with mixed dynamics ranging from transient to voltage
zones of the plane are represented in dark blue, while light white and oscillatory instability. It is also of interest to note that in
corresponds to secure zones. In the first case, the general trend many cases, such as in Fig. 13, the system lost stability or be-
is easy to capture, given that the insecure zone is confined to the came visibly insecure only 20 to 25 s after the initiating event,
bottom right (high energy, low voltage). Secure zones are white yet the cases were correctly classified by the RF predictor only
spots confined to the left upper side of the map (low energy, 150 ms after fault clearing (i.e., with a more than 20-s preemp-
normal voltage). For the 150-ms decision, some white spots are tion time).
also found at relatively low voltages because the voltage is slow One question remains however: do we have to retrain the RF
to recover. It is also noted that some high-energy cases are se- following routine changes in the network states? The answer
cure, provided that the short-term voltage dip is not too low. is yes, to some extent. Although the RF is robust over a wide
156 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO. 2, SEPTEMBER 2010

Fig. 14. On-demand incremental extension of the learning database and retraining of the random forest-based catastrophe predictor.

range of system conditions and was trained to capture the “es- in order to increase their share of the overall training error. The
sential” concept of system security, as any inductive knowledge, second major idea consists in shifting from a single-tree learning
it comes with a guarantee limited to the network states, that re- model to one with an ensemble of decision trees, the so-called
sult in dynamics “similar” to those included in the learning data- random-forest model. In so doing, the transparency inherent in
base. While the number of possible network states is infinite, the the DT is lost but the random forest is shown to be an order
set of resulting dynamics is limited to a finite number of clus- of magnitude more accurate. In addition, advantage is taken of
ters of similar features such as fast or multiswing transient in- the randomness in the forest of trees to first develop importance
stability, voltage or oscillatory instability, etc. measurements of the WASI features, just to confirm their phys-
It is therefore possible to improve the RF quality using peri- ical significance, and then to rank unlabeled events based on a
odic checks or following major system redesigns. These steps probabilistic assessment of their security or insecurity grade.
are not different from what is done today for a critical SPS Overall, it is suggested that the power system security sig-
whose settings are updated every few seconds or minutes ac- natures and patterns have a nonlinear behavior far too complex
cording to the network states [15]. A possible scheme for up- to be captured accurately in a single DT. Instead, given the pri-
dating the RF is shown in Fig. 14. Given the improved robust- mary importance of reliability in power system security assess-
ness and reliability of the new predictor, such a function could ment, some degree of raw transparency has been sacrificed in
be called upon infrequently on a yearly, monthly, or daily basis exchange for a faster (150 ms), more robust and reliable catas-
using forecast scenarios with some uncertainties added [7], [35]. trophe predictor which has performed well with a combination
An alternative view could be to execute this retrain functionality of actual and test systems with more than 60 000 events ac-
in real time, at the speed of SCADA information. RF training is counted for.
inherently fast (only 2-min CPU time on a laptop) and the data-
base update (when required) could take advantage of the com- REFERENCES
putational facilities being deployed for fast real-time simulation [1] T. E. Dy-Liacco, “Enhancing power system security control,” IEEE
and modeling [36]. Comput. Appl. Power, vol. 10, no. 3, pp. 38–41, Jul. 1997.
[2] L. Wehenkel, Th. Van Cutsem, and M. Pavella, “An artificial intelli-
gence framework for on-line transient stability assessment of power
VI. CONCLUSION systems,” IEEE Trans. Power Syst., vol. 4, no. 2, pp. 789–800, May
In this paper, the search for catastrophe precursors in a bulk 1989.
[3] S. Rovnyak, S. Kretsinger, J. Thorp, and D. Brown, “Decision trees for
power system has been stated in a WASI-based model-predic- real-time transient stability prediction,” IEEE Trans. Power Syst., vol.
tive framework. Two innovations are proposed to achieve fast 9, no. 3, pp. 1417–1426, Aug. 1994.
predictors with “3 nines” reliability (i.e., less than 0.1% misclas- [4] L. S. Moulin, A. P. Alves da Silva, M. A. El-Sharkawi, and R. J. Marks,
II, “Support vector machines for transient stability analysis of large
sified insecure cases). The first consists in extending the training scale power systems,” IEEE Trans. Power Syst., vol. 19, no. 2, pp.
database with several (three or four) replicas of insecure cases 818–825, May 2004.
KAMWA et al.: CATASTROPHE PREDICTORS FROM ENSEMBLE DECISION-TREE LEARNING OF WIDE-AREA SEVERITY INDICES 157

[5] Y. Mansour, E. Vaahedi, and A. El-Sharkawi, “Dynamic security con- [27] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
tingency screening and ranking using neural networks,” IEEE Trans. Learning, 2nd ed. New York: Springer-Verlag, 2009, p. 745.
Neural Netw., vol. 8, no. 15, pp. 942–950, Jul. 1997. [28] D. S. Siroky, “Navigating random forests and related advances
[6] N. Amjady and S. F. Majedi, “Transient stability prediction by a hy- in algorithmic modeling,” Stat. Rev. vol. 3, pp. 147–163, 2009
brid intelligent system,” IEEE Trans. Power Syst., vol. 22, no. 3, pp. [Online]. Available: http://projecteuclid.org/DPubS?service=UI&ver-
1275–1283, Aug. 2007. sion=1.0&verb=Display&handle=euclid.ssu, [online]
[7] K. Sun, S. Likhate, V. Vittal, V. S. Kolluri, and S. Mandal, “An online [29] R. E. Banfield, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer,
dynamic security assessment scheme using phasor measurements and “A comparison of decision tree ensemble creation techniques,” IEEE
decision trees,” IEEE Trans. Power Syst., vol. 22, no. 4, pp. 1935–1943, Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp. 173–180, Nov.
Nov. 2007. 2007.
[8] L. Wehenkel, M. Pavella, E. Euxibie, and B. Heilbronn, “Decision tree [30] F. J. Provost and P. Domingos, “Tree induction for probability-based
based transient stability method a case study,” IEEE Trans. Power Syst., ranking,” Mach. Learn., vol. 52, no. 3, pp. 199–215, 2003.
vol. 9, no. 1, pp. 459–469, Feb. 1994. [31] H. Liang, H. Zhang, and Y. Yan, “Decision trees for probability esti-
[9] E. M. Voumvoulakis and N. D. Hatziargyriou, “Decision trees-aided mation: An empirical study,” in Proc. 18th IEEE Int. Conf. Tools With
self-organized maps for corrective dynamic security,” IEEE Trans. Artif. Intell. (ICTAI’06), pp. 1–9.
Power Syst., vol. 23, no. 2, pp. 662–630, May 2008. [32] I. Kamwa, A. K. Pradhan, and G. Joos, “Automatic segmentation
[10] E. Lobato, A. Ugedo, L. Rouco, and F. M. Echavarren, “Decision tress of large power systems into fuzzy coherent areas for dynamic vul-
applied to spanish power systems applications,” in 9th Int. Conf. Prob- nerability assessment,” IEEE Trans. Power Syst., vol. 22, no. 4, pp.
abilistic Methods Applied Power Syst., Jun. 11–15, 2006, pp. 1–6. 1974–1985, Nov. 2007.
[11] J. A. Huang, S. Harrison, L. Wehenkel, G. Vanier, F. Lévesque, and A. [33] Y. N. Zhou, L. L. Zhu, K. K. Y. Poon, D. Gan, H. Zhu, and Z. Cai, “Area
Valette, “Operation rules determined by risk analysis for special pro- center of inertia—A potential unified signal for synchronous and fre-
tection systems at hydro-Quebec,” in Proc. CIGRÉ Session 2004, Paris, quency stability control of interconnected power systems under short
pp. 1–8, Paper B5-205. and long time spans,” in Proc. Int. Conf. Electr. Eng. (ICEE), Hong
[12] Z. Li and W. Wu, “Phasor measurements-aided decision trees for power Kong, Jul. 8–12, 2007, pp. 1–6 [Online]. Available: http://power2.eee.
system security assessment,” in Proc. 2009 IEEE Comput. Soc. 2nd Int. hku.hk/ceespub/papers/ICEE2007-390.pdf, paper no 390, [on-line]
Conf. Inf. Comput. Sci., pp. 358–361. [34] M. Friendly, “Corrgrams: Exploratory displays for correlation ma-
[13] I. Kamwa, R. Grondin, and L. Loud, “Time-varying contingency trices,” Amer. Stat., vol. 56, no. 4, pp. 316–324, 2002.
screening for dynamic security assessment using intelligent-systems [35] K. Alcheikh-Hamoud, N. Hadjsaid, Y. Bésanger, and J. P. Rognon,
techniques,” IEEE Trans. Power Syst., vol. 16, no. 3, pp. 526–536, “Decision tree based filter for control area external contingencies
Aug. 2001. screening,” in Proc 2009 IEEE Bucharest PowerTech, pp. 1–8.
[14] Y.-J. Wanga, C.-W. Liua, and Y.-H. Liu, “A PMU based special pro- [36] K. Moslehi, A. B. R. Kumar, E. Dehdashti, P. Hirsch, and W. Wu, “Dis-
tection scheme: A case study of Taiwan power system,” Int. J. Elec. tributed autonomous real-time system for power system operations—A
Power Energy Syst., vol. 25, no. 3, pp. 215–223, Mar. 2005. conceptual overview,” in Proc. IEEE Power Syst. Conf. Expo., New
[15] G. Trudel, J.-P. Gingras, and J.-R. Pierre, “Designing a reliable power York, Oct. 10–13, 2004, pp. 1–8.
system: The Hydro-Québec’s integrated approach,” Proc. IEEE , vol. [37] N. Chawla, K. Bowyer, L. Hall, and P. Kegelmeyer, “SMOTE: Syn-
93, no. 5, pp. 907–917, May 2005. thetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16,
[16] V. Madani, M. Adamiak, and M. Thakur, “Design and implementation pp. 321–357, 2002.
of wide area special protection schemes,” Protection Control J. Apr.
2006 [Online]. Available: http://www.gedigitalenergy.com/smartgrid/
Apr06/Wide_Area_Special_Protection_Schemes.pdf, [on line]:
[17] K. Mei and S. M. Rovnyak, “Response-based decision trees to trigger
one-shot stabilizing control,” IEEE Trans. Power Syst., vol. 19, no. 1,
pp. 531–537, Feb. 2004. Innocent Kamwa (S’83-M’88-SM’98-F’05) re-
[18] N. Senroy, G. T. Heydt, and V. Vittal, “Decision tree assisted controlled ceived the Ph.D. degree in electrical engineering
islanding,” IEEE Trans. Power Syst., vol. 21, no. 4, pp. 1790–1797, from Laval University, Laval, QC, Canada, in 1988.
Nov. 2006. Since then, he has been with the Hydro-Québec
[19] S. Bruno, M. De Benedictis, and M. La Scala, , A. Messina, Ed., “Ad- Research Institute (IREQ), Power System Analysis,
vanced monitoring and control approaches for enhancing power system Operation and Control, Varennes, QC, where he is
security,” in Inter-area Oscillations in Power Systems: A Nonlinear and currently a Principal Researcher in bulk system dy-
Non-stationary Perspective. New York: Springer, 2009, pp. 231–260. namic performance. He has also been an Associate
[20] C. W. Taylor, D. C. Erickson, K. E. Martin, R. E. Wilson, and V. V. Professor of Electrical Engineering at Laval Univer-
Venkatasubramanian, “WACS—Wide-area stability and voltage con- sity since 1990.
trol system: R&D and on-line demonstration,” Proc. IEEE, vol. 93, no. Dr. Kamwa is a Member of CIGRÉ and a Reg-
5, pp. 892–906, May 2005. istered Professional Engineer. He is a recipient of the 1998 and 2003 IEEE
[21] E. Martínez, N. Juárez, A. Guzmán, G. Zweigle, and J. León, Using Power Engineering Society Prize Paper Awards and is currently serving on the
synchronized phasor angle difference for wide-area protection and con- Adcom of the IEEE System Dynamic Performance Committee. He has been ac-
trol Schweitzer Eng. Lab. Tech. Paper [Online]. Available: http://www. tive for the last 13 years on the IEEE Electric Machinery Committee, where he
selinc.com/WorkArea/DownloadAsset.aspx?id=3691 is presently the Standards Coordinator.
[22] I. Kamwa, S. R. Samantaray, and G. Joos, “Development of rule-based
classifiers for rapid stability assessment of wide-area post-disturbance
records,” IEEE Trans. Power Syst., vol. 24, no. 1, pp. 258–270, Feb.
2009. S. R. Samantaray (M’08) received the B.Tech.
[23] Smart grid: Enabler of the new energy economy U.S. Dept. Energy, degree in electrical engineering from UCE Burla,
Dec. 2008 [Online]. Available: http://www.oe.energy.gov/Doc- India, in 1999 and the Ph.D. degree in power system
umentsandMedia/final-smart-grid-report.pdf, Electric Advisory engineering from the Department of Electronics and
Committee, [on line] Communication Engineering, National Institute of
[24] C. Strobl, J. Malley, and G. Tutz, An introduction to recursive par- Technology, Rourkela, India, in 2007.
titioning: rationale, application and characteristics of classification He is an Associate Professor in the Department
and regression trees Dept. Statistics, Univ. Munich, Tech. Rep. No. of Electrical Engineering, National Institute of Tech-
55, Apr. 2009 [Online]. Available: http://epub.ub.uni-muenchen.de/ nology Rourkela, Orissa, India. His major research
10589/1/partitioning.pdf, [on line] interests include intelligent protection, digital signal
[25] L. Breiman, “Random forests,” Mach. Learn. vol. 45, pp. 5–32, 2001 processing, soft computing, FACTs, distributed gen-
[Online]. Available: http://www.stat.berkeley.edu/users/breiman/Ran- eration, and dynamic security assessment.
domForests Prof. Samantaray is the recipient of the Orissa Bigyana Academy Young Sci-
[26] A. Liaw and M. Wiener, “Classification and regression by random entists Award 2007, the Indian National Academy of Engineering Best Ph.D.
forest in R,” R News vol. 2, no. 3, pp. 18–22, Dec. 2002 [Online]. Thesis Award 2008, and Institute of Engineers, India (IEI) Young Engineers
Available: http://www.r-project.org, [on line] Award 2009.
158 IEEE TRANSACTIONS ON SMART GRID, VOL. 1, NO. 2, SEPTEMBER 2010

Geza Joós (M’82, SM’89, F’06) received the M.Eng.


and Ph.D. degrees from McGill University, Montreal,
QC, Canada.
He was previously with ABB, the Ecole de
Technologie Supérieure, and Concordia University.
He has been a Professor with McGill University
since 2001, and holds a Canada Research Chair in
Power Electronics applied to power systems. He is
involved in fundamental and applied research related
to the application of high-power electronics to
power conversion, including distributed generation
and wind energy, and to power systems. He is involved on a regular basis in
consulting activities in power electronics and power systems.
Prof. Joós is a Fellow of the Canadian Academy of Engineering and of the
Engineering Institute of Canada. He is active in a number of IEEE Industry Ap-
plications Society committees and in IEEE Power Engineering Society working
groups, and in CIGRE working groups.

You might also like