You are on page 1of 10

610

IEEE TRANSACTIONS ON POWER SYSTEMS VOL. 19, NO. 1, FEBRUARY 2004

Enhancement of Anomalous Data Mining in Power


System Predicting-Aided State Estimation
Shyh-Jier Huang, Senior Member, IEEE, and Jeu-Min Lin

AbstractAn approach for predicting-aided state estimation including bad data mining in a power system is proposed in this
paper. In the method, the sliding surface-enhanced fuzzy control
and optimal cluster numbers estimation techniques are both employed for the enhancement of state estimation. This proposed approach has been applied to test systems. Test results reveal the feasibility of the method for the applications considered.
Index TermsBad data mining, cluster numbers estimation,
sliding surface-enhanced fuzzy control, state estimation.

I. INTRODUCTION

OLLOWING the open access and the operation of transmission networks, the patterns of power flows in a deregulated power system have become less predictable when compared to the vertically integrated systems [1]. Under such new
energy market rules, the need for real-time network modeling is
vitally important, yet the quality of the formulated model highly
depends on the effectiveness of state estimators. Besides, only
with the utilization of the previous estimate, then a state estimator can forecast the state vector one step ahead accordingly.
Thus, it will also provide system operator with more time allowable in making related decisions such as economic dispatch,
security assessment, and other related functions.
The process of a conventional state estimator is often carried out through two functions that include state predicting and
state filtering. State predicting is performed according to the
past information, and state filtering is applied to find the optimal
estimate considering all available measurements and predicted
states. Several algorithms have been proposed for the accomplishment of state estimation work. Among these methods, the
commonly recognized extended Kalman filter (EKF) was often
employed [2][4]. In such a method, when the system is operated normally, the EKF scheme is useful to provide an optimal
state estimate. However, once the system encounters a large load
variation, the estimation performance is often seen significantly
downgraded. This is mainly attributed to the shortage of the
large injection change considered in the model formulation. Recently, based on the human knowledge and experience, fuzzy
control theory was also suggested [5]. Through the fuzzy control system design, the formulated state estimator was expected

to be more robust because of a better management of uncertainty. Then, with the emergence of neural networks, it was also
deemed a favorite candidate in several electric power applications thanks to its fast response and efficient learning. By collecting the data generated from power system operation, neural
networks were seen of high potential to assist in the dynamic
tracking of system states [6], [7]. In aforementioned methods,
each of these techniques could improve the computation performance, but each has its drawbacks as well.
In this paper, a fuzzy control method enhanced with a sliding
surface is proposed for the predicting-aided state estimation of a
power system [8]. The predicting-aided state estimation is also
known as dynamic state estimation whose central concept is to
detect identify and eliminate the inconsistent data by use of the
difference between the latest acquired measurements and corresponding forecasted values. In the proposed approach, with the
embodiment of fuzzy decision capability, for those cases where
the large load changes occurred, the method would guide the
solution to a near-optimal trajectory [9]. In addition, rather than
the conventional fuzzy controller design, the proposed method
applies the concept of the sliding surface that combines the error
and the rate of error as an integrated input variable. Hence, the
number of fuzzy rules can be also largely reduced, thereby facilitating the computation performance.
In addition to coping with load variation scenario, the study
also investigates the possibility of gross measurement errors that
may lead to biased estimates. Fig. 1 outlines the framework of
the proposed approach. As seen from the figure, the predicted
state vector is first determined. Then, following the acquirement of a new set of measurements, the innovation vector can
be evaluated and used as the network inputs, where the innovation vector is defined as the differences between the latest acquired measurements and their corresponding predicted values.
Yet, once the abnormal data occurs, the related innovations will
be immediately replaced in anticipation of better reflecting the
correct values, assisting the predicting-aided state estimator toward reliable estimates of system states.
The organization of the paper is made as follows. Section II
introduces the modules of the proposed approach, Section III
describes the computation procedures, Section IV shows the numerical results, and Section V draws the conclusions.
II. PROPOSED APPROACH

Manuscript received April 15, 2003. This work was supported in part by the
National Science Council of the Republic of China under Contract NSC902213-E-006-104.
The authors are with the Department of Electrical Engineering, National
Cheng Kung University, Tainan 70101, Taiwan, R.O.C.
Digital Object Identifier 10.1109/TPWRS.2003.818726

A. Paradigm
A multidimensional state vector including node-voltage
and phase angles
is utilized to describe the
magnitudes
states of a power system, where is the node number. From the

0885-8950/04$20.00 2004 IEEE

HUANG AND LIN: ENHANCEMENT OF ANOMALOUS DATA MINING IN POWER SYSTEM PREDICTING-AIDED STATE ESTIMATION

611

the concept of sliding surface-enhanced fuzzy controller is motivated to employ in the framework of the proposed method.
B. Sliding Surface-Enhanced Fuzzy Controller
Fig. 1. Framework of predicting-aided state estimation with anomalous data
mining embedded.

latest acquired measurement vector, the state vector can be thus


,
optimally estimated at each time sample. At time instant
the incoming measurement vector
can be expressed by
(1)
where
represents the state vector,
indicates a
is a white
nonlinear measurement function vector, and
Gaussian measurement noise error vector with zero mean and
. By using a state transition equation,
covariance matrix
at the -th time sample can be written as
the state vector

In managing events involving uncertainty, fuzzy controller


has paved a systematic way of capturing the linguistic fuzzy information extracted from human experts. The fuzzy paradigm
can be deemed the generalization of probability theory, where
each variable is assigned a degree of membership for their possible values. The linguistic information can be, therefore, converted into a control decision suitable for describing nonlinear
or model-free scenarios. As for the sliding mode control, it is
commonly recognized as a robust approach that was widely applied in the control study. By integrating the sliding mode control with the fuzzy model, it forms the major theme of this paper,
which will be derived as follows.
be the tracking error between the current state and
Let
desired state , then a sliding surface defined by the scalar
can be selected as expressed by
function

(2)

(7)

In (2),
is a nonzero diagonal matrix,
is a nonzero
vector, and
is a white Gaussian sequence with zero mean
. Note that in this study, the meaand covariance matrix of
surement vector is formed by bus voltage magnitudes, power injections, and line flow measurements. All measurement data are
assumed to acquire at the fundamental frequency.
The state estimation process can be largely divided into two
stages that consist of state predicting and state filtering. In the
state predicting process, based on the conditional expectation
at time instant can be
theory, the predicted state vector
derived from (2) as shown below

where is the order of system, and is a strictly positive constant that is equivalently a magnification factor [11]. Now, for a
, it becomes
second-order system with

(3)
where
is an estimate of state vector at time sample
.
Then, in the state filtering process, by assuming that a latest
set of observed measurement is available, the predicted state
vector can be filtered such that a new estimate is obtained.
at time is
In such a process, the estimated state vector
formulated by
(4)
(5)
(6)
where
represents the innovation vector,
the
the covariance matrix of
predicted measurement vector, and
the predicted state vector. Under a normal operation scenario,
(3)(6) are considered reasonable. However, once the improper
predicted states or gross measurement errors happens, this estimator has been reported no longer reliable [1]. Such drawbacks were attributed that the predicted values of system state
variables fail to take the sudden injection changes into account,
downgrading the state estimation performance. For this reason,

(8)
where
is the rate of state error. As (8) indicates, the state
error and the rate of state error are seen linearly composed rather
than separated. The sliding surface defined by the scalar funcpassing through the
tion forms a straight line of the slope of
origin of the phase plane only if
. The achievement of
is, therefore, equivalent to maintaining the sliding surface with the initial condition given as below
(9)
Equation (9) also illuminates that the problem of following the
desired trajectory can be regarded as that of holding the scalar
quantity at zero. The rule in this sliding surface-enhanced
fuzzy controller can be formed through the integration of the
error and the rate of error as one; namely, the error and the rate
of error are both used in the antecedents of the fuzzy rules. In this
way, the dimension of the input space and the number of rules
can be reduced, by which increasing the computational performance. Main procedure of this controller is detailed as below.
1) Decision of Input and Output Variables: Similar to (8),
the input variable of fuzzy controller is first defined by
(10)
where is the state error, is the rate of state error, and is the
positive constant. Since can be seen as the difference of the
predicted and estimated values of state variable, the state error
at time step can be given by
(11)

612

IEEE TRANSACTIONS ON POWER SYSTEMS VOL. 19, NO. 1, FEBRUARY 2004

where
and
indicate the th predicted and estimated
system states at time , respectively. Then, the rate of state error
at the th time sample can be expressed as
(12)
Now, by assuming the element of input vector , and that
of output vector , in order for the region of the universe of
and
are
discourse of and , the intervals of
individually assigned. In this study, the value of is set to be
0.25 for the state error of voltage magnitude as well as that of
phase angle. The value of is set to be 0.07 for the state error
of voltage magnitude, and 0.05 for that of phase angle. This
selection was considered feasible based on the discussions with
utility engineers.
2) Partition of Universe of Discourse and Rule Establishment: Let and be linguistic variables individually for and
, then the universe of discourse is ready to partition. The input
and output universes of discourses can be divided as Table I tabulates, then the diagram of a fuzzy sliding surface can be formed
as Fig. 2 depicts. In the figure, the sliding surface is seen to consist of a banding area in the phase plane. The width of 2 indicates the available range for the input universe of discourse.
3) Fuzzification: In the fuzzification process, the isosceles
triangle is selected as the shape of membership function, where
the crisp value is accordingly fuzzified.
4) Fuzzy Inference: In the inference process, the min-max
method is adopted [12]. The grade of membership of the anis first formulated as below
tecedent
(13)
where the value of input variable is equal to a singleton .
can be written
The grade of membership of the consequent
as
(14)
Therefore, the inference result obtained from the grade of membership of the consequent becomes
(15)
5) Defuzzification: In the defuzzification process, inference
results of each input variable need to be converted to a crisp
value. The centroid is computed as below
(16)
where
is the element of output vector
can be hence modified to be

. At this stage, (3)

(17)
When the system is in normal operation, the term in (17) is
almost zero; yet when the system is under a drastic load change,
this term will become significant, indicating the current state of
a power system.

Fig. 2.

Fuzzy sliding surface.


TABLE I
FUZZY SETS AND FUZZY
RULES

C. Mining of Anomalous Data


When bad measurements take place, the squares of differences of the observed measurements and their corresponding
estimates often become significant. In the paper, a back-error
propagation neural network scheme has been developed to provide the forewarning signals, but input variables for this neural
network are innovation vectors that are the differences between
the latest acquired measurements and their corresponding predicted values. [13]. The reason of choosing this innovation as
network input lies in its white Gaussian distribution feature [14].
Hence, when the neural network is well trained, it would serve
as a useful extrapolator. Now, with an input variable , its corre, where
sponding output is the estimated value
stands for the number of measurements. Next, by setting the
, the bad data can
threshold to compare the value of
be flagged. This method was seen feasible in some example systems; however, the selection of threshold universally suitable to
different scenarios is hard to achieve. An inadvertent choice of
the threshold may contrarily degrade the diagnosis process. An
efficient method to improve this drawback is, therefore, urgently
required.
Based on the theory of cluster analysis, an optimal cluster
number decision strategy was proposed that is capable of
estimating the optimal number of clusters from a set of data.
This technique uses the output of any clustering algorithm, and
then compares the change in within cluster dispersion to that
expected under an appropriate reference null distribution [10],
[15]. In this way, the appropriate number of data clusters can

HUANG AND LIN: ENHANCEMENT OF ANOMALOUS DATA MINING IN POWER SYSTEM PREDICTING-AIDED STATE ESTIMATION

613

be justified, improving the detection and identification of the


abnormal data. The concept and formulation of this proposed
approach is detailed as follows.
, where represents
Assume that there is a set of samples
the thsample,thenthesetofsampleshave well-separatedgroups
can be obtained as below
and its error measure
(18)
where is the center of group . By comparing the natural logawith a reference value, the optimal number
rithmic value of
of clusters is accordingly determined. If there is any significant
and its reference value, it will
gap difference between
be a good indicator to warn the existence of bad data. Note that by
employing the natural logarithmic representation of
in this data mining process, the curve of error measure is expected
to be more linear such that the gap quantity between successive
clusters is also easier determined. It is also worth noting that for
the discovery of the gap difference, the set of reference data is also
required to form. In view of the distribution of squared error data
for each dimension, we have adopted a uniformly distributedrandomnumbergeneratorsuchthateachreferencedataset
has its Monte Carlo samples. Therefore, the probability of occurrence for each squared error data will be mutually the same under
normal system operation scenario.
III. COMPUTATION PROCEDURES
Fig. 3 depicts the building blocks of the proposed approach.
The main procedures of these blocks are discussed as follows.
Step 1: Compute the Predicted State and Innovation Vectors: In this step, the predicted state vector is first evaluated.
Then, with the acquisition of the latest set of measurements,
the innovation vector can be calculated by grasping the difference between the new measurements and their related predicted
values. Next, the squares of differences of the innovations and
are determined and ready for
their estimated values
the subsequent clustering process.
Step 2: Cluster the Squared Error Data: In this process, the
squared error data are partitioned into groups via a clustering
can be evaluated via (18),
algorithm. Its error measure
where the subscript squ indicates the squared error.
Step 3: Cluster the Reference Data: To serve as the comparison benchmark, reference data are also required uniformly generated. The error measure for each reference dataset can be com,wherethesubscript
putedandtermedby
ref signifies the reference data, and B is the number of reference
datasets.Thesamealgorithmusedintheabovestepisthenusedfor
the clustering of reference dataset. After the clustering of square
error data and reference dataset, the next step is to determine the
optimal number of clusters of square error data.
Step 4: Determine the Optimal Cluster Numbers: In this
step, the optimal number of clusters is determined by the
following rule [10]:
(19)
(20)

Fig. 3. Flowchart of the proposed algorithm.

(21)
(22)
where
represents the statistical expectation under an null
reference distribution of the squared error data. In (21)(22),
is utilized for the consideration of the statistical variance
of
is considered for the effects
of the number of reference datasets that may brought to the
is employed for the standard deviagap quantity, and
. In the study, the expectation value of
tion of
can be determined by an average of copies
, where each of which is computed from the
Monte Carlo samples. If (19) is satisfied with the calculated
cluster number of , it points outs that the set of squared error
data are appropriately divided into clusters.
Step 5: Detect and Identify the Anomalous Data: Following
the aforementioned steps, if the cluster number is estimated to
be 1, all observed measurements are identified as normal ones.
Yet, when the number of cluster is larger than 1, it implies that
the set of observed measurements has inconsistent data inside.
that is
Then, for each cluster whose average value of
calculated, a cluster with a smallest averaged value of
will be identified as the normal group while others are deemed
abnormal ones, therefore, accomplishing the separation of bad
data. This is followed by the update of the anomalous innovations by their estimated values in order to obtain the reliable estimate. Next, with the aid of the sliding surface-enhanced fuzzy
expressed in (17) will be obtained and used for
controller,
the re-estimate of system state vector. This completes the computation process of the proposed method.

614

IEEE TRANSACTIONS ON POWER SYSTEMS VOL. 19, NO. 1, FEBRUARY 2004

Fig. 5. Determination of optimal cluster numbers under normal operation


scenarios.
TABLE II
PERFORMANCE ASSESSMENT UNDER NORMAL SCENARIOS

Fig. 4.

IEEE 30-bus power system.

IV. NUMERICAL STUDY


The proposed approach has been tested on an IEEE 30-bus
system as depicted in Fig. 4, where six generators, 21 loads,
and 41 branches are included. A full set of 248 measurements is
collected from this test system, including 30-bus voltage magni, six pairs of active and reactive generations
,
tudes
, and 82 pairs of
21 pairs of active and reactive loads
active and reactive line flows
. For the measure, and
, each one was
ments of
individually assigned a number following the sequence of bus
indicates the voltage magnitude at
number. In other words,
the th bus,
is the active power produced by generator at
signifies the active power received by electrical
the th bus,
indicates the reactive power produced
load at the th bus,
denotes the reactive power reby generator at the th bus,
ceived at the th bus,
signifies the active power flowing
is the reactive
from the th bus to the -th bus, and
power flowing from the th bus to the th bus. The proposed
method was developed by Microsoft Visual C++ language
that runs on an IBM-compatible computer with Intel Pentium IV
1.7-GHz processor inside. In the numerical study, a linear trend
between 1.5 and 2.5% along with a random variation was added
to a load curve. The variation was represented by a normally
distributed random number with a standard deviation of 2% of
the trend component. For each time sample, power flow computation was carried out using the load and generation data obtained via load curve and generation participation factor under
a normal system operation condition. In the study, six test scenarios are employed to validate the performance of the proposed
method, involving normal operation scenario, large load variations, multiple measurement errors, coexistence of measurement errors and large load changes, network unobservability,
and topology errors. These cases are also simulated on an IEEE
118-bus system, where 118 buses and 179 branches along with
1116 measurements are included. For each scenario, the outcome of the load flow serves as the true values of measurements.
Then, the raw measurements were evaluated by incorporating a
normally distributed noise of 1% standard deviation for voltage
magnitudes, and 1.5% for power flow pairs.

In the following test cases, the performance of different


methods are assessed by the following indices:
is ema) At filtering stage, the performance index
ployed as below
(23)
b) At predicting stage, the averaged absolute error
is used as follows:
(24)
In (23), at time step
, and
individually signify the th estimated, true, and observed meais the number of measurement varisurements, and
and
are the th preables. In (24),
dicted and true system state variables at time
, and
is the number of state variables. A smaller index
reveals that the algorithm can perform better estimates of
system states. Similarly, for a case of better prediction, the
will become smaller. In the following,
value of
four test cases are presented and discussed.
Test 1: Normal Operation Scenario: In this test, different
methods are applied to solve the problem, where the observed
measurements are assumed behaved normally. Fig. 5 plots the
results of optimal cluster numbers by the proposed method at
the 10th time sample. In the figure, the inequality of
expressed in (19) of Section III (namely,
) is satisfied when
, revealing that the
optimal number of clusters would be 1 for this case. In other
words, the whole observed measurements of the 30-bus system
are identified as normal ones. Table II tabulates the comparisons
between methods. As seen from the table, the performances
of the methods are mutually similar. As for the IEEE 118-bus
system, Table III lists the simulation results of normal condition. Results obtained from this case support the effectiveness
of the method.
Test 2: Large Load Variations: In this test, a large load variation scenario is assumed. It is known that when encountering

HUANG AND LIN: ENHANCEMENT OF ANOMALOUS DATA MINING IN POWER SYSTEM PREDICTING-AIDED STATE ESTIMATION

TABLE III
RESULTS OF NORMAL CONDITION IN THE 118-BUS SYSTEM

Fig. 6. D(c) and (c) versus the cluster numbers under large load variations
scenarios.

large load variations, the accuracy of state estimates is often significantly affected. This is mainly attributed that the predicted
system states in the previous methods often fail to take account
of sudden injections. To validate the feasibility of the method,
large load variations are assumed to occur at bus 14 and 18 at
the 13th time sample in this study. Fig. 6 is the plot of the numerical results by using the proposed approach. Based on the
is seen larger than that of
inequality of (19), the value of
, indicating that the appropriate cluster numbers can
be chosen to be 2. Table IV lists the clustered results, which indi(no. 14),
(no. 18),
cates that the measurements of
(no. 45),
(no. 49),
(no. 126),
(no.
127),
(no. 129),
(no. 136),
(no. 137), and
(no. 138) are identified as the anoma(say
lous data due to the relatively large value of
0.0242), implying that the detection result is in consistence with
the assumed scenario. Besides, Table V shows the comparison
results of the performance indices between methods. As tabuand
of the
lated in the table, the smaller values of
proposed approach also show that the proposed method is superior to the other method, confirming the suitability of the approach for the scenario considered.
While for the 118-bus system, at the 16th time sample, a
large load change is assumed to occur at bus 52. Table VI and
Fig. 7 show the simulation results, by which the feasibility of
the method is confirmed.
Test 3: Multiple Measurement Errors: In this test case, a scenario of multiple measurement errors is assumed. Among the set
of observed measurements collected from the 30-bus system,
four bad data have been randomly assumed to occur at the 20th
time sample, involving the measurements of
(no. 15),
(no. 42),
(no. 63), and
(no. 215). The error
size of these bad data is 20 times their measurement standard
deviations. By using the proposed method, Table VII lists the
clustered results. As the table indicates, because the inequality
is satisfied, the optimal number of clusof
ters will amount to two for such a scenario, hinting that all ob-

615

TABLE IV
CLUSTERED RESULTS FOR LARGE LOAD VARIATIONS

TABLE V
COMPARISON OF PERFORMANCE INDICES BETWEEN METHODS

Fig. 7. Performance assessment for the 118-bus system.


TABLE VI
CLUSTERED RESULTS FOR THE 118-BUS SYSTEM

served measurements are grouped into two clusters. In the table,


the cluster that comes with the lowest average value
of
is deemed the normal group, while other groups
covering the bad measurements (no. 15, 42, 63, and 215) are abnormal ones.
Fig. 8 delineates the performance assessments between
methods. Confirmed by smaller values of indices, the performance of the proposed approach is proved better than that of the
other method. In addition, Fig. 9 also depicts the comparison
results with and without bad data identification and correction
and
are
included. From this plot, the values of
seen large when the procedure of bad data identification
and correction was not carried out, helping emphasize the
importance of the mining of anomalous data.
In the test case of 118-bus system, five bad measurements are
assumed to occur at the 15th time sample, including the mea,
surements of
. Error of bad data is assumed to be 20 times
and
the standard deviation. Table VIII and Fig. 10 show the clus-

616

IEEE TRANSACTIONS ON POWER SYSTEMS VOL. 19, NO. 1, FEBRUARY 2004

TABLE VII
CLUSTERED RESULTS OF MULTIPLE MEASUREMENT ERRORS

Fig. 11.

Evolution of voltage magnitude. (a) Bus 16. (b) Bus 26.

Fig. 12. D(c) and (c) versus the cluster numbers with both measurement
errors and load changes occurred simultaneously.

Fig. 8. Performance evaluation between methods.

Fig. 13. Comparison results with both measurement errors and load changes
occurred simultaneously.

Fig. 9. Performance comparisons with and without bad data identification and
correction.

TABLE IX
CLUSTERED RESULTS WITH BOTH MEASUREMENT ERRORS AND LOAD
CHANGES OCCURRED SIMULTANEOUSLY

TABLE VIII
CLUSTERED RESULTS FOR THE 118-BUS SYSTEM

Fig. 10.

Performance assessment for the 118-bus system.

tered results and performance assessment, respectively. The


computation results support the proposed approach as well.
Test 4: Simultaneous Occurrence of Gross Measurement Errors and Load Changes:
In this case, the performance of
the proposed method is evaluated when the gross measurement
error and large load change are assumed occurred simultaneously. In the 30-bus system, two measurement errors of 10%
were assumed to occur at the 25th time sample, involving mea(no. 97) and
(no. 144). Besurements of
sides, the large load decrease in the following scenarios are simulated:
1) bus 16 at the 25th time sample;
2) bus 26 at the 25th time sample.

Fig. 11 shows the evolution of voltage magnitude at bus 16


and bus 26. It can be seen that the voltage magnitude increases
due to the large decrease in load. Now, by use of the proposed
and
technique, Fig. 12 delineates the values of
versus the cluster numbers under this condition. As the figure
reveals, the squared error data are grouped into three clusters
as Table IX tabulates. In the list of table, the abnormal group
(no. 16),
(no. 26),
covering anomalous data of

HUANG AND LIN: ENHANCEMENT OF ANOMALOUS DATA MINING IN POWER SYSTEM PREDICTING-AIDED STATE ESTIMATION

Fig. 14.

617

Evolution of voltage magnitude at bus 83.


Fig. 16. Performance comparisons of the methods under network
unobservability (a) 30-bus system. (b) 118-bus system.
TABLE XI
CLUSTERED RESULTS OF NETWORK UNOBSERVABILITY

Fig. 15.

Performance assessment for the 118-bus system.


TABLE X
CLUSTERED RESULTS FOR THE 118-BUS SYSTEM

(no. 47),
(no. 55),
(no. 97),
(no.
(no. 132),
(no. 133),
144),
(no. 135),
(no. 153), and
(no. 155), are
identified. Fig. 13 depicts the performance assessment results,
where the performance of the proposed approach is better than
that of the EKF scheme as well, solidifying the feasibility of
the method.
In addition to applying the method on the 30-bus system, the
method was also assessed through the 118-bus system, where
and
are assumed six times the
the errors of
standard deviation, while the large load decreases are assumed
to occurs at bus 83 at the 23rd time sample. Fig. 14 depicts the
evolution of voltage magnitude before and after the load change.
The results obtained from this 118-bus system are individually
shown in Table X and Fig. 15. As the outcome reveals, the feasibility of the proposed method is solidified.
Test 5: Network Unobservability: An electric power network may miss some measurements such that the network
becomes unobservable. For this scenario, the proposed method
is applied to investigate, where the pseudo-measurements is
assumed for the problem formulation [1]. Namely, once the
network becomes unobservable at the th time sample, the
lost one would be replaced with a corresponding pseudo-measurement obtained from the most recent estimate. Then, the
th
measurements determined by the estimated states at the
time sample will be served as the pseudo ones at the th time
sample in anticipating of reaching the network observability.
As for the test of IEEE 30-bus and 118-bus systems, they
are individually assumed to have missed some measurements
at the 15th and 22nd time samples. By using the method,
the clustered results are listed in Table XI, where the optimal cluster numbers of 30-bus and 118-bus systems are
both seen to be 2. For the 30-bus system, measurements of

Fig. 17. Performance comparison of methods under topology errors. (a)


30-bus system. (b) 118-bus system.

, and
, are identified
abnormal data; for the 118-bus system, measurements of
, and
are abnormal ones. Fig. 16 delineates the performance assessments of methods. This figure
shows that the performance of the proposed method is superior
to that of the other method.
Test 6: Topology Errors: In this case, the estimation performance of methods under the incorrectly reported topology is
assessed. Operation scenarios of difference power systems are
simulated as follows. For the IEEE 30-bus system, at the seventh time sample, the line connected between bus 25 and bus 26
is assumed to lose connection but reported linked. While for the
IEEE 118-bus test system, the status of breaker situated between
bus 21 and bus 22 is incorrectly reported at the 10th time sample.
With these assumed scenarios, Table XII tabulates the clustered
results using the proposed method. In the table, the abnormal
measurements are seen effectively identified. Fig. 17 plots the
performance comparisons of methods, implying a better performance revealed by the proposed algorithm.
In aforementioned approaches, they were completed by Visual C++ language and performed on a Pentium-IV 1.7-GHz
computer. Table XIII lists the averaged calculation time of four
methods under different test cases, where this computation time

618

IEEE TRANSACTIONS ON POWER SYSTEMS VOL. 19, NO. 1, FEBRUARY 2004

TABLE XII
CLUSTERED RESULTS FOR TOPOLOGY ERRORS

list of Table XIV, the performance index is summarized when


the method is performed with or without the data mining. For
all scenarios considered, the performance index of Method 4
(the proposed method) is seen to be the smallest one; thereby
consolidating the feasibility of the proposed approach while the
contribution of data mining is also confirmed.
V. CONCLUSION

TABLE XIII
COMPARISONS OF CALCULATION TIME FOR VARIOUS METHODS

In this paper, a predicting-aided estimation method involving


anomalous data mining is proposed to enhance the power system
state estimation. By embedding the sliding surface-enhanced
fuzzy control, the estimation performance under different scenarios is seen significantly improved. Moreover, the proposed
approach has been designed to intelligently search the clusters
of inconsistent data. Therefore, the robustness of the estimation work can be better ensured. Currently, with the financial
support, a research project is being carried out in anticipation
of embedding the proposed method into utility energy management software system. The results will be reported in the near
future.
ACKNOWLEDGMENT
The authors are greatly indebted to Taiwan Power Company
for providing their valuable operating experience.
REFERENCES

TABLE XIV
COMPARISONS OF ESTIMATED PERFORMANCE

is obtained by taking the average of 30 runs. It is found that although the proposed method (Method 4) has included the fuzzy
controller and data mining, its computation time is seen only
slightly increased under different scenarios. Note that because
Test 1 is assumed a normal operation scenario where the fuzzy
controller is not required to perform, the computation time of
Method 1 and 2 are of the same. As for Method 3, its computation time is found slightly longer than that of Method 1 and 2 as
the data mining is added. Then, in Method 4 where fuzzy controller and data mining are both included, the increased computation burden is very limited, yet the computation performance is improved significantly as Table XIV reveals. In the

[1] A. Monticelli, Electric power system state estimation, Proc. IEEE, vol.
88, no. 2, pp. 262282, Feb. 2000.
[2] W. Slutsker, S. Mokhtari, and K. A. Clements, Real time recursive parameter estimation in energy management systems, IEEE Trans. Power
Syst., vol. 11, pp. 13931399, Aug. 1996.
[3] P. Rousseaux, Th. Van Cutsem, and T. E. Dy Liacco, Whither dynamic
state estimation?, Int. J. Elect. Power Energy Syst., vol. 12, no. 2, pp.
104116, Apr. 1990.
[4] A. M. Leite da Silva, M. B. Do Coutto Filho, and J. F. de Queiroz,
State forecasting in electric power systems, Proc. Inst. Elect. Eng.
-Gen. Transm. Dist., vol. 130, no. 5, pp. 237244, Sept. 1983.
[5] J. A. Momoh, X. W. Ma, and K. Tomsovic, Overview and literature
survey of fuzzy set theory in power systems, IEEE Trans. Power Syst.,
vol. 10, pp. 16761690, Aug. 1995.
[6] R. L. King, Artificial neural networks and computational intelligence,
IEEE Comput. Applicat. Power, vol. 11, pp. 1416, 1825, Oct. 1998.
[7] A. K. Sinha and J. K. Mandal, Dynamic state estimator using ANN
based bus load prediction, IEEE Trans. Power Syst., vol. 14, pp.
12191225, Nov. 1999.
[8] J. C. Souza, A. M. Silva, and A. P. Silva, Data debugging for real-time
power system monitoring based on pattern analysis, IEEE Trans. Power
Syst., vol. 11, pp. 15921599, Aug. 1996.
[9] S. W. Kim and J. J. Lee, Design of a fuzzy controller with fuzzy sliding
surface, Fuzzy Sets Syst., vol. 71, no. 3, pp. 359367, May 1995.
[10] R. Tibshirani, G. Walther, and T. Hastie, Estimating the Number of
Cluster in a Dataset Via the Gap Statistic, Stanford University, Technical Rep., Mar. 2000.
[11] J. J. E. Slotine and W. Li, Applied Nonlinear Control. Englewood
Cliffs, NJ: Prentice-Hall, 1991.
[12] A. Kandel and G. Langholz, Fuzzy Control Systems. Boca Raton, FL:
CRC Press, 1994.
[13] H. Salehfar and R. Zhao, A neural network preestimation filter for
bad-data detection and identification in power system state estimation,
Electric Power Syst. Res., vol. 34, pp. 127134, Aug. 1995.
[14] K. Nishiya, J. Hasegawa, and T. Koike, Dynamic state estimation including anomaly detection and identification for power systems, Proc.
Inst. Elect. Eng.-Gen. Transm. Dist., vol. 129, no. 5, pp. 192198, Sept.
1982.
[15] P. Arabie, L. J. Hubert, and G. de Soete, Clustering and Classification,
Singapore: World Scientific, 1996.

HUANG AND LIN: ENHANCEMENT OF ANOMALOUS DATA MINING IN POWER SYSTEM PREDICTING-AIDED STATE ESTIMATION

[16] S. J. Huang and J. M. Lin, Enhancement of power system data debugging using GSA-based data mining technique, IEEE Trans. Power Syst.,
vol. 17, pp. 10221029, Nov. 2002.

Shyh-Jier Huang (M95-SM01) received the Ph.D. degree in electrical engineering from the University of Washington, Seattle, in 1994.
Currently, he is Professor with the Department of Electrical Engineering and
is the Project Manager in Computational Intelligence Applied to Power (CIAP)
Laboratory at National Cheng Kung University, Tainan, Taiwan, R.O.C. He
worked on projects at the Department of Electrical Engineering and Computer
Science at the University of California at Berkeley from 1989 to 1991. His
main areas of interests are power system analysis, power quality, and signal-processing applications.

619

Jeu-Min Lin is currently pursuing the Ph.D. degree at National Cheng Kung
University, Tainan, Taiwan, R.O.C.
His main areas of interest include power system analysis, statistical analysis,
and fuzzy system applications.