Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
Abstract—The smart grid is now increasingly dependent on Power Grid Field Devices
smart devices to operate, which leaves space for cyber attacks. Power Attack1
Generation sensors
Especially, the intentionally designed false data injection attack
State Estimation
Optimal Power Flow
C C
to fit linear data, and RNN is used to fit nonlinear data feature. A Emergency Analysis False Measure Data A Attack3
The second-level learner uses the fully connected layer and back D D
propagation (BP) module to adaptively combine the results of A Economic Dispatch
A
two base learners. Then, through fitting Weibull distribution of
the sum of square errors (SSEs) between the observed Fig.1.Illustration of False Data Injection Attack (FDIA) in smart grid.
measurements and the predicted measurements, the dynamic Especially, the traditional bad data detection (BDD) method
threshold is obtained to judge whether FDIA occurs or not. built in the SE relies on measurement residual to detect cyber
Comprehensive simulation results show that our scheme has attacks. Specifically, the measurement residual (i.e., the
better performance than other neural network based and difference between the vector of observed measurements and
ensemble learning based FDIA detection schemes. the vector of estimated measurements), is compared to the
Index Terms—False data injection attack, Kalman filter, predefined system-based threshold value, and the presence of
Recurrent neural network, Smart grid, Ensemble learning.
bad measurements is inferred if the measurement residual
I. INTRODUCTION exceeds the threshold.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
the system state and judge whether FDIA exists or not. But main components. Thorough simulations are conducted to
Kalman filter is only a linear estimation, ignoring the verify and compare with typical similar schemes. Finally, we
non-linear feature in smart grid data. Ref. [8][9] respectively briefly conclude this paper.
combine recurrent and convolutional neural networks with
fully connected layer to predict the system state and then II. RELATED WORK
detect FDIA. In contrast to Kalman filter, neural networks FDIA detection schemes can be mainly divided into two
process non-linear data well, but smart grid data also contain categories: the model-based and the data-driven [5]. The
linear component. Residual recurrent neural network (R2N2) former explicitly models smart grids based on the streaming
is presented to detect FDIA [10], in which vector real-time measurements along with the static system data such
auto-regression (VAR), i.e., the linear predictive component, as the system parameters and substations configuration, and
and recurrent neural network (RNN), i.e., the non-linear adopts these models to infer the data relationship. While, the
predictive component are sequentially combined. The latter is model-free, which directly learns the data relationship
weakpoint lies in that the deviation caused by the first through data mining and machine learning methods,
prediction component will directly affect the performance of especially by deep learning. An example of the first category
the following prediction component, which in turn hampers is the use of vector auto-regression [15] or Kalman filter for
the overall performance. prediction. As stated in [16], recently, many researchers have
used the method of machine learning, such as [8-10][17-19],
B. Main contribution
which belongs to the second category.
In recent years, methods of detection by integrating Kalman filter is widely used for prediction [20-22]. Its
multiple models have emerged and shown good results, such advantage lies in that it can calibrate the predicted value with
as [11-14]. Among them, ensemble learning has become a constructing Kalman gain, but the weak point lies in that it
popular machine learning paradigm where multiple base can only deal with well the linear data feature of smart grid.
learners are trained to solve the same problem, in which the However, the smart grid data naturally contain non-linear
generalization ability of an ensemble is usually much stronger component. Aside from Kalman filter, vector auto-regression
than that of base learners. is also a widely used process that captures the linear
Considering that the diversity of the base learners of an inter-dependencies between the different time series events,
ensemble is known to be an important factor in determining and can incorporate the dynamics of the system.
its generalization error, this paper presents an effective FDIA Ref. [8][9] respectively use the recurrent neural network
detection scheme based on Kalman filer and recurrent neural combined with the fully connected layer (i.e., so-called Wide
network (named KFRNN). Specifically, the main component), and the convolutional neural network combined
contributions of this paper are given as follows. with Wide component for prediction. Hereby, for comparison
Considering that the actual smart grid data are both in our paper, the scheme [8] is named RNNWide and the
linear and non-linear, and their time-sequential nature, scheme [9] named CNNWide. The above two schemes only
the Kalman filter and RNN are adaptively combined in use the non-linear model to predict the data and ignore the
parallel to incorporate the data characteristics linear component in the smart grid data. In particular, in [9], it
respectively. Specifically, following the stacking method, is inappropriate to stiffly convert time-sequential grid data
the second-level meta-learner, i.e., the BP (back into a two-dimensional structure for the implementation of
propagation) based neural network is designed to convolutional neural network, for the temporal nature of the
adaptively combine the prediction results of these two data can be destroyed by the rigid conversion.
base learners. Final output xˆt 2h
Comprehensive simulations demonstrate that KFRNN
Predicted residual ~
performs better than other neural network based and
rt2h
other ensemble learning based schemes, including R2N2,
RNNWide, and CNNWide, etc.
To verify the design rationality of KFRNN, we RNN
intentionally design the so-called R2N2_variant, which
follows the same neural structure as KFRNN, but similar Residual rth
Observed
to R2N2, uses the VAR component for processing the value xth
linear data feature and RNN component for processing
the non-linear data feature. The simulation results Linear output ~
xth Linear output ~
xt2h
demonstrate that KFRNN and R2N2_variant outperform
than other deep learning based FDIA detection schemes, VAR VAR
and moreover show the superiority of KFRNN among all
schemes.
The rest of this paper is organized as follows. Section II Input data x t Input data xth
discusses related work .In section III, we briefly introduce the Fig. 2. Flowchart of R2N2 based prediction model.
smart grid system model, state estimation and attack model. R2N2 is used for FDIA detection [10]. Fig. 2 presents the
Section IV provides the proposed scheme KFRNN, and its technical framework of R2N2 prediction model. Technically,
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
R2N2 is divided into two branches: the left branch uses VAR TABLE I
MAIN NOTATION AND THEIR MEANINGS USED IN THIS PAPER
to predict the states at time (t+h), and then sends into RNN the
residuals between the observed states and the predicted states Notation Definition
at time (t+h) to predict the residuals at time (t+2h); the right 𝒛 𝑀 × 1 measurement vector of smart grid
branch uses vector auto-regression to directly predict the 𝒙 (𝑁 − 1) × 1 state vector of smart grid
states at time (t+2h), and then the predicted states at time 𝒏 Noise variable
(t+2h) in right branch plus the residuals at time (t+2h) from 𝒂 The generated attack vector on the measurements
the left branch is regarded as the finally predicted states at 𝒓 The residual between the observed measurement and the
time (t+2h). predicted measurement
In a sense, R2N2 also takes into account both linear and 𝑯 𝑀 × (𝑁 − 1) measurement Jacobian matrix
non-linear data features in smart grid, in which VAR is used to 𝑴 The number of measurements (M=N+L) in smart grid
fit the linear data feature, and RNN is used to process the L The number of transmission lines in smart grid
non-linear data feature. However, R2N2 has one defect: the 𝑵 The number of buses in smart grid
deviation generated by the VAR prediction in the left branch
In a typical power system, there are buses inter-connected
is propagated to the following procedures of RNN. To verify
by the transmission lines. The Independent System Operators
our inference, besides KFRNN scheme, we intentionally
(ISOs) monitor the power system through SCADA
designed the so-called R2N2_variant, which has the same
measurements using sensors. Smart grid model is based on
neural structure as KFRNN, with the only difference that the
real-time measurements of data flow and static system data.
Kalman filter component in KFRNN is replaced by VAR
The goal of state estimation is to use real-time data collected
component. The simulation results demonstrate that the
from the measurement units to estimate the operating state of
performance of R2N2_variant is better than that of the
the smart grid.
traditional R2N2 scheme, which implies the rationality of the
For state estimation using the DC power flow model, the
proposed KFRNN structure.
relation between measurements and state variables is given as:
Different from the above FDIA detection schemes, another
𝑧 = 𝐻𝑥 + 𝑛 (1)
methodology is direct classification. Especially, in [23], the
𝑧 = [𝑧𝑁 , 𝑧𝐿 ]𝑇 represents a 𝑀 × 1 measurement vector of
superposition of multilayer convolutional neural networks is
smart grid, including power injections at 𝑁 buses and the
used as the classification model, and the labeled data is used
power flows on 𝐿 transmission lines. Specifically, 𝑧𝑁 =
to train the classification model that is then used to detect
[𝑃1 , … , 𝑃𝑁 , 𝑄1 , … , 𝑄𝑁 ]𝑇 , 𝑧𝐿 = [… , 𝑃𝑖𝑗 , … , 𝑄𝑖𝑗 , … ]𝑇 ,
FDIA in operating phase. Similarly, using the training data
where 𝑃𝑘 , 𝑄𝑘 , 𝑘 ∈ 1, 𝑁 respectively represents the active
with known labels, support vector mechanism is used to
power and reactive powers injected by 𝑘 bus, and
create a hyperplane to detect FDIA in [24]. In summary,
𝑃𝑖𝑗 , 𝑄𝑖𝑗 , 𝑖, 𝑗 ∈ [1, 𝑁] respectively represent the flow active and
classification based methods require that the training data
reactive powers of the transmission line between bus 𝑖 and 𝑗.
should have been explicitly labeled, and have the following
𝑥 is a 𝑁 − 1 × 1 state vector of smart grid, which, for
drawbacks. First, it is difficult to obtain the labeled data in
example, denotes the phase angles of 𝑁 buses (one of the
practice, especially for FDIA. Second, the classification
buses is regarded as a reference bus). 𝐻 is the 𝑀 × 𝑁 −
model trained by some specifically labeled data can only work
1measurement Jacobian matrix. 𝑛 is the noise variable, which
for some certain attack mode, and cannot be applicable to
follows the normal distribution 𝑛~𝑁(0, 𝜎𝑛2 ). Through the
other attack modes that may generate different attack data.
minimum mean squared error estimator, the state vector 𝑥
In addition, there are some other detection schemes. For
can be estimated from the Equation (2), denoted as𝑥 .
example, the learning model of mixture Gaussian distribution
𝑥 = (𝐻 𝑇 𝑅−1 𝐻)−1 𝐻 𝑇 𝑅−1 𝑧 (2)
proposed by [25] is used to fit normal states for detecting
where 𝑅 = 𝑑𝑖𝑎𝑔(𝜎𝑛2 ) is noise co-variance matrix, and
FDIA. The problem is that the selection of the number of
𝜎𝑛2 represents the variance of the noise on the 𝑛 − 𝑡
Gaussian density components and the overall probability
component of the measurement vector.
distribution does not vary largely if a state was occurred by a
Then, the residual between the observed measurements and
small attack. In [26], FDIA is detected based on graph signal
the estimated measurements is represented as Equation (3).
processing, but its complexity is relatively high. It is worth
𝑟 = 𝑧 − 𝐻𝑥 (3)
mentioning that Generative Adversarial Networks are used to
For traditional BDD system, if 𝑟 < 𝜏 (𝜏is the predefined
identify and recover the data attacked FDIA [27][28].
system-based threshold value), then SE is regarded as valid.
Anomaly detection technology is adopted in [29], which
Denoting 𝑎 as the nonzero injected vector which is injected
includes Luenberger observer and an artificial neural network
into the measurement data 𝑧, then the measurement vector
to identify false data.
after attack can be represented as 𝑧𝑎 = 𝑧 + 𝑎. If the attacker
III.THE SYSTEM MODEL AND PROBLEM STATEMENT would know the topology, that is, know the 𝐻 matrix, then
the attack vector can be designed as 𝑎 = 𝐻𝑐, where 𝑐 is
Table I gives the main notations and their meanings used in
usually a (𝑁 − 1) × 1 vector. Then, the estimated system
this paper.
state vector after attack 𝑥𝑎 can be represented as the
following Equation (4).
𝑥𝑎 = (𝐻 𝑇 𝑅−1 𝐻)−1 𝐻 𝑇 𝑅−1 𝑧𝑎 = 𝐻 𝑇 𝑅−1 𝐻 −1 𝐻𝑇 𝑅−1 𝑧 +
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
𝐻𝑐 = 𝐻 𝑇 𝑅−1 𝐻 −1 𝐻𝑇 𝑅−1 𝑧 + 𝐻 𝑇 𝑅−1 𝐻 −1 (𝐻 𝑇 𝑅−1 𝐻)𝑐 = to output the final predictive result [32][33]. In sequential
𝑥+𝑐 (4) ensemble, the base learners are created over iterations and
Then, the residual between the observed measurements and have dependency among them. Typically, the larger weights
the estimated measurements under FDIA, 𝑟𝑎 can be are assigned to the mis-predicted instances in the preceding
computed as the Equation (5). base learner to advance the following based learner. The
𝑟𝑎 = 𝑧𝑎 − 𝐻𝑥𝑎 = 𝑧 + 𝐻𝑐 − 𝐻𝑥 − 𝐻𝑐 = 𝑧 − 𝐻𝑥 = advantage of parallel ensemble is that: due to the parallel
𝑟 (5) execution of the base learners, the entire scheme can be
Since 𝑟𝑎 = 𝑟 , it implied that the traditional measurement executed efficiently, and the biases of individual base learner
residual based BDD method can’t detect the FDIA at all, can be overcome, and then achieve higher overall accuracy
so-called a stealthy FDIA. [34]. More importantly, since the data in FDIA scenario is
In summary, the intentionally injected measurement attack dynamic and time-serial, and KFRNN detects the FDIA in
𝑎 = 𝐻𝑐 can make the traditional SE in smart grid changes the real-time way, especially in operation phase, the Kalman filter
estimated state from 𝑥 into 𝑥 + 𝑐, and successfully bypass in KFRNN is built on-site for real-time detection, therefore it
the measurement residual based BDD. Therefore, if the is extremely difficult (or even impossible) to adjust the
designer, instead of traditional SE, could accurately predict weights of even-changing data instance and send to the
the state variables through exploiting the time-sequential second learner.
features of historical state variables, then the FDIA attack can Based on above consideration, the parallel ensemble is
be detected effectively. utilized in our KFRNN scheme for real-time operation. Note
that the parallel learning framework can be implemented with
IV.KFRNN: THE PROPOSED FDIA DETECTION SCHEME bagging and stacking methods. The difference lies in the way
In a sense, our proposed scheme is inspired from ensemble of combining the base learners’ results. Basically, in bagging,
learning. There are several reasons why ensemble learning a simple combiner, e.g., majority voting for classification [11]
paradigm often improves performance of the prediction and weighted averaging for regression, is used to create
scheme [30]: 1) Overfitting avoidance. 2) Computational ensemble estimates, while stacking utilizes a second-level
advantage. 3) Representation. In terms of production way of learner to adaptively combine the base learner models
the base learners, ensemble learning can be broadly divided (so-called meta-algorithm), which can alleviate the biases in
into parallel and sequential ensemble [31]. In parallel the base models. Therefore, our work KFRNN adopts the
ensemble, each base learner performs prediction stacking method.
independently, and then multiple base learners are combined
Offline
Determining the weight between
Kalman filter and RNN
Training
set 2
RNN Kalman
filter
Detection threshold
Offline
Inferring the detection threshold SSE between The
Traning set 3 predicted and observed
measurements Fitting the cumulative probability
density curve of SSE to infer
detection threshold
KFRNN Predicted value
prediction model
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
Fig. 3 shows the overall flowchart of the proposed scheme 𝑥𝑡+ = 𝑥𝑡− + 𝐾𝑡 𝑧𝑡 − 𝐻𝑥𝑡− (9)
KFRNN, which is composed of two phases: offline training 𝐶𝑡+ = 𝐶𝑡− − 𝐾𝑡 𝐻𝐶𝑡− (10)
and online detection. where 𝐹 is the state transition matrix, which can be solved
The offline training phase explicitly includes the following by vector auto-regression model; 𝐶 is the covariance matrix
components: training the RNN prediction model, determining of the states; 𝑊 is the covariance matrix of state noise; 𝐾 is
the weight between Kalman filter and RNN prediction models, the Kalman gain; 𝑅 is the covariance matrix of the
and inferring the FDIA detection threshold through fitting the measurement noise; {∙−} represents the intermediate value
sum of square errors (SSEs) between the observed obtained by the prediction component, and ∙+ denotes the
measurements and the predicted measurements converted final value after the filter component. Compared with vector
from the predicted states. Especially, the combination of the auto-regression, Kalman filter calibrates the predicted state
former two components is named KFRNN prediction model with the observed measurements through the constructed
in our paper. In KFRNN, first, two parallel prediction models Kalman gain at time 𝑡, which finally improves the prediction
are built: one is the Kalman filter suitable for linear prediction; performance.
another is the recurrent neural network suitable for the Based on the above Kalman filter, we can get the predicted
non-linear data features. After the two parts are predicted in measurements 𝑧𝑝𝑟𝑒 of the Kalman component in Fig. 3at
parallel, they are fed into the fully-connected layer and back time t, shown as Equation (11).
propagation (BP) module for adaptively weighting. 𝑧𝑝𝑟𝑒 = 𝐻𝑥𝑡+ (11)
At online detection phase, the built KFRNN is used to
obtain the predicted states, and then the corresponding B. Recurrent Neural Network Suitable for Non-linear Data
predicted measurements can be estimated through Equation LSTM is a special kind of RNN, first proposed in [35].
(1). The residual between the finally predicted measurements LSTM cell can be formulated as the following Equations (12)
and the observed measurements is compared with the to (16).
detection threshold obtained in training phase, to infer 𝑓𝑡 = 𝜎(𝑤𝑓 𝑥𝑡 + 𝑢𝑓 𝑡−1 + 𝑏𝑓 ) (12)
whether FDIA occurs or not. 𝑖𝑡 = 𝜎(𝑤𝑖 𝑥𝑡 + 𝑢𝑖 𝑡−1 + 𝑏𝑖 ) (13)
It should be explicitly noted that, unlike the RNN 𝑜𝑡 = 𝜎(𝑤𝑜 𝑥𝑡 + 𝑢𝑜 𝑡−1 + 𝑏𝑜 ) (14)
component that is trained offline, in detection phase, Kalman 𝑐𝑡 = (𝑓𝑡 ⊗ 𝑐𝑡−1 ) ⊕ (𝑖𝑡 ⊗ 𝑡𝑎𝑛 𝑤𝑐 𝑥𝑡 + 𝑢𝑐 𝑡−1 + 𝑏𝑐 ) (15)
filter component in our proposed KFRNN scheme is 𝑡 = 𝑜𝑡 ⊗ 𝑡𝑎𝑛(𝑐𝑡 ) (16)
dynamically built, and continually evolved using the where 𝑓 is forget gate; 𝑖 is input gate; 𝑜is output gate; 𝑐 is
streaming real-time states and mathematical model for the
cell memory; is hidden state; 𝑥 is the input vector; 𝑏 is
power grid. It should be explicitly pointed out that, in the
bias; 𝑤 and 𝑢 represent the weight vectors of 𝑓 and ,
training process, Kalman filter is also built using the training
respectively; 𝜎 ∙ and 𝑡𝑎𝑛(∙) stand for the sigmoid
dataset, whose purpose is to adaptively determine the weights
function and the hyperbolic tangent function,respectively.
of two base predictors in our ensemble learning inspired
Compared with other RNNs, LSTM has better performance
KFRNN scheme. The detailed procedure is given in
in predicting time series data due to the existence of the gate
subsection IV.C.
functions, so we use LSTM in nonlinear component.
As described above, to verify the design rationality of
Note that considering that training neural network requires
KFRNN, we intentionally design the so-called R2N2_variant,
abundant time which can’t meet the real-time operation in
which follows the same neural structure as KFRNN shown as
FDIA detection, therefore, RNN component in KFRNN is
Fig. 3, but uses the same VAR component for processing the
trained offline, and then used for real-time prediction in
linear data feature and RNN component for processing the
detection phase.
non-linear data feature as R2N2.
The following subsections respectively describe the various C. Fully Connected Layer and BP for Adaptively Weighting
components shown in Fig.3, including Kalman filter, RNN Two Predictive Components
models, fully connected layer and BP for weight Kalman filter and RNN predictors can correspondingly
determination, and detection threshold determination. incorporate the linear and non-linear components respectively.
A. Kalman Filter Suitable for Linear Data It is imperative to investigate how to appropriately
accumulate these two predictive components to finally obtain
Kalman filter is one of the main dynamic state estimation
the accurate prediction state values.
methods in power system, which, in our scheme, is used for
estimating grid state through using linear state equations. The
formula of Kalman filter includes two components: prediction
component and filtering component. These two components
are performed in every estimation step t.
1) prediction component:
+
𝑥𝑡− = 𝐹𝑥𝑡−1 (6)
− +
𝐶𝑡 = 𝐹𝐶𝑡−1 𝐹 𝑇 + 𝑊𝑡−1 (7)
2) filtering component:
𝐾𝑡 = 𝐶𝑡−𝐻 𝑇 (𝐻𝐶𝑡−𝐻 𝑇 + 𝑅𝑡 )−1 (8)
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
method in reliability engineering and failure analysis
z1 w
, b1,1
1,1
1 z obs [36].
w1,M , b1,M
Kalman filter
1 Finally, the corresponding threshold value is determined
under a given false alarm rate.
...
...
z1_ pre
M
M
zM
...
zM1 M1
δ M1
zM _ pre
RNN
...
...
δ
z2M 2M
2M
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
power flows of all branches. The generated data consist of position of 𝑐𝑖 is randomly selected. The reason why the state
2,000 samples, divided into four equal parts: three for training attack variance is set in such way lies that, let 𝜎𝑥2𝑎 and 𝜎𝑥2
and one for testing. The three training data are used for denote the variances of the elements of 𝑥𝑎 and 𝑥
training RNN and BP and constructing SSEs fitting curves respectively, and it is reasonable to assume 10 𝑙𝑜𝑔 𝜎𝑐2 𝜎𝑥2 =
respectively. 3dB, i.e., the variance of the state vector contributed by the
injected vector 𝑐 is assumed to be 3dB higher than that
contributed by the original state data [39]. Empirically, the
state (i.e., voltage angles of all buses) variance in our data is
within 1 and 3. Therefore, the variance is set as 6. Note that,
we also conduct simulations to evaluate the impact of change
of injected state attack variance on detection probability,
shown as the following Fig. 14. Note that the measurement
noise with zero mean and variance 1 (the empirical variance
of the real power) Gaussian distribution is also used.
In summary, the compromised measurements are given as:
𝑧𝑎 = 𝐻𝑥 + 𝑛 + 𝑎 , where 𝐻𝑥 is real measurement, 𝑛 is
measurement noise, and 𝑎 is the injected measurement attack.
Since 𝑎 is randomly sampled from the normal Gaussian
distribution with varied variance, which, in a sense, is
Fig.6. Illustration of the first 100 samples of bus 5-9 state data (i.e., voltage generalized to scaling attack, ramp attack, and step attack.
angle) in IEEE 14. In addition, similar as in [10], the number of hidden layer
Fig.6 graphically shows the first 100 samples of bus 5-9 cells of LSTM is 100, and the number of input, output layer
training state data in IEEE 14, in which the shown state of and fully connected layer cells depends on the used smart grid
grid is the argument of complex voltage. Intuitively, we can model. In this paper, KFRNN, R2N2_variant, R2N2 [10],
observe that it contains abundant linear and non-linear RNNWide [8] and CNNWide [9] are analyzed and compared
components. Moreover, Fig.5 illustrates that the training state in IEEE 14 model. Note that, to illustrate the applicability of
data are fluctuating, which is also in line with the fluctuation the scheme to larger networks, we also provide the
of the actual voltage argument. performance evaluation on IEEE57 model.
2) FDIA attack model
B. Performance of Prediction Accuracy of Various Scheme
Generally, the FDIA signal is injected into the
measurements, modeled as an additive signal which is added First, we illustrate the measurement prediction accuracy of
to the measurement sensor readings. This additive signal can these five schemes, in terms of mean relative square error
be of any value, and throughout the literature, the scaling (MRSE) function. MRSE is defined as the Equation (20).
attack, ramp attack, step attack, and general random attack, 𝑇
𝑡=1
𝑁 −1 (𝑥 −𝑥 )2
𝑘=1 𝑘𝑡 𝑘𝑡
etc. are usually used [38]. 𝑀𝑅𝑆𝐸 = (20)
𝑇 𝑁 −1 (𝑥 −𝑚𝑒𝑎𝑛 (𝑥 ))2
In our work, instead of random attack model where the 𝑡=1 𝑘=1 𝑘𝑡 𝑘
attacker simply manipulates the sensor readings by inserting where 𝑥𝑘𝑡 represents the observed states of the 𝑘 − th
random attack vector generated by the attacker, the stealthy dimension at time 𝑡, which is obtained by using the minimum
false data injection attack model is used, which, in a sense, mean squared error described in the section II; 𝑥𝑘𝑡 is the
can be regarded as a special random attack. This model finally predicted states of the 𝑘 − th dimension at time 𝑡;
assumes the attackers have the knowledge of the system 𝑁 − 1 represents the dimension of the states; 𝑇 is the total
topology represented by the Jacobian matrix H and can predicted time points. In brief, the smaller the MRSE value is,
possess the capability of compromising a limited number of the better the scheme performance is.
measurements.
Specifically, in our simulations, first the compromised state
c is randomly selected, then the corresponding measurements
are compromised by the injected measurement attack set as
𝑎 = 𝐻𝑐. The rationale of using the above FDIA attack model
lies in the following fact. From Equations (2) to (5), we can
observe that the traditional BDD can’t detect the intentionally
designed FDIA attack, but other brute-force attacks can be
easily detected by the existed BDD method. The used stealthy
FDIA attack model is first proposed by [2].
Specifically, the compromised state vector
𝑐 = 0, … , 𝑐𝑖 , … 0 , 𝑖 ∈ [2, 𝑁] is generated randomly, in
which 𝑐𝑖 follows a Gaussian distribution with a mean of 0 Fig.7. Prediction performance (i.e., MRSE) of various schemes including
KFRNN, R2N2_variant, RNNWide, CNNWide and R2N2 in IEEE 14.
and variance between 0.5 and 8 (commonly set as 6), and the
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
false positive rate, to increase the detection probability, i.e., Treating our proposed KFRNN and R2N2_variant as two
true positive rate. ensembles, we can obtain the following experimental results:
𝐷𝑘𝐹𝑅𝑁𝑁 = 0.56, and 𝐷𝑅2𝑁2_𝑣𝑎𝑟𝑖𝑎𝑛𝑡 = 0.14. According to the
Equation (21), the result demonstrates that KFRNN should
theoretically perform better than R2N2_variant, which is
experimentally verified by our simulation results.
VI. CONCLUSION
Nowadays, smart grid faces many threats which may cause
huge economic losses, especially FDIA. To effectively detect
FDIA, inspired by the parallel ensemble learning, especially
stacking method, the KFRNN scheme is proposed in this
paper, which fully takes into account the linear and non-linear
components in the smart grid data. Specifically, KFRNN uses
Kalman filter and RNN as two base learners to respectively fit
corresponding linear and non-linear data features of smart
grid, then the second-level meta-learner adaptively combines
Fig.14.The detection probability with the change of injection attack vector the Kalman filter and RNN predictors through a specially
variance. designed neural network. Furthermore, to verify the advantage
Fig.14 shows the detection probabilities of various schemes of our design framework, we re-structure the typical residual
when the false alarm rate is 0.05 and the injection state attack recurrent neural network (R2N2) as R2N2_variant used for
vector variance changes from 0.5 to 8. The following FDIA detection. Then, the proposed scheme is
observations can be obtained. First, under all attack comprehensively compared with the other neural network
magnitudes (characterized by the injection attack vector based and ensemble learning based schemes, namely,
variance), our proposed KFRNN and R2N2_variant RNNWide, CNNWide and R2N2. The simulation results
outperform than other schemes. Second, when the attack demonstrate that, in terms of detection probability 𝐹1-score
variance is small (for example, less than 1), all schemes have and accuracy, KFRNN achieves the best performance for
relatively low detection probability. The reason is various grid network size including IEEE 14 and 57 buses,
straightforward: small attack variance generally means small and the re-designed R2N2_variant also performs better than
attack magnitude injected (since the mean of attack is 0), RNNWide, CNNWide and R2N2. Moreover, through
which, then, leads to the measurement residual may be lower analyzing the diversity of ensemble, we demonstrate the
than the detection threshold. However, when the attack underlying reason why KFRNN performs better than
variance is recognizable, as shown in the inner figure in Fig. R2N2_variant.
14, our proposed schemes can achieve good performance.
REFERENCES
D. The theoretical implication of our proposals
[1] A. Muir and J. Lopatto, "Final report on the August 14,
In an ensemble, the combination of the output of several 2003 blackout in the United States and Canada: causes
base learners is useful if they disagree on some inputs. The and recommendations," US–Canada Power System
disagreement is measured as the diversity/ambiguity of the Outage Task Force, Canada, 2004.
ensemble. It is shown that that the generalization error E of [2] Y. Liu, P. Ning, and M.K. Reiter, ―False data injection
the ensemble can be expressed as the following equation [40]. attacks against state estimation in electric power grids,‖
𝐸 =𝐸−𝐷 (21) ACM Transactions on Information and System Security,
where 𝐸 and 𝐷 are the mean error and diversity of the vol. 14,no. 1, May 2011, Art. no. 13.
ensemble respectively. This result implies that increasing [3] S. K. Singh, K. Khanna, R. Bose, B.K. Panigrahi, and A.
ensemble diversity while maintaining the average error of Joshi, "Joint-Transformation-Based Detection of False
ensemble members, should decrease the ensemble error. Data Injection Attacks in Smart Grid," IEEE
We use the disagreement of an ensemble member with the Transactions on Industrial Informatics, vol. 14, no. 1, pp.
ensemble’s prediction as a measure of diversity. More 89-97, Jan. 2018.
precisely, if 𝐶𝑖 (𝑠) is the prediction of the i-th predictor for [4] G. Liang, J. Zhao, F. Luo, S.R. Weller, and Z.Y. Dong,
the sample of 𝑠 ; 𝐶 ∗ (𝑠) is the prediction of the entire ―A Review of False Data Injection Attacks Against
ensemble, then the diversity of the i-th predictor on example Modern Power Systems,‖ IEEE Transactions on Smart
𝑠 is given by 𝑑𝑖 𝑠 = |𝐶𝑖 𝑠 − 𝐶 ∗ (𝑠)| (22) Grid, VOL. 8, NO. 4, JULY 2017.
To compute the diversity of an ensemble of size n (in our [5] A.S. Musleh, G. Chen, and Z.Y. Dong, "A Survey on the
proposed schemes KFRNN and R2N2_variant, n=2), on a test Detection Algorithms for False Data Injection Attacks in
set of size m, we average the Equation (22), and obtain the Smart Grids," IEEE Transactions on Smart Grid, vol. 11,
diversity of each scheme, represented as Equation (23). no. 3, pp. 2218-2234, May 2020.
1 𝑛 𝑚
𝐷= 𝑖=1 𝑗 =1 𝑑𝑖 𝑠𝑗 (23)
𝑛∙𝑚
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
[6] A. Amulya and K.S. Swarup, "Analysis of False Data Learning-Based Intelligent Mechanism," IEEE
Injection Attacks on Multiarea Load Frequency Control," Transactions on Smart Grid, vol. 8, no. 5, pp. 2505-2516,
in proceedings of the 8th International Conference on Sept. 2017.
Power Systems (ICPS), Jaipur, India, 2019, pp. 1-6. [19] A. Ayad, H.E.Z. Farag, A. Youssef, and E.F. El-Saadany,
[7] G. Liang, S. R. Weller, J. Zhao, F. Luo, and Z.Y. Dong, "Detection of false data injection attacks in smart grids
―The 2015ukraine blackout: Implications for false data using Recurrent Neural Networks," in proceedings of
injection attacks,‖ IEEE Transactions on Power Systems, IEEE Power & Energy Society Innovative Smart Grid
vol. 32, no. 4, pp. 3317–3318, July, 2017. Technologies Conference (ISGT), Washington, DC, 2018,
[8] Y. Wang, D. Chen, C. Zhang, X. Chen, B. Huang, and X. pp. 1-5.
Cheng, "Wide and Recurrent Neural Networks for [20] Q. Yang, L. Chang, and W. Yu, "On false data injection
Detection of False Data Injection in Smart Grids," in attacks against Kalman filtering in power system
proceedings of the International Conference on Wireless dynamic state estimation," Security and Communication
Algorithms, Systems, and Applications, Springer, Cham, Networks, vol. 9, no. 9, pp. 833-849, June 2016.
2019, pp. 335-345. [21] D.B. Rawat and C. Bajracharya, "Detection of False
[9] Z. Zheng, Y. Yang, X. Niu, H. Dai, and Y. Zhou, "Wide Data Injection Attacks in Smart Grid Communication
and Deep Convolutional Neural Networks for Systems," IEEE Signal Processing Letters, vol. 22, no.
Electricity-Theft Detection to Secure Smart 10, pp. 1652-1656, Oct. 2015.
Grids," IEEE Transactions on Industrial Informatics, vol. [22] K. Manandhar, X. Cao, F. Hu, and Y. Liu, "Detection of
14, no. 4, pp. 1606-1615, April 2018. Faults and Attacks Including False Data Injection Attack
[10] Y. Wang, W. Shi, Q. Jin, and J. Ma, "An Accurate False in Smart Grid Using Kalman Filter," IEEE Transactions
Data Detection in Smart Grid Based on Residual on Control of Network Systems, vol. 1, no. 4, pp.
Recurrent Neural Network and Adaptive threshold," 370-379, Dec. 2014.
in proceedings of IEEE International Conference on [23] S. Wang, S. Bi, and Y. J. and A. Zhang, "Locational
Energy Internet (ICEI), Nanjing, China, 2019, pp. Detection of the False Data Injection Attack in a Smart
499-504. Grid: A Multilabel Classification Approach," IEEE
[11] M.H. Haghighat and J. Li, "Intrusion detection system Internet of Things Journal, vol. 7, no. 9, pp. 8218-8227,
using voting-based neural network," in Tsinghua Science Sept. 2020.
and Technology, vol. 26, no. 4, pp. 484-495, Aug. 2021. [24] Y. Deng, K. Zhu, R. Wang, and Y. Wan, "Real-time
[12] L. Sun, S. Sun, T. Wang, J. Li and J. Lin, "Parallel ADR Detection of False Data Injection Attacks Based on Load
detection based on spark and BCPNN," in Tsinghua Forecasting in Smart Grid," in proceedings of IEEE
Science and Technology, vol. 24, no. 2, pp. 195-206, International Conference on Communications, Control,
April 2019. and Computing Technologies for Smart Grids
[13] Y. Lv et al., "A classifier using online bagging ensemble (SmartGridComm), Beijing, China, 2019, pp. 1-6.
method for big data stream learning," in Tsinghua [25] S.A. Foroutan and F.R. Salmasi, "Detection of false data
Science and Technology, vol. 24, no. 4, pp. 379-388, Aug. injection attacks against state estimation in smart grids
2019. based on a mixture Gaussian distribution learning
[14] G. Xi, X. Zhao, Y. Liu, J. Huang and Y. Deng, "A method," IET Cyber-Physical Systems: Theory &
hierarchical ensemble learning framework for Applications, vol. 2, no. 4, pp. 161-171, Dec. 2017.
energy-efficient automatic train driving," in Tsinghua [26] E. Drayer and T. Routtenberg, "Detection of False Data
Science and Technology, vol. 24, no. 2, pp. 226-237, Injection Attacks in Smart Grids Based on Graph Signal
April 2019. Processing," IEEE Systems Journal, vol. 14, no. 2, pp.
[15] W. Shi, Y. Wang, Q. Jin, and J. Ma, "PDL: An Efficient 1886-1896, June 2020.
Prediction-Based False Data Injection Attack Detection [27] Y. Li, R. Huang, and L. Ma, "False Data Injection Attack
and Location in Smart Grid," in proceedings of IEEE and Defense Method on Load Frequency Control," IEEE
42nd Annual Computer Software and Applications Internet of Things Journal, Available online, doi:
Conference (COMPSAC), Tokyo, 2018, pp. 676-681. 10.1109/JIOT.2020.3021429.
[16] M. Ozay, I. Esnaola, F.T. YarmanVural, S.R. Kulkarni, [28] Y. Li, Y. Wang, and S. Hu, "Online Generative Adversary
and H.V. Poor, "Machine Learning Methods for Attack Network Based Measurement Recovery in False Data
Detection in the Smart Grid," IEEE Transactions on Injection Attacks: A Cyber-Physical Approach," IEEE
Neural Networks and Learning Systems, vol. 27, no. 8, Transactions on Industrial Informatics, vol. 16, no. 3, pp.
pp. 1773-1786, Aug. 2016. 2031-2043, Mar. 2020.
[17] G. Pu, L. Wang, J. Shen and F. Dong, "A hybrid [29] A. Abbaspour, A. Sargolzaei, P. Forouzannezhad, K.K.
unsupervised clustering-based anomaly detection Yen, and A.I. Sarwat, "Resilient Control Design for Load
method," in Tsinghua Science and Technology, vol. 26, Frequency Control System Under False Data Injection
no. 2, pp. 146-153, April 2021. Attacks," IEEE Transactions on Industrial Electronics,
[18] Y. He, G. J. Mendis, and J. Wei, "Real-Time Detection of vol. 67, no. 9, pp. 7951-7962, Sept. 2020.
False Data Injection Attacks in Smart Grid: A Deep
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal
[30] O. Sagi and L. Rokach, "Ensemble learning: A survey," Zhihao Zhang is currently working toward
Wiley Interdisciplinary Reviews: Data Mining and the master’s degree in telecommunications
Knowledge Discovery, vol. 8, no. 4, pp. e1249, Feb. and information engineering with the Nanjing
2018. University of Posts and Telecommunications,
[31] J. Zhang, Z. Li, K. Nai, Y. Gu, and A. Sallam, "DELR: A Nanjing, China. His main research interests
double-level ensemble learning method for unsupervised include deep learning and artificial
anomaly detection," Knowledge-Based Systems, vol. 181, intelligence, and its applications in Energy
no. 1, pp. 104783,Oct. 2019. Internet.
[32] E.L. Bullock, C.E. Woodcock, and C.E. Holden,
"Improved change monitoring using an ensemble of time
series algorithms," Remote Sensing of Environment, vol. Jianhua Ma (M’-91) is currently a professor
238, no. 1, pp. 111165, Mar. 2020. at Digital Media Department in the Faculty of
[33] M.H. Alobaidi, F. Chebana, and M.A. Meguid, "Robust Computer and Information Sciences, in Hosei
ensemble learning framework for day-ahead forecasting University, Japan. Dr. Ma is a member of
IEEE and ACM. He has edited 10
of household based energy consumption," Applied books/proceedings, and published more than
energy, vol. 212, no. 15, pp. 997-1012, Feb. 2018. 150 academic papers in journals, books and
[34] M. Woźniak, M. Grana, and E. Corchado, "A survey of conference proceedings. His research interest
multiple classifier systems as hybrid systems," is ubiquitous computing.
Information Fusion, vol. 16, pp. 3-17, Mar. 2014.
[35] S. Hochreiterand and J. Schmidhuber, "Long short-term
memory," Neural computation, vol. 9, no.8, pp. Qun Jin (M’95–SM’17) is a professor at the
1735-1780, Nov. 1997. Networked Information Systems Laboratory,
[36] S. Mousavian, J. Valenzuela, and J. Wang, "Real-time Department of Human Informatics and
data reassurance in electrical power systems based on Cognitive Sciences, Faculty of Human
Sciences, Waseda University, Japan. He has
artificial neural networks," Electric Power Systems
been extensively engaged in research works in
Research, vol. 96, pp. 285-295, Mar. 2013. the fields of computer science, information
[37] R.D. Zimmerman, C.E. Murillo-Sánchez, and R.J. systems, and human informatics. His recent
Thomas, "MATPOWER: Steady-State Operations, research interests cover human-centric
Planning, and Analysis Tools for Power Systems ubiquitous computing, behavior and cognitive
Research and Education," IEEE Transactions on Power informatics, big data, personal analytics and individual modeling,
Systems, vol. 26, no. 1, pp. 12-19, Feb. 2011. cyber security, blockchain, intelligence computing and
[38] S. Sridhar and M. Govindarasu, ―Model-based attack applications in healthcare, and computing for well-being. He
detection and mitigation for automatic generation authored or co-authored several monographs and more than 300
refereed papers published in academic journals and international
control,‖ IEEE Transactions on Smart Grid, vol. 5, no. 2,
conference proceedings. He is a foreign fellow of the Engineering
pp. 580–591, 2014. Academy of Japan (EAJ).
[39] Z. Yu and W. Chin, "Blind False Data Injection Attack
Using PCA Approximation Method in Smart Grid,"
IEEE Transactions on Smart Grid, vol. 6, no. 3, pp.
1219-1226, May 2015.
[40] P. Melville and R.J. Mooney, ―Creating diversity in
ensembles using artificial data,‖ Information Fusion, Vol.
6, No. 1, March 2005, pp. 99-111.
2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.