You are on page 1of 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

KFRNN: An Effective False Data Injection Attack


Detection in Smart Grid Based on Kalman Filter and
Recurrent Neural Network
Yufeng Wang, Member, IEEE,Zhihao Zhang, Jianhua Ma, Member, IEEE and Qun Jin, Senior Member, IEEE

Abstract—The smart grid is now increasingly dependent on Power Grid Field Devices
smart devices to operate, which leaves space for cyber attacks. Power Attack1
Generation sensors
Especially, the intentionally designed false data injection attack

False commands /operations


(FDIA) can successfully bypass the traditional measurement

Measure Data Acquisition


Power
Transmission
residual-based bad data detection scheme. Considering that the Power
meters

smart grid data naturally contain linear and nonlinear Distribution


Attack2
components, inspired by parallel ensemble learning, especially Power RTU
Consumption
by stacking method, this paper presents an effective two-level
learner based FDIA detection scheme using Kalman filter and
recurrent neural network (KFRNN). The first-level includes two S Control Center/EMS
S
base learners, in which Kalman filter is used for state prediction

State Estimation
Optimal Power Flow
C C
to fit linear data, and RNN is used to fit nonlinear data feature. A Emergency Analysis False Measure Data A Attack3
The second-level learner uses the fully connected layer and back D D
propagation (BP) module to adaptively combine the results of A Economic Dispatch
A
two base learners. Then, through fitting Weibull distribution of
the sum of square errors (SSEs) between the observed Fig.1.Illustration of False Data Injection Attack (FDIA) in smart grid.
measurements and the predicted measurements, the dynamic Especially, the traditional bad data detection (BDD) method
threshold is obtained to judge whether FDIA occurs or not. built in the SE relies on measurement residual to detect cyber
Comprehensive simulation results show that our scheme has attacks. Specifically, the measurement residual (i.e., the
better performance than other neural network based and difference between the vector of observed measurements and
ensemble learning based FDIA detection schemes. the vector of estimated measurements), is compared to the
Index Terms—False data injection attack, Kalman filter, predefined system-based threshold value, and the presence of
Recurrent neural network, Smart grid, Ensemble learning.
bad measurements is inferred if the measurement residual
I. INTRODUCTION exceeds the threshold.

W ITH the development of society, people’s demand of


electricity is getting higher and higher, which promotes
the development of smart grid. Application of sensing,
A. Research motivation
False data injection (FDI) that attacks on the integrity of
data to corrupt the outputs of SE, is emerging as a severe
communications and computing improves the quality of threat to smart grid [2], which affects the normal operation of
monitoring and control of smart grids, meanwhile increases the system and leads to the system state error [3][4].
vulnerability to various cyber attacks. The 2003 blackout in As depicted in Fig.1, practically, an FDI attack (FDIA) can
the North-East [1], for example, shows that even minor be performed by manipulating the network communication
glitches in parts of the grid eventually caused billions of channels (i.e., attack 2) or hacking meters and/or control
dollars in economic losses. centers (i.e., attack 1 or 3) in the smart grid. Generally, ―Basic
As shown in Fig. 1, the measurement data, such as bus FDIA‖ represents FDIA can be detected by the traditional
voltage, bus power flow, and branch power flow, etc., from BDD, while stealthy FDIA denotes that FDIA can’t be
the sensors or meters, are sent to a control center known as perceived via the BDD [5]. In detail, if the attacker knows the
SCADA (Supervisory Control and Data Acquisition) system. whole grid network topology, then the intentionally injected
Then, it analyzes the received measurement data, conducts false data can successfully bypass the traditional measurement
state estimation (SE), detects the potentials of contingency, residual-based BDD scheme, so-called stealthy FDIA. The
and sends the corresponding control signals to the remote effects of FDIA, such as frequency and load imbalance, load
terminal units to ensure the reliable operation. interruption and damage of grid components, are practically
elaborated [6]. Specifically, the 2015 Ukraine blackout [7]
Manuscript received at April. 30, 2021; Revised at Jun. 17, 2021; Accepted demonstrates the plausibility of common assumptions
at Sep. 14, 2021. This research is sponsored by QingLan Project of JiangSu
Province. (Corresponding author: Yufeng Wang.)
regarding the knowledge and capabilities required by an
Yufeng Wang and Zhihao Zhang are with Nanjing University of Posts and adversary to mount a successful FDIA on a smart grid.
Telecommunications, China (e-mail: wfwang@njupt.edu.cn). Currently, there exist many schemes to detect FDIA in
Jianhua Ma is with Hosei University, Japan. (e-mail: jianhua@hosei.ac.jp). literature. For example, Kalman filter (KF) is used to estimate
Qun Jin is with Waseda University, Japan, (e-mail: jin@waseda.jp).

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

the system state and judge whether FDIA exists or not. But main components. Thorough simulations are conducted to
Kalman filter is only a linear estimation, ignoring the verify and compare with typical similar schemes. Finally, we
non-linear feature in smart grid data. Ref. [8][9] respectively briefly conclude this paper.
combine recurrent and convolutional neural networks with
fully connected layer to predict the system state and then II. RELATED WORK
detect FDIA. In contrast to Kalman filter, neural networks FDIA detection schemes can be mainly divided into two
process non-linear data well, but smart grid data also contain categories: the model-based and the data-driven [5]. The
linear component. Residual recurrent neural network (R2N2) former explicitly models smart grids based on the streaming
is presented to detect FDIA [10], in which vector real-time measurements along with the static system data such
auto-regression (VAR), i.e., the linear predictive component, as the system parameters and substations configuration, and
and recurrent neural network (RNN), i.e., the non-linear adopts these models to infer the data relationship. While, the
predictive component are sequentially combined. The latter is model-free, which directly learns the data relationship
weakpoint lies in that the deviation caused by the first through data mining and machine learning methods,
prediction component will directly affect the performance of especially by deep learning. An example of the first category
the following prediction component, which in turn hampers is the use of vector auto-regression [15] or Kalman filter for
the overall performance. prediction. As stated in [16], recently, many researchers have
used the method of machine learning, such as [8-10][17-19],
B. Main contribution
which belongs to the second category.
In recent years, methods of detection by integrating Kalman filter is widely used for prediction [20-22]. Its
multiple models have emerged and shown good results, such advantage lies in that it can calibrate the predicted value with
as [11-14]. Among them, ensemble learning has become a constructing Kalman gain, but the weak point lies in that it
popular machine learning paradigm where multiple base can only deal with well the linear data feature of smart grid.
learners are trained to solve the same problem, in which the However, the smart grid data naturally contain non-linear
generalization ability of an ensemble is usually much stronger component. Aside from Kalman filter, vector auto-regression
than that of base learners. is also a widely used process that captures the linear
Considering that the diversity of the base learners of an inter-dependencies between the different time series events,
ensemble is known to be an important factor in determining and can incorporate the dynamics of the system.
its generalization error, this paper presents an effective FDIA Ref. [8][9] respectively use the recurrent neural network
detection scheme based on Kalman filer and recurrent neural combined with the fully connected layer (i.e., so-called Wide
network (named KFRNN). Specifically, the main component), and the convolutional neural network combined
contributions of this paper are given as follows. with Wide component for prediction. Hereby, for comparison
 Considering that the actual smart grid data are both in our paper, the scheme [8] is named RNNWide and the
linear and non-linear, and their time-sequential nature, scheme [9] named CNNWide. The above two schemes only
the Kalman filter and RNN are adaptively combined in use the non-linear model to predict the data and ignore the
parallel to incorporate the data characteristics linear component in the smart grid data. In particular, in [9], it
respectively. Specifically, following the stacking method, is inappropriate to stiffly convert time-sequential grid data
the second-level meta-learner, i.e., the BP (back into a two-dimensional structure for the implementation of
propagation) based neural network is designed to convolutional neural network, for the temporal nature of the
adaptively combine the prediction results of these two data can be destroyed by the rigid conversion.
base learners. Final output xˆt 2h
 Comprehensive simulations demonstrate that KFRNN
Predicted residual ~
performs better than other neural network based and
rt2h
other ensemble learning based schemes, including R2N2,
RNNWide, and CNNWide, etc.
 To verify the design rationality of KFRNN, we RNN
intentionally design the so-called R2N2_variant, which
follows the same neural structure as KFRNN, but similar Residual rth
Observed
to R2N2, uses the VAR component for processing the value xth
linear data feature and RNN component for processing
the non-linear data feature. The simulation results Linear output ~
xth Linear output ~
xt2h
demonstrate that KFRNN and R2N2_variant outperform
than other deep learning based FDIA detection schemes, VAR VAR
and moreover show the superiority of KFRNN among all
schemes.
The rest of this paper is organized as follows. Section II Input data x t Input data xth
discusses related work .In section III, we briefly introduce the Fig. 2. Flowchart of R2N2 based prediction model.
smart grid system model, state estimation and attack model. R2N2 is used for FDIA detection [10]. Fig. 2 presents the
Section IV provides the proposed scheme KFRNN, and its technical framework of R2N2 prediction model. Technically,

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

R2N2 is divided into two branches: the left branch uses VAR TABLE I
MAIN NOTATION AND THEIR MEANINGS USED IN THIS PAPER
to predict the states at time (t+h), and then sends into RNN the
residuals between the observed states and the predicted states Notation Definition
at time (t+h) to predict the residuals at time (t+2h); the right 𝒛 𝑀 × 1 measurement vector of smart grid
branch uses vector auto-regression to directly predict the 𝒙 (𝑁 − 1) × 1 state vector of smart grid
states at time (t+2h), and then the predicted states at time 𝒏 Noise variable
(t+2h) in right branch plus the residuals at time (t+2h) from 𝒂 The generated attack vector on the measurements
the left branch is regarded as the finally predicted states at 𝒓 The residual between the observed measurement and the
time (t+2h). predicted measurement
In a sense, R2N2 also takes into account both linear and 𝑯 𝑀 × (𝑁 − 1) measurement Jacobian matrix
non-linear data features in smart grid, in which VAR is used to 𝑴 The number of measurements (M=N+L) in smart grid
fit the linear data feature, and RNN is used to process the L The number of transmission lines in smart grid
non-linear data feature. However, R2N2 has one defect: the 𝑵 The number of buses in smart grid
deviation generated by the VAR prediction in the left branch
In a typical power system, there are buses inter-connected
is propagated to the following procedures of RNN. To verify
by the transmission lines. The Independent System Operators
our inference, besides KFRNN scheme, we intentionally
(ISOs) monitor the power system through SCADA
designed the so-called R2N2_variant, which has the same
measurements using sensors. Smart grid model is based on
neural structure as KFRNN, with the only difference that the
real-time measurements of data flow and static system data.
Kalman filter component in KFRNN is replaced by VAR
The goal of state estimation is to use real-time data collected
component. The simulation results demonstrate that the
from the measurement units to estimate the operating state of
performance of R2N2_variant is better than that of the
the smart grid.
traditional R2N2 scheme, which implies the rationality of the
For state estimation using the DC power flow model, the
proposed KFRNN structure.
relation between measurements and state variables is given as:
Different from the above FDIA detection schemes, another
𝑧 = 𝐻𝑥 + 𝑛 (1)
methodology is direct classification. Especially, in [23], the
𝑧 = [𝑧𝑁 , 𝑧𝐿 ]𝑇 represents a 𝑀 × 1 measurement vector of
superposition of multilayer convolutional neural networks is
smart grid, including power injections at 𝑁 buses and the
used as the classification model, and the labeled data is used
power flows on 𝐿 transmission lines. Specifically, 𝑧𝑁 =
to train the classification model that is then used to detect
[𝑃1 , … , 𝑃𝑁 , 𝑄1 , … , 𝑄𝑁 ]𝑇 , 𝑧𝐿 = [… , 𝑃𝑖𝑗 , … , 𝑄𝑖𝑗 , … ]𝑇 ,
FDIA in operating phase. Similarly, using the training data
where 𝑃𝑘 , 𝑄𝑘 , 𝑘 ∈ 1, 𝑁 respectively represents the active
with known labels, support vector mechanism is used to
power and reactive powers injected by 𝑘 bus, and
create a hyperplane to detect FDIA in [24]. In summary,
𝑃𝑖𝑗 , 𝑄𝑖𝑗 , 𝑖, 𝑗 ∈ [1, 𝑁] respectively represent the flow active and
classification based methods require that the training data
reactive powers of the transmission line between bus 𝑖 and 𝑗.
should have been explicitly labeled, and have the following
𝑥 is a 𝑁 − 1 × 1 state vector of smart grid, which, for
drawbacks. First, it is difficult to obtain the labeled data in
example, denotes the phase angles of 𝑁 buses (one of the
practice, especially for FDIA. Second, the classification
buses is regarded as a reference bus). 𝐻 is the 𝑀 × 𝑁 −
model trained by some specifically labeled data can only work
1measurement Jacobian matrix. 𝑛 is the noise variable, which
for some certain attack mode, and cannot be applicable to
follows the normal distribution 𝑛~𝑁(0, 𝜎𝑛2 ). Through the
other attack modes that may generate different attack data.
minimum mean squared error estimator, the state vector 𝑥
In addition, there are some other detection schemes. For
can be estimated from the Equation (2), denoted as𝑥 .
example, the learning model of mixture Gaussian distribution
𝑥 = (𝐻 𝑇 𝑅−1 𝐻)−1 𝐻 𝑇 𝑅−1 𝑧 (2)
proposed by [25] is used to fit normal states for detecting
where 𝑅 = 𝑑𝑖𝑎𝑔(𝜎𝑛2 ) is noise co-variance matrix, and
FDIA. The problem is that the selection of the number of
𝜎𝑛2 represents the variance of the noise on the 𝑛 − 𝑡𝑕
Gaussian density components and the overall probability
component of the measurement vector.
distribution does not vary largely if a state was occurred by a
Then, the residual between the observed measurements and
small attack. In [26], FDIA is detected based on graph signal
the estimated measurements is represented as Equation (3).
processing, but its complexity is relatively high. It is worth
𝑟 = 𝑧 − 𝐻𝑥 (3)
mentioning that Generative Adversarial Networks are used to
For traditional BDD system, if 𝑟 < 𝜏 (𝜏is the predefined
identify and recover the data attacked FDIA [27][28].
system-based threshold value), then SE is regarded as valid.
Anomaly detection technology is adopted in [29], which
Denoting 𝑎 as the nonzero injected vector which is injected
includes Luenberger observer and an artificial neural network
into the measurement data 𝑧, then the measurement vector
to identify false data.
after attack can be represented as 𝑧𝑎 = 𝑧 + 𝑎. If the attacker
III.THE SYSTEM MODEL AND PROBLEM STATEMENT would know the topology, that is, know the 𝐻 matrix, then
the attack vector can be designed as 𝑎 = 𝐻𝑐, where 𝑐 is
Table I gives the main notations and their meanings used in
usually a (𝑁 − 1) × 1 vector. Then, the estimated system
this paper.
state vector after attack 𝑥𝑎 can be represented as the
following Equation (4).
𝑥𝑎 = (𝐻 𝑇 𝑅−1 𝐻)−1 𝐻 𝑇 𝑅−1 𝑧𝑎 = 𝐻 𝑇 𝑅−1 𝐻 −1 𝐻𝑇 𝑅−1 𝑧 +

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

𝐻𝑐 = 𝐻 𝑇 𝑅−1 𝐻 −1 𝐻𝑇 𝑅−1 𝑧 + 𝐻 𝑇 𝑅−1 𝐻 −1 (𝐻 𝑇 𝑅−1 𝐻)𝑐 = to output the final predictive result [32][33]. In sequential
𝑥+𝑐 (4) ensemble, the base learners are created over iterations and
Then, the residual between the observed measurements and have dependency among them. Typically, the larger weights
the estimated measurements under FDIA, 𝑟𝑎 can be are assigned to the mis-predicted instances in the preceding
computed as the Equation (5). base learner to advance the following based learner. The
𝑟𝑎 = 𝑧𝑎 − 𝐻𝑥𝑎 = 𝑧 + 𝐻𝑐 − 𝐻𝑥 − 𝐻𝑐 = 𝑧 − 𝐻𝑥 = advantage of parallel ensemble is that: due to the parallel
𝑟 (5) execution of the base learners, the entire scheme can be
Since 𝑟𝑎 = 𝑟 , it implied that the traditional measurement executed efficiently, and the biases of individual base learner
residual based BDD method can’t detect the FDIA at all, can be overcome, and then achieve higher overall accuracy
so-called a stealthy FDIA. [34]. More importantly, since the data in FDIA scenario is
In summary, the intentionally injected measurement attack dynamic and time-serial, and KFRNN detects the FDIA in
𝑎 = 𝐻𝑐 can make the traditional SE in smart grid changes the real-time way, especially in operation phase, the Kalman filter
estimated state from 𝑥 into 𝑥 + 𝑐, and successfully bypass in KFRNN is built on-site for real-time detection, therefore it
the measurement residual based BDD. Therefore, if the is extremely difficult (or even impossible) to adjust the
designer, instead of traditional SE, could accurately predict weights of even-changing data instance and send to the
the state variables through exploiting the time-sequential second learner.
features of historical state variables, then the FDIA attack can Based on above consideration, the parallel ensemble is
be detected effectively. utilized in our KFRNN scheme for real-time operation. Note
that the parallel learning framework can be implemented with
IV.KFRNN: THE PROPOSED FDIA DETECTION SCHEME bagging and stacking methods. The difference lies in the way
In a sense, our proposed scheme is inspired from ensemble of combining the base learners’ results. Basically, in bagging,
learning. There are several reasons why ensemble learning a simple combiner, e.g., majority voting for classification [11]
paradigm often improves performance of the prediction and weighted averaging for regression, is used to create
scheme [30]: 1) Overfitting avoidance. 2) Computational ensemble estimates, while stacking utilizes a second-level
advantage. 3) Representation. In terms of production way of learner to adaptively combine the base learner models
the base learners, ensemble learning can be broadly divided (so-called meta-algorithm), which can alleviate the biases in
into parallel and sequential ensemble [31]. In parallel the base models. Therefore, our work KFRNN adopts the
ensemble, each base learner performs prediction stacking method.
independently, and then multiple base learners are combined
Offline
Determining the weight between
Kalman filter and RNN
Training
set 2

RNN Kalman
filter

Offline RNN predicted Kalman filter


Training RNN Training measurements predicted
set 1
measurements

RNN RNN predicted


Online detection
measurements Fully SSE value between the FDIA occurs or not
connected predicted and observed
Real-time Kalman filter layer+BP measurements
Real-time time- predicted
sequential data Kalman
filter measurements
KFRNN prediction model

Detection threshold
Offline
Inferring the detection threshold SSE between The
Traning set 3 predicted and observed
measurements Fitting the cumulative probability
density curve of SSE to infer
detection threshold
KFRNN Predicted value
prediction model

Fig. 3.Flowchart of the proposed KFRNN framework.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

Fig. 3 shows the overall flowchart of the proposed scheme 𝑥𝑡+ = 𝑥𝑡− + 𝐾𝑡 𝑧𝑡 − 𝐻𝑥𝑡− (9)
KFRNN, which is composed of two phases: offline training 𝐶𝑡+ = 𝐶𝑡− − 𝐾𝑡 𝐻𝐶𝑡− (10)
and online detection. where 𝐹 is the state transition matrix, which can be solved
The offline training phase explicitly includes the following by vector auto-regression model; 𝐶 is the covariance matrix
components: training the RNN prediction model, determining of the states; 𝑊 is the covariance matrix of state noise; 𝐾 is
the weight between Kalman filter and RNN prediction models, the Kalman gain; 𝑅 is the covariance matrix of the
and inferring the FDIA detection threshold through fitting the measurement noise; {∙−} represents the intermediate value
sum of square errors (SSEs) between the observed obtained by the prediction component, and ∙+ denotes the
measurements and the predicted measurements converted final value after the filter component. Compared with vector
from the predicted states. Especially, the combination of the auto-regression, Kalman filter calibrates the predicted state
former two components is named KFRNN prediction model with the observed measurements through the constructed
in our paper. In KFRNN, first, two parallel prediction models Kalman gain at time 𝑡, which finally improves the prediction
are built: one is the Kalman filter suitable for linear prediction; performance.
another is the recurrent neural network suitable for the Based on the above Kalman filter, we can get the predicted
non-linear data features. After the two parts are predicted in measurements 𝑧𝑝𝑟𝑒 of the Kalman component in Fig. 3at
parallel, they are fed into the fully-connected layer and back time t, shown as Equation (11).
propagation (BP) module for adaptively weighting. 𝑧𝑝𝑟𝑒 = 𝐻𝑥𝑡+ (11)
At online detection phase, the built KFRNN is used to
obtain the predicted states, and then the corresponding B. Recurrent Neural Network Suitable for Non-linear Data
predicted measurements can be estimated through Equation LSTM is a special kind of RNN, first proposed in [35].
(1). The residual between the finally predicted measurements LSTM cell can be formulated as the following Equations (12)
and the observed measurements is compared with the to (16).
detection threshold obtained in training phase, to infer 𝑓𝑡 = 𝜎(𝑤𝑓 𝑥𝑡 + 𝑢𝑓 𝑕𝑡−1 + 𝑏𝑓 ) (12)
whether FDIA occurs or not. 𝑖𝑡 = 𝜎(𝑤𝑖 𝑥𝑡 + 𝑢𝑖 𝑕𝑡−1 + 𝑏𝑖 ) (13)
It should be explicitly noted that, unlike the RNN 𝑜𝑡 = 𝜎(𝑤𝑜 𝑥𝑡 + 𝑢𝑜 𝑕𝑡−1 + 𝑏𝑜 ) (14)
component that is trained offline, in detection phase, Kalman 𝑐𝑡 = (𝑓𝑡 ⊗ 𝑐𝑡−1 ) ⊕ (𝑖𝑡 ⊗ 𝑡𝑎𝑛𝑕 𝑤𝑐 𝑥𝑡 + 𝑢𝑐 𝑕𝑡−1 + 𝑏𝑐 ) (15)
filter component in our proposed KFRNN scheme is 𝑕𝑡 = 𝑜𝑡 ⊗ 𝑡𝑎𝑛𝑕(𝑐𝑡 ) (16)
dynamically built, and continually evolved using the where 𝑓 is forget gate; 𝑖 is input gate; 𝑜is output gate; 𝑐 is
streaming real-time states and mathematical model for the
cell memory; 𝑕 is hidden state; 𝑥 is the input vector; 𝑏 is
power grid. It should be explicitly pointed out that, in the
bias; 𝑤 and 𝑢 represent the weight vectors of 𝑓 and 𝑕 ,
training process, Kalman filter is also built using the training
respectively; 𝜎 ∙ and 𝑡𝑎𝑛𝑕(∙) stand for the sigmoid
dataset, whose purpose is to adaptively determine the weights
function and the hyperbolic tangent function,respectively.
of two base predictors in our ensemble learning inspired
Compared with other RNNs, LSTM has better performance
KFRNN scheme. The detailed procedure is given in
in predicting time series data due to the existence of the gate
subsection IV.C.
functions, so we use LSTM in nonlinear component.
As described above, to verify the design rationality of
Note that considering that training neural network requires
KFRNN, we intentionally design the so-called R2N2_variant,
abundant time which can’t meet the real-time operation in
which follows the same neural structure as KFRNN shown as
FDIA detection, therefore, RNN component in KFRNN is
Fig. 3, but uses the same VAR component for processing the
trained offline, and then used for real-time prediction in
linear data feature and RNN component for processing the
detection phase.
non-linear data feature as R2N2.
The following subsections respectively describe the various C. Fully Connected Layer and BP for Adaptively Weighting
components shown in Fig.3, including Kalman filter, RNN Two Predictive Components
models, fully connected layer and BP for weight Kalman filter and RNN predictors can correspondingly
determination, and detection threshold determination. incorporate the linear and non-linear components respectively.
A. Kalman Filter Suitable for Linear Data It is imperative to investigate how to appropriately
accumulate these two predictive components to finally obtain
Kalman filter is one of the main dynamic state estimation
the accurate prediction state values.
methods in power system, which, in our scheme, is used for
estimating grid state through using linear state equations. The
formula of Kalman filter includes two components: prediction
component and filtering component. These two components
are performed in every estimation step t.
1) prediction component:
+
𝑥𝑡− = 𝐹𝑥𝑡−1 (6)
− +
𝐶𝑡 = 𝐹𝐶𝑡−1 𝐹 𝑇 + 𝑊𝑡−1 (7)
2) filtering component:
𝐾𝑡 = 𝐶𝑡−𝐻 𝑇 (𝐻𝐶𝑡−𝐻 𝑇 + 𝑅𝑡 )−1 (8)

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal


  method in reliability engineering and failure analysis
z1 w
, b1,1
1,1
1 z obs [36].
w1,M , b1,M 
Kalman filter
 1   Finally, the corresponding threshold value is determined
under a given false alarm rate.
...

...
z1_ pre


M
M

zM

...
zM1 M1
δ M1

zM _ pre
RNN

...

...
δ
z2M 2M
2M

Fig.4.The combination model adaptively weighting of Kalman filter and RNN


predictors through fully connected layer and BP module.
As shown in Fig.4, in training phase, using small amount of
Fig.5.The Cumulative Distribution (CD) of SSEs and the fitted Weibull
training data (i.e., measurements data without being attacked), function of SSEs in IEEE 14 as an example to explain the detection threshold.
KFRNN trains the combination model to automatically and Fig.5 exemplifies how to determine the FDIA detection
adaptively assign the weight of the two predicted threshold. First, the acceptable false alarm rate, namely the
measurements, and output the finally predicted measurements. false positive rate (FPR), is determined, e.g., FPR=0.05
In detail, firstly, the predicted measurements from Kalman shown in Fig.5; then the designated (1-FPR), i.e., 0.95, is
filter component and RNN component are sent to the fully taken as the ordinate value of the fitting function to find the
connected layer to combine these two components. Then corresponding abscissa value that is regarded as the threshold
based on the loss between the observed and combined value, i.e.,0.65567. In the FDIA detection phase, if the sum of
measurements values, BP algorithm will adaptively adjust the square errors at a certain time is larger than the threshold, then
weights in fully connected layer. The formal description of the FDIA attack is received; otherwise there is no FDIA.
above procedures is given in the following Equations (17) to The dynamic nature of threshold is reflected in the fact that
(19). the corresponding threshold can be determined based on the
𝜎𝑛 = 𝜎 𝑤1,𝑛 𝑧1 + 𝑏1,𝑛 + ⋯ + 𝑤𝑀,𝑛 𝑧𝑀 + 𝑏𝑀,𝑛 + false alarm rate allowed at the current moment, rather than
𝑤𝑀+1,𝑛𝑧𝑀+1+𝑏𝑀+1,𝑛+…+𝑤2𝑀,𝑛𝑧2𝑀+𝑏2𝑀,𝑛 (17) just a static threshold, which is suitable for the actual
requirements. Intuitively, on one hand, the larger the threshold
𝑑𝜎𝑚

𝑤𝑛,𝑚 = 𝑤𝑛 ,𝑚 + 𝜂𝛿𝑚 𝑧𝑛 (18) is, the lower the corresponding detection probability could be,
𝑑𝑤 𝑛 ,𝑚
′ 𝑑𝜎𝑚 since the larger threshold will ignore some small attack; on
𝑏𝑛,𝑚 = 𝑏𝑛,𝑚 + 𝜂𝛿𝑚 𝑧𝑛 (19) the other hand, the false alarm rate is also lower, vice versa.
𝑑𝑏𝑛 ,𝑚
𝑛, 𝑚 ∈ [1, 2𝑀]
V. EXPERIMENTAL RESULTS
where 𝜎 stands for activation function. Here, sigmoid function
is used; 𝑤 stands for weight, 𝑏 stands for bias, 𝜂 stands for A. Simulation Settings and FDIA Attack Model
step length, 𝑀 is the number of measurements, and 𝛿 is the 1) Simulation settings
errors between the observed measurements 𝑧𝑜𝑏𝑠 and the In this section, we perform comprehensive simulations to
finally predicted measurements 𝑧𝑝𝑟𝑒 . evaluate the performance of the proposed KFRNN scheme.
Considering it is extremely dangerous to conduct FDIA attack
D. Determining the Dynamic FDIA Detection Threshold
in real smart grid, we adopt MATPOWER [37] simulation
A data-driven sum of square errors (SSEs) method is used tool to generate the smart grid topology and the operational
to dynamically determine the detection threshold of FDIA data, which is a commonly used simulation model in grid field
[36]. In SSEs, the errors mean the difference between the [27]. The simulation rationale lies in that the data in
finally predicted measurements and the observed MATPOWER is selected from real grid data, for example, the
measurements. Specifically, the dynamic threshold is IEEE 14 Bus Test Case represents a portion of the American
determined by the following steps. Electric Power System (in the Midwestern US); the IEEE 57
 Firstly, the finally predicted measurements are obtained Bus Test Case represents a portion of the American Electric
through our proposed KFRNN model, then the errors of Power System (in the Midwestern US).
the finally predicted measurements and the observed The specific grid system and their corresponding data
measurements are calculated, and then the sum of square contained in the MATPOWER are used to characterize the
errors (SSEs) are obtained. operation of real-world gird. In our simulations, the state
 Secondly, the sample values of SSEs are fitted into a variables are voltage angles of all buses, and the meter
Weibull distribution function, which is a common measurements are real power injections of all buses and real

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

power flows of all branches. The generated data consist of position of 𝑐𝑖 is randomly selected. The reason why the state
2,000 samples, divided into four equal parts: three for training attack variance is set in such way lies that, let 𝜎𝑥2𝑎 and 𝜎𝑥2
and one for testing. The three training data are used for denote the variances of the elements of 𝑥𝑎 and 𝑥
training RNN and BP and constructing SSEs fitting curves respectively, and it is reasonable to assume 10 𝑙𝑜𝑔 𝜎𝑐2 𝜎𝑥2 =
respectively. 3dB, i.e., the variance of the state vector contributed by the
injected vector 𝑐 is assumed to be 3dB higher than that
contributed by the original state data [39]. Empirically, the
state (i.e., voltage angles of all buses) variance in our data is
within 1 and 3. Therefore, the variance is set as 6. Note that,
we also conduct simulations to evaluate the impact of change
of injected state attack variance on detection probability,
shown as the following Fig. 14. Note that the measurement
noise with zero mean and variance 1 (the empirical variance
of the real power) Gaussian distribution is also used.
In summary, the compromised measurements are given as:
𝑧𝑎 = 𝐻𝑥 + 𝑛 + 𝑎 , where 𝐻𝑥 is real measurement, 𝑛 is
measurement noise, and 𝑎 is the injected measurement attack.
Since 𝑎 is randomly sampled from the normal Gaussian
distribution with varied variance, which, in a sense, is
Fig.6. Illustration of the first 100 samples of bus 5-9 state data (i.e., voltage generalized to scaling attack, ramp attack, and step attack.
angle) in IEEE 14. In addition, similar as in [10], the number of hidden layer
Fig.6 graphically shows the first 100 samples of bus 5-9 cells of LSTM is 100, and the number of input, output layer
training state data in IEEE 14, in which the shown state of and fully connected layer cells depends on the used smart grid
grid is the argument of complex voltage. Intuitively, we can model. In this paper, KFRNN, R2N2_variant, R2N2 [10],
observe that it contains abundant linear and non-linear RNNWide [8] and CNNWide [9] are analyzed and compared
components. Moreover, Fig.5 illustrates that the training state in IEEE 14 model. Note that, to illustrate the applicability of
data are fluctuating, which is also in line with the fluctuation the scheme to larger networks, we also provide the
of the actual voltage argument. performance evaluation on IEEE57 model.
2) FDIA attack model
B. Performance of Prediction Accuracy of Various Scheme
Generally, the FDIA signal is injected into the
measurements, modeled as an additive signal which is added First, we illustrate the measurement prediction accuracy of
to the measurement sensor readings. This additive signal can these five schemes, in terms of mean relative square error
be of any value, and throughout the literature, the scaling (MRSE) function. MRSE is defined as the Equation (20).
attack, ramp attack, step attack, and general random attack, 𝑇
𝑡=1
𝑁 −1 (𝑥 −𝑥 )2
𝑘=1 𝑘𝑡 𝑘𝑡
etc. are usually used [38]. 𝑀𝑅𝑆𝐸 = (20)
𝑇 𝑁 −1 (𝑥 −𝑚𝑒𝑎𝑛 (𝑥 ))2
In our work, instead of random attack model where the 𝑡=1 𝑘=1 𝑘𝑡 𝑘

attacker simply manipulates the sensor readings by inserting where 𝑥𝑘𝑡 represents the observed states of the 𝑘 − th
random attack vector generated by the attacker, the stealthy dimension at time 𝑡, which is obtained by using the minimum
false data injection attack model is used, which, in a sense, mean squared error described in the section II; 𝑥𝑘𝑡 is the
can be regarded as a special random attack. This model finally predicted states of the 𝑘 − th dimension at time 𝑡;
assumes the attackers have the knowledge of the system 𝑁 − 1 represents the dimension of the states; 𝑇 is the total
topology represented by the Jacobian matrix H and can predicted time points. In brief, the smaller the MRSE value is,
possess the capability of compromising a limited number of the better the scheme performance is.
measurements.
Specifically, in our simulations, first the compromised state
c is randomly selected, then the corresponding measurements
are compromised by the injected measurement attack set as
𝑎 = 𝐻𝑐. The rationale of using the above FDIA attack model
lies in the following fact. From Equations (2) to (5), we can
observe that the traditional BDD can’t detect the intentionally
designed FDIA attack, but other brute-force attacks can be
easily detected by the existed BDD method. The used stealthy
FDIA attack model is first proposed by [2].
Specifically, the compromised state vector
𝑐 = 0, … , 𝑐𝑖 , … 0 , 𝑖 ∈ [2, 𝑁] is generated randomly, in
which 𝑐𝑖 follows a Gaussian distribution with a mean of 0 Fig.7. Prediction performance (i.e., MRSE) of various schemes including
KFRNN, R2N2_variant, RNNWide, CNNWide and R2N2 in IEEE 14.
and variance between 0.5 and 8 (commonly set as 6), and the

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

Note that the reason why ROC is an appropriate


performance metric to demonstrate the performance of FDIA
detection lies in that, for SCADA environment, false
negatives are unacceptable and a low false positive rate is
desired. That is, the FDIA detection scheme should make FN
as lower as possible and increase TP as much as possible,
while it is acceptable to have low FP, due to the fact that
small number of normal data considered to be false (i.e., FP)
can’t harm SCADA much, but if TN is relatively large, or TP
is relatively small, the SCADA system will be broken.

Fig.8. Prediction performance (i.e., MRSE) of various schemes including


KFRNN, R2N2_variant, RNNWide, CNNWide and R2N2 in IEEE 57.
Fig.7 and 8 respectively show the MRSE values of various
schemes including KFRNN, R2N2_variant, RNNWide,
CNNWide and R2N2, under the scenarios of the IEEE 14 and
57 buses. Obviously, among all scenarios, KFRNN has the
best prediction performance (i.e., the minimal MRSE value),
R2N2_variant is the second, and the performance of R2N2 is
worst. These results verify our conjecture about the weakpoint
of R2N2, that is, the deviation caused by the first linear
prediction component will directly affect the performance of
the following prediction component, which in turn hampers
the overall performance. Meanwhile, this also verifies the Fig.9. The detection probability of KFRNN, R2N2_variant, RNNWide,
CNNWide and R2N2 in IEEE 14.
design rationality of the proposed KFRNN structure.
Fig. 9 shows the ROC curves of KFRNN, R2N2_variant,
In terms of comparison between RNNWide and CNNWide,
RNNWide, CNNWide and R2N2 in IEEE 14. It can be seen
we can observe an interesting phenomenon that the CNNWide
that KFRNN has the highest detection probability, followed
scheme has opposite performance under IEEE14 and 57
by R2N2_variant, CNNWide, RNNWide and R2N2. Note that
scenarios. Basically, CNN is suitable for processing data with
the performance of KFRNN is slightly better than that of
spatial structure, however the two-dimensional tensor
R2N2_variant, which is consistent with the results shown in
representation that is rigidly converted from the time series
Fig. 7.
data neither has reasonable spatial structure, nor reflects the
The reason why KFRNN has the best performance lies in
essential nature among time series data. If the pieced
the following aspect. KFRNN fully considers the linear and
two-dimensional tensor occasionally conforms to the
non-linear components in the smart grid data and uses Kalman
characteristics of CNN, the performance may be better, as
filter and RNN to predict independently to ensure that the two
shown in Fig.7. If not, the performance is worse, as shown in
parts do not interfere with each other in the prediction process.
Fig. 8. Therefore, CNN is not applicable to time series data.
Then, the predicted values of the two parts are fed into the
C. Performance of FDIA Detection of Various Schemes specially trained neural network composed of fully connected
Our work adopts the widely used receiver operating layer and BP, so the two parts are automatically assigned
characteristic (ROC) curve [10] to compare various schemes appropriate weights and integrated to achieve the purpose of
including our proposed KFRNN and R2N2_variant schemes. reducing the deviation generated in the prediction.
In detail, ROC curve takes false positive rate as the abscissa It should be explicitly pointed out that, although both R2N2
and true positive rate as the ordinate. Note that, for easily and R2N2_variant use same functional components: vector
understanding, false positive rate is also called false alarm auto-regression and RNN, they have different structure, which
rate in our paper, and true positive rate is intuitively named as is main reason why R2N2_variant performs better than R2N2.
the detection probability. The definition of these metrics is KFRNN, RNNWide and CNNWide all use the idea of
given as follows. ensemble learning to combine multiple prediction models to
𝐹𝑃 achieve better performance than a single model. Since
𝐹𝑎𝑙𝑠𝑒 𝐴𝑙𝑎𝑟𝑚 𝑅𝑎𝑡𝑒 = RNNWide and CNNWide only adopt the non-linear way (i.e.,
𝐹𝑃 + 𝑇𝑁
𝑇𝑃 neural network) for prediction and does not consider the linear
𝐷𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = component of smart grid data, its performance is inferior to
𝑇𝑃 + 𝐹𝑁
Where FP (False Positive) represents the number of normal that of KFRNN. On the contrary, KFRNN gives full
data that are considered false, TN (True Negative) is the consideration to the linear and non-linear components in the
number of normal data that are considered normal, TP (True smart grid data, and uses Kalman filter and RNN to deal with
Positive) represents the number of false data successfully them respectively, giving more comprehensive consideration
detected and FN (False Negative) represents the number of and better performance.
false data missed. In ensemble learning, the key is how to appropriately

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

integrate multiple prediction components. In the above three


schemes, the Back Propagation is used to fuse the prediction
components, which, instead of simple averaging method, can
adaptively assigns weights to each component, and obtain
more accurate output. The difference is that KFRNN adds a
fully connected layer to better integrate different models and
improve performance. The use of BP module to adaptively
weight multiple predictors is another main reason that
KFRNN achieve higher performance than that of R2N2.

Fig.12.The 𝐹1-score of KFRNN, R2N2_variant, RNNWide, CNNWide and


R2N2 in IEEE 57.
Fig.11 and 12 show the 𝐹1-score of the five schemes in
IEEE 14 and 57 respectively. Obviously, the 𝐹1-score of
KFRNN is the best, slightly better than R2N2_variant, and
significantly larger than RNNWide, CNNWide and R2N2.
In summary, the thorough simulations demonstrate that our
proposed KFRNN has the best performance, and our
Fig. 10.The detection probability of KFRNN, R2N2_variant, RNNWide, re-designed R2N2_variant also performs better than
CNNWide and R2N2 in IEEE 57. RNNWide, CNNWide and the traditional R2N2.
In practical usage, the proposed scheme should have For comprehensive comparison, the performance of
consistent performance with the varying of grid size. We accuracy varying the false alarm rate is given as the Fig. 13.
compare the performance of these five schemes in IEEE 57 The metric of accuracy is defined as follows.
buses as an extension, shown as Fig. 10. Obviously, we can 𝑇𝑃 + 𝑇𝑁
see that the results are consistent with those in Fig. 8, except 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃 + 𝑇𝑁
for CNNWide. There are two reasons for this situation: on the
one hand, CNN is not suitable for processing time series data;
on the other hand, as the number of buses in the smart grid
increases, the two-dimensional tensor expands, leading to the
decline of CNN processing capacity.
In brief, our proposed KFRNN has the best performance,
and the designed R2N2_variant also performs better than
RNNWide, CNNWide and the traditional R2N2.
To further analyze these schemes, the 𝐹1-score is used as
the statistical metric, which is a comprehensive index
balancing the detection probability and precision rate. The
definition of 𝐹1-score is following.
2 × 𝐷𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 × 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐹1 − score =
𝐷𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝑇𝑃 Fig.13.The accuracy of KFRNN, R2N2_variant, RNNWide, and R2N2 in
where 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = IEEE 14.
𝑇𝑃+𝐹𝑃
Obviously shown as Fig.13, our proposed KFRNN and
R2N2_variant outperform other schemes. Moreover,
interestingly, the accuracy of all schemes decreases with the
increase of false alarm rate (i.e., false positive rate). The
reason is following. Larger false alarm rate means the smaller
detection threshold, which may increase TP a little, but make
more normal data mistakenly classified as false data, that is,
correspondingly decreasing TN and increasing FN.
Considering the amount of normal samples is much more than
that of FDIA samples, Then, overall, with the increase of false
alarm rate, the accuracy decreases gradually. The result also
implies that, instead of the traditional metrics of accuracy,
ROC curve is an appropriate metric for evaluating FDIA
Fig. 11.The 𝐹1-score of KFRNN, R2N2_variant, RNNWide, CNNWide and detection performance, for the designer can naturally bear low
R2N2 in IEEE 14.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

false positive rate, to increase the detection probability, i.e., Treating our proposed KFRNN and R2N2_variant as two
true positive rate. ensembles, we can obtain the following experimental results:
𝐷𝑘𝐹𝑅𝑁𝑁 = 0.56, and 𝐷𝑅2𝑁2_𝑣𝑎𝑟𝑖𝑎𝑛𝑡 = 0.14. According to the
Equation (21), the result demonstrates that KFRNN should
theoretically perform better than R2N2_variant, which is
experimentally verified by our simulation results.
VI. CONCLUSION
Nowadays, smart grid faces many threats which may cause
huge economic losses, especially FDIA. To effectively detect
FDIA, inspired by the parallel ensemble learning, especially
stacking method, the KFRNN scheme is proposed in this
paper, which fully takes into account the linear and non-linear
components in the smart grid data. Specifically, KFRNN uses
Kalman filter and RNN as two base learners to respectively fit
corresponding linear and non-linear data features of smart
grid, then the second-level meta-learner adaptively combines
Fig.14.The detection probability with the change of injection attack vector the Kalman filter and RNN predictors through a specially
variance. designed neural network. Furthermore, to verify the advantage
Fig.14 shows the detection probabilities of various schemes of our design framework, we re-structure the typical residual
when the false alarm rate is 0.05 and the injection state attack recurrent neural network (R2N2) as R2N2_variant used for
vector variance changes from 0.5 to 8. The following FDIA detection. Then, the proposed scheme is
observations can be obtained. First, under all attack comprehensively compared with the other neural network
magnitudes (characterized by the injection attack vector based and ensemble learning based schemes, namely,
variance), our proposed KFRNN and R2N2_variant RNNWide, CNNWide and R2N2. The simulation results
outperform than other schemes. Second, when the attack demonstrate that, in terms of detection probability 𝐹1-score
variance is small (for example, less than 1), all schemes have and accuracy, KFRNN achieves the best performance for
relatively low detection probability. The reason is various grid network size including IEEE 14 and 57 buses,
straightforward: small attack variance generally means small and the re-designed R2N2_variant also performs better than
attack magnitude injected (since the mean of attack is 0), RNNWide, CNNWide and R2N2. Moreover, through
which, then, leads to the measurement residual may be lower analyzing the diversity of ensemble, we demonstrate the
than the detection threshold. However, when the attack underlying reason why KFRNN performs better than
variance is recognizable, as shown in the inner figure in Fig. R2N2_variant.
14, our proposed schemes can achieve good performance.
REFERENCES
D. The theoretical implication of our proposals
[1] A. Muir and J. Lopatto, "Final report on the August 14,
In an ensemble, the combination of the output of several 2003 blackout in the United States and Canada: causes
base learners is useful if they disagree on some inputs. The and recommendations," US–Canada Power System
disagreement is measured as the diversity/ambiguity of the Outage Task Force, Canada, 2004.
ensemble. It is shown that that the generalization error E of [2] Y. Liu, P. Ning, and M.K. Reiter, ―False data injection
the ensemble can be expressed as the following equation [40]. attacks against state estimation in electric power grids,‖
𝐸 =𝐸−𝐷 (21) ACM Transactions on Information and System Security,
where 𝐸 and 𝐷 are the mean error and diversity of the vol. 14,no. 1, May 2011, Art. no. 13.
ensemble respectively. This result implies that increasing [3] S. K. Singh, K. Khanna, R. Bose, B.K. Panigrahi, and A.
ensemble diversity while maintaining the average error of Joshi, "Joint-Transformation-Based Detection of False
ensemble members, should decrease the ensemble error. Data Injection Attacks in Smart Grid," IEEE
We use the disagreement of an ensemble member with the Transactions on Industrial Informatics, vol. 14, no. 1, pp.
ensemble’s prediction as a measure of diversity. More 89-97, Jan. 2018.
precisely, if 𝐶𝑖 (𝑠) is the prediction of the i-th predictor for [4] G. Liang, J. Zhao, F. Luo, S.R. Weller, and Z.Y. Dong,
the sample of 𝑠 ; 𝐶 ∗ (𝑠) is the prediction of the entire ―A Review of False Data Injection Attacks Against
ensemble, then the diversity of the i-th predictor on example Modern Power Systems,‖ IEEE Transactions on Smart
𝑠 is given by 𝑑𝑖 𝑠 = |𝐶𝑖 𝑠 − 𝐶 ∗ (𝑠)| (22) Grid, VOL. 8, NO. 4, JULY 2017.
To compute the diversity of an ensemble of size n (in our [5] A.S. Musleh, G. Chen, and Z.Y. Dong, "A Survey on the
proposed schemes KFRNN and R2N2_variant, n=2), on a test Detection Algorithms for False Data Injection Attacks in
set of size m, we average the Equation (22), and obtain the Smart Grids," IEEE Transactions on Smart Grid, vol. 11,
diversity of each scheme, represented as Equation (23). no. 3, pp. 2218-2234, May 2020.
1 𝑛 𝑚
𝐷= 𝑖=1 𝑗 =1 𝑑𝑖 𝑠𝑗 (23)
𝑛∙𝑚

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

[6] A. Amulya and K.S. Swarup, "Analysis of False Data Learning-Based Intelligent Mechanism," IEEE
Injection Attacks on Multiarea Load Frequency Control," Transactions on Smart Grid, vol. 8, no. 5, pp. 2505-2516,
in proceedings of the 8th International Conference on Sept. 2017.
Power Systems (ICPS), Jaipur, India, 2019, pp. 1-6. [19] A. Ayad, H.E.Z. Farag, A. Youssef, and E.F. El-Saadany,
[7] G. Liang, S. R. Weller, J. Zhao, F. Luo, and Z.Y. Dong, "Detection of false data injection attacks in smart grids
―The 2015ukraine blackout: Implications for false data using Recurrent Neural Networks," in proceedings of
injection attacks,‖ IEEE Transactions on Power Systems, IEEE Power & Energy Society Innovative Smart Grid
vol. 32, no. 4, pp. 3317–3318, July, 2017. Technologies Conference (ISGT), Washington, DC, 2018,
[8] Y. Wang, D. Chen, C. Zhang, X. Chen, B. Huang, and X. pp. 1-5.
Cheng, "Wide and Recurrent Neural Networks for [20] Q. Yang, L. Chang, and W. Yu, "On false data injection
Detection of False Data Injection in Smart Grids," in attacks against Kalman filtering in power system
proceedings of the International Conference on Wireless dynamic state estimation," Security and Communication
Algorithms, Systems, and Applications, Springer, Cham, Networks, vol. 9, no. 9, pp. 833-849, June 2016.
2019, pp. 335-345. [21] D.B. Rawat and C. Bajracharya, "Detection of False
[9] Z. Zheng, Y. Yang, X. Niu, H. Dai, and Y. Zhou, "Wide Data Injection Attacks in Smart Grid Communication
and Deep Convolutional Neural Networks for Systems," IEEE Signal Processing Letters, vol. 22, no.
Electricity-Theft Detection to Secure Smart 10, pp. 1652-1656, Oct. 2015.
Grids," IEEE Transactions on Industrial Informatics, vol. [22] K. Manandhar, X. Cao, F. Hu, and Y. Liu, "Detection of
14, no. 4, pp. 1606-1615, April 2018. Faults and Attacks Including False Data Injection Attack
[10] Y. Wang, W. Shi, Q. Jin, and J. Ma, "An Accurate False in Smart Grid Using Kalman Filter," IEEE Transactions
Data Detection in Smart Grid Based on Residual on Control of Network Systems, vol. 1, no. 4, pp.
Recurrent Neural Network and Adaptive threshold," 370-379, Dec. 2014.
in proceedings of IEEE International Conference on [23] S. Wang, S. Bi, and Y. J. and A. Zhang, "Locational
Energy Internet (ICEI), Nanjing, China, 2019, pp. Detection of the False Data Injection Attack in a Smart
499-504. Grid: A Multilabel Classification Approach," IEEE
[11] M.H. Haghighat and J. Li, "Intrusion detection system Internet of Things Journal, vol. 7, no. 9, pp. 8218-8227,
using voting-based neural network," in Tsinghua Science Sept. 2020.
and Technology, vol. 26, no. 4, pp. 484-495, Aug. 2021. [24] Y. Deng, K. Zhu, R. Wang, and Y. Wan, "Real-time
[12] L. Sun, S. Sun, T. Wang, J. Li and J. Lin, "Parallel ADR Detection of False Data Injection Attacks Based on Load
detection based on spark and BCPNN," in Tsinghua Forecasting in Smart Grid," in proceedings of IEEE
Science and Technology, vol. 24, no. 2, pp. 195-206, International Conference on Communications, Control,
April 2019. and Computing Technologies for Smart Grids
[13] Y. Lv et al., "A classifier using online bagging ensemble (SmartGridComm), Beijing, China, 2019, pp. 1-6.
method for big data stream learning," in Tsinghua [25] S.A. Foroutan and F.R. Salmasi, "Detection of false data
Science and Technology, vol. 24, no. 4, pp. 379-388, Aug. injection attacks against state estimation in smart grids
2019. based on a mixture Gaussian distribution learning
[14] G. Xi, X. Zhao, Y. Liu, J. Huang and Y. Deng, "A method," IET Cyber-Physical Systems: Theory &
hierarchical ensemble learning framework for Applications, vol. 2, no. 4, pp. 161-171, Dec. 2017.
energy-efficient automatic train driving," in Tsinghua [26] E. Drayer and T. Routtenberg, "Detection of False Data
Science and Technology, vol. 24, no. 2, pp. 226-237, Injection Attacks in Smart Grids Based on Graph Signal
April 2019. Processing," IEEE Systems Journal, vol. 14, no. 2, pp.
[15] W. Shi, Y. Wang, Q. Jin, and J. Ma, "PDL: An Efficient 1886-1896, June 2020.
Prediction-Based False Data Injection Attack Detection [27] Y. Li, R. Huang, and L. Ma, "False Data Injection Attack
and Location in Smart Grid," in proceedings of IEEE and Defense Method on Load Frequency Control," IEEE
42nd Annual Computer Software and Applications Internet of Things Journal, Available online, doi:
Conference (COMPSAC), Tokyo, 2018, pp. 676-681. 10.1109/JIOT.2020.3021429.
[16] M. Ozay, I. Esnaola, F.T. YarmanVural, S.R. Kulkarni, [28] Y. Li, Y. Wang, and S. Hu, "Online Generative Adversary
and H.V. Poor, "Machine Learning Methods for Attack Network Based Measurement Recovery in False Data
Detection in the Smart Grid," IEEE Transactions on Injection Attacks: A Cyber-Physical Approach," IEEE
Neural Networks and Learning Systems, vol. 27, no. 8, Transactions on Industrial Informatics, vol. 16, no. 3, pp.
pp. 1773-1786, Aug. 2016. 2031-2043, Mar. 2020.
[17] G. Pu, L. Wang, J. Shen and F. Dong, "A hybrid [29] A. Abbaspour, A. Sargolzaei, P. Forouzannezhad, K.K.
unsupervised clustering-based anomaly detection Yen, and A.I. Sarwat, "Resilient Control Design for Load
method," in Tsinghua Science and Technology, vol. 26, Frequency Control System Under False Data Injection
no. 2, pp. 146-153, April 2021. Attacks," IEEE Transactions on Industrial Electronics,
[18] Y. He, G. J. Mendis, and J. Wei, "Real-Time Detection of vol. 67, no. 9, pp. 7951-7962, Sept. 2020.
False Data Injection Attacks in Smart Grid: A Deep

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2021.3113900, IEEE
Internet of Things Journal

[30] O. Sagi and L. Rokach, "Ensemble learning: A survey," Zhihao Zhang is currently working toward
Wiley Interdisciplinary Reviews: Data Mining and the master’s degree in telecommunications
Knowledge Discovery, vol. 8, no. 4, pp. e1249, Feb. and information engineering with the Nanjing
2018. University of Posts and Telecommunications,
[31] J. Zhang, Z. Li, K. Nai, Y. Gu, and A. Sallam, "DELR: A Nanjing, China. His main research interests
double-level ensemble learning method for unsupervised include deep learning and artificial
anomaly detection," Knowledge-Based Systems, vol. 181, intelligence, and its applications in Energy
no. 1, pp. 104783,Oct. 2019. Internet.
[32] E.L. Bullock, C.E. Woodcock, and C.E. Holden,
"Improved change monitoring using an ensemble of time
series algorithms," Remote Sensing of Environment, vol. Jianhua Ma (M’-91) is currently a professor
238, no. 1, pp. 111165, Mar. 2020. at Digital Media Department in the Faculty of
[33] M.H. Alobaidi, F. Chebana, and M.A. Meguid, "Robust Computer and Information Sciences, in Hosei
ensemble learning framework for day-ahead forecasting University, Japan. Dr. Ma is a member of
IEEE and ACM. He has edited 10
of household based energy consumption," Applied books/proceedings, and published more than
energy, vol. 212, no. 15, pp. 997-1012, Feb. 2018. 150 academic papers in journals, books and
[34] M. Woźniak, M. Grana, and E. Corchado, "A survey of conference proceedings. His research interest
multiple classifier systems as hybrid systems," is ubiquitous computing.
Information Fusion, vol. 16, pp. 3-17, Mar. 2014.
[35] S. Hochreiterand and J. Schmidhuber, "Long short-term
memory," Neural computation, vol. 9, no.8, pp. Qun Jin (M’95–SM’17) is a professor at the
1735-1780, Nov. 1997. Networked Information Systems Laboratory,
[36] S. Mousavian, J. Valenzuela, and J. Wang, "Real-time Department of Human Informatics and
data reassurance in electrical power systems based on Cognitive Sciences, Faculty of Human
Sciences, Waseda University, Japan. He has
artificial neural networks," Electric Power Systems
been extensively engaged in research works in
Research, vol. 96, pp. 285-295, Mar. 2013. the fields of computer science, information
[37] R.D. Zimmerman, C.E. Murillo-Sánchez, and R.J. systems, and human informatics. His recent
Thomas, "MATPOWER: Steady-State Operations, research interests cover human-centric
Planning, and Analysis Tools for Power Systems ubiquitous computing, behavior and cognitive
Research and Education," IEEE Transactions on Power informatics, big data, personal analytics and individual modeling,
Systems, vol. 26, no. 1, pp. 12-19, Feb. 2011. cyber security, blockchain, intelligence computing and
[38] S. Sridhar and M. Govindarasu, ―Model-based attack applications in healthcare, and computing for well-being. He
detection and mitigation for automatic generation authored or co-authored several monographs and more than 300
refereed papers published in academic journals and international
control,‖ IEEE Transactions on Smart Grid, vol. 5, no. 2,
conference proceedings. He is a foreign fellow of the Engineering
pp. 580–591, 2014. Academy of Japan (EAJ).
[39] Z. Yu and W. Chin, "Blind False Data Injection Attack
Using PCA Approximation Method in Smart Grid,"
IEEE Transactions on Smart Grid, vol. 6, no. 3, pp.
1219-1226, May 2015.
[40] P. Melville and R.J. Mooney, ―Creating diversity in
ensembles using artificial data,‖ Information Fusion, Vol.
6, No. 1, March 2005, pp. 99-111.

Yufeng Wang (M’-16) currently is


Professor in Nanjing University of Posts
and Telecommunications, China. From
March 2008 to April 2011, he acted as
Expert Researcher in National Institute of
Information and Communications
Technology (NICT), Japan. He is guest
researcher at Advanced Research Center
for Human Sciences, Waseda University,
Japan. His research interests focus on cyber-physical-social
systems, and data sciences, etc.

2327-4662 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Vietnam National University. Downloaded on November 21,2021 at 15:43:31 UTC from IEEE Xplore. Restrictions apply.

You might also like