You are on page 1of 21

JID: FI

ARTICLE IN PRESS [m1+;December 6, 2018;7:30]

Available online at www.sciencedirect.com

Journal of the Franklin Institute xxx (xxxx) xxx


www.elsevier.com/locate/jfranklin

Data-driven attacks and data recovery with noise on


state estimation of smart grid
Qinxue Li a,b, Shanbin Li a, Bugong Xu a,∗, Yonggui Liu a
a School of Automation Science and Engineering, South China University of Technology, Guangzhou 510640, China
b College of Computer and Electronic Information, Guangdong University of Petrochemical Technology, Maoming
525000, China
Received 17 March 2018; received in revised form 10 September 2018; accepted 15 October 2018
Available online xxx

Abstract
In this paper, we focus on the false data injection attacks (FDIAs) on state estimation and correspond-
ing countermeasures for data recovery in smart grid. Without the information about the topology and
parameters of systems, two data-driven attacks (DDAs) with noisy measurements are constructed, which
can escape the detection from the residue-based bad data detection (BDD) in state estimator. Moreover,
in view of the limited energy of adversaries, the feasibility of proposed DDAs is improved, such as
more sparse and low-cost DDAs than existing work. In addition, a new algorithm for measurement data
recovery is introduced, which converts the data recovery problem against the DDAs into the problem
of the low rank approximation with corrupted and noisy measurements. Especially, the online low rank
approximate algorithm is employed to improve the real-time performance. Finally, the information on
the 14-bus power system is employed to complete the simulation experiments. The results show that
the constructed DDAs are stealthy under BBD but can be eliminated by the proposed data recovery
algorithms, which improve the resilience of the state estimator against the attacks.
© 2018 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.

1. Introduction

Smart grid is surely a typical cyber physical system (CPS) [1,2] due to its more and
more tightly integrated with information and communication technology (ICT) to collect and

∗ Corresponding author.
E-mail addresses: auliqinxue@mail.scut.edu.cn (Q. Li), aubgxu@scut.edu.cn (B. Xu).

https://doi.org/10.1016/j.jfranklin.2018.10.022
0016-0032/© 2018 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.

Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
2 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

Fig. 1. Cyber attacks against state estimation in power system.

process the physical meter data over a wide geographical range. However, the ICT depends
on the complex network connections as well as the Internet, which is vulnerable to some
malicious adversaries. Therefore, the security and reliability of the smart grid are not easy to
be guaranteed due to cyber attacks, which draw the attention of more and more researchers
[3–5].
In order to assess the vulnerability of the smart grid and study countermeasures for cyber
attacks, we firstly study feasible attack strategies in smart grid. False data injection attacks
(FDIAs) are one of main cyber attacks, which are implemented by injecting the bias values
into the transmitted measurements between the remote terminal units (RTUs) and the state
estimator in Supervisory Control and Data Acquisition (SCADA) (the false data injected into
control data is not within the scope of this paper), as shown in Fig. 1. Note that the successful
FDIAs always depend on the known detailed knowledge of the system topology and parame-
ters, which have been proved to be undetected by the residue-based bad data detection (BDD)
in the state estimator (SE) [6]. However, owing to protection settings of the system or limited
ability of adversaries (technical aspects), the information of the system mentioned above is
always hard to acquire. Therefore, many strategies of FDIAs under different constraints are
proposed.
According to the system information obtained by adversaries, the FDIAs can be roughly
classified into three categories: FDIAs with full information of the system topology, FDIAs
with partial information of the system topology, FDIAs without the system information but
only with the measurement data [6–11]. The FDIAs with full information of the system
topology are firstly introduced in [6], and then the FDIA with information about the impedance
of transmission lines in at least one cut set of the network topology can still be constructed in
[7]. Afterwards, the research in [8] shows that the topology and parameter information of the
local attacking region are enough to construct the attack. Different from [6–8], without any
information of the system topology, the FDIAs are constructed by independent component
analysis (ICA) and principal component analysis (PCA) approximation based on linear DC
power flow model, which can acquire the estimated topological information to launch the
attacks by analysing the power flow measurement data [9,10], where the FDIA in [10] is
tested to be still valid for the nonlinear AC power data. Obviously, the FDIAs in [9,10] are
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 3

both data-driven attacks. In addition, the data-driven attacks constructed by subspace methods
are proposed [11], which can acquire the subspace structure of the system through full or
even partial measurement data. However, the attacks in [9–11] always need to corrupt all of
the measurement data, which are impossible due to the limited energy [12] or constrained
capability of adversaries. Therefore, the FDIA strategies with energy constraint are proposed in
this paper, which are lower cost or more purposeful data-driven attacks (DDAs) than existing
work. In order to resist the FDIAs, the research on the countermeasures against the FDIAs is
also proposed, which has attracted intensive attentions and great interests in recent years.
In general, the countermeasures against the FDIAs on state estimation can be divided into
four categories: self-protection strategy, eattack detection, state reconstruction, and measure-
ments recovery.
Self-protection strategy: the protected measurements are always carefully selected to in-
crease the attack cost and improve the precision of the state estimation, since it is too expen-
sive to protect the all measurements [13–15]. Nevertheless, self-protection strategy is always
expensive and depends on an accurate dynamic grid topology. Complementing with other
anti-attack technology is a better solution.
Attack detection: once being attacked, attack detection is used to detect the attacked mea-
surement data based on a prior probability distribution on the grid states or at least the grid
topology and parameter information [16,17]. The system theories, graph theories and statis-
tical structure learning are partly or wholly adopted to complete attack detection. However,
the corrupted measurement data are always discarded and can not be recovered.
State reconstruction: if the estimated state values with the corrupted measurements can
be reconstructed on the system model, the anti-attack state estimator is used. Recently, state
reconstruction technologies are proposed to solve the problem for state estimation under
attacks [4,18–22], which omit the step of attack detection and identification. However, the
state reconstruction depends on the system model, especially on state equation of systems.
It should be noted that it is difficult to model the exact state equation for complex power
systems.
Measurements recovery: without the state equation of systems but with the measurement
data, the low rank matrix factorization and nuclear norm minimization [17,23] are employed to
complete not only the detection of FDIAs but also the identification of proper operation states
in smart grid [24], which are the problem of measurement data recovery and have been widely
applied to image processing [23,25]. Especially, with the scale expansion of power systems,
low rank matrix factorization becomes more practical due to higher computational efficiency
than singular value decomposition (SVD) in nuclear norm minimization. Obviously, the two
algorithms for measurements recovery are all data-driven methods, which do not depend on
the exact state equation of the smart grid. However, in engineering, the measurement data with
gross noise and over a period of time should not be considered as a low rank matrix, which
could deteriorate the effect of the algorithms proposed in [24]. Therefore, the measurement
noise is not negligible.
Different from attacks and anti-attack countermeasures under the ideal assumptions, not
only the data-driven attack strategies but also the corresponding countermeasures for mea-
surements recovery in smart grid are presented, both of which are data-driven under the
measurement noise. In this paper, the contributions are summarized as follows:

(1) At first, it is difficult to acquire the topological structure and parameter of the smart
grid for adversaries, especially, adversaries always have the limited energy. Therefore,
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
4 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

two kinds of low cost data-driven FDIAs are constructed without any knowledge about
the system but with the noisy measurement data in this paper. Furthermore, the stealth
of the two attacks is proved.
(2) The problem of measurement data recovery against attacks is converted into the problem
of low rank approximation. Moreover, the Gaussian noise is considered, since it is more
coincident with the actual situation.
(3) The improved online low rank approximate algorithm is performed to complete the
recovery of measurement data more efficiently. The proposed algorithms are shown to
be still effective under the both low rank and sparse attack, which have not been seen
before. The experiment results show that both real-time and accuracy of the proposed
data recovery algorithms are better than the algorithms used in [24]. Moreover, the
online version of the proposed algorithm compensates the deficiency of the off-line
algorithm.

The remainder of the paper is organized as follows. The preliminaries are described in
Section 2, including state estimation model of power systems, state estimator and residue-
based BDD (Section 2.1) and (general stealth attack model (Section 2.2). The general data-
driven attacks are presented in Section 3.1, and then two kinds of data-driven attacks are
proposed in Section 3.2, which are consistent with the actual situation and under the limited
energy of adversaries. Furthermore, data-driven countermeasures for data recovery are pro-
posed in Section 4, whose off-line and online versions are elaborated in the Section 4.1 and
Section 4.2 respectively. Then, the simulation experiments for proposed attacks on the IEEE
14-bus power system are shown in Section 5.1, and the proposed data-driven countermeasures
are displayed in Section 5.2. Finally, Section 6 concludes the paper.
Notations. n denotes the state dimension and n denotes n-dimensional real vector space.
I denotes the identity matrix.  · 0 denotes the number of nonzero elements for a vector or
matrix · ,  · 1 ,  · 2 and  · F denote the 1-norm, 2-norm and Frobenius-norm of the vector
· , respectively. In addition, diag(b) denotes that the diagonal elements of a matrix is the
vector b. N(·) denotes the null space of matrix · and sign(·) denotes the signum function.
vec(·) denotes a linear transformation, which converts a matrix · into a column vector. rank(·)
denotes the rank of a matrix.

2. Preliminaries

We consider the commonly used DC model of the power system for state estimation
(SE) [26,27] to approximate the nonlinear relationship between measurements and states,
since the proposed attacks are a kind of FDIAs proposed in [6]. Specifically, the linear DC
approximation model for the measurement equation is expressed as following by assuming
that all branch resistances and shunt elements are neglected and the bus voltage magnitudes
are all equal to 1 p.u.:
z = H x + e, (1)
where the states x = θ ∈ n denote the voltage phase angles. z ∈ m is the power measurement
data, which could be the real power flow between two buses or the real power injection at
a bus (node). H denotes the measurement Jacobian matrix and e denotes the measurement
noise.
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 5

2.1. State estimation and bad data detection

In SCADA of the smart grid, assuming that the state estimator based on the weighted least
square (WLS) [26,27] is employed and can be formulated as following problem:
xˆ = arg min J (x)
x (2)
J (x) = (z − H x)T R−1 (z − H x),
where xˆ ∈ n denotes the estimated value of the state x. J(x) denotes the objective function
and R denotes the covariance matrix of the e. Then the residue vector r ∈ m can be defined
as following Eq. (3):
r = z − H xˆ (3)
Assumption 1 [27] If the H is a full column rank matrix, then Eq. (2) has the following
unique solution:
 −1
xˆ = H T R−1 H H T R−1 z (4)
according to Eq. (4), the state estimator needs to collect enough measurements to ensure the
observability of the system. Obviously, the enough measurements are still too large and their
transmissions bring great vulnerability to the system.
In addition, as bad data detection (BDD) method, the chi-square (χ 2 ) test used to detect
bad measurement data [26] during the state estimation is shown as follow:
 
|r i |
max > τ, i ∈ {1, 2, . . . , m}, (5)
Rii
where Rii denotes diagonal elements of R and ri denotes the component of r. τ indicates the
detected threshold of the bad measurement data.
Moreover, we employ the largest normalized residual (LNR) test to identify the corrupted
data if the bad data exists according to Eq. (5). Otherwise, the measurement data can pass
the BDD and will be adopted to estimate the system states. The detection task continues until
making sure that all of the bad measurement data have been removed, and then the states are
re-estimated.

2.2. Stealth attack model

In order to inject the valid attack a to the measurement data (e.g., za = (z + a ) ∈ m )


successfully, namely the FDIAs passing the BDD, the general attack model can be constructed
[6] as following linear combination of the column vectors of the Jacobian matrix H:
a = Hc (6)
where c denotes an arbitrary nonzero vector. It is important to note that Eq. (6) is established
under the assumption that the bad measurement detectors of power systems are built on the
measurement residue, such as the BDD in Section 2.1.
Remark 1. It should be noted that the success of the stealth attack in (6) depends on measure-
ment Jacobian matrix H. In other words, the adversaries must have the ability to access the
current power system configuration information and manipulate the measurements of meters
at physically protected locations. If they can, such attacks can produce arbitrary errors without
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
6 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

being detected by BDD on the measurement residue [6]. However, in fact, the adversaries
always can not obtain the full system configuration information (e.g., smart grid topology and
transmission-line admittances) due to their own limited ability (limited resources and energy)
and system safeguard but only on the intercepted measurement data [10,12]. Therefore, the
research on data-driven attacks is meaningful and more consistent with the actual situation.

3. Constructed data-driven attacks

In general, the stealth attacks can be constructed under the assumption that the adversaries
can acquire full information about the system topology of the smart grid [3,15,28]. However,
this full information is always hard to acquire, hence data-driven attacks generate.

3.1. Data-driven attacks (DDAs)

Actually, the information from the measurement data is very rich. Therefore, inspired by
the idea of the data mining, subspace analysis or component pursuit methods are employed to
acquire the key information and then launch the valid attacks by adversaries [9–11]. With the
full or partial measurement data, the former namely subspace analysis is introduced to find the
nonzero vector in the column space of H and then construct an unobservable attack. While in
the latter, PCA or ICA approximation methods are proposed to transform each measurement
vector into the linear combination of a vector with principal or independent components, as
shown in the following form:
z = Hapx x˜ (7)
m×n
where x˜ ∈  denotes principal or independent component, and Hapx ∈ 
n
denotes linear
relationship between the measurements z and x˜. Therefore, if the x˜ is acquired by analyzing
the data z, the Hapx can be calculated by Eq. (7). In other words, both x˜ and Hapx can be
produced on z by the PCA or ICA approximation method, which is shown as follow (taking
the PCA approximation method as the example):
[Hapx , x˜] = PCA (Z, n), Z = (z1 , z2 , . . . zK ), (8)
where K is the total sampling time over a time period.
Note that the measurement data in smart grid can be eavesdropped in the form of the
node by adversaries, including injection power data of the node (bus) and power flow data of
branches directly connected to the node. Therefore, the adversaries can acquire n according
to the number of measurement data packets. Then the stealth attacks can be produced by this
new topology matrix Hapx , which are described as the following Lemma [9,10].
Lemma 1. The attack vector a ∈ m can be constructed as the following form:
a = Hapx c1 (9)
where c1 ∈  denotes an arbitrary nonzero vector. Note that a can be almost stealthy if the
n

formula Hapx ≈ HPx Hapx is established, where Px ∈ n×n denotes an projection matrix of the
principal or independent component x˜ to original state vector x, namely, x ≈ Px x˜.
Remark 2. Lemma 1 shows that the valid data-driven attack can be injected into smart grid
and pass the BDD successfully, which is a data-driven False Data-Injection Attack (DDFDIA)
in essence. Once the Hapx is calculated through analysis of measurement data, the attack can
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 7

be constructed with arbitrary nonzero entries of the c1 . The goal of these data-driven attacks
in [9,10] is to find stealthy data-driven FDIAs without the topological information of the
smart grid, but not to search the lower cost or optimal data-driven FDIAs. Therefore, the
research on lower cost or optimal data-driven attacks based on the [9,10] is necessary, after
all, the adversaries do not have the ability to corrupt all or most of the sensor data in smart
grid.

3.2. The optimized data-driven attacks

3.2.1. Random lower cost data-driven attacks


In this Section, we assume that the adversaries can choose any state vectors to contami-
nate if there is no sensor being protected specially. In other words, arbitrary measurements
can be chosen to be corrupted by adversaries, hence, the corrupted targets in this kind of
attacks are random. However, corrupting all of measurement data is impossible according to
[12,29,30] due to the limited energy or power of adversaries, which is high cost for adver-
saries. Therefore, constructing the lower cost DDAs that satisfy Eq. (9) is our goal. As far
as we know, there is no relevant research on such DDAs that being practical and considering
the feasibility of attacks.
According to the projection matrix P = H (H T H )−1 H T and Eq. (6), we have the formula
[6]:
 
−1
( P − I )a = H ( H T H ) H T − I H c = 0

Thus, the stealthy and sparse attacks can be constructed as following formula, when B =
P − I:
min a0
(10)
s.t. Ba = 0, a = 0
Obviously, the Eq. (10) is difficult to solve since the formula denotes a non-convex problem.
Therefore, this problem can be translated into solving v by searching the null space of B [13]:

N(B) = {v ∈ m |Bv = 0 } (11)

In order to obtain the lower cost of attacks, we try our best to search the sparser attack
vector in the null space of B. Generally, the small entry values of a attack vector always are
tolerated as long as their average energy being within the variance of the measurement noise,
whose injection is meaningless and negligible. Instead, the large entry values of the attack
vector v are our focus. Therefore, we introduce a shrinkage operator Sδ here:

v, |v| ≥ δ
Sδ ( v ) = , (12)
0, |v| < δ
where δ is a key threshold that decides the stealth and sparsity of the attack vector, which
should be adjusted along with the tolerable noise level. M denotes the maximum value of pro-
posed attack. Thus, the random low cost DDAs based on measurement data can be constructed
as the following Algorithm.
To testify the stealth of the attack in Algorithm 1, we have the following Proposition 1 to
show that this random data-driven attack is feasible.
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
8 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

Algorithm 1 Random Low Cost DDA (RLCDDA).


Input: Z, z, n, M > 0, δ > 0

(1) Hapx = PCA (Z, n) by (8).


(2) Calculating Bapx = Hapx (Hapx T Hapx )−1 Hapx T − I .
(3) The null space matrix V of B is found so that ith column of V satisfy the equation
Bvi = 0.
(4) Acquiring the largest variance of vi in V , namely v.
(5) Zoom down or up the value of v with operator
  η: v = ηv, where η=M max (v).
(6) The RLCDDA a can be obtained: a=Sδ v .
(7) za = z + a

Output: corrupted measurement data za

Proposition 1. a = Hapx c1 , if and only if Bapx a = 0, where Bapx = Hapx (Hapx T Hapx )−1 Hapx T −
I . Proof: According to Theorem 3.2 in [6], a = H c ⇔ Ba = 0, where a is a stealthy attack.
With the relationship Hapx ≈ HPx in Lemma 1, we have:

Bapx = Hapx (Hapx T Hapx )−1 Hapx T − I


= H Px (Px T H T H Px )−1 Px T H T − I
= H Px Px −1 (H T H )−1 (Px T )−1 Px T H T − I
= H (H T H )−1 H T − I
=B

Therefore Bapx a = Ba = 0. According to (10), a is undetected by BDD.


Remark 3. RLCDDA in Algorithm 1 mainly depends on the collected measurement data
over a period of time. Once Hapx is calculated and not yet updated, the attack vector a is
invariable. Therefore, a is fixed until the Hapx is updated (assuming that M and δ are fixed),
which leads to the low rank of RLCDDA. In addition, obviously, the RLCDDA is sparser
due to lower cost than the DDAs proposed in [9–11].

3.2.2. Sparse targeted data-driven attacks


In the section, a kind of targeted data-driven attacks that sparser than RLCDDA is pre-
sented, in which the states of nodes that can be attacked are limited. Moreover, the adversaries
have limited resources (such as limited energy) to tamper up to k measurement data. The set
 denotes the set of state variables of targeted nodes, which denotes the accessible nodes
(buses) by adversaries due to the partial node protection of systems. Thus,  ¯ denotes the set
of off-targeted state variables, and we have the following Lemma that presented in [6].

Lemma 2. a = H c if and only if B¯ a = y, where y = B¯ b, b = j∈ h j c j , hj denotes jth


column of Jacobian matrix H, and cj denotes jth entry of c. B¯ = H¯ (H¯ T H¯ )−1 H¯ T − I ,
where H¯ denotes the sub-matrix of H, in which the indices of columns are not in the set .
According to Lemma 2, we have the following Proposition, which can be adopted to
construct the sparse targeted DDA.
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 9

 ¯    T  −1 ¯ ¯ ¯ ¯
Proposition 2. a = Hapx c1 , if and only if Bapx a = y, where Bapx = Hapx [(Hapx ) Hapx ]
¯
 T ¯
(Hapx ) − I . Hapx denotes a sub-matrix of Hapx , in which indices of columns are not in
¯

the set . y = Bapx b.
The proof of the Proposition 2 is similar to the Proposition 1 based on the proof of the
Lemma 2 in [6].
Thus, the construction of the attack can be formed into the following optimization problem:
min a1
¯
 (13)
s.t. y = Bapx a,
where a1 denotes l1 relaxation of a0 [31]. Then, the problem Eq. (13) can be reformed
as a regressor selection problem further since the adversaries have limited resources to tamper
up to k measurement data.
2
¯

min y − Bapx a
2 (14)
s.t. a 0 ≤ k
2
¯

where y − Bapx a denotes the cost function and should be minimized to reduce the proba-
2
bility of being detected. In addition, the alternating direction method of multipliers (ADMM)
[32] is employed to obtain the solution of the Eq. (14). We define the augmented Lagrangian
parameter ρ and the maximum value of iterations tmax , then the Algorithm 2 is presented to
construct the following sparser and targeted DDA. where the update of a involves the regu-
larized least squares (RLS) problem. Therefore, through the input k, Algorithm 2 controls the
trade-off between the error of RLS and sparsity, which is a feature selection problem.

Algorithm 2 Sparse Targeted DDA (STDDA).


Input: Z, z, n, k, , y, ρ and tmax .

(1) Initialize: number of iterations t = 0; Optimization variable β=0 and dual variable
u = 0 in ADMM [32].
(2) Hapx = PCA (Z, n), Z = z1 , z2 , . . . zi . . . zK , K denotes the maximum sampling number.
¯ ¯
 ¯ T 
 ¯ ¯ T
(3) Calculating Bapx = Hapx [(Hapx ) Hapx ]−1 (Hapx 
) − I.
(4) The vector a is updated by ridge regression:
 T −1  T 
¯
 ¯
 ¯
at+1 = Bapx Bapx + ρI Bapx y + ρ(β t −ut )

(5) The projection


S is computed by hard thresholding to update β, in which the k largest
values of at+1 + ut are kept:
β t+1 =
S (at+1 + ut )
where
S denotes the projection onto S.
(6) u is updated: ut+1 = ut + at+1 − β t+1 .
(7) If t < tmax , go to step 3), else the iteration process should be stopped and attack a is
acquired.
(8) za = z + a

Output: corrupted measurement data za


Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
10 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

Table 1
Characteristics of three attacks and true measurement data.

DDA in [10] RLCDDA STDDA Ztrue


Sp 1 0.241 0.037 1
Rank 14 1 47 10
Max 26.760 50 18.262 9.481
Min −24.260 −7.475 −9.476 −7.739

According to the Algorithm 1 and 2, the essential differences between the two data-driven
attacks are that the RLCDDA is low rank while the STDDA is sparse but not low rank, and
the more details can be shown in Table 1 and its related instructions.

4. The proposed data-driven countermeasures for data recovery

The powerful ability of adversaries and vulnerability of smart grid have been shown in Sec-
tion 3, even if the adversaries can only acquire the transmitted measurement data. Therefore,
the researches on the countermeasures against the proposed data-driven attacks are necessary.
Similar to data-driven attacks and without a prior probability distribution on the grid states,
as a defender, new data-driven countermeasures for measurement data recovery are proposed
in this Section, which not only detect the attacks but also separate the injected data and
measurement noise from the true measurement data.

4.1. Low rank approximate method with noise

In fact, the defender receives the measurement data Za = Z + A = Ztrue + E + A, where


Ztrue ∈ m×K (K is the total sampling time over a period of time) is the true measurement
matrix, which is a low rank matrix due to the slow change of state variables in smart grid.
In other words, there is the intrinsic temporal correlations of measurement data [24], which
leads to the low rank of Ztrue . On the other side, as adversaries, they have no inexhaustible
ability to attack much more measurements and have restricted qualification to compromise all
meters, hence, the attack matrix A ∈ m×K is sparse. E ∈ m×K is a white Gaussian noise.
Different from the model of [17,24,33] (namely Za = Ztrue + A), we adopt the following
model to fit the actual situation closely:
Za = Ztrue + E + A (15)
Based on (15), our objective is to obtain the Ztrue from Za . A Low Rank Approximate
Method (LRAM) is proposed to recover the Ztrue owing to the low rank of Ztrue and sparsity of
A, which runs the low rank approximation and the sparse approximation alternatively [34,35].
Especially, bilateral random projections (BRP) [34] are presented to replace the singular value
decomposition (SVD) with large computational complexity in nuclear norm minimization
algorithm [23,24], which accelerates the running speed. Thus, the fast low rank approximation
is completed by BRP of Za ∈ m×K (m ≥ K) as follow:
Ztrue =Y1 (GT2 Y1 )−1Y2T , (16)
where Y1 = Za G1 , Y2 =ZaT G2 , G1 ∈ K×r and G2 ∈ m×r . r is the estimated rank of Ztrue , which
can be obtained through uncorrupted measurement data. Moreover, the power scheme modifi-
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 11
 q
cation is applied to calculate BRP of Za : Z˜a = Za ZaT Za , in which the singular values of Z˜a
decay faster than Za . Furthermore, with the increasing of the power scheme q, the error of the
fast low rank approximation becomes smaller [34]. Thus, the proposed LRAM is described
as Algorithm 3 and can be solved by minimizing the decomposition error as following:
min Za − Ztrue − A2F +λvec(A )1
L,S , (17)
s.t. rank(Ztrue ) ≤ r
where λ is the weight of L1 norm regularization for entries of A, r is the maximum estimated
value of the rank for Ztrue .

Algorithm 3 Low Rank Approximate Method (LRAM).


Input: Za ∈ m×K , r, λ, ε and power scheme q.

(1) Initialize: Ztrue 0


=Za , A0 =0 and t = 0 (number of iterations),
If m ≥ K , G1 ∈ K×r ;
Else G1 ∈ r×K , Za = ZaT , where G1 is a random matrix;
End. 2 
While Za − Ztrue t
− At F Za 2F > ε, do
(2) t = t + 1.
  T q  
(3) Z˜true = Za − At−1 Za − At−1 Za − At−1 .
(4) Y1 = Z˜true G1 , G2 =Y1 .
(5) The QR decomposition of Y1 and Y2 :
Y2 =Z˜true
T
 Y1 T= Q
˜
 2 R2 , Y1 =ZtrueY2 = Q1 R1 .
(6) If rank G  2 Y1 < r
r = rank G2 T Y1 and go to step 2)
End   −1 1/ (2q+1) T
(7) Updating Ztrue t t
: Ztrue =Q1 R1 GT2 Y1 R2T Q2 .
(8) At is updated: At =Sλ (Za −Ztrue
t
), where λ is the weight of L1 norm
 regularization
  for en-
tries of A , then Sλ (d ) = (i, j) ∈ [m] × [n] | sign(di j ) × max di j  − λ, 0 is a soft
t

thresholding operator with the threshold λ.

End while
Output: Ztrue , A

According to Algorithm 3, the noise E can also be obtained by E = Za − Ztrue − A. Thus,


the true measurement Ztrue can be used to complete the next state estimation. Meanwhile, the
data-driven attacks are detected and identified.
Remark 4. Note that there are two main improvements of [24] in the proposed LRAM.
Firstly, the low rank matrix Z ≈ Ztrue in [24] is the noisy measurement matrix, which is not
strictly low rank in fact. Therefore, the noise considered in the model (15) is closer to the
real situation than the model in [24], which can reduce the decomposition error. Secondly,
the calculation burden in the proposed LRAM can be reduced due to the BRP instead of
SVD in nuclear norm minimization algorithm (proposed in [24]). The effect of the proposed
algorithm and the algorithms in [24] will be compared in simulation experiments.
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
12 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

4.2. The proposed online low rank approximate method

As mentioned before, it is obvious that the received measurement data must be collected
over a period of time and forms a measurement matrix Za , which lends to miss the early
opportunity to resist the malicious attacks (e.g., [24]). On the one hand, when the observed
measurement vector zai , i = 1, 2, . . . K arrives, we hope that the new data is evolved as soon
as possibly. On the other hand, when more and more measurement data are collected, the
running speed of proposed data-driven LRAM will be more and more slow due to the growth
of K. Therefore, the new problem appears: how to improve real-time of the proposed LRAM
and reduce storage burden? To solve these problems, the online frame for LRAM is proposed
as Algorithm 4. According to Algorithm 4, there are at most m measurement vectors to be
stored and processed, which is an acceptable amount of data and meanwhile can reflect the
characteristics of the data with growing value of K.

Algorithm 4 Online Frame for LRAM (Online LRAM).


Input: zai ∈ m×1 , r, λ, ε and q

(1) Initialize: i=1, m = size(zai )


While true
(2) If i < m
[Ztrue , A]=LRAM (Za , r, λ, ε, q) ,
where Za = [za1 , . . . , zai ], Ztrue = ztrue
1
, . . . , ztrue
i
, A = [a1 , . . . , ai ], ztrue
i
∈ m×1 , ai ∈
 ;
m×1

Else
(i−m+1):i
[Ztrue , A(i−m+1):i ]=LRAM (Za(i−m+1):i , r, λ, ε, q);
End
(3) i=i + 1.
(4) If the new measurement vector zai is received, go to step (2);Else, break.

End while
i
Output: ztrue , ai

Remark 5. Algorithm 4 shows that the measurement matrix Za at most consisting of m


measurement vectors needs to be processed by LRAM. Hence, the real-time performance is
greatly improved and meanwhile the storage burden is reduced. However, why the maximum
column dimension of the measurement matrix Za is m? As mentioned before, the row di-
mension of Za (Za = [za1 , . . . , zai ]) is the constant m. If the low rank components of Za are
what we need to explore, the overly small column dimension of Za could not reflect this
low rank characteristic. In contrary, too large column dimension of Za is not what we de-
sire, which has bad performance on real-time and storage. Especially, when sampling time
larger than m, the latest m measurement vectors are enough for data analysis, although it is
still conservative.
 1:i+m  Therefore,
 1:mwhen the rank(Ztrue ) m, Ztrue ∈ m×i (i ≥ 1 is an integer), we
have rank Ztrue ≈ rank Ztrue , i = 1, 2, . . . ∞. In other words, m is the maximum column
dimension of the measurement matrix, which is reasonable in the proposed online frame for
LRAM.
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 13

5. Simulation experiments

In order to display the results of simulation experiments in detail, the performance of


proposed attacks and algorithms for the data recovery are evaluated in simple case, such
as the IEEE 14-bus power system [36] that includes 14 buses (n= 14), 5 generators, 20
transmission lines. The measurement data are collected over K = 100 sampling times and
m = 54. The bus states and true measurement data are obtained based on matpower [36].
Due to the proposed attacks (RLCDDA+STDDA), the corrupted measurements can mislead
the state estimator to send false state information to the control center, which are dangerous
for smart grid. Therefore, we show the impacts of proposed attacks, and then the data-driven
countermeasures against these attacks are also displayed in this Section.

5.1. Simulation experiments for proposed attacks and attack characteristics

Firstly, the proposed two attacks (RLCDDA and STDDA) shown in Fig. 2 are lower cost
and sparser than the DDA in [10], where the parameters in RLCDDA are set: M
=50, δ=3
and the parameters in STDDA : k =2, ρ=1.8, tmax =1000, y = Bapx Ā
b and b = j∈A h j c j .
Moreover, the set  is randomly generated according to k. In addition, the sparsity of the
attack matrix A is defined to exhibit the characteristics of attacks:
A0
Sp = 0 ≤ Sp ≤ 1 (18)
m×n
Thus, the characteristics of attacks and true measurements including the sparsity and rank
of the matrix, the maximum (max) and minimum (min) values of matrix entries can be shown
in Table 1 According to Table 1 and Fig. 2, the sparsity of the DDA in [10] is 1, which
means all of measurement data are tampered, needing the high cost and requirements of
adversaries. In other words, for adversaries, the DDA in [10] is hard to launch. Conversely,

a
Amplitude (MW)

50
0
−50
50 100
50
0 0
m
b
Amplitude (MW)

20
0
−20
50 100
50
0 0
m
Sampling time
Fig. 2. (a) Random Lower Cost Data-driven Attack (RLCDDA); (b) Sparse Targeted Data-driven Attacks (STDDA).

Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
14 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

0.2
RLCDDA
STDDA
0.15

True Positive Rate


0.1 X: 44
Y: 0.07692

0.05

0
0 20 40 60 80 100
Sampling time
Fig. 3. The true positive rate by BDD under the proposed attacks (RLCDDA and STDDA).

the RLCDDA and STDDA are much easy to implement due to their lower cost and sparsity,
the sparsity of RLCDDA and STDDA are 0.241 and 0.037 respectively. Specifically, the rank
is the main difference between RLCDDA and STDDA, where the rank of RLCDDA is 1 since
the RLCDDA depends on the update of the Hapx , while the rank of STDDA is 47. Overall,
the RLCDDA is a sparse and low rank DDA, and the STDDA is a sparse but an un-low rank
DDA. Furthermore, to show the stealth of two attacks proposed in the paper, we adopted the
following definition of the true positive rate Rtp (0 ≤ Rtp ≤ 1):
Nhit
Rt p = , (19)
Nhit + Nmiss
where Nhit is the number of detected attack entries by BDD, and Nmiss is the number of
undetected attack entries. The detected threshold of bad data in BDD is set in the experiment:
τ = 0.5, which is much smaller than amplitude of two attacks shown in Table 1 and Fig. 2.
In addition, Fig. 3 exhibits the Rtp under the proposed attacks (RLCDDA and STDDA).
According to the Fig. 3, we observe that the STDDA is absolutely stealthy (Rt p=0), while
the maximum value of Rtp under RLCDDA is 0.077, which means that the most elements of
the malicious attack are injected into measurement data successfully except very few elements
detected and discarded. In other words, large amounts of the undetected corrupted data are
used to complete state estimation and then mislead the control center to make erroneous
judgments and instructions.

5.2. Simulation experiments for proposed data-driven countermeasures(DDC)

To acquire the true measurement data, the LRAM and online LRAM based on measurement
data are proposed as data recovery algorithms. The parameters of the LRAM and online
LRAM are both set r = min (m, K ), λ=8, ε =10−6 , q=3. In addition, EDec is defined as the
error of decomposition for proposed data recovery algorithms.

Za − Z t − At 2
true
EDec = F
. (20)
Za 2F
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 15

0.7
Inexact−alm
0.6 LMafit
LRAM (proposed)
0.5 Time: 0.17803

EDec 0.4
Time: 0.047648
0.3

0.2
Time: 0.0022798
0.1

0
0 10 20 30 40 50
Iterations
Fig. 4. EDec and decomposition time for inexact-alm, LMaFit and LRAM (off-line) under noise (SNR=1.137) and
RLCDDA.

In order to reflect superiority of the proposed data-driven countermeasures, both of nu-


clear norm minimization algorithm (namely inexact-augmented lagrange multiplier, inexact-
alm [23]) and low rank matrix factorization (namely low rank matrix fitting, LMaFit) [25])
presented in [24] are introduced to complete data recovery with our proposed algorithms. The
parameters in inexact-alm are set as follows: tolerance for stopping criterion is 10-6 , weight on
sparse error term in the cost function is m−0.5 and maximum number of iterations is 50. The
initial rank estimate is min (m, K) in LMaFit. Moreover, the online version for inexact-alm
and LMaFit are similar to online frame of LRAM in Algorithm 4.

5.2.1. Simulation experiments for proposed DDC under RLCDDA


According to the Table 1, the RLCDDA is low rank and sparse attack and the true mea-
surement matrix Ztrue is low rank but not sparse matrix. Moreover, it is noted that the rank of
RLCDDA is much lower than the rank of Ztrue . Firstly, the proposed off-line algorithm LRAM
under RLCDDA is displayed both on time and error of decomposition with the inexact-alm
and LMaFit in Fig. 4, where measurement data are collected over 100s (sampling time), max-
imum number of iterations is 50. In particular, the signal-to-noise ratio (SNR) is 1.137, which
is large enough to display the superiority of the proposed methods against attack under white
Gaussian noise.
Comparing with the inexact-alm and LMaFit, the proposed LRAM is not only fast conver-
gence but also high precision. The decomposition time of the LRAM is just 2.280 × 10−3 s in
Fig. 4, which proves that the LRAM is an effective algorithm to acquire the true measurement
data. In addition, EDec and decomposition time for online version of the three algorithms are
shown in Figs. 5 and 6, both of which display the superiority of LRAM (online) among
three online algorithms. According to Fig. 5, the maximum value of EDec for three online
algorithms are 2.805 × 10−14 , 2.039 × 10−3 and 1.187 × 10−15 respectively. From Fig. 6, the
decomposition time of corrupted measurement data at 100s (sampling time) for three online
algorithms are 0.0839 s, 0.00719 s and 8.080 × 10−4 s respectively, where the decomposition
time for online LRAM is shortest among the three online algorithms and also less than the
decomposition time (2.280 × 10−3 s) of off-line LRAM. Through the simulation, we validate
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
16 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

a −14 X: 99
x 10 Y: 2.805e−14
4

EDec
2
0
0 20 40 60 80 100
b −3 X: 22
x 10 Y: 0.002039
4
EDec

2
0
0 20 40 60 80 100
c −15 X: 100
x 10 Y: 1.187e−15
2
EDec

1
0
0 20 40 60 80 100
Sampling time
Fig. 5. EDec for three algorithms under noise (SNR=1.137) and RLCDDA. (a) By inexact-alm; (b) By LMaFit; (c)
By LRAM (online).

a X = 100
0.2 Y = 0.0839
0.1
0
0 20 40 60 80 100
Decomposition Time

b
X = 100
0.04 Y = 0.00719
0.02
0
0 20 40 60 80 100

c x 10
−3
X = 100
4 Y = 0.000808
2
0
0 20 40 60 80 100
Sampling time
Fig. 6. Decomposition time for three algorithms under noise (SNR=1.137) and RLCDDA. (a) By inexact-alm; (b)
By LMaFit; (c) By LRAM (online).

that the proposed off-line and online LRAM can be applied to accomplish not only the de-
composition of the low rank measurement matrix mixed with sparse attack matrix in large
noisy environment, but also the decomposition of the low rank measurement matrix mixed
with sparse and low rank attack matrix.

5.2.2. Simulation experiments for proposed DDC under STDDA


Firstly, the proposed off-line LRAM under STDDA is displayed both on time and error of
decomposition comparing with the inexact-alm and LMaFit in Fig. 7. In addition, the error
of decomposition EDec and decomposition time for online version of the three algorithms are
also shown in Figs. 8 and 9. It should be noted that the STDDA is a sparse but not low rank
attack matrix, which ensures the great difference between the low rank measurement matrix
and STDDA matrix. Therefore, the STDDA is more different from the low rank measurement
matrix Ztrue than RLCDDA, which is better close to the low rank approximation model (15).
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 17

0.7
Inexact−alm
0.6 LMafit
LRAM (proposed)
0.5 Time: 0.17323

EDec 0.4
Time: 0.048585
0.3

0.2 Time: 0.0017339

0.1

0
0 10 20 30 40 50
Iterations
Fig. 7. EDec and decomposition time for inexact-alm, LMaFit and LRAM (off-line) under noise (SNR=1.137) and
STDDA.

a −14 X: 99
x 10 Y: 2.514e−14
4
EDec

2
0
0 20 40 60 80 100
b −3
x 10
2
EDec

1 X: 7
0 Y: 0.001764
0 20 40 60 80 100
c −15 X: 96
x 10
2 Y: 1.043e−15
EDec

1
0
0 20 40 60 80 100
Sampling time
Fig. 8. EDec for three algorithms under noise (SNR=1.137) and STDDA. (a) By inexact-alm; (b) By LMaFit; (c)
By LRAM (online).

In other words, the precision of data recovery by proposed DDC under STDDA should be
higher than that under RLCDDA in theory. Moreover, the SNR is also 1.137 in this scenario.
According to Figs. 7–9, faster convergence and higher precision of the LRAM than inexact-
alm and LMaFit are shown clearly. Moreover, the off-line LRAM converges rapidly within
very few iterations. Hence, the decomposition time of off-line LRAM is 1.734 × 10-3 s in
Fig. 7, which is the shortest time among three off-line algorithms. Furthermore, the online
LRAM also has a good performance in Figs. 8 and 9. The maximum errors of decomposition
EDec for three online algorithms are 2.514 × 10-14 , 1.764 × 10-3 and 1.043 × 10-15 respectively.
Similarly, the decomposition time of corrupted measurement data at 100s (sampling time) for
three online algorithms are 0.182 s, 0.019 s and 1.500 × 10-3 s respectively.
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
18 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

a X = 100
0.4 Y = 0.182
0.2
0
0 20 40 60 80 100

Decomposition Time
b
X = 100
0.04 Y = 0.019
0.02
0
0 20 40 60 80 100
c
−3
x 10 X = 100
4 Y = 0.0015
2
0
0 20 40 60 80 100
Sampling time
Fig. 9. Decomposition time for three algorithms under noise (SNR=1.137) and STDDA. (a) By inexact-alm; (b) By
LMaFit; (c) By LRAM (online).

Table 2
Influence of noise on the three online algorithms under RLCDDA.

SNR Max (EDec ) Decomposition time at 100 s

Inexact-alm LMaFit LRAM Inexact-alm LMaFit LRAM


17.381 2.034 × 10-13 1.499 × 10-15 1.313 × 10-15 0.113 s 9.210 × 10-3 s 1.050 × 10-3 s
6.499 5.625 × 10-14 2.063 × 10-14 1.273 × 10-15 0.169 s 0.012s 1.720 × 10-3 s
1.137 2.805 × 10-14 2.039 × 10-3 1.187 × 10-15 0.084 s 7.190 × 10-3 s 8.080 × 10-4 s
−0.8494 2.129 × 10-14 3.950 × 10-3 9.907 × 10-16 0.089 s 1.240 × 10-2 s 1.020 × 10-3 s

Comparing Fig. 5 with Fig. 8, the maximum errors of decomposition for online LRAM
under RLCDDA and STDDA are 1.187 × 10-15 and 1.043 × 10-15 respectively. Therefore, due
to the similar low rank of RLCDDA and the true measurement matrix Ztrue , the effect of
online LRAM under STDDA in the paper is slightly better than that under RLCDDA.

5.2.3. Influence of noise on the proposed DDC under attacks


According to Eq. (15), in fact, the measurement noise could increase the rank of the
measurement matrix, namely rank(Ztrue ) < rank(Ztrue + E ), which leads to the deteriorating
performance of the nuclear norm minimization and the low rank matrix factorization in [24].
Therefore, the measurement noise should be taken into consideration in the measurement
equation. Thus, the corrupted measurement data under malicious attacks are made up of three
parts: the clean low rank measurement matrix, the sparse attack matrix and Gaussian noise.
To verify the speculation, the influence of Gaussian noise on the three online algorithms under
RLCDDA and STDDA are tested through simulation experiments. Furthermore, Tables 2 and
3 show the corresponding maximum value of EDec (Max (EDec )) and decomposition time at
100s when different SNRs are selected.
According to Table 2 and 3, whatever the SNR is, the proposed online LRAM has the
best performance on decomposition accuracy and time among the three online algorithms.
In addition, the gross noise (SNR=1.137 and −0.8494) have a great impact on LMaFit,
which lead to reduced performance on decomposition accuracy. But the decomposition time
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 19

Table 3
Influence of noise on the three online algorithms under STDDA.

SNR Max (EDec ) Decomposition time at 100 s

Inexact-alm LMaFit LRAM Inexact-alm LMaFit LRAM


17.381 2.045 × 10-13 1.803 × 10-15 1.183 × 10-15 0.168 s 0.012s 1.610 × 10-3 s
6.499 5.775 × 10-14 1.840 × 10-14 1.106 × 10-15 0.085 s 8.760 × 10-3 s 8.220 × 10-4 s
1.137 2.514 × 10-14 1.764 × 10-3 1.043 × 10-15 0.182 s 0.019s 1.500 × 10-3 s
−0.8494 2.215 × 10-14 8.455 × 10-4 1.212 × 10-15 0.091s 0.012s 1.260 × 10-3 s

of LMaFit are always less than that of inexact-alm. Hence, inexact-alm wastes the longest
time and its performances on decomposition accuracy are always in the middle of the three
algorithms.

6. Conclusion

In the paper, two kinds of data-driven attacks (DDAs) and corresponding countermeasures
are presented on the state estimation of the smart grid. Especially, the proposed DDAs are
successfully launched without the system topology and parameters but only on the noisy
measurements, which are lower cost and sparser due to the limited energy and capacity of
adversaries. Obviously, the proposed DDAs are more realistic than the general FDIAs and
prone to implement. In addition, a new algorithm for measurement data recovery against
the proposed data-driven attacks is presented, which exploits the low rank of the measure-
ment data over the time and is achieved by the techniques of low rank pursuit and matrix
decomposition. Especially, the gross measurement noise is considered into the scenes. More-
over, the online version of this algorithm to improve the real-time performance is exploited.
The simulation experiments on 14-bus power system show the stealth of the constructed
DDAs in BBD of SE. Meanwhile, the constructed DDAs can be eliminated by the pro-
posed data recovery algorithms that improve the anti-attack of state estimation. It should be
noted that the constructed attacks and algorithms for measurement data recovery are both
data-driven, which are state model-free or topology-free and no longer depend on the ex-
act state equation. Obviously, once the proposed algorithms for measurement data recovery
are feasible and efficient, the design of a complex state estimator against attacks can be
avoided.
In particular, the proposed algorithms for measurement data recovery are displayed to be
equally effective against the low rank and sparse attack in simulation experiments, but there is
no theoretical support on the situation, which is necessary work in future. In addition, event-
triggered or self-triggered fashion such as [37–39] can reduce the communication burden and
calculation burden of the proposed data recovery method, which should be considered in our
next work. Furthermore, data-driven algorithms applied to resist other types of attacks in CPS
are also our next work.

Acknowledgment

This work was supported by Natural Science Foundation of China [NSFC] -Guangdong
Joint Foundation Key Project [Grant no. U1401253], NSFC [Grant nos. 61573153 and
61672174], Foundation of Guangdong Provincial Science and Technology Projects [Grant
Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
20 Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx

no. 2013B010401001], Fundamental Research Funds for the Central Universities [Grant no.
2015ZZ099], Guangzhou Science and Technology plan project [Grant no. 201510010132],
Maoming science and technology plan project [Grant no. MM2017000004], and the National
Natural Science Foundation of Guangdong Province [Grant no. 2016A030313510].

References

[1] L.F.e. a. Liang G Zhao J, Bibtex: a review of false data injection attacks against modern power systems, IEEE
Trans. Smart Grid 8 (4) (2017) 1630–1638, doi:10.1109/TSG.2015.2495133.
[2] X. Cao, L. Liu, W. Shen, A. Laha, J. Tang, Y. Cheng, Real-time misbehavior detection and mitigation in cyber-
physical systems over wlans, IEEE Trans. Ind. Inf. 13 (1) (2017) 186–197, doi:10.1109/TII.2015.2499123.
[3] K.O. Liang J, L. Sankar, Bibtex: Vulnerability analysis and consequences of false data injection attack on power
system state estimation, IEEE Trans. Power Syst. 31 (5) (2016) 3864–3872, doi:10.1109/pesgm.2017.8273736.
[4] H. Fawzi, P. Tabuada, S. Diggavi, Secure estimation and control for cyber-physical systems under adversarial
attacks, IEEE Trans. Autom. Control 59 (6) (2014) 1454–1467, doi:10.1109/TAC.2014.2303233.
[5] Q. Li, B. Xu, S. Li, Y. Liu, D. Cui, Reconstruction of measurements in state estimation strategy against
deception attacks for cyber physical systems, Control Theory Technol. 16 (1) (2018) 1–13, doi:10.1007/
s11768- 018- 7080- y.
[6] R.M.K. Liu Y, P. Ning, False data injection attacks against state estimation in electric power grids, ACM Trans.
Inf. Syst. Secur. (TISSEC) 14 (1) (2011) 13, doi:10.1145/1952982.1952995.
[7] M.A. Rahman, H. Mohsenian-Rad, False data injection attacks with incomplete information against smart power
grids, in: Global Communications Conference (GLOBECOM), 2012 IEEE, IEEE, 2012, pp. 3153–3158, doi:10.
1109/GLOCOM.2012.6503599.
[8] X. Liu, Z. Bao, D. Lu, Z. Li, Modeling of local false data injection attacks with reduced network information,
IEEE Trans. Smart Grid 6 (4) (2017) 1686–1696, doi:10.1109/tsg.2015.2394358.
[9] M. Esmalifalak, H. Nguyen, R. Zheng, Z. Han, Stealth false data injection using independent component analysis
in smart grid, in: Proceedings of the IEEE International Conference on Smart Grid Communications, 2011,
pp. 244–248, doi:10.1109/smartgridcomm.2011.6102326.
[10] Z.H. Yu, W.L. Chin, Blind false data injection attack using Pca approximation method in smart grid, IEEE
Trans. Smart Grid 6 (3) (2015) 1219–1226, doi:10.1109/tsg.2014.2382714.
[11] J. Kim, L. Tong, R.J. Thomas, Subspace methods for data attack on state estimation: A data driven approach,
IEEE Trans. Signal Process. 63 (5) (2015) 1102–1114, doi:10.1109/tsp.2014.2385670.
[12] H. Zhang, P. Cheng, L. Shi, J. Chen, Optimal denial-of-service attack scheduling with energy constraint, IEEE
Trans. Autom. Control 60 (11) (2015) 3023–3028, doi:10.1109/TAC.2015.2409905.
[13] J. Hao, R.J. Piechocki, D. Kaleshi, W.H. Chin, Z. Fan, Sparse malicious false data injection attacks and defense
mechanisms in smart grids, IEEE Trans. Ind. Inf. 11 (5) (2017) 1–12, doi:10.1109/TII.2015.2475695.
[14] R. Deng, G. Xiao, R. Lu, Defending against false data injection attacks on power system state estimation, IEEE
Trans. Ind. Inf. 13 (1) (2017) 198–207, doi:10.1109/tii.2015.2470218.
[15] Q. Yang, J. Yang, W. Yu, D. An, N. Zhang, W. Zhao, On false data-injection attacks against power system state
estimation: modeling and countermeasures, IEEE Trans. Parallel Distr. Syst. 25 (3) (2014) 717–729, doi:10.
1109/tpds.2013.92.
[16] F. Pasqualetti, F. Dörfler, F. Bullo, Attack detection and identification in cyber-physical systems, IEEE Trans.
Autom. Control 58 (11) (2013) 2715–2729, doi:10.1109/tac.2013.2266831.
[17] S. Tan, W.Z. Song, M. Stewart, J. Yang, L. Tong, Online data integrity attacks against real-time electrical market
in smart grid, IEEE Trans. Smart Grid 9 (1) (2018) 313–322, doi:10.1109/tsg.2016.2550801.
[18] M.S. Chong, M. Wakaiki, J.P. Hespanha, Observability of linear systems under adversarial attacks, in: Proceed-
ings of the American Control Conference, 2015, pp. 2439–2444, doi:10.1109/acc.2015.7171098.
[19] Q. Hu, D. Fooladivanda, Y.H. Chang, C.J. Tomlin, Secure state estimation and control for cyber security of the
nonlinear power systems, IEEE Trans. Control Netw. Syst. PP (99) (2017). 1–1 doi: 10.1109/tcns.2017.2704434.
[20] A. Wei, Y. Song, C. Wen, Adaptive cyber-physical system attack detection and reconstruction with application
to power systems, Iet Control Theory Appl. 10 (12) (2016) 1458–1468, doi:10.1049/iet-cta.2015.1147.
[21] Y. Shoukry, P. Tabuada, Event-triggered state observers for sparse sensor noise/attacks, IEEE Trans. Autom.
Control 61 (8) (2016) 2079–2091, doi:10.1109/tac.2015.2492159.
[22] C.K. Sid M A Chitraganti S, Medium access scheduling for input reconstruction under deception attacks, J.
Frankl. Inst. 354 (9) (2017) 3678–3689, doi:10.1016/j.jfranklin.2016.08.023.

Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022
JID: FI
ARTICLE IN PRESS [m1+;December 6, 2018;7:30]
Q. Li, S. Li and B. Xu et al. / Journal of the Franklin Institute xxx (xxxx) xxx 21

[23] Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank
matrices (2010) arXiv preprint arXiv:1009.5055, doi:10.1016/j.jsb.2012.10.010.
[24] L. Liu, M. Esmalifalak, Q. Ding, V.A. Emesih, Z. Han, Detecting false data injection attacks on power grid by
sparse optimization, IEEE Trans. Smart Grid 5 (2) (2014) 612–621, doi:10.1109/tsg.2013.2284438.
[25] Y. Shen, Z. Wen, Y. Zhang, Augmented lagrangian alternating direction method for matrix separation based on
low-rank factorization, Optim. Methods Softw. 29 (2) (2014) 239–263, doi:10.1080/10556788.2012.700713.
[26] A. Gomez-Exposito, A. Abur, Power system state estimation: theory and implementation, CRC press, 2004.
[27] A. Minot, N. Li, A fully distributed state estimation using matrix splitting methods, in: Proceedings of the
American Control Conference, 2015, pp. 2488–2493, doi:10.1109/acc.2015.7171105.
[28] M. Ozay, I. Esnaola, F.T.Y. Vural, S.R. Kulkarni, H.V. Poor, Sparse attack construction and state estimation in
the smart grid: Centralized and distributed models, IEEE J. Select. Areas Commun. 31 (7) (2013) 1306–1318,
doi:10.1109/jsac.2013.130713.
[29] L. Peng, L. Shi, X. Cao, C. Sun, Optimal attack energy allocation against remote state estimation, IEEE Trans.
Autom. Control PP (99) (2017). 1–1 doi: 10.1109/tac.2017.2775344.
[30] H. Zhang, W.X. Zheng, Denial-of-service power dispatch against linear quadratic control via a fading channel,
IEEE Trans. Autom. Control (2018), doi:10.1109/tac.2018.2789479.
[31] G. Kutyniok, Theory and applications of compressed sensing, 36, Gamm-Mitteilungen, 2013, pp. 79–101, doi:10.
1002/gamm.201310005.
[32] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the
alternating direction method of multipliers, Found. Trends Mach. Learn. 3 (1) (2010). 1–122 doi: 10.1561/
2200000016.
[33] H. Huang, Q. Yan, Y. Zhao, W. Lu, Z. Liu, Z. Li, False data separation for data security in smart grids, Know.
Inf. Syst. 52 (3) (2017) 815–834, doi:10.1007/s10115- 016- 1019- 8.
[34] T. Zhou, D. Tao, Godec: randomized low-rank and sparse matrix decomposition in noisy case, in: Proceedings
of the International Conference on Machine Learning, Omnipress, 2011.
[35] T. Zhou, D. Tao, Greedy bilateral sketch, completion and smoothing, in: Proceedings of the International Con-
ference on Artificial Intelligence and Statistics, JMLR. org, 2013.
[36] R.D. Zimmerman, C.E. Murillo-Sanchez, R.J. Thomas, Matpower: Steady-state operations, planning, and analysis
tools for power systems research and education, IEEE Trans. Power Syst. 26 (1) (2011) 12–19, doi:10.1109/
tpwrs.2010.2051168.
[37] H. Yan, H. Zhang, F. Yang, X. Zhan, C. Peng, Event-triggered asynchronous guaranteed cost control for Markov
jump discrete-time neural networks with distributed delay and channel fading, IEEE Trans. Neural Netw. Learn.
Syst. PP (99) (2017) 1–11, doi:10.1109/TNNLS.2017.2732240.
[38] H. Li, W. Yan, Y. Shi, Triggering and control co-design in self-triggered model predictive control of constrained
systems: with guaranteed performance, IEEE Trans. Autom. Control PP (99) (2018). 1–1 doi: 10.1109/TAC.
2018.2810514.
[39] J. Liu, J. Xia, E. Tian, S. Fei, Hybrid-driven-based h∞ filter design for neural networks subject to deception
attacks, Appl. Math. Comput. 320 (2018) 158–174, doi:10.1016/j.amc.2017.09.007.

Please cite this article as: Q. Li, S. Li and B. Xu et al., Data-driven attacks and data recovery with noise on state
estimation of smart grid, Journal of the Franklin Institute, https:// doi.org/ 10.1016/ j.jfranklin.2018.10.022

You might also like