Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
A Game-Theoretic Approach to
Cross-Layer Security Decision-Making
in Industrial Cyber-Physical Systems
Kaixing Huang, Chunjie Zhou, Yuanqing Qin, Weixun Tu
Abstract—Current security measures in industrial Cyber- loss caused by attacks [6]. Recently, some researchers have
Physical systems (ICPS) lack the active decision capa- employed game theory to propose security decision-making
bility to defend against highly-organized cyber-attacks. In approaches for ICPSs.
this paper, a security decision-making approach based on
stochastic game model is proposed to characterize the In [7], the authors propose a hybrid approach using game
interaction between attackers and defenders in ICPSs and theory and classical optimization to produce decision support
generate optimal defense strategies to minimize system for the defenders of ICPSs. Feng et al. [8] integrate risk
losses. The major distinction of this approach is that it assessment with game model to optimize the allocation of
presents a practical way to build a cross-layer security defensive resources in multiple chemical facilities. Chen et
game model for ICPSs by means of quantitative vulnera-
bility analysis and time-based unified payoff quantification. al. [9] present a comprehensive game framework to seek
A case study on a hardware-in-the-loop simulation testbed reliability strategies for defending power systems. Yuan et al.
is carried out to demonstrate the feasibility of the proposed [10] build a hierarchical Stackelberg game model to address
approach. the problem of resilient control of networked control systems
Index Terms—Industrial Cyber-Physical system, security, under denial of service attacks. Niu and Jagannathan [11] use
game theory, decision-making. a zero-sum game to derive the optimal strategy for defending
dynamic systems in the presence of both cyber-attacks and
physical disturbances. In [4], the authors introduce a games-
I. I NTRODUCTION
in-games principle for defending ICPSs. This is a cross-layer
HE massive deployment of information and commu- solution for resilient defense of ICPSs and it is quite promising
T nication technologies in industry is transforming the
traditional legacy-electromechanical-based systems into mod-
for protecting ICPSs against cross-layer attacks where the
attackers peneatrated from the cyber space to the physical
ern industrial Cyber-Physical systems (ICPS) which tightly space.
integrate the cyber space with the physical space [1]. ICPSs are Despite these previous efforts, however, the majority of
expected to significantly promote manufacturing productivity them make too simple assumptions about the cyber layer of
and realize smart services. However, ICPSs are suffering ICPSs and they generally model the cyber layer as multi-
from cyber-attacks due to their increasing connections to ple independent elements (e.g., [7]–[9]) or abstract dynamic
the Internet [2]. As cyber-attacks against ICPSs could cause systems (e.g., [4], [11]). But in fact ICPSs have many kinds
equipment damage, environmental pollution or even fatalities of devices communicating with each other through complex
[3], ensuring the security of ICPSs is an issue of great concern. networks in the cyber layer. The lack of adequate cyber layer
Existing security countermeasures for ICPSs (e.g., encryp- modeling makes these methods not entirely applicable to real-
tion, access control, intrusion detection) lack a quantitative world ICPSs.
decision-making mechanism to actively defend against ad- Moreover, most previous works assume that game model
vanced persistent threats [4], [5]. Game theory, as an effective parameters (e.g., gains, losses, game state transition probabili-
formal tool for strategic behavior analysis, provides the capa- ties) can be obtained by security experts. But in reality, ICPSs
bility to quantitatively model the interaction between attackers are usually very complex, so it is difficult, if not impossible,
and defenders, which can guide system operators to carry to build a game model with all the parameters accurately
out appropriate attack mitigation strategies and reduce the assigned by security experts. For example, the payoff (net
gain) parameters in the cyber layer and physical layer have
Manuscript received September 29, 2018; revised January 23, 2019; to be evaluated with totally different metrics in existing
accepted March 5, 2019. The work of C. Zhou was supported in part by approaches. As for the cyber layer, the payoffs are typically
the National Science Foundation of China under Grant 61433006, Grant measured by dollar values, while the payoffs in the physical
61873103 and Grant 61272204. (Corresponding author: Chunjie Zhou)
K. Huang, C. Zhou, Y. Qin and W. Tu are with the Key Labo- layer are usually quantified in terms of control performance
ratory of Image Processing and Intelligent Control, Ministry of Ed- degradation. Different quantification metrics increase the dif-
ucation, and the School of Artificial Intelligence and Automation, ficulty of building a comprehensive security game framework
Huazhong University of Science and Technology, Wuhan 430074, China
(e-mail: hyanglu1573@hust.edu.cn; cjiezhou@hust.edu.cn; qinyuan- which contains both the cyber and physical layers. Besides, as
qing@hust.edu.cn; m201672519@hust.edu.cn). indicated in [12], traditional methods usually fail to explore the
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
With the help of the TTR metric, we can quantify the player takes actions according to a probabilistic distribution.
payoffs in both the cyber and physical layers in an unified Then the attacker’s and the defender’s strategy in state t can
way, thereby building a cross-layer security game model for be represented as πtA = {πtA (At,1 ), πtA (At,2 ), ..., πtA (At,I )}
ICPSs. In the cyber layer of ICPSs, R(At,i ) is the amount of and πtD = {πtD (Dt,1 ), πtD (Dt,2 ), ..., πtD (Dt,J )}, respectively,
time it takes to bring the compromised host back to a normal where πt indicates the probability of choosing a specific action
state by, typically, restarting the host or switching to a backup in state t, and 0 ≤ πtA (At,i ) ≤ 1, 0 ≤ πtD (Dt,j ) ≤ 1,
device. When it comes to the physical layer, if the attacker
PI A
PJ D
i=1 (πt (At,i )) = 1, j=1 (πt (Dt,j )) = 1.
has successfully attacked a sensor or actuator by taking action Since the attacker and the defender choose actions in a
At,i , then R(At,i ) is defined as: stochastic manner, the strategies the attacker and the defender
R(At,i ) = Rc (At,i ) + Rp (At,i ), (8) adopt are called mixed strategy or stochastic strategy [6]. If
∃ i, πtA (At,i ) = 1 and ∀ l 6= i, πtA (At,l ) = 0, then the attack
where Rc (At,i ) is the TTR for the compromised device in strategy πtA is called pure strategy (As for the defender, it
the physical layer and Rp (At,i ) is the TTR for the physical should satisfy ∃ j, πtD (Dt,j ) = 1 and ∀ l 6= j, πtD (Dt,l ) = 0).
process under control. Obviously, pure strategy is a special case of mixed strategy.
When a sensor or an actuator has been compromised by an In a game problem, each player wants to maximize his/her
attacker, the evolution of the physical process depends on the total payoff by finding the optimal strategy profile, which
intrinsic system dynamics and the injected attack signals. Since is called the Nash equilibrium [6]. In a Nash equilibrium,
we usually do not know the injected attack signals in advance, no player wants to deviate from a specific strategy profile
it is difficult to obtain the value of Rp (At,i ). Therefore, we unilaterally. Denote πt,i A
as a possible mixed strategy for the
consider the worst-case situation in order to quantify Rp (At,i ) attacker in state t (πt,j for the defender), then a mixed strategy
D
when the attacker has not disturbed the physical process. pair (πt,∗
A D
, πt,∗ ) is a Nash equilibrium solution if:
According to [18], the MIN and MAX attacks, as described in
(9), are usually the most effective attacks for an attacker who (
A A D A D
wants to disrupt a plant but does not know system dynamics. ∀ πt,i , UtA (πt,∗ , πt,∗ ) ≥ UtA (πt,i , πt,∗ ),
D D A D D A D
(12)
∀ πt,j , Ut (πt,∗ , πt,j ) ≥ Ut (πt,∗ , πt,j ),
MIN attacks: yi = yimin , ui = umin
,
i (9)
MAX attacks: yi = yimax , ui = umax i .
where U calculates the expected payoff with a given
In (9), y is sensor measurements and u is control commands.
mixed strategy. Then the optimal attack and defense strat-
Consequently, we can compute the TTRs for the physical
egy profiles are π∗A = {π1,∗ A A
, π2,∗ A
, ..., πT,∗ } and π∗D =
process under MIN or MAX attacks by means of numerical
{π1,∗ , π2,∗ , ..., πT,∗ }, respectively.
D D D
simulation. Afterwards, the larger one of the two TTRs will
be assigned as the value of Rp (At,i ).
When the attacker has injected attack signals into the control
system, the physical process model can be described as a F. Game Solution
discrete-time linear time-invariant system with unknown inputs
[19]: A stochastic game is the combination of many matrix
0 games and a Markov decision process [20]. Therefore, given
xk+1 = Axk + Buk + Ba uk + wk , (10a) the attacker’s action set At = {At,1 , At,2 , ..., At,I } and the
yk = Cxk + vk , (10b) defender’s action set Dt = {Dt,1 , Dt,2 , ..., Dt,J } in state t,
0 the game in state t can be regarded as a matrix game Gt . Gt
where x is system states, uk is injected signals, Ba is the
is a I×J matrix whose elements are payoffs to be either gained
matrix for the attack signals, and A, B and C are matrices
or lost when each player takes the corresponding actions. An
representing the dynamics of the physical process. Literature
element at the ith row and jth column of Gt is given by:
[19] designed an unbiased minimum-variance state estimator
for the system described in (10) to estimate system states when T
attacks are in progress. This estimator has the following form: X
t
gi,j = UtA (At,i , Dt,j ) + p(Sk |St , At,i , Dt,j )Vk , (13)
x̂k+1 = Ax̂k + Buk + k=1
(11)
Lk+1 [yk+1 − CAx̂k − CBuk ],
where Vk is the value of Gk [20]. In our case, Vk is the
where x̂k is the estimate of xk and Lk+1 is a gain parameter.
expected payoff of the attacker in a Nash equilibrium.
Based on this estimator, we can estimate xk during the time
period when the physical process is under attack and when In a matrix game Gt , the attacker is assigned as the row
the defender is recovering the compromised sensor or actuator player while the defender is designated as the column player.
back to normal, and finally compute the value of Rp (At,i ). Accordingly, at game state t, the players will choose a row
i and a column j, then the payoffs for the attacker and the
defender are gi,j
t
and −gi,j
t
, respectively. Note that Gt is a two-
E. Strategy Profiles player zero-sum game, so the pure strategy Nash equilibrium
A strategy profile consists of a series of actions a player is the saddle point of matrix Gt [6]. If there does not exist
has adopted in each game state. And in each state t, the any saddle point in Gt , then the optimal strategy is a mixed
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
strategy. In such situation, the solution of Gt is achieved by Suppose the current game state is t, and the game transits to
solving the following linear programming problem: state k under the attack-defense action pair (At,i , Dt,j ), then
the Q-function for the attacker can be defined as:
∂(UtA (πtA , πtD )) QA (St , At,i , Dt,j ) ← (1 − α)QA (St , At,i , Dt,j )+
= 0, i = 1, 2, · · · , I − 1,
(15)
∂(πtA (At,i ))
(14) α[r(St , At,i , Dt,j ) + βWk ],
∂(U D (π A , π D ))
t t t
where the learning rate α indicates how fast the player updates
= 0, j = 1, 2, · · · , J − 1,
∂(πtD (Dt,j ))
the Q-function with new reward information, β is a discount
PI PJ factor, r(St , At,i , Dt,j ) represents the reward of the attacker,
with constraints: i=1 πtA (At,i ) = 1, D
j=1 πt (Dt,j ) = 1. and Wk is the value of matrix game QA (Sk ).
To solve the proposed stochastic game and get the optimal Empirically, setting α = 1 means that the player will focus
strategies for all game states, we adapt the algorithm proposed on learning from immediate and future rewards, thus losing the
in [21] to our case. The principle of that algorithm is to previously learnt knowledge and causing divergency. On the
iteratively update Vt until the deviation between two iterations contrary, α = 0 means the algorithm has no ability of learning.
is less than a predefined threshold δ: |Vtr+1 − Vtr | < δ. The So α is often chosen as a tradeoff between aggressiveness and
pseudo code of the algorithm is given in Algorithm 1. Using conservativeness. The reward r(St , At,i , Dt,j ) of the attacker
Algorithm 1, we can obtain the optimal defense strategy profile in our game model is defined as the payoff UtA (At,i , Dt,j )
π∗D , as well as the optimal attack strategy profile π∗A . of the attacker. Since the proposed game model is a zero-
sum game, so we only need to specify one Q-function and
Algorithm 1 Compute the optimal strategy profile. the Q-function for the defender is QD (St , At,i , Dt,j ) =
Input: A stochastic game G −QA (St , At,i , Dt,j ). Finally, the optimal defense strategy is
Output: Optimal attack-defense strategy profile (π∗A , π∗D ) the one that maximizes the value of QD (Sk ), i.e.:
1: r ← 0
2: Randomly initialize V as V 0 = {V10 , V20 , ..., VT0 }
X
3: repeat
D
πk,∗ = arg max min πkD (d)QD (Sk , a, d). (16)
D a∈Ak
πk
4: for each Gt ∈ G = {G1 , G2 , ..., GT } do d∈Dk
5: t
∀ gi,j ∈ Gt , replace Vk in (13) with Vkr The solution of (16) can also be obtained by means of linear
A,r D,r
6: Solve the matrix game Gt and obtain (πt,∗ , πt,∗ ) programming like (14).
r+1 r+1 A A,r D,r
7: ∀ Vt ∈ V , Vt ← Ut (πt,∗ , πt,∗ )
8: end for V. C ASE S TUDY
9: r ←r+1
r+1
10: until ∀ t ∈ {1, 2, ..., T }, |Vt − Vtr | < δ In this section, we implement our game model onto a
11: for each Gt ∈ G = {G1 , G2 , ..., GT } do simulated simplified Tennessee-Eastman (STE) process control
12: Solve the game and obtain (πt,∗ A D
, πt,∗ ) system and evaluate the experiment results.
13: end for
14: return π∗A = {πt,∗ A
}, π∗D = {πt,∗
D
}
A. Experiment Setup
The STE system [23] is a chemical reactor plant which has
been widely adopted as a testbed in fault diagnosis studies.
IV. R EINFORCEMENT L EARNING Fig. 2 illustrates the architecture of the hardware-in-the-loop
simulation testbed where the STE process is simulated in an
In general, one of the most difficult tasks of a game problem agent host. As shown in Fig. 2, AH1 and AH2 have access to
is to accurately specify the model parameters [4] because the control network, and the devices in the control network
we usually do not have enough domain knowledge. The communicate with the controllers using Modbus protocol,
Algorithm 1 can be used to solve the game problem only when which is a widely used industrial protocol. An attacker tries
both the attacker and defender know all the model parameters to penetrate into the system from the corporate network and
completely. Therefore, in this section a Q-learning algorithm manipulate the sensors/actuators to disrupt the STE process.
is devised to help the players learn the the optimal strategies The STE testbed has several vulnerabilities, as summarized
despite not accurately knowing the model parameters. in Table II. In this table, the “Exploitability” column indicates
Q-learning is a semi-supervised learning algorithm that be- the vulnerability exploitation probability computed with (1)
longs to the category of reinforcement learning [22]. The goal where the parameters (S AV, S AC and S AU ) are ac-
of Q-learning is to find an action sequence which generates the quired from the CVE (Common Vulnerabilities & Exposures)
maximal cumulative rewards via a trial-and-error manner. The database by querying the CVE ID. V3 is a Modbus protocol
quality of each action is assessed by a feedback from the en- vulnerability that Modbus lacks authentication mechanisms,
vironment, known as the reward. In the case of attacks against thus allowing unauthenticated attackers to send arbitrary mes-
ICPSs, attacker can probe the system to find out what actions sages. V5 is a SQL (Structured Query Language) database
the defender has taken, so both the attacker and defender can vulnerability which only exists in Siemens WinCC (Windows
know each other’s actions. Therefore, by applying Q-learning Control Center), an industrial supervisory software.
to stochastic game, the players can approximate the optimal By exploiting the vulnerabilities listed in Table II, the
strategy profiles by iteratively updating a Q-function. attacker can infiltrate into the system. Here we assume that
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
TABLE II
V ULNERABILITY I NFORMATION
TABLE IV
ATTACK ACTIONS
AH1 AH2 Router Attacker Cost (min)
Legend
Action Description
a1 Exploit V1 to acquire user privilege on AH1 4.5
AH: Administration Host a2 Exploit V1 to acquire user privilege on AH2 3.5
HMI: Human Machine Interface Exploit V2 to acquire user privilege on AH2 2.8
ES: Engineering Station
a3
PC: Pressure Controller a4 Exploit V3 to disguise as ES 5.2
Firewall ES HMI FC: Flow Controller a5 Exploit V4 to acquire root privilege on HMI 3.0
CC: Composition Controller Exploit V5 to manipulate the reading of IS 7.0
FS: Flow Sensor
a6
PS: Pressure Sensor a7 Exploit V5 to manipulate the reading of FS 7.5
IS: Ingredient Sensor a8 Exploit V5 to manipulate the reading of PS 8.0
VA: Valve No operation 0
CAN: Controller Area Network
a9
Gateway CAN bus
STE Simulation Platform for ICPS Security Experiments
TABLE V
T IME TO R ECOVERY
Fig. 2. Structure of the STE simulation testbed. According to Table III and Table IV, the attacker’s game
state transition graph can be drawn, as shown in Fig. 3. The
attacker first needs to exploit the vulnerabilities of AH1 or
the attack target is the sensors/actuators and the attacker will
AH2 to obtain access to the control network, then tries to
not waste time on attacking a target which is not able to help
attack the devices in the control network and disguises as a
him/her move towards the target. Consequently, when we are
legal host to communicate with the controllers, and finally
building the stochastic game model, we can find that most
seeks to compromise IS, FS or PS to disrupt the STE process.
game states are infeasible. Table III lists some feasible game
states we are interested in. And Table IV enumerates all the
attacker’s possible actions. So, the available attack action set in
each game state is: A1 = {a1 , a2 , a3 , a9 }, A2 = {a4 , a5 , a9 }, a9 S6 a9
A3 = {a4 , a5 , a9 }, A4 = {a6 , a7 , a8 , a9 }, A5 = {a4 , a9 }, a9 a6
A6 = ∅, A7 = ∅, A8 = ∅. In addition, the TTRs for each S2 a4 S4 a7 S7 a9
successful attack action are presented in Table V.
a1 a8
a5
TABLE III
a9 S1 a4 S8 a9
a4
I MPORTANT G AME S TATES a2 , a3
State Description State Description S3 a5 S5
S1 Normal system operation S5 Root privilege on HMI a9
S2 User privilege on AH1 S6 Manipulation on IS
S3 User privilege on AH2 S7 Manipulation on FS a9
S4 User privilege on ES S8 Manipulation on PS
Fig. 3. Attacker’s game state transition graph.
With contrast to the attack actions, the defender’s actions are
listed in Table VI. The available defense action set in each state
is: D1 = {d1 , d2 , d3 , d4 , d12 }, D2 = {d5 , d6 , d7 , d8 , d12 }, B. Simulation and Result Analysis
D3 = {d5 , d6 , d7 , d8 , d12 }, D4 = {d9 , d10 , d11 , d12 }, D5 = In order to demonstrate the effectiveness of the proposed
{d5 , d6 , d12 }, D6 = ∅, D7 = ∅, D8 = ∅. approach, two experiment scenarios are designed: 1) Solve
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
Action S1 S2 S3 S4 S5 10
a1 0.4869 – – – –
a2 0.4647 – – – –
W1
a3 0.0484 – – – –
a4 – 0.4261 0.4261 – 1 α = 0.2
a5 – 0.5739 0.5739 – – 5 α = 0.3
a6 – – – 0.2753 –
a7 – – – 0.3146 – α = 0.5
a8 – – – 0.4100 – α = 0.9
a9 0 0 0 0 0
0
0 100 200 300 400
The expected payoffs for the attacker and the defender in
Iteration
each game state are presented in Table IX. Since it is a zero-
sum game, the defender’s payoff is the negative of that of the Fig. 4. Value of W1 with different learning rates in state S1 .
attacker’s. In Table IX, the game states S6 , S7 and S8 are all
end states, which means that the attacker will take NOP action
in these states, so the payoff values in these states are 0. Fig. 5 shows the learning process of the defense strategy
2) Experiment 2 – Q-learning: In this experiment, we first in game state S1 with α = 0.9. In Fig. 5, d5 is not drawn
set all the Q-function values to zero and then repeat the game because it always equals to zero in the learning process.
many times to generate the optimal strategy profiles. The Finally, the learnt optimal mixed defense strategy is π1,∗
D
=
discount factor β is set as β = 1. {0.3060, 0.3559, 0.3378, 0, 0}, which is in accordance with the
Fig. 4 shows the convergence process of W1 in different result shown in Table VIII. In other words, this experiment
configurations of learning rate α. From Fig. 4 we see that α scenario demonstrates that the optimal defense strategy profile
does not have significant impact on the value of W1 , but the can be obtained through the proposed Q-learning algorithm.
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIE.2019.2907451, IEEE
Transactions on Industrial Electronics
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. XX, NO. XX, XXXX
revolution,” IEEE Industrial Electronics Magazine, vol. 11, no. 1, pp. [20] A. M. Fink, “Equilibrium in a stochastic n-person game,” Journal of
6–16, Mar. 2017. science of the hiroshima university, vol. 28, no. 1, pp. 89–93, 1964.
[2] M. Wolf and D. Serpanos, “Safety and security in cyber-physical systems [21] K. Sallhammar, B. E. Helvik, and S. J. Knapskog, “On stochastic
and internet-of-things systems,” Proc. IEEE, vol. 106, no. 1, pp. 9–20, modeling for integrated security and dependability evaluation,” Journal
Jan. 2018. of Networks, vol. 1, no. 5, pp. 31–42, 2006.
[3] S. McLaughlin, C. Konstantinou, X. Wang, L. Davi, A. R. Sadeghi, [23] N. L. Ricker, “Model predictive control of a continuous, nonlinear, two-
M. Maniatakos, and R. Karri, “The cybersecurity landscape in industrial phase reactor,” Journal of Process Control, vol. 3, no. 2, pp. 109–123,
control systems,” Proc. IEEE, vol. 104, no. 5, pp. 1039–1057, May. 1993.
2016. [24] X. Ou, W. F. Boyer, and M. A. McQueen, “A scalable approach to
[4] Q. Zhu and T. Basar, “Game-theoretic methods for robustness, security, attack graph generation,” in Proceedings of the 13th ACM conference
and resilience of cyberphysical control systems: Games-in-games prin- on Computer and communications security, pp. 336–345. ACM, 2006.
ciple for optimal cross-layer resilient control systems,” IEEE Control [25] N. Falliere, L. O. Murchu, and E. Chien, “W32. stuxnet dossier,” White
Systems, vol. 35, no. 1, pp. 46–65, Feb. 2015. paper, Symantec Corp., Security Response, vol. 5, no. 6, pp. 1–29, 2011.
[5] Y. Jiang and S. Yin, “Recursive total principle component regression
based fault detection and its application to vehicular cyber-physical Kaixing Huang received the B.S. and Ph.D.
systems,” IEEE Transactions on Industrial Informatics, vol. 14, no. 4, degrees in control science and engineering from
pp. 1415–1423, Apr. 2018. the Huazhong University of Science and Tech-
[6] C. T. Do, N. H. Tran, C. Hong, C. A. Kamhoua, K. A. Kwiat, E. Blasch, nology, Wuhan, China, in 2012 and 2018, re-
S. Ren, N. Pissinou, and S. S. Iyengar, “Game theory for cyber security spectively.
and privacy,” ACM Comput. Surv., vol. 50, no. 2, pp. 30:1–30:37, May. His research interests include security control
2017. of industrial control systems and game theory.
[7] C. Hankin, Semantics, Logics, and Calculi, ch. Game Theory and Indus-
trial Control Systems, pp. 178–190. Springer International Publishing,
2016.
[8] Q. Feng, H. Cai, Z. Chen, X. Zhao, and Y. Chen, “Using game theory to
optimize allocation of defensive resources to protect multiple chemical
facilities in a city against terrorist attacks,” Journal of Loss Prevention
in the Process Industries, vol. 43, no. Supplement C, pp. 614 – 628,
2016.
[9] G. Chen, Z. Y. Dong, D. J. Hill, and Y. S. Xue, “Exploring reliable Chunjie Zhou received the B.S., M.S. and Ph.D.
strategies for defending power systems against targeted attacks,” IEEE degrees in control theory and control engineer-
Transactions on Power Systems, vol. 26, no. 3, pp. 1000–1009, Aug. ing from the Huazhong University of Science and
2011. Technology, Wuhan, China, in 1988, 1991 and
[10] Y. Yuan, H. Yuan, L. Guo, H. Yang, and S. Sun, “Resilient control of 2001, respectively.
networked control system under dos attacks: A unified game approach,” He is currently a Professor with the School of
IEEE Transactions on Industrial Informatics, vol. 12, no. 5, pp. 1786– Artificial Intelligence and Automation, Huazhong
1794, Oct. 2016. University of Science and Technology. His re-
[11] H. Niu and S. Jagannathan, “Optimal defense and control of dynamic search interests include safety and security con-
systems modeled as cyber-physical systems,” The Journal of Defense trol of industrial control systems, theory and
Modeling and Simulation, vol. 12, no. 4, pp. 423–438, 2015. application of networked control systems and
[12] Y. Jiang, S. Yin, and O. Kaynak, “Data-driven monitoring and safety artificial intelligence.
control of industrial cyber-physical systems: Basics and beyond,” IEEE
Access, vol. 6, pp. 47 374–47 384, 2018.
[13] Y. Jiang, K. Li, and S. Yin, “Cyber-physical system based factory
monitoring and fault diagnosis framework with plant-wide performance
optimization,” in 2018 IEEE Industrial Cyber-Physical Systems (ICPS), Yuanqing Qin received the B.S. degree in elec-
pp. 240–245, May. 2018. trical engineering from the Shandong University
[14] A. A. Cárdenas, S. Amin, Z.-S. Lin, Y.-L. Huang, C.-Y. Huang, and of Technology, Zibo, China, in 2000, and the
S. Sastry, “Attacks against process control systems: Risk assessment, M.S. and Ph.D. degrees in control theory and
detection, and response,” in Proceedings of the 6th ACM Symposium on control engineering from the Huazhong Univer-
Information, Computer and Communications Security, ser. ASIACCS sity of Science and Technology, Wuhan, China,
’11, pp. 355–366. New York, NY, USA: ACM, 2011. in 2003 and 2007, respectively.
[15] P. Mell, K. Scarfone, and S. Romanosky, “Common vulnerability scoring He is currently a Lecturer with the School of
system,” IEEE Security Privacy, vol. 4, no. 6, pp. 85–89, Nov. 2006. Artificial Intelligence and Automation, Huazhong
[16] N. Poolsappasit, R. Dewri, and I. Ray, “Dynamic security risk manage- University of Science and Technology. His re-
ment using bayesian attack graphs,” IEEE Transactions on Dependable search interests include networked cotnrol sys-
and Secure Computing, vol. 9, no. 1, pp. 61–74, Jan. 2012. tem and artificial intelligence.
[17] S. Ntalampiras, “Detection of integrity attacks in cyber-physical critical
infrastructures using ensemble modeling,” IEEE Transactions on Indus-
trial Informatics, vol. 11, no. 1, pp. 104–111, Feb. 2015.
[18] Y.-L. Huang, A. A. Cárdenas, S. Amin, Z.-S. Lin, H.-Y. Tsai, and
S. Sastry, “Understanding the physical and economic consequences of
attacks on control systems,” International Journal of Critical Infrastruc- Weixun Tu received the B.S. degree in Automa-
ture Protection, vol. 2, no. 3, pp. 73–83, 2009. tion from the Xidian University, Xi’an, China, in
[19] K. Huang, C. Zhou, Y. C. Tian, S. H. Yang, and Y. Qin, “Assessing the 2016. He is currently working toward the M.S.
physical impact of cyber-attacks on industrial cyber-physical systems,” degree in control science and control engineer-
IEEE Transactions on Industrial Electronics, vol. 65, no. 10, pp. 8153– ing at School of Artificial Intelligence and Au-
8162, 2018. tomation, Huazhong University of Science and
[22] H. Wang, T. Huang, X. Liao, H. Abu-Rub, and G. Chen, “Reinforce- Technology.
ment learning in energy trading game among smart microgrids,” IEEE His research interests include networked con-
Transactions on Industrial Electronics, vol. 63, no. 8, pp. 5109–5119, trol systems and artificial intelligence.
Aug. 2016.
0278-0046 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.