You are on page 1of 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
1

A Differential Game Approach to Decentralized


Virus-Resistant Weight Adaptation Policy over
Complex Networks
Yunhan Huang, Quanyan Zhu, Member, IEEE

Abstract—Increasing connectivity of communication networks


enables large-scale distributed processing over networks and im-
proves the efficiency for information exchange. However, malware
and virus can take advantage of the high connectivity to spread
over the network and take control of devices and servers for
Spreading Process Agents’ Response
illicit purposes. In this paper, we use an SIS epidemic model to
capture the virus spreading process and develop a virus-resistant
weight adaptation scheme to mitigate the spreading over the
network. We propose a differential game framework to provide
a theoretic underpinning for decentralized mitigation in which
nodes of the network cannot fully coordinate, and each node
determines its own control policy based on local interactions with
neighboring nodes. We characterize and examine the structure
of the Nash equilibrium, and discuss the inefficiency of the Spreading Process Changed Updated Network Topology
Nash equilibrium in terms of minimizing the total cost of the
whole network. A mechanism design through a penalty scheme
is proposed to reduce the inefficiency of the Nash equilibrium Fig. 1. The interactions between the spreading, topology and individual
and allow the decentralized policy to achieve social welfare for behavior. The agents adapt their behavior of contacting other agents due to
the whole network. We corroborate our results using numerical virus spreading. The agents behavior changes the topology of the network
which will in in turn affects the spreading process. For example, after the
experiments and show that virus-resistance can be achieved by topology change, the virus is less likely to spread from device 1 to device 2.
a distributed weight adaptation scheme.
Index Terms—Virus Resistance, Malware Spreading, Differen-
tial Games, Complex Networks, Decentralized Control, Mecha- More recent examples of wide-spreading cyber-attacks include
nism Design, Network Security, Epidemic Processes. WannaCry and Petya Ransomware, which have incurred bil-
lions of dollars of losses [3].
I. I NTRODUCTION With an increasing number of wide-spreading cyber-attacks
on networks, protection against malware and virus spreading
T HE integration of the information and communications
technologies into systems upgrades system performance.
However, the integration also degrades the security level of
in cyber networks is central to the security of network systems
[3]. However, there are many challenges on designing a
the systems and introduces vulnerabilities that undermine protection scheme for cyber networks. One challenge is due
the reliability of critical infrastructure. The connectivity and to the interdependency between the microscopic individual be-
interdependence of cyber networks make the system even haviors and the macroscopic spreading phenomenon. The local
more vulnerable due to the existence of the wide-spreading interactions over a large network where nodes communicate,
cyber-attacks on networks. It provides opportunities for the share information, and make interdependent decisions, can
sophisticated and stealthy malware and virus to spread over the result in a macroscopic behavior, which will in turn affect the
network. One noteworthy example is the StuxNet attack [1]. agents’ behaviors. This type of microscopic and macroscopic
In June 2010, certain control systems of a nuclear-enrichment couplings has been illustrated in Fig. 1. Another challenge
plant in Iran were infected by a carefully crafted computer arises from the fact that cyber networks are often formed by a
worm called StuxNet. The worm, spreading through USB large number of self-interested agents or decision-makers. The
devices, intended to breach the implemented cyberprotection noncooperation among the agents makes it almost impossible
schemes and alter both the measurement and actuation signals for the system to be coordinated as a whole to defend against
which caused instabilities and damage the physical plant [2]. wide-spreading cyber-attacks.
To this end, one way to mitigate the malware spreading over
Y. Huang and Q. Zhu are both with the Department of Electrical and large networks is to control the intensity of interactions with
Computer Engineering, New York University, Brookly, NY, USA; e-mail:
{yh2315,qz494}@nyu.edu. neighboring nodes. By adapting the rate of communications
This research is partially supported by grants ECCS-1847056, CNS- or contacts, nodes can reduce the likelihood of infection. This
1544782, and SES-1541164 from National Science Foundation (NSF), ARO type of mechanism is called weight adaptation as the weights
grant W911NF1910041, and Grant Award #2015-ST-061-CIRC01 through
the Critical Infrastructure Resilience Institute (CIRI), supported by the U.S. between two nodes of a network capture the intensity of the
Department of Homeland Security (DHS). connectivity [4]. The most fundamental reason that virus and

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
2

malware can go viral is the inherent property of networks:


connectivity. Weight adaptation is a mechanism that hits the
nail. Weight adaptation lowers the connectivity which leaves Weight decreasing

virus and malware no way out. Compared with quarantining


and link removal [5], weight adaptation does not need to

Recovering
completely disconnect nodes from others but rather adjust
a. t=0 b. t=1
weights to connect more loosely with nodes with a higher
likelihood of infection. Instead of fixing the weights for the
whole spreading process, in the weight adaptation scheme, Weight restoring
each agent dynamically updates their weight in response to Weight decreasing
the state of the neighboring nodes. Weight adaptation is
different from changing the infection rate. The infection rate
d. t=3 c. t=2
is usually considered to be decided by some interior factors
like physiological or immunological states of individual. The
weight between two nodes is usually used to describe how Fig. 2. Weight adaptation scheme to mitigate the infection for the node in
the center over an undirected network. The line width indicates the weight.
strongly two nodes are connected. Changing the weight can Red nodes are the infected ones while blue nodes are the susceptible ones.
be interpreted as an exterior change. end Intro The susceptible node decreases the weight of its connection to an infected
We consider a directed weighted network where the nodes node. Once the infected node is recovered, the weight of the connection to
the recovered node will be restored as it is shown by process c to d. The
and the edges represent the agents and the connections be- purple link and the green link are the ones who adapt the weights.
tween the agents respectively. The directed connection be-
tween two nodes can be considered as one agent acquiring
information/data/packet from another agent. The weight be- each agent’s costate encodes the information about the network
tween two agents quantifies the frequency or the volume of structure and the infection of the whole network.
communication between two agents [6]. The original weight We use a centralized optimal control problem to serve as a
is pre-designed by multilateral agreement among agents to benchmark problem to study the efficiency of the decentral-
achieve certain goals or to optimize the system performance ized problem. Under centralized policies, the system operator
when there is no infection. For example, in distributed es- develops optimal weight adaptation scheme to achieve social
timation or learning problems over networks such as [6], optimum. Compared with the centralized solution, the open-
one agent needs to communicate with its neighboring agents loop NE solution is not the best from a system point of
at a sufficient rate to find the global estimate of the state. view since in the game, agents consider only their own cost.
The optimal weighting on the edges quantitatively captures Such inefficiency caused by selfish behavior of agents has a
the minimum required frequency of contacting neighboring significant impact on network and service management. One
nodes. As illustrated in Fig. 2, when there are wide-spreading example is the congestion in traffic network caused by selfish
virus or cyber attacks, the agent can decrease the likelihood drivers [7]. To address the inefficiency, we propose a dynamic
of being infected by reducing their weight with infected penalty approach by designing a mechanism in which each
neighbors. The agent then restores the connections when the agent pays for the infection cost of all agents that are reachable
infected neighbors are recovered. Deviation of the weights to him/her. We show that with this mechanism, the open-loop
from the optimal ones introduces cost induced by performance NE policy achieves the social optimum.
degradation and system inefficiency. Infected agents may not The equilibrium analysis and the mechanism design lead
function normally. The agents and the network system will to a distributed algorithm for the network operator and the
suffer losses. Thus, it is essential to consider the trade-off agents to compute the optimal weight adaptation where each
between malfunction cost caused by infection and inefficiency agent only has to know local information. We summarize the
or performance degradation cost caused by weight deviation. principal contributions as follows:
In this paper, an N-person nonzero-sum differential game-
based model is proposed to model the virus spreading and the 1) We propose a differential game model to develop a virus-
agents’ adaptive response to virus infection. This model cap- resistant weight adaptation scheme for cyber networks
tures the non-cooperative behaviors among agents, dynamic formed by a group of self-interested agents.
properties of spreading process, and the complexity of the 2) We study the structure of the open-loop NE for the
Paper’s local interactions. We characterize the Nash equilibrium (NE) differential game over complex networks and show the
sections for the game and investigate the network effects under the weight adaptation rule is based on the agents’ and its
non-cooperative strategies. We observe that under the open- out-neighbors’ infection level as well as the costate.
loop NE, each agent updates his weight based on its own 3) We discuss the inefficiency of the NE. A dynamic
infection level and its out-neighbors infection level as well as penalty scheme is proposed to achieve social optimum
the corresponding component of its costate. When the agent’s for the whole network.
own infection level is high, it does not care much about 4) An implementable distributed virus resistance algorithm
the weight of links to infected out-neighbors. When its out- is proposed to compute the NE-based control policy.
neighbor’s infection level is high, it lowers more weight of the Game theory has long been a useful tool to design strategies
corresponding connection. The corresponding component of on network systems for virus resistance purposes [3], [8], [9].

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
3

In [8], the authors have proposed a network formation game the weight of the edge from node i to j. We assume that graph
that balances multiple partially conflicting objectives such as G has no self-loops.
the cost of installing links, the performance of the network and We denote the original weight adjacency matrix by W o =
the resistance to virus. In their work, an undirected unweighted [wioj ] ∈ R N ×N . Let Ni,o
out (N in ) be the set of out-neighbors
i,o
static network is formed. Hayel et.al. in [3] have studied (in-neighbors) under the original optimal weight pattern W o .
large population game with heterogeneous types of individuals.
They focus on group behavior of certain type in stead of
B. Virus Spreading Model
individual behavior. Besides game theory, other tools such as
impulse control [10], optimal control [11], and optimization With the fact that cyber network nodes do not have human-
[12] have been used to design strategies to mitigate malware like auto-antibody/vaccination which can prevent individual
attacks and virus spreading. from being infected again, we study the so-called susceptible-
Virus spreading over adaptive networks has first been stud- infected-susceptible (SIS) models. Consider a population of N
ied by Gross et al. in [13]. They investigated adaptive behavior agents. Each agent can be either susceptible (S) or infected (I).
in a homogeneous way where the whole network takes the Infected individuals infect others at rate βi ≥ 0. The intensity
same adaption. Based on the work on epidemic spreading over of interaction between vi and v j is described by the weight
time-varying networks [14], optimal control method has been wi j ∈ R. Denote wi = (wi1, ..., wi N )0 ∈ R N . We assume that
utilized to find the optimal time-varying topology response for the weight is bounded by w̄i j ∈ R. If vi ∈ V is susceptible
the network system in [11]. However, the centralized optimal while v j ∈ V is infected, there is possibility that vi will be
control method is not practical and lack of incentive. The infected after the interaction. In addition, each infected agent
effect of heterogeneous weight adaptation on virus spreading returns to the susceptible state at some rate σi . The state of a
has been studied by Yun et al. in [15]. In [15], the authors node i at time t ≥ 0 is a binary random variable Xi (t) ∈ {0, 1},
have proposed a weight adaptation rule without taking cost with Xi (t) = 0 (Xi (t) = 1), indicating that agent i is susceptible
into consideration. The weight adaptation rule is based on the (infected). The state vector of all N agents is denoted by
infection level of the whole network. X(t) = (X1 (t), X2 (t), ..., X N (t))0 ∈ {0, 1} N . With the adaptive
Vaccination and immunity have been studying for control of weight wi j (t) from agent i to j, the stochastic state transitions
virus spreading over decades [10], [12], [16]. But vaccination of node vi from time t to t + ∆t can be written as follows:
may not be efficient for some malware and virus due to P(Xi (t + ∆t) = 1|Xi (t) = 0, X(t))
their fast upgrading and undetectability. Also, getting every ÕN
individual vaccinated is costly. Quarantining [5] is equivalent = wi j β j X j (t)∆t + o(∆t), (1)
to removal of all connections of one agent. Compared with j=1
weight adaptation scheme, it is overreacting to disconnect all P(Xi (t + ∆t) = 0|Xi (t) = 1, X(t)) = σi ∆t + o(∆t).
links since connection with healthy agents cause no harm.
The paper is organized as follows. In Sect. II, preliminaries The model (1) is computationally challenging under large-
are given and the N-person nonzero-sum differential game scale networks due to the exponentially increasing size of the
framework is introduced. Section III describes the open-loop state space. Hence, we resort to mean-field approximation of
NE of the differential game and the weight adaptation scheme. the Markov process [14], [17], [18]. Denote xi (t) ∈ [0, 1] as the
Sect. IV studies the efficiency of the NE solution. Comparisons probability of agent i being infected at time t. The mean-field
of the differential game-based weight adaptation scheme with approximation then provides
the optimal control based scheme and other numerical results N
Õ
are given in Sect. V. Conclusions are contained in Sect. VI. xÛi (t) = (1 − xi (t)) wi j (t)β j x j (t) − σi xi (t), (2)
j=1
II. P RELIMINARIES AND P ROBLEM F ORMULATION for i = 1, 2, ..., N. To write this dynamics equation in a more
In this section, we introduce notations and preliminary compact form, denote x(t) = (x1 (t), ..., x N (t))0. We have
results needed in our derivations. Along the way, we describe xÛ (t) = G(x(t), W(t)), (3)
and develop the problem formulation.
where G(·, ·) : R N × R N ×N → R N which can be written as
G(x(t), W(t)) = (W(t)B−D)x(t)−Xd (t)W(t)Bx(t) where W(t) =
A. Graph Theory [wi j (t)] N ×N , B = diag(β1, ..., β N ), D = diag(σ1, ..., σ2 ), and
A weighted, directed graph can be defined by a triple G , Xd (t) = diag(x1 (t), ..., x N (t)).
(V, E, W). V , {v1, v2, ..., v N } represents a set of N nodes. According to the discussion in [14], the n-intertwined model
Define N , {1, ..., N }. A set of directed edges is denoted by (2) gives an upper-bound for the exact probability of infection,
E ⊆ V × V. The set of in-neighbors of node i is defined as xi (t). However, the mean-field approximation consider herein,
Niin , { j | j ∈ V, ( j, i) ∈ E}. Denote by | · | the cardinality while it is an approximation, is well constructed because the
of a set. So, the in-degree of vi is |Niin |. Similarly, the set scale of networks, i.e., N in our model is large and we focus
of out-neighbors of vi is Niout , { j | j ∈ V, (i, j) ∈ E}. The on the cases where β/σ is above the threshold [18].
out-degree of vi is |Niout |. The weight adjacency matrix G is The graph and the epidemic spreading process can be
denoted by an N × N matrix W , [wi j ] where wi j refers to viewed as physical constraints. The agents in the network

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
4

are coupled by these constraints while trying to minimize and controls. Nodes interact with their neighbors. The network
their own cost. Such behaviors lead to differential games over topology is captured by wi j , i ∈ N, j ∈ N . The time-varying
networks, which will be introduced in the following section. property of the network is described by wi j (t) for t ∈ [0, T].
Remark 2. Information structure determines the state infor-
C. Differential Game Over Networks mation gained and recalled by players at time t. The reasons
As we mentioned in Section I, the self-interested agents aim why we adopt open-loop policies are three-fold. First, the
to minimize their own costs. One cost arise from malfunction obtained open-loop policy can be implemented as a feedback
caused by infection. Another cost for agent i is to describe policy [19] as is shown in Section III. Since the dynamics (3)
inefficiency or degradation of system performance caused by is determined, the state at any time can be computed and used
deviation from the original weight wioj for all j ∈ N . We to determine the control policy. Second, to obtain a strongly
consider the original weight as an optimal weight under which time-consistent optimal and individual feedback policies, we
the agent can achieve the most benefit. have to resort to techniques of dynamic programming. How-
For agent i, the infection cost function, given by fi : ever, a direct application of dynamic programming will not
[0, 1] → R+ , is a function of xi (t) ∈ [0, 1]. fi is assumed yield an individual feedback policy. Also, computation of the
to be monotonically increasing to capture the loss of being feedback control law derived from Hamilton–Jacobi–Bellman
infected. A weight cost function for edge from i to j is given equation requires solving nonlinear PDEs which increases the
by gi j (wi j (t) − wioj ) where gi j : R → R+ is convex. The difficulty of distributed implementation. Third, obtaining open-
function satisfies gi j (w) = 0 at and only at w = 0 for all loop policy resorts to maximum principle which well presents
i, j ∈ N because the original weight is optimal to the agent the structure of the optimal solution. This helps us to analyze
when there is no infection. It is optimal in terms of the tradeoff the inefficiency of the NE and obtain a penalty function to
between price and performance. The marginal cost of deviation achieve social optimum as is shown in Section IV.
from the optimal weight will increase as the distance from the
adapted weight to the optimal weight increases. Considering
a time duration from 0 to T ≥ 0, the cost function of agent i
III. A NALYTIC R ESULTS
during time interval [0, T] is given as follows by
∫T N
Õ The solutions to the N-person non-cooperative nonzero-sum
Ji = fi (xi (t)) + gi j (wi j (t) − wioj )dt. (4) differential game (5) played with an open-loop information
0 j=1 structure are open-loop Nash equilibria.
As each node determines its own weight adaptation policy, Definition 1. The weight adaptation trajectories or say the
it naturally leads to a differential game framework defined as control trajectories {w∗i , i ∈ N } constitute an open-loop NE
follows. Consider N agents in the network as N players with solution of the differential game (5) if the inequalities
an index set N = {1, ..., N }. The duration of the evolution of
the game is given by the time interval [0, T]. Denote x(t) = J1 (w∗1, w∗2, ..., w∗N ) ≤ J1 (w1, w∗2, ..., w∗N )
(x1 (t), ..., x N (t))0. Let X = {x ∈ R N |xi ∈ [0, 1], ∀i ∈ N } be
the permissible set of the states. For each fixed t ∈ [0, T], J2 (w∗1, w∗2, ..., w∗N ) ≤ J2 (w∗1, w2, ..., w∗N )
x(t) ∈ X. Let wi (t) = (wi1, ..., wi N ) be the controls of player .. (6)
.
i. The admissible control set for player i is Si = {0 ≤ wi j ≤
w̄i j , ∀ j ∈ N }, i.e., for each fixed t ∈ [0, T], wi ∈ Si ⊂ R N . A JN (w∗1, w∗2, ..., w∗N ) ≤ JN (w∗1, w∗2, ..., w N )
differential equation is given by (2) whose solution describes
the state trajectory of the game corresponding to the N-tuple hold for all control trajectories wi (t) ∈ Si, t ∈ [0, T]. We denote
of control functions {wi (t), 0 ≤ t ≤ T, i ∈ N } and the given xi∗ (t), t ∈ [0, T] the associated state trajectory for i ∈ N .
initial state x0 , (x1 (0), ..., x N (0))0 = (x10, ..., x N 0 )0. Define a
The definition states that at the open-loop NE, no agents
set-valued function ηi (·) for each i ∈ N to characterize the
have incentives to deviate unilaterally away from the optimal
information pattern of player i. We consider the open-loop
trajectory from time 0 to time T.
pattern in our case where ηi (t) = {x0 }, t ∈ [0, T]. We can state
our problem as the following differential game problem: To obtain the necessary conditions for the open-loop NE,
we make two mild assumptions.
min Ji (wi )
wi ∈Si Assumption 1. For each i ∈ N , the infection cost function
(5)
s.t. (2), fi (·) is to be of C 1 class.
where Ji (·) : R N → R and xi (0) = xi0, i = 1, 2, ..., N. Each Assumption 2. For each i, j ∈ N , the weight deviation cost
player aims to find a control policy µi (t, xi0 ) to generate a function gi j (·) is to be of C 1 class.
weight trajectory wi (t). Such control policies are open-loop
ones that depend on the initial condition of the individual state. Each player i ∈ N can decide to receive data or packets
from any other agent. The following observation narrows down
Remark 1. The game defined by (5) is a differential game the set of possible solutions of the open-loop NE.
over networks where the cost only depends on their own state

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
5

Observation 1. If {ui∗j (t), i ∈ N, j ∈ Ni,o


out } is an open-loop

NE solution for the following differential game


u∗mj (t)β j x ∗j (t) + σm if n = m
 Í

 ou t
 j ∈Nm, o

∫T
= −(1 − x ∗ (t))u∗ (t)β

Õ Γi,mn in
if n ∈ Nm,o (12)
min Ji = fi (xi (t)) + gi j (ui j (t) − wioj )dt 
 n nm m
ui ∈Ui 
0
0 j ∈N ou t otherwise,
Õ i, o

(7)
s.t. xÛi (t) = (1 − xi (t)) ui j (t)β j x j (t) − σi xi (t), γi is a vector whose i-th component is −dfi /dxi and other
j ∈Ni,ou
o
t
components are zero, for i ∈ N .
xi (0) = xi0, i = 1, 2, ..., N, Proof: See Appendix B-B.
out , t ∈ [0, T], and Note that Γi turns out to be the same for different i. In
with ui j (t) ∈ [0, wioj ] for i ∈ N, j ∈ Ni,o
∗ later discussion, we shall omit the index i. Now, the dynamics
{wi j (t), i ∈ N, j ∈ N } is an open-loop NE solution for the
of the costate function can be given as pÛ i (t) = Γ(t)pi (t) +
differential game defined by (5), then we have wi∗j (t) = ui∗j (t)
out and w ∗ (t) = 0 otherwise, for all player i and for γi (t) for i ∈ N which sheds some light on the design for
if j ∈ Ni,o ij achieving social welfare in the following section. Γ(t) is an
each t ∈ [0, T].
L-matrix [20] for every t ∈ [0, T] where the diagonal entries of
Proof: See Appendix B-A. Γ(t) are positive and all off-diagonal entries are non-positive.
Observation 1 simplifies the search process for the open- Therefore Γ(t) is structurally in line with the graph Laplacian
loop NE. Instead of analyzing problem (5), we can focus whose diagonal entries are the out-degrees of the N agents
on problem (7) which contains a smaller admissible control [21]. That is for every zero or negative entry of the matrix
set. Define ui = {ui j , j ∈ Ni,o out }. And u
−i denote the Γ(t), the corresponding entry of the graph Laplacian is zero or
collection of controls of all players except player i, i.e., negative respectively and vice versa. If the original graph is a
u−i = {u1, ..., ui−1, ui+1, ..., u N }. To be specific, the admissible directed acyclic graph, Γ(t) is a lower triangular matrix given
control set of game problem (7) for player i is Ui = {0 ≤ the index of a proper permutation. Other than the topology
ui j (t) ≤ wioj , j ∈ Ni,o
out , t ∈ [0, T]}. From Theorem 5.1 of [19] information, Γ(t) also contains the infection information. Note
and Lemma 1, the differential equation in (7) admits a unique that even though we write the dynamics of the costates in an
solution if the weight adaptation control is continuous in t. affine form, it is actually not affine owing to the fact that Γ(t)
Next, we discuss the derivation of candidate NE solutions depends on x∗ (t) and ui∗j , i ∈ N, j ∈ N as we can see from (12)
for differential game (7) when the information structure of and ui∗j is dependent on pii as we will show next in Theorem
the game is open-loop pattern. Utilizing techniques in optimal 2.
control theory, we arrive at the following result. Theorem 2. Define φi j (t) := pii (t)(1 − xi∗ (t))β j x ∗j (t) where
Theorem 1. For the N-person differential game (7), we have pii (·) is the ith component of the costate function pi (·). The
assumptions 1 and 2. Then, if {u∗i (t), i ∈ N } is an open-loop basic structure of the NE-based optimal weight control, i.e.,
NE solution, and {x∗ (t), 0 ≤ t ≤ T } is the corresponding the solution to (9), can be written as:
state trajectory, there exist N costate functions pi (·) : [0, T] → ui∗j (t) =
R N , i ∈ N , whose j-th component is denoted by pi j (·), such
that the following relations are satisfied: 
 0, −φi j (t) ≤ gi0j (−wioj ),
 (13)
gi0j (−wioj ) < −φi j (t) < gi0j (0),


Õ (gi0j )−1 (−φi j (t)),
xÛi∗ (t) = (1 − xi∗ ) ui∗j (t)β j x ∗j − σi xi∗ (t),  wo ,

−φi j (t) ≥ gi0j (0),

ou
j ∈Ni, o t
(8)  ij
out .
xi (0) = xi0, ∀i ∈ N,
∗ for i ∈ N, j ∈ Ni,o
Proof: See Appendix B-C.
u∗i (t) = arg min Hi (t, pi (t), x∗, u∗−i, ui ), (9) Condition (9) in Theorem 1 can thus be replaced by (13).
ui ∈Ui
Theorem 1 together with (13) provides a weight adaptation
scheme where each agent adapts its weight to minimize the
possibility of being infected and the loss of efficiency/interest.
pÛ i (t) = Γi (t, x∗, u∗1, ..., u∗N )pi (t) + γi (t), pi (T) = 0, (10) The weight of the edge from player i to player j, controlled
by player i, is based on the costate component pii (t), player
where i’s own infection xi (t) and its out-neighbors’ infection. Appar-
Õ
Hi (t, pi, x, u1, ..., u N ) , fi (xi (t)) + gi j (ui j − wioj ) ently, the higher the infection level of agent j is, the lower the
j ∈Ni,ou t weight of edge (i, j) should be. As is shown in (10) and (12),
o
pii (t) is highly coupled and it contains information about the
N   (11)
Õ  Õ  effect of the whole network.
+ u jk βk xk (t) + σj x j (t) ,

 

pi j (1 − x j )
j=1

 k ∈N j,ouo t

 Remark 3. Based on the structure of the optimal control (13),
 
the dynamics of costates (10) as well as Lemma 1, we can
and Γi is a matrix given by infer that the NE-based optimal control trajectory ui∗j (t) is

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
6

continuous for every i ∈ N, j ∈ N which means that there is always on the way to meet the minimum cost. Also, from
no switching in the optimal weight adaptation. theorem 3 (iii), we know for agent i who has no in-neighbors,
its out-link ui∗j (t) will never be 0 if αi β j /σi ≤ gi0j (−wioj ).
Remark 4. From (13), we know the weight between agents
This can be readily shown by φi j = pii (1 − xi∗ (t))β j x ∗j (t) ≤
i and j may be adapted to zero at certain time as one
(αi /σi )(1 − xi∗ )β j x ∗j (t) < αi β j /σi ≤ gi0j (−wioj ).
can see that ui∗j (t) = 0 if −φi j (t) ≤ gi0j (−ωioj ). That means
the connection between agent i and j may be disconnected
IV. I NEFFICIENCY OF NASH E QUILIBRIUM
temporarily which will be restored according to Theorem
3. For agent i, if all its out-links have weight zero, i.e., It is well known that the non-cooperative NE in nonzero-
−φi j (t) ≤ gi0j (−ωioj ) for all j ∈ Ni,o
out , we can view this agent sum games is generally inefficient [22]. There is need to
as being quarantined from infection. We say being quarantined develop a mechanism to attain a higher social welfare or
from infection because there might still be in-links connecting lower aggregate costs through cooperation behavior [23]. The
to agent i which means here, the concept of being quarantined notion of the price of anarchy has been introduced in [24]
is different from the concept in undirected graph. Besides, the to quantify the inefficiency. In the network, the social cost is
weight adaptation scheme we have proposed is different from the aggregate costs of all players. Let u = {u1, ..., u N } where
ou t
quarantining in a sense that the weight adaptation scheme ui (t) ∈ R | Ni, o | be the weight control variable for the whole
does not need to completely disconnect nodes from all other network with admissible set Uo = {u : ui j (t) ∈ [0, ωioj ], ∀i ∈
N, j ∈ Ni,o out , t ∈ [0, T]}. Denote by uo = {uo, ..., uo } the
others but rather adjust weights to connect more loosely with 1 N
particular nodes with a higher likelihood of infection. social optimal solution. The social optimum can be attained
by solving the optimal control problem:
Corollary 1. If gi j (·) is concave, i.e., the marginal cost of
deviation increases as the adapted weight becomes farther ∫T Õ
N N
Õ Õ
away from the optimal weight, the optimal control policy can min Jo = fi (xi (t))+ gi j (ui j (t) − wioj )dt
u∈Uo
be given as follows: 0 i=1 i=1 j ∈Ni,ou
o
t
Õ
gi j (−wioj ) s.t. xÛi (t) = (1 − xi (t)) ui j (t)β j x j (t) − σi xi (t),
 0, φi j (t) ≥ ,

wioj

=

ui∗j (t) j ∈Ni,ou t
gi j (−wioj ) (14) o
 wioj ,
 φi j (t) < wioj , xi (0) = xi0, i = 1, 2, ..., N,

out , where φ (t) is defined in Theorem 2.
(15)
for i ∈ N, j ∈ Ni,o ij
ou t ou t
where Jo : R | N1, o | × · · · × R | NN, o | → R. The application of
We can see that if gi j (·) is concave, the optimal control maximum principle gives the following: the optimal control
policy switches between 0 and wioj . In this paper, we focus uo (t) and corresponding trajectory xo (t) must satisfy the
our study on the case when gi j (·) is convex. following so-called canonical equations:
Before stepping into the numerical computation of the
Õ
xÛio (t) = (1 − xio ) uioj (t)β j x oj − σi xio (t), xio (0) = xi0,
open-loop NE candidates, we go into further analysis and
j ∈Ni,ou
o
t
obtain other structural results that would be beneficial for an
(16)
insightful understanding of the weight adaptation mechanism.
λ(t)
Û = Γ(t, xo, uo, ..., uo )λ(t) + γ, λ(T) = 0,
1 N (17)
Theorem 3. The costate function and the open-loop control
u (t) = arg min H(t, xo (t), λ(t), u(t)),
o
(18)
trajectories have the following properties: u∈Uo
(i) Along the open-loop NE trajectory, pi j (t) ≥ 0 holds for for all i ∈ N , where Γ(t) is the same with the one given in
all i, j ∈ N, j , i and all t ∈ [0, T]. Furthermore, pii (t) (12) for the dynamics of the costate in the differential game
stays positive for all i ∈ N and all t ∈ [0, T). problem and γ(t) = [− f10(x1 (t)), − f20(x2 (t)), ..., − f N0 (x N )]0, the
(ii) The open-loop NE control trajectory ui∗j (t), for i ∈ N, j ∈ Hamiltonian of the optimal control problem is defined as
Ni,oout , satisfies u∗ (t) = w o at and only at t = T and for
ij ij H(t, x(t), λ(t), u(t))
t < T, ui∗j (t) < wioj .
in | = 0, i.e., the in-degree of player i is zero in N N
(iii) If |Ni,o Õ Õ Õ
= fi (xi (t))+ gi j (ui j (t) − wioj )
the original graph, under linear infection cost functions
i=1 i=1 j ∈Ni,ou t
(19)
fi (xi (t)) = αi xi (t), the components pii (t) are bounded o

above by αi /σi . That is, pii (t) ≤ αi /σi for t ∈ [0, T]. N
Õ 
 Õ 

+ λi (t) (1 − xi (t)) ui j (t)β j x j (t) − σi xi (t) ,

 

(iv) If |Ni,o out | = 0, i.e., the out-degree of player i is zero in
 ou t

the original graph, under linear infection cost function i=1  j ∈Ni, o

 
where fi (xi (t)) = αi xi (t), the costate component pii (t) is
and λ(·) : [0, T] → R N is the costate function, λi is its ith
strictly monotonically decreasing over t.
component. The Hamiltonian of the optimal control problem
Proof: See Appendix B-D. (19) is different from the individual Hamiltonian defined in
Theorem 3 indicates that during the time interval [0, T), the (11). But they are related. The Hamiltonian of the optimal
agents, with an incentive to lower their own costs, adapt their control problem includes the cost of all agents over the
weight accordingly to impede the spreading of virus. After network instead of just individual’s cost. Also, the costate λ(t)
the prescribed alert duration [0, T], a recovery of topology is corresponds to the state of all agents x(t). The counterpart of

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
7

λ j (t), j ∈ N in the individual Hamiltonian defined for the game


problem is pi j (t), j ∈ N . Due to the similar structure of the
Hamiltonian of the optimal control problem and the individual 1 2 3 4 5
Hamiltonians for the game problem, after applying maximum
principle, we obtain (16-18) that are in the same structure with
(8-10).
An optimal point can in principle be computed centrally
by network operator to achieve social optimum. However, this Fig. 3. A directed graph with 5 vertices to illustrate the equilibrium solution
will require the network operator to be omniscient and also of a 5-person differential game over the network.
not all the agents have incentives to adapt their connection
weights based on the rule designed to minimize the aggregate Theorem 4 indicates that with a proper choice of the
costs. Also, for large-scale network/system, centralized solu- penalties ci (t), i ∈ N , the necessary conditions for the open-
tion gives rise to computational problems and implementability loop NE solution of the differential game is aligned with the
issues. So, centralized optimal control solution is impractical. necessary conditions for the optimal solution of the social cost
To achieve social optimum in a distributed way, a mechanism optimal control problem. The counterpart of condition (10) for
needs to be designed on behalf of the network operator. the penalty-based differential game can be written as
The strategy for the network operator is to set penalties
ci := (ci (t), t ∈ [0, T]) Í so that the cost for player i at time t is pÛ i (t) = Γ(t)pi (t) + γ̂i (t) (22)
lˆi (t, x, u) , fi (xi (t)) + j ∈Ni,ouo t gi j (ui j (t) − wioj ) + ci (t). To set a where jth component of γ̂i (t) is ∂( fi + ci )/∂ x j . In implemen-
proper penalty for each player, we need to utilize the theory of tation, the system operator sends the penalty function to each
potential differential games [25] which is an extension of the agent. If the open-loop NE and the social optimal solution
potential game concept for static games [26]. The following uniquely exist, the open-loop NE achieves the social optimum.
is the definition of potential differential games.
∫T Definition 3. A directed graph is strongly connected if it
Definition 2. A differential game with cost Jˆi = 0 lˆi (t, x, u)dt contains a directed path from j to i for every pair of vertices
and dynamics defined by (2) is an potential differential game {(i, j) : i ∈ N, j ∈ N, i , j}.
if there exists a function π : X × U1 × · · · × U N × [0, T] → R
that satisfies the following condition for every player i ∈ N : In a directed graph, a directed path is a sequence of edges
∫ T which join a sequence of vertices, but with the restriction
lˆi (t, xi, x−i, ui, u−i ) − lˆi (t, x̂i, x̂−i, vi, u−i )dt that the edges all be directed in the same direction. What we
0
∫ T (20) have developed so far in this section is for cases where the
original weighted network is strongly connected. If the original
= π(t, xi, x−i, ui, u−i ) − π(t, x̂i, x̂−i, vi, u−i )dt,
0 network is not strongly connected, we have the following.
for all ui, vi ∈ Ui , where x, x̂ ∈ X are the corresponding states Definition 4. Given a graph, if there exists a directed path
under controls {ui, u−i } and {vi, u−i }, respectively. from vertex j to vertex i, we say i is reachable from j. Denote
by R i ⊆ N the set of vertices that i can be reachable from.
If we can findÍ Nci (t) for everyÍiN∈ NÍ such that relation (20)
holds for π = i=1 fi (xi (t)) − i=1 j ∈Ni,ouo t gi j (ui j (t) − wioj ), In graph theory, a single vertex is defined to connect to itself
then the differential game is a potential game whose open-loop by a trivial path. We have i ∈ R i . If the graph is strongly
NE solution is equivalent to the open-loop solution of optimal connected, for every i ∈ N , R i = N . Otherwise, for some
control problem defined by (15). The following theorem gives i ∈ N , we have R i ⊂ N . Denote R i,o the counterpart of R i
important insights about choosing proper penalties ci (t). for the original graph defined by (V, E, W o ).
Theorem 4. Consider a differential game with penalties where Corollary 2. Consider the differential game with cost func-
the cost of player i is given by tions defined in Í
(21) and dynamics given by (7) as in Theorem
∫ T ∫ T 4. Let ci (t) = j ∈R i, o \{i } f j (x j ). Then, if {u∗i (t), i ∈ N } is
Jˆi = lˆi (t)dt = Ji + ci (t)dt, (21) an open-loop NE solution for the new differential game, and
0 0
{x∗ (t), 0 ≤ t ≤ T } is the corresponding state trajectory, the
which is obtained by introducing a penalty term introduced to relations (16), (17), and (18) hold for u∗ and x∗ with uo
Ji in (7), and the constraint dynamics is in accordance with replaced by {u∗i (t), i ∈ N } and xo replaced by x∗ .
the differential game defined by (7). Let ci (t) = j,i f j (x j (t)).
Í
Then, the differential game with (21) is a potential differential Proof. The proof of Corollary 2 simply follows from the proof
game corresponding to the optimal control problem defined for Theorem 4. 
by (15), and if {u∗i (t), i ∈ N } is an open-loop NE solution for To illustrate Corollary 2, we present an example in Ap-
new differential game with cost (21), and {x∗ (t), 0 ≤ t ≤ T } is pendix C
the corresponding state trajectory, the relations (16) (17) and
(18) hold for u∗ and x∗ with uo replaced by {u∗i (t), i ∈ N } V. A LGORITHMS AND C ASE S TUDIES
and xo replaced by x∗ .
In this section, we provide the set-up information for
Proof: See Appendix B-E. the case studies. Besides, based on the equilibrium analysis

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
8

Algorithm 1 DVR the penalty-based differential game is inline with the solution
o ; W ; .
Input: xi0, i ∈ N; βi , σi , i ∈ N; fi (·), gi j (·), i ∈ N, j ∈ Ni,ou t o
of the optimal control problem defined in (15).
Step 1: Each i ∈ N selects ui (t) ∈ Ui arbitrarily.
Step 2: Forward part: Obtain xi (t) from xi0 and (8) for each i ∈ N.
Initially, the input data includes initial infection data: xi0
if solving social problem (15) then for all i ∈ N ; infection rate βi , recovery rate σi for all i ∈
Step 3: Backward part: Obtain λ(t) from pÛ i (t) = Γ(t)pi (t) + γ̂i (t) and N ; the original topology W o ; the cost functions fi (·), gi j (·)
p(T ) = 0. out and a stopping value  to stop the
else
for all i ∈ N, j ∈ Ni,o
Step 30 : Backward part: Obtain pi (t), i ∈ N from pÛ i (t) = Γ(t)pi (t) + algorithm. In the first step, each player arbitrarily selects a
γi (t) and pi (T ) = 0. continuous control trajectory within the admissible control set
end if
Step 4: Control update: Update control ûi (t) based on (13).
Ui for every out-link it has: ui j (t) for each i ∈ N , for all
j ∈ Ni,o out , and reports the weight adaptation scheme u (t) to
Step 5: Control policy check: i
if k û − u k ∞ ≥  then the network operator. In step 2, each player utilizes the initial
Set u = û. Return to Step 2.
else
infection data xi0, i ∈ N and the control policy ui, i ∈ N ,
Set u∗i = ûi . u∗i is the optimal weight adaptation scheme for agent i. solve (8) forward in time to obtain xi, i ∈ N and report it
end if to the network operator. If the system aims to achieve the
social optimal control problem, then the algorithm goes into
step 3. Otherwise, the algorithm steps into step 30. In step 3,
and the optimal control analysis, an algorithm is proposed the system operator utilizes the reported ui and xi, i ∈ N , the
to compute the optimal weight adaptation trajectory for the infection damage cost fi (·), i ∈ N to compute p(t) backward
system operator and the agents. based on (22) and sends pi (t) back to the corresponding player
i. In step 30, the system operator utilizes the reported ui and
xi, i ∈ N , the infection damage cost fi (·), i ∈ N , computes
A. Preliminaries
pi (t), i ∈ N backward based on (10) and sends pi (t) back to the
In the simulation, the infection cost function fi (·) is given corresponding player i. In the next step, each player updates
to be linear in xi (t), i.e., fi (xi (t)) = αi xi (t). Here, we set its control based on (13) which only requires its out-neighbors
αi = α, ∀i ∈ N . The weight adaptation cost is taken to be infection information and reports the updated control policy
quadratic where gi j (ui j − wioj ) = (1/2)di j (ui j − wioj )2 for all to the network operator. Denote by ûi (t) the updated control
out . Unless otherwise stated, let α = 1, d =
i ∈ N, j ∈ Ni,o i ij policy. If k û − uk∞ ≥ , the algorithm goes back to step 2.
d = 0.2 for all i ∈ N, j ∈ Ni,o out . Note that under this setting,
Otherwise, the latest updated policy ûi is the optimal control
assumptions 1 and 2 hold and gi j (·) is even and convex. policy for agent i.
The original network in the simulation is a bi-directional
scale-free network with 150 agents generated based on the
Barabási-Albert model [27], [28]. We select this model since C. Numerical Results
many kinds of computer networks, including the internet In this subsection, we present the numerical results. First,
and the web graph of the World Wide Web, have scale- we show the dynamics of the costate function for all players.
free properties. We generate the network by following the Then, we show the evolution of the weight adaptation, the
growth and preferential attachment properties given in section infection and the costate of selected agents to see individuals’
VII of [27]. For simplicity, the original weight is set to be behaviors. Second, we give the comparisons between the
wioj = 1 for all edge (i, j). Let hk in i hk out i be the average optimal control based-weight adaptation scheme (this scheme
in-degree (out-degree) of the network we generated. We have is equivalent to the penalized differential game based-weight
hk in i = hk out i = 7.545. adaptation scheme) and the differential game-based weight
For simplicity, we take same infection rates and curing rates adaptation scheme. The optimal control based adaptation
for all players. Unless otherwise stated, let βi = β = 0.04 and scheme is from solving optimal control problem (15). The
σi = σ = 0.1. From the result in [30] and the fact that the two schemes together with the case of weight adaptation. are
largest real part of the eigenvalues of matrix (W o B − D) is compared in terms of the total cost and the infection level of
0.5368, we say that the virus epidemic will outbreak in the the whole network.
original network. The initial infection level is also set to be From (13) and φi j (t) := pii (t)(1 − xi∗ (t))β j x ∗j (t), we know
the same for all players, xi0 = 0.16 for all i ∈ N . Table I is a that the weight adaptation of player i is based on its own
summary of the setups. infection, its out-neighbors, and the costate component pii .
The infection of player i and its neighbors are just local
information. From (10) and (12), we can see the effect of the
B. Computational Algorithm whole network’s situation is conveyed by costate component
Note that we aim to propose an implementable distributed pii to the weight adaptation strategy of player i. Thus, we
virus resistance (DVR) algorithm. Based on the algorithm pro- investigate the dynamics of pii in Fig. 4, where the costate
posed in [29] for computation of open-loop NE for nonzero- component pii ’s dynamics for all agents are plotted. As we
sum differential games, we present, in algorithm 1, the DVR can see, pii (t) is positive for all i ∈ N during the whole
algorithm to compute the candidate open-loop NE solutions time interval which corroborates Theorem 3. For most of the
for the differential game described by (7) and the penalty- players, the value of pii is high at the very beginning and
based differential game defined in Theorem 4. The solution of then decreases to 0. One interpretation is that players are more

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
9

TABLE I
PARAMETERS USED FOR NUMERICAL STUDY UNLESS OTHERWISE STATED .

Cost functions Network Topology Spreading


fi (xi ) gi j (ui j − wioj ) αi di j N wioj hk ou t i hk i n i βi σi T

αi xi 1
2 di j (ui j − wioj )2 1 0.2 150 1 7.545 7.545 0.04 0.1 20

35 12 1
u1,150
u150,1 p1,1
1.2 x1 10 p150,150
30 x150 0.8
1
25 8
0.8 0.6
6
20 0.6 0.4
4
15 0.4
0.2 ū1
0.2 2
ū50
10 ū150
0 0 0
0 10 20 0 10 20 0 10 20
5 t t t

0 Fig. 5. The weight adaptation, the infection and the costate of agents 1, 50
0 5 10 15 20 and 150 over time. The first plot shows the weight adaptation of edge (1, 150)
pii (t) and edge (150, 1) and the infection of agent 1 and 150. The second plot of Fig.
5 shows the costate dynamics of the two agents. The Í third gives the average
Fig. 4. The dynamics of costate pi i (t) for all players i ∈ N. weight adaptation of agents 1, 50 and 100, where ūi = j ∈N ou t ui j /|Ni,ou t
o |. i, o

sensitive at the beginning to their out-neighbors infection and 1 1 1


tend to reduce their weights more heavily. 0.95
0.8 0.8
To see individual behaviors and states, we rank the agents 0.9 x1
x2
0.6 0.6
based on their out-degrees. Agent 1 has the largest out-degree. 0.85
x60
x83
x93
From the first plot of Fig. 5, agent 1 is more likely to be 0.8
0.4 0.4 x150

u83,1
infected due to its large degree. We can see that all weights 0.75
u1,2
u1,83 0.2 u83,60 0.2
equal 1 at and only at t = T = 20, which corroborates Theorem u1,150 u83,93
0.7 0 0
3. The weight u150,1 (t) is reduced to 0 for some time. This 0 10
t
20 0 10
t
20 0 10
t
20

phenomenon occurs because the costate and its out-neighbors’


infection levels are high during that time period. The third Fig. 6. The weight adaptation of links between a fixed agent and its neighbors.
The weight adaptation of links connecting from agent 1 to its out-neighbors,
plot shows that agents with higher out-degrees reduce less agents 2, 83 and 150, is plotted in the first figure. In the second figure, the
weight. Usually, one needs to cut more weights on highly weight adaptation of links connecting from agent 83 to its out-neighbors,
connected nodes to slow the infection propagation. However, agents 1, 60 and 93. The infection level of all agents involved is plotted in
the last figure.
the obtained weight adaptation scheme in this paper is a result
of considering both the infection and the loss of efficiency of
the network agents. There is a trade-off between maintaining 3500

the network’s performance and lowering the infection. So, Differential Game Based Adaptation
No Weight Adaptation
3000
the agents with higher out-degrees may reduce less weight Optimal Control Based Adaptation

to maintain the performance/function. 2500


To show that agents adapt heterogeneously to different
neighbors, we present Fig. 6. We can know from Fig. 6 that 2000
Jo

agent 1 adapts weights with his/her out-neighbors accordingly


1500
based on the evolution of the infection levels of his/her
out-neighbors. As we can see, agent 1 cuts more weight 1000
on neighbors with higher infection levels. For example, the
infection level of agent 2 is higher than agent 150 all the time. 500

Thus, weight u1,2 is lower than u1,150 . Also, agent 83 reduces


0
its weight on agent 1 to zero due to the latter’s high infection 0 0.2 0.4 0.6 0.8 1
α
1.2 1.4 1.6 1.8 2

level while its weight on agent 93 remains above 0.5.


Here, we compare the NE-based weight adaptation scheme, Fig. 7. The total cost Jo versus different infection cost functions k under the
the optimal control-based weight adaptation scheme, and no differential game-based weight adaptation scheme, the optimal control-based
weight adaptation scheme. In Fig. 7, we plot the total cost weight adaption scheme, and the scheme without weight adaptation.

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
10

No Adaptation Optimal Control Based Adaptation Game Based Adaptation distributed algorithm over large-scale networks to control the
0.8 0.8
α=0.1 α=0.5 macroscopic behaviors of the virus spreading over networks.
0.7 0.7
Numerical examples have been used to illustrate the virus-

Infection Level
Infection Level

0.6 0.6

0.5 0.5 resistance of the proposed scheme and the inefficiency of the
0.4 0.4 Nash equilibrium. The differential game approach achieves a
0.3 0.3
better performance than its centralized counterpart in terms
0.2 0.2

0.1 0.1
of the mitigation of virus spreading. One future direction for
0 5 10 15 20 0 5 10 15 20
t t this work would be to study the steady behavior of long-term
virus-resistance scheme where the duration of virus spreading
0.8 0.8

0.7
α=0.8
0.7
α=1.2 is sufficiently long.
Infection Level

Infection Level
0.6 0.6

0.5 0.5
A PPENDIX A
0.4 0.4

0.3 0.3
L EMMAS
0.2 0.2
Lemma 1. The dynamics equation G(x(t), W(t)) is uniformly
0.1 0.1
0 5 10
t
15 20 0 5 10
t
15 20
Lipschitz in x and wi for each i ∈ N .
Proof. From (3), we have
Fig. 8. The dynamics of the whole network’s infection under differential
game-based weight adaptation scheme, the optimal control-based weight
kG(x(t), W(t)) − G(x̂(t), W(t))k∞
adaption scheme, and the scheme without weight adaptation. =k(W(t)B − D)x(t) − (W(t)B − D)x̂(t)
− Xd (t)W(t)Bx(t) + X̂d (t)W(t)Bx̂(t)k∞
Jo under the three schemes for different α. We observe that ≤k(I N − Xd (t))W(t)B(x(t) − x̂(t))k∞
no adaptation scheme cause the most total cost. For different + kD(x(t) − x̂(t))k∞ + k(Xd (t) − X̂d (t))W(t)Bx(t)k∞
values of α, the NE-based scheme always incurs a higher cost
≤2β M kW(t)k∞ kx(t) − x̂(t))k∞ + δ M kx(t) − x̂(t))k∞
than the optimal control-based scheme, which indicates the
inefficiency of the NE solution. From the plot, we see that a ≤ max{2β M , δ M }kx(t) − x̂(t))k∞ (1 + kW(t)k∞ ),
higher α causes more inefficiency. where β M B maxi βi and δ M B maxi δi , and I N B
Fig. 8 is presented to show the virus-resistance of the diag(1, 1, · · · , 1). Since wi j is bounded by w̄i j , (1 + kW(t)k ∞ )
proposed schemes. The black line shows the infection level is bounded. So, G is uniformly Lipschitz in x. The proof for
for the case without adaptation, the blue line shows the case G is uniformly Lipschitz in wi for all i ∈ N can be obtained
with the game-based scheme, and the green line shows the case by following the similar steps. 
with the optimal control-based scheme. Even though the game-
based scheme is inefficient in terms of minimizing the total Lemma 2. Let xi , i ∈ N be the corresponding solution to the
cost, it outperforms the optimal control-based scheme since ODEs in (2). For all i ∈ N , given 0 < xi (0) ≤ 1, xi (t) ∈ (0, 1)
the infection level under the game-based scheme is always holds for all t ∈ (0, ∞).
lower than the infection level under the optimal control-
Proof. The proof follows Lemma 1 of [14]. It is clear that
based scheme. No matter in which case, the scheme we have
xi (t) is a continuous function of time. When xi (0) = 1,
proposed has proven to be virus-resistant and generated a
then from (2), we have xÛi (0) < 0, which means once xi (t)
lower total cost than the scheme without adaptation did.
reaches 1, it cannot stay there. On the other hand, when
xi (0) ∈ (0, 1), the solution xi (t) would always lie in (0, 1).
VI. C ONCLUSION AND F UTURE W ORK Otherwise, suppose that there exists t1 such that xi (t1 ) = 0 or
In this paper, we have established a differential game frame- xi (t1 ) = 1. In the first case, note that xÛi (t) ≥ −σi xi (t) holds for
work to develop decentralized virus-resistant mechanisms over the time interval (0, t1 ], which gives xi (t1 ) ≥ xi (0)e−σi t1 > 0.
complex networks. We have shown that weight adaptation It yields a contradiction. In the second case where xi (t1 ) = 1,
policies allow nodes to change weights to mitigate their we have xi (t) < 1 over time interval [0, t1 ). So, we have
infection. The differential game approach has captured the xÛi (t1− ) ≥ 0 which contradicts the fact obtained from (2) that
strategic and dynamic behaviors of a large number of self- xÛi (t1 ) < 0. 
interested agents over time-varying networks. Each player
adapts its weight based on its own infection and its out- A PPENDIX B
neighbors infection. It has been observed that the higher levels PROOF
of its out-neighbors’ infection lead to lower weights. The effect
of non-local behaviors on the adaptation strategy has been A. Proof of Observation 1
encoded in the costate function. We have discussed the ineffi- Proof. Given the original weight pattern W o , suppose that
ciency of the open-loop Nash equilibrium and have proposed wioj = 0, i.e., j < Ni,o
out and w ∗ (t) > 0. Obviously, player i can
ij
a penalty-based mechanism to achieve efficiency by imposing lower his own cost by deviating wi j (t) from wi∗j (t) to 0, which
local costs induced by reachable nodes. The differential game contradicts the fact that wi∗j (t) is the open-loop NE. A similar
framework has enabled the design and implementation of a statement can be made for the case when wi∗j > wioj . 

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
11

B. Proof of Theorem 1 in | = 0 indicates w o = 0 for all j ∈ N . Based on (10)


|Ni,o ji
Proof. Based on Theorem 6.11 in [19], conditions (8) and (9) and (12), the dynamics of the costate component pii (t) can be
are directly derived. To obtain (10), we have written as
∂Hi
 
pÛii (t) = − = − fi0(xi∗ (t))
 Õ 
pÛii (t) = ui j (t)β j x j (t) + σi pii (t) − αi .

 ∗ ∗


∂ xi  
 j ∈N ou t 


 Õ ∗

  i, o 
+ pii ui j (t)β j x j (t) + σi
 ∗


Note that the first term in the bracket is non-negative and
pii (T) = 0. Moving backward from T, it’s obvious that pii (t)

 j ∈N ou t 

 i, o 
is bounded above by αi /σi .
Õ
− pi j (t)(1 − x ∗j (t))u∗ji (t)βi,
(23) Under conditions stated in (iv), the dynamics Í of the costate
j ∈Ni,i no
component pii (t) can be written as pÛii (t) = −αi + j ∈N i n −(1−
i, o
∂Hi x ∗j (t))u∗ji (t)βi pi j (t). We have proved in (i) that pi j (t) ≥ 0 holds

 

 Õ ∗
pÛi j (t) = − = pi j u jk (t)βk xk (t) + σj
 ∗


∂ xj   for all i, j ∈ N, i , j and for t ∈ [0, T] which indicates pÛii < 0.
 k ∈N ou t
Thus, pii (t) is strictly monotonically decreasing over t.

Õ  j, o  
− pik (t)(1 − xk (t))uk j β j .
∗ ∗

k ∈N j,i no E. Proof of Theorem 4


Reformulating (23), we obtain condition (10). As the terminal Proof. Assume that uo = (u1o, ..., uon ) is an open-loop op-
cost is unspecified and the final state is free, we have the timal control of the centralized control problem (15), and
transversality condition pi (T) = 0.  xo = (x1o, ..., xon ) is the state path under the optimal control. Fix
an arbitrary i ∈ N , and let ui , uio be an open-loop strategy
C. Proof of Theorem 2 for player i. Let x = (x1, ..., xn ) be the new state trajectory
given by (15) corresponding to (ui, u−i o ). As uo and xo are
Proof. By Assumption 2 and the fact that gi j , i, j ∈ N is
optimal for the optimal control problem (15), then
convex, we know that the Hamiltonian Hi is differentiable and
convex on ui j for every j. Also, the admissible control set is ∫ TÕ N Õ Õ
convex. Thus, the solution of (9) can be obtained by letting fi (xi (t))+ g jk (uojk (t) − w ojk )
0
∂Hi (ui j )/∂ui j = 0, i.e., i=1 j,i k ∈N j,ouo t
Õ
∂Hi (ui j (t)) + o
gik (uik (t) − wik )dt
= gi0j (ui j(t) − ωioj ) + pii (t)(1 − xi (t))β j x j (t)
∂ui j (t) k ∈Ni,ou t
o

= gi0j (ui j(t) − ωioj ) + φii (t) = 0. N


T Õ N
∫ Õ Õ
≥ fi (xio (t))+ gi j (uioj (t) − wioj )dt.
Suppose ∂Hi (ũi j (t))/∂ui j (t) = 0. If ũi j (t) ∈ [0, ωioj ] which 0 i=1 i=1 j ∈Ni,ou t
o
happens if and only if gi0j (−wioj ) < −φi j (t) < gi0j (0), then
according to (9), ui∗j (t) = ũi j (t) = (gi0j )−1 (−φi j (t)). Otherwise, Adding ∫ to both sides of this this inequality the con-
T Í Í o o
if ũi j (t) < 0, ui∗j (t) = 0 while if ũi j (t) > ωioj , ui∗j (t) = ωioj . Thus, stant − 0 j,i k ∈N j,ouo t g jk (u jk (t) − w jk )dt, we obtain that
we have the optimal control rule in the form of (13).  Jˆi (ui, u−i
o ) ≥ Jˆ (uo ) for all u ∈ U . According to the definition
i i i
of open-loop NE for differential games in (6), we know uo is
D. Proof of Theorem 3 also an open-loop NE for the differential game with penalties.
Proof. For proof of (i), note the fact that if h(x) is a continuous To show the optimal control problem (15) shares the same
and piece-wise differential function over [a, b] such that h(a) = necessary conditions with the new differential game, we again
h(b) while h(x) , h(a) for all x in (a, b), (dh/dx)(a+ ) and utilize the maximum principle. The Hamiltonian of player i
(dh/dx)(b− ) cannot be negative simultaneously. From (10), for the new differential game is Ĥi = Hi + ci (t). We can find
we have pi (T) = 0, which gives pÛii (T − ) = limt→T − pÛii (t) = that relations (8) (9) and (10) under the Hamiltonian Ĥi are
− fi0(xi (T)) < 0. Hence, there exists i > 0 such that pii > 0 aligned with relations (16) (17) and (18) where λ(t) = pi (t) at
and pi j ≥ 0 for j ∈ N /{i} over [T −i, T]. Suppose that one of each t for all i ∈ N . 
the costate component pi j violates the inequality first at ta < T,
i.e., we have pi j (t) ≥ 0 for t ≥ ta , we obtain pÛi j (ta+ ) ≤ 0 which A PPENDIX C
is not feasible. If it is pii that first violates E XAMPLE
Í the inequality
at time ta , we have pÛii (ta+ ) = − fi0(xi∗ (t)) − j ∈N i n pi j (t)(1 − To illustrate Corollary 2, we consider a directed network in
i, o
x ∗j (t))u∗ji (t)βi < 0 which is in contradiction with the fact. Fig. 3. Here, R 1 = {1}, R 3 = {1, 2, 3, 4}, R 5 = {1, 4, 5}. The
To prove (ii), note that at time T, we have pi (T) = 0. Γ(t) associated with this network can be rewritten as an upper
Combined with the fact that gi j (·) is convex and continuously triangular block matrix. The upper triangular matrix is denoted
differentiable, gi j (0) = 0 and pii (t) > 0, by expression (13), by ΓR i where the first |R i | rows and columns of this matrix
we have that ui∗j (t) = wioj holds only at t = T. represent the vertices in R i in an ascending order. The last
For proof of (iii), from Observation 1, we know that if w oji = N − |R i | rows and columns represent the rest of the vertices
0, the optimal weight adaptation u∗ji (t) = 0 for every t. Here, in N \R i in an ascending order. For example, the permutation

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCNS.2019.2931862, IEEE
Transactions on Control of Network Systems
12

for agent 5 is {1, 4, 5, 2, 3}. Thus, the dynamics of p5 under [13] Gross, T., DLima, C.J.D. and Blasius, B., “Epidemic dynamics on
the differential game given in Corollary 2 can be written as an adaptive network”. Physical review letters, vol. 96, no. 20, 2006,
p.208701.
 pÛ51   p51   − f 0(x1 )  [14] Paré, P.E., Beck, C.L. and Nedich, A., “Epidemic processes over time-
 
 pÛ54  " u #  p   − f10(x )  varying networks”. IEEE Transactions on Control of Network Systems,
Γ 03×2  54   4 4  vol. 5, no. 3, pp. 1322-1334, Sept. 2018.
 pÛ55  = lR5  p55  +  − f 0(x5 )  ,
 
r 5 [15] Feng, Y., Ding, L., Huang, Y.H., Zhang, L. “Epidemic spreading on
 
 pÛ52  ΓR5 ΓR    
weighted networks with adaptive topology based on infective informa-
5
 p52  
  0 
tion”. Physica A: Statistical Mechanics and its Applications, vol. 463,
  
 pÛ53   p53   0 
      2016, pp.493-502.
where [16] Huang, Y., Ding, L. and Feng, Y. “A novel epidemic spreading model
with decreasing infection rate based on infection times”. Physica A:
u∗ β x ∗ + σ1
Í
0 0 Statistical Mechanics and its Applications, vol. 444, 2016, pp.1041-
 j ∈ {2,3} 1j j j


1048.
ΓR5 =  −(1 − x1 )u14 β4 u4j β j x j + σ4 0  ,
u
 ∗ ∗ Í ∗ ∗

 [17] Draief, M. and Massouli, L. Epidemics and rumours in complex net-
 j ∈ {3,5}  works. Cambridge University Press, 2010.
 0 −(1 − x4∗ )u45 ∗ β σ5  [18] Van Mieghem, P., Omic, J. and Kooij, R., “Virus spread in networks”.
 5
IEEE/ACM Transactions on Networking 17, no. 1 2009, pp. 1-14.
−(1 − x1∗ )u12 ∗ β
 
2 0 0 [19] Başar, T., Olsder, G.J. Dynamic noncooperative game theory, SIAM,
ΓlR5 = ∗ β ,
−(1 − x1∗ )u13 3 −(1 − x4 )u43 β3 0
∗ ∗ 1999.
[20] Young, D. M. Iterative solution of large linear systems. Elsevier, 2014.
u23 β3 x3∗ + σ2
 ∗ 
0 [21] Lin, C. T., “Structural controllability”. IEEE Transactions on Automatic
ΓrR5 = ∗ β .
−(1 − x2∗ )u23 3 σ3
Control, vol. 19, no. 3, 1974, pp. 201-208.
[22] Dubey, P.“Inefficiency of Nash equilibria”. Mathematics of Operations
Thus, if we let ci (t) = j ∈R i, o \{i } f j (x j ), the dynamics of pii
Í
Research, vol. 11, no. 1, 1986, pp. 1-8.
described by (22) is consistent with the dynamics of the ith [23] Başar, T., Zhu, Q. “Prices of anarchy, information, and cooperation in
differential games”. Dynamic Games and Applications, vol. 1, no. 1,
component of λ described by (17). By solving the optimization 2011, pp. 50-73.
problem (18), we know that the optimal control problem shares [24] Roughgarden, T., Tardos, É.“Bounding the inefficiency of equilibria in
the same control rule (13) with the differential game problem. nonatomic congestion games”. Games and Economic Behavior, vol. 47,
no. 2, 2004, pp.389-403.
Since pii (t) = λi (t) for every t ∈ [0, T], we can see the [25] Fonseca-Morales, A. and Hernndez-Lerma, O., “Potential differential
statement in Corollary 2 holds. games.” Dynamic Games and Applications, vol. 8, no. 2, 2018, pp. 254-
279.
R EFERENCES [26] Monderer, D., and Lloyd S. S., “Potential games.” Games and Economic
Behavior, vol. 14, no. 1, 1996, pp. 124-143.
[1] Farwell, J.P., Rohozinski, R. “Stuxnet and the future of cyber war”. [27] Albert, R., Barabási, A.L. “Statistical mechanics of complex networks”.
Survival, vol. 53, no. 1, 2011, pp.23-40. Reviews of Modern Physics, vol. 74, no. 1, 2002, p. 47.
[2] Falliere, N., Murchu, L.O. and Chien, E., “W32. stuxnet dossier”. White [28] Barabási, A.L., Albert, R. “Emergence of scaling in random networks”.
paper, Symantec Corp., Security Response, vol. 5, no. 6, 2011, p.29. Science, vol. 286, no. 5439, 1999, pp. 509-512.
[3] Hayel, Y., Zhu, Q. “Epidemic protection over heterogeneous networks [29] Holt, D., Muktjndan, R.“A Nash algorithm for a class of non-zero sum
using evolutionary Poisson games”. IEEE Transactions on Information differential games”. International Journal of Systems Science, vol. 2, no.
Forensics and Security, vol. 12, no. 8, 2017, pp. 1786-1800. 4, 1972, pp. 379-387.
[4] Guo, D., Trajanovski, S., van de Bovenkamp, R., Wang, H. and [30] Ganesh, A., Massoulié, L. and Towsley, D. “The effect of network
Van Mieghem, P. ”Epidemic threshold and topological structure of topology on the spread of epidemics,” In Proc. of the 24th Annual
susceptible−infectious−susceptible epidemics in adaptive networks”. Joint Conference of the IEEE Computer and Communications Societies
Physical Review E, vol. 88, no. 4, 2013, p.042802. (INFOCOM 2005), vol. 2, 2005, pp. 1455-1466.
[5] Khouzani, M. H. R., Altman E. and Sarkar, S. “Optimal quarantining
of wireless malware through reception gain control.” IEEE Transactions
on Automatic Control, vol. 57, no. 1, 2012, pp. 49–61.
[6] Zhu, Q., Fung, C, Boutaba, R. and Başar, T. “GUIDEX: A game-
theoretic incentive-based mechanism for intrusion detection networks.” Yunhan Huang received the B.Eng. degree in
IEEE Journal on Selected Areas in Communications, vol. 30, no. 11, Automation with honor from Wuhan University,
pp. 2220-2230, 2012. Wuhan, China, in 2016. From 2016-2017, he was a
[7] Wang, X., Xiao N., Xie, L., Frazzoli, E. and Rus, D. “Analysis of Price project officer at the School of Electrical and Elec-
of Total Anarchy in Congestion Games via Smoothness Arguments.” tronic Engineering, Nanyang Technological Univer-
IEEE Transactions on Control of Network Systems, vol. 4, no. 4, pp. sity. He is currently studying for a Ph.D in Electrical
876-885, Dec. 2017. Engineering, at New York University, NY, USA.
[8] Trajanovski, S., Kuipers, F.A., Hayel, Y., Altman, E. and Van Mieghem,
P. “Designing virus-resistant, high-performance networks: a game-
formation approach”. IEEE Transactions on Control of Network Systems,
vol. 5, no. 4, pp. 1682-1692, Dec. 2018.
[9] Hayel, Y., Trajanovski, S., Altman, E., Wang, H. and Van Mieghem, P.
“Complete game-theoretic characterization of SIS epidemics protection
strategies”. In Proc. of 2014 IEEE 53rd Annual Conference on Decision
and Control (CDC), pp. 1179-1184.
[10] Shulgin, B., Stone, L. and Agur, Z. “Pulse vaccination strategy in the Quanyan Zhu (SM02-M14) received B. Eng. in
SIR epidemic model”. Bulletin of mathematical biology, vol. 60, no. 6, Honors Electrical Engineering from McGill Univer-
2018, pp.1123-1148. sity in 2006, M.A.Sc. from University of Toronto
[11] Hu, P., Ding, L., Hadzibeganovic, T. “Individual-based optimal weight in 2008, and Ph.D. from the University of Illinois
adaptation for heterogeneous epidemic spreading networks”. Commu- at Urbana-Champaign (UIUC) in 2013. After stints
nications in Nonlinear Science and Numerical Simulation, 63, 2018, at Princeton University, he is currently an associate
pp.339-355. professor at the Department of Electrical and Com-
[12] Preciado V. M., Zargham, M., Enyioha, C., Jadbabaie, A., and Pappas, puter Engineering, New York University. His current
G. J., “Optimal Resource Allocation for Network Protection Against research interests include Internet of Things, cyber-
Spreading Processes,” IEEE Transactions on Control of Network Sys- physical systems, game theory, machine learning,
tems, vol. 1, no. 1, pp. 99-108, March 2014. cyber deception, network optimization and control.

2325-5870 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like