You are on page 1of 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 71, NO.

1, JANUARY 2024 211

Distributed Consensus Protocol for Multi-Agent


Differential Graphical Games
Shouxu Zhang, Zhuo Zhang , Senior Member, IEEE, Rongxin Cui , Member, IEEE,
Weisheng Yan , Member, IEEE, and Huiping Li , Senior Member, IEEE

Abstract—This brief investigates the multi-agent differential its policy by using the information of its own and immedi-
graphical game for high-order systems. A modified cost function ate neighbors, which is in contrast to traditional multi-agent
for each agent is presented, and a fully distributed control pro- games where each agent has access to the global information
tocol that guarantees the leaderless consensus and optimization
in the sense of Nash equilibrium is designed. The designed of MASs [7]. Kamalapurkar et al. [8] have proven that the
protocol allows the communication topology to only contain a Nash equilibrium is reached if all agents implement their best
directed spanning tree. A numerical example is finally reported response policies in graphical games, but the solution is not
to illustrate the effectiveness of theoretical work. distributed. In [9] and [10], distributed solutions are presented
Index Terms—Multi-agent systems, graphical game, leaderless for graphical games by using the minmax strategy, where the
consensus, fully distributed protocol, directed spanning tree. L2 -gain bounded performance instead of Nash equilibrium is
guaranteed. To obtain the optimal solution which is both dis-
tributed and Nash equilibrium, Peng et al. [11] have proposed
I. I NTRODUCTION a distributed control policy for autonomous vehicles, and the
A. Literature Review optimal solution is solved using the reinforcement learning
N RECENT years, multi-agent systems (MASs) have approach. Tan et al. [12] investigate the population dynam-
I received extensive attention because of their wide appli-
cations in many areas, such as autonomous robots, sensor
ics as the application of graphical games, and the distributed
optimal solution that is Nash equilibrium is obtained. Also,
networks, distributed computing, etc. A distributed protocol an estimator-based adaptive distributed policy that achieves
using only the local neighboring information is required to consensus and Nash equilibrium has been presented in [13].
achieve the consensus, which is the fundamental behavior of In [11], [12], [13], distributed protocols that guarantee the
MASs [1], [2]. Moreover, optimal solutions are often desired Nash equilibrium have been presented, but the communication
in practical applications, and the protocol not only reaches topologies are assumed to be undirected. Further, a novel cost
consensus but also achieves optimal solutions, see [3], [4], [5] function for multi-agent graphical games is presented in [14]
for instance. In these studies, all agents cooperatively min- and [15] with directed topology graphs, where the Nash equi-
imize the global cost function, and no agent considers its librium is reached and the policies are distributed. However,
self-interest. it is worth noticing that in [14] and [15], the topology graphs
On the contrary, the self-interest should be optimized by are assumed to be detailed balanced or strongly connected.
each individual agent in some applications, such as intelligent
transportation, social networks, and pursuit-evasion competi- B. Research Motivations
tion. The game theory provides a mathematical framework
to achieve both consensus and self-interest optimization [6]. Based on the literature review for existing works, the
In particular, the multi-agent graphical game is a noncoop- main issues are summarized as follows. 1) Most of the
erative game in the sense such that each agent implements existing works about multi-agent graphical games investi-
gate the leader-following consensus, however, the consensus
Manuscript received 10 May 2023; accepted 14 July 2023. Date of reaching of leaderless case is more challenging than the leader-
publication 24 July 2023; date of current version 8 January 2024. This following counterpart, due to the Laplacian matrix of the
work was supported in part by the National Natural Science Foundation
of China under Grant U22A2066, Grant U21B2047, Grant 52271333, Grant communication topology related to leaderless case has zero
51909217, and Grant 61733014; in part by the China Postdoctoral Science eigenvalue, while the matrix related to the leader-following
Foundation under Grant 2021M692641; and in part by the Key Research and case has no zero eigenvalues. The study on leaderless consen-
Development Program of Shaanxi Province under Grant 2022ZDLGY03-05,
Grant 2020ZDLGY06-07, Grant 2021GY-289, and Grant 2021GY-257. This sus for graphical games is still open. 2) In [10] and [14],
brief was recommended by Associate Editor C. Hua. (Corresponding author: the communication topology graphs can be directed, but
Rongxin Cui.) assumptions such that graphs are detailed balanced or strongly
The authors are with the School of Marine Science and Technology,
Northwestern Polytechnical University, Xi’an 710072, China (e-mail: connected are required, where general digraphs that only
zhangshouxu@nwpu.edu.cn; zhuozhang@nwpu.edu.cn; r.cui@nwpu.edu.cn; contain a directed spanning tree are still not allowed.
wsyan@nwpu.edu.cn; lihuiping@nwpu.edu.cn). Based on the discussions, we study the distributed optimal
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCSII.2023.3298015. control for multi-agent graphical games with general digraphs,
Digital Object Identifier 10.1109/TCSII.2023.3298015 and the leaderless consensus behavior is considered. The
1549-7747 
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: NORTHWESTERN POLYTECHNICAL UNIVERSITY. Downloaded on February 18,2024 at 08:10:07 UTC from IEEE Xplore. Restrictions apply.
212 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 71, NO. 1, JANUARY 2024

contributions of this brief are threefold. 1) A modified cost for i = 1, . . . , N, where xi = [xi,l ]l∈{1,...,n} ∈ n and ui ∈ 
function for the graphical game is presented, and a distributed are the state and control input, respectively. Notice that any
consensus protocol for high-order linear MASs is designed, controllable single-input linear system can be transformed into
which guarantees both leaderless consensus and Nash equi- the controllable canonical form described in (1).
librium. Note, the presented cost function is less conservative In the leaderless consensus problem of high-order linear
than that in [13] and [15]. 2) The assumptions adopted in MASs, the primary objective is to design a distributed protocol
many existing works, e.g., [10] and [14], such that topology ui for each agent such that
graphs must be detailed balanced or strongly connected are
removed, and the protocol proposed in this brief allows gen- lim xi,1 = lim xj,1 = xc , i, j ∈ {1, . . . , N}, (2a)
t→∞ t→∞
eral digraphs that only contain a directed spanning tree. 3) The lim xi,l = 0, i ∈ {1, . . . , N}, l ∈ {2, . . . , n}, (2b)
t→∞
presented protocol can be managed in a fully distributed way,
that is, the global information is unneeded in both the design where xc represents the final consensus state, which will be
and implementation of the protocol. determined later.

The remainder of this brief is organized as follows. In Let ϕi = n−1 l=1 cl xi,l + xi,n , where cl are selected such that
Section II, some preliminaries are provided. In Section III, roots of the following equation have negative real parts:
the main results about distributed consensus protocol for the
rn−1 + cn−1 rn−2 + . . . + c2 r + c1 = 0. (3)
graphical game are presented. In Section IV, numerical exam-
ples are performed, and the brief ends with conclusions in Design the following control protocol:
Section V.
n n−1
ui = − al xi,l − cl xi,l+1 + τi , (4)
II. P RELIMINARIES l=1 l=1
A. Notations where τi is the control policy to be determined which not
IN represents N-order identity matrix and 0N ∈ N denotes only achieves consensus but also minimize the cost function,
N-dimensional zero vector; P  0 means that P is a positive and the rest terms in ui are used to compensate the local
semidefinite matrix; x(n) stands for the n-order time derivative information in ϕ̇i . Inspired by [15], a modified cost function
of x; also, [xi ]i∈{1,...,N} = [x1T . . . xN
T ]T and
is defined as follows:
⎡ ⎤ N
p11 · · · p1N ∞
⎢ ⎥ Ji (τi , τNi ) = (Ri τi2 + aij (εijT Qij εij − Rj τj2 ))dt, (5)
[pij ]i,j∈{1,...,N} = ⎣ ... . . . ... ⎦. 0 j=1
pN1 · · · pNN
where Ri > 0, Rj > 0, τNi = {uj |vj ∈ Ni }, εij =
 N N
[εi , εj , Nl=1 ail εl , l=1 ajl εl ] , εi = ϕi −
T
j=1 αj ϕj (0), and
B. Graph Theory
ϕj (0) is the initial value of ϕj . Further,
The linkages between N agents are described by a directed ⎡ ⎤
graph G = {V, E}, where V = {v1 , . . . , vN } is the node set Qi Q12
ij Q13
ij Q14
ij
and E is the edge set. E contains an edge (vi , vj ), if vj is ⎢ Q12 Q22 Q23 Q24 ⎥
⎢ ij ⎥
capable to obtain the information from vi . A path from vi to Qij = ⎢ ij13 ij ij
34 ⎥, Qi > 0,
⎣ Qij Q23
ij Q33
ij Q ij ⎦
vj is a sequence of edges (vi , vn1 ), . . . , (vnm , vj ). The graph G Q14 Q 24 Q34 Q 44
ij ij ij ij
is said to contain a directed spanning tree, if there exists a root
node having directed paths to all other nodes. The neighbor and the rest entries in Qij will be determined later.
set of vi is represented by Ni = {vj |vj ∈ V, (vj , vi ) ∈ E}. The definitions of best response and Nash equilibrium are
The adjacency matrix [aij ]i,j∈{1,...,N} is defined as aij = 1 if presented as follows.
(vj , vi ) ∈ E but 0 otherwise, i = j, and aii = 0. The Laplacian Definition 1 [16]: The best response τi∗ of agent i to his
N
matrix L = [lij ]i,j∈{1,...,N} is defined as lii = j=1 aij and
neighbors’ policies τNi satisfies Ji (τi∗ , τNi ) ≤ Ji (τi , τNi ), ∀i ∈
lij = −aij for i = j. {1, . . . , N}.
Assumption 1: The graph has a directed spanning tree. Definition 2 [16]: Policies {τ1∗ , . . . , τN∗ } are defined as the
There exists a nonnegative
 vector α = [αi ]i∈{1,...,N} such Nash equilibrium of the N-player game if
that α T L = 0TN and N α = 1 if Assumption 1 holds [1].
j=1 j Ji (τi∗ , τ−i
∗ ∗
) ≤ Ji (τi , τ−i ), i = 1, . . . , N, (6)

C. Problem Formulation where τ−i = {τl |vl ∈ V, l = i} represents the set of all others’
Consider a group of N agents governed by policies except agent i.
⎡ ⎤ ⎡ ⎤
0 1 ··· 0 0 III. C ONSENSUS P ROTOCOL FOR D IFFERENTIAL
⎢ .. .. . . .. ⎥ ⎢ .. ⎥
⎢ ⎥ ⎢ ⎥
ẋi = ⎢ . . . . ⎥xi + ⎢ . ⎥ui G RAPHICAL G AME
⎣ 0 0 ··· 1 ⎦ ⎣0⎦ The main objective of this section is to design the policy
a1 a2 · · · an 1 τi to guarantee the consensus behavior in (2) and the Nash
= Axi + Bui , (1) equilibrium of N-player game in (6).

Authorized licensed use limited to: NORTHWESTERN POLYTECHNICAL UNIVERSITY. Downloaded on February 18,2024 at 08:10:07 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DISTRIBUTED CONSENSUS PROTOCOL FOR MULTI-AGENT DIFFERENTIAL GRAPHICAL GAMES 213

The value function corresponding to the cost function is Eq. (13) can be rewritten as
∞ N N
Vi (τi , τNi ) = Ri τi2 + aij (εijT Qij εij − Rj τj2 ) dt. (7) aij εijT (Qij − ij )εij = 0, (14)
t j=1 j=1
Then, the Hamiltonian function can be selected as follows:
where
N ⎡ ⎤
Hi (τi , τNi ) = Ri τi2 + aij (εijT Qij εij − Rj τj2 ) + ∇Vi ε̇i , (8) R−1 2
i Pi lii 0 −R−1
i Pi
2 0
⎢ R−1 2 2 −R−1 2 ⎥
j=1 ⎢ 0 j Pj ljj 0 j Pj ljj ⎥

ij = ⎢ −1 ⎥.
where ∇Vi = ∂Vi /∂εi . The Nash equilibrium in (6) is reached −1 2 Ri Pi 2

⎣ −Ri Pi 0 lii 0 ⎦
if every agent plays its best response policy [16]. Using the
0 −R−1 P2l
j jj 0 R−1 2
P
extremum condition ∂Hi /∂τi = 0, the best response policy can j j j
be obtained as Then, it follows from (14) that the HJ equation holds for all
1 possible εi and εj , if the entries in Qij are chosen as in (11).
τi∗ = − R−1 ∇Vi . (9)
2 i Moreover, the following  best response policy is obtained by
Substituting (9) into (8) yields the following Hamilton-Jacobi substituting ∇Vi = 2Pi N j=1 aij (εi − εj ) into (9):
(HJ) equation:
N N
N N
1 1 τi∗ = −R−1
i Pi aij (εi − εj ) = −R−1
i Pi aij (ϕi − ϕj ). (15)
aij εijT Qij εij − R−1
i ∇Vi −
2
aij R−1
j ∇Vj
2
= 0. (10)
4 4 j=1 j=1
j=1 j=1
Lemma
 1: Let the value function take the form Vi = This completes the proof.
Pi N j=1 aij (εi − εj ) . Then, the HJ equation (10) holds for all
2 Remark 1: It is worth noticing that the weighting matrix Qij
possible εi and εj , if the entries in Qij are chosen as in the cost function (5) is required to satisfy Qij  0. Let the
−1 2 entries in Qij be chosen as in (11), that is,
ij = 0, Qij = −Ri Pi , Qij = 0,
Q12 13 14
⎡ ⎤
−1 2 2 −1 2 Qi 0 −R−1
i Pi
2 0
ij = Rj Pj ljj , Qij = 0, Qij = −Rj Pj ljj ,
Q22 23 24
⎢ 2 ⎥
−1 2 −1 2 ⎢ 0 R−1 2 2
j Pj ljj 0 −R−1
j Pj ljj ⎥
ij = Ri Pi /lii , Qij = 0, Qij = Rj Pj ,
Q33 34 44
(11) ⎢
Qij = ⎢ R−1
⎥. (16)
√ −R −1 2 i Pi
2

⎣ i Pi 0 0 ⎦
where Pi = Qi Ri /lii and Pj = Qj Rj /ljj . Moreover, the best lii
0 −R−1 2
j Pj ljj 0 R−1 2
j Pj
response policy is
N Then the eigenvalues of Qij can be calculated as follows using
τi∗ = −R−1
i Pi aij (ϕi − ϕj ). (12) the fact such that Qi = R−1 2
i Pi lii .
j=1  −1 2 
0, 0, Qi + R−1 −1 2 2
i Pi /lii , Rj Pj ljj + Rj Pj .
2
(17)
 partial derivative of Vi with respect to εi
Proof: Taking the
yields ∇Vi = 2Pi N j=1 aij (εi −εj ). Then, substituting ∇Vi into It can be seen from (17) that the eigenvalues of Qij are non-
HJ equation (10) gives negative for all possible Ri > 0 and Qi > 0, which indicates
N
1
N 2 that Qij  0. However, in [15], the weighting matrix used to
aij εijT Qij εij − R−1 4P2i lii2 εi2 + 4P2i ail εl describe the control effort of neighboring agents should be
4 i
j=1 l=1 large enough to guarantee the positive semidefiniteness of Qij ,
N N which makes the results more conservative.
1
− 8P2i lii εi ail εl − aij R−1
j 4P2j ljj2 εj2 Remark 2: If the graph only contains a directed spanning
4
l=1 j=1 tree rooted at agent i, it follows that lii = 0. In this case, the
N 2 N HJ equation always holds for agent i, i.e., Hi (τi∗ , τN ∗ ) ≡ 0.
i
+ 4P2j ajl εl − 8P2j ljj εj ajl εl Hence, the entries of Qij can be chosen as any values satisfying
l=1 l=1 Qij  0, and the conditions in (11) are unneeded. Also, Pi can
N N be chosen as any positive value, and we can choose Pi = 1
= aij εijT Qij εij − R−1
i Pi lii εi −
2 2 2
aij R−1
j Pj ljj εj
2 2 2
without loss of generality. Moreover, it is worth mentioning
j=1 j=1 that the leader-following consensus can be treated as a special
N 2 N N 2 case of leaderless consensus. The leaderless consensus can
− R−1 2
i Pi ail εl − aij R−1 2
j Pj ajl εl be transformed as the leader-following one, if the graph only
l=1 j=1 l=1 contains a directed spanning tree rooted at agent i, and agent
N i can be treated as the leader.
+ 2R−1
i Pi lii εi
2
ail εl Remark 3: It can be observed from (4) and (12) that the
l=1 gains in the control protocol are independent with the global
N N information of communication topology: al and cl rely on
+2 aij R−1
j Pj ljj εj
2
ajl εl the dynamics and dimension of individual agent, Ri can be
j=1 l=1 any positive real number, Pi depends on the communication
= 0. (13) information between agent i and its neighbors. Therefore, each
Authorized licensed use limited to: NORTHWESTERN POLYTECHNICAL UNIVERSITY. Downloaded on February 18,2024 at 08:10:07 UTC from IEEE Xplore. Restrictions apply.
214 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 71, NO. 1, JANUARY 2024

agent can manage its individual control protocol in a fully


distributed way, for both the design and implementation.
Remark 4: The multi-agent graphical game studied in this
brief can be considered as a pursuit-evasion game, and each
evader aims to minimize its cost function in (5). For the
ith evader, the first term Ri τi2 is used to minimize its own
N
j=1 aij εij Qij εij is used to
control effort, the second term T
Fig. 1. Control block diagram for the protocol.
minimize the distance to  other evaders to avoid being iso-
N
j=1 aij Rj τj is used to maximize
lated, and the third term 2

the control effort of other evaders such that the ith evader
has more chance to escape from pursuers. The potential appli-
cations include missile guidance and competitive games of
autonomous vehicles [15]. Fig. 2. The communication topology.
The next result shows that the consensus and Nash equilib-
rium can be achieved using the presented protocol.
Theorem 1: If Assumption 1 holds, the consensus behavior
described in (2) and the Nash equilibrium in (6) are achieved have xi,1 = gi,1 + pi,1 = gi,1 + ϕc /c1 , which indicates that
by using the protocol designed in (4) and (12). Further, the lim xi,1 = ϕc /c1 , i.e., the final consensus state is xc =
final consensus state is
t→∞
N (l−1)
N ϕc /c1 = j=1 αj ϕj (0)/c1 . Furthermore, since xi,l = xi,1 ,
j=1 αj ϕj (0) l = 2, . . . , n, it follows that lim xi,l = 0. Therefore, the con-
xc = . (18) t→∞
c1 sensus behavior in (2) is guaranteed with the final consensus
Proof: Take the time derivative of ϕi , we have state presented in (18).
Since the value function Vi is twice continuously dif-
n−1 n−1 n
ferentiable and its gradient ∇Vi is strictly monotone, the
ϕ̇i = cl ẋi,l + ẋi,n = cl xi,l+1 + al xi,l + ui . (19) N-player game admits a unique Nash equilibrium at the poli-
l=1 l=1 l=1 cies {τ1∗ , . . . , τN∗ } if and only if ∇Vi = 0, ∀i ∈ {1, . . . , N} [17].
Substituting (4) and (12) into (19) yields Moreover, it infers that lim ∇Vi = 0, ∀i ∈ {1, . . . , N} can be
t→∞
N satisfied because lim εi = lim εj = 0. Therefore, the Nash
t→∞ t→∞
ϕ̇i = −R−1
i Pi aij (ϕi − ϕj ). (20) equilibrium in (6) is reached as t → ∞.
j=1 This completes the proof.
It follows from (20) that ϕi reaches consensus if Moreover, a control block diagram is given in Fig. 1 to
Assumption 1 holds [1], i.e., further illustrate the design process of control protocol.
N
IV. N UMERICAL E XAMPLE
lim ϕi = αj ϕj (0). (21)
t→∞ Consider a network of five agents, and the communication
j=1
N topology is depicted in Fig. 2.
Let ϕc = j=1 αj ϕj (0), then a nonhomogeneous linear Each agent is modeled by (1) with n = 4, a1 = a2 = 1
differential equation (DE) can be obtained as follows when and a3 = a4 = 2. Initial conditions are given by xi,1 (0) =
ϕi = ϕc . 0.3(i+2), xi,2 (0) = 0.4(i−3), xi,3 (0) = 0.2(i−1) and xi,4 (0) =
(n−1) (n−2) 0.2(i − 2), i = 1, . . . , 5. Let the characteristic equation (3) be
xi,1 + cn−1 xi,1 + . . . + c2 ẋi,1 + c1 xi,1 = ϕc . (22)
(r + 1)3 = 0, then it is calculated that c1 = 1 and c2 = c3 =
The solution of Eq. (22) is xi,1 = gi,1 + pi,1 , where pi,1 rep- 3. It can be seen from Fig. 2 that the graph only contains a
resents the particular solution of (22), and gi,1 is the general directed spanning tree rooted at agent 1, which means that
solution of the following homogeneous linear DE: l11 = 0. Then based on Remark 2, we can choose Q1j = I4 ,
(n−1) (n−2) j = 1, . . . , 5. Moreover, Let Qi = 1 and  Ri = 10, i = 1, . . . , 5.
gi,1 + cn−1 gi,1 + . . . + c2 ġi,1 + c1 gi,1 = 0. (23)
The trajectories of xi,1 and x̄i = xi,2 2 + x2 + x2 are
i,3 i,4
Noticeably, the characteristic equation of (23) is shown in (3), presented in Fig. 3 and Fig. 4. It is shown that the consen-
which indicates that lim gi,1 = 0 since cl , l = 1, . . . , n − 1, sus behavior in (2) is reached by implementing the proposed
t→∞
are selected such that all roots of (3) have negative real parts. protocol, i.e., lim xi,1 = xc and lim xi,l = 0, l = 2, 3, 4.
t→∞ t→∞
Notice that the nonhomogeneous term ϕc is constant, thus, the Also, the nonnegative vector α is chosen as α = [1 0 0 0 0]T ,
particular solution pi,1 of (22) can be defined as and the theoretical final consensus state can be computed as
xc = −1.7 based on (18), which is consistent with the result
pi,1 = d0 + d1 t + d2 t2 + . . . + dn−1 tn−1 . (24)
in Fig. 3.
Using the method of undetermined coefficients, we have In order to further illustrate the advantage of best response
dl = 0, l = 1, . . . , n − 1, and d0 = ϕc /c1 , thus, the policy τi∗ designed in (12) that achieves Nash equilibrium,
particular solution of (22) is pi,1 = ϕc /c1 . Therefore, we the comparison is made with the policies that are not Nash

Authorized licensed use limited to: NORTHWESTERN POLYTECHNICAL UNIVERSITY. Downloaded on February 18,2024 at 08:10:07 UTC from IEEE Xplore. Restrictions apply.
ZHANG et al.: DISTRIBUTED CONSENSUS PROTOCOL FOR MULTI-AGENT DIFFERENTIAL GRAPHICAL GAMES 215

are positive semidefinite. The leaderless consensus behavior


and optimization in the sense of Nash equilibrium can be
guaranteed by the presented protocol. Also, the design and
implementation of protocol are independent with the global
information of topologies, which means that the protocol can
be managed in a fully distributed way. Finally, a numerical
example is provided, and it is shown that the consensus and
Nash equilibrium can be achieved.
Fig. 3. States xi,1 of agents.

R EFERENCES
[1] W. Ren and R. W. Beard, Distributed Consensus in Multi-Vehicle
Cooperative Control: Theory and Applications. London, U.K.: Springer,
2008.
[2] Y. Zheng and L. Wang, “Consensus of switched multiagent systems,”
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 63, no. 3, pp. 314–318,
Mar. 2016.
[3] Y. Ren, Q. Wang, and Z. Duan, “Optimal distributed leader-following
consensus of linear multi-agent systems: A dynamic average consensus-
Fig. 4. State norm x̄i of agents. based approach,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 69,
no. 3, pp. 1208–1212, Mar. 2022.
[4] Z. Zhang, W. Yan, and H. Li, “Distributed optimal control for linear
multiagent systems on general digraphs,” IEEE Trans. Autom. Control,
vol. 66, no. 1, pp. 322–328, Jan. 2021.
[5] Z. Zhang, Y. Shi, S. Zhang, Z. Zhang, and W. Yan, “Robust cooperative
optimal sliding-mode control for high-order nonlinear systems: Directed
topologies,” IEEE Trans. Cybern., vol. 52, no. 6, pp. 5535–5547,
Jun. 2022.
[6] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory.
Philadelphia, PA, USA: SIAM, 1988.
[7] F. Salehisadaghiani and L. Pavel, “Distributed Nash equilibrium seek-
ing in networked graphical games,” Automatica, vol. 87, pp. 17–24,
Jan. 2018.
[8] R. Kamalapurkar, J. R. Klotz, P. Walters, and W. E. Dixon, “Model-based
Fig. 5. Cost function of agent 2: The solid line denotes the case using policies reinforcement learning in differential graphical games,” IEEE Trans.
{τ1∗ , τ2∗ , τ3∗ , τ4∗ , τ5∗ } at Nash equilibrium, and dash lines denote the case using Control Netw. Syst., vol. 5, no. 1, pp. 423–433, Mar. 2018.
policies {τ1∗ , τ2 , τ3∗ , τ4∗ , τ5∗ } that are not Nash equilibrium (red line: k = 0.2, [9] V. G. Lopez, F. L. Lewis, Y. Wan, M. Liu, G. Hewer, and K. Estabridis,
green line: k = 0.4, brown line: k = 0.6). “Stability and robustness analysis of minmax solutions for differential
graphical games,” Automatica, vol. 121, Nov. 2020, Art. no. 109177.
[10] B. Lian, F. L. Lewis, G. A. Hewer, K. Estabridis, and T. Chai, “Online
learning of minmax solutions for distributed estimation and tracking
equilibrium. For the former case, let {τ1∗ , τ2∗ , τ3∗ , τ4∗ , τ5∗ } be the control of sensor networks in graphical games,” IEEE Trans. Control
set of policies that is Nash equilibrium.  For the latter case, Netw. Syst., vol. 9, no. 4, pp. 1923–1936, Dec. 2022.
let the policy of agent 2 be τ2 = −k N j=1 aij (ϕi − ϕj ), where
[11] B. Peng, A. Stancu, S. Dang, and Z. Ding, “Differential graphical games
for constrained autonomous vehicles based on viability theory,” IEEE
k ∈ {0.2, 0.4, 0.6}, and other policies are taken from (12), i.e., Trans. Cybern., vol. 52, no. 9, pp. 8897–8910, Sep. 2022.
the set of policies is chosen as {τ1∗ , τ2 , τ3∗ , τ4∗ , τ5∗ }, which is [12] S. Tan, Y. Wang, and A. V. Vasilakos, “Distributed population dynam-
not Nash equilibrium. Fig. 5 shows the cost function of agent ics for searching generalized Nash equilibria of population games with
graphical strategy interactions,” IEEE Trans. Syst., Man, Cybern., Syst.,
2 by implementing the aforementioned two sets of policies. It vol. 52, no. 5, pp. 3263–3272, May 2022.
can be seen from Fig. 5 that the cost function by implementing [13] Y. Y. Qian, M. Liu, Y. Wan, F. L. Lewis, and A. Davoudi, “Distributed
the set of policies that is Nash equilibrium is smaller than its adaptive Nash equilibrium solution for differential graphical games,”
IEEE Trans. Cybern., vol. 53, no. 4, pp. 2275–2287, Apr. 2023.
counterparts for the case that is not Nash equilibrium, that [14] F. A. Yaghmaie, K. H. Movric, F. L. Lewis, and R. Su,
is, the condition Ji (τ1∗ , τ2∗ , τ3∗ , τ4∗ , τ5∗ ) ≤ Ji (τ1∗ , τ2 , τ3∗ , τ4∗ , τ5∗ ) “Differential graphical games for H∞ control of linear heterogeneous
described in (6) is satisfied. multiagent systems,” Int. J. Robust Nonlinear Control, vol. 29, no. 10,
pp. 2995–3013, 2019.
[15] M. Liu, Y. Wan, V. G. Lopez, F. L. Lewis, G. A. Hewer, and
K. Estabridis, “Differential graphical game with distributed global Nash
V. C ONCLUSION solution,” IEEE Trans. Control Netw. Syst., vol. 8, no. 3, pp. 1371–1382,
In this brief, a distributed consensus protocol for high-order Sep. 2021.
[16] K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, “Multi-agent differ-
MASs has been presented by using the optimization method ential graphical games: Online adaptive learning solution for synchro-
in the sense of graphical games, and general digraphs that nization with optimality,” Automatica, vol. 48, no. 8, pp. 1598–1611,
only contain a directed spanning tree are allowed. A modified 2012.
[17] Z. Li and Z. Ding, “Distributed Nash equilibrium searching via fixed-
cost function that describes the differential graphical game is time consensus-based algorithms,” in Proc. Amer. Control Conf. (ACC),
proposed, and it has been proven that the weighting matrices Philadelphia, PA, USA, 2019, pp. 2765–2770.

Authorized licensed use limited to: NORTHWESTERN POLYTECHNICAL UNIVERSITY. Downloaded on February 18,2024 at 08:10:07 UTC from IEEE Xplore. Restrictions apply.

You might also like