Professional Documents
Culture Documents
wangliang@snnu.edu.cn
Abstract—In this article, we investigate spectrum allocation in slowly varying large-scale fading information. In [7], graph
GLOBECOM 2020 - 2020 IEEE Global Communications Conference | 978-1-7281-8298-8/20/$31.00 ©2020 IEEE | DOI: 10.1109/GLOBECOM42002.2020.9322537
vehicle-to-everything (V2X) network. We first express the V2X partitioning algorithms are exploited to divide highly interfer-
network into a graph, where each vehicle-to-vehicle (V2V) link ing V2V links into disjoint spectrum-sharing clusters before
is a node in the graph. We apply a graph neural network (GNN)
to learn the low-dimensional feature of each node based on the formulating the spectrum sharing problem, which reduces the
graph information. According to the learned feature, multi-agent computational complexity and network signaling overhead.
reinforcement learning (RL) is used to make spectrum allocation. In recent years, machine learning, especially reinforcement
Deep Q-network is utilized to learn to optimize the sum capacity learning (RL), has shown its power in addressing various
of the V2X network. Simulation results show that the proposed engineering problems, including resource allocation in com-
allocation scheme can achieve near-optimal performance.
Index Terms—Vehicular Communications, Multi-agent RL, munications [8], [9]. The work in [10] shows that the deep RL
GNN, Resource Allocation framework can address the resource management problems
and is comparable and some-times better than heuristic based
I. I NTRODUCTION approaches for a multi-resource cluster scheduling problem.
In [11], a deep RL approach is developed to dynamically
Recently vehicle-to-everything (V2X) networks have at-
manage the networking, caching, and computing resources.
tracted numerous studies with the aspiration to make our
A distributed spectrum sharing scheme based on multi-agent
experience on wheels safer, greener, and more convenient
RL is proposed in [12] to enhance the sum capacity of
[1], [2]. In the V2X networks, vehicles, pedestrians and other
V2I links and the payload delivery rate of V2V links. In
entities on the road are connected and coordinated to provide a
[13], each vehicle is considered as an agent and multiple
whole new collection of applications, varying from improving
agents are employed to sequentially make decisions to find
road safety to mitigating traffic congestion. The vehicular
available spectrum based on their local observations in V2V
communication is also supported by the 3rd generation part-
broadcast communications. In [14], a base station (BS) is used
nership project (3GPP) in the fifth-generation communication
to aggregate and compress observations of vehicles and the
system (5G) [3], which will further push the development and
compressed information is then fed back to each agent to help
deployment of the V2X communication systems. Despite its
improve the distributed decision performance for spectrum
potential to exert a significant impact on daily human life,
sharing in V2X networks.
there exist some significant challenges in V2X networks, such
Apart from the RL-based optimization methods, several
as lack of quality-of-service (QoS) guarantee and time-varying
works propose to formulate the optimization problem within
channels and network configuration [4]. Judicious spectrum
a graph model framework and to solve them with graph
allocation is thus needed in the vehicular communication
embedding methods. The critical component is the graph
networks to deal with environment dynamics and to guarantee
neural network (GNN) [15], which is developed to capture
QoS. The resource management problem is often formulated
the dependence of graphs via message passing between nodes
as one of combinatorial optimization, which is generally NP-
of graphs. Different from standard neural networks, GNN is
hard lacking low-complexity and effective universal solutions.
used to extract the feature of a node from its neighbors and
To tackle such issues, some works are focused on obtaining
the graph topology. Recent studies have shown that GNN
a sub-optimal solution to reduce the complexity. In [5], the
has attained success in network architectures, optimization
reliability requirements of vehicle-to-vehicle (V2V) commu-
techniques, and parallel computation [16]. GNN has been
nications are transformed into optimization constraints, which
combined with the traditional heuristic algorithms in [17]
use only the slowly-varying channel information to make
to solve classic NP-hard optimization problems, such as
it computable. Similarly, a spectrum and power allocation
Minimum Vertex Cover and Maximal Independent Set, which
scheme is proposed in [6] to maximize the ergodic capacity
are formulated over weighted graphs. In [18], a device-to-
of the vehicle-to-infrastructure (V2I) links, requiring only
device (D2D) network is expressed as a graph, where D2D
This work was supported in part by the National Science Foundation under pairs are nodes and interference links are edges. A GNN is
Grants 1815637 and 1731017. applied to extract the feature for each node, which is used to
Authorized licensed use limited to: QSIO. Downloaded on August 11,2021 at 16:15:14 UTC from IEEE Xplore. Restrictions apply.
(a) (b)
Fig. 1: (a) The structure of vehicular networks. (b) The graph representation of the vehicular network shown in (a).
After expressing the network as a graph, we use GNN to the time step t. The whole structure of the GNN-RL scheme
extract node features. The concept of GNN has been first is shown in Fig. 2.
introduced in [15], which aims at learning a feature embedding
µv ∈ Rns of each node, containing the information of
adjacent nodes and edges, where ns is the dimension of the Algorithm 1 Training Process for the Proposed Framework
feature embedding. The i-th iteration of µv is described as Input: GNN structure, Q-network structure for each V2V,
µiv = f (xv , xne [v], {euv }u∈N (v) , µvi−1 , {µui−1 }u∈N (v) ), and the environment simulator
(5) Output: GNN and allocation policy πk represented by Q-
where xne [v] denotes the observation of node v’s incoming network Qk with parameter θ k , for all k ∈ K
neighbors, N (v) represents the set of the node v’s incoming Initialisation : Initialize GNN and all Q-network models
1: for each training episode do
neighbors, i.e., nodes that have an link to node v, and f (·) is
the updating function we need to design. 2: Start simulator and generate vehicles and links.
Motivated by the popular GNN framework in [19], the 3: for time-step t = 0, ..., T − 1 do
updating function for the proposed GNN is derived by 4: Observe the graph information denoted by O (t)
(t)
including node observations xk , ∀k ∈ K and edge
(t)
i i i−1
∑ ∑
i−1 weights elk , ∀l, k ∈ K
µv = σ W v xv ||µv || euv || µu , 5: Each V2V utilizes the proposed GNN in (6), and
u∈N (v) u∈N (v) I,(t)
(6) after I iterations, extracts {its feature µ}k
(t) (t) I,(t)
where ·||· denotes the vector concatenation and W iv is the 6: Each V2V takes sk = xk , µk and selects
trainable weight of the node v for the i-th iteration. The (t) (t)
action ak based on state sk , according to the
rectified linear unit (ReLU) is adopted as the activation policy πk derived from Qk , e.g., ϵ-greedy policy [20]
function of GNN, σ(x) = max(0, x). 7: All V2V receive reward R(t+1)
In general, µ0v = 0 at the beginning of feature embedding. 8: Channels update and new graph information of the
To reduce the network signaling overhead, only feature em- next time step O (t+1) is obtained
beddings, {µiv |v ∈ V }, are exchanged among the adjacent 9: Store {O (t) , a(t) , R(t+1) , O (t+1) } in memory M
nodes at the i-th iteration, where V = K is the set of the 10: Sample mini-batch D from M uniformly
nodes corresponding to V2V pairs. After I iterations, extracted 11: for each V2V agent k do
features for nodes are attained as {µIv }v∈V . 12: Use D to train GNN with parameters W and
B. Distributed Deep Q-Network Design the k-th Q-network with parameters θ k jointly
by minimizing the mean square error (MSE) (7)
After extracting the features of V2V pairs from GNN, the between estimation return and the Q-value
Q-network is developed to select the spectrum for each V2V Update the k-th target Q-network: θ −
13: k ← θ k every
pair, which is treated as an agent in the RL framework. For Ntarget time steps
(t)
{ k-th agent
the } at the time step t, the state is defined as sk = 14: end for
(t) I,(t) (t) (t)
xk , µk , and the action is given by ak = ρk , where 15: Update target GNN: W − ← W every Ntarget time
(t) 1,(t) 2,(t) N,(t) steps
ρk = {ρk , ρk , ..., ρk }. The reward is designed as R
(t+1)
in (3) for all agents globally, i.e., rk = R(t+1) , ∀k ∈ K. 16: end for
The superscript (t) represents the information as obtained at 17: end for
Authorized licensed use limited to: QSIO. Downloaded on August 11,2021 at 16:15:14 UTC from IEEE Xplore. Restrictions apply.
900
700
600
500
Normalized Return
(t) (t)
Here, a(t) = {a1 , ..., aK } and the MSE for the k-th agent 0.8
is defined as GNN-RL
0.7
Random Scheme
(t+1)
∑
[R(t+1) + γ max′
Qk (sk (W − , O (t+1) ), a′ ; θ −
k) Optimal Scheme
a
D 0.6
(t) (t)
− Qk (sk (W , O (t) ), ak ; θ k )]2 , (7)
0.5
(t)
where γ is the discount parameter. sk (W , O (t) ) implies the
state is derived by the GNN with parameters W = {W ik |∀k ∈ 0.4
0 250 500 750 1000 1250 1500 1750 2000
K, i = 1, ..., I}, which infers that parameters W and θ k can Number of Episodes
be trained simultaneously by minimizing (7). Fig. 4: The return comparison among GNN-RL scheme,
IV. S IMULATION R ESULTS random scheme and optimal scheme with N = K = 4
Authorized licensed use limited to: QSIO. Downloaded on August 11,2021 at 16:15:14 UTC from IEEE Xplore. Restrictions apply.
V. C ONCLUSION
1.0 We present a GNN-augmented RL spectrum allocation
GNN-RL
scheme for vehicular networks. GNN extracts the feature of
0.9 each V2V pair according to the network topology and neigh-
Normalized Return
R EFERENCES
0.6 [1] L. Liang, H. Peng, G. Y. Li, and X. Shen, “Vehicular communications:
A physical layer perspective,” IEEE Trans. Veh. Technol., vol. 66, no.
12, pp. 10647–10659, Dec. 2017.
0.5 [2] H. Peng, L. Liang, X. Shen, and G. Y. Li, “Vehicular communications:
0 5 10 15 20 25 30
A network layer perspective,” IEEE Trans. Veh. Technol., vol. 68, no.
Dimension of Extracted Features
2, pp. 1064–1078, Feb. 2018.
Fig. 5: Average normalized return against the dimension of [3] “Technical Specification Group Radio Access Network: Study on LTE-
based V2X Services; (Release 14),” 3GPP, TR 36.885 V14.0.0, Jun.
the extracted features ns 2016.
[4] G. Araniti, C. Campolo, M. Condoluci, A. Iera, and A. Molinaro, “Lte
for vehicular networking: a survey,” IEEE Commun. Mag., vol. 51, no.
5, pp. 148–157, May. 2013.
[5] W. Sun, E. G. Ström, F. Brännström, K. C. Sou, and Y. Sui, “Radio
1.05 resource management for D2D-based v2v communication,” IEEE Trans.
GNN-RL Veh. Technol., vol. 65, no. 8, pp. 6636–6650, Aug. 2015.
Random Scheme [6] L. Liang, G. Y. Li, and W. Xu, “Resource allocation for D2D-enabled
1.00 vehicular communications,” IEEE Trans. Commun., vol. 65, no. 7, pp.
Normalized Return
Authorized licensed use limited to: QSIO. Downloaded on August 11,2021 at 16:15:14 UTC from IEEE Xplore. Restrictions apply.