Professional Documents
Culture Documents
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
15216 IEEE INTERNET OF THINGS JOURNAL, VOL. 9, NO. 16, 15 AUGUST 2022
have the customization function for the corresponding service 2) We propose a QoS guaranteed network slicing orches-
category that is required by users but also the ability to accom- tration, i.e., LSTM-DDPG, of which deep learning and
modate to the uncertain traffic demands [12], [13]. In order to RL collaboratively allocate resources. Specifically, deep
guarantee the performance isolation and maximize resource learning, i.e., LSTM, is employed for allocating dedi-
utilization (RU) under uncertain traffic needs, it is necessary cated resources, and RL DDPG performs online resource
to reconfigure network slice adaptively and intelligently. scheduling that can calibrate the offset between the real-
With the increasing progress of artificial intelligence tech- time environment changing and historical knowledge.
nology, machine learning (ML) has been widely applied 3) Compared with the benchmarks, we prove the effective-
in wireless communications [14], [15]. Compared with the ness of the proposed QoS guaranteed resource allocation
numerical optimization method, it shows a stronger ability orchestration for the targeted IoV slicing problem.
to solve complex problems, which makes ML more suitable The remainder of this article is organized as follows.
for the dynamic feature of IoV [13]. Moreover, reinforce- Section II presents the related work. Section III describes the
ment learning (RL), as an important branch of ML, learns system model that includes the network model and slicing
the optimal decisions by maximizing cumulative reward in model. Section IV formulates the resource allocation problem
the long term. By interacting with the unknown environment, and then decomposes it into two subproblems. In Section V,
RL methods are particularly appropriate for decision making we introduce the proposed QoS guaranteed network slicing
in such a changing environment to tackle the uncertainty and orchestration, LSTM-DDPG. We use the LSTM algorithm
dynamics. to predict the required resources in a long timescale, and
Despite the potentials and advantages of network slicing, the use the RL DDPG algorithm to adjust the slice resources
adaptation of network slicing to IoV also remains challenging. dynamically in a short timescale. In Section VI, we show
For IoV slicing, it is easily influenced by the inherent variation the simulation results to evaluate the performance of the
of the vehicular environment. This requires the slicing system proposed LSTM-DDPG, followed by a brief summary in
to make resource allocation decisions with partial system sta- Section VII.
tus information, and the validity of the obtained solution is
ensured during the entire operating period. In addition, with
the IoV becoming progressively dynamic, heterogeneous, and II. R ELATED W ORK
large scale, the real-time tracing and explicit modeling of the Network slicing is categorized into two camps: 1) core
network environment are acquiring gradually difficultly, which network slicing and 2) radio access network (RAN) slicing; the
makes traditional mathematical methods intractable [16]. On former in charge of configuring network nodes, links, as well
the other hand, as the high mobility of vehicles, the gen- as network topologies, and the latter is mainly responsible for
eral resource allocation way only according to the real-time wireless resource management [18]. Core network slicing has
service request is incapable to supply vehicles the guaran- been investigated in a vast amount of literature [16], [19]–[21].
tee of stable service performance within a specified time Cheng et al. [16] presented a two-stage slicing optimization
range. model. They first predicted link traffic to decide the long-
In order to deal with the mentioned problems, this article term feasible slicing deployment strategy, and after that the
proposes the long short-term memory-based deep determinis- Lyapunov stability theory was used for online optimization
tic policy gradient algorithm (LSTM-DDPG) to cater to the to calibrate misloading calibration. Limited by the capacity
diversified QoS requirements of vehicles. The early version of of links and nodes in physical infrastructure, a mixed binary
this article has been published in a conference [17], and the linear programming problem has been formulated in [19] to
current manuscript is an extensive extension of the previously minimize total link traffic, and then two heuristic algorithms
published paper. The LSTM is used to track the long-term fea- have been proposed. Analogously, Wei et al. [20] elaborated
tures of request changes, and performs long timescale resource on the reconfiguration problem in core network slicing, and
allocation. Furthermore, to handle the problem of the IoV sus- it used the branching dueling Q-network that has the abil-
ceptible to the varying environment, we use the RL DDPG ity to compress action space for reducing configuration cost.
algorithm to perform more fine-grained resource allocation by Guan et al. [21] introduced a multilayer complex network
adjusting the allocated resource for the slice, such that the model to help the analysis of slice deployment, and used this
vehicle’s requirements can be satisfied within a time interval. model to research the traffic performance of slices. But, from
Simulation results exhibit that the proposed orchestration is the perspective of network slicing goals, it is imperfect to sim-
able to offer stable QoS to the vehicles with a stability ply consider the status of nodes and links in the core network
probability greater than 92%. without considering the RAN, especially in the complex IoV.
The main contributions of this article can be summarized For instance, the ultrareliable and low-latency transmission of
as follows. safety-related services with different QoS requirements is still
1) We analyze the stability performance of the slice a challenge for IoV [22]. Hence, the RAN slicing is urgent to
by considering the dynamic request and the vehicle’s research.
movement. After that, the resource allocation problem In the field of RAN slicing, literature [23]–[30] have done a
is formulated and decoupled into two subproblems, i.e., detailed study. The network slicing request problem was inves-
the long timescale resource allocation problem and the tigated in [23], where the authors modeled it as a Markov
short timescale resource scheduling problem. decision process (MDP) and the RL method was used for
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: QoS GUARANTEED NETWORK SLICING ORCHESTRATION FOR INTERNET OF VEHICLES 15217
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
15218 IEEE INTERNET OF THINGS JOURNAL, VOL. 9, NO. 16, 15 AUGUST 2022
Especially, we use nv to denote the cardinality of set Uv , i.e., slices. Furthermore, the stringent performance isolation of
|Uv | = nv . Furthermore, nt = [nt1 , . . . , ntv , . . . , ntV ] is used slice has ensured that the performance for vehicles does not
to denote the traffic requirements of all slices at t timeslot. degrade to below the QoS the SLA provided before and after
Without loss of generality, ntv = 0 when there is no traffic allocating resource. The system state Uv , v ∈ V may vary
needs for slice v at t timeslot. before and after resource allocation on account of the vehi-
According to the SLA of the slice, each slice should make cle’s mobility and stochastic nature of request. We denote the
an assurance of QoS requirement for the served vehicles. current state and next state as Uvt and Uvt+1 , respectively. The
Based on the characteristic of the IoV, e.g., the strict delay transmission rate of vehicle u ∈ Uvt , v ∈ V is given by
constraint [37], we define the QoS provided by slice as the
rut = ωu log2 (1 + SINRub ) (3)
minimum threshold of transmission rate Rv , v ∈ V for vehi-
cles, which means that the rate of vehicles on slice v no less where ωu = (htv /ntv ) is the obtained spectrum resource of
than threshold Rv , can be expressed as vehicle u. Similarly, we can obtain the virtual rate of vehi-
cle r̂ut+1
, u ∈Uv
t+1 according to the load at t + 1 timeslot,
ru ≥ Rv ∀u ∈ Uv , v ∈ V (1)
which is defined as r̂ut+1 = ω̂u log2 (1 + SINRu b ), where
where ru is the transmission rate of vehicle u to BS. Moreover, ω̂u = (hv /nv ) is determined by the resources allocated at t
t t+1
because of the indeterminacy of the traffic load that is varying timeslot. Notice that the resource used for calculating virtual
at different times, it is imperative to perform resource reser- rate r̂ut+1
is htv instead of ht+1 that is based on the require-
v
vation deliberately for slices from the resource pool. Tracking ment at t + 1 timeslot, i.e., ω̂u = (htv /nt+1
v ). The main idea
the request changes over a long-term scale and reducing recon- behind this definition is to judge whether the last allocated
figuration overhead are both accomplished by reservation [38]. resources stabilize the communication rate of the vehicle, that
So, the total resources are divided into
the shared resource and is to say, whether the vehicle rate at t+1 timeslot still meets (1)
dedicated resource, i.e., W = W s + v∈V Wvd , where W s is the according to the last allocated resources. Thus, the following
shared resource, and Wvd is the dedicated resource allocated to constraint by which slice can provide stable service between
slice v. The specific determination way and calculation process t and t + 1 timeslot should be satisfied:
will be explained in Section IV. t+1
u ∈Uvt+1 r̂u
≥ Rv . (4)
IV. P ROBLEM F ORMULATION nt+1
v
In this section, we analyze the stability performance of ser- Obviously, the slice can guarantee the QoS of vehicles when
vice provided by slice to vehicles. Following this analysis, the system is allocating resource superfluously, but this will
the resource allocation problem is formulated and then we cause serious waste. So, we also need to reduce the cost of
decomposed it into two subproblems, that is, the resource allo- allocating resource. Considering the depletion in long-term t ∈
cation problem in a long timescale and the resource scheduling T = {0, 1, 2, . . . , T, T + 1, T + 2, . . . , KT}, the slice resource
problem in a short timescale. allocation can be formulated as
A. Resource Allocation Problem p(1) : min htv
t∈T v∈V
When allocating resource to slices, it is essential to take
the system state Uvt into account, where we use Uvt to express s.t. C1 : htv ≤ W ∀t ∈ T
the system state at t timeslot. Under current Uvt , the allocated v∈V
resource for slice v is denoted by htv . In order to make the C2 : rut ≥ Rv ∀u ∈ Uvt , v ∈ V
t+1
instant transmission rate satisfy (1), the following inequality u ∈Uvt+1 r̂u
should be held: C3 : ≥ Rv ∀v ∈ V (5)
nt+1
v
Rv
htv ≥
htv = ,v ∈ V (2) where htv is the optimization objective. Constraint C1 states the
log 2 (1 + SINRub )
u∈Uv
t
sum of allocated resources for slice that is constrained by total
where
htv indicates the minimum value of resources required resources. The QoS requirement of slice emerges from con-
by slice v to satisfy the QoS of vehicles, and SINRub = straint C2. Constraint C3 indicates that the allocated resources
(Pu Gub /I + σ 2 ) is the received signal-to-interference-plus- should ensure the vehicle’s transmission rate stable during the
noise ratio (SINR) of the V2I link of vehicle u, in which interval. In C3, variables r̂ut+1
and nt+1
v are obtained accord-
Pu is vehicle transmit power, Gub denotes the channel gain of ing to the information of vehicles that is connected to slice v
vehicle u that considers path loss, shadowing, and fast fading, at next timeslot. The number of requests and the position of
I is the received interference power, and σ 2 is the noise power. vehicles in slice v may change with time, so this constraint
In the similar way, the allocated resources for slice v at t + 1 reflects the change of service request and vehicle mobility.
timeslot can be expressed as ht+1v .
The system is supposed to allocate resources flexibly B. Problem Decomposition
according to the slice requirements enabling the adaptation To track the dynamic property of request and satisfy vehi-
of the dynamic environment so as to avert the negative impact cle’s requirements, motivated by [29], we accomplish the
caused by the environmental changes on the performance of resource allocation process by two decisions, i.e., resource
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: QoS GUARANTEED NETWORK SLICING ORCHESTRATION FOR INTERNET OF VEHICLES 15219
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
15220 IEEE INTERNET OF THINGS JOURNAL, VOL. 9, NO. 16, 15 AUGUST 2022
and the higher the value, the greater the influence of future
rewards on the current action selection. The policy π is usu-
ally the mapping from state to the probability distribution of
action, i.e., under a certain state s, the agent takes action based
on the policy π . Furthermore, the purpose of learning is to
search a best policy π ∗ to maximize the expectation of the
Fig. 3. Internal structure of LSTM cell. cumulative return with any initial state, given by
π ∗ = arg max E(R|π ). (13)
π
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: QoS GUARANTEED NETWORK SLICING ORCHESTRATION FOR INTERNET OF VEHICLES 15221
fine-grained resource scheduling with a significantly reduced the agent does not allocate redundant resources largely
complexity [48]. to the slice. Inversely, when the resource of slice is
Distinct from the value-based RL, the DDPG integrates insufficient, i.e., htv < htv , the agent also gets a nega-
policy gradient. The policy π can be represented as [49] tive reward. But it will get a greater reward if the slice
gets more resources. So, to maximize the reward of this
π (a, s) ≈ π (a|s; θ ) = P(a|s; θ ). (15)
term, whose maximum value is zero, the agent tends to
Because the DDPG is a deterministic policy algorithm, that allocate resources to meet the needs of the slice exactly.
is, in the same state, the policy is determined by the action Then, we explain the second term at the right-hand
with the maximum probability. So, we rewrite the policy π as side in (19). I(htv ) is an indicator function, when the
total
resource of t+1 slice satisfies the constraint C3, i.e.,
π (s; θ ) = a. (16) ([ u ∈Uvt+1 r̂ut+1 ]) ≥ Rv , its value is equal to 1, other-
]/[nv
In the following, we describe in detail the state, action, and wise, it is equal to 0. This function characterizes the stability
reward. of the slice. When the slice V can guarantee the stability of
1) State: For the resource scheduling problem, the real- the link rate of the served vehicles in the STI, the agent will
time link situation is not suitable for the environmental get a positive reward βv . Meanwhile, β = [β1 , β2 , . . . , βV ]
state due to the variation of network load caused by the is the weight of each slice, and the requirements of sta-
random request. The uncertain character of the number ble performance of different slices can get safeguard via
of links will cause dimension mismatching in the input adjusting β.
if the real-time link is selected as the state. Thus, we The last term at the right-hand side in (19) is a Dirac
define the state as impulse function. The reward will be negative infinite if the
allocated resource is overloading. Thus, massive impractical
st = W k,d , W Vk,d ,
k,d , . . . , W ht1 ,
ht2 , . . . ,
htV (17)
1 2 slicing deployment can be avoided.
vk,d is the predicted resource requirements by Finally, the DDPG network is structured. The DDPG
where W
network consists of two networks, namely, the evaluation
using the LSTM, and htv is the resource requirement
network and the target network. As shown in Fig. 5, each of
required by slice v.
the two networks can also be divided into two subnetworks,
2) Action: The action is defined as scheduling resources for
i.e., evaluation actor network, target actor network, evalua-
each slice
tion critic network, and target critic network, and their neural
at = W1k,t , W2k,t , . . . , WVk,t . (18) network parameters are represented by θ , θ , ω, and ω , respec-
tively. Moreover, to speed up the convergence, we also use
3) Reward: The reward, an important part of the RL the experience replay memory. At each time step t, the eval-
problem, would be designed elaborately to lead the uation actor network chooses action according to the state st
agent to find the optimal policy, and it reflects the and presents parameter θ . Then, it observes the next state st+1
optimization goal of our problem. Because the resource while acquiring reward rt . Expressly, to encourage exploration,
scheduling process occurs at each STI in ATI, for nota- we represent the action as a Gaussian probability distribution
tional convenience, we have omitted the superscript k
in the following formulas. As shown in problem P(3), at ∼ N π (st ; θ ), σ 2 (22)
the objective is minimizing the resource cost while
ensuring the transmission rate of service vehicles is sta- where π(st ; θ ) is the output of the evaluation actor network
ble. Considering the constraint conditions, the reward and also as the mean of probability distribution, and σ is the
function is defined as standard deviation.
The agent will save the transition tuple (st , at , rt , st+1 ) to
rt = φ htv + βv · I htv + δ ht (19) the replay memory to train networks. In training, the agent
v∈V v∈V samples minibatch samples at every iteration. The evaluation
where critic network is mainly responsible for calculating Q-value
⎧ t
⎪ according to the sampled samples. The target actor network
if htv ≥
hv
t
⎨ lnhtv , htv
is capable of selecting action at+1 from st+1 , and it is worth
φ hv = ht (20)
⎪
⎩ ln vt , if htv < htv noting that this action is only used for updating the network
hv and will not be executed by the agent. The target actor network
−∞, if v∈V hv > W
t
is used for forecasting Q-value, which is formulated by
δ ht = (21)
0, otherwise
Q st , at ; ω = rt + γ Q st+1 , at+1 ; ω . (23)
where we have used C4 in (7), and ht =
[ht1 , ht2 , . . . , htV ]. The first term at the right-hand side Next, the network will be updated. Similar to DQN, the loss
in (19) is decided by the instant needed resources and function of the evaluation critic network is the mean square
allocated resources. As shown in (20), the agent will get error of Q-value, which can be given by
a negative reward if superfluous resource is allocated for
1
K
slice, i.e., htv > htv . In this case, the more resources allo- J(ω) = Q si , ai ; ω − Q(si , ai ; ω) . (24)
cated, the smaller the reward acquired. This means that K
i=1
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
15222 IEEE INTERNET OF THINGS JOURNAL, VOL. 9, NO. 16, 15 AUGUST 2022
The policy gradient of the evaluation actor network can be Algorithm 1 DDPG-Based Online Resource Scheduling
represented as Algorithm
Phase 1: Initialize the networks
1
K
Initialize the evaluation network, target network, replay memory,
∇θ J(θ ) ≈ ∇a Q(s, a; ω)|s=si ,a=π (si ;θ) ∇θ π (s; θ )|s=si .
K the number of samples M, mini-batch K, standard deviation σ ,
i=1
update coefficient τ , target network update frequency C.
(25) Phase 2: Update the networks
Now, we have the update method of the evaluation network. for every k do
After that, the parameter of the target network is updated regu- Allocate the dedicated resources for each slice according to the
k,d k,d k,d
larly by that of the evaluation network in a soft update manner, LSTM prediction results W 1 , W2 , . . . , WV
Observe the state s = W k,d , W k,d ,
k,d , . . . , W ht ,
ht , . . . ,
ht .
and can be presented as 1 2 V 1 2 V
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: QoS GUARANTEED NETWORK SLICING ORCHESTRATION FOR INTERNET OF VEHICLES 15223
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
15224 IEEE INTERNET OF THINGS JOURNAL, VOL. 9, NO. 16, 15 AUGUST 2022
TABLE I
S IMULATION PARAMETERS
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: QoS GUARANTEED NETWORK SLICING ORCHESTRATION FOR INTERNET OF VEHICLES 15225
(a)
Fig. 9. Comparison of RU.
(b)
Fig. 10. Slice stability probability comparison with respect to the rate
threshold of slice.
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
15226 IEEE INTERNET OF THINGS JOURNAL, VOL. 9, NO. 16, 15 AUGUST 2022
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.
CUI et al.: QoS GUARANTEED NETWORK SLICING ORCHESTRATION FOR INTERNET OF VEHICLES 15227
[28] Y. Hua, R. Li, Z. Zhao, X. Chen, and H. Zhang, “GAN-powered [53] L. Liang, H. Ye, and G. Y. Li, “Spectrum sharing in vehicular networks
deep distributional reinforcement learning for resource management based on multi-agent reinforcement learning,” IEEE J. Sel. Areas
in network slicing,” IEEE J. Sel. Areas Commun., vol. 38, no. 2, Commun., vol. 37, no. 10, pp. 2282–2292, Oct. 2019.
pp. 334–349, Feb. 2020.
[29] M. Yan, G. Feng, J. Zhou, Y. Sun, and Y.-C. Liang, “Intelligent resource Yaping Cui received the Ph.D. degree in traffic
scheduling for 5G radio access network slicing,” IEEE Trans. Veh. information engineering and control from Southwest
Technol., vol. 68, no. 8, pp. 7691–7703, Aug. 2019. Jiaotong University, Chengdu, China, in 2017.
[30] R. Li, C. Wang, Z. Zhao, R. Guo, and H. Zhang, “The LSTM- He joined the School of Communication and
based advantage actor-critic learning for resource management in Information Engineering, Chongqing University of
network slicing with user mobility,” IEEE Commun. Lett., vol. 24, no. 9, Posts and Telecommunications, Chongqing, China,
pp. 2005–2009, Sep. 2020. in 2017, where he is currently a Lecturer. He is also
[31] X. Ge, “Ultra-reliable low-latency communications in autonomous with the School of Aeronautics and Astronautics,
vehicular networks,” IEEE Trans. Veh. Technol., vol. 68, no. 5, University of Electronic Science and Technology
pp. 5005–5016, May 2019. of China, Chengdu, China; with the Advanced
[32] H. D. R. Albonda and J. Pérez-Romero, “An efficient RAN slicing strat- Network and Intelligent Connection Technology Key
egy for a heterogeneous network with eMBB and V2X services,” IEEE Laboratory of Chongqing Education Commission of China, Chongqing; and
Access, vol. 7, pp. 44771–44782, 2019. with Chongqing Key Laboratory of Ubiquitous Sensing and Networking,
[33] K. Xiong, S. Leng, J. Hu, X. Chen, and K. Yang, “Smart network slic- Chongqing University of Posts and Telecommunications, Chongqing. His
ing for vehicular fog-RANs,” IEEE Trans. Veh. Technol., vol. 68, no. 4, research interests include millimetre-wave communications, multiple antenna
pp. 3075–3085, Apr. 2019. technologies, and smart antennas for vehicular networks.
[34] Z. Mlika and S. Cherkaoui, “Network slicing with MEC and deep rein-
forcement learning for the Internet of Vehicles,” IEEE Netw., vol. 35, Xinyun Huang is currently pursuing the M.S.
no. 3, pp. 132–138, May/Jun. 2021. degree with the School of Communication and
[35] W. Wu et al., “Dynamic RAN slicing for service-oriented vehicu- Information Engineering, Chongqing University of
lar networks via constrained learning,” IEEE J. Sel. Areas Commun., Posts and Telecommunications, Chongqing, China.
vol. 39, no. 7, pp. 2076–2089, Jul. 2021. He is also with the Advanced Network and
[36] J. Zheng, P. Caballero, G. de Veciana, S. J. Baek, and A. Banchs, Intelligent Connection Technology Key Laboratory
“Statistical multiplexing and traffic shaping games for network slicing,” of Chongqing Education Commission of China,
IEEE/ACM Trans. Netw., vol. 26, no. 6, pp. 2528–2541, Dec. 2018. Chongqing, and with Chongqing Key Laboratory
[37] L. Liang, H. Ye, G. Yu, and G. Y. Li, “Deep-learning-based wireless of Ubiquitous Sensing and Networking, Chongqing
resource allocation with application to vehicular networks,” Proc. IEEE, University of Posts and Telecommunications,
vol. 108, no. 2, pp. 341–356, Feb. 2020. Chongqing. His research interests include vehicular
[38] G. Wang, G. Feng, T. Q. S. Quek, S. Qin, R. Wen, and networks and network slicing.
W. Tan, “Reconfiguration in network slicing—Optimizing the profit
and performance,” IEEE Trans. Netw. Service Manag., vol. 16, no. 2, Peng He received the Ph.D. degree from the
pp. 591–605, Jun. 2019. University of Electronic Science and Technology of
[39] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, “Deep transfer China, Chengdu, China, in 2018.
learning for intelligent cellular traffic prediction based on cross-domain He is currently a Lecturer with the
big data,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1389–1401, School of Communication and Information
Jun. 2019. Engineering, Chongqing University of Posts and
[40] H. Zhang and V. W. S. Wong, “A two-timescale approach for Telecommunications, Chongqing, China. He is also
network slicing in C-RAN,” IEEE Trans. Veh. Technol., vol. 69, no. 6, with Advanced Network and Intelligent Connection
pp. 6656–6669, Jun. 2020. Technology Key Laboratory of Chongqing
[41] F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Education Commission of China, Chongqing,
Continual prediction with LSTM,” in Proc. 9th Int. Conf. Artif. Neural and with the Chongqing Key Laboratory of
Netw., Edinburgh, U.K., 1999, pp. 850–855. Ubiquitous Sensing and Networking, Chongqing University of Posts and
[42] L. Wang, H. Ye, L. Liang, and G. Y. Li, “Learn to compress CSI and allo- Telecommunications, Chongqing. His research interests include mobile edge
cate resources in vehicular networks,” IEEE Trans. Commun., vol. 68, computing, molecular communication, and wireless body area networks.
no. 6, pp. 3640–3653, Jun. 2020. Dr. He is a member of IEEE ComSoc.
[43] E. Sakic, F. Sardis, J. W. Guck, and W. Kellerer, “Towards adaptive state
consistency in distributed SDN control plane,” in Proc. IEEE Int. Conf. Dapeng Wu (Senior Member, IEEE) received the
Commun. (ICC), Paris, France, 2017, pp. 1–7. M.S. degree in communication and information
[44] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. systems from Chongqing University of Posts and
Cambridge, MA, USA: MIT Press, 1998. Telecommunications, Chongqing, China, in June
[45] V. Mnih et al., “Human-level control through deep reinforcement 2006, and the Ph.D. degree from Beijing University
learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. of Posts and Telecommunications, Beijing, China,
[46] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learn- in 2009.
ing with double Q-learning,” in Proc. 13th AAAI Conf. Artif. Intell., His research interests include ubiquitous
Feb. 2016, pp. 2094–2100. networks, IP QoS architecture, network reliability,
[47] Z. Wang, N. de Freitas, and M. Lanctot, “Dueling network architectures and performance evaluation in communication
for deep reinforcement learning,” in Proc. Int. Conf. Learn. Represent., systems.
Caribe Hilton, Puerto Rico, 2016, pp. 1–9.
[48] T. P. Lillicrap et al., “Continuous control with deep reinforcement learn- Ruyan Wang received the M.S. degree
ing,” in Proc. Int. Conf. Learn. Represent., Caribe Hilton, Puerto Rico, from the Chongqing University of Posts and
2016, pp. 1–14. Telecommunications (CQUPT), Chongqing, China,
[49] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gra- in 1997, and the Ph.D. degree from the University
dient methods for reinforcement learning with function approximation,” of Electronic and Science Technology of China,
in Advances in Neural Information Processing Systems. Cambridge, MA, Chengdu, China, in 2007.
USA: MIT Press, 2000. Since December 2002, he has been a Professor
[50] C. Qiu, Y. Hu, Y. Chen, and B. Zeng, “Deep deterministic policy gradi- with the Special Research Centre for Optical
ent (DDPG)-based energy harvesting wireless communications,” IEEE Internet and Wireless Information Networks,
Internet Things J., vol. 6, no. 5, pp. 8577–8588, Oct. 2019. CQUPT. He is also with Advanced Network and
[51] G. Barlacchi et al., “A multi-source dataset of urban life in the city Intelligent Connection Technology Key Laboratory
of Milan and the Province of Trentino,” Sci. Data, vol. 2, no. 1, 2015, of Chongqing Education Commission of China, Chongqing, and with
Art. no. 150055. Chongqing Key Laboratory of Ubiquitous Sensing and Networking,
[52] “Technical specification group radio access network; study on LTE- Chongqing University of Posts and Telecommunications, Chongqing. His
based V2X services; (release 14),” 3GPP, Sophia Antipolis, France, research interests include network performance analysis and multimedia
3GPP Rep. TR 36.885 V14.0.0, Jun. 2016. information processing.
Authorized licensed use limited to: Zhejiang University of Technology. Downloaded on April 09,2023 at 13:29:25 UTC from IEEE Xplore. Restrictions apply.