Professional Documents
Culture Documents
Abstract—Radio access network (RAN) slicing allows the each tailored and dedicated to meet the requirements of a
arXiv:2206.11328v1 [cs.DC] 22 Jun 2022
division of the network into several logical networks tailored specific 5G service [4], [5]. These services can be classi-
to different and varying service requirements in a sustainable fied into enhanced mobile broadband (eMBB), ultra-reliable
way. It is thereby considered a key enabler of 5G and next
generation networks. However, determining optimal strategies for low-latency communication (URLLC), and massive machine-
RAN slicing remains a challenging issue. Using machine learning type communication (mMTC) services. In the next-generation
algorithms to address such a difficult problem is promising. How- networks, MNOs consist of two main entities, namely the
ever, due to the large differences imposed by RAN deployments infrastructure provider (InP) and the mobile virtual network
and the disparity of their required services it is difficult to utilize operators (MVNOs) [6]. On one hand, the InP owns the
the same slicing model across all the covered areas. Moreover,
the data collected by each mobile virtual network operator physical resources, including, base stations, and core network
(MVNO) in different areas is mostly limited and rarely shared components, and importantly, the radio resources. On the other
among operators. Federated learning presents new opportunities hand, MVNOs lease these physical resources from the InP to
for MVNOs to benefit from distributed training. In this paper, deploy the RAN slices required to provide their own services.
we propose a federated deep reinforcement learning (FDRL) In a RAN slicing scenario, the InP allocates the radio resources
approach to train bandwidth allocation models among MVNOs
based on their interactions with their users. We evaluate the to the MVNOs according to the service level agreement (SLA)
proposed approach through extensive simulations to show the contracts. Then, each MVNO allocates the radio resource
importance of such collaboration in building efficient network rented from the InP to its users [6].
slicing models. The allocation of radio resources to users is an extremely
Index Terms—RAN Slicing, Federated Learning, Reinforce- intricate operation for MVNOs. This is mainly due to the
ment Learning, B5G. radio resources’ scarcity and the heterogeneous requirements
of their users in terms of quality of service (QoS) [7]–[9].
I. I NTRODUCTION To address these challenges, various approaches based on ma-
chine learning (ML) techniques have been proposed recently,
Modern wireless networks have known an explosive growth
specifically, reinforcement learning (RL) algorithms [10]–[13].
of data traffic as the number of mobile devices is increasing
Nevertheless, due to the dynamics of the RAN environment,
every day. Mobile devices exchange data to acquire vari-
in terms of density of users, user requirements, and wireless
ous services, with various qualities and requirements from
channel transmission conditions, RAN slicing remains a sig-
their mobile network operators (MNOs). To meet the ever-
nificantly challenging problem for MVNOs. These stochastic
growing needs of services, network operators are obligated
RAN environment factors have a major impact on the accuracy
to deploy new equipment to extend their coverage as the
of the RL models, which decreases the performance of radio
network generations evolves. However, extending the coverage
resource allocation to the users [14]–[19]. Indeed, when a
of next generation networks is expensive, which makes sharing
MVNO builds its resource allocation RL model using training
the network infrastructures a valuable alternative for various
datasets related only to its users’ behavior and its surrounding
service providers [1]. By sharing different network equipment
environment, the accuracy of the model may be limited. To
and resources, such as spectrum, antennas, and radio inter-
benefit from a diversified dataset, MVNOs can collaborate
faces, service providers can fulfill the requirements of highly
by sharing their data with each other to provide diverse and
scattered customers at a significantly reduced cost [2].
high-quality dataset that will be useful in the training RL
Network slicing (NS) is an advanced solution based on
models. However, MVNOs are often competing entities and
network virtualization that enables the transition from static
are unlikely to willing to share their data for privacy and data
network infrastructure to a dynamic one. It allows the design
security reasons. To overcome such issue, federated learning
of several logically independent networks, known as network
(FL) paradigm can be leveraged [20]–[22].
slices, which operate on a common physical infrastructure [3].
FL is a cooperative learning approach in which multiple
In particular, radio access network (RAN) slicing consists in
collaborators, MVNOs in our case, train an ML model using
partitioning the RAN resources to create various RAN slices,
their private datasets, and then send their trained models to
* Authors contributed equally. an aggregation entity to build a global model [22], [23]. The
A. Abouaomar, A. Filali, and S. Cherkaoui are with Polytechnique Mon- aggregation entity returns the global model to all collaborators
treal, QC, Montreal, Canada.
A. Taik is with INTERLAB, Engineering Faculty, Université de Sherbrooke, to immediate utilization or further training. Thus, FL will
QC, Canada. enable MVNOs to build a robust ML resource allocation model
2
while maintaining data privacy since only trained models are of the available resources on established network slices and
shared. Indeed, the shared experience will enable the RAN- to provide load balancing. Authors in [27] proposed a genetic
slicing model to learn from varying scenarios, which makes algorithm to allocate resources considering multi-tenant and
it more adaptive to environment changes. In fact, due to the multi-tier heterogeneous networks. The proposed approach
unbalanced and non-independent and identical distributions consists in relaxing the problem and solving it through hier-
(non-i.i.d) of the users across MVNOs, alongside their varying archical decomposition methods and Monte Carlo simulation.
numbers and requirements, FL becomes an attractive solution This work addressed in particulare the latency and bandwidth
to build robust models. allocation as QoS metrics. From a deeper perspective, RAN
To promote more programmability in the RAN, the open- slicing resource allocation process takes part at many levels,
RAN (O-RAN) architecture can be leveraged [24]–[26]. In and ML was widely investigated to this regard [28]. Literature
fact, the hierarchical RAN intelligent controller (RIC), includ- separates the InP resource allocation to MVNOs from MVNOs
ing non-real-time RIC (non-RT) and near-real-time RIC (near- allocating their resources to users. Many works investigate
RT) can be used to manage RAN slicing operations using ML. the MVNO RAN slicing [16]–[18], [29] though using RL.
The former handles the heavier RAN tasks, such as running However, the radio resource allocation is given only from
the training process, while the latter performs critical tasks, the perspective of a single MVNO. For instance, authors
such as inference and aggregation of ML models in FL. of [18] proposed a RAN slicing mechanism to enhance the
In this paper, we propose an FL-based cooperative radio re- performance of uRLLC and eMBB services. The proposed
source allocation mechanism for MVNOs. In this mechanism, approach takes in consideration two time-scale (large scale,
each MVNO trains an RL radio resource allocation model and short scale) slicing of RAN resources. Large time-scale,
according to its users’ requirements and sends the trained radio resources allocation depends on the requirements of
models to the near-RT RIC for aggregation. Then, the near-RT uRLLC and eMBB users. The short time-scale consists in
RIC sends back the global RL model to each MVNO to update gNodeBs allocating their resources to en users. This problem
its local RL models. We consider two types of users, namely was modeled as a non-linear binary program solved using
URLLC users and eMBB users. URLLC users require low deep reinforcement learning, precisely deep Q-learning model.
latency, while eMBB users need a high data rate. To the best Although the work mentions that resources can be allocated
of our knowledge, this is the first work to propose cooperative from adjacent nodes, however, this work only considers the re-
radio resource allocation between MVNOs based on FLDRL. source allocation for a single operator. Work in [19] considered
The main contributions of this paper are summarized as a strategic approach through Stackelberg-type games to cope
follows: with frequency and energy provisioning for the InP. Authors
• We model the radio resource allocation problem for provided an analysis for the equilibrium where MVNO’s
URLLC ad eMBB users as a continuous non-linear users are uniformly distributed. The authors obtain a unique
optimization problem. equilibrium policy at each layer in the special scenario when
• We model the radio resource allocation problem of an each MVNO manages only one category of users. Regarding
MVNO as a Markov decision process (MDP). the broader scenario of MVNOs that serve multiple user types,
• We develop a deep RL (DRL) algorithm to allocate radio authors proposed an evolved two-layer differential algorithm
resources to URLLC and eMBB users of each MVNO. along with a gradient based method to achieve the equilibrium.
• We design a federated DRL (FDRL) mechanism on an Work of [30] introduces a RAN dynamic slicing approach for
O-RAN architecture to cooperatively improve the radio vehicular networks in order to handle various IoV services
resource allocation operation of MVNOs. with different QoS requirements. The RL-based algorithm
• We evaluate the proposed mechanism through extensive solves the problem in two phases, including workload distri-
simulations. bution and resource allocation decisions. A DDPG actor-critic
RL approach was adopted in particular.
The remainder of this paper is organized as follows. Section
Despite the significant efforts to provide solutions for
II discusses related work RAN slicing based on DRL and
dynamic and efficient management for RAN slicing, many
FL. Section III provides the system model and the problem
aspects were missing from literature. The aspect of privacy,
formulation of radio resource allocation. Section IV presents
which is crucial and may represent a threat to MVNOs as
the proposed FDRL mechanism. Section V discusses the eval-
well as to users is yet ot be investigated. Moreover, sharing
uations and results of the proposed mechanism. The conclusion
each others experiences, MVNOs can improve their resource
is provided in section VI.
allocation schemes through collaboratively training resource
allocation models and share them in a FL fashion. Such
II. R ELATED W ORK a research direction was not best investigated. To the best
Many works investigated RAN slicing generally such as our knowledge, this is the first work to investigate the use
in [16] proposed DeepSlice, a deep learning neural network of FL in next generation networks management, specifically,
driven approach to efficiently address the load balancing and for the multi-MVNOs resources allocation. Authors of [17]
network availability challenges. In their work, they utilize investigated the resource allocation for wireless network slices.
the available KPIs to train the model for analyzing incoming This work, proposed a two tier slicing resource allocation
traffic and predicting the network slice for any user type. scheme using DRL. Also, this paper tackled the problem
Intelligent resource allocation allows for efficient utilization within a single BS, and users are accessing the associated RAN
3
MVNO1
QoS Local
Dataset
MVNON
Local model
RAN Slicing
Global model
MVNO2
resources through MVNOs. Hence, the resource allocation III. FDRL- ENABLED MVNO S A RCHITECTURE
process is divided into two tiers. The first tier is dedicated
to allocate InP resources to MVNOs using the DQN tech- A. System Model
nique combined with bidding. The second tier considers the We consider a RIC-enabled RAN architecture with a single
allocation of MVNOs resources to users using dueling DQN base station (BS) owned by an InP. The BS is operating on
technique to converge to an optimal solution. However, DQN a total bandwidth B. The InP is responsible to serve a set of
technique takes longer delays to converge to a stable reward, MVNOs M = {mi }, i ∈ {1, 2, . . . , M} by renting to each
which makes it non-suitable for all the DRL-based solutions. of them a fraction of the total bandwidth B based on an
SLA. Each MVNO mi has a set of users denoted by Ui }.
Previous literature on RAN slicing resource allocation pro- We consider two types of users, namely, eMBB users, and
vides variety of solutions and techniques that cope with the URLLC users. For a user j, let zje = {0, 1} and zju = {0, 1}
resource allocation, either from the upper tier (InP allocating denote the binary variables representing whether j is eMBB
resources to MVNOs) or the lower tier (MVNOs allocating (zje = 1) user or URLLC (zju = 1) user, respectively.
resources to users). However, DQN is mainly adapted to solve In this work, we consider that bandwidth allocation to
problems where the observation space have high dimensions, MVNOs has already been performed by the InP. We denote
it is only capable of handling discrete action spaces with low the fraction of the total bandwidth B leased to the MVNO
dimension. Therefore, DQN is not well adapted to situations mi by Bi . An MVNO allocates to each of its users a fraction
with continuous action spaces with significantly high dimen- fi,j ∈ [0, 1] from the leased bandwidth wi , to satisfy its QoS
sions. Consequently, DQN is not directly applied to domains requirements in terms of data rate and latency. Each user u(i,j)
that are continuous since it is founded on seeking for actions uses the allocated bandwidth to transmit a packet with a size
that maximizes the action-value function. In continuous cases, ξ(i,j) . We consider that the packet size depends on the type
DQN involves iterative optimization processes at every step. of users, so we denote the packet size of an eMBB user and
We adopt in this paper, a deep deterministic policy gradient to a URLLC user by ξ(i,j)e u
and ξ(i,j) , respectively. We consider
deal with the discrete aspect of the action space, therefore, es- the orthogonal frequency division multiple access (OFDMA)
cape the curse of dimensionality. Additionally, in the proposed uploading scenario to reduce interference between the users.
approach, MVNOs can benefit from each other experiences
while promoting the privacy. The achievable uplink data rate of the user u(i,j) ∈ Ui using
4
Take action
+
OU Noise
Action
State
Environment
<BW, User>
Actor network
Policy gradient
Replay buffer
Soft update
Target network
the allocated bandwidth is defined as follows, formulate both minimization and maximization problems of
an MVNO mi ∈ M in a joint problem as follows,
δ(i,j) = fi,j Bi log2 (1 + ρ(i,j) ) (1)
X X
where ρ(i,j) is the signal to noise ration between the user u(i,j) maximize zje δi,j , − zju D(i,j) (4a)
and the BS, and is given as follows, f
j∈Ui j∈Ui
subject to
Pi,j .gi,j
ρ(i,j) = (2) 0 ≤ fi,j ≤ fmax , ∀j ∈ Ui , (4b)
fi,j Bi σ 2 X
fi,j ≤ 1, (4c)
2 j∈Ui
where, σ is the noise power, Pi,j is the transmission power
of the user u(i,j) , and g(i,j) is the channel gain between the D(i,j) ≤ Dimax , ∀j ∈ Ui and zje = 1, (4d)
user u(i,j) and the BS. The transmission delay to upload a δ(i,j) ≥ δimin , ∀j ∈ Ui and zju =1 (4e)
packet can be calculated as follows:
IV. FDRL BANDWIDTH A LLOCATION an action ai is considered valid if the sum of the fractions
In this section, we present the proposed FDRL mechanism is less than 1, and if the allocated fractions result in delays
to solve the optimization problem Eq. (4). First, we model the and data rates that meet the SLA values. In case the action
bandwidth allocation problem of an MVNO as a single-agent is invalid, a negative reward is returned to prevent the agent
MDP. Then, we describe the proposed FDRL mechanism by from choosing similar actions in subsequent steps.
explaining the DDPG algorithm and how the latter is trained
in a federated fashion. B. Federated Deep Reinforcement Learning
A. MDP formulation of the bandwidth allocation Having formulated the problem as a MDP, an adequate
solution is reinforcement learning. In this case, each MVNO
In this section we present the formulation of the MDP. To
is considered as an agent interacting with the environment
formulate the MDP problem, we define the state space, the
composed of users, through observing a state S and choosing
action space, and the reward function.
an action a. The agent’s goal is to learn an optimal policy π
1) State space: At each time step t, each agent (i.e.,
by aiming to maximize the reward r.
MVNO) observes the environment state. The observation of
Deep reinforcement learning (DRL) combines the power of
each MVNO includes the type of its active users and their
deep neural networks with reinforcement learning to create
channel gains. The users’ types is necessary as it defines
agents that learn from high-dimensional states. Accordingly,
the requirements of SLA. The estimation of the channel
the policy π is represented as a deep neural network [13].
gains between each associated user on the communication
DRL was first introduced through Deep-Q Networks (DQN),
channel is necessary to make adequate bandwidth allocation
and was fast adopted by the research community to solve many
decisions. The channel gains are periodically collected by each
practical decision making problems [12]. Nonetheless, DQN is
MVNO. In fact, each MVNO broadcasts pilot signals to all
off-policy and may not perform well in environments that have
its users. Subsequently, each user estimates the channel state
high uncertainties such as wireless networks. While value-
information and sends it back to its MVNO through the return
based RL algorithms like Q-learning optimize value function
channel.
first then derive optimal policies, the policy-based methods
We denote Si (t) the observed state of MVNO mi at time-
directly optimize an objective function based on the rewards,
slot t.
which makes them suitable for large or infinite action spaces.
Si (t) = hGi (t), Ui (t)i (5)
Yet, policy-based RL might have noisy and unstable gradients
where Gi (t) represents the channel gain between the MVNO [31]. As a result, we propose to use an actor-critic based
mi and its users Ui at the time-slot t, Ui (t) represents the set algorithm [32]. In fact, actor-critic approaches combine strong
of users types of MVNO mi . The types of users are defined points from both value-based and policy-based RL algorithms.
using two values we and wu , which represent the priority of Furthermore, since the fraction values are continuous, we
each type. In general, since URLLC users have stringent delay use deep deterministic policy gradient (DDPG) [33], which
requirements, they are assigned higher priority values. concurrently learns a Q-function and a policy and performs
actions from a continuous space.
2) Action space: At each time slot RIC provides the
1) Deep Deterministic Policy Gradient (DDPG):
necessary bandwidth fraction Bi to each MVNO. An MVNO
DDPG is an off-policy algorithm that uses four neural
assigns factions of Bi to its users. The action space for each
networks, namely the actor network µ, the critic network v,
MVNO mi at a time-slot t is given as follows:
the actor target network µ′ , and the critic target network v ′ .
Ai (t) = [0, fmax ] (6) For a given observed environment state, the actor chooses an
where each action ai ∈ Ai (t) is represented by a row vector action, and the critic uses the following state-action Q function
given as a vector {f(i,j) (t), ∀ui,j ∈ Ui }. to evaluate this action.
tizes this type of users and is less likely to violate their required
45
delay.
40 2) Varying number of users:
The second considered scenario is non-i.i.d with unequal
35 number of users. The models are first trained with a total
number of users of 12, where 5, 4, and 3 users are served
30
Global Model by the first, second, and third MVNO, respectively. In other
25 Local Models words, we seek to evaluate the robustness of the models in
0 50 100 150 200 250 the case of changing number of users. In a first experiment,
Episodes we changed the number of users in test time to 4, 3 ,5 for
the first, second and third MVNOs, respectively. In a second
Fig. 3: Non-i.i.d and equal user distributions
experiment, we changed these numbers to 3, 5, 4. Fig. 6 shows
the number of times where users’ SLA were not satisfied
by the local models of the MVNOs and the global model,
while observing the same environments for a total of 20000
50
observations.
45
Similarly to the previous experiments, the global model’s
actions are less likely to violate the SLA requirements for
Cumulative reward
1513 43638
1462 Local Models 41165
Local Models
1400 Global Model 40000 39997
Global Model
1200
1000 30000
SLA unatisfaction
SLA unatisfaction
800
20000 20263 19953 20275
600
400
10000
200
0 0 0 0 0 0
MVNO1 MVNO2 MVNO3 MVNO1 MVNO2 MVNO3
50000
2324 47837
Local Models Local Models
Global Model Global Model
2000 41274 40005
40000
SLA unatisfaction
SLA unatisfaction
1500 30000
1273 26677
24845 24723
1000 20000
500 10000
0 0 0 1 0 0
MVNO1 MVNO2 MVNO3 MVNO1 MVNO2 MVNO3
the users. Experiences have shown that the model trained using [10] M. R. Raza, C. Natalino, P. Öhlen, L. Wosinska, and P. Monti, “Rein-
FDRL is more robust against environment changes compared forcement learning for slicing in a 5g flexible ran,” Journal of Lightwave
Technology, vol. 37, no. 20, pp. 5161–5169, 2019.
to models trained separately by each MVNO. [11] C. Ssengonzi, O. P. Kogeda, and T. O. Olwal, “A survey of deep
reinforcement learning application in 5g and beyond network slicing
R EFERENCES and virtualization,” Array, p. 100142, 2022.
[1] O. Sallent, J. Perez-Romero, R. Ferrus, and R. Agusti, “On radio access [12] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
network slicing from a radio resource management perspective,” IEEE Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,
Wireless Communications, vol. 24, no. 5, pp. 166–174, 2017. et al., “Human-level control through deep reinforcement learning,”
[2] E. J. Oughton and Z. Frias, “The cost, coverage and rollout implications nature, vol. 518, no. 7540, pp. 529–533, 2015.
of 5g infrastructure in britain,” Telecommunications Policy, vol. 42, no. 8, [13] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
pp. 636–652, 2018. “Deep reinforcement learning: A brief survey,” IEEE Signal Processing
[3] A. Filali, A. Abouaomar, S. Cherkaoui, A. Kobbane, and M. Guizani, Magazine, vol. 34, no. 6, pp. 26–38, 2017.
“Multi-access edge computing: A survey,” IEEE Access, vol. 8, [14] A. Filali et al., “Communication and computation o-ran resource slicing
pp. 197017–197046, 2020. for urllc services using deep reinforcement learning,” arXiv preprint
[4] Z. Mlika and S. Cherkaoui, “Network slicing with mec and deep arXiv:2202.06439, 2022.
reinforcement learning for the internet of vehicles,” IEEE Network, [15] A. Abouaomar et al., “Resource provisioning in edge computing for
vol. 35, no. 3, pp. 132–138, 2021. latency-sensitive applications,” IEEE Internet of Things Journal, vol. 8,
[5] X. Foukas, M. K. Marina, and K. Kontovasilis, “Orion: Ran slicing for no. 14, pp. 11088–11099, 2021.
a flexible and cost-effective multi-service mobile network architecture,”
[16] A. Thantharate, R. Paropkari, V. Walunj, and C. Beard, “Deepslice: A
in Proceedings of the 23rd annual international conference on mobile
deep learning approach towards an efficient and reliable network slicing
computing and networking, pp. 127–140, 2017.
in 5g networks,” in 2019 IEEE 10th Annual Ubiquitous Computing, Elec-
[6] C. Liang and F. R. Yu, “Wireless network virtualization: A survey,
tronics & Mobile Communication Conference (UEMCON), pp. 0762–
some research issues and challenges,” IEEE Communications Surveys
0767, IEEE, 2019.
& Tutorials, vol. 17, no. 1, pp. 358–380, 2014.
[7] A. Rago, S. Martiradonna, G. Piro, A. Abrardo, and G. Boggia, “A [17] G. Chen, X. Zhang, F. Shen, and Q. Zeng, “Two tier slicing resource
tenant-driven slicing enforcement scheme based on pervasive intelli- allocation algorithm based on deep reinforcement learning and joint
gence in the radio access network,” Available at SSRN 4022195, 2022. bidding in wireless access networks,” Sensors, vol. 22, no. 9, p. 3495,
[8] H. Song, J. Bai, Y. Yi, J. Wu, and L. Liu, “Artificial intelligence enabled 2022.
internet of things: Network architecture and spectrum access,” IEEE [18] A. Filali, Z. Mlika, et al., “Dynamic sdn-based radio access network
Computational Intelligence Magazine, vol. 15, no. 1, pp. 44–51, 2020. slicing with deep reinforcement learning for urllc and embb services,”
[9] H. Song, L. Liu, J. Ashdown, and Y. Yi, “A deep reinforcement learning IEEE Transactions on Network Science and Engineering, pp. 1–1, 2022.
framework for spectrum management in dynamic spectrum access,” [19] J. Hu, Z. Zheng, B. Di, and L. Song, “Multi-layer radio network
IEEE Internet of Things Journal, vol. 8, no. 14, pp. 11208–11218, 2021. slicing for heterogeneous communication systems,” IEEE Transactions
10
SLA unatisfaction
400 30000
300
20000 19483 18221
200 14876 13776
10000
100
0 0 0 0 0 0
MVNO1 MVNO2 MVNO3 MVNO1 MVNO2 MVNO3
2346 46044
Local Models Local Models
Global Model Global Model
2000 40000 38769
30000
SLA unatisfaction
SLA unatisfaction
1500 30000
14184 14554
558
500 10000
0 0 0 0 0 0
MVNO1 MVNO2 MVNO3 MVNO1 MVNO2 MVNO3
on Network Science and Engineering, vol. 7, no. 4, pp. 2378–2391, [30] W. Wu, N. Chen, C. Zhou, M. Li, X. Shen, W. Zhuang, and X. Li,
2020. “Dynamic ran slicing for service-oriented vehicular networks via con-
[20] J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and strained learning,” IEEE Journal on Selected Areas in Communications,
D. Bacon, “Federated learning: Strategies for improving communication vol. 39, no. 7, pp. 2076–2089, 2020.
efficiency,” 2016. [31] O. Nachum, M. Norouzi, K. Xu, and D. Schuurmans, “Bridging the gap
[21] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: between value and policy based reinforcement learning,” Advances in
Challenges, methods, and future directions,” IEEE Signal Processing neural information processing systems, vol. 30, 2017.
Magazine, vol. 37, no. 3, pp. 50–60, 2020. [32] V. Konda and J. Tsitsiklis, “Actor-critic algorithms,” Advances in neural
[22] A. Taı̈k et al., “Data-aware device scheduling for federated edge learn- information processing systems, vol. 12, 1999.
ing,” IEEE Transactions on Cognitive Communications and Networking, [33] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
vol. 8, no. 1, pp. 408–421, 2022. D. Silver, and D. Wierstra, “Continuous control with deep reinforcement
[23] A. Abouaomar, S. Cherkaoui, Z. Mlika, and A. Kobbane, “Mean-field learning,” arXiv preprint arXiv:1509.02971, 2015.
game and reinforcement learning mec resource provisioning for sfc,” in
2021 IEEE Global Communications Conference (GLOBECOM), pp. 1–
6, 2021.
[24] O.-R. Alliance, “O-RAN: Towards an Open and Smart RAN,” tech. rep.,
, Oct. 2018. White Paper.
[25] I. Chih-Lin, S. Kuklinskı́, and T. Chen, “A perspective of o-ran integra-
tion with mec, son, and network slicing in the 5g era,” IEEE Network,
vol. 34, no. 6, pp. 3–4, 2020.
[26] D. Johnson, D. Maas, and J. Van Der Merwe, “Nexran: Closed-loop ran
slicing in powder-a top-to-bottom open-source open-ran use case,” in
Proceedings of the 15th ACM Workshop on Wireless Network Testbeds,
Experimental evaluation & CHaracterization, pp. 17–23, 2022.
[27] S. O. Oladejo and O. E. Falowo, “Latency-aware dynamic resource allo-
cation scheme for multi-tier 5g network: A network slicing-multitenancy
scenario,” IEEE Access, vol. 8, pp. 74834–74852, 2020.
[28] B. Han and H. D. Schotten, “Machine learning for network slic-
ing resource management: a comprehensive survey,” arXiv preprint
arXiv:2001.07974, 2020.
[29] A. Abouaomar, Z. Mlika, A. Filali, S. Cherkaoui, and A. Kobbane,
“A deep reinforcement learning approach for service migration in mec-
enabled vehicular networks,” in 2021 IEEE 46th Conference on Local
Computer Networks (LCN), pp. 273–280, 2021.