You are on page 1of 6

Accepted by the 20th IEEE International Conference on Mobile Ad-Hoc and Smart Systems (MASS 2023)©2023 IEEE

Intent-driven Intelligent Control and Orchestration


in O-RAN Via Hierarchical Reinforcement Learning
Md Arafat Habib1 , Hao Zhou1 , Pedro Enrique Iturria-Rivera1 , Medhat Elsayed2 , Majid Bavand2 ,
Raimundas Gaigalas2 , Yigit Ozcan2 and Melike Erol-Kantarci1 , Senior Member, IEEE
1
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada
2
Ericsson Inc., Ottawa, Canada
Emails:{mhabi050, hzhou098, pitur008, melike.erolkantarci}@uottawa.ca,
{medhat.elsayed, majid.bavand, raimundas.gaigalas, yigit.ozcan}@ericsson.com

Abstract—rApps and xApps need to be controlled and or- such as “increase throughput by 10%” or “increase energy
chestrated well in the open radio access network (O-RAN) so efficiency by 5%” [5]. To better support autonomous orchestra-
that they can deliver a guaranteed network performance in a
arXiv:2307.02754v1 [cs.NI] 6 Jul 2023

tion of the xApps in a multi-vendor environment, emphasis on


complex multi-vendor environment. This paper proposes a novel
intent-driven intelligent control and orchestration scheme based operators’ intents is crucial [6]. Intents aid in achieving agile,
on hierarchical reinforcement learning (HRL). The proposed flexible, and simplified configuration of the wireless networks
scheme can orchestrate multiple rApps or xApps according to with minimum possible intervention. Furthermore, intelligent
the operator’s intent of optimizing certain key performance intent-driven management has the ability to constantly acquire
indicators (KPIs), such as throughput, energy efficiency, and knowledge and adjust to changing network conditions by
latency. Specifically, we propose a bi-level architecture with a
meta-controller and a controller. The meta-controller provides utilizing extensive real-time network data. The inclusion of
the target performance in terms of KPIs, while the controller intent-driven goals for intelligent xApp control and orches-
performs xApp orchestration at the lower level. Our simulation tration is a promising yet highly complex task, since there
results show that the proposed HRL-based intent-driven xApp are multiple vendors involved with different network functions
orchestration mechanism achieves 7.5% and 21.4% increase and intents may trigger conflicting optimization goals in sub-
in average system throughput with respect to two baselines,
i.e. a single xApp baseline and a non-machine learning-based systems. There are a few works on conflict mitigation or xApp
algorithm, respectively. Similarly, 17.3% and 37.9% increase in cohabitation in O-RAN. For instance, Han et al. propose a
energy efficiency is observed in comparison to the same baselines. conflict mitigation scheme among multiple xApps using team
learning [7], and Polese et al. propose a machine learning
Index Terms—O-RAN, rApps, xApp, hierarchical reinforce- (ML)-based pipeline for the cohabitation of multiple xApps
ment learning, orchestration
in an O-RAN environment [8]. The work outlined in [9]
introduces a method for achieving automation throughout the
I. I NTRODUCTION
entire life cycle of xApps, beginning with the utilization
Open radio access network (O-RAN) facilitates openness scenario, requirements, design, verification, and ultimately, the
and intelligence to support diverse traffic types and their deployment within networks. However, the operator intent is
requirements in 5G and beyond networks [1] as well as, multi- not involved in these works.
vendor RAN deployments. In a multi-vendor environment, To this end, we propose a hierarchical reinforcement learn-
rApps and xApps can be hosted in a non-real-time RAN ing (HRL) method for intent-driven xApp orchestration. Dif-
intelligent controller (non-RT-RIC) and near-real-time RAN ferent from the previous works, the proposed scheme has a
intelligent controller (near-RT-RIC). In the literature, xApps bi-level architecture, where we can pass the intents to the
and rApps have been studied for resource and power allo- top-level hierarchy, and process it as optimization goals for
cation, beamforming and management, cell sleeping, traffic the lower-level controller to control and orchestrate xApps.
steering and so on [2]–[4]. Advanced reinforcement learning Orchestration can avoid xApp conflicts and improve perfor-
(RL) algorithms can be used to develop intelligent network mance by combining xApps with similar performance objec-
functions in O-RAN. However, a multi-rApp or a multi- tives. The proposed method is compared with two baselines:
xApp scenario with a variety of AI-enabled Apps will require non-machine learning (non-ML) solution and a single xApp
intelligent control and orchestration among the Apps to avoid scenario. Our simulation results show that the proposed HRL-
performance degradation. based intent-driven xApp orchestration mechanism achieves
Note that, we focus on xApps as a case study but our work 7.5% and 21.4% increase in average system throughput along
generalizes to rApps as well. To elevate autonomy in O-RAN with 17.3% and 37.9% increase in energy efficiency, compared
via xApp orchestration, intent-driven network optimization to the single xApp and non-ML baselines, respectively.
goals can play a pivotal role. The intent is defined as an The rest of the paper is organized as follows: Section II
optimization goal that is a high-level command given by the discusses the related works, followed by Section III which
operator usually in plain language and it determines a key presents the system model elaborately. The proposed HRL-
performance indicator (KPI) that the network should meet, based xApp orchestration in O-RAN is covered in Section IV.
Performance analysis and comparison of the proposed method
along with the baselines are presented in Section V. Lastly,
we present our conclusions in Section VI.

II. R ELATED WORK


There are a few works that investigate ML-based xApps
for RAN optimization and control. Polese et al. propose an
ML pipeline for multiple xApps in an O-RAN environment
[8]. Han et al. propose a conflict mitigation scheme among
deep reinforcement learning (DRL)-based power allocation
and resource allocation xApps [7]. Polese et al. propose an
Orchest-RAN scheme, in which network operators can specify
high-level control objectives in non-RT-RIC to sort out the
optimal set of data-driven algorithms to fulfill the provided
intent [10]. While the work presented in [10] focuses on
selecting the appropriate machine learning models and their
execution locations for given inputs from the operator, it
does not put emphasis on the network operator’s goals as
optimization objectives to select and orchestrate xApps.
An intent-driven orchestration of cognitive autonomous net-
works of RAN management is presented in [11], where the
authors propose a generic design of intent-based management
for controlling RAN parameters and KPIs. Zhang et al. pro-
pose an intent conflict resolution scheme to realize conflict
avoidance in machine learning-based xApps [12]. A graph-
based solution is proposed in [13] to determine the specific Fig. 1. O-RAN based system model with macro cell and small cells.
network function required to fulfill an intent. 1) Traffic Steering xApp: The traffic steering xApp aims
Compared with existing literature, the main contribution to achieve a simultaneous balance of QoS requirements for
of this work is that we propose an HRL scheme for intent- various traffic classes by introducing a traffic steering scheme
driven orchestration of xApps. The HRL scheme can well fit based on Deep Q-Network (DQN) [14]. We design the reward
the inherent O-RAN hierarchy with non-RT-RIC and near-RT- and state functions to ensure satisfactory performance, focus-
RIC, and intent-based orchestration enables higher flexibility ing on two essential KPIs: network delay and average system
for network control and management. The intents from the throughput. Traffic can be steered to a certain BS based on
human level operator are provided as goals for the system to load experienced, link quality, and traffic type. The details of
achieve, which leads to the orchestration of xApps to achieve this xApp can be found in [2].
the provided goal. 2) Cell Sleeping xApp: The cell sleeping xApp is designed
to reduce power consumption in the system by turning off idle
III. S YSTEM M ODEL or less busy BSs. The xApp can perform cell sleeping based
A. System Model on traffic load ratios and queue length of each BS. The energy
consumption model for the BS is:
We consider an O-RAN-based downlink orthogonal fre- 
quency division multiplexing cellular system having B BSs P0 + δp Pout , 0 < Pout ≤ Pmax ,
Pin = (1)
serving U users simultaneously, and multiple small cells in Psleep , Pout = 0,
the system are within the range of a macro cell. There are where P0 is the fixed power consumption, δp is the slope of
K classes of traffic in the system and users are connected load-dependent power consumption, Pout is the transmission
with multiple RATs via dual connectivity. There are Q classes power, Pmax is the maximum transmission power, and Psleep
of RATs (q1 , q2 , ..., qn ), where q represents a certain access is the constant power consumption in sleep mode [15].
technology (LTE, 5G, etc.). The wireless system model con- The goal of the cell sleeping xApp is to maximize energy
sidered in this work is presented in Fig. 1. RIC platforms in efficiency as much as possible without overloading the active
the figure (non and near-RT-RIC) can host rApps and xApps BSs. The optimization goal is formulated as follows:
which are control and optimization applications operating at P P
u∈Uo b∈B Tu,b
different time scales. max − θbu ,
Pb Pb (2)
We design three xApps, namely traffic steering, cell sleep-
ing, and beam forming xApps. In each xApp, we apply deep s.t. (1),
reinforcement learning for optimization within this xApp, where Uo is the set of the user equipments (UEs) connected
which will be introduced in the following. to a certain BS, T represents the throughput, θ is the penalty
factor to reduce overloading, and bu is the number of the • Reward: The reward function is the same as eq. (3) as
BSs overloaded. Turning off the BSs can greatly decrease presented before.
power consumption. It reduces the number of BSs active
that are serving the live network traffic. This poses a risk IV. P ROPOSED HRL- BASED X A PP ORCHESTRATION
of overloading the active BSs. Therefore, the penalty factor S CHEME
related to the number of BSs has been introduced to avoid RL problems can be formulated as MDPs where we have a
excessive overloading. set of states, actions, transition probability, and a reward func-
To address the formulated problem, the following MDP is tion (S, A, T, R). The RL agent in HRL consists of two con-
formulated: trollers: a meta-controller and a controller [18]. The MDP for
• State: The set of the state consists of: S = {qL , LR }. LR HRL has an added element which is denoted as a set of goals
represents the traffic load ratio of a BS, b. The second (G). Depending on the current state, the meta-controller is
element of the state space is the queue length of the BSs responsible for generating high-level goals (G = g1 , g2 , ..., gn )
representing the load level. for the controller. After that, these goals are transformed into
• Action: Turning the BSs on and off are put into the action high-level policies. The controller chooses low-level action
set for the DQN implementation. A = {ON, OF F }. a according to the high-level policies. This process from
• Reward: The reward function is the same as eq. (2). the controller yields an intrinsic reward (rin ). Finally, an
3) Beamforming xApp: The third xApp is the beamforming extrinsic reward (rex ) is given to the meta-controller from the
xApp. We deploy band-switching BSs from 3.5 GHz to environment and it will provide the controller with a new goal
mmWave frequencies [16]. This allows us to support high (g′ ). This section will discuss the xApp orchestration scheme
throughput traffic like enhanced mobile broadband (eMBB) via HRL.
via accurate intelligent beamforming. This xApp can control A. xApp Coordination Using HRL
power based on the location of the UE and it uses minimum
transmission power needed which is energy efficient. The The proposed O-RAN-based system architecture is pre-
xApp employs analog beamforming, and a multi-antenna setup sented in Fig 2. RIC platforms can host rApps and xApps
is adopted where each BS deploys a uniform linear array which are applications operating at different time scales. Three
(ULA) of M antennas [17]. The beamforming weights of every xApps have been defined in previous sections. The rApp in the
beamforming vector are implemented using constant modulus figure works as an input panel for the network operator, and
phase shifters. We also assume that there is a beam steering- it can convert these inputs as goals to be optimized. Also, it
based codebook, F , from which every beamforming vector is works as the meta-controller in the non-RT-RIC.
selected [17]. Let’s assume, X is a set of xApps and Y is the subset of
Every BS l has a transmit power PT X,l ∈ P , where P is X having at least one element (xApp in our case), that can
the set of candidate transmit powers. We want to optimize two optimize the network performance based on the operator input.
parameters: throughput and energy efficiency using this xApp. Let I be the set of candidate KPIs that a xApp can optimize
To obtain such a goal, the following optimization problem is and Z be the set of QoS requirements the system has to satisfy.
addressed. Considering all these assumptions, the xApp orchestration
     problem that we want to address can be formulated as follows:
X Tk,b ε
max c1 + c2 ,
TQoS εmax XX
l∈{1,2,..,L}
(3) max (Pi − ρξz ),
s.t. PT X,l [t] ∈ P, i∈I z∈Z (4)
fl [t] ∈ F, s.t. ∀(X)∃(Z) : V (O) = 1,
where Tk,b is the throughput achieved by the system, TQoS where P is the intended performance metric the operator
is the defined throughput requirement for a traffic type k, intends to improve. ρ is the penalty parameter for QoS
ε represents the energy efficiency associated with the BS requirement violation, and ξz is the number of UEs QoS
throughput and transmission power, εmax is the maximum requirements violated to. Lastly, V (O) is the proposition that
theoretical energy efficiency, and c1 and c2 are the weight “An xApp can improve a performance metric”, which is either
factors. ‘0’ or ‘1’.
To solve the formulated problem, the following MDP is As presented in Fig. 2, the rApp in the system is directly
defined. connected to the user panel where the operator may provide
• State: UE co-ordinates are used as set of states, S = input to the system. The operator input is provided as the
{CU E1 , CU E2 , ..., CU EN }. percentage of the increase related to a certain KPI. For exam-
• Action: The action set consists of two elements: A = ple, x% for throughput increase or y% for energy efficiency
{α(χn ), δn }. Here, χ is the steering angle, and α(χn ) is increase or any other intent stated in natural language. The
the array steering vector in the direction χn of the n-th rApp has a hierarchical deep Q-learning (h-DQN) framework
element in the codebook. δn accounts for the power level [18]. The meta-controller (in non-RT-RIC) takes the increased
change. amount of throughput or increased amount of energy efficiency
as a goal, observes the state in the environment and provides
both the goal and states to the controller in near-RT-RIC
having a bundle of xApps. This type of data passing is done
via the A1 interface by which both the non and near-RT-RIC
are connected. The controller takes the action of choosing
an xApp or a set of xApps based on the provided state and
goal. Following, we define the MDP for the meta-controller
and controller to address the xApp orchestration problem
formulated in eq. (4).
• State: The set of states consists of traffic flow
types of different users in the network. UEs hav-
ing similar traffic types are grouped together. S =
{Tvoice , .., Turllc , ..., Tgaming , ..., TeM BB , ..}. Elements
in this set stand for five different traffic types in the
system. Both meta-controller and controller share the
same states.
• Action: xApp or combination of xApp selection is con- Fig. 2. Intent-based xApp orchestration with macro and micro cells having
sidered as actions to be performed by the controller which different types of traffic.
is defined as: {AxApp1 , AxApp1,2 , ...., AxAppN }.
B. Baseline Algorithms
• Intrinsic reward: The intrinsic reward function (rin ) for
the controller is: rin = Pi − ρξz which is similar to eq. This section includes two baselines. The first baseline is a
(4). simulation of the same network scenario based on the system
• Goal for the controller: Increased throughput or in- model we have presented so far where there is no intelligent
creased energy efficiency level that can satisfy operator DRL-based xApp to optimize the network. We use non-ML
intent is passed to the controller as goals. It is G = algorithms. For comparing the throughput performance of
{tp1 , tp2 , ..., tpn } for throughput increasing intents or the proposed HRL-based system, we use a threshold-based
G = {ee1 , ee2 , ..., een } for energy efficiency increasing traffic steering scheme proposed in [19]. It uses a predefined
intents. Note that these goals can be generalized to other threshold. The threshold is determined considering the load at
KPIs however for simplicity we target throughput and each station, channel condition, and user service type. The
energy efficiency. mean of all these metrics is taken to obtain the threshold
• Extrinsic reward: The meta-controller is responsible for (T ) values. Weighted summation of the same parameters is
the overall performance of the system. Therefore, we have taken to form a variable (w). Then, the traffic is steered
set the extrinsic reward function for the meta-controller as to another BS based on the w and T values. This baseline
the objective of the problem formulation presented in eq. does not include cell sleeping, therefore BSs are always on.
(4). The following equation is basically the summation In our second baseline, we consider single xApp scenarios.
of the intrinsic reward over τ steps. For example, the proposed HRL-based xApp orchestration
mechanism is compared with single xApp scenarios where
n only traffic steering xApp is in action, or the cell sleeping
1X
rex,τ = rin,τ ∀(u) ∈ U, ∀(b) ∈ B, (5) xApp is in action.
n τ =1
V. P ERFORMANCE E VALUATION
The whole process of xApp orchestration can be summa- A. Simulation setup
rized as follows:
A MATLAB-based simulation environment has been devel-
• Step 1: Operator’s intent is provided as input regarding oped having one eNB and four gNBs to serve as one macro-
which performance metric is to be improved. cell and four small cells. In total, we deploy 60 UEs with
• Step 2: These performance targets are provided to the five different traffic types: voice, gaming, video, URLLC, and
controller in near-RT-RIC by the meta controller rApp in eMBB. Different types of traffic in the system have variant
the non-RT-RIC as goals to achieve. requirements in terms of different KPIs. QoS requirements of
• Step 3: The controller selects an xApp or a combination different traffic types have been defined based on our previous
of xApps to reach the target performance as close as work [20]. eMBB and URLLC traffic types have been added
possible. The system learns based on the reward it gets additionally to test the system compatibility. For the eMBB
for such kind of xApp selection. traffic type, we consider packet size, TQoS , and DQoS to be
• Step 4: Selected xApps with their own DRL-based 1500 bytes, 100 Mbps, and 15 ms, respectively [21]. Lastly,
functionalities optimize the performance of the network specifications related to delay and packet size for the URLLC
as a response to the intent of the operator. traffic are set to 32 bytes and 2.5 ms.
the 461-th time slot and xApp2 (cell sleeping xApp) has been
invoked.
Fig. 4 presents a similar graph to the previous one but this
time it plots the energy efficiency in the time axis. When
there is an intent from the operator to achieve “10% increase
in energy efficiency”, we can see that there is an initiation
of xApp2 at the 131-th time slot. This xApp performs cell
sleeping and saves energy. For the next energy efficiency
increase intent given by the operator, it can be seen that
both xApp2 and xApp3 are working together. The proposed
HRL-based algorithm has successfully orchestrated these two
xApps for the desired performance gain. Fig. 3 and 4 basically
Fig. 3. Impact of operator intents on throughput. show the utility of the proposed system. Not only it can
induce operator intent as an optimization goal, but also it can
orchestrate xApps to gain desired performance output by using
the proper combination of xApps.
Fig. 5 shows the performance comparison between the
proposed HRL-based xApp orchestration scheme and the
baseline scenarios in terms of average system throughput.
Results are obtained under a constant load of 6 Mbps. The
proposed orchestration scheme achieves a 21.4% increase
and 7.5% increase in average system throughput compared
to the non-ML algorithm and single xApp scenario (traffic
steering xApp), respectively. It is because of the efficient
orchestration mechanism that involves multiple xApps that
Fig. 4. Impact of operator intents on energy efficiency. trigger the optimal combination of xApps to reach better
performance based on the operator intent.
A 5G NSA mode having different types of RAT (LTE and
Fig. 6 shows the performance comparison between the pro-
5G) in the simulation environment work together. We deploy
posed HRL-based xApp orchestration scheme and the baseline
an architecture based on [22]. The carrier frequency for LTE
scenarios in terms of average energy efficiency. The proposed
is set to be 800 MHz. For 5G NR small cells, band-switching
orchestration scheme obtains a 17.3% increase and 37.9%
BSs are deployed at 3.5 GHz and 30 GHz. BS transmission
increase in average energy efficiency compared to the single
power for LTE and 5G NR is set to be 38 dBm and 43 dBm,
xApp and non-ML scenario (cell sleeping xApp), respectively.
respectively [23].
Similar to the former, it is because of the HRL-based orches-
For the HRL implementation, the starting rate of learning is
tration mechanism that incorporates multiple xApps to achieve
set to 0.95. In order to maintain stable learning performance,
better performance based on the user intent. Also, note that we
we reduce the learning rate periodically after a certain number
use traffic steering in the former figure and cell sleeping in this
of episodes. Additionally, the discount factor used is 0.3. The
evaluation because they are specifically optimizing throughput
simulation is conducted 10 times using MATLAB, and the
and energy respectively.
average outcomes are presented along with a 95% confidence
interval.
B. Simulation results VI. C ONCLUSIONS
Before conducting the performance evaluation of the pro-
posed xApp orchestration scheme, first, we present how the In this paper, we show that the HRL-based intent-driven
intent-oriented HRL-based orchestration scheme works. Fig. 3 orchestration mechanism is vastly effective in not only opti-
shows that the operator intent of “increase throughput” leads to mizing KPIs but also providing great flexibility and control to
the selection of certain xApps. When there is a 5% throughput the operator. In this study, we have introduced a novel HRL-
increase intent from the operator, after a few time slots, there is based xApp orchestration mechanism that can perform xApp
a sharp increase in throughput. This is because xApp1 (traffic management and provide recommendations for the best combi-
steering xApp) has been invoked. When a 5% increase is nation of xApps given the operator’s intent. The optimal xApp
again given as an input, a combination of xApp1 and xApp3 orchestration scheme has led to a 7.5% increase in average
(intelligent beamforming xApp) is selected. When the operator system throughput and a 17.3% increase in energy efficiency
provides an intent to decrease power consumption by 5%, we compared to single xApp usage with no orchestration. In our
can see from Fig. 3 that there is a sharp decrease in throughput. future work, we plan to extend this orchestration to rApps and
This is because xApp1 and xApp3 have been terminated at other xApps with complex KPI interactions.
[5] K. Mehmood, K. Kralevska, and D. Palma, “Intent-Driven Autonomous
Network and Service Management in Future Cellular Networks:
A Structured Literature Review,” Computer Networks, vol. 220,
p. 109477, 2023. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S1389128622005114
[6] A. Leivadeas and M. Falkner, “A Survey on Intent-Based Networking,”
IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 625–655,
2023.
[7] H. Zhang, H. Zhou, and M. Erol-Kantarci, “Team Learning-Based
Resource Allocation for Open Radio Access Network (O-RAN),” in
ICC 2022 - IEEE International Conference on Communications, 2022,
pp. 4938–4943.
[8] M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “ColO-
RAN: Developing Machine Learning-based xApps for Open RAN
Closed-loop Control on Programmable Experimental Platforms,” 2022.
[9] A. Kilks, M. Dryjansky, V. Ram, L. Wong, and P. Harvey, “Towards
Autonomous Open Radio Access Networks,” ITU Journal on Future
and Evolving Technologies, vol. 4, no. 2, 2023.
[10] S. D’Oro, L. Bonati, M. Polese, and T. Melodia, “OrchestRAN: Network
Automation Through Orchestrated Intelligence in the Open RAN,” in
IEEE INFOCOM 2022 - IEEE Conference on Computer Communica-
Fig. 5. Performance comparison in terms of average system throughput.
tions, 2022, pp. 270–279.
[11] A. Banerjee, S. S. Mwanje, and G. Carle, “An Intent-Driven Orches-
tration of Cognitive Autonomous Networks for RAN Management,” in
2021 17th International Conference on Network and Service Manage-
ment (CNSM), 2021, pp. 380–384.
[12] J. Zhang, J. Guo, C. Yang, X. Mi, L. Jiao, X. Zhu, L. Cao, and
R. Li, “A Conflict Resolution Scheme in Intent-Driven Network,” in
2021 IEEE/CIC International Conference on Communications in China
(ICCC), 2021, pp. 23–28.
[13] E. J. Scheid, C. C. Machado, M. F. Franco, R. L. dos Santos, R. P.
Pfitscher, A. E. Schaeffer-Filho, and L. Z. Granville, “INSpire: Inte-
grated NFV-based Intent Refinement Environment,” in 2017 IFIP/IEEE
Symposium on Integrated Network and Service Management (IM), 2017,
pp. 186–194.
[14] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski
et al., “Human-Level Control Through Deep Reinforcement Learning,”
nature, vol. 518, no. 7540, pp. 529–533, 2015.
[15] P. Ren and M. Tao, “A Decentralized Sleep Mechanism in Heteroge-
neous Cellular Networks with QoS Constraints,” IEEE Wireless Com-
munications Letters, vol. 3, no. 5, pp. 509–512, Oct. 2014.
[16] F. B. Mismar, A. Alammouri, A. Alkhateeb, J. G. Andrews, and
Fig. 6. Performance comparison in terms of energy efficiency. B. L. Evans, “Deep Learning Predictive Band Switching in Wireless
Networks,” IEEE Transactions on Wireless Communications, vol. 20,
no. 1, pp. 96–109, 2021.
ACKNOWLEDGEMENT [17] F. B. Mismar, B. L. Evans, and A. Alkhateeb, “Deep Reinforcement
Learning for 5G Networks: Joint Beamforming, Power Control, and
Interference Coordination,” IEEE Transactions on Communications,
This work has been supported by MITACS and Ericsson vol. 68, no. 3, pp. 1581–1592, 2020.
Canada, and NSERC Canada Research Chairs and NSERC [18] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. B. Tenenbaum,
Collaborative Research and Training Experience Program “Hierarchical Deep Reinforcement Learning: Integrating Temporal
Abstraction and Intrinsic Motivation,” CoRR, vol. abs/1604.06057,
(CREATE) under Grant 497981. 2016. [Online]. Available: http://arxiv.org/abs/1604.06057
[19] M. Khaturia, P. Jha, and A. Karandikar, “5G-Flow: A unified Multi-RAT
RAN architecture for beyond 5G networks,” Computer Networks, vol.
R EFERENCES 198, p. 108412, 2021.
[20] M. A. Habib, H. Zhou, P. E. Iturria-Rivera, M. Elsayed, M. Bavand,
[1] L. Bonati, S. D’Oro, M. Polese, S. Basagni, and T. Melodia, “In- R. Gaigalas, Y. Ozcan, and M. Erol-Kantarci, “Hierarchical Reinforce-
telligence and Learning in O-RAN for Data-Driven NextG Cellular ment Learning Based Traffic Steering in Multi-RAT 5G Deployments,”
Networks,” IEEE Communications Magazine, vol. 59, no. 10, pp. 21–27, 2023.
2021. [21] A. Chagdali, S. E. Elayoubi, and A. M. Masucci, “Impact of Slice
[2] M. A. Habib, H. Zhou, P. E. Iturria-Rivera, M. Elsayed, M. Bavand, Function Placement on the Performance of URLLC with Redundant
R. Gaigalas, S. Furr, and M. Erol-Kantarci, “Traffic Steering for Coverage,” in 2020 16th International Conference on Wireless and
5G Multi-RAT Deployments using Deep Reinforcement Learning,” in Mobile Computing, Networking and Communications (WiMob), 2020,
2023 IEEE 20th Consumer Communications & Networking Conference pp. 1–6.
(CCNC), 2023, pp. 164–169. [22] P. Frenger and R. Tano. (2019) A Technical Look at 5G
[3] Y. Dantas, P. E. Iturria-Rivera, H. Zhou, M. Bavand, M. Elsayed, Energy Consumption and Performance. [Online]. Available: https:
R. Gaigalas, and M. Erol-Kantarci, “Beam Selection for Energy-Efficient //www.ericsson.com/en/blog/2019/9/energy-consumption-5{G}-nr
mmWave Network Using Advantage Actor Critic Learning,” 2023. [23] E. Dahlman, S. Parkvall, and J. Skold, 4G, LTE-Advanced Pro and The
[4] H. Zhou, L. Kong, M. Elsayed, M. Bavand, R. Gaigalas, S. Furr, Road to 5G, Third Edition, 3rd ed. USA: Academic Press, Inc., 2016.
and M. Erol-Kantarci, “Hierarchical Reinforcement Learning for RIS-
Assisted Energy-Efficient RAN,” in GLOBECOM 2022 - 2022 IEEE
Global Communications Conference, 2022, pp. 3326–3331.

You might also like