Reinforcement Learning For Scheduling and Mimo Beam Selection Using Caviar Simulations

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/357643577
Reinforcement Learning for Scheduling and Mimo beam Selection using

Caviar Simulations
Conference Paper · December 2021

DOI: 10.23919/ITUK53220.2021.9662100
CITATION READS
1 13
9 authors, including:
Ailton Oliveira Felipe Bastos

Federal University of Pará Federal University of Pará
14 PUBLICATIONS 16 CITATIONS 8 PUBLICATIONS 5 CITATIONS
SEE PROFILE SEE PROFILE
Daniel Takashi Ne Do Nascimento Suzuki Cleverson Nahum

Federal University of Pará Federal University of Pará
6 PUBLICATIONS 1 CITATION 19 PUBLICATIONS 22 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
SAMURAI - Smart 5G Core And MUltiRAn Integration View project
All content following this page was uploaded by Ailton Oliveira on 10 February 2022.
The user has requested enhancement of the downloaded file.

REINFORCEMENT LEARNING FOR SCHEDULING AND MIMO BEAM SELECTION
USING CAVIAR SIMULATIONS
João Paulo Tavares Borges1 ; Ailton Pinto de Oliveira1 ; Felipe Henrique Bastos e Bastos1 ; Daniel Takashi Né do Nascimento
Suzuki1 ; Emerson Santos de Oliveira Junior1 ; Lucas Matni Bezerra2 ; Cleverson Veloso Nahum1 ; Pedro dos Santos Batista3 ;
Aldebaro Barreto da Rocha Klautau Júnior1
1 Universidade Federal do Pará, Belém 66075-110, Brazil
2 Universidade Estácio de Sá, Belém 66055-260, Brazil
3 Ericsson Research, 164 80 Stockholm, Sweden
ABSTRACT
This paper describes a framework for research on

Reinforcement Learning (RL) applied to scheduling and
MIMO beam selection. This framework consists of asking
2021 ITU Kaleidoscope: Connecting Physical and Virtual Worlds (ITU K) | 978-92-61-33881-7/21/$31.00 ©202110.23919/ITUK53220.2021.9662100
the RL agent to schedule a user and then choose the index UAV
ofabeamformingcodebooktoserveit. Akeyaspectofthis
problemisthatthesimulationofthecommunicationsystem
and the artificial intelligence engine is based on a virtual
world created with AirSim and the Unreal Engine. These
components enable the so-called CAVIAR methodology,
which leads to highly realistic 3D scenarios. This paper
describes the communication and RL modeling adopted in
the framework and also presents statistics concerning the
implementedRLenvironment,suchasdatatraffic,aswellas
resultsforthreebaselinesystems. Figure 1 – CAVIAR simulation scenario, depicting the
radiationpattern(inlightgreen)correspondingtothechosen
Keywords-5G,6G,beamselection,MIMO,mmWave,RL beamformingcodebookindextoserveadrone(attheright).
1. INTRODUCTION Systems such as IEEE 802.11ad are usually designed for
worst-case scenarios and, in most situations, continuously
ReinforcementLearning(RL)isalearningparadigmsuitable send signals that do not carry information (overhead) [9].
for problems in which an agent has to maximize a given This overhead may represent a significantp arcelo fthe
reward,whileinteractingwithanever-changingenvironment. channelcapacity,anddecreasingitisafundamentalproblem
This class of problem appears in several points of interest that can enable systems to improve the usage of physical
in 5th Generation (5G) and 6th Generation (6G) mobile resources (e.g., with lower latency and higher bit rates)
networks,suchas:congestioncontrol[1],networkslicing[2], [10,11,12].
resourceallocation[3],andthe5GPhysicalLayer(PHY)[4].
However,thelackoffreelyavailabledatasetsorenvironments In this work, the beam selection and user scheduling
totrainandassessRLagentsisapracticalobstaclethatdelays problems are posed as a game that must be solved with
thewidespreadadoptionofRLin5Gandfuturenetworks. RL. The game is based on a simulation methodology
namedCommunicationNetworks,ArtificialIntelligenceand
To address this challenge, some works explore the use of Computer Vision with 3D Computer-Generated Imagery
virtualworldstogeneratedatasetsbycreatingenvironments (CAVIAR), with a preliminary version proposed in [13].
forcommunicationsingeneral[5],andArtificialIntelligence The CAVIAR simulation integrates three subsystems: the
(AI) / Machine Learning (ML) applied to 5G/6G [6], communicationsystem, theAIandMLmodels, andfinally
leveragingthefactthat5Gandbeyondsystemswillbenefit thevirtualworldcomponents. Inthispaper,theproblemis
from rich contextual information to improve performance based on simulating a communication system immersed in
and reduce loss of radio resources to support its services avirtualworldcreatedwithAirSim[14]andUnrealEngine[15].
[4, 7, 8]. So, the key idea in this paper is to use realistic
representations of deployment sites together with physics More specifically, the goal is to schedule and allocate
and sensor simulations, to generate a virtual representation resources to Unmanned Aerial Vehicles (UAVs), cars and
that combined with the communication network simulator, pedestrians,composingascenariowithaerialandterrestrial
enablestrainingRLagentsfortaskssuchasbeamselection.
978-92-61-33881-7/CFP2168P @ ITU 2021 Kaleidoscope
Authorized licensed use limited to: UNIVERSIDADE FEDERAL DO PARA. Downloaded on February 10,2022 at 17:42:35 UTC from IEEE Xplore. Restrictions apply.
2021 ITU Kaleidoscope Academic Conference
Figure 2 – CAVIAR simulation overview.
User Equipment (UE). The RL agent is executed at the files, which are named episodes, containing the trajectory
Base Station (BS) and periodically takes actions based on the data of all moving objects within a simulation. To generate it,
information captured from the environment, which includes a waypoint file, which is a text file with reference points, must
channel estimates, buffer status, and positions from a Global be executed by AirSim. During its execution, the information
Navigation Satellite System, such as GPS. The RL agent from the mobile elements is stored in the episode. Each
receives a reward based on the service provided to the episode lasts about three minutes, with a sampling interval
users. The training occurs “offline”, without rendering the of ten milliseconds, and is composed by columns related to
3D scenes, but it is possible to render the output in a position and orientation for pedestrians and cars, with the
post-processing step and generate a video. addition of acceleration, linear, and angular velocities for
UAVs. To use the episode files to obtain information from
This work is organized as follows. In Section 2 we discuss Multiple-Input Multiple-Output (MIMO) channels and data
CAVIAR simulations in general and the specific RL problem traffic, one must execute them within the CAVIAR simulation
addressed in this paper. Sections 3 and 4 describe the environment.
communication and machine learning models, respectively.
Simulations results are presented in Section 5, while Section Unreal/AirSim
6 concludes the paper.
Waypoint
Episodes
generator
2. CAVIAR SIMULATIONS
- MIMO channels
As proposed in [6] and shown in Figure 2, the CAVIAR SimulaƟon
- Combined channel magnitudes
framework incorporates three subsystems: AI/ML, virtual - Data traĸc environment
world, and wireless communications. In the following
paragraphs we describe the framework, focusing first on the Figure 3 – CAVIAR data generation.
overall description of the methodology and then on how
it was realized in the user scheduling and beam selection
environment. 2.1 Overall CAVIAR description
RL tasks can be continuous or episodic; this last category As previously mentioned, Figure 2 displays an overview of the
assumes the context adopted in this work. Figure 3 expected components in a CAVIAR simulation. In summary,
exemplifies the CAVIAR data generation pipeline: the data the blocks encompassing the proposed simulation strategy
set is provided as a set of Comma-Separated Values (CSVs) can be described as follows: the Communications Engine
Connecting physical and virtual worlds
handles all information regarding the communication aspect (position, orientation, acceleration, etc.) of each mobile
of the simulation, such as data traffic, buffer and channel entity. For this problem, the samples are collected at
information. In the CAVIAR simulations, channels can be every 10ms and they contain information of 37 entities (34
pre-computed and the communication simulation decoupled pedestrians, 2 cars, and 1 UAV).
from the physics engine, as often used in AI/ML applied to
beam selection [7, 4]. As shown in Figure 4 the spatial data generated by the
virtual scenario is the input for the CAVIAR simulation
The 3D assets used in the Environment and as Mobile entities, environment, more specifically, the communication engine,
such as UAVs, cars, buildings, etc, are either created or which is responsible for computing the radio channels and
obtained online, as described in [4]. They compose the other parameters related to the telecommunication system,
simulation environment as fixed or mobile objects, whose such as buffer size, etc. The output of the communication
eventual movements and interactions are managed by the engine along with the spatial data is the input for an RL
Mobility engine and by the Physics engine of the virtual agent, that is trained to choose, in each time slot, a user to
world subsystem, respectively. The Sensors engine output serve, and the beam that should be used.
constitutes the input to the AI/ML frontend engine.
The AI/ML engine receives signals and communication 3. COMMUNICATION MODEL

parameters that were simulated in the virtual-world
The simulation environment incorporates not only the
environment and suggests actions that are then implemented
RL-related functions required by the OpenAI Gym
by the Orchestrator, that also considers parameters from the
Application Programming Interface (API), but also the
Communications engine and Environment. An example of an
functions related to the communication system. This section
action in the context of beam selection would be sending to
will describe the adopted communication model.
the base station a list of codebook indices to try and avoid a
full beam sweeping. We assume downlink transmission using a carrier frequency
fc = 60 GHz and a bandwidth of 100 MHz. The BS serves
three distinct receivers or UEs, which are located at a car, at
2.2 CAVIAR simulation for user scheduling and beam
a UAV and used by a pedestrian. The BS has an individual
selection problems
buffer with a size of 1 Gb for each receiver. We assume that
For the user scheduling and beam selection problems, the each packet has 8188 bytes. When the buffer of a specific
Communications engine used by CAVIAR simulations was user becomes full, the newly arrived packets are dropped.
defined by geometric channel models, further described in
The MIMO system corresponds to an analog architecture
Section 3. The Physics engine and the Mobility engine are
using a Uniform Planar Array (UPA) with Nt antenna
handled by Unreal Engine and AirSim, and finally, for the
elements at the BS, and receivers with UPAs with Nr
AI/ML engine, we assume an RL environment. Upon having
antennas. Therefore, the MIMO channel between the BS
the episodes available, the environment can be executed
and a given user is represented by a Nr × Nt matrix H. The
to allow an agent to assume the role of a BS, scheduling
codebooks are obtained from Discrete Fourier Transform
and serving the users with a specific beam chosen from its
(DFT) matrices and denoted by Ct = {w̄1, · · · , w̄ Nt } and
codebook. The episodes can be used to train and test agents,
Cr = {f¯1, · · · , f̄ Nr }. They are used at the transmitter and
as well as store the outputs in a CSV file, which can be used
the receiver sides, respectively. When modeling the beam
as it is or reproduced graphically in Unreal, as described in
selection, the chosen beam pair [p, q] is represented by a
Figure 4.
unique index i ∈ {1, 2, · · · , M }, where M ≤ Nt Nr . Each
index p (or q) generates a specific radiation pattern, as
Unreal/AirSim
depicted in Figure 5 for a Nt = 64.
Waypoint
Episodes
generator
For the i-th index, the equivalent channel is calculated as:
SimulaƟon yi = wi∗ Hfi, (1)

environment
Rendering
and the optimal beam index î is given by
Agent output file
RL agent RL agent
(scheduled user and
codebook index) test training î = arg max |yi |. (2)
i ∈ {1, ··· , M }
Figure 4 – CAVIAR simulation flow. We do not take noise into account in order to isolate the
impact of the beam selection procedure.
Using a virtual scenario provided by CAVIAR, three mobile
entities: a pedestrian, a car, and a UAV are simulated in Ray Tracing (RT) was used in [4], to generate realistic
order to generate a data set of urban mobility. This data communication channels H. For this paper, we have not used
is organized in episodes that contains spatial information RT but a simpler procedure based on the geometric MIMO
different network load scenarios representing a light and

heavy network traffic to simulate the traffic variations along
with the scenes. The simulation alternates light and heavy
traffic in each 1000 scenes. The total throughput of the
heavy scenario is bigger than the lighter one. Each user
has a specific traffic magnitude defined as a percentage of
total throughput in accordance with Table 1, enabling the
differentiation of applications. Figure 6 shows the histogram
of traffic throughput for each user in Gbps. Each user presents
the heavy and light traffic behavior. The incoming traffic for
each user is buffered when there is buffer space available,
otherwise the excess of packets are tail-dropped. The packets
are also dropped when they occupy the buffer for more than
10 seconds.
Table 1 – Network load information for light and heavy
scenarios.
Figure 5 – Example of radiation pattern for specific beam
index with an 8 × 8 UPA. Network load Total throughput UAV (%) Pedestrian (%) Car (%)
Light 0.48 Gbps 50% 20% 30%
channel model [16]. The reason for this choice is that we first Heavy 0.96 Gbps 50% 20% 30%
want to evolve the AI/ML engine such that newer versions
allow rendering in (near) real-time, along the training of the
RL agent. Right now, we are not able to render each scene
and the agent choices, but CAVIAR-v2 is being developed
with this goal. We will later work in the CAVIAR-RT-v1,
where RT stands for support to ray tracing.
The simplified H currently represents a Line-of-Sight (LoS)

channel. A narrowband channel model [16] is used, but
wideband models can be readily incorporated in case their
extra computational cost is not an issue. For simplicity, the
users have a single antenna (Nr = 1) while the BS has a 8 × 8
UPA (Nt = 64).
The geometric channel model [16] is adopted with L = 2 Figure 6 – Histogram of packets traffic received by the BS
Multipath Components (MPCs): for each user.

L
H= Nt Nr α ar (φA, θ A)a∗t (φD, θ D ). (3)
4. MACHINE LEARNING MODEL
=1
The parameters in Eq. (3) are obtained as follows. The 4.1 Evaluation of RL agents
phase of the complex-gain α is obtained from a uniform
To evaluate the RL agent, the return G over the test episodes
distribution with support [0, 2π]. For generating the
is used. The return G e for episode e is
magnitude |α |, first the distance d between the BS and
the given receiver is used to calculate the received power e

Ns
via the Friis equation [17]. The path loss is obtained from Ge = re [t], (4)
this equation and determines |α |, which decreases with t=1
d. The elevation φ and azimuth θ angles, for departure
where Nse is the number of scenes in episode e. The
(e.g. φD ) and arrival (e.g. φA) are obtained from the
corresponding reward re [t] at discrete-time t is a weighted
orientation provided by the LoS path. The nominal LoS
sum of transmitted and discarded packets given by
angles are slightly changed by adding to them Gaussian
random variables with zero-mean and variance of 1 degree. Ptx [t] − 2Pd [t]
re [t] = , (5)
These angles are used to compose the steering vectors at and Pb [t]
ar .
where Ptx [t], Pd [t], and Pb [t] correspond, respectively, to the
total amount (summation for all users) of transmitted, dropped
3.1 Traffic model and buffered packets at time t. The reward re [t] is restricted
to the range −2 ≤ re [t] ≤ 1. At each time t, a single user can
The users’ data traffic is defined as Poisson processes with be served, but Pb [t] accounts for the number of packets in all
time-varying mean λu [t] for user u. We specified two three buffers. Hence, re [t] = 1 only if all buffered packages
of the scheduled user are transmitted, while the buffers of the 4.3 Experiment description
other two users were empty.
We developed an experiment using CAVIAR for the problem
of scheduling and beam selection. Given that a complete
4.2 Possible inputs to RL agents episode file contains information about all moving objects in
a scene (all pedestrians, cars, etc.), we simplified the data
The inputs (also known as states or observations) can generated by the simulation assuming that the beam selection
be selected both from information provided in CSV files RL agent, named B-RL, only uses data from the three served
(position (x,y,z), velocities, etc.) or obtained from the users (uav1, simulation_car1 and simulation_pedestrian1).
environment, such as the buffer state and channel information
for specific beam index previously chosen.
More specifically, the RL agent can use: the UEs geographic

position in X, Y, Z, with the origin of the coordinate system
being on the BS RL agent. Also, the UEs orientation in the
three rotation coordinates: the front and side roll angles, as
well as its rotation over its own axis. Besides that, there are
also dropped, transmitted and buffered packets. Finally, the
last two other available input features for the agent are the bit
rate and the channel magnitude at each step of the simulation.
Note that we assume the BS (more specifically, the RL agent)

does not know the best index î. In practice, this would require
a full beam sweep, which is assumed to be unfeasible in our Figure 7 – Channel maximum throughput when using always
model due to stringent time requirements. Similarly, given the best beam index î and a simple scheduling strategy that
that the RL agent chose user u and beam index j at time t, chooses users sequentially (1-2-3-1-2-3...), in a round-robin
it learns only the magnitude |y j | and the spectral efficiency fashion.
Su,t, j for this specific user and beam index at time t.
The following results are extracted from an Advantage Actor
The channel throughput Tu,t, j = Su,t, j BW is obtained by Critic (A2C) agent from the Stable-Baselines [18], trained
multiplying the spectral efficiency by the bandwidth BW and with default parameters. The states of the agent are defined by
indicates the maximum number of bits that can be transmitted. seven features: X, Y, Z, packets dropped, packets transmitted,
An empirical factor is used to adjust Tu,t, j in order to define buffered packets and bit rate. The action space is composed
the network load, such that, for the given input traffic, some by a vector with two integers: a numeric identity of the
packets have to be dropped. user being allocated at the specific timestamp, that can range
between [0, 2]; and the codebook index of the beam to be
Algorithm 1 summarizes the steps for executing an already used to serve it, which is an integer from the range [0, 63].
trained RL agent. Finally, the reward used Eq. (5).
Algorithm 1: High-level algorithm of the RL-based Because the RL agent was designed to play the role of a simple
scheduling and beam selection problem. example and not optimize performance, two other agents were
Initialization for a given episode e; developed: B-Dummy and B-BeamOracle. The B-Dummy
while t ≤ Nse do agent assumes random action choices for both the scheduled
1) Based on the number of bits in the buffers of user and which beam index to use. The B-BeamOracle agent
the users and other input information, RL agent follows a sequential user scheduling pattern (1-2-3-1-2-3,...)
schedules user u and selects beam index i; in a round-robin fashion, and always uses the optimum beam
2) Environment calculates combined channel index î for the selected user. In Figure 7 we characterize the
magnitude |yi | and corresponding throughput Ti ; channel maximum throughput of this experiment when using
3) The number of transmitted bits is B-BeamOracle.
Ri = min(Tu,t, j ; bu ) ;
4) Update buffers;
5) Receive new packets; 5. EXPERIMENT RESULTS
6) Eventually drops packets;
7) Environment calculates rewards and updates its The CAVIAR environment was used to generate 70 episodes,
state; from which 50 were used for training the RL agent, and 20 for
8) Update buffers again; testing. We present results for the three agents: B-Dummy,
end B-BeamOracle and the RL-based A2C agent.
In Figure 8, it is possible to verify the switching at every 1000
ACKNOWLEDGEMENTS
This work was supported in part by the Innovation Center,

Ericsson Telecomunicacões S.A., Brazil, CNPq and the
Capes Foundation.

REFERENCES
[1] I. Nascimento, R. Souza, S. Lins, A. Silva, and

A. Klautau, “Deep Reinforcement Learning Applied
to Congestion Control in Fronthaul Networks,”
Proceedings - 2019 IEEE Latin-American Conference
on Communications, LATINCOM 2019, pp. 1–6, 2019.
Figure 8 – Reward obtained by the B-BeamOracle agent for
a given episode. The traffic load switches every 1000 time [2] Y. Kim and H. Lim, “Multi-agent reinforcement
steps between “heavy” and “light”. learning-based resource management for end-to-end
network slicing,” IEEE Access, vol. 9, pp.
samples, between the “heavy” and the “light” data traffic.
56 178–56 190, 2021.
The sequential scheduling proves to be sufficient to attend
the demand in light traffic situations, however, for intense [3] X. Wang and T. Zhang, “Reinforcement Learning
traffic moments, even using the best beam index î, without Based Resource Allocation for Network Slicing in 5G
proper scheduling, the performance of the reward tends to be C-RAN,” in 2019 Computing, Communications and IoT
negative. Applications (ComComAp), 2019, pp. 106–111.
[4] A. Klautau, P. Batista, N. González-Prelcic, Y. Wang,

and R. W. Heath, “5G MIMO data for machine learning:
Application to beam-selection using deep learning,” in
2018 Information Theory and Applications Workshop
(ITA). IEEE, 2018, pp. 1–9.
[5] E. Egea-Lopez, F. Losilla, J. Pascual-Garcia, and J. M.

Molina-Garcia-Pardo, “Vehicular networks simulation
with realistic physics,” IEEE Access, vol. 7, pp.
44 021–44 036, 2019.
[6] A. Klautau, A. de Oliveira, I. P. Trindade, and

Figure 9 – Histogram of the total sum of rewards achieved in W. Alves, “Generating MIMO Channels For 6G Virtual
the test episodes. Worlds Using Ray-tracing Simulations,” arXiv preprint
arXiv:2106.05377, 2021.
Figure 9 shows a reward histogram for different agents along
20 test episodes. As expected, the B-BeamOracle presents [7] A. Klautau, N. González-Prelcic, and R. W. Heath,
the best performance, while the B-RL achieves performance “LIDAR data for deep learning-based mmWave
close to the B-Dummy, which simply uses random actions. beam-selection,” IEEE Wireless Communications
One reason for the bad performance of B-RL is the choice Letters, vol. 8, no. 3, pp. 909–912, 2019.
of its input parameters. None of the seven features help
the agent to directly learn the user and beam index used [8] W. Jiang, B. Han, M. A. Habibi, and H. D. Schotten,
in its previous decision. Better modeling of the agent can “The road towards 6G: A comprehensive survey,” IEEE
substantially improve its performance. Open Journal of the Communications Society, vol. 2,
pp. 334–366, 2021.
6. CONCLUSION [9] M. Giordani, M. Mezzavilla, C. N. Barati, S. Rangan,
and M. Zorzi, “Comparative analysis of initial access
This paper presented a framework for research on RL techniques in 5G mmWave cellular networks,” in 2016
applied to scheduling and MIMO beam selection. Using Annual Conference on Information Science and Systems
the framework, we provided statistics of an experiment in (CISS). IEEE, 2016, pp. 268–273.
which an RL agent faces the problems of user scheduling and
beam selection. The experiment allowed us to validate the [10] J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R.
designed environment for RL training and testing. Future Bhat, and R. W. Heath, “Millimeter-wave vehicular
development will focus on rendering the 3D scenarios while communication to support massive automotive
training the RL agent, as well as using more realistic channels sensing,” IEEE Communications Magazine, vol. 54,
via ray tracing. no. 12, pp. 160–167, 2016.
<> J. Kim and A. F. Molisch, “Fast millimeter-

wavebeam training with receive beamforming,”
Journal ofCommunications and Networks, vol. 16,
no. 5, pp.512–522,2014.
<> P. Zhou, X. Fang, Y. Fang, Y. Long, R. He, and
9 Han, “Enhanced random access and beam training
for millimeter wave wireless local networks with
high user density,” IEEE Transactions on Wireless
Communications,vol.16,no.12,pp.7760–7773,2017.
<> A. Oliveira, F. Bastos, I. Trindade, W. Frazão,

" Nascimento,D.Gomes,F.Müller,andA.Klautau,
“Simulation of Machine Learning-Based 6G Systems
inVirtualWorlds,”SubmittedtoITUJournalonFuture
andEvolvingTechnologies,2021.
<> “Welcome to AirSim,” https://microsoft.github.io/
AirSim,accessed: 2021-10-19.
<> “Themostpowerfulreal-time3Dcreationtool,”
https://www.unrealengine.com,accessed: 2021-10-19.
<> R.W.Heath,N.González-Prelcic,S.Rangan,W.Roh,
andA.M.Sayeed,“AnOverviewofSignalProcessing
Techniques for Millimeter Wave MIMO Systems,”
vol.10,no.3,pp.436–453,Apr.2016.
<> Balanis, ConstantineA,Antennatheory: analysisand

design. Johnwiley&sons,2015.
<> “Welcome to Stable Baselines docs! - RL Baselines
Made Easy,” https://stable-baselines.readthedocs.io,
accessed: 2021-10-19.
View publication stats

Reinforcement Learning For Scheduling and Mimo Beam Selection Using Caviar Simulations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reinforcement Learning For Scheduling and Mimo Beam Selection Using Caviar Simulations

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Reinforcement Learning for Scheduling and Mimo beam Selection using

Conference Paper · December 2021

Ailton Oliveira Felipe Bastos

SEE PROFILE SEE PROFILE

Daniel Takashi Ne Do Nascimento Suzuki Cleverson Nahum

SEE PROFILE SEE PROFILE

SAMURAI - Smart 5G Core And MUltiRAn Integration View project

The user has requested enhancement of the downloaded file.

This paper describes a framework for research on

978-92-61-33881-7/CFP2168P @ ITU 2021 Kaleidoscope

Figure 2 – CAVIAR simulation overview.

The AI/ML engine receives signals and communication 3. COMMUNICATION MODEL

SimulaƟon yi = wi∗ Hfi, (1)

diﬀerent network load scenarios representing a light and

The simpliﬁed H currently represents a Line-of-Sight (LoS)

More speciﬁcally, the RL agent can use: the UEs geographic

Note that we assume the BS (more speciﬁcally, the RL agent)

In Figure 8, it is possible to verify the switching at every 1000

This work was supported in part by the Innovation Center,

[1] I. Nascimento, R. Souza, S. Lins, A. Silva, and

[4] A. Klautau, P. Batista, N. González-Prelcic, Y. Wang,

[5] E. Egea-Lopez, F. Losilla, J. Pascual-Garcia, and J. M.

[6] A. Klautau, A. de Oliveira, I. P. Trindade, and

<> J. Kim and A. F. Molisch, “Fast millimeter-

<> A. Oliveira, F. Bastos, I. Trindade, W. Frazão,

<> Balanis, ConstantineA,Antennatheory: analysisand

You might also like

<> J. Kim and A. F. Molisch, “Fast millimeter-

<> A. Oliveira, F. Bastos, I. Trindade, W. Frazão,

<> Balanis, ConstantineA,Antennatheory: analysisand