Professional Documents
Culture Documents
Abstract—Next generation 6G wireless envisions much higher beams between the next-generation base station nodes (gNB)
data rate and lower latency compared to 5G wireless networks. and mobile User Equipment (UE).
Directional antennas, with narrow beams, across high mmWave
frequencies, hold the key to achieving such high data rates. One major challenge [1] in BM lie in inefficient beam
However, Beam Management (BM), which is the process of sweeping for the downlink (DL)/uplink (UL) beam corre-
finding appropriate transmit and receive beams, offers significant spondence. Beam sweeping involves an exhaustive search to
challenges. Dynamic channel variation, user mobility, and narrow find the beam/beam-pair having the strongest signal strength.
beams in high frequency mmWave channels further complicate
Naturally, performing an exhaustive search in all possible
these challenges. Efficient Machine Learning (ML) strategies can
be used to alleviate this overhead. In spite of their underlying directions is quite complex and expensive (both in terms of
differences, sub-6 GHz and high frequency mmWave channels latency and power). This already complex problem become
share some similarities in array geometry, number of paths, and even more complicated in the mmWave bands, located in the
surrounding environment. As sub-6 GHz channel characteristics frequency spectrum ranging from 30 to 300 GHz. MmWave
are relatively easier to acquire and learn, we introduce a new
bands provide the benefits of a big chunk of available and
machine learning framework, using transformer and LSTM to
learn sub-6 GHz channel information over time for efficient beam un-explored frequencies. However, these benefits come with
prediction across high frequency mmWave channels. System a high attenuation cost in free space. To address the high
Level simulation results point out that when using time-series attenuation, a large number of highly directional MIMO
data-based learning of beam patterns with transformers or antennas are used, whose gain compensates for the path
LSTMs in sub-6 GHz channels, our proposed scheme achieves
loss. The use of these highly directional MIMO antennas
up to 99.5% top-5 beam prediction accuracy while reducing the
BM overhead by over 50% compared to existing work. demands precise and efficient beam selection methods to
Index Terms—6G wireless, beam management, machine learn- ensure the required application data rate and meet the strict
ing, transformer, LSTMs, channel measurements delay requirements. Another challenge such bands pose is
the low diffraction capacity and severe blocking caused by
I. I NTRODUCTION most materials. Furthermore, beam searching across these
The insatiable demand for high data rates and ubiquitous large number of narrow beams in high mmWave frequencies
connectivity is fueling the interests for upcoming sixth gener- incurs more delay and computational power. This makes BM
ation (6G) wireless. The vision of 6G wireless aims to provide in mmWave more complex and expensive compared to BM in
an order-of-magnitude more spectrum and ubiquitous connec- sub-6 GHz bands. Therefore, 6G wireless will continue using
tivity for emerging new applications, like eXtended Reality sub-6 GHz bands, also known as frequency range 1 (FR1)
(XR) and Metaverse, by exploring high-frequency millimeter for outdoor to indoor connectivity, while exploiting mmWave
wave (mmWave) bands that provide the benefits of a big chunk bands or frequency range 2 (FR2) for high data rates in indoor
of available and un-explored frequencies. However, mmWave and outdoor environments [2].
bands face challenges such as high path loss, blockage, Artificial intelligence (AI) and machine learning (ML) are
and oxygen absorption. Dense network deployment with a expected to play a key part in alleviating the complexity of
wide number of small-size, directional antennas is expected 6G wireless. Major wireless standard bodies, like the 3rd
to alleviate these challenges. This will be aided by beam Generation Partnership Project (3GPP), anticipate native AI-
management (BM) to find the (near) optimal transmit/receive ML support in 6G wireless. 3GPP’s efforts encompass AI-
ML applications for networks and radio interfaces, with an
§ This work was done as an intern at Mediatek Inc eventual goal of standardization [3]. More recently, 3GPP
has recognized the use of AI/ML techniques for efficient works are known to suffer from significant path loss and
BM as a key use-case. As many fundamental relationships blockage [14], and hence dense network deployments with
in wireless communications are often non-linear, deep neural large-scale antenna arrays are used to achieve high beam-
networks (DNNs) have recently been adopted in various fields forming gains to compensate this loss. However, the increased
of wireless communications including BM [4]–[12]. scale of antenna arrays introduces high overhead when UEs try
Interestingly, while different types of DNN models [11], to periodically measure the signal quality of different beams
[12] have been recently used for beam prediction across for reporting the beam measurements for beam alignment and
frequencies, time-series data for cross-frequency BM is not potential handovers [15].
yet explored. As BM in wireless involves non-linear mappings Adaptive learning of time varying wireless channels using
across time-varying wireless channels, we believe that explor- AI/ML techniques are recently used to address this challenge
ing time-series data for cross-frequency BM has sufficient and enable reliable and efficient beam management [16], [17].
potential for improving its efficiency and accuracy. This is The work in [4] uses reinforcement learning to develop a
where we believe that models, such as transformers [13] multi-armed bandit framework for beam tracking. A data-
and long short-term memory (LSTMs) [6] come into play§ . driven strategy that fuses the reference signal received power
Similar to recurrent neural networks (RNNs) [5], LSTM is (RSRP) with orientation information using an RNN is devel-
a deep learning model architecture capable of learning long- oped in [7]. The authors in [8] use LSTM to support large-
term dependencies for sequence prediction problems. A trans- scale antenna array enabled hybrid, directional BM in hotspot-
former, on the other hand, is another deep learning model based virtual small cells. [9] explores LSTM to utilize the
architecture that avoids recurrence and instead relies entirely past channel state information (CSI) for efficient prediction of
on an ‘attention’ mechanism, differentially weighing the sig- future channels in vehicular mmWave systems. Deep learning
nificance of each part of the input data for obtaining global techniques, similar to Natural Language Processing (NLP)
dependencies between input and output. Removing recurrence is used in [10] to predict the best serving beams of the
in favor of attention mechanisms endows transformers with mobile UEs. Interestingly, the work in [11] points out the
significantly higher parallelization, compared to other popular efficacy of fully-connected neural networks (FCNN) to closely
DNN methods like RNNs and LSTMs. approximate these mapping functions, required for predicting
In this paper, we introduce a new time-series data-based the optimal mmWave beams from sub-6 GHz channels. [12]
beam prediction using models, such as LSTMs and trans- also uses deep neural networks for exploring power delay
formers that exploits channel characteristics of a sub-6 GHz profile (PDP) of a sub-6 GHz channel to predict the optimal
channel to choose a mmWave beam. The use of LSTMs mmWave beams.
incurs lower number of FLOPs resulting in lower energy Broadly a lot of this work has focused on predicting beam
consumption, while transformers can serve inference faster due parameters by either (i) looking at fewer beam parameters, (ii)
to the presence of its underlying attention mechanism. Our parameters over time, (iii) parameters in a different frequency
framework lets 6G developers choose which model they need band, etc. Our focus is on the latter two classes of problems,
to use based on their specific requirements. Assuming a sub-6 where AI/ML has been recently used to share information
GHz connection is already established, we learn its underlying along the temporal and frequency domains individually. How-
channel information over time by using LSTM/transformer ever, to the best of our knowledge, exploiting time series data
learning. While learning over sub-6 GHz information aids for beam prediction in different frequency bands is not yet
in reducing the search space needed for the initial mmWave explored. As BM in high frequency mmWave bands are more
beam establishment, the use of LSTMs/transformers over time complex and expensive compared to BM in sub-6 GHz bands
series helps in the efficient capturing of time-varying wireless and wireless channels are inherently time-varying, we posit
channel characteristics. that there is an additional value in combining the information
The rest of the paper is organized as follows: In Sec- across the two domains (time and frequency) to use temporal
tion II we take a look into the major related works on beam information for improving the prediction accuracy in the
prediction and point out our motivation behind using time frequency domain beam prediction. Fortunately, LSTMs and
series information. Section III explains the system model and transformers provide an efficient tool to capture such time
problem formulation. Subsequently, we explain our proposed series information [18] Although originally developed for
time-series based beam prediction method in Section IV. Natural Language Processing (NLP) [6], [13], over the next
Simulation results in Section V show the efficacy of our few sections we will show how time-series based learning can
solution. Section VI concludes the paper with pointers to delve into time varying wireless channels information to assist
future research. and improve cross-frequency BM in 6G wireless.
II. R ELATED W ORKS AND M OTIVATION III. S YSTEM AND C HANNEL M ODELS
BM problem has recently raised an increased research
We consider a system where uplink communication happens
interest, especially for the high frequencies (e.g., mmWave
in FR1 and downlink in FR2 as shown in Figure 1. We assume
and THz) 6G wireless networks. These high frequency net-
MF R1 antennas for uplink and MF R2 antennas for downlink.
§ RNN was also a candidate, but since it experiences vanishing/exploding We follow a similar system operation and channel models as
gradients for longer sequences, it does not help here. defined in [11]. We use the uplink channel vector as hF R1 [k] ∈
[11], [12] have recently pointed out that it is feasible to
use the FR1 channel information to predict the beamforming
vector in FR2. Moreover, they have also shown that there exists
FR1 hFR1 FR1 a mapping from the FR1 channels to achievable rate in FR2
Transceiver Transceiver channel.
FR2 hFR2 FR2 Φn : hF R1 → R(hF R2 , fn ), n = 1, 2, . . . , ∥F ∥, (5)
Transceiver Transceiver UE and that there exists a mapping function that can be leveraged
Base-station to obtain the optimal mapping between FR1 and FR2 channels.
Let’s define the optimal mapping function as ηt⋆ at timestep t
as
Fig. 1: Depicting the system setup where the FR1 and FR2
η ⋆ = argmaxn∈{n=1,2,...,∥F ∥} ΦnF R1 (hF R1 ) (6)
antenna arrays are colocated on the base-station and UE.
The works in [7]–[10] showed that temporal FR2 channel
information can be utilized to predict the optimal FR2 beam-
CMF R1 ×1 at the kth subcarrier with k = 1, . . . , K (K is the
forming vector at next time-step, thereby giving
total number of subcarriers) and the signal received at the gNB
Ψt : {ht−m t−m+1
F R2 , hF R2 , . . . , ht−1 t ⋆
F R2 } → R(hF R2 , f ) (7)
sub-6 GHz or FR1 array as:
Combining with Equation 7 with Equation 6, gives us
yF R1 [k] = hF R1 [k]sp [k] + nF R1 [k], (1) Ψt : {η ⋆ (ht−m ⋆ t−m+1
F R1 ), η (hF R1 ), . . . , η ⋆ (ht−1
F R1 )} (8)
t ⋆
→ R(hF R2 , f )
where sp [k] is the uplink pilot signal satisfying E∥sp [k]∥2 =
PF R1 We term this composition Ω mapping from the FR1 chan-
K , with PF R1 as the uplink transmit power, and nF R1 [k] ∼ nels for m time-steps to the optimal beamforming vector at
NC (0, σ 2 , I) is the noise at the gNB FR1 array. Similarly,
the latest time-step.
defining the downlink beamforming vector as f ∈ CMF R2 ×1
Ωt : {ht−m t−m+1
F R1 , hF R1 , . . . , ht−1 t
F R1 } → R(hF R2 , f )
⋆
(9)
leads to the signal received by the UE as:
Prior works have shown that high beam training time
raises significant challenges with solving these functions using
yF R2 [k̄] = hTF R2 [k̄]f sd + nF R2 [k̄] (2)
traditional (non-ML) approaches. Hence, we decided to look
where hF R2 [k̄] ∈ CMF R2 ×1 represents the downlink channel at machine learning techniques that can help to solve this
from the mobile UE to the gNB FR2 array at the k̄ th problem. We observe that this is quite similar to the mapping
subcarrier, k̄ = 1, 2, . . . , K̄. Owing to the hardware constraints used to solve time-series problems [18] in AI/ML. Different
in the FR2 analog beamforming vectors, these vectors are ML approaches, like RNNs, LSTMs, and transformers have
generally selected from quantized codebooks. Thus, it can been used to solve these types of time-series problems. We
be assumed that the beamforming vector f takes one of the next look at which of these approaches will be suitable for
candidate values collected in the codebook F , i.e., f ∈ F , this problem.
with cardinality ∥F ∥= NCB . We adopt a geometric (physical)
IV. C ROSS F REQUENCY BM USING T IME S ERIES DATA
model for the FR1 and FR2 channels [19]. Using this allows
us to model FR2 channel as As we discussed earlier, ML can help to drastically reduce
DXC −1 X L
2πk the time associated in BM by incorporating information about
hF R2 [k] = αl e−j K p(dTs − τl )a(θl , ϕl ), (3) the channel and predicting the best k beams. Prior work [11],
d=0 l=1
where L is the total number of channel paths, α, τ, θ, ϕ are [12] focused on using the FR1 measurements at the current
the path gains (including the path-loss), the delay, the azimuth timestep to exploit the correlation between the FR1 and
angle of arrival (AoA), and elevation AoA, respectively, and FR2 measurements. We observed that this correlation can
a is the steering vector for the lth channel path. Ts represents be composited with the correlation between measurements
the sampling time while DC denotes the cyclic prefix length at consecutive time steps i.e., a time series data of FR1
(assuming that the maximum delay is less than DC Ts ). Using measurements helps with better prediction of FR2 beams.
this channel model and the system setup mentioned above, The most common techniques to deal with time series
the achievable downlink rate for an FR2 channel hF R2 and data include RNNs, LSTMs and transformers. We ruled out
beamforming vector f can be written as: RNNs as a potential solution since they suffer from vanish-
ing/exploding gradients [6] and hence won’t be able to capture
K̄
X the time series nature of this data. We observe comparable
R({hF R2 [k̄], f}) = log2 (1 + SN R∥hTF R2 f∥2 ) (4) accuracy results for transformers and LSTMs (Section V) with
k̄=1 the transformer doing marginally better. However, transformers
PF R2
where SNR is defined as SN R = Kσ 2 . The optimal beam- have a faster inference speed due to their ability to replace
F R2
forming vector is the one that maximizes the R({hF R2 [k̄], f}) recurrence to entirely focus on attention making them faster
which we denote by f ⋆ . Predicting f ⋆ requires estimating the while also increasing the parallelism of the model. On the
channel hF R2 or an online exhaustive beam training, which other hand, LSTMs incur less number of Floating Point
incurs large training overhead, especially for high-frequency Operations (FLOPs) [20], [21] that will result in lower energy
FR2 or mmWave channels. utilization to run the model. Our contribution in this paper is
Layer Output Shape Parameters Dropout Layer
Layer Output Shape Parameters
Input (10,504) 0
Input (10,504) 0
Transformer Block 1 (10,504) 256159 Normalization Layer
LSTM 1 (10,256) 779264
Transformer Block 2 (10,504) 256159
LSTM 2 (10,256) 525312 Feed-Forward Layer
Transformer Block 3 (10,504) 256159
LSTM 3 (10,128) 197120
Transformer Block 4 (10,504) 256159
LSTM 4 (10,64) 49408 Feed-Forward Layer
Average Pooling (504,) 0
LSTM 5 (10,32) 12416
Dense (128,) 64640
Dense (128,) 4224 Multi-Head Attention
Dropout (128,) 0
Dense (504,) 65016
Dense (504,) 65016 Input
FLOPs (millions)
100 99.1 97.96 99.34 99.54 200
95.42 96.5 96.75
92.76 150
90 100
Accuracy
80
50
0
72.42 FCNN TF#1 TF#5 TF#10 TF#20 TF#40
70 66.52
69.69
Model
63.3
60 Fig. 5: Comparing the number of Floating Point Operations
FCNN RNN LSTM Transformer
Model required for all the different models.
Accuracy
B. Simulation Methodology.
80
We compare our methodology of using time-series data with 71.91 72.42 71.44
69.97 71.28
prior work [11], [12] that suggested the use of fully connected 70
neural networks (FCNN) to determine the best beam in FR2.
The FCNN model consists of an input layer and four hidden 60
layers all having the same number of neurons. All the layers TF 1 TF 5 TF 10 TF 20 TF 40
are activated with the ReLu function [23] followed by the Model
dropout function [24], except the last layer which is activated
Fig. 6: Top-1, Top-3, and Top-5 accuracy for transformer
with the softmax function [25]. In addition to using the FCNN
variants. (TF n refers to TF with window size n)
model, we also show the results for a RNN model as a
baseline comparing with our proposed LSTM and transformer
models. All the models used in our simulation are developed Top 1 Top 3 Top 5
using Keras [26]. While we use the entire models for our 100 98.17 98.32 99.34 99.23 98.88
simulation study, using Keras provides us the benefit of using 96.5 95.58 94.89
93.91 94.14
the TensorFlow Lite Converter which can easily convert our
90
models to run efficiently on UEs.
Accuracy