Professional Documents
Culture Documents
that the RAN information is available at SMO to perform adaptively direct traffic to appropriate RUs. Our frame-
resource allocation and RAN management, making a fully work not only generalizes the classical queue-length-
automated network impractical. The understanding of how O- based congestion control and scheduling (QCS) method
RAN could help improve network performance by controlling [16], but also provides a synergy between RL, QCS and
data traffic and optimizing RAN functions remains rather updated network state information, and thus enabling a
limited in the literature. In this paper, we aim to fill this gap closed-loop control of the TS in the O-RAN context.
by conducting an in-depth analysis of the multi-layer design • To ensure the practicality, we identify inherent prop-
between the physical and higher layers and developing low- erties of the JFCS problem and propose an intelligent
complexity algorithms for network control, scheduling and resource management algorithm to solve it effectively by
resource allocation in different time scales. We also analyze leveraging the stochastic optimization framework [17]. In
their impact on the throughput and delay performances in the particular, by exploiting the historical system information
6G O-RAN context. accumulated from the previous time-slots, an RL process
is developed to build the smoothed best response while
In light of the above discussions, this paper focuses on
maximizing the long-term utility for each data-flow under
designing the TS control to intelligently direct the user traf-
arbitrary changes in traffic demands. Given the updated
fic through a group of RUs, taking into account available
queue-length vector and the optimal flow-split distribu-
resources and users’ service requirements. To fully realize
tion, two low-complexity algorithms are developed to ef-
the potential performance of the TS scheme, O-RAN allows
fectively solve the short-term power control optimization
customization of user-centric strategies, multi-path routing
subproblem in an iterative fashion.
and multi-connectivity as well as proactive optimization of
• Given a scaling factor φ to minimize the Lyapunov drift
network parameters through RICs. However, the problem
[18], the theoretical performance results are analyzed to
becomes more challenging in the O-RAN setting due to several
show that the queueing network is stable. In addition, the
complicating factors: i) the traffic demand of user equipements
expected divergence in queue-length and the optimality
(UEs) often varies over time, and the complete information of √
gap of congestion control rate still scale as O( φ)
the RAN layer is indeterminate at the time of optimization √
and O(1/ φ), respectively. Thus, there always exists a
algorithm execution. Hence, the policies and control decisions
scaling factor to balance utility-optimality and latency.
at the service management and orchestration (SMO) must
be adapted to the variation of data traffic; ii) the total data We numerically evaluate the performance of the proposed
traffic is distributed unevenly to RUs due to different downlink framework. Results show that the proposed framework can im-
(DL) throughput capabilities, causing high queueing delay; prove network resource utilization significantly while achiev-
and iii) the strong correlation between congestion control and ing fast convergence and long-term utility-optimality, com-
scheduling optimization influences the optimal choice of flow- pared to state-of-the-art approaches.
split distribution of data traffic across all RUs. In addition, C. Paper Organization and Mathematical Notation
the deployment of fully automated networks is an intricate The remainder of this paper is organized as follows. The
problem in O-RAN that calls for intelligent, scalable and self- related work is discussed in Section II. In Section III, we first
organizing strategies for a holistic multi-layer optimization introduce the network model and then present the problem
framework. In this regard, reinforcement learning (RL) plays formulation. The proposed JFCS framework and its solutions
an important role in achieving long-term utility optimization. are provided in Sections IV and V, respectively. Section VI
To the best of our knowledge, the TS optimization problem for presents the key theoretical performance results of the JFCS
framework. Numerical results are given in Section VII, while reinforcement learning-based intelligent session management
Section VIII concludes the paper. for ultra-reliable and low latency communications (URLLC)
Mathematical notation: Throughout this paper, matrices and was proposed in [15] to allocate resources for serving current
vectors are written as bold uppercase and lowercase letters, and new sessions more efficiently. However, these studies did
respectively, while the scalar number is denoted in lowercase. not reveal any observable information about the RAN layer to
hH is the Hermitian transpose of vector h. The notation x ∼ SMO via periodic feedback loops. Thus, RICs in these studies
CN (0, σ 2 ) implies that x is a circularly-symmetric complex were unable to monitor RAN in a timely manner to enable
Gaussian random variable with zero mean and variance σ 2 . their management automation within O-RAN.
∥ · ∥ stands for the vector’s Euclidean norm. C and R denote In traditional RAN architectures, the TS solutions are typ-
the sets of all complex and real numbers, respectively. Finally, ically determined by users’ radio conditions of a serving cell
E{·} denotes the expectation of a random variable. while treating signals from neighboring cells as interference
[27]. The authors in [28] proposed a distributed TS scheme
II. R ELATED W ORK through edge servers, where the matrix-based shortest path
Multi-layer (a.k.a. cross-layer) optimization for traditional selection and matrix-based multipath searching algorithms
cellular RAN architectures has been extensively studied in the were developed to dynamically determine the optimal paths
literature (see e.g., [19] and references therein). For example, for traffic steering. Very recently, Kavehmadavani et al. [29]
Tang et al. [20] studied a multi-layer resource allocation showed that a dynamic multi-connectivity (MC)-based TS
problem to minimize the overall system power consumption scheme can help steer traffic flows towards the most suitable
in a cloud-RAN (C-RAN), which jointly optimizes the service cells based on user-centric conditions. In addition, the flow
scaling, remote radio head selection, and beamforming. In split for each user was purely determined by the RUs’ ca-
[21], a joint design of virtual computing and radio resource pacity in delivering user traffic demands, resulting in a very
allocation was proposed. It was shown that this approach suboptimal solution. However, this work did not embed AI/ML
can efficiently allocate the virtual computing of the baseband solutions in Non-RT RIC within the O-RAN architecture and
unit (BBU) pool to achieve load balancing among users with assumed that all network information is available at Near-RT
significantly reduced power consumption. These problems RIC to optimize radio resource allocation.
are often solved by the difference of the convex algorithm Different from all the above works and others in the
due to the combinatorial nature and strong coupling between literature that focus on a single layer, we propose a fully
optimization variables. To address this challenge, graph theory multi-layer optimization framework that captures interplays
techniques were introduced in [22] and [23] to effectively between the physical and higher layers, enabling proactive
solve the jointly coordinated scheduling and power optimiza- optimization of network parameters through RICs with pe-
tion problem in C-RAN. Recently, the multi-layer network riodic feedback loops. This holistic multi-layer optimization
coding was also investigated in [24]–[26], taking into account framework guarantees the long-term utility-optimality with far
the rate heterogeneity of different users to remote radio less latency than state-of-the-art approaches, opening the door
heads. In general, these existing works only optimized radio towards fully automated networks with enhanced control and
resources, while other factors at higher layers (e.g. congestion flexibility.
control and routing) were overlooked, making guaranteed
multi-layer QoS for O-RAN infeasible. In addition, the non- III. N ETWORK M ODEL AND P ROBLEM F ORMULATION
causal statistical knowledge of traffic demands is required to A. Network Model
model queue states, which is again impractical.
So far, there have been only a few attempts to study the Scheduler
applicability of the O-RAN architecture. Kumar et al. [9]
AK [t]
proposed an automatic relation (ANR) approach to manage
neighbour cell relationships by leveraging ML techniques,
hence improving gNodeB (gNB) handovers. The work in [13] RUs
introduced an intelligent user access control algorithm based DU
Ak [t]
on deep reinforcement learning, aiming to maximize the over-
all throughput and avoid frequent handovers. The authors in
[10] developed an RL-based dynamic function splitting which K flows CU c[t]
is shown to be able to effectively decide the O-RAN’s function A2 [t]
splits and reduce operating costs. Based on the Working Group
(WG)-2 AI/ML specifications of the O-RAN Alliance, Acu- DU RUs
mos framework and open network automation platform were
introduced in [11] to generate AI/ML models to be deployed flow-split selection A1 [t]
in RIC modules and monitor the designed workflow, respec-
tively. Motalleb et al. [14] developed an iterative algorithm Fig. 2: Illustration of the O-RAN-based system model enabling
to jointly optimize service-aware baseband resource allocation TS where each DU connects to multiple RUs towards cost-effective
and virtual network function activation, thus achieving better deployment.
data rate and lower end-to-end delay. Very recently, a deep
Frame t (Tc ) components with a low degree of mobility are assumed
to be unchanged during time-slot ts with duration of τ
Update β[t] 1 2 t t+1
and vary independently in the next time-slot. For example,
the large-scale fading coefficients may stay invariant for a
Optimize w[ts ] tTf + 1 ts = tTf + s tTf + Tf period of at least 40 small-scale fading coherence inter-
time-slot ts (τ ) vals for indoor scenarios [32]. The channel vector between
RU (i, j) and UE k ∈ K in time-slot ts is denoted by
Fig. 3: Illustration of frame structure with each time-frame t hi,j Mi,j ×1
(corresponding to one large-scale coherence time) consisting of Tf k [ts ] ∈ C , which follows the Rician fading model
i,j
time-slots. with the Rician factor qκk [t].qIn particular, hi,j
k [ts ] is mod-
eled as hi,j
k [ts ] = ξki,j [t] κi,j i,j i,j
k [t]/(κk [t] + 1)h̄k [t] +
q
As shown in Fig. 2, we consider an O-RAN architecture 1/(κi,j i,j
k [t] + 1)h̃k [ts ] where ξki,j [t] represents the large-
with one CU, I DUs and J RUs, where each DU connects scale fading; h̄i,j i,j
k [t] and h̃k [ts ] ∼ CN (0, I) are the line-of-
to multiple RUs for cost-effective deployment. Let us denote sight (LoS) and non-LoS (NLoS) components, which follow
by I ≜ {1, 2, · · · , I} the set of DUs. We consider a downlink a deterministic channel and Rayleigh fading models, respec-
tively. We let H[ts ] ≜ h1 [ts ] · · · hK [ts ] ∈ CM ×K denote the
multi-user multiple-input single-output (MU-MISO) system,
where J RUs simultaneously serve the set K ≜ {1, 2, · · · , K} channel matrix
i,j between
all RUs and UEs in time-slot ts where
H H M ×1
of K = |K| single-antenna UEs. The j-th RU served by hk [ts ] ≜ (hk [ts ]) ∀i,j ∈ C corresponds to the channel
the i-th DU is referred to as RU (i, j), which is equipped vector between RUs and UE k.
with Mi,j antennas.
P The total number of RUs’ antennas is Let us denote by xi,j i,j
k [ts ] and wk [ts ] ∈ C
Mi,j ×1
a unit-
thus MΣ = ∀(i,j) Mi,j . The set of RUs served by DU i power data symbol and a linear beamforming vector transmit-
P denoted by Ji ≜ {(i, 1), · · · , (i, Ji )} with |Ji | = Ji and
is ted from RU (i, j) to UE k in time-slot ts , respectively. The
i∈I Ji = J. The total set of RUs is denoted as J ≜ ∪i∈I Ji . received signal at UE k in time-slot ts can be written as
We assume that the midhaul (MH) link between the CU and X
DU and fronthaul link between the DU and RU have sufficient yk [ts ] = (hi,j H i,j i,j
k [ts ]) wk [ts ]xk [ts ]
(i,j)∈Pk
capacity (i.e., high-speed optical ones), so that the transmission X X
latency from CU to RUs and queueing latency at CU and DUs + (hi,j H i,j i,j
k [ts ]) wk′ [ts ]dk′ [ts ] + ωk [ts ] (1)
are negligible. k′ ∈K\{k} (i,j)∈Pk′
We consider that the system operates in a discrete time- where ωk [ts ] is the additive white Gaussian background
frame indexed by t ∈ [1, 2, · · · , T ], which corresponds to noise (AWGN) with power N0 . The downlink achievable
one large-scale coherence time with a duration of Tc , as rate (bits/s) of UE k from RU (i, j) in time-slot
illustrated in Fig. 3. Each frame is divided into Tf time-slots ts can be
written as rki,j (w[ts ]) ≜ W log2 1 + γki,j (w[ts ]) , where W
of equal duration τ = Tc /Tf , where the time-slot is indexed is the system bandwidth and the signal-to-interference-plus-
by ts = tTf + s with s ∈ {1, 2, · · · , Tf }. At CU, there noise ratio (SINR) γki,j (w[ts ]) is given by γki,j (w[ts ]) =
exist K independent data-flows, each of which is intended |(hi,j H i,j 2 i,j
k [ts ]) wk [ts ]| /Φk (w[ts ]) with
for one UE. The CU splits the data-flow of UE k, say flow k, X ′ ′ ′ ′
into multiple sub-flows which are possibly transmitted through Φi,j
k (w[ts ]) ≜ |(hik ,j [ts ])H wki ,j [ts ]|2
the set of paths and then aggregated at this UE [30], [31], (i′ ,j ′ )∈Pk \{(i,j)}
so-called “traffic steering”. For data-flow k, we denote by | {z }
Intra-user interference
Pk ≜ {(i, j)}∀(i,j)∈J the set of path states, including queue X X
states and routing tables. To improve the system throughput, + |(hi,j H i,j
k [ts ]) wk′ [ts ]|
2
+N0 (2)
k′ ∈K\{k} (i,j)∈Pk′
a subset of separate paths in the set Pk (i.e., via neighboring | {z }
RUs indexed by (i, j)) should be appropriately selected. Let Inter-user interference
us denote by ck [t] ≜ ci,j [t] the flow-split selection H
k
and w[ts ] ≜ (wki,j [ts ])H k∈K,(i,j)∈P being the vector em-
(i,j)∈P
k
is stable if the steady-state total queue-length remains finite, IV. JFCS- BASED N ETWORK U TILITY O PTIMIZATION
such as A. Tractable Form of the JFCS Problem (5)
lim sup E{∥q[ts ]∥1 } < ∞. (4) Challenges of Solving JFCS Problem (5): We can observe
ts →∞ that constraint (5c) is nonconvex while (5e) is a nonconvex
probabilistic constraint, generally making problem (5) NP-
B. Problem Formulation hard. In addition, the expectations in the constraints cause
Pts
Let r̄k ≜ lim t1s ℓ=1 rk (w[ℓ]) denote the long-term the stochastic nature of the problem, which cannot be solved
ts →∞
average rate of data-flow k. Each UE k is associated with a directly. The classical optimization approaches, such as suc-
utility function, denoted by Uk (r̄k ). To facilitate the analysis cessive convex approximation (SCA) [35], are often applied to
presented later, we make the following assumption to the solve the optimization problems of nonconvex and determinis-
utility function [18], [30], [33], [34]. tic constraints. However, the stochastic SCA-based algorithms
can no longer guarantee a feasible and (sub)-optimal solution
Assumption 1. The utility function Uk (·) is assumed to satisfy
of all subsequent time intervals (TTIs) due to the dynamics of
the following conditions
the physical layer at small timescales. The flow-split decisions
• Uk (·) is twice continuously differentiable, increasing, and mainly rely on the previous states updated by the RAN
strictly concave. layer. Towards practical applications, an efficient and adaptive
• There exist positive constants 0 < ψ < Ψ < ∞, such as solution to the long-term subproblem of (5) is necessary to
′′
ψ ≤ −Uk (r̄k ) ≤ Ψ, ∀r̄k ∈ [0, r̄max ], with r̄max being achieve high QoE for all UEs in every TTI.
the maximum long-term average rate of any data flow. Let us start by transforming problem (5) into a more
tractable form. Towards a safe design, we consider the re-
POur goal is to maximize the network utility function placement of constraint (5e) by its deterministic constraint.
k∈K Uk (r̄k ), subject to the probabilistic delay constraint,
From the basic property of probability, we can rewrite (5e) as
achievable rate region and queue-stability constraint. Based on
Prob qki,j [ts ] ≥ Āk d¯k ≤ 1 − ϵk . It follows from the well-
the network utility maximization (NUM) framework, the joint
known Markov inequality [36] that Prob qki,j [ts ] ≥ Āk d¯k ≤
flow-split distribution, congestion control and scheduling op-
timization problem (JFCS) can be mathematically formulated E{qki,j [ts ]}/Āk d¯k , yielding
as Xt Xts−1 i,j
βki,j [ℓ]Āk τ − (1 − ϵk )Āk d¯k − rk (w[ℓ])τ
ℓ=1 ℓ=1
X
JFCS : max Uk (r̄k ) (5a)
β,r̄,w ≤ rki,j (w[ts ])τ, ∀ts , k ∈ K, (i, j) ∈ Pk (6)
k∈K
s.t. lim sup E{∥q[ts ]∥1 } < ∞ (5b) where each queue-length is always non-negative. We note that
ts →∞
(6) is a relaxed constraint of (5e), which implies that any
rk (w[ts ]) ∈ CH[ts ] , ∀ts , k ∈ K (5c) feasible of the former is also feasible for the latter but not
β k [t] ∈ B[t], ∀t, k ∈ K (5d) vice versa due to the Markov upper bound on the outage
q i,j [t ] probabilities.
s
Prob k ≤ d¯k ≥ ϵk , ∀ts , k, (i, j) (5e) To facilitate the following optimization, we
Āk Tintroduce con-
T T gestion control variables a[ts ] ≜ ak [ts ] k∈K , satisfying
where β ≜ β T k k∈K and r̄ ≜ r̄k k∈K . Constraint (5e) ensures
Pts
āk − r̄k ≤ 0, ∀k, where āk ≜ lim t1s ℓ=1 ak [ℓ]. Problem
different minimum outage delay requirements for sub-flows, ts →∞
where d¯k and ϵk (0 ≪ ϵk ≤ 1) are the maximum allowable (5) is then rewritten as
X
average delay and the required reliable communication for max Uk (āk ) (7a)
each UE, respectively. It is stated that the probability of β,ā,r̄,w
i,j k∈K
qk [ts ]
Āk
≤ d¯k (i.e. UEs’ maximum allowable delay) should be s.t. (5b), (5c), (5d), (6) (7b)
āk − r̄k ≤ 0, ∀k. (7c) Algorithm 1: Intelligent Resource Management Algorithm
for Solving JFCS Problem (5), compliant with O-RAN
We also introduce a new auxiliary queue-length vector
T Initialization: Set t = 1 and select a positive scaling factor φ.
q̂[ts ] ≜ q̂k [ts ] k∈K , where q̂k [ts+1 ] = q̂k [ts ] + ak [ts ]τ − Initialize β k [1] = |P1k | [1, · · · , 1] and all queues are set to be
+
rk (w[ts ])τ to associate constraint (7c) with a penalty empty: qki,j [11 ] = 0 and q̂k [11 ] = 0, ∀(i, j), k.
function and ak [ts ] ∈ [0, Amax ]. We define the total P queue Main Loop:
1: for each frame t = 1, 2, · · · , T do {/*Long-term scale t*/}
backlog of all UEs in time-slot ts as L[ts ] = 12 k∈K 2: Flow-Split Distribution: Given {q[t − 1], A[t − 1]}, CU
i,j
P qk [ts ]2 P q̂k [ts ]2 splits data-flows of all UEs based on the optimal flow-split
(i,j)∈Pk τ2 + k∈K τ2 , which is the quadratic
decisions β ∗ [t] by solving L-SP at Non-RT RIC:
Lyapunov function [17], [37]. For given (q[ts ], q̂[ts ]), the X
Lyapunov drift from time-slot ts to ts+1 is given as ∆L[ts ] = max Lk [t].
β k [t]∈B[t],∀k
L[ts+1 ] − L[ts ]. To guarantee joint network stability and k∈K
penalty minimization (i.e., (5b) and (7c) hold true), we adopt 3: for each time-slot ts = tTf + s with s ∈ {1, · · · , Tf } do
the drift-plus-penalty procedure [17] to minimize the drift of {/*Short-term scale ts */}
a quadratic Lyapunov function and rewrite (7) as 4: Congestion Controller: Given the queue-length vector
X q̂[ts ], solve S-SP1 (12) to obtain the optimal congestion
max φ E{Uk (ak [ts ])} − E{∆L[ts ]} (8a) control variables:
β,ā,r̄,w
k∈K
n ′ q̂k [ts ] max o
a∗k [ts ] = min Uk−1 ,A , ∀k.
s.t. (5c), (5d), (6) (8b) φτ
where φ is a scaling factor to balance two objective functions. 5: Weighted Queue-Length-Based Scheduler: Given the
queue-length vector q̂[ts ] and the flow-split distribution
We now show that constraint (7c) holds with equality at β ∗ [t], each RU (i, j) ∈ Pk schedules the service rate
optimum by introducing the following lemma. rki,j (w[ts ]) for UE k ∈ K by solving S-SP2:
X q̂k [ts ]
Lemma 1. For each data-flow of UE k, the optimal congestion max rk (w[ts ]), s.t. (5c), (6).
control rate is equal to the optimal long-term average service r[ts ],w[ts ] τ
k∈K
rate, i.e., ā∗k − r̄k∗ = 0, ∀k. 6: Queue-Length Updates: Queue-Lengths are updated as
qki,j [ts+1 ] = qki,j [ts ] + βki,j [t]Ak [t]τ
The proof Lemma 1 is straightforward by examining the
Karush–Kuhn–Tucker (KKT) complementary slackness condi- +
− rki,j (w[ts ])τ , ∀k, (i, j)
tion over the increasing and strictly concave objective function +
q̂k [ts+1 ] = q̂k [ts ] + ak [ts ]τ − rk (w[ts ])τ , ∀k.
Uk (·), ∀k.
B. Overall Intelligent Resource Management Algorithm 7: Set s = s + 1
To solve problem (8) in different time scales, we now 8: end for
9: Update {q[t], A[t]} := {qki,j [t], Ak [t]}k,(i,j) to Non-RT
decompose it into three subproblems. To do so, we consider RIC.
a worst-case design by developing an upper bound of ∆L[ts ] 10: Set t = t + 1
11: end for
for given (q[ts ], q̂[ts ]). From the inequality ([x]+ )2 ≤ x2 and
(x + y)2 − x2 = 2xy + y 2 , we have
X X qki,j [ts ] i,j q i,j [t ]
βk [t]Ak [t] − rki,j (w[ts ]) where Lk [t] = (i,j)∈Pk k τ s rki,j (w[ts ]) − βki,j [t]Ak [t] .
∆LUB [ts ] ≜
P
τ
k∈K (i,j)∈Pk Although problem (11) is a linear program in β, it cannot be
X q̂k [ts ] solved directly by standard optimization techniques because
+ ak [ts ] − rk (w[ts ]) + B[ts ] ≥ ∆L[ts ] (9) Ak [t], ∀k are incompletely known at the beginning of time-
τ
k∈K
frame t.
1
P P i,j
(i,j)∈Pk βk [t]Ak [t] −
where B[ts ] ≜ 2 k∈K
Short-term subproblems (S-SPs): The congestion control
2 2 subproblem at time-slot ts is
rki,j (w[ts ]) + 21 k∈K ak [ts ] − rk (w[ts ])
P
is the
summation of the second moments of the arrival and service X q̂k [ts ]
processes. Following [17] and [30], we consider that B[ts ] is S-SP1 : max φUk (ak [ts ]) − ak [ts ] (12)
a[ts ]≥0 τ
k∈K
finite and bounded by B̄ for all ts , i.e., E{B[ts ] q[ts ], q̂[ts ]}
≤ B̄. As a result, problem (8) is simplified to which is an unconstrained convex problem. The optimal
X solution of (12) exists and is unique that is a∗k [ts ] =
max φ E{Uk (ak [ts ])} − E{∆LUB [ts ]} (10a) ′
Uk−1 q̂kφτ [ts ] ′
, ∀k, where Uk−1 (·) denotes the inverse function
β,ā,r̄,w
k∈K
of the first derivation of Uk (·). Given the optimal solution
s.t. (5c), (5d), (6). (10b) β ∗ [t], the short-term power control optimization subproblem
Long-term subproblem (L-SP): The flow-split distribution (i.e., the weighted queue-length-based scheduling) at time-slot
subproblem at time-frame t is given as ts is given as
X X q̂k [ts ]
L-SP : max Lk [t] (11) S-SP2 : max rk (w[ts ]), s.t. (5c), (6). (13)
β k [t]∈B[t],∀k r[ts ],w[ts ] τ
k∈K k∈K
The overall intelligent resource management algorithm for with θ̂ki,j [1] = 0 and ηθ [t] being the learning rate. In order
solving the JFCS problem (5) is summarized in Algorithm to achieve high performance in the long term, the L-SP
1, where the solutions of subproblems will be provided next. must balance exploration and exploitation processes. We note
that trying all possible actions to choose the best paths (e.g.
V. P ROPOSED A LGORITHMS FOR S OLVING S UBPROBLEMS the exhaustive exploration) can offer the highest payoff, but
We are now in a position to solve L-SP (11) and S-SP2 (13) with the cost of slow convergence and even computationally
in different time scales. The optimality of the latter depends prohibitive. During the exploitation process, playing an action
heavily on the optimal flow-split decisions, which often require associated with the highest estimated utility in (15) will
a prior knowledge of the statistical information of all possible likely result in a very sub-optimal solution. To make this
paths at Non-RT RIC. However, the assumption of complete tradeoff more efficient, let us define the best response function
information is unrealistic due to the dynamic environment and β̂[t] = f (θ̂[t]) as
the data collected from the RAN layer being only updated to
n X X o
βki,j [t]θ̂ki,j [t] . (17)
f (θ̂[t]) := argmin h β[t] −λ
Non-RT RIC only on the long-term scale. In this work, at β k [t]∈B[t] k∈K (i,j)∈Pk
time-frame t we aim to exploit historical system information
accumulated from the previous time-slot, which can be used to Here λ is the so-called trade-off factor (a.k.a. Boltzmann
build the smoothed best response in maximizing the long-term temperature) and h β[t] denotes the regularization function.
utility for each data flow. We note that when λ → 0, it leads to uniform probabilities of
all actions, i.e. βki,j [t] = 1/|Pk |, ∀(i, j) ∈ Pk . For λ → ∞, the
A. Reinforcement Learning Algorithm for Solving L-SP (11) second term in (17) will dominate the best response function
The flow-split decision β k [t] in problem (11) can be es- and then the actions associated with the highest estimated
timated separably by minimizing Lk [t]. This implies that regret will be selected [38].
the larger the queue-length qki,j [ts ], the lower the flow-split Regularization function: The regularization function al-
decision value βki,j [t] to guarantee fairness among all RUs lows to learn the best paths that maximize its own performance
(i, j) ∈ Pk (i.e., to avoid large queue-lengths qki,j at some and stabilize the flow-split decisions. The solutions to problem
RUs in the next time-slot ts+1 ). Let us denote (11) lie in the unit simplex for each data-flow. Therefore, we
qki,j [ts ] i,j adopt the Gibbs-Shannon entropy as the regularization func-
ui,j rk (w[ts ]) − βki,j [t]Ak [t] tion, i.e. h β[t] = k∈K (i,j)∈Pk βki,j [t] ln βki,j [t] , which
P
k [t] ≜
P
τ
is K-strongly convex. Substituting h β[t] into (17), we have
as the instantaneous utility observation of data-flow k at time- nX X
frame t when selecting path (i, j) ∈ Pk . The total utility f (θ̂[t]) := argmin βki,j [t] ln βki,j [t]
observation of data-flow k, denoted by uk [t], is thus β k [t]∈B[t],∀k k∈K (i,j)∈Pk
X o
ui,j
X X
uk [t] = k [t]. (14) −λ βki,j [t]θ̂ki,j [t] . (18)
(i,j)∈Pk k∈K (i,j)∈Pk
However, it is unable to build a smoothed best response based The function f (θ̂[t]) is convex and separable for each βki,j [t].
on ui,j
k [t] as it is not revealed at the beginning of time-frame By solving the following equation
t. Inspired by [38], we denote ûi,j k [t] as the estimated utility
∂f (θ̂[t])/∂βki,j [t] = ln βki,j [t] + 1 − λθ̂ki,j [t] = 0
of data-flow k at time-frame t when selecting path (i, j). In
addition, the actual utility observed by data-flow k at time-
we have
frame t, denoted by ūk [t], is given as ūk [t] = uk [t − 1], which
βki,j [t] = f (θ̂ki,j [t]) = exp λθ̂ki,j [t] − 1 .
is based on feedback from Near-RT RIC at time t − 1. By
initializing ûi,j
k [1] = 0, the estimated utility of data-flow k is
To ensure (i,j)∈Pk βki,j [t] = 1, ∀k (i.e. the unit simplex for
P
updated for action ck [t] = ci,jk [t] as follows:
data-flow k), we normalize fki,j (θ̂ k [t]) through the exponenti-
k [t] = ûk [t − 1] + ηu [t]1{ck [t]=ci,j
ûi,j i,j
ūk [t] ated mirror function as
k [t]}
− ûi,j
+
k [t − 1] , ∀t > 1 exp λθ̂ki,j [t] − 1
(15) i,j
fk (θ̂ k [t]) = P i′ ,j ′ +
where ηu > 0 is the decreasing step size (i.e. the learning rate), (i′ ,j ′ )∈Pk exp λθ̂k [t] − 1
which is often decreased over time to guarantee convergence. +
exp λ θ̂ki,j [t]
Naturally, ûi,j i,j
k [1] is initialized as ûk [1] = 0 for t = 1. The =P i′ ,j ′ + . (19)
indicator function 1{x=y} = 1 (resp. 0) if the condition x = y (i′ ,j ′ )∈Pk exp λ θ̂k [t]
is true (resp. false). As a result, the estimated value of each element of flow-split
Next, we denote θ̂ k [t] ≜ [θ̂ki,j [t]](i,j)∈Pk as the estimated vector β k [t] is updated for all actions with the regret as
regret vector of data-flow k, where each element is updated
βki,j [t] = βki,j [t − 1] + ηβ [t] fki,j (θ̂ k [t]) − βki,j [t − 1] (20)
for action ck [t] = ci,j
k [t] as
θ̂ki,j [t] =θ̂ki,j [t − 1] + ηθ [t]1{ck [t]=ci,j [t]} ūk [t] for t > 1, where β k [1] = |P1k | [1, · · · , 1] and ηβ [t] is the
k
learning rate. The three-step reinforcement learning procedure
− ûi,j i,j
k [t] − θ̂ k [t − 1] , ∀t > 1 (16) includes (15), (16) and (20), which do not require expensive
computations and projection to the feasible space. Algorithm 2: Proposed Iterative Algorithm for Solving
Convergence properties: The convergence conditions for (13) with MRT-Based Transmission Design
the three-step reinforcement learning procedure are given as Initialization: Set n := 1 and generate an initial feasible
follows: value for p(0) [ts ] to constraints in (25)
XT XT 1: repeat
lim ηu [t] = +∞ & lim ηu2 [t] < +∞ 2: Solve (25) to obtain the optimal transmission power
T →∞ t=1 T →∞ t=1
XT XT p∗ [ts ]
lim ηθ [t] = +∞ & lim ηθ2 [t] < +∞ 3: Update p(n) [ts ] := p∗ [ts ]
T →∞ t=1 T →∞ t=1
XT XT 4: Set n := n + 1
lim ηβ [t] = +∞ & lim ηβ2 [t] < +∞ 5: until Convergence
T →∞ t=1 T →∞ t=1 6: Output: p∗ [t√ s] = p
(n)
[ts ] and
ηθ [t] ηβ [t] i,j,∗
i,j∗
pk [ts ] i,j
lim = 0 & lim = 0. (21) wk [ts ] = √ i,j hk [ts ], ∀k, (i, j).
t→∞ ηu [t] t→∞ ηθ [t] νk [ts ]
Vki,j [ts ]w̃ki,j [ts ], where w̃ki,j [ts ] ∈ C(Mi,j −K+1)×1 , ∀k, (i, j) p̃i,j∗ i,j
k [ts ] = max p̃k,min [ts ], τ µ∗
N0
− ν̃ i,j and
√ i,j∗ i,j ln 2 k [ts ]
are the solutions to the ZFBF-based problem. By defining p̃ [t ]
wki,j,∗ [ts ] = √ ki,j s Vki,j [ts ](h̃i,j H
k [ts ]) , ∀k, (i, j).
ν̃ki,j [ts ] ≜ ∥(h̃i,j H 2 i,j i,j
k [ts ]) ∥2 with h̃k [ts ] ≜ (hk [ts ]) Vk [ts ] ∈
H i,j ν̃k [ts ]
C 1×(Mi,j −K+1)
, weq can equivalently express w̃ki,j [ts ]
(h̃i,j [ts ])H
as w̃ki,j [ts ] = p̃i,j
k [ts ]
√k i,j , where p̃[ts ] ≜ Service Management &
ν̃k [ts ] Data
i,j Orchestration (SMO) Framework storage
p̃k [ts ] k,(i,j)∈P are the solutions to the following problem:
k 1 Non-RT RIC
Policy AI/ML Agent
Computing (L-SP)
X q̂k [ts ] platform
max rk (p̃i,j
k [ts ]) (27a) 2 A1 Controls (β ∗ [t]) O1
p̃[ts ] τ
k∈K Near-RT RIC
Observation updates
s.t. R̄ki,j [ts ] ≤ rki,j (p̃i,j
k [ts ])τ, ∀k, (i, j) (27b) xAPP 1 xAPP 2 xAPP 3
-600
(1,4) B. Numerical Results of Algorithm 1
We first study the impacts of φ and λ on the convergence
-800
behavior of Algorithm 1 in Fig. 6. From Fig. 6(a), it can
-1000 be observed that the congestion control rates for different
-1000 -800 -600 -400 -200 0 200 400 600 800 1000
values of the scaling factor φ converge to the same optimal
x-coordinate (m)
solution, and ∥a[ts ]∥ is almost independent of φ. In addition,
Fig. 5: A system topology with J = 8 RUs and K = 12 UEs. increasing φ results in a smaller divergence of the steady-state
congestion control rate (see Theorem 2), but also slows down
We consider a system topology given in Fig. 5, including the convergence rate of Algorithm 1. The reason is attributed
8 RUs and 12 UEs located within a circle of 1-km radius.
P the fact that for a large φ, the network utility function
to
There are two DUs, each connected to 4 RUs. RUs are
k∈K Uk (ak [ts ]) in (8a) will prevail over the Lyapunov drift
uniformly distributed in the area, while those of UEs are function ∆L[ts ], which requires more iterations to guarantee
randomly located in each time-frame t. The large-scale fading network stability. In Fig. 6(b), we increase the trade-off factor
coefficient ξ[t] ∈ {ξki,j [t]}∀(i,j),k is modeled as the three-slope λ (i.e. Boltzmann temperature) from 0.05 to 0.7. The result
path loss model [32], such as ξ[t] = ξ0 − 35 log10 (d[t]) + shows that the larger the value of λ, the better the estimated
20c0 log10 (d/d0 )+15c1 log10 (d/d1 ) where ξ0 = −140.7+SF utility that can be achieved with the cost of lower convergence
dB, d0 = 10 m, d1 = 50 m, and d is the distance between an speed of the RL process. From (18), the paths associated
RU and a UE; here ci = max{0, |ddii −d −d| } with i ∈ {0, 1} with the highest estimated regret θ̂ki,j [t] will be selected to
and SF ∼ CN (0, σSF ) denotes the shadowing factor with minimize the best response function f (θ̂[t]). Conversely, a
σSF = 8 dB. The Rician factor κ[t] ∈ {κi,j k [t]}∀(i,j),k is low value of λ can speed up convergence by allocating traffic
given as κ = PLoS (d[t])/ 1 − PLoS (d[t]) , where the LoS data uniformly to all paths but leads to a very sub-optimal
probability
follows the 3GPP–UMa model as PLoS (d[t]) = solution.
min d[t] , 1 1−exp(− 36 ) +exp(− d[t]
18 d[t]
36 ) [43]. We consider In Fig. 7, we evaluate the performance of Algorithm 1 with
4 8
∥}
𝜑 = 100
(25)
𝜑 = 50
Congestion control rate, ∥ 𝒂[𝑡 𝑠 ] ∥
3.25
4
3
2.75
2 Alg. 1 - ZFBF
2.5 Alg. 1 - ZFBF with EFSD
0 1 2 3 4 5 6 7 8 9 10 Alg. 1 - MRT
Iteration × 10000 Alg. 1 - MRT with EFSD
(a) Impact of φ on congestion control rate 0
16 32 64 128
25 Number of antennas at RUs, 𝑀
15
∥} 10
Í
10
NUM-EFSD
𝜆 = 0.7 8 NUM-FRA
𝜆 = 0.5 NUM-NRU
5 𝜆 = 0.3
𝜆 = 0.1 6
𝜆 = 0.05
0
0 1 2 3 4 5 6 7 8 9 10 4
Iteration × 1000
(b) Impact of λ on estimated utility 2
Algorithm 1 - ZFBF
NUM-EFSD A PPENDIX A: D ERIVATION OF I NEQUALITY
( 𝜑) 1
400 NUM-FRA
Steady-state queue-length, E{∥ q̂∞
NUM-NRU We will find the concave lower bound of rki,j [ts ]. By [48,
300
Appendix A], it is true that the function r(x, y) = − ln(1 −
x2 /y) is convex in the domain y > x2 with x, y ∈ R+ . The
global concave lower bound of r(x, y) at the feasible point
200 (x̄, ȳ) is given as
D ∂r(x̄, ȳ) ∂r(x̄, ȳ) E
100 r(x, y) ≥ r(x̄, ȳ) + , , (x − x̄, y − ȳ)
∂ x̄ ∂ ȳ
x̄2 x̄x x̄2 y
0 = r(x̄, ȳ) − + 2 − (A.1)
25 50 75 100 125 150 175 200 ȳ − x̄2 ȳ − x̄2 ȳ − x̄2 ȳ
Scaling factor, 𝜑 by applying the
x2
first-order Taylor
x2
approximation. By the fact
Fig. 9: The steady-state total queue-length with respect to φ. that ln 1+ z = − ln 1− z+x2 and substituting y = z +x2 ,
√ √
ȳ = z̄ + x̄2 , x = v and x̄ = v̄ into (A.1), we obtain
√ √
80 v v̄ v̄ v v̄(z + v)
Algorithm 1 - ZFBF r(v, z) ≜ ln 1 + ≥ r(v̄, z̄) − + 2 −
70 z z̄ z̄ z̄(z̄ + v̄)
Average worst-case delay [ms]
NUM-EFSD
:= r̄(v, z; v̄, z̄) (A.2)
60 NUM-FRA
NUM-NRU where r̄(v, z; v̄, z̄) is concave and r̄(v̄, z̄; v̄, z̄) = r(v̄, z̄)
50
whenever (v, z) = (v̄, z̄).
40
A PPENDIX B: P ROOF OF T HEOREM 1
30
For a given φ, the quadratic Lyapunov function defined
20 in Section IV-A is rewritten with respect to q̂(φ) [ts ] as:
10 L(q̂(φ) [ts ]) = 2τ12 ∥q̂(φ) [ts ] − q̂∗(φ) ∥22 . Following [34, Theorem
3], the mean Lyapunov drift from time-slot ts to ts+1 is
0
5 10 15 20 25 30 computed as
Scaling factor, 𝜑 ∆L̄(q̂(φ) [ts ])
Fig. 10: Average worst-case delay with respect to φ. = E{∆L(q̂(φ) [ts ])} = E{L(q̂(φ) [ts+1 ]) − L(q̂(φ) [ts ])}
1 n T
= 2 E q̂(φ) [ts+1 ] + q̂(φ) [ts ] − 2q̂∗(φ)
√ 2τ
as O(φ) + O( φ), which confirms our theoretical results × q̂(φ) [ts+1 ] − q̂(φ) [ts ]
o
in Corollary 1. We recall from Theorem 2 that the utility-
optimality gap can be narrowed by increasing φ, but with the 1 n T
E 2q̂(φ) [ts ]+ a[ts ] − r(w[ts ]) τ − 2q̂∗(φ)
≤
cost of higher delay, as shown in Fig. 10. When φ is larger than 2τ
o
25, all the considered schemes violate the maximum allowable × a[ts ] − r(w[ts ])
average delay of d¯ = 10 ms. It implies that the data traffic 1
cannot be completely transmitted to UEs in each time-frame. = E{∥a[ts ] − r(w[ts ])∥22 }
Nevertheless, Algorithm 1 still provides the best performance |2 {z }
out of the schemes considered. ≜B1
1
+ E{(q̂(φ) [ts ] − q̂∗(φ) )T a[ts ] − r(w[ts ]) }
VIII. C ONCLUSION (B.1)
|τ {z }
We have proposed a new holistic multi-layer optimization ≜B2
framework, called JFCS, to enable intelligent traffic steering
by using the inequalities: ([x]+ )2 ≤ x2 and x2 − y 2 = (x +
in a hierarchical O-RAN architecture. In particular, we have
y)(x − y), and the fact that q̂(φ) [ts+1 ] − q̂∗(φ) = q̂(φ) [ts ] −
developed an intelligent resource management algorithm based
q̂∗(φ) + a[ts ] − r(w[ts ]) τ.
on network utility maximization and stochastic optimization
to efficiently and adaptively direct traffic to appropriate RUs We first focus on providing the expected bound of B1 as
by jointly optimizing the flow-split distribution, congestion 1
B1 = E{∥a[ts ]∥22 − 2a[ts ]T r(w[ts ]) + ∥r(w[ts ])∥22 }
control and scheduling. JFCS is proved to achieve fast con- 2
vergence, long-term utility-optimality and significant delay 1
≤ E{∥a[ts ]∥22 + ∥r(w[ts ])∥22 }
reduction compared to state-of-the-art approaches. To that end, 2
the insights in this work will foster future studies in this area, K max
A1 + (rmax )2 ≜ BUB
≤ 1 (B.2)
especially in the design of more advanced AI/ML solutions to 2
1
where the last inequality follows from Assumption 2. To bound + E (q̂∞ T ∗
(φ) ) r − r
∞
≥0 (B.8)
B2 , we first rewrite it equivalently as τ
where r∞ = ∞
P
argmax k∈K q̂k rk (w). We note
1 rk (w)∈ CH[∞] ,∀k∈K
B2 = (q̂(φ) [ts ] − q̂∗(φ) )T E a[ts ]} − r∗
τ here that (q̂∞ T ∞ ∞
P
(φ) ) r = max k∈K q̂k rk (w) ≥
1 rk (w)∈ CH[∞] ,∀k∈K
+ E (q̂(φ) [ts ] − q̂∗(φ) )T r∗ − r(w[ts ]) . (B.3)
τ (q̂∞ T ∗
(φ) ) r , yielding
From (7), it follows that (q̂(φ) [ts ] − q̂∗(φ) )T E a[ts ]} − r∗ ≤
1
E ∥q̂∞ ∗ 2
UB
0. By applying the Cauchy–Schwarz inequality, i.e. |xT y| ≤ (φ) − q̂(φ) ∥2 − B1 ≤ 0 (B.9)
τ 2 Ψφ
∥x∥2 ∥y∥2 , to the first term in (B.3), we have
E ∥q̂∞ − q̂∗(φ) ∥2
This implies that (φ) ≤
1 q
√
(q̂(φ) [ts ] − q̂∗(φ) )T E a[ts ]} − r∗
2
Kτ Ψ
= K
max + (r max )2
max
2 A1 φ where BUB
1 2 A1 +
τ
max 2
1X ∗ (r ) , showing the inequality (31) in Theorem 1.
≤− |q̂(φ),k [ts ] − q̂(φ),k ||ak [ts ] − rk∗ |. (B.4)
τ
k∈K A PPENDIX C: P ROOF OF T HEOREM 2
By Assumption 1 on Ψ-smooth and Step 4 of Algorithm 1, it is ′ ∞
−1 q̂(φ),k
To prove (34), we first recall that a∞ ∗
′ q̂ [ts ] ′ ∗
−1 q̂(φ),k [ts ] −a = U
true that ak [ts ] − rk∗ = Uk−1 (φ),k
−U ≤0 (φ),k k k φτ
φτ k φτ ′ q̂ ∗ ′ q̂
∞
′ q̂
∗
q̂ ∞
−Uk−1 (φ),k
′ q̂(φ),k [ts ]
∗
′ q̂(φ),k [ts ] q̂(φ),k [ts ]
∗
q̂(φ),k [ts ] and Uk (φ),k −Uk (φ),k ≥ ψ (φ),k φτ −
and Uk φτ −Uk φτ ≤Ψ φτ − φτ . ∗
q̂(φ),k
φτ φτ φτ
∗
′
−1 q̂(φ),k [ts ]
′
−1 q̂(φ),k [ts ]
using Assumption 1. By the inverse function lemma,
In addition, we have Uk φτ −Uk φτ ≥ φτ
′ q̂ ∞ ′ q̂ ∗ q̂ ∞ ∗
q̂(φ),k
we have Uk−1 (φ),k −Uk−1 (φ),k ≤ ψ1 (φ),k
∗
φτ − φτ ,
1 q̂(φ),k [ts ] q̂(φ),k [ts ]
Ψ φτ − φτ due to the inverse function lemma. φτ φτ
which yields
From the fact that (q̂∗(φ) )T r∗ − (q̂∗(φ) )T r(w[ts ]) ≥ 0, we can
further bound B2 as 1 (31) C1 1
∥a∞ ∗
(φ) − a ∥2 ≤ ∥q̂∞ ∗
(φ) − q̂(φ) ∥2 ≤ √ . (C.1)
1 ψτ φ ψτ φ
B2 ≤ − 2 ∥q̂(φ) [ts ] − q̂∗(φ) ∥22
τ Ψφ Next, it is assumed that Uk (·) is twice continuously differ-
1 entiable, increasing, and strictly concave. If the utility function
+ E (q̂(φ) [ts ])T r∗ − r(w[ts ])
(B.5)
τ U (a) has a maximizer a∗ , then
∗
where the term E (r − r(w[ts ])) is a constant with respect Ψ ∗ ΨC21 1
to q̂(φ) [ts ]. Substituting (B.2) and (B.5) into (B.1) yields U (a∗ ) − U (a∞
(φ) ) ≤ ∥a − a∞ 2
(φ) ∥2 ≤ (C.2)
2 2ψ 2 τ 2 φ
1 where the last inequality follows from (C.1). The proof is thus
∆L̄(q̂(φ) [ts ]) ≤ − ∥q̂(φ) [ts ] − q̂∗(φ) ∥22 + BUB
1
τ 2 Ψφ complete.
1
E (q̂(φ) [ts ])T r∗ − r(w[ts ]) . (B.6)
+
τ R EFERENCES
[1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
We now compute the mean Lyapunov drift over T Tf time- Applications, trends, technologies, and open research problems,” IEEE
slots as Network, vol. 34, no. 3, pp. 134–142, 2020.
[2] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The
Tf
T X roadmap to 6G: AI empowered wireless networks,” IEEE Commun.
X
∆L̄= E{L(q̂(φ) [ts+1 ]) − L(q̂(φ) [ts ])|q̂(φ) [11 ]} Mag., vol. 57, no. 8, pp. 84–90, 2019.
t=1 s=1
[3] O-RAN Alliance, “ORAN-WG1 O-RAN architecture description
v01.00.00.” Technical Specification, Feb. 2020.
T Tf
XX X [4] SAMSUNG, “ORAN - The Open Road to 5G,” White Paper, July
= Prob q̂(φ) [ts ] = q̂(φ) |q̂(φ) [11 ] 2019. [Online]. Available: https://www.samsung.com/global/business/
t=1 s=1 q̂(φ) ≥0 networks/insights/whitepapers/ORAN-the-open-road-to-5g/
[5] O-RAN Alliance, “ORAN Working Group 2: AI/ML workflow descrip-
×E{L(q̂(φ) [ts+1 ]) − L(q̂(φ) [ts ])|q̂(φ) [ts ] = q̂(φ) } .(B.7) tion and requirements,” Tech. Rep., Mar. 2019.
[6] L. Bonati, S. D’Oro, M. Polese, S. Basagni, and T. Melodia, “Intelli-
Let us denote by ρ∞ the stationary distribution gence and learning in O-RAN for data-driven nextG cellular networks,”
q̂(φ) IEEE Commun. Mag., vol. 59, no. 10, pp. 21–27, 2021.
of the Markov chain q̂(φ) [ts ] ≥ 0, i.e. ρq̂∞(φ) = [7] L. Gavrilovska, V. Rakovic, and D. Denkovski, “From cloud RAN to
PT PTf ORAN,” Wireless Personal Communications, Mar. 2020.
limT →∞ T 1Tf t=1 s=1
Prob q̂(φ) [ts ] = q̂(φ) |q̂(φ) [11 ] . [8] J. Wang, H. Roy, and C. Kelly, “OpenRAN: The next generation of
By substituting (B.6) into (B.7) and dividing both side with radio access networks,” Accesnture Startegy, Tech. Rep., Nov. 2019.
T Tf , we have [Online]. Available: https://telecominfraproject.com/openran/
[9] H. Kumar, V. Sapru, and S. K. Jaisawal, “O-RAN based proactive
X 1 ANR optimization,” in IEEE Global Commun. Conf. Workshops (IEEE
ρ∞
q̂(φ) − 2 ∥q̂(φ) [ts ] − q̂∗(φ) ∥22 + BUB
1 GLOBECOM Wkshps.), 2020, pp. 1–4.
τ Ψφ
q̂(φ) ≥0 [10] T. Pamuklu, M. Erol-Kantarci, and C. Ersoy, “Reinforcement learning
1 based dynamic function splitting in disaggregated green open RANs,”
+ (q̂(φ) [ts ])T E r∗ − r(w[ts ])
in IEEE Inter. Conf. Commun. (IEEE ICC 2021), 2021, pp. 1–6.
τ [11] H. Lee, J. Cha, D. Kwon, M. Jeong, and I. Park, “Hosting AI/ML
1 workflows on O-RAN RIC platform,” in IEEE Global Commun. Conf.
E ∥q̂∞ ∗ 2
UB
=− 2 (φ) − q̂(φ) ∥2 +B1 Workshops (IEEE GLOBECOM Wkshps.), 2020, pp. 1–6.
τ Ψφ
[12] J. A. Ayala-Romero, A. Garcia-Saavedra, X. Costa-Perez, and G. Iosi- [30] T. K. Vu, M. Bennis, M. Debbah, and M. Latva-Aho, “Joint path
fidis, “Bayesian online learning for energy-aware resource orchestration selection and rate allocation framework for 5G self-backhauled mm-
in virtualized RANs,” in IEEE Conf. Comput. Commun. (IEEE INFO- wave networks,” IEEE Trans. Wireless Commun., vol. 18, no. 4, pp.
COM), 2021, pp. 1–10. 2431–2445, 2019.
[13] Y. Cao, S.-Y. Lien, Y.-C. Liang, K.-C. Chen, and X. Shen, “User access [31] S. Singh, M. Geraseminko, S.-p. Yeh, N. Himayat, and S. Talwar,
control in open radio access networks: A federated deep reinforcement “Proportional fair traffic splitting and aggregation in heterogeneous
learning approach,” IEEE Trans. Wire. Commun., vol. 21, no. 6, pp. wireless networks,” IEEE Commun. Lett., vol. 20, no. 5, pp. 1010–1013,
3721–3736, 2022. 2016.
[14] M. Karbalaee Motalleb, V. Shah-Mansouri, S. Parsaeefard, and O. L. [32] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta,
Alcaraz López, “Resource allocation in an Open RAN system using “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless
network slicing,” IEEE Trans. Netw. and Ser. Manag., vol. 20, no. 1, Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
pp. 471–485, 2023. [33] P. Key, L. Massoulie, and D. Towsley, “Path selection and multipath
[15] S.-Y. Lien and D.-J. Deng, “Intelligent session management for URLLC congestion control,” in IEEE Conf. Comput. Commun. (IEEE INFO-
in 5G open radio access network: A deep reinforcement learning COM), 2007, pp. 143–151.
approach,” IEEE Trans. Indus. Infor., vol. 19, no. 2, pp. 1844–1853, [34] J. Liu, A. Eryilmaz, N. B. Shroff, and E. S. Bentley, “Understanding the
2023. impacts of limited channel state information on massive MIMO cellular
[16] M. J. Neely, E. Modiano, and C.-P. Li, “Fairness and optimal stochas- network optimization,” IEEE J. Sel. Areas Commun., vol. 35, no. 8, pp.
tic control for heterogeneous networks,” IEEE/ACM Trans. on Netw., 1715–1727, 2017.
vol. 16, no. 2, pp. 396–409, 2008. [35] M. Razaviyayn, “Successive convex approximation: Analysis and appli-
[17] M. J. Neely, “Stochastic network optimization with application to cations,” Ph.D. dissertation, University of Minnesota, 2014.
communication and queueing systems,” Synthesis Lectures Commun. [36] P. Billingsley, Probability and Measure, 3rd ed. New York: Wiley,
Netw., vol. 3, no. 1, pp. 1–211, 2010. 1995.
[18] A. Eryilmaz and R. Srikant, “Joint congestion control, routing, and MAC [37] L. Tassiulas and A. Ephremides, “Stability properties of constrained
for stability and fairness in wireless networks,” IEEE J. Sel. Areas in queueing systems and scheduling policies for maximum throughput
Commun., vol. 24, no. 8, pp. 1514–1524, 2006. in multihop radio networks,” IEEE Trans. Automatic Control, vol. 37,
[19] M. A. Habibi, M. Nasimi, B. Han, and H. D. Schotten, “A comprehensive no. 12, pp. 1936–1948, 1992.
survey of RAN architectures toward 5G mobile communication system,” [38] M. Bennis, S. M. Perlaza, P. Blasco, Z. Han, and H. V. Poor, “Self-
IEEE Access, vol. 7, pp. 70 371–70 421, 2019. organization in small cell networks: A reinforcement learning approach,”
[20] J. Tang, W. P. Tay, and T. Q. S. Quek, “Cross-layer resource allocation IEEE Trans. Wireless Commun., vol. 12, no. 7, pp. 3202–3212, 2013.
with elastic service scaling in cloud radio access network,” IEEE Trans. [39] D. S. Leslie and E. Collins, “Convergent multiple-timescales reinforce-
Wireless Commun., vol. 14, no. 9, pp. 5068–5081, 2015. ment learning algorithms in normal form games,” Anna. Appl. Prob.,
[21] P. Luong, F. Gagnon, C. Despins, and L.-N. Tran, “Joint virtual comput- vol. 13, no. 4, pp. 1231–1251, 2003.
ing and radio resource allocation in limited fronthaul green C-RANs,” [40] A. Beck, A. Ben-Tal, and L. Tetruashvili, “A sequential parametric
IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2602–2617, 2018. convex approximation method with applications to nonconvex truss
[22] A. Douik, H. Dahrouj, T. Y. Al-Naffouri, and M.-S. Alouini, “Coor- topology design problems,” J. Global Optim., vol. 47, no. 1, pp. 29–
dinated scheduling and power control in cloud-radio access networks,” 51, May 2010.
IEEE Trans. Wireless Commun., vol. 15, no. 4, pp. 2523–2536, 2016. [41] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimiza-
[23] A. Douik, H. Dahrouj, T. Y. Al-Naffouri, and M.-S. Alouini, “Low- tion. Philadelphia: MPS-SIAM Series on Optimi., SIAM, 2001.
complexity scheduling and power adaptation for coordinated cloud-radio [42] A. Eryilmaz and R. Srikant, “Fair resource allocation in wireless
access networks,” IEEE Commun. Lett., vol. 21, no. 10, pp. 2298–2301, networks using queue-length-based scheduling and congestion control,”
2017. IEEE/ACM Trans. Net., vol. 15, no. 6, pp. 1333–1344, 2007.
[24] M. S. Al-Abiad, A. Douik, and S. Sorour, “Rate aware network codes [43] A. H. Jafari, D. López-Pérez, M. Ding, and J. Zhang, “Study on
for cloud radio access networks,” IEEE Trans. Mobile Comput., vol. 18, scheduling techniques for ultra dense small cell networks,” in IEEE Veh.
no. 8, pp. 1898–1910, 2019. Technol. Conf. (VTC-Fall), 2015, pp. 1–6.
[25] A. Douik, S. Sorour, T. Y. Al-Naffouri, and M.-S. Alouini, “Rate aware [44] 3GPP, NR, Physical channels and modulation (Release 15), document
instantly decodable network codes,” IEEE Trans. Wireless Commun., 3GPP TS 38.211 version 15.2.0 Release 15, 2017.
vol. 16, no. 2, pp. 998–1011, 2017. [45] S. Samarakoon, M. Bennis, W. Saad, and M. Latva-aho, “Backhaul-
[26] M. S. Al-Abiad, A. Douik, S. Sorour, and M. J. Hossain, “Throughput aware interference management in the uplink of wireless small cell
maximization in cloud-radio access networks using cross-layer network networks,” IEEE Trans. Wireless Commun., vol. 12, no. 11, pp. 5813–
coding,” IEEE Trans. Mobile Comput., vol. 21, no. 2, pp. 696–711, 2022. 5825, 2013.
[27] O-RAN.WG2.Use-Case-Requirements-v02.01, “Non-RT RIC & A1 [46] X. Lin, N. Shroff, and R. Srikant, “A tutorial on cross-layer optimization
interface: Use cases and requirements,” Technical Specifi- in wireless networks,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp.
cation, Nov. 2021. [Online]. Available: https://www.o-ran.org/ 1452–1463, 2006.
specifications(accessedon10November2021) [47] E. Stai and S. Papavassiliou, “User optimal throughput-delay trade-off
[28] M. R. Anwar, S. Wang, M. F. Akram, S. Raza, and S. Mahmood, in multihop networks under NUM framework,” IEEE Commun. Lett.,
“5G-enabled MEC: A distributed traffic steering for seamless service vol. 18, no. 11, pp. 1999–2002, 2014.
migration of internet of vehicles,” IEEE Internet of Things J., vol. 9, [48] V.-D. Nguyen, T. Q. Duong, H. D. Tuan, O.-S. Shin, and H. V. Poor,
no. 1, pp. 648–661, 2022. “Spectral and energy efficiencies in full-duplex wireless information and
[29] F. Kavehmadavani, V.-D. Nguyen, T. X. Vu, and S. Chatzinotas, “Intel- power transfer,” IEEE Trans. Commun., vol. 65, no. 5, pp. 2220–2233,
ligent traffic steering in beyond 5G Open RAN based on LSTM traffic May 2017.
prediction,” IEEE Trans. Wireless Commun., pp. 1–1, 2023.