You are on page 1of 15

Network-Aided Intelligent Traffic Steering in 6G

O-RAN: A Multi-Layer Optimization Framework


Van-Dinh Nguyen, Thang X. Vu, Nhan Thanh Nguyen, Dinh C. Nguyen, Markku Juntti,
Nguyen Cong Luong, Dinh Thai Hoang, Diep N. Nguyen and Symeon Chatzinotas

Abstract—To enable an intelligent, programmable and multi- Service Management &


vendor radio access network (RAN) for 6G networks, consider- Orchestration (SMO) Framework
A1 O1
able efforts have been made in standardization and development Non-RT RIC
Mgmt Layer Design Policy Config.
of open RAN (O-RAN). So far, however, the applicability of AI/ML Agent
O-RAN in controlling and optimizing RAN functions has not (Non-RT)
been widely investigated. In this paper, we jointly optimize Performance
Controls/ policies updates
the flow-split distribution, congestion control and scheduling Near-RT RIC
arXiv:2302.02711v2 [cs.LG] 29 May 2023

(JFCS) to enable an intelligent traffic steering application in O- Control Layer


RAN. Combining tools from network utility maximization and (Near-RT) Mobility & RAN Data Trained
xAPPS QoS Mgmt Analytics AI model
stochastic optimization, we introduce a multi-layer optimization
framework that provides fast convergence, long-term utility- E2 Actions
optimality and significant delay reduction compared to the state-
of-the-art and baseline RAN approaches. Our main contributions Function Layer Open FH
are three-fold: i) we propose the novel JFCS framework to (RT) F1 DU 7-2x
efficiently and adaptively direct traffic to appropriate radio RU Resource
CU updates
units; ii) we develop low-complexity algorithms based on the Open FH
reinforcement learning, inner approximation and bisection search F1 DU 7-2x
methods to effectively solve the JFCS problem in different time NFVI Platform RU
scales; and iii) the rigorous theoretical performance results are
analyzed to show that there exists a scaling factor to improve the Fig. 1: O-RAN Alliance reference architecture and workflow [3],
tradeoff between delay and utility-optimization. Collectively, the with Non-RT and Near-RT RICs. A base station is disaggregated to
insights in this work will open the door towards fully automated CU, DU and RU.
networks with enhanced control and flexibility. Numerical results
are provided to demonstrate the effectiveness of the proposed
algorithms in terms of the convergence rate, long-term utility-
optimality and delay reduction. the coexistence of these demands [1]. As we move towards
6G, the latest frontier in this endeavor is open radio access
Index Terms—Open radio access network, intelligent resource
management, traffic steering, reinforcement learning, resource network (O-RAN) by disaggregating RAN components and
sharing. opening up interfaces, which is considered today the most
promising approach to revolutionize the wireless technology
I. I NTRODUCTION from “connected things” to “connected intelligence” [2]–[4].
O-RAN is expected to fully enable programmable, intelligent,
With the great success of mobile Internet, fifth generation interoperable and multi-vendor RAN [5].
(5G) cellular networks have been standardized to meet com-
Fig. 1 illustrates the high-level O-RAN Alliance reference
peting demands (e.g. extremely high data rate, low-latency
architecture [3], where the “black” parts and interfaces are
and massive connectivity) and proliferation of heterogeneous
defined by the 3rd Generation Partnership Project (3GPP),
devices. However, the existing “one-size-fits-all” 5G archi-
while the “orange” parts and interfaces are defined by O-
tecture lacks sufficient intelligence and flexibility to enable
RAN Alliance. O-RAN initiatives were developed to split
V.-D. Nguyen is with the College of Engineering and Computer Sci- the RAN into the radio unit (RU), distributed unit (DU)
ence, VinUniversity, Vinhomes Ocean Park, Hanoi 100000, Vietnam (e-mail: and centralized unit (CU), allowing for the interoperability
dinh.nv2@vinuni.edu.vn).
T. X. Vu and S. Chatzinotas are with the Interdisciplinary Centre for of open hardware (HW), software (SW) and interfaces (e.g.
Security, Reliability and Trust (SnT), University of Luxembourg, L-1855 Lux- O1, A1 and E2) [3], [4]. The O-RAN architecture typically
embourg City, Luxembourg (e-mail: {thang.vu, symeon.chatzinotas}@uni.lu). has three main layers (or loops), including the management,
N. T. Nguyen and M. Juntti are with Centre for Wireless Communications,
University of Oulu, P.O.Box 4500, FI-90014, Finland, (email: {nhan.nguyen, control and function layers as illustrated in Fig. 1. In particular,
markku.juntti}@oulu.fi). the management layer takes place in non-real-time (Non-RT)
N. C. Luong is with the Faculty of Computer Science, PHENIKAA over 1 s (second) with orchestration, automation functions and
University, Hanoi 12116, Vietnam (e-mail: luong.nguyencong@phenikaa-
uni.edu.vn). trained artificial intelligence (AI) and machine learning (ML)
Dinh C. Nguyen is with the Elmore Family School of Electrical and Com- models. The control layer is executed in near real-time (Near-
puter Engineering, Purdue University, USA (e-mail: nguye772@purdue.edu). RT) between 10 ms and 1 s to provide functions like radio
D. T. Hoang and D. N. Nguyen are with the School of Electrical and Data
Engineering, University of Technology Sydney, Sydney, NSW 2007, Australia resource management (RRM), quality-of-service (QoS) man-
(e-mail: {hoang.dinh, diep.nguyen}@uts.edu.au) agement and interference management. Finally, the function
layer provides the RAN optimization of a timescale below 10 O-RAN as outlined above has not been thoroughly addressed
ms, such as scheduling, power control and radio-frequency in the literature.
assignment, etc. The function layer (CU, DU and RU) is B. Main Contributions
also connected to the Non-RT RIC through the O1 interface In this paper, we consider a practical scenario where the
for periodic feedback, aiming to fully enable autonomous complete information of the RAN layer is not available at
and self-optimizing networks. Two important parts introduced the beginning of each time-frame. Instead, we assume that
in O-RAN are Non-RT RAN intelligent controller (Non-RT only their expected values are available to approximately
RIC) and Near-RT RIC that allow to access RRM functions. measure queueing delay. An interesting question naturally
The former enables AI/ML workflow for RAN components arises: How does the incomplete information of user traffic
and RRM like traffic steering (TS) as well as policy-based demands affect the optimal choices of the TS scheme? To
guidance of applications in Near-RT RIC, while the latter is answer this question and address the challenges above, we
embedded with the control/optimization algorithms of RAN introduce a holistic multi-layer optimization framework that
and radio resources [5]–[8]. jointly optimizes the flow-split distribution, congestion control
A. Motivation and scheduling (called JFCS). The proposed framework effec-
tively characterizes the complex interactions between layers
Yet the existing research efforts on O-RAN in the academic
(e.g. flow-split selection, congestion control rate and power
community are isolated, providing only tailored solutions to
allocation). In summary, we make the following three key
problems at either the physical or higher layers [9]–[12]. On
contributions:
the other hand, ML-based approaches (e.g. [9], [10], [12]–
[15]) often ignored periodic feedback loops and assumed • We propose a novel JFCS framework to efficiently and

that the RAN information is available at SMO to perform adaptively direct traffic to appropriate RUs. Our frame-
resource allocation and RAN management, making a fully work not only generalizes the classical queue-length-
automated network impractical. The understanding of how O- based congestion control and scheduling (QCS) method
RAN could help improve network performance by controlling [16], but also provides a synergy between RL, QCS and
data traffic and optimizing RAN functions remains rather updated network state information, and thus enabling a
limited in the literature. In this paper, we aim to fill this gap closed-loop control of the TS in the O-RAN context.
by conducting an in-depth analysis of the multi-layer design • To ensure the practicality, we identify inherent prop-

between the physical and higher layers and developing low- erties of the JFCS problem and propose an intelligent
complexity algorithms for network control, scheduling and resource management algorithm to solve it effectively by
resource allocation in different time scales. We also analyze leveraging the stochastic optimization framework [17]. In
their impact on the throughput and delay performances in the particular, by exploiting the historical system information
6G O-RAN context. accumulated from the previous time-slots, an RL process
is developed to build the smoothed best response while
In light of the above discussions, this paper focuses on
maximizing the long-term utility for each data-flow under
designing the TS control to intelligently direct the user traf-
arbitrary changes in traffic demands. Given the updated
fic through a group of RUs, taking into account available
queue-length vector and the optimal flow-split distribu-
resources and users’ service requirements. To fully realize
tion, two low-complexity algorithms are developed to ef-
the potential performance of the TS scheme, O-RAN allows
fectively solve the short-term power control optimization
customization of user-centric strategies, multi-path routing
subproblem in an iterative fashion.
and multi-connectivity as well as proactive optimization of
• Given a scaling factor φ to minimize the Lyapunov drift
network parameters through RICs. However, the problem
[18], the theoretical performance results are analyzed to
becomes more challenging in the O-RAN setting due to several
show that the queueing network is stable. In addition, the
complicating factors: i) the traffic demand of user equipements
expected divergence in queue-length and the optimality
(UEs) often varies over time, and the complete information of √
gap of congestion control rate still scale as O( φ)
the RAN layer is indeterminate at the time of optimization √
and O(1/ φ), respectively. Thus, there always exists a
algorithm execution. Hence, the policies and control decisions
scaling factor to balance utility-optimality and latency.
at the service management and orchestration (SMO) must
be adapted to the variation of data traffic; ii) the total data We numerically evaluate the performance of the proposed
traffic is distributed unevenly to RUs due to different downlink framework. Results show that the proposed framework can im-
(DL) throughput capabilities, causing high queueing delay; prove network resource utilization significantly while achiev-
and iii) the strong correlation between congestion control and ing fast convergence and long-term utility-optimality, com-
scheduling optimization influences the optimal choice of flow- pared to state-of-the-art approaches.
split distribution of data traffic across all RUs. In addition, C. Paper Organization and Mathematical Notation
the deployment of fully automated networks is an intricate The remainder of this paper is organized as follows. The
problem in O-RAN that calls for intelligent, scalable and self- related work is discussed in Section II. In Section III, we first
organizing strategies for a holistic multi-layer optimization introduce the network model and then present the problem
framework. In this regard, reinforcement learning (RL) plays formulation. The proposed JFCS framework and its solutions
an important role in achieving long-term utility optimization. are provided in Sections IV and V, respectively. Section VI
To the best of our knowledge, the TS optimization problem for presents the key theoretical performance results of the JFCS
framework. Numerical results are given in Section VII, while reinforcement learning-based intelligent session management
Section VIII concludes the paper. for ultra-reliable and low latency communications (URLLC)
Mathematical notation: Throughout this paper, matrices and was proposed in [15] to allocate resources for serving current
vectors are written as bold uppercase and lowercase letters, and new sessions more efficiently. However, these studies did
respectively, while the scalar number is denoted in lowercase. not reveal any observable information about the RAN layer to
hH is the Hermitian transpose of vector h. The notation x ∼ SMO via periodic feedback loops. Thus, RICs in these studies
CN (0, σ 2 ) implies that x is a circularly-symmetric complex were unable to monitor RAN in a timely manner to enable
Gaussian random variable with zero mean and variance σ 2 . their management automation within O-RAN.
∥ · ∥ stands for the vector’s Euclidean norm. C and R denote In traditional RAN architectures, the TS solutions are typ-
the sets of all complex and real numbers, respectively. Finally, ically determined by users’ radio conditions of a serving cell
E{·} denotes the expectation of a random variable. while treating signals from neighboring cells as interference
[27]. The authors in [28] proposed a distributed TS scheme
II. R ELATED W ORK through edge servers, where the matrix-based shortest path
Multi-layer (a.k.a. cross-layer) optimization for traditional selection and matrix-based multipath searching algorithms
cellular RAN architectures has been extensively studied in the were developed to dynamically determine the optimal paths
literature (see e.g., [19] and references therein). For example, for traffic steering. Very recently, Kavehmadavani et al. [29]
Tang et al. [20] studied a multi-layer resource allocation showed that a dynamic multi-connectivity (MC)-based TS
problem to minimize the overall system power consumption scheme can help steer traffic flows towards the most suitable
in a cloud-RAN (C-RAN), which jointly optimizes the service cells based on user-centric conditions. In addition, the flow
scaling, remote radio head selection, and beamforming. In split for each user was purely determined by the RUs’ ca-
[21], a joint design of virtual computing and radio resource pacity in delivering user traffic demands, resulting in a very
allocation was proposed. It was shown that this approach suboptimal solution. However, this work did not embed AI/ML
can efficiently allocate the virtual computing of the baseband solutions in Non-RT RIC within the O-RAN architecture and
unit (BBU) pool to achieve load balancing among users with assumed that all network information is available at Near-RT
significantly reduced power consumption. These problems RIC to optimize radio resource allocation.
are often solved by the difference of the convex algorithm Different from all the above works and others in the
due to the combinatorial nature and strong coupling between literature that focus on a single layer, we propose a fully
optimization variables. To address this challenge, graph theory multi-layer optimization framework that captures interplays
techniques were introduced in [22] and [23] to effectively between the physical and higher layers, enabling proactive
solve the jointly coordinated scheduling and power optimiza- optimization of network parameters through RICs with pe-
tion problem in C-RAN. Recently, the multi-layer network riodic feedback loops. This holistic multi-layer optimization
coding was also investigated in [24]–[26], taking into account framework guarantees the long-term utility-optimality with far
the rate heterogeneity of different users to remote radio less latency than state-of-the-art approaches, opening the door
heads. In general, these existing works only optimized radio towards fully automated networks with enhanced control and
resources, while other factors at higher layers (e.g. congestion flexibility.
control and routing) were overlooked, making guaranteed
multi-layer QoS for O-RAN infeasible. In addition, the non- III. N ETWORK M ODEL AND P ROBLEM F ORMULATION
causal statistical knowledge of traffic demands is required to A. Network Model
model queue states, which is again impractical.
So far, there have been only a few attempts to study the Scheduler
applicability of the O-RAN architecture. Kumar et al. [9]
AK [t]
proposed an automatic relation (ANR) approach to manage
neighbour cell relationships by leveraging ML techniques,
hence improving gNodeB (gNB) handovers. The work in [13] RUs
introduced an intelligent user access control algorithm based DU
Ak [t]
on deep reinforcement learning, aiming to maximize the over-
all throughput and avoid frequent handovers. The authors in
[10] developed an RL-based dynamic function splitting which K flows CU c[t]
is shown to be able to effectively decide the O-RAN’s function A2 [t]
splits and reduce operating costs. Based on the Working Group
(WG)-2 AI/ML specifications of the O-RAN Alliance, Acu- DU RUs
mos framework and open network automation platform were
introduced in [11] to generate AI/ML models to be deployed flow-split selection A1 [t]
in RIC modules and monitor the designed workflow, respec-
tively. Motalleb et al. [14] developed an iterative algorithm Fig. 2: Illustration of the O-RAN-based system model enabling
to jointly optimize service-aware baseband resource allocation TS where each DU connects to multiple RUs towards cost-effective
and virtual network function activation, thus achieving better deployment.
data rate and lower end-to-end delay. Very recently, a deep
Frame t (Tc ) components with a low degree of mobility are assumed
to be unchanged during time-slot ts with duration of τ
Update β[t] 1 2 t t+1
and vary independently in the next time-slot. For example,
the large-scale fading coefficients may stay invariant for a
Optimize w[ts ] tTf + 1 ts = tTf + s tTf + Tf period of at least 40 small-scale fading coherence inter-
time-slot ts (τ ) vals for indoor scenarios [32]. The channel vector between
RU (i, j) and UE k ∈ K in time-slot ts is denoted by
Fig. 3: Illustration of frame structure with each time-frame t hi,j Mi,j ×1
(corresponding to one large-scale coherence time) consisting of Tf k [ts ] ∈ C , which follows the Rician fading model
i,j
time-slots. with the Rician factor qκk [t].qIn particular, hi,j
k [ts ] is mod-
eled as hi,j
k [ts ] = ξki,j [t] κi,j i,j i,j
k [t]/(κk [t] + 1)h̄k [t] +
q 
As shown in Fig. 2, we consider an O-RAN architecture 1/(κi,j i,j
k [t] + 1)h̃k [ts ] where ξki,j [t] represents the large-
with one CU, I DUs and J RUs, where each DU connects scale fading; h̄i,j i,j
k [t] and h̃k [ts ] ∼ CN (0, I) are the line-of-
to multiple RUs for cost-effective deployment. Let us denote sight (LoS) and non-LoS (NLoS) components, which follow
by I ≜ {1, 2, · · · , I} the set of DUs. We consider a downlink a deterministic channel  and Rayleigh fading models, respec-
tively. We let H[ts ] ≜ h1 [ts ] · · · hK [ts ] ∈ CM ×K denote the

multi-user multiple-input single-output (MU-MISO) system,
where J RUs simultaneously serve the set K ≜ {1, 2, · · · , K} channel matrix
 i,j between
 all RUs and UEs in time-slot ts where
H H M ×1
of K = |K| single-antenna UEs. The j-th RU served by hk [ts ] ≜ (hk [ts ]) ∀i,j ∈ C corresponds to the channel
the i-th DU is referred to as RU (i, j), which is equipped vector between RUs and UE k.
with Mi,j antennas.
P The total number of RUs’ antennas is Let us denote by xi,j i,j
k [ts ] and wk [ts ] ∈ C
Mi,j ×1
a unit-
thus MΣ = ∀(i,j) Mi,j . The set of RUs served by DU i power data symbol and a linear beamforming vector transmit-
P denoted by Ji ≜ {(i, 1), · · · , (i, Ji )} with |Ji | = Ji and
is ted from RU (i, j) to UE k in time-slot ts , respectively. The
i∈I Ji = J. The total set of RUs is denoted as J ≜ ∪i∈I Ji . received signal at UE k in time-slot ts can be written as
We assume that the midhaul (MH) link between the CU and X
DU and fronthaul link between the DU and RU have sufficient yk [ts ] = (hi,j H i,j i,j
k [ts ]) wk [ts ]xk [ts ]
(i,j)∈Pk
capacity (i.e., high-speed optical ones), so that the transmission X X
latency from CU to RUs and queueing latency at CU and DUs + (hi,j H i,j i,j
k [ts ]) wk′ [ts ]dk′ [ts ] + ωk [ts ] (1)
are negligible. k′ ∈K\{k} (i,j)∈Pk′
We consider that the system operates in a discrete time- where ωk [ts ] is the additive white Gaussian background
frame indexed by t ∈ [1, 2, · · · , T ], which corresponds to noise (AWGN) with power N0 . The downlink achievable
one large-scale coherence time with a duration of Tc , as rate (bits/s) of UE k from RU (i, j) in time-slot
illustrated in Fig. 3. Each frame is divided into Tf time-slots  ts can be
written as rki,j (w[ts ]) ≜ W log2 1 + γki,j (w[ts ]) , where W
of equal duration τ = Tc /Tf , where the time-slot is indexed is the system bandwidth and the signal-to-interference-plus-
by ts = tTf + s with s ∈ {1, 2, · · · , Tf }. At CU, there noise ratio (SINR) γki,j (w[ts ]) is given by γki,j (w[ts ]) =
exist K independent data-flows, each of which is intended |(hi,j H i,j 2 i,j
k [ts ]) wk [ts ]| /Φk (w[ts ]) with
for one UE. The CU splits the data-flow of UE k, say flow k, X ′ ′ ′ ′
into multiple sub-flows which are possibly transmitted through Φi,j
k (w[ts ]) ≜ |(hik ,j [ts ])H wki ,j [ts ]|2
the set of paths and then aggregated at this UE [30], [31], (i′ ,j ′ )∈Pk \{(i,j)}
so-called “traffic steering”. For data-flow k, we denote by | {z }
Intra-user interference
Pk ≜ {(i, j)}∀(i,j)∈J the set of path states, including queue X X
states and routing tables. To improve the system throughput, + |(hi,j H i,j
k [ts ]) wk′ [ts ]|
2
+N0 (2)
k′ ∈K\{k} (i,j)∈Pk′
a subset of separate paths in the set Pk (i.e., via neighboring | {z }
RUs indexed by (i, j)) should  be appropriately selected. Let Inter-user interference
us denote by ck [t] ≜ ci,j [t] the flow-split selection H
k
and w[ts ] ≜ (wki,j [ts ])H k∈K,(i,j)∈P being the vector em-
(i,j)∈P

k

(action) vector for data-flow k in time-frame t, i.e., ci,j


k [t]
=1 k
bracing all the beamformers. The overall effective data rate
if path (i, j) ∈ Pk (i.e., via RU (i, j)) is selected to transmit
data of flow k; otherwise, ci,j i,j P data-flowi,jk (or UE k) can be computed as rk (w[ts ]) =
of
k [t] = 0. We let βk [t] ∈ [0, 1]
(i,j)∈Pk rk (w[ts ]). Then, for each H[ts ] and a given β k [t],
be the fraction of data-flow k which is routed via path we define the instantaneous achievable rate region under
(i, j) in time-frame (state) t by selecting action ci,j
k [t], where beamformer w[ts ] as
P i,j
β
(i,j)∈Pk k [t] = 1. The global flow-split decision is de-
rki,j (w[ts ]))
P
i,j
noted by B[t] ≜ {β k [t], ∀k (i,j)∈Pk βk [t] = 1, ∀k}, where
P ( rk (w[ts ]) =
T
each column flow-split vector β k [t] ≜ βki,j [t] (i,j)∈P ∈ RJ
 CH[ts ] ≜ rk (w[ts ]), ∀k P (i,j)∈Pk

k ∥wki,j [ts ]∥22 ≤ Pmax


i,j
, ∀(i, j)
corresponds to the flow-split vector of data-flow k. k∈K
(3)
1) Wireless Channel Model and Downlink Throughput
i,j
The large-scale fading coefficients are assumed to be in- where Pmax denotes the transmit power budget of RU (i, j).
variant within one frame Tc , while the small-scale fading We note that the achievable rate rki,j (w[ts ]) is upper bounded
2
by rki,j (w[ts ]) ≤ W log2 1 + Pmax hi,j
i,j

k [ts ] 2 /N0 for a greater than or equal to a certain positive constant ϵk . This
i,j probabilistic constraint is used to tackle the randomness and
limited transmit power budget Pmax , leading to rk (w[ts ]) <
∞, ∀k, t. variability of the arrival rate while maintaining a certain level
2) Queueing Model of performance.
As illustrated in Fig. 2, each RU maintains a separate Remark 1. It is clear that problem (5) needs to be executed in
queue for each UE. Let Ak [t] (bits/s) be the total rate of different time scales (i.e., over the long-term scale t at Non-RT
data arriving at RU destined for UE k in time-frame t with RIC and the short-term scale ts at Near-RT RIC), as shown
mean E{Ak } = Āk . We assume that Ak [t] is random and in Fig. 3. In particular, the global flow-split vector β[t] is
upper bounded by a finite constant Amax , such as Ak [t] ≤ only updated once per time-frame t to reduce computational
Amax < ∞, ∀k, t, and unknown at the beginning of time- complexity and information exchange, as well as to ensure a
frame t. As a result, the queue-length of data-flowh k at RU stable queueing system. On the other hand, the beamforming
(i, j) in time-slot ts evolves as follows: qki,j [ts+1 ] = qki,j [ts ]+ vector w[ts ] and the instantaneous achievable rate r[ts ] are
i+ optimized based on the real-time effective CSI H[ts ] in time-
βki,j [t]Ak [t]τ − rki,j (w[ts ])τ , where [x]+ ≜ max{0, x}. By slot ts , adapting to dynamic environments.
T
q[ts ] ≜ qki,j [ts ] k,(i,j) and following [18], a queueing network


is stable if the steady-state total queue-length remains finite, IV. JFCS- BASED N ETWORK U TILITY O PTIMIZATION
such as A. Tractable Form of the JFCS Problem (5)
lim sup E{∥q[ts ]∥1 } < ∞. (4) Challenges of Solving JFCS Problem (5): We can observe
ts →∞ that constraint (5c) is nonconvex while (5e) is a nonconvex
probabilistic constraint, generally making problem (5) NP-
B. Problem Formulation hard. In addition, the expectations in the constraints cause
Pts
Let r̄k ≜ lim t1s ℓ=1 rk (w[ℓ]) denote the long-term the stochastic nature of the problem, which cannot be solved
ts →∞
average rate of data-flow k. Each UE k is associated with a directly. The classical optimization approaches, such as suc-
utility function, denoted by Uk (r̄k ). To facilitate the analysis cessive convex approximation (SCA) [35], are often applied to
presented later, we make the following assumption to the solve the optimization problems of nonconvex and determinis-
utility function [18], [30], [33], [34]. tic constraints. However, the stochastic SCA-based algorithms
can no longer guarantee a feasible and (sub)-optimal solution
Assumption 1. The utility function Uk (·) is assumed to satisfy
of all subsequent time intervals (TTIs) due to the dynamics of
the following conditions
the physical layer at small timescales. The flow-split decisions
• Uk (·) is twice continuously differentiable, increasing, and mainly rely on the previous states updated by the RAN
strictly concave. layer. Towards practical applications, an efficient and adaptive
• There exist positive constants 0 < ψ < Ψ < ∞, such as solution to the long-term subproblem of (5) is necessary to
′′
ψ ≤ −Uk (r̄k ) ≤ Ψ, ∀r̄k ∈ [0, r̄max ], with r̄max being achieve high QoE for all UEs in every TTI.
the maximum long-term average rate of any data flow. Let us start by transforming problem (5) into a more
tractable form. Towards a safe design, we consider the re-
POur goal is to maximize the network utility function placement of constraint (5e) by its deterministic constraint.
k∈K Uk (r̄k ), subject to the probabilistic delay constraint,
From the basic property  of probability, we can rewrite (5e) as
achievable rate region and queue-stability constraint. Based on
Prob qki,j [ts ] ≥ Āk d¯k ≤ 1 − ϵk . It follows from the well-
the network utility maximization (NUM) framework, the joint
known Markov inequality [36] that Prob qki,j [ts ] ≥ Āk d¯k ≤

flow-split distribution, congestion control and scheduling op-
timization problem (JFCS) can be mathematically formulated E{qki,j [ts ]}/Āk d¯k , yielding
as Xt Xts−1 i,j
βki,j [ℓ]Āk τ − (1 − ϵk )Āk d¯k − rk (w[ℓ])τ
ℓ=1 ℓ=1
X
JFCS : max Uk (r̄k ) (5a)
β,r̄,w ≤ rki,j (w[ts ])τ, ∀ts , k ∈ K, (i, j) ∈ Pk (6)
k∈K
s.t. lim sup E{∥q[ts ]∥1 } < ∞ (5b) where each queue-length is always non-negative. We note that
ts →∞
(6) is a relaxed constraint of (5e), which implies that any
rk (w[ts ]) ∈ CH[ts ] , ∀ts , k ∈ K (5c) feasible of the former is also feasible for the latter but not
β k [t] ∈ B[t], ∀t, k ∈ K (5d) vice versa due to the Markov upper bound on the outage
 q i,j [t ]  probabilities.
s
Prob k ≤ d¯k ≥ ϵk , ∀ts , k, (i, j) (5e) To facilitate the following optimization, we
Āk  Tintroduce con-
 T  T gestion control variables a[ts ] ≜ ak [ts ] k∈K , satisfying
where β ≜ β T k k∈K and r̄ ≜ r̄k k∈K . Constraint (5e) ensures
Pts
āk − r̄k ≤ 0, ∀k, where āk ≜ lim t1s ℓ=1 ak [ℓ]. Problem
different minimum outage delay requirements for sub-flows, ts →∞
where d¯k and ϵk (0 ≪ ϵk ≤ 1) are the maximum allowable (5) is then rewritten as
X
average delay and the required reliable communication for max Uk (āk ) (7a)
each UE, respectively. It is stated that the probability of β,ā,r̄,w
i,j k∈K
qk [ts ]
Āk
≤ d¯k (i.e. UEs’ maximum allowable delay) should be s.t. (5b), (5c), (5d), (6) (7b)
āk − r̄k ≤ 0, ∀k. (7c) Algorithm 1: Intelligent Resource Management Algorithm
for Solving JFCS Problem (5), compliant with O-RAN
We also  introduce a new auxiliary queue-length vector
T  Initialization: Set t = 1 and select a positive scaling factor φ.
q̂[ts ] ≜ q̂k [ts ] k∈K , where q̂k [ts+1 ] = q̂k [ts ] + ak [ts ]τ − Initialize β k [1] = |P1k | [1, · · · , 1] and all queues are set to be
+
rk (w[ts ])τ to associate constraint (7c) with a penalty empty: qki,j [11 ] = 0 and q̂k [11 ] = 0, ∀(i, j), k.
function and ak [ts ] ∈ [0, Amax ]. We define the total P queue Main Loop:
1: for each frame t = 1, 2, · · · , T do {/*Long-term scale t*/}
backlog of all UEs in time-slot ts as L[ts ] = 12 k∈K 2: Flow-Split Distribution: Given {q[t − 1], A[t − 1]}, CU
i,j
P qk [ts ]2 P q̂k [ts ]2  splits data-flows of all UEs based on the optimal flow-split
(i,j)∈Pk τ2 + k∈K τ2 , which is the quadratic
decisions β ∗ [t] by solving L-SP at Non-RT RIC:
Lyapunov function [17], [37]. For given (q[ts ], q̂[ts ]), the X
Lyapunov drift from time-slot ts to ts+1 is given as ∆L[ts ] = max Lk [t].
β k [t]∈B[t],∀k
L[ts+1 ] − L[ts ]. To guarantee joint network stability and k∈K

penalty minimization (i.e., (5b) and (7c) hold true), we adopt 3: for each time-slot ts = tTf + s with s ∈ {1, · · · , Tf } do
the drift-plus-penalty procedure [17] to minimize the drift of {/*Short-term scale ts */}
a quadratic Lyapunov function and rewrite (7) as 4: Congestion Controller: Given the queue-length vector
X q̂[ts ], solve S-SP1 (12) to obtain the optimal congestion
max φ E{Uk (ak [ts ])} − E{∆L[ts ]} (8a) control variables:
β,ā,r̄,w
k∈K
n ′ q̂k [ts ]  max o
a∗k [ts ] = min Uk−1 ,A , ∀k.
s.t. (5c), (5d), (6) (8b) φτ

where φ is a scaling factor to balance two objective functions. 5: Weighted Queue-Length-Based Scheduler: Given the
queue-length vector q̂[ts ] and the flow-split distribution
We now show that constraint (7c) holds with equality at β ∗ [t], each RU (i, j) ∈ Pk schedules the service rate
optimum by introducing the following lemma. rki,j (w[ts ]) for UE k ∈ K by solving S-SP2:
X q̂k [ts ]
Lemma 1. For each data-flow of UE k, the optimal congestion max rk (w[ts ]), s.t. (5c), (6).
control rate is equal to the optimal long-term average service r[ts ],w[ts ] τ
k∈K
rate, i.e., ā∗k − r̄k∗ = 0, ∀k. 6: Queue-Length Updates: Queue-Lengths are updated as
qki,j [ts+1 ] = qki,j [ts ] + βki,j [t]Ak [t]τ

The proof Lemma 1 is straightforward by examining the
Karush–Kuhn–Tucker (KKT) complementary slackness condi- +
− rki,j (w[ts ])τ , ∀k, (i, j)
tion over the increasing and strictly concave objective function  +
q̂k [ts+1 ] = q̂k [ts ] + ak [ts ]τ − rk (w[ts ])τ , ∀k.
Uk (·), ∀k.
B. Overall Intelligent Resource Management Algorithm 7: Set s = s + 1
To solve problem (8) in different time scales, we now 8: end for
9: Update {q[t], A[t]} := {qki,j [t], Ak [t]}k,(i,j) to Non-RT
decompose it into three subproblems. To do so, we consider RIC.
a worst-case design by developing an upper bound of ∆L[ts ] 10: Set t = t + 1
11: end for
for given (q[ts ], q̂[ts ]). From the inequality ([x]+ )2 ≤ x2 and
(x + y)2 − x2 = 2xy + y 2 , we have
X X qki,j [ts ] i,j q i,j [t ]
βk [t]Ak [t] − rki,j (w[ts ]) where Lk [t] = (i,j)∈Pk k τ s rki,j (w[ts ]) − βki,j [t]Ak [t] .

∆LUB [ts ] ≜
P 
τ
k∈K (i,j)∈Pk Although problem (11) is a linear program in β, it cannot be
X q̂k [ts ]  solved directly by standard optimization techniques because
+ ak [ts ] − rk (w[ts ]) + B[ts ] ≥ ∆L[ts ] (9) Ak [t], ∀k are incompletely known at the beginning of time-
τ
k∈K
frame t.
1
P P i,j
(i,j)∈Pk βk [t]Ak [t] −
where B[ts ] ≜ 2 k∈K
Short-term subproblems (S-SPs): The congestion control
2 2 subproblem at time-slot ts is
rki,j (w[ts ]) + 21 k∈K ak [ts ] − rk (w[ts ])
 P
is the
summation of the second moments of the arrival and service X q̂k [ts ] 
processes. Following [17] and [30], we consider that B[ts ] is S-SP1 : max φUk (ak [ts ]) − ak [ts ] (12)
a[ts ]≥0 τ
k∈K
finite and bounded by B̄ for all ts , i.e., E{B[ts ] q[ts ], q̂[ts ]}
≤ B̄. As a result, problem (8) is simplified to which is an unconstrained convex problem. The optimal
X solution of  (12) exists and is unique that is a∗k [ts ] =
max φ E{Uk (ak [ts ])} − E{∆LUB [ts ]} (10a) ′
Uk−1 q̂kφτ [ts ] ′
, ∀k, where Uk−1 (·) denotes the inverse function
β,ā,r̄,w
k∈K
of the first derivation of Uk (·). Given the optimal solution
s.t. (5c), (5d), (6). (10b) β ∗ [t], the short-term power control optimization subproblem
Long-term subproblem (L-SP): The flow-split distribution (i.e., the weighted queue-length-based scheduling) at time-slot
subproblem at time-frame t is given as ts is given as
X X q̂k [ts ]
L-SP : max Lk [t] (11) S-SP2 : max rk (w[ts ]), s.t. (5c), (6). (13)
β k [t]∈B[t],∀k r[ts ],w[ts ] τ
k∈K k∈K
The overall intelligent resource management algorithm for with θ̂ki,j [1] = 0 and ηθ [t] being the learning rate. In order
solving the JFCS problem (5) is summarized in Algorithm to achieve high performance in the long term, the L-SP
1, where the solutions of subproblems will be provided next. must balance exploration and exploitation processes. We note
that trying all possible actions to choose the best paths (e.g.
V. P ROPOSED A LGORITHMS FOR S OLVING S UBPROBLEMS the exhaustive exploration) can offer the highest payoff, but
We are now in a position to solve L-SP (11) and S-SP2 (13) with the cost of slow convergence and even computationally
in different time scales. The optimality of the latter depends prohibitive. During the exploitation process, playing an action
heavily on the optimal flow-split decisions, which often require associated with the highest estimated utility in (15) will
a prior knowledge of the statistical information of all possible likely result in a very sub-optimal solution. To make this
paths at Non-RT RIC. However, the assumption of complete tradeoff more efficient, let us define the best response function
information is unrealistic due to the dynamic environment and β̂[t] = f (θ̂[t]) as
the data collected from the RAN layer being only updated to
n X X o
βki,j [t]θ̂ki,j [t] . (17)

f (θ̂[t]) := argmin h β[t] −λ
Non-RT RIC only on the long-term scale. In this work, at β k [t]∈B[t] k∈K (i,j)∈Pk
time-frame t we aim to exploit historical system information
accumulated from the previous time-slot, which can be used to Here λ is the so-called trade-off factor (a.k.a. Boltzmann
build the smoothed best response in maximizing the long-term temperature) and h β[t] denotes the regularization function.
utility for each data flow. We note that when λ → 0, it leads to uniform probabilities of
all actions, i.e. βki,j [t] = 1/|Pk |, ∀(i, j) ∈ Pk . For λ → ∞, the
A. Reinforcement Learning Algorithm for Solving L-SP (11) second term in (17) will dominate the best response function
The flow-split decision β k [t] in problem (11) can be es- and then the actions associated with the highest estimated
timated separably by minimizing Lk [t]. This implies that regret will be selected [38].
the larger the queue-length qki,j [ts ], the lower the flow-split Regularization function: The regularization function al-
decision value βki,j [t] to guarantee fairness among all RUs lows to learn the best paths that maximize its own performance
(i, j) ∈ Pk (i.e., to avoid large queue-lengths qki,j at some and stabilize the flow-split decisions. The solutions to problem
RUs in the next time-slot ts+1 ). Let us denote (11) lie in the unit simplex for each data-flow. Therefore, we
qki,j [ts ] i,j adopt the Gibbs-Shannon entropy as the regularization  func-
ui,j rk (w[ts ]) − βki,j [t]Ak [t] tion, i.e. h β[t] = k∈K (i,j)∈Pk βki,j [t] ln βki,j [t] , which
  P
k [t] ≜
P
τ 
is K-strongly convex. Substituting h β[t] into (17), we have
as the instantaneous utility observation of data-flow k at time- nX X
frame t when selecting path (i, j) ∈ Pk . The total utility f (θ̂[t]) := argmin βki,j [t] ln βki,j [t]

observation of data-flow k, denoted by uk [t], is thus β k [t]∈B[t],∀k k∈K (i,j)∈Pk
X o
ui,j
X X
uk [t] = k [t]. (14) −λ βki,j [t]θ̂ki,j [t] . (18)
(i,j)∈Pk k∈K (i,j)∈Pk

However, it is unable to build a smoothed best response based The function f (θ̂[t]) is convex and separable for each βki,j [t].
on ui,j
k [t] as it is not revealed at the beginning of time-frame By solving the following equation
t. Inspired by [38], we denote ûi,j k [t] as the estimated utility
∂f (θ̂[t])/∂βki,j [t] = ln βki,j [t] + 1 − λθ̂ki,j [t] = 0

of data-flow k at time-frame t when selecting path (i, j). In
addition, the actual utility observed by data-flow k at time-
we have
frame t, denoted by ūk [t], is given as ūk [t] = uk [t − 1], which
βki,j [t] = f (θ̂ki,j [t]) = exp λθ̂ki,j [t] − 1 .

is based on feedback from Near-RT RIC at time t − 1. By
initializing ûi,j
k [1] = 0, the estimated utility of data-flow k is
To ensure (i,j)∈Pk βki,j [t] = 1, ∀k (i.e. the unit simplex for
P
updated for action ck [t] = ci,jk [t] as follows:
data-flow k), we normalize fki,j (θ̂ k [t]) through the exponenti-
k [t] = ûk [t − 1] + ηu [t]1{ck [t]=ci,j
ûi,j i,j
ūk [t] ated mirror function as
k [t]}

− ûi,j
 + 
k [t − 1] , ∀t > 1 exp λθ̂ki,j [t] − 1

(15) i,j
fk (θ̂ k [t]) = P  i′ ,j ′ + 
where ηu > 0 is the decreasing step size (i.e. the learning rate), (i′ ,j ′ )∈Pk exp λθ̂k [t] − 1
which is often decreased over time to guarantee convergence. + 
exp λ θ̂ki,j [t]

Naturally, ûi,j i,j
k [1] is initialized as ûk [1] = 0 for t = 1. The =P  i′ ,j ′ +  . (19)
indicator function 1{x=y} = 1 (resp. 0) if the condition x = y (i′ ,j ′ )∈Pk exp λ θ̂k [t]
is true (resp. false). As a result, the estimated value of each element of flow-split
Next, we denote θ̂ k [t] ≜ [θ̂ki,j [t]](i,j)∈Pk as the estimated vector β k [t] is updated for all actions with the regret as
regret vector of data-flow k, where each element is updated
βki,j [t] = βki,j [t − 1] + ηβ [t] fki,j (θ̂ k [t]) − βki,j [t − 1] (20)

for action ck [t] = ci,j
k [t] as

θ̂ki,j [t] =θ̂ki,j [t − 1] + ηθ [t]1{ck [t]=ci,j [t]} ūk [t] for t > 1, where β k [1] = |P1k | [1, · · · , 1] and ηβ [t] is the
k
learning rate. The three-step reinforcement learning procedure
− ûi,j i,j 
k [t] − θ̂ k [t − 1] , ∀t > 1 (16) includes (15), (16) and (20), which do not require expensive
computations and projection to the feasible space. Algorithm 2: Proposed Iterative Algorithm for Solving
Convergence properties: The convergence conditions for (13) with MRT-Based Transmission Design
the three-step reinforcement learning procedure are given as Initialization: Set n := 1 and generate an initial feasible
follows: value for p(0) [ts ] to constraints in (25)
XT XT 1: repeat
lim ηu [t] = +∞ & lim ηu2 [t] < +∞ 2: Solve (25) to obtain the optimal transmission power
T →∞ t=1 T →∞ t=1
XT XT p∗ [ts ]
lim ηθ [t] = +∞ & lim ηθ2 [t] < +∞ 3: Update p(n) [ts ] := p∗ [ts ]
T →∞ t=1 T →∞ t=1
XT XT 4: Set n := n + 1
lim ηβ [t] = +∞ & lim ηβ2 [t] < +∞ 5: until Convergence
T →∞ t=1 T →∞ t=1 6: Output: p∗ [t√ s] = p
(n)
[ts ] and
ηθ [t] ηβ [t] i,j,∗
i,j∗
pk [ts ] i,j
lim = 0 & lim = 0. (21) wk [ts ] = √ i,j hk [ts ], ∀k, (i, j).
t→∞ ηu [t] t→∞ ηθ [t] νk [ts ]

This implies that the learning rates must be decreased over


time to guarantee the convergence of the proposed three-step
RL procedure. The detailed proof of a multiple-timescales RL of rki,j (p[ts ]). We will now apply the inner approxima-
algorithm can be found in [38], [39]. Following the same tion (IA) method to effectively solve (23) in an iterative
arguments as those in [38] and the conditions in (21), the manner. Following from inequality (A.2) in Appendix A
three-step RL procedure in (15), (16) and (20) converges to with v = pi,j i,j 2
k [ts ]∥hk [ts ]∥2 and z = Φk (p[ts ]), the
i,j
i,j
the optimal solution with the positive trade-off factor λ > 0, global concave lower bound of rk (p[ts ]) at the updated
satisfying lim β k [t] = β ∗k , ∀k ∈ K. feasible point p(n) [ts ] found at iteration n, denoted by
t→∞ i,j(n)
rk (p[ts ]; p(n) [ts ]), is given as
B. Proposed Algorithm for Solving S-SP2 (13) 
Given the optimal flow-split distribution of data-flow k,
rki,j (p[ts ]) ≥ rki,j (p(n) [ts ]) − W log2 e γki,j (p(n) [ts ])
β ∗k [t], we denote by Pk∗ [t] the set of selected path states in
time-frame t, which only includes ci,jk [t] = 1 with (i, j) ∈ Pk .
q
i,j(n)
q
In this section, we present two low-complexity transmission νki,j [ts ] pk [ts ] pi,jk [ts ]
−2 i,j
+ γki,j (p(n) [ts ])
designs for w, namely maximum ratio transmission (MRT) Φk (p(n) [ts ])
and zero-forcing beamforming (ZFBF), and then develop low-
pi,j i,j i,j 
k [ts ]νk [ts ] + Φk (p[ts ])
complexity iterative algorithms for their solution. × i,j(n)
1) MRT-Based Transmission Design pk [ts ]νki,j [ts ] + Φi,jk (p
(n) [t ])
s
i,j(n)
Each RU (i, j) performs MRT beamforming (a.k.a. channel- := rk (p[ts ]; p(n) [ts ]) (24)
mathched
√ beamforming) using local CSI as wki,j [ts ] = i,j(n)
(p(n) [ts ]; p(n) [ts ]) = W log2 1 + γki,j (p(n) [ts ]) .

i,j with rk
√pki,j [ts ] hi,j i,j
k [ts ], ∀(i, j) ∈ Pk [t] and k ∈ K, where νk [ts ] ≜ As a result, we successively solve the following inner convex
νk [ts ]
∥hi,j 2 i,j
k [ts ]∥2 and pk [ts ] is the transmit power coefficient allo-
approximate program of (23) at iteration n:
cated to UE k from RU (i, j) in time-slot ts . The correspond- X q̂k [ts ] (n)
ing SINR is rewritten as max rk (p[ts ]) (25a)
p[ts ] τ
k∈K
pi,j i,j
k [ts ]νk [ts ] i,j(n)
γki,j (p[ts ]) = (22) s.t. R̄ki,j [ts ] ≤ rk (p[ts ]; p(n) [ts ])τ, ∀k ∈ K, (i, j) (25b)
Φi,j
k (p[ts ])
X i,j
i,j
pk [ts ] ≤ Pmax , ∀(i, j) (25c)
′ ′ ′ ′
where Φi,j i ,j
[ts ]νki ,j [ts ] +
P
k (p[ts ]) ≜ (i′ ,j ′ )∈Pk [t]\{(i,j)} pk
k∈K
|(hi,j [ts ])H hi,j [ts ]|2
i,j k′ and update the feasible point p(n) [ts ] until convergence, where
P P k
k′ ∈K\{k} (i,j)∈Pk′ [t] pk′ [ts ] νki,j
+ N0 is lin-
′ [ts ] (n) i,j(n)
(p[ts ]; p(n) [ts ]). The proposed
P
 i,j  rk (p[ts ]) = (i,j)∈P ∗ rk
ear in p[ts ] ≜ pk [ts ] k∈K,(i,j)∈P . As a result, the short- k
k iterative procedure to solve (13) is summarized in Algorithm
term power optimization problem (13) with MRT reduces to
2. An initial feasible value for p(0) [ts ] to start Algorithm 2
the following problem:
is easily found by successively solving the following simple
X q̂k [ts ] convex program:
max rk (p[ts ]) (23a)
p[ts ] τ  i,j(n)
k∈K max ϱ ≜ min rk (p[ts ]; p(n) [ts ])τ − R̄ki,j [ts ] (26a)
p[ts ] ∀k,(i,j)
s.t. R̄ki,j [ts ] ≤ rki,j (p[ts ])τ, ∀k, (i, j) (23b) X i,j
i,j
X i,j
i,j s.t. pk [ts ] ≤ Pmax , ∀(i, j) (26b)
pk [ts ] ≤ Pmax , ∀(i, j) (23c)
k∈K
k∈K
P i,j i,j
until reaching ϱ > 0.
where rk (p[ts ]) = (i,j)∈Pk∗ rk (p[ts ]) with rk (p[ts ]) ≜ Convergence and complexity analysis: The convergence
t
W log2 1 + γki,j (p[ts ]) , and R̄ki,j [ts ] ≜ ℓ=1 βki,j [ℓ]Āk τ −
 P
Pts−1 i,j of an IA-based algorithm is already provided in [40]. In
(1 − ϵk )Āk d¯k − ℓ=1 rk (p[ℓ])τ. particular, Algorithm 2 generates an improved solution after
Problem (23) is nonconvex due to the nonconcavity each iteration, which converges to at least a local optimal
solution of (13) when n → ∞. The
√ worst-case
 of per-iteration Algorithm 3: Proposed Low-Complexity Algorithm for
complexity of Algorithm 2 is O c(v)3 by the interior-point Solving (13) with ZFBF-Based Transmission Design
method [41, Chapter 6], where c = KJ + J and v = KJ Initialization: Set n := 1 and generate initial values
are the numbers of linear constraints and scalar variables, µi,j = 0 and µi,j = +∞, ∀(i, j) ∈ J
respectively. 1: for each RU (i, j) ∈ J in parallel do
2: repeat
2) ZFBF-Based Transmission Design (n) i,j(n)
3: Compute µi,j = (µi,j + µi,j )/2 and p̃k [ts ] as
To make ZFBF efficient and feasible, the number of an- in (30)
tennas of each RU (i, j) is required to be larger than the P i,j(n) i,j
4: if k∈K p̃k [ts ] − Pmax ≤ 0 then
number of UEs, i.e. Mi,j > K, ∀(i, j) ∈ J , to cancel the 5: ′
Compute µi,j = (µi,j + µi,j )/2 and update
inter-user interference transmitted by this RU. In addition, the µi,j := µ′i,j
system bandwidth is equally allocated to each RU (i, j), i.e. 6: else
W i,j = W/J, to completely remove the intra-user interference 7: Update µ′i,j = (µi,j + µi,j )/2 and update
and interference caused by other RUs. Under the proposed µi,j := µ′i,j
ZFBF technique, beamformer wki,j [ts ] at RU (i, j) is designed 8: end if
to satisfy (hi,j H i,j ′
k′ [ts ]) wk [ts ] = 0, ∀k ∈ K \ {k}. We denote
9: Set n := n + 1
by H−k [ts ] ≜ h1 [ts ] · · · hk−1 [ts ] hi,j
i,j i,j i,j i,j  10: until µi,j − µi,j ≤ δ {/*Satisfying a given accuracy
k+1 [ts ] · · · hK [ts ] ∈
CM ×(K−1) the channel matrix from RU (i, j) to UEs, level*/}
11: end for
except UE k. Let Vki,j [ts ] ∈ CMi,j ×(Mi,j −K+1) be the (n)
12: Output: µ∗ i,j =nµi,j ,
null space of (Hi,j H
−k [ts ]) . We can then write wk [ts ] =
i,j
q̂k [ts ]W
o i,j

Vki,j [ts ]w̃ki,j [ts ], where w̃ki,j [ts ] ∈ C(Mi,j −K+1)×1 , ∀k, (i, j) p̃i,j∗ i,j
k [ts ] = max p̃k,min [ts ], τ µ∗
N0
− ν̃ i,j and
√ i,j∗ i,j ln 2 k [ts ]
are the solutions to the ZFBF-based problem. By defining p̃ [t ]
wki,j,∗ [ts ] = √ ki,j s Vki,j [ts ](h̃i,j H
k [ts ]) , ∀k, (i, j).
ν̃ki,j [ts ] ≜ ∥(h̃i,j H 2 i,j i,j
k [ts ]) ∥2 with h̃k [ts ] ≜ (hk [ts ]) Vk [ts ] ∈
H i,j ν̃k [ts ]
C 1×(Mi,j −K+1)
, weq can equivalently express w̃ki,j [ts ]
(h̃i,j [ts ])H
as w̃ki,j [ts ] = p̃i,j
k [ts ]
√k i,j , where p̃[ts ] ≜ Service Management &
ν̃k [ts ] Data
 i,j  Orchestration (SMO) Framework storage
p̃k [ts ] k,(i,j)∈P are the solutions to the following problem:
k 1 Non-RT RIC
Policy AI/ML Agent
Computing (L-SP)
X q̂k [ts ] platform
max rk (p̃i,j
k [ts ]) (27a) 2 A1 Controls (β ∗ [t]) O1
p̃[ts ] τ
k∈K Near-RT RIC

Observation updates
s.t. R̄ki,j [ts ] ≤ rki,j (p̃i,j
k [ts ])τ, ∀k, (i, j) (27b) xAPP 1 xAPP 2 xAPP 3

(q[t − 1], A[t − 1])


X i,j 3 (S-SP1) (S-SP2) (...) 6
i,j
p̃k [ts ] ≤ Pmax , ∀(i, j) (27c) a∗ [ts ] w∗ [ts ]
k∈K 4 RAN Data Analytics
(Queue-length updates: q[ts+1 ], q̂[ts+1 ])
p̃i,j i,j
k [ts ]ν̃k [ts ]
where rki,j (p̃i,j i,j

k [ts ]) ≜
W log2 1 + . The 5 E2 Actions (r∗ [ts ]| β ∗ [t], w∗ [ts ])
N0
i,j i,j i,j
function rk (p̃k [ts ]) is
concave in p̃k [ts ], leading to the F1 DU
convexity of problem (27). From (27b), one can show that CU
i,j
R̄ [ts ]
k −1 F1
p̃i,j i,j N0
k [ts ] ≥ p̃k,min [ts ] := ν̃ i,j [ts ] 2 . We now develop
W i,j τ
NFVI Platform
DU
k
an efficient method to solve (27) by formulating the partial
Lagrangian as Fig. 4: O-RAN Alliance reference architecture for implement-
ing the proposed JFCS management scheme at time-frame t.
X q̂k [ts ]
L(p̃[ts ], µ) = rk (p̃i,j
k [ts ])
τ
k∈K
X
i,j
X i,j  for a given µi,j , the optimal solution to p̃i,j k [ts ] is given as
+ µi,j Pmax − p̃k [ts ] (28)
k∈K
n q̂k [ts ]W i,j N0 o
(i,j)∈J
p̃i,j∗ i,j
k [ts ] = max p̃k,min [ts ], − i,j . (30)
τ µi,j ln 2 ν̃k [ts ]
where µ ≜ {µi,j ≥ 0}(i,j)∈J are the Lagrange multipliers of
constraint (27c). The dual function can be written as g(µ) = The optimal Lagrange multiplier µi,j is efficiently found by
max {L(p̃[ts ], µ)|p̃i,j i,j
k [ts ] ≥ p̃k,min [ts ], ∀k, (i, j)}. We note
applying a bisection search method between µi,j = 0 and a
p̃[ts ]≥0
sufficiently large µi,j . An efficient algorithm for solving (13)
that L(p̃[ts ], µ) is separable with respect to p̃i,j k [ts ]. Thus, by with ZFBF is summarized in Algorithm 3, which does not rely
solving on existing convex optimization solvers.
n q̂ [t ]
k s
 p̃i,j i,j
k [ts ]ν̃k [ts ]

C. O-RAN-based Implementation of Algorithm 1
p̃i,j∗
k [t s ] = argmax W log 2 1 +
i,j i,j
p̃k [ts ]≥p̃k,min [ts ] τ N 0 Fig. 4 illustrates the key steps for implementing the pro-
o posed JFCS management scheme at time-frame t in the O-
− µi,j p̃i,j
k [ts ] (29) RAN architecture.
1 At the beginning of time-frame t > 1, the three-step φ is large, a better congestion control rate is achieved but
RL procedure for solving L-SP is carried out at Non- with the cost of larger steady-state queue-length divergence
RT RIC based on the collected RAN data in SMO. (i.e. larger delay and slower convergence). Hence, there exists
The collected data include performance/observation and an appropriate value of φ to make this tradeoff more efficient.
resource updates from CU, DU, RU and Near-RT RIC to Theorem 1 will immediately lead to the following result.
SMO. For t = 1, the flow-split decisions are initialized
(1) Corollary 1 (Queue-stability). Given a scaling factor φ and
as β k [1] = 1(1) [1, · · · , 1], ∀k where Pk is the set of
|Pk | C1 in (31), the steady-state total queue-length remains finite
RUs in the feasible communication range of UE k. √
and scales as O(φ) + O( φ), i.e.
2 The optimal flow-split decisions β ∗ [t] are sent to Near- √ √
RT RIC via A1 (the standardized open interface) for real lim sup E{∥q̂(φ) [ts ]∥1 } ≤ τ ΨKAmax φ + KC1 φ
ts →∞
deployment. √
= O(φ) + O( φ). (32)
3 Near-RT RIC hosts xApps (i.e. third party applications)
which communicate with CU/DU through the E2 in- Proof. The proof of (32) is straightforward by noticing the
terface. Given β ∗ [t], xAPPs deployed in Near-RT RIC fact that lim sup E{∥q̂(φ) [ts ]∥1 } = E{∥q̂∞ ∞
(φ) ∥1 } = E{∥q̂(φ) −
control congestion and optimize RAN resources and ts →∞ √
functions in each time-slot ts of time-frame t by solving q̂∗(φ) ∥1 + ∥q̂∗(φ) ∥1 }. Applying the inequality ∥x∥1 ≤ K∥x∥2
∞ ∗ ∗
S-SP1 and S-SP2 to obtain the optimal solutions of for any x ∈ RK + yields: ∥q̂(φ) − q̂(φ) ∥1 + ∥q̂(φ) ∥1 ≤

congestion control a∗ [ts ] and beamformer w∗ [ts ]. K∥q̂∞ ∗ ∗
(φ) − q̂(φ) ∥2 + ∥q̂(φ) ∥1 . From (31) and Step 4 of
4 Subsequently, the RAN Data Analytic component in Algorithm 1, it follows that
Near-RT RIC updates queue-lengths as in Step 6 of
lim sup E{∥q̂(φ) [ts ]∥1 }
Algorithm 1. The updated queue-lengths are sent back ts →∞
to SMO through the O1 interface for periodic reporting. √
≤ KE{∥q̂∞ ∗ ∗
(φ) − q̂(φ) ∥2 } + ∥q̂(φ) ∥1
5 Given β ∗ [t] and w∗ [ts ], the optimal service rate √ √ X ′ √ √ X
Uk a∗k φ ≤ KC1 φ + τ Ψ a∗k φ

r(w∗ [ts ]) is scheduled and applied to CU and DUs ≤ KC1 φ + τ
through the E2 interface. k∈K k∈K
√ √
6 After Tf time-slots in the short-term scale ts , perfor- ≤ KC1 φ + τ ΨKA max
φ (due to a∗k ≤A max
, ∀k) (33)
mance and observations (e.g. q[t − 1], A[t − 1]) are
showing (32).
updated to SMO through the O1 interface to re-estimate
the flow-split decision β ∗ [t + 1]. Let a∞ ∞ ∞
T
 max
(φ) ≜ [a(φ),k ]k∈K with a(φ),k = E min{A ,


−1 q̂(φ),k

Uk φτ } be the mean steady-state congestion control
VI. P ERFORMANCE A NALYSIS OF T HE JFCS F RAMEWORK P
rate vector. We also denote by U (a) ≜ k∈K Uk (ak ) the
In this section, we analyze the main theoretical performance total utility function of problem (7). The utility-optimality of
results of Algorithm 1 and discuss their key insights, followed Algorithm 1 is stated by the following theorem, whose proof
by concrete proofs of the theorems. is given in Appendix C.
Assumption 2. To facilitate the analysis, we make the follow- Theorem 2 (Optimality). Given a scaling factor φ, Algorithm
ing additional assumptions. 1 produces the mean steady-state congestion control rate
• Under the limited transmit power budget at RUs, the vector a∞
(φ) , satisfying
achievable rate of UE k is upper bounded by rmax > 0,
i.e., rk (w[ts ]) ≤ rmax , ∀k, ts . 1 √
∥a∞ ∗
(φ) − a ∥2 ≤ C2 √ = O(1/ φ) (34)
• The congestion control variable ak [ts ] satisfies the con- φ
dition E{a2k [ts ]} ≤ Amax
1 , where Amax
1 is a sufficiently C1
q


where C2 ≜ ψτ = max + (r max )2 . Therefore, the
large positive constant [34]. 2ψ A1
optimal network utility maximization is bounded as
Theorem 1 (Bounding the mean divergence of the auxiliary
queue-length). For a given scaling factor φ, let q̂∞ ∗ 1
(φ) and q̂(φ) U (a∗ ) − C3 = U (a∗ ) − O(1/φ) ≤ U (a∞(φ) ) (35)
be the steady-state and optimal queue-lengths, respectively. φ
From Assumptions 1 and 2, the expected upper bound of the ΨC2 2
where C3 ≜ 2ψ2 τ12 = KΨ max

4ψ A1 + (rmax )2 .
divergence of q̂∞ ∗
(φ) from q̂(φ) is given as
√ √ The analytical results in Theorem 2 show that the divergence
E{∥q̂∞ ∗
(φ) − q̂(φ) ∥2 } ≤ C1 φ = O( φ) (31) of the steady-state congestion control rate vector a∞ ∗
√ (φ) from a
q
2 scales as O(1/ φ), which is the same as in [34], [42]. The
where C1 ≜ Kτ2 Ψ Amax

1 + (rmax )2 is a positive constant. utility-optimality gap can be reduced by increasing φ, but this
The proof is detailed in Appendix B. Theorem 1 implies that will also lead to a larger steady-state queue-length divergence.
the divergence of the steady-state queue-length is bounded by
√ VII. N UMERICAL R ESULTS
O( φ). In particular, the smaller the value of φ, the less the
divergence of q̂∞
(φ) . However, a small φ will also result in a In this section, we first present simulation setup and param-
small congestion control rate and a faster convergence. When eters in Section VII-A and then provide numerical results of
TABLE I: Simulation Parameters uniform linear arrays with half-wavelength distances between
Parameter Value array elements to model the LoS channels at RUs. The array
i,j i,j
System bandwidth, W 20 MHz response vector is generated as  h̄i,j
k [t] = a(ϕk [t]), where
Number of RUs, J 8 each element m is given as a(ϕk [t]) m = exp jπ(m −
1) sin ϕi,j i,j

k [t] with ϕk [t] ∈ [−π/2, π/2) being the angle-of-
Number of UEs, K 12
Number of antennas at RUs (i, j), Mi,j ≡ M 16 departure (AoD) at RU (i, j). The noise power is modeled as
RUs’ height 10 m N0 = −170 + 10 log10 (W ) + NF dBm, where NF = 9 dB
UEs’ antenna altitude 1.5 m
i,j denotes the noise figure.
Power budget at RU (i, j), Pmax ≡ Pmax 43 dBm
Noise figure, NF 9 dB
We run Algorithm 1 over T = 10000 frames, each consists
Maximum average delay, d¯k ≡ d 10 ms of Tf = 10 time-slots (subframes) and has duration of Tc = 10
Require reliable communication, ϵk ≡ ϵ 0.95 ms, followed by 5G NR Frame structure [44]. In each time-
Number of frames, T 10000 frame t, UE k is served by a subset of four RUs. To illustrate
Number of time-slots per frame, Tf 10 the heterogeneity of UEs, we assume that the arrival rate
Duration of one frame, Tc 10 ms Ak [t] is uniformly distributed in [1, 3] Gbps. The step sizes
Duration of one time-slot, τ 1 ms (learning rates) are set to decrease after each frame as ηu [t] =
Trade-off factor (Boltzmann temperature), λ 0.3 1/(t + 1)0.51 , ηθ [t] = 1/(t + 1)0.55 and ηβ [t] = 1/(t + 1)0.6
[45]. We adopt the proportional fairness metric to model the
utility function as: Uk (rk ) = log(0.001 + rk ), ∀k [46]. The
Algorithm 1 in Section VII-B. The results and performance
key parameters are summarized in Table I for ease of cross-
comparison over existing schemes will be provided in Section
referencing, followed by studies in [32], [38], [43]–[45]. In
VII-C.
the following figures, results are averaged over the last 6000
A. Simulation Setups and Parameters frames.
Benchmark schemes: To demonstrate the benefits of the
proposed JFCS algorithm, we consider the following three
1000
benchmark schemes:
RUs (DU 1)
800 RUs (DU 2) • “NUM with fixed resource allocation (NUM-FRA)” [47]:
UEs
600
(2,1) Under Algorithm 1, RUs allocate power equally to UEs.
(2,2)
• “NUM with equal flow-split distribution (NUM-EFSD):”
(2,4)
400
(2,3) CU splits data-flows of all UEs equally among the
y-coordinate (m)

200 selected paths, i.e., βki,j [t] = 1/|Pk |, ∀(i, j) ∈ Pk .


(1,1) • “NUM with the nearest RU selection (NUM-NRU):”
0
(1,2) Under Algorithm 1, each UE k selects only the nearest
-200 (1,3) RU for the data transmission, i.e. βki,j [t] = 1 if RU (i, j)
-400
is the nearest RU to UE k.

-600
(1,4) B. Numerical Results of Algorithm 1
We first study the impacts of φ and λ on the convergence
-800
behavior of Algorithm 1 in Fig. 6. From Fig. 6(a), it can
-1000 be observed that the congestion control rates for different
-1000 -800 -600 -400 -200 0 200 400 600 800 1000
values of the scaling factor φ converge to the same optimal
x-coordinate (m)
solution, and ∥a[ts ]∥ is almost independent of φ. In addition,
Fig. 5: A system topology with J = 8 RUs and K = 12 UEs. increasing φ results in a smaller divergence of the steady-state
congestion control rate (see Theorem 2), but also slows down
We consider a system topology given in Fig. 5, including the convergence rate of Algorithm 1. The reason is attributed
8 RUs and 12 UEs located within a circle of 1-km radius.
P the fact that for a large φ, the network utility function
to
There are two DUs, each connected to 4 RUs. RUs are
k∈K Uk (ak [ts ]) in (8a) will prevail over the Lyapunov drift
uniformly distributed in the area, while those of UEs are function ∆L[ts ], which requires more iterations to guarantee
randomly located in each time-frame t. The large-scale fading network stability. In Fig. 6(b), we increase the trade-off factor
coefficient ξ[t] ∈ {ξki,j [t]}∀(i,j),k is modeled as the three-slope λ (i.e. Boltzmann temperature) from 0.05 to 0.7. The result
path loss model [32], such as ξ[t] = ξ0 − 35 log10 (d[t]) + shows that the larger the value of λ, the better the estimated
20c0 log10 (d/d0 )+15c1 log10 (d/d1 ) where ξ0 = −140.7+SF utility that can be achieved with the cost of lower convergence
dB, d0 = 10 m, d1 = 50 m, and d is the distance between an speed of the RL process. From (18), the paths associated
RU and a UE; here ci = max{0, |ddii −d −d| } with i ∈ {0, 1} with the highest estimated regret θ̂ki,j [t] will be selected to
and SF ∼ CN (0, σSF ) denotes the shadowing factor with minimize the best response function f (θ̂[t]). Conversely, a
σSF = 8 dB. The Rician factor κ[t] ∈ {κi,j k [t]}∀(i,j),k is low value of λ can speed up convergence by allocating traffic
given as κ = PLoS (d[t])/ 1 − PLoS (d[t]) , where the LoS data uniformly to all paths but leads to a very sub-optimal
probability
 follows the 3GPP–UMa model as PLoS (d[t]) = solution.
min d[t] , 1 1−exp(− 36 ) +exp(− d[t]
18 d[t] 
36 ) [43]. We consider In Fig. 7, we evaluate the performance of Algorithm 1 with
4 8

∥}
𝜑 = 100

(25)
𝜑 = 50
Congestion control rate, ∥ 𝒂[𝑡 𝑠 ] ∥

Steady-state congestion control rate, E{∥ 𝒂 ∞


3.75
𝜑 = 25
𝜑 = 10
3.5 6

3.25
4
3

2.75
2 Alg. 1 - ZFBF
2.5 Alg. 1 - ZFBF with EFSD
0 1 2 3 4 5 6 7 8 9 10 Alg. 1 - MRT
Iteration × 10000 Alg. 1 - MRT with EFSD
(a) Impact of φ on congestion control rate 0
16 32 64 128
25 Number of antennas at RUs, 𝑀

Fig. 7: Performance of Algorithm 1 with different transmission


ℒ𝑘 [𝑡]

20 strategies versus the number of antennas at RUs, M ≡ Mi,j , ∀(i, j).


𝑘 ∈K

15
∥} 10
Í

(25) Algorithm 1 - ZFBF


Estimated utility,

Steady-state congestion control rate, E{∥ 𝒂 ∞

10
NUM-EFSD
𝜆 = 0.7 8 NUM-FRA
𝜆 = 0.5 NUM-NRU
5 𝜆 = 0.3
𝜆 = 0.1 6
𝜆 = 0.05
0
0 1 2 3 4 5 6 7 8 9 10 4
Iteration × 1000
(b) Impact of λ on estimated utility 2

Fig. 6: Convergence behavior of Algorithm 1 with ZFBF.


0
16 32 64 128
Number of antennas at RUs, 𝑀
different transmission strategies, namely MRT and ZFBF. For
a fixed φ = 25, we vary the number of antennas at RUs Fig. 8: The steady-state congestion control rate with respect to the
M ≡ Mi,j , ∀(i, j) from 16 to 128. For each transmission number of antennas at RUs, M ≡ Mi,j , ∀(i, j).
design, we also plot the steady-state congestion control rate
E{∥a∞ (25) ∥} with the equal flow-split distribution. As seen
to 128 to investigate the impact of the physical factor. As
from Fig. 7 that the steady-state congestion control rate of all
M increases, the downlink instantaneous achievable rates of
schemes increases as M increases. Unsurprisingly, Algorithm
all UEs also significantly increase since more degrees of
1 with ZFBF offers better performance in terms of congestion
freedom are added to leverage multi-user diversity, resulting
control rate than that of MRT when the number of antennas at
in lower queue-lengths. For a fixed value of φ, the steady-
RUs is sufficiently large to cancel the inter-user interference
state congestion control rate vector increases monotonically
transmitted by the same RU. It is obvious that the higher the
with M . Clearly, Algorithm 1 outperforms the benchmark
effective data rate of a data-flow in the downlink, the lower
schemes in all ranges of M , and the gap is deeper when M
the total queue-length of that data-flow (or user), resulting in
is small. In addition, the NUM-FRA and NUM-NRU, which
a higher congestion control rate.
fairly allocate the power budget and fix the path selection
Since Algorithm 1 with MRT is based on the IA method that
to UEs, respectively, provide the worst performance. These
requires high computation complexity and relies on existing
observations demonstrate the effectiveness of the proposed
convex optimization solvers, we provide only the performance
Algorithm 1 by jointly optimizing the flow-split distribution,
of Algorithm 1 with ZFBF in the following section.
congestion control, scheduling and radio resource allocation.
C. Performance Comparison Lastly, the impacts of scaling factor φ on the steady-state
Next, we show the performance comparison in terms of total queue-length E{∥q̂∞ (φ) ∥1 } and average worst-case delay
the steady-state congestion control rate E{∥a∞
(25) ∥} among (i.e., the delay of slowest data-flow) are plotted in Figs. 9 and
the considered schemes versus the number of antennas at 10, respectively. It can be seen from Fig. 9 that the steady-
RUs in Fig. 8. We fix φ = 25 and vary M from 16 state total queue-length of all schemes monotonically scales
500 achieve enhanced control and flexibility in O-RAN.
∥ } [Gb]

Algorithm 1 - ZFBF
NUM-EFSD A PPENDIX A: D ERIVATION OF I NEQUALITY
( 𝜑) 1

400 NUM-FRA
Steady-state queue-length, E{∥ q̂∞

NUM-NRU We will find the concave lower bound of rki,j [ts ]. By [48,
300
Appendix A], it is true that the function r(x, y) = − ln(1 −
x2 /y) is convex in the domain y > x2 with x, y ∈ R+ . The
global concave lower bound of r(x, y) at the feasible point
200 (x̄, ȳ) is given as
D ∂r(x̄, ȳ) ∂r(x̄, ȳ)  E
100 r(x, y) ≥ r(x̄, ȳ) + , , (x − x̄, y − ȳ)
∂ x̄ ∂ ȳ
x̄2 x̄x x̄2 y
0 = r(x̄, ȳ) − + 2 − (A.1)
25 50 75 100 125 150 175 200 ȳ − x̄2 ȳ − x̄2 ȳ − x̄2 ȳ
Scaling factor, 𝜑 by applying the
x2
 first-order Taylor
x2
 approximation. By the fact
Fig. 9: The steady-state total queue-length with respect to φ. that ln 1+ z = − ln 1− z+x2 and substituting y = z +x2 ,
√ √
ȳ = z̄ + x̄2 , x = v and x̄ = v̄ into (A.1), we obtain
√ √
80 v v̄ v̄ v v̄(z + v)
Algorithm 1 - ZFBF r(v, z) ≜ ln 1 + ≥ r(v̄, z̄) − + 2 −
70 z z̄ z̄ z̄(z̄ + v̄)
Average worst-case delay [ms]

NUM-EFSD
:= r̄(v, z; v̄, z̄) (A.2)
60 NUM-FRA
NUM-NRU where r̄(v, z; v̄, z̄) is concave and r̄(v̄, z̄; v̄, z̄) = r(v̄, z̄)
50
whenever (v, z) = (v̄, z̄).
40
A PPENDIX B: P ROOF OF T HEOREM 1
30
For a given φ, the quadratic Lyapunov function defined
20 in Section IV-A is rewritten with respect to q̂(φ) [ts ] as:
10 L(q̂(φ) [ts ]) = 2τ12 ∥q̂(φ) [ts ] − q̂∗(φ) ∥22 . Following [34, Theorem
3], the mean Lyapunov drift from time-slot ts to ts+1 is
0
5 10 15 20 25 30 computed as
Scaling factor, 𝜑 ∆L̄(q̂(φ) [ts ])
Fig. 10: Average worst-case delay with respect to φ. = E{∆L(q̂(φ) [ts ])} = E{L(q̂(φ) [ts+1 ]) − L(q̂(φ) [ts ])}
1 n T
= 2 E q̂(φ) [ts+1 ] + q̂(φ) [ts ] − 2q̂∗(φ)
√ 2τ
as O(φ) + O( φ), which confirms our theoretical results × q̂(φ) [ts+1 ] − q̂(φ) [ts ]
o
in Corollary 1. We recall from Theorem 2 that the utility-
optimality gap can be narrowed by increasing φ, but with the 1 n T
E 2q̂(φ) [ts ]+ a[ts ] − r(w[ts ]) τ − 2q̂∗(φ)


cost of higher delay, as shown in Fig. 10. When φ is larger than 2τ
o
25, all the considered schemes violate the maximum allowable × a[ts ] − r(w[ts ])
average delay of d¯ = 10 ms. It implies that the data traffic 1
cannot be completely transmitted to UEs in each time-frame. = E{∥a[ts ] − r(w[ts ])∥22 }
Nevertheless, Algorithm 1 still provides the best performance |2 {z }
out of the schemes considered. ≜B1
1
+ E{(q̂(φ) [ts ] − q̂∗(φ) )T a[ts ] − r(w[ts ]) }

VIII. C ONCLUSION (B.1)
|τ {z }
We have proposed a new holistic multi-layer optimization ≜B2
framework, called JFCS, to enable intelligent traffic steering
by using the inequalities: ([x]+ )2 ≤ x2 and x2 − y 2 = (x +
in a hierarchical O-RAN architecture. In particular, we have
y)(x − y), and the fact that q̂(φ) [ts+1 ] − q̂∗(φ) = q̂(φ) [ts ] −
developed an intelligent resource management algorithm based
q̂∗(φ) + a[ts ] − r(w[ts ]) τ.

on network utility maximization and stochastic optimization
to efficiently and adaptively direct traffic to appropriate RUs We first focus on providing the expected bound of B1 as
by jointly optimizing the flow-split distribution, congestion 1
B1 = E{∥a[ts ]∥22 − 2a[ts ]T r(w[ts ]) + ∥r(w[ts ])∥22 }
control and scheduling. JFCS is proved to achieve fast con- 2
vergence, long-term utility-optimality and significant delay 1
≤ E{∥a[ts ]∥22 + ∥r(w[ts ])∥22 }
reduction compared to state-of-the-art approaches. To that end, 2
the insights in this work will foster future studies in this area, K max
A1 + (rmax )2 ≜ BUB

≤ 1 (B.2)
especially in the design of more advanced AI/ML solutions to 2
1 
where the last inequality follows from Assumption 2. To bound + E (q̂∞ T ∗
(φ) ) r − r

≥0 (B.8)
B2 , we first rewrite it equivalently as τ
where r∞ = ∞
P
argmax k∈K q̂k rk (w). We note
1 rk (w)∈ CH[∞] ,∀k∈K
B2 = (q̂(φ) [ts ] − q̂∗(φ) )T E a[ts ]} − r∗
 
τ here that (q̂∞ T ∞ ∞
P
(φ) ) r = max k∈K q̂k rk (w) ≥
1  rk (w)∈ CH[∞] ,∀k∈K
+ E (q̂(φ) [ts ] − q̂∗(φ) )T r∗ − r(w[ts ]) . (B.3)

τ (q̂∞ T ∗
(φ) ) r , yielding
From (7), it follows that (q̂(φ) [ts ] − q̂∗(φ) )T E a[ts ]} − r∗ ≤
 
1
E ∥q̂∞ ∗ 2
 UB
0. By applying the Cauchy–Schwarz inequality, i.e. |xT y| ≤ (φ) − q̂(φ) ∥2 − B1 ≤ 0 (B.9)
τ 2 Ψφ
∥x∥2 ∥y∥2 , to the first term in (B.3), we have
E ∥q̂∞ − q̂∗(φ) ∥2

This implies that (φ) ≤
1 q

(q̂(φ) [ts ] − q̂∗(φ) )T E a[ts ]} − r∗
  2
Kτ Ψ
= K
max + (r max )2
 max
2  A1 φ where BUB
1 2 A1 +
τ
max 2
1X ∗ (r ) , showing the inequality (31) in Theorem 1.
≤− |q̂(φ),k [ts ] − q̂(φ),k ||ak [ts ] − rk∗ |. (B.4)
τ
k∈K A PPENDIX C: P ROOF OF T HEOREM 2
By Assumption 1 on Ψ-smooth and Step 4 of Algorithm 1, it is ′ ∞
−1 q̂(φ),k
To prove (34), we first recall that a∞ ∗

′ q̂ [ts ]  ′ ∗
−1 q̂(φ),k [ts ] −a = U
true that ak [ts ] − rk∗ = Uk−1 (φ),k

−U ≤0 (φ),k k k φτ
φτ k φτ ′ q̂ ∗ ′ q̂

′ q̂

q̂ ∞
−Uk−1 (φ),k
  
′ q̂(φ),k [ts ] 

′ q̂(φ),k [ts ]  q̂(φ),k [ts ]

q̂(φ),k [ts ] and Uk (φ),k −Uk (φ),k ≥ ψ (φ),k φτ −
and Uk φτ −Uk φτ ≤Ψ φτ − φτ . ∗
q̂(φ),k
φτ φτ φτ


−1 q̂(φ),k [ts ]
 ′
−1 q̂(φ),k [ts ]
 using Assumption 1. By the inverse function lemma,
In addition, we have Uk φτ −Uk φτ ≥ φτ
′ q̂ ∞ ′ q̂ ∗ q̂ ∞ ∗
q̂(φ),k
we have Uk−1 (φ),k −Uk−1 (φ),k ≤ ψ1 (φ),k

 
φτ − φτ ,
1 q̂(φ),k [ts ] q̂(φ),k [ts ]
Ψ φτ − φτ due to the inverse function lemma. φτ φτ
which yields
From the fact that (q̂∗(φ) )T r∗ − (q̂∗(φ) )T r(w[ts ]) ≥ 0, we can
further bound B2 as 1 (31) C1 1
∥a∞ ∗
(φ) − a ∥2 ≤ ∥q̂∞ ∗
(φ) − q̂(φ) ∥2 ≤ √ . (C.1)
1 ψτ φ ψτ φ
B2 ≤ − 2 ∥q̂(φ) [ts ] − q̂∗(φ) ∥22
τ Ψφ Next, it is assumed that Uk (·) is twice continuously differ-
1  entiable, increasing, and strictly concave. If the utility function
+ E (q̂(φ) [ts ])T r∗ − r(w[ts ])

(B.5)
τ U (a) has a maximizer a∗ , then
 ∗
where the term E (r − r(w[ts ])) is a constant with respect Ψ ∗ ΨC21 1
to q̂(φ) [ts ]. Substituting (B.2) and (B.5) into (B.1) yields U (a∗ ) − U (a∞
(φ) ) ≤ ∥a − a∞ 2
(φ) ∥2 ≤ (C.2)
2 2ψ 2 τ 2 φ
1 where the last inequality follows from (C.1). The proof is thus
∆L̄(q̂(φ) [ts ]) ≤ − ∥q̂(φ) [ts ] − q̂∗(φ) ∥22 + BUB
1
τ 2 Ψφ complete.
1 
E (q̂(φ) [ts ])T r∗ − r(w[ts ]) . (B.6)

+
τ R EFERENCES
[1] W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems:
We now compute the mean Lyapunov drift over T Tf time- Applications, trends, technologies, and open research problems,” IEEE
slots as Network, vol. 34, no. 3, pp. 134–142, 2020.
[2] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The
Tf
T X roadmap to 6G: AI empowered wireless networks,” IEEE Commun.
X
∆L̄= E{L(q̂(φ) [ts+1 ]) − L(q̂(φ) [ts ])|q̂(φ) [11 ]} Mag., vol. 57, no. 8, pp. 84–90, 2019.
t=1 s=1
[3] O-RAN Alliance, “ORAN-WG1 O-RAN architecture description
v01.00.00.” Technical Specification, Feb. 2020.
T Tf
XX X   [4] SAMSUNG, “ORAN - The Open Road to 5G,” White Paper, July
= Prob q̂(φ) [ts ] = q̂(φ) |q̂(φ) [11 ] 2019. [Online]. Available: https://www.samsung.com/global/business/
t=1 s=1 q̂(φ) ≥0 networks/insights/whitepapers/ORAN-the-open-road-to-5g/
 [5] O-RAN Alliance, “ORAN Working Group 2: AI/ML workflow descrip-
×E{L(q̂(φ) [ts+1 ]) − L(q̂(φ) [ts ])|q̂(φ) [ts ] = q̂(φ) } .(B.7) tion and requirements,” Tech. Rep., Mar. 2019.
[6] L. Bonati, S. D’Oro, M. Polese, S. Basagni, and T. Melodia, “Intelli-
Let us denote by ρ∞ the stationary distribution gence and learning in O-RAN for data-driven nextG cellular networks,”
q̂(φ) IEEE Commun. Mag., vol. 59, no. 10, pp. 21–27, 2021.
of the Markov chain q̂(φ) [ts ] ≥ 0, i.e. ρq̂∞(φ) = [7] L. Gavrilovska, V. Rakovic, and D. Denkovski, “From cloud RAN to
PT PTf ORAN,” Wireless Personal Communications, Mar. 2020.
limT →∞ T 1Tf t=1 s=1

Prob q̂(φ) [ts ] = q̂(φ) |q̂(φ) [11 ] . [8] J. Wang, H. Roy, and C. Kelly, “OpenRAN: The next generation of
By substituting (B.6) into (B.7) and dividing both side with radio access networks,” Accesnture Startegy, Tech. Rep., Nov. 2019.
T Tf , we have [Online]. Available: https://telecominfraproject.com/openran/
[9] H. Kumar, V. Sapru, and S. K. Jaisawal, “O-RAN based proactive
X  1 ANR optimization,” in IEEE Global Commun. Conf. Workshops (IEEE
ρ∞
q̂(φ) − 2 ∥q̂(φ) [ts ] − q̂∗(φ) ∥22 + BUB
1 GLOBECOM Wkshps.), 2020, pp. 1–4.
τ Ψφ
q̂(φ) ≥0 [10] T. Pamuklu, M. Erol-Kantarci, and C. Ersoy, “Reinforcement learning
1   based dynamic function splitting in disaggregated green open RANs,”
+ (q̂(φ) [ts ])T E r∗ − r(w[ts ])

in IEEE Inter. Conf. Commun. (IEEE ICC 2021), 2021, pp. 1–6.
τ [11] H. Lee, J. Cha, D. Kwon, M. Jeong, and I. Park, “Hosting AI/ML
1 workflows on O-RAN RIC platform,” in IEEE Global Commun. Conf.
E ∥q̂∞ ∗ 2
 UB
=− 2 (φ) − q̂(φ) ∥2 +B1 Workshops (IEEE GLOBECOM Wkshps.), 2020, pp. 1–6.
τ Ψφ
[12] J. A. Ayala-Romero, A. Garcia-Saavedra, X. Costa-Perez, and G. Iosi- [30] T. K. Vu, M. Bennis, M. Debbah, and M. Latva-Aho, “Joint path
fidis, “Bayesian online learning for energy-aware resource orchestration selection and rate allocation framework for 5G self-backhauled mm-
in virtualized RANs,” in IEEE Conf. Comput. Commun. (IEEE INFO- wave networks,” IEEE Trans. Wireless Commun., vol. 18, no. 4, pp.
COM), 2021, pp. 1–10. 2431–2445, 2019.
[13] Y. Cao, S.-Y. Lien, Y.-C. Liang, K.-C. Chen, and X. Shen, “User access [31] S. Singh, M. Geraseminko, S.-p. Yeh, N. Himayat, and S. Talwar,
control in open radio access networks: A federated deep reinforcement “Proportional fair traffic splitting and aggregation in heterogeneous
learning approach,” IEEE Trans. Wire. Commun., vol. 21, no. 6, pp. wireless networks,” IEEE Commun. Lett., vol. 20, no. 5, pp. 1010–1013,
3721–3736, 2022. 2016.
[14] M. Karbalaee Motalleb, V. Shah-Mansouri, S. Parsaeefard, and O. L. [32] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta,
Alcaraz López, “Resource allocation in an Open RAN system using “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless
network slicing,” IEEE Trans. Netw. and Ser. Manag., vol. 20, no. 1, Commun., vol. 16, no. 3, pp. 1834–1850, Mar. 2017.
pp. 471–485, 2023. [33] P. Key, L. Massoulie, and D. Towsley, “Path selection and multipath
[15] S.-Y. Lien and D.-J. Deng, “Intelligent session management for URLLC congestion control,” in IEEE Conf. Comput. Commun. (IEEE INFO-
in 5G open radio access network: A deep reinforcement learning COM), 2007, pp. 143–151.
approach,” IEEE Trans. Indus. Infor., vol. 19, no. 2, pp. 1844–1853, [34] J. Liu, A. Eryilmaz, N. B. Shroff, and E. S. Bentley, “Understanding the
2023. impacts of limited channel state information on massive MIMO cellular
[16] M. J. Neely, E. Modiano, and C.-P. Li, “Fairness and optimal stochas- network optimization,” IEEE J. Sel. Areas Commun., vol. 35, no. 8, pp.
tic control for heterogeneous networks,” IEEE/ACM Trans. on Netw., 1715–1727, 2017.
vol. 16, no. 2, pp. 396–409, 2008. [35] M. Razaviyayn, “Successive convex approximation: Analysis and appli-
[17] M. J. Neely, “Stochastic network optimization with application to cations,” Ph.D. dissertation, University of Minnesota, 2014.
communication and queueing systems,” Synthesis Lectures Commun. [36] P. Billingsley, Probability and Measure, 3rd ed. New York: Wiley,
Netw., vol. 3, no. 1, pp. 1–211, 2010. 1995.
[18] A. Eryilmaz and R. Srikant, “Joint congestion control, routing, and MAC [37] L. Tassiulas and A. Ephremides, “Stability properties of constrained
for stability and fairness in wireless networks,” IEEE J. Sel. Areas in queueing systems and scheduling policies for maximum throughput
Commun., vol. 24, no. 8, pp. 1514–1524, 2006. in multihop radio networks,” IEEE Trans. Automatic Control, vol. 37,
[19] M. A. Habibi, M. Nasimi, B. Han, and H. D. Schotten, “A comprehensive no. 12, pp. 1936–1948, 1992.
survey of RAN architectures toward 5G mobile communication system,” [38] M. Bennis, S. M. Perlaza, P. Blasco, Z. Han, and H. V. Poor, “Self-
IEEE Access, vol. 7, pp. 70 371–70 421, 2019. organization in small cell networks: A reinforcement learning approach,”
[20] J. Tang, W. P. Tay, and T. Q. S. Quek, “Cross-layer resource allocation IEEE Trans. Wireless Commun., vol. 12, no. 7, pp. 3202–3212, 2013.
with elastic service scaling in cloud radio access network,” IEEE Trans. [39] D. S. Leslie and E. Collins, “Convergent multiple-timescales reinforce-
Wireless Commun., vol. 14, no. 9, pp. 5068–5081, 2015. ment learning algorithms in normal form games,” Anna. Appl. Prob.,
[21] P. Luong, F. Gagnon, C. Despins, and L.-N. Tran, “Joint virtual comput- vol. 13, no. 4, pp. 1231–1251, 2003.
ing and radio resource allocation in limited fronthaul green C-RANs,” [40] A. Beck, A. Ben-Tal, and L. Tetruashvili, “A sequential parametric
IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2602–2617, 2018. convex approximation method with applications to nonconvex truss
[22] A. Douik, H. Dahrouj, T. Y. Al-Naffouri, and M.-S. Alouini, “Coor- topology design problems,” J. Global Optim., vol. 47, no. 1, pp. 29–
dinated scheduling and power control in cloud-radio access networks,” 51, May 2010.
IEEE Trans. Wireless Commun., vol. 15, no. 4, pp. 2523–2536, 2016. [41] A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimiza-
[23] A. Douik, H. Dahrouj, T. Y. Al-Naffouri, and M.-S. Alouini, “Low- tion. Philadelphia: MPS-SIAM Series on Optimi., SIAM, 2001.
complexity scheduling and power adaptation for coordinated cloud-radio [42] A. Eryilmaz and R. Srikant, “Fair resource allocation in wireless
access networks,” IEEE Commun. Lett., vol. 21, no. 10, pp. 2298–2301, networks using queue-length-based scheduling and congestion control,”
2017. IEEE/ACM Trans. Net., vol. 15, no. 6, pp. 1333–1344, 2007.
[24] M. S. Al-Abiad, A. Douik, and S. Sorour, “Rate aware network codes [43] A. H. Jafari, D. López-Pérez, M. Ding, and J. Zhang, “Study on
for cloud radio access networks,” IEEE Trans. Mobile Comput., vol. 18, scheduling techniques for ultra dense small cell networks,” in IEEE Veh.
no. 8, pp. 1898–1910, 2019. Technol. Conf. (VTC-Fall), 2015, pp. 1–6.
[25] A. Douik, S. Sorour, T. Y. Al-Naffouri, and M.-S. Alouini, “Rate aware [44] 3GPP, NR, Physical channels and modulation (Release 15), document
instantly decodable network codes,” IEEE Trans. Wireless Commun., 3GPP TS 38.211 version 15.2.0 Release 15, 2017.
vol. 16, no. 2, pp. 998–1011, 2017. [45] S. Samarakoon, M. Bennis, W. Saad, and M. Latva-aho, “Backhaul-
[26] M. S. Al-Abiad, A. Douik, S. Sorour, and M. J. Hossain, “Throughput aware interference management in the uplink of wireless small cell
maximization in cloud-radio access networks using cross-layer network networks,” IEEE Trans. Wireless Commun., vol. 12, no. 11, pp. 5813–
coding,” IEEE Trans. Mobile Comput., vol. 21, no. 2, pp. 696–711, 2022. 5825, 2013.
[27] O-RAN.WG2.Use-Case-Requirements-v02.01, “Non-RT RIC & A1 [46] X. Lin, N. Shroff, and R. Srikant, “A tutorial on cross-layer optimization
interface: Use cases and requirements,” Technical Specifi- in wireless networks,” IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp.
cation, Nov. 2021. [Online]. Available: https://www.o-ran.org/ 1452–1463, 2006.
specifications(accessedon10November2021) [47] E. Stai and S. Papavassiliou, “User optimal throughput-delay trade-off
[28] M. R. Anwar, S. Wang, M. F. Akram, S. Raza, and S. Mahmood, in multihop networks under NUM framework,” IEEE Commun. Lett.,
“5G-enabled MEC: A distributed traffic steering for seamless service vol. 18, no. 11, pp. 1999–2002, 2014.
migration of internet of vehicles,” IEEE Internet of Things J., vol. 9, [48] V.-D. Nguyen, T. Q. Duong, H. D. Tuan, O.-S. Shin, and H. V. Poor,
no. 1, pp. 648–661, 2022. “Spectral and energy efficiencies in full-duplex wireless information and
[29] F. Kavehmadavani, V.-D. Nguyen, T. X. Vu, and S. Chatzinotas, “Intel- power transfer,” IEEE Trans. Commun., vol. 65, no. 5, pp. 2220–2233,
ligent traffic steering in beyond 5G Open RAN based on LSTM traffic May 2017.
prediction,” IEEE Trans. Wireless Commun., pp. 1–1, 2023.

You might also like