CARES: Computation-Aware Scheduling in Virtualized Radio Access Networks

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TWC.2018.2873324, IEEE
Transactions on Wireless Communications
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. XX, NO. X, XXXXXXX XXXX 1
CARES: Computation-aware Scheduling in

Virtualized Radio Access Networks
D. Bega, A. Banchs, Senior Member, IEEE, M. Gramaglia, X. Costa-Pérez, Member, IEEE, and
P. Rost, Senior Member, IEEE
Abstract—In a virtualized Radio Access Network (RAN), cloud-computing platforms.1 The cloud-computing platforms
baseband processing is performed by software running in cloud- typically run the baseband processing of a number of RAPs
computing platforms. However, current protocol stacks were in a single location; depending on the scenario, such location
not designed to run in this kind of environments: the high
variability on the computational resources consumed by RAN may be placed in edge nodes close to the antennas (e.g., in
functions may lead to eventual computational outages (where MEC – multi-access edge computing) or in the network core
frames are not decoded on time), severely degrading the resulting (e.g., in centralized RAN).
performance. In this paper, we address this issue by re-designing There is wide consensus in the scientific and industrial
two key functions of the protocol stack: (i) scheduling, to communities on the advantages of RAN virtualization. Indeed,
select the transmission of those frames that do not incur in
computational outages, and (ii) modulation and coding scheme this technology provides, among other benefits, high flexibility
(MCS) selection, to downgrade the selected MCS in case no and increased multiplexing gains. Driven by these benefits,
sufficient computational resources are available. We formulate substantial research and standardization efforts have been
the resulting problem as a joint optimization and compute the devoted over the last few years to the design and analysis
(asymptotically) optimal solution to this problem. We further of virtualized RAN solutions [2]–[6].
show that this solution involves solving an NP-hard problem, and
propose an algorithm to obtain an approximate solution that is In spite of the efforts conducted to date, there are still
computationally efficient while providing bounded performance some unresolved research challenges with RAN virtualiza-
over the optimal. We thoroughly evaluate the proposed approach tion; one of them is the dimensioning of the computational
via simulation, showing that it can provide savings as high as resources. The dimensioning of resources should be based on
80% of the computational resources while paying a small price
the processing demands of the different functions; however,
in performance.
the computational load of functions such as baseband pro-
Index Terms—5G, Computation-aware scheduling, Virtualized cessing is known to be highly variable [3], which makes the
RAN, joint scheduling and MCS selection.
resource allocation based on peak demands highly inefficient.
A much more efficient strategy would be to pool the baseband
I. I NTRODUCTION processing and allocate resources to this pool based on typical
(rather than peak) demands [7].2 This implies that occasionally
F Ollowing current trends in network virtualization, virtual-
ized RAN (Radio Access Network) is considered as one
of the key components of future 5G networks [1]. Enabled
the baseband processing pool may not have all the resources
needed to process all frames within the involved timing
by the Network Function Virtualization (NFV) paradigm, this constraints; following [10], hereafter this is referred to as a
technology virtualizes the RAN functions and moves them computational outage.
out from the base stations. The RAPs (Radio Access Points) The current RAN protocol stack has been designed under
only perform simple processing of the Radio Frequency the assumption that required computational resources are
(RF) signal while the majority of the baseband processing always available, and RAN functions are not prepared to
is executed in Virtual Network Functions (VNFs) running in cope with computational outages [11]. Indeed, when such
computational outages occur (e.g., when the decoding function
Manuscript received March 28, 2018; revised June 13, 2018 and September
12, 2018; accepted September 17, 2018. Date of publication xxxx xx, xxxx;
cannot meet the given deadline), current virtualized RAN
date of current version xxxx xx, xxxx. The work of University Carlos III of implementations [12] just drop the frame being processed, and
Madrid was supported by the H2020 5G-MoNArch project (Grant Agreement as a result they see their performance severely degraded.
No. 761445) and the work of NEC Europe Ltd. by the 5G-Transformer
project (Grant Agreement No. 761536). The associate editor coordinating
In order to provide a more robust behavior against com-
the review of this paper and approving it for publication was K. Kansanen. putational outages, a re-design of the RAN protocol stack
(Corresponding author: Dario Bega.) is required. As a first step towards this end, in this paper
D. Bega and A. Banchs are with the with the IMDEA Networks Institute,
28918 Leganés, Spain and also with the University Carlos III of Madrid, 1 This is the case, for instance, with options 7 and 8 for the functional split
28911 Leganés, Spain (e-mail: dario.bega@imdea.org; banchs@it.uc3m.es).
M. Gramaglia is with the University Carlos III of Madrid, 28911 Leganés, defined by 3GPP [2].
2 Indeed, one would expect that resources are typically allocated dynami-
Spain (e-mail: mgramagl@it.uc3m.es).
X. Costa-Pérez is with NEC Europe Ltd., 69115 Heidelberg, Germany (e- cally based on estimates of the average demand. However, such predictions
mail: xavier.costa@neclab.eu). may fail due to the fluctuations of the demand or other events such as, e.g., an
P. Rost is with Nokia Bell Labs, 81541 Munich, Germany (e-mail: unexpected flash-crowd. While in such cases more resources may be provided
peter.m.rost@nokia-bell-labs.com). by adding Virtual Machines or containers to the cloud, this usually happens
at time scales (of tens of seconds or minutes) [8] that are much larger than
0000–0000/00$00.00
c 2018 IEEE the user QoS requirements (of hundreds of milliseconds) [9].
1536-1276 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TWC.2018.2873324, IEEE
2 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. XX, NO. X, XXXXXXX XXXX
we focus on the design of two key functions of the protocol on understanding the computational load incurred by RAN
stack: scheduling and modulation and coding scheme (MCS) functions [26]. Bhaumik et al. [27] analyze this based on a
selection. By making these functions computationally aware, real experimental testbed, but do not consider the impact of
our solution provides a graceful performance degradation in the SNR, which is known to influence computational load very
presence of computational outages, in contrast to the severe substantially. Rost et al. [13] provide a more accurate model
degradation with the current stack. Our two key ideas are as for computational load. In this paper we employ the model
follows: in [13], which is the most accurate proposed to date, for our
• By being aware of the frames’ computational resource performance evaluation experiments.
consumption when scheduling transmissions, we can se- Perhaps the closest work to ours is that of [10]. In that work,
lect those frames that prevent the system from incurring the authors propose to downgrade the MCS of scheduled users
into computational outages. to fight computational outages. However, in contrast to our
• As shown in [13], computational resource consumption approach, [10] does not consider radio resource scheduling. In
decreases when selecting more robust modulation-and- the performance evaluation, we show that our approach (which
coding schemes (MCS); hence, by reducing the selected jointly optimizes scheduling and MCS selection) substantially
MCS for some frames we can control the overall com- outperforms an approach that combines a standard scheduling
putational resource consumption. with the MCS selection of [10] (relying on two separate
We remark that the above techniques are not decoupled, optimizations).
and they may be combined in different ways to mitigate The re-design of the VNF internals to make them robust
the impact of computational outages. Following this, the key to computational outages will be a fundamental cornerstone
contribution of this paper is the design of a mechanism that of virtualized mobile network protocol stacks. By building
jointly optimizes scheduling and MCS selection to improve the network functions that are elastic with respect to the computa-
performance of a virtualized RAN system with (temporarily) tional resources available in the cloud, infrastructure providers
limited computational resources. can host more tenants on the same infrastructure, having a
better resources utilization and increasing thus the CPU cycle
per dollar ratio.
Related Work
There is a vast amount of work in the literature focusing on
Key contributions
the design of algorithms for scheduling and for modulation-
and-coding scheme selection. However, most of this work The key contributions of the paper are as follows. After
assumes unlimited amount of computational resources, and presenting the system model (Section II), we introduce the
focuses only on communications resources (namely, channel maximum step algorithm, which maximizes the resulting
quality). In this context, many scheduling algorithms have performance at each step, and show that this algorithm is
been designed to optimize performance under the given chan- asymptotically optimal (Section III). As the maximum step
nel quality (see, e.g., [14]–[16]). From all these algorithms, the algorithm involves solving an NP-hard problem, we devise the
most relevant ones are those aiming at providing proportional Computation-AwaRE Scheduling (CARES) algorithm, which
fairness [17]–[22], which is the criterion we focus on in this is computationally efficient and provides bounded performance
paper. However, these algorithms do not take into account guarantees (Section IV). Finally, we exhaustively evaluate the
computational resources, and the choice of the optimal sched- performance of the proposed approach via simulation, show-
ule may substantially vary when taking them into account. ing that CARES achieves savings in terms of computational
There have also been many proposals on modulation and capacity of up to 80% while paying a very small price in
coding scheme (MCS) selection algorithms [23]–[25]. How- performance (Section V).
ever, all these algorithms assume that the only constraint when
selecting the MCS is the channel quality, and do not consider II. M ODEL AND PROBLEM FORMULATION
any other aspect. Our approach is therefore fundamentally In this section, we present our system model and formulate
different, as channel quality only provides us with an upper the optimization problem that will drive the computational-
bound on the MCS that can be selected, but depending on the aware scheduling of the users. The model is inspired in LTE-A
available computational resources we may select lower MCSs. but is generally applicable to any cellular system.
In traditional networks, scheduling and MCS selection are
fully decoupled: first, the highest MCS under the current
channel quality is selected for each device, and then devices A. System model
are scheduled with the selected MCS for each of them. This We consider a typical cloud-computing platform that is
is fundamentally different in our case, where we can counter serving a set S of RAPs. The amount of resources devoted
computational resources outages either by scheduling less to the baseband processing for these RAPs is given by the
demanding users or by selecting a more robust MCS for some computational capacity C. Each RAP has a number B of
of them, and hence we need to address scheduling and MCS resource blocks. We denote by Us the set of users associated
selection jointly. with RAP s ∈ S.
Current trends on virtualization and cloudification of net- In line with standard LTE schedulers [28]–[32], time is
work functions have led to some research work focusing divided into Transmission Time Intervals (TTIs) and at every
BEGA et al.: CARES: COMPUTATION-AWARE SCHEDULING IN VIRTUALIZED RADIO ACCESS NETWORKS 3
TTI scheduling decisions are performed for the next TTI based 9
on the current channel quality of the different users. Such a cum
8
scheme allows to take into account the SNR level of each 7
MCS12
MCS26
MCS10
MCS15
MCS20
MCS25
cum [Kbit-iter/s prb]
MCS0
MCS5
user resulting from fading, interference and other effects. We 6
can thus select users when they experience their best channel 5
quality; this is commonly referred to as “multi-user diversity”,
4
as it exploits the fact that different users will experience peaks
3
in their channel quality at different times. More specifically,
2
at each TTI a central scheduler decides (i) the mapping of
1
users to resource blocks, and (ii) the MCS selected for each
user. This decision is taken with the goal of improving the 0
-5 0 5 10 15 20
overall system performance given the SNR of each user and SNR [dB]
the available computational capacity C.
Fig. 1: Computational load a function of SNR level, obtained
In our system model, we assume that each user sees the
from the analysis of [13].
same SNR level in all resource blocks at a given TTI. The
SNR level constrains the set of MSC schemes that may be
selected for a user: we denote the set of possible MCS schemes
to a cloud environment (e.g., a datacenter) consisting of
available for user u at TTI t by Mu (t). Specifically, we may
commodity hardware, which is the natural scenario for a
select an MCS scheme from this set under the constraint that
virtualized RAN system such as the one considered in this
all the resource blocks assigned to the user in a TTI have
paper. If we used hardware with accelerators such as GPU
to employ the same MCS (as imposed by LTE-A). We further
(Graphics Processing Unit) or FPGA (Field-Programmable
let cu,m (t) denote the computational load incurred to decode a
Gate Array), the computational load would change as a result
resource block for user u under MCS m ∈ Mu at TTI t (given
of parallelizing some of the tasks. In any case, the algorithm
the user’s channel at that time). Following [10], we impose
we propose can be applied to any hardware platform, simply
that the aggregate computational load in the system (given by
by taking the corresponding values for the computational load
the individual loads cu,m (t)) cannot exceed the computational
and capacity.
capacity C.
Note that the computational load cu,m (t) incurred by user
u at time t is a function of the SNR level at that time and the B. Legacy Proportional Fair (PF) Scheduler
MCS selected for the user. Indeed, the computational effort
In this paper, we aim at providing proportional fairness (PF),
required to decode a sub-frame increases with the number
which is a widely accepted criterion for sharing resources.
of iterations, the number of code blocks and the number of
Indeed, much effort in the literature has been devoted to the
information bits, among other factors; in turn, the number
design of PF scheduling algorithms (see, e.g., [17]–[22]). In
of iterations depends on the SNR, and the number of code
the following, we present the formulation of a legacy PF
blocks and information bits depend on the selected MCS. To
scheduler under no computational capacity constraints, which
determine the value of cu,m (t), we follow the model provided
will be taken as a baseline for our approach.
in [13], according to which the computational load for MCS
Let xu,b (t) be a variable that indicates if resource block
m with a given SNR is equal to
b is assigned to user u at TTI t. Let ru,b (t) be the rate
log2 − (ζ−2) log ˆchannel
10 provided to user u by resource block b at that time, depending
"
Kζ
cm (SNR) = rm max 1, on the user’s SNR. A scheduling algorithm involves deciding
log2 (ζ − 1) the assignment of resource blocks to users (i.e., the xu,b (t)
#
2 log2 [log2 (1 + SNR) − rm ] values) depending on the users’ channel quality at each point
− in time (reflected by ru,b (t)). Once a scheduling decision
log2 (ζ − 1)
is taken, P the rate received by user u at TTI t is equal to
where rm is the rate provided by one resource block with ru (t) = b∈Bs(u) xu,m (t)ru,b (t), where Bs(u) is the set of
MCS m, ˆchannel is the constraint on the channel outage resource blocks at user u’s RAP. The average rate of user u
probability, and ζ and K are constant parameters; following is then given by the following expression, which corresponds
[13], we set ζ = 6 and K = 0.2 and ˆchannel = 0.1%. Moreover, to an exponential moving average updated at each TTI [17],
in the above expression we set a minimum computational load [19], [21], [22], [33]:
which corresponds to one iteration in order to account for large
SNR values. Fig. 1 illustrates the resulting computational load Ru (t + 1) = (1 − α)Ru (t) + αru (t),
per resource block, cu,m , as a function of the instantaneous where ru (t) is the rate received by user u at TTI t and α
SNR level; specifically, the figure shows (i) the computational is a smoothing factor which discounts past values of ru (t) to
load of a few selected MCSs (in different colors) and (ii) the reflect that more recent transmissions provide a higher value to
computational load of the highest possible MCS for each SNR the user. As the PF criterion aims to maximize the long-term
value (in gray). throughputs [17]–[22], this implies that, with this criterion,
Note that the above computational load values correspond recent transmissions are not valued significantly higher than
previous ones. Following this, in the rest of the paper we xu,b,m (t) ∈ {0, 1} ∀ u, b, m, t. (4f)
assume that the discount factor α takes a small value.
With the above, PF scheduling can be formulated as an The above formulation corresponds to a joint optimiza-
optimization problem that maximizes the overall performance tion problem involving both scheduling and MCS selection,
or utility, given by the sum of the logarithms of the average under the constraints that the consumption of computational
rates, under the constraint that a resource block cannot be resources does not exceed the available capacity (4b), each
assigned to more than one user: resource block is only assigned once (4c), a user can employ
X only one MCS4 (4d) and we cannot select a higher MCS
maximize log E[Ru (t)] than the one allowed by current radio conditions (4e). The
{xu,m (t)}
u∈U
X optimization problem is similar to the legacy proportional fair
subject to xu,b (t) ≤ 1, xu,b (t) ∈ {0, 1} ∀ u, b, t. scheduling presented in Section II-B, except for the constraint
u∈U on computational load. However, this constraint makes the
problem much harder, as it couples scheduling decisions with
The above problem has been solved by the literature, showing
MCS selection (which are decoupled in the legacy case)
that performance is optimized when at a given TTI t, the user
and makes the problem NP-hard (in contrast to legacy PF
with the highest ru (t)/Ru (t) is scheduled (see, e.g., [22]).
scheduling).
C. Computation-aware Proportional Fair scheduling and The above problem formulation allows to select an MCS
MCS selection lower than the allowed by the user’s radio conditions (which
we refer to as an MCS downgrade) when computational
The previous problem formulation assumes that computa-
resources are limited. While this may have a negative impact
tional resources are unlimited; under such conditions, opti-
on throughput, our optimization guarantees that such MCS
mizing MCS selection and scheduling are decoupled problems:
downgrades will only be performed when this contributes
users first select the highest MCS allowed by their radio condi-
to improving the overall system performance. Indeed, the
tions, and then scheduling chooses the users that maximize the
results presented in Section V show that such selective MCS
system’s utility given the pre-selected MCSs. In this paper, we
downgrades can drastically reduce computational resource
focus on computational-constrained systems where scheduling
consumption while paying a minimum price in terms of
and MCS selection decisions take computational resources into
throughput performance.
account, which couples these decisions and leads to a joint
optimization problem. In the resulting problem formulation, The underlying assumption behind the above problem for-
scheduling and MCS selection are driven by the decision mulation is that all users are backlogged, i.e., they always
variable xu,b,m (t), which indicates that resource block b is have a frame ready for transmission, and the algorithm devised
assigned to user u with MCS m at TTI t. Based on this hereafter is based on this assumption. However, in case some
variable, the rate received by user u at TTI t is given by: of the users are not backlogged, we can adapt our algorithm
X X following the approach proposed in [34] to ensure that (i)
ru (t) = xu,b,m (t)ru,m (t), (3) a user cannot steal resources from the other users upon be-
b∈Bs(u) m∈Mu (t) coming backlogged, and (ii) under low loads, i.e., when there
are no backlogged users, the system provides low latencies
where Bs(u) is the set of resource blocks at user u’s RAP,
(see [34] for more details).
Mu (t) is the set of MCS options available for user u at time
t (according to the radio conditions at that time) and ru,m (t) is
the rate provided to user u by one resource block with MCS m.
With this, the goal of maximizing overall performance or III. O PTIMAL SOLUTION
utility in terms of PF in this system leads to the following
We next present the maximum step algorithm and show
optimization:3
X that this algorithm is asymptotically optimal, meaning that
maximize log E[Ru (t)] (4a) it provides optimal performance when α → 0. Note that,
{xu,b,m (t)}
u∈U as mentioned before, we can expect that α will be set to
a small value, which means that in practice the maximum
X X X
subject to xu,b,m (t) cu,m (t) ≤ C, ∀ t,
u∈U b∈Bs(u) m∈Mu (t)
step algorithm will provide a very good approximation to the
(4b) optimal solution.
X X The maximum step algorithm maximizes the objective func-
xu,b,m (t) ≤ 1, ∀ b, t, (4c) tion increase at each step, i.e., it maximizes the following
u∈U m∈Mu (t)
expression for each t,
xu,b,m (t) + xu,b0 ,m0 (t) ≤ 1, ∀ u, b, b0 , m0 6= m, t, X X
(4d) log Ru (t + 1) − log Ru (t).
u∈U u∈U
xu,b,m (t) = 0, ∀ u, b, m ∈ / Mu (t), (4e)
3 Our formulation focuses on uplink scheduling, as the load at the cloud- 4 In particular, following the LTE-A specifications mentioned in Sec-
computing platform is dominated by uplink frames [27]; however, the formu- tion II-A, this constraint forces that if xubm is equal to 1 for a certain user
lation could be easily extended to incorporate downlink scheduling by simply u and a certain MCS m, it cannot be equal to 1 for the same user u and
adding downlink transmissions with their (low) computational cost. another MCS m0 .
The above can be expressed as follows: Theorem 1. If the wireless channels of the RAPs behave ac-
X X X Ru (t + 1) cording to a Markov chain, then the maximum step algorithm
log Ru (t + 1) − log Ru (t) = log = becomes (asymptotically) optimal as α → 0.
Ru (t)
u∈U u∈U u∈U
α
P P
xu,b,m (t)ru,m (t) + (1 − α)Ru (t) While the maximum step algorithm provides (asymptoti-
X b∈Bs(u) m∈Mu (t) cally) optimal performance, it involves finding the best strategy
log , ∆(t) at each t, which is very costly in terms of computational
Ru (t)
u∈U
complexity. This is confirmed by the following theorem, which
where s(u) is the RAP serving user u. shows that the maximum step algorithm is an NP-hard problem
Maximizing the above expression is equivalent to maximiz- and thus infeasible in practical scenarios. The proof of the
ing theorem is provided in Appendix II.
P P !
Y α b∈Bs(u) m∈Mu (t) xu,b,m (t)ru,m (t) Theorem 2. The maximum step algorithm is NP-hard.
+ (1 − α) .
Ru (t)
u∈U
If we divide the above by (1 − α)|U | , the maximization still IV. A PPROXIMATE SOLUTION : CARES ALGORITHM
holds, yielding to
P P
Y α b∈Bs(u) m∈Mu (t) xu,b,m (t)ru,m (t)
! In this section we propose an heuristic, the Computational-
+1 . AwaRE Scheduling algorithm (CARES), to obtain an approxi-
(1 − α)Ru (t) mate solution to the optimization problem of (4a)-(4f). The
u∈U
proposed approach (i) incurs an acceptable computational
To simplify notation, let us denote the number of re-
overhead, and (ii) provides a performance that is provably
source blocks assigned to user P u with MCS m at time
close to the optimal by a constant factor.
t by nu,m (t), i.e., nu,m (t) = b∈Bs(u) xu,b,m (t), which
The CARES algorithm proposed here is inspired by the
allows to Pwrite the above as a more compact expression:
Q α m∈Mu (t) nu,m (t)ru,m (t) algorithm proposed in [40] for the multiple-choice knapsack
u∈U (1−α)Ru (t) + 1 . It can be seen that problem. While the problem addressed in [40] and CARES
this expression is equivalent to have some similarities, there are also fundamental differences:
Y Y αnu,m (t)ru,m (t) (i) in [40] there are no constraints on the number of items

+1 , that can be inserted in a knapsack, while in CARES this is
(1 − α)Ru (t)
u∈U m∈Mu (t)
limited by the number of available resource blocks; (ii) in
since the terms for which nu,m (t) = 0 have no effect in either CARES, a single user cannot employ different MCSs, while
of the expressions. [40] does not have such a constraint, (iii) in CARES, a
Let us denote by ∆(t) the strategy followed for the assign- user can be assigned several resource blocks, while items
ment of users to resource blocks as well as for selecting the cannot be repeated in [40], and (iv) CARES aims to utilize
MCS of each user. The definition of a strategy is determined all resource blocks, in contrast to [40]. These differences
by variables {nu,m (t), ∀u, m}. According to the analysis have strong implications in terms of (i) the algorithm design
presented in this section, we have that the strategy followed (the branching module has been adapted to the specifics of
by the maximum step algorithm is given by our problem, while the improvement module is completely
Y Y αnu,m (t)ru,m (t) new), (ii) the theoretical analysis (the complexity analysis
arg max∆(t) +1 (5) differs substantially and the performance bound proof has been
(1 − α)Ru (t) adapted, adjusting Lemma 1 and incorporating Lemma 2) and
u∈U m∈Mu (t)
(iii) the performance results (see Figure 5 for a comparison
under the constraints (4b)-(4f).
of CARES against a variation of [40] in terms of network
Hence, the maximum step algorithm takes the scheduling utilization).
decision at each TTI t that maximizes (5) given the value of
the various variables at that time. More specifically, it decides
the number of resource blocks to assign to each user, nu,m (t),
depending on the rate a user would receive, ru,m (t), and her A. Algorithm description
current average rate based on previous transmissions, Ru (t)).
The CARES algorithm builds on the following approxima-
The following theorem shows that this algorithm is asymp- tion:
totically optimal as α → 0, which (as discussed in Sec- αn (t)r (t) αr (t)
u,m u,m u,m
tion II-B) is a reasonable assumption when dealing with long- log +1 ≈ nu,m (t) log +1 ,
term rates. The theorem also assumes that the channel state of (1 − α)Ru (t) (1 − α)Ru (t)
(6)
the various RAPs can be modeled as a finite Markov chain.
which is accurate for small α; indeed, when α is small both
Note that this is a mild constraint, as Markov chains allow
the left-hand side and the right-hand side of the above equation
for very general models. Indeed, many works rely on similar
can be approximated by
assumptions to model wireless channels [33], [35]–[39]. The
proof of the theorem is provided in Appendix I. αnu,m (t)ru,m (t)
.
(1 − α)Ru (t)
Algorithm 1 Branching module algorithm Algorithm 2 Binary search module algorithm

Step 1: Set the initial resource block assignment {x0u,b,m } 1: Remove the {u, m} pairs for which cu,m > C
Discard those {u, m} pairs for which pu,m /cu,m < 0.8P/C 2: L ← maxu,m pu,m , U ← B|S| maxu,m pu,m , P ← U/2
{u0 , m0 } ← select the {u, m} with the largest pu,m 3: P̂ ← L and x̂u,b,m ← user with largest pu,m scheduled
for each RAP among remaining pairs . assignment yielding P̂
{x0u,b,m } ← assign all the resource blocks of the RAP to user 4: While U/L > 5 do
u0 under MCS m0 5: Run the branching module to
Step 2: Determine whether P ∗ > 0.4P or P ∗ ≤ 1.6P holds determine if P ∗ ≤ 1.6P or P ∗ ≥ 0.4P
P = {u,b,m} x0u,b,m pu,m
0
get {x̃u,b,m }
P
. performance in terms of profit for the {x0u,b,m } assignment 6: If P̃ > P̂ then {x̂u,b,m } ← {x̃u,b,m }, P̂ ← P̃
If P 0 > 0.8P then it holds P ∗ > 0.4P 7: If P ∗ ≤ 1.6P then U ← 1.6P and go to 9
If P 0 ≤ 0.8P then it holds P ∗ ≤ 1.6P 8: If P ∗ ≥ 0.4P then L ← 0.4P and go to 9
P ← U/2
P the resource block assignment solution {x̃u,b,m }
Step 3: Build 9:
C 0 = {u,b,m} x0u,b,m cu,m
If C 0 ≤ C then return {x̃u,b,m } = {x0u,b,m } and stop
Algorithm 3 Improvement module algorithm
Else . Build a subset of {x0u,b,m } denoted by {x̃u,b,m }
{x̃u,b,m } = P∅ Step 1: First iteration
. 1: Take the solution provided by the binary search module
While C̃ = {u,b,m} x̃u,b,m cu,m ≤ C do
2: Add {u, m} pairs that fit within the leftover capacity in
{x̃u,b,m } ← element remaining in {x0u,b,m } with
order of decreasing pu,m
the largest pu,m /cu,m 3: primary users ← users selected by the binary search module
If C̃ > C/2 then return {x̃u,b,m } and stop 4: secondary users ← users added above
Else . Rebuild {x̃u,b,m } 5: Compute the performance of this solution in terms of
{x̃u,b,m } = ∅, {x̃u,b,m } ← element in {x0u,b,m } aggregate profit
with the largest pu,m Step 2: Start a new iteration
While C̃ ≤ C do 6: Select the user with the largest cu,m and downgrade her
{x̃u,b,m } ← element in {x0u,b,m } with largest MCS
pu,m /cu,m 7: Remove {u, m} pairs with a higher complexity than the old
Return {x̃u,b,m } and stop cu,m of the downgraded user
8: Keep fixed the MCS of the user downgraded above and of
all the other primary users
9: Fill the remaining resource blocks following step 3 of the
With the above approximation, the optimal solution is given branching module
by the {xu,b,m } setting that solves the following optimization Step 3: Fill leftover capacity with secondary users
10: If some resource blocks remain unassigned then
problem: 11: Add {u, m} pairs that fit within the leftover capacity
XXX in order of decreasing pu,m
maximize xu,b,m pu,m 12: . Update both sets for the next iteration
{xu,b,m }
u b m 13: primary users ← users selected in step 2
subject to (4b) − (4f), 14: secondary users ← users selected in step 3
Step 4: Evaluate performance and decide whether to perform
.

αru,m (t) a new iteration
where we define the profit as pu,m = log (1−α)R u (t)
+1 .
15: Evaluate performance with the solution provided by step 3
To solve the above optimization, the design of CARES 16: Compare performance with the one obtained in the previous
consists of three modules: (i) the branching module; (ii) the iteration
binary search module, and (iii) the improvement module. 17: If performance has improved then go to Step 2
CARES works as follows. First, it runs the binary search 18: Else stop the algorithm and return the solution from the
module to find a candidate solution. In each of the steps of previous iteration
the binary search, the branching module provides a possible
solution. Once the binary search returns the candidate solution,
the improvement module is executed to try to find a better one. solution stored in variable x̂u,b,m , whose (aggregate) profit is
Below, we describe each of the modules of CARES in detail: given by P̂ . The pseudocode is provided in Algorithm 2.
Branching module: Let P ∗ denote the maximum profit, Improvement module: The objective of this module is to fur-
∗
P
P = max {u,b,m}
P x u,b,m pu,m , resulting from finding ther improve performance when the solution x̂u,b,m returned
the maximum value of {u,b,m} xu,b,m pu,m under constraints by the binary search module does not use all the available
(4b)-(4f). The aim of this module is to determine, for a resource blocks. The idea is to downgrade the MCS of some
given P value, if P ∗ ≤ 1.6P or P ∗ ≥ 0.4P . Moreover, it selected users (i.e., choosing lower MCS for them) to make
returns
P a feasible solution, {x̃u,b,m }; hereafter, we let P̃ denote room for other users to be scheduled. This module runs in
{u,b,m} x̃u,b,m pu,m for the returned subset. The pseudocode iterations: at each iteration, we obtain a new solution and
is provided in Algorithm 1. compare its performance with the previous iteration. We stop
Binary search module: This module performs a binary when the current iteration does not improve performance over
search on P until it finds a value that satisfies certain condi- the previous one. The pseudocode is provided in Algorithm 3.
tions, and returns the solution {xu,b,m } corresponding to this One of the key features of CARES is that it provides a
value. When the binary search module stops, it returns the performance bound. The rationale to achieve this bound is
as follows. CARES runs the binary search module within an first candidate solution. After the binary search is complete,
upper limit U and a lower limit L. In each step, we set P = the improvement module runs at most one iteration for each
U/2 and run the branching module to find a solution {x0u,b,m }. {u, m} pair, since we can downgrade each pair only once.
Based on the outcome of the branching module, we either set Every iteration incurs a complexity of O(|W|). Thus, the
L = 0.4P or U = 1.6P and continue the binary search. It can overall complexity of this module is O(|W|2 ). Adding this
be proven that in all the steps of the search it holds that (i) value to the one obtained for the binary search module gives
the profit of the solution provided by the branching algorithm the total complexity of the algorithm.
satisfies P 0 > L, and (ii) the upper bound satisfies U > P ∗ .
Then, by executing the search until U/L ≤ 5, we can bound V. P ERFORMANCE E VALUATION
the distance between the profit provided by CARES (P 0 ) and We next evaluate via simulation the performance of the
the maximum profit (P ∗ ). Note that the choice of the various CARES algorithm in terms of utility, throughput and capacity
parameters (such as 0.4 and 1.6) is crucial to guarantee the savings. In line with similar works in the literature [22], [28],
provided performance bound. This is further analyzed in the [41]–[44], in our simulator the wireless channel follows the
following section. log-distance path loss model with Rayleigh fading [45], with
a path loss exponent γ = 3 and SN R0 = 17.6 dB (which
B. Performance bound is the SNR threshold for the highest MCS [13]). We set
To gain insight into the performance provided by CARES, in α = 0.01 for all experiments (unless otherwise stated) and and
the following we provide a theoretical bound for the distance employ 95% confidence intervals below 1% (unless explicitly
between CARES’ performance and the optimal one. Note shown). In order to show the gains provided by the CARES
that, when α → 0, the approximation of (6) becomes exact algorithm proposed here, we compare its performance against
and the performance of the optimal solution is equal to the the following benchmarks:
maximum profit P ∗ . Theorem 3 below shows that under • ‘Optimal’, which corresponds to the maximum step algo-
these conditions the proposed algorithm provides bounded rithm of Section III.
performance guarantees; specifically, the (relative) difference • ‘PF w/ MCS’, which first selects users by means of a
between the performance provided by CARES and the optimal proportional fair scheduler [17] and then adjusts their
one is bounded by a constant factor. The proof of the theorem MCS following the approach of [10].
is provided in Appendix III. • ‘Greedy’, which implements the algorithm proposed
in [46] to solve the knapsack problem resulting from the
Theorem 3. Let {x̂u,b,m } be the solution provided by CARES approximation given by (6).
and letPP̂ be
P the profit
P provided by this solution, i.e., • ‘Legacy PF’, which implements a legacy proportional fair
P̂ = u∈U b∈Bp(u) m∈Mu x̂u,b,m pu,m . Then, it holds scheduling [17] to select user transmissions and processes
P ∗ −P̂
P∗ ≤ 4/5. as many frames as possible with the given computational
capacity, dropping the frames that exceed the capacity.
C. Complexity analysis
In the following we analyze the algorithm’s complexity, A. Utility performance
showing that it is upper bounded by O(|W| log(B|S|)+|W|2 ), We first analyze performance in terms of the objective
where |W| is the number of {u, m} pairs in the network function (4a), which corresponds to proportional fairness util-
satisfying cu,m ≤ C, |S| is the number of RAPs and B is the ity [19], comparing the utility provided by CARES against
number of resource blocks per RAP. Moreover, the simulations the other approaches. We consider a scenario sufficiently
of Section V-F show the actual complexity is substantially small to allow for the computation of the optimal solution: a
lower than this bound. These results confirm that CARES network consisting of 5 RAPs, each with 5 users located at a
scales well with the size of the network and that computational distances d0 , 2d0 , . . . , 5d0 from the RAP respectively (where
complexity is sufficiently low to allow its deployment in d0 corresponds to the distance that yields an average SNR
practical settings, specially taking into account that CARES is equal to SN R0 ). Each base station has B = 50 resource
expected to run in the cloud, where computational resources blocks.
are typically not scarce. Results are shown in Figure 2 as a function of the computa-
The computational complexity bound is computed as fol- tional capacity C. They show that CARES closely follows the
lows. For the binary search module, initially we have L ≤ optimal performance way beyond the theoretically guaranteed
P ∗ ≤ U = B|S|L. Every time we visit step 3.1, U is bound, confirming that our algorithm is effective in jointly op-
reduced to 0.8 of its previous value. Then, we need no timizing scheduling and MCS selection. We also observe that
more than loga (B|S|/5) visits to step 3.1 until U ≤ 5L CARES substantially outperforms all the other approaches,
(where a = 10/8). If at any time the algorithm visits step providing a more graceful degradation as C decreases. The
3.2, L becomes 0.2U and the algorithm stops. Thus, we performance obtained with the ‘PF w/ MCS’ approach is
run the branching module no more than loga (B|S|/5) times. substantially lower for smaller C values, where user selection
Taking into account that the complexity of the branching is heavily constrained by the available capacity (see Sections
module is O(|W|), this yields to an overall complexity of V-B and V-C for more details). ‘Greedy’ tends to schedule
O(|W| log(B|S|)), until the binary search module provides a users with higher pu,m /cu,m ratios, which is a valid strategy
400 0.6
Optimal Legacy PF CARES C = 1 Mbit-iter/s d0 3d0 5d0
350 PF w/ MCS Greedy 0.4 2d0 4d0
300
Throughput [Mbps]
0.2
250
Frame access delay [ms] 0
Utility
200 80 Optimal CARES PF w/ MCS Legacy PF Greedy

150 60 6 C = 10 Mbit-iter/s
40
100 4
20
50 2
0
0 1 10
0
1 10 Optimal CARES PF w/ MCS Legacy PF Greedy
C [Mbit-iter/s]
Fig. 3: Throughput distribution for two different C values.
Fig. 2: Utility and access delay performance in a small scale
scenario.
only schedules users with very high computational load and
does not downgrade their MCS.
when the capacity is low, but does not fully utilize the available
capacity when there is plenty of it. Finally, performance with
C. MCS downgrades
‘Legacy’ PF is drastically degraded, as the number of frames
that can be decoded rapidly decreases as we reduce capacity, One of the key ideas behind CARES is that, under shortage
reaching a point where no frames are decoded. As expected, of available computational resources, we select MCS schemes
when capacity is sufficient to decode all frames, all approaches that are lower (and computationally less intensive) than the
(except for ‘Greedy’) perform optimally, as they all simply highest possible MCS under current radio conditions. Another
apply an optimal PF scheduler with no capacity constraints. approach that follows the same idea is the ‘PF w/ MCS’, which
While the main objective of the paper is to provide PF, also downgrades the MCS of scheduled users (until they fit
which mainly concerns throughput performance, delay is also within the available capacity).
relevant for many applications. In order to provide insight into Obviously, when applying the above idea, the common goal
the delay performance of CARES, the subplot of Figure 2 of both approaches is to select the highest MCS schemes
shows the average frame access delay provided by the different under the available computational capacity. To understand how
approaches, corresponding to the time elapsed from the instant well do these approaches meet this goal, Figure 4 depicts the
when a frame reaches the head of the transmission queue average number of downgrades performed by both approaches,
until it is transmitted. We observe that CARES substantially i.e., we compare the MCS selected for the transmitted frames
outperforms all other approaches and performs close to the against the highest possible MCS for those frames under the
optimal benchmark. current radio conditions. As expected, the average number
of downgrades decreases with the computational capacity, as
larger capacities allow to fit higher MCS schemes. We further
B. Throughput distribution observe from the figure that CARES substantially outperforms
‘PF w/ MCS’, providing significantly lower downgrade rates.
In order to gain further understanding on the utility gains Indeed, as CARES takes computational needs into account
obtained in the previous section, in Figure 3 we analyze when performing scheduling, it can select combinations of
the individual throughput of each user, depending on the users that yield not too high aggregated computational load,
user’s distance to the base station (d0 , 2d0 , . . . , 5d0 ), for two thus limiting the number of required downgrades. In contrast,
representative capacity values: C = 1 Mbit-iter/s and C = 10 scheduling in ‘PF w/ MCS’ is blind to computational load, and
Mbit-iter/s. We observe from the results that the CARES therefore may lead to scheduling combinations that require a
algorithm provides almost the same throughput distribution larger number of downgrades to fit within C.
as the optimal approach, achieving larger throughputs and/or
fairer distributions than the other approaches. It is particularly
interesting to observe the behavior of the ‘PF w/ MCS’ D. Network utilization
algorithm: by decoupling scheduling from MCS selection, this One of the design goals of CARES is to maximize network
strategy favors the users with higher SNR when scheduling utilization, using as many resource blocks as possible, as this
frames; however, when the MCS of such users is downgraded, brings advantages in terms of fairness and delay, among other
this decision turns out to be harmful for the fairness without aspects. The improvement module has been specifically de-
improving the overall throughput. As anticipated, the ‘Greedy’ signed to achieve this goal. In order to assess its effectiveness,
approach performs poorly for C = 10 Mbit-iter/s, as it selects Figure 5 illustrates the fraction of resource blocks utilized by
users that (although they may be very efficient in relative CARES and by the solution provided by the binary search
terms) do not exploit all the available capacity. Finally, it module, respectively. Note that the latter is quite similar to
is also interesting to observe that for C = 1 Mbit-iter/s the the algorithm of [40], although (as explained in Section IV-A)
‘Legacy PF’ approach is unable to decode any frame, as it the algorithm of [40] has been designed for a different problem
25 shows the utility performance achieved in this scenario by the

PF w/ MCS various approaches (except for the optimal one). We observe
Number of MCS downgrades
20 CARES here a similar trend to the one shown by the small scale
scenario in Figure 2: the CARES algorithm provides a graceful
15 degradation as capacity decreases, substantially outperforming
all other approaches. This confirms the advantages of the
10 proposed approach in realistic (large-scale) scenarios.
In order to understand the computational complexity in-
5 curred by CARES in such a large scale scenario, we have
measured the number of operations it performs as compared to
0
1 10 the upper bound computed in Section IV-C. The results, shown
C [Mbit-iter/s] in the subplot of Figure 7, confirm that the actual complexity
of the algorithm is very far from the upper bound (in more
Fig. 4: Average number of downgrades with respect to the than two orders of magnitude), which further supports the
highest possible MCS feasibility of CARES in terms of computational complexity.
We further analyzed a simplified version of the CARES
100 algorithm, called S-CARES, which runs the improvement
CARES
90 module algorithm designed in Section IV over the solution
BS
Network utilization (%)
80 provided by the ‘Legacy PF’ algorithm rather than the one

70
provided by the binary search module. Results show that
the performance of this approach is fairly close to CARES
60
while computational complexity is reduced by one order of
50 magnitude, which means that S-CARES can be a viable
40 alternative for environments with heavily constrained compu-
30 tational resources.
20
1 10
C [Mbit-iter/s] G. Savings in computational capacity
Fig. 5: Network utilization with CARES and with the binary By making RAN functions robust against computational
search module solution (‘BS’). outages, CARES allows to reduce the amount of computa-
tional resources allocated to those functions while providing
minimum degradation in performance. In the following, we
and hence is not directly applicable to scheduling in a mobile analyze the capacity savings provided by CARES over the
network. Results show the effectiveness of CARES and its current RAN stack (that is, the ‘Legacy PF’ approach).
advantages over approaches such as [40], which have been As an intuitive measure to assess performance degradation,
devised for more general knapsack problems without taking we define the relative performance X as follows. Let us
into account the specific context of our scheduling problem. consider a reference throughput distribution r, and let U (r) be
the corresponding utility. We say that an alternative throughput
E. Impact of the α parameter distribution r0 provides a relative performance X if the utility
it provides is equivalent to multiplying by X the throughput
Since our goal in this paper, following the PF criterion, is of each individual user in the reference distribution, i.e.,
to optimize the long-term throughputs, we have focused on U (r0 ) = U (r̃), where r̃u = Xru ∀u (for instance, a relative
small α values; as a matter of fact, the optimality proof in performance of X = 99% is equivalent to multiplying all
Section III relies on this assumption. However, the proposed throughputs by 0.99).
CARES algorithm can be applied to any α value, including
For a given relative performance level X, we obtain the
large ones. In order to gain insight into the performance of
(computational) capacity savings achieved by CARES as
CARES for different α’s, Figure 6 shows the utility provided
follows. Let Cmax be the computational capacity required to
by the different approaches for a wide range of values. We
ensure that we are always able to decode all frames under the
observe from the results that CARES outperforms the other
‘Legacy PF’ approach, and let Umax be the utility achieved
approaches for all α’s. We also observe that, for the ‘PF w/
in this case. Let Cx be the capacity required by the CARES
MCS’ approach, utility improves as α grows; this is due to
algorithm to provide a relative performance of X with respect
the fact that large α values favor the selection of users with
to the maximum utility Umax . Then, the capacity savings
lower MCS, and this avoids downgrades. −Cx
provided by CARES is given by Cmax Cmax , which reflects the
amount of capacity that can be saved (in %) when admitting
F. Large scale network a relative performance degradation of X.
In the following we consider a much larger scale scenario, Figure 8 shows the capacity savings achieved by CARES
consisting of 50 RAPs serving 50 users each, placed at a for different relative performance levels, X = 99%, 95% and
random distance between d0 and 5d0 from the RAP. Figure 7 90%, as a function of the number of RAPs in the network,
PF w/ MCS Legacy PF Greedy CARES Optimal
320
C = 1 Mbit-iter/s C = 10 Mbit-iter/s
320
220
Utility
300
120
280
20
0.001 0.01 0.1 0.001 0.01 0.1
α
Fig. 6: Utility performance for different α values.
30 80
PF w/ MCS S-CARES
25 Legacy PF CARES
Capacity Savings [%]

Greedy 75
20
Utility [x1000]
Complexity (no. operations) 70

15
109
10 107 X = 99 %
65
105 X = 95 %
5
Upper Bound X = 90 %
103
0 1 10 100 60
1 10 100 20 40 60 80 100
C [Mbit-iter/s] RAPs
Fig. 7: Utility and complexity in a large scale scenario. Fig. 8: Computational capacity savings provided by CARES
over ‘Legacy PF’.
under the same conditions as the large scale scenario described

in Section V-F. We observe that our approach provides very platforms. However, the current RAN protocol stack was not
substantial capacity savings (between 70% and 80%), while designed to be executed in a virtualized environment, and its
paying a very small price in performance. We further observe performance degrades drastically when the required computing
that, while capacity savings increase when clustering more resources are not available. To address this issue, in this
RAPs in the same pool as a result of the multiplexing gains, paper we have re-designed two fundamental functions of the
they are already very high for a single RAP and almost RAN protocol stack: (i) scheduling, and (ii) MCS selection.
maximum with only around 5 RAPs. This implies that it is not By considering computational load when scheduling frame
necessary to cluster a large number of RAPs in a central data- transmissions, we aim at reducing the level of variability of
center to achieve high capacity savings; instead, it is sufficient the computational load incurred. Furthermore, by downgrading
to cluster a small number of RAPs in a location that can the selected MCS when needed, we can reduce the impact of
be relatively close to the antennas, as would be the case for computational resource shortage on performance.
instance with multi-access edge computing (MEC). One of the key contributions of this paper has been to
The above results highlight the advantage of employing address the design of the two schemes jointly, which leads to
CARES in a virtualized RAN environment: (i) by making substantial performance improvements. When addressing the
RAN VNFs computationally aware, CARES provides very joint optimization of scheduling and MCS selection, we have
remarkable capacity savings; and (ii) such savings are al- relied in analytical tools to obtain the solution that provides
ready achieved with relatively small RAP clusters. Thus, this optimal performance. Building on the insights gained from
technology allows an Infrastructure Provider to deploy edge this solution, we have designed the CARES algorithm, which
infrastructure (which usually entails high costs) with a very incurs limited complexity while providing a performance that
high yield of decoded bytes per deployed CPU cycle ratio. is provably close (by a certain factor) to the optimal.
The conducted performance evaluation has confirmed that
CARES is a practical and effective approach for virtualizing
VI. C ONCLUSIONS the RAN functionality. With CARES, we can reduce the
There is a widely accepted trend towards the virtualization amount of computing resource allocated to RAN functions
of RAN functions and their execution in cloud-computing by 70% or more, while retaining almost the entire utility.
Furthermore, performance with CARES is gracefully degraded algorithm. This confirms that both algorithms are maximizing
in case of (temporary) resource shortages, such that improper the same expression and hence are the same algorithm.
dimensioning of virtual machines pools have a very mild We next show that the gradient algorithm is asymptoti-
impact in performance. In summary, CARES represents a cally optimal. According to [33], the gradient algorithm is
pioneering step towards a new paradigm of virtualization- asymptotically optimal as long as the following conditions
enabled protocol stacks. are satisfied: (i) the objective function U is a strictly concave
One of the main contribution in this paper has been smooth function, (ii) for each element, there is always some
the proposal of a new paradigm for scheduling algorithms: decision under which its rate is non-zero, and (iii) the region
computational-aware scheduling, which takes into account of long-term rates Ru is convex, bounded and closed. Note that
computational resources and jointly optimizes scheduling and [33] relies on a model that covers our system: it allows that
MCS selection. While here we have taken a proportional fair at a given state m (in our case the channel conditions) there
scheduler as baseline and have extended it to account for are a set k ∈ K of scheduling options (in our case the users
computational resources, the same concept could be applied scheduled in each resource block and the selected MCS) each
to other scheduling algorithms (e.g, a scheduler that optimizes of which provides a service rate vector (µm m
1 (k), . . . , µ|U | (k))
delay performance and/or provides minimum rate guarantees), m
(where µu (k) is the rate of user u with the number of resource
which could be extended following a similar methodology to blocks and MCS corresponding to this scheduling option).
the one employed in this paper. To show that the three conditions of [33] are satisfied by
our system, we proceed asP follows. Condition (i) holds for our
A PPENDIX I: P ROOF OF T HEOREM 1
objective function, U = u∈U log Ru ; indeed, this function
Theorem 1. If the wireless channels of the RAPs behave ac- is precisely one of the functions explicitly considered in [33],
cording to a Markov chain, then the maximum step algorithm referred to as ‘Type II Utility Function’. Condition (ii) follows
becomes (asymptotically) optimal as α → 0. from the fact that when we schedule some user, this user
Proof. To prove the theorem, we start by showing that as experiences a positive rate. Finally, condition (iii) holds under
α → 0 the maximum step algorithm defined by our paper some natural non-restrictive assumptions that are satisfied by
is equivalent toP
the gradient algorithm analyzed in [33]. If we our system model (see [33]).
define U (t) = u∈U log Ru , the gradient is given by Since all conditions are satisfied, the result of [33] applies,
and the gradient algorithm is (asymptotically) optimal as
1 1 α → 0. Since, as we have shown above, the maximum step
∇U (t) = ,..., .
R0 (t) R|U | (t) algorithm and the gradient algorithms provide the same solu-
The gradient algorithm is the one that maximizes (see [33]) tion as α → 0, this means that the maximum step algorithm
is asymptotically optimal, which proves the theorem.
G = ∇U (t)(R0 (t + 1) − R0 (t), . . . , R|U | (t + 1) − R|U | (t)).
Noting that Ru (t + 1) − Ru (t) = αnu,m (t)ru,m (t) − αRu , A PPENDIX II: P ROOF OF T HEOREM 2
we have Theorem 2. The maximum step algorithm is NP-hard.
X X αnu,m (t)ru,m (t)
G= − α|U|. Proof. The reduction is via the multiple-choice knapsack
Ru (t) (MCKP) problem, which is known to be NP-hard. Recall
u∈U m∈Mu (t)
that the multiple-choice knapsack problem is a variant of
Given that α|U| is a constant, maximizing G is equiva-
P αn (t)r (t) the knapsack problem and is stated as follows. We have m
lent to maximizing {u,m} u,mRu (t)u,m . Hence, we con- mutually disjoint classes N1 , . . . , Nm of ni items to pack in a
clude that the gradient algorithm is the one that maximizes knapsack of capacity W . Each item i of class j, i ∈ Nj , has
P αnu,m (t)ru,m (t)
{u,m} Ru (t) . a profit pij and a weight wij , and the problem is to choose at
We now look at the maximum step algorithm and show that most one item from each class in such a way that (i) the total
it maximizes the same expression as the gradient algorithm. If profit sum is maximized, and (ii) the sum of weights does not
we take the log of the function maximized by this algorithm, exceed W . The MCKP can thus be formulated as:
i.e., (5), and let α → 0, we can see that this algorithm m X
maximizes the following expression
X
maximize pij xij
  {xij }
j=1 i∈Nj
Y Y αnu,m (t)ru,m (t)
m X
lim log  +1 = X
α→0 (1 − α)Ru (t) subject to: wij xij ≤ W,
u∈U m∈Mu (t)
j=1 i∈Nj
X X αnu,m (t)ru,m (t)
= lim log +1 =
X
α→0 (1 − α)Ru (t) xij ≤ 1 ∀j,
u∈U m∈Mu (t)
i∈Nj
X X αnu,m (t)ru,m (t) xij ∈ {0, 1} ∀j, i ∈ Nj .
= ,
(1 − α)Ru (t)
u∈U m∈Mu (t)
Let us define an instance of our algorithm where we have:
which (given that the constant 1 − α does not change the (i )one resource block per RAP, (ii) a RAP s for each
maximization) coincides with the expression of the gradient class Nj , (iii) only one MCS available for each user, (iv)
for each item i of class Nj , a user u at RAP s such that Therefore P ∗ < 1.6P .
αru,m (t)
log (1−α)R u (t)
+ 1 = pij and cu,m (t) = wij , and (v) the Lemma 2. When P 0 > 0.8P , the setting returned by the
total computational capacity C equal to W . branching module satisfies P̃ > 0.4P .
If now we take a solution to the maximum step algorithm P 0
and choose the items that correspond to the users scheduled Proof. If {u,b,m} xu,b,m cu,m ≤ C, we can see from the
for transmission, the resulting set of items provides a solution proof of Lemma 1 that P̃ = P 0 > 0.8P > 0.4P holds for
to the MCKP, which proves the theorem. the items selected by the branching algorithm. Otherwise, we
consider two cases: (i) the sum of the cu,m ’s of the items
selected by the branching module is larger than C/2, and (ii)
A PPENDIX III: P ROOF OF T HEOREM 3 the sum of the cu,m ’s of these items is smaller than or equal
To prove Theorem 3, we first prove the two Lemmas below. to C/2. P
For case (i) we have {u,b,m} x̃u,b,m cu,m > C/2, and thus
Lemma 1. Let P 0 be the value computed by the branching it holds
module for a given P . If P 0 > 0.8P then P ∗ > 0.4P , and if X 0.8P X
P 0 ≤ 0.8P then P ∗ ≤ 1.6P P̃ = x̃u,b,m pu,m ≥ x̃u,b,m cu,m
C
0 {u,b,m} {u,b,m}
Proof. Suppose PP > 0.8P . Consider the following
0 0.8P C
two cases: (i) {u,b,m} xu,b,m cu,m ≤ C, and (ii) > = 0.4P.
P
x 0
c > C. C 2
{u,b,m} u,b,m u,m
In case (i), {x0u,b,m For case (ii), we proceed as follows. If we take any
P } provides a feasible solution to the item {u0 , m0 } that belongs to {x0u,b,m } but not to {x̃u,b,m },
problem satisfying {u,b,m} x0u,b,m pu,m > 0.8P and hence
P ∗ > 0.8P > 0.4P . necessarily the sum of the cu,m of this item plus all the items
In case (ii), we choose a subset B 0 of resource blocks, in {x̃u,b,m } exceeds C, as otherwise {u0 , m0 } would be part
and an additional resource block b0 (assigned to user u0 of {x̃u,b,m }. Applying the same reasoning as in Lemma 1, we
0
underPMCS m ) such that
P 0 have
{b∈B0 ,u,m} xu,b,m cu,m < C
X
0 x̃u,b,m pu,m + pu0 ,m0 > 0.8P.
and {b∈B0 ,u,m} xu,b,m cu,m + cu ,m 0 0 > C. Given that
{u,b,m}
pu,m /cu,m ≥ 0.8P/C for {x0u,b,m }, we have
X Furthermore, since {x̃u,b,m } comprises the item with the
x0u,b,m pu,m + pu0 ,m0 ≥ largest pu,m , we also have
{b∈B0 ,u,m} X
  x̃u,b,m pu,m ≥ pu0 ,m0 .
0.8P  X
≥ x0u,b,m cu,m + cu0 ,m0  > 0.8P. {u,b,m}
C
{b∈B0 ,u,m} P̃ =
PThe above two equations yield
Further, P ≥ ∗
P 0 ∗
and P ≥ pu0 ,m0 , x̃ p > 0.4P .
{b∈B0 ,u,m} xu,b,m pu,m {u,b,m} u,b,m u,m
which implies that Theorem 3. Let {x̂u,b,m } be the solution provided by CARES
and letPP̂ be
P the profit
P provided by this solution, i.e.,
X
2P ∗ > x0u,b,m pu,m + pu0 ,m0 > 0.8P
{b∈B0 ,u,m}
P̂ = u∈U b∈Bp(u) m∈Mu x̂u,b,m pu,m . Then, it holds
P ∗ −P̂
and therefore P ∗ > 0.4P . P∗ ≤ 4/5.
Now suppose that P 0 ≤ 0.8P . Let x∗u,b,m be the optimal Proof. Based on Lemmas 1 and 2, we proceed as follows
assignment that yields P ∗ . Consider the optimal assignment, to prove the theorem. The proof focuses only on the binary
and denote by B1 (respectively B2 ) the set of resource blocks search module and does not consider the improvement module.
within this assignment for which pu,m /cu,m < 0.8P/C Indeed, as the improvement module just increases the perfor-
(respectively pu,m /cu,m ≥ 0.8P/C) holds. Then, mance provided by the previous steps, any bound obtained
X without taking into account this module will hold for the
P∗ = x∗u,b,m pu,m = complete algorithm.
{u,b,m}
X X It can be seen that P̂ ≥ L; indeed, (i) initially we have
= x∗u,b,m pu,m + x∗u,b,m pu,m . P̃ ≥ L; (ii) when we visit step 3.2 of the binary search
{b∈B1 ,u,m} {b∈B2 ,u,m} module, L increases to L = 0.4P but from Lemma 2 we have
that the resulting P̃ satisfies P̃ > 0.4P and hence P̃ ≥ L
But
holds; and (iii) when we visit step 3.1, we have that P̃ may
X 0.8P X
x∗u,b,m pu,m < x∗u,b,m cu,m ≤ 0.8P. increase while L remains constant. Hence, P̃ ≥ L holds in
C any intermediate steps, and will also hold in the last step, i.e.,
{b∈B1 ,u,m} {b∈B1 ,u,m}
P̂ ≥ L. Furthermore, from Lemma 1 it holds that U ≥ P ∗ ;
Further, from our initial assumption and the choice at step 1 indeed, initially U ≥ P ∗ , and we only decrease U to 1.6P
of the branching module, we have when P ∗ ≤ 1.6P .
X X
x∗u,b,m pu,m ≤ x0u,b,m pu,m ≤ 0.8P. The algorithm is guaranteed to stop because the length of
{b∈B2 ,u,m} {u,b,m} the interval [L, U ] decreases by at least 12.5% every time
the algorithm runs steps 3 and 4. When the algorithm stops, [24] J. Fan, Q. Yin, G. Y. Li, B. Peng, and X. Zhu, “MCS Selection for
we have U/L ≤ 5 from the stopping conditions. Since Throughput Improvement in Downlink LTE Systems,” in Proc. of IEEE
ICCCN, Aug. 2011, pp. 1–5.
U ≥ P ∗ , this implies that P ∗ /L ≤ 5 and hence L ≥ P ∗ /5.
[25] R. Kwan, C. Leung, and J. Zhang, “Resource Allocation in an LTE
Furthermore, since P̂ ≥ L this yields P̂ ≥ P ∗ /5, which Cellular Communication System,” in Proc. of IEEE ICC, Jun. 2009, pp.
proves the theorem. 1–5.
[26] J. Bartelt et al., “Fronthaul and Backhaul Requirements of Flexibly
Centralized Radio Access Networks,” IEEE Wireless Communications,
R EFERENCES vol. 22, no. 5, pp. 105–111, Oct. 2015.
[27] S. Bhaumik et al., “CloudIQ: A Framework for Processing Base Stations
[1] P. Rost et al., “Mobile network architecture evolution toward 5G,” IEEE
in a Data Center,” in Proc. of ACM MOBICOM, Aug. 2012, pp. 125–
Communications Magazine, vol. 54, no. 5, pp. 84–91, May 2016.
136.
[2] 3GPP, “Study on New Radio Access Technology: Radio Access Archi-
tecture and Interfaces,” TR 38.801, Aug. 2016. [28] H. A. M. Ramli, R. Basukala, K. Sandrasegaran, and R.Patachaianand,
[3] A. Checko et al., “Cloud RAN for Mobile Networks: A Technology “Performance of well known packet scheduling algorithms in the down-
Overview,” IEEE Communications surveys & tutorials, vol. 17, no. 1, link 3GPP LTE system,” in Proc. of IEEE MICC, Dec. 2009, pp. 815–
pp. 405–426, Mar. 2015. 820.
[4] China Mobile, “C-RAN: The Road Towards Green RAN,” White Paper, [29] S. B. Lee, I. Pefkianakis, A. Meyerson, S. Xu, and S. Lu, “Proportional
Dec. 2013. Fair Frequency-Domain Packet Scheduling for 3GPP LTE Uplink,” in
[5] P. Rost et al., “Cloud Technologies for Flexible 5G Radio Access Proc. of IEEE INFOCOM, Apr. 2009, pp. 2611–2615.
Networks,” IEEE Communications Magazine, vol. 52, no. 5, pp. 68–
76, May 2014. [30] H. Safa and K. Tohme, “LTE uplink scheduling algorithms: Performance
[6] D. Wubben et al., “Benefits and Impact of Cloud Computing on 5G and challenges,” in Proc. of ICT, Apr. 2012, pp. 1–6.
Signal Processing: Flexible centralization through cloud-RAN,” IEEE [31] A. S. Lioumpas and A. Alexiou, “Uplink scheduling for Machine-to-
Signal Processing Magazine, vol. 31, no. 6, pp. 35–44, Nov. 2014. Machine communications in LTE-based cellular systems,” in Proc. of
[7] M. Madhavan, P. Gupta, and M. Chetlur, “Quantifying Multiplexing IEEE GLOBECOM Workshops, Dec. 2011, pp. 353–357.
Gains in a Wireless Network Cloud,” in Proc. of IEEE ICC, Jun. 2012,
pp. 3212–3216. [32] F. Capozzi, G. Piro, L. A. Grieco, G. Boggia, and P. Camarda, “Down-
[8] T. L. Nguyen and A. Lebre, “Virtual Machine Boot Time Model,” in link packet scheduling in lte cellular networks: Key design issues and a
Proc. of PDP, Mar. 2017, pp. 430–437. survey,” IEEE Communications Surveys & Tutorials, vol. 15, no. 2, pp.
678–700, May 2013.
[9] 3GPP, “Policy and charging control architecture,” TS 23.203, 2017.
[10] P. Rost, A. Maeder, M. C. Valenti, and S. Talarico, “Computationally [33] A. L. Stolyar, “On the Asymptotic Optimality of the Gradient Scheduling
Aware Sum-Rate Optimal Scheduling for Centralized Radio Access Algorithm for Multiuser Throughput Allocation,” Opearations Research,
Networks,” in Proc. of IEEE GLOBECOM, Dec. 2015, pp. 1–6. vol. 53, no. 1, pp. 12–25, Jan. 2005.
[11] M. C. Valenti, S. Talarico, and P. Rost, “The role of computational [34] J. T. Tsai, “State-Dependent Proportional Fair Scheduling Algorithms
outage in dense cloud-based centralized radio access networks,” in Proc. for Wireless Forward Link Data Services,” in Proc. of IEEE INFOCOM,
of IEEE GLOBECOM, Dec. 2014, pp. 1466–1472. Apr. 2008, pp. 2414–2422.
[12] N. Nikaein et al., “OpenAirInterface: A Flexible Platform for 5G
Research,” ACM SIGCOMM Computer Communication Review, vol. 44, [35] A. Konrad, B. Y. Zhao, A. D. Joseph, and R. Ludwig, “A Markov-based
no. 5, pp. 33–38, Oct. 2014. channel model algorithm for wireless networks,” Wireless Networks,
[13] P. Rost, S. Talarico, and M. C. Valenti, “The Complexity–Rate Tradeoff vol. 9, no. 3, pp. 189–199, 2003.
of Centralized Radio Access Networks,” IEEE Transactions on Wireless [36] Q. Zhang and S. A. Kassam, “Finite-state Markov model for Rayleigh
Communications, vol. 14, no. 11, pp. 6164–6176, Nov. 2015. fading channels,” IEEE Transactions on Communications, vol. 47,
[14] Y. Li, M. Sheng, X. Wang, Y. Zhang, and J. Wen, “Max–Min Energy- no. 11, pp. 1688–1692, Nov. 1999.
Efficient Power Allocation in Interference-Limited Wireless Networks,”
IEEE Transactions on Vehicular Technology, vol. 64, no. 9, pp. 4321– [37] H. S. Wang and N. Moayeri, “Finite-state Markov channel-a useful
4326, Sep. 2015. model for radio communication channels,” IEEE Transactions on Ve-
[15] Z. Li, S. Guo, D. Zeng, A. Barnawi, and I. Stojmenovic, “Joint Resource hicular Technology, vol. 44, no. 1, pp. 163–171, Feb. 1995.
Allocation for Max-Min Throughput in Multicell Networks,” IEEE [38] M. Zorzi, R. R. Rao, and L. B. Milstein, “On the accuracy of a first-
Transactions on Vehicular Technology, vol. 63, no. 9, pp. 4546–4559, order Markov model for data transmission on fading channels,” in Proc.
Nov. 2014. of IEEE ICUPC, Nov. 1995, pp. 211–215.
[16] M. Kalil, A. Shami, and A. Al-Dweik, “QoS-Aware Power-Efficient
Scheduler for LTE Uplink,” IEEE Transactions on Mobile Computing, [39] J. McDougall and S. Miller, “Sensitivity of wireless network simulations
vol. 14, no. 8, pp. 1672–1685, Aug. 2015. to a two-state Markov model channel approximation,” in Proc. of IEEE
[17] M. Andrews, “A Survey of Scheduling Theory in Wireless Data Net- GLOBECOM, Dec. 2003, pp. 697–701.
works,” IMA Volumes in Mathematics and its Applications, vol. 143, pp. [40] G. Gens and E. Levnerr, “An approximate binary search algorithm for
1–17, 2007. the multiple-choice knapsack problem,” Information Processing Letters,
[18] P. Bender et al., “CDMA/HDR: A Bandwidth-Efficient High-Speed vol. 67, no. 5, pp. 261–265, Sep. 1998.
Wireless Data Service for Nomadic Users,” IEEE Communications
[41] E. Yaacoub and Z. Dawy, “A Game Theoretical Formulation for Pro-
Magazine, vol. 38, no. 7, pp. 70–77, Jul. 2000.
portional Fairness in LTE Uplink Scheduling,” in Proc. of IEEE WCNC,
[19] T. Bu, E. L. Li, and R. Ramjee, “Generalized Proportional Fair Schedul-
Apr. 2009, pp. 1–5.
ing in Third Generation Wireless Data Networks,” in Proc. of IEEE
INFOCOM, Apr. 2006, pp. 1–12. [42] 3GPP, “Technical Specification Group Radio Access Network; Deploy-
[20] S. Borst, N. Hegde, and A. Proutiere, “Mobility-Driven Scheduling in ment Aspects,” TS 25.943, Mar. 2017.
Wireless Networks,” in Proc. of IEEE INFOCOM, Apr. 2009, pp. 1260– [43] S. Schwarz, C. Mehlfhrer, and M. Rupp, “Low complexity approximate
1268. maximum throughput scheduling for LTE,” in Proc. of Asilomar Con-
[21] H. J. Kushner and P. A. Whiting, “Convergence of Proportional-Fair ference on Signals, Systems and Computers, Nov. 2010, pp. 1563–1569.
Sharing Algorithms Under General Conditions,” IEEE Transactions on
Wireless Communications, vol. 3, no. 4, pp. 1250–1259, Jul. 2004. [44] E. Liu and K. K. Leung, “Proportional Fair Scheduling: Analytical
[22] P. Viswanath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming Insight under Rayleigh Fading Environment,” in Proc. of IEEE WCNC,
using dumb antennas,” IEEE Transactions on Information Theory, Mar. 2008, pp. 1883–1888.
vol. 48, no. 6, pp. 1277–1294, Jun. 2002. [45] T. S. Rappaport, Wireless Communications: Principles & Practices.
[23] R. Kwan, C. Leung, and J. Zhang, “Proportional Fair Multiuser Schedul- Prentice Hall, 1996.
ing in LTE,” IEEE Signal Processing Letters, vol. 16, no. 6, pp. 461–464,
Jun. 2009. [46] E. K. Burke, et al., Search Methodologies. Springer, 2005.
Dario Bega is a research assistant with IMDEA Xavier Costa-Pérez (M’01) is Head of 5G Net-
Networks Institute in Madrid and a Ph.D. student works R&D at NEC Laboratories Europe, where
in Telematic Engineering at University Carlos III he manages several projects focused on 5G mo-
of Madrid (UC3M). He received his B.Sc (2010) bile core, backhaul/fronthaul and access networks.
and M.Sc (2013) degrees in Telecommunication His team contributes to NEC projects for prod-
Engineering from the University of Pisa, Italy. His ucts roadmap evolution, to European Commission
research work focuses on wireless networks, multi- R&D collaborative projects as well as to open-
tenancy and machine learning approach for 5G. source projects and related standardization bodies,
and has received several R&D Awards for successful
technology transfers. Dr. Costa-Pérez has served on
the Program Committees of several conferences and
holds multiple patents. He received both his M.Sc. and Ph.D. degrees in
Telecommunications from the Polytechnic University of Catalonia (UPC-
Albert Banchs (M’04-SM’12) received his BarcelonaTech) and was the recipient of a national award for his Ph.D. thesis.
M.Sc. and Ph.D. degrees from the Polytechnic
University of Catalonia (UPC-BarcelonaTech) in
1997 and 2002, respectively. He is currently a Full
Professor with the University Carlos III of Madrid Peter Rost (S’06-M’10-SM’15) received his Ph.D.
(UC3M), and has a double affiliation as Deputy degree from Technische Universität Dresden, Dres-
Director of the IMDEA Networks institute. Before den, Germany, in 2009 and his M.Sc. degree from
joining UC3M, he was at ICSI Berkeley in 1997, University of Stuttgart, Stuttgart, Germany, in 2005.
at Telefonica I+D in 1998, and at NEC Europe Ltd. Peter has been with Fraunhofer Institute for Beam
from 1998 to 2003. Prof. Banchs is Editor of IEEE and Material Technologies, Dresden, Germany; IBM
Transactions on Wireless Communications and Deutschland Entwicklung GmbH, Böblingen, Ger-
IEEE/ACM Transactions on Networking. His research interests include the many; and NEC Laboratories Europe, Heidelberg,
performance evaluation and algorithm design in wireless and wired networks. Germany. Since May 2015, Peter is member of
the e2e Architecture research group at Nokia Bell
Labs, Munich, Germany, where he is TM of German
BMBF project TACNET 4.0, leads the 5G Industrial IOT Testbed at Hamburg
Marco Gramaglia is a post-doc researcher at Uni- Seaport, and works in business unit projects on 5G Architecture. He currently
versity Carlos III of Madrid (UC3M), where he serves as member of VDE ITG Expert Committee Information and System
received M.Sc (2009) and Ph.D (2012) degrees in Theory and is an Executive Editor of IEEE Transactions of Wireless Com-
Telematics Engineering. He held post-doctoral re- munications.
search positions at Istituto Superiore Mario Boella
(Italy), the Institute of Electronics, Computer, and
Telecommunications Engineering (IEIIT) of the Na-
tional Research Council of Italy (CNR, Torino, Italy)
CNR-IEIIT (Italy) and IMDEA Networks (Spain).
He was involved in several EU projects and authored
more than 30 scientific publications appeared in
international books, conference and journals.

CARES: Computation-Aware Scheduling in Virtualized Radio Access Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CARES: Computation-Aware Scheduling in Virtualized Radio Access Networks

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

CARES: Computation-aware Scheduling in

Algorithm 1 Branching module algorithm Algorithm 2 Binary search module algorithm

200 80 Optimal CARES PF w/ MCS Legacy PF Greedy

25 shows the utility performance achieved in this scenario by the

80 provided by the ‘Legacy PF’ algorithm rather than the one

PF w/ MCS Legacy PF Greedy CARES Optimal

Fig. 6: Utility performance for different α values.

Capacity Savings [%]

Complexity (no. operations) 70

under the same conditions as the large scale scenario described

You might also like