You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357149409

QoS-aware 5G component selection for content delivery in multi-access edge


computing

Conference Paper · December 2021


DOI: 10.1145/3468737.3494101

CITATIONS READS
0 19

4 authors, including:

Erfan Farhangi Weibin Ma


University of Delaware University of Delaware
6 PUBLICATIONS   13 CITATIONS    7 PUBLICATIONS   22 CITATIONS   

SEE PROFILE SEE PROFILE

Lena Mashayekhy
University of Delaware
53 PUBLICATIONS   943 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Advanced big data algorithms View project

Network-based drug repositiong View project

All content following this page was uploaded by Erfan Farhangi on 19 December 2021.

The user has requested enhancement of the downloaded file.


QoS-Aware 5G Component Selection for Content Delivery in
Multi-access Edge Computing
Erfan Farhangi Maleki Weibin Ma
University of Delaware University of Delaware
Newark, DE, USA Newark, DE, USA
erfanf@udel.edu weibinma@udel.edu

Lena Mashayekhy Humberto La Roche


University of Delaware Cisco Systems
Newark, DE, USA USA
mlena@udel.edu hlaroche@cisco.com
ABSTRACT KEYWORDS
The demand for content such as multimedia services with strin- Multi-access Edge Computing, 5G, Content Delivery, Mobility, Multi-
gent latency requirements has proliferated significantly, posing Armed Bandit.
heavy backhaul congestion in mobile networks. The integration of
ACM Reference Format:
Multi-access Edge Computing (MEC) and 5G network is an emerg- Erfan Farhangi Maleki, Weibin Ma, Lena Mashayekhy, and Humberto La
ing solution that alleviates the backhaul congestion to meet QoS Roche. 2021. QoS-Aware 5G Component Selection for Content Delivery
requirements such as ultra-low latency, ultra-high reliability, and in Multi-access Edge Computing. In 2021 IEEE/ACM 14th International
continuous connectivity to support various latency-critical appli- Conference on Utility and Cloud Computing (UCC’21), December 6–9, 2021,
cations for user equipment (UE). Content caching can markedly Leicester, United Kingdom. ACM, New York, NY, USA, 10 pages. https:
augment QoS for UEs by increasing the availability of popular con- //doi.org/10.1145/3468737.3494101
tent. However, uncertainties originating from user mobility cause
the most challenging barrier in deciding content routes for UEs 1 INTRODUCTION
that lead to minimum latency. Considering the 5G-enabled MEC Cisco anticipated that approximately 500 billion devices would be
components, it is critical to select the optimal 5G components, rep- connected to the Internet by 2030, generating an enormous amount
resenting content routes from Edge Application Servers (EASs) to of data traffic [7]. Content delivery constitutes a large share of
UEs, that enhances QoS for the UEs with uncertain mobility pat- this traffic. It is predicted that only mobile videos will make up
terns by reducing frequent handover (path reallocation). To this aim, to 79% of the global mobile data traffic by 2022 [8], which mostly
we study the component selection for QoS-aware content delivery require strict latency for content delivery to enhance the quality
in 5G-enabled MEC. We first formulate an integer programming (IP) of experience of users. Many efforts have been conducted to meet
optimization model to obtain the optimal content routing decisions. the low-latency requirement of the emerging unprecedented traffic
As this problem is NP-hard, we tackle its intractability by designing demands. In this regard, fifth-generation (5G) network is a promis-
an efficient online learning approach, called Q-CSCD, to achieve ing technology suitable to manage massive capacity, connectivity,
a bounded performance. Q-CSCD learns the optimal component and support of different devices and applications such as mobile
selection for UEs and autonomously makes decisions to minimize object recognition, Internet of Things (IoT), data stream processing,
latency for content delivery. We conduct extensive experiments augmented reality, natural language processing, and mobile health
based on a real-world dataset to validate the effectiveness of our computing. 5G networks meet the diverse quality of service (QoS)
proposed algorithm. The results reveal that Q-CSCD leads to low requirements such as ultra-low latency and ultra-high reliability,
latency and handover ratio in a reasonable time with a reduced lower energy consumption, reduced operational costs. However,
regret over time. with the rapid growth of mobile data traffic, 5G networks encounter
significant challenges in capacity-limited backhaul links, particu-
CCS CONCEPTS larly while considering the extreme densification and diversity of
• Computer systems organization → Cloud computing; • Net- small cells [22]. In consideration of this challenge, Multi-access Edge
works → Mobile networks. Computing (MEC) offers cloud advantages such as computing, stor-
age, and caching at the edge of the network close to mobile devices
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed and delivers extremely low latency and high throughput [10, 18],
for profit or commercial advantage and that copies bear this notice and the full citation which complements 5G. Therefore, the European 5G Infrastructure
on the first page. Copyrights for components of this work owned by others than ACM Public Private Partnership (5G PPP) research body recognized MEC
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a as one of the main technologies for 5G networks [12].
fee. Request permissions from permissions@acm.org. Content delivery utilizing content caching at MEC is an effective
UCC’21, December 6–9, 2021, Leicester, United Kingdom approach to better deal with the exceedingly growing traffic over
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8564-0/21/12. . . $15.00 the mobile networks. It significantly reduces duplicated content
https://doi.org/10.1145/3468737.3494101 transmissions from backhaul links and thus improves QoS (e.g.,
UCC’21, December 6–9, 2021, Leicester, United Kingdom Farhangi Maleki, et al.

5G Network

EAS

Enterprise

PSA UPF Check availability

Backhaul
gNB CU
Midhaul

gNB DU
Submit requests
Fronthaul
gNB RU

UEs

Figure 1: Framework: 5G component selection for UEs for content delivery.

reducing latency) for UEs. Furthermore, it degrades backhaul ca- requirement. Essentially, assigning a new EAS means migrating
pacity demands by up to 35% [20]. Improving QoS for UEs has the content delivery service from the previous EAS to the new EAS,
enormous economic profits for enterprises. Amazon reported that leading to a migration latency. On the other hand, allocating a new
every 100ms of latency reduces 1% in sales. Google announced an PSA UPF leads to relocating the previous PSA UPF to a new PSA
extra 0.5 seconds in search page generation time declined traffic UPF, causing a relocation latency.
by 20%. Likewise, if the trading platform is 5ms behind the competi- This paper addresses the problem of component selection for
tion, a broker could lose $4 million in revenues per millisecond [1]. content delivery in a 5G-enabled MEC. We first formulate an opti-
Despite all these advantages, however, UEs’ high mobility causes mal integer programming (IP) model for this problem. To tackle the
severe uncertainty in deciding the places to locate the UEs’ most NP-hardness and intractability of the IP model, we then design an
popular content and the delivery routes. online learning-based approach to solve the problem in a reasonable
The integration of 5G and MEC, called 5G-enabled MEC, includes time and minimize the content delivery latency for UEs over their
the 5G components and Edge Application Servers (EASs) The 3GPP lifetime. Our proposed approach is formulated as the Multi-Armed
5G RAN architecture, which is recognized as NG-RAN, consists of Bandit (MAB) problem. We introduce a suitable online machine
radio base stations called gNBs connected to the 5G core network learning solution to learn the optimal routing path for UEs in a
(5GC) and each other. The gNB component incorporates three main time-slotted format. In this regard, at each time slot, the system
functional modules: the Radio Unit (RU), the Distributed Unit (DU), assigns a proper routing path for delivering requested contents to
and the Centralized Unit (CU), which can be located separately UEs based on the currently observed information (e.g., the requests,
or together in various combinations. Furthermore, PDU Session UEs’ mobility patterns, mean latency) to reduce the experienced
Anchor UPF (PSA UPF) plays a pivotal role in building the content latency.
routing path between each UE and EAS content [26]. Each gNB The rest of the paper is organized as follows. In the next section,
connects to a PSA UPF to enable UE’s access to the content cached we provide an overview of existing studies in this domain. We
at an EAS. As shown in Fig. 1, the component selection problem formulate our problem in Section 3. We then describe our proposed
(UEs routing path to content) in 5G-enabled MEC is composed approach, Q-CSCD, in Section 4. We evaluate the performance
of selecting the best gNB, PSA UPF, and EAS to determine each of our proposed approach by extensive experiments in Section 5.
UE’s optimal routing path with minimum latency from a gNB to Finally, we summarize our results and present potential directions
an EAS with the content. Optimal component selection requires for future research.
considering the dynamic of UEs’ behavior (e.g., mobility). Many
existing studies in 4G/5G networks mainly focus on collecting the 2 RELATED WORK
most popular contents and caching them on the local EAS. However,
We summarize the most relevant 5G-based studies in the literature
to the best of our knowledge, the component selection problem is
based on their research direction.
not studied in 5G-enabled MEC. The high mobility of UE results in
Enhancing QoS. A group of studies has focused on enhanc-
multiple gNB allocations as the connected gNB may change over
ing QoS for users as the only objective. Solozabal et al. [23] in-
time based on UE’s new locations. Therefore, when a UE departs
troduced a Non-Standalone (i.e., disconnected from the Internet)
from the coverage of the currently connected gNB, the system
5G ETSI MEC-based architecture for Mission-Critical Push-to-Talk
needs to determine which EAS and PSA UPF to allocate to the
(MCPTT) services to achieve the delay requirement. Kiani et al. [15]
UE to support the service continuity while satisfying the latency
proposed an edge computing-aware NOMA technique that utilizes
QoS-Aware 5G Component Selection for Content Delivery in Multi-access Edge Computing UCC’21, December 6–9, 2021, Leicester, United Kingdom

Markakis et al. [19] proposed a unified communication, computing,

Hybrid Computing
Content Delivery
Edge Computing
and caching (3C) solution for the 5G environment so that the con-

Online Learning
Fog Computing
Multi Objective
tent and service providers can deploy their functions, services, and

Real Dataset
Multi-User
contents closed to UEs. Bastug et al. [5] proposed a predictive and

Mobility
Latency
proactive caching approach to reduce peak traffic demands, leverag-
ing the 5G function, including D2D communication. Hou et al. [11]
QoS

Study devised a proactive caching mechanism, in which they exploited a


Solozabal et al.[23] ✓ ✓ ✓ ✓ transfer learning-based approach to predict the content popularity
Kiani et al.[15] ✓ ✓ ✓ of UEs. However, this approach requires a time-consuming train-
Kitanov et al.[16] ✓ ✓ ✓ ✓ ✓ ✓ ing phase. An online learning method could be more suitable to
Guo et al. [9] ✓ ✓ ✓ ✓ ✓
adaptively estimate the popularity, especially when the popularity
Rimal et al. [22] ✓ ✓ ✓ ✓ ✓
Zhang et al. [27] ✓ ✓ ✓ ✓ ✓ ✓ ✓
changes over time.
Tang et al. [25] ✓ ✓ ✓ ✓ ✓ Online Learning. Recently, online learning methods have gained
Qiao et al. [21] ✓ ✓ ✓ ✓ popularity to cache content on EASs. Blasco et al. [6] applied the
Markakis et al. [19] ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ multi-arm bandit (MAB) approach to learn the content popular-
Bastug et al.[5] ✓ ✓ ✓ ✓ ✓ ity distribution. Liu et al. [17] further improved the performance
Hou et al. [11] ✓ ✓ ✓ ✓ ✓ of content caching strategy by exploiting the preferences and the
Blasco et al. [6] ✓ ✓ ✓ ✓ ✓ spatial locality of UEs when designing the online learning content
Liu et al. [17] ✓ ✓ ✓ ✓ ✓ ✓ ✓ caching mechanism.
Our Study ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Summary. Table 1 further gives a detailed comparison of our
Table 1: Comparison with existing research work with the existing state-of-art studies based on several criteria.
An optimal 5G component selection from EASs to UEs significantly
impacts users’ perceived QoS, which has not been addressed by the
uplink NOMA privileges in reducing MEC users’ uplink energy existing studies. We tackle this problem by developing an efficient
consumption. To this aim, the paper formulated a NOMA-based component selection approach for UEs to optimize the content
optimization framework to minimize the energy consumption of delivery performance. Our work formulates the 5G component
MEC users based on optimizing user clustering, computing and selection as a MAB problem. We introduce an online algorithm
communication resource allocation, and transmit powers. Kitanov to learn the content routing path from EASs to UEs over time,
et al. [16] provided a comparison of energy efficiency between cloud considering users’ mobility and dynamics of the 5G network.
computing and fog computing under different modulation schemes.
Multi Objective. A line of research has focused on Multi-objective
criteria for users. Guo et al. [9] presented a cross stratum resources 3 SYSTEM MODEL
protection (CSRP) in fog-computing-based radio over fiber net- In the system model, we consider that each EAS is co-located with
works (F-RoFN) for 5G services with software-defined networking a PSA UPF, and each gNB is connected to a PAS UPF to allow a
controlling to overcome the complexity in the interaction between UE to acquire the content cached at EAS [26]. In other words, the
RRH and BBU and resource scheduling among BBUs in the cloud PSA UPF is building the content routing path between each UE and
and the latency of cloud radio over fiber network (CRoFN) in the the content at EAS. Note that the communication between UE and
5G area. Rimal et al. [22] proposed a fiber wireless (FiWi) access gNB is wireless via a radio access network (RAN). In contrast, the
architecture to improve MEC services such as traffic and network communication between gNB and PSA UPF is achieved via a wired
performance monitoring. network (e.g., local area network (LAN)).
Caching. An efficient way to improve content delivery’s perfor- We assume that multiple gNBs cover a specific area. We denote a
mance is content caching. In particular, by caching the most popular set of UEs, a set of gNBs, and a set of EASs by N = {1, . . . , 𝑁 }, G =
contents according to UE’s preferences at the EAS, UEs can receive {1, . . . , 𝐺 }, and M = {1, . . . , 𝑀 }, respectively. Considering that the
their requested contents from nearby EAS rather than from the PSA UPF is co-located with the EAS, PSA UPF 𝑖 indicates that the
remote cloud and hence reduce the perceived latency of UEs. Zhang PSA UPF is co-located with EAS 𝑖.
et al. [27] introduced a mobility-aware cooperative edge caching We consider a time-slotted format, where the time horizon is
architecture for content-centric 5G networks such that it utilizes defined as multiple time slots shown by T = {1, . . . ,𝑇 }. Allocating
mobile edge computing resources for intensifying edge caching a new PSA UPF leads to relocating (rerouting) the previous PSA
capability. Besides, smart vehicles act as moving and collaborative UPF to a new PSA UPF. Therefore, we define the relocation latency
caching agents for sharing content cache tasks with base stations. of a PSA UPF from EAS 𝑖 ∈ M to EAS 𝑗 ∈ M by 𝑟𝑖 𝑗 and the
Tang et al. [25] proposed a cooperative caching scheme to extend migration latency of a content from EAS 𝑖 to EAS 𝑗 is defined by 𝑠𝑖 𝑗 .
the virtual cache capacity. It addresses content caching and mini- Each UE 𝑛 ∈ N requests (requires) a content of size 𝑐𝑛𝑡 at time
mizes delay in user-centric delivery schemes in 5G CDNs under the slot 𝑡. Moreover, there are three types of transmission speeds. We
storage capacity and bandwidth constraints. Qiao et al. [21] pro- define 𝑑𝑛𝑔 𝑡 as the wireless transmission speed based on 5G between
posed a caching-based mmWave framework. The video contents UE 𝑛 ∈ N and its corresponding gNB 𝑔 ∈ G at time slot 𝑡. In
of UEs were precached at the base station to reduce the experi- addition, the speed of transmitting content between gNB 𝑔 and PSA
enced latency while the satisfactory quality of videos is guaranteed. UPF 𝑖 ∈ M is denoted by 𝑑𝑔𝑖 . Finally, the speed of transmitting
UCC’21, December 6–9, 2021, Leicester, United Kingdom Farhangi Maleki, et al.

𝑡 +1 𝑡 +1
content between PSA UPF 𝑖 and the targeted EAS 𝑗 is denoted 𝑡
𝑦𝑛𝑖 + 𝑦𝑛𝑖 ′ − 1 ≤ 𝑝𝑛𝑖𝑖 ′ ∀𝑛 ∈ N, 𝑖, 𝑖 ′ ∈ M, 𝑡 ∈ T (7)
by 𝑑𝑖 𝑗 . To formulate the problem mathematically, we first define
the decision variables as follows. 𝑧𝑛𝑡 𝑗 + 𝑧𝑛𝑡 +1 𝑡 +1
𝑗 ′ − 1 ≤ 𝑞𝑛 𝑗 𝑗 ′ ∀𝑛 ∈ N, 𝑗, 𝑗 ′ ∈ M, 𝑡 ∈ T (8)
(
1 if gNB 𝑔 is allocated to UE 𝑛 at time slot 𝑡 𝑡 𝑡
+ 𝑦𝑛𝑖 + 𝑧𝑛𝑡 𝑗 − 2 ≤ 𝑘𝑛𝑔𝑖
𝑡
𝑡
𝑥𝑛𝑔 = 𝑥𝑛𝑔 𝑗 ∀𝑛 ∈ N, 𝑔 ∈ G, 𝑖, 𝑗 ∈ M, 𝑡 ∈ T (9)
0 otherwise.
𝑡
𝑥𝑛𝑔 ∈ {0, 1} ∀𝑛 ∈ N, 𝑔 ∈ G, 𝑡 ∈ T (10)
(
𝑡 1 if PSA UPF 𝑖 is allocated to UE 𝑛 at time slot 𝑡 𝑡
𝑦𝑛𝑖 ∈ {0, 1} ∀𝑛 ∈ N, 𝑖 ∈ M, 𝑡 ∈ T (11)
𝑦𝑛𝑖 =
0 otherwise.
𝑧𝑛𝑡 𝑗 ∈ {0, 1} ∀𝑛 ∈ N, 𝑗 ∈ M, 𝑡 ∈ T (12)


 1 if the targeted content for UE 𝑛 is located at 𝑡
𝑝𝑛𝑖 𝑗 ∈ {0, 1} ∀𝑛 ∈ N, 𝑖, 𝑗 ∈ M, 𝑡 ∈ T (13)


𝑧𝑛𝑡 𝑗 = EAS 𝑗 at time slot 𝑡
𝑡
𝑗 ∈ {0, 1} ∀𝑛 ∈ N, 𝑖, 𝑗 ∈ M, 𝑡 ∈ T (14)
 𝑞𝑛𝑖
0

otherwise.

𝑡
𝑘𝑛𝑔𝑖 𝑗 ∈ {0, 1} ∀𝑛 ∈ N, 𝑖, 𝑗 ∈ M, 𝑔 ∈ G, 𝑡 ∈ T (15)

 1 if the allocated PSA UPF for UE 𝑛 is rerouted
The objective function in Eq. (1) minimizes the total sum of the



𝑡
𝑝𝑛𝑖 𝑗 = from EAS 𝑖 to EAS 𝑗 at time slot 𝑡
 relocation latency and migration latency along with the content
 0 otherwise.

 delivery latency over the entire time trip. The first term calculates
the relocation and migration latencies. The second term calculates

 1 if the targeted content for UE 𝑛 is migrated from the content delivery latency when a specific routing path is chosen
 𝑡
𝑡
=

 (indicated by 𝑘𝑛𝑔𝑖 𝑗 ). The content delivery latency is the total sum
𝑞𝑛𝑖 𝑗 EAS 𝑖 to EAS 𝑗 at time slot 𝑡

 0 otherwise.
 of content delivery latencies over different components of the 5G
𝑐𝑡
network. For example, 𝑑 𝑡𝑛 denotes content delivery latency when

𝑛𝑔
a requested content is passed through gNB 𝑔 to UE 𝑛. Constraints
 1 if gNB 𝑔 and PSA UPF 𝑖 are allocated to UE 𝑛


 (2) ensure each UE connects to only one gNB for accessing the edge
𝑡


 and the targeted content for UE 𝑛 is located at content at any time slot. Constraints (3) guarantee only one PSA
𝑘𝑛𝑔𝑖 𝑗 =

 EAS 𝑗 at time slot 𝑡 UPF is selected for each UE each time. Constraints (4) are to ensure
the targeted content of a UE is located at only one EAS each time.

 0 otherwise.

Constraints (5) guarantee the relocation between PSA UPFs for
Now, the 5G component selection problem in MEC can be formu- each UE happens at most once between two consecutive time slots.
lated as an Integer Program (IP) as follows: Constraints (6) guarantee migration of the targeted content happens
Õ Õ Õ Õ at most once between two consecutive time slots. Constraints (7)
𝑡 𝑡
Minimize D = (𝑝𝑛𝑖 𝑗 𝑟𝑖 𝑗 + 𝑞𝑛𝑖 𝑗 𝑠𝑖 𝑗 )+ are to set the relocation decision variables according to whether
𝑡 ∈ T 𝑛 ∈N 𝑖 ∈M 𝑗 ∈M a PSA UPF relocation happens or not. Constraints (8) are to set
(1) the migration decision variables according to whether a content
Õ Õ Õ Õ Õ
𝑡 𝑡 1 11 1 1 migration happens or not. Constraints (9) are to set the routing path
𝑘𝑛𝑔𝑖 𝑗 𝑐𝑛 ( 𝑡 + 𝑑 + + +
𝑑𝑔𝑐 𝑑𝑔𝑖 𝑑𝑖 𝑗
)
decision variables according to whether a routing path composed
𝑡 ∈ T 𝑛 ∈N 𝑔 ∈ G 𝑖 ∈M 𝑗 ∈M
𝑑𝑛𝑔 𝑔𝑑
of gNB, PSA UPF, and EAS is selected or not. Constraints (10)-(15)
Subject to: are to ensure the decision variables are binary.
Õ
𝑡
𝑥𝑛𝑔 =1 ∀𝑛 ∈ N, 𝑡 ∈ T (2) 4 QOS-AWARE 5G COMPONENT SELECTION
𝑔∈G
FOR CONTENT DELIVERY
Õ
𝑡 This section presents a scenario where several UEs submit their
𝑦𝑛𝑖 =1 ∀𝑛 ∈ N, 𝑡 ∈ T (3)
requests to the service provider (enterprise) to receive specific
𝑖 ∈M
contents with minimum latency. In such a scenario, each UE 𝑛
Õ owns only local information (e.g., its location) and does not have a
𝑧𝑛𝑡 𝑗 = 1 ∀𝑛 ∈ N, 𝑡 ∈ T (4)
priori information about other factors on the 5G side, such as the
𝑗 ∈M
relocation latency between PSA UPFs, migration latency of content,
Õ Õ
𝑡 or transmission speed of contents.
𝑝𝑛𝑖 𝑗 ≤1 ∀𝑛 ∈ N, 𝑡 ∈ T (5) Online machine learning is a promising technique to deal with
𝑖 ∈M 𝑗 ∈M dynamics and uncertain information in realtime. It is a proper
Õ Õ technique for making decisions when not all information is avail-
𝑡
𝑞𝑛𝑖 𝑗 ≤1 ∀𝑛 ∈ N, 𝑡 ∈ T (6) able initially and data comes in sequential order. Therefore, online
𝑖 ∈M 𝑗 ∈M learning can efficiently cope with constant moves and changes in
QoS-Aware 5G Component Selection for Content Delivery in Multi-access Edge Computing UCC’21, December 6–9, 2021, Leicester, United Kingdom

(a) MAB Model (b) Component Selection

Figure 2: A general framework for multi-armed bandit learning

massive datasets, and it can sustain changes for an extended period. Several algorithms have been studied under the MAB framework.
Furthermore, it does not require a considerable amount of memory Q-CSCD is adopted from the UCB1 strategy, which sets a balance
or data storage versus traditional batch learning techniques. In between exploration and exploitation [3, 4]. We present extensive
addition, online learning saves time and is cost-efficient since it comparisons of Q-CSCD with two other notable MAB strategies,
only processes a small portion of data instead of training over the Epsilon Greedy and Thompson Sampling, in Section 5. UCB1 al-
entire dataset at a time, which can be computationally infeasible locates a counter for each route to specify the number of times
for realtime MEC requirements. that route has been selected. It decides the priority of every route
We design a QoS-Aware 5G Component Selection for Content based on its obtained mean latency and the number of times that
Delivery (Q-CSCD) to solve the component selection problem ef- route has been selected. However, due to the online learning char-
ficiently, achieving a bounded performance. Q-CSCD learns the acteristics of Q-CSCD, it is inevitable to choose suboptimal routing
optimal component selection online for UEs, including selection paths and to have higher radio handover and content migrations
of gNB, PSA UPF, and EAS, to minimize latency for the content between EASs, especially at the beginning of the learning process.
delivery over time. Q-CSCD is an online learning approach formu- Fig. 2 further depicts a general framework of our learning-based
lated as the multi-armed bandit problem [14]. Q-CSCD provides an approach for the problem of 5G component selection for UEs in
online learning solution for each UE to learn optimal routing path, 5G-enabled MEC. Fig. 2a shows that each UE independently learns
denoted by Z : (𝑔𝑁 𝐵𝑔 , 𝑃𝑆𝐴 𝑈 𝑃𝐹𝑖 , 𝐸𝐴𝑆 𝑗 ), on the fly independently the optimal component selection using the MAB model based on
from other UEs. Q-CSCD, as an MAB-based approach, is expected the history of past observations. Once new latency is observed (for
to provide a near real-time solution for highly mobile UEs, and the allocated content routing path), the history of information is
it is also lightweight making it suitable for realtime decisioning updated. Then, the content provider is informed about the updated
in MEC. It includes two fundamental choices each time: 1) Explo- policy (or updated 5G component selection). Fig. 2b shows that
ration: by collecting more information leading to better choices each UE 𝑛 selects a single routing path at each time 𝑡 based on the
in the future; 2) Exploitation: by choosing the best option given up-to-now information.
current information. The core idea of our proposed approach is
that Q-CSCD chooses an unselected routing path for every UE 4.1 Q-CSCD Algorithm
at each time slot and observes its average latency per unit of re- The pseudo-code of our proposed approach is presented in Algo-
ceived content. During time slots with no new unselected routing rithm 1. For every UE 𝑛, the current location 𝑙𝑛𝑡 , the routing paths
path, UEs select a routing path that has the highest preference over in close proximity of the UE N (𝑙𝑛𝑡 ), the amount of received con-
others thus far. Q-CSCD also lowers the complexity of learning tent 𝑐𝑛𝑡 , and the value for 𝛽 to specify the weight of exploration
by considering just routes that could deliver the UE’s requested are available at each time slot (Line 1). The history of the selected
content and are in closer proximity. This approach applies a trade- routing paths for every UE 𝑛 is denoted by H𝑛 , and it is set to an
off between exploration and exploitation, where Q-CSCD explores empty set (Line 2). Q-CSCD first checks the existence of an uns-
different routing paths for each UE to learn the range of possible elected routing path Z𝑚 ∈ N (𝑙𝑛𝑡 ) (Line 4). If such a route exists,
new latencies. At the same time, it gives a chance to exploit a priori Q-CSCD selects that once (Line 5) and appends that to the history
known optimal routing paths. set (Line 6). It observes the perceived latency by UE 𝑛, when it is
UCC’21, December 6–9, 2021, Leicester, United Kingdom Farhangi Maleki, et al.

Algorithm 1 QoS-Aware 5G Component Selection for Content for UE 𝑛 until 𝑇 is denoted by 𝑅𝑛,𝑇 and is defined as follows:
Delivery (Q-CSCD) 𝑇
Õ
1: Input: 𝑙𝑛𝑡 , N (𝑙𝑛𝑡 ), 𝑐𝑛𝑡 , 𝛽 at the beginning of each time slot for UE 𝑅𝑛,𝑇 = E[ D𝜋𝑛𝑡 ,𝑡 − D𝜋𝑛∗ ,𝑡 ] + E[ℎ𝑛,𝜋𝑛 ]
𝑛, ∀𝑡 ∈ 𝑇 𝑡 =1 | {z }
| {z } handover
regret
2: H𝑛 = ∅ sampling regret
3: for all 𝑡 ∈ 𝑇 do where D𝜋𝑛𝑡 ,𝑡 , and D𝜋𝑛∗ ,𝑡 represent the latency for UE 𝑛 by routing
4: if ∃ Z𝑚 : (𝑔𝑁 𝐵𝑔 , 𝑃𝑆𝐴 𝑈 𝑃𝐹𝑖 , 𝐸𝐴𝑆 𝑗 ) ∈ N (𝑙𝑛𝑡 ) such that Z𝑚 ∉ path 𝜋𝑛 and the optimal routing path 𝜋𝑛∗ , respectively. Further-
H𝑛 then more, E[ℎ𝑛,𝜋𝑛 ] indicates the expected relocation regret for UE 𝑛.
5: Select Z𝑚 once We provide an upper bound on the learning regret of Q-CSCD in
the following Theorem.
6: H𝑛 = H𝑛 ∪ Z𝑚
7: Observe D𝑛,𝑚,𝑡 Theorem 1. The upper bound of learning regret for UE 𝑛 is as
D follows:
8: Update D̄𝑛,𝑚,𝑡 = 𝑛,𝑚,𝑡 2 Í
𝑅𝑛,𝑇 ≤ [8 𝑚≠𝜋𝑛∗ Δln𝑇 + (1 + 𝜋3 ) 𝑚≠𝜋𝑛∗ Δ𝑛,𝑚 ] +
Í
𝑐𝑛𝑡
9: 𝜒𝑛,𝑚,𝑡 = 1 𝑛,𝑚
2
C̄ [2 𝑚≠𝜋𝑛∗ (8 (Δln𝑇 ) 2 + 1 + 𝜋3 ) + 1],
Í
10: else q 𝑛,𝑚

11: Select 𝜋𝑛𝑡 = arg min ( D̄𝑛,𝑚,𝑡 − 𝛽 𝜒2𝑛,𝑚,𝑡 ln 𝑡 ) where C̄ represents the maximum handover latency that can be
𝑚 ∈H𝑛 experienced, and Δ𝑛,𝑚 is defined as D𝑛,𝑚 − D𝑛∗ such that D𝑛,𝑚 is
12: Observe D𝑛,𝜋𝑛𝑡 ,𝑡 the expected latency when routing path Z𝑚 is selected by UE 𝑛,
𝜒𝑛,𝜋 𝑡 ,𝑡 × D̄𝑛,𝜋 𝑡 ,𝑡 +
D𝑛,𝜋𝑛 ,𝑡
𝑡
and D𝑛∗ is the minimum latency when an optimal routing path is
𝑛 𝑛 𝑐𝑛
13: Update D̄𝑛,𝜋𝑛𝑡 ,𝑡 = 𝜒𝑛,𝜋 𝑡 ,𝑡 +1 selected by UE 𝑛. The proof is provided in the appendix.
𝑛
14: 𝜒𝑛,𝜋𝑛𝑡 ,𝑡 = 𝜒𝑛,𝜋𝑛𝑡 ,𝑡 + 1
15: end if
5 EXPERIMENTAL RESULTS
16: end for In this section, we conduct and analyze a set of experiments that are
designed to evaluate the effectiveness of our proposed approach,
Q-CSCD.
assigned to routing path Z𝑚 at time slot 𝑡 (Line 7). Then, it updates
the sample mean latency per unit of the received content by rout- 5.1 Experimental Setup
ing path Z𝑚 until time slot 𝑡, denoted by D̄𝑛,𝑚,𝑡 to compare the
We compare the performance of Q-CSCD with the following ap-
average experienced latency per unit of the content by each routing
proaches:
path (Line 8). The notation 𝜒𝑛,𝑚,𝑡 denotes the number of times that
the newly selected routing path Z𝑚 is selected until time slot 𝑡, • IP: we implement our IP model using IBM ILOG Concert
and initially is set to 1 for every newly selected routing path, and Technology API for C++ [13]. We use this IP, presented in
will be used in the calculation of the average latency per unit of equations (1-15), as a benchmark to compare our results with
content by that routing path (Line 9). In a case that such a routing the optimal solutions when is possible.
path does not exist, Q-CSCD selects the optimal routing path 𝜋𝑛𝑡 • Epsilon Greedy (𝜖-Greedy): This approach is a multi-armed
among the already explored routing paths, which represents the bandit strategy that balances exploration and exploitation
up to this point optimal path. The parameter 𝛽 determines a bal- by choosing between exploration and exploitation randomly.
ance between exploitation and exploration and specifies the value The value of 𝜖 is a threshold to determine the probability of
given to less selected routing paths (Line 11). In other words, the choosing to explore or to exploit the already best routing
first term in the formula gives a higher value to the already best path.
routing paths with minimum average latency, which is equivalent • Thompson Sampling: This approach is a multi-armed bandit
to exploitation. However, as the routing path, Z𝑚 , is less selected strategy that uses a Beta distribution to prioritize the choice
especially in a larger value of time slot 𝑡, the second term gets a between different routing paths. The shape of the Beta dis-
higher value, causing to explore the less observed routing paths. tribution is controlled by two positive shape parameters,
The value of 𝛽 further controls the weight of the first term (exploita- denoted by 𝛼 and 𝛽, such that if the selected routing path
tion) versus the second term (exploitation). Q-CSCD then observes leads to less experienced latency than the current average
new latency D𝑛,𝜋𝑛𝑡 ,𝑡 (Line 12) and updates D̄𝑛,𝜋𝑛𝑡 ,𝑡 by calculating
the mean latency for the past and newly observed latencies and Table 2: Experiment Scenarios
then increments 𝜒𝑛,𝜋𝑛𝑡 ,𝑡 (Lines 13-14).
Exp. # UEs # gNB # PSA UPF # EAS
1 10 2 2 2
2 12 4 2 2
4.2 Q-CSCD Regret Analysis 3 50 7 2 2
We measure the performance loss of every UE due to learning by 4 100 8 3 3
applying the concept of learning regret. The learning regret is the 5 150 8 4 4
combination of sampling regret and handover regret. The regret 6 200 10 5 5
QoS-Aware 5G Component Selection for Content Delivery in Multi-access Edge Computing UCC’21, December 6–9, 2021, Leicester, United Kingdom

IP Epsilon Greedy IP Epsilon Greedy IP Epsilon Greedy


Q-CSCD Thompson Sampling Q-CSCD Thompson Sampling Q-CSCD Thompson Sampling
1600 25 1600

Avg. Handover Time (milliseconds)

Avg. Delivery Time (milliseconds)


Avg. Latency (milliseconds)

1400 20 1400

1200 15 1200

1000 10 1000

800 5 800

600 0 600
Exp.1 Exp.2 Exp.3 Exp.4 Exp.5 Exp.6 Exp.1 Exp.2 Exp.3 Exp.4 Exp.5 Exp.6 Exp.1 Exp.2 Exp.3 Exp.4 Exp.5 Exp.6
Experiments Experiments Experiments

(a) Latency (b) Handover Time (c) Content Delivery Time

Figure 3: Performance Analysis - Latency Components (*IP was unable to determine any solution for Exp. 3-6 in feasible time, and
thus, there are no bars in the plots for those cases)

latency per unit of content for a UE, 𝛼 is incremented by 1; transmission speed for connecting UEs with gNB RUs at each time
otherwise, 𝛽 is incremented by 1. The incrementing of 𝛼 slot. Finally, we set the relocation latency between PSA UPFs and
increases the route’s likelihood of selection in the next time the migration latency of content between EASs both to 10 ms.
slot. In contrast, incrementing 𝛽 reduces the its likelihood We consider six scenarios for the experiments. Table 2 summa-
of selection in the next time slot. rizes these scenarios. The values of 𝛽 for our approach and 𝜖 for
Epsilon Greedy are set to 10−4 and 0.3, respectively. We perform
The algorithms are implemented in C++, and the experiments a sensitivity analysis on the value of 𝛽 in a range between 10−5
are conducted on a desktop PC with 2.80 GHz, 11th Gen Intel(R) to 10−2 to find the best 𝛽 value that leads to minimizing latency
Core(TM) i7-1165G7 and 16 GB RAM. (see Fig. 5b).
For our experiments, we consider a 5G-enabled MEC for content
delivery in an environment where UEs are scattered uniformly, 5.2 Comparative Analysis
changing locations between time slots with various rates from
We use several metrics such as latency, handover time, content
distribution U(1, 10) to model different mobility patterns. We assume
delivery time, handover ratio, regret, and runtime to compare our
all components of 5G networks, including gNBs, PSA UPFs, and
proposed approach Q-CSCD with IP, Epsilon Greedy, and Thomp-
EASs, are randomly scattered over the geographical area according
son Sampling. Due to the NP-hardness of our problem, solving
to a uniform distribution. In this network, the contents required
the IP to get optimal results is intractable. We initiated 60 min-
by the UEs at each time slot are randomly taken from U(100, 200)
utes as a maximum feasible time to receive the IP results by the
Mbits.
solver. However, the solver was unable to obtain optimal solutions
ThousandEyes [2], a network intelligence company acquired
within 60 minutes for most scenarios. It should be noted that we
by Cisco, uses an active monitoring technique to collect network
set the number of time slots 𝑇 to 3000 for Fig. 3a-3c and to 1000 for
metrics such as loss, latency, jitter, and comprehensive path metrics
Fig. 4a-4b.
with detailed layer-3 hops. We use latency between hops obtained
by ThousandEyes to model transmission speeds between differ- 5.2.1 Comparative Analysis on Latency. Our optimization crite-
ent links in a 5G network. In general, 5G links are decomposed to rion is to minimize the experienced latency time of UEs, composed
Fronthaul, Midhaul, and Backhaul. Fronthaul represents the con- of content delivery time and handover time (relocation time and
nectivity between gNB RUs and gNB DUs; Midhaul connectivity migration time). The average latency time of UEs per time slot
describes the communication link between gNB DUs and gNB CUs, is presented in Fig. 3a, which is measured in milliseconds. The
and Backhaul comes into play to connect gNB CUs with PSA UPFs, results show that IP was unable to find a solution for Exp. 3-6
and EASs. To model transmission speeds in these links, we first due to the intractability of our NP-hard problem. Q-CSCD obtains
set 3 hops between source and target nodes; therefore, there are close to optimal results in Exp. 1-2 and much lower latency time
five nodes in the path from the source node to the target node. We in all experiments than Epsilon Greedy and Thompson Sampling
model the source node as gNB RU, second node as gNB DU, third approaches. Epsilon Greedy utilizes a simple technique to trade-
node as gNB CU, fourth node as PSA UPF, and fifth node as EAS to off between exploration and exploitation randomly. At the same
represent the 5G framework shown in Fig. 1. time, Q-CSCD deliberately established this tradeoff by consider-
Considering the latency and data transmission size between ing a repetition of the least observed routing paths and uses a
nodes, we calculate the respective data transmission speeds in more in-depth formula to minimize the latency. On the other hand,
links and scale them up to represent speeds in a 5G network. We Thompson sampling immediately increments parameters in each
use these paths as a sample and then obtain the minimum and repetition only by comparing the currently experienced latency
maximum speeds in each hop and utilize a uniform distribution to with the up-to-now average latency. This will determine the shape
simulate more paths for our experiments whenever it is necessary. of the distribution immediately. Then, those that can be better in
Similarly, measuring distance, we consider 1 to 10000 Mbps wireless the future but obtained larger beta parameters in the initial time
UCC’21, December 6–9, 2021, Leicester, United Kingdom Farhangi Maleki, et al.

IP Epsilon Greedy IP Epsilon Greedy IP Epsilon Greedy


Q-CSCD Thompson Sampling Q-CSCD Thompson Sampling Q-CSCD Thompson Sampling
1 107 2

106

Runtime (milliseconds)
0.8
1.5
105
Handover ratio

0.6
104

Regret
1
3
0.4 10

102
0.5
0.2
10

0 1 0
Exp.1 Exp.2 Exp.3 Exp.4 Exp.5 Exp.6 Exp.1 Exp.2 Exp.3 Exp.4 Exp.5 Exp.6 T=100 T=200 T=300 T=400 T=500
Experiments Experiments Time Slot

(a) Handover ratio (b) Runtime (c) Regret

Figure 4: Performance Analysis (*IP was unable to determine any solution for Exp. 4-6 in feasible time for Fig. 4a)

slots are less likely to happen in the future, which causes a worse to calculate so many beta distributions that it is time-consuming.
obtained latency. Q-CSCD and Epsilon Greedy are very close. Q-CSCD is faster than
Epsilon Greedy for small-case scenarios because fewer routing
5.2.2 Comparative Analysis on Handover Time. Fig. 3b shows the paths exist to learn from. However, as the size of the experiments
average handover time of UEs per time slot. Q-CSCD has compar- increases, epsilon greedy is faster because Q-CSCD has to learn
ative results with IP in Exp. 1-2 (close to zero) and significantly among more routing paths.
less handover time than Epsilon Greedy and Thompson Sampling
approaches in all experiments. This is due to choosing the best 𝛽 5.2.6 Comparative Analysis on Regret. Regret is the deviation per-
value for Q-CSCD that leads to an appropriate balance between ex- centage of the obtained result from the optimal result. The regret
ploration and exploitation resulting in minimum handover time and for IP is 0 for all time slots. We choose Exp. 1 to assess regret of our
latency. Epsilon Greedy achieves a higher handover time due to its approach since the optimal results obtained by the IP are available
random nature determined by the value of epsilon. Since Thompson for all time slots. As the number of time slots increases, Q-CSCD
Sampling does not get adequate samples for the average latency and leads to less regret because it has more samples to learn from. This
instantly starts incrementing parameters, it requires more handover is due to the fact that Q-CSCD can learn and obtain closer to opti-
whenever a better beta distribution value is obtained. mal paths, even though a higher number of handovers and far from
optimal solutions are inevitable in the beginning. However, this
5.2.3 Comparative Analysis on Content Delivery Time. We compare pattern is not linear in Epsilon Greedy and Thompson Sampling
the content delivery time of the UEs per time slot obtained by each due to their randomness and not-well informed inherent (from the
approach. The results are presented in Fig. 3c. UEs with Q-CSCD past observations).
experience close to optimal results in Exp. 1-2 and better content
delivery time than other approaches. Because Q-CSCD gives a 5.3 Sensitivity Analysis
more reasonable value to choose already best routing paths or This section presents sensitivity analysis concerning the 𝛽 value
to explore less selected routing paths, both Epsilon Greedy and and the number of time slots 𝑇 to show their impacts on the results.
Thompson Sampling approaches result in worse content delivery In each analysis, we fix other parameters to study sensitivity over
time as expected. a single parameter.
5.2.4 Comparative Analysis on Handover ratio. We present the han- 5.3.1 Sensitivity Analysis on Beta. In Q-CSCD, 𝛽 is a parameter that
dover ratio (the percentage of handover for each UE at each time controls the value of exploration versus exploitation. We consider
slot) obtained by each approach in Fig.4a. IP can only solve Exp. 1-3 Exp. 6 and 𝑇 = 3000. We conduct two sensitivity analyses on 𝛽 to
(with 𝑇 = 1000) and cannot obtain a solution in a feasible time study its impacts on handover ratio and latency. We present the
for other experiments. As expected, IP will have a lower handover results in Figs. 5a and 5b, respectively. Note that only Q-CSCD is
ratio even though minimization of the handover ratio is not our sensitive to the value of 𝛽, and we only present the results of other
objective. However, we believe approaches with lower handover approaches. The results show as 𝛽 increases, the handover ratio
ratios are expected to have a better total latency. As the size of increases as well. This is because a higher value of 𝛽 leads to a
experiments increases, Q-CSCD leads to a higher handover ratio be- higher chance for exploration that will result in more handover. On
cause there are more routing paths to choose from. Epsilon Greedy the other hand, the best 𝛽 for the latency is the one that can set the
and Thompson Sampling have much higher handover ratios for all best tradeoff between exploration and exploitation, which in our
experiments. case, 0.0001 obtains the best latency compared to other values.
5.2.5 Comparative Analysis on Runtime. Fig. 4b shows the runtime 5.3.2 Sensitivity Analysis on Time Slot. We consider Exp. 3 and per-
of different approaches per UE in logarithmic scale. IP is the worst form sensitivity analysis on the number of time slot (𝑇 ) on handover
as expected and uses the maximum feasible time to obtain a solution. ratio and runtime, and we present the results in Figs. 5c and 5d,
Thompson Sampling is second-worst because of the requirement respectively. As the number of iterations (time slots) increases, our
QoS-Aware 5G Component Selection for Content Delivery in Multi-access Edge Computing UCC’21, December 6–9, 2021, Leicester, United Kingdom

IP Epsilon Greedy IP Epsilon Greedy


Q-CSCD Thompson Sampling Q-CSCD Thompson Sampling
1 1400

Avg. Latency (milliseconds)


1300
0.8
1200

Handover ratio
0.6
1100

1000
0.4

900
0.2
800

0 700
Beta=0.00001 Beta=0.0001 Beta=0.001 Beta=0.01 Beta=0.00001 Beta=0.0001 Beta=0.001 Beta=0.01
Beta Beta

(a) 𝛽 on Handover ratio (b) 𝛽 on Latency


IP Epsilon Greedy IP Epsilon Greedy
Q-CSCD Thompson Sampling Q-CSCD Thompson Sampling
1 107

106

Runtime (milliseconds)
0.8
105
Handover ratio

0.6
104

0.4 103

102
0.2
10

0 1
T=500 T=1000 T=2000 T=3000 T=500 T=1000 T=2000 T=3000
Time Slot Time Slot

(c) Time Slot on Handover ratio (d) Time Slot on Runtime

Figure 5: Sensitivity Analysis

approach can get better decisions and learn which routing path UEs, and therefore, reduce the routing path length to enhance QoS.
has the best latency; thus, leading to a lower handover ratio over We further plan to conduct more intensive experiments to com-
time. Our approach obtains close to the IP results for the first two pare Q-CSCD with different learning-based and prediction-based
time slots that IP results are available. As expected, Epsilon Greedy approaches.
and Thompson Sampling obtain the worst results as the duration
of the experiments increases. For the runtime, as the number of 7 ACKNOWLEDGMENTS
time slot increases, the runtime increases with the value of 𝑇 for all We gratefully acknowledge the support of Cisco grant CG#1935382.
approaches, except the runtime of IP, which is set to a maximum of
60 minutes, and it utilizes this maximum feasible time limit due to REFERENCES
its intractability. [1] 2009. Latency Is Everywhere And It Costs You Sales - How To Crush
It. Available: http://highscalability.com/latency-everywhere-and-it-costs-you-
To sum up, the results show that our proposed approach, Q- sales-how-crush-it.
CSCD, is efficient in finding the best routing path along with the [2] 2021. Thousandeyes. Available: https://www.thousandeyes.com/.
[3] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of
representative 5G components for UEs fast and with low latency, the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.
handover time, and content delivery time. It also obtains a lower [4] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 2002. The
nonstochastic multiarmed bandit problem. SIAM journal on computing 32, 1
handover ratio and less regret than Epsilon Greedy and Thompson (2002), 48–77.
Sampling approaches. [5] Ejder Bastug, Mehdi Bennis, and Mérouane Debbah. 2014. Living on the edge:
The role of proactive caching in 5G wireless networks. IEEE Communications
Magazine 52, 8 (2014), 82–89.
[6] Pol Blasco and Deniz Gündüz. 2014. Learning-based optimization of cache
6 CONCLUSION AND FUTURE WORK content in a small cell base station. In Proc. of the IEEE International Conference
5G-enabled Multi-access Edge Computing alleviates limited back- on Communications (ICC). 1897–1903.
[7] Cisco. 2016. Internet of Things. Available: https://www.cisco.com/c/en/us/
haul capacity and enhances QoS for UEs. This paper tackled the products/collateral/se/internet-of-things/at-a-glance-c45-731471.pdf.
5G component selection problem for content delivery, which is a [8] GMDT Forecast. 2019. Cisco visual networking index: global mobile data traffic
critical problem in 5G-enabled MEC. We proposed a multi-armed forecast update, 2017–2022. Update 2017 (2019), 2022.
[9] Shaoyong Guo, Sujie Shao, Yao Wang, and Hui Yang. 2017. Cross stratum re-
bandit-based approach, called Q-CSCD, to learn the optimal 5G com- sources protection in fog-computing-based radio over fiber networks for 5G
ponents, leading to minimum experienced latency for UEs. Q-CSCD services. Optical Fiber Technology 37 (2017), 61–68.
[10] Najmul Hassan, Kok-Lim Alvin Yau, and Celimuge Wu. 2019. Edge computing in
learns the optimal routing paths for UEs over time. The results indi- 5G: A review. IEEE Access 7 (2019), 127276–127289.
cate that Q-CSCD is highly scalable, achieves efficient latency and [11] Tingting Hou, Gang Feng, Shuang Qin, and Wei Jiang. 2018. Proactive content
handover ratio versus other strategies, and significantly reduces caching by exploiting transfer learning for mobile edge computing. International
Journal of Communication Systems 31, 11 (2018), e3706.
regret over time. For future work, we plan to enable and incen- [12] Yun Chao Hu, Milan Patel, Dario Sabella, Nurit Sprecher, and Valerie Young. 2015.
tivize cooperation among UEs to share their content with nearby Mobile edge computing—A key technology towards 5G. ETSI white paper 11, 11
UCC’21, December 6–9, 2021, Leicester, United Kingdom Farhangi Maleki, et al.

min D̄𝑚,𝑎𝑚 − 𝑐𝑡 −1,𝑎𝑚 }


(2015), 1–16. ℓ<𝑎𝑚 <𝑡
[13] IBM. 2009. Concert Technology version 12.1 C++ API Reference Manual. Avail-
∞ Õ
𝑡 −1 Õ
𝑡 −1 (16)
able: ftp://public.dhe.ibm.com/software/websphere/ilog/docs/optimization/cplex/ Õ
refcppcplex.pdf. Accessed: 2019-05-25. ≤ℓ+ 𝐼 { D̄𝜋 ∗ ,𝑎 − 𝑐𝑡,𝑎 ≥ D̄𝑚,𝑎𝑚 − 𝑐𝑡,𝑎𝑚 }
[14] Michael N Katehakis and Arthur F Veinott Jr. 1987. The multi-armed bandit 𝑡 =1 𝑎=1 𝑎𝑚 =ℓ
problem: decomposition and computation. Mathematics of Operations Research
12, 2 (1987), 262–268. where 𝐼 { D̄𝜋 ∗ ,𝑎 − 𝑐𝑡,𝑎 ≥ D̄𝑚,𝑎𝑚 − 𝑐𝑡,𝑎𝑚 } implies at least one of the
[15] Abbas Kiani and Nirwan Ansari. 2018. Edge computing aware NOMA for 5G following equations:
networks. IEEE Internet of Things Journal 5, 2 (2018), 1299–1306.
[16] Stojan Kitanov and Toni Janevski. 2017. Energy efficiency of Fog Computing D̄𝜋 ∗ ,𝑎 ≥ D ∗ + 𝑐𝑡,𝑎 (17)
and Networking services in 5G networks. In Proc. of the IEEE EUROCON 17th
International Conference on Smart Technologies. 491–494.
[17] Dong Liu and Chenyang Yang. 2017. Optimizing caching policy at base stations D̄𝑚,𝑎𝑚 ≤ D𝑚 − 𝑐𝑡,𝑎𝑚 (18)
by exploiting user preference and spatial locality. arXiv preprint arXiv:1710.09983
(2017).
[18] Erfan Farhangi Maleki, Lena Mashayekhy, and Seyed Morteza Nabavinejad. 2021. D ∗ ≥ D𝑚 − 2𝑐𝑡,𝑎𝑚 (19)
Mobility-Aware Computation Offloading in Edge Computing using Machine
Learning. IEEE Transactions on Mobile Computing (2021). The probability in which equations (17) or (18) be true by using
[19] Evangelos K Markakis, Kimon Karras, Anargyros Sideris, George Alexiou, and Chernoff-Hoeffding bound is:
Evangelos Pallis. 2017. Computing, caching, and communication at the edge:
The cornerstone for building a versatile 5G ecosystem. IEEE Communications 𝑃 { D̄𝜋 ∗ ,𝑎 ≥ D ∗ + 𝑐𝑡,𝑎 } ≤ 𝑒 −4 ln 𝑡 = 𝑡 −4 (20)
Magazine 55, 11 (2017), 152–157.
−4 ln 𝑡 −4
[20] Milan Patel, Brian Naughton, Caroline Chan, Nurit Sprecher, Sadayuki Abeta, 𝑃 { D̄𝑚,𝑎𝑚 ≤ D𝑚 − 𝑐𝑡,𝑎𝑚 } ≤ 𝑒 =𝑡 (21)
Adrian Neal, et al. 2014. Mobile-edge computing introductory technical white l m
paper. White paper, mobile-edge computing (MEC) industry initiative 29 (2014), ln𝑇
When ℓ ≥ 8 (Δ ) 2 , equation (19) is false because:
854–864. 𝑚
[21] Jian Qiao, Yejun He, and Xuemin Sherman Shen. 2016. Proactive caching for s
mobile video streaming in millimeter wave 5G networks. IEEE Transactions on ∗ ∗ 2 ln 𝑡
Wireless Communications 15, 10 (2016), 7187–7198. D − D𝑚 + 2𝑐𝑡,𝑎𝑚 = D − D𝑚 + 2 ≤ D ∗ − D𝑚 + Δ𝑚 = 0
𝑎𝑚
[22] Bhaskar Prasad Rimal, Dung Pham Van, and Martin Maier. 2017. Mobile edge
computing empowered fiber-wireless access networks in the 5G era. IEEE Com- Therefore, we have:
munications Magazine 55, 2 (2017), 192–200.
[23] Ruben Solozabal, Aitor Sanchoyerto, Eneko Atxutegi, Bego Blanco, Jose Oscar
  Õ ∞ Õ 𝑡 −1 𝑡 −1
ln𝑇 Õ
Fajardo, and Fidel Liberal. 2018. Exploitation of mobile edge computing in 5G E[𝜒𝑚,𝑇 ] ≤ 8 +
distributed mission-critical push-to-talk service deployment. IEEE Access 6 (2018), (Δ𝑚 ) 2 𝑡 =1 𝑎=1
l m
ln𝑇
37665–37675. 𝑎𝑚 = 8
(Δ𝑚 ) 2
[24] Yuxuan Sun, Sheng Zhou, and Jie Xu. 2017. EMM: Energy-aware mobility man-
agement for mobile edge computing in ultra dense networks. IEEE Journal on × (𝑃 { D̄𝜋 ∗ ,𝑎 ≥ D ∗ + 𝑐𝑡,𝑎 } + 𝑃 { D̄𝑚,𝑎𝑚 ≤ D𝑚 − 𝑐𝑡,𝑎𝑚 })
Selected Areas in Communications 35, 11 (2017), 2637–2646.
[25] Shiyu Tang, Ali Alnoman, Alagan Anpalagan, and Isaac Woungang. 2018. A
  Õ ∞ Õ 𝑡 −1 𝑡 −1
ln𝑇 Õ
user-centric cooperative edge caching scheme for minimizing delay in 5G content ≤ 8 + 2𝑡 −4
delivery networks. Transactions on Emerging Telecommunications Technologies (Δ𝑚 ) 2 𝑡 =1 𝑎=1
l m
ln𝑇
29, 8 (2018), e3461. 𝑎𝑚 = 8
(Δ𝑚 ) 2
[26] Takahito Yoshizawa, Sheeba Backia Mary Baskaran, and Andreas Kunz. 2019.
Overview of 5g urllc system and security aspects in 3gpp. In 2019 IEEE Conference ln𝑇 𝜋2
on Standards for Communications and Networking (CSCN). IEEE, 1–5. ≤8 2
+1+
[27] Ke Zhang, Supeng Leng, Yejun He, Sabita Maharjan, and Yan Zhang. 2018. Co- (Δ𝑚 ) 3
operative content caching in 5G networks with mobile edge computing. IEEE (22)
Wireless Communications 25, 3 (2018), 80–87.
As a result, the sample regret is as follows:
𝑇
8 APPENDIX E[
Õ
D𝜋 𝑡 ,𝑡 − D𝜋 ∗ ,𝑡 ] =
Proof of Theorem 1. The learning regret for every UE is the total 𝑡 =1
sampling regret and handover regret. Here, we calculate the upper Õ
Δ𝑚 E[𝜒𝑚,𝑇 ] = (23)
bound of regret for each part independently. The proof uses the
𝑚≠𝜋 ∗
similar idea of [3] and [24], where 𝜒𝑚,𝑇 shows the number of times Õ ln𝑇 𝜋2 Õ
path Z𝑚 is selected by time 𝑇 , ℓ is a positive integer that denotes 8 + (1 + ) Δ𝑚
∗ Δ𝑚 3
the number q of times routing path Z𝑚 is selected until time |Z|, 𝑚≠𝜋 ∗ 𝑚≠𝜋
and 𝑐𝑡,𝑎 = 2 ln 𝑡
𝑎 . Since we analyze regret for one UE, for simplicity,
We now find an upper bound for handover regret as follows:
E[ℎ𝑛,𝜋𝑛 ] ∝ E[ 𝑇𝑡=2 𝐼 {𝜋 𝑡 ≠ 𝜋 𝑡 −1 }] = 𝑚 E[ 𝑇𝑡=2 𝐼 {𝜋 𝑡 = 𝑚, 𝜋 𝑡 −1 ≠
Í Í Í
we eliminate 𝑛.
The upper bound for 𝜒𝑚,𝑇 is calculated as follows: 𝑚}]
𝑇
Let assume 𝐴𝑚 = 𝑇𝑡=2 𝐼 {𝜋 𝑡 = 𝑚, 𝜋 𝑡 −1 ≠ 𝑚} counts the number
Õ Í
𝜒𝑚,𝑇 ≤ ℓ + 𝐼 {𝜋 𝑡 = 𝑚}
of handover from routing path Z𝑚 to other routing paths.
𝑡 = | Z |+1 Í Í
𝑅𝑅 ∝ ( 𝑚≠𝜋 ∗ E[𝐴𝑚 ] + E[𝐴𝜋 ∗ ]) ≤ (2 𝑚≠𝜋 ∗ E[𝐴𝑚 ] + 1) <=
𝑇 2
(2 𝑚≠𝜋 ∗ E[𝜒𝑚,𝑇 ] + 1) ≤ (2 𝑚≠𝜋 ∗ [8 (Δln𝑇) 2 + 1 + 𝜋3 ] + 1)
Õ Í Í
≤ℓ+ 𝐼 {𝜋 𝑡 = 𝑚, 𝜒𝑚,𝑡 −1 ≥ ℓ } 𝑚
𝑡 = | Z |+1 Therefore, the aggregation of the sample regret and the handover
𝑇 regret proves Theorem 1, which shows that the obtained regret
Õ
≤ℓ+ 𝐼 { max D̄𝜋 ∗ ,𝑎 − 𝑐𝑡 −1,𝑎 ≥ (deviation from optimal result) is bounded.
0<𝑎<𝑡
𝑡 = | Z |+1

View publication stats

You might also like