JSA D 22 00243 Reviewer

Journal of Systems Architecture
Edge Intelligence in Motion: Mobility-Aware Dynamic DNN Inference Service Migration

with Downtime in Mobile Edge Computing
--Manuscript Draft--
Manuscript Number: JSA-D-22-00243

Article Type: VSI: ECforIoV [MGE - Shaohua Wan]
Keywords: mobile edge computing; DNN service migration; multi-exit DNN; model predictive control
Abstract: Edge intelligence (EI) becomes a trend to push the deep learning frontiers to the network edge.
In this paper, we consider a user-centric management for DNN inference service migration and
exit point selection, aiming at maximizing overall user utility (e.g., DNN model inference
accuracy) with various service downtime. We first leverage dynamic programming to propose
an optimal offline migration and exit point selection strategy (OMEPS) algorithm when
complete future information of user behaviors is available. Amenable to a more practical
application domain without complete future information, we incorporate the OMEPS algorithm
into a model predictive control (MPC) framework, and then construct a mobility-aware
servicemigration and DNN exit point selection (MOMEPS) algorithm,
which improves the long-term service utility within limited predictive future information.
However, heavy computation overheads of MOMEPS algorithm impose burdens on mobile
devices, thus we further advocate a cost-efficient algorithm, named smart- MOMEPS, which
introduces a smart migration judgement based on Neural Networks to control the
implementation of (MOMEPS) algorithm by wisely estimating whether the DNN service should
be migrated or not. Extensive trace-driven simulation results demonstrate the superior

performance of our algorithms for achieving significant overall utility improvements with low
computation overheads compared with other online algorithms.
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript File Click here to view linked References
1
Edge Intelligence in Motion: Mobility-Aware

Dynamic DNN Inference Service Migration with
Downtime in Mobile Edge Computing
Pu Wang1,2, Tao Ouyang1, Guocheng Liao3, Jie Gong1, Shuai Yu1, Xu Chen1
School of Computer Science and Engineering1, School of Biomedical Engineering2, Sun Yat-sen University,
Guangzhou, China
School of Software Engineering3, Sun Yat-sen University, Zhuhai, China
Abstract—Edge intelligence (EI) becomes a trend to push the

deep learning frontiers to the network edge, so that deep neural constrained computing capability and energy. For example,
networks (DNNs) applications can be well leveraged at resource- real-time video analytic applications require processing tens
constrained mobile devices with benefits of edge computing. Due of frames per second by computation-intensive DNN models
to the high user mobility among scattered edge servers in many (e.g., YOLO, ResNet), which are unsuitable for resource-
scenarios such as internet of vehicular applications, dynamic
constrained mobile devices [1]. To tackle this challenge, mo-
service migration is desired to maintain a reliable and efficient
quality of service (QoS). However, inevitable service downtime bile edge computing (MEC) is proposed to push these services
incurred by service migration would largely degrade the real- from the local IoT devices to the network edges, which are
time performance of delay-sensitive DNN inference services. servers in proximity to mobile devices (e.g., at base stations or
Fortunately, based on the characteristics of container-based WiFi hotpots). Mobile devices can collaborate with network
DNN services, exit point selection and layer sharing feature of
edges to achieve a higher quality of service and a low latency
the container technique can alleviate such performance
degradation. Thus, we consider a user-centric management for via task offloading [2].
DNN inference service migration and exit point selection, aiming However, maintaining a reliable and efficient service
at maximizing overall user utility (e.g., DNN model inference performance of DNN applications at the network edge is
accuracy) with various service downtime. We first leverage nontrivial due to the mobility of users. To reduce the network
dynamic programming to propose an optimal offline migration
latency, DNN services are generally deployed at a nearby
and exit point selection strategy (OMEPS) algorithm when
complete future information of user behaviors is available. edge server, where users can access their services through
Amenable to a more practical application domain without wireless communications to a local base station (BS), such
complete future information, we incorporate the OMEPS as WiFi hotspots. When a user moves away from the service
algorithm into a model predictive control (MPC) framework, scope of the nearby edge server, the connection will switch to
and then construct a mobility-aware service migration and DNN
exit point selection (MOMEPS) algorithm, which improves the
a new BS for remote access to the service, and the QoS will
long-term service utility within limited predictive future inevitably degrade due to the longer transmission delay caused
information. However, heavy computation overheads of by increasing network distance. Although methods such as
MOMEPS algorithm impose burdens on mobile devices, thus we service replica [3], can alleviate the service performance
further advocate a cost-efficient algorithm, named smart- degradation caused by user mobility via backing up services
MOMEPS, which introduces a smart migration judgement based
on Neural Networks to control the implementation of
on different edge servers, it brings issues such as user privacy
(MOMEPS) algorithm by wisely estimating whether the DNN and storage redundancy. Therefore, we focus on another
service should be migrated or not. Extensive trace-driven widely adopted approach, dynamic service migration [4]–[7],
simulation results demonstrate the superior performance of our to solve this problem.
smart-MOMEPS algorithm for achieving significant overall Designing a suitable dynamic service migration scheme
utility improvements with low computation overheads compared
with other online algorithms.
faces a few challenges. Specifically, frequent service
migration would cause extensive service downtime [8], and
Index Terms—DNN service migration, multi-exit DNN, service thus is unacceptable for latency sensitive DNN services,
downtime, mobile edge computing, model predictive control.
especially in internet of vehicular applications where the
users have high mobility. Besides, blindly allowing service to
I. INTRODUCTION follow user mobility (such as Ping-pong loops, i.e., the
W ITH the explosive development of Internet of Things user goes back and forth between two BSs’ coverages) incurs
(IoT) devices, an increasing number of Deep Neu- unnecessary migrations and heavy overheads. Therefore, it is
ral Networks (DNNs)-driven IoT applications, including face critical to dynamically migrate a DNN service in a moderated
recognition, autonomous driving and augmented reality are way to follow the user mobility.
emerging. Generally, such applications require massive com- Existing work on edge service migration has regarded
puting resources to guarantee low latency and high inference container as a promising way for service provision [9]–[12].
accuracy, which are often unavailable due to mobile devices’ Compared to traditional virtual machine (VM), container is
a more lightweight virtualization technique and takes up less
2
Container Image Registry delay. To combat the service outage, we could run DNN
service on a mobile device with a proper exit point during
migration. However, exiting inference at an early DNN layer
Layer 1
... results in a loss of accuracy. As the network distance
Layer n
DNN service
container image
continues to increase, the benefits of early exit will
gradually decrease as the inference needs to exit at earlier
exit point.
As illustrated in Fig. 1, a mobile vehicle travels around
EDGE1 EDGE2 EDGE3
three regions, each containing a BS endowed with an edge
conv
conv
conv
Exit 1
FC
DNN service
Exit 2
FC
server. For better QoS, the vehicle needs to constantly decide

whether to migrate service or choose early exit points while
To exit early
or not?
Region 1 Region 2 Region 3
driving. When the vehicle is going back and forth between
regions 2 and 3, it’s wiser to exit DNN inference early than
BS1 BS2 BS3 keep service following the vehicle’s mobility with considering
the service downtime. In addition to the user trajectory,
To migrate service or not?
service request frequency is another important factor in
migration decisions. The vehicle will generate a higher
move frequency of service requests in region 2 because of the more
complicated road condition. In this case, migrating service
to BS2 in advance
User move User move User may yield higher utility. Therefore, future information of user
activities could help us make a better decision on service
Fig. 1. Dynamic DNN service migration in mobile edge computing. counteract the increased transmission
storage space than VMs. Nevertheless, as DNN applications

are generally built on machine learning frameworks such
as TensorFlow and PyTorch, the container image sizes of
these applications can use up to be hundreds or thousands
of megabytes. Service downtime caused by migration comes
primarily from fetching the image from the remote registry,
which is relatively long for latency-sensitive applications.
Therefore, frequent migration would lead to significant traf-
fic overhead and terrible user experience. In our work, we
leverage the layer sharing feature of the container to reduce
service downtime. Container image packages the necessary
files, such as runtime tools, system libraries, and data files in
different independent layers, which can be shared by different
images [13]. For example, the official container image size of
TensorFlow on dockerhub is 400MB [14], and it will cost 80
seconds to migrate on a 5 Mbps wide area network (WAN).
When a TensorFlow-based DNN service is migrated to an
edge node where the TensorFlow image already exists, the
service can share this base image at the target edge node
without downloading the full service image. In this case, only
a part of service data such as state-dependent files, user’s
private data or personalized model parameters need to be
migrated, and hence the service downtime could be
significantly reduced.
In this work, we focus on the dynamical service migration
problem in a distributed and time-varying MEC scenario to
optimize the long-term user utility, e.g., the inference
accuracy of the DNN model. We assume that a request will
bring full utility if it completes before the deadline.
Otherwise, the utility will be zero. As mentioned above,
frequent migration will cause numerous service interruptions,
and during the migration, all the requests will miss the
deadline and bring no utility. Since the increase in network
distance is the main cause of migration, in order to reduce
the frequency of migration, we introduce an early exit
mechanism [15] to acquire less inference latency to
migration and exit point selection. Fortunately, a variety of 3
studies have been on predicting user behavior, which can be
utilized to predict future information for decision-making.
In this paper, we consider a time-slotted system and for-
mulate the long-term optimization problem. We propose an
online algorithm based on Model Predictive Control
(MPC)
[16] framework to make adaptive DNN service migration
and exit point selection decisions. Then we devise a
migration judgement approach based on Neural Networks to
make an efficient policy for higher user utility and lower
computation overhead. The main contributions of this paper
are listed as below:
1) We propose a general framework of mobility-aware
dynamic service migration and DNN model exit point
selection in MEC scenario. Such a framework enables
efficient DNN service that maximizes the long-term
user utility.
2) We formulate a long-term user utility maximization
problem considering downtime in MEC. We solve the
problem in an offline setting where the information of
user’s activities is know, via a polynomial-time
dynamic programming algorithm. In an online setting
where the information is unknown, we leverage a
MPC framework to construct an online proactive
service migration and DNN exit point selection
algorithm named MOMEPS to solve this long-term
utility problem. Furthermore, to cope with the heavy
computation overhead of MPC, we devise a smart
migration judgement approach based on Neural
Network called smart-MOMEPS, which ef- ficiently
navigates the performance and computation overhead
trade-off.
3) We conduct extensive experiments to evaluate our
online algorithm based on real-world data traces. We
demonstrate the effectiveness of our approach by
comparisons with benchmark algorithms. The trace-
driven simulation results show that our approach can
significantly improve the overall utility and reduce the
computation overhead.
4
The rest of this paper is organized as follows. Section II

Container-based Service Migration: In addition,
briefly reviews related works of service migration. In section
container-based service migration is specifically considered
III, we describe the system model and problem formulation. In
in our work which is more lightweight and saves storage
section IV, we propose an optimal offline service migration and
space compared to traditional VM. To reduce the migration
exit point selection algorithm to maximize overall user utility.
overhead, [11] designed an efficient live migration system
We explain the MPC-based online algorithm and furthermore
which ensures integrity of components by leveraging the
propose a more lightweight algorithm in Section V. In Section
layered structure of container. In [9], authors proposed
VI, we evaluate our online algorithm to demonstrate the
an edge computing platform architecture which supports
effectiveness. Finally, we conclude this paper in Section VII.
seamless docker container migration of offloading services
while also keeping the moving mobile user with its nearest
II. RELATED WORK edge server. [12] presented Voyager, a just-in-time live
container migration service which combines service memory
With the rapid development of MEC, which allows low
state migration with local filesystem migration to minimize
transmission delay and fast response speed, significant chal-
service downtime. However, these works mainly focus on
lenges are gradually emerging. The inherent characteristics of
leveraging container features to reduce service downtime.
MEC services which are distributed in different geographic
While our work incorporates the impact of various service
regions, and the mobility of users bring dominating
downtime due to layer sharing feature of container into
difficulties. Hence, edge service migration solution, which can
migration and improves long-term user utility by optimizing
help to maintain QoS in a dynamic MEC environment, has
migration decisions.
become one of the significant research topics.
DNN Inference Service Migration: Since DNN has been
Service Migration without Future Prediction: Plenty of
extensively applied for intelligent applications at the edge,
literature has focused on service migration in MEC to cope
ongoing service downtime becomes extremely unaffordable
with the key challenge of user dynamics. A branch of such
for latency-sensitive DNN inference services. To figure out
works accommodate arbitrary user mobilities without predic-
the above difficulty, [25] developed a mobility-included DNN
tion. Zhao et al. tackled virtual machine migration strategy
partition offloading algorithm to adapt to user movement.
based on multiple attribute decision-making, aiming at mini-
Wang et al. adopted DNN inference exit at earlier layers
mizing a comprehensive cost given current network situation
during service outage to shorten the inference delay by
and user location [17]. In [18], Ouyang et al. researched an
sacrificing an acceptable level of accuracy [26]. Different
adaptive service placement mechanism at MEC and
from [25] and [26], we solve DNN inference service outage
formulated it as a contextual multi-armed bandit learning
collaboratively via early exit mechanism and dynamic service
problem to optimize user’s perceived latency and service
migration to maintain the user QoS.
migration cost. Besides, many studies utilized Markov
decision process-based (MDP) methods to solve dynamic
service placement under the assumption that user mobility III. SYSTEM MODEL AND PROBLEM FORMULATION
follows or can be approximated by a Markov chain-mobility In this section, we present system model and problem
model. Specifically, [19] worked on balancing a trade-off formulation for DNN service migration with downtime and
between migration cost and quality by modeling the service exit point selection in mobile edge computing.
migration procedure using MDP. In [20], the optimal policy of
edge service migration formulated as MDP was proved to be a A. Overview of Dynamic Migration for DNN Service
threshold policy when user mobility follows a one-
Fig. 1 illustrates a typical edge intelligence scenario, i.e.,
dimensional (1-D) asymmetric random walk mobility mode.
edge-assisted one-device object detection for video streaming
Service Migration with Future Prediction: Accordingly,
analytics. More explicitly, a smart vehicle traveling around an
user activities prediction in service migration scenario is also
urban city uses DNN service to process surrounding informa-
widely studied to improve user QoS. Ma et al. incorporated
tion gathered by its camera. Based on existing virtualization
two-timescale Lyapunov optimization method and limited
technologies, the service profile and environment for multi-
user mobility prediction to find the optimal service placement
exit DNN model (e.g., AlexNet) can run on a dedicated
decisions [21]. Zhang et al. in [22] overcame the challenges
container. Thus, the real-time video stream can be collabora-
of an underlying dynamic rendering-module placement prob-
tively processed at both local vehicle and nearby edge server
lem by leveraging model predictive control (MPC) to tackle
via task offloading [1]. To guarantee a reliable QoS for a
user trajectory prediction at the edge. Both [23] and [24]
moving vehicle, dynamic DNN service migration and exit
took advantages of MPC to work out dynamic placement
point selection are adopted to accommodate the high
of virtual network functions (VNF) and achieved efficient
dynamics within required completion time.
resource scheduling. However, service downtime, which is
With the presented edge-assisted architecture, we consider
widespread at the edge and has great influence on migration
a set of base stations (BSs):B= 1, { ..., B }, each of which is
policies, was not considered in these works. In this paper,
equipped with an edge server to provide DNN service to the
we take service downtime fully into account and propose an
user. And the user accesses the service through the nearest
MPC-based algorithm, which integrates NN-based migration
base station. In line with the recent work [27] on edge
judgement for mitigating the heavy computation cost of MPC.
computing, we adopt a discrete time-slotted model to fully
characterize the
5
TABLE I
SUMMARY OF NOTATIONS constraints of DNN service execution should be satisfied as
Σ = si 1{xi (t)>xi (t−1)} , ∀t ∈ T , ∀i ∈ B,
λ(t) (3)
Notation Definition x (τ ) ≤ 1 − 1 ,
B The set of all the base stations i {xi (t)>xi (t−1)}
M All early exit points of the DNN model i∈B
xi(t) Whether user decides to offload service to BS i at time t t + 1 ≤ τ ≤ t + λ(t) − 1, ∀t ∈ T , (4)
yj (t) Whether j is the selected early exit point at time t
Tmax The maximum tolerated delay of each request
rt The request rate at time t where 1{·} is the indicator function. The constraint (3) indi-
dfj Inference delay of the j-th early point on edge server cates the current service downtime for migrating service to BS
de Inference delay of the fixed γ-th early point on local device i when xi (t) > xi (t − 1) (i.e., ix (t) = 1 andi x (t − 1) =
γ
0).
lf (t) Total delay of a request while offloading service at The constraint (4) denotes the local execution during the
t lc(t) Communication delay while offloading service at t migration procedure, i.e., the user can not make migration
le(t) Total delay of a request delay while executing locally at decision from t + 1 to t + λ(t) −1 after deciding to migrate
t si Time cost for BS i to fetching container images
λ(t) Downtime caused by migrating service at
to BS i at time t, where the service is being migrated from t
t S(t) User is in the offloading state at t to t + λ(t) 1.
−
Uo(t) User utility while offloading at To clearly describe the current state of user, we use S(t)
t Um(t) User utility while migrating at t
to indicate whether user is migrating or offloading. S(t) = 1
represents user is offloading computation to BS at t. Accord-
ingly, S(t) = 0 when user is migrating its service (executing
system dynamics (e.g., user mobility and request frquency). requests locally). Obviously, there is only one state for a
Each time slot T = {0, ..., T } matches the time scale user at a given time slot. The expressions are listed below:
where service migration and exit point selection decisions Σ
are updated. For ease of exposition, Table I summarizes the S(t) = xi (t) − 1{xi (t)>xi (t−1)} , ∀t ∈ T , (5)
introduced notations. i∈B
S(t) ∈ {0, 1}, ∀t ∈ T . (6)
B. Dynamic Service Migration C. DNN Early Exit Point Selection

To improve the service performance during task offloading
Due to the uncertain user behavior, such as user mobility procedure, we incorporate the early exist point selection of
and request frequency, the user needs to dynamically migrate DNN model to accommodate the dynamic user behavior and
service among these BSs to acquire higher QoS. Here we edge environment. Without loss of generality, we consider the
use xi(t) ∈0,{ 1 to
} indicate the decision-making for service DNN model with a set of early exit points, denoted asM =
migration at time slot t. xi(t) = 1 (i ∈ ) indicates that the (1, ..., M ). The exit point selection would directly affect the
user decides to offload computation toBthe i-th BS at t. Note service performance (i.e., delay and accuracy). For example,
that the user can only offload to one BS at each time slot. an earlier exit of DNN inference causes less computation
Thus, such a constrain of service migration decision-making delay with a lower accuracy [28]. Therefore, user can choose
can be expressed as: to exit DNN inference early on BS to avoid performance
Σ degradation due to service downtime. Let yj(t) denote ∈M
xi(t) ≤ 1, ∀t ∈ T , (1) whether j
i∈B is the selected early exit point at t. yj(t) = 1 means that the
xi (t) ∈ {0, 1}, ∀t ∈ T , ∀i ∈ B. (2) DNN model will exit the inference at the j-th exit point at t.
Like service migration, the user can only make the exit point
selection decision while offloading. In addition, only one exit
Considering the delay-sensitive nature of DNN services, we
point can be selected at each time slot. We thus have have
capture large service downtime, which is caused by fetch-
following constraints:
ing container image from remote image registry (during the
migration procedure), when optimizing the long-tem service yj (t) ∈ {0, 1}, ∀t ∈ T , ∀j ∈ M, (7)
performance. Once the user determines a migration, the DNN Σ Σ
service can only be executed locally on the user’s device and yj(t) = xi(t), ∀t ∈ T . (8)
j∈M i∈B
successive migration decisions should be consistent with the
original one until the whole migration procedure terminates.
Due to different container layers deployed on edge servers, D. User Request Latency
different amounts of data should be downloaded from the We assume that a request will bring the full user utility
central image registry to fetch necessary container layers at if it is completed before the deadline. Otherwise, the user
the target BS, which leads to various service downtime based utility will be zero. Therefore, to maximize the overall user
on current migration decision. In this regard, we denote the utility, the latency of each request should meet its deadline
time cost for fetching container image at BS i by si, and [26]. We define Tmax as the maximum tolerated delay of each
use λ(t) to denote the service downtime at time t. Then, the request. The user request latency differs in offloading state
and mitration state. 6
7
Latency of offloading: When user offloads its request to an

the problem. Next, we focus on the challenging case without
edge server, the request delay lf (t) is jointly determined by
future information. We leverage advanced machine learning
the DNN inference delay and communication delay. We set
methods to predict limited information, and devise an online
the delay for edge server to perform model inference at the
algorithm.
j-th early exit point as df . lc(t) denotes the communication
j
delay at t. Apparently, lc(t) includes the delay for
IV. OFFLINE OPTIMAL SERVICE MIGRATION AND EXIT
transmitting the request from the user’s device to the BS that
POINT SELECTION ALGORITHM SOLUTION
user is connected to at time t, and the forwarding delay
which majorly depends on the hop distance along the In this section, we present the offline optimal solution of
shortest communication path among these BSs. The the service migration and DNN exit point selection problem
constraints of lf (t) can be expressed as: through dynamic programming, when the complete future
information of the user activities is assumed to be exactly
lf (t) = S(t)(lc(t) + djf ), known.
lf (t) ≤ Tmax, ∀t ∈ T . (9) First, we show that this offline optimal problem possesses
the property of optimal substructure. We define (i, q, n) as
Latency of migration: During migration, DNN inference the state of the user, in which i ∈ T represents time, q ∈
will be processed at local device with a fixed early exit point
B service and n ∈ M
represents the BS that hosts the user
γ ∈ . Hence the user request latency le(t) is mainly deter- represents the selected DNN model exit point. Accordingly,
mined
M by the computation delay. We set the DNN inference (i−, q−, n−) is the previous state of (i, q, n). We set p ∈ B
delay while running at local device as deγ , and its constraints and m ∈ M as the user migration decision and exit point
are shown as follows: selection, respectively. Let C((i, q), p, m) represent the whole
l (t) = (1 − S(t))de ,
time slots to complete p and m. It can be expressed as
e γ


le(t) ≤ Tmax, ∀t ∈ T . (10) 1, p = q,
C((i, q, n), p, m) = sp, p q, sp + i ≤ (12)
E. Problem Formulation 
T,
T − i, p ̸= q, sp + i > T.
After introducing the user’s decision variables and request
latency, we are ready to present the user utility maximization Here, p = q indicates that user decides to re-select the model
problem in a given time horizon. Here we define the utility as exit point in the current state, and it cost one time slot to
the inference accuracy of the DNN model. The accumulated complete. When the user chooses to migrate service (p q),
user utility contains the utility in migrating and offloading. the decision execution time is the service downtime (sp), and
We define the utility at j-th early exit point is uj . The user the portion beyond T is discarded.
request rate at time t is rt. When the request is offloading, the Let U ((i, q, n), p, m) represent the sum of the utilities
user utility Uo(t) at time t is the sum of the utility of all during the completion of decision p and m. Note that DNN
requests at that time, which can be expressed as: inference will be processed at local device with a fixed
early exit point γ during service downtime (p q). f (i, q, n)
Uo(t) = S(t)rtuj, ∀t ∈ T . represents the accumulated utility at state (i, q, n). Then, given
(i, q, n) and the user’s decisions p and m, the state is transited
During migration, we set the utility at a fixed exit point γ
to (i+, q+, n+), where
is uγ, and the migration user utility Um(t) at time t can be
expressed as: i+ = i + C((i, q, n), p, m), (13)
Um(t) = (1 − S(t))rtuγ, ∀t ∈ T . q+ = p, (14)
n+ = m. (15)
We formulate an accumulated user utility maximization prob-
lem in a given finite time horizon T as follows: And the user obtains following
u r, utility if p =
m i
U ((i, q, n), p, m) =
q, (16)
Σ
max (t)) (11) uγriC((i, q, n), p, m), otherwise.
i j T (U (t) + U
m ++
t=1
o Thus the accumulated utility at ,q
x (t),y (t) (i
+
,n ) is
s.t. (1) − (10) f (i+, q+, n+) = f (i, q, n) + U ((i, q, n), p, m). (17)
Solving the problem in (11) has the following two chal- ∗
Let f (i, q, n) denote the optimal accumulated utility at
lenges. Firstly, as the objective is the accumulative utility in a (i, q, n), and then we have
time horizon, it is difficult to gain the complete future infor-
f ∗ (i+ , q + , n+ ) = max (U ((i, q, n), p, m) + f ∗ (i, q, n)) (18)
mation (e.g., user trajectory and request frequency). Secondly, p∈B,
m∈M
the user’s decisions are coupled across different time slots.
s.t. (12) − (17).
That is, decision at current slot would affect decisions in the
future. To solve these challenges, we begin with an ideal case Since the problem can be regarded as the Bellman equation,
where future information is available, and develop an optimal a dynamic programming approach can be adopted to derive an
offline solution. This helps provide some insights to solve offline optimal solution of our service migration and DNN
exit 8
9
point selection problem. For ease of exposition, we transform Algorithm 1 Offline Optimal Migration and Exit Point Selec-
the offline optimal problem into a longest path problem by tion Strategy (OMEPS) Algorithm
constructing a directed acyclic graph (DAG). 1: Parameter Notation
2: Vector O(i,q,n) is the optimal migration strategy, it
T1 T2
T5
T3 T4
contains a series optimal states from initial state to state
B1 B1 B1 B1 B1 B1 (i, q, n).
− − −
3: Initialization: Initialize the initial state (i , q , n ) =
(0, 1, 1), optimal strategy O(i,q,n) = ∅, O(0,1,1) =
{(0,
∗ 1, 1)}, optimal accumulated utility of the initial state
S B2 B2 B2 B2 E f( 0,1,1) = 0.
4: for each time slot i = 1, ..., T do
5: for all q such that q ∈ do
6: Determine the optimal
B previous state according to
B3 B3 B3
decision p by using (12) − (17), i.e., (i−, q−, n−) opt =
Fig. 2. Longest-path problem transformation of offline service migration and ∗
DNN exit point selection problem over T = 5 time slots with three BSs B1,
arg max(i− ,q− ,n− ) {U ((i− , q − , n− ), p, m) + f( i− ,q− ,n− ) }.
B2 and B3. 7: if the optimal previous state (i−, q−, n−)opt is
found, then
As shown in Fig. 2, we construct a graph G = (V, E) to 8: Update the optimal decisions to current state
represent all possible service migration and DNN exit point O(i,q,n) {(i, q, n)} ∪ (i−,q−,n−)opt .
=
selection decisions within T time slots. Each vertex presents 9: O Update the optimal accumulated utility at
∗
the state (i, q, n) that user can reach. Since the future current state f( i,q,n) = U ((i− , q − , n− )opt , p, m)
∗
information (user trajectory and request frequency) is f( .
known, exit point n can be determined when i and q are + (i−,q−,n−)opt )
10: else
given at each state. Note that the source vertex S represents 11: Update the optimal accumulated utility at cur-
the initial state (we set it as (0, 1, n)). Each state (except the ∗
rent state f( i,q,n) = 0.
initial state) is transited from the previous state by performing
the corresponding decision. The destination vertex E is an 12: end if
auxiliary vertex to ensure that a single longest path can be 13: end for
14: end for
founded. Each edge weight on the DAG between two states
represents the sum of the request utilities of executing 15: Pick the maximum accumulated utility state (i, q, n)opt =
∗
decisions, and the edges connecting to E have zero weight. arg max(i,q,n) f( i,q,n) and set its corresponding
It’s worth noting that, suppose the user decision can be policy O(i,q,n) as the offline optimal service migration and
completed before T , we can draw a directed edge between DNN
two states. However, if the decision completes at a time model exit point selection strategy Ooff .
beyond T , for example, when a user in state B1 at T4
performs the decision of transferring to state B2. We draw a
directed edge from B1 at T4 to the corresponding yellow to find out the optimal strategies O of each time slot for a
auxiliary vertex B2. Accordingly, the weight of the edge given finite time horizon. In the algorithm, we can obtain
represents the sum of the utilities that user can obtain from the longest-path (i.e., the optimal service migration and DNN
T4 to the end. The weight of the edge connecting each model exit point selection strategy) for each state by solving
vertex of time T to the yellow auxiliary vertices is zero. We the Bellman’s equation (i.e., line 6). Then we can pick the
have now completed the construction of the DAG. path that contains the state with the highest accumulated
We can derive the user’s optimal strategy by finding the utility at T as the longest path (i.e., line 15), which is the
longest path from S to E. Specifically, given all the optimal solution to the problem. For searching the longest
information of user activities over T time slots, the weight of path, the algorithm needs to enumerate at most B2 possible
all edges can be calculated. And the total weight of a path states at each time slot. Thus, for the T -time slots, the time-
from source vertex S to destination vertex E can hence complexity of Algorithm 1 is O(B2T ).
present the whole utility over the time horizon.
Consequently, the optimal service migration and DNN model V. ONLINE SERVICE MIGRATION AND EXIT PINT
exit point selection strategy can be found by taking the SELECTION ALGORITHM
longest path from S to E. As shown in Fig. 2, we give the
So far we have presented the solution of the problem in
longest path for T = 5 with 3 BSs. Each red vertex
(11) under complete future information scenario as a baseline.
represents the state of the user at the corresponding time slot,
In practice, it is challenging to obtain complete information.
and the vertex pointed by the solid black edge is the user state
This motivates an optimal online algorithm without complete
after performing the decision. Obviously, since this longest
information. To this end, in this section, we combine some
path problem has an optimal sub-structure property, it can be
popular machine learnig techniques (e.g., LSTM) to predict
solved by the classical dynamic programming approach.
future information (e.g., mobility traces) to improve the long-
Algorithm 1 shows the pseudocode of our optimization al-
term service performance with informed decision making.
gorithm which uses dynamic programming with memoization
However, frequent prediction would incur large running costs
1
Algorithm 2 MPC-Based Online Migration and Exit Point

Selection Strategy (MOMEPS) Algorithm
CONV
FC
1: Parameter Notation
Migration decision-making
Exit Point 1
2: Vector π(i,q,n) is the optimal migration and exit point
CONV
selection strategy of a prediction window, it contains a
FC series optimal states from current time slot t to t + W .
3: Initialization: τ = 1, t = 1.
Exit Point 2
... Repeat
Predict information of 4: while current time slot t ≤ T do

t0+1~t0+w 5: τ =t
prediction OMEPS 6: Predict the user mobility and request frequency of
future W time slots [τ + 1, ..., τ + W ].
7: Determine the optimal strategy π(i,q,n) of the predic-
t t +1 t +2...................t +n tion window [τ, ..., τ + W ] by using OMEPS.
0 0 0 0
8: Select the first step (i, q, n) of π(i,q,n) to execute.
Fig. 3. Overview of MPC-Based Online Migration and Exit Point Selection 9: t=i+1
Strategy. 10: end while
(e.g., prediction latency), which are not affordable for the

the robustness of informed decisions, we adopt a standard
resource-constrained user equipment. To well balance the
MPC-based approach for decision execution, i.e., only the first
performance-cost trade-off, we propose a proactive adaptive
decision would be implemented in each prediction window. In
algorithm, named smart-MOMEPS, to jointly optimize ser-
this way, the negative impact of accumulated predictive errors
vice migration and early exit point of DNN services, which
can be considerably alleviated, since the potential terrible
integrates MPC-based online migration and exit point
decisions in the prediction widow (especially last few time
selection strategy (MOMEPS) and NN-based migration
slots) are rejected.
judgement for achieving higher averaged utilities at low
We summarize the MPC-based online migration and exit
costs.
point selection strategy in Algorithm 2. In line 7, user needs to
perform OMEPS algorithm to obtain the decisions of service
A. MPC-Based Online Migration and exit point selection migration and early exit point selection. Thus for the T -time
Strategy slots, the time-complexity of Algorithm 2 is O(B2WT ).
A wide variety of machine learning techniques (e.g., [29]–
[32]) have been studied to well estimate the user behavior
by leveraging collected data information. Among them, long B. Smart MOMEPS
short-term memory (LSTM) [33] is considered particularly However, MPC-based online migration and exit point se-
efficient for time series prediction (user activities in our lection strategy (MOMEPS) brings not only substantial im-
problem) due to its ability to keep a memory of previous provements of service performance, but also heavy running
inputs [34]. In addition, compared with other machine costs. More explicitly, one-slot decision-making depends on a
learning methods, such as DNN, LSTM has a simpler network whole optimization over the prediction window, which con-
structure and less computation overhead, making it more tains the running cost (i.e., latency) of prediction mechanism
suitable for resource-constrained mobile devices. Therefore, and OMEPS algorithm. Such computational costs in MO-
in our work, we exploit LSTM to predict future information, MEPS largely increases the burden of resource-constrained
including user trajectory and request demanding. By mobile devices for decision optimization. To well balance the
incorporating these predictive results to assist decision performance-cost trade-off in the run time, we advocate a
optimization, we can get rid of extensive unnecessary proactive adaptive algorithm, Smart-MOMEPS, for enabling
migrations caused by uncertain user mobility (e.g., Ping-pong more moderated decision optimization on the fly.
loops). We design the Smart-MOMEPS algorithm based on the
The MPC-based online migration and exit point selection observation of the MOMEPS execution results. Intuitively,
strategy works as follows. As illustrated in Fig. 3, at the MOMEPS would avoid frequent service migrations to reduce
beginning of each slot, the device would use its prediction the user utility degradation due to extensive service down-
mechanism to estimate the future information of mobility time. In this regard, we apply the MOMEPS algorithm over
trace and request demands within W time slots before the whole process to demonstrate the proportion of service
decision- making. Here, W is actually the size of prediction migration and edge offloading (i.e., no migration) in the long-
window. Sequentially, based on the predictive results, the term policy optimization. As shown in Fig. 4, offloading
device can derive optimal decisions of service migration and decisions play a dominant position (i.e., 81.82%) to reduce
early exit point selection during the prediction window via the service downtime, so that DNN services can be processed as
OMEPS algorithm. many as possible at the resource-rich edge server. Inspired
The prediction mechanism in the online setting can hardly by such vital observations, a core idea in our proactive adap-
provide perfect prediction. Moreover, as the window size W tive algorithm is to design a light-weight approach to assist
increases, the predict error would accumulate, which even-
tually leads to severe performance degradation. To enhance
1
Migration judgement
NN model
Offloading MOMEPS
exit point selection
CONV
FC
Exit Point 1
CONV
FC
Fig. 4. The percentage of migration and offloading throughout the MPC Exit Point 2
execution. Fig. 5. Overview of smart-MOMEPS Online Migration Strategy.
Algorithm 3 Smart-MOMEPS Algorithm

user quickly determine whether a DNN service migration is
1: Initialization: t = 1.
needed or not. As illustrated in Fig 5, once determining DNN
2: while current time slot t ≤ T do
service migration at current slot, MPC-based predictive policy
3: Determine whether the current service needs to be
would be adopted to derive an efficient migration decision.
migrated with trained NN model.
Otherwise, the device can offload its task with an optimal exit
4: if current service needs to be migrated, then
point selection to achieve high service performance. Clearly,
the times of executing MPC-based predictive policy will be 5: Determine the current migration decision (i, q, n)
with MOMEPS and execute it.
significantly reduced, thus leading to lower computational
overheads. 6: t=i+1
7: else
Inspired by [35], we adopt a light-weight Neural Networks
(NN) model to dynamically determine whether migrate DNN 8: t=t+1
service or not based on current user state, due to its powerful 9: end if
10: end while
presentation capacity. Particularly, the device can gather suffi-
cient sample traces beforehand and train the NN model
offline, and then only run the lightweight model inference
A. Simulation Setup
online1. To well characterize the underlying mapping of NN
model, we consider two key items observed by the device at In this experiment, we take AlexNet [36] as the DNN
current time slot, i.e., the frequency of requests and hop service, and we employ BranchyNet framework to train this
distance between service and user as the input for NN model. AlexNet model with five exit points. We test the delay and ac-
The rationales behind are as follows: on the one side, a long curacy of inference at each exit point, using a desktop PC
transmission delay with large hop distances would stimulate (Intel i7-6700 CPU, and 8 GB memory). The outcome is
the DNN service migration, aiming at higher utilities over shown in Table II. For user mobility, we utilize the ONE
successive time slots; On the another side, the request demand simulator [37] to generate user movement traces. ONE
is an essential indicator of service performance. Particularly, simulates the scenario where mobile devices move along the
high frequency of user requests (e.g., the number of video roads and streets in an urban city. We chose the movement
frame in objective detection) implies high potential costs for data of cars, and the speed of each car is from 10km/h to
DNN service migration, since the DNN model must be 50km/h. For simplicity, we divide the whole area into 169
inferred at the local device during migration procedure. square parts, and each part has a base station that can provide
Finally, we summarize this Smart-MOMEPS algorithm in service×to the user and occupies 200m 200m area. In our
Algorithm 3. simulation, we adopt a discrete time-slotted model to
characterize the system dynamics. We set the interval of one
VI. EVALUATION time slot to be 20 seconds, and the wireless connection and
system dynamic remain unchanged during each time slot. For
In this section, we carry out trace-driven simulation to eval- user trajectory and request frequency prediction, we utilize
uate the performance of our proposed algorithm and demon- LSTM to obtain the results.
strate its effectiveness compared with benchmark schemes.
1
Note that the Neural Networks can also be trained in an online manner. B. Benchmark Schemes
Specifically, we collect the results of the MOMEPS execution in the We compare our proposed algorithms with aforementioned
migrated case as training data, whose volume would be gradually
increased as time goes by. Considering introduced deployment costs on the offline optimal solution and the following 5 benchmarks.
resource-constraint device, we adopt a suffice light-weight Neural Networks 1) Always Migration (AM): user always migrates service
with two full- connected layers, where only few data samples are needed for
well training. to the base station where it is currently located.
1
TABLE II
LATENCY AND ACCURACY AT EACH EXIT POINT OF ALEXNET.
Lazy-MPC Smart-MOMEPS MOMEPS
1400
Exit point 1 2 3 4 5
1200
Latency (ms) 9.4 14.0 18.5 24.4 30.2
Computation overhead
Accuracy 70.0 71.2 76.0 77.7 78.0 1000
800
600
120
FHC Lazy-MPC 400
LM AM PLM MOMEPS
110 Smart-MOMEPS
200
Algorithm efficiency (%)
100
0
500 1000 1500 2000
90 Time slot
80 Fig. 7. MPC-based algorithms computation overhead at different time slots.
70
60
Besides, we can observe that FHC algorithm has the worst
50 performance, with only 58% of the offline optimal. This is
500
1000 1500 2000 because the decisions in the last few time slots within the
Time slot
prediction window deviate far from the optimal decisions due
Fig. 6. Algorithm efficiency at different time slots (W = 5 for MPC-based to the accumulation of prediction errors, which can severely
algorithms).
reduce the algorithm efficiency. Compared to FHC, Smart-
MOMEPS has 1.6 times efficiency and better robustness.
2) Lazy Migration (LM): service will not be migrated until Except for FHC, algorithms that leverage future information
the distance between the base station where it is hosted work better than those that do not. The reason is that we
and the user exceeds the threshold. Once the migration have considered service downtime in this work. If the service
is triggered, the service image is migrated to the base blindly follows the user’s trajectory (AM and LM), service
station where the user is currently located. cannot be migrated to a suitable BS in most cases due to
3) Predictive Lazy Migration (PLM): a prediction-based user mobility. In contrast, future information can help user
Lazy Migration algorithm proposed in [38]. It leverages migrate service as appropriately as possible. Compared to
one-shot prediction to improve the LM algorithm. LM, PLM can help user avoid unnecessary migrations by
4) Lazy MPC (L-MPC): this algorithm is proposed to using the information of the next slot. However, when the
reduce the computation overhead of MPC. The basic service needs to be migrated, PLM can not guide user where
idea of L-MPC is that we use MPC only when the to migrate the service since the downtime contains several
migration condition in LM is met. time slots. Accordingly, far-sighted Smart-MOMEPS can be
5) Fixed Horizon Control (FHC): unlike standard MPC- more effective than PLM.
based algorithm, FHC performs the whole decisions
within a prediction window instead of only the first step
of these decisions [39].
D. Algorithms Execution Cost
C. Algorithm Performance Comparisons We also evaluate the computation overhead reduction

caused by Smart-MOMEPS under different time slots. We
We first evaluate our proposed algorithms and five bench-
define the computation overhead of these algorithms as the
marks. The prediction window of MPC-based algorithms is
execution times of prediction and OMEPS. Under the result
set to 5 time slots. We define a efficiency metric as the
of Fig. 7, Smart-MOMEPS can fulfill an average computation
ratio of the overall user utility obtained by these algorithms
overhead reduction ratio 3.1 times more than MOMEPS.
to the utility obtained by the offline optimal algorithm. The
Although the computation overhead of Smart-MOMEPS is
numerical results are shown in Fig. 6, our proposed algorithms
higher than that of Lazy-MPC, it is still worthwhile
are stable in different time slots and always achieve more
considering the better algorithm efficiency. To further verify
notable effects compared with benchmarks. The efficiency
the performance of our algorithm on resource-constrained
of Smart-MOMEPS is 96%, which is almost the same as
devices, we evaluate the suitability of our algorithm on mobile
MOMEPS (97%). This result demonstrates that our proposed
devices by measuring the execution latency on different
smart migration judgement approach can effectively help user
devices. The results are shown in table 3. Compared to the
determine whether migration is needed based on current user
interval of a time slot (20s), our algorithm is lightweight
state.
enough to help user make migration decisions quickly.
1
TABLE III
LATENCY OF LSTM ANDOMEPS ON DIFFERENT DEVICES.
Method Raspberry Pi Jetson NANO Jetson TX2 95
OMEPS 102.4 ms 39.1 ms 22.3 ms

90
LSTM 44.0 ms 24.5 ms 24.2 ms

85
E. Impact of Prediction Error and Look-ahead Window Size

80
Furthermore, we investigate both the impact of prediction
error and how far to look into the future on algorithm 75
efficiency. We use two prediction methods to obtain the
future information. One is long short-term memory (LSTM) 70
and the other is autoregressive integrated moving average 2 3 4 5 6 7 8
model (ARIMA) [40]. The prediction accuracy of these two Prediction window size
methods are 93.3%, and 82.5%, respectively. Intuitively, the

more accurate the MPC model’s predictions are, the better the Fig. 8. Algorithm efficiency at different model accuracy and prediction
window size.
algorithm’s performance will be, and the experimental data
confirm this intuition. As shown in Fig. 8, the efficiency of 100
lstm-MOMEPS
lstm-smart-MOMEPS arima-MOMEPS 200
LSTM-based algorithm is 5.3% higher than the ARIMA- arima-smart-MOMEPS

95
based algorithm on average. 180
As for the influence of different prediction window size 90
Migration times (%)

W , Fig. 8 shows that the efficiency of the algorithm will 85
160
improve with the increase of W at the beginning. This is
80
because of the existence of service downtime. When the
service downtime for migration to a base station is more than 75
140
the size of the prediction window, the algorithm will not select 70
this base station as the target of migration. Moreover, the 120
long- term effect of migration is not considered when W is 65
small, which will also cause the solution to underperform. 60 100

2 3 4 5
Therefore, when the size of the prediction window is smaller Number of early exit points
than the maximum service downtime, the performance of the
algorithm
will improve with the increase of the size of the prediction Fig. 9. Algorithm efficiency at different prediction window size.
window. Nevertheless, this improvement does not last forever.
Fig. 8 shows that the performances of these algorithms begin
to decline when W is greater than 6. This can be explained by model exit points play a positive role in DNN service migra-
the fact that when there are no errors in the future prediction, tion. Assuming there is no exit point in a DNN model, when
user perceived delay exceeds the maximum tolerated delay of
W can be set as large as possible to obtain the best long-term
the service, user can only migrate smart_MOMEPS_mig_percentsmart_MOMEPS_uti_percent
MOMEPS_mig_percentMOMEPS_uti_percent the service since the utility
performance. However, we can not predict the future perfectly
will be zero if missing the deadline. During service downtime
in practice. The farther ahead we look into the future, the less
caused by migration, service requests will be processed
accurate the prediction becomes. Therefore, when the size of
locally with poor utility. Instead, exit point provides user with
the prediction window continues to increase, the performance
an al- ternative to migration. User can choose to exit DNN
of the algorithm will gradually decline.
inference early to reduce computation delay, which can
alleviate the increased transmission delay caused by user
F. Impact of Different DNN Model Exit Points movement. This requires very little utility sacrifice compared
to migration. In a nutshell, more exit points lead to more user
To evaluate how the number of early exit points of DNN utility and fewer migrations.
model influences the algorithm performance, we employ an
AlexNet model as a DNN service and measure the perfor-
mance of our algorithms with 2 to 5 exit points. To concisely G. Impact of the Base Station Density
show the performance difference, we characterize the perfor- Finally, we show the impact of different base station den-
mance of each algorithm as its ratio to the offline optimal with sities and observe the changes of our algorithms. It is worth
5 exit points. noting that the range of services provided by each base station
The algorithm efficiency and migration times results are is constant in our setup, and we control the density of base
shown in Fig. 9. As the number of exit points increases, the stations by changing its properties (one only forward user
efficiency of the algorithm will improve, and fewer service requests, while the other are capable of both forwarding and
migrations will be performed. This result demonstrates that executing user requests). For simplicity, we characterize the
1
[3] C. Li, M. Song, M. Zhang, and Y. Luo, “Effective replica management

for improving reliability and availability in edge-cloud computing envi-
100 Smart-MOMEPS MOMEPS OMEPS
ronment,” Journal of Parallel and Distributed Computing, vol. 143, pp.
107–128, 2020.
[4] Q. Zhang, Q. Zhu, M. F. Zhani, R. Boutaba, and J. L. Hellerstein,
98
“Dynamic service placement in geographically distributed clouds,”
IEEE Journal on Selected Areas in Communications, vol. 31, no. 12, pp.
762– 772, 2013.
96
[5] T. Ouyang, Z. Zhou, and X. Chen, “Follow me at the edge: Mobility-
aware dynamic service placement for mobile edge computing,” IEEE
94
Journal on Selected Areas in Communications, vol. 36, no. 10, pp.
2333– 2345, 2018.
[6] V. Farhadi, F. Mehmeti, T. He, T. F. L. Porta, H. Khamfroush,
92 S. Wang, K. S. Chan, and K. Poularakis, “Service placement and request
scheduling for data-intensive applications in edge clouds,” IEEE/ACM
Transactions on Networking, vol. 29, no. 2, pp. 779–792, 2021.
90 [7] R. Urgaonkar, S. Wang, T. He, M. Zafer, K. Chan, and K. K. Leung,
4.16 6.32 8.40 10.04 12.57 “Dynamic service migration and workload scheduling in edge-clouds,”
Density of base station (1/km²)
Performance Evaluation, vol. 91, pp. 205–228, 2015.
[8] A. Machen, S. Wang, K. K. Leung, B. J. Ko, and T. Salonidis, “Live ser-
Fig. 10. Algorithm efficiency at different base station densities. vice migration in mobile edge clouds,” IEEE Wireless Communications,
vol. 25, no. 1, pp. 140–147, 2018.
[9] L. Ma, S. Yi, N. Carter, and Q. Li, “Efficient live migration of edge
efficiency of each algorithm under different BS densities as its services leveraging container layered storage,” IEEE Transactions on
Mobile Computing, vol. 18, no. 9, pp. 2020–2033, 2019.
ratio to the offline optimal when the BS density is 12.57. Fig. [10] L. Gu, D. Zeng, J. Hu, B. Li, and H. Jin, “Layer aware microservice
10 presents that there exists a growth in algorithms efficiency placement and request scheduling at the edge,” in IEEE INFOCOM
as the density of BS increases from 4.16 to 12.57. This is 2021
- IEEE Conference on Computer Communications, 2021, pp. 1–9.
because more base stations bring more migration choices
[11] B. Xu, S. Wu, J. Xiao, H. Jin, Y. Zhang, G. Shi, T. Lin, J. Rao,
when user migrates services. It is also easier for user to L. Yi, and J. Jiang, “Sledge: Towards efficient live migration of docker
migrate services to a base station closer to it for higher utility. containers,” in 2020 IEEE 13th International Conference on Cloud
To summarize, the denser the base stations are, the better our Computing (CLOUD), 2020, pp. 321–328.
[12] S. Nadgowda, S. Suneja, N. Bila, and C. Isci, “Voyager: Complete
algorithms perform. container state migration,” in 2017 IEEE 37th International Conference
on Distributed Computing Systems (ICDCS), 2017, pp. 2137–2142.
VII. CONCLUSTION [13] S. Fu, R. Mittal, L. Zhang, and S. Ratnasamy, “Fast and
efficient container startup at the edge via dependency scheduling,”
In this paper, we investigate a user-centric DNN service in 3rd USENIX Workshop on Hot Topics in Edge Computing
migration and exit point selection problem with various (HotEdge 20). USENIX Association, Jun. 2020. [Online]. Available:
https://www.usenix.org/conference/hotedge20/presentation/fu
service downtime in the mobile edge computing [14] tensorflow, “Tensorflow docker images,” [EB/OL], https://hub.docker.
environment. We leverage the exit point selection and layer com/r/tensorflow/tensorflow Accessed March 7, 2022.
sharing feature of the container technique to alleviate [15] S. Teerapittayanon, B. McDanel, and H. Kung, “Branchynet: Fast
performance degradation caused by inevitable service inference via early exiting from deep neural networks,” in 2016 23rd
International Conference on Pattern Recognition (ICPR), 2016, pp.
downtime. To maximize long- term user utility, we first 2464–2469.
propose an offline optimal migration and exit point [16] E. F. Camacho and C. Bordons, Model Predictive Control. Model
selection strategy (OMEPS) algorithm by leveraging Predictive control, 2007.
[17] D. Zhao, T. Yang, Y. Jin, and Y. Xu, “A service migration strategy
dynamic programming when complete future information is based on multiple attribute decision in mobile edge computing,” in 2017
available. To deal with the uncertain user behavior, we IEEE 17th International Conference on Communication Technology
incorporate a Model Predictive Control framework to the (ICCT), 2017, pp. 986–990.
[18] T. Ouyang, R. Li, X. Chen, Z. Zhou, and X. Tang, “Adaptive user-
OMEPS algorithm. And then construct a proactive service managed service placement for mobile edge computing: An online
migration and DNN exit point selection (MOMEPS) learning approach,” in IEEE INFOCOM 2019 - IEEE Conference on
algorithm. To cope with the heavy computation overheads Computer Communications, 2019, pp. 1468–1476.
[19] A. Ksentini, T. Taleb, and M. Chen, “A markov decision process-
of MOMEPS, we propose a cost-efficient algorithm, smart- based service migration procedure for follow me cloud,” in 2014 IEEE
MOMEPS, which introduces a neural network based smart International Conference on Communications (ICC), 2014, pp. 1350–
migration judgement to navigate the performance and compu- 1354.
[20] S. Wang, R. Urgaonkar, T. He, M. Zafer, K. Chan, and K. K. Leung,
tation overhead trade-off. Finally, we conduct extensive trace- “Mobility-induced service migration in mobile micro-clouds,” in 2014
driven experiments to evaluate our online algorithm. We also IEEE Military Communications Conference, 2014, pp. 835–840.
explore the performance of our algorithms under a variety of [21] H. Ma, Z. Zhou, and X. Chen, “Predictive service placement in mo-
bile edge computing,” in 2019 IEEE/CIC International Conference on
system settings and give corresponding analysis. Communications in China (ICCC), 2019, pp. 792–797.
[22] Y. Zhang, L. Jiao, J. Yan, and X. Lin, “Dynamic service placement for
REFERENCES virtual reality group gaming on mobile edge cloudlets,” IEEE Journal
on Selected Areas in Communications, vol. 37, no. 8, pp. 1881–1897,
[1] M. Hanyao, Y. Jin, Z. Qian, S. Zhang, and S. Lu, “Edge-assisted
2019.
online on-device object detection for real-time video analytics,” in IEEE
[23] K. Kawashima, T. Otoshi, Y. Ohsita, and M. Murata, “Dynamic place-
INFOCOM 2021 - IEEE Conference on Computer Communications,
ment of virtual network functions based on model predictive control,” in
2021, pp. 1–10.
NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management
[2] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge
Symposium, 2016, pp. 1037–1042.
intelligence: Paving the last mile of artificial intelligence with edge
[24] M. Kumazaki and T. Tachibana, “Optimal vnf placement and route
computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762,
2019. selection with model predictive control for multiple service chains,” in
1
2020 IEEE International Conference on Consumer Electronics - Taiwan

(ICCE-Taiwan), 2020, pp. 1–2.
[25] X. Tian, J. Zhu, T. Xu, and Y. Li, “Mobility-included dnn partition
offloading from mobile devices to edge clouds,” Sensors, vol. 21, no. 1,
2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/1/229
[26] Z. Wang, W. Bao, D. Yuan, L. Ge, N. H. Tran, and A. Y.
Zomaya, “Accelerating on-device dnn inference during service outage
through scheduling early exit,” Computer Communications, vol. 162,
pp. 69–82, 2020. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0140366420318818
[27] Z. Zhou, S. Yu, W. Chen, and X. Chen, “Ce-iot: Cost-effective cloud-
edge resource provisioning for heterogeneous iot applications,” IEEE
Internet of Things Journal, vol. 7, no. 9, pp. 8600–8614, 2020.
[28] E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge ai: On-demand
accelerating deep neural network inference via edge computing,” IEEE
Transactions on Wireless Communications, vol. 19, no. 1, pp. 447–
457, 2020.
[29] C. Wang, L. Ma, R. Li, T. S. Durrani, and H. Zhang, “Exploring
trajectory prediction through machine learning methods,” IEEE Access,
vol. 7, pp. 101 441–101 452, 2019.
[30] H. Gebrie, H. Farooq, and A. Imran, “What machine learning predictor
performs best for mobility prediction in cellular networks?” in 2019
IEEE International Conference on Communications Workshops (ICC
Workshops), 2019, pp. 1–6.
[31] C. Yang, X. Shi, L. Jie, and J. Han, “I know you’ll be back:
Interpretable new user clustering and churn prediction on a mobile
social application,” in Proceedings of the 24th ACM SIGKDD
International Conference on Knowledge Discovery & Data Mining,
2018, pp. 914–922.
[32] J. Feng, Y. Li, C. Zhang, F. Sun, F. Meng, A. Guo, and D. Jin, “Deep-
move: Predicting human mobility with attentional recurrent networks,”
in Proceedings of the 2018 world wide web conference, 2018, pp. 1459–
1468.
[33] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[34] F. Altche´ and A. de La Fortelle, “An lstm network for highway
trajectory prediction,” in 2017 IEEE 20th international conference on
intelligent transportation systems (ITSC). IEEE, 2017, pp. 353–359.
[35] T. Ouyang, X. Chen, L. Zeng, and Z. Zhou, “Cost-aware edge resource
probing for infrastructure-free edge computing: From optimal stopping
to layered learning,” in 2019 IEEE Real-Time Systems Symposium
(RTSS), 2019, pp. 380–391.
[36] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in neural informa-
tion processing systems, vol. 25, 2012.
[37] A. Kera¨nen, J. Ott, and T. Ka¨rkka¨inen, “The ONE Simulator
for DTN Protocol Evaluation,” in SIMUTools ’09: Proceedings of the
2nd International Conference on Simulation Tools and Techniques.
New York, NY, USA: ICST, 2009.
[38] Q. Wu, X. Chen, Z. Zhou, and L. Chen, “Mobile social data learning for
user-centric location prediction with application in mobile edge service
migration,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7737–
7747, 2019.
[39] M. Lin, Z. Liu, A. Wierman, and L. L. H. Andrew, “Online algorithms
for geographical load balancing,” in 2012 International Green Comput-
ing Conference (IGCC), 2012, pp. 1–10.
[40] K.-L. Li, C.-J. Zhai, and J.-M. Xu, “Short-term traffic flow prediction
using a methodology based on arima and rbf-ann,” in 2017 Chinese
Automation Congress (CAC), 2017, pp. 2804–2807.

JSA D 22 00243 Reviewer

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JSA D 22 00243 Reviewer

Uploaded by

Copyright:

Available Formats

Journal of Systems Architecture

Edge Intelligence in Motion: Mobility-Aware Dynamic DNN Inference Service Migration

Manuscript Number: JSA-D-22-00243

be migrated or not. Extensive trace-driven simulation results demonstrate the superior

Edge Intelligence in Motion: Mobility-Aware

Abstract—Edge intelligence (EI) becomes a trend to push the

server. For better QoS, the vehicle needs to constantly decide

storage space than VMs. Nevertheless, as DNN applications

The rest of this paper is organized as follows. Section II

B. Dynamic Service Migration C. DNN Early Exit Point Selection

Latency of offloading: When user offloads its request to an

Algorithm 2 MPC-Based Online Migration and Exit Point

Predict information of 4: while current time slot t ≤ T do

(e.g., prediction latency), which are not affordable for the

execution. Fig. 5. Overview of smart-MOMEPS Online Migration Strategy.

Algorithm 3 Smart-MOMEPS Algorithm

80 Fig. 7. MPC-based algorithms computation overhead at different time slots.

C. Algorithm Performance Comparisons We also evaluate the computation overhead reduction

Method Raspberry Pi Jetson NANO Jetson TX2 95

OMEPS 102.4 ms 39.1 ms 22.3 ms

Algorithm efficiency (%)

LSTM 44.0 ms 24.5 ms 24.2 ms

E. Impact of Prediction Error and Look-ahead Window Size

methods are 93.3%, and 82.5%, respectively. Intuitively, the

LSTM-based algorithm is 5.3% higher than the ARIMA- arima-smart-MOMEPS

Algorithm efficiency (%)

Migration times (%)

long- term effect of migration is not considered when W is 65

small, which will also cause the solution to underperform. 60 100

[3] C. Li, M. Song, M. Zhang, and Y. Luo, “Effective replica management

2020 IEEE International Conference on Consumer Electronics - Taiwan

You might also like