JSA D 22 00243 Reviewer

Journal of Systems Architecture
Edge Intelligence in Motion: Mobility-Aware Dynamic DNN Inference Service Migration

with Downtime in Mobile Edge Computing
--Manuscript Draft--
Manuscript Number: JSA-D-22-00243
Article Type: VSI: ECforIoV [MGE - Shaohua Wan]
Keywords: mobile edge computing; DNN service migration; multi-exit DNN; model predictive
control
Abstract: Edge intelligence (EI) becomes a trend to push the deep learning frontiers to the
network edge. In this paper, we consider a user-centric management for DNN
inference service migration and exit point selection, aiming at maximizing overall user
utility (e.g., DNN model inference accuracy) with various service downtime. We first
leverage dynamic programming to propose an optimal offline migration and exit point
selection strategy (OMEPS) algorithm when complete future information of user
behaviors is available. Amenable to a more practical application domain without
complete future information, we incorporate the OMEPS algorithm into a model
predictive control (MPC) framework, and then construct a mobility-aware
servicemigration and DNN exit point selection (MOMEPS) algorithm,
which improves the long-term service utility within limited predictive future information.
However, heavy computation overheads of MOMEPS algorithm impose burdens on
mobile devices, thus we further advocate a cost-efficient algorithm, named smart-
MOMEPS, which introduces a smart migration judgement based on Neural Networks to
control the implementation of (MOMEPS) algorithm by wisely estimating whether the
DNN service should
be migrated or not. Extensive trace-driven simulation results demonstrate the superior

performance of our algorithms for achieving significant overall utility improvements with
low computation overheads compared with other online algorithms.
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript File Click here to view linked References
1
Edge Intelligence in Motion: Mobility-Aware

Dynamic DNN Inference Service Migration with
Downtime in Mobile Edge Computing
Pu Wang1,2 , Tao Ouyang1 , Guocheng Liao3 , Jie Gong1 , Shuai Yu1 , Xu Chen1
School of Computer Science and Engineering1 , School of Biomedical Engineering2 , Sun Yat-sen University,
Guangzhou, China
School of Software Engineering3 , Sun Yat-sen University, Zhuhai, China
Abstract—Edge intelligence (EI) becomes a trend to push the constrained computing capability and energy. For example,
deep learning frontiers to the network edge, so that deep neural real-time video analytic applications require processing tens
networks (DNNs) applications can be well leveraged at resource- of frames per second by computation-intensive DNN models
constrained mobile devices with benefits of edge computing. Due
to the high user mobility among scattered edge servers in many (e.g., YOLO, ResNet), which are unsuitable for resource-
scenarios such as internet of vehicular applications, dynamic constrained mobile devices [1]. To tackle this challenge, mo-
service migration is desired to maintain a reliable and efficient bile edge computing (MEC) is proposed to push these services
quality of service (QoS). However, inevitable service downtime from the local IoT devices to the network edges, which are
incurred by service migration would largely degrade the real- servers in proximity to mobile devices (e.g., at base stations or
time performance of delay-sensitive DNN inference services.
Fortunately, based on the characteristics of container-based DNN WiFi hotpots). Mobile devices can collaborate with network
services, exit point selection and layer sharing feature of the edges to achieve a higher quality of service and a low latency
container technique can alleviate such performance degradation. via task offloading [2].
Thus, we consider a user-centric management for DNN inference However, maintaining a reliable and efficient service perfor-
service migration and exit point selection, aiming at maximizing mance of DNN applications at the network edge is nontrivial
overall user utility (e.g., DNN model inference accuracy) with var-
ious service downtime. We first leverage dynamic programming due to the mobility of users. To reduce the network latency,
to propose an optimal offline migration and exit point selection DNN services are generally deployed at a nearby edge server,
strategy (OMEPS) algorithm when complete future information where users can access their services through wireless commu-
of user behaviors is available. Amenable to a more practical nications to a local base station (BS), such as WiFi hotspots.
application domain without complete future information, we When a user moves away from the service scope of the nearby
incorporate the OMEPS algorithm into a model predictive control
(MPC) framework, and then construct a mobility-aware service edge server, the connection will switch to a new BS for remote
migration and DNN exit point selection (MOMEPS) algorithm, access to the service, and the QoS will inevitably degrade due
which improves the long-term service utility within limited to the longer transmission delay caused by increasing network
predictive future information. However, heavy computation over- distance. Although methods such as service replica [3], can
heads of MOMEPS algorithm impose burdens on mobile devices, alleviate the service performance degradation caused by user
thus we further advocate a cost-efficient algorithm, named smart-
MOMEPS, which introduces a smart migration judgement based mobility via backing up services on different edge servers, it
on Neural Networks to control the implementation of (MOMEPS) brings issues such as user privacy and storage redundancy.
algorithm by wisely estimating whether the DNN service should Therefore, we focus on another widely adopted approach,
be migrated or not. Extensive trace-driven simulation results dynamic service migration [4]–[7], to solve this problem.
demonstrate the superior performance of our smart-MOMEPS Designing a suitable dynamic service migration scheme
algorithm for achieving significant overall utility improvements
with low computation overheads compared with other online faces a few challenges. Specifically, frequent service migration
algorithms. would cause extensive service downtime [8], and thus is
unacceptable for latency sensitive DNN services, especially
Index Terms—DNN service migration, multi-exit DNN, service
downtime, mobile edge computing, model predictive control. in internet of vehicular applications where the users have
high mobility. Besides, blindly allowing service to follow user
mobility (such as Ping-pong loops, i.e., the user goes back
I. I NTRODUCTION and forth between two BSs’ coverages) incurs unnecessary
W ITH the explosive development of Internet of Things

(IoT) devices, an increasing number of Deep Neu-
ral Networks (DNNs)-driven IoT applications, including face
migrations and heavy overheads. Therefore, it is critical to
dynamically migrate a DNN service in a moderated way to
follow the user mobility.
recognition, autonomous driving and augmented reality are Existing work on edge service migration has regarded
emerging. Generally, such applications require massive com- container as a promising way for service provision [9]–[12].
puting resources to guarantee low latency and high inference Compared to traditional virtual machine (VM), container is
accuracy, which are often unavailable due to mobile devices’ a more lightweight virtualization technique and takes up less
2
Container Image Registry delay. To combat the service outage, we could run DNN
service on a mobile device with a proper exit point during
Layer 1
...
migration. However, exiting inference at an early DNN layer
Layer n results in a loss of accuracy. As the network distance continues
DNN service
container image to increase, the benefits of early exit will gradually decrease
as the inference needs to exit at earlier exit point.
As illustrated in Fig. 1, a mobile vehicle travels around
EDGE1 EDGE2 EDGE3
DNN service three regions, each containing a BS endowed with an edge
conv
conv
conv
Exit 1
FC
server. For better QoS, the vehicle needs to constantly decide

Exit 2
FC
whether to migrate service or choose early exit points while

To exit
Region 1 Region 2 Region 3
early
or not?
driving. When the vehicle is going back and forth between
regions 2 and 3, it’s wiser to exit DNN inference early than
BS1 BS2 BS3 keep service following the vehicle’s mobility with considering
the service downtime. In addition to the user trajectory, service
To migrate
request frequency is another important factor in migration de-
service
or not? cisions. The vehicle will generate a higher frequency of service
move requests in region 2 because of the more complicated road
condition. In this case, migrating service to BS2 in advance
User move User move User may yield higher utility. Therefore, future information of user
activities could help us make a better decision on service
Fig. 1. Dynamic DNN service migration in mobile edge computing.
migration and exit point selection. Fortunately, a variety of
studies have been on predicting user behavior, which can be
utilized to predict future information for decision-making.
storage space than VMs. Nevertheless, as DNN applications
In this paper, we consider a time-slotted system and for-
are generally built on machine learning frameworks such
mulate the long-term optimization problem. We propose an
as TensorFlow and PyTorch, the container image sizes of
online algorithm based on Model Predictive Control (MPC)
these applications can use up to be hundreds or thousands
[16] framework to make adaptive DNN service migration and
of megabytes. Service downtime caused by migration comes
exit point selection decisions. Then we devise a migration
primarily from fetching the image from the remote registry,
judgement approach based on Neural Networks to make an
which is relatively long for latency-sensitive applications.
efficient policy for higher user utility and lower computation
Therefore, frequent migration would lead to significant traf-
overhead. The main contributions of this paper are listed as
fic overhead and terrible user experience. In our work, we
below:
leverage the layer sharing feature of the container to reduce
service downtime. Container image packages the necessary 1) We propose a general framework of mobility-aware
files, such as runtime tools, system libraries, and data files in dynamic service migration and DNN model exit point
different independent layers, which can be shared by different selection in MEC scenario. Such a framework enables
images [13]. For example, the official container image size of efficient DNN service that maximizes the long-term user
TensorFlow on dockerhub is 400MB [14], and it will cost 80 utility.
seconds to migrate on a 5 Mbps wide area network (WAN). 2) We formulate a long-term user utility maximization
When a TensorFlow-based DNN service is migrated to an edge problem considering downtime in MEC. We solve the
node where the TensorFlow image already exists, the service problem in an offline setting where the information of
can share this base image at the target edge node without user’s activities is know, via a polynomial-time dynamic
downloading the full service image. In this case, only a part of programming algorithm. In an online setting where the
service data such as state-dependent files, user’s private data information is unknown, we leverage a MPC framework
or personalized model parameters need to be migrated, and to construct an online proactive service migration and
hence the service downtime could be significantly reduced. DNN exit point selection algorithm named MOMEPS
In this work, we focus on the dynamical service migration to solve this long-term utility problem. Furthermore, to
problem in a distributed and time-varying MEC scenario to cope with the heavy computation overhead of MPC,
optimize the long-term user utility, e.g., the inference accuracy we devise a smart migration judgement approach based
of the DNN model. We assume that a request will bring on Neural Network called smart-MOMEPS, which ef-
full utility if it completes before the deadline. Otherwise, the ficiently navigates the performance and computation
utility will be zero. As mentioned above, frequent migration overhead trade-off.
will cause numerous service interruptions, and during the 3) We conduct extensive experiments to evaluate our online
migration, all the requests will miss the deadline and bring no algorithm based on real-world data traces. We demon-
utility. Since the increase in network distance is the main cause strate the effectiveness of our approach by comparisons
of migration, in order to reduce the frequency of migration, with benchmark algorithms. The trace-driven simulation
we introduce an early exit mechanism [15] to acquire less results show that our approach can significantly improve
inference latency to counteract the increased transmission the overall utility and reduce the computation overhead.
3
The rest of this paper is organized as follows. Section II Container-based Service Migration: In addition,
briefly reviews related works of service migration. In section container-based service migration is specifically considered
III, we describe the system model and problem formulation. In in our work which is more lightweight and saves storage
section IV, we propose an optimal offline service migration and space compared to traditional VM. To reduce the migration
exit point selection algorithm to maximize overall user utility. overhead, [11] designed an efficient live migration system
We explain the MPC-based online algorithm and furthermore which ensures integrity of components by leveraging the
propose a more lightweight algorithm in Section V. In Section layered structure of container. In [9], authors proposed
VI, we evaluate our online algorithm to demonstrate the an edge computing platform architecture which supports
effectiveness. Finally, we conclude this paper in Section VII. seamless docker container migration of offloading services
while also keeping the moving mobile user with its nearest
edge server. [12] presented Voyager, a just-in-time live
II. R ELATED W ORK
container migration service which combines service memory
With the rapid development of MEC, which allows low state migration with local filesystem migration to minimize
transmission delay and fast response speed, significant chal- service downtime. However, these works mainly focus on
lenges are gradually emerging. The inherent characteristics of leveraging container features to reduce service downtime.
MEC services which are distributed in different geographic While our work incorporates the impact of various service
regions, and the mobility of users bring dominating difficulties. downtime due to layer sharing feature of container into
Hence, edge service migration solution, which can help to migration and improves long-term user utility by optimizing
maintain QoS in a dynamic MEC environment, has become migration decisions.
one of the significant research topics. DNN Inference Service Migration: Since DNN has been
Service Migration without Future Prediction: Plenty of extensively applied for intelligent applications at the edge,
literature has focused on service migration in MEC to cope ongoing service downtime becomes extremely unaffordable
with the key challenge of user dynamics. A branch of such for latency-sensitive DNN inference services. To figure out
works accommodate arbitrary user mobilities without predic- the above difficulty, [25] developed a mobility-included DNN
tion. Zhao et al. tackled virtual machine migration strategy partition offloading algorithm to adapt to user movement.
based on multiple attribute decision-making, aiming at mini- Wang et al. adopted DNN inference exit at earlier layers during
mizing a comprehensive cost given current network situation service outage to shorten the inference delay by sacrificing
and user location [17]. In [18], Ouyang et al. researched an an acceptable level of accuracy [26]. Different from [25] and
adaptive service placement mechanism at MEC and formulated [26], we solve DNN inference service outage collaboratively
it as a contextual multi-armed bandit learning problem to via early exit mechanism and dynamic service migration to
optimize user’s perceived latency and service migration cost. maintain the user QoS.
Besides, many studies utilized Markov decision process-based
(MDP) methods to solve dynamic service placement under the III. S YSTEM M ODEL AND P ROBLEM F ORMULATION
assumption that user mobility follows or can be approximated
In this section, we present system model and problem
by a Markov chain-mobility model. Specifically, [19] worked
formulation for DNN service migration with downtime and
on balancing a trade-off between migration cost and quality by
exit point selection in mobile edge computing.
modeling the service migration procedure using MDP. In [20],
the optimal policy of edge service migration formulated as
MDP was proved to be a threshold policy when user mobility A. Overview of Dynamic Migration for DNN Service
follows a one-dimensional (1-D) asymmetric random walk Fig. 1 illustrates a typical edge intelligence scenario, i.e.,
mobility mode. edge-assisted one-device object detection for video streaming
Service Migration with Future Prediction: Accordingly, analytics. More explicitly, a smart vehicle traveling around an
user activities prediction in service migration scenario is also urban city uses DNN service to process surrounding informa-
widely studied to improve user QoS. Ma et al. incorporated tion gathered by its camera. Based on existing virtualization
two-timescale Lyapunov optimization method and limited user technologies, the service profile and environment for multi-
mobility prediction to find the optimal service placement exit DNN model (e.g., AlexNet) can run on a dedicated
decisions [21]. Zhang et al. in [22] overcame the challenges container. Thus, the real-time video stream can be collabora-
of an underlying dynamic rendering-module placement prob- tively processed at both local vehicle and nearby edge server
lem by leveraging model predictive control (MPC) to tackle via task offloading [1]. To guarantee a reliable QoS for a
user trajectory prediction at the edge. Both [23] and [24] moving vehicle, dynamic DNN service migration and exit
took advantages of MPC to work out dynamic placement point selection are adopted to accommodate the high dynamics
of virtual network functions (VNF) and achieved efficient within required completion time.
resource scheduling. However, service downtime, which is With the presented edge-assisted architecture, we consider
widespread at the edge and has great influence on migration a set of base stations (BSs): B = {1, ..., B}, each of which is
policies, was not considered in these works. In this paper, equipped with an edge server to provide DNN service to the
we take service downtime fully into account and propose an user. And the user accesses the service through the nearest base
MPC-based algorithm, which integrates NN-based migration station. In line with the recent work [27] on edge computing,
judgement for mitigating the heavy computation cost of MPC. we adopt a discrete time-slotted model to fully characterize the
4
TABLE I constraints of DNN service execution should be satisfied as

SUMMARY OF NOTATIONS
λ(t) = si 1{xi (t)>xi (t−1)} , ∀t ∈ T , ∀i ∈ B, (3)
Notation Definition X
xi (τ ) ≤ 1 − 1{xi (t)>xi (t−1)} ,
B The set of all the base stations
i∈B
M All early exit points of the DNN model
xi (t) Whether user decides to offload service to BS i at time t t + 1 ≤ τ ≤ t + λ(t) − 1, ∀t ∈ T , (4)
yj (t) Whether j is the selected early exit point at time t
Tmax The maximum tolerated delay of each request where 1{·} is the indicator function. The constraint (3) indi-
rt The request rate at time t
cates the current service downtime for migrating service to BS
dfj Inference delay of the j-th early point on edge server
deγ Inference delay of the fixed γ-th early point on local device
i when xi (t) > xi (t − 1) (i.e., xi (t) = 1 and xi (t − 1) = 0).
lf (t) Total delay of a request while offloading service at t The constraint (4) denotes the local execution during the
lc (t) Communication delay while offloading service at t migration procedure, i.e., the user can not make migration
le (t) Total delay of a request delay while executing locally at t
si Time cost for BS i to fetching container images
decision from t + 1 to t + λ(t) − 1 after deciding to migrate
λ(t) Downtime caused by migrating service at t to BS i at time t, where the service is being migrated from t
S(t) User is in the offloading state at t to t + λ(t) − 1.
Uo (t) User utility while offloading at t
Um (t) User utility while migrating at t
To clearly describe the current state of user, we use S(t)
to indicate whether user is migrating or offloading. S(t) = 1
represents user is offloading computation to BS at t. Accord-
ingly, S(t) = 0 when user is migrating its service (executing
system dynamics (e.g., user mobility and request frquency). requests locally). Obviously, there is only one state for a user
Each time slot T = {0, ..., T } matches the time scale at a given time slot. The expressions are listed below:
where service migration and exit point selection decisions X
are updated. For ease of exposition, Table I summarizes the S(t) = xi (t) − 1{xi (t)>xi (t−1)} , ∀t ∈ T , (5)
introduced notations. i∈B
S(t) ∈ {0, 1}, ∀t ∈ T . (6)
B. Dynamic Service Migration C. DNN Early Exit Point Selection

To improve the service performance during task offloading
Due to the uncertain user behavior, such as user mobility procedure, we incorporate the early exist point selection of
and request frequency, the user needs to dynamically migrate DNN model to accommodate the dynamic user behavior and
service among these BSs to acquire higher QoS. Here we edge environment. Without loss of generality, we consider the
use xi (t) ∈ {0, 1} to indicate the decision-making for service DNN model with a set of early exit points, denoted as M =
migration at time slot t. xi (t) = 1 (i ∈ B) indicates that the (1, ..., M ). The exit point selection would directly affect the
user decides to offload computation to the i-th BS at t. Note service performance (i.e., delay and accuracy). For example,
that the user can only offload to one BS at each time slot. an earlier exit of DNN inference causes less computation delay
Thus, such a constrain of service migration decision-making with a lower accuracy [28]. Therefore, user can choose to exit
can be expressed as: DNN inference early on BS to avoid performance degradation
X due to service downtime. Let yj (t) denote whether j ∈ M
xi (t) ≤ 1, ∀t ∈ T , (1) is the selected early exit point at t. yj (t) = 1 means that the
i∈B DNN model will exit the inference at the j-th exit point at t.
xi (t) ∈ {0, 1}, ∀t ∈ T , ∀i ∈ B. (2) Like service migration, the user can only make the exit point
selection decision while offloading. In addition, only one exit
point can be selected at each time slot. We thus have have
Considering the delay-sensitive nature of DNN services, we following constraints:
capture large service downtime, which is caused by fetch-
ing container image from remote image registry (during the yj (t) ∈ {0, 1}, ∀t ∈ T , ∀j ∈ M, (7)
migration procedure), when optimizing the long-tem service X X
yj (t) = xi (t), ∀t ∈ T . (8)
performance. Once the user determines a migration, the DNN
j∈M i∈B
service can only be executed locally on the user’s device and
successive migration decisions should be consistent with the
original one until the whole migration procedure terminates. D. User Request Latency
Due to different container layers deployed on edge servers, We assume that a request will bring the full user utility
different amounts of data should be downloaded from the if it is completed before the deadline. Otherwise, the user
central image registry to fetch necessary container layers at utility will be zero. Therefore, to maximize the overall user
the target BS, which leads to various service downtime based utility, the latency of each request should meet its deadline
on current migration decision. In this regard, we denote the [26]. We define Tmax as the maximum tolerated delay of each
time cost for fetching container image at BS i by si , and request. The user request latency differs in offloading state and
use λ(t) to denote the service downtime at time t. Then, the mitration state.
5
Latency of offloading: When user offloads its request to an the problem. Next, we focus on the challenging case without
edge server, the request delay lf (t) is jointly determined by future information. We leverage advanced machine learning
the DNN inference delay and communication delay. We set the methods to predict limited information, and devise an online
delay for edge server to perform model inference at the j-th algorithm.
early exit point as dfj . lc (t) denotes the communication delay
at t. Apparently, lc (t) includes the delay for transmitting the IV. O FFLINE O PTIMAL S ERVICE M IGRATION AND E XIT
request from the user’s device to the BS that user is connected P OINT S ELECTION A LGORITHM S OLUTION
to at time t, and the forwarding delay which majorly depends In this section, we present the offline optimal solution of
on the hop distance along the shortest communication path the service migration and DNN exit point selection problem
among these BSs. The constraints of lf (t) can be expressed through dynamic programming, when the complete future
as: information of the user activities is assumed to be exactly
lf (t) = S(t)(lc (t) + dfj ), known.
First, we show that this offline optimal problem possesses
lf (t) ≤ Tmax , ∀t ∈ T . (9)
the property of optimal substructure. We define (i, q, n) as
Latency of migration: During migration, DNN inference the state of the user, in which i ∈ T represents time, q ∈ B
will be processed at local device with a fixed early exit point represents the BS that hosts the user service and n ∈ M
γ ∈ M. Hence the user request latency le (t) is mainly deter- represents the selected DNN model exit point. Accordingly,
mined by the computation delay. We set the DNN inference (i− , q − , n− ) is the previous state of (i, q, n). We set p ∈ B
delay while running at local device as deγ , and its constraints and m ∈ M as the user migration decision and exit point
are shown as follows: selection, respectively. Let C((i, q), p, m) represent the whole
time slots to complete p and m. It can be expressed as
le (t) = (1 − S(t))deγ , 
le (t) ≤ Tmax , ∀t ∈ T . (10) 1,
 p = q,
C((i, q, n), p, m) = sp , p ̸= q, sp + i ≤ T, (12)

T − i, p ̸= q, sp + i > T.

E. Problem Formulation
After introducing the user’s decision variables and request Here, p = q indicates that user decides to re-select the model
latency, we are ready to present the user utility maximization exit point in the current state, and it cost one time slot to
problem in a given time horizon. Here we define the utility as complete. When the user chooses to migrate service (p ̸= q),
the inference accuracy of the DNN model. The accumulated the decision execution time is the service downtime (sp ), and
user utility contains the utility in migrating and offloading. We the portion beyond T is discarded.
define the utility at j-th early exit point is uj . The user request Let U ((i, q, n), p, m) represent the sum of the utilities
rate at time t is rt . When the request is offloading, the user during the completion of decision p and m. Note that DNN
utility Uo (t) at time t is the sum of the utility of all requests inference will be processed at local device with a fixed early
at that time, which can be expressed as: exit point γ during service downtime (p ̸= q). f (i, q, n)
represents the accumulated utility at state (i, q, n). Then, given
Uo (t) = S(t)rt uj , ∀t ∈ T . (i, q, n) and the user’s decisions p and m, the state is transited
During migration, we set the utility at a fixed exit point γ to (i+ , q + , n+ ), where
is uγ , and the migration user utility Um (t) at time t can be i+ = i + C((i, q, n), p, m), (13)
expressed as: +
q = p, (14)
Um (t) = (1 − S(t))rt uγ , ∀t ∈ T . +
n = m. (15)
We formulate an accumulated user utility maximization prob- And the user obtains following utility
lem in a given finite time horizon T as follows:
um ri , if p = q,
T U ((i, q, n), p, m) = (16)
X uγ ri C((i, q, n), p, m), otherwise.
max (Uo (t) + Um (t)) (11)
xi (t),yj (t)
t=1 Thus the accumulated utility at (i+ , q + , n+ ) is
s.t. (1) − (10) f (i+ , q + , n+ ) = f (i, q, n) + U ((i, q, n), p, m). (17)
Solving the problem in (11) has the following two chal- Let f ∗ (i, q, n) denote the optimal accumulated utility at
lenges. Firstly, as the objective is the accumulative utility in a (i, q, n), and then we have
time horizon, it is difficult to gain the complete future infor-
f ∗ (i+ , q + , n+ ) = max (U ((i, q, n), p, m) + f ∗ (i, q, n)) (18)
mation (e.g., user trajectory and request frequency). Secondly, p∈B,
m∈M
the user’s decisions are coupled across different time slots.
s.t. (12) − (17).
That is, decision at current slot would affect decisions in the
future. To solve these challenges, we begin with an ideal case Since the problem can be regarded as the Bellman equation,
where future information is available, and develop an optimal a dynamic programming approach can be adopted to derive an
offline solution. This helps provide some insights to solve offline optimal solution of our service migration and DNN exit
6
point selection problem. For ease of exposition, we transform Algorithm 1 Offline Optimal Migration and Exit Point Selec-
the offline optimal problem into a longest path problem by tion Strategy (OMEPS) Algorithm
constructing a directed acyclic graph (DAG). 1: Parameter Notation
2: Vector O(i,q,n) is the optimal migration strategy, it
T1 T2 T3 T4 T5 contains a series optimal states from initial state to state
B1 B1 B1 B1 B1 B1 (i, q, n).
3: Initialization: Initialize the initial state (i− , q − , n− ) =
(0, 1, 1), optimal strategy O(i,q,n) = ∅, O(0,1,1) =
{(0, 1, 1)}, optimal accumulated utility of the initial state
S B2 B2 B2 B2 E ∗
f(0,1,1) = 0.
4: for each time slot i = 1, ..., T do
5: for all q such that q ∈ B do
6: Determine the optimal previous state according to
B3 B3 B3
decision p by using (12) − (17), i.e., (i− , q − , n− )opt =
arg max(i− ,q− ,n− ) {U ((i− , q − , n− ), p, m) + f(i∗ − ,q− ,n− ) }.
Fig. 2. Longest-path problem transformation of offline service migration and
DNN exit point selection problem over T = 5 time slots with three BSs B1 , 7: if the optimal previous state (i− , q − , n− )opt is
B2 and B3 . found, then
8: Update the optimal decisions to current state
As shown in Fig. 2, we construct a graph G = (V, E) to O(i,q,n) = {(i, q, n)} ∪ O(i− ,q− ,n− )opt .
represent all possible service migration and DNN exit point se- 9: Update the optimal accumulated utility at
∗
lection decisions within T time slots. Each vertex presents the current state f(i,q,n) = U ((i− , q − , n− )opt , p, m) +
state (i, q, n) that user can reach. Since the future information ∗
f((i − ,q − ,n− )
opt )
.
(user trajectory and request frequency) is known, exit point n 10: else
can be determined when i and q are given at each state. Note 11: Update the optimal accumulated utility at cur-
that the source vertex S represents the initial state (we set it as ∗
rent state f(i,q,n) = 0.
(0, 1, n)). Each state (except the initial state) is transited from 12: end if
the previous state by performing the corresponding decision. 13: end for
The destination vertex E is an auxiliary vertex to ensure that 14: end for
a single longest path can be founded. Each edge weight on 15: Pick the maximum accumulated utility state (i, q, n)opt =
the DAG between two states represents the sum of the request ∗
arg max(i,q,n) f(i,q,n) and set its corresponding policy
utilities of executing decisions, and the edges connecting to O(i,q,n) as the offline optimal service migration and DNN
E have zero weight. It’s worth noting that, suppose the user model exit point selection strategy Oof f .
decision can be completed before T , we can draw a directed
edge between two states. However, if the decision completes
at a time beyond T , for example, when a user in state B1 to find out the optimal strategies O of each time slot for a
at T4 performs the decision of transferring to state B2 . We given finite time horizon. In the algorithm, we can obtain
draw a directed edge from B1 at T4 to the corresponding the longest-path (i.e., the optimal service migration and DNN
yellow auxiliary vertex B2 . Accordingly, the weight of the model exit point selection strategy) for each state by solving
edge represents the sum of the utilities that user can obtain the Bellman’s equation (i.e., line 6). Then we can pick the
from T4 to the end. The weight of the edge connecting each path that contains the state with the highest accumulated utility
vertex of time T to the yellow auxiliary vertices is zero. We at T as the longest path (i.e., line 15), which is the optimal
have now completed the construction of the DAG. solution to the problem. For searching the longest path, the
We can derive the user’s optimal strategy by finding the algorithm needs to enumerate at most B 2 possible states at
longest path from S to E. Specifically, given all the informa- each time slot. Thus, for the T -time slots, the time-complexity
tion of user activities over T time slots, the weight of all edges of Algorithm 1 is O(B 2 T ).
can be calculated. And the total weight of a path from source
vertex S to destination vertex E can hence present the whole
utility over the time horizon. Consequently, the optimal service V. O NLINE S ERVICE M IGRATION AND E XIT P INT
migration and DNN model exit point selection strategy can be S ELECTION A LGORITHM
found by taking the longest path from S to E. As shown in So far we have presented the solution of the problem in
Fig. 2, we give the longest path for T = 5 with 3 BSs. Each (11) under complete future information scenario as a baseline.
red vertex represents the state of the user at the corresponding In practice, it is challenging to obtain complete information.
time slot, and the vertex pointed by the solid black edge is the This motivates an optimal online algorithm without complete
user state after performing the decision. Obviously, since this information. To this end, in this section, we combine some
longest path problem has an optimal sub-structure property, it popular machine learnig techniques (e.g., LSTM) to predict
can be solved by the classical dynamic programming approach. future information (e.g., mobility traces) to improve the long-
Algorithm 1 shows the pseudocode of our optimization al- term service performance with informed decision making.
gorithm which uses dynamic programming with memoization However, frequent prediction would incur large running costs
7
Algorithm 2 MPC-Based Online Migration and Exit Point

Selection Strategy (MOMEPS) Algorithm
CONV
FC
1: Parameter Notation
Migration CONV
Exit Point 1
2: Vector π(i,q,n) is the optimal migration and exit point
decision-making selection strategy of a prediction window, it contains a
FC
Exit Point 2
er
BS series optimal states from current time slot t to t + W .
Se th
le
cta
to
an
o
... Repeat 3: Initialization: τ = 1, t = 1.
n
ex te
ra
4: while current time slot t ≤ T do
it p ig
Predict information of
oi M
nt
t0+1~t0+w 5: τ =t
prediction OMEPS 6: Predict the user mobility and request frequency of
future W time slots [τ + 1, ..., τ + W ].
7: Determine the optimal strategy π(i,q,n) of the predic-
t0 t0+1 t0+2 ...... t0+n tion window [τ, ..., τ + W ] by using OMEPS.
8: Select the first step (i, q, n) of π(i,q,n) to execute.
Fig. 3. Overview of MPC-Based Online Migration and Exit Point Selection 9: t=i+1
Strategy. 10: end while
(e.g., prediction latency), which are not affordable for the

the robustness of informed decisions, we adopt a standard
resource-constrained user equipment. To well balance the
MPC-based approach for decision execution, i.e., only the first
performance-cost trade-off, we propose a proactive adaptive
decision would be implemented in each prediction window. In
algorithm, named smart-MOMEPS, to jointly optimize ser-
this way, the negative impact of accumulated predictive errors
vice migration and early exit point of DNN services, which
can be considerably alleviated, since the potential terrible
integrates MPC-based online migration and exit point selection
decisions in the prediction widow (especially last few time
strategy (MOMEPS) and NN-based migration judgement for
slots) are rejected.
achieving higher averaged utilities at low costs.
We summarize the MPC-based online migration and exit
point selection strategy in Algorithm 2. In line 7, user needs to
A. MPC-Based Online Migration and exit point selection perform OMEPS algorithm to obtain the decisions of service
Strategy migration and early exit point selection. Thus for the T -time
A wide variety of machine learning techniques (e.g., [29]– slots, the time-complexity of Algorithm 2 is O(B 2 W T ).
[32]) have been studied to well estimate the user behavior
by leveraging collected data information. Among them, long
B. Smart MOMEPS
short-term memory (LSTM) [33] is considered particularly
efficient for time series prediction (user activities in our However, MPC-based online migration and exit point se-
problem) due to its ability to keep a memory of previous lection strategy (MOMEPS) brings not only substantial im-
inputs [34]. In addition, compared with other machine learning provements of service performance, but also heavy running
methods, such as DNN, LSTM has a simpler network structure costs. More explicitly, one-slot decision-making depends on a
and less computation overhead, making it more suitable for whole optimization over the prediction window, which con-
resource-constrained mobile devices. Therefore, in our work, tains the running cost (i.e., latency) of prediction mechanism
we exploit LSTM to predict future information, including and OMEPS algorithm. Such computational costs in MO-
user trajectory and request demanding. By incorporating these MEPS largely increases the burden of resource-constrained
predictive results to assist decision optimization, we can get mobile devices for decision optimization. To well balance the
rid of extensive unnecessary migrations caused by uncertain performance-cost trade-off in the run time, we advocate a
user mobility (e.g., Ping-pong loops). proactive adaptive algorithm, Smart-MOMEPS, for enabling
The MPC-based online migration and exit point selection more moderated decision optimization on the fly.
strategy works as follows. As illustrated in Fig. 3, at the We design the Smart-MOMEPS algorithm based on the
beginning of each slot, the device would use its prediction observation of the MOMEPS execution results. Intuitively,
mechanism to estimate the future information of mobility trace MOMEPS would avoid frequent service migrations to reduce
and request demands within W time slots before decision- the user utility degradation due to extensive service down-
making. Here, W is actually the size of prediction window. time. In this regard, we apply the MOMEPS algorithm over
Sequentially, based on the predictive results, the device can the whole process to demonstrate the proportion of service
derive optimal decisions of service migration and early exit migration and edge offloading (i.e., no migration) in the long-
point selection during the prediction window via the OMEPS term policy optimization. As shown in Fig. 4, offloading
algorithm. decisions play a dominant position (i.e., 81.82%) to reduce
The prediction mechanism in the online setting can hardly service downtime, so that DNN services can be processed as
provide perfect prediction. Moreover, as the window size W many as possible at the resource-rich edge server. Inspired
increases, the predict error would accumulate, which even- by such vital observations, a core idea in our proactive adap-
tually leads to severe performance degradation. To enhance tive algorithm is to design a light-weight approach to assist
8
Migration
judgement
NN model
Offloading MOMEPS
exit point selection
Mi
gr
ati
on
CONV
FC
Exit Point 1
CONV
FC
Exit Point 2
Fig. 4. The percentage of migration and offloading throughout the MPC

execution. Fig. 5. Overview of smart-MOMEPS Online Migration Strategy.
Algorithm 3 Smart-MOMEPS Algorithm

user quickly determine whether a DNN service migration is 1: Initialization: t = 1.
needed or not. As illustrated in Fig 5, once determining DNN 2: while current time slot t ≤ T do
service migration at current slot, MPC-based predictive policy 3: Determine whether the current service needs to be
would be adopted to derive an efficient migration decision. migrated with trained NN model.
Otherwise, the device can offload its task with an optimal exit 4: if current service needs to be migrated, then
point selection to achieve high service performance. Clearly, 5: Determine the current migration decision (i, q, n)
the times of executing MPC-based predictive policy will be with MOMEPS and execute it.
significantly reduced, thus leading to lower computational 6: t=i+1
overheads. 7: else
Inspired by [35], we adopt a light-weight Neural Networks 8: t=t+1
(NN) model to dynamically determine whether migrate DNN 9: end if
service or not based on current user state, due to its powerful 10: end while
presentation capacity. Particularly, the device can gather suffi-
cient sample traces beforehand and train the NN model offline,
and then only run the lightweight model inference online1 . To A. Simulation Setup
well characterize the underlying mapping of NN model, we In this experiment, we take AlexNet [36] as the DNN
consider two key items observed by the device at current time service, and we employ BranchyNet framework to train this
slot, i.e., the frequency of requests and hop distance between AlexNet model with five exit points. We test the delay and ac-
service and user as the input for NN model. The rationales curacy of inference at each exit point, using a desktop PC (Intel
behind are as follows: on the one side, a long transmission i7-6700 CPU, and 8 GB memory). The outcome is shown in
delay with large hop distances would stimulate the DNN Table II. For user mobility, we utilize the ONE simulator [37]
service migration, aiming at higher utilities over successive to generate user movement traces. ONE simulates the scenario
time slots; On the another side, the request demand is an where mobile devices move along the roads and streets in an
essential indicator of service performance. Particularly, high urban city. We chose the movement data of cars, and the speed
frequency of user requests (e.g., the number of video frame of each car is from 10km/h to 50km/h. For simplicity, we
in objective detection) implies high potential costs for DNN divide the whole area into 169 square parts, and each part has
service migration, since the DNN model must be inferred a base station that can provide service to the user and occupies
at the local device during migration procedure. Finally, we 200m × 200m area. In our simulation, we adopt a discrete
summarize this Smart-MOMEPS algorithm in Algorithm 3. time-slotted model to characterize the system dynamics. We
set the interval of one time slot to be 20 seconds, and the
VI. E VALUATION wireless connection and system dynamic remain unchanged
In this section, we carry out trace-driven simulation to eval- during each time slot. For user trajectory and request frequency
uate the performance of our proposed algorithm and demon- prediction, we utilize LSTM to obtain the results.
strate its effectiveness compared with benchmark schemes.
B. Benchmark Schemes
1 Note that the Neural Networks can also be trained in an online manner.
Specifically, we collect the results of the MOMEPS execution in the migrated We compare our proposed algorithms with aforementioned
case as training data, whose volume would be gradually increased as time offline optimal solution and the following 5 benchmarks.
goes by. Considering introduced deployment costs on the resource-constraint
device, we adopt a suffice light-weight Neural Networks with two full- 1) Always Migration (AM): user always migrates service
connected layers, where only few data samples are needed for well training. to the base station where it is currently located.
9
TABLE II
L ATENCY AND ACCURACY AT EACH EXIT POINT OF A LEXNET.
1400 Lazy-MPC
Smart-MOMEPS
Exit point 1 2 3 4 5 MOMEPS
1200
Latency (ms) 9.4 14.0 18.5 24.4 30.2
Computation overhead
Accuracy 70.0 71.2 76.0 77.7 78.0 1000
800
600
120
FHC Lazy-MPC 400
LM MOMEPS
110 AM Smart-MOMEPS 200
PLM
Algorithm efficiency (%)
100 0
500 1000 1500 2000
90 Time slot
80 Fig. 7. MPC-based algorithms computation overhead at different time slots.
70
60 Besides, we can observe that FHC algorithm has the worst

50
performance, with only 58% of the offline optimal. This is
500 1000 1500 2000 because the decisions in the last few time slots within the
Time slot
prediction window deviate far from the optimal decisions due
Fig. 6. Algorithm efficiency at different time slots (W = 5 for MPC-based to the accumulation of prediction errors, which can severely
algorithms). reduce the algorithm efficiency. Compared to FHC, Smart-
MOMEPS has 1.6 times efficiency and better robustness.
Except for FHC, algorithms that leverage future information
2) Lazy Migration (LM): service will not be migrated until work better than those that do not. The reason is that we
the distance between the base station where it is hosted have considered service downtime in this work. If the service
and the user exceeds the threshold. Once the migration blindly follows the user’s trajectory (AM and LM), service
is triggered, the service image is migrated to the base cannot be migrated to a suitable BS in most cases due to
station where the user is currently located. user mobility. In contrast, future information can help user
3) Predictive Lazy Migration (PLM): a prediction-based migrate service as appropriately as possible. Compared to
Lazy Migration algorithm proposed in [38]. It leverages LM, PLM can help user avoid unnecessary migrations by
one-shot prediction to improve the LM algorithm. using the information of the next slot. However, when the
4) Lazy MPC (L-MPC): this algorithm is proposed to service needs to be migrated, PLM can not guide user where
reduce the computation overhead of MPC. The basic to migrate the service since the downtime contains several time
idea of L-MPC is that we use MPC only when the slots. Accordingly, far-sighted Smart-MOMEPS can be more
migration condition in LM is met. effective than PLM.
5) Fixed Horizon Control (FHC): unlike standard MPC-
based algorithm, FHC performs the whole decisions
within a prediction window instead of only the first step
of these decisions [39]. D. Algorithms Execution Cost
We also evaluate the computation overhead reduction caused

C. Algorithm Performance Comparisons by Smart-MOMEPS under different time slots. We define the
We first evaluate our proposed algorithms and five bench- computation overhead of these algorithms as the execution
marks. The prediction window of MPC-based algorithms is times of prediction and OMEPS. Under the result of Fig. 7,
set to 5 time slots. We define a efficiency metric as the Smart-MOMEPS can fulfill an average computation overhead
ratio of the overall user utility obtained by these algorithms reduction ratio 3.1 times more than MOMEPS. Although the
to the utility obtained by the offline optimal algorithm. The computation overhead of Smart-MOMEPS is higher than that
numerical results are shown in Fig. 6, our proposed algorithms of Lazy-MPC, it is still worthwhile considering the better
are stable in different time slots and always achieve more algorithm efficiency. To further verify the performance of our
notable effects compared with benchmarks. The efficiency algorithm on resource-constrained devices, we evaluate the
of Smart-MOMEPS is 96%, which is almost the same as suitability of our algorithm on mobile devices by measuring
MOMEPS (97%). This result demonstrates that our proposed the execution latency on different devices. The results are
smart migration judgement approach can effectively help user shown in table 3. Compared to the interval of a time slot
determine whether migration is needed based on current user (20s), our algorithm is lightweight enough to help user make
state. migration decisions quickly.
10
TABLE III
L ATENCY OF LSTM AND OMEPS ON DIFFERENT DEVICES .
Method Raspberry Pi Jetson NANO Jetson TX2 95

OMEPS 102.4 ms 39.1 ms 22.3 ms 90
LSTM 44.0 ms 24.5 ms 24.2 ms
85
80
E. Impact of Prediction Error and Look-ahead Window Size
lstm-MOMEPS
Furthermore, we investigate both the impact of prediction 75 lstm-smart-MOMEPS
error and how far to look into the future on algorithm arima-MOMEPS
efficiency. We use two prediction methods to obtain the 70 arima-smart-MOMEPS
future information. One is long short-term memory (LSTM) 2 3 4 5 6 7 8
Prediction window size
and the other is autoregressive integrated moving average
model (ARIMA) [40]. The prediction accuracy of these two Fig. 8. Algorithm efficiency at different model accuracy and prediction
methods are 93.3%, and 82.5%, respectively. Intuitively, the window size.
more accurate the MPC model’s predictions are, the better the
algorithm’s performance will be, and the experimental data
confirm this intuition. As shown in Fig. 8, the efficiency of 100 200
LSTM-based algorithm is 5.3% higher than the ARIMA-based
95
algorithm on average. 180
90
As for the influence of different prediction window size
Migration times (%)

W , Fig. 8 shows that the efficiency of the algorithm will 85 160
improve with the increase of W at the beginning. This is
80
because of the existence of service downtime. When the
75 140
service downtime for migration to a base station is more than
the size of the prediction window, the algorithm will not select 70
this base station as the target of migration. Moreover, the long- 120
65 MOMEPS_mig_percent MOMEPS_uti_percent
term effect of migration is not considered when W is small, smart_MOMEPS_mig_percent smart_MOMEPS_uti_percent
which will also cause the solution to underperform. Therefore, 60 100
2 3 4 5
when the size of the prediction window is smaller than the Number of early exit points
maximum service downtime, the performance of the algorithm
will improve with the increase of the size of the prediction Fig. 9. Algorithm efficiency at different prediction window size.
window. Nevertheless, this improvement does not last forever.
Fig. 8 shows that the performances of these algorithms begin
to decline when W is greater than 6. This can be explained by model exit points play a positive role in DNN service migra-
the fact that when there are no errors in the future prediction, tion. Assuming there is no exit point in a DNN model, when
W can be set as large as possible to obtain the best long-term user perceived delay exceeds the maximum tolerated delay of
performance. However, we can not predict the future perfectly the service, user can only migrate the service since the utility
in practice. The farther ahead we look into the future, the less will be zero if missing the deadline. During service downtime
accurate the prediction becomes. Therefore, when the size of caused by migration, service requests will be processed locally
the prediction window continues to increase, the performance with poor utility. Instead, exit point provides user with an al-
of the algorithm will gradually decline. ternative to migration. User can choose to exit DNN inference
early to reduce computation delay, which can alleviate the
increased transmission delay caused by user movement. This
F. Impact of Different DNN Model Exit Points requires very little utility sacrifice compared to migration. In
a nutshell, more exit points lead to more user utility and fewer
To evaluate how the number of early exit points of DNN
migrations.
model influences the algorithm performance, we employ an
AlexNet model as a DNN service and measure the perfor-
mance of our algorithms with 2 to 5 exit points. To concisely G. Impact of the Base Station Density
show the performance difference, we characterize the perfor- Finally, we show the impact of different base station den-
mance of each algorithm as its ratio to the offline optimal with sities and observe the changes of our algorithms. It is worth
5 exit points. noting that the range of services provided by each base station
The algorithm efficiency and migration times results are is constant in our setup, and we control the density of base
shown in Fig. 9. As the number of exit points increases, the stations by changing its properties (one only forward user
efficiency of the algorithm will improve, and fewer service requests, while the other are capable of both forwarding and
migrations will be performed. This result demonstrates that executing user requests). For simplicity, we characterize the
11
[3] C. Li, M. Song, M. Zhang, and Y. Luo, “Effective replica management

for improving reliability and availability in edge-cloud computing envi-
100
Smart-MOMEPS ronment,” Journal of Parallel and Distributed Computing, vol. 143, pp.
MOMEPS 107–128, 2020.
OMEPS [4] Q. Zhang, Q. Zhu, M. F. Zhani, R. Boutaba, and J. L. Hellerstein,
98
“Dynamic service placement in geographically distributed clouds,” IEEE
Journal on Selected Areas in Communications, vol. 31, no. 12, pp. 762–
96 772, 2013.
[5] T. Ouyang, Z. Zhou, and X. Chen, “Follow me at the edge: Mobility-
aware dynamic service placement for mobile edge computing,” IEEE
94 Journal on Selected Areas in Communications, vol. 36, no. 10, pp. 2333–
2345, 2018.
[6] V. Farhadi, F. Mehmeti, T. He, T. F. L. Porta, H. Khamfroush,
92 S. Wang, K. S. Chan, and K. Poularakis, “Service placement and request
scheduling for data-intensive applications in edge clouds,” IEEE/ACM
Transactions on Networking, vol. 29, no. 2, pp. 779–792, 2021.
90 [7] R. Urgaonkar, S. Wang, T. He, M. Zafer, K. Chan, and K. K. Leung,
4.16 6.32 8.40 10.04 12.57 “Dynamic service migration and workload scheduling in edge-clouds,”
Density of base station (1/km²)
Performance Evaluation, vol. 91, pp. 205–228, 2015.
[8] A. Machen, S. Wang, K. K. Leung, B. J. Ko, and T. Salonidis, “Live ser-
Fig. 10. Algorithm efficiency at different base station densities. vice migration in mobile edge clouds,” IEEE Wireless Communications,
vol. 25, no. 1, pp. 140–147, 2018.
[9] L. Ma, S. Yi, N. Carter, and Q. Li, “Efficient live migration of edge
efficiency of each algorithm under different BS densities as its services leveraging container layered storage,” IEEE Transactions on
Mobile Computing, vol. 18, no. 9, pp. 2020–2033, 2019.
ratio to the offline optimal when the BS density is 12.57. Fig.
[10] L. Gu, D. Zeng, J. Hu, B. Li, and H. Jin, “Layer aware microservice
10 presents that there exists a growth in algorithms efficiency placement and request scheduling at the edge,” in IEEE INFOCOM 2021
as the density of BS increases from 4.16 to 12.57. This is - IEEE Conference on Computer Communications, 2021, pp. 1–9.
because more base stations bring more migration choices when [11] B. Xu, S. Wu, J. Xiao, H. Jin, Y. Zhang, G. Shi, T. Lin, J. Rao,
L. Yi, and J. Jiang, “Sledge: Towards efficient live migration of docker
user migrates services. It is also easier for user to migrate containers,” in 2020 IEEE 13th International Conference on Cloud
services to a base station closer to it for higher utility. To Computing (CLOUD), 2020, pp. 321–328.
summarize, the denser the base stations are, the better our [12] S. Nadgowda, S. Suneja, N. Bila, and C. Isci, “Voyager: Complete
container state migration,” in 2017 IEEE 37th International Conference
algorithms perform. on Distributed Computing Systems (ICDCS), 2017, pp. 2137–2142.
[13] S. Fu, R. Mittal, L. Zhang, and S. Ratnasamy, “Fast and
VII. C ONCLUSTION efficient container startup at the edge via dependency scheduling,”
in 3rd USENIX Workshop on Hot Topics in Edge Computing
In this paper, we investigate a user-centric DNN service (HotEdge 20). USENIX Association, Jun. 2020. [Online]. Available:
migration and exit point selection problem with various service https://www.usenix.org/conference/hotedge20/presentation/fu
downtime in the mobile edge computing environment. We [14] tensorflow, “Tensorflow docker images,” [EB/OL], https://hub.docker.
com/r/tensorflow/tensorflow Accessed March 7, 2022.
leverage the exit point selection and layer sharing feature of [15] S. Teerapittayanon, B. McDanel, and H. Kung, “Branchynet: Fast
the container technique to alleviate performance degradation inference via early exiting from deep neural networks,” in 2016 23rd
caused by inevitable service downtime. To maximize long- International Conference on Pattern Recognition (ICPR), 2016, pp.
2464–2469.
term user utility, we first propose an offline optimal migra- [16] E. F. Camacho and C. Bordons, Model Predictive Control. Model
tion and exit point selection strategy (OMEPS) algorithm Predictive control, 2007.
by leveraging dynamic programming when complete future [17] D. Zhao, T. Yang, Y. Jin, and Y. Xu, “A service migration strategy based
on multiple attribute decision in mobile edge computing,” in 2017 IEEE
information is available. To deal with the uncertain user be- 17th International Conference on Communication Technology (ICCT),
havior, we incorporate a Model Predictive Control framework 2017, pp. 986–990.
to the OMEPS algorithm. And then construct a proactive [18] T. Ouyang, R. Li, X. Chen, Z. Zhou, and X. Tang, “Adaptive user-
managed service placement for mobile edge computing: An online
service migration and DNN exit point selection (MOMEPS) learning approach,” in IEEE INFOCOM 2019 - IEEE Conference on
algorithm. To cope with the heavy computation overheads Computer Communications, 2019, pp. 1468–1476.
of MOMEPS, we propose a cost-efficient algorithm, smart- [19] A. Ksentini, T. Taleb, and M. Chen, “A markov decision process-
based service migration procedure for follow me cloud,” in 2014 IEEE
MOMEPS, which introduces a neural network based smart International Conference on Communications (ICC), 2014, pp. 1350–
migration judgement to navigate the performance and compu- 1354.
tation overhead trade-off. Finally, we conduct extensive trace- [20] S. Wang, R. Urgaonkar, T. He, M. Zafer, K. Chan, and K. K. Leung,
driven experiments to evaluate our online algorithm. We also “Mobility-induced service migration in mobile micro-clouds,” in 2014
IEEE Military Communications Conference, 2014, pp. 835–840.
explore the performance of our algorithms under a variety of [21] H. Ma, Z. Zhou, and X. Chen, “Predictive service placement in mo-
system settings and give corresponding analysis. bile edge computing,” in 2019 IEEE/CIC International Conference on
Communications in China (ICCC), 2019, pp. 792–797.
[22] Y. Zhang, L. Jiao, J. Yan, and X. Lin, “Dynamic service placement for
R EFERENCES virtual reality group gaming on mobile edge cloudlets,” IEEE Journal
[1] M. Hanyao, Y. Jin, Z. Qian, S. Zhang, and S. Lu, “Edge-assisted on Selected Areas in Communications, vol. 37, no. 8, pp. 1881–1897,
online on-device object detection for real-time video analytics,” in IEEE 2019.
INFOCOM 2021 - IEEE Conference on Computer Communications, [23] K. Kawashima, T. Otoshi, Y. Ohsita, and M. Murata, “Dynamic place-
2021, pp. 1–10. ment of virtual network functions based on model predictive control,” in
[2] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management
intelligence: Paving the last mile of artificial intelligence with edge Symposium, 2016, pp. 1037–1042.
computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762, [24] M. Kumazaki and T. Tachibana, “Optimal vnf placement and route
2019. selection with model predictive control for multiple service chains,” in
12
2020 IEEE International Conference on Consumer Electronics - Taiwan

(ICCE-Taiwan), 2020, pp. 1–2.
[25] X. Tian, J. Zhu, T. Xu, and Y. Li, “Mobility-included dnn partition
offloading from mobile devices to edge clouds,” Sensors, vol. 21, no. 1,
2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/1/229
[26] Z. Wang, W. Bao, D. Yuan, L. Ge, N. H. Tran, and A. Y.
Zomaya, “Accelerating on-device dnn inference during service outage
through scheduling early exit,” Computer Communications, vol. 162,
pp. 69–82, 2020. [Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0140366420318818
[27] Z. Zhou, S. Yu, W. Chen, and X. Chen, “Ce-iot: Cost-effective cloud-
edge resource provisioning for heterogeneous iot applications,” IEEE
Internet of Things Journal, vol. 7, no. 9, pp. 8600–8614, 2020.
[28] E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge ai: On-demand accelerating
deep neural network inference via edge computing,” IEEE Transactions
on Wireless Communications, vol. 19, no. 1, pp. 447–457, 2020.
[29] C. Wang, L. Ma, R. Li, T. S. Durrani, and H. Zhang, “Exploring
trajectory prediction through machine learning methods,” IEEE Access,
vol. 7, pp. 101 441–101 452, 2019.
[30] H. Gebrie, H. Farooq, and A. Imran, “What machine learning predictor
performs best for mobility prediction in cellular networks?” in 2019
IEEE International Conference on Communications Workshops (ICC
Workshops), 2019, pp. 1–6.
[31] C. Yang, X. Shi, L. Jie, and J. Han, “I know you’ll be back: Interpretable
new user clustering and churn prediction on a mobile social application,”
in Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining, 2018, pp. 914–922.
[32] J. Feng, Y. Li, C. Zhang, F. Sun, F. Meng, A. Guo, and D. Jin, “Deep-
move: Predicting human mobility with attentional recurrent networks,”
in Proceedings of the 2018 world wide web conference, 2018, pp. 1459–
1468.
[33] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[34] F. Altché and A. de La Fortelle, “An lstm network for highway trajectory
prediction,” in 2017 IEEE 20th international conference on intelligent
transportation systems (ITSC). IEEE, 2017, pp. 353–359.
[35] T. Ouyang, X. Chen, L. Zeng, and Z. Zhou, “Cost-aware edge resource
probing for infrastructure-free edge computing: From optimal stopping
to layered learning,” in 2019 IEEE Real-Time Systems Symposium
(RTSS), 2019, pp. 380–391.
[36] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in neural informa-
tion processing systems, vol. 25, 2012.
[37] A. Keränen, J. Ott, and T. Kärkkäinen, “The ONE Simulator for
DTN Protocol Evaluation,” in SIMUTools ’09: Proceedings of the 2nd
International Conference on Simulation Tools and Techniques. New
York, NY, USA: ICST, 2009.
[38] Q. Wu, X. Chen, Z. Zhou, and L. Chen, “Mobile social data learning for
user-centric location prediction with application in mobile edge service
migration,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 7737–
7747, 2019.
[39] M. Lin, Z. Liu, A. Wierman, and L. L. H. Andrew, “Online algorithms
for geographical load balancing,” in 2012 International Green Comput-
ing Conference (IGCC), 2012, pp. 1–10.
[40] K.-L. Li, C.-J. Zhai, and J.-M. Xu, “Short-term traffic flow prediction
using a methodology based on arima and rbf-ann,” in 2017 Chinese
Automation Congress (CAC), 2017, pp. 2804–2807.

JSA D 22 00243 Reviewer

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JSA D 22 00243 Reviewer

Uploaded by

Copyright:

Available Formats

Journal of Systems Architecture

Edge Intelligence in Motion: Mobility-Aware Dynamic DNN Inference Service Migration

Manuscript Number: JSA-D-22-00243

Article Type: VSI: ECforIoV [MGE - Shaohua Wan]

be migrated or not. Extensive trace-driven simulation results demonstrate the superior

Edge Intelligence in Motion: Mobility-Aware

W ITH the explosive development of Internet of Things

server. For better QoS, the vehicle needs to constantly decide

whether to migrate service or choose early exit points while

TABLE I constraints of DNN service execution should be satisfied as

B. Dynamic Service Migration C. DNN Early Exit Point Selection

Algorithm 2 MPC-Based Online Migration and Exit Point

(e.g., prediction latency), which are not affordable for the

Fig. 4. The percentage of migration and offloading throughout the MPC

Algorithm 3 Smart-MOMEPS Algorithm

80 Fig. 7. MPC-based algorithms computation overhead at different time slots.

60 Besides, we can observe that FHC algorithm has the worst

We also evaluate the computation overhead reduction caused

Method Raspberry Pi Jetson NANO Jetson TX2 95

Algorithm efficiency (%)

Migration times (%)

[3] C. Li, M. Song, M. Zhang, and Y. Luo, “Effective replica management

2020 IEEE International Conference on Consumer Electronics - Taiwan

You might also like