Professional Documents
Culture Documents
JSA D 22 00243 Reviewer
JSA D 22 00243 Reviewer
Abstract: Edge intelligence (EI) becomes a trend to push the deep learning frontiers to the network edge.
In this paper, we consider a user-centric management for DNN inference service migration and
exit point selection, aiming at maximizing overall user utility (e.g., DNN model inference
accuracy) with various service downtime. We first leverage dynamic programming to propose
an optimal offline migration and exit point selection strategy (OMEPS) algorithm when
complete future information of user behaviors is available. Amenable to a more practical
application domain without complete future information, we incorporate the OMEPS algorithm
into a model predictive control (MPC) framework, and then construct a mobility-aware
servicemigration and DNN exit point selection (MOMEPS) algorithm,
which improves the long-term service utility within limited predictive future information.
However, heavy computation overheads of MOMEPS algorithm impose burdens on mobile
devices, thus we further advocate a cost-efficient algorithm, named smart- MOMEPS, which
introduces a smart migration judgement based on Neural Networks to control the
implementation of (MOMEPS) algorithm by wisely estimating whether the DNN service should
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript File Click here to view linked References
1
W ITH the explosive development of Internet of Things user goes back and forth between two BSs’ coverages) incurs
(IoT) devices, an increasing number of Deep Neu- unnecessary migrations and heavy overheads. Therefore, it is
ral Networks (DNNs)-driven IoT applications, including face critical to dynamically migrate a DNN service in a moderated
recognition, autonomous driving and augmented reality are way to follow the user mobility.
emerging. Generally, such applications require massive com- Existing work on edge service migration has regarded
puting resources to guarantee low latency and high inference container as a promising way for service provision [9]–[12].
accuracy, which are often unavailable due to mobile devices’ Compared to traditional virtual machine (VM), container is
a more lightweight virtualization technique and takes up less
2
Container Image Registry delay. To combat the service outage, we could run DNN
service on a mobile device with a proper exit point during
migration. However, exiting inference at an early DNN layer
Layer 1
... results in a loss of accuracy. As the network distance
Layer n
DNN service
container image
continues to increase, the benefits of early exit will
gradually decrease as the inference needs to exit at earlier
exit point.
As illustrated in Fig. 1, a mobile vehicle travels around
EDGE1 EDGE2 EDGE3
three regions, each containing a BS endowed with an edge
conv
conv
conv
Exit 1
FC
DNN service
Exit 2
FC
TABLE I
SUMMARY OF NOTATIONS constraints of DNN service execution should be satisfied as
Σ = si 1{xi (t)>xi (t−1)} , ∀t ∈ T , ∀i ∈ B,
λ(t) (3)
Notation Definition x (τ ) ≤ 1 − 1 ,
B The set of all the base stations i {xi (t)>xi (t−1)}
M All early exit points of the DNN model i∈B
xi(t) Whether user decides to offload service to BS i at time t t + 1 ≤ τ ≤ t + λ(t) − 1, ∀t ∈ T , (4)
yj (t) Whether j is the selected early exit point at time t
Tmax The maximum tolerated delay of each request
rt The request rate at time t where 1{·} is the indicator function. The constraint (3) indi-
dfj Inference delay of the j-th early point on edge server cates the current service downtime for migrating service to BS
de Inference delay of the fixed γ-th early point on local device i when xi (t) > xi (t − 1) (i.e., ix (t) = 1 andi x (t − 1) =
γ
0).
lf (t) Total delay of a request while offloading service at The constraint (4) denotes the local execution during the
t lc(t) Communication delay while offloading service at t migration procedure, i.e., the user can not make migration
le(t) Total delay of a request delay while executing locally at decision from t + 1 to t + λ(t) −1 after deciding to migrate
t si Time cost for BS i to fetching container images
λ(t) Downtime caused by migrating service at
to BS i at time t, where the service is being migrated from t
t S(t) User is in the offloading state at t to t + λ(t) 1.
−
Uo(t) User utility while offloading at To clearly describe the current state of user, we use S(t)
t Um(t) User utility while migrating at t
to indicate whether user is migrating or offloading. S(t) = 1
represents user is offloading computation to BS at t. Accord-
ingly, S(t) = 0 when user is migrating its service (executing
system dynamics (e.g., user mobility and request frquency). requests locally). Obviously, there is only one state for a
Each time slot T = {0, ..., T } matches the time scale user at a given time slot. The expressions are listed below:
where service migration and exit point selection decisions Σ
are updated. For ease of exposition, Table I summarizes the S(t) = xi (t) − 1{xi (t)>xi (t−1)} , ∀t ∈ T , (5)
introduced notations. i∈B
S(t) ∈ {0, 1}, ∀t ∈ T . (6)
point selection problem. For ease of exposition, we transform Algorithm 1 Offline Optimal Migration and Exit Point Selec-
the offline optimal problem into a longest path problem by tion Strategy (OMEPS) Algorithm
constructing a directed acyclic graph (DAG). 1: Parameter Notation
2: Vector O(i,q,n) is the optimal migration strategy, it
T1 T2
T5
T3 T4
contains a series optimal states from initial state to state
B1 B1 B1 B1 B1 B1 (i, q, n).
− − −
3: Initialization: Initialize the initial state (i , q , n ) =
(0, 1, 1), optimal strategy O(i,q,n) = ∅, O(0,1,1) =
{(0,
∗ 1, 1)}, optimal accumulated utility of the initial state
S B2 B2 B2 B2 E f( 0,1,1) = 0.
4: for each time slot i = 1, ..., T do
5: for all q such that q ∈ do
6: Determine the optimal
B previous state according to
B3 B3 B3
decision p by using (12) − (17), i.e., (i−, q−, n−) opt =
Fig. 2. Longest-path problem transformation of offline service migration and ∗
DNN exit point selection problem over T = 5 time slots with three BSs B1,
arg max(i− ,q− ,n− ) {U ((i− , q − , n− ), p, m) + f( i− ,q− ,n− ) }.
B2 and B3. 7: if the optimal previous state (i−, q−, n−)opt is
found, then
As shown in Fig. 2, we construct a graph G = (V, E) to 8: Update the optimal decisions to current state
represent all possible service migration and DNN exit point O(i,q,n) {(i, q, n)} ∪ (i−,q−,n−)opt .
=
se- lection decisions within T time slots. Each vertex presents 9: O Update the optimal accumulated utility at
∗
the state (i, q, n) that user can reach. Since the future current state f( i,q,n) = U ((i− , q − , n− )opt , p, m)
∗
information (user trajectory and request frequency) is f( .
known, exit point n can be determined when i and q are + (i−,q−,n−)opt )
10: else
given at each state. Note that the source vertex S represents 11: Update the optimal accumulated utility at cur-
the initial state (we set it as (0, 1, n)). Each state (except the ∗
rent state f( i,q,n) = 0.
initial state) is transited from the previous state by performing
the corresponding decision. The destination vertex E is an 12: end if
auxiliary vertex to ensure that a single longest path can be 13: end for
14: end for
founded. Each edge weight on the DAG between two states
represents the sum of the request utilities of executing 15: Pick the maximum accumulated utility state (i, q, n)opt =
∗
decisions, and the edges connecting to E have zero weight. arg max(i,q,n) f( i,q,n) and set its corresponding
It’s worth noting that, suppose the user decision can be policy O(i,q,n) as the offline optimal service migration and
completed before T , we can draw a directed edge between DNN
two states. However, if the decision completes at a time model exit point selection strategy Ooff .
beyond T , for example, when a user in state B1 at T4
performs the decision of transferring to state B2. We draw a
directed edge from B1 at T4 to the corresponding yellow to find out the optimal strategies O of each time slot for a
auxiliary vertex B2. Accordingly, the weight of the edge given finite time horizon. In the algorithm, we can obtain
represents the sum of the utilities that user can obtain from the longest-path (i.e., the optimal service migration and DNN
T4 to the end. The weight of the edge connecting each model exit point selection strategy) for each state by solving
vertex of time T to the yellow auxiliary vertices is zero. We the Bellman’s equation (i.e., line 6). Then we can pick the
have now completed the construction of the DAG. path that contains the state with the highest accumulated
We can derive the user’s optimal strategy by finding the utility at T as the longest path (i.e., line 15), which is the
longest path from S to E. Specifically, given all the optimal solution to the problem. For searching the longest
informa- tion of user activities over T time slots, the weight of path, the algorithm needs to enumerate at most B2 possible
all edges can be calculated. And the total weight of a path states at each time slot. Thus, for the T -time slots, the time-
from source vertex S to destination vertex E can hence complexity of Algorithm 1 is O(B2T ).
present the whole utility over the time horizon.
Consequently, the optimal service migration and DNN model V. ONLINE SERVICE MIGRATION AND EXIT PINT
exit point selection strategy can be found by taking the SELECTION ALGORITHM
longest path from S to E. As shown in Fig. 2, we give the
So far we have presented the solution of the problem in
longest path for T = 5 with 3 BSs. Each red vertex
(11) under complete future information scenario as a baseline.
represents the state of the user at the corresponding time slot,
In practice, it is challenging to obtain complete information.
and the vertex pointed by the solid black edge is the user state
This motivates an optimal online algorithm without complete
after performing the decision. Obviously, since this longest
information. To this end, in this section, we combine some
path problem has an optimal sub-structure property, it can be
popular machine learnig techniques (e.g., LSTM) to predict
solved by the classical dynamic programming approach.
future information (e.g., mobility traces) to improve the long-
Algorithm 1 shows the pseudocode of our optimization al-
term service performance with informed decision making.
gorithm which uses dynamic programming with memoization
However, frequent prediction would incur large running costs
1
FC
1: Parameter Notation
Migration decision-making
Exit Point 1
2: Vector π(i,q,n) is the optimal migration and exit point
CONV
selection strategy of a prediction window, it contains a
FC series optimal states from current time slot t to t + W .
3: Initialization: τ = 1, t = 1.
Exit Point 2
... Repeat
Migration judgement
NN model
Offloading MOMEPS
exit point selection
CONV
FC
Exit Point 1
CONV
FC
Fig. 4. The percentage of migration and offloading throughout the MPC Exit Point 2
TABLE II
LATENCY AND ACCURACY AT EACH EXIT POINT OF ALEXNET.
Lazy-MPC Smart-MOMEPS MOMEPS
1400
Exit point 1 2 3 4 5
1200
Latency (ms) 9.4 14.0 18.5 24.4 30.2
Computation overhead
Accuracy 70.0 71.2 76.0 77.7 78.0 1000
800
600
120
FHC Lazy-MPC 400
LM AM PLM MOMEPS
110 Smart-MOMEPS
200
Algorithm efficiency (%)
100
0
500 1000 1500 2000
90 Time slot
70
60
Besides, we can observe that FHC algorithm has the worst
50 performance, with only 58% of the offline optimal. This is
500
1000 1500 2000 because the decisions in the last few time slots within the
Time slot
prediction window deviate far from the optimal decisions due
Fig. 6. Algorithm efficiency at different time slots (W = 5 for MPC-based to the accumulation of prediction errors, which can severely
algorithms).
reduce the algorithm efficiency. Compared to FHC, Smart-
MOMEPS has 1.6 times efficiency and better robustness.
2) Lazy Migration (LM): service will not be migrated until Except for FHC, algorithms that leverage future information
the distance between the base station where it is hosted work better than those that do not. The reason is that we
and the user exceeds the threshold. Once the migration have considered service downtime in this work. If the service
is triggered, the service image is migrated to the base blindly follows the user’s trajectory (AM and LM), service
station where the user is currently located. cannot be migrated to a suitable BS in most cases due to
3) Predictive Lazy Migration (PLM): a prediction-based user mobility. In contrast, future information can help user
Lazy Migration algorithm proposed in [38]. It leverages migrate service as appropriately as possible. Compared to
one-shot prediction to improve the LM algorithm. LM, PLM can help user avoid unnecessary migrations by
4) Lazy MPC (L-MPC): this algorithm is proposed to using the information of the next slot. However, when the
reduce the computation overhead of MPC. The basic service needs to be migrated, PLM can not guide user where
idea of L-MPC is that we use MPC only when the to migrate the service since the downtime contains several
migration condition in LM is met. time slots. Accordingly, far-sighted Smart-MOMEPS can be
5) Fixed Horizon Control (FHC): unlike standard MPC- more effective than PLM.
based algorithm, FHC performs the whole decisions
within a prediction window instead of only the first step
of these decisions [39].
D. Algorithms Execution Cost
TABLE III
LATENCY OF LSTM ANDOMEPS ON DIFFERENT DEVICES.
model (ARIMA) [40]. The prediction accuracy of these two Prediction window size
the size of the prediction window, the algorithm will not select 70
this base station as the target of migration. Moreover, the 120
IEEE Journal on Selected Areas in Communications, vol. 31, no. 12, pp.
762– 772, 2013.
96
[5] T. Ouyang, Z. Zhou, and X. Chen, “Follow me at the edge: Mobility-
aware dynamic service placement for mobile edge computing,” IEEE
94
Journal on Selected Areas in Communications, vol. 36, no. 10, pp.
2333– 2345, 2018.
[6] V. Farhadi, F. Mehmeti, T. He, T. F. L. Porta, H. Khamfroush,
92 S. Wang, K. S. Chan, and K. Poularakis, “Service placement and request
scheduling for data-intensive applications in edge clouds,” IEEE/ACM
Transactions on Networking, vol. 29, no. 2, pp. 779–792, 2021.
90 [7] R. Urgaonkar, S. Wang, T. He, M. Zafer, K. Chan, and K. K. Leung,
4.16 6.32 8.40 10.04 12.57 “Dynamic service migration and workload scheduling in edge-clouds,”
Density of base station (1/km²)
Performance Evaluation, vol. 91, pp. 205–228, 2015.
[8] A. Machen, S. Wang, K. K. Leung, B. J. Ko, and T. Salonidis, “Live ser-
Fig. 10. Algorithm efficiency at different base station densities. vice migration in mobile edge clouds,” IEEE Wireless Communications,
vol. 25, no. 1, pp. 140–147, 2018.
[9] L. Ma, S. Yi, N. Carter, and Q. Li, “Efficient live migration of edge
efficiency of each algorithm under different BS densities as its services leveraging container layered storage,” IEEE Transactions on
Mobile Computing, vol. 18, no. 9, pp. 2020–2033, 2019.
ratio to the offline optimal when the BS density is 12.57. Fig. [10] L. Gu, D. Zeng, J. Hu, B. Li, and H. Jin, “Layer aware microservice
10 presents that there exists a growth in algorithms efficiency placement and request scheduling at the edge,” in IEEE INFOCOM
as the density of BS increases from 4.16 to 12.57. This is 2021
- IEEE Conference on Computer Communications, 2021, pp. 1–9.
because more base stations bring more migration choices
[11] B. Xu, S. Wu, J. Xiao, H. Jin, Y. Zhang, G. Shi, T. Lin, J. Rao,
when user migrates services. It is also easier for user to L. Yi, and J. Jiang, “Sledge: Towards efficient live migration of docker
migrate services to a base station closer to it for higher utility. containers,” in 2020 IEEE 13th International Conference on Cloud
To summarize, the denser the base stations are, the better our Computing (CLOUD), 2020, pp. 321–328.
[12] S. Nadgowda, S. Suneja, N. Bila, and C. Isci, “Voyager: Complete
algorithms perform. container state migration,” in 2017 IEEE 37th International Conference
on Distributed Computing Systems (ICDCS), 2017, pp. 2137–2142.
VII. CONCLUSTION [13] S. Fu, R. Mittal, L. Zhang, and S. Ratnasamy, “Fast and
efficient container startup at the edge via dependency scheduling,”
In this paper, we investigate a user-centric DNN service in 3rd USENIX Workshop on Hot Topics in Edge Computing
migration and exit point selection problem with various (HotEdge 20). USENIX Association, Jun. 2020. [Online]. Available:
https://www.usenix.org/conference/hotedge20/presentation/fu
service downtime in the mobile edge computing [14] tensorflow, “Tensorflow docker images,” [EB/OL], https://hub.docker.
environment. We leverage the exit point selection and layer com/r/tensorflow/tensorflow Accessed March 7, 2022.
sharing feature of the container technique to alleviate [15] S. Teerapittayanon, B. McDanel, and H. Kung, “Branchynet: Fast
performance degradation caused by inevitable service inference via early exiting from deep neural networks,” in 2016 23rd
International Conference on Pattern Recognition (ICPR), 2016, pp.
downtime. To maximize long- term user utility, we first 2464–2469.
propose an offline optimal migra- tion and exit point [16] E. F. Camacho and C. Bordons, Model Predictive Control. Model
selection strategy (OMEPS) algorithm by leveraging Predictive control, 2007.
[17] D. Zhao, T. Yang, Y. Jin, and Y. Xu, “A service migration strategy
dynamic programming when complete future information is based on multiple attribute decision in mobile edge computing,” in 2017
available. To deal with the uncertain user be- havior, we IEEE 17th International Conference on Communication Technology
incorporate a Model Predictive Control framework to the (ICCT), 2017, pp. 986–990.
[18] T. Ouyang, R. Li, X. Chen, Z. Zhou, and X. Tang, “Adaptive user-
OMEPS algorithm. And then construct a proactive service managed service placement for mobile edge computing: An online
migration and DNN exit point selection (MOMEPS) learning approach,” in IEEE INFOCOM 2019 - IEEE Conference on
algorithm. To cope with the heavy computation overheads Computer Communications, 2019, pp. 1468–1476.
[19] A. Ksentini, T. Taleb, and M. Chen, “A markov decision process-
of MOMEPS, we propose a cost-efficient algorithm, smart- based service migration procedure for follow me cloud,” in 2014 IEEE
MOMEPS, which introduces a neural network based smart International Conference on Communications (ICC), 2014, pp. 1350–
migration judgement to navigate the performance and compu- 1354.
[20] S. Wang, R. Urgaonkar, T. He, M. Zafer, K. Chan, and K. K. Leung,
tation overhead trade-off. Finally, we conduct extensive trace- “Mobility-induced service migration in mobile micro-clouds,” in 2014
driven experiments to evaluate our online algorithm. We also IEEE Military Communications Conference, 2014, pp. 835–840.
explore the performance of our algorithms under a variety of [21] H. Ma, Z. Zhou, and X. Chen, “Predictive service placement in mo-
bile edge computing,” in 2019 IEEE/CIC International Conference on
system settings and give corresponding analysis. Communications in China (ICCC), 2019, pp. 792–797.
[22] Y. Zhang, L. Jiao, J. Yan, and X. Lin, “Dynamic service placement for
REFERENCES virtual reality group gaming on mobile edge cloudlets,” IEEE Journal
on Selected Areas in Communications, vol. 37, no. 8, pp. 1881–1897,
[1] M. Hanyao, Y. Jin, Z. Qian, S. Zhang, and S. Lu, “Edge-assisted
2019.
online on-device object detection for real-time video analytics,” in IEEE
[23] K. Kawashima, T. Otoshi, Y. Ohsita, and M. Murata, “Dynamic place-
INFOCOM 2021 - IEEE Conference on Computer Communications,
ment of virtual network functions based on model predictive control,” in
2021, pp. 1–10.
NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management
[2] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge
Symposium, 2016, pp. 1037–1042.
intelligence: Paving the last mile of artificial intelligence with edge
[24] M. Kumazaki and T. Tachibana, “Optimal vnf placement and route
computing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762,
2019. selection with model predictive control for multiple service chains,” in
1