When Edge Computing Meets Microgrid A Deep Reinforcement Learning Approach

7360 IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO.
5, OCTOBER 2019
When Edge Computing Meets Microgrid: A Deep

Reinforcement Learning Approach
Md. Shirajum Munir , Sarder Fakhrul Abedin , Student Member, IEEE,
Nguyen H. Tran , Senior Member, IEEE, and Choong Seon Hong , Senior Member, IEEE
Abstract—The computational tasks at multiaccess edge carrier networks, where MEC hosts are deployed at the edge,
computing (MEC) are unpredictable in nature, which raises central data network, or between them [1]. The potentiality
uneven energy demand for MEC networks. Thus, to handle this of MEC includes low-latency, high-bandwidth, and real-time
problem, microgrid has the potentiality to provides seamless
energy supply from its energy sources (i.e., renewable, non-
computation with the decision making feed-back for het-
renewable, and storage). However, supplying energy from the erogeneous IoT applications and services [e.g., smart home,
microgrid faces challenges due to the high uncertainty and irreg- smart city, emergency service, virtual reality, augmented real-
ularity of the renewable energy generation over the time horizon. ity (AR), autonomous vehicles, smart energy, industrial IoT,
Therefore, in this paper, we study about the microgrid-enabled and so on] [2]. Moreover, toward the year of 2021, these
MEC networks’ energy supply plan, where we first formulate an applications will produce 49 exabytes of computational data
optimization problem and the objective is to minimize the energy
consumption of microgrid-enabled MEC networks. The problem per month, which includes 78% of video data, and also the
is a mixed integer nonlinear optimization with computational MEC is the key enabler to compute the 63% of total com-
and latency constraints for tasks fulfillment, and also coupled putation [3]. However, these applications are very hungry for
with the dependencies of uncertainty for both energy consump- energy consumption, where high computational background
tion and generation. Therefore, we show that the problem is services are deployed at MEC for fulfilling the application
an NP-hard problem. As a result, second, we decompose our
formulated problem into two subproblems: 1) energy-efficient
requirements [4]. In addition, to compute the huge amount
tasks assignment problem for MEC into community discovery of computational tasks at MEC, 30%–50% of the additional
problem and 2) energy supply plan problem into Markov deci- energy consumption is required [5]. In fact, the energy con-
sion process. Third, we apply a low complexity density-based sumption of MEC is uneven since computational tasks request
spatial clustering of applications with noise to solve the first sub- are random over time. However, the physical deployment of
problem for each base station distributedly. Sequentially, we use MEC can be flexible for the network operators while supported
the output of the first subproblem as a input for solving the
second subproblem, where we apply a model-based deep rein- applications, available site facilities, operational, performance,
forcement learning. Finally, the simulation results demonstrate and/or security parameters are considered along with technical
the significant performance gain of the proposed model with a and business perspective [6].
high accuracy energy supply plan. The MEC is already included as an essential component
Index Terms—Computational tasks, deep reinforcement learn- in the smart infrastructures, such as the smart city, smart
ing (RL), demand response (DR), energy management, Internet factory [2]. Meanwhile, the microgrid is also considered to
of Things (IoT), microgrid, multiaccess edge computing (MEC), be prominent in those MEC infrastructures [7]. As a result,
unsupervised learning. microgrid can be useful energy supplement to MEC, while
the necessity of the renewable energy powered base stations
(BSs) operation is established more than a decade ago by
the Ericsson (telecommunications company) [8]. Furthermore,
I. I NTRODUCTION joint operation of wireless networks and renewable energy sup-
N THE era of fifth-generation networks with the sustainable ply was achieved energy saving, where a renewable energy
I expansion of smart services via Internet of Things (IoT)
applications, multiaccess edge computing (MEC) is a revo-
aware grid-enabled cellular networks address energy har-
vesting challenges for energy load balancing [9]–[11], task
lutionary technology. MEC is a key technology that brings scheduling [12], and demand-side management for intranet-
application-oriented computational capabilities at the edge of works infrastructure, which save up to 18% of the energy
usages [13]. In addition, suitable combination of the renew-
Manuscript received December 14, 2018; accepted February 4, 2019. Date
of publication February 15, 2019; date of current version October 8, 2019.
able, nonrenewable, and storage energy generation and dis-
This work was supported by the National Research Foundation of tribution can save up to 30% of the energy usages for radio
Korea Grant funded by the Korea Government (MSIT) under Grant access networks [14]. However, MEC energy consumption has
NRF-2017R1A2A2A05000995. (Corresponding author: Choong Seon Hong.) been overlooked in these activities. Therefore, a joint problem
M. S. Munir, S. F. Abedin, and C. S. Hong are with the
Department of Computer Science and Engineering, Kyung Hee University, can be a possible way to tackle the uncertainty for both
Seoul 17104, South Korea (e-mail: munir@khu.ac.kr; saab0015@khu.ac.kr; energy consumption of MEC networks and microgrid energy
cshong@khu.ac.kr). generation toward the energy saving.
N. H. Tran is with the School of Computer Science, University of Sydney,
Sydney, NSW 2006, Australia (e-mail: nguyen.tran@sydney.edu.au). MEC confronts with the sustainability issue to manage the
Digital Object Identifier 10.1109/JIOT.2019.2899673 computation with respect to energy consumption [15], [16]
2327-4662 c 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INTERNATIONAL ISLAMIC UNIVERSITY. Downloaded on May 17,2022 at 11:43:36 UTC from IEEE Xplore. Restrictions apply.
MUNIR et al.: WHEN EDGE COMPUTING MEETS MICROGRID 7361
and the competence of MEC operation is depending on effi- those time variant information, in which a long term esti-
cient energy supply for microgrid empowered MEC networks. mation is performed using historical information along
However, the MEC receives tasks with the uncertainty, where with the current observation. As a result, a strong con-
the characteristic of the energy consumption relies on the nection between MEC energy demand and supply is
computational payload size. Furthermore, the reliability and established, which implies the relationship between the
stability of the microgrid energy supply contingent on energy uncertainties of MEC task loads and microgrid energy
generation of the renewable (e.g., solar, wind, biofuels, etc.) generation.
and nonrenewable (e.g., diesel generator, coal power, and 3) To derive the solution, we use a density-based spa-
so on) energy sources [17]. In order to solve this, a strong tial clustering of applications with noise (DBSCAN)-
coordination between MEC networks energy consumption based energy-efficient tasks assignment algorithm to
and microgrid energy generation is required over the time solve the first subproblem distributedly. To an extent,
horizon. In such case, we devise a microgrid-enabled MEC this algorithm effectively manages nondeterministic and
networks’ energy supply plan problem, which not only guar- heterogeneous computational tasks for MEC operation
antees computational and delay requirements but also ensures while considering the computational and tolerable delay
the sustainable energy supply to MEC networks. In order to requirements. Consecutively, we obtain the solution for
achieve this we face several challenges. microgrid energy supply plan, where we use an algo-
1) Energy consumption of the MEC networks varies on the rithm based on model-based deep reinforcement learning
nondeterministic flow of computational tasks request. (MDRL). The proposed solution achieves fast conver-
Furthermore, to execute these computation with a high gence with high forecasting accuracy, which mitigates
degree of reliability, computational capacity of the MEC the uncertainties for MEC operation along with energy
server and tolerable delay for the tasks are needed to be consumption and microgrid energy generation.
considered. 4) Finally, we perform an extensive simulation analysis for
2) The coordination between MEC operation and microgrid the proposed microgrid-enabled multiaccess energy sup-
energy supply is another critical challenge, where both ply plan. The simulation results show that the proposed
of them face uncertainties over different time periods. approach outperforms the greedy and the random-greedy
As a result, the long-term estimation of the energy con- approaches with the 97% of tasks execution success rate,
sumption and generation is more important than the saves up to 7.3% of the total energy consumption, and
immediate estimation. provides a high accuracy energy supply plan.
3) A centralized solutions can no longer cope with the The remainder of this paper is organized as follows. A
massive overhead in terms of required computation and review of related works is given in Section II. Section III
signaling for dense MEC networks. Consecutively, a dis- contains the system model and formulation of the multiaccess
tributed solutions can be a possible way to overcome edge server energy supply plan problem. Section IV presents
these overhead. the proposed problem decomposition. Section V represents a
To address these challenges, we focus on approaches that data-driven energy supply plan, which describes the solution
not only coordinate both MEC network energy consump- procedure for the proposed model. Experimental results are
tion and microgrid energy generation, but also adjust the discussed in Section VI. Finally, the conclusions are given in
task assignment to MEC such that energy consumption is Section VII.
minimized. We summarize our key contributions as follows.
1) We first formulate the multiaccess edge server energy
supply plan problem, which is a mixed integer nonlin- II. R ELATED W ORK
ear optimization problem. The problem’s objective is to Energy management, for power grid-enabled wireless
minimize the energy consumption for the MEC servers networks, have received much attention in recent years.
such that the computational and latency requirements Therefore, in the aspects of network deployment, and network
are satisfied. In addition, the formulated problem coor- operation for small cell networks, the feasibility analysis
dinates between the MEC networks energy consumption has been performed with renewable energy generation [9].
and the microgrid energy generation to determine the Nevertheless, this paper did not study task computation or
microgrid energy supply plan for fulfilling the energy demand response (DR) management. The broadcasting nature
demand of MEC networks. of the wireless communications, wired energy transfer facil-
2) In order to solve the formulated problem with a data- ity established by the grid architecture among multiple BSs
driven and distributed approaches, we decompose this has influenced the offloading decision between the BSs [10].
into two subproblems. The first subproblem of energy- However, this paper observed the energy loss for the offloaded
efficient tasks assignment is formulated a community traffic since it always requires more energy compared to the
discovery problem and this problem is solved distribut- original associated nearest BS. A smart grid powered wireless
edly for each BS. Sequentially, the second subproblem Heterogeneous Network (HetNet) scenario was considered,
pertaining to energy supply plan and that is decomposed where this paper proposed a joint optimization problem and
by Markov decision process, where the input of second the objective is to minimize both transmit power among the
subproblem depends on the output (energy demand) of networks and power loss for power distribution. This proposed
first subproblem and microgrid energy generation. Since, model achieved an energy-efficient wireless data transmission
the decision of energy supply plan is strongly related on with less power loss for a grid-supported wireless HetNet
7362 IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO. 5, OCTOBER 2019
as compared to standalone HetNet [11]. These studies, focus

on energy management for BSs operation equipped with
renewable energy generation, while the major concern about
the establishment of infrastructure of power grid supported
wireless networks.
Similarly, a proposal to jointly optimizing BS operation and
power distribution for mobile networks powered by a smart
grid has made [13]. This paper considered the coupling of BS
operation and the power distribution challenges, and then pro-
vides an approximate solution. However, this paper did not
consider the MEC infrastructure. Thus, to operate HetNets
with a hybrid energy supply, user scheduling, and resource
allocation scheme has proposed based on the renewable energy
sources for each of the small cells. The deep reinforcement
learning (RL) has used to solve this problem by introducing
the actor/critic algorithm [12]. However, they did not mention Fig. 1. System model.
the relationship between the mobile networks task requests
with renewable energy generation over the time horizon.
To an extent, research is focusing on solving energy mobile devices under the fog environments [27]. The smart
efficiency of the task offloading problems using nonlinear city services are managed by -greedy-based RL method for
problem, which in some cases we scarify the optimal solu- the software-defined networks, where the model is explo-
tion to reduce the computational complexity of the problem ration and exploitation in nature [28]. The single agent-based
or proposed optimal solution with high computational com- Q-learning algorithm has deployed for analyzing the energy
plexity. In respect to low latency communication decision market and provides the market strategy [29]. However, these
making, high complexity solution is not appropriate [18], [19]. methods did not use the deep learning RL for the policy deter-
However, for some cases, grid-enabled wireless networks are mination and also takes a long time to convergence. In this
not considered, so the aim of MEC faces with sustainability paper, we have used MDRL for energy supply plan, which
issues. To fulfill the objective of MEC, which accomplish the can overcome those challenges.
smooth transition for service fulfillment for video analytics, In summary, our proposed approaches consider the
location services, IoT, AR, optimized local content distribu- microgrid energy supply plan in the domain of microgrid-
tion, data caching, and so on, near to the end device with the enabled MEC networks, which is an evolving area for research
high reliability. community. To the best of our knowledge, there are no such
From the perspective of energy demand forecasting and study for the analogy of MEC energy consumption with the
demand-side management, most studies have considered the renewable energy generation, which resolves the sustainable
residential and industrial energy management points of view. issue of the MEC. Thus, a data-driven approach is capable to
Furthermore, to solve the unpredictable energy demand provide a holistic energy supply plan for microgrid-enabled
for home appliances, three layers hierarchical architecture MEC networks. The proposed method considers the MEC
has used to solve the multiobjective optimization problem servers energy consumption model-based on the computational
for demand-side management [20]. Furthermore, an online requirements and tolerable computational delay requirements
cloud-based nonintrusive load monitoring methodology was for the heterogeneous computational tasks of the IoT services.
proposed to detect household appliance energy load in a The proposed approaches and solution are a new contributions
nonintrusive way. This method has separated residential build- with respect to this field.
ing energy consumption in near real-time for the proper
demand-side management of the households appliance [21].
III. S YSTEM M ODEL AND P ROBLEM F ORMULATION
Additionally, DR management has already proved the poten-
tiality for effective energy supply of residential, commercial, A. System Model
and cloud data center [21]–[23]. We consider a microgrid-enabled wireless network that con-
In our problem, the energy-efficient tasks assignment sists of a set B = {1, 2, . . . , B} of BSs and each BS i has a
problem solves using the DBSCAN-based method, which is set of active multiaccess edge servers Ci = {1, 2, . . . , Ci } with
unsupervised learning in nature [24], [25]. The characteris- the homogeneous computational capacity, as seen in Fig. 1.
tic of DBSCAN is immune to noise and also able to handle We also consider one time slot t, which is in a infinite time
clusters of various shapes and sizes. The number of clusters horizon T = 1, 2, . . . , ∞ and the duration of each time slot
determines by its nature, which is more appropriate to solve t is 15 min [30]. ui (t) represents the computational capac-
our problem. ity of an active edge server at time slot t. There is a set of
On the other hand, various kind of RL has used for solv- heterogeneous user tasks K = {1, 2, . . . , K} with a user task
ing the problems regarding robotics applications, user task association indicator ωik (t) = 1 if task k is already assigned
offloading, network management, and also for the energy mar- to the BS i at time t, 0 otherwise. Each task k ∈ K is served
ket strategic plan [26]–[29]. Multiagent RL has used to solve by a multiaccess edge server j ∈ Ci at BS i. The properties of
the spatio-temporal resource requirements of heterogeneous task k ∈ K represents with a tuple βk , γk , where βk and γk
TABLE I
S UMMARY OF N OTATION The total time that includes the execution (service) time and
the waiting time in the queue for BS i is as follows [32]:
1
τi (t) = . (5)
μi (t)(1 − ρi (t))
The latency ratio at BS i ∈ B is
τi (t) − Ti (t) ρi (t)
ψi (t) = = . (6)
Ti (t) 1 − ρi (t)
A smaller value of ψi (t) determines that BS i has less latency
communication for its associated tasks.
Since, in this scenario,1 user task requests behave accord-
ing to a the Poisson process, which acts as stochastic (random
over time) in nature. On the other hand, the service rate is
exponentially distributed as the traffic size is already known,
where the data rate is considered as a constant. Therefore,
M/M/1 queuing can be considered to be an appropriate
choice [31], [33], [34].
The amount of CPU resources required for a single task k
execution at MEC is

κi (t) = δmn (t) (7)
n∈N m∈M
where N is the number of cores, M is the number of CPU com-

denote the number of expected computational units (CPU ponents (CPU cycles), represents the weight for a single
cycles), and maximum tolerable computational delay, respec- core, and δmn (t) is the active ratio [35].
tively. Additionally, in this model the microgrid controller is 2) Energy Consumption Model:
able to control the energy supply based on the energy demands a) Base station operation energy consumption: In BS i,
from the networks. Table I represents the summary of notation. the energy consumption for network communication and data
1) Communication and Computation Model: We consider transfer through the networks with transmission power pi and
the task arrival rate at time slot t under the BS i ∈ B is λi (t), transmission rate ri (t) is as follows:
which follows the Poisson process with average traffic size
pi Si (t)
Si (t) and the average traffic load denoted by λi (t)Si (t). The Einet (t) = ηnet (t) + φstnet (t) (8)
transmission data rate for an BS i ∈ B is as follows: ri (t)
where ηnet is the energy coefficient for transferring data with
pi Gi (t)
ri (t) = wi log2 1 + 2 (1) energy consumption [(pi Si (t))/(ri (t))] through the networks,
σ + i ∈B,i =i Ii (t) the value of ηnet is depending on the type of network
devices [36]. φstnet (t) determines the static energy consumption
where wi is the channel bandwidth, the transmission power is
for the BS operations [37].
denoted with pi , Gi (t) is the channel gain, the channel noise
b) CPU of MEC server energy consumption: CPU of
is σ 2 , and the interference of the channel is Ii (t).
MEC server energy consumption depends on the proces-
Thus, the user task execution service rate under the BS i at
sor architecture, number of CPU cores and number of CPU
time slot t is as follows:
cycle usages. The total energy consumption is defined as
ri (t) follows [35]:
μi (t) = . (2) M
λi (t)Si (t)
cpu
N cpu
Hence, the service time under the BS i at time slot t is Ei (t) = η δmn (t) + φst (t)
cpu
(9)
n=1 m=1
1
Ti (t) = . (3) where δmn (t) represents the dynamic power consumption for
μi (t)
core n with component m of the CPU, and ηcpu is the energy
Therefore, we assume that a set of tasks ∀k ∈ K uniformly coefficient for CPU usage. The static energy consumption is
cpu
distributed at time slot duration t under the BS i and the overall represented by φst (t).
utilization rate is as follows [31]:
1 Additionally, the proposed system model is capable of adopting a more

λi (t) general queuing model, such as M/G/1. Unlike the M/M/1 model, the service
ρi (t) = ωik (t) dt (4) time in M/G/1 follows an arbitrary distribution that is the general [13], [38].
t∈T μi (t) Therefore, in such case, the total time τi (t) can be defined by Pollaczek–
Khinchin formula [39], τi (t) = [(λi (t)E[Ti (t)2 ])/(2(1 − ρi (t)))] + E[Ti (t)],
where t∈T ωik (t)λi (t)dt is the total amount of tasks served by where E[Ti (t)] is expectation of service time and E[Ti (t)2 ] is expectation of
the BS i at time slot t. square of the service time.
c) Total energy consumption: The multiaccess edge The first part of (16) determines for the dynamic energy con-
server-enabled network system requires two types of energy sumption to fulfill task k ∈ K at edge server j ∈ Ci from (10).
consumption: 1) the static energy Eist (t) and 2) the dynamic The dynamic energy consumption depends on (8) and (9)
dyn
energy Ei (t) consumption [37]. The static energy consump- regarding the BS operation and CPU usage, respectively, for
tion has a fixed amount of energy that does not depend on the the computational task. The second part of (16) represents the
computational tasks, and this type of energy is needed for idle static energy consumption which represents idle state operation
state operations. The dynamic energy depends on the com- energy consumption and does not depend on the computational
putational task load that includes BS operation (network data task k ∈ K. Furthermore, the amount of energy required for
transfer) and amount of CPU usages. Equations (8) and (9) the server j ∈ Ci to complete one task is k∈K ξijk xijk , where
represent the amount of energy consumption for data transfer xijk is a binary decision variable with xijk = 1 if the task k ∈ K
through the networks and CPU usages, respectively [40]. The is assigned to server j ∈ Ci , and 0 otherwise
dynamic energy consumption at time slot t is
1, if k ∈ K assigned to j ∈ Ci
xijk = (17)
Ei
dyn
(t) = Ei
cpu
(t) + Einet (t). (10) 0, otherwise.
The objective is to minimize the total amount of energy con-
The total energy consumption under BS i ∈ B is as sumption ξijk needed for executing the heterogeneous tasks
follows [37]: under the constraints of computational capacity and maximum
dyn tolerable delay. Additionally, based on the energy demand, the
Eitot (t) = Ei (t)ηdyn (t) + Eist (t) (11)
microgrid takes a necessary decision about the energy supply
where, ηdyn (t) is energy coefficient for dynamic energy con- plan using a binary decision variable ζt ∈ {0, 1}
sumption at time slot t and the value of parameters ηdyn (t)
1, if gsto (t) ≥ 0, t ∈ T
are known, which are also depended on types of BS [37]. ζt = (18)
0, otherwise
Therefore, the total energy consumption under the microgrid
for N BSs is defined by where ζt = 1 if microgrid is able to fulfill the energy demand
dyn from its own energy generation sources at time slot t, and 0
E tot (t) = Ei (t)ηdyn (t) + Eist (t) . (12) otherwise.
∀i∈B The problem formulation is as follows:
3) Microgrid Energy Generation: In the microgrid, there
min ωik (t)xijk ζt ξijk (19)
are two type of energy sources. The first type is the renew- x,ζ
∀t∈T ∀i∈B ∀j∈Ci ∀k∈K
able energy sources (e.g., solar, wind, biofuels, etc.), where
the amount of renewable energy generation at time slot t is s.t. xijk = 1 ∀k ∈ K (19a)
denoted by gren (t). The second type of energy is nonrenew- j∈Ci

able energy sources (e.g., diesel generator, coal power, and so xijk ≤ K ∀j ∈ Ci (19b)
on), where the amount of nonrenewable energy at time slot
k∈K
t is defined as gnon (t). Additionally, the nonrenewable energy
sources are connected with the main grid, the microgrid can ζt gsto (t) + (1 − ζt )gbuy (t) ≥ 0, t ∈ T (19c)
buy additional energy from the main grid if the microgrid 0 ≤ g (t) ≤ g (t) + g (t) + gsto (t), t ∈ T
tot ren non
(19d)
unable to fulfill the energy demand using its own energy Ht (.) ∈ {Ht (.), . . . , HT (.)} ∀t ∈ T (19e)
sources [30]. The total energy generation gtot (t) at time slot t 0 ≤ βk ≤ κi (t), k ∈ K, j ∈ Ci (19f)
is as follows:
0 ≤ γk ≤ τi (t), k ∈ K, j ∈ Ci (19g)
gtot (t) = gren (t) + gnon (t). (13) ζt ∈ {0, 1}, t ∈ T (19h)
Therefore, the additional amount of buying energy gbuy (t) xijk ∈ {0, 1} ∀j ∈ Ci , k ∈ K. (19i)
from the main grid is defined by In problem (19), constraint (19a) ensures that one task
g buy
(t) = E (t) − g (t).
tot tot
(14) will be assigned to the only one multiaccess edge server and
constraint (19b) determines the maximum number of tasks
Additionally, microgrid is capable of storing the sur- is not greater than the total number of associated tasks K
plus amount of energy gsto (t) in storage medium for future with the MEC servers Ci in BS i. Constraint (19c) represents
usages and the amount of stored energy at time slot t is as the coupling between energy demand of MEC networks and
follows [38]: energy supply plan for the microgrid side. Here, the variable
ζt takes decision regarding the amount of energy store gsto (t)
gsto (t) = gtot (t) − E tot (t). (15) or buy gbuy (t) at time t. Moreover, this decision depends on
the amount of total energy consumption E tot (t) of the MEC
B. Problem Formulation networks at time t, which completely relies on variable xijk of
The energy consumption ξijk for a single user task k ∈ K at constraint (19a). Therefore, (19d) is a constraint that ensures
server j ∈ Ci is as follows: the total energy generation is not bigger than the sum of the
renewable, nonrenewable, and storage energies at time slot t.
dyn
ξijk = Eijk (t)ηdyn (t) + Eijk
st
(t). (16) Hence, the constraint (19e) is the coupling between the time
horizon and the historical data. The set of functions Ht (.) pre- capacity to execute task k ∈ K, and 0 otherwise
serves all the energy consumption and generation information
for ∀t ∈ T. Constraint (19f) fulfills the computational require- 1, 0 ≤ βk ≤ κi (t), k ∈ K

ykj = (21)
ment of each task with the computational capacity of MEC 0, otherwise.
server (7). Therefore, constraint (19g) ensures maximum tol- Similarly, zkj = 1 if server j ∈ Ci has capacity to execute task
erable latency of a task must be satisfied by the total (service k ∈ K in tolerable delay, and 0 otherwise
and waiting) delay (5) of the MEC networks. Finally, the (19h)
and (19i) defines the decision variables as a binary variable. 1, 0 ≤ γk ≤ τi (t), k ∈ K

zkj = (22)
The optimization problem (19) is a mixed integer nonlin- 0, otherwise.
ear programming problem, where the combinatorial properties Moreover, the variable ϒ determines length between the
make this problem NP-hard. Hence, it is extremely hard to interclass distance, where this variable determines well sep-
solve but not impossible, to find the global optimal result. arated and dense task cluster. The following is the con-
Therefore, to obtain the low complexity solution, we decom- straint programming model for multiaccess edge server energy
pose the formulated problem into two subproblems: 1) energy- consumption:
efficient tasks assignment for the MEC servers under the BS ⎛ ⎞
and 2) energy supply plan for the microgrid controller.
max ⎝ min ξkj xkj ⎠ (23)
ϒ x
j∈Ci k∈K
IV. P ROBLEM D ECOMPOSITION
We decompose our original problem (19) into two subprob- s.t. xkj = 1 ∀k ∈ K (23a)
lems. First, we reformulate the energy-efficient tasks assign- j∈Ci

ment problem for each BS in a constraint programming model, xkj ≤ K ∀j ∈ Ci (23b)
which provides a community discovery problem [41] and this k∈K
is similar to the label propagation method [42]. In our refor- 0 ≤ βk ≤ κi (t), k ∈ K (23c)
mulation, the constraints are able to generate the interesting
variants, which are very convenient to solve data-driven-based 0 ≤ γk ≤ τi (t), k ∈ K (23d)
problem, more likely our first subproblem. Second, we model 0 ≤ ykj + zkj ≤ 1, j ∈ Ci (23e)

the microgrid energy supply plan using the MDRL, while the βjk ≥ ui (t), j + 1 ∈ Ci (23f)
second subproblem is stochastic in nature. Furthermore, the ∀k∈Kj
MDRL provides guarantee to faster convergence to the optimal xkj ∈ {0, 1}, k ∈ K, j ∈ Ci (23g)
solution for using the prior knowledge of transitions from the
model [26]. ykj ∈ {0, 1}, k ∈ K, j ∈ Ci (23h)
zkj ∈ {0, 1}, k ∈ K, j ∈ Ci . (23i)
A. Energy-Efficient Tasks Assignment The objective of the problem (23) is to serve all tasks
We have defined our first subproblem as a constraint pro- ∀k ∈ K with energy-efficient while maximizing number of
gramming problem for solving in a data-driven approach. We server usages under the BS i ∈ B. Constraints (23a)–(23d)
recall the set of multiaccess edge servers Ci at BS i ∈ B, where and (23g) are similar to constraints (19a), (19b), (19f), (19g),
these servers execute a set of computational tasks K and the and (19i) in problem (19). Constraint (23e) is used to handle
tasks are already associated with BS i ∈ B using the task asso- the unpredictable task assignment at multiaccess edge sever
ciation indicator ωik (t) = 1, ∀k ∈ K. Moreover, the task k ∈ K Ci ∈ Ci . If any server j ∈ Ci exceeds the maximum com-
is executed by server j ∈ Ci and this single task is performed putational capacity or the tolerable delay, then the task is
by only one server. Hence, from the task execution point of assigned to another server j + 1 ∈ Ci , as determined by con-
view, the set of multiaccess edge server Ci creates a partition of straint (23f), which ensures server load balancing. Finally, the
the set of tasks K incurred by the optimization objective, and constraints (23h) and (23i) restrict variables y and z as binary
mathematically we can represent this as Cij ⊆ K, ∀Cij ∈ Ci . In variables.
addition, we have two domains: the first domain is the set of
tasks K and the second is the set of multiaccess edge serves Ci . B. Microgrid Energy Supply Plan
We have considered a 2-D space with computational
In order to solve the microgrid energy supply plan problem,
requirements βk and the maximum tolerable delay γk for every
we use an MDRL for the second subproblem of the proposed
task k ∈ K. The decision variable xkj = 0 denotes that server j
problem (19). The goal of this subproblem is to provide high
is assigned for task k, and is 0 otherwise. Therefore, we have
accuracy energy supply plan of the MEC networks by dealing
rewritten the decision variable equation (17) as follows:
with uncertainties of both energy consumption and genera-

tion, where the energy consumption is the aggregated output
1, if server j ∈ Ci assigned for task k ∈ K
xkj = (20) from the first subproblem (23). On the other hand, the energy
0, otherwise.
generation depends on various types of the energy source of
Additionally, we have added two more binary decision vari- microgrid, such as renewable (e.g., solar, wind, biofuels, etc.),
ables for fulfilling the computational capacity and tolerable nonrenewable (e.g., diesel generator, coal power, and so on),
latency. ykj = 1 if server j ∈ Ci satisfy the computational and stored energy.
In the MDRL settings, we consider a set of states S =

{1, 2, . . . , S}, where a state st ∈ S consists of a four-
elements tuple tdem , tren , tnon , tsto for a single state space
at time slot t. Therefore, tdem , tren , tnon , and tsto determine
the amount of energy demand, renewable energy genera-
tion, nonrenewable energy generation, and store energy of
the microgrid, respectively. We also consider a set of actions
A = {1, 2, . . . , A}, where a action at ∈ A consists of two-
actions tuple ζt1 , ζt0 , in which ζt1 represents energy store
decision and action ζt0 presents buying decision for time slot
t. A set of observations O = {1, 2, . . . , O}, where ot ∈ O
represents a single observation for time slot’s t and this con-
sists of a three-elements tuple st , at , st with current state st ,
current action at , and next state st .
For a given action at ∈ A, we consider a parameter
θ , which determines a stochastic policy πθ with a state Fig. 2. MDRL.
transition probability Pθ (st |st , at ). In MDRL model, for a
given state-action pair (st , at ), an action at selection deci-
sion for the next state st is determined by a reward function However, a state-action value function Qπθ (st , at ) can help
R(st , at ), where the probability distribution of that reward to choose the action at using the policy πθ (at |st ). The state-
function depends on action value function is defined by
the state transition probability distribution ∞
Pθ (ot ) = Pθ (s1 ) i∈T Pθ (at |st )Pθ (st |at , st ) from the obser-
πθ
vation ot at time slot t. Therefore, for both state transition Q (st , at ) = Eπθ α Rt+l (st , at )|st , at .
l
(28)
and reward function work in a sequential manner. Hence, the l=0
objective is to choose a probabilistic policy πθ over the actions Additionally, (28) satisfies the coupling constraint (19e) of the
from the observation ot in order to maximize the value function problem (19) between the time slots, where the decision made
V πθ (st ) [26] based on the history of previous state information. Moreover,
the action selection decision fully depends on the state transi-
∞
tion probability Pθ (st |st , at ) of the state space, which reflects
πθ
V (st ) = max Eot ∼Pθ (ot ) α Rt+l (st , at )|st
l
(24) on the dynamics of Markovian. Therefore, the state-action
θ
l=0 function is redefined by

where the value function V πθ (st ) determines the expected Qπθ (st , at ) = R(st , at ) + αPθ (st |st , at )V πθ (st ). (29)
st
cumulative discounted rewards with a discount factor α.
Therefore, we redefine the value function (24) as a sum of Equation (29) defines the maximum value of the state-action
the immediate reward R(st , at ) with current policy πθ (at |st ), value function Qπθ (st , at ) for any observation (state) that
where the state transition probability Pθ (st |st , at ) determines chooses a specific action. Equation (27) provides the optimal
from policy πθ (at |st ) policy for the state-action value function and the state-action
value function Qπθ (st , at ) is used for the reconstruction of the
⎡ ⎤ optimal policy πθ∗ (at |st ).

V πθ (st ) = πθ (at |st )⎣R(st , at ) + αPθ (st |st , at )V πθ (st )⎦. Fig. 2, depicts the working procedure of MDRL, where
at ∈A st ∈S the model calculates rewards R(st , at ) from the observation
(25) (state information) tdem , tren , tnon , tsto and determines the
optimal policy πθ∗ (at |st ) using the current policy πθ (at |st ). A
supervised learning technique is used, where backpropagation
Hence, the value function with optimal policy πθ∗ (at |st ) is as method updates the model for deep Q-networks (DQNs) [43].
follows: The loss function of MDRL is defined by
∞ 1 1
L(t) = min Qπθ so , ao − so 2 (30)
πθ∗
V (st ) = max E α Rt+l (st , at )|st , at .
l
(26) θ |O| 2 t t t
a∈A ot ∈O
l=0
where, ot ∈ O and ot ∈ O represent current and next observa-
∗ tion, respectively. The backpropagation for loss function (30)
Now, consider a new state space st with optimal value V πθ (st )
and the optimal value function at time t is determined by shows in Fig. 3, where the objective is to maximize the reward
R(st , at )
⎡ ⎤
max Eπθ (R(st , at )). (31)
πθ∗ πθ∗ at
V (st ) = max E⎣R(st , at ) + αPθ (st |st , at )V (st )⎦. t∈T
at ∈A
st ∈S In the proposed MDRL model, a Softmax function deter-
(27) mines the probability distribution for choose an action at with
Fig. 3. MDRL backpropagation.
a maximum reward R(st , at ) and the probability P(at ) for

action at is defined by, [44]
E(R(st , at )/ )
P(at ) = n (32)
i=1 E(R(st , at )/ )
Fig. 4. Data-driven energy supply plan procedure.
where R(st , at ) is the reward for an action at and a temper-
ature parameter is . If the value of → ∞, then P(at ) is
almost the same. Thus, a lower value of provides the high-
Hence, the function of weight updates wt with corrected bias
est expected reward, where the action probability P(at ) tends
is defined by
to 1.
In this model, we have two actions ζt1 and ζt0 , where action ˆw
ζt determines the energy storing decision and energy buy-
1 wt = −r (39)
ing decision is determined by, ζt0 . Here, a rectified linear νˆw + ε
unit (ReLU) activation function is capable to activate a fully where r is a learning rate and ε is a very small value, which
connected hidden neural networks of the DQNs for this two protects from division by zero. Therefore, the updated weight
actions [45]. The ReLU activation function is defined by for next time slot t + 1 is as follows:
f (at ) = max(at , 0). (33) wt+1 = wt + wt . (40)

at
A sigmoid activation function returns monotonically increas- V. DATA -D RIVEN E NERGY S UPPLY P LAN S OLUTION
ing values in the range of −1 to 1 [46]. The sigmoid activation We solve the multiaccess edge server energy supply plan
function for output layer is as follows: problem (19) in a distributed approach. The data-driven energy
eat supply plan procedure is presented in Fig. 4. In the first part,
g(at ) = . (34) the objective is to find the energy-efficient heterogeneous task
+1eat
execution for all BSs, and each BS solves this problem inde-
In order to train the proposed MDRL model, we use adap-
pendently. The second part provides the energy supply plan,
tive moment estimation stochastic gradient descent optimizer
and the microgrid controller is responsible for solving this
for approximations [47]. This approximation algorithm esti-
challenge.
mates the first and second moments of the gradient and also
1) We solve the energy consumption problem (23) for each
computes individual adaptive learning rates for a different
BS i ∈ B individually. To solve this problem, we use
batches of data with different parameters. The first and second
unsupervised learning to determine the energy-efficiency
moments of the loss function (30) are defined by
while satisfying the computational and the maximum
w
t+1 = ϑ1 t + (1 − ϑ1 )∇ L(t)
w w
(35) tolerable latency requirements for each task. We have
2 chosen the DBSCAN technique for the unsupervised
νt+1
w
= ϑ2 νtw + (1 − ϑ2 ) ∇ w L(t) (36) learning [25]. The reason for this is that heteroge-
where ϑ1 and ϑ2 are the decay rates. Since, this approximation neous tasks have different computational requirements
algorithm usages the adaptive learning rates, where the step and data features (maximum tolerable delay) that are
size is important for the first few iterations of the training nondeterministic and further the tasks requested are not
process. Therefore, this algorithm performs a bias correction predictable. Moreover, to handle the most unpredictable
before the weight update estimation and the bias correction tasks, we can use the outlier properties. Additionally,
functions for the first and second moments are the worst case complexity of DBSCAN is O(n2 ) and the
average case is O(n log n), which are more convenient
w
ˆw = t+1
(37) considering the computational capacities of the MEC
1 − (ϑ1 )t+1 servers. Additionally, we have combined this with a con-
νt+1
w
trol flow algorithm for the server task load balancing to
νˆw = . (38)
1 − (ϑ2 )t+1 achieve the energy-efficiency.
2) An MDRL technique is used to solve the energy sup- Algorithm 1 Energy-Efficient Tasks Assignment for BS i ∈ B
ply plan for the microgrid. The reasons behind using Based on DBSCAN
MDRL are as follows. First, the model learns the Input: Dt = βk , γk , ∀k ∈ K, t ∈ T
dynamic behavior of the environment in a supervised Output: ξt , t ∈ T
learning manner from the historical data using deep Initialization: ϒ
neural networks. Second, this model finds the optimal 1: for ∀k ∈ K, i ∈ B do
policy by observing the current environment and the 2: if (k is visited) then
optimal policy used before updating the model. Finally, 3: Continue to k + 1 ∈ K
MDRL provides faster convergence to the optimal solu- 4: else
tion using a small number of interactions with the 5: for ∀j ∈ Ci , i ∈ C do
real-time observation from the environment [26]. 6: Constraints: (23c), (23h)
From the implementation point of view, the energy-efficient 7: if (0 ≤ βk ≤ κi (t), k ∈ K && ϒ < βk ) then
tasks assignment method for BS i ∈ B is deployed under 8: ykj = 1, k ∈ K, j ∈ Ci
BS i. This method runs every 15 min (time slot t) to serve 9: else
the associated tasks with BS i ∈ B. Also, after each time 10: ykj = 0, k ∈ K, j ∈ Ci
slot t, the DBSCAN model will be updated and it sends 11: end if
the amount of energy consumption ξi to the microgrid con- 12: Constraints: (23d), (23i)
troller. A similar strategy is performed by the BSs ∀i ∈ B 13: if (0 ≤ γk ≤ τi (t), k ∈ K && ϒ < γk ) then
under the microgrid coverage area. The microgrid controller 14: zkj = 1, k ∈ K, j ∈ Ci
is responsible for execution of the MDRL method. 15: else
16: zkj = 0, k ∈ K, j ∈ Ci
17: end if
A. Energy-Efficient Tasks Assignment for BS Based on 18: Constraints:
(23a), (23b), (23e), (23f), (23g)
DBSCAN 19: if ( k∈K xkj ≤ K, ∀j ∈ Ci ) then
To solve the problem (23), we have used the unsuper- 20: xkj = 1, Ci ∈ Ci
vised learning (DBSCAN) method for task categorization and 21: else if (0 ≤ ykj + zkj ≤ 1) then
server assignment based on the computational demand and 22: xkCi =1, Ci ∈ Ci
maximum tolerable delay features. DBSCAN-based method 23: else if ( ∀k∈Kj βjk ≥ ui (t)) then
provides the global optimal result in respect to task catego- 24: xk(j+1) = 1, j + 1 ∈ Ci
rization [48]. The flow control algorithm concept has used 25: else
for server load balancing so as to minimize the energy con- 26: xkj = 0, Ci ∈ Ci
sumption of the multiaccess edge servers to fulfill the task 27: end if
requests. 28: end for
The proposed DBSCAN-based energy-efficient tasks assign- 29: end if
ment Algorithm 1 verifies the computational capacity βk and 30: Calculate: ξkj , ∀k ∈ K, ∀j ∈ Ci , ξkj using eq. (16)
the maximum tolerable delay γk for task k ∈ K in lines 7 31: end for
through 17, and calculates the similarity of the tasks using 32: Update the clustering model using ξt
variable ϒ. Lines 7–11 label the decision variable ykj to 33: return ξt
ensure the computational capacity satisfies constraints (23c)
and (23h). Lines 12–17 are responsible for the delay tol-
erance using the decision variable zkj , which fulfills con- Therefore, the action selection process executes backprop-
straints (23d) and (23i). The server assignment with the load agation algorithm through the model Qπθ (st , at ) with fully
balancing decision is made via lines 18–27, which reflects connected neural networks and accomplishes the first action
constraints (23a), (23b), and (23e)–(23g) of the problem (23). at for observing the new state (observation) st . Then, this
Algorithm 1 aggregates the results of energy consumption dur- algorithm appends the observation output ot = st , at , st and
ing the 15 min of time slot t and sends the result ξt to the updates the model. Finally, the energy supply plan will be
microgrid controller before starting the time slot t + 1. executed based on the prediction results.
B. Microgrid Energy Supply Plan Based on MDRL VI. E XPERIMENTAL R ESULT AND D ISCUSSION
Our proposed MDRL-based microgrid energy supply plan The proposed multiaccess edge server energy supply plan
in Algorithm 2 runs at the microgrid controller. This model is implemented on the Python platform. To evaluate
initializes the base policy πθ (st , at ), which is generated this model, we have used three well-known datasets [49]–[51],
from the historical data to collect the observation output where the first two datasets are for computational task requests
ot = st , at , st for each time slot t. The microgrid con- and the third one for solar energy generation data. Moreover,
troller collects the energy demands from all associated BSs in our energy consumption model, we have used the energy
∀i ∈ B under the microgrid for at each time slot t. consumption parameters according to Raspberry Pi 3 Model
This algorithm learns themodel Qπθ (st , at ) to minimize B as a baseline for each multiaccess edge server [52], and
the residual minθ (1/|O|) ot ∈O (1/2)Qπθ (sot , aot ) − sot 2 . also considered edge servers. As a result, we do not need to
Algorithm 2 Microgrid Energy Supply Plan Based on MDRL

Input: Observation st = tdem , tren , tnon , tsto
Output: Observation output ot = st , at , st
Initialization: Base policy πθ (st , at )
1: for ∀t ∈ T do
2: Constraints: (19c), (19d),(19e), & (19h)
πθ
ot ∈O 2 ||Q (sot , aot ) − sot || do
1 1 2
3: for Until: min |O |
θ
4: Learn model: Q πθ (s , a )
t t
5: for Until: max t R(st , at ) do
∗
6: Choose model: Qπθ (st , at )
7: Using: Equations (32), (33), (34), and (40)
8: Execute : Action at ∈ A at t ∈ T
9: Observe : State st = tdem , tren , tnon , tsto
10: Update : State Action (st , at , st ) at t ∈ T
11: Append : Observation output ot at t ∈ T
12: end for
13: end for
14: end for
15: return ot ∈ O, ∀t ∈ T Fig. 5. Computational task execution energy consumption after 24 h (15 min
time slots).
TABLE II
E XPERIMENTAL E NVIRONMENT S UMMARY
Fig. 6. Energy consumption ECDF for computational task execution.

include a cooling system in our experiment. Table II represents
the overall experimental environment.
the random greedy and greedy methods of energy consump-
tion, respectively. We observe more clearly from the empirical
A. MEC Servers Energy Consumption Numerical Analysis distribution function (ECDF) in Fig. 6 that the proposed solu-
The energy consumption model has implemented accord- tion for energy consumption is almost the same as for the
ing to Algorithm 1 and compared with the random greedy greedy method (the probability of having less than 60 KW/h
and greedy methods for the task assignment problem energy consumption is around 0.8). The proposed DBSCAN-
[53], [54]. We preprocessed and combined the CRAWDAD based unsupervised learning (Algorithm 1) is more robust, less
nyupoly/video [49] and CRAWDAD due/packet-delivery [50] complex, and more reliable for handling the unpredictable task
datasets regarding heterogeneous task requests, considering the requests. On the other hand, the random greedy method with
payload size, computational demand and maximum tolerable a uniform policy for energy consumption has a probability of
delay. Our proposed Algorithm 1 is executed in real time for 0.82 to have an energy consumption less than 60 KW/h.
cluster-based tasks assignment for a single BS and after one The tradeoff between the random greedy and proposed
time slot (15 min), the total energy consumption results are methods is clearly demonstrated in Fig. 7, where the proposed
sent to the microgrid controller to execute the microgrid con- model and greedy methods provide around 97% successful
troller energy supply plan method. Further, all BSs under the task execution. However, the random greedy task execution
microgrid execute this procedure independently. success rate is not more than 78%. Additionally, the proposed
Fig. 5 describes the energy consumption result for all three method on average saves 5.7% and 7.3% energy compared
methods. The dashed line with crossed marks (red) shows the with the random greedy and greedy methods, respectively.
DBSCAN-based task execution energy consumption results for Fig. 8 presents the average server utilization for task execu-
24 h with 15 min time slots, whereas the lines with solid cir- tion. Our proposed method assigned tasks to each multiaccess
cle marks (green) and dotted diamond marks (blue) represent edge server according to the task category, and this method
Fig. 7. ECDF for unassigned computational task. Fig. 9. ECDF for average server utilization.
Fig. 10. Silhouette scores for the task cluster performance analysis (96 time
slots).
Fig. 8. Average multiaccess server utilization for task fulfillment.
unknown. However, the evaluation can be performed using

handles unpredictable tasks using the load balancing policy only developed model, and the silhouette coefficient score and
while satisfying the computational and maximum tolerable Calinski–Harabaz index metrics are appropriate.
delay constraints. The proposed method’s average percentage The silhouette coefficient score is one of the most well-
of server utilization for five multiaccess edge servers in one used technique for such evaluation [55]. A higher silhouette
BS is indicated with star marks bar (red color) in Fig. 8. coefficient score is associated with better performance. The
Additionally, the first three severs utilization is higher than silhouette coefficient score ranges from −1 to 1 and a higher
the last two servers. However, the proposed method is applica- score indicates the clusters are dense and well separated, thus
ble to any number of task categories because this method uses satisfying the standard concept of a cluster formation. The
the concept of DBSCAN. The uniformly random policy-based, silhouette coefficient for a single sample d is
random greedy method server utilization percentage is almost b−a
the same for all the servers (crossed marks with green bar), but d= (41)
max(a, b)
this method is not capable of fulfilling the task requests due to
the computational capacity constraint, which is demonstrated where a denotes the distance between a sample and all other
in Fig. 7. Moreover, the dotted yellow bar in Fig. 8 shows points in the same class, and the mean distance between a sam-
that most of the tasks are assigned to the first server, and then ple and all other points in the next nearest cluster is denoted bt
proceeds in a sequential manner. Additionally, if the server b. In addition, the following is the silhouette coefficient score
utilization is higher then the energy consumption is increased. for all data in a time slot t:
Fig. 9 provides a clear view of how the proposed method has b−a
dt = . (42)
less server utilization compared with the other two methods, max(a, b)
∀d∈Dt
and how our proposed method performance is better than that
of the greedy method. Fig. 10 shows the ECDF (for 96 time slots) of the silhou-
We have verified our proposed DBSCAN (unsupervised ette coefficient scores for task clustering, where the average
learning) method using the silhouette coefficient score and score is 0.6963, that demonstrating the task clustering of the
Calinski–Harabaz index metrics [55], [56]. For unsupervised proposed method performs better. Even though, we have used
learning, it is very difficult to evaluate the performance DBSCAN, this metrics indicates that our proposed method
regarding the cluster since the ground truth labels are performs better for task categorization.
Fig. 11. Calinski–Harabaz score for task cluster performance analysis

(96 time slots).
The Calinski–Harabaz index performance metric is the

second metric used for further verifying our task cluster-
ing performance [56]. This procedure also does not require
ground truth labels, so this is appropriate for evaluating
DBSCAN (unsupervised learning). The following is the
Calinski–Harabaz index score I(p) for the p clusters:

p K−p
I(p) = (43)
p p−1
where K is the total number of task, and m and m are the
between group dispersion matrix and dispersion matrix within-
cluster, respectively. The values of m and m are measured
using the following:

p
T
p = a − vq a − vq (44)
q=1 a∈Vq
Fig. 12. Energy supply plan for 24 h using training set.
and
T
p = dq vq − v vq − v (45) TABLE III
q MDRL PARAMETERS
where Vq is the set of points in cluster q with center vq , and

dq is the number of points in cluster q.
Fig. 11 presents the ECDF for the Calinski–Harabaz score.
For around 80% of the 96 time slots (for 1 day with a 15-min
duration for each time slot), the scores are more than 715,
which demonstrates our proposed method performs better for
task categorization.
B. Microgrid Energy Supply Plan Numerical Analysis

MDRL for the microgrid energy supply plan has also
implemented on the Python platform, along with TensorFlow
APIs [57]. Therefore, we used the energy consumption output
from Algorithm 1 as the input for energy demand for each time
slot t. We used the UMass solar panel dataset for estimating from the historical data, and the policy and model are updated
the renewable energy generation for each day with 96 time using real observations and the learned model.
slots at 15-min durations for each [51]. Based on the number Figs. 12 and 13 show the energy supply plan for 24 h with
of solar panel units for each day, we have divided this dataset 15-min durations using training set and test set, respectively. In
into 70% and 30% as training and testing datasets, respec- Fig. 12, the top subfigure describes the overall energy demand
tively [24]. Table III describes the parameter and co-efficient using red diamond marks with a solid line, and the green
values for the proposed microgrid energy supply model. This cross marks with a dashed line represent the energy supply
model usages the first 100 episodes for learning the model plan using the training model. Similarly, the second, third,
(a)
(b)
Fig. 15. MDRL convergence and reward value.
Fig. 13. Energy supply plan for 24 h using test set (January 11, 2015).
(a)
(b)
Fig. 16. Precision recall curve and ROC curve for the MDRL training model.
Fig. 14. Training and testing MSE, RMSE, MAE, and RMAE for the MDRL
learning model. MDRL learning model using training and test sets. In case of
the training set, the RMAE error is 1.41%. The reason behind
the small amount error is, the proposed MDRL learns from the
and fourth subfigures present the renewable energy genera- historical data first, where the MDEL model is updated on real
tion, nonrenewable energy demand, and the storage amount of environmental observations. However, the RMAE for the test
the energy forecast, respectively. On the other hand, Fig. 13 set is around 6.76%, which ensures that the proposed model
shows energy supply plan for 24 h using the test set, where works well under the uncertainties for both energy demand
mean absolute percentage error for energy demand, renew- and generation.
able, nonrenewable, storage energy are 8.78%, 7.23%, 6.26%, The convergence and reward values are given in Fig. 15,
and 3.75%, respectively. Furthermore, Fig. 14 represents the where the first subplot shows the convergence of the proposed
mean square error (MSE), root MSE (RMSE), mean absolute model using the loss function and the second subplot describes
error (MAE), and root mean absolute value (RMAE) for the the reward value. The reward function is a monotonically
increasing function and reaches convergence after the around [8] “Sustainable energy use in mobile communications,” Stockholm,
160 episodes. We have verified the performance of our model Sweden, Ericsson, White Paper, Aug. 2007.
[9] Y. Mao, Y. Luo, J. Zhang, and K. B. Letaief, “Energy harvesting small
using precision recall curves and receiver operating character- cell networks: Feasibility, deployment, and operation,” IEEE Commun.
istic (ROC) curves, which are commonly used performance Mag., vol. 53, no. 6, pp. 94–101, Jun. 2015.
metrics for supervised learning model verification [58]. In [10] X. Huang and N. Ansari, “Energy sharing within EH-enabled wire-
less communication networks,” IEEE Wireless Commun., vol. 22, no. 3,
Fig. 16, the top subplot presents the precision recall curve pp. 144–149, Jun. 2015.
and second subplot shows the ROC curve, which verify the [11] M. Hong and H. Zhu, “Power-efficient operation of wireless heteroge-
training result of the proposed MDRL method. neous networks using smart grids,” in Proc. IEEE Int. Conf. Smart Grid
Commun. (SmartGridComm), Venice, Italy, 2014, pp. 236–241.
[12] Y. Wei, F. R. Yu, M. Song, and Z. Han, “User scheduling and resource
VII. C ONCLUSION allocation in HetNets with hybrid energy supply: An actor-critic rein-
forcement learning approach,” IEEE Trans. Wireless Commun., vol. 17,
In this paper, we investigated the MEC network energy no. 1, pp. 680–692, Jan. 2018.
consumption while considering computational task loads for [13] X. Huang, T. Han, and N. Ansari, “Smart grid enabled mobile networks:
the MEC servers, the energy generation characteristics from Jointly optimizing BS operation and power distribution,” IEEE/ACM
Trans. Netw., vol. 25, no. 3, pp. 1832–1845, Jun. 2017.
microgrid energy sources (i.e., renewable, nonrenewable, and [14] T. Han and N. Ansari, “A traffic load balancing framework for software-
storages), and the energy supply plan for microgrid-enabled defined radio access networks powered by hybrid energy sources,”
MEC networks. In order to solve the proposed multiac- IEEE/ACM Trans. Netw., vol. 24, no. 2, pp. 1038–1051, Apr. 2016.
[15] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision
cess edge server energy supply plan problem distributedly, and challenges,” IEEE Internet Things J., vol. 3, no. 5, pp. 637–646,
first we decompose the proposed problem into two subprob- Oct. 2016.
lems. Second, we have proposed DBSCAN-based approach [16] S. Shahzadi, M. Iqbal, T. Dagiuklas, and Z. Ul Qayyum, “Multi-access
edge computing: Open issues, challenges and future perspectives,” J.
for measuring the energy consumption of the network while Cloud Comput., vol. 6, p. 30, Dec. 2017.
performing the task assignment at MEC. Consequently, we [17] C. Li et al., “Towards sustainable in-situ server systems in the big data
have applied the MDRL-based mechanism to derive the solu- era,” in Proc. ACM/IEEE 42nd Annu. Int. Symp. Comput. Archit. (ISCA),
Portland, OR, USA, 2015, pp. 14–26.
tion for microgrid energy supply plan, according to the energy [18] D. Huang, P. Wang, and D. Niyato, “A dynamic offloading algorithm
demand from the MEC network with the energy generation for mobile computing,” IEEE Trans. Wireless Commun., vol. 11, no. 6,
through microgrid. The simulation results establish that the pp. 1991–1995, Jun. 2012.
[19] L. Chen, S. Zhou, and J. Xu, “Energy efficient mobile edge computing
proposed approaches significantly reduce the energy consump- in dense cellular networks,” in Proc. IEEE Int. Conf. Commun. (ICC),
tion than that of the greedy and random greedy methods Paris, France, 2017, pp. 1–6.
without degrading the performance of the tasks computation [20] D. Li, W.-Y. Chiu, H. Sun, and H. V. Poor, “Multiobjective optimization
at MEC. Additionally, overall energy supply plan exhibits a for demand side management program in smart grid,” IEEE Trans. Ind.
Informat., vol. 14, no. 4, pp. 1482–1490, Apr. 2018.
high accuracy prediction, as a result the proposed approaches [21] M. A. Mengistu, A. A. Girmay, C. Camarda, A. Acquaviva,
mitigate uncertainty for both MEC tasks load and microgrid and E. Patti, “A cloud-based on-line disaggregation algorithm for
energy generation. Finally, the proposed approaches ensure the home appliance loads,” IEEE Trans. Smart Grid, to be published.
doi: 10.1109/TSG.2018.2826844.
sustainability of the MEC networks by significantly reducing [22] N. H. Tran, C. Pham, M. N. H. Nguyen, S. Ren, and C. S. Hong,
the risk of energy outage due to the nondeterministic task “Incentivizing energy reduction for emergency demand response in
loads. multi-tenant mixed-use buildings,” IEEE Trans. Smart Grid, vol. 9, no. 4,
pp. 3701–3715, Jul. 2018.
[23] C. Pham et al., “A distributed approach to emergency demand response
R EFERENCES in geo-distributed mixed-use,” J. Build. Eng., vol. 19, pp. 506–518,
Sep. 2018.
[1] F. Giust et al., “MEC deployments in 4G and evolution towards 5G,” [24] W. Kong et al., “Short-term residential load forecasting based on LSTM
Sophia Antipolis, France, ETSI, White Paper, Feb. 2018. Accessed: recurrent neural network,” IEEE Trans. Smart Grid, vol. 10, no. 1,
Dec. 3, 2018. [Online]. Available: https://www.etsi.org/images/files/ pp. 841–851, Jan. 2019.
ETSIWhitePapers/etsi_wp24_MEC_deployment_in_4G_5G_FINAL.pdf [25] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algo-
[2] P. Porambage, J. Okwuibe, M. Liyanage, M. Ylianttila, and T. Taleb, rithm for discovering clusters a density-based algorithm for discovering
“Survey on multi-access edge computing for Internet of Things real- clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf.
ization,” IEEE Commun. Surveys Tuts., vol. 20, no. 4, pp. 2961–2991, Knowl. Disc. Data Min., Portland, Oregon, Aug. 1996, pp. 226–231.
4th Quart., 2018. [26] A. S. Polydoros and L. Nalpantidis, “Survey of model-based reinforce-
[3] “Cisco visual networking index: Global mobile data traf- ment learning: Applications on robotics,” J. Intell. Robot. Syst., vol. 86,
fic forecast update, 2016–2021,” San Jose, CA, USA, Cisco, no. 2, pp. 153–173, May 2017.
White Paper, Feb. 2017. Accessed: Jul. 3, 2018. [Online]. [27] M. G. R. Alam, Y. K. Tun, and C. S. Hong, “Multi-agent and reinforce-
Available: https://www.cisco.com/c/en/us/solutions/collateral/service- ment learning based code offloading in mobile fog,” in Proc. Int. Conf.
provider/visual-networking-index-vni/mobile-white-paper-c11-520862. Inf. Netw. (ICOIN), Kota Kinabalu, Malaysia, 2016, pp. 285–290.
html [28] M. S. Munir, S. F. Abedin, M. G. R. Alam, N. H. Tran, and C. S. Hong,
[4] K. Zhang et al., “Energy-efficient offloading for mobile edge computing “Intelligent service fulfillment for software defined networks in smart
in 5G heterogeneous networks,” IEEE Access, vol. 4, pp. 5896–5907, city,” in Proc. Int. Conf. Inf. Netw. (ICOIN), 2018, pp. 516–521.
2016. [29] M. Rahimiyan and H. R. Mashhadi, “An adaptive Q-learning algorithm
[5] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A sur- developed for agent-based computational modeling of electricity mar-
vey on mobile edge computing: The communication perspective,” IEEE ket,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 40, no. 5,
Commun. Surveys Tuts., vol. 19, no. 4, pp. 2322–2358, 4th Quart., 2017. pp. 547–556, Sep. 2010.
[6] S. Kekki et al., “MEC in 5G networks,” Sophia Antipolis, France, ETSI, [30] Y. Zhang, M. H. Hajiesmaili, S. Cai, M. Chen, and Q. Zhu, “Peak-aware
White Paper, Jun. 2018. Accessed: Dec. 4, 2018. [Online]. Available: online economic dispatching for microgrids,” IEEE Trans. Smart Grid,
https://www.etsi.org/images/files/ETSIWhitePapers/etsi_wp28_mec_in_ vol. 9, no. 1, pp. 323–335, Jan. 2018.
5G_FINAL.pdf [31] T. Han and N. Ansari, “Network utility aware traffic load balanc-
[7] C. F. Calvillo, A. Sánchez-Miralles, and J. Villar, “Energy management ing in backhaul-constrained cache-enabled small cell networks with
and planning in smart cities,” Renew. Sustain. Energy Rev., vol. 55, hybrid power supplies,” IEEE Trans. Mobile Comput., vol. 16, no. 10,
pp. 273–287, Mar. 2016. pp. 2819–2832, Oct. 2017.
[32] L. Kleinrock, Queueing Systems, vol. 1. New York, NY, USA: Wiley, Md. Shirajum Munir received the B.S. degree
1975. in computer science and engineering from Khulna
[33] L. Chen, S. Zhou, and J. Xu, “Computation peer offloading for energy- University, Khulna, Bangladesh, in 2010. He is cur-
constrained mobile edge computing in small-cell networks,” IEEE/ACM rently pursuing the Ph.D. degree in computer science
Trans. Netw., vol. 26, no. 4, pp. 1619–1632, Aug. 2018. and engineering at Kyung Hee University, Seoul,
[34] S. F. Abedin et al., “Resource allocation for ultra-reliable and enhanced South Korea.
mobile broadband IoT applications in fog network,” IEEE Trans. He served as a Lead Engineer with the Solution
Commun., vol. 67, no. 1, pp. 489–502, Jan. 2019. Laboratory, Samsung Research and Development
[35] R. Bertran, M. Gonzalez, X. Martorell, N. Navarro, and E. Ayguade, “A Institute, Dhaka, Bangladesh, from 2010 to 2016.
systematic methodology to generate decomposable and responsive power His current research interests include IoT network
models for CMPs,” IEEE Trans. Comput., vol. 62, no. 7, pp. 1289–1302, management, fog computing, mobile edge comput-
Jul. 2013. ing, software-defined networking, smart grid, and machine learning.
[36] Y. Sun, S. Zhou, and J. Xu, “EMM: Energy-aware mobility management
for mobile edge computing in ultra dense networks,” IEEE J. Sel. Areas
Commun., vol. 35, no. 11, pp. 2637–2646, Nov. 2017.
[37] G. Auer et al., “How much energy is needed to run a wireless network?” Sarder Fakhrul Abedin (S’18) received the
IEEE Wireless Commun., vol. 18, no. 5, pp. 40–49, Oct. 2011. B.S. degree in computer science from Kristianstad
[38] J. Xu, L. Chen, and S. Ren, “Online learning for offloading and autoscal- University, Kristianstad, Sweden, in 2013. He is cur-
ing in energy harvesting mobile edge computing,” IEEE Trans. Cogn. rently pursuing the Ph.D. degree in computer science
Commun. Netw., vol. 3, no. 3, pp. 361–373, Sep. 2017. and engineering at Kyung Hee University, Seoul,
[39] W. C. Chan, T. C. Lu, and R. J. Chen, “Pollaczek–Khinchin formula for South Korea.
the M/G/1 queue in discrete time with vacations,” IEE Proc. Comput. His research interests include Internet of Things
Digit. Techn., vol. 144, no. 4, pp. 222–226, Jul. 1997. network management, cloud computing, fog com-
[40] S. L. Song, K. Barker, and D. Kerbyson, “Unified performance and puting, and wireless sensor networks.
power modeling of scientific workloads,” in Proc. 1st Int. Workshop Mr. Abedin was a recipient of the scholarship for
E2SC, 2013, pp. 56–75. his graduate study at Kyung Hee University in 2014.
[41] M. Sozio and A. Gionis, “The community-search problem and how He is a member of KIISE.
to plan a successful cocktail party,” in Proc. 16th ACM SIGKDD Int.
Conf. Knowl. Disc. Data Min. (KDD), Washington, DC, USA, 2010,
pp. 939–948. Nguyen H. Tran (S’10–M’11–SM’18) received the
[42] C. Ruiz, M. Spiliopoulou, and E. Menasalvas, “C-DBSCAN: B.S. degree from the Ho Chi Minh City University
Density-based clustering with constraints,” in Proc. 11th Int. Conf. of Technology, Ho Chi Minh City, Vietnam, in 2005,
Rough Sets Fuzzy Sets Data Min. Granular Comput., 2007, and the Ph.D. degree in electrical and computer engi-
pp. 216–223. neering from Kyung Hee University, Seoul, South
[43] J. Suh and D. F. Hougen, “The context-aware learning model: Reward- Korea, in 2011.
based and experience-based logistic regression backpropagation,” in Since 2018, he has been with the School of
Proc. IEEE Symp. Comput. Intell. (SSCI), Honolulu, HI, USA, 2017, Computer Science, University of Sydney, Sydney,
pp. 1–8. NSW, Australia, where he is currently a Senior
[44] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Lecturer. He was an Assistant Professor with the
Cambridge, MA, USA: MIT Press, 1998. Department of Computer Science and Engineering,
[45] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Kyung Hee University, from 2012 to 2017. His current research interests
Boltzmann machines,” in Proc. 27th Int. Conf. Int. Conf. Mach. Learn., include applying analytic techniques of optimization, game theory, and
Haifa, Israel, Jun. 2010, pp. 807–814. stochastic modeling to cutting-edge applications, such as cloud and
[46] M. N. Gibbs and D. J. C. Mackay, “Variational Gaussian process clas- mobileedge computing, data centers, heterogeneous wireless networks, and big
sifiers,” IEEE Trans. Neural Netw., vol. 11, no. 6, pp. 1458–1464, data for networks.
Nov. 2000. Dr. Tran was a recipient of the Best KHU Thesis Award in Engineering
[47] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 2011 and the Best Paper Award of IEEE ICC 2016. He has been an
in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), Dec. 2014, pp. 1–5. Editor of the IEEE T RANSACTIONS ON G REEN C OMMUNICATIONS AND
[48] R. J. G. B. Campello, D. Moulavi, and J. Sander, “Density-based clus- N ETWORKING since 2016 and served as the Editor of the 2017 Newsletter
tering based on hierarchical density estimates,” in Proc. Adv. Knowl. of Technical Committee on Cognitive Networks on Internet of Things.
Disc. Data Min., 2013, pp. 160–172.
[49] F. Fund et al.. (May 2014). CRAWDAD Dataset Nyupoly/Video
(V. 2014-05-09). Accessed: Jul. 3, 2018. [Online]. Available:
https://crawdad.org/nyupoly/video/20140509 Choong Seon Hong (S’95–M’97–SM’11) received
[50] S. Fu and Y. Zhang. (Apr. 2015). CRAWDAD Dataset Due/Packet- the B.S. and M.S. degrees in electronic engineering
Delivery (V. 2015-04-01). Accessed: Jul. 3, 2018. [Online]. Available: from Kyung Hee University, Seoul, South Korea, in
https://crawdad.org/due/packet-delivery/20150401 1983 and 1985, respectively, and the Ph.D. degree
[51] Solar Panel Dataset. UMassTraceRepository. Accessed: Jul. 3, 2018. from Keio University, Minato, Japan, in 1997.
[Online]. Available: http://traces.cs.umass.edu/index.php/Smart/Smart In 1988, he joined Korea Telecom, Seongnam,
[52] Power Consumption Benchmarks. Raspberry Pi South Korea, where he performed research on broad-
Dramble. Accessed: Jun. 3, 2018. [Online]. Available: band networks as a Technical Staff Member. In 1993,
https://www.pidramble.com/wiki/benchmarks/power-consumption he joined Keio University, Tokyo, Japan. He was
[53] T. A. Feo and M. G. C. Resende, “Greedy randomized adaptive search with the Telecommunications Network Laboratory,
procedures,” J. Glob. Optim., vol. 6, no. 2, pp. 109–134, 1995. Korea Telecom, as a Senior Member of the Technical
[54] R. Cohen, L. Katzir, and D. Raze, “An efficient approximation for the Staff and the Director of the Networking Research Team until 1999. Since
generalized assignment problem,” Inf. Process. Lett., vol. 100, no. 4, 1999, he has been a Professor with the Department of Computer Science
pp. 162–166, Nov. 2006. and Engineering, Kyung Hee University. His current research interests include
[55] P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation future Internet, ad hoc networks, network management, and network security.
and validation of cluster analysis,” J. Comput. Appl. Math., vol. 20, Dr. Hong has served as the General Chair, a TPC Chair/member, and an
pp. 53–65, Nov. 1987. Organizing Committee member for international conferences such as NOMS,
[56] T. Calinski and J. Harabasz, “A dendrite method for cluster analysis,” IM, APNOMS, E2EMON, CCNC, ADSN, ICPP, DIM, WISA, BcN, TINA,
Commun. Stat., vol. 3, no. 1, pp. 1–27, Jun. 2017. SAINT, and ICOIN. In addition, he is currently an Associate Editor of
[57] All Symbols in TensorFlow. TensorFlow. Accessed: Jul. 3, 2018. the IEEE T RANSACTIONS ON N ETWORK AND S ERVICE M ANAGEMENT,
[Online]. Available: https://www.tensorflow.org/api_docs/python/ the International Journal of Network Management, and the Journal of
[58] Model Evaluation: Quantifying the Quality of Predictions, Communications and Networks, as well as an Associate Technical Editor of
Scikit-Learn. Aug. 3, 2018. [Online]. Available: http://scikit- IEEE Communications Magazine. He is a member of the ACM, IEICE, IPSJ,
learn.org/stable/modules/model_evaluation.html KIISE, KICS, KIPS, and OSIA.

When Edge Computing Meets Microgrid A Deep Reinforcement Learning Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

When Edge Computing Meets Microgrid A Deep Reinforcement Learning Approach

Uploaded by

Copyright:

Available Formats

7360 IEEE INTERNET OF THINGS JOURNAL, VOL. 6, NO.