Professional Documents
Culture Documents
Abstract— In this paper, we present a blockchain-based mobile Meanwhile, new applications are developing in the directions
edge computing (B-MEC) framework for adaptive resource allo- of Internet of things (IoT), Internet of vehicles (IoV), e-
cation and computation offloading in future wireless networks, healthcare, tactile Internet and so on. However, the deployment
where the blockchain works as an overlaid system to provide
management and control functions. In this framework, how to of these applications is restricted by the energy, memory size,
reach a consensus between the nodes while simultaneously guar- computation resources of mobile devices [2]. These emerging
anteeing the performance of both MEC and blockchain systems applications with requirements in terms of intensive compu-
is a major challenge. Meanwhile, resource allocation, block size, tational capacity and sensitive latency can rely on advanced
and the number of consecutive blocks produced by each producer improved wireless technologies and computation offloading.
are critical to the performance of B-MEC. Therefore, an adaptive
resource allocation and block generation scheme is proposed. Future wireless networks are required to not only support
To improve the throughput of the overlaid blockchain system massive wireless access but also offer the provisioning of
and the quality of services (QoS) of the users in the underlaid computation offloading for mobile users.
MEC system, spectrum allocation, size of the blocks, and number To meet the demands of mobile users, future wireless
of producing blocks for each producer are formulated as a joint networks will become more heterogeneous and dense. In
optimization problem, where the time-varying wireless links and
computation capacity of the MEC servers are considered. Since the growth of more capable wireless networks, the scarcity
this problem is intractable using traditional methods, we resort of spectrum is always an impediment along the evolution
to the deep reinforcement learning approach. Simulation results of cellular networks from the first generation (1G) to the
show the effectiveness of the proposed approach by comparing upcoming fifth generation (5G) [3]. One reason is the binary
with other baseline methods. quality of the current spectrum access approach, i.e., licensed
Index Terms— Mobile edge computing, computation offloading, and un-licensed, which is an intentional set of policy choices.
blockchain, deep reinforcement learning.
To improve the spectrum efficiency, dynamic spectrum access
I. I NTRODUCTION becomes the norm. However, with an unprecedented level of
network densification in the future, the spectrum management
T HE progressive miniaturization of hardware is enabling
the massive deployment of smart mobile devices [1]. is of high complexity. Thus, smarter and more decentralized
dynamic spectrum access techniques are preferred.
Manuscript received January 6, 2019; revised May 27, 2019, September 4, In future communication networks, the edge clouds will be
2019, and November 11, 2019; accepted November 12, 2019. Date of
publication December 9, 2019; date of current version March 10, 2020. This deployed in the heterogeneous network and able to provide
work was supported in part by the National Natural Science Foundation of computation offloading services to users [4]. One of the
China under Grant 61671088 and Grant 61771070, in part by the Beijing promising paradigms is mobile edge computing (MEC) [5].
University of Posts and Telecommunications (BUPT) Excellent Ph.D. Students
Foundation under Grant CX2018201, and in part by the Canadian Natural Many outstanding works have been done on computation
Sciences and Engineering Research Council under Grant RGPIN-2019-06348. offloading [6]–[10], in which resource allocation, collabora-
The associate editor coordinating the review of this article and approving it tion, offloading strategy and pricing algorithm are investigated.
for publication was L. Le. (Corresponding author: Hong Ji.)
F. Guo, H. Zhang, and H. Ji are with the Key Laboratory of Universal However, some problems in this distributed and distrusted
Wireless Communications, Ministry of Education, Beijing University of environment are failed to be considered. First, it is impractical
Posts and Telecommunications, Beijing 100876, China (e-mail: fengxianguo@ to deploy or collaborate all the system resources, e.g., caching,
bupt.edu.cn; zhangheli@bupt.edu.cn; jihong@bupt.edu.cn).
F. R. Yu is with the Department of Systems and Computer computing, networking, due to the self-deployment nature and
Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada (e-mail: coexistence of multiple radio access service providers (SPs)
richard.yu@carleton.ca). and edge cloud vendors, which is precisely required by the
M. Liu is with the Beijing Key Laboratory of Space-ground Interconnec-
tion and Convergence, Beijing University of Posts and Telecommunications, logical system of traditional MEC. Second, there is no trusted
Beijing 100876, China (e-mail: liumengting@bupt.edu.cn). entity in the system to audit the computation offloading
V. C. M. Leung is with the College of Computer Science and Software process or ensure the proper and surefire payments to the
Engineering, Shenzhen University, Shenzhen 518060, China, and also with the
Department of Electrical and Computer Engineering, The University of British SPs and edge cloud vendors. Third, privacy is often cited
Columbia, Vancouver, BC V6T 1Z4, Canada (e-mail: vleung@ieee.org). as one of the key concerns in cloud adoption, especially
Color versions of one or more of the figures in this article are available when sensitive or personal information is outsourced to the
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TWC.2019.2956519 edge cloud vendors. Few cloud SPs can be fully trusted
1536-1276 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
1690 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 19, NO. 3, MARCH 2020
contracts [22], which can provide an incentive to ensure the users located around the BSs, each of which has a number of
interests of different parties in a trust-less computing market. computational tasks (e.g., online games, navigation, VR, health
The transaction records the computation offloading requests monitor and so on) to be completed. The users are denoted by
from the mobile users. These transaction records are jointly U = {U1 , . . . , Um , . . . , UM }. To complete the tasks, the users
approved by the consensus nodes selected from the blockchain are supposed to choose to offload the tasks to the BSs. In this
nodes, and then digitally stored in all nodes’ local blockchain paper, we don’t consider the local execution on the mobile
replicas. Since the mobile devices usually have limited storage, devices.
the full ledger is stored only in the blockchain nodes, i.e., the In the blockchain system, the consensus protocol adopts
edge servers in this paper. Fortunately, the blockchain provides the idea of both DPoS and PBFT. As noted, the selection of
publicly accessible records to the users about the transactions block producers is important in DPoS, which has been well
in this network, which are inevitably encrypted to provide studied in existing works [23]. Thus, we didn’t consider it
privacy guarantee. in this paper. PBFT provides safety and liveness while there
In this network, the offloading requests are listened by are less than (N − 1)/3 faulty nodes. In the B-MEC system,
all the BSs. When it is its own turn to produce blocks, assume that the N block producers take turns to produce K
the primary node validates and processes the transactions with blocks within interval Ṫ (in seconds), the size of which is SB
smart contracts. Then the computing results are sent back (in bits). K varies across different time periods to account for
to the users and the transactions are packaged into a new the time-varying characteristics of wireless networks.
block. After that, it comes to a consensus procedure, which As noted, computation offloading in this system includes
will be described in details in section IV. After the consensus four phases, 1) submitting the offloading requests to the
is reached among the nodes, the block is appended to the blockchain system, 2) executing smart contracts by the block
blockchain, which means the block reaches finality. producers, 3) sending back the results to the users, 4) reaching
consensus among the block producers. As noted, it involves a
communication model and a computation model, which will
B. Key Challenges of Computation Offloading in This
Framework be presented next.
each user and BS have the chance to be a transmitter of one where we have qx,y = P r(Ψs (t + 1) = y|Ψs (t) = x) and x,
multicast group. Assume that the user Um is allocated with y ∈ C.
WUm sub-channels, and BS Bn is assigned with WBn sub- Based on the above model, define the computation resources
channels. Due to the limited wireless spectrum, the following assigned from BS Bn to message s is ΨBn ,s (t). The execution
constraint should be met time for completing task Is at BS Bn can be calculated by
fs
WUm W0 + WWBn W0 ≤ W, (1) TBn ,s = . (5)
Um ∈U Bn ∈B
ΨBn ,s (t)
packaged into the new block. This procedure occurs within matching prepare messages from the other replicas, it enters
the block interval Ṫ /K. the next step.
Assume that the average size of one transaction is denoted In this phase, the transmission cost is caused by sending the
by . In this phase, the transmission latency in this phase prepare message to all other replicas, which can be calculated
tr
Treq can be expressed by by
tr SB
Treq = max { }. (6) tr
Tpre = max { }. (10)
Um ∈U RUm ,Bp
Bn ,Bn ∈B/{Bp },Bn =Bn RBn ,Bn
where RUm ,Bp denotes the transmission rate from user Um to For the computation cost, the primary node needs to verify
the primary node Bp . 2f signatures and MACs from the other replicas, which can
Considering the size of one block, the maximum number be expressed by Δpre,Bp = 2f (β + θ). For the other backup
of transactions that can be included in a block is SB /. nodes, each needs to generate a signature and N − 1 MACs
Uncivil execution is assumed that a g fraction of transactions for the prepare message. Then 2f signatures and MACs are
submitted by the clients are correct [25]. In this phase, required to be validated. Hence, the computation cost at the
the primary node needs to verify the signatures and MACs backup nodes Bn (= Bp ) can be given by Δpre,Bn = β +
for SB /g transactions and it also needs to execute smart (N − 1)θ + 2f (β + θ). Hence, the computation latency in this
contracts for SB / transactions. Hence, the computation cost phase is
at the primary node is Δreq,Bp = SB g (β+θ)
+ S Bα
. Thus, Δpre,Bn
c
the computation delay is Tpre = max { }. (11)
Bn ∈B FBn ,v
c Δreq,Bp
Treq = . (7) 4) Commit: Following receipt 2f matching prepare mes-
FBp ,p
sages from the other replicas that are consistent with the
As noted, there is no computation cost at the backup nodes. pre-prepare message, each replica sends a commit message
2) Pre-Prepare: After producing the new block, the primary to all the others, which includes the ID and signature of the
node multicasts the signed block along with a pre-prepare replica. Once upon receipt 2f matching commit messages,
message to all the backup nodes for validation, where the it enters the next step.
pre-prepare message contains the ID, signature of the primary To deliver the commit messages, the transmission latency
node and hashed result of the new block. Since the smart con- can be expressed by
tract is in charge of the execution of the offloaded computation SB
tasks, the backup nodes need to make sure that the offloading Tctr = max { }. (12)
Bn ,Bn ∈B,Bn =Bn RBn ,Bn
tasks are actually executed by the primary node except for
validating the identities and the economic parts. In this case, In this phase, each replica needs to generate 1 signature and
the intuitive method is to implement the smart contracts and N −1 MACs to form the commit messages. After receiving the
compare the computation results. commit messages, each replica needs to verify 2f signatures
Hence, after receiving the pre-prepare message and the new and MACs. Hence, the computation cost at each replica is
block, the backup nodes first verify the signature and MAC of Δc,Bn = β + (N − 1)θ + 2f (β + θ). Hence, the computation
the block, then the signatures and MACs of the transactions. latency in this phase is
Different from the work in [20], then smart contracts are Δc,Bn
executed by the backup nodes to validate the transactions. Tcc = max { }. (13)
Bn ∈B FBn ,v
If the pre-prepare message is accepted by some backup node,
5) Reply: After collecting 2f matching commit messages,
it enters the next step.
tr the new block becomes a valid one and it will be appended to
In this phase, the transmission latency Tprep can be calcu-
the blockchain. A reply message will be delivered, in which
lated by
the signature, ID, computation result for the offloading task
tr SB
Tprep = max { }. (8) are included. Different from the original BFT protocol [25],
Bn ∈B/{Bp } RBp ,Bn the reply message is delivered to the primary node, instead of
As noted, the primary node needs to generate one signature the clients, due to the mobile devices’ limited memory size.
and N − 1 MACs in this phase, which is given by Δprep,Bp = In this phase, the transmission cost is
β + (N − 1)θ. The computation cost at the backup nodes is SB
Δprep,Bn = β + θ + SB (α + β + θ), where there is Bn = Bp . Trtr = max { }. (14)
Bn ∈B/{Bp } RBn ,Bp
Hence, the computation latency in this phase is
For the computation cost, each backup node needs to
c Δprep,Bn
Tprep = max { }. (9) generate SB / signatures, and SB / MACs for the primary
Bn ∈B FBn ,v node, which can be given by Δr,Bn = SB (β + θ). For the
3) Prepare: After verifying the new block, each backup primary node, it needs to verify 2f signatures and MACs,
node sends a prepare message to all the other replicas, in which the computation cost of which is given by Δr,Bp = 2f (β +θ).
the replica ID and the signature are contained. Each replica Hence, the computation latency in this phase is
will check the prepare message to make sure that it is Δr,Bn
consistent with the pre-prepare message. Once upon receipt 2f Trc = max { }. (15)
Bn ∈B FBn ,v
1694 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 19, NO. 3, MARCH 2020
V. P ERFORMANCE A NALYSIS the left of which denotes the processing time to produce a
In this section, we give details of the performance of the block.
MEC system and blockchain system. For the MEC system, Since the primary node produces K blocks continuously,
the QoS of the users in terms of delay is given, which the last several blocks may be ignored due to the propagation
is the time from submitting the requests to receiving the delay to the next primary node. We assume the transmission
results. In the blockchain system, the most important criterions data rate from the current primary node to the next one is
to measure the system performance are throughput, time to Rp,p+1 . Hence, the number of ignored blocks can be calculated
finality, decentralization, and security. To address the four-way by
trade-off issue, the four properties will be presented in this SB /Rp,p+1
section. IB(SB , K) = − 1. (21)
Ṫ /K
As noted, there is IB ≤ K, which is apparent that the system
A. Performance of MEC won’t missing more blocks than produced ones.
To measure the QoS of the users, the delay experienced The throughput of the consensus protocol can be expressed
by the users is introduced, which consists of three parts, by
submitting the requests to the BS, executing the offloading
(K − IB) SB SB K
tasks (executing smart contracts), sending back the results Υ(SB , K, W) = Ξ = (K − +1), (22)
to the users. As analyzed in Section IV.B, the transmission K Ṫ Rp,p Ṫ
tr
latency Treq to submitting the requests is as expression (6). where W = {WUm , WB,n , Um ∈ U, Bn ∈ B} denotes the
As analyzed in Section IV.B, the primary node first verifies spectrum allocation profile. Υ denotes the number of transac-
the signature and MAC of the requests, then executes the tions that can be included into the blocks and transmitted to
offloading requests. The computation cost for one offloading the next primary node successfully.
request is α + β + θ. The processing latency consists of two 2) Time to Finality/Confirmation Latency: To guarantee the
parts, queuing delay and executing delay. For each transaction, security of the transactions, it is essential to prevent the
the executing delay is transactions to be arbitrarily changed or reversed. Time to
α+β+θ finality is the time that the transactions can’t be revoked
Te = . (16) once committed to the blockchain, which is important to
FBp ,p
some real-time applications. Longer delay frustrates users and
Hence, the average queuing delay can be expressed by makes applications built on a blockchain less competitive with
1 SB existing non-blockchain alternatives.
Tq = ( − 1)T e . (17)
2 Time to finality T f includes two parts, time for propagation
In this work, we don’t consider the sending back procedure T and time for computation T c .
p
as in [9], since the size of the output may be much smaller than T f = T p + T c. (23)
the input data, which corresponds to many practical scenarios,
such as virus detection, face recognition, and video analysis. Assume that each transmission procedure should be done
Hence, the average delay experienced by the users are within a timeout τtr . As discussed in section IV.B, the prop-
tr
agation time can be calculated by
TU = Treq + T e + T q. (18)
T p = ttr tr tr tr tr
req + tprep + tpre + tc + tr
tr tr tr
B. Performance of the Blockchain System = min{Treq , τtr } + min{Tprep , τtr } + min{Tpre , τtr }
1) Throughput: The throughput of the blockchain system + min{Tctr , τtr } + min{Trtr , τtr }. (24)
can be measured by the number of transactions that can be In this consensus protocol, it involves five procedures. The
processed successfully in unit time, which is related to two computation cost and computation latency for each procedure
procedures, i.e., block generation, and consensus reaching. are shown in section IV.B. We assume that each message
When producing a block, it is limited by the block size and should be processed within a timeout τc . Thus, for computation
the processing capacity of the primary node. Considering the latency T c , we have
block size, the transactions that can be included into the block
per time is T c = tcreq + tcprep + tcpre + tcc + tcr
c c c
SB K = min{Treq , τc } + min{Tprep , τc } + min{Tpre , τc }
Ξ(SB , K) = . (19) c c
Ṫ + min{Tc , τc } + min{Tr , τc }. (25)
The computation cost of producing one block is shown in 3) Decentralization: To characterize the decentralization of
Section IV.B. We assume that the computation resources of the the blockchain systems, we resort to Gini coefficient, which
primary node that are assigned to produce blocks is Fp,p Hz. is often used as a gauge of economic inequality, measuring
Considering the limited computation resources, the following income distribution or wealth distribution among a popula-
constraint should be met tion [26]. The definition to measure inequality is based on
SB (β + θ)/g + α Ṫ Lorenz curve [27]. Focusing on the decentralization of the
≤ , (20)
Fp,p K block producers, we consider the number of blocks that each
GUO et al.: ADAPTIVE RESOURCE ALLOCATION IN FUTURE WIRELESS NETWORKS WITH BLOCKCHAIN AND MEC 1695
replica produces over time, the set of which is denoted by assigned by the BSs to different messages, primary node ID.
K = {K(1), K(2), . . . , K(T )}. Hence, the Gini coefficient of Hence, the network state s(t) at time period t is expressed by
the distribution among K is expressed by
s(t) = {ΓU1 (t), . . . , ΓUm (t), . . . , ΓUM (t);
t∈T |K(t) − K(t )| ΓB1 (t), . . . , ΓBn (t), . . . , ΓBN (t);
G(K) = t ∈T
2 t∈T t ∈B K(t)
ΨB1 (t), . . . , ΨBn (t), . . . , ΨBN (t); Bp (t)}, (29)
t∈T t ∈T |K(t) − K(t )|
= . (26) where there are ΓUm (t) = {ΓUm ,Bn (t), Bn ∈ B},
2N t∈T K(t)
ΓBn (t) = {ΓBn ,Bn (t), Bn ∈ B, Bn = Bn }, and ΨBn (t) =
Note that there is G(K) ∈ [0, 1]. The smaller the value of {ΨBn ,s (t)}. In this paper, the primary node in each time
the Gini coefficient is, the more decentralized the blockchain period is known as a priori.
system is. A Gini coefficient of zero expresses perfect equality,
where all values in K are the same. It means every replica B. Action
produces the same number of blocks in a round. A Gini
coefficient of 1 expresses maximal inequality among values. In this paper, we focus on spectrum allocation, block size,
For example, only one replica produces several blocks, but the number of successive blocks produced by one block producer.
other replicas don’t have a chance or can’t produce one block. Let A = {A(t), t ∈ T } be the system action space. Here A(t)
In this case, the blockchain system becomes totally centralized, denotes the action at time period t, which can be expressed
which violates the idea of the blockchain, a distributed ledger. by
To ensure the decentralization of the blockchain system, A(t) = {WU1 (t), . . . , WUm (t), . . . , WUM (t);
we have the following constraint
WB1 (t), . . . , WBn (t), . . . , WBN (t);
G(K) ≤ η, (27) SB (t); K(t)}, (30)
where η ∈ [0, 1] denotes the thresholds of decentralization in where the first two rows denote the spectrum allocation indica-
terms of K. tors for the users and BSs, respectively. Particularly, we have
4) Security: To guarantee the security of the transactions, WUm ∈ {1, . . . , E} and WBn ∈ {1, . . . , E}. Considering the
it is essential to prevent the transactions to be arbitrarily limited wireless resources, the capacity constraint should be
changed or reversed. As such, finality is vital when designing a met, which is shown in expression (1). Here, SB (t) denotes
blockchain consensus protocol. In PBFT-based consensus pro- the block size at time period t, and K(t) denotes the number
tocol, absolute finality can be provided when a 2/3 fraction of of successive blocks at time period t. Especially, the replicas
nodes are honest. So the number of loyal nodes is essential to take turns to produce blocks, there is only one primary node
guarantee the security of the consensus protocol. To guarantee in a certain time period. For the primary node Bp (t) at time
the security of the system, the following constraint should be period t, it produces K(t) blocks. To simplify this problem,
met we discretize the action space, where SB and K are selected
N −1 in the set SB and K.
f≤ . (28)
3
In another word, to prevent from revoking or modifying a C. Reward Function
transaction, the number of malicious nodes should not exceed In this paper, we aim to maximize the performance of the
(N − 1)/3. In this paper, we don’t consider the security joint MEC and blockchain system by making decisions on the
problem of the consensus protocol. In another word, the above action space. The reward function is designed to be
condition is assumed to be satisfied already.
max R(SB , K, W)
SB ,K,W
VI. P ROBLEM F ORMULATION s.t. C1 : Tkf (t) ≤ Tmax , ∀k ∈ K(t)
In order to improve the throughput of this system, we need C2 : G(K) ≤ η
to jointly optimize spectrum allocation, block size, number of
blocks produced by each replica. Since it is intractable to solve C3 : WUm + WWBn ≤ E
Um ∈U Bn ∈B
this problem with the traditional methods, we resort to DRL,
which will be introduced in the next section. To implement C4 : RUm ,Bn + RBn ,Bn ≥ RBn ,Bn
the approach, we formulate the joint optimization problem as Um ∈U n ∈B/{n}
a MDP, where the state, action, and reward function are defined (31)
as follows. T
where R(SB , K, W) = t =t t −t r(t) denotes the long term
reward over the time periods T . Here ∈ [0, 1) is the discount
A. State rate, which indicates the weight of the future reward. For
Let S = {s(t), t ∈ T } be the system state space, where s(t) fixed t, the bigger is, the more influence the future reward
denotes the state at time period t. Here st evolves across T . r(t) has. As noted, with fixed , t −t approaches zero when
The network state consists of the SNR between the users and t − t is large enough, which means that the future reward has
the BSs, SNR between different BSs, computing resources less impact on the long term reward with the time going on.
1696 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 19, NO. 3, MARCH 2020
In the proposed problem, Tkf (t) in constraint C1 denotes the where y(t) is the target Q value, which can be estimated by
time to finality of the k-th block produced in time period t.
y(t) = r(t)+ max Q(s(t+1), a(t+1); θ−(t+1)). (35)
Constraints C1 and C2 represents the limitations on time to a(t+1)
finality, and decentralization, respectively. C3 denotes the allo-
Here, the target DQN is updated every G steps,
cated sub-channels should not exceed the total system wireless
i.e., θ− (t) = θt−G .
resources. C4 denotes the backhaul capacity constraint. As
3) Beyond DRL: To improve the performance of DRL, two
noted, constraints C1 ∼ C4 may not be met, which means the
important techniques, double DQN, and dueling DQN, are
whole system may have a low system performance. Adopting
applied in this work, which will be described next.
the idea of penalty function, we define the immediate reward
a) Double DQN: To handle the problem of overestima-
r(t) as
⎧ tions of Q values, double DQN is proposed by Hado van
⎨ϑ Υ(t) + ϑ 1 , when C1 ∼ C4 is satisfied, Hasselt [30], the idea of which is to decompose the selection
B M
r(t) = TU from the estimation for the actions. In mathematics, it can be
⎩0, otherwise. expressed by
(32)
y DoubleDQN = r + Q(s , argmaxQ(s , a; θ); θ− ), (36)
where ϑB and ϑM ∈ [0, 1] are the weights corresponding to
which selects the actions according to online weights θ, while
the blockchain system and MEC system. And there is ϑB +
the estimation is based on the current values. This simple trick
ϑM = 1. Note that the weights can be dynamic, which indicate
can help yield more accurate estimations, thus improving the
the dynamic preference on these two systems. For ease of
performance of DRL.
modeling, assume that the weights remain stationary within
b) Dueling DQN: Motivated by the fact that not every
one time period, while can be changed over different periods.
action affects the state when in some state, dueling DQN is
VII. P ROPOSED L EARNING A PPROACH proposed [31], the idea of which is to decompose the Q value
into two parts, the value of being in that state V (s) and the
In this section, we first introduce necessary background
advantage of taking that action at that state A(s, a). The idea
related to DRL, then present the approach to solve the con-
can be expressed by
sidered problem.
Q(s, a) = A(s, a) + V (s). (37)
A. DRL Background
In this case, dueling DQN can intuitively learn which state
1) RL: RL is a branch of machine learning, in which the is more valuable without learning the effect of each action at
agent learns the optimal policy by interacting with an unknown that state. By doing so, it can help find more reliable Q values
environment to maximize the expected long term reward [28]. for each action and accelerate the training process.
A RL agent can be modeled as a MDP. The way that an agent
acts in a MDP framework is as follows. Given the state s(t) ∈
B. Proposed Algorithm
S in environment X, the agent takes an action a(t) from the
legal set A at each time step. To solve the proposed problem, an offline DRL-based
After taking the action from the given state, it enters the approach is proposed. In this approach, double-dueling DQN
next state s(t + 1) according to the state transition probability model is first trained to learn the optimal policies in an
P (s(t + 1)|s(t), a(t)). At the same time, it receives an instant offline way. After the model is trained, it can be used by the
reward r(t). In RL, the objective is defined as the expected B-MEC system to jointly allocate the wireless resources and
long term reward, which is decide the size of the blocks and the number of consecutive
T
produced blocks for each replica in an online way. In this way,
it avoids long training time compared to the online learning
R(t) = t −t r(t), (33)
t =t
approach.
In each training step, the state information is sent to the
where ∈ [0, 1] is the discount factor on the future rewards.
Q network, and the Q network sends back the optimal action
2) DRL: Recently, many researches have shown that deep
a∗ (t) at each time step. Action selection follows the ε-greedy
learning can be combined with RL to solve problems with
policy. The transitions, i.e., experience, from all the trains
high dimensional raw data input, which is referred to DRL
are accumulated in the experience replay buffer in parallel.
[29]. In the training process of DRL, it utilizes a deep neural
A mini-batch of samples are selected from the experience
network (DNN) called DQN to derive the relationship between
replay buffer to train the Q network parameter θ. The target Q
the action-state pair and the Q function Q(s, a; θ), in which θ
network parameter θ− is updated every G steps, i.e., copying
represents the weights of the neural networks. DQN is trained
from the main Q network. The training process is shown in
by updating θ in each iteration to approximate the real Q
Algorithm 1, in which there are two points to be specified.
values. This is achieved until two improved technique are
First, the Q values are first divided into two parts, the value
applied in DQN, experience replay and the target network.
being in the state V (s) and the advantage of taking that action
Furthermore, the main DQN is trained towards the target
in that state A(s, a). In the last layer of DQN, combine these
DQN by minimizing the loss function, which is defined as
two parts into one Q value. Second, when updating the target
Loss(θ(t)) = E[(y(t) − Q(s(t), a(t); θ(t)))], (34) Q network, a learning rate α is introduced, where α ∈ [0, 1] is
GUO et al.: ADAPTIVE RESOURCE ALLOCATION IN FUTURE WIRELESS NETWORKS WITH BLOCKCHAIN AND MEC 1697
Algorithm 1 Offline DRL-Based Performance Optimization Theorems 1: In practical scenarios, the computational
for B-MEC complexity of the proposed training DRL algorithm is
1: Input: Maximum training episode Emax , maximum steps O(E (N +M) ) or O(N + M )(E−(N +M)) .
Hmax in each episode, mini-batch size U , initial learning Proof of Theorem 1: To prove the above theorem and thus
rate α, exploration probability ε, discount rate . analyze the complexity of the proposed algorithm, one must
2: Initialization consider the size of the state function of the system as
3: Initialize the state of the B-MEC system s1 , set ε = 1 well as the action space at each state vector [32]. As such,
4: Initialize the experience replay buffer based on the action space definition, the system needs to
5: Initialize the main DQN with random weights with θ. update each user and BS’s spectrum allocation indicator,
6: Initialize the target DQN with weights θ − = θ. the block size and number of blocks produced by each
7: for episode = 1, . . . , Emax do block producer, and, thus, its actions is also a function of
8: for t = 1, . . . , Hmax do channel association vector, block size level, and block number
9: Choose a random probability p, level.
10: if p > ε then For each state, the action of the system is a function of
11: a(t) = a∗ (t) = arg maxa Q(x, a; θ), channel association vector, block size level, and block number
12: else level. Nevertheless, the number of possible channel association
13: randomly select an action at = a∗ (t). of the users and BSs in the system is much more than
14: end if the number of possible block size level and block number
15: Decrease exploration probability ε level. Therefore, one can focus on the number of possible
16: Execute action a(t) in the system, and observe the channel association of the users and BSs only for analyzing the
reward r(t) and the next state s(t + 1). convergence complexity of the proposed training algorithm,
17: Store the experience (s(t), a(t), r(t), s(t + 1)) into by the law of large numbers. Consequently, the computational
the experience replay buffer. complexity of the proposed algorithm is O(E (N +M) ) when
18: Sample a mini-batch of size U the system update the channel allocation indicator of the N
19: Calculate the target Q-value through expression (36) block producers and M users with E sub-channels. In this
20: Update the main DQN by minimizing the loss L(θ) paper, we assume that the number of sub-channels is more than
in expression (34), and perform a gradient descent step on the total number of users and BSs, i.e., E > N + M . Thus,
L(θ) with respect to θ. from another perspective, the computation complexity can be
21: Every G steps, update the target DQN parameters also expressed by O(N + M )(E−(N +M)) . This completes the
with learning rate α, θ− = αθ + (1 − α)θ− every G steps proof.
22: Update the learning rate according to the optimizer From Theorem 1, we can conclude that the conver-
(e.g., Adam, Adagrad) gence speed of the proposed training algorithm is strongly
23: end for related to the state space dimension. It is of signifi-
24: end for cant importance to note here that there exists a tradeoff
between the computational complexity of the proposed DRL
training algorithm and the resulting network performance
the weight to adjust the preference on the current and previous [32]. Worth noted, the complexity of the proposed algo-
learning values. rithm can be ignored in this paper, due to the training
In each training step, the system state transits into a new process, which is the most costly, is conducted in an offline
state according to the system transition probability after an way.
action is performed. And the reward can be observed based Theorems 2: The space complexity of the proposed
on the reward function. After the states, actions, the reward DRL-based algorithm is O(SAHmax ), where S is the number
function, transition probabilities, and constraints of B-MEC of states, A is the number of actions, and Hmax is the number
system are identified, the optimal policy can be learned off- of steps in one episode.
line. In order to obtain the optimal solution, the states, actions, Proof of Theorem 2: According to [33], space complexity
reward function, and constraints are identified in Section VI. is measured by the amount of memory required to implement
As noted, the transition probabilities and reward need to be the algorithm. Inferred from the work in [34], the space
identified when conducting the simulation, while both of them complexity is related to the number of states, the number
are not needed when carrying out the Q networks in a real of actions, and the number of steps per episode. In this
B-MEC system. paper, the number of states can be expressed by S = (N +
M )(E−(N +M)) × Y N × N . The number of actions can be
calculated by A = (N + M ) × |SB | × |K|. In this paper,
the number of maximum steps in each episode is Hmax ,
C. Complexity Analysis
as defined in Algorithm 1.
Next, we analyze the computational complexity and space In this case, the space complexity to implement the proposed
complexity of the proposed DRL-based algorithm for practical algorithm can be expressed by O(SAHmax ) = O((N +
scenarios where there are tens and even hundreds of users and M )(E−(N +M)+1) × Y N × N × |SB | × |K| × Hmax ). This
BSs in a small area. completes the proof.
1698 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 19, NO. 3, MARCH 2020
TABLE I
T HE S IMULATION PARAMETERS
Fig. 3. Reward under different discount rates. Fig. 5. Comparison of DQN, double DQN, dueling DQN, double-dueling
DQN.
The effects of different discount rates on the performance Different DRL methods are compared in Fig. 5. For fairness,
of the proposed approach are shown in Fig. 3. In DRL, all methods adopt the same simulation parameters. The Y-axis
the actions are chosen by optimizing the long term reward, represents the value of loss function, which represents the gap
where the future rewards are discounted by multiplying the to approximate the Q function, while the x-axis represents
discount rate, as defined in equation (33). The learning agent the training steps. First, we can see that double-dueling DQN
will choose the action maximizing the current reward with a converges first. Second, double-dueling DQN performs a more
small discount rate and vice versa. Since the current action accurate approximation of the Q function than the other
would influence future rewards in this paper, the long term three DRL methods. That is because dueling DQN divides
reward increases with the discount rate growing. However, the Q function into the state value function and the action
it is meaningless to put too many weights on the future in advantage function, which allows a better approximation of the
an unstable system. Also, it would incur high computational Q values and enables faster convergence. Furthermore, double
complexity. To explore a tradeoff between the performance DQN selects the actions according to the online weights,
and the computational complexity, an appropriate discount which mitigates the overestimation compared with traditional
rate should be chosen, which is set as 0.9 in the rest of the DQN. It also results in more accurate approximation.
simulations. Fig. 6 shows the convergence of different schemes, where
Fig. 4 shows the effects of the mini-batch size of the pro- the y-axis denotes the long term reward. First, the existing
posed approach on the convergence performance. The x-axis static scheme converges firstly, but obtains the worst perfor-
denotes the training steps and the y-axis represents the value mance in terms of reward. That is because that the decisions
of loss function. The mini-batch size indicates how many are made according to the current reward, which needs less
experience cases are used to train the Q network in each training steps. As a result, it doesn’t consider the effects of the
training step. We can observe from Fig. 4 that the convergence current action on the future rewards, which obviously obtains
is faster with the mini-batch size growing, which is because the lowest reward. Second, the proposed scheme maintains
that more experiences are used to train the Q network with higher long term reward than the other three schemes. With
a bigger mini-batch size. Similar to the other parameters, the adaptive spectrum allocation policy, the latency can be
an appropriate mini-batch size should be chosen, which is set reduced. With an adaptive block size and a properly chosen
to be 64 in the rest of the simulations. number of producing blocks, the throughput of blockchain
1700 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 19, NO. 3, MARCH 2020
R EFERENCES
[1] V. Sharma, I. You, F. Palmieri, D. N. K. Jayakody, and J. Li, “Secure
Fig. 10. Rewards under different SNR state settings. and energy-efficient handover in fog networks using blockchain-based
DMM,” IEEE Commun. Mag., vol. 56, no. 5, pp. 22–31, May 2018.
[2] X. Tao, K. Ota, M. Dong, H. Qi, and K. Li, “Performance guaranteed
computation offloading for mobile-edge cloud computing,” IEEE Wire-
less Commun. Lett., vol. 6, no. 6, pp. 774–777, Dec. 2017.
[3] D. M. Kalathil and R. Jain, “Spectrum sharing through contracts
for cognitive radios,” IEEE Trans. Mobile Comput., vol. 12, no. 10,
pp. 1999–2011, Oct. 2013.
[4] J. Zheng, Y. Cai, Y. Wu, and X. Shen, “Dynamic computation offloading
for mobile cloud computing: A stochastic game-theoretic approach,”
IEEE Trans. Mobile Comput., vol. 18, no. 4, pp. 771–786, Apr. 2019.
[5] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile
edge computing—A key technology towards 5G,” ETSI White Paper,
vol. 11, no. 11, pp. 1–16, 2015.
[6] J. Feng, Q. Pei, F. R. Yu, X. Chu, and B. Shang, “Computation offloading
and resource allocation for wireless powered mobile edge computing
with latency constraint,” IEEE Wireless Commun. Lett., vol. 8, no. 5,
pp. 1320–1323, Oct. 2019.
[7] Y. Liu, F. R. Yu, X. Li, H. Ji, and V. C. M. Leung, “Distributed resource
Fig. 11. Throughput and latency (seconds) versus the number of users. allocation and computation offloading in fog and cloud networks with
non-orthogonal multiple access,” IEEE Trans. Vehi. Technol., vol. 67,
no. 12, pp. 12137–12151, Dec. 2018.
[8] H. Guo, J. Liu, and H. Qin, “Collaborative mobile edge computation
on the convergence speeds of these algorithms. That is because offloading for IoT over fiber-wireless networks,” IEEE Netw., vol. 32,
the number of states, actions, and steps in one episode is rather no. 1, pp. 66–71, Jan. 2018.
[9] C. Wang, F. R. Yu, C. Liang, Q. Chen, and L. Tang, “Joint computation
larger than the state set’s dimension, which can be ignored by offloading and interference management in wireless cellular networks
the law of large numbers. Fourth, the proposed scheme obtains with mobile edge computing,” IEEE Trans. Veh. Technol., vol. 66, no. 8,
the best performance, due to the superiority of DRL, which pp. 7432–7445, Aug. 2017.
[10] M. Liu and Y. Liu, “Price-based distributed offloading for mobile-
also shows the ability of the proposed scheme to adapt to edge computing with computation capacity constraints,” IEEE Wireless
different network dynamics. Commun. Lett., vol. 7, no. 3, pp. 420–423, Jun. 2018.
At last, we analyze the throughput and latency of the [11] K. Yang, X. Jia, and K. Ren, “Secure and verifiable policy update
outsourcing for big data access control in the cloud,” IEEE Trans.
proposed algorithm, as shown in Fig. 11, where the number Parallel Distrib. Syst., vol. 26, no. 12, pp. 3461–3470, Dec. 2015.
of BSs is fixed to 10 and the number of users ranges from [12] F. Tschorsch and B. Scheuermann, “Bitcoin and beyond: A technical
5 to 25. Similar to Fig. 9, the results in Fig. 11 are also survey on decentralized digital currencies,” IEEE Commun. Surveys
Tuts., vol. 18, no. 3, pp. 2084–2123, 3rd Quart., 2016.
obtained online using the offline-trained algorithm, and it [13] T. Salman, M. Zolanvari, A. Erbad, R. Jain, and M. Samaka, “Security
also shows that the proposed approach of training offline services using blockchains: A state of the art survey,” IEEE Commun.
and operating online can work well in practice. Obviously, Surveys Tuts., vol. 21, no. 1, pp. 858–880, 1st Quart., 2018.
[14] F. R. Yu, J. Liu, Y. He, P. Si, and Y. Zhang, “Virtualization for distributed
the throughput decreases and the latency increases with the ledger technology (vDLT),” IEEE Access, vol. 6, pp. 25019–25028,
number of users increasing. First, the resources are fixed. 2018.
So the average amount of resources decreases with the number [15] T. N. Dinh and M. T. Thai, “AI and blockchain: A disruptive integration,”
Computer, vol. 51, no. 9, pp. 48–53, Sep. 2018.
of users going up. Second, the queuing delay grows when the [16] Y. Liu, F. R. Yu, X. Li, H. Ji, and V. C. M. Leung, “Decentralized
number of users raises, which induces larger latency. resource allocation for video transcoding and delivery in blockchain-
based system with mobile edge computing,” IEEE Trans. Vehi. Technol.,
IX. C ONCLUSION AND F UTURE W ORK vol. 68, no. 11, pp. 11169–11185, Nov. 2019.
[17] M. Liu, F. R. Yu, Y. Teng, V. C. M. Leung, and M. Song, “Distributed
In this paper, we developed a novel blockchain-based frame- resource allocation in blockchain-based video streaming systems with
work for resource allocation in future wireless networks with mobile edge computing,” IEEE Trans. Wireless Commun., vol. 18, no. 1,
MEC. With blockchain, the data delivery and computation pp. 695–708, Jan. 2019.
[18] J. Kang et al., “Blockchain for secure and efficient data sharing in
execution on the edge servers are self-organized by smart vehicular edge computing and networks,” IEEE Internet Things J., vol. 6,
contracts. A consensus protocol in this distributed wireless no. 3, pp. 4660–4670, Jun. 2019.
network was proposed, along with the details and theoretical [19] K. Fan, S. Wang, Y. Ren, K. Yang, Z. Yan, H. Li, and Y. Yang,
“Blockchain-based secure time protection scheme in IoT,” IEEE Internet
analysis. The performance of MEC and blockchain system was Things J., vol. 6, no. 3, pp. 4671–4679, Jun. 2019.
1702 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 19, NO. 3, MARCH 2020
[20] C. Qiu, F. R. Yu, H. Yao, C. Jiang, F. Xu, and C. Zhao, “Blockchain- F. Richard Yu (S’00–M’04–SM’08–F’18) received
based software-defined industrial Internet of Things: A dueling deep the Ph.D. degree in electrical engineering from The
Q-learning approach,” IEEE Internet Things J., vol. 6, no. 3, University of British Columbia (UBC) in 2003.
pp. 4627–4639, Jun. 2019. From 2002 to 2006, he was with Ericsson, Lund,
[21] H. Liu et al., “Blockchain-enabled security in electric vehicles cloud and Sweden, and a start-up in California, USA. He joined
edge computing,” IEEE Netw., vol. 32, no. 3, pp. 78–83, May 2018. Carleton University in 2007, where he is currently
[22] G. Wood et al., “Ethereum: A secure decentralised generalised transac- a Professor. His research interests include con-
tion ledger (eip-150 revision),” Ethereum Project Yellow Paper, vol. 151, nected/autonomous vehicles, security, artificial intel-
no. 2017, pp. 1–32, 2017. ligence, distributed ledger technology, and wireless
[23] M. Liu, F. R. Yu, Y. Teng, V. C. M. Leung, and M. Song, “Performance cyber-physical systems.
optimization for blockchain-enabled industrial Internet of Things (IIoT) Dr. Yu is a registered Professional Engineer in the
systems: A deep reinforcement learning approach,” IEEE Trans. Ind. province of Ontario, Canada, and a fellow of the Institution of Engineering and
Informat., vol. 15, no. 6, pp. 3559–3570, Jun. 2019. Technology (IET). He is an elected member of the Board of Governors of the
[24] V. D. Papoutsis and S. A. Kotsopoulos, “Chunk-based resource alloca- IEEE VTS. He received the IEEE TCGCC Best Journal Paper Award in 2019,
tion in multicast OFDMA systems with average BER constraint,” IEEE the Distinguished Service Awards in 2019 and 2016, the Outstanding Lead-
Commun. Lett., vol. 15, no. 5, pp. 551–553, May 2011. ership Award in 2013, the Carleton Research Achievement Award in 2012,
[25] A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti, “Making the Ontario Early Researcher Award (formerly Premiers Research Excellence
byzantine fault tolerant systems tolerate byzantine faults,” in Proc. 6th Award) in 2011, the Excellent Contribution Award at IEEE/IFIP TrustCom
NSDI, 2009, pp. 153–168. 2010, the Leadership Opportunity Fund Award from Canada Foundation of
[26] F. Cowell, Measuring Inequality. Oxford, U.K.: Oxford Univ. Press, Innovation in 2009, and the Best Paper Awards at IEEE ICNC 2018, VTC
2011. 2017 Spring, ICC 2014, Globecom 2012, IEEE/IFIP TrustCom 2009, and
[27] F. Wenli, H. Ping, and L. Zhigang, “Multi-attribute node importance International Conference on Networking 2005. He has served as the Technical
evaluation method based on Gini-coefficient in complex power grids,” Program Committee (TPC) Co-Chair of numerous conferences. He serves on
IET Gener., Transmiss. Distrib., vol. 10, no. 9, pp. 2027–2034, Jun. 2016. the editorial boards of several journals, including as a Co-Editor-in-Chief
[28] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement for Ad Hoc and Sensor Wireless Networks, an Area Editor for the IEEE
learning: A survey,” J. Artif. Intell. Res., vol. 4, no. 1, pp. 237–285, C OMMUNICATIONS S URVEYS AND T UTORIALS , a Lead Series Editor for
Jan. 1996. the IEEE T RANSACTIONS ON V EHICULAR T ECHNOLOGY, and the IEEE
[29] V. Mnih et al., “Human-level control through deep reinforcement learn- T RANSACTIONS ON G REEN C OMMUNICATIONS AND N ETWORKING. He is
ing,” Nature, vol. 518, no. 7540, p. 529, 2015. an IEEE Distinguished Lecturer of Vehicular Technology Society (VTS) and
[30] H. V. Hasselt, “Double Q-learning,” in Proc. Adv. Neural Inf. Process. Communications Society.
Syst. 23, 2010, pp. 2613–2621. [Online]. Available: http://papers.nips.cc/
paper/3964-double-q-learning.pdf
[31] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and
Heli Zhang received the B.S. degree in commu-
N. de Freitas, “Dueling network architectures for deep reinforcement
nication engineering from Central South University
learning,” 2015, arXiv:1511.06581. [Online]. Available: https://arxiv.
in 2009 and the Ph.D. degree in communication
org/abs/1511.06581
and information system from the Beijing University
[32] U. Challita, W. Saad, and C. Bettstetter, “Interference management
of Posts and Telecommunications (BUPT) in 2014.
for cellular-connected UAVs: A deep reinforcement learning approach,”
From 2014 to 2018, she was a Lecturer with the
IEEE Trans. Wireless Commun., vol. 18, no. 4, pp. 2125–2140,
School of Information and Communication Engi-
Apr. 2019.
neering, BUPT, where she has been an Associate
[33] A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman,
Professor since 2018. Her research interests include
“PAC model-free reinforcement learning,” in Proc. 23rd Int. Conf. Mach.
heterogeneous networks, long-term evolution/fifth
Learn., 2006, pp. 881–888.
generation, and the Internet of Things.
[34] C. Jin, Z. A. Zhu, S. Bubeck, and M. I. Jordan, “Is Q-learning provably
Dr. Zhang participated in many National projects funded by National
efficient?” in Proc. NIPS, 2018, pp. 4863–4873.
Science and Technology Major Project, National 863 High-tech, and National
[35] Y. He et al., “Deep-reinforcement-learning-based optimization for cache-
Natural Science Foundation of China, and cooperated with many corporations
enabled opportunistic interference alignment wireless networks,” IEEE
in research. She has been a reviewer for Journals of IEEE Wireless Commu-
Trans. Veh. Technol., vol. 66, no. 11, pp. 10433–10445, Nov. 2017.
nications, IEEE Communication Magazine, the IEEE T RANSACTIONS ON
[36] U. Challita, L. Dong, and W. Saad, “Proactive resource management for
V EHICULAR T ECHNOLOGY, the IEEE C OMMUNICATION L ETTERS , and the
LTE in unlicensed spectrum: A deep learning perspective,” IEEE Trans.
IEEE T RANSACTIONS ON N ETWORKING.
Wireless Commun., vol. 17, no. 7, pp. 4674–4689, Jul. 2018.
[37] J. Zhu, Y. Song, D. Jiang, and H. Song, “A new deep-Q-learning-based
transmission scheduling mechanism for the cognitive Internet of Things,”
IEEE Internet Things J., vol. 5, no. 4, pp. 2375–2385, Aug. 2018. Hong Ji (SM’09) received the B.S. degree in
communications engineering and the M.S. and
Ph.D. degrees in information and communications
engineering from the Beijing University of Posts
and Telecommunications (BUPT), Beijing, China,
in 1989, 1992, and 2002, respectively. In 2006, she
was a Visiting Scholar with The University of British
Columbia, Vancouver, BC, Canada. She is currently
a Professor with BUPT. She has authored more than
300 journals/conference papers. Several of her arti-
cles had been selected for Best paper. Her research
interests include wireless networks and mobile systems, including cloud
Fengxian Guo received the B.E. degree in commu- computing, machine learning, intelligent networks, green communications,
nications from Zhengzhou University (ZZU), China, radio access, ICT applications, system architectures, management algorithms,
in 2015. She is currently pursuing the Ph.D. degree and performance evaluations.
with the School of Information and Communica- Dr. Ji is serving on the Editorial Boards of the IEEE T RANSAC -
tion Engineering, Beijing University of Posts and TIONS ON G REEN C OMMUNICATIONS AND N ETWORKING and Interna-
Telecommunications (BUPT), Beijing, China. She tional Journal of Communication Systems (Wiley). She has served as the
was with The University of British Columbia, Van- Co-Chair for Chinacom’11 and a member of the Technical Program Com-
couver, Canada, and Carleton University, Ottawa, mittee of WCNC’19/15/14/12, Globecom’17/16/15/14/13/12/11/10, ISCIT’17,
Canada, as a Visiting Ph.D. Student from Septem- CITS’16/15/12, WCSP’15, ICC’20/13/12/11, ICCC’13/12, PIMRC’12/11,
ber 2018 to September 2019. Her current research IEEE VTC’12S, and Mobi-World’11. She was a Guest Editor of International
interests include future wireless networks, mobile Journal of Communication Systems, (Wiley) Special Issue on Mobile Internet:
edge computing, blockchain, and machine learning. Content, Security and Terminal.
GUO et al.: ADAPTIVE RESOURCE ALLOCATION IN FUTURE WIRELESS NETWORKS WITH BLOCKCHAIN AND MEC 1703
Mengting Liu received the Ph.D. degree from Victor C. M. Leung (S’75–M’89–SM’97–F’03)
the Beijing University of Posts and Telecommu- is currently a Distinguished Professor of computer
nications (BUPT), Beijing, China, in 2019. From science and software engineering with Shenzhen
2017 to 2018, she was a Visiting Ph.D. Student University, Shenzhen, China, and a Professor Emeri-
with The University of British Columbia, Vancouver, tus with The University of British Columbia (UBC),
BC, Canada. Her current research interests include Vancouver, BC, Canada. Before he retired from
blockchain technology, deep reinforcement learning, UBC in 2018, he was a Professor of electrical
resource allocation, mobile edge computing systems, and computer engineering and the holder of the
and stochastic geometry theory. TELUS Mobility Research Chair there. He has coau-
thored more than 1300 journals/conference papers
and book chapters. His research is in the broad areas
of wireless networks and mobile systems. He is serving on the Editorial
Boards of the IEEE T RANSACTIONS ON G REEN C OMMUNICATIONS AND
N ETWORKING, the IEEE T RANSACTIONS ON C LOUD C OMPUTING, IEEE
A CCESS , the IEEE N ETWORK, and several other journals. He is a fellow of the
Royal Society of Canada, Canadian Academy of Engineering, and Engineering
Institute of Canada. He received the IEEE Vancouver Section Centennial
Award, the 2011 UBC Killam Research Prize, the 2017 Canadian Award
for Telecommunications Research, and the 2018 IEEE TCGCC Distinguished
Technical Achievement Recognition Award. He has coauthored articles that
received the 2017 IEEE ComSoc Fred W. Ellersick Prize, the 2017 IEEE
Systems Journal Best Paper Award, the 2018 IEEE CSIM Best Journal Paper
Award, and the 2019 IEEE TCGCC Best Journal Paper Award. He is named
in the current Clarivate Analytics list of Highly Cited Researchers.