10 1109@tii 2020 3004232

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3004232, IEEE
Transactions on Industrial Informatics
1
A Reinforcement Learning-based Network Traffic

Prediction Mechanism in Intelligent Internet of
Things
Laisen Nie, Zhaolong Ning, Mohammad S. Obaidat, Life Fellow, IEEE, Balqies Sadoun, Huizhi Wang,
Shengtao Li, Lei Guo, and Guoyin Wang
Abstract—Intelligent Internet of Things (IIoT) is comprised of

various wireless and wired networks for industrial applications,
which makes it complex and heterogeneous. The openness of
T O provide support to Intelligent Internet of Things (IIoTs)
with adequate throughput for industrial applications, the
5G-enabled communication network, which is involved in
IIoT has led to the intractable problems of network security
and management. Many network security and management IIoTs, is expected to be complex and heterogeneous with a
functions rely on network traffic prediction techniques, such as dense deployment of infrastructures. Aiming at distinct kinds
anomaly detection and predictive network planning. Predicting of industrial applications, various access techniques have been
IIoT network traffic is significantly difficult because its frequently used in 5G-enabled networks to guarantee the quality-of-
updated topology and diversified services lead to irregular service and quality-of-experience of IIoTs [1]–[4]. In this case,
network traffic fluctuations. Motivated by these observations,
we proposed a reinforcement learning-based mechanism. We the network security and management obtain unprecedented
modelled the network traffic prediction problem as a Markov attentions. Nevertheless, the openness of 5G heterogeneous
decision process, and then predicted network traffic by Monte- networks (5G HetNets) arises the main challenges in network
Carlo Q-learning. Furthermore, we addressed the real-time security. Meanwhile, the dense deployment of infrastructure
requirement of the proposed mechanism and we proposed a using smaller cells requires more efficient management mech-
residual-based dictionary learning algorithm to improve the
complexity of Monte-Carlo Q-learning. Finally, the effectiveness anisms [5]–[7].
of our mechanism was evaluated using real network traffic. Network traffic prediction is one of the most important
bases for network security and management functions, such
Index Terms—Intelligent Internet of Things, network traffic
prediction, reinforcement learning, industrial applications. as intrusion detection and anomaly detection [8]–[10].Network
traffic information can be used in anomaly detection resulted
by network attacks or malfunctions. In a wireless network,
I. I NTRODUCTION network traffic information is adopted for both safety-related
and non-safety-related applications, such as predictive network
planning, energy-efficient routing, and aggregation traffic de-
L. Nie is with the School of Electronics and Information, Northwestern tection [11], [12].
Polytechnical University, Xi’an, 710072, China, also with the National Mobile There are a great number of researchers exploring network
Communications Research Laboratory, Southeast University, Nanjing 210096, traffic prediction mechanisms for traditional networks, such
China, and also with the Qingdao Research Institute of Northwestern Poly-
technical University, Qingdao, China. Email: nielaisen@nwpu.edu.cn. as IP backbone and cellular networks [13], [14]. Statistical
Z. Ning (Corresponding author) is with the National Mobile Communica- model-based mechanisms, where a linear (or non-linear) model
tions Research Laboratory, Southeast University, Nanjing 210096, China, also is built to extract the features of traffic flows as a prior, have
with the Chongqing Key Laboratory of Mobile Communications Technology,
Chongqing University of Posts and Telecommunications, Chongqing 400065, been proposed to predict network traffic in various scenarios.
China, and also with the School of Software, Dalian University of Technology, Unfortunately, these prediction mechanisms are not suitable
Dalian 116024, China. Email: zhaolongning@dlut.edu.cn. for IIoTs to empower industrial systems. The primary reason
M. Obaidat is with the College of Computing and Informatics, The
University of Sharjah, Sharjah 27272, UAE; The King Abdullah II School of for this is that the statistical characteristics of network traffic in
Information Technology, The University of Jordan, Amman 11942, Jordan; IIoTs is very different to those of other networks. For instance,
and University of Science and Technology Beijing, Beijing 100083, China. IP backbone network traffic often obeys a regular and periodic
Email: msobaidat@gmail.com.
B. Sadoun is with College of Engineering, University of Sharjah, Sharjah profile. On the contrary, the traffic flows of IIoTs have many
27272, UAE and with College of Engineering, Al-Balqa Applied University, irregular fluctuations, increasing the difficulty of prediction.
Al-Salt 19117, Jordan. Email: sadounbalqies@gmail.com. Machine learning has been acknowledged as an available
H. Wang is with the School of Electronics and Information,
Northwestern Polytechnical University, Xi’an, 710072, China. Email: solution for the problem of network traffic prediction in
wanghuizhi@mail.nwpu.edu.cn. traditional networks. Nevertheless, there are several challenges
S. Li is with the School of Information Science and Engineering, Shandong in machine learning-based network traffic prediction when
Normal University, Jinan, China. Email: saintaolee@sdnu.edu.cn.
L. Guo is with School of Communication and Information Engineering, these mechanisms are utilized in the same way for IIoT traffic
Chongqing University of Posts and Telecommunications, Chongqing 400000, prediction [15], [16]. The main challenges can be summarized
China. Email: guolei@cqupt.edu.cn. as follows:
G. Wang is with the Chongqing Key Laboratory of Computational Intel-
ligence, Chongqing University of Posts and Telecommunications, Chongqing • In an IIoT, infrastructure is deployed much more densely
400065, China. Email: wanggy@cqupt.edu.cn. than in other networks. The number of end-to-end net-
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Exeter. Downloaded on June 26,2020 at 20:54:25 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3004232, IEEE
2
work traffic increases with the number of nodes. Hence, II reviews related work. Section III presents the system model
the computational complexity of predicting IIoT traffic of network traffic prediction. In Section IV, we present the RL-
by way of machine learning is greater, which also makes based network traffic prediction mechanism for IIoTs. Section
real-time prediction more difficult. V describes our evaluation of the proposed mechanism. Sec-
• Machine learning-based network traffic prediction con- tion VI concludes our work and indicates future work.
sumes many computing and memory resources for net-
work management. This is because these algorithms must II. R ELATED W ORK
sample a prior of network traffic as a training data set. A. Statistical Model-based Network Traffic Prediction
• Collecting such prior network traffic also consumes the
Statistical model-based mechanisms have been widely de-
resources of network edge. The IIoT is built by vari-
veloped to predict network traffic [18]–[21]. These models
ous access techniques, but not all of these can support
track the linear or non-linear features of end-to-end network
sufficient resources for network traffic sampling. For
traffic for prediction. Typically, linear time series methods,
instance, the energy of nodes in a wireless sensor network
e.g., autoregressive model and autoregressive moving average
cannot provide sufficient power to deploy network traffic
model, are used to model end-to-end network traffic for
sampling.
prediction. Besides, the GARCH model in [19] was utilized
Thereby, though network traffic prediction is crucial for vari- to extract non-linear features of traffic flows. In a wireless
ous perspectives and many prediction mechanisms have been network, traffic reveals a large number of irregular fluctuations.
proposed, it is still an intractable challenge to predict the Meanwhile, due to the random access, researchers usually
end-to-end network traffic in IIoTs. Existing network traffic focus on short-term prediction of traffic flows. Therefore,
prediction mechanisms mainly make a trade-off between the modeling short-range dependence is crucial for traffic pre-
cost of the algorithm and its accuracy. diction. However, a single statistical model cannot match the
Motivated by the above challenges, we focus on the problem complex statistical features of traffic flows. As a result, hybrid
of end-to-end network traffic prediction in IIoTs, and propose model-based mechanisms have appeared in the literature for
a mechanism based on Reinforcement Learning (RL) for feature extraction. In [22], the authors took advantage of the
real-time traffic prediction. Addressing the issues of high stationary wavelet transform, quantum genetic algorithm, and
consumption caused by learning a large-scale training data set backpropagation neural network to predict traffic flow in a
and sampling a great deal of network traffic, we propose a wireless network. The Markov model is a prevalent method
Monte-Carlo Q-learning algorithm. In details, we first model to extract the statistical features of end-to-end network traffic.
the end-to-end network traffic as a Markov Decision Process The authors in [23] utilized Markov chains with tensors to
(MDP) via the prior of network traffic. Then, we predict deal with the problem of network traffic prediction, and a
the traffic flow using the proposed Monte-Carlo Q-learning multivariate multi-order Markov transition was proposed for
algorithm. Nevertheless, the volume of each traffic flow is accurate multi-modal prediction. They first defined the tensor
tremendous, making defining a series of actions impossible. unified product, and then proposed the general multivariate
Therefore, to reduce the computational complexity of the multi-order Markov model. After that, a modified multi-step
proposed algorithm, we propose a residual-based adaptive transition tensor is put forward to predict network traffic.
dictionary learning algorithm to project the network traffic to
another space. In this many, network traffic can be denoted B. Machine Learning-enabled Network Traffic Prediction
by few coefficients, which decreases the number of actions as
a result of the reduced computational complexity. The main Currently, machine learning techniques have been widely
contributions of this paper can be summarized as follows: used in network traffic prediction to capture various features
of network traffic. To manage various networks jointly, the
• We consider the irregular fluctuations of network traffic software defined network has gradually become a common
in IIoTs, and propose an efficient algorithm based on networking technique. The authors in [24] studied network
RL to extract short-term time-varying features of network traffic prediction for a mobile metro-core network based on
traffic We combine Monte-Carlo learning, Q-learning, and a software defined network. They constructed a feedforward
Kullback-Leibler (KL) divergence [17] for prediction. neural network with 3 layers to predict network traffic. The
• With the aim of running the prediction mechanism in real short-range dependence was also considered. The authors
time, a greedy adaptive dictionary learning algorithm is in [25] focused on the spatio-temporal features of wireless net-
proposed, where a residual term is defined via the `0 - and work traffic, and a recurrent neural network was designed for
`2 -norms to make the coefficients fewer and smaller so network traffic prediction. The authors not only considered the
that the number of actions and computational complexity short-range dependence but also the long-range dependence of
are reduced. traffic flows. The Long Short-Term Memory (LSTM)-based
• An evaluation of the proposed mechanism using real approach was deployed in optical data center networks to
network traffic data sets via our testbed and the GÉANT predict the traces of traffic flows [26]. This LSTM-based deep
network is implemented. Numerical results indicate that architecture contained 4 LSTM blocks as the hidden layer.
the Monte-Carlo Q-learning-based mechanism can cap- Similarly, the authors in [27] designed a deep architecture
ture the short-term features of traffic flows in IIoTs better. based on the LSTM recurrent neural network with 2 hidden
The remainder of this paper is organized as follows. Section layers to capture the spatio-temporal features of network
3
traffic for prediction in large-scale backbone networks. Each our network traffic prediction mechanism legibly and simply,
hidden layer consisted of two LSTM blocks. Considering the we define an OD flow as a row vector Xt and its elements as
contradiction between real-time online prediction and the mass Xt . Here, we use time slot as a unit, which means that X (n, t)
of training data set required to train the deep architecture, is not the volume of network traffic at a point in time; rather,
the authors in [28] took advantage of the transfer learning it declares the average value of network traffic during a time
technique to reduce the training time for the network traffic interval, typically 10 or 15 min [33].
prediction in cellular networks. LSTM was also adopted to Network traffic prediction can be formulated as the problem
construct the deep architecture. Consequently, an effective of calculating the traffic matrix X according to a prior of the
inter-cluster transfer learning approach was proposed, which traffic matrix. If we denote a traffic element of any OD flow
can reuse extracted features efficiently. The authors in [29] by Xt , then the problem of network traffic prediction can be
studied the problem of network traffic prediction in software regarded as calculating Xt from the previous network traffic
defined network-enabled optical networks. They modeled the data (Xt−1, Xt−2, ..., Xt−W ). Generally, the statistical model-
problem of network traffic prediction using a Gaussian process based network traffic prediction mechanism is to deal with
regression model; the proposed method took into account the following conditional probability:
the short-range and long-range dependencies of traffic flows
at the same time. The authors in [30] dealt with network p (Xt |Xt−1, Xt−2, ..., Xt−W ) , (1)
traffic prediction using an integer linear programming model. where W is the length of the prior network traffic. The
Afterwards, a heuristic algorithm was designed to calculate statistical model-based network traffic prediction mechanism
the NP-complete problem. The network traffic can be denoted often assumes that Eq. (1) obeys certain distributions, such as
by a matrix, and it can be viewed as an image. Thereby, the multifractal wavelet model and generalized autoregressive
many methods based on Convolutional Neural Network (CNN) conditional heteroskedasticity model. Indeed, these mecha-
have emerged to deal with the problem of network traffic nisms pursue a traffic flow via the prior of network traffic
prediction. The authors in [31] built a CNN to predict end-to- in the sense that a function is fitted, which can be denoted by:
end network traffic in a citywide cellular network. Different
from previous methods based on CNN, a densely connected Xt = f (Xt−1, Xt−2, ..., Xt−W ) . (2)
CNN was proposed, where more matrices were concurrently
Unfortunately, this function represents the time-varying feature
added as input data to extract the spatial and temporal features
of traffic flows, and it changes depending on the services
of the traffic matrix. However, due to the dense deployment of
supported by our network. Therefore, we explore RL to deal
infrastructure and larger energy consumption of sampling with
with the problem of network traffic prediction defined by Eq.
prior information, these methods are not suitable for IIoTs.
(2).
III. S YSTEM M ODEL IV. R EINFORCEMENT L EARNING - BASED N ETWORK

End-to-end network traffic can be denoted by a matrix [32]. T RAFFIC P REDICTION
This is a crucial input parameter for several kinds of network In this section, we introduce the designed network traffic
management and security functions. In a traffic matrix, each prediction mechanism based on RL and dictionary learning.
row vector describes the volume of traffic flow between This mechanism contains two parts. First, we propose a
possible Origin-Destination (OD) node pairs, where the nodes network traffic prediction approach that combines the Monte-
can be routers, switches, or point of presences. We denote Carlo learning and the Q-learning. In our mechanism, we
this matrix by X in this paper, and we denote each element of predict each OD flow independently. Thereby, we mainly
the traffic matrix by X (n, t), where n is the index of an OD present the predicting process of an OD flow as an example
flow and t denotes time slot. If the IIoT contains N nodes, for convenience. In view of the computation complexity of
then n = 1, 2, 3, ..., N 2 . This consideration is established when the proposed mechanism, we further propose a modified
we consider an OD flow enters and exits our network at the approach based on dictionary learning to improve its real-time
same node; otherwise, n = 1, 2, 3, ..., N (N − 1). Without loss of performance.
generally, in this paper, we set n = 1, 2, 3, ..., N 2 . Meanwhile,
we assume that the traffic matrix X records the volume of
network traffic with T time slots, namely t = 1, 2, 3, ..., T. A. Monte-Carlo Q-learning-based Network Traffic Prediction
The network traffic yields various statistical features. These RL [34] is one of the most prevalent machine learning
features can be grouped into three dimensions: temporal, methods. It calculates a policy by an agent implementing
spatial, and spatio-temporal. The temporal feature represents iterative actions to research the best policy with the maximum
the time-varying of an OD flow, and the spatial feature defines reward. In detail, RL takes advantage of a sample under the
the relationship between two OD flows influenced by the environment to guide the next action. Meanwhile, the relative
locations of users and servers in the logical topology. The reward can be achieved during this process, and then the
spatio-temporal feature indicates how close adjacent elements environment can be updated according to the reward. The RL
in the traffic matrix are to each other in size. In our network can be defined as an MDP denoted by hS, A, P, R, γi, where
traffic prediction mechanism, we mainly take into account the S is the set of states and A is the set of actions. R is the
time-varying feature of the traffic matrix. Hence, to present immediate reward from the current state to the next according
4
Prior Predictor For network traffic prediction, we define the sampling of OD

a∗ flows with two elements as (Xt−1, Xt ), where Xt−1 and Xt are
X t−1 Xt the prior of network traffic and relative predictor, respectively.
(X t −1
,a ∗ ) The problem of network traffic prediction is computing Xt
when Xt−1 is known. In this case, we define the problem of
network traffic prediction as the following optimization model:
Monte-Carlo Q-Learning min K L (p (Xt |Xt−1 ) kp0 (X 0 t |X 0 t−1 ) )
s.t. Xt ∈ [Xmin, Xmax ] ,
X 0 t ∈ [X 0 min, X 0 max ] , (5)
0 i.i.d
p, p ∼ PX .
(X ′T −W
,aT′ −W ,rT′−W +1 , X T′ −W +1 ..., X T′ −1, aT′ −1, rT′ , X T′ )
In Eq. (5), p and p0, which are independent and identically
Training Dataset
distributed, denote the distributions of the predicted and prior
network traffic data sets, respectively. Note that the constraint
i.i.d
Fig. 1. Network traffic prediction based on RL. p, p0 ∼ PX mainly declares that p and p0 are independent
and identically distributed. The distribution PX is unknown
to us. Meanwhile, because the problem of network traffic
to the given policy with environment. P is the transition
prediction can be viewed as calculating a conditional prob-
probability matrix, and γ is the discount factor [35].
ability, we use the KL divergence denoted by K L (·) as the
Q-learning has been employed and widely developed in RL. metric between two distributions [17]. Then, the problem of
For an agent with state s (s ∈ S), when it moves to the network traffic prediction is to search the optimal probability
next state s̃ (s̃ ∈ S) after implementing an action a (a ∈ A) distribution p∗ (Xt |Xt−1 ) with the minimized KL divergence.
according to policy π (s, a), an immediate reward R (s, a) can Furthermore, this optimization model is convex, and the trace
be achieved. The reward with respect to the current state s by of the objective function is a parabola with respect to p. To
taking action a, denoted by Q (s, a), is a weight sum via the solve this convex optimization model, we propose our RL-
immediate reward R (s, a) of moving to the next state s̃, which based mechanism. The framework of the proposed mechanism
can be calculated by: is illustrated in Fig. 1. Calculating the predictor Xt from the
known network traffic Xt−1 is regarded as an action carried out

Q (s, a) = R (s, a) + γ max Q (s̃, ã) , (3) according to a specified policy π (Xt−1, a). Consequently, the
ã ∈ Ã
next issue is how to determine an optimal policy π (Xt−1, a∗ )
where ã ∈ Ã is the action set under the next state s̃. In fact, Q- with the maximum reward and implement the action a∗ to
learning can be regarded as an optimization problem that finds achieve the predictor Xt . To deal with this problem, the
an optimal policy to maximize the reward Q (s, a). During the Monte-Carlo Q-learning algorithm based on KL divergence
iterative process of finding the maximum reward, the reward is proposed to calculate the optimal policy π (Xt−1, a∗ ).
is updated by a weighted sum: The KL divergence-based Monte-Carlo Q-learning algo-
rithm is employed to determine the optimal policy through
Q (s, a) ← (1 − α) Q (s, a) + α R (s, a) + γ max Q (s̃, ã) , a training data set denoted by X 0. Each element of X 0
ã ∈ A
is the network traffic with W + 1 time slots denoted by
(4)
(X 0T −W , X 0T −W +1 ..., X 0T −1, X 0T ). We consider that the net-
where α ∈ [0, 1]. During each iteration, the sampling of 0
work traffic moves from State X j−1 to State X j0 by imple-
state can be chosen via exploration-only and exploitation-only 0
menting Action a j−1 according to a policy and receives an
approaches. The former is suitable for sampling with large
immediate reward r j0. Then, we are able to generate a sequence:
range and selects a sampling of state according to the uniform
distribution. By contrast, the latter chooses the sampling with (X 0T −W , a 0T −W , r 0T −W +1, X 0T −W +1 ..., X 0T −1, a 0T −1, r 0T , X 0T ) .
the current maximum cumulative reward. (6)
We first assume that Xt ∈ [Xmin, Xmax ], where Xmin and The generated sequence depends on an initialized reward R. In
Xmax are the minimum and maximum values of traffic flow, our mechanism, the reward R is a (X 0 max − X 0 min + 1)-order
respectively. Xmin and Xmax are known to us because they can square matrix whose rows and columns are indexed by the
be determined from the prior of network traffic. This assump- values of traffic flows, i.e.,
tion is established due to the prediction of trends of traffic  r 0 X 0 mi n,X 0 mi n r 0 X 0 mi n,X 0 mi n +1 · · · r 0 X 0 mi n,X 0 ma x

flows. Namely, predicting an OD flow can be transformed  0
r0 · · · r 0 X 0 mi n +1,X 0 ma x

 r X 0 mi n +1,X 0 mi n
X 0 mi n +1,X 0 mi n +1 
to predicting its variation profile instead of predicting each .
.. .. ..
 
. . .

numerical value exactly. This is because the network security 
 ··· 

and management functions using predicted network traffic  r 0 X 0 ,X 0 r 0 X 0 ma x ,X 0 mi n +1 · · · r 0 X 0 ma x ,X 0 ma x 
 ma x mi n
always have fault-tolerance for prediction error. In practice, (7)
the variation trends of OD flows are suitable enough, and do The element rs01,s2 (s1, s2 ∈ [X 0 min, X 0 max ]) represents the
not have a negative impact on deploying any network security immediate reward when the agent moves from State s1 to
and management functions. State s2 by implementing action as01,s2 . Eq. (6) declares the
5
state and reward sequence of MDP with the state space Algorithm 1 Network Traffic Prediction Based on Monte-
Xs0 = [X 0 min, X 0 max ]. The reward of this sequence is defined Carlo Q-learning
as the mean of all immediate rewards: Require: training data set X 0 with P time slots; state space
T Xs0 = [X 0 min, X 0 max ]; number of iterations Z.
1 Õ
Ensure: π (X 0, a 0)
Rc = r 0 j, (8)
W j=T −W +1 1: Q ← [](X 0 ma x −X 0 mi n +1)×(X 0 ma x −X 0 mi n +1)
2: rs01,s2 ← 1, s1, s2 ∈ Xs0
where r 0 j denotes the reward when the agent moves from State
0 to State X 0 . After that, we can update the policy π (X 0, a 0 ) 3: for s1 ∈ X 0 do
X j−1 j 4: rs01,s2 ← rs01,s2 + 1
using Algorithm 1. At this time, we can predict Xt according to
5: end for 0
the optimal policy π (Xt−1, a∗ ). In this algorithm, the reward r s ,s
6: rs01,s2 ← |X 01|−12
R is initialized according to the training data set, and each
7: for z = 1, 2, ..., Z do
element rs01,s2 records the ratio of actions from State s1 to
8: Generate a series shown by Eq. (6) from training data
State s2 in the training data set. Namely, we also ensure that
set X 0
all the elements in the initialized reward R are non-zero, and
r0s , s 9: for i = T − W, T − W + 1, ..., T − 1 do
we set rs01,s2 = 1. After that, we set rs01,s2 = |X 01|−12 , where |X 0 | T
Rc = T1−i r0j
Í
is the number of elements in the training data set. To obtain 10:
j=i
the optimal policy, Q (X 0i, a 0i ) is updated by: 11: Update Q (X 0i, a 0i ) according to Eq. (9)
Q (X 0i, a 0i ) ← (1 − α) Q (X 0i, a 0i ) 12: end for
+α (R (X 0i, a 0i ) + γ (max Q (X 0i+1, a 0i+1 ))) 13: if ε < E then
+K L (p (Xi+1 |Xi ) kp0 (X 0i+1 |X 0i ) ). 14: π (X 0, a 0) ← arg max Q (X 0, ã 0)
ã0
(9) 15: else
Under this update rule, the convergence can be guarantee 16: π (X 0, a 0) ← π (X 0, ã 0) , ã 0 ∼ U (X 0 max, X 0 min )
to find optimal solution with probability 1. We denote the 17: end if
optimal policy by Q∗ (Xt−1, at−1 ), and then we define ∆Q = 18: end for
Q∗ (Xt−1, at−1 ) − Q (Xt−1, at−1 ). According to Eq. (9), we have
the following: Prior Predictor
∆Q = (1 − α) ∆Q+ a∗
X t−1 Xt
= α (R + γ (max Q (Xt , at )) , (10)
−Q∗ (Xt−1, at−1 ) + α−1 K L(p kp0 ) (
π Xt −1, a ∗ )
where R = R (Xt−1, at−1 ) and K L (p∗ kp0 ) = 0. We define
F (Xt−1, at−1 ) as: Monte-Carlo Q-Learning
F (Xt−1, at−1 ) = R + γ (max Q (Xt , at ))
. (11)
−Q∗ (Xt−1, at−1 ) + α−1 K L (p kp0 )
From Theorem 1 in [36], the random iterative process ∆Q (α T −W
,aT′ −W ,rT′−W +1 , αT −W +1 ..., αT −1, aT′ −1, rT′ , αT )
converges to 0 with probability 1 under the following two

assumptions: (α T −W
, αT −W +1 ..., αT −1 , αT )
kE {F (Xt−1, at−1 )}kW ≤γk∆QkW ,

(
(12)
var {F (Xt−1, at−1 )} ≤ C 1 + k∆QkW
2
,
Adaptive Dictionary Learning
where γ < 1 and the constant C > 0. k·kW refers to some
weighted maximum norm [36]. Obviously, after adding the KL
(X ′ T −W
, XT′ −W +1 ..., XT′ −1, XT′ )
divergence shown by Eq. (9), these two assumptions can still
be established for the infinite norm because the KL divergence Training Data Set
is nonnegative.
Meanwhile, both the exploration-only and exploitation-only Fig. 2. Network traffic prediction based on RL and adaptive dictionary
approaches are used to update the policy. The variable ε, which learning.
obeys the uniform distribution ε ∼ U (0, 1), is employed to
determine the optimal tradeoff between these two approaches.
Q-learning algorithm is massive resulting in poor real-time
performance. In other words, if the volume of an OD flow
B. Dictionary Learning for Traffic Matrix is [X 0 min, X 0 max ], then the cumulative reward Q (X 0, a 0) is a
Nowadays, an IIoT supports many network applications and (X 0 max − X 0 min + 1)-order square matrix, which will occupy
services for industrial applications. Consequently, the volume a significant amount of memory for deploying the proposed
of network traffic has increased exponentially. Furthermore, network traffic prediction mechanism.
the state space of the KL divergence-based Monte-Carlo To address this issue, we propose a dictionary learning
6
nl ← arg min η = φ nl 0 + φ nl 2 . During each iteration,

algorithm to project the network traffic to a transform domain,
n<N l
where the coefficients are fewer and smaller than those of the column of the residual term with the minimum index is
the time domain. This mapping process can decrease the determined and normalized to gain an orthogonal basis. From
implementation time of the proposed network traffic prediction Eq. (15), we observe that the proposed dictionary learning-
mechanism. The framework of the modified RL-based network based solution in fact calculates an orthogonal basis by way
traffic prediction mechanism is illustrated in Fig. 2. In detail, of the Schmidt orthogonalization.
the modified mechanism first conducts a dictionary learning
algorithm over the traffic matrix to acquire the coefficient φ nl+1 ← φ nl − d l < d l , φ nl > . (15)
matrix. Each row of the coefficient matrix represents the trace
Hence, the computational complexity is much lower than
of an OD flow in the transform domain. After that, we use
that of other state-of-the-art methods, such as the Principal
the KL divergence-based Monte-Carlo Q-learning algorithm
Component Analysis (PCA) [32] and K-SVD. Besides, during
to predict these coefficients instead of the network traffic in
each iteration, one column of the training data is selected
time domain.
with minimum index independently. Thereby, we can obtain
At this moment, the problem of dictionary learning is to
the optimal convergence by way of this greedy algorithm.
find an orthogonal matrix D such that the following equation
Meanwhile, according to the definition of the minimum index
holds:
and the greedy adaptive dictionary learning algorithm, we have
X 0 = D α, (13)
the following proposition.
where α is the coefficient with respect to dictionary D. Re-
searchers have proposed many methods to create a dictionary Algorithm 2 Residual-based Dictionary Learning
based on given data. These methods consist of the dictionary Require: Φ0 = X 00; D0 = []; l = 0; N 0 = ∅.
selection and dictionary learning. The typical dictionary selec- Ensure: Dl+1
tion contains wavelet basis and Fourier basis. The dictionary 1: for l = 0, 1, 2..., N2 − 1 do
nl ← arg min η = φ nl 0 + φ nl 2

learning constructs a dictionary by an alternating optimization 2:
l
strategy. Existing dictionary learning strategies always look for .n<N

d l ← φ nl l φ nl l

a dictionary to make the data sparse, but the size of coefficient 3:
2
is usually explicitly neglected. Both signal representation and 4: Dl+1 ← [Dl, d l ]
image processing generally focus on the sparsity of the coeffi- 5: N l+1 ← N l ∪ {nl }
cient after decomposition. Different from the computationally 6: for n = 1, 2..., N 2 do
expensive algorithms for sparse representation, we investigate 7: φ nl+1 ← φ nl − d l < d l , φ nl >
a sparse decomposition strategy that takes into account the 8: end for
sparse representation and size of coefficients jointly. 9: end for
In our dictionary learning algorithm, we define a minimum
index to match the sparse representation and minimize coeffi- Proposition 1: The minimum index can make the coeffi-
cients. cients small and sparse, which will decrease the state space in
Definition 1: The minimum index is defined as: the Monte-Carlo Q-learning algorithm.
Proof: The function of the minimum index can be achieved
η = k c n k 0 + k c n k 2, (14)
by means of the definitions of `0 -norm, `2 -norm and the
where k·k 0 and k·k 2 are the `0 - and `2 -norms, respectively. greedy dictionary learning algorithm based on the Schmidt
The vector c n is the nth column of the training data set X 0, orthogonalization. The `0 -norm is defined as the number of
and n = 1, 2, ..., N 2 . non-zero elements in a vector, and the `2 -norm reflects its
The minimum index measures two features of coefficients. energy. In this case, when we define the residual term as the
The `0 -norm in Eq. (14) derives the sparsity of the coefficients, prior of the traffic matrix, the equation φ nl − d l < d l , φ nl >
and the `2 -norm is used to keep the coefficients as small as will implement the Schmidt orthogonalization in the sense that
possible. Note that we build a novel prior of the traffic matrix, all remaining residual terms are orthogonal to d l . At the same
denoted by X 00, collected from X 0 to construct the orthogonal time, during the next iteration, the optimal orthogonal vector
matrix D. This novel X 00 consists of N 2 time slots, that is the with the minimum `0 -norm and `2 -norm can be obtained by:
matrix X 00 is N 2 × N 2 .
nl ← arg min η = φ nl 0 + φ nl 2 .

Our adaptive dictionary learning algorithm gains an or- (16)
n<N l
thogonal matrix by multiple h iterations. During each iteration
i Hence, by several greedy iterations, the proposed adaptive
l, a residual term Φl = φ1l , φ2l , ..., φ lN 2 is defined in our dictionary learning algorithm is able to find the optimal
2
algorithm, where φ nl ∈ R N is the nth column of the residual dictionary in the sense that the coefficients obey the optimal
term. The dictionary learning algorithm is shown in Algorithm sum of the `0 -norm and `2 -norm. That is, the coefficients
2. The notation nl denotes the nth column of the residual term are sufficiently sparse and small. From Algorithm 1, we see
at the lth iteration. In the dictionary learning algorithm, the that the computation complexity is O (Z). The size of Z
residual term is initialized as the prior of the traffic matrix belongs to the state space [X 0 min, X 0 max ]. Namely, because the
X 00. The notation N l , which is initialized as a null set, records network traffic is nonnegative integers, the number of states is
the indexes of the columns of the residual term selected by (X 0 max − X 0 min + 1). In fact, the network traffic can be equal
7
to 0 occasionally. Then, there are Xmax 0 states in the Monte- Error (SRE) and Temporal Relative Error (TRE) as:
Carlo Q-learning. We must set Z Xmax 0 for the algorithm to
 SRE (n) = k kX(n,t) k k 2 ,
 X̂(n,t)−X(n,t)
always be valid. In addition, the cumulative reward Q (X 0, a 0)


2 (17)
is a (X 0 max − X 0 min + 1)×(X 0 max − X 0 min + 1) matrix. Above  T RE (t) = k kX(n,t) k k 2 ,
 X̂(n,t)−X(n,t)
2
all, it is significantly difficult to use the Monte-Carlo Q- 
0
learning algorithm without making Xmax sufficiently small. where X̂ (n, t) is the predicted network traffic with respect to
By contrast, after projecting the original network traffic into a X (n, t). Figs. 3(a) and 3(b) respectively show the SREs and
transform domain, the real-time performance can be improved TREs of four network traffic prediction mechanisms for the
obviously. data set via our testbed, respectively. In Fig. 3(a), the x- and
y-axes represent the ordered flows from smallest to largest
V. N UMERICAL R ESULTS according to their mean and the SREs, respectively. In our
testbed, the volumes of several OD flows are zero, and we
A. Network Traffic Data Set delete these OD flows so that k X (n, t)k 2 , 0. The SREs of the
To evaluate the proposed RL-based network traffic pre- four mechanisms decrease as the size of the flow increases.
diction mechanism, we built a testbed including 12 nodes. The RL mechanism has high SREs for OD 5, 11, and 15.
The constructed wireless network supports various services By contrast, SRMF and LSTM have larger SREs for partial
consisting of video, voice, etc. The topology of the testbed is OD flows compared to the other two approaches. The means
generated by the Open Shortest Path First (OSPF) algorithm, of SREs for RL, PCA, SRMF, and LSTM are 3.20, 8.74,
and then the weights of the OSPF are defined as the general 8.16, and 9.31, respectively. Fig. 3(b) shows the TREs of RL,
urban path loss model, as shown in [37]. We sample the end- PCA, SRMF, and LSTM. Obviously, RL has the lowest TRE
to-end network traffic data with 10100 time slots. Namely, among the four approaches, and PCA has the highest TRE.
the traffic matrix is 144 × 10100. We employ the first 10000 The means of TREs for RL, PCA, SRMF, and LSTM are 0.55,
time slots as the training data set for the KL divergence- 1.28, 1.09, and 1.20, respectively. We observe that the TREs
based Monte-Carlo Q-learning algorithm. In the training data of RL show some fluctuations in size. Namely, some TREs
set with 10000 time slots, the first 144 time slots are used are close to zero, while others are much higher. The profile
to implement the proposed dictionary learning algorithm. of the flow in our testbed yields much more fluctuations,
Besides, the GÉANT backbone network traffic was employed meaning it requires many more iterations to update the optimal
in our simulations. It is made up of 23 nodes and 120 links, policy. In other words, if we make Z larger, the TREs will be
and records 3360 samples of network traffic. The first 3000 smoother. However, the means of the TREs do not decrease as
time slots are used as the training data set, and the first Z increases. As a result, we mainly consider the computational
800 samplings of the training data set are for the proposed complexity rather than the fluctuations of TREs. Figs. 3(c)
residual-based adaptive dictionary learning algorithm. We train and 3(d) represent the evaluations in the GÉANT data set. We
the proposed framework shown in Algorithm 1 in batch, and observe that the SRE of SRMF is enormous for several OD
the size of batch is 10. Moreover, we set Z = 10, E = 0.4, flows, and the SREs of RL, PCA, and LSTM are relatively low
α = 0.5, and γ = 0.01 in Algorithm 1. These parameters are in comparison. Some traffic flows in the GÉANT backbone
set empirically to obtain the lowest prediction error. We used network have sharp fluctuations. The SRMF method assumes
MATLAB R2018a to conduct all evaluations. The evaluations that the values of two adjacent elements are similar, which can
were conducted on a 64-bit Windows 7 machine running with lead to large SREs for predicting sharp fluctuations. Thereby,
an Intel Xeon W-2102 (2.9 GHz) and 32 GB RAM. from Fig 3(c), we obverse TREs of SRMF close to order 105 .
In traditional large-scale backbone networks, the PCA The LSTM method achieves the best TRE for the GÉANT
method and the Sparsity Regularized Matrix Factorization data set, as shown by Fig. 3(d). LSTM has an outstanding
(SRMF) [38] method are state-of-the-art techniques for end-to- capability of learning long-range dependence, which is suitable
end network traffic prediction. The former takes into account for predict long-term traffic, such as in the GÉANT backbone
the power laws of a traffic matrix, and predicts the network network. The means of TREs for RL, PCA, SRMF, and LSTM
traffic in the principal component domain. The latter utilizes are 0.58, 0.86, 0.78, and 0.34, respectively.
the spatio-temporal features of a traffic matrix, that is, the To observe the SRE and TRE intuitively, we present the
neighboring elements are close to each other in value. It Cumulative Distribution Functions (CDF) of SREs and TREs
can implement both network traffic estimation and prediction in Fig. 4. In Fig. 4(a), for RL, PCA, SRMF and LSTM in
by specifying a certain configuration. Hence, in this section, our testbed data set, the SREs are less than 2.3, 19.0, 16.0,
we compare our network traffic prediction mechanism with and 1.28, respectively, with respect to 80% of all the OD
PCA and SRMF. Moreover, we also take into account the flows. In Fig. 4(b), the TREs are less than 0.7, 1.4, 1.3, and
LSTM method mentioned in [26] to evaluate the proposed 1.4, respectively, for 80% of the time slots. From Figs. 4(c)
mechanism. and 4(d), it can bee seen that SRMF is better in terms of TRE
compared to PCA, but worse in terms of SRE. For LSTM and
RL, they mainly focus on the temporal features of network
B. Prediction Error Analysis traffic. Hence, their improvements are large in TRE but small
To subsequently assess the overall performance of the in SRE. Comparing with LSTM, RL prefers to predict short-
proposed prediction mechanism, we define the Spatial Relative term traffic over long-term traffic. Hence, the TRE of LSTM
8
300
RL 200 RL
100 0.8 PCA
0
0 20 40 60 80 100 120 SRMF
300 0.6 LSTM
CDF
PCA
200
100
0.4
0
0 20 40 60 80 100 120
0.2
LSTM SRMF
300
200
100
0
0
0 20 40 60 80 100 120 0 50 100 150 200 250
300 SRE
200
100
0
(a) CDF of SRE in testbed.
0 20 40 60 80 100 120
(a) SREs in testbed. 1

RL
PCA
2
SRMF
RL
1 LSTM
CDF
0 0.5
10 20 30 40 50 60 70 80 90 100
2
PCA
1
0
10 20 30 40 50 60 70 80 90 100 0
2
LSTM SRMF
0 0.5 1 1.5 2
1 TRE
0
10 20 30 40 50 60 70 80 90 100 (b) CDF of TRE in testbed.
2
1
1
0
10 20 30 40 50 60 70 80 90 100 RL
PCA
(b) TREs in testbed. SRMF
CDF
LSTM
0.5
104
10
RL
5
0
0 50 100 150 200 250 300 350 400 450 500
10
104 0
PCA
0 0.5 1 1.5 2 2.5 3

5
0
SRE 105
0 50 100 150 200 250 300 350 400 450 500
4
10 (c) CDF of SRE in GÉANT.
LSTM SRMF
10
5
0
0 50 100 150 200 250 300 350 400 450 500
104 RL
10
0.8 PCA
5
0
SRMF
0.6
CDF
0 50 100 150 200 250 300 350 400 450 500 LSTM
(c) SREs in GÉANT. 0.4
0.2
RL
1 0
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
50 100 150 200 250 300 350
TRE
PCA
1 (d) CDF of TRE in GÉANT.

0
50 100 150 200 250 300 350
Fig. 4. CDFs of SREs and TREs in our testbed and GÉANT.
LSTM SRMF
0
50 100 150 200 250 300 350
where error (n) = X̂ (n, t) − X (n, t). Fig. 5(a) shows the bias
1
0
50 100 150 200 250 300 350
of each OD flow for the testbed data set. The x-axis denotes
(d) TREs in GÉANT.
the identities of OD flows arranged in descending order with
Fig. 3. SREs and TREs in our testbed and GÉANT. respect to their averages. We observe that the bias of PCA is
much larger than that of the others when predicting for small
OD flows. The reason for this is that some components with
is lower than that of RL in the GÉANT data set but higher respect to small singular values are neglected in the sense that
for the testbed data set. just a small number of principal components with respect to
It is well known that biased estimators sometimes can have larger singular values are retained to approximate the traffic
lower variance, and then estimates are closer to the true value matrix. SRMF extracts the features between two neighboring
than those of unbiased estimators. Consequently, we evalu- elements in a traffic matrix, making its bias consistent. LSTM
ate the performance of RL aiming at short-range dependent shows an overestimation for the largest flow. For the RL-
and long-range dependences separately, respectively. For this based network traffic prediction mechanism, it mainly takes
purpose, the prediction bias and relative sample Standard advantage of the temporal features of the traffic matrix. In
Deviation (SD) [32] are involved in our experiments, which each iteration, the policy is updated according to a generated
are defined as follows: sequence from the prior of an OD flow. Hence, the RL-based
T mechanism is independently over-estimate and under-estimate
bias(n) = T1 X̂ (n, t) − X (n, t) ,

for diverse OD flows. In addition, RL has the lowest bias

 Í



 s t=1 among these four approaches. Fig. 5(b) displays the bias for
T
(18)
 SD(n) = each flow against its sample standard deviation (or variance)
1 Í
(error(n) − bias(n))2,

T −1


 t=1 for the testbed data set. The variance of LSTM is lowest among

9
1000 RL and PCA methods’ error variances are relatively high,

RL
PCA while the SRMF and LSTM methods maintain relatively
500 SRMF
LSTM
low variances. Obviously, PCA shows entirely different error
Bias
variances for two data sets. Although the profiles of traffic

0
flows in GÉANT are smoother, there are still several enormous
-500
low-high transitions, which cause higher variance.
0 50 100 150 Finally, we evaluate the performance of the four approaches
Flow ID for two data sets by referring to the performance improvement
(a) Bias of predicted results in testbed. ratio, which is defined as:
1000
RL
Í2 Í
N T N2 T
X̂a (n, t) − X (n, t) − Í Í X̂b (n, t) − X (n, t)
PCA
n=1 t=1 n=1 t=1
500 SRMF R= ,
LSTM Í2 Í
N T
Bias

X̂a (n, t) − X (n, t)
0
n=1 t=1
(19)
-500 where X̂a (n, t) and X̂b (n, t) are the respective predictors of
0 1 2 3 4 5 6 7 8 9 10 two algorithms. The performance ratios of RL versus PCA,
SD in Error 105
SRMF, and LSTM in our testbed data set are 73.40%, 58.59%,
(b) Bias versus SD in testbed.
and 63.18%. For the GÉANT data set, the performance ratios
105
of RL versus PCA and SRMF are 15.90% and 51.81%,
5 RL respectively. The RL method has no performance improvement
PCA to LSTM. This is because RL and LSTM are designed for
SRMF
LSTM short-term and long-term traffic, respectively. As a result,
Bias
0
RL does not perform well compared with LSTM for traffic
prediction of any backbone network.
-5
0 100 200 300 400 500 600
Flow ID
VI. C ONCLUSIONS AND F UTURE W ORKS
(c) Bias of predicted results in GÉANT. This paper investigates the problem of end-to-end network
traffic prediction in IIoTs for industrial applications. Although
105
5
machine learning-based methods are widely used to capture
RL
PCA
the statistical features of traffic flows for network traffic
SRMF prediction, there are some issues when they are applied to
Bias
LSTM
0 IIoTs directly. These issues mainly consist of high computa-
tional complexity and large consumption caused by the direct
measurement of large scale training data set. Motivated by
-5
these observations, we proposed an RL-based network traffic
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
SD in Error 1010
prediction mechanism with lower computational complexity
(d) Bias versus SD in GÉANT.
and a smaller scale training data set. Aiming at a small
training data set, we took advantage of RL to predict network
Fig. 5. Biases and SD in our testbed and GÉANT. traffic, and proposed a KL divergence-based Monte-Carlo Q-
learning algorithm. Furthermore, to reduce the computational
complexity of the proposed algorithm, a dictionary learn-
the four approaches. RL and SRMF have few larger sample ing mechanism was also proposed. The proposed RL-based
standard deviations. Therefore, they tend to capture individual network traffic prediction mechanism was evaluated on real
flow elements. In theory, as mentioned before, both RL and network traffic from our testbed and the GÉANT network.
SRMF predict network traffic via adjacent elements. On the According to the evaluation, the proposed mechanism can
contrary, PCA and LSTM obtain a predictor from a series of predict short-term traffic effectively. The RL-based network
previous flow elements. We can obtain similar conclusions via traffic prediction mechanism can improve prediction accuracy
the evaluation of the GÉANT backbone network, as shown efficiently. However, some works are still necessary to improve
in Figs. 5(c) and 5(d). The volume of network traffic in the performance of the proposed algorithm adequately, which
GÉANT is much larger than that in our testbed. Consequently, will be our main work in the future.
compared with the testbed data set, the absolute biases of For our future work, the proposed mechanism can first be
all four algorithms are much larger in GÉANT. The SRMF expanded for long-term traffic prediction by designing a novel
method shows an overestimation for the largest OD flows, RL algorithm. Besides, the spatial and spatio-temporal features
with the bias increasing with flow size. The PCA method is can be considered as additional constraints to improve the
quite different from the other three methods; underestimation prediction error. For instance, we can define a novel immediate
appears for small flows. From Fig. 5(d), we observe that the reward by a deep learning architecture, e.g., deep neural
10
network and CNN for spatial and spatio-temporal feature [15] Z. Ning, P. Dong, X. Wang, X. Hu, L. Guo, B. Hu, Y. Guo, T. Qiu,
extraction, respectively. and R. Y. K. Kwok, “Mobile edge computing enabled 5G health mon-
itoring for Internet of Medical Things: A decentralized game theoretic
approach,” IEEE Journal on Selected Areas in Communications, 2020.
VII. ACKNOWLEDGEMENTS [16] J. Zhang, X. Hu, Z. Ning, E. Ngai, L. Zhou, J. Wei, J. Cheng, and
B. Hu, “Energy-latency tradeoff for energy-aware offloading in mobile
This work was supported in part by the National Key R&D edge computing networks,” IEEE Internet of Things Journal, vol. 5,
no. 4, pp. 2633–2645, Aug 2018.
Program of China under Grant 2018YFE0206800, in part by [17] P. Yang and B. Chen, “Robust Kullback-Leibler divergence and universal
the National Natural Science Foundation of China under Grant hypothesis testing for continuous distributions,” IEEE Transactions on
61701406 and Grant 61971084, in part by the National Natural Information Theory, vol. 65, no. 4, pp. 2360–2373, April 2019.
[18] Y. Xu, F. Yin, W. Xu, J. Lin, and S. Cui, “Wireless traffic prediction with
Science Foundation of Chongqing under Grant cstc2019jcyj- scalable gaussian process: Framework, algorithms, and verification,”
msxmX0208, in part by the open research fund of National IEEE Journal on Selected Areas in Communications, vol. 37, no. 6,
Mobile Communications Research Laboratory, Southeast Uni- pp. 1291–1306, June 2019.
[19] M. Akintunde, P. Kgosi, and D. Shangodoyin, “Evaluation of GARCH
versity, under Grant 2020D05, and in part by the Applied Basic model adequacy in forecasting non-linear economic time series data,”
Research Programs of Qingdao City under Grant 18-2-2-36- Journal of Computations and Modelling, vol. 3, no. 2, pp. 1–20, 2013.
jch. [20] C. Xiang, P. Qu, and X. Qu, “Network traffic prediction based on MK-
SVR,” Journal of Information and Computational Science, vol. 12, no. 8,
pp. 3185–3197, 2015.
R EFERENCES [21] M. Joshi and T. Hadi, “A review of network traffic analysis and
prediction techniques,” Computer Science, 2015.
[1] Z. Ning, X. Wang, J. J. P. C. Rodrigues, and F. Xia, “Joint computation [22] Y. Liu, B. Li, X. Sun, and Z. Zhou, “A fusion model of SWT, QGA
offloading, power allocation, and channel assignment for 5G-enabled and BP neural network for wireless network traffic prediction,” in 2013
traffic management systems,” IEEE Transactions on Industrial Infor- 15th IEEE International Conference on Communication Technology,
matics, vol. 15, no. 5, pp. 3058–3067, May 2019. Nov 2013, pp. 769–774.
[2] Q. He, X. Wang, Z. Lei, M. Huang, Y. Cai, and L. Ma, “TIFIM: A two- [23] H. Liu, L. T. Yang, J. Chen, M. Ye, J. Ding, and L. Kuang, “Multivariate
stage iterative framework for influence maximization in social networks,” multi-order markov multi-modal prediction with its applications in net-
Applied Mathematics and Computation, vol. 354, pp. 338 – 352, 2019. work traffic management,” IEEE Transactions on Network and Service
[3] D. Zhang, L. Tan, J. Ren, M. K. Awad, S. Zhang, Y. Zhang, and Management, vol. 16, no. 3, pp. 828–841, Sep. 2019.
P. Wan, “Near-optimal and truthful online auction for computation [24] R. Alvizu, S. Troia, G. Maier, and A. Pattavina, “Matheuristic with
offloading in green edge-computing systems,” IEEE Transactions on machine-learning-based prediction for software-defined mobile metro-
Mobile Computing, vol. 19, no. 4, pp. 880–893, 2020. core networks,” IEEE/OSA Journal of Optical Communications and
[4] H. Yang, X. Xie, and M. Kadoch, “Intelligent resource management Networking, vol. 9, no. 9, pp. D19–D30, Sep. 2017.
based on reinforcement learning for ultra-reliable and low-latency IoV [25] C. Qiu, Y. Zhang, Z. Feng, P. Zhang, and S. Cui, “Spatio-temporal
communication networks,” IEEE Transactions on Vehicular Technology, wireless traffic prediction with recurrent neural network,” IEEE Wireless
vol. 68, no. 5, pp. 4157–4169, May 2019. Communications Letters, vol. 7, no. 4, pp. 554–557, Aug 2018.
[5] X. Wang, Z. Ning, and L. Wang, “Offloading in Internet of Vehicles: A [26] S. K. Singh and A. Jukan, “Machine-learning-based prediction for
fog-enabled real-time traffic management system,” IEEE Transactions resource (re)allocation in optical data center networks,” IEEE/OSA
on Industrial Informatics, vol. 14, no. 10, pp. 4568–4578, Oct 2018. Journal of Optical Communications and Networking, vol. 10, no. 10,
[6] T. Otoshi, Y. Ohsita, M. Murata, Y. Takahashi, K. Ishibashi, K. Sh- pp. D12–D28, Oct 2018.
iomoto, and T. Hashimoto, “Hierarchical model predictive traffic engi- [27] J. Zhao, H. Qu, J. Zhao, and D. Jiang, “Towards traffic matrix prediction
neering,” IEEE/ACM Transactions on Networking, vol. 26, no. 4, pp. with LSTM recurrent neural networks,” Electronics Letters, vol. 54,
1754–1767, Aug 2018. no. 9, pp. 566–568, 2018.
[7] H. Yang, A. Alphones, W. Zhong, C. Chen, and X. Xie, “Learning-based [28] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, “Deep transfer
energy-efficient resource management by heterogeneous RF/VLC for learning for intelligent cellular traffic prediction based on cross-domain
ultra-reliable low-latency industrial IoT networks,” IEEE Transactions big data,” IEEE Journal on Selected Areas in Communications, vol. 37,
on Industrial Informatics, pp. 1–1, 2019. no. 6, pp. 1389–1401, June 2019.
[8] Y. Zhang and H. Chiang, “Enhanced ELITE-load: A novel CMPSOATT [29] G. Choudhury, D. Lynch, G. Thakur, and S. Tse, “Two use cases of
methodology constructing short-term load forecasting model for indus- machine learning for SDN-enabled IP/optical networks: traffic matrix
trial applications,” IEEE Transactions on Industrial Informatics, vol. 16, prediction and optical path performance prediction [invited],” IEEE/OSA
no. 4, pp. 2325–2334, April 2020. Journal of Optical Communications and Networking, vol. 10, no. 10, pp.
[9] J. Ren, D. Zhang, S. He, Y. Zhang, and T. Li, “A survey on end- D52–D62, Oct 2018.
edge-cloud orchestrated network computing paradigms: Transparent [30] F. Morales, M. Ruiz, L. Gifre, L. M. Contreras, V. Lopez, and L. Velasco,
computing, mobile edge computing, fog computing, and cloudlet,” ACM “Virtual network topology adaptability based on data analytics for
Computing Surveys, vol. 52, no. 6, pp. 125:1–125:36, 2020. traffic prediction,” IEEE/OSA Journal of Optical Communications and
[10] Z. Ning, X. Hu, Z. Chen, M. Zhou, B. Hu, J. Cheng, and M. S. Obaidat, Networking, vol. 9, no. 1, pp. A35–A45, Jan 2017.
“A cooperative quality-aware service access system for Social Internet [31] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Citywide cellular traffic
of Vehicles,” IEEE Internet of Things Journal, vol. 5, no. 4, pp. 2506– prediction based on densely connected convolutional neural networks,”
2517, Aug 2018. IEEE Communications Letters, vol. 22, no. 8, pp. 1656–1659, Aug 2018.
[11] Z. Ning, P. Dong, X. Wang, M. S. Obaidat, X. Hu, L. Guo, Y. Guo, [32] A. Soule, A. Lakhina, N. Taft, K. Papagiannaki, K. Salamatian, A. Nucci,
J. Huang, B. Hu, and Y. Li, “When deep reinforcement learning meets M. Crovella, and C. Diot, “Traffic matrices: balancing measurements,
5G-enabled vehicular networks: A distributed offloading framework for inference and modeling,” in Proceedings of SIGMETRICS 2005, 2005,
traffic big data,” IEEE Transactions on Industrial Informatics, vol. 16, pp. 362–373.
no. 2, pp. 1352–1361, 2020. [33] F. Xu, Y. Lin, J. Huang, D. Wu, H. Shi, J. Song, and Y. Li, “Big
[12] X. Wang, Z. Zhou, F. Xiao, K. Xing, Z. Yang, Y. Liu, and C. Peng, data driven mobile traffic understanding and forecasting: A time series
“Spatio-temporal analysis and prediction of cellular traffic in metropo- approach,” IEEE Transactions on Services Computing, vol. 9, no. 5, pp.
lis,” IEEE Transactions on Mobile Computing, vol. 18, no. 9, pp. 2190– 796–805, 2016.
2202, Sep. 2019. [34] S. Gelly and D. Silver, “Monte-carlo tree search and rapid action value
[13] C. Backfrieder, G. Ostermayer, and C. F. Mecklenbräuker, “Increased estimation in computer go,” Artificial Intelligence, vol. 175, no. 11, pp.
traffic flow through node-based bottleneck prediction and V2X com- 1856 – 1875, 2011.
munication,” IEEE Transactions on Intelligent Transportation Systems, [35] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling,
vol. 18, no. 2, pp. 349–363, Feb 2017. P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton,
[14] D. Chen, “Research on traffic flow prediction in the big data environment “A survey of monte carlo tree search methods,” IEEE Transactions on
based on the improved RBF neural network,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–43,
Industrial Informatics, vol. 13, no. 4, pp. 2000–2008, Aug 2017. March 2012.
11
[36] T. Jaakkola, M. I. Jordan, and S. P. Singh, “On the convergence of Balqies Sadoun received her M.S. in Civil Engineer-
stochastic iterative dynamic programming algorithms,” Neural Compu- ing from the Ecole Nationale Des Travaux Publique
tation, vol. 6, no. 6, pp. 1185–1201, 1994. de LEtat, Lyon, France. She received another M.S.
[37] J. Andrusenko, R. L. Miller, J. A. Abrahamson, N. M. M. Emanuelli, and Ph.D. degrees from The Ohio State University,
R. S. Pattay, and R. M. Shuford, “VHF general urban path loss model Columbus, Ohio, USA, where show was on a schol-
for short range ground-to-ground communications,” IEEE Transactions arship by JUST. Currently, she is a Professor at
on Antennas and Propagation, vol. 56, no. 10, pp. 3302–3310, 2008. the College of Engineering, University of Sharjah,
[38] M. Roughan, Y. Zhang, W. Willinger, and L. Qiu, “Spatio-temporal UAE and a professor in the Department of Surveying
compressive sensing and Internet traffic matrices (extended version),” Engineering and Geomatics and Civil Engineering,
IEEE Transactions on Networking, vol. 20, no. 3, pp. 662–676, 2012. Faculty of Engineering, at the Al-Balqa Applied
University, Jordan. She has published good number
of refereed journal and conference papers. She served as a Department Chair
Laisen Nie received his Ph.D. degree in commu- more than once. She worked at City University of New York, USA, and
nication and information system from Northeastern Jordan University of Science and Technology in Jordan. Her research interests
University, Shenyang, China, in 2016. Now He is include: modeling and simulation, Geographical Information Systems (GIS),
an Associate Professor in School of Electronics and Wireless Navigation Systems, Data Analytics, Transportation Systems, Smart
Information, Northwestern Polytechnical University, Homes and Cities, among others.
Xi’an, China. His research interests include network
measurement, network security, and cognitive net-
works.
Huizhi Wang is now studying communication engi-
neering at Northwestern Polytechnical University for
undergraduate degree. Her research interests include
machine learning and network measurement.
Zhaolong Ning (M’14-SM’18) received the M.S.
and Ph.D. degrees from Northeastern University,
Shenyang, China, in 2011 and 2014, respectively.
From 2013 to 2014, he was a Research Fellow
at Kyushu University, Japan. He is an Associate
Professor in Dalian University of Technology, China,
and a Distinguished Professor in Chongqing Univer-
sity of Posts and Telecommunications. His research
interests include Internet of Things, edge computing, Shengtao Li received his M.S. degree in operational
and vehicular networks. Dr. Ning has published over research and cybernetics from Ludong University,
120 scientific papers in international journals and China, in 2010, and Ph.D. degree in control theory
conferences. He serves as an associate editor or guest editor of several and control engineering from Northeastern Univer-
journals, such as IEEE Transactions on Industrial Informatics, The Computer sity, China in 2013. He is currently an associate pro-
Journal and so on. fessor in the school of information science and engi-
neering, Shandong Normal University. His research
Mohammad S. Obaidat (S’85-M’86-SM’91-F’05) interests include nonlinear systems theory, optimal
Dr. Mohammad S. Obaidat [F’05] received his Ph.D. switch-time control theory of switched stochastic
degree in Computer Engineering with a minor in systems, and robust control of systems with time-
Computer Science from The Ohio State University, delay.
Columbus, USA. He has published To Date (2019)
about 1,000 refereed technical articles-About half
of them are journal articles, over 70 books, and Lei Guo (M’06) received the Ph.D. degree from
about 70 Book Chapters. He is Editor-in-Chief of the University of Electronic Science and Technol-
3 scholarly journals and an editor of many other ogy of China, Chengdu, China, in 2006. He is a
international journals. He is the founding Editor-in Full Professor with the School of Communication
Chief of Wiley Security and Privacy Journal. He is and Information Engineering, Chongqing Univer-
now the Founding Dean of the College in Computing and Informatics at sity of Posts and Telecommunications, Chongqing,
The University of Sharjah, UAE. Among his previous positions are Advisor China. He has authored or co-authored more than
to the President of Philadelphia University for Research, Development and 200 technical papers in international journals and
Information Technology, President and Chair of Board of Directors of the conferences. His current research interests include
Society for Molding and Simulation International, SCS, Senior Vice President communication networks, optical communications,
of SCS, Dean of the College of Engineering at Prince Sultan University, Chair and wireless communications. Dr. Guo is currently
and tenured Professor at the Department of Computer and Information Science serving as an Editor for several international journals.
and Director of the MS Graduate Program in Data Analytics at Fordham
university, Chair and tenured Professor of the Department of Computer
Science and Director of the Graduate Program at Monmouth University is a Guoyin Wang (SM’03) received the B.S., M.S., and
tenured Full Professor at King Abdullah II School of Information Technology, Ph.D. degrees in computer science and technology
University of Jordan, The PR of China Ministry of Education Distinguished from Xi’an Jiaotong University, Xi’an, China, in
Overseas Professor at the University of Science and Technology Beijing, 1992, 1994, and 1996, respectively.
China and an Honorary Distinguished Professor at the Amity University- During 1998−1999, he was a Visiting Scholar
A Global University. He has chaired numerous (Over 160) international with the University of North Texas, Denton,
conferences and has given numerous (Over 160) keynote speeches worldwide. TX, USA, and the University of Regina, Regina,
He founded or co-founded four international conferences. He has served SK, Canada. Since 1996, he has been with the
as ABET/CSAB evaluator and on IEEE CS Fellow Evaluation Committee. Chongqing University of Posts and Telecommuni-
He has served as IEEE CS Distinguished Speaker/Lecturer and an ACM cations, Chongqing, China, where he is currently a
Distinguished Lecturer. Since 2004 has been serving as an SCS Distinguished Professor, the Director of the Chongqing Key Labo-
Lecturer. He received many best paper awards for his papers including ratory of Computational Intelligence, and the Dean of the School of Graduate.
ones from IEEE ICC, IEEE Globecom, AICSA, CITS, SPECTS, DCNET He was appointed as the Director of the Institute of Electronic Information
International conferences. He also received Best Paper awards from IEEE Technology, Chongqing Institute of Green and Intelligent Technology, CAS,
Systems Journal in 2018 and in 2019 (2 Best Paper Awards). In 2020, he China, in 2011. He is the author of ten books, the Editor of dozens of
received 4 best paper awards from IEEE Systems Journal. He also received proceedings of international and national conferences, and has more than
many other worldwide awards for his technical contributions. He also received 200 reviewed research publications. His research interests include rough set,
the SCS Outstanding Service Award. He was awarded the IEEE CITS Hall granular computing, knowledge technology, data mining, neural network, and
of Fame Distinguished and Eminent Award. He is a Life Fellow of IEEE and cognitive computing.
a Fellow of SCS.

10 1109@tii 2020 3004232

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1109@tii 2020 3004232

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

A Reinforcement Learning-based Network Traffic

Abstract—Intelligent Internet of Things (IIoT) is comprised of

III. S YSTEM M ODEL IV. R EINFORCEMENT L EARNING - BASED N ETWORK

Prior Predictor For network traffic prediction, we define the sampling of OD

converges to 0 with probability 1 under the following two

kE {F (Xt−1, at−1 )}kW ≤γk∆QkW ,

nl ← arg min η = φ nl 0 + φ nl 2 . During each iteration,

(a) SREs in testbed. 1

0 0.5 1 1.5 2 2.5 3

(c) SREs in GÉANT. 0.4

1 (d) CDF of TRE in GÉANT.

1000 RL and PCA methods’ error variances are relatively high,

variances for two data sets. Although the profiles of traffic

You might also like

10 1109@tii 2020 3004232

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 1109@tii 2020 3004232

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

A Reinforcement Learning-based Network Traffic

Abstract—Intelligent Internet of Things (IIoT) is comprised of

III. S YSTEM M ODEL IV. R EINFORCEMENT L EARNING - BASED N ETWORK

Prior Predictor For network traffic prediction, we define the sampling of OD

converges to 0 with probability 1 under the following two

kE {F (Xt−1, at−1 )}kW ≤γk∆QkW , 

nl ← arg min η = φ nl 0 + φ nl 2 . During each iteration,

(a) SREs in testbed. 1

0 0.5 1 1.5 2 2.5 3

(c) SREs in GÉANT. 0.4

1 (d) CDF of TRE in GÉANT.

1000 RL and PCA methods’ error variances are relatively high,

variances for two data sets. Although the profiles of traffic

You might also like

kE {F (Xt−1, at−1 )}kW ≤γk∆QkW ,