You are on page 1of 4

1656 IEEE COMMUNICATIONS LETTERS, VOL. 22, NO.

8, AUGUST 2018

Citywide Cellular Traffic Prediction Based on Densely


Connected Convolutional Neural Networks
Chuanting Zhang, Student Member, IEEE, Haixia Zhang , Senior Member, IEEE,
Dongfeng Yuan, Senior Member, IEEE, and Minggao Zhang

Abstract— With accurate traffic prediction, future cellular Aiming at exploiting the spatial dependence of different
networks can make self-management and embrace intelligent and cells, [9] established a strategy combining auto-encoder and
efficient automation. This letter devotes itself to citywide cellular Long Short Term Memory (LSTM) network [10]. However,
traffic prediction and proposes a deep learning approach to model
the nonlinear dynamics of wireless traffic. By treating traffic the learned features through auto-encoder are lossy represen-
data as images, both the spatial and temporal dependence of cell tation of original data [11], which may fail to fully characterize
traffic are well captured utilizing densely connected convolutional the spatial dependence of neighboring cells. Besides, the above
neural networks. A parametric matrix based fusion scheme is methods mainly concentrate on predicting traffic for single
further put forward to learn influence degrees of the spatial cell. They are computationally expensive if applied to citywide
and temporal dependence. Experimental results show that the
prediction performance in terms of root mean square error can be scale networks, because hundreds even thousands of models
significantly improved compared with those existing algorithms. need to be trained simultaneously.
The prediction accuracy is also validated by using the data sets Motivated by the aforementioned problems, this letter
of Telecom Italia. proposes a new method for citywide traffic prediction by
Index Terms— Cellular traffic prediction, big data, deep exploiting the powerful capability of deep convolutional
learning, intelligent traffic management. neural network (CNN). More specifically, densely connected
CNN [12], which is the most advanced deep learning archi-
I. I NTRODUCTION tecture, is utilized to collectively model the spatial and
T HAS been well known that adopting traffic prediction temporal dependence of traffic in different cells. The spa-
I can facilitate resource allocation, enhance energy effi-
ciency, and finally enable intelligent cellular networks [1], [2].
tial dependence is naturally captured by the convolution
operation. Two temporal dependence, i.e., closeness and
Recently, a lot of work has been done to investigate period, are modeled using two CNNs and the results are
the dynamic characteristics of wireless traffic, e.g., non- further fused by a parametric-matrix-based scheme. Experi-
stationary and seasonality [3], thus to make an accurate mental results and comparisons demonstrate that the proposed
prediction [4], [5]. Cellular traffic prediction in these work can method is effective and outperforms those existing traffic
be treated as a time series analysis problem, the performance prediction methods. The source code is available through
of which depends on its linear statistical models, such as https://github.com/zctzzy/traffic_prediction.
AutoRegressive Integrated Moving Average (ARIMA) and
alpha-stable. The pattern of cellular traffic is actually very II. DATASET AND S OME K EY O BSERVATIONS
complex due to various factors, e.g., user mobility, arrival A. Wireless Big Traffic Dataset
pattern and diverse user requirements. It becomes increasingly The wireless traffic data analyzed in this letter comes from
clear that those linear models are not suitable for such kind a large telephony service provider in Europe, Telecom Italia,
of applications [6]. as part of the “Big Data Challenge” [13]. The dataset consists
To capture the complex and nonlinear dependence hidden of time series of aggregated cell phone traffic, i.e, short
in wireless traffic data, recent advances on machine learning message service (SMS) and call service, sent or received by
models [7] have established themselves as strong competitors users within a specific area over the city of Milan. The city
to classical statistical models in traffic prediction [8], [9]. is divided into a grid with size of H × W and each square
[8] proposed a deep belief network based prediction method of the grid is referred to as a “cell”1 . H and W refer to
to model the long-term dependence of cellular traffic. the number of rows and columns of the grid. In the dataset,
Manuscript received March 22, 2018; revised May 3, 2018; accepted
H = W = 100, which means the whole city area is divided
May 16, 2018. Date of publication May 29, 2018; date of current version into 100 × 100 cells. The traffic is recorded during the period
August 10, 2018. This work was supported in part by the National Science from 00:00 11/01/2013 to 00:00 01/01/2014 with a temporal
Foundation of China for Excellent Young Scholars under Grant 61622111 and
in part by the National Natural Science Foundation of China under Grant
interval of 10 minutes. At the tth time slot, the in and out
61671278. The associate editor coordinating the review of this paper traffic in all cells can be denoted as a tensor Xt ∈ R2×H×W
and approving it for publication was M. Caleffi. (Corresponding author: where (Xt )0,i,j = xin,i,j
t , (Xt )1,i,j = xout,i,j
t . Furthermore,
Haixia Zhang.) i,j
The authors are with the Shandong Provincial Key Laboratory of
xi,j = {xt }, ∀t is denoted as the in/out traffic of cell (i, j)
Wireless Communication Technologies, Shandong University, Jinan 250100, without distinguishing them unless otherwise specified.
China (e-mail: chuanting.zhang@gmail.com; haixia.zhang@sdu.edu.cn;
dfyuan@sdu.edu.cn). 1 This is the best approximation of a cell tower from publicly available
Digital Object Identifier 10.1109/LCOMM.2018.2841832 dataset.
1558-2558 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
ZHANG et al.: CITYWIDE CELLULAR TRAFFIC PREDICTION BASED ON DENSELY CONNECTED CNNs 1657

Fig. 2. Prediction framework.

with 9 × 9 cells is selected to show the spatial correlations.


The obtained ρ values between the target cell (5, 5) and its
neighboring cells are shown in Fig. 1d. It can be evidently
Fig. 1. Spatial and temporal distribution and correlation of the considered
cellular traffic. observed that there indeed exists spatial correlation among
cells. The degree of the correlation depends somewhat on the
distance between cells. For example, though cell (5, 4) and
B. Key Observations cell (5, 6) are with the same distance from target cell (5, 5),
After carefully exploring the dataset, we obtain the spatial their correlation values, which are 0.48 and 0.87, respectively,
and temporal characteristics of the wireless traffic. Details are still differ a lot.
demonstrated in Fig. 1. Based on the above observations, it is clear that an effective
1) Temporal Domain: Fig. 1a shows the traffic dynamics method is needed to collectively capture the spatial and
of different services, i.e., SMS and Call service, of a given temporal dependencies of the wireless traffic.
cell. It can be seen from Fig. 1a that the traffic follows
strong daily patterns associated with a slight dissimilarity III. P REDICTION M ODEL
between SMS and phone Calls. For example, the traffic volume In this section, a deep learning approach based on con-
drops at weekends compared with those at working days. The volutional neural networks is proposed to model the spatial-
difference between in and out traffic volume of SMS is much temporal dependence of the traffic data among different cells.
more obvious than that of phone Calls. Taking SMS traffic as The framework is illustrated in Fig. 2 which mainly consists
an example, Fig. 1b displays the average traffic volume ratio of three components: training data construction, convolutional
at a time slot gap τ . The ratio  is defined as learning and parametric matrix based fusion.
1 
T 
H 
W xi,j
= t
, (1) A. Training Data Construction
(T − τ ) × H × W t=1+τ i=1 j=1 xi,j
t−τ The in and out traffic of each time slot are represented as an
where xi,jt represents the traffic volume of cell (i, j) at the t
th image-like two channel tensor matrix, as shown in the top part
time slot and T is the number of time slots of the dataset. The of Fig. 2. A sliding window scheme is used to generate training
value of this ratio denotes the temporal correlation of wireless and test datasets. Suppose that the traffic volume of tth slot is
traffic in time series, namely, traffics of adjacent slots are more the target to be predicted, the intervals before t are divided into
relevant than those apart in time. two fragments, i.e., recent time and daily history. Define p as
2) Spatial Domain: Again taking SMS as example, the spa- the length of dependence of closeness, the traffic of recent time
tial distribution of the traffic at a given time is demonstrated fragment modeling the temporal closeness dependence can be
in Fig. 1c. It can be seen that the traffic is distributed unevenly written as [Xt−p , Xt−(p−1) , · · · , Xt−1 ]. Similarly, define q as
across different cells. This is reasonable, because of the large the length of dependence of period, the traffic sampled from
population in the city center where there is much more traffic daily history modeling the temporal period dependence can
than the rural areas. The spatial correlation of the traffic data be expressed as [Xt−q∗24 , Xt−(q−1)∗24 , · · · , Xt−24 ]. Then the
is measured using a widely used metric [9], i.e., Pearson traffic in each fragment is concatenated to be a new tensor
correlation coefficient ρ, between a target cell (i, j) and its along their first axis. For denotation simplicity, these two
neighboring cells (i , j  ) sampled traffic models are denoted as Xc ∈ R2p×H×W and
  Xd ∈ R2q×H×W , respectively.
cov(xi,j , xi ,j )
ρ= , (2)
σxi,j σxi ,j B. Densely Connected Convolutional Neural Networks
where cov(·) represents the covariance operator and σ is As discussed in section II-B.2, traffic of neighboring cells
the standard deviation. For demonstration purpose, a region may be affected by each other. As CNN has shown its
1658 IEEE COMMUNICATIONS LETTERS, VOL. 22, NO. 8, AUGUST 2018

remarkable ability in hierarchically capturing the spatial struc-


tural information, it is introduced to model the spatial depen-
dence among cells. The convolution with a kernel of size
(k1 , k2 ) can effectively fuse the information of k1 k2 cells
into a high-level representation. Employing CNN, after a
series of convolutions in each layer, the local and global
spatial dependencies of the city wide traffic can be captured.
Considering the fact that the in and out traffic of different
cells are dependent with each other no matter they are adja-
cent or located far apart, the densely connected pattern [12] are
adopted in CNN, which can alleviate the vanishing-gradient
problem and strengthen the feature propagation, and finally can
enhance the prediction efficiency and accuracy. The detailed Fig. 3. Overall performance on two kinds of wireless traffic.
network architecture is displayed in Fig. 2, where there are
where θ denotes the parameter set of the proposed prediction
two networks with shared structure designed, one is used to
model.
model temporal closeness dependence and the other is used
to model temporal period dependence. In the network, there
IV. E XPERIMENTAL R ESULTS AND A NALYSIS
are L layers and each layer implements a non-linear transfor-
mation fl (·), l = 1, 2, · · · , L, which is a composite function In this section, the data is firstly preprocessed for model
with three consecutive operations, i.e., Convolution (Conv), training, and then the overall performance of the proposed
Batch Normalization (BN) and rectified linear units (ReLU). prediction scheme is evaluated and the predicted results are
By calculating the weighted sum of its input, ReLU decides given.
whether it should be activated or not. For modeling closeness
dependence, with the initial input X0c , at the lth layer, the A. Preprocessing and Parameter Settings
output is denoted as During the 10 minutes time interval of the original dataset,
Xlc = fl (X0c ⊕ X1c ⊕ ···⊕ Xl−1
c ), (3) a large proportion of the cell traffic is zero, which makes the
data very sparse. Besides, resource planning in 10 minutes
where ⊕ refers to the concatenation of the feature maps level is a non-trivial task and may result in unstable net-
produced in all preceding layers. Similarly, for modeling the works or excessive overhead. Thus in this letter, the traffic is
temporal period dependence, the output at the lth layer can aggregated per hour. As sigmoid function is used to activate the
be expressed as
outputs of the proposed model, the traffic is scaled into [0,1]
Xld = fl (X0d ⊕ X1d ⊕ · · · ⊕ Xl−1
d ). (4) using Min-Max normalization. In evaluation, the predicted
values are rescaled back to the normal values and compared
C. Parametric Matrix Based Fusion with the ground truth. In addition, the traffic from the last
From the above analysis, it is known that the traffic of 7 days is selected as the test data, and all traffic before is set
different cells are related to both closeness and period, but the to be training data.
relevance varies from each other. To capture this relationship, Both the two deep networks are trained by the widely
a parametric matrix based scheme is proposed to fuse the used optimization technique, Adam, with a mini-batch 32 for
features of closeness and period, as shown in the bottom 100 epochs. The initial learning rate is set to 0.01, and is
part of Fig. 2. A convolution layer is separately added on divided by 10 at 50% and 75% of the total number of training
top of the Lth layer of the networks to get the features for epochs. All convolution layers have 32 filters with size 3 × 3
the parametric matrix based fusion scheme. Suppose that the except for the last layer, which has 2 filters with size 3 × 3.
outputs of the two convolution layers are XL+1 and XL+1 , For lengths of dependent sequences, p and q are both set to
c d
respectively. After fusion, we have be 3. The root mean square error (RMSE) is adopted as the
evaluation metric and defined as
Xo = Wc  XL+1
c + Wd  XL+1 , (5) 
d
1 H×W
where  is Hadamard product, Wc and Wd are learnable RMSE = (yˆi − yi )2 , (8)
H × W i=1
parameters charactering the relationship between closeness,
period and the wireless traffic. The final output after an where yˆi is the predicted value of ith cell and yi is the ground
activation writes truth.
X̂t = σ(Xo ), (6) B. Overall Performance
where σ(·) denotes the sigmoid function. The proposed deep To validate the effectiveness of the proposed deep learn-
learning model can be easily trained through minimizing the ing based cellular traffic prediction method, experiments
Frobenius norm between the predicted value and the ground are carried out with two types of datasets, i.e., SMS
truth value of the tth slot and Call. As a performance measurement, RMSE of the
prediction, denoted as Proposed-F, is calculated and dis-
L(θ) = arg minX̂t − Xt 22 , (7)
θ played in Fig. 3. RMSE without parametric matrix based
ZHANG et al.: CITYWIDE CELLULAR TRAFFIC PREDICTION BASED ON DENSELY CONNECTED CNNs 1659

truth values. The comparisons are shown in Fig. 5. It can be


seen that the prediction results well match the trend of ground
truth even when the traffic becomes unstable during the time
slots from 130th to 160th . The peaks of both in and out traffic
can be effectively captured and predicted.

V. C ONCLUSION
This letter investigates the spatial and temporal dependence
of traffic among different cells and has proposed a deep
learning approach to collectively model these two kinds of
dependencies for traffic prediction. To exactly model the
Fig. 4. The change of training loss with each epoch.
influence degree of these two dependences, a parametric
matrix based fusion scheme has been introduced. It has
been shown that by treating traffic data as images, the city-
wide traffic can be efficiently predicted. Experimental results
have demonstrated that the proposed CNN based approach
obtains better performance in terms of RMSE when com-
pared with some existing methods. It should be noted that
to achieve the best performance with the proposed approach
large amount of training data is needed. The code is available
at https://github.com/zctzzy/traffic_prediction.

R EFERENCES
[1] R. Li et al., “Intelligent 5G: When cellular networks meet artificial
intelligence,” IEEE Wireless Commun., vol. 24, no. 5, pp. 175–183,
Fig. 5. Prediction results of a random selected cell. Oct. 2017.
[2] N. Saxena, B. J. R. Sahu, and Y. S. Han, “Traffic-aware energy
optimization in green LTE cellular systems,” IEEE Commun. Lett.,
fusion scheme (XL+1 c + XL+1
p ), denoted as Proposed, is also vol. 18, no. 1, pp. 38–41, Jan. 2014.
included in Fig. 3. It can be observed that the fusion scheme [3] A. D’Alconzo, A. Coluccia, F. Ricciato, and P. Romirer-Maierhofer,
can enhance the performance in terms of RMSE, which “A distribution-based approach to anomaly detection and application
means that learning the weights of closeness and period can to 3G mobile traffic,” in Proc. IEEE Global Telecommun. Conf.,
Nov./Dec. 2009, pp. 1–8.
better describe their function in traffic prediction. For com- [4] B. Zhou, D. He, and Z. Sun, Traffic Modeling and Prediction Using
parison purpose, three existing algorithms, Historical Average ARIMA/GARCH Model. Boston, MA, USA: Springer, 2006, ch. 5,
value (HA), ARIMA and LSTM, are adopted as baselines as pp. 101–121.
[5] R. Li, Z. Zhao, J. Zheng, C. Mei, Y. Cai, and H. Zhang, “The
they have been widely used for traffic prediction in current learning and prediction of application-level traffic data in cellu-
research work [4], [9]. The obtained RMSE shows that the lar networks,” IEEE Trans. Wireless Commun., vol. 16, no. 6,
proposed method achieves the most accurate prediction. This pp. 3899–3912, Jun. 2017.
[6] J. G. D. Gooijer and R. J. Hyndman, “25 years of time series forecast-
is because the spatial and temporal dependences of cellular ing,” Int. J. Forecasting, vol. 22, no. 3, pp. 443–473, 2006.
traffic among different cells are collectively modeled through [7] H. Zhang, L. Cao, and S. Gao, “Locality correlation preserving based
utilizing CNNs. Moreover, the RMSE values on the Call one-class support vector machine,” Pattern Recognit., vol. 47, no. 9,
dataset are relatively lower than that of the SMS dataset, as the pp. 3168–3178, 2014.
[8] L. Nie, D. Jiang, S. Yu, and H. Song, “Network traffic prediction based
pattern hidden in the Call dataset is more “regular”, as seen on deep belief network in wireless mesh backbone networks,” in Proc.
from Fig. 1a, than the SMS dataset, which in turn makes it IEEE Wireless Commun. Netw. Conf. (WCNC), San Francisco, CA, USA,
easier to be learned by the deep model. It should be noted that Mar. 2017, pp. 1–5.
[9] J. Wang et al., “Spatiotemporal modeling and prediction in cellular
for real communication systems re-training is needed to cope networks: A big data enabled deep learning approach,” in Proc. IEEE
with the new coming data. Conf. Comput. Commun. (INFOCOM), Atlanta, GA, USA, May 2017,
To show the convergence speed of the proposed method, pp. 1–9.
the training loss after each epoch is given in Fig. 4. It is shown [10] Y. LeCun, Y. Bengio, and G. E. Hinton, “Deep learning,” Nature,
vol. 521, pp. 436–444, May 2015.
that the loss decreases quickly during the first 20 epochs and [11] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of
then gradually converges to a stable status after 40 epochs, data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,
indicating that the training process of the proposed method is 2006.
[12] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger,
time efficient. “Densely connected convolutional networks,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA,
C. Prediction Results Jul. 2017, p. 3.
[13] G. Barlacchi et al., “A multi-source dataset of urban life in the city
Finally, both the predicted in and out traffic of a ran- of milan and the province of trentino,” Sci. Data, vol. 2, Oct. 2015,
domly selected cell are given and compared with the ground Art. no. 150055.

You might also like