QoE Assessment Model Based On Continuous Deep Learning For Video in Wireless Networks

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO.
6, JUNE 2023 3619
QoE Assessment Model Based on Continuous

Deep Learning for Video in Wireless Networks
Xuewen Liu , Gang Chuai , Xin Wang , Member, IEEE,
Zhiwei Xu , Member, IEEE, and Weidong Gao
Abstract—Quality of experience (QoE) is a vital metric that indicates how well the wireless network provides transmission services to
users, while quality of service (QoS) help better configure the network parameters for higher performance. The evaluation time of QoE is
usually several orders of magnitude larger than that of QoS, because QoE is the perception of users over a period of time, but QoS can be
collected every millisecond. Therefore, the implementation of QoE/QoS mapping model can help us obtain QoE by collecting the QoS
measurements, and perform QoE-based network configurations with smaller time granularity. Many studies are made to obtain the QoS to
QoE mapping, including the use of machine learning (ML) methods. However, traditional ML-based regression methods for QoE/QoS
mapping face the challenge of high regression error and catastrophic forgetting in dealing with continuously arriving data. In this paper, we
propose a novel QoE model based on continual deep learning in wireless network. This model is formed with two deep neural networks
(DNNs) concatenated. The first DNN classifies data into different subsets, which are then fed into the second DNN for regression. The
second DNN dynamically form the corresponding subnets, each with nodes and connections adaptively selected in each new time period
with new arriving data. We solve the catastrophic forgetting problem with the use of node splitting and hidden state augmentation. Our
proposed learning framework greatly reduces the regression error to as low as 0.9314%. The experimental results demonstrate that our
proposed model reduces the root mean square error (RMSE) by 21 86 times compared with several existing approaches, specially, the
testing error of our proposed model is more than 80 times lower than that of traditional DNN. Compared with other DNN-based cascade
models, our proposed method provides good performance in both training time and RMSE.
Index Terms—Data-driven QoE assessment, continual deep learning, QoE/QoS mapping, wireless network, cascaded DNNs
1 INTRODUCTION measure the service performance indicators and determine

the impact of each metric on user experience [2]. However, a
OE, an estimated metric of user satisfaction for a service,
Q will be much more important in the 5G mobile wireless
networks. As services with different requirements coexist [1],
subjective test is extremely costly, time-consuming and not
suitable for large-scale assessments and real time applica-
tions [3]. To overcome the limitations of subjective evalua-
it not only requires an accurate method to measure QoE, but
tions, researchers investigate objective methods which do not
also the efficient schemes to improve the performance. How-
rely on viewer feedbacks but use mathematical statistics and
ever, QoE assessment is a difficult task, because it is not only
machine learning. According to the requirement of original
affected by objective factors such as service types, client devi-
video, objective assessments are classified as [4]: full refer-
ces and network conditions, but is also related to the subjec-
ence (FR), reduced reference (RR) and no reference (NR).
tive feeling of users like preferences and tolerance. Subjective
Both FR and RR need the information about original video
test is the simplest way to judge the perceived video quality
that is only available to service providers and the client side
in a lab environment with specialized client software to
[5]. As a result, Mobile Network Operators (MNOs) are not
able to perform measurements in real time and get the ade-
Xuewen Liu, Gang Chuai, and Weidong Gao are with the Department of quate metrics to predict QoE and optimize the network con-
Information and Communication Engineering, Beijing University of Posts
and Telecommunications, Beijing 100876, China. E-mail: {xuewen_liu1990,
figurations on time. The purpose of QoE/QoS mapping is to
chuai, gaoweidong}@bupt.edu.cn. provide guidance for the future real-time optimal operation
Xin Wang is with the Department of Electrical and Computer Engineer- of wireless access network (RAN). NR model does not need
ing, State University of New York at Stony Brook, Stony Brook, NY the original (reference) video data or a reduced feature data
11794 USA. E-mail: x.wang@stonybrook.edu.
Zhiwei Xu is with the Institute of Computing, Chinese Academy of Scien- set, and it is more suitable for real time scenarios.
ces, Beijing 100080, China, and also with the Department of Electrical and NR-based QoE model mainly depends on two factors.
Computer Engineering, State University of New York at Stony Brook, One is the characteristics of video streams (e.g., bit rate,
Stony Brook, NY 11794 USA. E-mail: xuzhiwei2001@ict.ac.cn.
resolution and video coding), and the other is the network
Manuscript received 24 July 2021; revised 15 November 2021; accepted 6 Decem- QoS indicators, such as packet delay, packet loss rate, jitter
ber 2021. Date of publication 9 December 2021; date of current version 5 May
2023. and throughput. To guide for the optimal operations of
This work was supported in part by the National Key Research and Develop- radio access network (RAN), this article focuses on estab-
ment Project of China under Grant 2020YFB1806703 and in part by the lishing the relationship between video QoE and QoS. Many
National Science Foundation under Grant ECCS 2030063. Xuewen Liu would mathematical models (exponential hypothesis [6], [7], loga-
like to acknowledges financial support from China Scholarship Council.
(Corresponding author: Gang Chuai.) rithmic formula [8], linear equation [9]) and ITU standards
Digital Object Identifier no. 10.1109/TMC.2021.3133949 (P. NAMS and P. NBAMS [10]) have been developed to get
1536-1233 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
3620 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 6, JUNE 2023
QoE/QoS relationship. Since machine learning has advan- model update through hidden node splitting to prevent cat-
tages in inferring the complex relationship between multi- astrophic forgetting in the continuous learning.
ple parameters, many machine learning techniques are Based on the above analysis, we propose the solutions
proposed to associate QoE with QoS in recent years, includ- from two perspectives to address the challenges.
ing random forest regression (RFR) [11], liner regression First, based on our analysis of distribution of data on
(LR) [12], decision tree regression (DTR) [12], support vector QoE, we develop a new deep learning framework com-
regression (SVR) [13], gradient boosting decision tree posed with two cascading DNNs to achieve low regression
(GBDT) [14], boosting SVR [15] and DNN [16]. errors. The first-level DNN, called classification learning
However, there are two drawbacks in the above studies. network (CLN), is used to properly classify the input data
into m categories. Then, the classification results will be
Poor regression performance. From the analysis of state- processed in the second-level DNN - regression learning
of-the-art learning-based models for QoE and QoS network (RLN), to perform regression. Different from a cas-
mapping, we found that the regression error (RMSE) caded DNN used in other contexts, the first-level network
of all models is greater than 10%, which is undoubt- of the proposed model is applied to divide the data set into
edly unacceptable for MNOs that need accurate m subsets (classes) rather than performing the feature
knowledge of the user QoE at any time. It is also dif- extraction in [17] and [18]. In the second-level network, we
ficult to optimize QoE through this inaccurate introduce selective training to form m different subnets
model, as the inaccuracy may offset the performance from a single DNN by dynamically selecting neural net-
gains brought by network optimization and intro- work nodes and connections for each to make the architec-
duce larger network configuration errors. ture more compact and achieve the regression of m data
Not considering the model update. Almost all literature sets. While, in [19], [20], [21], [22], [23] and [24], m DNNs or
studies did not consider how to update the QoE transfer learning are needed to perform this function, which
model as users and the wireless network environ- increases the computational complexity.
ment change, which greatly reduces the usability Second, we exploit the principle of continuous learning
and accuracy of the model. for model updating to alleviate the catastrophic forgetting
In wireless networks, QoE is often captured through problem. Continuous/incremental learning has the ability
mean opinion scores (MOS) and has a range between 5 to quickly learn new data on the basis of existing knowledge
(excellent) and 1 (bad). Through the analysis of QoE distri- while minimizing forgetting. Different neural network
bution, we find that MOS values tend to form clusters in approaches [25] for lifelong learning has been developed,
every score range with the length of 1 and fluctuate around such as Regularization Approaches and Dynamic Architec-
some central points. Clustering similar data into subsets can tures. The regularization methods [26], [27] alleviate the cat-
reduce the fluctuation of MOS and make the data distribu- astrophic forgetting by imposing constraints on the update
tion of each subset more centralized with smaller variance. of neural weights, which may result in a trade-off between
Obviously, the model trained for regression by a data set the performance of old and new tasks because a limited
with a large variance is more likely to have a larger training number of neural resources may not be able to capture the
loss than that by a data set with a small variance. This moti- information well. Dynamic Architectures prevent forgetting
vates us to design a cascaded model to first divide the data by increasing the number of neurons [28], [29] or network
set into several new data sets with small variance in the layers [30], [31] to represent new information. However, the
first-level model, and then train the second-level model on main disadvantage of this strategy is that the number of
each new subset through regression. parameters increases greatly with the progress of learning.
This paper focuses on the mapping between QoS and [29] proposed a dynamically expanding network (DEN)
QoE to obtain QoE by collecting QoS and performing future that splits hidden nodes to incrementally learn new tasks.
QoE-based network configurations with a smaller time Inspired by [29], in this paper, we develop a novel model
granularity. In wireless networks, a lot of metrics are used updating method based on the principal of Regularization
to evaluate QoS, which makes the input of the model high- Approach whose severe performance degradation caused
dimensional. DNN is good at handling high dimension of by limited neural resources will be alleviated by using the
input to achieve good classification and regression results. hidden-node splitting to prevent the forgotten features of
Therefore, DNN is employed in both parts of the model. the previous data. To overcome the disadvantage of large
However, in the second-level model, if each subset is increase of parameters in Dynamic Architectures, in our
trained separately, we will have to train multiple models to proposed RLN model, selective training is adopted to form
ensure the training accuracy at the cost of complexity. On more compact structure, which will further reduce exces-
the other hand, if we use one model and train it with all sub- sive neural resources increased by learning new knowledge.
sets of data at together, the complexity can be reduced but it Different from the two training processes included in [29],
cannot guarantee the model accuracy. Instead, in order to our method only needs to be trained once and evaluated
ensure the model to be accurate while not incurring a high whether all nodes in each layer need to be copied, which
training complexity, we propose to divide a DNN network can significantly reduce the computational overhead.
(which generally has many connected neurons at each layer) The contributions of this paper are summarized as
into a number of sub-networks, and train each using a corre- follows:
sponding data subset. In addition, taking advantage of the
unique network structure of DNN composing of abundant 1) We propose a novel deep network architecture, i.e.,
connected neurons, we further propose to enable efficient CLN-RLN, for QoE/QoS mapping model which
LIU ET AL.: QOE ASSESSMENT MODEL BASED ON CONTINUOUS DEEP LEARNING FOR VIDEO IN WIRELESS NETWORKS 3621
consists of two cascaded DNNs. The first-level DNN measurements to train a RFR model against calculated
is employed to classify original dataset into several video quality metrics from the reference videos, and
subsets, with data in each having similar characteris- achieves the RMSE of 19.2%. However, post-processing
tics. At the second level, for each subset, a DNN sub- analysis on the original video with the degraded reference
net is flexibly and dynamically constructed with file is typically needed to compare the corresponding
nodes and links selected to reduce the training frames of the two video files and produce objective quality
errors. We demonstrate that the regression error can metrics such as MOS, Peak Signal–to–Noise Ratio (PSNR),
be greatly reduced by reasonable data partition and frame delay, frame skips, and blurriness. In [12], the authors
selecting appropriate sub networks for the integrated explore the method of getting key quality indicators (KQIs)
regression. To the best of our knowledge, we are the from KPIs with DTR in a real cellular network, and the eval-
first that propose the framework and use it for QoE/ uation indicates good fitness levels and modeling accuracy.
QoS mapping. In [14], the authors provide a new QoE prediction model for
2) We provide a new method to form neural subnet- video streaming services over 4G networks with drive test
works with nodes and links dynamically selected to data, using layer 1 (i.e., Physical Layer) key performance
greatly reduce the regression error and computa- indicators (KPIs). From the several considered ML algo-
tional complexity caused by the introduction of mul- rithms, GBDT shows the best performance, achieving 78.9%
tiple DNNs in the second-level. Pearson correlation and 11.4% MSE. Based on SVR, work in
3) We develop a model updating method based on the [15] creates a powerful predictive model, i.e., boosting SVR,
principal of Regularization Approach whose severe which shows its superiority over relevant ensemble learning
performance degradation caused by limited neural methods and gets the RMSE of 47%. In [16], four types of
resources will be compensated by using of hidden- subjective scores and 89 network parameters are collected
node splitting to prevent the forgotten features of the by a mobile phone application to achieve a data-driven
previous data. The compact architecture achieved by objective QoE prediction approach. As DNN can learn more
selective training will curb the disadvantage of sub- potential features through a large number of connected neu-
stantial increase of parameters in learning new rons, a DNN structure shows strong ability in presenting
knowledge. the complex relationship between the network performance
To the best of our knowledge, this is the first work that metrics and the user scores. The results show that DNN out-
exploits the use of cascading DNNs to achieve precise performs SVM, DTR and GBDT.
regression. Our proposed model reduces the RMSE by 21 Despite the potential of applying DNN to predicting QoE
86 times compared with several reference algorithms. Spe- scores, the performance of the current QoE/QoS mapping
cially, the RMSE of our CLN-RLN model is nearly 86 times shows that machine learning methods used in the current
lower than that of directly using DNN for regression with- research can not meet the practical requirements of low
out increasing too much training time. The regression error regression error. Feature selection almost needs to be car-
of the CLN-RLN model is as low as 0.9314%. ried out first, and the performance highly relies on its accu-
racy. Also, reference videos are essential for all the schemes
without real-time QoE estimation. Moreover, how to update
2 RELATED WORK model effectively after finishing QoE/QoS mapping is not
Recently, QoE/QoS mapping models based on machine mentioned in above work, which is the key to judging
learning methods in RAN have been developed. Previous whether the model is usable or not as wireless network
research on QoE assessment was mainly made based on states change rapidly.
QoS indicators such as packet loss rate, jitter, delay and Cascaded DNNs provide a new DNN-based structure.
bandwidth [32]. However, the QoS metrics alone can not They posses strong feature extraction ability and can
effectively reflect the fault problems in RAN and the root achieve high accuracy, but suffer from high complexity and
causes for poor user experience quality, such as poor cover- overhead. Various cascaded DNNs composed of Fully Con-
age, frequent handover, high interference and resource nected Network (FCN) or Convolutional Neural Networks
overload. Many other KPIs describing the radio conditions (CNN) have been developed in many other contexts. Paired
are introduced [11], [13], [14], [15], [16], [33] to compensate CNNs (termed ”CNN+CNN”) are utilized in image proc-
for the network quality metrics, such as channel quality essing [17] and efficient workload allocation [18] for feature
indicator, reference signal received quality, received signal extraction. In [19], a novel DNN architecture with two cas-
strength indicator, throughput, signal to interference noise caded parts included (termed ”FCN+CNN”) is proposed to
ratio and load. reconstruct the high-dimensional channel from the low-
In [13], MOS is captured by comparing the received sig- dimensional measurements. The first stage utilizes FCN to
nal with the original one. Feature pre-selection is first con- obtain an initial coarse channel estimation which is deliv-
ducted by using Pearson Correlation, and then SVR is used ered to CNNs in the second stage. In [20], another kind of
to map the QoS metrics and several Radio Frequency (RF) cascaded DNNs (termed ”CNN+FCN”) is proposed to
channel measurements of Real Drive Test (DT) data into achieve vehicle positioning with fully convolutional autoen-
MOS with the RMSE of 11%, but real-time QoE estimation coder network for extracting equivalent positioning features
was not studied. In order to improve the prediction accu- and two FCNs for efficiently estimating lateral position and
racy of classical individual learner, ensemble learning algo- yaw angle. Same cascaded structure (”CNN+FCN”) as [20],
rithms are applied in [11], [12], [14], [15]. Work [11] uses in [21], convolutional autoencoder network is applied to
several real-live radio metrics collected by smartphone abstract latent features and four FCNs to reconstruct image.
The combination of ”FCN+FCN” is employed for the cali-

bration of Time Division Duplexing (TDD) reciprocity [22]
and joint optical-signal-to-noise ratio (OSNR) monitoring
and modulation format identification [23], in which one
FCN in the first-level network is used for classification and
several FCNs in the second-level network are used for
regression. [24] achieves the same purpose as [23], and only
one FCN is used in the second-level network, but the trans-
fer learning (TL) and reconstructing data set is needed to
guarantee the universality and accuracy of the model,
which actually does not reduce the computational over-
head. To accurately learn the mapping between QoE and
QoS while avoiding incurring a high complexity, we
Fig. 1. The framework of the CLN-RLN model.
develop a novel cascaded DNNs consisting of two FCNs,
where the second FCN incorporates selective training to
form m different subnets with each formed by dynamically parameter used to restrict the impact of weight shift to the
selecting neural network nodes and connections. This scal- objective function. If we increase , the objective function
able and flexible architecture can effectively reduce com- will be more susceptible to the weight drifting, which makes
plexity and overhead introduced by the cascaded structure. the deep learning network not easy to forget the previous
network parameters. On the other hand, with a small , the
objective function will focus more on finding the minimum
3 METHOD loss under the current dataset.
Although DNN is shown to be promising in handling the
complex relationship between QoE and QoS, fixed DNN 3.2 The Proposed Method
structure and inflexible network construction mode make it
On the basis of the above analysis, our goal is to design a
hard to achieve high regression accuracy and timely update
learning model that can not only properly group the data,
the model as new data continuously arrive. To address
but also realize the regression with a low generalization
these issues, we propose a novel DNN framework and a
error. Accordingly, our deep learning framework will be
new model update scheme.
formed with two DNNs cascaded. As shown in Fig. 1, we
first classify the input data into m categories with a deep
3.1 Problem Formulation
learning function ’u ðxÞ, realized through a classification
QoE model assessment is made at time t with a series learning network (CLN) parameterized by u. Then, the clas-
f xi ; y i g N
i¼1 of Nt samples. Given an input xi , the goal of QoE
t
sification results will be processed in another deep learning
model building with a deep learning method is to learn a function fW ð’u ðxÞÞ through a regression learning network
function fW that can get the predicted QoE value y0i with the (RLN) parameterized by W to get the final predicted value.
minimal loss from the target value yi , with the loss defined In RLN, each classified group will dynamically include dif-
as ‘ðy0i ; yi Þ. We need to minimize the loss expressed as ferent neural network nodes and the corresponding links
(represented by nodes and solid lines with different colors
LðW ; DÞ ¼ E½‘ðfW ðXÞ; Y Þ ¼ ‘ðfW ðxÞ; yÞpðx; yÞdxdy; in Fig. 1) to discover the data features. Back propagation
(1) will be performed to minimize the total loss (RMSE) of
RLN. The colorless nodes are the ones that are not selected
where W represents the set of parameters to update in the
by any sub-network during the selective process, so their
process of learning, and D is the dataset.
weights (dotted line) will be set to 0 and not be updated in
To achieve the goal, fixed model structure or parameters
the process of backward propagation.
are difficult to meet the performance expectation with data
The flowchart of the CLN-RLN model is illustrated in
continuously arriving, and we need to constantly adjust the
Fig. 2. At the time t, we first initialize the network parame-
model based on new data. In the lifelong learning setting,
ters ut and W t with ut1 and W t1 (Step 1). Then, the dataset
previous data are not available, and only the model parame-
Dt is divided into n batches to feed into the model one by
ters trained from the previous data are known. As a result,
one (Step 2).
the model parameters should not be changed too dramati-
In CLN, input data are classified into several groups, and
cally to forget the model characteristics of the previous
the cross-entropy loss is applied to start the back propaga-
data. The most popular way to implement continual learn-
tion (Steps 3 and 4). With CLN placed in front of RLN, data
ing is to learn the model parameter W t and use ‘2 -regulari-
with similar characteristics will be categorized into the
zation to restrict the parameter drifting. Therefore, the
same subset and used for subsequent fitting regression.
objective function will be changed from Eq. (1) to
Obviously, the fitting result obtained in this way has a
2 smaller loss than the one obtained directly without data
min imize L W t ; Dt þ W t W t1 2 ; (2) sub-division, and the network is easier to train.
Wt
In RLN, partial network nodes and connections will be
where W t and W t1 are respectively the network parame- selected to train RLN for a category. Then, the RMSE will be
ters at the time t and t 1, with the dataset Dt ¼ calculated for backward propagation (Steps 3 and 4). The
fðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ðxi ; yi ÞgN t
i¼1 . is the regularization trained weight vector W t on dataset Dt will be achieved
X
C
LCE ðuÞ ¼ P ðcc =xÞ log ðP ðck =xÞÞ; (5)
k¼1;k6¼c
where C is the total number of classes, and P ðcc =xÞ is the

probability that x is predicted as class cc , which is the truth.
’u ðxÞ ¼ P ðck =xÞ is the probability for x to be predicted to a
class k, where k 2 ½1; C. If k! ¼ c, P ðck =xÞ is the probability
of classifying x into a wrong class.
To minimize the loss function LCE ðuÞ, in the backward
propagation, we introduce the stochastic gradient descent
(SGD), one of the most popular gradient descent methods,
which reduces the error by adjusting the network parame-
ters u. Instead of using the entire training instances, several
mini batches will be randomly sampled to repeatedly
Fig. 2. Flowchart of the CLN-RLN model. update the network parameters u by
unew ¼ uold aru LCE ðu; xnk ; cnk Þ; (6)
when the RMSE stays stable (Step 5). If the parameter drift
between W t and W t1 is greater than the threshold, we will where xnk ; cnk 2 batchn ¼fðxn1 ; cn1 ; yn1 Þ; . . . ; ðxnk ; cnk ; ynk Þg.
split the hidden units (parameterized by WNt1 ) at time t 1 batchnð1nNbatches Þ represents the nth training mini batch. xnk ,
and add part of them to the neural network at time t to cnk and ynk respectively represent the input vector, the label,
increase the generalization performance of the model at the and the target value of the kth sample in batchn . a ¼
timestamp t (Step 6). The details of CLN and RLN will be a0 decay rateðglobal step=decay stepsÞ is the adaptive learning
discussed in the following sections. rate, where a0 is the initial learning rate, decay rate is the
learning rate decay, decay steps is the decay step, and
3.2.1 The CLN Model global step is the current learning epochs.
In order to classify input data into m classes, we apply a
deep neural network (DNN) that is composed of an input 3.2.2 The RLN Model
layer, one or several hidden layers and an output layer, In traditional DNNs, once the model is established, samples
each of which has many neurons represented by mathemat- with different characteristics are mixed together for train-
ical symbols. The input is fed into the network layer by layer ing. The structure and node connections are fixed and will
to get the predicted output from Rn to Rm , where n and m remain static during the training process. The large variance
are respectively the size of the input and output vectors. between features of samples could not only result in a large
Each hidden layer hi is calculated as training loss for the model, but also the difficulty of training
because of the large number of neural units and connections
hi ðxÞ ¼ gðwTi x þ bi Þ; (3)
in DNN. To solve these problems, we adopt a dynamic
where hi : Rdi1 ! Rdi , g : Rn ! Rm , wi 2 Rdi di1 , b 2 Rdi , training method which is embodied in two aspects. One is
di is the size of output vectors, g is the non-linear activation Selective Training, where our frame dynamically selects a
function. There are many non-linear activation functions, subnetwork for each category to minimize the total loss.
such as sigmoid, tangent, softmax and Rectified Linear The other one is Hidden Node Splitting, which splits hidden
Unit (ReLU) function. We choose the ReLU function due to nodes of the subnetwork at time t 1 and incorporates the
its capability of reducing the vanishing and exploding gra- selected nodes into the subnetwork at the current time t.
dient issues. Generally, the CLN function ’u ðxÞ can be Selective training and hidden node splitting together consti-
expressed as tute the main functions of RLN.
1) Selective Training. Integrating with the data pre-group-
’u ðxÞ ¼ HL ðHL1 ð ðH1 ðxÞÞÞÞ; (4) ing, our RLN will be first trained with the selective partial
network, where different sub-networks will be chosen for
where L is the total number of hidden layers, and u ¼ different categories to minimize the overall loss function.
ðW10 ; W20 ; . . . ; Wl0 ; . . . ; WL0 Þ is the network parameters of CLN. As a first step, a sparse neural network without redundant
In Eq. (4), the forward propagation is used to get the out- connections will be constructed at the time stamp t = 1. The
put layer by layer, where the input feature vector is passed output is calculated layer by layer as
from one node to another along the fully connected edge in
a forward direction, and finally to the output layer used to fW ðxÞ ¼ HL ðHL1 ð ðH1 ðxÞÞÞÞ: (7)
represent the category of the instance. The back propagation
has the opposite process. After obtaining the forward prop- Different from CLN, the RMSE is calculated to get the loss
agation output, the loss will be calculated. In multi-class for backward propagation. RMSE can be expressed by
classification problem, the most commonly utilized loss sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
function is the cross entropy, which describes the distance 1X
LRMSE ðW Þ ¼ ðfW ðxÞ yÞ2 ; (8)
between the actual output and the expected output. The N N
smaller the value of cross entropy, the closer the distance
between the actual output and the expected output is. Cross where y is the expected output, and fW ðxÞ is the actual out-
entropy cost function can be expressed as put. W ¼ ðW1 ; W2 ; . . . ; Wl ; . . . ; WL Þ is the network
parameters of RLN and l represents the lth layer of network. subnetwork St1 to St . After all hidden nodes have been
N is the total number of samples. checked, the new subnetwork St0 are formed and it will be
‘1 -regularization is employed to train the DNN for get- used for further testing.
ting the sparse connectivity below:
Algorithm 1. CLN-RLN Model
X
L
min LRMSE ðW t¼1
; D t Þ þ m1 W t¼1 ; (9) Input: Dataset D ¼ ðD1 ; D2 ; . . . ; Dt ; . . . ; DT Þ. Learning rate a.
l 1
W t¼1 l¼1 ‘1 -regularization parameter m1 . ‘2 -regularization parameter
m2 ; The max number of epoch epoch max .
where Wltdenotes the lth layer network parameters at the Output: The final network parameters uT and W T .
time t, and W t ¼ fWlt gLl¼1 . m1 is the ‘1 -regularization 1: Initialize: Randomly initialize u and W .
parameter. 2: Training:
As a second step, we will obtain the selective network at 3: for t ¼ 1 ! T do
time point t ¼ 1 and update the network parameters W . 4: Label each instance of dataset Dt and divide it into N
Once we have the sparse network, we can determine differ- mini batches.
ent units and weights, which are affected by different cate- 5: Each mini batch can be expressed as: batchnð1nNÞ ¼
gories in the network and a neural network is selected for fðxn1 ; cn1 ; yn1 Þ; . . . ; ðxnk ; cnk ; ynk Þg.
each category. More specifically, as shown in Fig. 1, for the 6: for epoch ¼ 1 ! epoch max do
input node cm , the breadth-first search on the network is 7: for batch ¼ 1 ! batch max do
performed to identify all units that have paths (blue arrow 8: Get the CLN output layer by layer using Eq. (4).
in Fig. 1) from it, and these selected units constitute one sub- 9: Measure the CLN loss according to Eq. (5).
network. The construction method of other subnetworks is 10: Update the CLN parameters ut with Eq. (6).
similar to cm , and we will get the final selective networks 11: Calculate RLN output layer by layer using Eq. (7).
St¼1 on dataset Dt¼1 . Although each subnetwork is used for 12: Get the new batch (batch0 n ) to minimize the objective
the regression with its corresponding dataset, the objective function of RLN, where batch0 n ¼ fðxn1 ; yn1 Þ; . . . ; ðxnk ;
ynk Þjxnk ; ynk 2 batchn g.
of parameter updating is to minimize the total training error
13: if t = 1 then
of all subnetworks. In order to minimize LRMSE ðW Þ, SGD is
14: The objective function of RLN will be calculated
utilized to repeatedly update the network parameters W as
according to Eq. (9) and we will get a sparse and
selective network (St ).
Wnew ¼ Wold arW LRMSE ðW ; xnk ; ynk Þ; (10)
15: else
where xnk ; ynk 2 batchn ¼fðxn1 ; yn1 Þ; ðxn2 ; yn2 Þ; . . . ; ðxnk ; ynk Þg. 16: The objective function of RLN will be calculated
batchnð1nNbatches Þ is the nth mini batch of data for training. according to Eq. (11).
Finally, to form continual learning, we solve the cata- 17: end if
18: Update the RLN parameters W t with Eq. (10).
strophic forgetting problem for other time stamp t 6¼ 1 and
19: end for
update the network parameters again with SGD, using the
20: end for
same objective function of Eq. (2). With the continuous
21: if t 6¼ 1 then
arrival of datasets Dt (t 6¼ 1), we will use ‘2 -regularization
22: for l ¼ 1 ! L do
to prevent large parameter drifting from time t 1 to t. We 23: for i ¼ 1 ! M do
leave the unselected weight unchanged and train the weight 24: ifdl2 ¼ kwtui wt1 k2 > t then
ui
of the selective network WSt t as l l
25: Duplicate the hidden unite uil , add it from sub-
2
network St1 to St , and form a new sub-network
min LRMSE ðWSt t ; Dt Þ þ m12 WSt t WSt1
t1
; (11) St0 .
WSt 2
t
26: end if
where WSt t denotes the weights of selected subnetwork on 27: end for
datasets Dt , and m12 is the ‘2 -regularization parameter. 28: end for
29: Get the new subnetwork St0 and reselect the network
2) Hidden Node Splitting
by the method of breadth-first search.
In Eq. (11), we use the ‘2 -regularization to prevent WSt t
30: end if
from deviating too much from WSt1 . If the upcoming dataset
t1 31: Deep copy the model parameters ut and W t . Initial the
Dt is highly relevant to Dt1 , the weight difference between network parameters utþ1 and W tþ1 with ut and W t .
them will be less than the threshold. However, if the parame- 32: end for
ters drift too much, it means the new features we learn from
the current dataset Dt fail to prevent the catastrophic for-
getting, and we should find a good way to capture the fea-
3.3 Integrated Training of the CLN-RLN Model
tures of both datasets Dt and Dt1 at the same time. To
address this issue, we split hidden nodes of subnetwork St1 Algorithm 1 describes the details on how CLN-RLN model
and add selected ones to St , so that we can get the features is trained. First, data pre-processing is performed. The
that are suitable for both datasets. For any hidden unit i at samples with missing values will be deleted. According to
the distribution characteristics of yk , we use K-means to
the lth layer uil , if the ‘2 -distance (dl2 ¼ kwtui wt1
ui
k2 ) between label each instance as ck and partition the dataset Dt into
l l
the weight of uil at time t and t 1 is larger than the threshold N mini batches (Line 4-5). Second, we calculate the CLN
t, we will duplicate the node i and add it from the output and loss value, and update the CLN parameters ut
(Line 8-10). The results of CLN will be fed into RLN as the TABLE 1
input. Same as CLN, the output, loss value and parameter Simulation Parameters
updating of RLN will be performed (Line 11-18), but the
Parameter Value
loss function that needs to be minimized is different
when t ¼ 1 and t 6¼ 1. The reason for using different loss Cellular layout Hexagonal grid, 7 eNodeBs, 21
functions is that we have different expectations for the cells
Carrier frequency 2.14GHz
structure of the network at different time points. When the Inter-site distance 500 m
first dataset (t ¼ 1) is sampled, our goal is to get a network Frequency bandwidth 20 MHz
with sparse connections, so the ‘1 -regularization is Transmit power 5W
applied. After the subsequent data arrive, we can train on BS antenna height 32 m
a sparse and selective network (Line 14) which could be Antenna gain pattern Directional
Electrical tilt angle 3A^
obtained through breadth-first search. While at the time ^
Mechanical tilt angle 5A
point t 6¼ 1 (Line 16), to alleviate the catastrophic forgetting ^ , 120A ^ , 240A
^ )
Azimuth (0A
problem, we introduce the ‘2 -regularization to prevent Transmission mode CLSM 2x2
too high parameter drift. Third, check all the hidden nodes Path loss model TR 36.942 [35]
of RLN whether they need to be split (Line 22-28). If Shadow fading TR 36.942 [35]
needed, i.e., dl2 ¼ kwtui wt1
uil
k2 > t, duplicate the hidden Small scale fading TR 36.942 [35]
unite uil and add it from l
the subnetwork St1 to St . We Channel model ITU-R Ped-A
will get a new subnetwork St0 until all hidden nodes are UE position, UE velocity Random distribution, 5 km/h
Number of UEs per cell 30
checked. Then, we need to re-select the subnetwork for UEs height 1.5 m
future testing (Line 29). Finally, initialize the network Traffic model Video
parameters utþ1 and W tþ1 with ut and W t for next time Scheduler Proportional Fairness (PF)
stamp (Line 31). Different from other cascaded DNNs with Feedback AMC:CQI, MIMO:PMI and RI
different stages trained individually, in this paper, the two Feedback delay 3 TTI
stages of our cascaded DNNs are integratedly trained Number of TTIs 1000
(Line 8-18). simulated
Epochs 100
3.4 Computational Cost

We measure the computational cost of cascaded DNN The FLOPs of CNN can be calculated by
model considered using the floating-point operations
X
D
(FLOPs) in the number of multiply-adds. For a fully con- FLOPsCNN ¼ Nepo ð2 Clin Kl2 1Þ Slout Clout ; (15Þ
nected (FC) feed-forward layer, the number of FLOPs is l¼1
Fi ¼ ð2Ii 1Þ Oi , where Ii and Oi are respectively the
input and output dimensions of layer i. FLOPs of CLN can where Clin is the input channel of the ith layer, Kl is the ker-
be calculated by nel size of the ith layer, Slout is the size of the output feature
" map on the ith layer, D is the number of convolution layers
FLOPsCLN ¼ Nepo ð2Iin 1ÞN1 of CNN, Clout is the output channel of the ith layer.
#
X
n 4 EXPERIMENT
þ ð2Ni1 1ÞNi þ ð2Nn 1Þcm ; (12Þ
i¼2 To evaluate the performance of users, we use the Vienna
where cm is the number of classes, Iin is the dimension of the LTE-A Downlink System Level Simulator [34], an open
input data and Ni is the number of nodes in the hidden resource platform written by MATLAB. The platform is
layer i. Similarly, FLOPs of RLN is estimated by composed of a link performance model and a link quality
" model. With the channel modeling based on network
deployment, the link quality model measures the post-
FLOPsRLN ¼ Nepo ð2cm 1ÞN1
equalization SINR for resource assignment and link adapta-
# tion, which will be fed into the link performance model to
X
n
þ ð2Ni1 1ÞNi þ ð2Nn 1Þcm : (13Þ determine the block error ratio (BLER) and throughput. The
i¼2 key simulation parameters are summarized in Table 1.
In the process of hidden-node splitting, suppose Hi nodes 4.1 Dataset

are added to the hidden layer i, the FLOP increment of RLN
The proposed model is evaluated on the datasets provided
is expressed as
" by the simulator. We train the model with data from differ-
Xn ent eNodeBs separately, which makes it easy to realize dis-
DFLOPs ¼ Nepo ð2cm 1ÞH1 þ ð2Hi1 1ÞðNi þ Hi Þ tributed training and execution. In order to make the
i¼2
# description easier, we only discuss the data set of one
model. The dataset Dt ðt ¼ 1; 2; 3; 4; 5Þ of one eNodeB con-
þ ð2Hn 1Þcm : (14Þ
tains 9000 records of 90 users in 100 epochs. In this paper,
MOS is measured in a period of one second, and KPIs are
TABLE 2
The Video Traffic Model Parameters
Parameter Distribution Values

T Deterministic 40 ms (based on 25
frame per second, 512
kbps)
Fig. 3. The format of the samples. Npackets Deterministic 8
Psize Truncated Pareto a ¼ 1:2; K ¼ 400
(Minimum=130 Bytes,
collected every millisecond. In order to obtain the single Mean=266 Bytes, Maximum
feature of each interval and eliminate the fluctuation of the =400 Bytes)
channel, some statistic calculations are carried out to find Tinterval Truncated Pareto a ¼ 1:2; K ¼ 400
(Minimum=1 ms, Mean=2
the mean, the maximum and the minimum. Therefore,
ms, Maximum =4 ms)
each sample includes a user’s 42 KPIs, 1 estimated QoE
and 1 QoE class label. QoE is captured through mean opin-
ion scores (MOS) and has a range of [0,5]. According to the i : The packet delay PDi of user i can be estimated
(11) PDP
pi ij ij
value of QoE, the dataset is classified by K-means into 6 ðT T Þ
by PDi ¼ j¼1 departure pi
arrival
, where pi is the total number
classes. We use integers in the range of [0,5] to represent
of packets of user i generated during the transmissions,
the label of each class. The dataset is a set of samples fol- ij ij
Tdeparture and Tarrival are respectively the departure and
lowing the format described in Fig. 3. The detailed
arrival time of the jth packet.
description and calculation method of the instance are
(12) PLRi : The packet loss ratio PLRi of user i can be
described as follows.
described as the ratio of unsuccessfully delivered packets to
(1) RSRPi : The reference signal received power (RSRP)
total number of packets.
measures the linear average of the received power over the
(13) PJi : The delay jitter PJi of user i is defined as PJi ¼
measurement bandwidth. Ppi
k¼1 j j
PDki PDkþ1
The RSRPi of user i received from serving cell can be pi
i
, where pi is the total number of packets of
Lossi
estimated by RSRPi ¼ PRS NPRB , where PRS is the power of user i generated during the transmissions.
reference signals, Lossi is the propagation losses between (14) CQIi : Channel quality indicator (CQI) is used to
user i and the serving cell, and NPRB is the number of physi- reflect the channel quality of the downlink. The UE needs to
cal resource blocks (PRBs). evaluate the downlink characteristics according to the RS-
(2) RSSIi : The received signal strength indicator (RSSI) is SINR (reference signal SINR), uses an internal algorithm to
the total receiving power over the channel bandwidth. The determine the BLER value, and report the corresponding
RSSIi of the user i includes the power received from the CQI value according to the BLER<10% limit. The value
serving cell, non-serving cells, and thermal noise, i.e., range of CQI is 1-15, and different CQI values have different
RSSIi ¼ Pserving þ Pnonserving þ PN . modulation, code rate and efficiency.
(3) RSRQi : The reference signal received quality (RSRQ) (15) QoEivideo : Video streaming service, the most com-
combines reference signal strength and interference. The monly used service with large traffic, is considered in this
RSRQ of user i is defined as RSRQi ¼ NPRB RSRP i
RSSIi . work to measure user’s QoEivideo . The video streams are
(4) SINRi : The signal to interference noise ratio (SINR) transmitted from the wireless buffer of eNodeB to the play-
refers to the ratio of the received useful signal strength PD out buffer of the users. The video traffic model parameters
to the received interference signal strength (noise PN and are shown in Table 2. The video stream data are composed
interference I), i.e., SINRi ¼ PPDþI . of several frames arriving at a regular interval T , with the
N
(5) SNRi : The signal to noise ratio (SNR) refers to the rate determined by the number of frames per second
ratio of the received useful signal strength PD to the (Nframes ). Each video frame consists of Npackets packets. The
received noise PN , i.e., SNRi ¼ PPD . packet size Psize and packet arrival interval Tinterval follow a
N
(6) BLERi : Block error rate (BLER) is the percentage of truncated Pareto distribution parameterized by a and K
error blocks in all sent blocks. [36]. QoE is usually estimated by MOS value ranging from 1
(7) Ti : The user throughput Ti is determined by the user’s to 5. For the video streaming service, MOS is approximately
SINRi and BLERi through the formula Ti ¼ ð1 BLER estimated as [37]
Di
ðSINRi ÞÞ TTI , where Di is the data block in bits, TTI is the
minimum transmission interval. QoEivideo ¼ 4:23 0:0672Tinitial 0:742Freb 0:106Treb ; (16)
(8) HOSRi : Handover success rate (HOSR) is an indicator
that reflects the quality of users-network connections. HOSR where Tinitial is the initial buffering time (in seconds), Freb is
can be calculated by the ratio of the number of successful the re-buffer frequency (in sec ond1 ), and Treb is the re-
handovers NSH to the total number of handovers NTH . buffer duration (in seconds). The buffer threshold (Thbuf ) is
(9) RBi : The allocated RB (i.e., resource block) of user i at set to get the Tinitial , Freb and Treb . At the beginning of traffic
each TTI. buffering, Tinitial will be added by 1 ms at each TTI until the
(10) RIi : The rank indicator (RI) is one control informa- current buffer size is greater than the Thbuf . The buffer size
tion that UE will report to eNodeB based on uplink schedul- decreases with the size of resource block allocated to the
ing, which indicates the number of code words the user can user at each TTI. When the current buffer is empty, Treb will
support now. be increased by 1 ms per TTI until new packages are
TABLE 3
Effect of Thbuf on QoE Distribution and Model Accuracy
Thbuf 0.5 frame (10240 bit) 1 frame (20480 bit) 1.5 frame (30720 bit) 2 frames (40960 bit)
QoE distribution ð; 0Þ 64:4% ð; 0Þ 0% ð; 0Þ 0% ð; 0Þ 0%
½0; 1Þ 1:1% ½0; 1Þ 6:6% ½0; 1Þ 0% ½0; 1Þ 0%
½1; 2Þ 4:4% ½1; 2Þ 36:6% ½1; 2Þ 46:6% ½1; 2Þ 0%
½2; 3Þ 8:8% ½2; 3Þ 22:8% ½2; 3Þ 44:4% ½2; 3Þ 78:8%
½3; 4Þ 15:5% ½3; 4Þ 18:5% ½3; 4Þ 3:5% ½3; 4Þ 14:4%
½4; 5 5:8% ½4; 5 15:5% ½4; 5 5:5% ½4; 5 6:8%
Classification Accuracy 0.7164 0.9811 0.9276 0.8994
RMSE 0.351428 0.010129 0.084554 0.239717
generated and the current buffer size beyond Thbuf . We call 5 RESULTS AND COMPARISON
the process of current buffer size from zero to larger than
In this section, we evaluate the performance of CLN-RLN
the buffer threshold as one pause. Freb will be counted as
with and without the part of splitting nodes, and compare
the reciprocal of pauses.
its performance with that of existing approaches.
(16) QoEivideo class labelling: According to the value of
QoEivideo , the dataset is classified by K-means into m classes.
The data in each category constitute a new data set for sub- 5.1 Sensitivity Analysis Results
sequent supervised classification and regression. The value Before evaluating the proposed model, we need to deter-
m is selected by considering the trade-off between training/ mine the optimal parameters first.
testing error and computing complexity. For the video streaming service, MOS is approximately
From the collected dataset, we observe that the distribu- estimated by Eq. (16). The buffer threshold Thbuf is set to get
tion of QoE data is uneven. There are much fewer data with the Tinitial , Freb and Treb . It can be seen that Freb has the great-
QoE values in the range of 0-1 and 4-5, and most data are in est impact on QoEivideo . The smaller the value of Thbuf , the
the middle ranges. To alleviate the impact of unbalanced larger the value of Freb , and the smaller the value of
data on model training, we create more balanced dataset QoEivideo . On the contrary, a large Thbuf will reduce the fre-
with ADASYN [38], one of the over-sampling algorithms to quency of pause events and get larger QoEivideo . In this
handle data balancing. ADASYN uses a systematic method paper, we set the bitrate to 512 kbps and the inter-arrival
to adaptively create different amounts of synthetic data time between the beginning of each frame to 40 ms. 25
according to their distributions. The main idea of the ADA- frames will be generated per second. We set the average
SYN algorithm is to use the density distribution as a criterion number of bits generated in one frame as the buffer thresh-
to automatically determine the number of synthesized sam- old Thbuf ¼ 40 ms 512 kbps ¼ 20480 bit. In order to justify
ples that need to be generated for each minority example. At the Thbuf , we test the effect of Thbuf value on QoE distribu-
the time t, the dataset Dt is divided into three sets, of which tion and model accuracy in Table 3 as follows. From Table 3,
60% is used for training, 20% is used for validation and the we find that with the increase of Thbuf , the QoE distribution
left part is used for testing. will gradually move to the interval with large value, which
even makes the intervals with small values lack of data.
Conversely, a small Thbuf value will move the QoE distribu-
4.2 Experiment Setting tion to the small value range, and even make most of the
To compare the performance of our proposed CLN-RLN data out of the range [0,5]. In addition, the classification
with the state-of-the-art solutions, we take the regression accuracy and RMSE of the model also decrease with inap-
algorithms including traditional DNN, SVR, DTR, GDBT, propriate buffer size threshold. This is because too large or
RFR, and boosting SVR. We also test the CLN-RLN too small a value makes the QoE distribution change
model without Splitting to verify the effectiveness of greatly, which affects the accuracy of classification and
splitting nodes in alleviating the problem of catastrophic regression. Therefore, we set Thbuf to 20480 bits. Thbuf is
forgetting. All the models are separately trained with affected by the bit rate and frame rate of the video, which
data from different eNodeB. The DNN is a three-layer can be expressed as Thbuf ¼ frameratebitrate
.
fully connected neural network which consists of 100, The number of classes to subdivide the data has an
100, 6 neurons accordingly. The rectified linear unit impact on the accuracy of model. As shown in Fig. 4, the
(ReLU) is utilized as the activation function and Adam loss decreases greatly when the number of classes m 2 ½2; 6.
optimizer is used to update network weights with a ini- When m 6, the loss only reduces slightly and keeps stable,
tial learning rate of a0 ¼ 0:01. We set decay rate ¼ 0:5, but the computational complexity increases linearly with
decay steps ¼ 10, epochmax ¼ 100, and Nbatches ¼ 100. Our the rise of m. Therefore, we take m ¼ 6 to ensure that the
proposed model CLN-RLN, traditional DNN and CLN- model has a relatively low loss value and low computa-
RLN model without Splitting are tested with the same tional complexity.
DNN configuration and parameters. The parameters of Next, we evaluate the impact of dataset (Dt ) size on aver-
other regression algorithms are optimized with Grid age model precision in Table 4. As the volume of data
Search method. increases, we can see that the classification accuracy and
Fig. 4. The variation of losses with the number of classes.
TABLE 4 Fig. 5. The variation of Emean with m1 and m2 .

The Impact of Dt Size on Average Model Precision
EDt =Mt and EDt1 =Mt with m1 and m2 . It can be seen that the
Original The number of Classification Testing loss of minimum Emean will be achieved when m1 ¼ 1:00E 06
Dataset classes Accuracy of RLN
Size CLN
and m2 ¼ 1:00E 04.
In the process of hidden node splitting, we employ the
1800 6 0.85697 0.267421 threshold t to determine whether a hidden node is needed
3600 6 0.91064 0.113745
5400 6 0.92911 0.085374
to be split. The sensitivity analysis of RMSE with the split-
7200 6 0.96612 0.028164 ting threshold t is provided in Table 5. We denote EDt =Mt as
9000 6 0.98447 0.009757 the test loss of model Mt trained on Dt and EDt =Mt1 as the
10800 7 0.98596 0.009704 test loss of model Mt trained on Dt1 . With the gradual
12600 7 0.99111 0.009433 decrease of t, the average EDt =Mt and EDt =Mt1 decrease first
14400 8 0.99437 0.009267 and then tend to be stable. When t 1:00E 03, the split-
18000 10 0.99533 0.009022 ting will not be triggered, and the average EDt =Mt and
EDt =Mt1 are relatively large. When t 1:00E 04, the split-
ting will be invoked. As t becomes smaller, the number of
testing loss have a large improvement when the dataset size hidden nodes copied increases, which increases the training
is less than or equal to 9000. The trend of advancement will and testing time. We set t to 1.00E-04 in our studies to get
gradually level off and more classes are needed to guaran- the lowest training and testing time on the premise of low
tee good quality when the dataset size is greater than 10800 average EDt =Mt and EDt =Mt1 .
to reduce the impact of data variance, which leads to high
computational complexity. Therefore, we set the data set
size to 9000. At the same time, we also found that the higher 5.2 The Results of Proposed CLN-RLN Model
the classification accuracy of CLN, the smaller the loss value In this subsection, the performance of our proposed CLN-
of RLN. RLN model is evaluated on the time-dependent datasets
The best ‘1 -regularization parameter m1 and the ‘2 -regu- D1 D5 . Each mini-batch selected from dataset Dt is first
larization parameters m2 of the model are determined with passed into CLN for classification and calculating the
the search over the grid points as shown in Fig. 5. As the Cross-Entropy lost function. Then multiple data subsets
two parameters are respectively applied in the process of output from CLN are input into the RLN model, and
getting sparse network and restricting the weight shift from diverse DNNs are dynamically constructed for each data
catastrophic forgetting, in order to get the optimal one, for set to minimize the overall regression error (RMSE).
the model Mt trained on Dt , we should not only minimize With the optimal regularization parameters, a selective
the test error EDt =Mt on the current data set Dt but also con- network with sparse connectivity is first generated and
sider the test error EDt1 =Mt on the previous data set Dt1 . In trained on D1 by solving the Eq. (9). The convergence pro-
Fig. 5, we show the variation of average test error (Emean ) of cess of the proposed model on one of the eNodeBs’ training
TABLE 5
The Sensitivity Analysis of RMSE With the Splitting Threshold t
t 1E-2 1E-3 1E-4 1E-5 1E-6

Average EDt =Mt 0.021312 0.018624 0.009757 0.014763 0.017813
Average EDt =Mt1 0.023274 0.021239 0.009507 0.012557 0.016571
Hidden layer k of layer 1: 0 layer 1: 0 layer 1: 0 layer 1: 29 layer 1: 69
RLN: total split nodes layer 2: 0 layer 2: 0 layer 2:27 layer 2: 375 layer 2: 375
Hidden layer k of RLN: layer 1: 95/100 layer 1: 98/100 layer 1: 98/100 layer 1: 113/129 layer 1: 153/169
Selected nodes/Full nodes layer 2: 44/100 layer 2: 47/100 layer 74/127 layer 2: 418/475 layer 2: 418/475
Training/test time (s) 178.3665/ 0.44274 181.3432/ 0.449987 196.3375/ 384.8829/1.56874 408.7432/1.70178
0.458774
the model establishment, which demonstrates that ADA-

SYN has play a good role in improving the performance of
the model. Splitting nodes are applied in the training pro-
cess to solve the catastrophic forgetting problem when deal-
ing with a new dataset. If we remove them from DNN, as
the learning continues, the model trained on Dt may forget
features learned from Dt1 , which results in a larger testing
error on Dt1 .
In Table 6, at time stamp t ¼ 2 5, EDt1 =Mt of CLN-RLN
Fig. 6. The convergence process of CLN (a) and RLN (b) on the training without using the splitting is about 1:58 2:53 times that
and validation dataset D1 . with splitting. Similarly, the error reduction in the case of
EDt =Mt is 1:54 2:53 times. The results demonstrate that
and validation dataset D1 is shown in Fig. 6. Fig. 6a is the using node splitting to augment the hidden structure is
classification accuracy and Cross-Entropy cost function effective for learning new features in the following time slot
curve of CLN with epochs. It can be seen that the classifica- to alleviate the problem of catastrophic forgetting. We also
tion accuracy is around 98:4% 0:15%. Fig. 6b is the conver- find that, for the dataset D1 D4 , almost all the EDt1 =Mt of
gence curve of RLN. The value of loss function greatly the proposed CLN-RLN is smaller than EDt =Mt . However,
reduces with the increase of epochs and it gradually con- for the CLN-RLN without splitting, the situation is
verges to around 0:009674 0:0002 when the epoch is reversed, which further testify that new features are better
greater than 50. learned when we replicated the neural nodes at t 1 and
When the new data set Dt ðt 6¼ 1Þ arrives, the CLN-RLN add them to the DNN at time t. Compared with our pro-
without Splitting is first performed by Eq. (11) to minimize posed CLN-RLN model, we find that the RMSE of CLN-
the loss function and restrict the weight shifting. The con- RLN without Splitting is only slightly increased, while the
vergence processes of CLN-RLN without splitting are RMSE of the Traditional DNN is more than 86 times that of
shown in Fig. 7. It can be seen that the RMSE of CLN-RLN CLN-RLN. Therefore, the low RMSE of our CLN-RLN
without Splitting reaches a very small value around model is mainly contributed by our Selective Training
0:014968 0:023593. Then the part of splitting nodes is method, where DNN connections are dynamically con-
introduced to make up for the catastrophic forgetting and structed around each RLN input node (cm in Fig. 2) to proc-
reduce the regression error caused by a big weight drifting. essed with the training. Furthermore, hidden-node Split can
Table 6 shows the testing error of CLN-RLN with and with- facilitate the learning of new features to further reduce the
out the part of Splitting. The introduction of nodes splitting learning loss.
can further reduce the testing error from 0:014968 Then, several state-of-the-art solutions are used for per-
0:023593 to 0:009314 0:009661. formance comparison. Table 7 shows that the proposed
algorithm reduces the loss error by 21 86 times compared
5.3 Performance Comparison with other commonly used regression algorithms. Specially,
We first tested the effect of data balancing on the regression the RMSE of our proposed model is nearly 86 times lower
error of the model. As the Table 6 shown, the data processed than that of directly using DNN for regression. Compared
by the balancing method ADASYN has a smaller RMSE on with other commonly used regression algorithms, the
Fig. 7. The convergence process of the CLN-RLN without splitting on dataset D2 (a), D3 (b), D4 (c), D5 (d).
TABLE 6
RMSE of CLN-RLN With and Without the Part of Splitting on Current Dataset Dt and Previous Dataset Dt1
Time EDt =Mt of CLN-RLN EDt =Mt of EDt1 =Mt of EDt =Mt of CLN-RLN EDt1 =Mt of CLN-RLN EDt =Mt of
stamp without ADASYN proposed proposed without Splitting without Splitting traditional DNN
CLN-RLN CLN-RLN [16]
t¼1 0.01854 0.010833 0.010833 0.8371
t¼2 0.01655 0.009661 0.009652 0.014968 0.015274 0.8438
t¼3 0.015486 0.009553 0.009567 0.017443 0.018127 0.8482
t¼4 0.016345 0.009425 0.009434 0.020741 0.021097 0.8367
t¼5 0.017941 0.009314 0.009377 0.023593 0.023763 0.8345
TABLE 7
Performance Comparison
Model type Computational cost Description Values Training Testing Average

time (s) time (s) error
RFR[11] OðM ðn log 2 n dÞÞ M trees, n instance, d n = 9000, d = 42, 83.9745 0.2867 0.207624
attributes. M = 100.
DTR[12] Oðn log 2 n dÞ n instance, d attributes. n = 9000, d = 42 0.7962 0.0259 0.319076
SVR[13] Oðdn2 Þ Oðdn3 Þ
P n instance, d attributes. n = 9000, d = 42 35.8874 4.0566 0.288063
GBDT[14] Oð M
i¼1 n log 2 n d hi Þ M trees, n instance, d n = 9000, d = 42, 79.4175 0.2833 0.227413
attributes, the ith tree M = 100,
depth hi . maximum depth
hmax ¼ 5.
boosting OðT ðdn2 ÞÞ OðT ðdn3 ÞÞ n instance, d attributes, n = 9000, d = 42, T 182.2497 18.5893 0.263376
SVR[15] T support vector = 5.
regressors.
DNN[16] See equation 12. cm classes, Ni nodes in N1 ¼ 100, N2 ¼ 105.8776 0.29971 0.840046
the layer i. 100, cm ¼ 1
CLN-RLN See equation 12 and 13. cm classes, Ni nodes in N1 ¼ 100, N2 ¼ 179.8173 0.4401 0.017515
Without the layer i. 100, cm ¼ 6
Split
Proposed See equation 12, 13 and 14. cm classes, Ni nodes in N1 ¼ 100, N2 ¼ 196.3375 0.4587 0.009757
CLN-RLN the layer i. 100, cm ¼ 6
ensemble algorithm is more effective than the general base nodes in the output layer of the kth specific network. It can
algorithm in this article. Among the three kinds of ensemble be seen that the complexity depends on the type, structure
learning methods, RFR has the best fitting effect on this sub- and the numbers of DNNs cascaded in each level. In the
ject. We also provide the training and testing time analysis combination of “FCN (1) + FCN (1) (Proposed CLN-RLN)”,
of different methods in Table 7. The training time of CLN- “FCN (1) + FCN (6) [22], [23]”,“FCN (1) + FCN (1) [24]”,
RLN is about 1.85 times that of the traditional DNN, while “FCN (1) + CNN (6) [19]” and “CNN (2) + FCN (6) [20],
the testing error of CLN-RLN is more than 86 times lower [21]”, the first-level network with FCN (1) or CNN (2) is
than that of DNN, which indicates that the cascaded DNN used to classify original dataset into several subsets, with
can greatly reduce the test error without increasing too data in each having similar characteristics; six FCNs or
much the training time. The introduction of splitting nodes CNNs are employed in the second-level network of “FCN
can further reduce the test error of the model, while the (1) + FCN (6) [22], [23]”, “FCN (1) + CNN (6) [19]” and
training time increases by about 16 s in comparison with the “CNN (2) + FCN (6) [20],[21]” to regress for each subsets
model without split nodes. We also find that the testing outputted from the first-level network, which increases the
time of cascaded DNN increases slightly compared with computational complexity compared with our proposed
that of DNN. Except for the model with DNN structure, the “FCN (1) + FCN (1) (CLN-RLN)”. In CLN-RLN, only one
training time of other models is relatively short, but they FCN (1) is needed to realize the regression of six subsets. In
have huge testing errors. the model of “FCN (1) + FCN (1) [24]”, although it also
Finally, we collect data sets from different eNodeBs and needs only one FCN (1) in the second-level network, trans-
conduct training and testing. Seven proposed models are fer learning is required to handle the six subsets, which
trained, and each model has a very low RMSE (in the range does not reduce the computational overhead because we
of 0.00953 to 0.01278), which shows that our proposed need to train the FCN (1) of the secondary network six
model is very effective in regression. times. Different from other models where the first-level net-
work is used for classification, in the combination of “CNN
(2) + CNN (1) [17],[18]”, the first-level network with 2
5.4 Computing Overhead CNNs is to perform feature extraction, and one CNN (1) in
In the parameter settings of the proposed model, we set the second-level network is for regression. The results of
Iin ¼ 42 and cm ¼ 6. The Nepo is selected based on the mini- “CNN (2) + CNN (1) [17],[18]” and “FCN (1) [16]” show
mum number of epochs needed for the training process to that the models without embedded classification module
converge. According to the results of split nodes in hidden have an extremely high RMSE, which further demonstrate
layer, we can get H1 ¼ 0, H2 ¼ 27, and achieve that the overall regression error can be greatly reduced
DFLOPs ¼ 5:611K. To highlight the advantages of our pro- through appropriate data set division to obtain subsets with
posed algorithm, Table 8 compares the complexity of the small variance. Compared with other models of Table 8, our
DNN-based models in terms of computations required to proposed method provides good performance in training
complete a single forward pass in the training phase. The time and RMSE.
expression “FCN (1) + CNN (6)” represents the composition
of a cascaded network, in which the bold Arabic numeral
followed denotes the number of specific networks. More 5.5 Generalization Evaluation
specifically, there are 1 FCN in the first-level network and 6 In order to evaluate the generalization capability of the pro-
k
CNNs in the second-level network. Noutput is the number of posed solution in other scenarios, we change the simulation
TABLE 8
The Complexity Analysis of DNN-Based Models
scenarios, train and test the proposed model on the dataset 6 CONCLUSION AND FUTURE WORK
generated by a new setup. Compared with the previous
In this paper, we propose a novel data-driven QoE model,
setup used in the paper, we change the simulation parame-
CLN-RLN, based on continual deep learning and to be
ters, such as eNodeB distribution, inter-site distance, band-
used in wireless networks. The CLN-RLN model consists
width, transmit power, the number of UEs, UE velocity,
of two DNNs connected in a series. The first one (CLN) is
electrical tilt and mechanical tilt. 12 cells and 600 UEs with
used to classify the input data into m categories to
10 km/h speed are generated and randomly distributed in a
achieve high classification accuracy, and the other one
square area of 2 km in length and width. Frequency band-
(RLN) is applied to get the final predicted value. The clas-
width and transmit power are reduced by half. Electrical
sification results of CLN will be fed into RLN as inputs to
tilt and mechanical tilt are randomly selected from [0, 8 ]
the corresponding subnetworks, where each subnet is
and [0, 10 ].
formed dynamically in a new time period with nodes and
Similar to the method in the paper, we train the model
links adaptively selected. The testing results demonstrate
with data from different eNodeBs separately, and there are
that the dynamic-connected network architecture formed
4 models to train for 4 eNodeBs. In order to make the
for selective training can largely reduce RMSE. In order
description easier, we only discuss the data set of one
to overcome the catastrophic forgetting problem in the
model. The test error of one model is shown in Table 9.
continual learning, we introduce Hidden-node Split to
Compared with the results in Section 5.3, we can draw a
prevent model parameters trained on the previous dataset
similar conclusion that our proposed learning framework
from deviating too much, and our results show that this
can greatly reduce the regression error to as low as 0.00951.
procedure can facilitate the learning of new features
to further reduce the learning loss. Our extensive per-
formance evaluations by comparing with 7 reference
TABLE 9
RMSE of the Proposed Model on Dataset Produced by algorithms demonstrate that CLN-RLN can reduce regres-
New Simulation Parameters sion error 21 86 times.
In the next phase, according to the proposed QoE/QoS
mapping model, we will focus on improving the QoE by
adjusting the network parameters, such as azimuth angle,
downtilt, transmit power, handover hysteresis and offset.
REFERENCES
[1] R. El Hattachi and J. Erfanian, “Next generation mobile networks
(NGMN) alliance: 5G white paper,” Frankfurt, Germany, NGMN,
White Paper, 2015.
[2] M. Seufert, N. Wehner, F. Wamser, P. Casas, A. D’Alconzo, and
P. Tran-Gia, “Unsupervised QoE field study for mobile youtube
video streaming with YoMoApp,” in Proc. 9th Int. Conf. Qual.
Multimedia Experience, 2017, pp. 1–6.
[3] X. Liu, G. Chuai, W. Gao, K. Zhang, and X. Chen, “KQis-driven [24] J. Zhang, Y. Li, S. Hu, W. Zhang, Z. Wan, Z. Yu, and K. Qiu, “Joint
QoE anomaly detection and root cause analysis in cellular modulation format identification and OSNR monitoring using
networks,” in Proc. IEEE Globecom Workshops, 2019, pp. 1–6. cascaded neural network with transfer learning,” IEEE Photon. J.,
[4] S. Chikkerur, V. Sundaram, M. Reisslein, and L. J. Karam, vol. 13, no. 1, pp. 1–10, Feb. 2021.
“Objective video quality assessment methods: A classification, [25] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter,
review, and performance comparison,” IEEE Trans. Broadcasting, “Continual lifelong learning with neural networks: A review,”
vol. 57, no. 2, pp. 165–182, Jun. 2011. Neural Netw., vol. 113, pp. 54–71, 2019.
[5] Y. Liu, S. Dey, F. Ulupinar, M. Luby, and Y. Mao, “Deriving [26] Z. Li and D. Hoiem, “Learning without forgetting,” in Proc. Eur.
and validating user experience model for DASH video stream- Conf. Comput. Vis., 2016, pp. 614–629.
ing,” IEEE Trans. Broadcasting, vol. 61, no. 4, pp. 651–665, Dec. [27] J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural
2015. networks,” Proc Natl Acad. Sci. USA, vol. 114, no. 13, pp. 3521–3526,
[6] M. Fiedler, T. Hossfeld, and P. Tran-Gia, “A generic quantitative 2016.
relationship between quality of experience and quality of service,” [28] A. A. Rusu et al., “Progressive neural networks,” 2016,
IEEE Netw., vol. 24, no. 2, pp. 36–41, Mar./Apr. 2010. arXiv:1606.04671.
[7] M. Vaser and S. Forconi, “QoS KPI and QoE KQI relationship [29] J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with
for LTE video streaming and VoLTE services,” in Proc. 9th Int. dynamically expandable networks,” in Proc. 6th Int. Conf. Learn.
Conf. Next Gener. Mobile Appl., Serv. Technol., 2015, pp. 318–323. Representations, 2018, pp. 1–12.
[8] A. Khan, L. Sun, and E. Ifeachor, “Content clustering based video [30] N. Houlsby et al., “Parameter-efficient transfer learning for nlp,”
quality prediction model for MPEG4 video streaming over wire- in Proc. 36th Int. Conf. Mach. Learn., 2019, vol. 97, pp. 2790–2799.
less networks,” in Proc. IEEE Int. Conf. Commun., 2009, pp. 1–5. [31] J. Pfeiffer et al., “AdapterHUB: A framework for adapting trans-
[9] M. Venkataraman and M. Chatterjee, “Inferring video QoE in real formers,” in Proc. Conf. Empirical Methods Natural Lang. Proc. Syst.
time,” IEEE Netw., vol. 25, no. 1, pp. 4–13, Jan./Feb. 2011. Demonstrations, 2020, pp. 46–54.
[10] ITU, “Sg12: performance qos and qoe; available: goo.gl/wnudy8,” [32] W. Hsu and C. Lo, “QoS/QoE mapping and adjustment model in
2021 the cloud-based multimedia infrastructure,” IEEE Syst. J., vol. 8,
[11] D. Minovski, C. Ahlund, K. Mitra, and P. Johansson, “Analysis no. 1, pp. 247–255, Mar. 2014.
and estimation of video QoE in wireless cellular networks using [33] E. Danish, A. Fernando, M. Alreshoodi, and J. Woods, “A hybrid
machine learning,” in Proc. 11th Int. Conf. Qual. Multimedia Experi- prediction model for video quality by QoS/QoE mapping in wire-
ence, 2019, pp. 1–6. less streaming,” in Proc. IEEE Int. Conf. Commun. Workshop, 2015,
[12] A. Herrera-Garcia, S. Fortes, E. Baena, J. Mendoza, C. Baena, and pp. 1723–1728.
R. Barco, “Modeling of key quality indicators for end-to-end net- [34] M. Rupp, S. Schwarz, and M. Taranetz, The Vienna LTE-Advanced
work management: Preparing for 5G,” IEEE Veh. Technol. Mag., Simulators (Signals & Communication Technology). Berlin, Germany:
vol. 14, no. 4, pp. 76–84, Dec. 2019. Springer, 2016.
[13] V. Pedras, M. Sousa, P. Vieira, M. P. Queluz, and A. Rodrigues, [35] 3GPP, “Evolved universal terrestrial radio access (E-UTRA); radio
“A no-reference user centric QoE model for voice and web brows- frequency (RF) system scenarios,” ETSI, Sophia Antipolis, France,
ing based on 3G/4G radio measurements,” in Proc. IEEE Wireless Tech. Rep. TR 36.942, 2020.
Commun. Netw. Conf., 2018, pp. 1–6. [36] 3GPP, “LTE physical layer framework for performance ver-
[14] D. Moura, M. Sousa, P. Vieira, A. Rodrigues, and M. P. Queluz, ification,” Orange, China Mobile, KPN, NTT DoCoMo, Sprint,
“A no-reference video streaming QoE estimator based on physical T-Mobile, Vodafone, Telecom Italia, China, Tech. Rep. R1–070674,
layer 4G radio measurements,” in Proc. IEEE Wireless Commun. 2007.
Netw. Conf., 2020, pp. 1–6. [37] R. K. P. Mok, E. W. W. Chan, and R. K. C. Chang, “Measuring the
[15] Y. Ben Youssef, M. Afif, R. Ksantini, and S. Tabbane, “A novel quality of experience of HTTP video streaming,” in Proc. 12th IFIP/
QoE model based on boosting support vector regression,” in Proc. IEEE Int. Symp. Integr. Netw. Manage. Workshops, 2011, pp. 485–492.
IEEE Wireless Commun. Netw. Conf., 2018, pp. 1–6. [38] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive syn-
[16] X. Tao, Y. Duan, M. Xu, Z. Meng, and J. Lu, “Learning QoE of thetic sampling approach for imbalanced learning,” in Proc. IEEE
mobile video transmission with deep neural network: A data- Int. Joint Conf. Neural Netw. World Congr. Comput. Intell., 2008,
driven approach,” IEEE J. Sele. Areas Commun., vol. 37, no. 6, pp. 1322–1328.
pp. 1337–1348, Jun. 2019.
[17] L. Li, L. G. Wang, F. L. Teixeira, C. Liu, A. Nehorai, and T. J. Cui, Xuewen Liu received the MSc degree in optical
“DeepNIS: Deep neural network for nonlinear electromagnetic engineering from the Tianjin University of Technol-
inverse scattering,” IEEE Trans. Antennas Propag., vol. 67, no. 3, ogy, in 2016. He is currently working toward the
pp. 1819–1825, Mar. 2019. PhD degree in information and communications
[18] C. Lo, Y.-Y. Su, C.-Y. Lee, and S.-C. Chang, “A dynamic deep engineering with the Beijing University of Posts
neural network design for efficient workload allocation in and Telecommunications, Beijing, China. From
edge computing,” in Proc. IEEE Int. Conf. Comput. Des., 2017, 2019 to 2020, he was a visiting PhD student with
pp. 273–280. the State University of New York at Stony Brook,
[19] X. Ma and Z. Gao, “Data-driven deep learning to design pilot and Stony Brook, NY, USA. His research interests
channel estimator for massive MIMO,” IEEE Trans. Veh. Technol., include wireless network fault detection, fault diag-
vol. 69, no. 5, pp. 5677–5682, May 2020. nosis, and intelligent network optimization based
[20] Z. Zheng, X. Li, Z. Sun, and X. Song, “A novel visual measure- on multi-agent reinforcement deep learning.
ment framework for land vehicle positioning based on multimod-
ule cascaded deep neural network,” IEEE Trans. Ind. Inf., vol. 17,
no. 4, pp. 2347–2356, Apr. 2021. Gang Chuai received the MSc and PhD degrees
[21] J. Yu and J. Liu, “Two-dimensional principal component anal- in information and communications engineering
ysis-based convolutional autoencoder for wafer map defect with the Beijing University of Posts and Telecom-
detection,” IEEE Trans. Ind. Electron., vol. 68, no. 9, pp. 8789– munications (BUPT), Beijing, China, in 1999 and
8797, Sep. 2021. 2010, respectively. He is currently a full professor
[22] N. Athreya, V. Raj, and S. Kalyani, “Beyond 5G: Leveraging with the School of Information and Communica-
cell free TDD massive MIMO using cascaded deep learning,” tion Engineering, BUPT. From 2006 to 2007, he
IEEE Wireless Commun. Lett., vol. 9, no. 9, pp. 1533–1537, Sep. was a senior visiting scholar with the University of
2020. Drexel, Philadelphia, PA, USA. His research inter-
[23] F. N. Khan et al., “Joint OSNR monitoring and modulation format ests include wireless communications networking
identification in digital coherent receivers using deep neural technology, intelligent network optimization, and
networks,” Opt. Express, vol. 25, pp. 17767–17776, Jul. 2017. wireless network positioning theory.
Xin Wang (Member, IEEE) received the BS and Weidong Gao received the PhD degrees in 2009
MS degrees in telecommunications engineering from the Beijing University of Posts and Telecom-
and wireless communications engineering respec- munications (BUPT) in China. From 2009 to 2015,
tively from the Beijing University of Posts and he was a senior engineer with Potevio Company
Telecommunications, Beijing, China, and the PhD Ltd., and engaged in the research of LTE stand-
degree in electrical and computer engineering ards and algorithms. He is currently an associate
from Columbia University, New York, NY, USA. professor with the School of Information and
She is currently an associate professor with the Communication Engineering, BUPT. His research
Department of Electrical and Computer Engineer- interests include next generation wireless commu-
ing of the State University of New York at Stony, nication, signal and information processing, and
Stony Brook, NY, USA. Her research interests Internet of Things.
include algorithm and protocol design in wireless networks and communi-
cations, mobile and distributed computing, and networked sensing and
detection. She was with executive committee and technical committee of " For more information on this or any other computing topic,
numerous conferences and funding review panels. She is an associate please visit our Digital Library at www.computer.org/csdl.
editor for the IEEE Transactions on Mobile Computing. She was the recip-
ient of the NSF Career Award in 2005, and ONR Challenge Award in 2010.
Zhiwei Xu (Member, IEEE) received the BS

degree from the University of Electronic Science
and Technology of China, Chengdu, China, in
2002, and the PhD degree from the Institute of
Computing Technology, Chinese Academy of
Sciences, Beijing, China, in 2018. He is currently
an associate professor and MS supervisor of inner
Mongolia university of technology, while working
as an adjunct professor with the Institute of com-
puting, Chinese Academy of Sciences. He is cur-
rently a postdoctoral with the Department of
Electrical and Computer Engineering, State University of New York, Stony
Brook, NY, USA. His research interests include network performance
analysis and the related mathematics problems.

QoE Assessment Model Based On Continuous Deep Learning For Video in Wireless Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

QoE Assessment Model Based On Continuous Deep Learning For Video in Wireless Networks

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO.

6, JUNE 2023 3619

QoE Assessment Model Based on Continuous

1 INTRODUCTION measure the service performance indicators and determine

The combination of ”FCN+FCN” is employed for the cali-

where C is the total number of classes, and P ðcc =xÞ is the

3.4 Computational Cost

In the process of hidden-node splitting, suppose Hi nodes 4.1 Dataset

Parameter Distribution Values

Fig. 4. The variation of losses with the number of classes.

TABLE 4 Fig. 5. The variation of Emean with m1 and m2 .

t 1E-2 1E-3 1E-4 1E-5 1E-6

the model establishment, which demonstrates that ADA-

Model type Computational cost Description Values Training Testing Average

Zhiwei Xu (Member, IEEE) received the BS

You might also like