Professional Documents
Culture Documents
Abstract—Quality of experience (QoE) is a vital metric that indicates how well the wireless network provides transmission services to
users, while quality of service (QoS) help better configure the network parameters for higher performance. The evaluation time of QoE is
usually several orders of magnitude larger than that of QoS, because QoE is the perception of users over a period of time, but QoS can be
collected every millisecond. Therefore, the implementation of QoE/QoS mapping model can help us obtain QoE by collecting the QoS
measurements, and perform QoE-based network configurations with smaller time granularity. Many studies are made to obtain the QoS to
QoE mapping, including the use of machine learning (ML) methods. However, traditional ML-based regression methods for QoE/QoS
mapping face the challenge of high regression error and catastrophic forgetting in dealing with continuously arriving data. In this paper, we
propose a novel QoE model based on continual deep learning in wireless network. This model is formed with two deep neural networks
(DNNs) concatenated. The first DNN classifies data into different subsets, which are then fed into the second DNN for regression. The
second DNN dynamically form the corresponding subnets, each with nodes and connections adaptively selected in each new time period
with new arriving data. We solve the catastrophic forgetting problem with the use of node splitting and hidden state augmentation. Our
proposed learning framework greatly reduces the regression error to as low as 0.9314%. The experimental results demonstrate that our
proposed model reduces the root mean square error (RMSE) by 21 86 times compared with several existing approaches, specially, the
testing error of our proposed model is more than 80 times lower than that of traditional DNN. Compared with other DNN-based cascade
models, our proposed method provides good performance in both training time and RMSE.
Index Terms—Data-driven QoE assessment, continual deep learning, QoE/QoS mapping, wireless network, cascaded DNNs
QoE/QoS relationship. Since machine learning has advan- model update through hidden node splitting to prevent cat-
tages in inferring the complex relationship between multi- astrophic forgetting in the continuous learning.
ple parameters, many machine learning techniques are Based on the above analysis, we propose the solutions
proposed to associate QoE with QoS in recent years, includ- from two perspectives to address the challenges.
ing random forest regression (RFR) [11], liner regression First, based on our analysis of distribution of data on
(LR) [12], decision tree regression (DTR) [12], support vector QoE, we develop a new deep learning framework com-
regression (SVR) [13], gradient boosting decision tree posed with two cascading DNNs to achieve low regression
(GBDT) [14], boosting SVR [15] and DNN [16]. errors. The first-level DNN, called classification learning
However, there are two drawbacks in the above studies. network (CLN), is used to properly classify the input data
into m categories. Then, the classification results will be
Poor regression performance. From the analysis of state- processed in the second-level DNN - regression learning
of-the-art learning-based models for QoE and QoS network (RLN), to perform regression. Different from a cas-
mapping, we found that the regression error (RMSE) caded DNN used in other contexts, the first-level network
of all models is greater than 10%, which is undoubt- of the proposed model is applied to divide the data set into
edly unacceptable for MNOs that need accurate m subsets (classes) rather than performing the feature
knowledge of the user QoE at any time. It is also dif- extraction in [17] and [18]. In the second-level network, we
ficult to optimize QoE through this inaccurate introduce selective training to form m different subnets
model, as the inaccuracy may offset the performance from a single DNN by dynamically selecting neural net-
gains brought by network optimization and intro- work nodes and connections for each to make the architec-
duce larger network configuration errors. ture more compact and achieve the regression of m data
Not considering the model update. Almost all literature sets. While, in [19], [20], [21], [22], [23] and [24], m DNNs or
studies did not consider how to update the QoE transfer learning are needed to perform this function, which
model as users and the wireless network environ- increases the computational complexity.
ment change, which greatly reduces the usability Second, we exploit the principle of continuous learning
and accuracy of the model. for model updating to alleviate the catastrophic forgetting
In wireless networks, QoE is often captured through problem. Continuous/incremental learning has the ability
mean opinion scores (MOS) and has a range between 5 to quickly learn new data on the basis of existing knowledge
(excellent) and 1 (bad). Through the analysis of QoE distri- while minimizing forgetting. Different neural network
bution, we find that MOS values tend to form clusters in approaches [25] for lifelong learning has been developed,
every score range with the length of 1 and fluctuate around such as Regularization Approaches and Dynamic Architec-
some central points. Clustering similar data into subsets can tures. The regularization methods [26], [27] alleviate the cat-
reduce the fluctuation of MOS and make the data distribu- astrophic forgetting by imposing constraints on the update
tion of each subset more centralized with smaller variance. of neural weights, which may result in a trade-off between
Obviously, the model trained for regression by a data set the performance of old and new tasks because a limited
with a large variance is more likely to have a larger training number of neural resources may not be able to capture the
loss than that by a data set with a small variance. This moti- information well. Dynamic Architectures prevent forgetting
vates us to design a cascaded model to first divide the data by increasing the number of neurons [28], [29] or network
set into several new data sets with small variance in the layers [30], [31] to represent new information. However, the
first-level model, and then train the second-level model on main disadvantage of this strategy is that the number of
each new subset through regression. parameters increases greatly with the progress of learning.
This paper focuses on the mapping between QoS and [29] proposed a dynamically expanding network (DEN)
QoE to obtain QoE by collecting QoS and performing future that splits hidden nodes to incrementally learn new tasks.
QoE-based network configurations with a smaller time Inspired by [29], in this paper, we develop a novel model
granularity. In wireless networks, a lot of metrics are used updating method based on the principal of Regularization
to evaluate QoS, which makes the input of the model high- Approach whose severe performance degradation caused
dimensional. DNN is good at handling high dimension of by limited neural resources will be alleviated by using the
input to achieve good classification and regression results. hidden-node splitting to prevent the forgotten features of
Therefore, DNN is employed in both parts of the model. the previous data. To overcome the disadvantage of large
However, in the second-level model, if each subset is increase of parameters in Dynamic Architectures, in our
trained separately, we will have to train multiple models to proposed RLN model, selective training is adopted to form
ensure the training accuracy at the cost of complexity. On more compact structure, which will further reduce exces-
the other hand, if we use one model and train it with all sub- sive neural resources increased by learning new knowledge.
sets of data at together, the complexity can be reduced but it Different from the two training processes included in [29],
cannot guarantee the model accuracy. Instead, in order to our method only needs to be trained once and evaluated
ensure the model to be accurate while not incurring a high whether all nodes in each layer need to be copied, which
training complexity, we propose to divide a DNN network can significantly reduce the computational overhead.
(which generally has many connected neurons at each layer) The contributions of this paper are summarized as
into a number of sub-networks, and train each using a corre- follows:
sponding data subset. In addition, taking advantage of the
unique network structure of DNN composing of abundant 1) We propose a novel deep network architecture, i.e.,
connected neurons, we further propose to enable efficient CLN-RLN, for QoE/QoS mapping model which
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
LIU ET AL.: QOE ASSESSMENT MODEL BASED ON CONTINUOUS DEEP LEARNING FOR VIDEO IN WIRELESS NETWORKS 3621
consists of two cascaded DNNs. The first-level DNN measurements to train a RFR model against calculated
is employed to classify original dataset into several video quality metrics from the reference videos, and
subsets, with data in each having similar characteris- achieves the RMSE of 19.2%. However, post-processing
tics. At the second level, for each subset, a DNN sub- analysis on the original video with the degraded reference
net is flexibly and dynamically constructed with file is typically needed to compare the corresponding
nodes and links selected to reduce the training frames of the two video files and produce objective quality
errors. We demonstrate that the regression error can metrics such as MOS, Peak Signal–to–Noise Ratio (PSNR),
be greatly reduced by reasonable data partition and frame delay, frame skips, and blurriness. In [12], the authors
selecting appropriate sub networks for the integrated explore the method of getting key quality indicators (KQIs)
regression. To the best of our knowledge, we are the from KPIs with DTR in a real cellular network, and the eval-
first that propose the framework and use it for QoE/ uation indicates good fitness levels and modeling accuracy.
QoS mapping. In [14], the authors provide a new QoE prediction model for
2) We provide a new method to form neural subnet- video streaming services over 4G networks with drive test
works with nodes and links dynamically selected to data, using layer 1 (i.e., Physical Layer) key performance
greatly reduce the regression error and computa- indicators (KPIs). From the several considered ML algo-
tional complexity caused by the introduction of mul- rithms, GBDT shows the best performance, achieving 78.9%
tiple DNNs in the second-level. Pearson correlation and 11.4% MSE. Based on SVR, work in
3) We develop a model updating method based on the [15] creates a powerful predictive model, i.e., boosting SVR,
principal of Regularization Approach whose severe which shows its superiority over relevant ensemble learning
performance degradation caused by limited neural methods and gets the RMSE of 47%. In [16], four types of
resources will be compensated by using of hidden- subjective scores and 89 network parameters are collected
node splitting to prevent the forgotten features of the by a mobile phone application to achieve a data-driven
previous data. The compact architecture achieved by objective QoE prediction approach. As DNN can learn more
selective training will curb the disadvantage of sub- potential features through a large number of connected neu-
stantial increase of parameters in learning new rons, a DNN structure shows strong ability in presenting
knowledge. the complex relationship between the network performance
To the best of our knowledge, this is the first work that metrics and the user scores. The results show that DNN out-
exploits the use of cascading DNNs to achieve precise performs SVM, DTR and GBDT.
regression. Our proposed model reduces the RMSE by 21 Despite the potential of applying DNN to predicting QoE
86 times compared with several reference algorithms. Spe- scores, the performance of the current QoE/QoS mapping
cially, the RMSE of our CLN-RLN model is nearly 86 times shows that machine learning methods used in the current
lower than that of directly using DNN for regression with- research can not meet the practical requirements of low
out increasing too much training time. The regression error regression error. Feature selection almost needs to be car-
of the CLN-RLN model is as low as 0.9314%. ried out first, and the performance highly relies on its accu-
racy. Also, reference videos are essential for all the schemes
without real-time QoE estimation. Moreover, how to update
2 RELATED WORK model effectively after finishing QoE/QoS mapping is not
Recently, QoE/QoS mapping models based on machine mentioned in above work, which is the key to judging
learning methods in RAN have been developed. Previous whether the model is usable or not as wireless network
research on QoE assessment was mainly made based on states change rapidly.
QoS indicators such as packet loss rate, jitter, delay and Cascaded DNNs provide a new DNN-based structure.
bandwidth [32]. However, the QoS metrics alone can not They posses strong feature extraction ability and can
effectively reflect the fault problems in RAN and the root achieve high accuracy, but suffer from high complexity and
causes for poor user experience quality, such as poor cover- overhead. Various cascaded DNNs composed of Fully Con-
age, frequent handover, high interference and resource nected Network (FCN) or Convolutional Neural Networks
overload. Many other KPIs describing the radio conditions (CNN) have been developed in many other contexts. Paired
are introduced [11], [13], [14], [15], [16], [33] to compensate CNNs (termed ”CNN+CNN”) are utilized in image proc-
for the network quality metrics, such as channel quality essing [17] and efficient workload allocation [18] for feature
indicator, reference signal received quality, received signal extraction. In [19], a novel DNN architecture with two cas-
strength indicator, throughput, signal to interference noise caded parts included (termed ”FCN+CNN”) is proposed to
ratio and load. reconstruct the high-dimensional channel from the low-
In [13], MOS is captured by comparing the received sig- dimensional measurements. The first stage utilizes FCN to
nal with the original one. Feature pre-selection is first con- obtain an initial coarse channel estimation which is deliv-
ducted by using Pearson Correlation, and then SVR is used ered to CNNs in the second stage. In [20], another kind of
to map the QoS metrics and several Radio Frequency (RF) cascaded DNNs (termed ”CNN+FCN”) is proposed to
channel measurements of Real Drive Test (DT) data into achieve vehicle positioning with fully convolutional autoen-
MOS with the RMSE of 11%, but real-time QoE estimation coder network for extracting equivalent positioning features
was not studied. In order to improve the prediction accu- and two FCNs for efficiently estimating lateral position and
racy of classical individual learner, ensemble learning algo- yaw angle. Same cascaded structure (”CNN+FCN”) as [20],
rithms are applied in [11], [12], [14], [15]. Work [11] uses in [21], convolutional autoencoder network is applied to
several real-live radio metrics collected by smartphone abstract latent features and four FCNs to reconstruct image.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
3622 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 6, JUNE 2023
X
C
LCE ðuÞ ¼ P ðcc =xÞ log ðP ðck =xÞÞ; (5)
k¼1;k6¼c
parameters of RLN and l represents the lth layer of network. subnetwork St1 to St . After all hidden nodes have been
N is the total number of samples. checked, the new subnetwork St0 are formed and it will be
‘1 -regularization is employed to train the DNN for get- used for further testing.
ting the sparse connectivity below:
Algorithm 1. CLN-RLN Model
X
L
min LRMSE ðW t¼1
; D t Þ þ m1 W t¼1 ; (9) Input: Dataset D ¼ ðD1 ; D2 ; . . . ; Dt ; . . . ; DT Þ. Learning rate a.
l 1
W t¼1 l¼1 ‘1 -regularization parameter m1 . ‘2 -regularization parameter
m2 ; The max number of epoch epoch max .
where Wltdenotes the lth layer network parameters at the Output: The final network parameters uT and W T .
time t, and W t ¼ fWlt gLl¼1 . m1 is the ‘1 -regularization 1: Initialize: Randomly initialize u and W .
parameter. 2: Training:
As a second step, we will obtain the selective network at 3: for t ¼ 1 ! T do
time point t ¼ 1 and update the network parameters W . 4: Label each instance of dataset Dt and divide it into N
Once we have the sparse network, we can determine differ- mini batches.
ent units and weights, which are affected by different cate- 5: Each mini batch can be expressed as: batchnð1nNÞ ¼
gories in the network and a neural network is selected for fðxn1 ; cn1 ; yn1 Þ; . . . ; ðxnk ; cnk ; ynk Þg.
each category. More specifically, as shown in Fig. 1, for the 6: for epoch ¼ 1 ! epoch max do
input node cm , the breadth-first search on the network is 7: for batch ¼ 1 ! batch max do
performed to identify all units that have paths (blue arrow 8: Get the CLN output layer by layer using Eq. (4).
in Fig. 1) from it, and these selected units constitute one sub- 9: Measure the CLN loss according to Eq. (5).
network. The construction method of other subnetworks is 10: Update the CLN parameters ut with Eq. (6).
similar to cm , and we will get the final selective networks 11: Calculate RLN output layer by layer using Eq. (7).
St¼1 on dataset Dt¼1 . Although each subnetwork is used for 12: Get the new batch (batch0 n ) to minimize the objective
the regression with its corresponding dataset, the objective function of RLN, where batch0 n ¼ fðxn1 ; yn1 Þ; . . . ; ðxnk ;
ynk Þjxnk ; ynk 2 batchn g.
of parameter updating is to minimize the total training error
13: if t = 1 then
of all subnetworks. In order to minimize LRMSE ðW Þ, SGD is
14: The objective function of RLN will be calculated
utilized to repeatedly update the network parameters W as
according to Eq. (9) and we will get a sparse and
selective network (St ).
Wnew ¼ Wold arW LRMSE ðW ; xnk ; ynk Þ; (10)
15: else
where xnk ; ynk 2 batchn ¼fðxn1 ; yn1 Þ; ðxn2 ; yn2 Þ; . . . ; ðxnk ; ynk Þg. 16: The objective function of RLN will be calculated
batchnð1nNbatches Þ is the nth mini batch of data for training. according to Eq. (11).
Finally, to form continual learning, we solve the cata- 17: end if
18: Update the RLN parameters W t with Eq. (10).
strophic forgetting problem for other time stamp t 6¼ 1 and
19: end for
update the network parameters again with SGD, using the
20: end for
same objective function of Eq. (2). With the continuous
21: if t 6¼ 1 then
arrival of datasets Dt (t 6¼ 1), we will use ‘2 -regularization
22: for l ¼ 1 ! L do
to prevent large parameter drifting from time t 1 to t. We 23: for i ¼ 1 ! M do
leave the unselected weight unchanged and train the weight 24: ifdl2 ¼ kwtui wt1 k2 > t then
ui
of the selective network WSt t as l l
25: Duplicate the hidden unite uil , add it from sub-
2
network St1 to St , and form a new sub-network
min LRMSE ðWSt t ; Dt Þ þ m12 WSt t WSt1
t1
; (11) St0 .
WSt 2
t
26: end if
where WSt t denotes the weights of selected subnetwork on 27: end for
datasets Dt , and m12 is the ‘2 -regularization parameter. 28: end for
29: Get the new subnetwork St0 and reselect the network
2) Hidden Node Splitting
by the method of breadth-first search.
In Eq. (11), we use the ‘2 -regularization to prevent WSt t
30: end if
from deviating too much from WSt1 . If the upcoming dataset
t1 31: Deep copy the model parameters ut and W t . Initial the
Dt is highly relevant to Dt1 , the weight difference between network parameters utþ1 and W tþ1 with ut and W t .
them will be less than the threshold. However, if the parame- 32: end for
ters drift too much, it means the new features we learn from
the current dataset Dt fail to prevent the catastrophic for-
getting, and we should find a good way to capture the fea-
3.3 Integrated Training of the CLN-RLN Model
tures of both datasets Dt and Dt1 at the same time. To
address this issue, we split hidden nodes of subnetwork St1 Algorithm 1 describes the details on how CLN-RLN model
and add selected ones to St , so that we can get the features is trained. First, data pre-processing is performed. The
that are suitable for both datasets. For any hidden unit i at samples with missing values will be deleted. According to
the distribution characteristics of yk , we use K-means to
the lth layer uil , if the ‘2 -distance (dl2 ¼ kwtui wt1
ui
k2 ) between label each instance as ck and partition the dataset Dt into
l l
the weight of uil at time t and t 1 is larger than the threshold N mini batches (Line 4-5). Second, we calculate the CLN
t, we will duplicate the node i and add it from the output and loss value, and update the CLN parameters ut
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
LIU ET AL.: QOE ASSESSMENT MODEL BASED ON CONTINUOUS DEEP LEARNING FOR VIDEO IN WIRELESS NETWORKS 3625
(Line 8-10). The results of CLN will be fed into RLN as the TABLE 1
input. Same as CLN, the output, loss value and parameter Simulation Parameters
updating of RLN will be performed (Line 11-18), but the
Parameter Value
loss function that needs to be minimized is different
when t ¼ 1 and t 6¼ 1. The reason for using different loss Cellular layout Hexagonal grid, 7 eNodeBs, 21
functions is that we have different expectations for the cells
Carrier frequency 2.14GHz
structure of the network at different time points. When the Inter-site distance 500 m
first dataset (t ¼ 1) is sampled, our goal is to get a network Frequency bandwidth 20 MHz
with sparse connections, so the ‘1 -regularization is Transmit power 5W
applied. After the subsequent data arrive, we can train on BS antenna height 32 m
a sparse and selective network (Line 14) which could be Antenna gain pattern Directional
Electrical tilt angle 3A^
obtained through breadth-first search. While at the time ^
Mechanical tilt angle 5A
point t 6¼ 1 (Line 16), to alleviate the catastrophic forgetting ^ , 120A ^ , 240A
^ )
Azimuth (0A
problem, we introduce the ‘2 -regularization to prevent Transmission mode CLSM 2x2
too high parameter drift. Third, check all the hidden nodes Path loss model TR 36.942 [35]
of RLN whether they need to be split (Line 22-28). If Shadow fading TR 36.942 [35]
needed, i.e., dl2 ¼ kwtui wt1
uil
k2 > t, duplicate the hidden Small scale fading TR 36.942 [35]
unite uil and add it from l
the subnetwork St1 to St . We Channel model ITU-R Ped-A
will get a new subnetwork St0 until all hidden nodes are UE position, UE velocity Random distribution, 5 km/h
Number of UEs per cell 30
checked. Then, we need to re-select the subnetwork for UEs height 1.5 m
future testing (Line 29). Finally, initialize the network Traffic model Video
parameters utþ1 and W tþ1 with ut and W t for next time Scheduler Proportional Fairness (PF)
stamp (Line 31). Different from other cascaded DNNs with Feedback AMC:CQI, MIMO:PMI and RI
different stages trained individually, in this paper, the two Feedback delay 3 TTI
stages of our cascaded DNNs are integratedly trained Number of TTIs 1000
(Line 8-18). simulated
Epochs 100
TABLE 2
The Video Traffic Model Parameters
TABLE 3
Effect of Thbuf on QoE Distribution and Model Accuracy
Thbuf 0.5 frame (10240 bit) 1 frame (20480 bit) 1.5 frame (30720 bit) 2 frames (40960 bit)
QoE distribution ð; 0Þ 64:4% ð; 0Þ 0% ð; 0Þ 0% ð; 0Þ 0%
½0; 1Þ 1:1% ½0; 1Þ 6:6% ½0; 1Þ 0% ½0; 1Þ 0%
½1; 2Þ 4:4% ½1; 2Þ 36:6% ½1; 2Þ 46:6% ½1; 2Þ 0%
½2; 3Þ 8:8% ½2; 3Þ 22:8% ½2; 3Þ 44:4% ½2; 3Þ 78:8%
½3; 4Þ 15:5% ½3; 4Þ 18:5% ½3; 4Þ 3:5% ½3; 4Þ 14:4%
½4; 5 5:8% ½4; 5 15:5% ½4; 5 5:5% ½4; 5 6:8%
Classification Accuracy 0.7164 0.9811 0.9276 0.8994
RMSE 0.351428 0.010129 0.084554 0.239717
generated and the current buffer size beyond Thbuf . We call 5 RESULTS AND COMPARISON
the process of current buffer size from zero to larger than
In this section, we evaluate the performance of CLN-RLN
the buffer threshold as one pause. Freb will be counted as
with and without the part of splitting nodes, and compare
the reciprocal of pauses.
its performance with that of existing approaches.
(16) QoEivideo class labelling: According to the value of
QoEivideo , the dataset is classified by K-means into m classes.
The data in each category constitute a new data set for sub- 5.1 Sensitivity Analysis Results
sequent supervised classification and regression. The value Before evaluating the proposed model, we need to deter-
m is selected by considering the trade-off between training/ mine the optimal parameters first.
testing error and computing complexity. For the video streaming service, MOS is approximately
From the collected dataset, we observe that the distribu- estimated by Eq. (16). The buffer threshold Thbuf is set to get
tion of QoE data is uneven. There are much fewer data with the Tinitial , Freb and Treb . It can be seen that Freb has the great-
QoE values in the range of 0-1 and 4-5, and most data are in est impact on QoEivideo . The smaller the value of Thbuf , the
the middle ranges. To alleviate the impact of unbalanced larger the value of Freb , and the smaller the value of
data on model training, we create more balanced dataset QoEivideo . On the contrary, a large Thbuf will reduce the fre-
with ADASYN [38], one of the over-sampling algorithms to quency of pause events and get larger QoEivideo . In this
handle data balancing. ADASYN uses a systematic method paper, we set the bitrate to 512 kbps and the inter-arrival
to adaptively create different amounts of synthetic data time between the beginning of each frame to 40 ms. 25
according to their distributions. The main idea of the ADA- frames will be generated per second. We set the average
SYN algorithm is to use the density distribution as a criterion number of bits generated in one frame as the buffer thresh-
to automatically determine the number of synthesized sam- old Thbuf ¼ 40 ms 512 kbps ¼ 20480 bit. In order to justify
ples that need to be generated for each minority example. At the Thbuf , we test the effect of Thbuf value on QoE distribu-
the time t, the dataset Dt is divided into three sets, of which tion and model accuracy in Table 3 as follows. From Table 3,
60% is used for training, 20% is used for validation and the we find that with the increase of Thbuf , the QoE distribution
left part is used for testing. will gradually move to the interval with large value, which
even makes the intervals with small values lack of data.
Conversely, a small Thbuf value will move the QoE distribu-
4.2 Experiment Setting tion to the small value range, and even make most of the
To compare the performance of our proposed CLN-RLN data out of the range [0,5]. In addition, the classification
with the state-of-the-art solutions, we take the regression accuracy and RMSE of the model also decrease with inap-
algorithms including traditional DNN, SVR, DTR, GDBT, propriate buffer size threshold. This is because too large or
RFR, and boosting SVR. We also test the CLN-RLN too small a value makes the QoE distribution change
model without Splitting to verify the effectiveness of greatly, which affects the accuracy of classification and
splitting nodes in alleviating the problem of catastrophic regression. Therefore, we set Thbuf to 20480 bits. Thbuf is
forgetting. All the models are separately trained with affected by the bit rate and frame rate of the video, which
data from different eNodeB. The DNN is a three-layer can be expressed as Thbuf ¼ frameratebitrate
.
fully connected neural network which consists of 100, The number of classes to subdivide the data has an
100, 6 neurons accordingly. The rectified linear unit impact on the accuracy of model. As shown in Fig. 4, the
(ReLU) is utilized as the activation function and Adam loss decreases greatly when the number of classes m 2 ½2; 6.
optimizer is used to update network weights with a ini- When m 6, the loss only reduces slightly and keeps stable,
tial learning rate of a0 ¼ 0:01. We set decay rate ¼ 0:5, but the computational complexity increases linearly with
decay steps ¼ 10, epochmax ¼ 100, and Nbatches ¼ 100. Our the rise of m. Therefore, we take m ¼ 6 to ensure that the
proposed model CLN-RLN, traditional DNN and CLN- model has a relatively low loss value and low computa-
RLN model without Splitting are tested with the same tional complexity.
DNN configuration and parameters. The parameters of Next, we evaluate the impact of dataset (Dt ) size on aver-
other regression algorithms are optimized with Grid age model precision in Table 4. As the volume of data
Search method. increases, we can see that the classification accuracy and
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
3628 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 6, JUNE 2023
Fig. 7. The convergence process of the CLN-RLN without splitting on dataset D2 (a), D3 (b), D4 (c), D5 (d).
TABLE 6
RMSE of CLN-RLN With and Without the Part of Splitting on Current Dataset Dt and Previous Dataset Dt1
Time EDt =Mt of CLN-RLN EDt =Mt of EDt1 =Mt of EDt =Mt of CLN-RLN EDt1 =Mt of CLN-RLN EDt =Mt of
stamp without ADASYN proposed proposed without Splitting without Splitting traditional DNN
CLN-RLN CLN-RLN [16]
t¼1 0.01854 0.010833 0.010833 0.8371
t¼2 0.01655 0.009661 0.009652 0.014968 0.015274 0.8438
t¼3 0.015486 0.009553 0.009567 0.017443 0.018127 0.8482
t¼4 0.016345 0.009425 0.009434 0.020741 0.021097 0.8367
t¼5 0.017941 0.009314 0.009377 0.023593 0.023763 0.8345
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
3630 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 6, JUNE 2023
TABLE 7
Performance Comparison
ensemble algorithm is more effective than the general base nodes in the output layer of the kth specific network. It can
algorithm in this article. Among the three kinds of ensemble be seen that the complexity depends on the type, structure
learning methods, RFR has the best fitting effect on this sub- and the numbers of DNNs cascaded in each level. In the
ject. We also provide the training and testing time analysis combination of “FCN (1) + FCN (1) (Proposed CLN-RLN)”,
of different methods in Table 7. The training time of CLN- “FCN (1) + FCN (6) [22], [23]”,“FCN (1) + FCN (1) [24]”,
RLN is about 1.85 times that of the traditional DNN, while “FCN (1) + CNN (6) [19]” and “CNN (2) + FCN (6) [20],
the testing error of CLN-RLN is more than 86 times lower [21]”, the first-level network with FCN (1) or CNN (2) is
than that of DNN, which indicates that the cascaded DNN used to classify original dataset into several subsets, with
can greatly reduce the test error without increasing too data in each having similar characteristics; six FCNs or
much the training time. The introduction of splitting nodes CNNs are employed in the second-level network of “FCN
can further reduce the test error of the model, while the (1) + FCN (6) [22], [23]”, “FCN (1) + CNN (6) [19]” and
training time increases by about 16 s in comparison with the “CNN (2) + FCN (6) [20],[21]” to regress for each subsets
model without split nodes. We also find that the testing outputted from the first-level network, which increases the
time of cascaded DNN increases slightly compared with computational complexity compared with our proposed
that of DNN. Except for the model with DNN structure, the “FCN (1) + FCN (1) (CLN-RLN)”. In CLN-RLN, only one
training time of other models is relatively short, but they FCN (1) is needed to realize the regression of six subsets. In
have huge testing errors. the model of “FCN (1) + FCN (1) [24]”, although it also
Finally, we collect data sets from different eNodeBs and needs only one FCN (1) in the second-level network, trans-
conduct training and testing. Seven proposed models are fer learning is required to handle the six subsets, which
trained, and each model has a very low RMSE (in the range does not reduce the computational overhead because we
of 0.00953 to 0.01278), which shows that our proposed need to train the FCN (1) of the secondary network six
model is very effective in regression. times. Different from other models where the first-level net-
work is used for classification, in the combination of “CNN
(2) + CNN (1) [17],[18]”, the first-level network with 2
5.4 Computing Overhead CNNs is to perform feature extraction, and one CNN (1) in
In the parameter settings of the proposed model, we set the second-level network is for regression. The results of
Iin ¼ 42 and cm ¼ 6. The Nepo is selected based on the mini- “CNN (2) + CNN (1) [17],[18]” and “FCN (1) [16]” show
mum number of epochs needed for the training process to that the models without embedded classification module
converge. According to the results of split nodes in hidden have an extremely high RMSE, which further demonstrate
layer, we can get H1 ¼ 0, H2 ¼ 27, and achieve that the overall regression error can be greatly reduced
DFLOPs ¼ 5:611K. To highlight the advantages of our pro- through appropriate data set division to obtain subsets with
posed algorithm, Table 8 compares the complexity of the small variance. Compared with other models of Table 8, our
DNN-based models in terms of computations required to proposed method provides good performance in training
complete a single forward pass in the training phase. The time and RMSE.
expression “FCN (1) + CNN (6)” represents the composition
of a cascaded network, in which the bold Arabic numeral
followed denotes the number of specific networks. More 5.5 Generalization Evaluation
specifically, there are 1 FCN in the first-level network and 6 In order to evaluate the generalization capability of the pro-
k
CNNs in the second-level network. Noutput is the number of posed solution in other scenarios, we change the simulation
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
LIU ET AL.: QOE ASSESSMENT MODEL BASED ON CONTINUOUS DEEP LEARNING FOR VIDEO IN WIRELESS NETWORKS 3631
TABLE 8
The Complexity Analysis of DNN-Based Models
scenarios, train and test the proposed model on the dataset 6 CONCLUSION AND FUTURE WORK
generated by a new setup. Compared with the previous
In this paper, we propose a novel data-driven QoE model,
setup used in the paper, we change the simulation parame-
CLN-RLN, based on continual deep learning and to be
ters, such as eNodeB distribution, inter-site distance, band-
used in wireless networks. The CLN-RLN model consists
width, transmit power, the number of UEs, UE velocity,
of two DNNs connected in a series. The first one (CLN) is
electrical tilt and mechanical tilt. 12 cells and 600 UEs with
used to classify the input data into m categories to
10 km/h speed are generated and randomly distributed in a
achieve high classification accuracy, and the other one
square area of 2 km in length and width. Frequency band-
(RLN) is applied to get the final predicted value. The clas-
width and transmit power are reduced by half. Electrical
sification results of CLN will be fed into RLN as inputs to
tilt and mechanical tilt are randomly selected from [0, 8 ]
the corresponding subnetworks, where each subnet is
and [0, 10 ].
formed dynamically in a new time period with nodes and
Similar to the method in the paper, we train the model
links adaptively selected. The testing results demonstrate
with data from different eNodeBs separately, and there are
that the dynamic-connected network architecture formed
4 models to train for 4 eNodeBs. In order to make the
for selective training can largely reduce RMSE. In order
description easier, we only discuss the data set of one
to overcome the catastrophic forgetting problem in the
model. The test error of one model is shown in Table 9.
continual learning, we introduce Hidden-node Split to
Compared with the results in Section 5.3, we can draw a
prevent model parameters trained on the previous dataset
similar conclusion that our proposed learning framework
from deviating too much, and our results show that this
can greatly reduce the regression error to as low as 0.00951.
procedure can facilitate the learning of new features
to further reduce the learning loss. Our extensive per-
formance evaluations by comparing with 7 reference
TABLE 9
RMSE of the Proposed Model on Dataset Produced by algorithms demonstrate that CLN-RLN can reduce regres-
New Simulation Parameters sion error 21 86 times.
In the next phase, according to the proposed QoE/QoS
mapping model, we will focus on improving the QoE by
adjusting the network parameters, such as azimuth angle,
downtilt, transmit power, handover hysteresis and offset.
REFERENCES
[1] R. El Hattachi and J. Erfanian, “Next generation mobile networks
(NGMN) alliance: 5G white paper,” Frankfurt, Germany, NGMN,
White Paper, 2015.
[2] M. Seufert, N. Wehner, F. Wamser, P. Casas, A. D’Alconzo, and
P. Tran-Gia, “Unsupervised QoE field study for mobile youtube
video streaming with YoMoApp,” in Proc. 9th Int. Conf. Qual.
Multimedia Experience, 2017, pp. 1–6.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
3632 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 6, JUNE 2023
[3] X. Liu, G. Chuai, W. Gao, K. Zhang, and X. Chen, “KQis-driven [24] J. Zhang, Y. Li, S. Hu, W. Zhang, Z. Wan, Z. Yu, and K. Qiu, “Joint
QoE anomaly detection and root cause analysis in cellular modulation format identification and OSNR monitoring using
networks,” in Proc. IEEE Globecom Workshops, 2019, pp. 1–6. cascaded neural network with transfer learning,” IEEE Photon. J.,
[4] S. Chikkerur, V. Sundaram, M. Reisslein, and L. J. Karam, vol. 13, no. 1, pp. 1–10, Feb. 2021.
“Objective video quality assessment methods: A classification, [25] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter,
review, and performance comparison,” IEEE Trans. Broadcasting, “Continual lifelong learning with neural networks: A review,”
vol. 57, no. 2, pp. 165–182, Jun. 2011. Neural Netw., vol. 113, pp. 54–71, 2019.
[5] Y. Liu, S. Dey, F. Ulupinar, M. Luby, and Y. Mao, “Deriving [26] Z. Li and D. Hoiem, “Learning without forgetting,” in Proc. Eur.
and validating user experience model for DASH video stream- Conf. Comput. Vis., 2016, pp. 614–629.
ing,” IEEE Trans. Broadcasting, vol. 61, no. 4, pp. 651–665, Dec. [27] J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural
2015. networks,” Proc Natl Acad. Sci. USA, vol. 114, no. 13, pp. 3521–3526,
[6] M. Fiedler, T. Hossfeld, and P. Tran-Gia, “A generic quantitative 2016.
relationship between quality of experience and quality of service,” [28] A. A. Rusu et al., “Progressive neural networks,” 2016,
IEEE Netw., vol. 24, no. 2, pp. 36–41, Mar./Apr. 2010. arXiv:1606.04671.
[7] M. Vaser and S. Forconi, “QoS KPI and QoE KQI relationship [29] J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with
for LTE video streaming and VoLTE services,” in Proc. 9th Int. dynamically expandable networks,” in Proc. 6th Int. Conf. Learn.
Conf. Next Gener. Mobile Appl., Serv. Technol., 2015, pp. 318–323. Representations, 2018, pp. 1–12.
[8] A. Khan, L. Sun, and E. Ifeachor, “Content clustering based video [30] N. Houlsby et al., “Parameter-efficient transfer learning for nlp,”
quality prediction model for MPEG4 video streaming over wire- in Proc. 36th Int. Conf. Mach. Learn., 2019, vol. 97, pp. 2790–2799.
less networks,” in Proc. IEEE Int. Conf. Commun., 2009, pp. 1–5. [31] J. Pfeiffer et al., “AdapterHUB: A framework for adapting trans-
[9] M. Venkataraman and M. Chatterjee, “Inferring video QoE in real formers,” in Proc. Conf. Empirical Methods Natural Lang. Proc. Syst.
time,” IEEE Netw., vol. 25, no. 1, pp. 4–13, Jan./Feb. 2011. Demonstrations, 2020, pp. 46–54.
[10] ITU, “Sg12: performance qos and qoe; available: goo.gl/wnudy8,” [32] W. Hsu and C. Lo, “QoS/QoE mapping and adjustment model in
2021 the cloud-based multimedia infrastructure,” IEEE Syst. J., vol. 8,
[11] D. Minovski, C. Ahlund, K. Mitra, and P. Johansson, “Analysis no. 1, pp. 247–255, Mar. 2014.
and estimation of video QoE in wireless cellular networks using [33] E. Danish, A. Fernando, M. Alreshoodi, and J. Woods, “A hybrid
machine learning,” in Proc. 11th Int. Conf. Qual. Multimedia Experi- prediction model for video quality by QoS/QoE mapping in wire-
ence, 2019, pp. 1–6. less streaming,” in Proc. IEEE Int. Conf. Commun. Workshop, 2015,
[12] A. Herrera-Garcia, S. Fortes, E. Baena, J. Mendoza, C. Baena, and pp. 1723–1728.
R. Barco, “Modeling of key quality indicators for end-to-end net- [34] M. Rupp, S. Schwarz, and M. Taranetz, The Vienna LTE-Advanced
work management: Preparing for 5G,” IEEE Veh. Technol. Mag., Simulators (Signals & Communication Technology). Berlin, Germany:
vol. 14, no. 4, pp. 76–84, Dec. 2019. Springer, 2016.
[13] V. Pedras, M. Sousa, P. Vieira, M. P. Queluz, and A. Rodrigues, [35] 3GPP, “Evolved universal terrestrial radio access (E-UTRA); radio
“A no-reference user centric QoE model for voice and web brows- frequency (RF) system scenarios,” ETSI, Sophia Antipolis, France,
ing based on 3G/4G radio measurements,” in Proc. IEEE Wireless Tech. Rep. TR 36.942, 2020.
Commun. Netw. Conf., 2018, pp. 1–6. [36] 3GPP, “LTE physical layer framework for performance ver-
[14] D. Moura, M. Sousa, P. Vieira, A. Rodrigues, and M. P. Queluz, ification,” Orange, China Mobile, KPN, NTT DoCoMo, Sprint,
“A no-reference video streaming QoE estimator based on physical T-Mobile, Vodafone, Telecom Italia, China, Tech. Rep. R1–070674,
layer 4G radio measurements,” in Proc. IEEE Wireless Commun. 2007.
Netw. Conf., 2020, pp. 1–6. [37] R. K. P. Mok, E. W. W. Chan, and R. K. C. Chang, “Measuring the
[15] Y. Ben Youssef, M. Afif, R. Ksantini, and S. Tabbane, “A novel quality of experience of HTTP video streaming,” in Proc. 12th IFIP/
QoE model based on boosting support vector regression,” in Proc. IEEE Int. Symp. Integr. Netw. Manage. Workshops, 2011, pp. 485–492.
IEEE Wireless Commun. Netw. Conf., 2018, pp. 1–6. [38] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive syn-
[16] X. Tao, Y. Duan, M. Xu, Z. Meng, and J. Lu, “Learning QoE of thetic sampling approach for imbalanced learning,” in Proc. IEEE
mobile video transmission with deep neural network: A data- Int. Joint Conf. Neural Netw. World Congr. Comput. Intell., 2008,
driven approach,” IEEE J. Sele. Areas Commun., vol. 37, no. 6, pp. 1322–1328.
pp. 1337–1348, Jun. 2019.
[17] L. Li, L. G. Wang, F. L. Teixeira, C. Liu, A. Nehorai, and T. J. Cui, Xuewen Liu received the MSc degree in optical
“DeepNIS: Deep neural network for nonlinear electromagnetic engineering from the Tianjin University of Technol-
inverse scattering,” IEEE Trans. Antennas Propag., vol. 67, no. 3, ogy, in 2016. He is currently working toward the
pp. 1819–1825, Mar. 2019. PhD degree in information and communications
[18] C. Lo, Y.-Y. Su, C.-Y. Lee, and S.-C. Chang, “A dynamic deep engineering with the Beijing University of Posts
neural network design for efficient workload allocation in and Telecommunications, Beijing, China. From
edge computing,” in Proc. IEEE Int. Conf. Comput. Des., 2017, 2019 to 2020, he was a visiting PhD student with
pp. 273–280. the State University of New York at Stony Brook,
[19] X. Ma and Z. Gao, “Data-driven deep learning to design pilot and Stony Brook, NY, USA. His research interests
channel estimator for massive MIMO,” IEEE Trans. Veh. Technol., include wireless network fault detection, fault diag-
vol. 69, no. 5, pp. 5677–5682, May 2020. nosis, and intelligent network optimization based
[20] Z. Zheng, X. Li, Z. Sun, and X. Song, “A novel visual measure- on multi-agent reinforcement deep learning.
ment framework for land vehicle positioning based on multimod-
ule cascaded deep neural network,” IEEE Trans. Ind. Inf., vol. 17,
no. 4, pp. 2347–2356, Apr. 2021. Gang Chuai received the MSc and PhD degrees
[21] J. Yu and J. Liu, “Two-dimensional principal component anal- in information and communications engineering
ysis-based convolutional autoencoder for wafer map defect with the Beijing University of Posts and Telecom-
detection,” IEEE Trans. Ind. Electron., vol. 68, no. 9, pp. 8789– munications (BUPT), Beijing, China, in 1999 and
8797, Sep. 2021. 2010, respectively. He is currently a full professor
[22] N. Athreya, V. Raj, and S. Kalyani, “Beyond 5G: Leveraging with the School of Information and Communica-
cell free TDD massive MIMO using cascaded deep learning,” tion Engineering, BUPT. From 2006 to 2007, he
IEEE Wireless Commun. Lett., vol. 9, no. 9, pp. 1533–1537, Sep. was a senior visiting scholar with the University of
2020. Drexel, Philadelphia, PA, USA. His research inter-
[23] F. N. Khan et al., “Joint OSNR monitoring and modulation format ests include wireless communications networking
identification in digital coherent receivers using deep neural technology, intelligent network optimization, and
networks,” Opt. Express, vol. 25, pp. 17767–17776, Jul. 2017. wireless network positioning theory.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.
LIU ET AL.: QOE ASSESSMENT MODEL BASED ON CONTINUOUS DEEP LEARNING FOR VIDEO IN WIRELESS NETWORKS 3633
Xin Wang (Member, IEEE) received the BS and Weidong Gao received the PhD degrees in 2009
MS degrees in telecommunications engineering from the Beijing University of Posts and Telecom-
and wireless communications engineering respec- munications (BUPT) in China. From 2009 to 2015,
tively from the Beijing University of Posts and he was a senior engineer with Potevio Company
Telecommunications, Beijing, China, and the PhD Ltd., and engaged in the research of LTE stand-
degree in electrical and computer engineering ards and algorithms. He is currently an associate
from Columbia University, New York, NY, USA. professor with the School of Information and
She is currently an associate professor with the Communication Engineering, BUPT. His research
Department of Electrical and Computer Engineer- interests include next generation wireless commu-
ing of the State University of New York at Stony, nication, signal and information processing, and
Stony Brook, NY, USA. Her research interests Internet of Things.
include algorithm and protocol design in wireless networks and communi-
cations, mobile and distributed computing, and networked sensing and
detection. She was with executive committee and technical committee of " For more information on this or any other computing topic,
numerous conferences and funding review panels. She is an associate please visit our Digital Library at www.computer.org/csdl.
editor for the IEEE Transactions on Mobile Computing. She was the recip-
ient of the NSF Career Award in 2005, and ONR Challenge Award in 2010.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on September 17,2023 at 08:46:27 UTC from IEEE Xplore. Restrictions apply.