You are on page 1of 6

QoE-Driven Integrated Heterogeneous Traffic

Resource Allocation Based on Cooperative Learning


for 5G Cognitive Radio Networks
Fatemeh Shah Mohammadi and Andres Kwasinski
Rochester Institute of Technology, Rochester, New York 14623, USA
{fs2213, axkeec}@rit.edu

Abstract— Since quality measurement of end user plays an ever for effective and adaptive spectrum use that addresses this
increasing role in development of the wireless communications challenge. With a foresight on future 5G scenarios where
toward the 5G era, mean opinion score (MOS) has become a heterogeneous networks share a common spectrum band, this
widely used metric, not only because it reflects the subjective
quality experience of end users but it also provides a common work will study cognitive QoE-based resource management
quality assessment metric for traffic of different types. This for seamless integration of dissimilar traffic in an underlay
paper presents a distributed underlay dynamic spectrum access DSA setting where primary users (PUs) and the secondary
(DSA) scheme based on MOS which performs integrated traffic users (SUs) are allowed to transmit simultaneously over the
management and resource allocation across traffics of dissimilar same frequency band as long as the interference from the SUs
characteristics (real-time video and data traffic). The presented
scheme maximizes the overall MOS through a reinforcement to the PUs remains below a limit [4].
learning for a system where primary users coexist with secondary Following the transformation towards end-user centric net-
users accessing the same frequency band of interest, while work design for future 5G scenarios, a number of works
satisfying a total interference constraint to the primary users. have studied cognitive radio (CR) techniques based on QoE
The use of MOS as a common metric allows teaching between performance metrics. The works [5] and [6] focused on QoE
nodes carrying different traffic without reducing performance. As
a result, the docitive paradigm is applied to the presented scheme provisioning in CR systems with multiple antennas. While [6]
to investigate the impact of different docition scenarios on overall derived closed form expressions for three QoE indicators, [5]
MOS where a new comer node being taught by experienced focused on the SU satisfaction only from the delay perspective.
peers with similar and dissimilar traffics. Simulation results show The authors in [1] proposed a QoE-driven spectrum handoff
that the docition will reduce the number of iterations required scheme and applied reinforcement learning (RL) to maximize
for convergence by approximately 65% while preserving the
overall MOS more than acceptable level (MOS>3) for different the quality of video transmissions in the long term. In this
secondary network loads. In terms of applying docition between paper, we consider the scenario of multiple SUs accessing a
nodes with similar and dissimilar traffic, simulation results show single spectrum band to transmit real-time video or regular
all different docition scenarios have the same performance in data traffic. Our proposed underlay DSA technique adapts
terms of MOS. the transmit rate, and accordingly modulation scheme, and
I. I NTRODUCTION allocated transmit powers of all SUs so as to maximize
average QoE across all active real-time video or regular data
The evolution of wireless communications towards the 5G transmissions in the secondary network. Our main contribution
era involves a transformation in network design and evaluation in this paper is the application of the MOS as the common
that aims at placing the end-user at the center of any decision. measurement scale of end user’s QoE for all types of traffic
As a result, resource management techniques for 5G networks allowing seamless integration of dissimilar traffic (real-time
need to be based on Quality of Experience (QoE) performance video and data).
assessment [1]. The shift in performance assessment from Our proposed underlay resource allocation technique needs
objective Quality-of-Service (QoS) metrics to subjective end to meet the two contradicting goals of maximizing the average
user’s QoE metrics has been aided by a number of studies in QoE while satisfying the constraint of the interference to
multimedia quality assessment which have contributed tech- the PU. This resource allocation problem is solved through
niques to estimate the QoE from objective measurements while discrete-time Markov decision process (DTMDP) modeling
retaining high correlation with the subjective perception of and the use of a reinforcement learning (RL). The RL learning
quality. Among these techniques, mean opinion score (MOS), [7] has been shown as an effective solution for the resource
a metric rating from 1 (bad) to 5 (excellent), is the most allocation in communication networks. The RL agent can
widely used QoE metric, [2]. Importantly, by providing a generate near optimal solutions through an immediate reward
single common measuring scale for different types of traffic, achieved from interactions with the environment. Through
MOS provides the means to perform integrated traffic man- optimizing the current reward, the RL agent achieves a long-
agement and resource allocation across traffic of dissimilar term optimizing goal, which is important for dynamic systems
characteristics (e.g. real-time video and data), [3]. such as wireless networks. The use of a common QoE metric
The design of new resource management techniques based on dissimilar traffic (MOS) also introduces new important
on QoE performance metrics still face the challenge derived questions. One such question is how the learned environment-
from ever-present constrains in radio spectrum resources. action adaptation experience differs between cognitive radios
The cognitive radio paradigm presents a powerful technology carrying different types of traffic. A novel contribution of this

244
work is to study this question by examining the performance derived as, [10],
differences seen between teaching of nodes carrying the same  (s)
 Pi = Ψi (σ2 +G 0 P0 )
, i = 1, 2, ..., N,
and different types of traffic using the idea of a docitive radio (s) PN
Gi (1− j=1 Ψj ) (4)
[8] which allows newcomer SUs learn from their more expert  Ψ = (1 + 1 )−1
i βi
peers to improve the learning process. For this, we examine
different docition scenarios where a new joined SU is taught In order to have a valid power allocation the condition 1 −
P N
by different groups of SUs with similar and dissimilar traffic i=1 Ψi > 0 needs to be met. After replacing the SU powers
and investigate the impact of different docition scenarios on obtained from (4) in (1), equation (3) can be rewritten as,
the overall QoE. It will be shown that the docitive approach PN
will reduce the number of iterations by approximately 65% j=1 αj Ψj ≤ 1, (5)
without a performance difference whether learning is from where
teachers carrying the same or a different type of traffic. (p) (s)
Gj (σ 2 +G0 P0 )
αj = (s) (p) + 1. (6)
Gj (G0 P0 /β0 −σ 2 )
II. S YSTEM M ODEL
Since β0 is assumed as a constant number, βi needs to be
We assume two wireless networks, a primary network (PN) adjusted at each SU in order to meet the equations (4) and (5).
and a secondary network (SN). The PN containing of a This adjustment can be done by adapting the transmit bit rate.
single primary link shares a single channel with SN at a Based on the work [11] for the considered system setup, the
given time instant under the typical underlay DSA technique. relation between transmit bit rate and the SU threshold SINR
The SN serves N SUs randomly located around a secondary can be written as,
base station (SBS). All secondary and primary links transmit (s)
using adaptive modulation and coding (AMC) scheme. Every ri = W log2 (1 + kβi ), (7)
SUs adapts its transmit parameters in order to satisfy the where M (βi ) = (1 + kβi ) shows the number of bits per
interference requirements from the PU and the other SUs. The modulation symbol and takes practically only a small number
traffic carried over the SN links is real-time/streaming video 1.5
of integer values, and k = − ln(5BER) is a constant which
and regular data. The transmissin channel is assumed as a relates to a target maximum transmit bit error rate (BER)
quasi-static channel along with additive white Gaussian noise. requirement. For our proposed underlay DSA scheme in this
The PU adopts AMC technique while its transmit power is paper, each SU select its target SINR βi and accordingly ri ,
(s)
assumed as a constant number. Under this assumption, the SUs so that all SU cooperately meet the SINR constraints in (3)
can infer the channel state information (CSI) through active and (5). By adjusting the βi and consequently corresponding
learning, [9], and then estimate the channel gains. We design (s)
ri , the modulation scheme is adjusted.
underlay DSA technique based on the Signal-to-Interference- Our optimization task is to maximize the network perfor-
plus-Noise Ratio (SINR) requirement which is the translation mance metric while satisfying a total interference constraint
of the interference requirements. The SINR at the primary base to the primary user. As the representative of end-user centric
station (PBS), SIN R(p) and the SINR for the ith. SU at its quality assessment, QoE is gaining significant attention as we
corresponding SBS, SIN R(s) are expressed as: are moving toward 5G era. Consequently, we opted QoE as
(p) the network performance metric to assess the quality of the
G0 P0
SIN R(p) = PN (p)
, (1) delivered traffic. Among the metrics used to model the QoE,
σ 2 + j=1 Gj Pj MOS was chosen as it is the most widely used metric to model
(s) the delivered traffic QoE. Thus, the network is optimized based
(s) Gi Pi
SIN Ri = (s) P (s)
, (2) on average QoE of all video and data sessions transmitted by
2
σ +G0 P0 + j6=i Gj Pj SUs. In the following, we present the MOS formulas which
are utilized in this paper to quantify quality for delivered data
where P0 is the PU transmit power, which is a constant and video traffic.
(p)
number, Pj is jth. SU’s transmit power, G0 is the channel
(p) A. Data MOS Model
gain between PU and PBS and Gj is jth. SU’s channel gain
(s)
to the PBS, G0 is the channel gain between PU and the ith. Based on the work [3], the MOS for data traffic is calculated
(s)
SBS, ri is the ith. SU’s transmit rate, Pi is the ith. SU’s as follow:
(s)
transmit power, Gi is ith. SU’s channel gain, while σ 2 is (s)
QD = a log10 (b ri (1 − pe2e )), (8)
the noise power. To meet the underlay DSA and QoE goals,
(s)
we impose constraints on secondary and primary SINRs as where QD , pe2e and ri are the data traffic MOS, end-to-end
follow: packet loss probability and data transmit bit rate, respectively.
( The parameters a and b are calculated using the maximum and
SIN R(p) ≥β0
(3) minimum perceived data quality by the end-user. If transmit
(s)
SIN Ri ≥βi , i = 1, · · · , N. rate of a user is R and the effective receive rate is also R,
then the packet loss rate is zero, the quality perceived rate of
where β0 and βi are primary and secondary SINR threshold, the end-user in terms of MOS should be maximum, that is 5.
respectively. When both SINR constraints are assumed with While MOS value of 1 is assigned to a minimum transmission
equality, allocated transmission power for each SU can be rate. In this work a = 1.3619 and b = 0.6780.

245
B. Video MOS Model 5

As video quality assessment metric, peak signal-to-noise


4
ratio (PSNR) is commonly accepted to objectively measure
the coding performance of video. However, it is known that

MOS
PSNR does not accurately reflect subjective human perception 3

of video quality [12]. A wide variety of techniques have been


proposed to estimate user satisfaction for video applications 2
(a survey of video quality assessments can be found in [2]),
among which [13] proposed a simple linear mapping between 1
15 20 25 30 35 40 45
PSNR and MOS, as shown in Fig 1, which assigns MOS PSNR (dB)
value of 4.5 for PSNR of 40 dB and MOS value of 1 for Fig. 1. MOS versus PSNR
PSNR of 20 dB. The limits arise from the fact that received
video sequences experiencing PSNR equal to 40 dB are almost get MOS values corresponding to the computed PSNR values,
indistinguishable from the transmitted one and those below 20 we used linear mapping to map PSNR to MOS. We also used
dB exhibit very severe quality degradations [13]. The work in table-based conversion to obtain another array of MOS values
[14] presented the heuristic mappings from PSNR to MOS for the same PSNR values. Then we averaged the two MOS
as shown in Table I, while, according to the recommendation arrays to obtain final MOS array. After obtaining the MOS
ITU-R BT.500-13 [15], the relationship between MOS and an value corresponding to the PSNR values, the parameters of
objective measure of picture distortion have a sigmoid shape. logistic function can be achieved. in this work c = 6.6431 ,
Consequently, [16] claimed that if the picture distortion is d = −0.1344 and f = 30.4264.
measured with an objective metric, e.g., PSNR (dB), then Since MOS as a common quality assessment metric for all
a logistic function can be used to characterize the relation type of traffics allows to do resource allocation for traffics of
between MOS and PSNR, as follows: all type in an integrated way, we are allowed toPadd MOS
c U
QV = , (9) for all video and data sessions, as follow: N1 ( i=1 QD +
1 + exp(d (P SN R − f )) PN
where QV denotes the MOS for video, and c, d and f are the i=U +1 QV ), where U is the number of SUs transmitting data
while the remaining N − U users transmit streaming video.
parameters of the logistic function. In this paper, we selected (s)
the logistic function to evaluate the quality of video traffic. To By adjusting βi , and consequently corresponding ri , the
compute the parameters of the logistic function (9), an array SU tries to obtain an optimal power assignment in order to not
of PSNR and corresponding MOS values is needed. only satisfies the interference thresholds in (3) but maximize
the network performance metric (MOS).
PSNR[dB] MOS
>37 5 (Excellent) III. P ROBLEM F ORMULATION FOR I NDIVIDUAL SU
31-37 4 (Good) Q-L EARNING
25-31 3 (Fair)
20-25 2 (Poor)
1 (Bad) The reinforcement learning (RL) approach required the
<20
definition of a set of states S, a set of actions A and a
TABLE I
reward function representing the effect of selected action on
P OSSIBLE PSNR TOMOS C ONVERSION [14]
an environment from which each of the learning agents will
Since the PSNR of reconstructed video changes as a func- be guide to choose the next action from A. The same set of
tion of bit rate with the characteristics of the video sequence actions as well as states are assumed for all SUs. Each SU con-
itself, we averaged over the PSNR-bit rate functions for mul- ducts a search into the finite discrete space of candidate target
(i) (i)
tiple MPEG-4 coded video sequences at different resolutions, SINR, denoted by A(i) = {β1 ,· · · ,βn } to find the optimal
240p, 360p and 480p and obtained one average PSNR-bit rate solution which not only meet the constraints in (3) but results
curve. The video sequences were combined in the respective in the better MOS through a CR-based reinforcement learning
proportions 39%, 32% and 28% , same as the proportions approach. As a RL method, a Q-Learning algorithm [18], is
used in [17]. The video sequences used were “Flowervase” and opted in this paper to solve the problem of resource allocation.
“Race Horses”, at 30 frames per second (fps) and resolution The Q-Learning algorithm considers the environment as a
480p, while for resolution 360p “Tennis” and “Park Scene” finite-state, discrete-time stochastic dynamical system. The
at 24 fps were selected. As a result, it was observed that a learning agent observes its current state s∈S and accordingly
(s)
function of the form P SN R = k log ri + p can be used take an action π(s)∈A, under a certain policy π, which
(i)
to very closely approximate the average PSNR-bit rate curve, involves scalar immediate reward Rt . The problem then is to
(s)
where ri is the video transmission bit rate and k and p are find a policy which maximizes the received discounted reward
constants. In this work k = 10.4 and p = −28.7221. To get the V with a discount factor γ (0 < γ < 1), [18]. At the same
parameters of the logistic function formula as in (9) an array time with choosing one strategy from A(i) , each SU adapts its
of PSNR values and its corresponding MOS values is needed. transmit power and other related transmission parameters, then
In this paper, we first computed PSNR values for an array of observes the changes in the system and its own transmission as
video bit rate (which itself computed through equation (7) for well. Each SUs will seek to find an optimal policy to maximize
all candidate actions) using combined PSNR-bit rate curve. To its own MOS while the SINR constraints (3) are satisfied.

246
The states, St = (It , Lt ), are defined to reflect the interference IV. SU D OCITION - BASED C OOPERATIVE L EARNING
caused by the SUs where
As mentioned in the previous section, Q-learning algorithm
 PN (i) develops (learns) iteratively a Q-table that stores the reward
0, if i=1 Ψi (βt )<1
It = (10) for each action. Each Cognitive SUs first learns about its sur-
 1, otherwise,
PN (i) rounding environment then continues with a choice of action
0, if i=1 αi Ψi (βt )≤1 associated with the largest reward, obtains the reward of the
Lt = (11)
1, otherwise. selected action by running Q-learning algorithm, and finally
The reward function is also defined as a function of the state updates its Q-table based on the received immediate reward.
and the local action, Therefore, the Q-table will reflect the effects of the actions
( on the wireless environment. Because part of the wireless
(i) M, if It+1 +Lt+1 > 0
Rt (at , st ) = (i) (12) environment involves the interference created by each SU to
Q(DorV ) , otherwise, the rest of the system, the Q-table will reflect both the individ-
with the assumption of M being a constant smaller than ual local wireless environment for each SU and the collective
the reward of any other strategies in exchange of taken interrelation between the system components [20]. When a SU
an unsuccessful action resulting in interference constraints joint the already learnt system, this environment shows limited
violation (3). While in the case of satisfying the interference changes, therefore it is inefficient to re-run the cognitive cycle
constraints, MOS of the received traffic which is either video and disregard the awareness of the environment captured by
(QV ) or data (QD ) is considered as the immediate reward. other SUs already in system. Hence, this awareness of the
We also assume that the SUs do not know the other’s action environment which is reflected in Q-table can be taught to the
or the effect of joint actions on states and considers the others new joint SU in order to decrease the learning time and also
as part of the environment. Then the SUs repeatedly make their improve the learning performance. This paradigm denoted as
decisions and finally obtain their optimal policies to maximize docitive radio. While the emphasis in CR is to learn, docitive
the expected sum of discounted reward: radio focuses on teaching. Under docitive paradigm the nodes
P∞ (i) with more “experience” for solving a specific system problem
Vi (s, π) = t=0 γ t E(Rt |π, s0 = s), i = 1, 2, . . . , N, (13) will teach less able nodes so as to decrease the learning time
where π is the local strategy and s0 = s is the initial state. and also improve the learning performance [21]. As a result,
According to Bellman’s principle of optimality [19], the
Algorithm 1 Individual learning for resource allocation
solution to (13) can be obtained by taking the optimal action
Initialization: Q0 = 0 for all the SUs
if all the strategies thereafter are optimal:
X for time t < tmax do
Vi∗ (s, π ∗ ) = max [R(s, a) + γ p(s0 |s, a)Vi (s0 , π ∗ )]. (14)
a for all SUi , i = 1, · · · , N do
s0 (i) (i) (i)
Select the action at = arg maxa(i) Q(st , at )
Further, Vi∗ (s, π ∗ )in (14) can be approached by the Q- (i)
t
(i)
Update the state st+1 (10), (11) and the reward Rt ,
function, which is updated as follows:
0
(14).
Qit+1 (s, ait ) = (1−αt )Qit (s, ait )+αt [Rti (s, ait )+γQi∗
t (s )], (15) (i)
Update Q-value Qt (st , at ), (15).
end for
0
where αt is the learning rate, 0 < αt (s, a) < 1 and Qi∗ t (s ) end for
is the ith. SU’s Q-value corresponding to the stage maximum
0 0 0
Qi∗ i i
t (s ) = maxb Qt (b, s ) in the new state s after at is taken.
Consequently, the problem of finding a transmit power and we introduce a docitive approach where SUs already existing
corresponding bit rate becomes that of finding the optimal (β̂i ) in the network initialize their cognitive cycle with their own Q-
and the optimal secondary network spectrum use problem is tables already learnt through Algorithm 1 and the newcomer
formulated as: SU initialize its Q-table obtained by averaging the Q-tables
PN
{(β̂i )} = arg max i=1 Qi(DorV ) (βi ), from the existing users. In the next section, we examine the
βi effects of initialization of Q-table of joint SU by averaging the
PN
s.t. i=1 Ψi (βi )≤1 − ,
(16) Q-tables from SUs with similar and dissimilar traffic types on
PN
j=1 αj Ψj (βj )≤1.
the learning performance.
PN
Qc = N1 i=1 Q(i) . (17)
In order to be able to maximize the average quality mea-
surement of all received traffics, the quality of data or video The mentioned docitive mechanism is shown in Algorithm 2.
received in the SUs receivers needs to be added. As mentioned Moreover, a low bit rate control channel is assumed as an
before as the contribution of this paper, MOS has been used as indication if a newcomer (less experienced node) joins the
QoE metric as it provides a common quality assessment metric network.
for traffics of all type. Algorithm 1 summarizes the steps needs V. S IMULATION R ESULTS
to be taken by each SU in order to implement the individual
learning mechanism. It should be noted that because of the The performance of the presented CR resource allocation
initialization for all the SUs Qi0 = 0, the algorithm will first algorithm was studied through Monte Carlo simulation. A
perform an initial exploration phase, where each Q-table entry primary network consists of one PU accessing a single channel
is visited once, then will through an exploitation phase. with the bandwidth of 10MHz was assumed. The target SINR

247
Algorithm 2 Cooperative Learning (Docition) 4.2
Newcomer-Individual Learning
(Run as a new SU joins the network) Newcomer-Docition
(N +1) 4.1 Newcomer-Docition similar traffic
Add one new SU as SUN +1 and initialize Q0 with Newcomer-Docition dissimilar traffic
Newcomer-Docition nearest neighbour
Qc following (17). 4 Newcomer-Docition random neighbour

Average MOS
for all SUi , i = 1, · · · , N + 1 do
3.9
Restart individual learning (Algorithm 1) with the ex-
isting N +1 Q-tables. 3.8

end for 3.7

for the PU is set to be 10dB. The Gaussian noise power and 3.6

the transmit power of PU are set to be 1nW and 10mW 3.5


respectively. Both SUs and PUs are distributed randomly 5 10 15 20 25
Number of Secondary Users
around their respective base stations within a circle of radius
200m and 1000m respectively. Channel gains follow a log- Fig. 2. Average MOS in SN.
distance path loss model with path loss exponent equal to Figs. 2 to 4 show, respectively, the results for average MOS,
2.8. For a single SU, its SINR could be chosen from the finite congestion rate and average number of iterations. The results
set {−5, −3, −1, 1, 3, 5, 7, 9, 11, 13, 15} dB. The SUs transmit show that the docition algorithm (Algorithm. 2) is able to
using BPSK or QAM modulation (depending on the SINR reduce the average number of iteration to convergence roughly
in which algorithm converges). Regarding to the learning by 2/3 from the individual learning algorithm. In all figures,
algorithm, the same learning rate α = 0.1 and discounting the last five system performing cooperative learning have
factor γ = 0.4 are assumed for all SUs. almost the same performance as all using doction with the
Performance is evaluated based on measuring, as a function difference in the number of partners selected to learn from.
of the SUs number available in the network, the change in Fig. 2 shows, the average MOS of secondary network at
average MOS of the SUs achieved at the convergence point,
in congestion rate in the SU, and in average total number of 0.6
iterations needed for algorithms 1 and 2 to be converged. In
the evaluation, congestion rate is the percentage of cases for 0.5
which one or both of the SINR constraints would be violated
when all SUs remain within acceptable level of distortion. The
Congestion Rate

0.4
maximum SU number is set at 25 because as it can be seen
in Fig 2 this number of SUs results in an average MOS more 0.3
than 3 which is considered as an acceptable MOS level in
terms of end-user quality perception. 0.2
Newcomer-Individual Learning
The performance of six different systems was compared Newcomer-Docition
during simulations while all systems performing a physical- 0.1 Newcomer-Docition similar traffic
Newcomer-Docition dissimilar traffic
layer CR adaptation learning technique: a system called Newcomer-Docition nearest neighbour
Newcomer- Docition random neighbour
“Newcomer-Individual Learning” where all SUs performs in- 0
5 10 15 20 25
dividual learning implemented in Algorithm 1, and five other Number of Secondary Users
systems, called “New Comer-Docition”, “New Comer-Docition Fig. 3. Congestion rate.
similar traffic”, “New Comer-Docition dissimilar traffic”,“New
Comer-Docition nearest neighbor” and “New Comer-Docition convergence point, as a function of total number of SUs in SN.
random neighbor”, respectively, consider one SU joining the It can be seen how the average network MOS decreases as the
already learned network. While the first system performs number of SUs increases. The reason for that is that to meet
individual learning for the joined SU and re-run Algorithm 1, the interference constraints as the number of users increases,
disregarding the intelligence acquired by SUs already in the each SU tend to converge in a less SINR value which overall
network, the others teach the joined SU through the “Docitive” results in a less average MOS. This result also shows that our
approach implemented in Algorithm 2. “New Comer-Docition” QoE-driven resource allocation algorithm obtains a high MOS
cooperative learning system assigns all SUs already in the values (always more than acceptable MOS level (QoE>3),
system to learn from, and initiate the joined SU Q-values even when 25 number of SUs exist in the network). It also can
using (17), while, as the names indicate, “New Comer-Docition be seen that all system have the same performance in terms of
similar traffic” cooperative learning system assigns partners MOS, and the reason for that is that MOS also enables docition
with similar traffic type to learn from and initiate the joined SU between individual CR nodes that carry different types of
Q-values by averaging over Q-values of the users transmitting traffic from each other. This is because while actual MOS
as the same traffic type as the joined SU, “New Comer- values may differ between different types of traffic, the relation
Docition dissimilar traffic” assigns partners with dissimilar between the rewards obtained for different wireless states and
traffic type to learn from, “New Comer-Docition nearest actions is maintained (this is, rewards that are relatively higher
neighbor” cooperative learning system assigns partner based than others remain with the same relation because of MOS and
on proximity, and “New Comer-Docition random neighbor” that the way is calculated is always through monotonically
assigns partner at random. increasing functions of objective quality measures or QoS

248
values). that the presented cooperative CR algorithm based on docition
Fig. 3 shows, congestion rate as a function of the number reduce the number of iterations by approximately 65% while
of SUs in the network. Fig. 3 also is used to determine preserving the average MOS always more than acceptable
the range of simulation choice of the SUs number for a level.
defined congestion rate of the system. Based on this figure,
if it is decided that the secondary network operates at a R EFERENCES
predefined congestion rate, the cooperative learning solution [1] Y. Wu, F. Hu, S. Kumar, Y. Zhu, A. Talari, N. Rahnavard, and J. D.
always accepts more number of users. Matyjas, “A learning-based qoe-driven spectrum handoff scheme for
multimedia transmissions over cognitive radio networks,” IEEE Journal
Fig. 4 shows the efficiency of utilizing docitive paradigm in on Selected Areas in Communications, vol. 32, no. 11, pp. 2134–2148,
accurate transforming the awareness of the surrounding envi- 2014.
ronment to the new comer by experienced peers and reducing [2] Y. Chen, K. Wu, and Q. Zhang, “From qos to qoe: A tutorial on video
quality assessment,” IEEE Communications Surveys & Tutorials, vol. 17,
the number of iterations needed to achieve convergence. It no. 2, pp. 1126–1165, 2015.
can be seen that the number of iterations needed to achieve [3] O. Dobrijevic, A. J. Kassler, L. Skorin-Kapov, and M. Matijasevic, “Q-
convergence is reduced by as much as 65% compared to point: Qoe-driven path optimization model for multimedia services,” in
International Conference on Wired/Wireless Internet Communications.
individual learning algorithm. Springer, 2014, pp. 134–147.
[4] A. Goldsmith, S. A. Jafar, I. Maric, and S. Srinivasa, “Breaking spectrum
gridlock with cognitive radios: An information theoretic perspective,”
Average Iteration Number of Convergence

35 Proceedings of the IEEE, vol. 97, no. 5, pp. 894–914, 2009.


[5] R. Imran, M. Odeh, N. Zorba, and C. Verikoukis, “Quality of experience
30 for spatial cognitive systems within multiple antenna scenarios,” IEEE
Newcomer-Individual Learning Transactions on Wireless Communications, vol. 12, no. 8, pp. 4153–
Newcomer-Docition
Newcomer-Docition similar traffic 4161, 2013.
25 [6] T. Jiang, H. Wang, and A. V. Vasilakos, “Qoe-driven channel allocation
Newcomer-Docition dissimilar traffic
Newcomer-Docition nearest neighbour schemes for multimedia transmission of priority-based secondary users
Newcomer-Docition random neighbour
20 over cognitive radio networks,” IEEE Journal on Selected Areas in
Communications, vol. 30, no. 7, pp. 1215–1224, 2012.
[7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
15
MIT press Cambridge, 1998, vol. 1, no. 1.
[8] L. Giupponi, A. Galindo-Serrano, P. Blasco, and M. Dohler, “Docitive
10 networks: an emerging paradigm for dynamic spectrum management,”
IEEE Wireless Communications,, vol. 17, no. 4, pp. 47 –54, 2010.
5
[9] R. Zhang, “On active learning and supervised transmission of spectrum
5 10 15 20 25 sharing based cognitive radios by exploiting hidden primary radio
Number of Secondary Users feedback,” Communications, IEEE Transactions on, vol. 58, no. 10, pp.
Fig. 4. Average number of cycles at the convergence point. 2960 –2970, october 2010.
[10] S. Pietrzyk and G. J. Janssen, “Radio resource allocation for cellular
networks based on ofdma with qos guarantees,” in Global Telecommu-
VI. C ONCLUSION nications Conference, 2004. GLOBECOM’04. IEEE, vol. 4. IEEE,
2004, pp. 2694–2699.
In accordance with the evolution of resource management [11] X. Qiu and K. Chawla, “On the performance of adaptive modulation in
techniques for 5G network, in this paper we presented an cellular systems,” IEEE transactions on Communications, vol. 47, no. 6,
pp. 884–895, 1999.
underlay DSA technique that allows to adapts transmit power, [12] H. R. Sheikh, M. F. Sabir, and A. C. Bovik, “A statistical evaluation
modulation scheme and accordingly transmit rate of all SUs of recent full reference image quality assessment algorithms,” IEEE
to maximize average QoE across all traffics of dissimilar Transactions on image processing, vol. 15, no. 11, pp. 3440–3451, 2006.
[13] S. Khan, S. Duhovnikov, E. Steinbach, and W. Kellerer, “Mos-based
characteristics (real-time video and regular data traffic) in the multiuser multiapplication cross-layer optimization for mobile multime-
secondary network while satisfying the interference constraints dia communication,” Advances in Multimedia, vol. 2007, 2007.
to the PU transmission. MOS is exploited as an metric to [14] K. Piamrat, C. Viho, J.-M. Bonnin, and A. Ksentini, “Quality of
experience measurements for video streaming over wireless networks,”
model the subjective QoE as it not only meets the end- in Information Technology: New Generations, 2009. ITNG’09. Sixth
user centric quality assessment requirements of 5G networks, International Conference on. IEEE, 2009, pp. 1184–1189.
but enables the seamless integration of dissimilar traffic by [15] I. R. Assembly, Methodology for the subjective assessment of the quality
of television pictures. International Telecommunication Union, 2003.
providing a single common measuring scale for different types [16] P. Hanhart and T. Ebrahimi, “Calculation of average coding efficiency
of traffic. In addition, to improve the convergence time of the based on subjective quality scores,” Journal of Visual Communication
RL algorithm we applied the idea of a docitive radio which and Image Representation, vol. 25, no. 3, pp. 555–564, 2014.
[17] B. Mobile, “Mobile analytica report.”
allows newcomer SUs learn from their more expert peers to [18] C. Watkins and P. Dayan, “Technical note: Q-learning,” Machine learn-
improve the learning process. The use of MOS as performance ing, vol. 8, no. 3, pp. 279–292, 1992.
metric to integrate dissimilar traffic allows to study for the first [19] R. Bellman, Dynamic programming. Courier Corporation, 2013.
[20] W. Wang and A. Kwasinski, “Experience cooperative sharing in cross-
time the teaching between nodes that carry different types layer cognitive radio for real-time multimedia communication,” in Pro-
of traffic. For this, we examine different docition scenarios ceedings of the 4th International Conference on Cognitive Radio and
where a new joined SU is taught by different groups of SUs Advanced Spectrum Management. ACM, 2011, p. 55.
[21] Q. Zhao, D. Grace, and T. Clarke, “Transfer learning and cooperation
with similar and dissimilar traffic and investigate the impact management: balancing the quality of service and information ex-
of different docition scenarios on the overall QoE. Simulation change overhead in cognitive radio networks,” Transactions on Emerging
results show all system have the same performance in terms Telecommunications Technologies, vol. 26, no. 2, pp. 290–301, 2015.
of MOS, and the reason for that is that MOS also enables
docition between individual CR nodes that carry different
types of traffic from each other. Simulation results also show

249

You might also like