You are on page 1of 7

INTELLIGENT RADIO: WHEN ARTIFICIAL INTELLIGENCE MEETS THE RADIO NETWORK

Machine-Learning-Based Opportunistic
Spectrum Access in Cognitive Radio Networks
Pengcheng Zhu, Jiamin Li, Dongming Wang, and Xiaohu You

Abstract centralized and distributed manners [3, 4]. Spe-


cifically, we focus on distributed OSA and study
The explosive growth of wireless devices and its implementation in real-world cognitive radio
data rate demands makes spectrum scarcity a seri- networks.
ous problem. A promising solution is to employ However, due to the existence of multiple
OSA, which enables SUs to seek and opportunis- channels and users, several technical challenges
tically exploit the underutilized spectrum without arise from the opportunistic utilization of spectrum,
interrupting the data transmission of PUs. How- which are summarized below.
ever, the real-world implementation of OSA still
faces several critical challenges including lack of Performance Guarantees under Information Uncertainty
global information, the dilemma of exploration and Channel availability and channel quality are often
exploitation, and channel access competition. In unknown to SUs, and are highly dynamic and
this article, we propose a machine-learning-based unpredictable due to the random behavior of
OSA framework by integrating MAB and matching PUs and channel fading. As a result, such channel
theory. First, we start from the single-SU scenar- dynamics lead to uncertain throughput degrada-
io without global information while considering tion. Considering the limited sensing capability of
the volatility of channel availability. We propose SUs, the communication overheads to collect the
an occurrence-aware OSA (OA-OSA) framework channel state information (CSI) of all the channels
based on the UCB algorithm, which can achieve are tremendous. Therefore, conventional OSA
long-term optimal network throughput perfor- methods which presume that perfect CSI is avail-
mance and a well-balanced trade-off between able will be infeasible. How to develop a distrib-
exploration and exploitation based on only local uted OSA algorithm with bounded performance
information. Then we extend OA-OSA to the multi- guarantees in the absence of prior knowledge is
SU scenario with channel access competitions, an inevitable challenge.
and derive an OCA-OSA framework by integrating
OA-OSA and the Gale-Shapley algorithm. Simula- Dilemma of Exploration-Exploitation
tion results demonstrate that the proposed frame- Without prior knowledge or even statistical infor-
works achieve superior performance in network mation about channel states, each SU has to
throughput and less deviation from optimal perfor- estimate channel availability and quality. Chan-
mance with global information. nel estimation can be performed either offline or
online. Compared to offline estimation, the online
Introduction approach is more efficient since transmission
With the explosive growth of wireless data traffic can be performed during estimation. However,
and mobile service demands, spectrum scarci- it faces the dilemma of exploration-exploitation.
ty has become a serious problem. On the other Specifically, in each round of decision making,
hand, several research works reveal that the an SU must choose whether to increase the esti-
licensed spectrum allocated to users has not been mation accuracy by exploring alternative choices,
fully utilized in either space or time, which results or to take full advantage of existing estimation by
in spectrum waste named the spectrum hole [1]. exploiting the currently optimal choice. In order
To address the paradox between spectrum scar- to optimize the long-term throughput, a good bal-
city and spectrum hole, opportunistic spectrum ance between exploration and exploitation must
access (OSA) has been proposed and intensively be achieved.
researched [2]. OSA is one of the core technol-
ogies of cognitive radio, which emphasizes the Competition of Multiple SUs
intelligence of the network and adapts spectrum An SU will experience channel selection collision
utilization with the external wireless environment when other SUs compete for the same channel.
through learning or other means. The principle of Thus, the decisions of SUs are coupled. Such cou-
OSA is to enable secondary users (SUs) to seek plings cannot be easily analyzed by traditional
and opportunistically use the spectrum hole for game-theoretical approaches [5], because the
data transmission without violating the quality perfect information of other game players as well
of service (QoS) requirements of primary users as CSI are unknown. How to resolve the competi-
(PUs) [2]. OSA is compatible with existing cellular tion of multiple SUs under information uncertainty
technologies and can be implemented in both remains nontrivial.
Digital Object Identifier:
10.1109/MWC.001.1900234 The authors are with Southeast University, Nanjing; Jiamin Li is the corresponding author.

38 1536-1284/20/$25.00 © 2020 IEEE IEEE Wireless Communications • February 2020

Authorized licensed use limited to: UNIVERSIDAD DE MONTERREY (UDEM). Downloaded on November 28,2022 at 18:37:36 UTC from IEEE Xplore. Restrictions apply.
The real-world implementation of OSA oped for the scenario of a single player, which The difference between
becomes more complicated when the above will lead to severe collisions when multiple SUs the works mentioned
three challenges are jointly considered. To pro- compete for the same channel [6]. Therefore, in this article and our
vide a tractable solution, we propose a machine- the interactions among multiple SUs should be work is that we con-
learning-based OSA framework that combines considered in decision making, which is actually
multi-armed bandit (MAB) with matching theory. a combinatorial problem with high complexity. sider both information
MAB is a powerful reinforcement learning tool Moreover, the preferences of SUs and channels uncertainty and inter-
to address the sequential decision making prob- are not aligned. Particularly, SUs prefer channels actions among multiple
lem. It simulates the mechanism the player uses with better channel availability and quality, while SUs. Specifically, we
to decide which single arm to play in each round channels prefer SUs who cause less co-channel incorporate occurrence
with the purpose of maximizing the cumulative interference. Although some works have studied
reward based on only historical observations [6]. the multi-SU scenario of OSA [7, 13], channel awareness and collision
In particular, the channel selection problem in volatility as well as mutual preferences of SUs and awareness into the pro-
OSA is analogous to the classic arm selection channels are not considered. posed OSA framework,
problem in the MAB framework, where each Matching theory has been widely applied to and adapt it to learn
channel is formulated as an arm, and the SU who address the combinatorial problem in various appli- channel volatility and
selects the channel is formulated as the play- cation scenarios including peer discovery, task off-
er. The objective is to maximize the cumulative loading, user association, and so on [8, 14, 15]. matching collision.
reward by selecting while learning. Among several In [8], Gu et al. provided a comprehensive survey
MAB algorithms, we choose upper confidence of the applications of matching theory in wireless
bound (UCB) [7] to achieve a well-balanced communications. In [14], Francesco et al. consid-
trade-off between exploitation and exploration. ered the scenario of the Internet of Things (IoT)
It employs the sample-mean estimations for and developed a distributed many-to-one match-
exploitation and simultaneously utilizes the confi- ing strategy to provide an efficient task offloading
dence interval of estimations for exploration. Fur- mechanism based on the deferred acceptance
thermore, the Gale-Shapley (GS) [8] algorithm is algorithm. A matching-based social network-aware
combined with UCB to resolve the competition of algorithm was proposed to optimize the overall
multiple SUs under information uncertainty. network performance in wireless small-cell net-
In this article, we start from the single-SU sce- works [15]. The drawback of conventional match-
nario and develop an occurrence-aware OSA ing algorithms such as the GS algorithm [8] is that
(OA-OSA) framework under information uncer- perfect information on both sides is required in
tainty. Next, we extend OA-OSA into the multi-SU order to precisely construct the mutual preferenc-
scenario, and develop an occurrence-aware and es.
collision-aware OSA (OCA-OSA) framework. A The difference between the works mentioned
case study is provided to demonstrate that both above and our work is that we consider both infor-
OA-OSA and OCA-OSA can achieve bounded mation uncertainty and interactions among multi-
performance guarantees without any prior infor- ple SUs. Specifically, we incorporate occurrence
mation of CSI. Finally, we provide a summary and awareness and collision awareness into the pro-
present future research directions. posed OSA framework, and adapt it to learn chan-
nel volatility and matching collision.
Related Works
Several researchers have already investigated the System Model and Problem Formulation
implementation of distributed OSA. In [9], Chen In this section, first we introduce the system
et al. developed a spatial congestion framework model of OSA in multi-cell cognitive radio net-
to model the channel selection game with fixed works. Then we present the formulated optimiza-
user locations and spatial reuse, and proposed a tion problem.
distributed learning algorithm based on the Nash Figure 1 depicts the considered multi-cell cog-
equilibrium. A two-phase stochastic multiple chan- nitive radio networks, where each small cell con-
nel sensing (SMCS) protocol was proposed for sists of a small-cell base station (SBS), J PUs, and
distributed OSA [10], in which the aggregated I SUs. We consider the uplink scenario where
throughput of SUs is maximized subject to the both PUs and SUs transmit data to the SBS. OSA
QoS constraints of PUs based on partially observ- is employed to improve spectrum efficiency. The
able channel information and sensing overhead. specific role and functionality of each entity are
However, in both [9, 10], the prior knowledge illustrated as follows:
of the channel is assumed to be available. For • SBS: The SBS is a fixed infrastructure network
instance, the game-theoretical approach devel- component that provides connection service
oped in [9] requires statistical knowledge of the and coordinates channel selection for users
environment, which may not be always available within its coverage area.
in practical applications. • PUs: PUs are licensed users who are authorized
MAB has been widely applied in OSA to solve to use the spectrum with top priority. PUs are
channel selection problems. [11, 12]. In [11], Lai interference-sensitive, that is, SUs are prohibit-
et al. considered the scenario where the probabil- ed from using the channel when it is occupied
ity of channel availability is unknown in advance, by a PU.
and proposed a UCB-based channel selection • SUs: SUs are not allocated dedicated channels.
algorithm to maximize the expected through- They can transmit data by utilizing idle channels
put of SUs. Anandkumar et al. considered the that are unoccupied by PUs.
single-SU scenario and proposed a distributed A time slot model is adopted, in which the total
channel selection scheme based on the e-greedy time period is divided into T equalized slots. At
algorithm [12]. However, classic MAB algorithms the tth slot, a channel is unavailable if it is unoc-
such as UCB and e-greedy were originally devel- cupied by PUs or available otherwise. We assume

IEEE Wireless Communications • February 2020 39

Authorized licensed use limited to: UNIVERSIDAD DE MONTERREY (UDEM). Downloaded on November 28,2022 at 18:37:36 UTC from IEEE Xplore. Restrictions apply.
FIGURE 1. The implementation of OSA in a cognitive radio network.

that both channel availability and channel quality Unavailability of Global Information
remain unchanged within a slot and vary across dif- The global information related to throughput
ferent slots. The implementation of OSA is gener- includes both CSI and strategies of other SUs. If
ally composed of two stages: sensing and decision there is a genie that reveals the global information
making. to SUs, the best solution of OSA follows the Nash
equilibrium of the noncooperative game-theo-
Sensing retical approach. However, the game-theoreti-
At the beginning of each slot, an SU selects a cal approach is infeasible if global information is
channel to sense and observes its occupancy. unknown for SUs.
Due to the hardware limitation, only one channel
can be sensed in a slot, and the remaining chan- Learning while Selecting
nels are left unsensed. The channel is reported as We propose a learning-based OSA framework
“occupied” if the signal of a PU is detected, and is that enables SUs to learn the environment while
reported as “idle” otherwise. making decisions. For each SU, the CSI of a chan-
nel can be learned from historical observations
Decision Making including empirical throughput performance,
Based on the sensing results and collected histor- number of selections, and occurrence time. The
ical information, an SU selects the channel with objective is to maximize the expectation of cumu-
the highest possible throughput. If more than one lative throughput of the entire network.
SU competes for the same channel, it is randomly
allocated to one of the candidate SUs, and the
other SUs have to wait for the next slot.
Occurrence-Aware and Collision-Aware
The channel selection strategy of an SU is Opportunistic Spectrum Access
denoted as a binary variable. For example, we use In this section, first, we develop an occur-
ai,n,t = 1 to represent that SU i selects channel n at rence-aware OSA (OA-OSA) framework for the
slot t, and the corresponding throughput is denot- single-SU scenario. Then we consider the more
ed as Qi,n,t. The achievable throughput not only practical multi-SU scenario, and develop an occur-
depends on CSI, but also depends on the channel rence-aware and collision-aware OSA (OCA-OSA)
selection strategies of other SUs. The throughput framework by combining the GS algorithm and
of the selected channel can even be zero if it is OA-OSA. The proposed OA-OSA and OCA-OSA
unavailable, for example, it is occupied by another frameworks are shown in Fig. 2.
PU.
The objective of OSA is to maximize the cumu- OA-OSA for the Single-SU Scenario
lative throughput of the entire network over a total The channel selection problem is a typical online
period of T slots via the optimization of channel sequential decision making problem that can be
selection, which can be denoted as solved by employing the MAB framework. How-
T I N ever, the variation of channel availability has an
max{ai,n,t } ∑ t=1∑ i=1∑ n=1 ai,n,tQi,n,t . adverse impact on existing MAB algorithms such
as UCB and e-greedy, which are only applicable

40 IEEE Wireless Communications • February 2020

Authorized licensed use limited to: UNIVERSIDAD DE MONTERREY (UDEM). Downloaded on November 28,2022 at 18:37:36 UTC from IEEE Xplore. Restrictions apply.
FIGURE 2. The proposed OA-OSA and OCA-OSA frameworks for the single-SU and multi-SU scenarios, respectively.

to the MAB problem with constantly available For any newly available channel, its small number
arms. To address this challenge, we incorporate of selections Ai,n,t – 1 results in a large confidence
classical UCB with occurrence awareness and interval, which encourages OA-OSA to explore
propose an occurrence-aware OSA framework and learn its performance. Meanwhile, OA-OSA
named OA-OSA. Its implementation consists of tends to exploit the existing channels with small-
four stages: estimation, sensing, decision making, er occurrence time since more times of selection
and learning. Details are given as follows. result in less estimation deviation.
Stage 1. Estimation: At the beginning of each Throughput Awareness: Throughput aware-
slot, every SU estimates its preference toward each ness is considered in both the decision making
channel based on historical observations. The pref- stage and the sensing stage. Compared to con-
erence of SU i toward channel n is estimated as ventional OSA, where the SU randomly senses a

Q! i,n,t = Qi,n,t−1 +
( occ
β log t − ti,n ). channel, OA-OSA adopts a targeted sensing strat-
egy in which only the channel with the highest
Ai,n,t−1 preference is sensed, thereby avoiding throughput
(1) performance degradation.
Here, the first term of Eq. 1 denotes the historical
throughput of channel n up to the current slot, OCA-OSA for the Multi-SU Scenario
which is the sample-mean estimation of the chan- In the multi-SU scenario, OA-OSA will suffer from
nel throughput and belongs to local information. significant performance degradation due to com-
The second term denotes the confidence interval petition among SUs. To overcome this challenge,
representing the uncertainty of sample-mean esti- we augment OA-OSA with the GS algorithm[8],
occ
mation. b is a positive weight of exploration. ti,n which transforms the combinatorial problem into
represents the channel occurrence time. Ai,n,t–1 a two-sided matching problem. We develop OCA-
denotes the number of times channel n has been OSA, which is shown in Fig. 2. It is implemented
selected up to the current slot, which also belongs in four stages: estimation, sensing, matching, and
to local information. learning. The first two stages of OCA-OSA are
Stage 2. Sensing: Denote the ultimate selected exactly the same as those of OA-OSA. Therefore,
channel as yi. Then SU i senses the occupancy of we emphasize the stages of matching and learn-
this channel. ing.
Stage 3. Decision Making: Based on the sens- Stage 3. Matching: In the third stage, the
ing results, the preference toward channel ψi is combinatorial channel selection problem can be
updated. Specifically, if it is occupied, this chan- transformed into a two-sided one-to-one matching
nel is specified as unavailable. Both the prefer- game, which can easily be solved by the classic GS
ence and the channel throughput are set as zero. algorithm, where SUs from one side are matched
Then update ψi as the channel with the second with channels on the other side based on their
highest preference. Otherwise, if ψi is reported as mutual preferences in a stable way. That is, any
idle, the corresponding preference and the chan- SU is matched with its best available choice. The
nel throughput remain unchanged. Next, it selects implementation of the GS algorithm consists of
channel ψi, that is, ai,ψi,t = 1, and sends a scheduling two phases: preference list construction and itera-
request to the SBS. tive matching.
Stage 4. Learning: After data transmission, the Preference List Construction: As shown in Fig.
SU observes the achieved throughput associated 2, if channel n – 1 is sensed to be occupied, it is
with channel ψi. At the end of the tth slot, for any placed at the end of the preference list of SU i.
— —
channel n, update Qi,n,t = Qi,n,t–1Ai,n,t–1 + Qi,n,tai,n,t)/ For any other available channel (e.g., channel n),
(Ai,n,t–1 + 1}) and Ai,n,t = Ai,n,t–1 + ai,n,t. if it is a newly available channel, the preference is
The above four stages are repeated iteratively set as positive infinity so that the SU has a greater
until the maximum number of slots is reached. The chance to be matched with it to learn its through-
OA-OSA framework adapts to the random vari- put performance. Otherwise, if it has been avail-
ations of channel availability and channel quality able before, the preference is determined as Eq. 1,
due to the following properties. which takes empirical throughput performance and
Occurrence Awareness: Occurrence aware- occurrence awareness into consideration. Then the
ness is incorporated in the confidence interval. preference list of SU i is constructed by re-sorting

IEEE Wireless Communications • February 2020 41

Authorized licensed use limited to: UNIVERSIDAD DE MONTERREY (UDEM). Downloaded on November 28,2022 at 18:37:36 UTC from IEEE Xplore. Restrictions apply.
narios. We evaluate the performance in terms of
average network throughput and learning regret.
Learning regret is defined as the cumulative per-
formance loss of the proposed scheme compared
to the optimal channel selection policy with glob-
al information.
FIGURE 3. The assumptions of channel availability.
Performance under the Single-SU Scenario
We consider one SU and four channels. The
the preferences of every channel in descending dynamics of channel availability are shown in
order. Fig. 3, where channels marked A are available
Meanwhile, the preference of a channel toward channels, and channels marked U are unavailable
an SU is defined as positively related to the dis- channels. The achievable throughput of channel n
tance between the SU and the SBS, which guar- at each slot follows a uniform distribution within
— — —
antees fairness among SUs since cell-edge SUs the range [0.8 Qn,1.2 Qn], where Qn represents the

will have high priority. Furthermore, it reduces the average throughput. We have Qn = 0.5, 0.3, 0.2,
overall interference levels of the whole network 0.1 when n = 1, 2, 3, 4. The weight of exploration
since cell-edge SUs are allocated high-quality chan- b is set as 2. Two baseline algorithms are taken for
nels under the same condition. The preference list comparison. Baseline 1 is the channel selection
of a channel is constructed similarly to that of a SU. algorithm proposed in [6], which does not consid-
Iterative Matching: In this phase, the SBS col- er either occurrence awareness or collision aware-
lects the preference list of each SU and performs ness. Baseline 2 is obtained by enhancing baseline
iterative matching. In each iteration, each SU pro- 1 with occurrence awareness. However, collision
poses to its most preferred channel in its prefer- awareness is still not considered.
ence list. If a channel receives only one proposal Figures 4a and 4b show the normalized average
from SUs, it will hold this SU as a potential can- network throughput and learning regret under the
didate. Note that this candidate can be rejected single-SU scenario, respectively. When t = 500, the
later if a better candidate appears. Otherwise, if it proposed OA-OSA scheme outperforms baseline 1
receives more than one proposal, it will select the and baseline 2 by 29.5 and 6.5 percent, respective-
most preferred SU based on its preference list and ly. Besides, its performance also converges faster
reject the other SUs. At the end of this iteration, than the other two baseline algorithms. This huge
each SU updates its preference list by removing performance gain is achieved because it takes both
the channels that have rejected it. occurrence awareness and collision awareness
In the next iteration, any SU who has been into consideration. Among all the three algorithms,
rejected in the previous iteration proposes to its baseline 1 performs the worst since both occur-
currently most preferred channel. If the channel rence awareness and collision awareness are not
has not held any candidate and has received only considered. The achieved throughput is even zero
one proposal from an SU, they are matched. Oth- during some slots due to the fact that an occupied
erwise, the channel will select the most preferred channel has been mistakenly selected. From Fig.
SU among all the newly received proposals as well 4b, it is clear that OA-OSA always achieves the
as the previously held candidate. The iteration will lowest learning regret. Its learning regret is 38 and
continue until every SU has been matched with a 6.2 percent lower than those of baseline 1 and
channel or rejected by all the channels in its prefer- baseline 2, respectively.
ence list, and a stable matching has been derived.
Stage 4. Learning: Each SU accesses a chan- Performance under the Multi-SU Scenario
nel channel based on the matching result. After We consider the multi-SU scenario with three SUs
data transmission, it observes the corresponding and four channels. The settings of channel occu-
throughput of the accessed channel. Next, for any pancy and throughput remain the same as the

channel n, Qi,n,t and Ai,n,t are updated similarly as single-SU scenario. In both baseline 1 and base-
OA-OSA. line 2 algorithms, if multiple SUs select the same
As for OCA-OSA, the complexity of other channel, the channel is randomly allocated to one
stages is similar to that of OA-OSA, while the of the candidate SUs, and the other SUs have to
complexity of matching depends on O(Nlog(N) + wait for the next slot.
Ilog(I)), which is much lower than that of exhaus- Figures 5a and 5b show the normalized average
tive searching. Compared to OA-OSA, OCA-OSA network throughput and learning regret for the
adapts to the multi-SU scenario due to the follow- multi-SU scenario, respectively. Numerical results
ing property. of Fig. 5a indicate that the total network through-
Collision Awareness: By observing matching put of OCA-OSA over T = 2000 slots is 47.7 per-
results, an SU can continuously learn the relation- cent larger than that of baseline 1. Compared to
ship among its preferences toward channels, the baseline 2, OCA-OSA can improve the total net-
strategies of other SUs, and the preferences of work throughput by 8.2 percent over T = 2000
channels. Thus, the impacts of matching collision slots. Numerical results of Fig. 5b also demonstrate
and mutual preferences have been incorporated that OCA-OSA can achieve superior performance
into the learning process. in terms of learning regret. Compared to the base-
line 1 and baseline 2 algorithms, OCA-OSA can
Performance Evaluation reduce learning regret by 93.2 and 76.1 percent,
In this section, preliminary simulation results are respectively, when t = 2000. Therefore, the joint
provided to validate the performance of the pro- consideration of occurrence awareness and colli-
posed scheme. Simulations are carried out in sion awareness can significantly improve the per-
MATLAB for both the single-SU and multi-SU sce- formance.

42 IEEE Wireless Communications • February 2020

Authorized licensed use limited to: UNIVERSIDAD DE MONTERREY (UDEM). Downloaded on November 28,2022 at 18:37:36 UTC from IEEE Xplore. Restrictions apply.
1 600
Baseline 1
0.9 OA-OSA
Baseline 2
Normalized Average Network Throughput
Baseline 2
500 OA-OSA
0.8 Baseline 1

0.7
400 66

Learning Regret
0.6
64

0.5 300 62
200 205 210
0.4
200
0.3

0.2
100
0.1

0 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Slot Slot
(a) The normalized average network throughput. (b) The learning regret.

FIGURE 4. The throughput performance under the single-SU scenario: a) the normalized average network throughput; b) the learning regret.

1 400
OCA-OSA Baseline 1
0.9 Baseline 2 Baseline 2
350
Normalized Average Network Throughput

Baseline 1 OSA-OSA
0.8
300
0.7
Learning Regret

250
0.6

0.5 200

0.4
150
0.3
100
0.2
50
0.1

0 0
0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
Slot Slot
(a) The normalized average network throughput. (b) The learning regret.

FIGURE 5. The throughput performance under the multi-SU scenario: a) the normalized average network throughput; b) the learning regret.

Conclusion and Open Issues Context Awareness: In addition to the occur-


In this article, we propose a machine-learning- rence awareness and collision awareness con-
based OSA framework that incorporates both sidered in this work, there are still numerous
occurrence awareness and collision aware- context factors that influence the learning perfor-
ness. We start from the single-SU scenario and mance, including energy status, service priority,
develop a distributed, low-complexity, online location, QoS requirements, and so on. How to
OSA framework based on only local informa- reasonably incorporate these factors to design a
tion. Then we extend it to the more complicat- context-aware framework for more accurate esti-
ed multi-SU scenario and develop a OCA-OSA mation is an open issue.
framework that enables SUs to learn the inter- Personalized Service Demand of SUs: This
actions among SUs from the matching results. work assumes that all the SUs are identical, and
Numerical results demonstrate that OCA-OSA only considers the QoS performance indicator
can improve the throughput performance by 8.2 in terms of throughput. However, this simplified
percent if collision awareness is considered, and optimization model does not consider other per-
by 47.7 percent if both occurrence awareness formance metrics such as latency and fairness,
and collision awareness are considered. Com- and may not capture the personalized service
pared to the conventional OSA scheme, the demands of SUs well. Therefore, an important
learning regret can also be reduced by 38.0 and research direction is to conduct research on OSA
93.2 percent under the single-SU and multi-SU from the perspective of quality of experience
scenarios, respectively. (QoE), and design a machine-learning-based OSA
Finally, we outline some open issues that require framework to support differentiated service pro-
further research efforts. visioning.

IEEE Wireless Communications • February 2020 43

Authorized licensed use limited to: UNIVERSIDAD DE MONTERREY (UDEM). Downloaded on November 28,2022 at 18:37:36 UTC from IEEE Xplore. Restrictions apply.
Channel Switching Cost: Due to dynam- [11] L. Lai et al., “Cognitive Medium Access: Exploration,
Exploitation, and Competition,” IEEE Trans. Mobile Comp.,
ic channel availability, SUs have to frequently vol. 10, no. 2, Feb. 2011, pp. 239–53.
switch channels and readjust relevant transmission [12] A. Anandkumar et al., “Distributed Algorithms for Learning
parameters, resulting in extra signaling overhead and Cognitive Medium Access with Logarithmic Regret,”
and additional switching cost. This cost should IEEE JSAC, vol. 29, no. 4, Apr. 2011, pp. 731–45.
[13] J. Chen et al., “Interference-Aware Online Distributed
also be taken into consideration in the optimiza- Channel Selection for Multicluster FANET: A Potential Game
tion process, and its impact on the learning regret Approach,” IEEE Trans. Vehic. Tech., vol. 68, no. 4, Apr.
should be analyzed. 2019, pp. 3792–3804.
[14] F. Chiti et al., “A Matching Theory Framework for Tasks
Offloading in Fog Computing for IoT Systems,” IEEE Trans.
Acknowledgments Commun., vol. 66, no. 11, Nov. 2018, pp. 5526–38.
This work was supported by the Natural Science [15] M. I. Ashraf et al., “Dynamic Clustering and User Associ-
Foundation of Jiangsu Province under Grant ation in Wireless Small-Cell Networks with Social Consider-
BK20180011, the National Natural Science Foun- ations,” IEEE Trans. Vehic. Tech., vol. 66, no. 7, July 2017,
pp. 6553–68.
dation of China under Grant 61971127, and the
National Science and Technology Major Project
of China under Grant 2018ZX03001008-002.
Biographies
Pengcheng Zhu [M’09] received his B.S. and M.S. degrees in
electrical engineering from Shandong University, Jinan, China,
References in 2001 and 2004, respectively, and his Ph.D. degree in com-
[1] S. Haykin, “Cognitive Radio: Brain-Empowered Wireless munication and information science from Southeast University,
Communications,” IEEE JSAC, vol. 23, no. 2, Feb. 2005, pp. Nanjing, China, in 2009. He is an associate professor with the
201–20. National Mobile Communications Research Laboratory, South-
[2] S. Stotas et al., “On the Throughput and Spectrum Sensing east University. His research interests lie in the areas of wireless
Enhancement of Opportunistic Spectrum Access Cognitive communications and mobile networks.
Radio Networks,” IEEE Trans. Wireless Commun., vol. 11, no.
1, Jan. 2012, pp. 97–107. J iamin L i received his B.S. and M.S. degrees in communica-
[3] M. Rashid et al., “Opportunistic Spectrum Scheduling for tion and information systems from Hohai University, Nanjing,
Multiuser Cognitive Radio: A Queueing Analysis,” IEEE Trans. China, in 2006 and 2009, respectively, and his Ph.D. degree
Vehic. Tech., vol. 8, no. 10, Oct. 2009, pp. 5259–69. in information and communication engineering from Southeast
[4] Y. Liang et al., “Cognitive Radio Networking and Communi- University in 2014. He has been a lecturer with the Nation-
cations: An Overview,” IEEE Trans. Vehic. Tech., vol. 60, no. al Mobile Communications Research Laboratory, Southeast
7, Sept. 2011, pp. 3386–3407. University, since 2014. His research interests include massive
[5] X. Kang et al., “Incentive Mechanism Design for Hetero- MIMO, distributed antenna systems, and cooperative commu-
geneous Peer-to-Peer Networks: A Stackelberg Game nications.
Approach,” IEEE Trans. Mobile Comp., vol. 14, no. 5, May
2015, pp. 1018–30. D ongming W ang [M’06] received his B.S. degree from
[6] Y. Xu et al., “Decision-Theoretic Distributed Channel Selec- Chongqing University of Posts and Telecommunications, China,
tion for Opportunistic Spectrum Access: Strategies, Chal- in 1999, his M.S. degree from Nanjing University of Posts and
lenges and Solutions,” IEEE Commun. Surveys & Tutorials, Telecommunications, China, in 2002, and his Ph.D. degree from
vol. 15, no. 4, Apr. 2013, pp. 1689–1713. Southeast University in 2006. He is a professor with the Nation-
[7] N. Modi et al., “QoS Driven Channel Selection Algorithm for al Mobile Communications Research Laboratory, Southeast
Cognitive Radio Network: Multi-User Multi-Armed Bandit University. His research interests include distributed antenna
Approach,” IEEE Trans. Cognitive Commun. Net., vol. 3, no. systems and large-scale MIMO systems.
1, Mar. 2017, pp. 49–66.
[8] Y. Gu et al., “Dynamic Path to Stability in LTE-Unlicensed Xiaohu You [F’11] received his B.S., M.S., and Ph.D. degrees
With User Mobility: A Matching Framework,” IEEE Trans. in electrical engineering from Nanjing Institute of Technology
Wireless Commun., vol. 16, no. 7, Dec. 2017, pp. 4547–61. in 1982, 1985, and 1989, respectively. He is a professor with
[9] X. Chen et al., “Distributed Spectrum Access with Spatial the National Mobile Communications Research Laboratory,
Reuse,” IEEE JSAC, vol. 31, no. 3, Mar. 2013, pp. 593–603. Southeast University. His research interests include mobile com-
[10] K. Feng et al., “ Novel Design on Multiple Channel Sensing munications, adaptive signal processing, and artificial neural
for Partially Observable Cognitive Radio Networks,” IEEE networks with applications to communications and biomedical
Trans. Mobile Comp., vol. 16, no.8, Aug. 2017, pp. 2260–75. engineering.

44 IEEE Wireless Communications • February 2020

Authorized licensed use limited to: UNIVERSIDAD DE MONTERREY (UDEM). Downloaded on November 28,2022 at 18:37:36 UTC from IEEE Xplore. Restrictions apply.

You might also like