You are on page 1of 8

Future Generation Computer Systems ( ) –

Contents lists available at ScienceDirect

Future Generation Computer Systems


journal homepage: www.elsevier.com/locate/fgcs

Hybrid Directional CR-MAC based on Q-Learning with Directional


Power Control
Anil Carie a , Mingchu Li a , Chang Liu b , Prakasha Reddy c , Waseef Jamal d, *
a
School of Software, Dalian University of Technology, Dalian, China
b
School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
c
Department of Informatics, College of Engineering and Technology, Wollega University T.R, Ethiopia
d
Center of Business Administration, Institute of Management Sciences, Hayatabad, Peshawar, Pakistan

highlights

• A cognitive radio concept is used to utilized channels in a more accurate way .


• Directional Hybrid control channel with GPS is used to exchange cognitive control information.
• Experimental results are carried out in context of proposed method in comparison with other existing schemes.

article info a b s t r a c t
Article history: In this paper, we investigate the Hybrid Directional CR-MAC based on Q-Learning with Directional
Received 8 August 2017 Power Control in cognitive radio (CR) systems. In CR systems, nodes can switch to heterogeneous non-
Received in revised form 20 October 2017 overlapping channels opportunistically which offer higher achievable throughput. However, the random
Accepted 7 November 2017
channel selection policy in existing CR-MAC protocol has problems like delay, packet collisions, and
Available online xxxx
quality of service. The proposed channel selection scheme which is quite different from the traditional
Keywords: scheme is adopted by nodes to achieve context awareness and intelligence for adaptive channel selection.
Software defined cognitive radio network The nodes select a channel based on the results learned by interactions with the other nodes and channels.
MAC protocol The directional transmission power control scheme allows the nodes to reuse the channels subject
Directional antennas to interference constraints. The simulation results show that nodes using the proposed algorithm can
Software defined radio select channels adaptively and optimal transmission power which helps to achieve high throughput and
Q-learning minimized power consumption.
Power control © 2017 Elsevier B.V. All rights reserved.

1. Introduction CR networks to answer scarcity problem. CR nodes can use Time


Division Multiple Access, Frequency Division Multiple Access or
Fixed spectrum policies have led to underutilization of spec- Code Division multiple access or their combinations. But, there
trum resources, one of the answers to spectrum scarcity problem exists fundamental question of competent channel selection and
in a wireless network is Cognitive radio. Cognitive radio technol- optimal transmission power in the dynamic time-varying envi-
ogy enables nodes to dynamically access unused or underutilized ronment, i.e. channel characteristics, the mobility of the wireless
licensed spectrum opportunistically [1]. Therefore, spectrum uti- sources, the wireless nodes unpredictability in joining the network
lization significantly increases with spectrum aware CR network. or leaving the network etc. The MAC protocols in the literature [13]
SU’s exploit space, frequency and time domains of radio spectrum can work as guidelines for Channel switching according to envi-
using models like opportunistic access [2–4] and concurrent ac- ronmental changes in cognitive radio networks. However, the cost
cess [5–12]. CR devices can maximize throughput [2–4], minimize of frequency changes in current manufactured wireless devices is
energy consumption [10,11] and reduce interference [5–8] by ad- not considered by most of them. On an average channel, switching
justing their radio constraints. Spectrum sharing is important in can cause packet loss ratio up to 3% exclusively, which can be
even worse in cognitive radio networks due to dynamic nature
of primary users. Thus optimal channel selection strategy which
* Corresponding author. fluctuates less frequently is favorable. In [14] power control poli-
E-mail addresses: carieanil@gmail.com (A. Carie), mingchul@dlut.edu.cn
(M. Li), cliu.wcom@hotmail.com (C. Liu), reddysinfo@gmail.com (P. Reddy), cies are devised for cellular networks with QoS as primary criteria,
waseef.jamal@imsciences.edu.pk (W. Jamal). where transmitters increase power to deal with interference and

https://doi.org/10.1016/j.future.2017.11.014
0167-739X/© 2017 Elsevier B.V. All rights reserved.

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.
2 A. Carie et al. / Future Generation Computer Systems ( ) –

channel impairments. However, it is not suitable for primary user’s of wireless network using directional antennas is studied in [35].
QoS if secondary users transmit with arbitrarily high power. Thus In [36], the achievable throughput of mobile ad hoc network with
it is natural that power control should rely on the interference the directional antenna is addressed. Directional Medium Access
levels. In this paper, we propose ‘‘Hybrid Directional CR-MAC based Control is studied in [37], it suffers from deafness and hidden
on Q-Learning with Directional Power Control’’ where learning terminal problems. First Power control with a directional antenna
algorithm is used for channel selection and a new Directional over packet radio network was considered in [38]. Power control
transmission power control scheme (DTPC) for enhancing the built on D-MAC with Directional RTS, Omni Directional CTS and
throughput and energy efficiency of the directional hybrid CR MAC optimal power for the data packet is studied in [39], increased
protocol. The main contributions of our work are twofold. First, network capacity and reduced power consumption. An optimiza-
using channel selection algorithm we try to select the best channel tion problem for selecting the range of channels for transmission
based on SU’s observation of PU’s traffic, channel characteristics with the control channel and aggregated data channel was em-
like throughput achieved, packets lost. Second we investigate the ployed in statistical channel allocation MAC (SCA-MAC) [40] which
problem of channel reuse in directional communication, where outperforms random scheme. DSA-MAC [41], DCRMAC [42], HC-
CR user has the power control capability. That is, CR user can MAC [43], DDMAC [44], SMA [45] are the protocols which are
transmit at any transmit power in the allowable power range that similar to IEEE802.11 DCF standard for reserving a channel.
to achieve maximum concurrent transmissions. The organization
of the rest of the paper is as follows. In Section 2, we present 3. System model
related work. In Section 3 we present system model. Overview of
the Q-learning algorithm is presented in Section 4. The proposed In this study, users with CR capabilities, referred to as sec-
Hybrid Directional CR-MAC based on Q-Learning with Directional ondary users, can communicate with other CR nodes utilizing the
Power Control Scheme is presented in Section 5. Simulation results primary networks available spectrum spatially and/or temporally.
for different network topologies are shown in the Section 6 to When the secondary network doesn’t have enough resources CR
establishes the substantial throughput and energy gains that can nodes form an ad-hoc network without a central controller or
be attained under the investigated scheme. Finally, we present our dedicated control channels. Due to highly dynamic and hetero-
conclusions and future work. geneous networking environment, a dedicated control channel
is not pre-defined for exchanging control messages. We assume
2. Related work that the nodes are close enough as to consider an interference-
limited spectrum sharing scenario in which a CRAHN operates.
Game theory based CR The system model is composed of M licensed channels which
In CR networks Game theory has recently been the most pop- are accessed opportunistically by K CR nodes (acts as both trans-
ular method for attaining context-awareness and intelligence. In mitter and receiver), D primary users. The primary transmitter,
which, SU’s interact to maximize their individual objective such primary receiver, and the mobile CR devices are distributed in
as delay, throughput etc. however, there are several limitations randomly within the coverage area. Similar to [46–48], the two-
in game theory which are addressed using RL approach. Firstly, state continuous-time Markov process is used to model the traffic
GT based CR requires a complete set of information to compute of each channel: Channel occupied by the PU (busy state) and the
the Nash equilibrium; hence it is more suitable for centralized CR channel that is not occupied by the PU(idle state). These two states
networks [15,16]. are referred as ON and OFF respectively. Each SU transmitter and its
Secondly, GT assumes a single type of objective function corresponding SU receiver, but also on the time-varying activities
throughout the CR network, and hence a homogeneous learning of the PUs. We consider the situation that several SUs may compete
mechanism in all the SUs. Thirdly, SU’s might converge to sub for the same channel, and one SU may have more than one channel
optimal action due to miss-coordination even when optimal ac- for selection
tion exists. Although the GT has been successfully applied in CR
networks [17–26], the RL approach is a good alternative which Antenna model
addresses the issues above associated with GT. For instance, the To predict the received power, as in [49,50], we consider a gen-
RL supports heterogeneous learning mechanism in each agent eral power propagation model Pr = Pt CG rt Gtr
dtrα
where Pt is the trans-
because each agent can represent distinctive performance metrics mit power, dtr is the distance between transmitter and receiver,
as local rewards. α is the path-loss exponent, Gtr and Grt are the directive gains of
the transmitting and receiving antennas toward the direction of
Omni directional Power control the receiving and transmitting antennas, respectively, while C is a
SU’s vary their transmit power depending on interference constant determined by other factors as antenna heights and wave
at primary receiver and maximum secondary transmit power length.
constraint [27]. Concurrent transmission region is maximized
in [6] using optimal power control. The number of concurrent Time slot structure
transmissions are maximized in [7] using dynamic spectrum shar- The system model has slotted transmission structure as shown
ing. With objectives of maximizing sum-rate, achieving rate fair- in Fig. 3 and described as follows Each secondary user executes
ness, minimizing power consumption using power control are following stages synchronously during each time slot.
studied in [28–32]. The necessary and sufficient condition for the
- Channel Sensing: SU’s sense the PU channel’s to detect the
feasible region using Power controlled MAC consisting of only
activity of PU’s.
two transmission links is derived in [33]. Channel hopping se-
- PCL-EXCHANGE: After Sensing SU’s broadcasts their Primary
quence is used to allocate the control channel to one-hop neighbor
user free Channel List (PCL) to its neighbors. After receiving
nodes [25]. The basic drawback of sequential CCC based CR-MAC is
PCL information from neighbors, SU’s update PCL table.
longer channel rendezvous delays [26–31]. Channel rendezvous is
- CHANNEL RESERVATION: it is divided into N slots and every
more challenging for increased availability of PU channels.
slot is divided in to two sub slots sub slot (S2) for a node to
Directional power control send RTS directionally and sub slot (S2) for the destination
For reusing spectrum in the macro cell, underlying microcell node to reply with CTS or DNAV.
uses Antenna beamforming and power allocation schemes to max- - DATA TRANFER: SU’s which successfully reserved channel in
imize multiuser sum rate [34]. Capacity and power consumption CHANNEL RERVATION PHASE start data transmission

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.
A. Carie et al. / Future Generation Computer Systems ( ) – 3

4. Overview of Q-learning algorithm the different nodes (states). Therefore, when choosing the next
channel, we let agent act greedily, taking, in each situation, the
We use Q-Learning [12], which is a recent form of reinforcement action with highest Q -value. If the node can transmit data suc-
learning algorithm that does not need a model of its environment cessfully using the channel, then reward R will be increased by the
and works by estimating the values of state–action pairs. The number of packets transmitted otherwise R will be 0. The discount
estimate of future reward value in Q- learning algorithm is given factor is an important parameter of the Q-Learning algorithm. We
by Q(S,A) when an agent takes a particular action A when in a use variable discount factor which is determined by (PU activity
particular state S. Learning intervals are denoted by t∈T = {1,2,. . . }, rate, number of SU competing for the channel, bandwidth).
a constant interval by tD , actions by a∈A, rewards by rt+1(at). Every
CHANNEL SELECTION USING DH-CRMAC
agent records then learnt values from the environment in Q-table
Set α ϵ [0, 1] //learning rate
for all its possible actions with | A| entries. The local reward for For initial time slot
an action is reflected through in its Q -value; hence agents change Select random channel
their actions when there is a change in Q -value. At each interval Broadcast PCL with chosen channel ID’s at the top of the list
t, agent i chooses an action at, and receives local reward rt+1 (at ) to other users.
Receive the information of other user’s channel selection.
at time t+1. The agent i updates the Q -value of at at time t+1 as Calculate population state of each channel (Count number of
follows: user selecting a given channel).
Choose the channel with least count.
Qti+1 (att ) ← (1 − α )Qti (ait ) + α rti+1 (ait ) (1) Sense and contend for the chosen channel and
Transmit data packets if successfully grabbing the channel.
where 0≤α≤1 is learning rate. Value of α decides the dependence If (receive ACK for the DATA packet sent)
on the reward, a higher value of α gives more importance to local Then ND = ND+1
reward than past knowledge. End if
End of initial time slot.
Agents search for action that maximizes value function Vπ as
shown below: For remaining time slot’s
rt+1 (at ) = (ND/TD) + population state
V π = max(Qti (a)). (2) Q t+1 (at ) = (1 − α ) Qt (at ) + α rt+1 at ).
a∈A R = uniform (0, 1) {generate random number}
By exploring the environment, the agents build a table of Q - If R<=ε then
a temp = uniform (1, k)
values for each environment state and each possible action. Ex- Else
ploitation chooses the best known action, or the greedy action, at a temp = Argmax a ∈ A (Q (a))
all times for performance enhancement. Exploration chooses the If | Q t+1 (a temp)− Qt+1 (at )| <= β then
other non-optimal actions once in a while to improve the estimates a t+1 = at
Else
of all the Q -values in order to realize better actions. The learning
a t+1 = atemp
rate and the discount factor are important parameters of the Q- End if
learning algorithm. The learning rate parameter limits how quickly End if
learning can occur. The discount factor controls the value placed on Return a t+1
future rewards. End if
Broadcast PCL with chosen channel ID’s at the top of the list other
users.
5. Hybrid directional CR-MAC based on Q-learning with direc- Receive the information of other user’s channel selection.
tional power control Calculate population state of each channel (Count number of the user
selecting a given channel).
Choose the channel with least count.
5.1. Channel selection using Q-learning Sense and contend for the chosen channel and
Transmit data packets if successfully grabbing the channel.
We present our directional antenna based hybrid CCC based End of time slot
CR-MAC with dynamic channel selection implementation in this
section. Each SU node is equipped with one transceiver which is
used for both control and data. 902MHz is used by nodes to ex- 5.2. Optimal directional power control
change their free channel list which is used to find common control
channel (CCC) among nodes. Nodes decide on PU free (available) In this section, the proposed optimal directional power control
data channel in the licensed band over CCC. Two-way handshaking algorithm for channel spatial reuse is presented. We first study
is performed by nodes to transmit control and data information. the feasibility of the channel reuse with proposed optimal power
An illustration of the cognitive MAC protocol is shown in Fig. 4. control algorithm. Fig. 2 illustrates a classical spectrum access or
Channel switching decision is made at the beginning of each time spectrum sharing scenario with D randomly distributed primary
slot which depends on the channel state information. users (PU in Fig. 2) and K secondary users (SU in Fig. 1). In this
In Directional Hybrid (DH) CR-MAC the DCS applies QL to select scenario, we assume that each of the PUs is equipped with Omni-
a channel. Each agent divides time in to fixed intervals of ‘t’ and antenna, while each of SUs is equipped with multi-antenna, which
keeps track of a number of packets transmitted successfully ‘ND ’. is available for beamforming technology. In this case, the small
At the beginning of each‘t’ every node updates Q -value using cell consists of PU broadcasting channels and SU beamforming
(1) and chooses the channel with highest Q -value and broadcast channels. In addition, considering the mobility of both the primary
to its neighbors along with PCL. Nodes after receiving broadcast users and secondary users, we assume that both PUs and SUs follow
information update PCL table and calculate population state of the homogeneous Poisson point process (HPPP). Let {N(A)} denotes
each channel, i.e. number of users selecting a given channel. Nodes the number of users in the area ‘‘A’’, such as the cell in Fig. 2. If {N
select the channel with least population state for the transmission. (A)} follows an HPPP with the intensity of λ > 0, that is, N(A) ∼
Every node maintains Q-table that consists of Q -values which are Poisson (λ| A|), then the probability of N(A)= k can be expressed
in the range of 0 to 1. We use dynamic Q-table that the size of as:
Q-table of the node is determined by the number of available e−λ|A| (λ|A|)k
channels. The Q-table and the learning task are distributed among P(N(A) = k) = . (3)
k!

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.
4 A. Carie et al. / Future Generation Computer Systems ( ) –

Fig. 3. Antenna, channel and node reservation tables in directional CR-MAC.

Fig. 1. Cognitive radio network with learning and power control.

Fig. 4. Node communication using hybrid control channel using learning and power
control.

Fig. 2. Illustration of distributed spectrum access with spatial reuse.

Based on the discussion above, we can then analyze the access


scenario. As for the PUs in the cell, although the directional beams
are adopted among the SUs communications, the PU receivers may
still suffer the interferences from the SUs’ inaccuracy beamform-
ing as the broadcasting channels can receive signals from all the
directions in space. As for the SUs in the cell, by adjusting the
beamforming direction, the SU system can nullify the interference Fig. 5. Interference of node SU to neighboring PU’s.
to PU systems, but the SU system needs to know the interference
channel information to control the interference by beamforming.
We assume that cooperation exists between the PU system and
SU receiver and transmitter, and gi ∈ C M ×1 is the interference
SU system, and cooperation could be achieved by using a direct
channel vector from the SU transmitter to the ith PU receiver. The
feedback wireless channel link from the PU users to the SUs. We
transmitted signal is given by x = v · s, meaning that a single
skip the details of the feedback protocols between the PU receiver
and SU transmitter, as this is out of focus of this paper. According dimensional symbol, s ∈ C , is weighted by a complex beamforming
to the discussion above, we can then make a mathematical formu- vector v ∈ C M ×1 . The transmit signal must satisfy the transmit
lation for the channel and signal model, which is generalized as power constraint of PM , i.e., E[∥x∥2 ] ≤ PM . Then, by restricting
the application model in Fig. 5. Note that in this model, a single our attention to the normalized beamforming vector v, where
SU receiver coexists with D PU receivers, while sharing the same ∥v∥2 = 1, we can limit the signal power P by E[| s| 2| ]= P ≤P M .
radio resources. The SU transmitter has M antennas, while both Our problem is to determine the optimal beamforming vector and
the SU receivers and PU receivers are equipped with a single transmit power such that the transmission gain in the SU link is
antenna. Thus, the signal received at the SU receiver,y ∈ C and maximized while the interference to the PU receiver is constrained
the interference to the ith PU receiver from the SU transmitter, Ii to a predefined threshold. Mathematically, this problem can be
are, respectively, given by described as
2
y = hH x + n (4) max γs = P |hH v|
v,P (6)
2 2
Ii = |gH
i x| (5) s.t .Ii = |gH
i v| ≤ Ith , ∥v∥ = 1, and P ≤ PM .
2

where n is a zero-mean independent complex Gaussian noise with For, i = 1, 2, . . . , m where γs is the channel gain of the SU link.
unit variance; h ∈ C M ×1 is the channel vector between the Therefore, by solving the above optimization, we can finally obtain

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.
A. Carie et al. / Future Generation Computer Systems ( ) – 5

Table 1
Simulation parameters.
Parameter name Description
Topology 1000 *1000 Flatgrid
No. of CR nodes 100
No. of PU channels 10(8 MHz) channels
No. of PU transmitters 10
Unlicensed channel ISM-902 MHz
PU active probability 10, 15 and 20 msec
Mobility model Random waypoint
Input CR transmit power 10 mW
Receiver threshold −95 dbm
Carrier Sense threshold −115 dbm
CR Tx range 200 m (Licensed channel)
PU Tx range 500 m (Licensed channel)
Data rate 2 Mbps
Antenna type(Channel reservation) Directional (4-element)
Antenna type(Data transmission) Directional(4-element)
Beamwidth 900
Interface Queue length 50
Simulation time (s) 100 Sec
Traffic type CBR/UDP
Packet size (bytes) 512 & 1024 bytes

the optimal beam shape to achieve the directional transmission


among SUs with tolerance interference to PUs.

6. Simulation results

Cognitive radio network simulator (NS-2.31) is used to imple-


ment the proposed reinforcement learning based CR-MAC proto-
col. Our primary goal is to show learning based channel selection,
and optimal power control would improve throughput, reduce
channel switching and improve overall performance. Also, the pro-
posed Cognitive MAC protocol is compared in terms of aggregate
network throughput, average packet delay, per-hop delay with
respect to data rate, PU spectrum occupancy rate and PU channel
availability. Furthermore, the proposed Cognitive MAC protocol
with respect to reinforcement learning is compared with In-band
Cognitive MAC, out-of-band Cognitive MAC and hybrid CCC MAC
protocols. The simulation parameters that are used in this paper is
shown in Table 1. A flat-grid 1000*1000 topology is used to run the
network simulations. It is noted that 100 CR nodes are considered
for this simulation with 10 PU channels and 10 PU base stations.
Fig. 6(a) describes energy consumption for application data with
respect to number of CR nodes. Whenever, the number of cognitive
nodes within the network increases then there is an increase in
collision probability due to more traffic contention.
Fig. 6(a) describes energy consumption for application data
with respect to number of CR nodes. Whenever the number of
cognitive nodes within the network increases then there is an
increase in collision probability due to more traffic contention. This
is more common in opportunistic cognitive radio access network
due to contention with CR nodes and evacuation due to PU active
in current CR communication channel. From Fig. 6(a), it is clear
that the proposed Cognitive MAC protocol will have less energy
consumption with respect to in-band and out-of-band CCC based
CR networks.
In Fig. 6(b) we have shown the relation of aggregate throughput
to number of CR nodes and observed that CR nodes with learning
and without learning have similar throughput initially. CR nodes
with learning at first do not have knowledge of PU activity. How-
ever, over a period as CR nodes determine the usage pattern of
primary users, i.e., PU’s they select PU free channels that would Fig. 6. (a) Per-hop control energy consumption with respect to No. of CR nodes. (b)
yield higher throughputs. CR nodes would stop choosing channels Per-hop data energy consumption with respect to No. of CR nodes. (c). Aggregate
Network Throughput with respect to No. of available PU channels. (d) Aggregate
which are frequently used by PU’s. So from Fig. 6(b), it is evident
Network Throughput with respect to Spectrum Occupancy rate.
that learning helps to achieve higher throughputs.

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.
6 A. Carie et al. / Future Generation Computer Systems ( ) –

Fig. 6(c) shows the curves between aggregate network through-


put and number of available PU channels. For this, we have con-
sidered negligible sensing time. CR nodes learn about the channel
usage pattern of PU’s and nearby CR nodes and select channels
which give the maximum period to transmit without interrup-
tions. Learning helps to avoid competition with PU and other CR
nodes which contribute to promote utilization of all the available
channels. Finally, Fig. 6(d) shows an increase in throughput as the
number available PU free channels increases
Fig. 7(a) shows aggregate throughput as function of MAC data
rate. We have taken number of PU channels as 5 and data size as
1024 bytes. With the increase in data rate CR nodes compete to
access the channel with the higher data rate, but CR nodes also
understand through learning mechanism that other nodes also
want to access the high data rate channel. Thus they tend to choose
a channel which satisfies their requirement hence avoiding colli-
sion with other CR nodes. In Fig. 7(c) the number of PU channels
was increased to 10, which will increase transmission probability.
Now CR nodes will shift to channel which satisfies its throughput
requirements.
Average packet delay of the proposed scheme is presented and
verified in Fig. 7(b). Herein we have analyzed performance by
setting a number of PU channels as 5. Packet delay increases as
the collisions increase. By comparison, proposed CR-MAC utilizing
learning scheme and also optimal power decreases the number
of collisions, which in turn decrease the packet delay. We have
compared the performance of packet delay with state of the art CR-
MAC protocols. In Fig. 7(d) the number of PU channels are increased
to 10 which shows a direct impact on packet delay. CR nodes
which are waiting in the queue or in back off phase can switch into
new transmission opportunities. Unlike CR nodes without learning,
CR nodes with learning are more adaptive to dynamic channel
opportunities.

7. Conclusion and future work

Enhancing the network throughput with minimal node energy


consumption is significant in cognitive radio ad-hoc networks.
This paper proposes A Hybrid Directional CR-MAC based on
Learning with Power Control for ad-hoc cognitive radio networks
that uses Q-learning scheme for channel selection based on the
observation of PU’s traffic and other SU’s channel selection. The
protocol learns the neighbor nodes channel selection and adapts to
channel with less chance of collision. In addition, node calculates
optimal power for transmission, which increases spatial reuse
thus improving channel utilization and systems throughput. The
proposed schemes is shown to have better performance compared
with the non-learning CRMAC protocols. In the future work real
time application based channel scheduling will be considered in
accordance with the proposed scheme.

Acknowledgment

This work was supported by Nature Science Foundation of china


under grant number : 61572095. Also The authors would like to
be grateful the anonymous reviewers for their valuable comments
and suggestions to enhance the quality of the paper.

References

[1] J. Mitola, Cognitive radio: An integrated agent architecture for software de-
Fig. 7. (a) Aggregate Throughput with respect to data rate(PU channels :5) (b) fined radio (Ph.D. dissertation), KTH Royal Institute of Technology, 2000.
Average packet delay with respect to data rate(PU channels : 5) (c). Aggregate [2] S. Srinivasa, S.A. Jafar, Soft sensing and optimal power control for cognitive
Network Throughput with respect to data rate (PU channels : 10) (d) Average packet radio, IEEE Trans. Wireless Commun. 9 (12) (2010) 3638–3649.
delay with respect to MAC data rate (PU channels: 10). [3] V. Asghari, S. Aissa, Adaptive rate and power transmission in spectrum-sharing
systems, IEEE Trans. Wireless Commun. 9 (10) (2010) 3272–3280.

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.
A. Carie et al. / Future Generation Computer Systems ( ) – 7

[4] E.C.Y. Peh, Y.-C. Liang, Y. Zeng, Sensing and power control in cognitive radio [29] B. Maham, P. Popovski, X. Zhou, A. Hjorungnes, Cognitive multiple access net-
with location information, in ICCS, 2012, pp. 255–259. works with outage margin in the primary system, IEEE Trans. Wirel. Commun.
[5] L.-C. Wang, A. Chen, Effects of location awareness on concurrent transmissions 10 (10) (2011) 3343–3353.
for cognitive ad hoc networks overlaying infrastructure-based systems, IEEE [30] E. Dall’Anese, S.-J. Kim, G.B. Giannakis, S. Pupolin, Power control for cognitive
Trans. Mobile Comput. 8 (5) (2009) 577–589. radio networks under channel uncertainty, IEEE Trans. Wirel. Commun. 10 (10)
[6] Y. Song, J. Xie, Optimal power control for concurrent transmissions of location- (2011) 3541–3551.
aware mobile cognitive radio ad hoc networks, in GLOBECOM, no. July 2009, [31] X. Gong, S. Vorobyov, C. Tellambura, Optimal bandwidth and power allocation
pp. 1–6. for sum ergodic capacity under fading channels in cognitive radio networks,
[7] M.R. Hassan, G. Karmakar, J. Kamruzzaman, Maximizing the concurrent trans- IEEE Trans. Signal Process. 59 (4) (2011) 1814–1826.
missions in cognitive radio ad hoc networks, in IWCMC, no. July 2011, pp. 466– [32] C.C. Zarakovitis, N. Qiang, D.E. Skordoulis, M.G. Hadjinicolaou, Power-efficient
471. cross-layer design for OFDMA systems with heterogeneous QoS, imperfect CSI,
[8] S.M. S’anchez, R.D. Souza, E.M.G. Fernandez, V.A. Reguera, W. Godoy, Effect of and outage considerations, IEEE Trans. Veh. Technol. 61 (2) (2012) 781–798.
location accuracy and shadowing on the probability of non-interfering con- [33] A. Behzad, Z. Rubin, Multiple access protocol for power-controlled wireless
current transmissions in cognitive ad hoc networks, Radioengineering 22 (4) access nets, IEEE Trans. Mob. Comput. 3 (4) (2004) 307–316.
(2013) 1138–1149. [34] M. Ku, L. Wang, Y.T. Su, Toward optimal multiuser antenna beamforming for
[9] S.M. S’anchez, R.D. Souza, E.M.G. Fernandez, V.A. Reguera, Impact of rate hierarchical cognitive radio systems, IEEE Trans. Commun. 60 (10) (2012)
control on the performance of a cognitive radio adhoc network, IEEE Commun. 2872–2885.
Lett. 16 (9) (2012) 1424–1427. [35] P. Li, C. Zhang, Y. Fang, The capacity of wireless ad hoc networks using
[10] S.M. Sanchez, R.D. Souza, E.M.G. Fernandez, V.A. Reguera, Rate and energy directional antennas, IEEE Trans. Mobile Comput. 10 (10) (2011) 1374–1387.
efficient power control in a cognitive radio ad hoc network, IEEE Signal Process. [36] Y. Chen, J. Liu, X. Jiang, O. Takahashi, Throughput analysis in mobile ad hoc
Lett. 20 (2013) 451–454. networks with directional antennas, Ad Hoc Networks 11 (3) (2013) 1122–
[11] S. Buzzi, D. Saturnino, A game-theoretic approach to energyefficient power 1135.
control and receiver design in cognitive CDMA wireless networks, IEEE J. Sel. [37] R. Choudhury, N.H. Vaidya, Impact of Directional Antennas on ad hoc Routing.
Topics Signal Process. 5 (1) (2011) 137–150. In Conference on Personal and Wireless Communication, September 2003.
[12] M. Ku, L. Wang, Y.T. Su, Toward optimal multiuser antenna beamforming for [38] J. Zander, Slotted Aloha multihop packet radio networks with directional
hierarchical cognitive radio systems, IEEE Trans. Commun. 60 (10) (2012) antennas, Electron. Lett. 26 (25) (1990).
2872–2885. [39] Y.B. Ko, V. Shankarkumar, N.H. Shankarkumar, Medium access control proto-
[13] Proceedings of the first IEEE symposium on New Frontiers in Dynamic Spec- cols using directional antennas in ad hoc networks. In Annual Joint Conference
trum Access Networks, November 2005. of the IEEE Computer and Communications Societies, March 2000.
[14] M. Chiang, P. Hande, T. Lan, C.W. Tan, Power control in wireless cellular [40] A. Chia-Chun Hsu, David S.L. Wei, C.-C. Jay Kuo, A cognitive radio MAC protocol
networks, Found. Trends Netw. 2 (4) (2008) 381–533. using statistical channel allocation for wireless ad hoc networks, In Proc. IEEE
[15] Z. Ji, K.J.R. Liu, Dynamic spectrum sharing: A game theoretical overview, IEEE WCNC, March 2007, pp. 105–110.
Comm. Mg. 45 (5) (2007) 88–94. [41] S.L. Wu, C.Y. Lin, Y.C. Tseng, J.P. Sheu, A new multi-channel MAC protocol with
[16] D. Niyato, E. Hossain, Competitive spectrum sharing in cognitive radio on-demand channel assignment for multi-hop mobile ad hoc networks, IEEE
networks: A dynamic game approach, IEEE T. Wls. Comm. 7 (7) (2008) DySPAN, Maryland, USA, 2005, pp. 203–213.
2651–2660. [42] S.J. Yoo, H. Nan, T.I. Hyon, DCR-MAC: Distributed cognitive radio MAC protocol
[17] N. Nie, C. Comaniciu, Adaptive channel allocation spectrum etiquette for for wireless ad hoc networks, WirelCommun Mobile Comput. 9 (5) (2009) 631–
cognitive radio networks, Symp. on New Frntr. in Dynmc. Spctrm. Acs. Nwk. 653.
(DySPAN), IEEE, Baltimore, MD, 2005, pp. 269–278. [43] J. Jia, Q. Zhang, X. Shen, HC-MAC: A hardware-constrained cognitive MAC for
[18] M.R. Musku, A.T. Chronopoulos, S. Penmatsa, D.C. Popescu, A game theoretic efficient spectrum management, IEEE J. Sel. Areas Commun. 26 (1) (2008) 106–
approach for medium access of open spectrum in cognitive radios, 2nd Intl. 117.
Conf. on Cgntve. Rd. Orntd. Wls. Nwk. and Comm. (CROWNCOM), IEEE, Or- [44] H.A.B. Salameh, M.M. Krunz, O. Younis, Cooperative adaptive spectrum
lando, FL, July 2007, pp. 336–341. sharingin cognitive radio networks, IEEE/ACM Trans. Netw. 18 (4) (2010)
[19] Z. Han, C. Pandana, K.J.R. Liu, Distributive opportunistic spectrum access for 1181–1194.
cognitive radio using correlated equilibrium and no-regret learning, Wls. [45] X. Wang, A. Wong, P.H. Ho, Stochastic medium access for cognitive radio ad
Comm. Nwk. Conf. (WCNC), IEEE, Hong Kong, March 2007, pp. 11–15. hoc networks, IEEE J. Sel. Areas Commun. 29 (4) (2011) 770–783.
[20] Z. Ji, K.J.R. Liu, Dynamic spectrum sharing: A game theoretical overview, IEEE [46] Y. Yilmaz, Z. Guo, X. Wang, Sequential joint spectrum sensing and channel
Comm. Mg. 45 (5) (2007) 88–94. estimation for dynamic spectrum access, IEEE J. Sel. Areas Commun. 32 (11)
[21] S. Subramani, T. Basar, S. Armour, D. Kaleshi, Z. Fan, Noncooperative equilib- (2014) 2000–2012.
rium solutions for spectrum access in distributed cognitive radio networks, [47] N. Khambekar, C.M. Spooner, V. Chaudhary, On improving serviceability with
Symp. On New Frntr. in Dynmc. Spctrm. Acs. Nwk. (DySPAN), IEEE, Chicago, IL, quantified dynamic spectrum access, in Proceedings of the IEEE International
October 2008. Symposium on Dynamic Spectrum Access Networks (DySPAN ’14), pp. 553–
[22] H.N. Pham, J. Xiang, Y. Zhang, T. Skeie, QoS-aware channel selection in cog- 564, McLean, Va, USA, April 2014.
nitive radio networks: A game-theoretic approach, Glbl. Telecomm. Conf. [48] T.M.C. Chu, H. Phan, H.J. Zepernick, Hybrid interweave underlay spectrum
(GLOBECOM), IEEE, New Orleans, LA, December 2008. access for cognitive cooperative radio networks, IEEE Trans. Commun. 62 (7)
[23] H. Qin, H. Wang, H. Zhou, A selfish game-theoretic approach for cognitive radio (2014) 2183–2197.
networks with dynamic spectrum sharing, Intl. Conf. on Comp. Sc. Sftwr. Eng. [49] P. Li, C. Zhang, Y. Fang, The capacity of wireless ad hoc networks using
(CSSE), China, December 2008, pp. 1105–1109. directional antennas, IEEE Trans. Mobile Comput. 10 (10) (2011) 1374–1387.
[24] X. Gong, W. Yuan, W. Liu, W. Cheng, S. Wang, A cooperative relay scheme for [50] Y. Chen, J. Liu, X. Jiang, O. Takahashi, Throughput analysis in mobile ad hoc
secondary communication in cognitive radio networks, Glbl. Telecomm. Conf. networks with directional antennas, Ad Hoc Networks 11 (3) (2013) 1122–
(GLOBECOM), IEEE, New Orleans, LA, December 2008. 1135.
[25] V. Maskery, V. Krishnamurthy, Q. Zhao, Decentralized dynamic spectrum
access for cognitive radios: Cooperative design of a noncooperative game, IEEE Anil Carie received the B.Tech. Degree in Computer Sci-
T. Comm. 57 (2) (2009) 459–469. ence and Engineering from Jawaharlal Nehru Technolog-
[26] H.-P. Shiang, M.V.D. Schaar, Queuing-based dynamic channel selection for ical University, Hyderabad, India and M-Tech degree in
heterogeneous multimedia applications over cognitive radio networks, IEEE software Engineering from Karunya University, Coimbat-
T. Mm 10 (5) (2008) 896–909. ore, India. He worked as faculty in Vidya Barathi Insti-
[27] S. Srinivasa, S.A. Jafar, Soft sensing and optimal power control for cognitive tute of Technology, Warangal, India from Jun-2010 to
Jun-2014. He is currently a Ph.D. candidate in School of
radio, IEEE Trans. Wireless Commun. 9 (12) (2010) 3638–3649.
Software Technology of Dalian University of Technology,
[28] A. Hoang, Y. Liang, M. Islam, Power control and channel allocation in cognitive
China. His research interests include Common control
radio networks with primary users’ cooperation, IEEE Trans. Mob. Comput. channel design for MAC and routing protocols in Cognitive
9 (3) (2010) 348–360. radio ad-hoc networks.

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.
8 A. Carie et al. / Future Generation Computer Systems ( ) –

Mingchu Li received the B.S. degree in mathematics, CH.R. Prakasha Reddy is currently working as a Lecturer
Jiangxi Normal University and the M.S. degree in applied at College of Engineering and Technology in Department of
science, University of Science and Technology Beijing in Informatics, Wollega University T.R, Ethiopia. He received
1983 and 1989, respectively. He worked for University of his Bachelor degree in Information Technology (2007)
Science and Technology Beijing as an associate professor from Department of Information Technology, St.Theressa
from 1989 to 1994. He received his doctorate in Math- Institute of Engineering and Technology, Jawaharlal Nehru
ematics, University of Toronto in 1998. He worked for Technological University, Andhra Pradesh, India. He Joined
School of Software of Tianjin University as a full professor Karunya University Coimbatore, India for Masters in Net-
(from 2002 to 2004). and, from 2004 to now, for School of work and Internet Engineering (2010). He has worked as
Software Technology of Dalian University of Technology as Assistant Professor at Department of Information Tech-
a full Professor and Vice Dean. His main research interests nology, SRK Institute of Technology, Vijayawada, Andhra
include theoretical computer science and information security, and trust models Pradesh, (2010 – 2013) India. His main Research interests include Storage Area Net-
and cooperative game theory. works, Network security, Energy Efficient Routing in wireless Networks, Artificial
Intelligence.

Chang Liu received the B.S. degree in electronic informa-


tion engineering from Dalian Maritime University, Dalian, Waseef Jamal is an Assistant Professor at the Center of
China, in 2012, and the Ph.D. degree in signal and informa- Business Administration, Institute of Management Sci-
tion processing, Dalian University of Technology, China. ences, Hayatabad, Peshawar, (Pakistan). His research in-
From 2015 to 2016, he was a visiting scholar in depart- terest is in the area of organization behavior and Invest-
ment of electrical engineering and computer science at ment in Human capital. Dr. Waseef has no of national and
University of Tennessee, Knoxville, USA. He is currently a international journal publication and presented papers
postdoctoral research fellow with the National Key Lab- at both local and international conferences. Additionally,
oratory of Science and Technology on Communications, he is also a Trainer, Consultant and philanthropist and
University of Electronic Science and Technology of China, member of environmental committee at Institute of Man-
Chengdu, China, 611731. His research interests include agement Sciences.
Spectrum Sensing in Cognitive Radio, Statistical Signal Processing, Random Matrix
Theory and Array Signal Processing.

Please cite this article in press as: A. Carie, et al., Hybrid Directional CR-MAC based on Q-Learning with Directional Power Control, Future Generation Computer Systems
(2017), https://doi.org/10.1016/j.future.2017.11.014.

You might also like