You are on page 1of 5

Stochastic Channel Selection in Cognitive Radio

Yang Song and Yuguang Fang

Yanchao Zhang

Department of Electrical and Computer Engineering

University of Florida
Gainesville, Florida 32611
Email: {yangsong@, fang@ece.}

Department of Electrical and Computer Engineering

New Jersey Institute of Technology
University Heights, Newark, NJ 07102

Abstract In this paper, we investigate the channel selection

strategy for secondary users in cognitive radio networks. We
claim that in order to avoid the costly channel switchings, a
secondary user may desire an optimal channel which maximizes
the probability of successful transmissions, rather than consistently adapting channels to the random environment. We propose
a stochastic channel selection algorithm based on the learning
automata techniques. This algorithm adjusts the probability of
selecting each available channel and converges to the -optimal
solution asymptotically.

In wireless communication, the most valuable resource is
the available radio frequencies. The dramatic increase in wireless services aggravates the scarcity of the frequency spectrum.
In Unite States, the Federal Communications Commission
(FCC) regulates the frequency spectrum and allocates it in
an exclusive-usage fashion. Currently, the spectrum is overcrowded and there is few space available for the new emerging
wireless services. Ironically, as reported in a set of real
experiments [1], the majority of the current allocated spectrum
is severely under-utilized. For example, in [1], it is observed
that, on average, there is only about 5.2% of the spectrum
below 3GHz is actually in use. The ubiquitous spectrum
holes and the growth of new wireless services motivate the
DAPRA to start the XG program [2] in order to investigate
the policies and regulation rules which enable the spectrum
holes to be utilized by other users, a.k.a., secondary user,
more efficiently. The new scheme is termed as opportunistic
spectrum access (OSA) or dynamic spectrum access (DSA) in
In order to dynamically sense the frequency spectrum and
adjust the operating frequency, the cognitive radios are proposed as the solution to fully utilize the valuable spectrum.
Cognitive radios, usually based on the software defined radio
(SDR) techniques, are able to detect the current spectrum,
sense the spectrum holes and adjust the parameters, such as
frequency, power and transmission rate, in order to use the
spectrum more efficiently and economically. For example, a
novel IEEE 802.22 standard is introduced in [3], which enables
the cognitive radios to utilize the unused analog TV broadcast
This work was supported in part by the U.S. National Science Foundation
under Grant DBI-0529012 and under Grant CNS-0626881.

band as secondary users. They are allowed to operate at the TV

broadcast band whenever the primary user, i.e., TV broadcast
signal, is absent. Moreover, if there are multiple primary users
exist, i.e., there are multiple frequency opportunities available,
the cognitive radios can dynamically switch among all available frequencies following the unpredictable returns of the primary users. There are several MAC protocols proposed in the
literature [4] for cognitive radio networks, which serve as the
guidelines of switching channels according the environmental
changes. However, few of them take account of the drastic
cost of changing frequencies in current manufactured wireless
devices. The aggravated delay and the deteriorated packet-lossratio induced by the channel switchings have caught more and
more attention in the community [5] [6] [7] [8] [9] [10] [11].
For example, a recent experimental study [12] shows that, on
average, channel switchings can cause up to a packet loss ratio
of 3%, exclusively. In cognitive radio networks, the situation
is even worse. Due to the random characteristics of primary
users, the secondary user with frequency-agile MAC protocols
needs to switch the operating frequency adaptively and consistently. Therefore, the long-term performance, in terms of
QoS support and aggregated throughput, will be dramatically
degraded due to the overwhelming cost in channel switchings.
Thus, from the secondary users point of view, it may be more
favorable to implement a channel selection strategy which
reluctates to switch the channels, unless necessary. Moreover,
note that the essential rationale of adapting channels is to
minimize the conflicts with the primary users. Hence, we
claim that the prudent strategy for a secondary user is to
insist on the statistically optimal channel which maximizes the
probability of successful transmissions, rather than adaptively
switching channels as the random and unpredictable behaviors
of primary users fluctuate.
However, the problem of acquiring the optimal channel is by
no means straightforward. Were the distributions of returning
time and operating duration of primary users known as a
prior, the problem is easy to solve, yet this knowledge is
not attainable in practical cognitive radio networks. In this
paper, we propose a novel stochastic channel selection algorithm based on learning automata (LA), which dynamically
adjusts the probability of choosing one channel on the fly and
asymptotically converges to the optimal channel, in the sense

that the probability of successful transmissions is maximized.

The rest of this paper is organized as follows. Section II
briefly overviews the background and the motivation of this
paper. The stochastic channel selection algorithm is proposed
in Section III. An illustrative example of the algorithm is
provided in Section IV. Finally, Section V concludes this
A. Cognitive Radio Networks
In the literature, the dynamic spectrum access schemes are
categorized into three classes [13]: exclusive use model, open
sharing model and hierarchical access model. The exclusive
use model represents the scenarios where the primary users
can sublease part of the private spectrum to other secondary
users for profit. Each user, either primary or secondary, does
not share the spectrum with other users. However, in the open
sharing model, all the users obey a certain spectrum etiquette
and share the spectrum with each other. The unlicensed ISM
band is a well-known example. In the hierarchical access
model, primary users and secondary users coexist and the secondary users can transmit only when the summed interference
measured at the primary users receiver is below a certain
threshold. For detailed overview of the dynamic spectrum
access schemes for cognitive radio networks, one may refer to
[14] and a new comprehensive survey in [13].
In our work, we consider a hierarchical access model of
cognitive radio networks with minimum interference tolerance
where the secondary users are required to terminate the current
transmission immediately after the corresponding primary user
returns. In the network, there are multiple primary users with
different frequencies and each secondary user is a pair of
transmitter and receiver which are both cognitive radios. We
assume that the secondary users have the capability of sensing
the spectrum and the willingness of terminating the current
transmission whenever the primary user returns. The traffic
generated on each secondary user is transmitted over consequent time slots. At the beginning of each time slot, the loaded
secondary user selects one of the available frequencies as the
transmission frequency, and transmits. If the corresponding
primary user, i.e., the one who owns the selected channel,
does not return within this time slot, the transmission is
successful. On the contrary, if the primary user returns, the
current transmission will be terminated and the secondary
user needs to retransmit all the packets in the next available
time slot. We also consider the influence of other secondary
users on the channel selection strategy of this particular
secondary user. Apparently, whether the transmission of this
particular secondary user, say k, is successful or not depends
not only the primary users but also the neighboring users of
k as well. The definition of neighboring users depends on
the specific spectrum etiquette. For example, if we use the
traditional carrier sensing mechanism, the neighboring users
of k is defined as the secondary users (1) whose transmitter
is within the carrier sensing range of the k-th receiver, or
(2) whose receiver is within the carrier sensing range of the

k-th transmitter. Thus, we define that for the k-th secondary

user, in a time slot, the transmission is successful if (1) at the
beginning of the slot, there is no neighboring users1 operating
at the same frequency channel as k has selected, and (2) the
primary user who owns this channel does not return within
the current time slot, and define as failure otherwise.
From the k-th secondary users perspective, the strategy
of dynamically adapting channels according to the random
environment, which consists of all primary users and coexisting secondary users, is likely unfavorable, owing to the
dominant and burdensome cost in channel switchings. Instead,
one prudent secondary user may want to find the optimal
channel in terms of maximizing the probability of successful
transmissions. Hence, the expected throughput is maximized
by insisting on the statistically optimal channel, if the transmission rate is assumed fixed. Besides, there is no more costly
channel switchings once the optimal channel is obtained. However, as mentioned above, the problem of acquiring the optimal
channel from a zero-knowledge on the statistics of the random
environment, which consists of nondeterministic primary users
and unpredictable coexisting secondary users, is challenging.
Fortunately, the learning automata (LA) provides us a family
of algorithmic tools which converge to the optimum strategy
asymptotically in arbitrary stationary random environments
with no prior distribution knowledge required. We will briefly
overview the principles of the learning automata in the next
B. Learning Automata
In classical control theory, the mathematical model for a
system is usually assumed to be known. The expected output
of the current input can be predicted in a deterministic fashion.
Although the later developed stochastic control theory deals
with the issues of uncertain parameters, they both assume
an unchanging system. Stochastic learning approaches are
introduced to solve the problems of random systems. The scenarios where learning automata techniques can be applied are
summarized as follows [15]. In a random environment, there
are finite actions available for the decision maker to choose
from. Each deterministic action induces a random output from
the random environment, which could be either favorable or
unfavorable. Based on the observations, an LA algorithm is
expected to determine a strategy of selecting actions at a
stage given the past actions and corresponding outputs. We
can view the random environment as a black box, depicted in
Figure 1. At each decision instance, the decision maker picks
an action according a certain rule as the input to the random
environment and waits for the random output. Apparently, if
we are patient enough, we could execute voluminous trials
for each action, say a million times, to estimate the optimal
action in terms of maximum probability of getting rewards.
The beauty of the LA approaches lies in that they can converge
to the -optimal solution with a reasonable fast pace for
1 In general, the neighboring users of k is defined as the secondary users
whose actions may potentially cause a transmission failure of user k.

Random Environment

One of the actions

Random Output

Decision Maker
Fig. 1.
The interaction between the decision maker and the random

arbitrary stationary random environments. In addition, the LA

approaches can also track the optimal action for non-stationary
environments as long as the statistic characteristics of the
environment does not change very fast. Recently, the learning
automata techniques have been extensively applied in various
aspects in the communication and networking communities
[16] [17] [18] [19] [20]. For an overview of learning automata
techniques, refer to [21].
Let us revisit the paradigm of cognitive radio networks.
From the k-th secondary users perspective, the outer world
can be viewed as a random environment. The randomness in
the response from this outer world is induced by both the
unpredictable returns of primary users and the uncontrollable
behaviors of neighboring users. We consider this complex
outer world for the secondary user k as a black box. At the
beginning of each transmission slot, the secondary user selects
one of the available channels. The output of this black box is
a binary number where = 0 denotes the transmission
is successful and = 1 otherwise. In the next section,
we will propose an LA-based stochastic channel selection
algorithm for secondary users in cognitive radio networks. The
convergence property of the algorithm will also be discussed.
We consider a stationary2 cognitive radio network where
M primary users and C secondary users coexist, illustrated
in Figure 2. Each secondary user is a cognitive transmitter......

M primary users/channels

C secondary users


Fig. 2.


Topology of the cognitive radio network.

2 We assume the random environment is stationary, however, the proposed

stochastic channel selection algorithm can track the optimal solution in slow
statistic-varying environment as well.

receiver pair which can adjust the transmission frequency

among 1, , M indexed channels. The transmission of the
primary users are protected by enforcing the secondary users to
exit immediately whenever the legitimate primary user comes
back. For simplicity, we assume that each secondary user has a
fixed transmission rate. The traffic generated at the transmitter
of the k-th secondary user is transmitted to the receiver in
consequent time slots. We denote the starting time of each
slot as Ti where i is the slot number. Each secondary user,
say k, keeps a probability vector Pk = [p1 , , pM ] and
an estimation vector Dk = [d1 , , dM ], where pi is the
probability of choosing channel i and di is the estimated
possibility of successful transmission by selecting channel i.
At the beginning of a slot, say m-th slot, the transmitter
picks a channel according to the probability vector Pk . If the
selected channel is busy, i.e., the primary user is present, the
same process repeats until an available channel is selected and
transmission begins. If the transmission is successful, user k
will set = 0 and = 1 otherwise. Based on the received
value of , the user k updates the probability vector Pk and
the estimation vector Dk . Then enters the next time slot Tm+1 .
The algorithm executed in the k-th secondary users is
provided in detail as follows.
R: The resolution parameter (tunable).
W : The initialization parameter (tunable).
M : The number of primary users, i.e., the number of
channel opportunities.
C: The number of secondary users.
H(n): The number of channels which have higher values
in the estimation vector D than the current selected
: The step size of adjusting the probability vector and
= R1 .
Si (n): The number of slots where the transmissions with
channel i are successful, up to Tn .
Ci (n): The number of slots where the channel i are
chosen to transmit packets, up to Tn .
- The user k sets P (0) = [p1 , , pM ] where pi = M
all 1 i M .
- Chooses an available channel according to P (0) then
transmits and records the values of s, until each channel
is selected W times.
- Depending on the values of s, i.e., whether the corresponding transmissions were successful, sets Si (0) and
Ci (0) for each channel i.
Si (0)
- Initializes D(0) = [d1 , , dM ] where di = C
i (0)
3 We

omit the subscript k in the following algorithm description.

- Selects an available transmission channel i according to

the probability vector P (n).
- Update the probability vector P (n) according to the
following equations.
pj (n + 1)

pj (n + 1)

pi (n + 1)

min pj (n) +

j if dj (n) > di (n).

max pj (n)
M H(n)
j if dj (n) < di (n)
pj (n + 1)





- After Tn , the secondary user adjusts the running estimation of the successful transmission possibility, i.e., D, by
Si (n + 1)
Ci (n + 1)


di (n + 1)

Si (n) + (1 )
Ci (n) + 1
Si (n + 1)
Ci (n + 1)


where = 0 if the transmission is successful and = 1


the uncertainty of the random environment comes solely from

the unpredictable returns of the primary users. We assume
that there are five primary users with distinct channels. The
traffic on the secondary user is transmitted over successive
time slots. The returning probability vector consists of
five probabilities, i.e., = [p1 , , p5 ] where pi denotes the
probability of returning of the i-th primary user in each slot.
Without loss of generality, we assume that the primary users
have short jobs and hence the durations are within one time
slot. The simulation parameters are summarized as follows.
R: The resolution parameter is set to R = 50.
W: The initialization parameter is set to W = 10.
M: There are five primary users, i.e., M = 5.
C: There is a single secondary user, i.e., C = 1.
: The step size is set to = R = 0.02.
B: The convergence threshold is set to B = 0.9999.
We assume that the returning probability vector is =
[0.2, 0.1, 0.3, 0.4, 0.3]. By inspection, we observe that the second primary user has the least possibility of returning in each
slot. Therefore, channel 2 is the optimal strategy in the sense
that the probability of successful transmissions is maximized.
The stochastic channel selection algorithm implemented in
the secondary user is expected to find this optimal strategy
asymptotically, which is verified in Figure 3.
M = 5, C = 1

- max(P (n)) > B where B is a predefined convergence




P r{|1 pz (n)| < } > 1 n > n0


where z is the index of the optimal channel in terms of

probability of successful transmissions.
In this section, we use a simple example to illustrate the
effectiveness of the stochastic channel selection algorithm
introduced in the previous section.
For simplicity, we consider a single-secondary-user cognitive radio network with consistent traffic load. In other words,



The channel selection algorithm introduced above falls into

the family of the Discrete Generalized Pursuit Algorithms
(DGPA) defined in [22] where the convergence speeds of various learning automata approaches are compared numerically.
It is shown that DGPA converges remarkably faster than other
LA-based algorithms. Moreover, the -optimality of DGPA is
proved in [22], based on the earlier results in [23]. Thus,
we provide the following theorem for the stochastic channel
selection algorithm without proof due to the page limit.
Theorem 1: The stochastic channel selection algorithm is
-optimal for any stationary cognitive radio networks. In other
words, for any arbitrarily small > 0 and > 0, there exists
a n0 satisfying

Channel 1
Channel 2
Channel 3
Channel 4
Channel 5




Fig. 3. The probability updating history of the stochastic channel selection

Figure 3 depicts the trajectories of the updating processes

of channel selection probabilities. At the initialization step,
each channel is assigned with an equal probability. We notice
that channel 3, 4, 5 are eliminated by the stochastic channel
selection algorithm soon. Channel 1 has been preserved by the
algorithm since p1 = 0.2 and it is the second-best candidate
to the optimal strategy, i.e., channel 2 where p2 = 0.1.
Interestingly, in a certain period of time, channel 1 outperforms
channel 2 dramatically, e.g., at around the 70-th iteration,
the probabilities of selecting channel 1 and 2 are 0.7 v.s.


0.3, as denoted in Figure 3. However, the stochastic channel

selection algorithm leans to the optimal strategy gradually and
the probability of selecting the optimal channel approaches 1
asymptotically, as expected.
We consider the design issues of optimal channel selection
for secondary users in cognitive radio networks. We claim
that the overwhelming cost in frequent channel-switchings,
in terms of aggravated delay and packet loss ratio, makes
the schemes with less consistent switchings more appealing
and favorable. We propose a stochastic channel selection
algorithm based on the learning automata techniques. This
algorithm asymptotically converges to the -optimal strategy,
in the sense that the probability of successful transmissions
is maximized. Our analysis shows the effectiveness of the
proposed algorithm.
[1] [Online]. Available:
[2] [Online]. Available:
[3] C. Cordeiro, K. Challapali, D. Birru, and S. S. N, Ieee 802.22: the
first worldwide wireless standard based on cognitive radios, First IEEE
International Symposium on New Frontiers in Dynamic Spectrum Access
Networks, DySPAN05, pp. 328 337, Nov. 2005.
[4] Proceedings of the first IEEE Symposium on New Frontiers in Dynamic
Spectrum Access Networks, Nov.2005.
[5] P. Kyasanur, J. So, C. Chereddi, and N. H. Vaidya, Multi-channel mesh
networks: Challenges and protocols, IEEE Wireless Communications,
Apr.2006, invited paper.
[6] J. Shi, T. Salonidis, and E. W.Knightly, Starvation mitigation through
multi-channel coordination in csma multi-hop wireless networks, Proceedings of the seventh ACM international symposium on Mobile ad hoc
networking and computing, MobiHoc06, pp. 214225, 2006.
[7] P. Kyasanur and N. H. Vaidya, Capacity of multichannel wireless
networks: Impact of channels, interfaces, and interface switching delay,
University of Illinois at Urbana-Champaign, Tech. Rep., Oct.2006.
[8] V. Bhandari and N. H.Vaidya, Connectivity and capacity of multichannel wireless networks with channel switching constraints, Proceedings of IEEE INFOCOM07.
[9] J. A. Patel, H. Luo, and I. Gupta, A cross-layer architecture to exploit
multi-channel diversity with a single transceiver, Proceedings of IEEE
[10] P. Kyasanur and N. H.Vaidya, Routing in multi-channel multi-interface
ad hoc wireless networks, University of Illinois at Urbana-Champaign,
Tech. Rep., 2004.
[11] C. Chereddi, System architecture for multichannel multi-interface
wireless networks, Masters thesis, University of Illinois at UrbanaChampaign, 2006.
[12] P. Li, N. Scalabrino, and Y. Fang, Channel switching cost in multichannel wireless mesh networks, 2007, to appear in Milcom07.
[13] Q. Zhao and B. M. Sadler, A survey of dynamic spectrum access: Signal
processing, networking, and regulatory policy, IEEE Signal Processing
Magazine, May 2007.
[14] I. F. Akyildiz, L. W.Y., V. M.C., and M. S., Next generation/dynamic
spectrum access/cognitive radio wireless networks: A survey, Computer
Networks Journal, (Elsevier), Sep. 2006.
[15] C. Unsal, Intelligent navigation of autonomous vehicles in an automated
highway system: Learning methods and interacting vehicles approach,
Ph.D. dissertation, Virginia Polytechnic Institute and State University,
1998, chapter 3 : Stochastic Learning Automata.
[16] Y. Xing and R. Chandramouli, Price dynamics in competitive agile
spectrum access markets, IEEE Journal on Selected Areas in Communications, Mar.2007.
[17] M. A. Haleem and R. Chandramouli, Adaptive downlink scheduling
and rate selection: A cross layer design, IEEE Journal on Selected
Areas in Communications, vol. 23, Jun. 2005.

[18] , Adaptive transmission rate assignment for fading wireless channels with pursuit learning algorithm, Proceedings of CISS, Princeton,
[19] S. Kiran and R. Chandramouli, An adaptive energy-efficient link
layer protocol using stochastic learning control, IEEE International
Conference on Communications (ICC), 2003.
[20] Y. Xing and R.Chandramouli, Distributed discrete power control for
bursty transmissions over wireless data networks, IEEE International
Conference on Communications (ICC), 2004.
[21] M. A. L. Thathachar and P. S. Sastry, Varieties of learning automata:
An overview, IEEE Transactions on systems, man, and cybernetics,
vol. 32, Dec.2002.
[22] M. Agache and B. Oommen, Generalized pursuit learning schemes:
New families of continuous and discretized learning automata, IEEE
Transactions on systems, man, and cybernetics, vol. 32, Dec.2002.
[23] B. J. Oommen and J. K. Lanctot, Discretized pursuit learning automata, IEEE Transactions on systems, man, and cybernetics, vol. 20,