Professional Documents
Culture Documents
Abstract—The problem of distributed learning and channel ac- to resource and hardware constraints, they can sense only a
cess is considered in a cognitive network with multiple secondary part of the spectrum at any given time.
users. The availability statistics of the channels are initially
unknown to the secondary users and are estimated using sensing We consider a slotted cognitive system where each sec-
decisions. There is no explicit information exchange or prior ondary user can sense and access only one orthogonal channel
agreement among the secondary users and sensing and access in each transmission slot (see Fig. 1). Under sensing con-
decisions are undertaken by them in a completely distributed straints, it is thus beneficial for the secondary users to select
manner. We propose policies for distributed learning and ac- channels with higher mean availability, i.e., channels which
cess which achieve order-optimal cognitive system throughput
(number of successful secondary transmissions) under self play, are less likely to be occupied by the primary users. However,
i.e., when implemented at all the secondary users. Equivalently, in practice, the channel availability statistics are unknown to
our policies minimize the sum regret in distributed learning and the secondary users at the start of the transmissions.
access, which is the loss in secondary throughput due to learning Since the secondary users are required to sense the medium
and distributed access. For the scenario when the number of
secondary users is known to the policy, we prove that the total
before transmission, can these sensing decisions be used to
regret is logarithmic in the number of transmission slots. This learn the channel availability statistics? If so, using these
policy achieves order-optimal regret based on a logarithmic lower estimated channel availabilities, can we design channel access
bound for regret under any uniformly-good learning and access rules which maximize the transmission throughput? Designing
policy. We then consider the case when the number of secondary provably efficient algorithms to accomplish the above goals
users is fixed but unknown, and is estimated at each user through
feedback. We propose a policy whose sum regret grows only
forms the focus of our paper. Such algorithms need to be
slightly faster than logarithmic in the number of transmission efficient, both in terms of learning and channel access.
slots. For any learning algorithm, there are two important perfor-
Index Terms—Cognitive medium access control, multi-armed mance criteria: consistency and regret bounds [3]. A learning
bandits, distributed algorithms, logarithmic regret. algorithm is said to be consistent if the learnt estimates
converge to the true values as the number of samples goes
to infinity. The regret1 of a learning algorithm is a measure
I. I NTRODUCTION of the speed of convergence, and is thus meaningful only for
consistent algorithms. In our context, the users need to learn
T HERE has been extensive research on cognitive radio
network in the past decade to resolve many challenges not
encountered previously in traditional communication networks
the channel availability statistics consistently in a distributed
manner. The regret in this case is defined as the loss in
(see e.g., [2]). One of the main challenges is to achieve secondary throughput due to learning when compared to the
coexistence of heterogeneous users accessing the same part of ideal scenario where the channel statistics are known perfectly
the spectrum. In a hierarchical cognitive network, there are two (see (2)). It is thus desirable to design distributed learning
classes of transmitting users, viz., the primary users who have algorithms with small regret.
priority in accessing the spectrum and the secondary users who Additionally, we consider a distributed framework where
opportunistically transmit when the primary user is idle. The there is no information exchange or prior agreement among
secondary users are cognitive and can sense the spectrum to the secondary users. This introduces additional challenges:
detect the presence of a primary transmission. However, due it results in loss of throughput due to collisions among the
secondary users, and there is now competition among the
Manuscript received 1 December 2009; revised 4 June 2010. During the secondary users since they all tend to access channels with
stint of this work, the first author was supported by MURI through AFOSR
Grant FA9550-06-1-0324. The second and the third authors are supported in higher availabilities. It is imperative that the channel access
part through NSF grant CCF-0835706. Parts of this paper were presented at policies overcome the above challenges. Hence, a distributed
[1]. learning and access policy experiences regret both due to
A. Anandkumar is with the Center for Pervasive Communications
and Computing at the Department of Electrical Engineering and Com- learning the unknown channel availabilities as well as due
puter Science, University of California, Irvine, CA, 92697, USA (e-mail: to collisions under distributed access.
a.anandkumar@uci.edu).
N. Michael and A. K. Tang are with the School of Electrical and
Computer Engineering, Cornell University, Ithaca, NY 14853, USA (e-mail: 1 Note that the sum regret is a finer measure of performance of a
nm373@,atang@ece.cornell.edu). distributed algorithm than the time-averaged total throughput of the users
A. Swami is with the Army Research Laboratory, Adelphi, MD 20783, since any sub-linear regret (with respect to time) implies optimal average
USA (e-mail: a.swami@ieee.org). throughput. On the other hand, regret is equivalent to total throughput and
Digital Object Identifier 10.1109/JSAC.2011.110406. hence, optimal regret is equivalent to achieving optimal total throughput.
0733-8716/11/$25.00
c 2011 IEEE
732 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 29, NO. 4, APRIL 2011
Recently, the work in [21] considers combinatorial bandits, n ≥ n0 for a fixed n0 ∈ N, and f (n) = Θ(g(n)) if f (n) =
where a more general model of different (unknown) channel Ω(g(n)) and f (n) = O(g(n)). Also, f (n) = o(g(n)) when
availabilities is assumed for different secondary users, and a f (n)/g(n) → 0 and f (n) = ω(g(n)) when f (n)/g(n) → ∞
matching algorithm is proposed for jointly allocating users as n → ∞.
to channels. The algorithm is guaranteed to have logarithmic For a vector µ, let |µ| denote its cardinality, and let
regret with respect to the number of transmission slots and µ = [μ1 , μ2 , . . . , μ|µ| ]T . We refer to the U highest entries
polynomial storage requirements. A decentralized implemen- in a vector µ as the U -best channels and the rest as the U -
tation of the proposed algorithm is proposed but it still worst channels, where 1 ≤ U ≤ |µ|. Let σ(a; µ) denote
requires information exchange and coordination among the the index of the ath highest entry in µ. Alternatively, we
users. In contrast, we propose algorithms which removes this abbreviate a∗ :=σ(a; µ) for ease of notation. Let B(μ) denote
requirement albeit in a more restrictive setting of identical the Bernoulli distribution with mean μ. With abuse of notation,
channel availabilities for all users. let D(μ1 , μ2 ):=D(B(μ1 ); B(μ2 )) be the Kullback-Leibler
Recently, Liu and Zhao [4] proposed a family of distributed distance between the Bernoulli distributions B(μ1 ) and B(μ2 )
learning and access policies known as time-division fair share [23] and let Δ(1, 2):=μ1 − μ2 .
(TDFS), and proved logarithmic regret for these policies. They
established a lower bound on the growth rate of system regret
for a general class of uniformly-good decentralized polices. A. Sensing & Channel Models
The TDFS policies in [4] can incorporate any order-optimal Let U ≥ 1 be the number of secondary users3 and C ≥ U
single-player policy while our work here is based on the be the number4 of orthogonal channels available for slotted
single-user policy proposed in [11]. Another difference is transmissions with a fixed slot width. In each channel i and
that in [4], the users orthogonalize via settling at different slot k, the primary user transmits i.i.d. with probability 1 −
offsets in their time-sharing schedule, while in our work here, μi > 0. In other words, let Wi (k) denote the indicator variable
users orthogonalize into different channels. Moreover, the if the channel is free
TDFS policies ensure that each player achieves the same time-
average reward while our policies here achieve probabilistic 0, channel i occupied in slot k
Wi (k) =
fairness, in the sense that the policies do not discriminate be- 1, o.w,
tween different users. In [22], the TDFS policies are extended i.i.d.
to incorporate imperfect sensing. and we assume that Wi (k) ∼ B(μi ).
In our prior work [1], we first formulated the problem The mean availability vector µ consists of mean availabil-
of decentralized learning and access for multiple secondary ities μi of all channels, i.e., is µ:=[μ1 , . . . , μC ], where all
users. We considered two scenarios: one where there is initial μi ∈ (0, 1) and are distinct. µ is initially unknown to all the
common information among the secondary users in the form of secondary users and is learnt independently over time using
pre-allocated ranks, and the other where no such information the past sensing decisions without any information exchange
is available. In this paper, we analyze the distributed policy in among the users. We assume that channel sensing is perfect
detail and prove that it has logarithmic regret. In addition, we at all the users.
also consider the case when the number of secondary users is Let Ti,j (k) denote the number of slots where channel i is
unknown, and provide bounds on regret in this scenario. sensed in k slots by user j (not necessarily C being the sole
Organization: Section II deals with the system model, user to sense that channel). Note that i=1 Ti,j (k) = k for
Section III deals with the special case of single secondary all users j, since each user senses exactly one channel in every
user and of multiple users with centralized access which time slot. The sensing variables are obtained as follows: at the
can be directly solved using the classical results on multi- beginning of the k th slot, each secondary user j ∈ U selects
armed bandits. In Section IV, we propose distributed learning exactly one channel i ∈ C for sensing, and hence, obtains the
and access policy with provably logarithmic regret when the value of Wi (k), indicating if the channel is free. User j then
number of secondary users is known. Section V considers the records all the sensing decisions of each channel i in a vector
scenario when the number of secondary users is unknown. Xki,j :=[Xi,j (1), . . . , Xi,j (Ti,j (k))]T . Hence, ∪C
i=1 Xi,j is the
k
Section VI provides a lower bound for distributed learning. collection of sensed decisions for user j in k slots for all the
Section VII includes some simulation results for the proposed C channels.
schemes and Section VIII concludes the paper. Most of the We assume the collision model under which if two or
proofs are found in the Appendix. more users transmit in the same channel then none of the
Since Section III mostly deals with a recap of the clas- transmissions go through. At the end of each slot k, each
sical results on multi-armed bandits, those familiar with the user j receives acknowledgement Zj (k) on whether its trans-
literature (e.g., [8]–[11]) may skip this section without loss of mission in the k th slot was received. Hence, in general, any
continuity. policy employed by user j in the (k + 1)th slot, denoted by
ρ(∪C i=1 Xi,j , Zj ), is based on all the previous sensing and
k k
II. S YSTEM M ODEL & F ORMULATION feedback results.
Notation: For any two functions f (n), g(n), f (n) =
3A user refers to a secondary user unless otherwise mentioned.
O(g(n)) if there exists a constant c such that f (n) ≤ cg(n) 4 When U ≥ C, learning availability statistics is less crucial, since all
for all n ≥ n0 for a fixed n0 ∈ N. Similarly, f (n) = Ω(g(n)) channels need to be accessed to avoid collisions. In this case, design of
if there exists a constant c such that f (n) ≥ c g(n) for all medium access is more crucial.
734 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 29, NO. 4, APRIL 2011
Algorithm 1 Single User Policy ρ1 (g(n)) in [10]. multi-armed bandit process. Lai and Robbins [8] first analyzed
Input: {X̄i (n)}i=1,...,C : Sample-mean availabilities after n schemes for multi-armed bandits with asymptotic logarithmic
rounds, g(i; n): statistic based on X̄i,j (n), regret based on the upper confidence bounds on the unknown
σ(a; g(n)): index of ath highest entry in g(n). channel availabilities. Since then, simpler schemes have been
Init: Sense in each channel once, n ← C, Curr_Sel ← C. proposed in [10], [11] which compute a statistic or an index for
Loop: n ← n + 1 each arm (channel), henceforth referred to as the g-statistic,
Curr_Sel ← channel corresponding to highest entry in g(n) based only on its sample mean and the number of slots where
for sensing. If free, transmit. the particular arm is sensed. The arm with the highest index is
selected in each slot in these works. We summarize the policy
in Algorithm 1 and denote it ρ1 (g(n)), where g(n) is the
B. Regret of a Policy vector of scores assigned to the channels after n transmission
Under the above model, we are interested in designing slots.
policies ρ which maximize the expected number of successful The sample-mean based policy in [11, Thm. 1] proposes an
transmissions of the secondary users subject to the non- index for each channel i and user j at time n is given by
interference constraint for the primary users. Let S(n; µ, U, ρ)
be the expected total number of successful transmissions after 2 log n
gj (i; n):=X̄i,j (Ti,j (n)) +
MEAN
, (4)
n slots under U number of secondary users and policy ρ. Ti,j (n)
In the ideal scenario where the availability statistics µ are
known a priori and a central agent orthogonally allocates the where Ti,j (n) is the number of slots where user j selects
secondary users to the U -best channels, the expected number channel i for sensing,
of successful transmissions after n slots is given by Ti,j (n)
Xi,j (k)
U X̄i,j (Ti,j (n)):=
S ∗ (n; µ, U ):=n μ(j ∗ ), (1) Ti,j (n)
k=1
j=1
is the sample-mean availability of channel i, as sensed by user
where j ∗ is the j th -highest entry in µ. j.
It is clear that S ∗ (n; µ, U ) > S(n; µ, U, ρ) for any policy The statistic in (4) captures the exploration-exploitation
ρ and any finite n ∈ N. We are interested in minimizing the tradeoff between sensing the channel with the best predicted
regret in learning and access, given by availability to maximize immediate throughput and sensing
R(n; µ, U, ρ):=S ∗ (n; µ, U ) − S(n; µ, U, ρ) > 0. (2) different channels to obtain improved estimates of their avail-
abilities. The sample-mean term in (4) corresponds to exploita-
We are interested in minimizing regret under any given µ ∈ tion while the other term involving Ti,j (n) corresponds to
(0, 1)C with distinct elements. exploration since it penalizes channels which are not sensed
By incorporating the collision channel model assumption often. The normalization of the exploration term with log n
with no avoidance mechanisms5, the expected throughput in (4) implies that the term is significant when Ti,j (n) is
under the policy ρ is given by much smaller than log n. On the other hand, if all the channels
C
U are sensed Θ(log n) number of times, the exploration terms
S(n; µ, U, ρ) = μ(i)E[Vi,j (n)], become unimportant in the g-statistics of the channels and the
i=1 j=1 exploitation term dominates, thereby, favoring sensing of the
where Vi,j (n) is the number of times in n slots where user channel with the highest sample mean.
j is the sole user to sense channel i. Hence, the regret in (2) The regret based on the above statistic in (4) is logarithmic
simplifies as for any finite number of slots n but does not have the optimal
scaling constant. The sample-mean based statistic in [10,
U
C
U
Example 5.7] leads to the optimal scaling constant for regret
R(n; ρ) = nμ(k ∗ ) − μ(i)E[Vi,j (n)]. (3)
k=1 i=1 j=1
and is given by
III. S PECIAL C ASES F ROM K NOWN R ESULTS log n
gj (i; n):=X̄i,j (Ti,j (n)) + min
OPT
, 1 . (5)
We recap the bounds for the regret under the special cases 2Ti,j (n)
of a single secondary user (U = 1) and multiple users with
centralized learning and access by appealing to the classical In this paper, we design policies based on the g MEAN statistic
results on the multi-armed bandit process [8]–[10]. since it is simpler to analyze than the g OPT statistic. This is
because g MEAN is a continuous function in Ti,j (n) while g OPT
has a threshold on Ti,j (n).
A. Single Secondary User (U = 1)
We now recap the results which show logarithmic regret in
When there is only one secondary user, the problem of learning the best channel. In this context, we define uniformly
finding policies with minimum regret reduces to that of a good policies ρ [8] as those with regret
5 If the users employ CSMA-CA to avoid collisions then the regret is
reduced. The bounds derived in this paper are applicable for this case as R(n; µ, U, ρ) = o(nα ), ∀α > 0, µ ∈ (0, 1)C . (6)
well.
ANANDKUMAR et al.: DISTRIBUTED ALGORITHMS FOR LEARNING AND COGNITIVE MEDIUM ACCESS WITH LOGARITHMIC REGRET 735
Theorem 1 (Logarithmic Regret for U = 1 [10], [11]): Algorithm 2 Centralized Learning Policy ρCENT in [9].
For any uniformly good policy ρ satisfying (6), the expected Input: X n := ∪U j=1 ∪i=1 Xi,j : Channel availability after n
C n
time spent in any suboptimal channel i = 1∗ satisfies slots, g(n): statistic based on X n ,
σ(a; g(n)): index of ath highest entry in g(n).
(1 − ) log n
lim P Ti,1 (n) ≥ ; µ = 1, (7) Init: Sense in each channel once, n ← C
n→∞ D(μi , μ1∗ )
Loop: n ← n + 1
where 1∗ is the channel with the best availability. Hence, the Curr_Sel ← channels with U -best entries in g(n). If free,
regret satisfies transmit.
R(n; µ, 1, ρ) Δ(1∗ , i)
lim inf ≥ . (8)
n→∞ log n D(μi , μ1∗ )
i∈1-worst
Notice that for both the single user case U = 1 (Theorem 1)
The regret under the g OPT statistic in (5) achieves the above and the centralized multi-user case (Theorem 2), the number
bound. of time slots spent in the U-worst channels is Θ(log n) and
R(n; µ, 1, ρ1 (gjOPT )) Δ(1∗ , i) hence, the regret is also Θ(log n).
lim = . (9)
n→∞ log n i∈1-worst
D(μi , μ1∗ )
The regret under g MEAN statistic in (34) satisfies IV. M AIN R ESULTS
8 log n π2 Armed with the classical results on multi-armed bandits, we
R(n; µ, 1, ρ1 (gjMEAN )) ≤ Δ(1∗ , i) + 1 + .
∗
Δ(j ∗ , i)2 3 now design distributed learning and allocation policies.
i=1
Algorithm 3 Policy ρRAND (U, C, gj (n)) for each user j under j = 1, . . . , U , in any i ∈ U -worst channel is given by
U users, C channels and statistic gj (n).
U
Input: {X̄i,j (n)}i=1,...,C : Sample-mean availabilities at
8 log n π2
E[Ti,j (n)] ≤ + 1 + . (16)
user j after n rounds, gj (i; n): statistic based on X̄i,j (n), Δ(i, k ∗ )2 3
k=1
σ(a; gj (n)): index of ath highest entry in gj (n).
ζj (i; n): indicator of collision at nth slot at channel i Proof: The proof is similar along the lines of Theorem 2,
Init: Sense in each channel once, n ← C, Curr_Rank ← 1, given in Appendix A. The only difference is that here, we
ζj (i; m) ← 0 need to consider the probability of transmission of each user
Loop: n ← n + 1 in each U -worst channel while in Appendix A, the probability
if ζj (Curr_Sel; n − 1) = 1 then of transmission by all users is considered. 2
Draw a new Curr_Rank ∼ Unif(U ) We now focus on analyzing the number of collisions M (n)
end if in the U -best channels. We first give a result on the expected
Select channel for sensing. If free, transmit. number of collisions in the ideal scenario where each user has
Curr_Sel ← σ(Curr_Rank; gj (n)). perfect knowledge of the channel availability statistics µ. In
If collision ζj (Curr_Sel; m) ← 1, Else 0. this case, the users attempt to reach an orthogonal (collision-
free) configuration by uniformly randomizing over the U -best
channels.
B. ρRAND : Distributed Learning and Access The stochastic process in this case is a finite-state Markov
chain. A state in this Markov chain corresponds to a con-
We present the ρRAND policy in Algorithm 3. Before de- figuration of U number of (identical) users in U number of
scribing this policy, we make some simple observations. If channels. The number of states in the Markov chain is the
each user implemented the single-user policy in Algorithm 1, number of compositions of U , given by 2U−1 U [24, Thm. 5.1].
then it would result in collisions, since all the users target The orthogonal configuration corresponds to the absorbing
the best channel. When there are multiple users and there state. There are two other classes of states in the Markov
is no direct communication among them, the users need to chain. One class consists of states in which each channel either
randomize channel access in order to avoid collisions. At has more than one user or no user at all. The second class
the same time, accessing the U -worst channels needs to be consists of the remaining states where some channels may
avoided since they contribute to regret. Hence, users can avoid have just one user. For the first class, the transition probability
collisions by randomizing access over the U -best channels, to any state of the Markov chain (including self transition and
based on their estimates of the channel ranks. However, if the absorption probabilities) is uniform. For a state in the second
users randomize in every slot, there is a finite probability of class, where certain channels have exactly one user, there are
collisions in every slot and this results in a linear growth of only transitions to states which have identical configuration of
regret with the number of time slots. Hence, the users need the single users and the transition probabilities are uniform.
to converge to a collision-free configuration to ensure that the Let Υ(U, U ) denote the maximum time to absorption in the
regret is logarithmic. above Markov chain starting from any initial distribution. We
In Algorithm 3, there is adaptive randomization based have the following result
on feedback regarding the previous transmission. Each user Lemma 2 (No. of Collisions Under Perfect Knowledge):
randomizes only if there is a collision in the previous slot; The expected number of collisions under ρRAND scheme in
otherwise, the previously generated random rank for the user Algorithm 3, assuming that each user has perfect knowledge
is retained. The estimation for the channel ranks is through of the mean channel availabilities µ, is given by
the g-statistic, along the lines of the single-user case.
E[M (n); ρRAND (U, C, µ)] ≤ U E[Υ(U, U )]
2U − 1
C. Regret Bounds under ρRAND ≤U −1 . (17)
U
It is easy to see that the ρRAND policy ensures that the Proof: See Appendix C. 2
users are allocated orthogonally to the U -best channels as
The above result states that there is at most a finite number
the number of transmission slots goes to infinity. The regret
of expected collisions, bounded by U E[Υ(U, U )] under perfect
bounds on ρRAND are however not immediately clear and we
knowledge of µ. In contrast, recall from the previous section,
provide guarantees below.
that there are no collisions under perfect knowledge of µ
We first provide a logarithmic upper bound6 on the number in the presence of pre-allocated ranks. Hence, U E[Υ(U, U )]
of slots spent by each user in any U -worst channel. Hence, the represents a bound on the additional regret due to the lack
first term in the bound on regret in (15) is also logarithmic. of direct communication among the users to negotiate their
Lemma 1 (Time Spent in U -worst Channels): Under the ranks.
ρRAND scheme in Algorithm 3, the total time spent by any user We use the result of Lemma 2 for analyzing the num-
ber of collisions under distributed learning of the unknown
6 Note that the bound on E[T (n)] in (16) holds for user j even if the
i,j availabilities µ as follows: if we show that the users are
other users are using a policy other than ρRAND . But on the other hand, to
analyze the number of collisions E[M (n)] in (19), we need every user to able to learn the correct order of the different channels with
implement ρRAND . only logarithmic regret then only an additional finite expected
ANANDKUMAR et al.: DISTRIBUTED ALGORITHMS FOR LEARNING AND COGNITIVE MEDIUM ACCESS WITH LOGARITHMIC REGRET 737
number of collisions occur before reaching an orthogonal Algorithm 4 Policy ρEST (n, C, gj (m), ξ) for each user j under
configuration. n transmission slots (horizon length), C channels, statistic
Define T (n; ρRAND ) as the number of slots where any one gj (m) and threshold function ξ.
of the top U -estimated ranks of the channels at some user is 1) Input: {X̄i,j (n)}i=1,...,C : Sample-mean availabilities
wrong under the ρRAND policy. Below we prove that its expected at user j, gj (i; n): statistic based on X̄i,j (n),
value is logarithmic in the number of transmissions. σ(a; gj (n)): index of ath highest entry in gj (n).
Lemma 3 (Wrong Order of g-statistics): Under the ρRAND ζj (i; n): indicator of collision at nth slot at channel i
scheme in Algorithm 3, : current estimate of the number of users.
U
n: horizon (total number of slots for transmission)
U
C
8 log n π2
E[T (n; ρRAND )] ≤ U +1+ . (18) ξ(n; k): threshold functions for k = 1, . . . , C.
a=1
Δ(a∗ , b∗ )2 3 2) Init: Sense each channel once, m ← C, Curr_Sel ←
b=a+1
C, Curr_Rank ← 1, U ← 1, ζj (i; m) ← 0 for all
Proof: See Appendix D. 2 i = 1, . . . , C
We now provide an upper bound on the number of collisions 3) Loop 4 to 6: m ← m + 1, stop when m = n or U = C.
M (n) in the U -best channels by incorporating the above result 4) If ζj (Curr_Sel; m − 1) = 1 then
on E[T (n)], the result on the average number of slots E[Ti,j ] Draw a new Curr_Rank ∼ Unif(U ). end if
spent in the i ∈ U -worst channels in Lemma 1 and the average Select channel for sensing. If free, transmit.
number of collisions U E[Υ(U, U )] under perfect knowledge Curr_Sel ← σ(Curr_Rank; gj (m))
of µ in Lemma 2. 5) ζj (Curr_Sel; m) ← 1 if collision, 0 o.w.
Theorem 3 (Logarithmic Number of Collisions Under ρRAND ): m U )) then
6) If a=1 k=1 ζj (σ(k; gj (m)); a) > ξ(n; U
The expected number of collisions in the U -best channels ←U + 1, ζj (i; a) ← 0, i = 1, . . . C, a = 1, . . . , m.
U
under ρRAND (U, C, gMEAN ) scheme satisfies
end if
E[M (n)] ≤ U (E[Υ(U, U )] + 1) E[Tj (n)]. (19)
Hence, from (16), (18) and (17), M (n) = O(log n). V. D ISTRIBUTED L EARNING AND ACCESS UNDER
Proof: See Appendix E. 2 U NKNOWN N UMBER OF U SERS
Hence, there are only a logarithmic number of expected We have so far assumed that the number of secondary
collisions before the users settle in the orthogonal channels. users is known, and is required for the implementation of
Combining this result with Lemma 1 that the number of the ρRAND policy. In practice, this entails initial announcement
slots spent in the U -worst channels is also logarithmic, we from each of the secondary users to indicate their presence in
immediately have one of the main results of this paper the cognitive network. However, in a truly distributed setting
that the sum regret under distributed learning and access is without any information exchange among the users, such an
logarithmic. announcement may not be possible.
Theorem 4 (Logarithmic Regret Under ρRAND ): The policy In this section, we consider the scenario, where the number
ρ (U, C, gMEAN ) in Algorithm 3 has Θ(log n) regret.
RAND
of users U is unknown (but fixed throughout the duration of
Proof: Substituting (19) and (16) in (15). 2 transmissions and U ≤ C, the number of channels). In this
Hence, we prove that distributed learning and channel case, the policy needs to estimate the number of secondary
access among multiple secondary users is possible with log- users in the system, in addition to learning the channel
arithmic regret without any explicit communication among availability statistics and designing channel access rules based
the users. This implies that the number of lost opportunities on collision feedback. Note that if the policy assumed the
for successful transmissions at all secondary users is only worst-case scenario that U = C, then the regret grows linearly
logarithmic in the number of transmissions, which is negligible since there is a positive probability that the U -worst channels
when there are a large number of transmissions. are selected for sensing in any time slot.
We have so far focused on designing schemes that maximize
system or social throughput. We now briefly discuss the
fairness for an individual user under ρRAND . Since ρRAND does A. Description of ρEST Policy
not distinguish any of the users, in the sense that each user We now propose a policy ρEST in Algorithm 4. This policy
has equal probability of “settling" down in one of the U -best incorporates two broad aspects in each transmission slot, viz.,
channels while experiencing only logarithmic regret in doing execution of the ρRAND policy in Algorithm 3, based on the
so. Simulations in Section VII (in Fig. 4) demonstrate this current estimate of the number of users U , and updating of
phenomenon. the estimate U based on the number of collisions experienced
We also note that the bound on regret under ρRAND grows by the user.
rapidly with the number of users U , due to the bound in The updating is based on the idea that if there is under-
(17). This is due to uniform channel access by the users estimation of U at all the users (U j < U at all the users j),
without any coordination or information exchange. It is of collisions necessarily build up and the collision count serves
interest to explore better distributed access schemes, which as a criterion for incrementing U . This is because after a long
when combined with learning algorithms yield low regret in learning period, the users learn the true ranks of the channels,
the number of users. and target the same set of channels. However, when there is
738 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 29, NO. 4, APRIL 2011
under-estimation, the number of users exceeds the number of thresholds ξ(n) to test against the collision counts obtained
channels targeted by the users. Hence, collisions among the through feedback.
users accumulate, and can be used as a test for incrementing Conditional Regret: We now give the result for the first
.
U part. Define the “good event” C(n; U ) that none of the users
Denote the collision count used by the ρEST policy as over-estimates U under ρEST as
m
k
U
Φk,j (m) := ζj (σ(b; gj (m)); a). (20) C(n; U ):={ EST (n) ≤ U }.
U (22)
j
a=1 b=1 j=1
which is the total number of collisions experienced by user j
so far (till the mth transmission slot) in the top Uj -channels, The regret conditioned on C(n; U ), denoted by
where the ranks of the channels are estimated using the g- R(n; µ, U, ρEST )|C(n; U ), is given by
statistics. The collision count is tested against a threshold
j ), which is a function of the horizon length7 and
U
C
U
ξ(n; U n ∗
μ(k ) − μ(i)E[Vi,j (n)|C(n; U )],
current estimate U j . When the threshold is exceeded, U j is k=1 i=1 j=1
incremented, and the collision samples collected so far are
discarded (by setting them to zero) (line 6 in Algorithm 4). where Vi,j (n) is the number of times that user j is the sole
The choice of the threshold for incrementing U j is critical; if user of channel i. Similarly, we have conditional expectations
it is too small it can result in over-estimation of the number of E[Ti,j (n)|C(n; U )] and of the number of collisions in U -
of users. On the other hand, if it is too large, it can result in best channels, given by E[M (n)|C(n; U )]. We now show that
slow learning and large regret. Threshold selection is studied the regret conditioned on C(n; U ) is O(max(ξ ∗ (n; U ), log n)).
in detail in the subsequent section. Lemma 4: (Conditional Regret): When all the U secondary
users implement the ρEST policy, we have for all i ∈ U -worst
channel and each user j = 1, . . . , U ,
B. Regret Bounds under ρEST
We analyze regret bounds under the ρEST policy, where the U
8 log n π2
regret is defined in (3). Let the maximum threshold function E[Ti,j (n)|C(n)] ≤ +1+ . (23)
Δ(i, k ∗ )2 3
for the number of consecutive collisions under the ρEST policy k=1
ranks of the channels at any user is wrong under the ρEST is O(f (n) log n) for any function f (n) = ω(1), while it is
policy satisfies O(log n) under the ρRAND policy. Hence, we are able to quantify
U C the degradation of performance when the number of users is
8 log n π2 unknown.
E[T (n)] ≤ U +1+ . (26)
a=1
Δ(a∗ , b∗ )2 3
b=a+1
VI. L OWER B OUND & E FFECT OF N UMBER OF U SERS
Proof: The proof is on the lines of Lemma 3 2
Recall the definition of Υ(U, U ) in the previous section, A. Lower Bound For Distributed Learning & access
as the maximum time to absorption starting from any initial We have so far designed distributed learning and access
distribution of the finite-state Markov chain, where the states policies with provable bounds on regret. We now discuss the
correspond to different user configurations and the absorbing relative performance of these policies, compared to the optimal
state corresponds to the collision-free configuration. We now learning and access policies. This is accomplished by noting
generalize the definition to Υ(U, k), as the time to absorption a lower bound on regret for any uniformly-good policy, first
in a new Markov chain, where the state space is the set of derived in [4] for a general class of uniformly-good time-
configurations of U users in k channels, and the transition division policies. We restate the result below.
probabilities are defined on similar lines. Note that Υ(U, k) Theorem 6 (Lower Bound [4]): For any uniformly good
is almost-surely finite when k ≥ U and ∞ otherwise (since distributed learning and access policy ρ, the sum regret in
there is no absorbing state in the latter case). (2) satisfies
We now bound the maximum value of the collision count
R(n; µ, U, ρ) U
Δ(U ∗ , i)
Φk,j (m) under the ρEST policy in (20) using T (m), the total lim inf ≥ . (30)
time spent with wrong channel estimates, and Υ(U, k), the n→∞ log n j=1
D(μi , μj ∗ )
i∈U-worst
st
time to absorption in the Markov chain. Let ≤ denote the The lower bound derived in [9] for centralized learning and
stochastic order for two random variables [25]. access holds for distributed learning and access considered
Proposition 2: The maximum collision count in (20) over here. But a better lower bound is obtained above by consid-
all users under the ρEST policy satisfies ering the distributed nature of learning. The lower bound for
st distributed policies is worse than the bound for the centralized
max Φk,j (m) ≤ (T (m) + 1)Υ(U, k), ∀m ∈ N. (27) policies in (11). This is because each user independently learns
j=1,...,U
the channel availabilities µ in a distributed policy, whereas
Proof: The proof is on the lines of Theorem 3. See Ap-
sensing decisions from all the users are used for learning in a
pendix G. 2
centralized policy.
We now prove that the probability of over-estimation goes
Our distributed learning and access policy ρRAND matches
to zero asymptotically.
the lower bound on regret in (15) in the order (log n) but the
Lemma 6 (No Over-estimation Under ρEST ): For threshold
scaling factors are different. It is not clear if the regret lower
function ξ satisfying (25), the event C(n; U ) in (22) satisfies
bound in (30) can be achieved by any policy under no explicit
lim P[C(n; U )] = 1, (28) information exchange and is a topic for future investigation.
n→∞
decreases with U . This is true since its derivative with respect 500
to U is negative. 450 Random Allocation Scheme
For the upper bound on regret under ρRAND in (15), when U Central Allocation Scheme
400
is increased, the number of U -worst channels decreases and
350 Distributed Lower Bound
hence, the first term in (15) decreases. However, the second
term consisting of collisions M (n) increases to a far greater 300 Centralized Lower Bound
extent. 2 250
Note that the above results are for the upper bound on regret 200
under the ρRAND policy and not the regret itself. Simulations in 150
Section VII reveal that the actual regret also increases with U . 100
Under the centralized scheme ρCENT , as U increases, the number
50
of U -worst channels decreases. Hence, the regret decreases,
0
since there are less number of possibilities of making bad 0 1 2 3 4
4
5
x 10
decisions. However, for distributed schemes although this
R(n)
effect exists, it is far outweighed by the increase in regret (a) Normalized regret log n vs. n slots.
due to the increase in collisions among the U users. U = 4 users, C = 9 channels.
In contrast, the distributed lower bound in (30) displays
anomalous behavior with U since it fails to account for 500
collisions among the users. Here, as U increases there are 450 ρRAND with gMEAN
two competing effects: a decrease in regret due to decrease
400
in the number of U -worst channels and an increase in regret ρRAND with gOPT
250
future study. 80
400
decrease in number of U -worst channels and increase in regret
Random Allocation Scheme
due to increase in number of users visiting these U -worst
350
Central Allocation Scheme
channels.
300 Fig. 3b evaluates the performance of the different algorithms
as the number of channels C is varied while fixing the number
Distributed Lower Bound
90
Random Allocation Scheme Collisions and Learning: Fig. 2c verifies the logarithmic
Central Allocation Scheme nature of the number of collisions under the random alloca-
80
Distributed Lower Bound tion scheme ρRAND . Additionally, we also plot the number of
70
Centralized Lower Bound collisions under ρRAND in the ideal scenario when the channel
60 availability statistics µ are known to see the effect of learning
50 on the number of collisions. The low value of the number
40
of collisions obtained under known channel parameters in
the simulations is in agreement with theoretical predictions,
30
analyzed as U E[Υ(U, U )] in Lemma 2. As the number of
20
slots n increases, the gap between the number of collisions
10 under the known and unknown parameters increases since the
0
2 3 4 5 6 7 8 9
former converges to a finite constant while the latter grows as
O(log n). The logarithmic behavior of the cumulative number
R(n)
(b) Normalized regret log n vs. C channels.
of collisions can be inferred from Fig. 2a. However, the curve
U = 2 users, n = 2500 slots. in Fig. 2c for the unknown parameter case appears linear in
n due to the small value of n.
400 Difference between g OPT and g MEAN : Since the statistic g MEAN
Random Allocation Scheme used in the schemes in this paper differs from the optimal
350
Central Allocation Scheme
statistic g OPT in (5), a simulation is done to compare the perfor-
300 Distributed Lower Bound
mance of the schemes under both the statistics. As expected, in
Fig. 2b, the optimal scheme has better performance. However,
Centralized Lower Bound
250
the use of g MEAN enables us to provide finite-time bounds, as
200 described earlier.
150 Fairness: One of the important features of ρRAND is that
it does not favor any one user over another. Each user has
100
an equal chance of settling down in any one of the U -best
50 channels. Fig. 4 evaluates the fairness characteristics of ρRAND .
The simulation assumes U = 4 cognitive users vying for
0
1 2 3 4 5 6 access to C = 9 channels. The graph depicts which user
asymptotically gets the best channel over 1000 runs of the
R(n)
(c) Normalized regret log n vs. U users. random allocation scheme. As can be seen, each user has
User-channel ratio U/C = 0.5, n = 2500 slots. approximately the same frequency of being allotted the best
Fig. 3. Simulation Results. Probability of Availability µ = channel indicating that the random allocation scheme is indeed
[0.1, 0.2, . . . , 0.9]. fair.
worst channels decreases resulting in lower regret. Also, the VIII. C ONCLUSION
lower bound for the distributed case in (30) initially increases In this paper, we proposed novel policies for distributed
and then decreases with U . This is because as U increases learning of channel availability statistics and channel access
there are two competing effects: decrease in regret due to of multiple secondary users in a cognitive network. The first
742 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 29, NO. 4, APRIL 2011
300
A PPENDIX
A. Proof of Theorem 2
250
The result in (13) involves
U extending the results of [11,
200 Thm. 1]. Define Ti (n):= j=1 Ti,j (n) as the number of times
a channel i is sensed in n rounds for all users. We will show
150 that
8 log n π2
100 E[Ti (n)] ≤ +1+ , ∀i ∈ U -worst.
Δ(k ∗ , i)2 3
k∈U-best
50 (31)
We have
0
1 2 3 4
P[Tx. in i in nth slot] = P[g(U ∗ ; n) ≤ g(i; n)],
=P[A(i; n) ∩ (g(U ∗ ; n) ≤ g(i; n))]
Fig. 4. Simulation Results. Probability of Availability µ = + P[Ac (i; n) ∩ (g(U ∗ ; n) ≤ g(i; n))],
[0.1, 0.2, . . . , 0.9]. No. of slots where user has best channel vs. user. U = 4,
C = 9, n = 2500 slots, 1000 runs, ρRAND . where
A(i; n):= (g(k; n) ≤ g(i; n))
k∈U-best
policy assumed that the number of secondary users in the
network is known, while the second policy removed this is the event that at least one of the U -best channels has g-
requirement. We provide provable guarantees for our policies statistic less than i. Hence, from union bound we have
in terms of sum regret. By noting the lower bound on regret for
any uniformly-good learning and access policy, we find that P[A(i; n)] ≤ P[g(k; n) ≤ g(i; n)].
our first policy achieves order-optimal regret while our second k∈U-best
policy is also nearly order optimal. Our analysis in this paper
We have for C > U ,
provides insights on incorporating learning and distributed
medium access control in a practical cognitive network. P[Ac (i; n) ∩ (g(U ∗ ; n) ≤ g(i; n))] = 0,
The results of this paper open up an interesting array of
problems for future investigation. Simulations suggest that our Hence,
lower and upper bounds are not tight in terms of the scaling
constant and that better bounds are needed. Our assumptions P[Tx. in i in nth round] ≤ P[g(k; n) ≤ g(i; n)].
of an i.i.d. model for primary user transmissions and perfect k∈-best
sensing at the secondary users need to be relaxed. Our policy
allows for an unknown but fixed number of secondary users, On the lines of [11, Thm. 1], we have ∀k, i :
and it is of interest to incorporate users dynamically entering k is U -best, i is U -worst
and leaving the system. Moreover, our model ignores dynamic
n
8 log n π2
traffic at the secondary nodes and extension to a queueing- I[g(k; l) ≤ g(i; l)] ≤ + 1 + .
Δ(k ∗ , i)2 3
theoretic formulation is desirable. We consider the worst-case l=1
scenario that there is no information exchange among the sec-
Hence, we have (31). For the bound on regret, we can break
ondary users. Extension to the case with limited information
R in (2) into two terms
exchange is of interest. The proposed schemes are designed
to be simple and tractable for mathematical analysis. It is 1
U
desirable to obtain more robust schemes with additional design R(n; µ, U, ρCENT ) = Δ(l∗ , i) E[Ti (n)]
considerations for real-world deployment. U
i∈U-worst l=1
1
U
+ Δ(l∗ , i) E[Ti (n)].
U
Acknowledgement i∈U-best l=1
The authors thank the guest editors and anonymous review- For the second term, we have
ers for valuable comments that vastly improved this paper, and 1
U
for pointing out an error in Proposition 1. The authors thank Δ(l∗ , i) E[Ti (n)]
Keqin Liu and Prof. Qing Zhao for extensive discussions, U
i∈U-best l=1
feedback on the proofs in an earlier version of the manuscript, 1
U
and for sharing their simulation code. The authors also thank ≤ E[T ∗ (n)] Δ(l∗ , i) = 0,
Prof. Lang Tong and Prof. Robert Kleinberg at Cornell, Prof. U
i∈U-best l=1
Bhaskar Krishnamachari at USC and Prof. John Tsitsiklis and
Dr. Ishai Menache at MIT for helpful comments. where T ∗ (n):= max Ti (n). Hence, we have the bound.
i∈U-best
ANANDKUMAR et al.: DISTRIBUTED ALGORITHMS FOR LEARNING AND COGNITIVE MEDIUM ACCESS WITH LOGARITHMIC REGRET 743
B. Proof of Proposition 1 Now for the random allocation scheme without the genie,
U any user not experiencing collision does not draw a new
For convenience, let Ti (n) := j=1 Ti,j (n), Vi (n) :=
U C variable from Unif(U ). Hence, the number of possible config-
V
j=1 i,j (n). Note that T
i=1 i (n) = nU, since each user
urations in any slot is lower than under genie-aided scheme.
selects one channel for sensing in each slot and there are U
Since there is only one configuration satisfying orthogonality9,
users. From (3),
the probability of orthogonality increases in the absence of the
U
C genie and is at least (35). Hence, the number of slots to reach
R(n) =n μ(i∗ ) − μ(i)E[Vi (n)], orthogonality without the genie is at most (34). Since in any
i=1
i=1 slot, at most U collisions occur, (17) holds.
≤ μ(i)(n − E[Vi (n)])
D. Proof of Lemma 3
i∈U-best
∗
≤μ(1 )(nU − E[Vi (n)]) (32) Let cn,m := 2 log
m .
n
where p is the probability of having an orthogonal configura- T (n) ≤ U I(gjMEAN (a∗ ; m) < gjMEAN (b∗ ; m)),
a=1 b=a+1 m=1
tion in a slot. This is in fact the reciprocal of the number of
compositions of U [24, Thm. 5.1], given by where a∗ and b∗ represent channels with ath and bth highest
−1 availabilities. On lines of the result for U = C = 2, we can
2U − 1 show that
p= . (35)
U n
8 log n π2
EI[gjMEAN (a∗ ; m) < gjMEAN (b∗ ; m)] ≤ 2 +1+ .
The above expression is nothing but the reciprocal of number
m=1
Δa∗ ,b∗ 3
of ways U identical balls (users) can be placed in U different
Hence, (18) holds.
bins (channels): there are 2U − 1 possible positions to form
U partitions of the balls. 9 since all users are identical for this analysis.
744 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 29, NO. 4, APRIL 2011
The number of slots under the bad event is = P[ max ΦU,j (n) > ξ(n; U )],
j=1,...,U
n
I[G (m)] = T (n),
c where Φ is given by (20). For U = 1, we have P[C c (n); U ] = 0
m=1 since no collisions occur.
Using (27) in Proposition 2,
by definition of T (n). In each slot, either a good or a bad
event occurs. Let γ be the total number of collisions in U - k
P[max Φk,j (n) > ξ(n; k)]
best channels between two bad events, i.e., under a run of j=1
good events. In this case, all the users have the correct top ≤ P[kΥ(U, k)(T (n) + 1) > ξ(n; k)]
U -ranks of channels and hence, ξ(n; k)
≤ P[k(T (n) + 1) > ] + P[Υ(U, k) > αn ]
E[γ|G(n)] ≤ U E[Υ(U, U )] < ∞, αn
kαn (E[T (n)] + 1)
≤ + P[Υ(U, k) > αn ], (36)
where E[Υ(U, U )] is given by (17). Hence, each transition ξ(n; k)
from the bad to the good state results in at most U E[Υ(U, U )]
using Markov inequality. By choosing αn = ω(1), the second
expected number of collisions in the U -best channels. The
term in (36), viz., P[Υ(U, k) > αn ] → 0 as n → ∞, for
expected number of collisions under the bad event is at most
k ≥ U . For the first term, from (26) in Lemma 5, E[T (n)] =
U E[T (n)]. Hence, (19) holds.
O(log n). Hence, by choosing αn = o(ξ ∗ (n; k)/ log n), the
first term decays to zero. Since ξ ∗ (n; U ) = ω(log n), we can
F. Proof of Lemma 4 choose αn satisfying both the conditions. By letting k = U in
(36), we have P[C c (n); U ] → 0 as n → ∞, and (28) holds.
Under C(n; U ), a U -worst channel is sensed only if it is
mistaken to be a U -best channel. Hence, on lines of Lemma 1,
R EFERENCES
E[Ti,j (n)|C(n; U )] = O(log n), ∀i ∈ U -worst, j = 1, . . . , U. [1] A. Anandkumar, N. Michael, and A. Tang, “Opportunistic Spectrum
Access with Multiple Users: Learning under Competition,” in Proc.
[14] H. Liu, B. Krishnamachari, and Q. Zhao, “Cooperation and Learning in Nithin Michael (S ‘06) received the B.S. degree
multiuser opportunistic spectrum access,” in IEEE Intl. Conf. on Comm. summa cum laude from Drexel University, Philadel-
(ICC), Beijing, China, May 2008. phia, PA, in 2008 in electrical engineering. He is
[15] F. Fu and M. van der Schaar, “Learning to compete for resources in currently pursuing the M.S./Ph.D. joint degree pro-
wireless stochastic games,” IEEE Trans. Veh. Technol., vol. 58, no. 4, gram in electrical engineering at Cornell University.
pp. 1904–1919, May 2009. Mr. Michael was the recipient of first Honors in ECE
[16] H. Gang, Z. Qian, and X. Ming, “Contention-Aware Spectrum Sensing and the Highest Academic Achievement Award at
and Access Algorithm of Cognitive Network,” in Intl. Conf. on Cognitive Drexel University in 2008. At Cornell University,
Radio Oriented Wireless Networks and Comm., Singapore, May 2008. he has been awarded the Jacobs Fellowship in 2008
[17] H. Liu, L. Huang, B. Krishnamachari, and Q. Zhao, “A Negotiation and 2010.
Game for Multichannel Access in Cognitive Radio Networks,” in Proc.
Intl. Conf. on Wireless Internet, Las Vegas, NV, Nov. 2008.
[18] H. Li, “Multi-agent Q-Learning of Channel Selection in Multi-user
Cognitive Radio Systems: A Two by Two Case,” in IEEE Conf. on
System, Man and Cybernetics, Istanbul, Turkey, 2009.
[19] M. Maskery, V. Krishnamurthy, and Q. Zhao, “Game Theoretic Learning Ao Tang (S’ 01 M’ 07) received the B.E. in
and Pricing for Dynamic Spectrum Access in Cognitive Radio,” in Electronics Engineering from Tsinghua University,
Cognitive Wireless Comm. Networks. Springer, 2007. Beijing, China, and the Ph.D. in Electrical Engineer-
[20] R. Kleinberg, G. Piliouras, and E. Tardos, “Multiplicative Updates ing with a minor in applied and computational math-
Outperform Generic No-regret Learning in Congestion Games,” in Proc. ematics from the California Institute of Technology,
ACM Symp. on theory of computing (STOC), Bethesda, MD, May-June Pasadena, CA, in 1999 and 2006, respectively.
2009. He is an Assistant Professor in the School of
[21] Y. Gai, B. Krishnamachari, and R. Jain, “Learning Multiuser Channel Electrical and Computer Engineering at Cornell Uni-
Allocations in Cognitive Radio Networks: A Combinatorial Multi- versity, where his current research is on control and
Armed Bandit Formulation,” in IEEE Symp. on Dynamic Spectrum optimization of networks including communication
Access Networks (DySPAN), Singapore, April 2010. networks, power networks and on-chip networks. Dr.
[22] K. Liu, Q. Zhao, and B. Krishnamachari, “Distributed learning under Tang was the recipient of the 2006 George B. Dantzig Best Dissertation
imperfect sensing in cognitive radio networks,” in Submitted to Proc. Award, the 2007 Charles Wilts best dissertation Prize, and a 2009 IBM Faculty
IEEE Asilomar Conf. on Signals, Sys., and Comp., Monterey, CA, Oct. Award.
2010.
[23] T. Cover and J. Thomas, Elements of Information Theory. John Wiley
& Sons, Inc., 1991.
[24] M. Bona, A Walk Through Combinatorics: An Introduction to Enumer-
ation and Graph Theory. World Scientific Pub. Co. Inc., 2006. Ananthram Swami (F’08) received the B.Tech.
[25] M. Shaked and J. Shanthikumar, Stochastic Orders. Springer, 2007. degree from the Indian Institute of Technology (IIT),
Bombay; the M.S. degree from Rice University,
Houston, TX, and the Ph.D. degree from the Uni-
versity of Southern California (USC), Los Angeles,
all in electrical engineering.
Animashree Anandkumar (S’ 02, M’ 09) received He has held positions with Unocal Corporation,
her B.Tech in Electrical Engineering from the Indian USC, CS-3 and Malgudi Systems. He was a Sta-
Institute of Technology Madras in 2004 and her tistical Consultant to the California Lottery, devel-
Ph.D. in Electrical and Computer Engineering with oped HOSAT, a MATLAB-based toolbox for non-
a minor in Applied Math from Cornell University Gaussian signal processing, and has held visiting
in 2009. She was a post-doctoral researcher at the faculty positions at INP, Toulouse, France. He is the ST for Network Science
Stochastic Systems Group at Massachusetts Insti- at the U.S. Army Research Laboratory. His work is in the broad areas of signal
tute of Technology between 2009 and 2010. She processing, wireless communications, sensor and mobile ad hoc networks. He
is currently an assistant professor at the Electrical is the co-editor of the book Wireless Sensor Networks: Signal Processing &
Engineering and Computer Science Department at Communications Perspectives (New York: Wiley, 2007).
University of California, Irvine. She is also a mem- Dr. Swami is a member of the IEEE Signal Processing Society’s (SPS)
ber of the center for pervasive communications and computing at UCI. Technical Committee on Sensor Array and Multichannel systems and serves
She is the recipient of the 2008 IEEE Signal Processing Society (SPS) on the Senior Editorial Board of the IEEE Journal on Selected Topics in Signal
Young Author award for her paper co-authored with Lang Tong which Processing. He was a tutorial speaker on Networking Cognitive Radios for
appeared in the IEEE Transactions on Signal Processing. She is the recipient Dynamic Spectrum Access at IEEE ICC 2010 and co-chair of IEEE SPAWC
of the Fran Allen IBM Ph.D fellowship 2008-09, presented in conjunction 2010.
with the IBM Ph.D. Fellowship Award. She received the Best Thesis Award
2009 by the ACM Sigmetrics society. Her research interests are in the area
of statistical-signal processing, information theory and networking with a
focus on distributed inference and learning of graphical models. She serves
on the Technical Program Committee of IEEE INFOCOM 2011, ACM
MOBIHOC 2011, IEEE ICC 2010, MILCOM 2010 and IEEE PIMRC 2010.
She has served as a reviewer for IEEE Transactions on Signal Processing,
IEEE Journal on Selected Areas in Communications, IEEE Transactions on
Information Theory, IEEE Transactions on Wireless Communications and
IEEE Signal Processing Letters.