Professional Documents
Culture Documents
I. P ROOF OF L EMMA 1
Note that L̃x (t) = a∗x , ãx (t) = a and µ(x, a∗x ) ≤ ũx,a∗x (t) together imply that ũx,a (t) ≥ ũx,a∗x (t) ≥ µ(x, a∗x ).
Thus, we have
{ũx,a (t) ≥ µ(x, a∗x )} = {min {ũx,a (t), w̃x,a (t)} ≥ µ(x, a∗x )}
∗
Ñx,a (t)d+
a (µ̃x,a (t), µ(x, ax )) ≤ Ñx,a (t)da (µ̃x,a (t), ũx,a (t))
≤ log(T ) + 3 log(log(T )) = f (T ).
∗
M̃x,a (t)d+
a (η̃x,a (t), µ(x, ax )) ≤ M̃x,a (t)da (η̃x,a (t), w̃x,a (t)) ≤ f (T ).
This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grant 116E229.
The authors are with the Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800 Turkey (e-mail:
qureshi@ee.bilkent.edu.tr; cemtekin@ee.bilkent.edu.tr).
" Nx (T +1) t 2
X X
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a,
t=1 s=1
#
∗ ∗
1)d+ s s
d+ s
(s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T )
" T X
t
X
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a
t=1 s=1
#
∗ ∗
1)d+ s s
d+ s
1 (s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T ) .
P(µ(x, a) > ũx,a (t)) = P(µ(x, a) > min {ũx,a (t), w̃x,a (t)})
+ P(µ(x, a) > w̃x,a (t) | A) P(A) + P(µ(x, a) > w̃x,a (t) | Ac ) P(Ac )
≤ P(µ(x, a) > ũx,a (t)) + P(µ(ỹx,a (t), a) > w̃x,a (t)) + P(Ac )
where the last inequality follows from Theorem 10 in [2]. Next, we bound the probability of event Ac as
Since the KL-UCB index is an upper bound on true mean with high probability and the KL-LCB index is a lower
bound on true mean with high probability, Ac implies that there exists at least one context for arm a whose KL-UCB
3
index under-estimates its mean value or KL-LCB index over-estimates its mean value, i.e., given
and hence,
R EFERENCES
[1] M. A. Qureshi and C. Tekin, “Fast learning for dynamic resource allocation in AI-enabled radio networks,” to appear in IEEE Trans. Cogn.
Commun. Netw., 2020.
[2] A. Garivier and O. Cappé, “The KL-UCB algorithm for bounded stochastic bandits and beyond,” in Proc. JMLR Workshop Conf., pp. 359–
376, 2011.