You are on page 1of 3

1

Online Appendix for: Fast Learning for


Dynamic Resource Allocation in AI-enabled
Radio Networks
Muhammad Anjum Qureshi, Student Member, IEEE, Cem Tekin, Member, IEEE

This online appendix includes the proofs of Lemmas 1 and 2 in [1].

I. P ROOF OF L EMMA 1

Note that L̃x (t) = a∗x , ãx (t) = a and µ(x, a∗x ) ≤ ũx,a∗x (t) together imply that ũx,a (t) ≥ ũx,a∗x (t) ≥ µ(x, a∗x ).
Thus, we have

{ũx,a (t) ≥ µ(x, a∗x )} = {min {ũx,a (t), w̃x,a (t)} ≥ µ(x, a∗x )}

= {ũx,a (t) ≥ µ(x, a∗x ), w̃x,a (t) ≥ µ(x, a∗x )}.

Condition ũx,a (t) ≥ µ(x, a∗x ) implies that


Ñx,a (t)d+
a (µ̃x,a (t), µ(x, ax )) ≤ Ñx,a (t)da (µ̃x,a (t), ũx,a (t))

≤ log(T ) + 3 log(log(T )) = f (T ).

Similarly, condition w̃x,a (t) ≥ µ(x, a∗x ) implies that


M̃x,a (t)d+
a (η̃x,a (t), µ(x, ax )) ≤ M̃x,a (t)da (η̃x,a (t), w̃x,a (t)) ≤ f (T ).

Using the inequalities above, we get


 
Nx (T +1)
X
E 1{L̃x (t) = a∗x , µ(x, a∗x ) ≤ ũx,a∗x (t), ãx (t) = a}
t=1
" Nx (T +1)
X

≤E 1 Ñx,a (t)d+
a (µ̃x,a (t), µ(x, ax )) ≤ f (T ),
t=1
#

M̃x,a (t)d+

a (η̃x,a (t), µ(x, ax )) ≤ f (T ), ãx (t) = a

This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grant 116E229.
The authors are with the Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800 Turkey (e-mail:
qureshi@ee.bilkent.edu.tr; cemtekin@ee.bilkent.edu.tr).
" Nx (T +1) t 2
X X
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a,
t=1 s=1
#
∗ ∗
1)d+ s s
d+ s

(s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T )
" T X
t
X 
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a
t=1 s=1
#
∗ ∗
1)d+ s s
d+ s

1 (s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T ) .

By the change of variables, the sum above can be rewritten as


" T T
XX 
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a
s=1 t=s
#
∗ ∗
1)d+ s s
d+ s

1 (s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T )
" T
X
∗ ∗
1 (s − 1)d+ s s + s

≤E a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a da (η̃x,a , µ(x, ax )) ≤ f (T )
s=1
T
#
X 
1 Ñx,a (t) = (s − 1), ãx (t) = a
t=s
T
X
∗ ∗
P (s − 1)d+ s s + s

≤ a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a da (η̃x,a , µ(x, ax )) ≤ f (T )
s=1
PT 
where the last inequality holds since s ∈ {1, . . . , T }, t=s 1 Ñx,a (t) = (s − 1), ãx (t) = a ≤ 1.

II. P ROOF OF L EMMA 2

Let A = {µ(x, a) ≤ µ(ỹx,a (t), a)}. We have

P(µ(x, a) > ũx,a (t)) = P(µ(x, a) > min {ũx,a (t), w̃x,a (t)})

≤ P(µ(x, a) > ũx,a (t)) + P(µ(x, a) > w̃x,a (t))

= P(µ(x, a) > ũx,a (t))

+ P(µ(x, a) > w̃x,a (t) | A) P(A) + P(µ(x, a) > w̃x,a (t) | Ac ) P(Ac )

≤ P(µ(x, a) > ũx,a (t)) + P(µ(ỹx,a (t), a) > w̃x,a (t)) + P(Ac )

≤ 2edδ log(τx (t))e exp(−δ) + P(Ac )

where the last inequality follows from Theorem 10 in [2]. Next, we bound the probability of event Ac as

P(Ac ) ≤ 2Xedδ log(τx (t))e exp(−δ).

Since the KL-UCB index is an upper bound on true mean with high probability and the KL-LCB index is a lower
bound on true mean with high probability, Ac implies that there exists at least one context for arm a whose KL-UCB
3
index under-estimates its mean value or KL-LCB index over-estimates its mean value, i.e., given

Ac1 = ∪x0 ∈X {ũxx0 ,a (t) < µ(x0 , a)},

Ac2 = ∪x0 ∈X {˜lxx0 ,a (t) > µ(x0 , a)}

we have Ac ⊂ Ac1 ∪ Ac2 .


Since by the definition of KL-UCB in (8), q 7→ d(p, q) is increasing function for q ≥ p, if ũxx0 ,a (t) < µ(x0 , a),
then
µ̃xx0 ,a (t) µ(x0 , a)
 
Ñxx0 ,a (t)d , > δ.
Ka Ka
Similarly by the definition of KL-LCB in (10), q 7→ d(p, q) is decreasing function for q ≤ p, if ˜lxx0 ,a (t) > µ(x0 , a)
then:
µ̃xx0 ,a (t) µ(x0 , a)
 
Ñxx0 ,a (t)d , > δ.
Ka Ka
Therefore,
[  µ̃xx0 ,a (t) µ(x0 , a)
  
Ac1 ∪ Ac2 ⊆ Ñxx0 ,a (t)d , >δ .
Ka Ka
x0 ∈X

Again by using Theorem 10 in [2], we get

P(Ac ) ≤ 2Xedδ log(τx (t))e exp(−δ)

and hence,

P(µ(x, a) > ũx,a (t)) ≤ 2(X + 1)edδ log(τx (t))e exp(−δ).

R EFERENCES

[1] M. A. Qureshi and C. Tekin, “Fast learning for dynamic resource allocation in AI-enabled radio networks,” to appear in IEEE Trans. Cogn.
Commun. Netw., 2020.
[2] A. Garivier and O. Cappé, “The KL-UCB algorithm for bounded stochastic bandits and beyond,” in Proc. JMLR Workshop Conf., pp. 359–
376, 2011.

You might also like