Online Appendix For: Fast Learning For Dynamic Resource Allocation in AI-enabled Radio Networks

1
Online Appendix for: Fast Learning for

Dynamic Resource Allocation in AI-enabled
Radio Networks
Muhammad Anjum Qureshi, Student Member, IEEE, Cem Tekin, Member, IEEE
This online appendix includes the proofs of Lemmas 1 and 2 in [1].
I. P ROOF OF L EMMA 1
Note that L̃x (t) = a∗x , ãx (t) = a and µ(x, a∗x ) ≤ ũx,a∗x (t) together imply that ũx,a (t) ≥ ũx,a∗x (t) ≥ µ(x, a∗x ).
Thus, we have
{ũx,a (t) ≥ µ(x, a∗x )} = {min {ũx,a (t), w̃x,a (t)} ≥ µ(x, a∗x )}
= {ũx,a (t) ≥ µ(x, a∗x ), w̃x,a (t) ≥ µ(x, a∗x )}.
Condition ũx,a (t) ≥ µ(x, a∗x ) implies that
∗
Ñx,a (t)d+
a (µ̃x,a (t), µ(x, ax )) ≤ Ñx,a (t)da (µ̃x,a (t), ũx,a (t))
≤ log(T ) + 3 log(log(T )) = f (T ).
Similarly, condition w̃x,a (t) ≥ µ(x, a∗x ) implies that
∗
M̃x,a (t)d+
a (η̃x,a (t), µ(x, ax )) ≤ M̃x,a (t)da (η̃x,a (t), w̃x,a (t)) ≤ f (T ).
Using the inequalities above, we get

 
Nx (T +1)
X
E 1{L̃x (t) = a∗x , µ(x, a∗x ) ≤ ũx,a∗x (t), ãx (t) = a}
t=1
" Nx (T +1)
X
∗
≤E 1 Ñx,a (t)d+
a (µ̃x,a (t), µ(x, ax )) ≤ f (T ),
t=1
#
∗
M̃x,a (t)d+

a (η̃x,a (t), µ(x, ax )) ≤ f (T ), ãx (t) = a
This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grant 116E229.
The authors are with the Department of Electrical and Electronics Engineering, Bilkent University, Ankara 06800 Turkey (e-mail:
qureshi@ee.bilkent.edu.tr; cemtekin@ee.bilkent.edu.tr).
" Nx (T +1) t 2
X X
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a,
t=1 s=1
#
∗ ∗
1)d+ s s
d+ s

(s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T )
" T X
t
X
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a
t=1 s=1
#
∗ ∗
1)d+ s s
d+ s

1 (s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T ) .
By the change of variables, the sum above can be rewritten as

" T T
XX
≤E 1 Ñx,a (t) = (s − 1), ãx (t) = a
s=1 t=s
#
∗ ∗
1)d+ s s
d+ s

1 (s − a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a a (η̃x,a , µ(x, ax )) ≤ f (T )
" T
X
∗ ∗
1 (s − 1)d+ s s + s

≤E a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a da (η̃x,a , µ(x, ax )) ≤ f (T )
s=1
T
#
X
1 Ñx,a (t) = (s − 1), ãx (t) = a
t=s
T
X
∗ ∗
P (s − 1)d+ s s + s

≤ a (µ̃x,a , µ(x, ax )) ≤ f (T ), M̃x,a da (η̃x,a , µ(x, ax )) ≤ f (T )
s=1
PT
where the last inequality holds since s ∈ {1, . . . , T }, t=s 1 Ñx,a (t) = (s − 1), ãx (t) = a ≤ 1.
II. P ROOF OF L EMMA 2
Let A = {µ(x, a) ≤ µ(ỹx,a (t), a)}. We have
P(µ(x, a) > ũx,a (t)) = P(µ(x, a) > min {ũx,a (t), w̃x,a (t)})
≤ P(µ(x, a) > ũx,a (t)) + P(µ(x, a) > w̃x,a (t))
= P(µ(x, a) > ũx,a (t))
+ P(µ(x, a) > w̃x,a (t) | A) P(A) + P(µ(x, a) > w̃x,a (t) | Ac ) P(Ac )
≤ P(µ(x, a) > ũx,a (t)) + P(µ(ỹx,a (t), a) > w̃x,a (t)) + P(Ac )
≤ 2edδ log(τx (t))e exp(−δ) + P(Ac )
where the last inequality follows from Theorem 10 in [2]. Next, we bound the probability of event Ac as
P(Ac ) ≤ 2Xedδ log(τx (t))e exp(−δ).
Since the KL-UCB index is an upper bound on true mean with high probability and the KL-LCB index is a lower
bound on true mean with high probability, Ac implies that there exists at least one context for arm a whose KL-UCB
3
index under-estimates its mean value or KL-LCB index over-estimates its mean value, i.e., given
Ac1 = ∪x0 ∈X {ũxx0 ,a (t) < µ(x0 , a)},
Ac2 = ∪x0 ∈X {˜lxx0 ,a (t) > µ(x0 , a)}
we have Ac ⊂ Ac1 ∪ Ac2 .

Since by the definition of KL-UCB in (8), q 7→ d(p, q) is increasing function for q ≥ p, if ũxx0 ,a (t) < µ(x0 , a),
then
µ̃xx0 ,a (t) µ(x0 , a)

Ñxx0 ,a (t)d , > δ.
Ka Ka
Similarly by the definition of KL-LCB in (10), q 7→ d(p, q) is decreasing function for q ≤ p, if ˜lxx0 ,a (t) > µ(x0 , a)
then:
µ̃xx0 ,a (t) µ(x0 , a)

Ñxx0 ,a (t)d , > δ.
Ka Ka
Therefore,
[ µ̃xx0 ,a (t) µ(x0 , a)

Ac1 ∪ Ac2 ⊆ Ñxx0 ,a (t)d , >δ .
Ka Ka
x0 ∈X
Again by using Theorem 10 in [2], we get
P(Ac ) ≤ 2Xedδ log(τx (t))e exp(−δ)
and hence,
P(µ(x, a) > ũx,a (t)) ≤ 2(X + 1)edδ log(τx (t))e exp(−δ).
R EFERENCES
[1] M. A. Qureshi and C. Tekin, “Fast learning for dynamic resource allocation in AI-enabled radio networks,” to appear in IEEE Trans. Cogn.
Commun. Netw., 2020.
[2] A. Garivier and O. Cappé, “The KL-UCB algorithm for bounded stochastic bandits and beyond,” in Proc. JMLR Workshop Conf., pp. 359–
376, 2011.

Online Appendix For: Fast Learning For Dynamic Resource Allocation in AI-enabled Radio Networks

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Online Appendix For: Fast Learning For Dynamic Resource Allocation in AI-enabled Radio Networks

Uploaded by

Copyright:

Available Formats

1

Online Appendix for: Fast Learning for

This online appendix includes the proofs of Lemmas 1 and 2 in [1].

= {ũx,a (t) ≥ µ(x, a∗x ), w̃x,a (t) ≥ µ(x, a∗x )}.

Condition ũx,a (t) ≥ µ(x, a∗x ) implies that

Similarly, condition w̃x,a (t) ≥ µ(x, a∗x ) implies that

Using the inequalities above, we get

By the change of variables, the sum above can be rewritten as

II. P ROOF OF L EMMA 2

Let A = {µ(x, a) ≤ µ(ỹx,a (t), a)}. We have

≤ P(µ(x, a) > ũx,a (t)) + P(µ(x, a) > w̃x,a (t))

= P(µ(x, a) > ũx,a (t))

≤ 2edδ log(τx (t))e exp(−δ) + P(Ac )

P(Ac ) ≤ 2Xedδ log(τx (t))e exp(−δ).

Ac1 = ∪x0 ∈X {ũxx0 ,a (t) < µ(x0 , a)},

Ac2 = ∪x0 ∈X {˜lxx0 ,a (t) > µ(x0 , a)}

we have Ac ⊂ Ac1 ∪ Ac2 .

Again by using Theorem 10 in [2], we get

P(Ac ) ≤ 2Xedδ log(τx (t))e exp(−δ)

P(µ(x, a) > ũx,a (t)) ≤ 2(X + 1)edδ log(τx (t))e exp(−δ).

You might also like