Professional Documents
Culture Documents
Proof of Thm. 6. Fix x0 2 [m]. By Asm. 2 we have E[Y (x) Y (x0 ) | (z) = x] = x x0 .
Therefore:
Pm
z = E[Y ( (z))] = x=1 Pzx E[Y (x) | (z) = x]
Pm
= x=1 Pzx (x x0 + E[Y (x0 ) | (z) = x])
Pm Pm
= x=1 Pzx x x0 + x0 = x=1 Pzx x
(t(s)
for a fixed s, z is simply an empirical average of s iid draws of the reward when z is pulled.
)
t(s)
Similarly, for fixed s, Pzxz is simply an empirical average of s iid draws of the indicator whether x is
applied when z is pulled.
Pm 1 2
Lemma 1. Fix s1 , . . . , sm . Then P(kP ((s1 ,...,sm )) P k1 > ) z=1 em 8 sz .
Proof. Define ez 2 Rm as the one-hot unit vector with one in coordinate z. Let us define the
empirical Rademacher complexities
X X
b ((sz )) = 1
R sup
1
u v T eXu .
z
2sz ((s ) kvk1 1
s z ((s )) z ) z
2{ 1,+1}Tz u2Tz
Note that
0 1
1 X
kPz((sz )) Pz k1 = sup @ v T eXu v T Pz A
kvk1 1 s z ((s ))
u2Tz z
Then by Bartlett and Mendelson [6], we have that with probability at least 1 ,
p
b ((sz )) ] + log(1/)/(2sz )
kP ((sz )) Pz k1 2E[R
z z
By linearity and duality of norms, we have
X sz
b ((sz )) = 1 1 X
R z u e Xu
2sz sz u=1
2{ 1,+1}sz 1
m
X X sz
X
1 1
= I [Xu = x] u
sz x=1
2sz
2{ 1,+1}sz u=1
m n((s z ))
1 X 1 X X
zx
= ((sz ))
u
sz x=1 2 zx
n
((sz )) u=1
2{ 1,+1}nzx
& '
m ((s ))
1 X 1 nzx z l m m
= ((s ))
nzx z
sz x=1 2n((s
zx
z ))
1 2 2
Xm q m q
X r
1 ((s )) 1 ((s )) m
nzx z = p Pzx z .
sz x=1 sz x=1 sz
p
Therefore, by 2 1/ 2, Jensens inequality, and the concavity of square root, with probability at
least 1 ,
p
kPz((sz )) Pz k1 4 (m log )/(2sz ).
((s ))
Finally, P(kP ((s1 ,...,sm ))
P k1 > ) = P(9z : kPz z Pz k1 > ) and the union bound complete
the proof.
Proof. We have
p
((s1 ,...,sm )) 1
p ((s1 ,...,sm )) 1 m
k(P ) k1 mk(P ) k2 =
min (P ((s1 ,...,sm )) )
p
m 1
p .
( kP ((s1 ,...,sm )) P k2 )+ ( / m kP ((s1 ,...,sm )) P k1 )+
p
Applying Lemma 1 with = / m yields the result.
Lemma 3. Fix s and let Asm. 4 hold. Then,
((s)) s2 /(4 )
P(| z Pz((s)) | > ) 2e .
p
Proof. Applying Lemma 2 with = 2/(2 + 2 ), applying Lemma 3 with (1 f) 1
m 1/2
for each z, the union bound twice, and 2 em yield the result.
Pt
Proof of Thm. 7. Let E (t) = s=1 s , B = maxx=1,...,m (
x ), and = minx62X (
(t)
x ). Let t be one if at time step t we explored (random pull) and zero otherwise. Let nz =
P (t)
s2Tz
(t) s nz . Then,
Pm P1 (t 1) (t 1)
E CRegret(T ) p x=1 ( x )E (T ) /m + B t=1 P(k((n1 ,...,nm )) k1 /2).
Note
Pm that E
(t 1)
m log(t)/ and E (t) m(log(t) + 1)/pso that the first regret term is
x=1 ( x )(log(T ) + 1)/. Let = min2
(P ) 2 /(8m( 2 )2 ) > 0 and = 1 e > 0.
By Lemma 4, because t is independent of the data, and by the law of total probability, we have that
(t) (t) Pm (t)
P(k((n1 ,...,nm )) k1 /2) 2em z=1 E[e nz ]. Moreover,
(t) Pt
log E[e nz ] = s=1 log(e t /m + (1 t /m))
Pt
s=1 t /m = E (t) /m.
P1 (t 1) P 1
Finally, t=1 e E /m
t=1 t / and by assumption 0 < min( , 1)/2 < so that
the second regret term is a finite constant.
(t 1)
Proof of Thm. 8. Let B = maxx=1,...,m ( x ), = minx62X ( x ), tz = I[n (t) <
z0
(t) (t) Pt Pm
log(t)/, z = z0 ], nz = s=1 tz , t = z=1 tz , and s(t) = dlog(t)/e. We then have
Pm (T ) P1 (t) (t)
E CRegret(T ) p x=1 ( x )E nx + B t=1 P(k((s ,...,s )) k1 /2).
(T )
For the first regret term, we have nx 1 + log(T )/ under by construction and so the same is
(T )
true for E nx . For the second term, we invoke Lemma 4 and use s(t) log t/ in order to get
2 ) 2
(t) min (P
,...,s(t) )) p
P(k((s k1 /2) 2em mt 8m( + 2 )2 .
2
min (P ) 2
By assumption, we have that 8m( + 2 )2
p > 1 so that the second regret term is a finite constant.