You are on page 1of 4

Supplementary Material for Instrument-Armed Bandits

Proofs of IAB Regret Characterizations


Proof of Thm. 1. Set
pmin = min pj ,
j=1,...,`

= min min E[Y (xj ) Y (j (z)) | = j ].


j=1,...,` z 2 1
/ j (X )
j

Then, by independence of Zt , union bound, and De Morgans law,


XT X
E LCTRegret E / j 1 (Xj )]
I[ t = j , Zt 2
t=1 j=1
T X
X
= / j 1 (Xj ))
pkj P (Zt 2
t=1 j=1
T X
X
pmin / j 1 (Xj ))
P (Zt 2
t=1 j=1
T
X
pmin / j 1 (Xj ))
P (_`j=1 Zt 2
t=1
T
X
= pmin (1 P (Zt 2 ?))
t=1
= pmin T = (T )

Proof of Thm. 3. Define 0 = minz2X E[Y ( (z )) Y ( (z))], 00 = minz2X / E[Y (x )



Y (z) |
P T
= ], = min{ , }, and TX =
0 00
t=1 I [Zt 2 X ] . By assumption, we have

> 0. By
independence of Zt , we have
X T
E ITTRegret E I [Zt 2 X ] (Yt ( t (z )) Yt )
t=1
0
ETX ETX .
By independence of Zt , we also have
T
X
E CRegret E I[ t / X ] (Yt (x )
= , Zt 2 Yt )
t=1
T
X
00
E I[ t / X ]
= , Zt 2
t=1
T
X
E (I [ t = ] I [Zt 2 X ])
t=1
= (p T ETX ).
Together, E ITTRegret +E CRegret = (T ).

Proof. By assumption (2) (1), we must have 2 , (1) , (2) . Therefore,


E[Y ( (z))] = E[E[Y ( (z)) | ]]
= p E[Y (z) | = ] + p(1) E[Y (1) | = (1) ] + p(2) E[Y (2) | = (2) ].
Maximizing both sides over z 2 [m], we see that Z = X . Therefore,
XT
ITTRegret = I[ t = ](Yt ( t (z )) Yt )
t=1
T
X
+ I[ t = (1) ](Yt ( t (z )) Yt )
t=1
T
X
+ I[ t = (2) ](Yt ( t (z )) Yt )
t=1
= CRegret
T
X
+ I[ t = (1) ](Yt (1) Yt (1))
t=1
T
X
+ I[ t = (2) ](Yt (2) Yt (2))
t=1
= CRegret

Proof of Thm. 6. Fix x0 2 [m]. By Asm. 2 we have E[Y (x) Y (x0 ) | (z) = x] = x x0 .
Therefore:
Pm
z = E[Y ( (z))] = x=1 Pzx E[Y (x) | (z) = x]
Pm
= x=1 Pzx (x x0 + E[Y (x0 ) | (z) = x])
Pm Pm
= x=1 Pzx x x0 + x0 = x=1 Pzx x

Proofs of Regret Bounds


(s) (t)
Let us define tz = inf{t : nz s} as the time when z was pulled for the sth time. Define
((s)) (t(s)
z ) ((s)) (t(s)
z ) ((s)) (t(s) )
nzx = nzx , Tz = Tz, and Pz = Pz z . Then define P ((s1 ,...,sm )) as the matrix
th ((sz )) t(sz ) ((s1 ,...,sm )) as the vector
whose z row is Pz , i.e., its
(z, x) entry is Pzxz . Similarly, define
whose z th entry is (t(s z ))
((s1 ,...,sm )) . Note that,
. Finally, define ((s1 ,...,sm )) = (P ((s1 ,...,sm )) ) 1
z

(t(s)
for a fixed s, z is simply an empirical average of s iid draws of the reward when z is pulled.
)
t(s)
Similarly, for fixed s, Pzxz is simply an empirical average of s iid draws of the indicator whether x is
applied when z is pulled.
Pm 1 2
Lemma 1. Fix s1 , . . . , sm . Then P(kP ((s1 ,...,sm )) P k1 > ) z=1 em 8 sz .

Proof. Define ez 2 Rm as the one-hot unit vector with one in coordinate z. Let us define the
empirical Rademacher complexities
X X
b ((sz )) = 1
R sup
1
u v T eXu .
z
2sz ((s ) kvk1 1
s z ((s )) z ) z
2{ 1,+1}Tz u2Tz

Note that
0 1
1 X
kPz((sz )) Pz k1 = sup @ v T eXu v T Pz A
kvk1 1 s z ((s ))
u2Tz z
Then by Bartlett and Mendelson [6], we have that with probability at least 1 ,
p
b ((sz )) ] + log(1/)/(2sz )
kP ((sz )) Pz k1 2E[R
z z
By linearity and duality of norms, we have
X sz
b ((sz )) = 1 1 X
R z u e Xu
2sz sz u=1
2{ 1,+1}sz 1
m
X X sz
X
1 1
= I [Xu = x] u
sz x=1
2sz
2{ 1,+1}sz u=1
m n((s z ))
1 X 1 X X
zx

= ((sz ))
u
sz x=1 2 zx
n
((sz )) u=1
2{ 1,+1}nzx
& '
m ((s ))
1 X 1 nzx z l m m
= ((s ))
nzx z
sz x=1 2n((s
zx
z ))
1 2 2

Xm q m q
X r
1 ((s )) 1 ((s )) m
nzx z = p Pzx z .
sz x=1 sz x=1 sz
p
Therefore, by 2 1/ 2, Jensens inequality, and the concavity of square root, with probability at
least 1 ,
p
kPz((sz )) Pz k1 4 (m log )/(2sz ).
((s ))
Finally, P(kP ((s1 ,...,sm ))
P k1 > ) = P(9z : kPz z Pz k1 > ) and the union bound complete
the proof.

The Rademacher complexity argument (compared to an argument based on Hoeffding inequalitys


and the union bound) is critical to achieving the correct rate in sz /m.
Lemma 2. Fix s1 , . . . , sm and let 2 (0, 1) and 0 < min (P ). Then
p Pm 1 2 2
P(k(P ((s1 ,...,sm )) ) 1 k1 > m 1 (1 ) 1 ) z=1 em 8 sz /m

Proof. We have
p
((s1 ,...,sm )) 1
p ((s1 ,...,sm )) 1 m
k(P ) k1 mk(P ) k2 =
min (P ((s1 ,...,sm )) )
p
m 1
p .
( kP ((s1 ,...,sm )) P k2 )+ ( / m kP ((s1 ,...,sm )) P k1 )+
p
Applying Lemma 1 with = / m yields the result.
Lemma 3. Fix s and let Asm. 4 hold. Then,
((s)) s2 /(4 )
P(| z Pz((s)) | > ) 2e .

Proof. Note that we can rewrite


P
((s))
z
((s))
Pz = 1
((s)) (Yu (Xu ) Xu ).
s u2Tz
By Markovs inequality,
((s))
P( z Pz((s)) > )
Y
inf e s E[e (Yu (Xu ) Xu )
]
0
((s))
u2Tz
2
s2 /(4 )
inf es( )
=e .
0
A symmetric argument and the union bound yields the result.
Lemma 4. Fix s1 , . . . , sm . Let 0 < min (P ) and Asm. 4 hold. Then,
Pm m sz 2 2
p
P(k((s1 ,...,sm )) k1 > ) 2 z=1 e 2m(2+ 2 )2

p
Proof. Applying Lemma 2 with = 2/(2 + 2 ), applying Lemma 3 with (1 f) 1
m 1/2

for each z, the union bound twice, and 2 em yield the result.
Pt
Proof of Thm. 7. Let E (t) = s=1 s , B = maxx=1,...,m (

x ), and = minx62X (
(t)
x ). Let t be one if at time step t we explored (random pull) and zero otherwise. Let nz =
P (t)
s2Tz
(t) s nz . Then,
Pm P1 (t 1) (t 1)
E CRegret(T ) p x=1 ( x )E (T ) /m + B t=1 P(k((n1 ,...,nm )) k1 /2).
Note
Pm that E
(t 1)
m log(t)/ and E (t) m(log(t) + 1)/pso that the first regret term is
x=1 ( x )(log(T ) + 1)/. Let = min2
(P ) 2 /(8m( 2 )2 ) > 0 and = 1 e > 0.
By Lemma 4, because t is independent of the data, and by the law of total probability, we have that
(t) (t) Pm (t)
P(k((n1 ,...,nm )) k1 /2) 2em z=1 E[e nz ]. Moreover,
(t) Pt
log E[e nz ] = s=1 log(e t /m + (1 t /m))
Pt
s=1 t /m = E (t) /m.
P1 (t 1) P 1
Finally, t=1 e E /m
t=1 t / and by assumption 0 < min( , 1)/2 < so that
the second regret term is a finite constant.

(t 1)
Proof of Thm. 8. Let B = maxx=1,...,m ( x ), = minx62X ( x ), tz = I[n (t) <
z0
(t) (t) Pt Pm
log(t)/, z = z0 ], nz = s=1 tz , t = z=1 tz , and s(t) = dlog(t)/e. We then have
Pm (T ) P1 (t) (t)
E CRegret(T ) p x=1 ( x )E nx + B t=1 P(k((s ,...,s )) k1 /2).
(T )
For the first regret term, we have nx 1 + log(T )/ under by construction and so the same is
(T )
true for E nx . For the second term, we invoke Lemma 4 and use s(t) log t/ in order to get
2 ) 2
(t) min (P
,...,s(t) )) p
P(k((s k1 /2) 2em mt 8m( + 2 )2 .
2
min (P ) 2
By assumption, we have that 8m( + 2 )2
p > 1 so that the second regret term is a finite constant.

You might also like