Professional Documents
Culture Documents
Courant Institute
1 / 30
Learning with Large Number of Experts
Regret of RWM is O( T ln N).
Informative even for very large number of experts.
What if there is overlap between experts?
RWM with path experts
FPL with path experts
can we do better?
[Littlestone and Warmuth, 1989; Kalai and Vempala, 2004]
2 / 30
Better Bounds in Structured Case?
3 / 30
Outline
Learning Scenario
Component Hedge Algorithm
Regret Bounds
Applications & Lower Bounds
Conclusion & Open Problems
4 / 30
Learning Scenario
Assumptions:
Structured concept class C {0, 1}d
Composed of components: C t indicates which
components are used for each trial t.
Goal:
minimize expected regret after T trials
XT XT
t t
RT = E[C ] ` min C `t
C C
t=1 t=1
5 / 30
Component Hedge Algorithm
[Koolen, Warmuth, and Kivinen, 2010 ]
CH maintains weights w t conv (C ) [0, 1]d over the
components at each round t.
Update:
t
1 bit = wit1 e `i
weights: w
2 relative entropy projection:
w t := argminw conv (C) (w ||w
t)
Pd
where (w k v ) = i=1 (wi ln wvii + vi wi )
6 / 30
Component Hedge Algorithm
Prediction:
1 Decomposition of weights:
X
wt = C C
C C
7 / 30
Efficiency
8 / 30
Regret Bounds
Since ` MT , regret RT O(M T ln d).
Matching lower bounds in applications.
9 / 30
Comparison of CH, RWM and FPL
1 CH has significantly
better regret bounds:
CH: RT O(M Tln d).
RWM: RT O(M MT ln d)
FPL: RT O(M dT ln d)
10 / 30
Applications
11 / 30
On-line Shortest Path Problem (SPP)
G = (V , E ) is a directed graph.
s is the source and t is the destination.
Each s t path is an expert.
The loss is additive over edges.
12 / 30
Unit Flow Polytope
wu,v 0, (u, v ) E
X
ws,v = 1
v V
X X
wv ,u = wu,v , u V
v V v V
13 / 30
Example of Unit Flow Polytope
3 0.25
0.25
0
0.25 0.25 6
0 0.5
0 2
0.5 5
0.25 7
0.5
0.25 4
1
0.25
14 / 30
Entropy Projection on Unit Flow Polytope
X wu,v
min wu,v ln bu,v wu,v
+w
w w
bu,v
(u,v )E
subject to:
wu,v 0, (u, v ) E
X
ws,v = 1
v V
X X
wv ,u = wu,v , u V
v V v V
15 / 30
Dual problem
n X o
u v
max s w
bu,v e
(u,v )E
No constraints.
Only |V | variables.
bu,v e u v
Primal solution: wu,v = w
16 / 30
Convex Decomposition
17 / 30
Example of Convex Decomposition
3 0.25
0.25
0
0.25 0.25 6
0 0.5
0 2
0.5 5
0.25 7
0.5
0.25 4
1
0.25
18 / 30
Example of Convex Decomposition
3 0.25
0.25
0
0.25 0.25 6
0 0.5
0 2
0.25 5
0.25 7
0.25
0.25 4
1
19 / 30
Example of Convex Decomposition
3 0.25
0.25
0
0 0.25 6
0 0.5
0 2
0.25 5
0 7
0
0.25 4
1
20 / 30
Example of Convex Decomposition
3 0.25
0.25
0
0 6 0.25
0 0
0 2
0 5
0 7
0
0 4
1
21 / 30
Example of Convex Decomposition
3 0
0 0
0 6
0 0 0
0 2
0 0 5
7
0
0 4
1
22 / 30
Regret Bounds for SPP
23 / 30
Lower Bounds
24 / 30
Conclusions
25 / 30
References
Wouter Koolen, Manfred K. Warmuth, and Jyrki Kivinen.
Hedging Structured Concepts. In COLT (2010).
Nick Littlestone and Manfred K. Warmuth. The Weighted
Majority Algorithm. FOCS 1989: 256-261.
Adam Kalai and Santosh Vempala. Efficient algorithms for
online decision problems. J. Comput. Syst. Sci. , 2003.
Eiji Takimoto and Manfred K. Warmuth. Path kernels and
multiplicative updates. JMLR, 4:773818, 2003.
T. van Erven, W. Kotlowski, and M. K. Warmuth. Follow the
leader with dropout perturbations. In COLT, 2014.
Nicolo Cesa-Bianchi and Gabor Lugosi. Combinatorial bandits.
In Proceedings of the 22nd Annual Conference on Learning
Theory, 2009.
26 / 30
Regret Bounds
27 / 30
Proof of CH Regret Bound
1 Bound:
(1e )w t1 `t (C ||w t1 )(C ||w t )+C `t .
1 e x (1 e )x
Generalized Pythagorean Theorem
2 Sum overPtrials t:
(1 e ) Tt=1 w t1 `t (C ||w 0 ) (C ||w T ) + C `T
where `T = `1 + + `T .
3 Use w t1 `t = E [C t ] `t :
PT t t (C ||w 0 )(C ||w T )+C `T
t=1 E [C ] ` (1e )
28 / 30
Proof of CH Regret Bound
29 / 30