You are on page 1of 29

Learning with Large Number of Experts:

Component Hedge Algorithm

Giulia DeSalvo and Vitaly Kuznetsov

Courant Institute

March 24th, 2015

1 / 30
Learning with Large Number of Experts


Regret of RWM is O( T ln N).
Informative even for very large number of experts.
What if there is overlap between experts?
RWM with path experts
FPL with path experts
can we do better?
[Littlestone and Warmuth, 1989; Kalai and Vempala, 2004]

2 / 30
Better Bounds in Structured Case?

Can overlap between experts lead to better regret


guarantees?
What are the lower bounds in the structured
setting?
Computationally efficient solutions that realize these
bounds?

3 / 30
Outline

Learning Scenario
Component Hedge Algorithm
Regret Bounds
Applications & Lower Bounds
Conclusion & Open Problems

4 / 30
Learning Scenario
Assumptions:
Structured concept class C {0, 1}d
Composed of components: C t indicates which
components are used for each trial t.

Additive loss `t incurred at each trial t.


Loss of each concept C is C `t M := maxC C |C |

Goal:
minimize expected regret after T trials
XT XT
t t
RT = E[C ] ` min C `t
C C
t=1 t=1
5 / 30
Component Hedge Algorithm
[Koolen, Warmuth, and Kivinen, 2010 ]
CH maintains weights w t conv (C ) [0, 1]d over the
components at each round t.
Update:
t
1 bit = wit1 e `i
weights: w
2 relative entropy projection:
w t := argminw conv (C) (w ||w
t)
Pd
where (w k v ) = i=1 (wi ln wvii + vi wi )

6 / 30
Component Hedge Algorithm

Prediction:

1 Decomposition of weights:
X
wt = C C
C C

where is a distribution over C

2 Sample C t+1 according to

7 / 30
Efficiency

Need efficient implementation of:


Decomposition (not unique) of weights over the
concepts
Entropy projection step (convex problem)
Sufficient: conv (C) described by polynomial in d
constraints

8 / 30
Regret Bounds

Theorem: Regret Bounds for CH


Let ` = minC C C (`1 + . . . + `T ) be the loss of the
best concept in hindsight, then
p
RT 2` M ln(d/M) + M ln(d/M)
q
by choosing = 2M ln(d/M)
`


Since ` MT , regret RT O(M T ln d).
Matching lower bounds in applications.

9 / 30
Comparison of CH, RWM and FPL

1 CH has significantly
better regret bounds:
CH: RT O(M Tln d).
RWM: RT O(M MT ln d)
FPL: RT O(M dT ln d)

2 CH is optimal w.r.t. regret bounds while RWM and


FPL are not optimal.
3 Standard expert setting (no structure):
CH, RWM and FPL reduce to the same algorithm.

10 / 30
Applications

On-line shortest path problems.


On-line PCA (k-sets).
On-line ranking (k-permutations).
Spanning trees.

11 / 30
On-line Shortest Path Problem (SPP)

G = (V , E ) is a directed graph.
s is the source and t is the destination.
Each s t path is an expert.
The loss is additive over edges.

12 / 30
Unit Flow Polytope

Convex hull of paths cannot be captured by linear


constraints
Unit flow polytope relaxation is used:

wu,v 0, (u, v ) E
X
ws,v = 1
v V
X X
wv ,u = wu,v , u V
v V v V

Relaxation does not hurt regret bounds.

13 / 30
Example of Unit Flow Polytope

3 0.25

0.25
0
0.25 0.25 6
0 0.5
0 2
0.5 5
0.25 7
0.5

0.25 4
1

0.25

14 / 30
Entropy Projection on Unit Flow Polytope

X wu,v
min wu,v ln bu,v wu,v
+w
w w
bu,v
(u,v )E
subject to:
wu,v 0, (u, v ) E
X
ws,v = 1
v V
X X
wv ,u = wu,v , u V
v V v V

15 / 30
Dual problem

n X o
u v
max s w
bu,v e

(u,v )E

No constraints.
Only |V | variables.
bu,v e u v
Primal solution: wu,v = w

16 / 30
Convex Decomposition

1 Find any non-zero path from s to t.


2 Subtract the smallest weight from each edge.
3 Repeat until no path is found.
= At most |E | iteration is needed.

17 / 30
Example of Convex Decomposition

3 0.25

0.25
0
0.25 0.25 6
0 0.5
0 2
0.5 5
0.25 7
0.5

0.25 4
1

0.25

18 / 30
Example of Convex Decomposition

3 0.25

0.25
0
0.25 0.25 6
0 0.5
0 2
0.25 5
0.25 7
0.25

0.25 4
1

19 / 30
Example of Convex Decomposition

3 0.25

0.25
0
0 0.25 6
0 0.5
0 2
0.25 5
0 7
0

0.25 4
1

20 / 30
Example of Convex Decomposition

3 0.25

0.25
0
0 6 0.25
0 0
0 2
0 5
0 7
0

0 4
1

21 / 30
Example of Convex Decomposition

3 0

0 0
0 6
0 0 0
0 2
0 0 5
7
0

0 4
1

22 / 30
Regret Bounds for SPP

Expected regret is bounded by


p p
2 ` k ln |V | + 2k ln |V | O(M T ln |V |)

Bound holds for arbitrary graphs.

23 / 30
Lower Bounds

Any algorithm can be forced to have expected regret


r
|V |
` k ln
k
Idea of the proof:
Minimize the overlap.
Create |V |/k disjoint paths of length k.
Apply lower bounds for standard expert setting.

24 / 30
Conclusions

Regret of CH is often better than that of RWM or


FPL in structured setting.
Regret of CH often matches lower bounds in
applications.
Efficient solutions exist for a wide range of
applications: on-line shortest path, on-line PCA,
on-line ranking, spanning trees.

25 / 30
References
Wouter Koolen, Manfred K. Warmuth, and Jyrki Kivinen.
Hedging Structured Concepts. In COLT (2010).
Nick Littlestone and Manfred K. Warmuth. The Weighted
Majority Algorithm. FOCS 1989: 256-261.
Adam Kalai and Santosh Vempala. Efficient algorithms for
online decision problems. J. Comput. Syst. Sci. , 2003.
Eiji Takimoto and Manfred K. Warmuth. Path kernels and
multiplicative updates. JMLR, 4:773818, 2003.
T. van Erven, W. Kotlowski, and M. K. Warmuth. Follow the
leader with dropout perturbations. In COLT, 2014.
Nicolo Cesa-Bianchi and Gabor Lugosi. Combinatorial bandits.
In Proceedings of the 22nd Annual Conference on Learning
Theory, 2009.
26 / 30
Regret Bounds

Theorem: Regret Bounds for CH


Let ` = minC C C (`1 + . . . + `T ) be the loss of the
best concept in hindsight, then
p
RT 2` M ln(d/M) + M ln(d/M)
q
by choosing = 2M ln(d/M)
`

27 / 30
Proof of CH Regret Bound

1 Bound:
(1e )w t1 `t (C ||w t1 )(C ||w t )+C `t .
1 e x (1 e )x
Generalized Pythagorean Theorem

2 Sum overPtrials t:
(1 e ) Tt=1 w t1 `t (C ||w 0 ) (C ||w T ) + C `T
where `T = `1 + + `T .

3 Use w t1 `t = E [C t ] `t :
PT t t (C ||w 0 )(C ||w T )+C `T
t=1 E [C ] ` (1e )

28 / 30
Proof of CH Regret Bound

4 w 0 assumes uniform distribution over concepts


wi0 = Md = (C ||w 0 ) = M ln( Md )
5 let `qbest concept in hind-sight and choosing
d
2M ln( M )
= ` = Regret bound RT .

29 / 30

You might also like