Professional Documents
Culture Documents
Boosting
Mehryar Mohri
Courant Institute and Google Research
mohri@cims.nyu.edu
Weak Learning
(Kearns and Valiant, 1994)
Definition: concept class C is weakly PAC-learnable
if there exists a (weak) learning algorithm L and > 0
such that:
• for all > 0 , for all c C and all distributions D,
1
Pr R(hS ) 1 ,
S D 2
t = 1
t = 2
Mehryar Mohri - Foundations of Machine Learning page 6
t = 3
... ...
R(h) exp( 2 2
T ).
Zt = Dt (i)e t yi ht (xi )
i=1
= Dt (i)e t
+ Dt (i)e t
= (1 t) 1
t
t
+ t
1
t
t
=2 t (1 t ).
• Notes:
• minimizer of
t (1 t )e + te .
• since (1 )e = t , at each round, AdaBoost
t
te
t
x
e
0 1 loss
•
X PN
Since F (↵¯ t 1 + ⌘ek ) = e yi j=1 ↵
¯t 1,j hj (xi ) ⌘yi hk (xi )
,
m i=1
0 1 X yi
PN
↵
¯t 1,j hj (xi )
¯t
F (↵ 1 , ek ) = yi hk (xi )e j=1
m i=1
m
1 X
= yi hk (xi )D̄t (i)Z̄t
m i=1
" m m
#
X X Z̄t
= D̄t (i)1yi hk (xi )=+1 D̄t (i)1yi hk (xi )= 1
i=1 i=1
m
h i Z̄ h i Z̄
t t
= (1 ✏¯t,k ) ✏¯t,k = 2¯
✏t,k 1 .
m m
logistic loss
x log2 (1 + e x
)
hinge loss
x max(1 x, 0)
zero-one loss
x 1x<0
cumulative distribution
1.0
20
test error
15
error
10 0.5
training error 5
0
10 100 1000 -1 -0.5
# rounds ma
C4.5 decision
Figure 2: Errortrees
curves and the
(Schapire margin
et al., 1998). distribution graph for
the letter dataset as reported by Schapire et al. [69]. Left: the
error curves (lower and upper curves, respectively) of the comb
Mehryar Mohri - Foundations of Machine Learning page 18
Rademacher Complexity of Convex Hulls
Theorem: Let H be a set of functions mapping
from X to R. Let the convex hull of H be defined as
p p
conv(H) = { µk hk : p 1, µk 0, µk 1, hk H}.
k=1 k=1
w · (x) ↵ · h(x)
geom. margin kwk2
= d2 ( (x), hyperpl.)
k↵k1
= d1 (h(x), hyperpl.)
• equivalent form:
max min p Mej = min max ei Mq.
p j [1,n] q i [1,m]
Mehryar Mohri - Foundations of Machine Learning page 31
John von Neumann (1903 - 1957)
• G. Lebanon and J. Lafferty. Boosting and maximum likelihood for exponential models. In
NIPS, pages 447–454, 2001.
• Ron Meir and Gunnar Rätsch. An introduction to boosting and leveraging. In Advanced
Lectures on Machine Learning (LNAI2600), 2003.
• Rätsch, G., and Warmuth, M. K. (2002) “Maximizing the Margin with Boosting”, in
Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 02),
Sidney, Australia, pp. 334–350, July 2002.
• Reyzin, Lev and Schapire, Robert E. How boosting the margin can also boost classifier
complexity. In ICML, pages 753-760, 2006.
• Robert E. Schapire and Yoav Freund. Boosting, Foundations and Algorithms. The MIT Press,
2012.
• Robert E. Schapire,Yoav Freund, Peter Bartlett and Wee Sun Lee. Boosting the margin: A
new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):
1651-1686, 1998.