Professional Documents
Culture Documents
Chapter 8
Chapter 8
M L
D M
Chapter 8
Ensemble Learning
Introduction
Bagging
Boosting
Adaboost
Training Data
Example
What’s the average price of house prices?
From F, get a sample L=(x1, x2, …, xn), and calculate
the average u.
Question: how reliable is u? What’s the standard
error of u? what’s the confidence interval?
Solution: bootstrap
An example X1=(1.57,0.22,19.67,
0,0,2.2,3.12)
Mean=4.13
X=(3.12, 0, 1.57,
19.67, 0.22, 2.20) X2=(0, 2.20, 2.20,
Mean=4.46 2.20, 19.67, 1.57)
Mean=4.64
X3=(0.22, 3.12,1.57,
3.12, 2.20, 0.22)
Mean=1.74
Original 1 2 3 4 5 6 7 8
Training set 1 2 7 8 3 7 6 3 1
Training set 2 7 8 5 6 4 2 7 1
Training set 3 3 6 2 7 5 6 2 2
Training set 4 4 5 1 4 6 4 3 8
A family of methods
Sequential production of classifiers
Each classifier is dependent on the previous one,
and focuses on the previous one’s errors
Examples that are incorrectly predicted in
previous classifiers are chosen more often or
weighted more heavily
What is AdaBoost?
AdaBoost
Adaptive Boosting
A learning algorithm
Building a strong classifier from a lot of weaker ones
h1 ( x) {1, 1}
h2 ( x) {1, 1}
T
.. H T ( x) sign t ht ( x)
. t 1
hT ( x) {1, 1}
weak classifiers strong classifier
slightly better than random
..
classification should be selected
. How to
define features?
hT ( x) {1, 1}
–
– select beneficial features?
– train weak classifiers?
weak classifiers – manage (weight) training samples?
– associate weight to each weak
slightly better than random classifier?
h1 ( x) {1, 1}
h2 ( x) {1, 1}
T
.. H T ( x) sign t ht ( x)
. t 1
hT ( x) {1, 1}
weak classifiers strong classifier
slightly better than random
Weak
Classifier 1
Weights
Increased
Weak
Classifier 2
Weights
Increased
Weak
Classifier 3
Final classifier is
a combination of weak
classifiers
i 1
Dt (i )[ yi h j ( xi )]
1 1 t
• Weight classifier: t ln
2 t
Dt (i ) exp[ t yi ht ( xi )]
• Update distribution: Dt 1 (i ) , Z t is for normalization
Zt
T
Output final classifier: sign H ( x) t ht ( x)
t 1
T
Final classifier: sign H ( x) t ht ( x)
t 1
Ex E y e y[ Ht 1 ( x )t ht ( x )] | x
Ex , y e yHt ( x ) Ex eEyyHte1( xyH) t (ex )| txP(y ht ( x)) et P( y ht ( x))
Set Ex , y e yHt ( x ) 0
t
Xin-Shun Xu @ SDU 0
School of Computer Science and Technology, Shandong University 37
t ? MIMA
T
Final classifier: sign H ( x) t ht ( x)
t 1
Minimize lossexp H ( x) Ex , y e yH ( x )
Define H t ( x) H t 1 ( x) t ht ( x) with H 0 ( x) 0
Then, H ( x) H T ( x)
1 P( y ht ( x)) 1 1 t
t ln t ln P( xi , yi ) Dt (i )
2 P( y ht ( x)) 2 t
m
t P (error) Dt (i )[ yi h j ( xi )]
i 1
Xin-Shun Xu @ SDU 0
School of Computer Science and Technology, Shandong University 38
t ?
Given: ( x1 , y1 ),K , ( xm , ym ) where xi X , yi {1, 1}
Initialization: D1 (i ) m1 , i 1,K , m
For t 1,K , T : MIMA
• Find classifier ht : X
T {1, 1} which minimizes error wrt Dt ,i.e.,
Final classifier: sign h arg
Hmin t
( x) where
t 1
t h
hj
j t (Dx()i)[y h ( x )]
j
m
i 1
t i j i
1 P( y ht ( x)) 1 1 t
t ln t ln P( xi , yi ) Dt (i )
2 P( y ht ( x)) 2 t
m
t P(error) Dt (i )[ yi h j ( xi )]
i 1
Xin-Shun Xu @ SDU 0
School of Computer Science and Technology, Shandong University 39
Dt 1 ?
Given: ( x1 , y1 ),K , ( xm , ym ) where xi X , yi {1, 1}
Initialization: D1 (i ) m1 , i 1,K , m
For t 1,K , T : MIMA
• Find classifier ht : X
T {1, 1} which minimizes error wrt Dt ,i.e.,
Final classifier: sign h arg
Hmin t
( x) where
t 1
t h
hj
j t (Dx()i)[y h ( x )]
j
m
i 1
t i j i
1 P( y ht ( x)) 1 1 t
t ln t ln P( xi , yi ) Dt (i )
2 P( y ht ( x)) 2 t
m
t P(error) Dt (i )[ yi h j ( xi )]
i 1
Xin-Shun Xu @ SDU 0
School of Computer Science and Technology, Shandong University 40
Dt 1 ? MIMA
T
Final classifier: sign H ( x) t ht ( x)
t 1
Minimize lossexp H ( x) Ex , y e yH ( x )
Define H t ( x) H t 1 ( x) t ht ( x) with H 0 ( x) 0
Then, H ( x) H T ( x)
1
Ex , y e yHt Ex , y e yHt 1 e yt ht Ex , y e yHt 1 1 y t ht t2 y 2 ht2
2
yHt 1 1 2 2 2
ht arg min Ex , y e 1 y t h t y h y 2 h2 1
h
2
1
ht arg min Ex , y e yHt 1 1 y t h t2
h
2
yHt 1 1 2
ht arg min Ex E y e 1 y t h t | x
h
2
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 41
Dt 1 ? MIMA
T
Final classifier: sign H ( x) t ht ( x)
t 1
Minimize lossexp H ( x) Ex , y e yH ( x )
Define H t ( x) H t 1 ( x) t ht ( x) with H 0 ( x) 0
Then, H ( x) H T ( x)
yHt 1 1 2
ht arg min Ex E y e 1 y t h t | x
h
2
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 42
Dt 1 ? MIMA
T
Final classifier: sign H ( x) t ht ( x)
t 1
Minimize lossexp H ( x) Ex , y e yH ( x )
Define H t ( x) H t 1 ( x) t ht ( x) with H 0 ( x) 0
Then, H ( x) H T ( x)
ht ( x) sign Ex , y ~ e yHt 1 ( x ) P ( y| x ) y | x
h ( x) sign P
t x , y ~ e yHt 1 ( x ) P ( y| x )
( y 1| x) Px , y ~ e yHt 1 ( x ) P ( y| x ) ( y 1| x)
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 43
Dt 1 ? MIMA
T
Final classifier: sign H ( x) t ht ( x)
t 1
Minimize lossexp H ( x) Ex , y e yH ( x )
Define H t ( x) H t 1 ( x) t ht ( x) with H 0 ( x) 0
Then, H ( x) H T ( x)
At time t x, y ~ e yHt 1 ( x ) P( y | x)
ht ( x) sign Px , y ~ e yHt 1 ( x ) P ( y| x ) ( y 1| x) Px , y ~ e yHt 1 ( x ) P ( y| x ) ( y 1| x)
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 44
Dt 1 ?
Given: ( x1 , y1 ),K , ( xm , ym ) where xi X , yi {1, 1}
Initialization: D1 (i ) m1 , i 1,K , m
For t 1,K , T : MIMA
• Find classifier ht : X
T {1, 1} which minimizes error wrt Dt ,i.e.,
Final classifier: sign h arg
Hmin t
( x) where
t 1
t h
hj
j t (Dx()i)[y h ( x )]
j
m
i 1
t i j i
At time t x, y ~ e yHt 1 ( x ) P( y | x)
1 1
At time 1 x, y ~ P ( y | x ) P( yi | xi ) 1 D1 (1)
Z1 m
yH t ( x ) t yht ( x )
At time t+1 x, y ~ e P( y | x) Dt e
Dt (i ) exp[ t yi ht ( xi )]
Dt 1 (i ) , Z t is for normalization
Zt
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 45
Successful applications of ensemblesMIMA
Netflix prize
Any Question?