You are on page 1of 136

AdaBoost

Jiri Matas and Jan Šochman

Centre for Machine Perception


Czech Technical University, Prague
http://cmp.felk.cvut.cz
Presentation

Outline: 1
2
AdaBoost algorithm


3
• Why is of interest?
4
• How it works?
5
• Why it works? 6
AdaBoost variants 7


AdaBoost with a Totally Corrective Step (TCS) 8




Experiments with a Totally Corrective Step 9




10
11
12
13
14
15
16
Introduction

1990 – Boost-by-majority algorithm (Freund) 2




3
1995 – AdaBoost (Freund & Schapire)


4
1997 – Generalized version of AdaBoost (Schapire & Singer)


5
2001 – AdaBoost in Face Detection (Viola & Jones)


6
Interesting properties: 7

AB is a linear classifier with all its desirable properties. 8




9
AB output converges to the logarithm of likelihood ratio.


10
AB has good generalization properties.


11
AB is a feature selector with a principled strategy (minimisation of upper


12
bound on empirical error).
13
AB close to sequential decision making (it produces a sequence of gradually


more complex classifiers). 14


15
16
What is AdaBoost?

1
2
AdaBoost is an algorithm for constructing a ”strong” classifier as linear


combination 3
T
4
X
f (x) = αtht(x)
t=1 5
of “simple” “weak” classifiers ht(x).
6
7
8
9
10
11
12
13
14
15
16
What is AdaBoost?

1
2
AdaBoost is an algorithm for constructing a ”strong” classifier as linear


combination 3
T
4
X
f (x) = αtht(x)
t=1 5
of “simple” “weak” classifiers ht(x).
6
Terminology 7
8
ht(x) ... “weak” or basis classifier, hypothesis, ”feature”


9
H(x) = sign(f (x)) ... ‘’strong” or final classifier/hypothesis


10
11
12
13
14
15
16
What is AdaBoost?

1
2
AdaBoost is an algorithm for constructing a ”strong” classifier as linear


combination 3
T
4
X
f (x) = αtht(x)
t=1 5
of “simple” “weak” classifiers ht(x).
6
Terminology 7
8
ht(x) ... “weak” or basis classifier, hypothesis, ”feature”


9
H(x) = sign(f (x)) ... ‘’strong” or final classifier/hypothesis


10
Comments 11

The ht(x)’s can be thought of as features. 12




13
Often (typically) the set H = {h(x)} is infinite.


14
15
16
(Discrete) AdaBoost Algorithm – Singer & Schapire (1997)

Given: (x1, y1), ..., (xm, ym); xi ∈ X , yi ∈ {−1, 1} 1


2
Initialize weights D1(i) = 1/m
3
For t = 1, ..., T :
4
1. (Call WeakLearn), which returns the weak classifier ht : X → {−1, 1} with 5
minimum error w.r.t. distribution Dt;
6
2. Choose αt ∈ R,
7
3. Update 8
Dt(i)exp(−αtyiht(xi)) 9
Dt+1(i) =
Zt
10
where Zt is a normalization factor chosen so that Dt+1 is a distribution 11
12
Output the strong classifier:
13
T
!
X 14
H(x) = sign αtht(x)
t=1 15
16
(Discrete) AdaBoost Algorithm – Singer & Schapire (1997)

Given: (x1, y1), ..., (xm, ym); xi ∈ X , yi ∈ {−1, 1} 1

Initialize weights D1(i) = 1/m 2


3
For t = 1, ..., T :
1. (Call WeakLearn), which returns the weak classifier ht : X → {−1, 1} with 4
minimum error w.r.t. distribution Dt; 5
2. Choose αt ∈ R,
6
3. Update
7
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 8
Zt
9
where Zt is a normalization factor chosen so that Dt+1 is a distribution
10
Output the strong classifier: 11
12
T
!
X
H(x) = sign αtht(x)
t=1
13
14
Comments
The computational complexity of selecting ht is independent of t. 15


All information about previously selected “features” is captured in Dt! 16



WeakLearn

1
Loop step: Call WeakLearn, given distribution Dt;
returns weak classifier ht : X → {−1, 1} from H = {h(x)} 2
3
Select a weak classifier
Pm with the smallest weighted error


ht = arg min j = i=1 Dt(i)[yi 6= hj (xi)] 4


hj ∈H
5
Prerequisite: t < 1/2 (otherwise stop)


6
WeakLearn examples:


7
• Decision tree builder, perceptron learning rule – H infinite 8
• Selecting the best one from given finite set H 9
10
11
12
13
14
15
16
WeakLearn

1
Loop step: Call WeakLearn, given distribution Dt;
returns weak classifier ht : X → {−1, 1} from H = {h(x)} 2
3
Select a weak classifier
Pm with the smallest weighted error


ht = arg min j = i=1 Dt(i)[yi 6= hj (xi)] 4


hj ∈H
5
Prerequisite: t < 1/2 (otherwise stop)


6
WeakLearn examples:


7
• Decision tree builder, perceptron learning rule – H infinite 8
• Selecting the best one from given finite set H 9
10
Demonstration example
11
Weak classifier = perceptron 12
13
14
15
2
• ∼ N (0, 1) •∼ √1 e−1/2(r−4)
r 8π 3 16
WeakLearn

1
Loop step: Call WeakLearn, given distribution Dt;
returns weak classifier ht : X → {−1, 1} from H = {h(x)} 2
3
Select a weak classifier
Pm with the smallest weighted error


ht = arg min j = i=1 Dt(i)[yi 6= hj (xi)] 4


hj ∈H
5
Prerequisite: t < 1/2 (otherwise stop)


6
WeakLearn examples:


7
• Decision tree builder, perceptron learning rule – H infinite 8
• Selecting the best one from given finite set H 9
10
Demonstration example
11
Training set Weak classifier = perceptron
12
13
14

2
15
• ∼ N (0, 1) •∼ √1 e−1/2(r−4)
r 8π 3 16
WeakLearn

1
Loop step: Call WeakLearn, given distribution Dt;
returns weak classifier ht : X → {−1, 1} from H = {h(x)} 2
3
Select a weak classifier
Pm with the smallest weighted error


ht = arg min j = i=1 Dt(i)[yi 6= hj (xi)] 4


hj ∈H
5
Prerequisite: t < 1/2 (otherwise stop)


6
WeakLearn examples:


7
• Decision tree builder, perceptron learning rule – H infinite 8
• Selecting the best one from given finite set H 9
10
Demonstration example
11
Training set Weak classifier = perceptron
12
13
14

2
15
• ∼ N (0, 1) •∼ √1 e−1/2(r−4)
r 8π 3 16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

1 1
The main objective is to minimize εtr = m |{i : H(xi) 6= yi}|


2
T
Q
It can be upper bounded by εtr (H) ≤ Zt 3


t=1
4
5
6
7
8
9
10
11
12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error

1 1
The main objective is to minimize εtr = m |{i : H(xi) 6= yi}|


2
T
Q
It can be upper bounded by εtr (H) ≤ Zt 3


t=1
4
How to set αt? 5
Select αt to greedily minimize Zt(α) in each step 6


Zt(α) is convex differentiable function with one extremum 7




1+rt 8
⇒ ht(x) ∈ {−1, 1} then optimal αt = 21 log( 1−r t
)
Pm 9
where rt = i=1 Dt(i)ht(xi)yi
p 10
Zt = 2 t(1 − t) ≤ 1 for optimal αt


11
⇒ Justification of selection of ht according to t 12
13
14
15
16
AdaBoost as a Minimiser of an Upper Bound on the Empirical Error
1
The main objective is to minimize εtr = m |{i : H(xi) 6= yi}| 1


T
Q 2
It can be upper bounded by εtr (H) ≤ Zt


t=1 3
How to set αt? 4
Select αt to greedily minimize Zt(α) in each step 5


Zt(α) is convex differentiable function with one extremum




6
1+rt
⇒ ht(x) ∈ {−1, 1} then optimal αt = 21 log( 1−r t
) 7
Pm
where rt = i=1 Dt(i)ht(xi)yi 8
p
Zt = 2 t(1 − t) ≤ 1 for optimal αt


9
⇒ Justification of selection of ht according to t
10
Comments 11
The process of selecting αt and ht(x) can be interpreted as a single 12


optimization step minimising the upper bound on the empirical error.


13
Improvement of the bound is guaranteed, provided that t < 1/2.
14
The process can be interpreted as a component-wise local optimization


(Gauss-Southwell iteration) in the (possibly infinite dimensional!) space of 15


ᾱ = (α1, α2, . . . ) starting from. ᾱ0 = (0, 0, . . . ). 16
Reweighting

1
Effect on the training set
Reweighting formula: 2
Pt 3
Dt(i)exp(−αtyiht(xi)) exp(−yi q=1 αq hq (xi ))
Dt+1(i) = = Qt 4
Zt m q=1 Zq
 5
< 1, yi = ht(xi)
exp(−αtyiht(xi)) 6
> 1, yi 6= ht(xi)
} 7

⇒ Increase (decrease) weight of wrongly (correctly) 8


classified examples. The weight is the upper bound on the
error of a given example! 9
10
11
12
13
14
15
16
Reweighting

1
Effect on the training set
Reweighting formula: 2
Pt 3
Dt(i)exp(−αtyiht(xi)) exp(−yi q=1 αq hq (xi ))
Dt+1(i) = = Qt 4
Zt m q=1 Zq
 5
< 1, yi = ht(xi)
exp(−αtyiht(xi)) 6
> 1, yi 6= ht(xi)
} 7

⇒ Increase (decrease) weight of wrongly (correctly) 8


classified examples. The weight is the upper bound on the
error of a given example! 9
10
11
12
13
14
15
16
Reweighting

1
Effect on the training set
Reweighting formula: 2
Pt 3
Dt(i)exp(−αtyiht(xi)) exp(−yi q=1 αq hq (xi ))
Dt+1(i) = = Qt 4
Zt m q=1 Zq
 5
< 1, yi = ht(xi)
exp(−αtyiht(xi)) 6
> 1, yi 6= ht(xi)
} 7

⇒ Increase (decrease) weight of wrongly (correctly) 8


classified examples. The weight is the upper bound on the
error of a given example! 9
10
11
12
13
14
15
16
Reweighting

1
Effect on the training set
Reweighting formula: 2
Pt 3
Dt(i)exp(−αtyiht(xi)) exp(−yi q=1 αq hq (xi ))
Dt+1(i) = = Qt 4
Zt m q=1 Zq
 5
< 1, yi = ht(xi)
exp(−αtyiht(xi)) 6
> 1, yi 6= ht(xi)
} 7

⇒ Increase (decrease) weight of wrongly (correctly) 8


classified examples. The weight is the upper bound on the
error of a given example! 9
10
11
12
13
14
15
16
Reweighting

1
Effect on the training set
Reweighting formula: 2
Pt 3
Dt(i)exp(−αtyiht(xi)) exp(−yi q=1 αq hq (xi ))
Dt+1(i) = = Qt 4
Zt m q=1 Zq
 3
5
< 1, yi = ht(xi) 2.5

exp(−αtyiht(xi)) 6
> 1, yi 6= ht(xi) 2

err
1.5

} 1
7

⇒ Increase (decrease) weight of wrongly (correctly)


0.5
8
0
classified examples. The weight is the upper bound on the −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

error of a given example!


yf(x)
9
10
11
12
13
14
15
16
Reweighting

Effect on the training set 1

Reweighting formula: 2
Pt 3
Dt(i)exp(−αtyiht(xi)) exp(−yi q=1 αq hq (xi ))
Dt+1(i) = = Qt 4
Zt m q=1 Zq
 3
5
< 1, yi = ht(xi) 2.5

exp(−αtyiht(xi)) 6
> 1, yi 6= ht(xi) 2

err
1.5

} 1
7
⇒ Increase (decrease) weight of wrongly (correctly)
0.5
8
0
classified examples. The weight is the upper bound on the −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

error of a given example!


yf(x)

9
Effect on ht e
10
0.5 11
αt minimize Zt ⇒


12
P P
Dt+1(i) = Dt+1(i)
i:ht (xi )=yi i:ht (xi )6=yi
13
Error of ht on Dt+1 is 1/2


14
Next weak classifier is the most


15
“independent” one
1 2 3 4 t 16
Summary of the Algorithm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm

1
Initialization...
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm

1
Initialization...
For t = 1, ..., T : 2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm

1
Initialization... t=1
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
5
6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm

1
Initialization... t=1
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


6
7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm

1
Initialization... t=1
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


7
8
9
10
11
12
13
14
15
16
Summary of the Algorithm

1
Initialization... t=1
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 9
Zt
10
11
12
13
14
15
16
Summary of the Algorithm

1
Initialization... t=1
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Summary of the Algorithm

1
Initialization... t=2
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Summary of the Algorithm

1
Initialization... t=3
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Summary of the Algorithm

1
Initialization... t=4
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Summary of the Algorithm

1
Initialization... t=5
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Summary of the Algorithm

1
Initialization... t=6
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Summary of the Algorithm

1
Initialization... t=7
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Summary of the Algorithm

1
Initialization... t = 40
For t = 1, ..., T : 2
m
P 3
Find ht = arg min j = Dt(i)[yi 6= hj (xi)]


hj ∈H i=1 4
If t ≥ 1/2 then stop 5


1+rt
Set αt = 12 log( 1−r ) 6


Update 7


0.35 8
Dt(i)exp(−αtyiht(xi))
Dt+1(i) = 0.3
9
Zt
0.25
10
Output the final classifier: 0.2

11
0.15

12
T
!
X 0.1

H(x) = sign αtht(x)


0.05 13
t=1
0
0 5 10 15 20 25 30 35 40 14
15
16
Does AdaBoost generalize?

Margins in SVM 1
α · ~h(x))
y(~ 2
max min
(x,y)∈S k~
αk2 3
Margins in AdaBoost
α · ~h(x))
y(~ 4
max min
(x,y)∈S k~
αk1 5
6
Maximizing margins in AdaBoost
7
T q
Y ~ · ~h(x)
α 8
PS [yf (x) ≤ θ] ≤ 2T 1−θ
t (1 − t)1+θ where f (x) =
t=1
k~αk1 9
10
Upper bounds based on margin
11

2
!1/2 12
1 d log (m/d)
PD [yf (x) ≤ 0] ≤ PS [yf (x) ≤ θ] + O √

2
+ log(1/δ)  13
m θ
14
15
16
AdaBoost variants

1
Freund & Schapire 1995
2
Discrete (h : X → {0, 1})


3
Multiclass AdaBoost.M1 (h : X → {0, 1, ..., k})


4
Multiclass AdaBoost.M2 (h : X → [0, 1]k ) 5


Real valued AdaBoost.R (Y = [0, 1], h : X → [0, 1]) 6




7
Schapire & Singer 1997
8
Confidence rated prediction (h : X → R, two-class)


9
Multilabel AdaBoost.MR, AdaBoost.MH (different formulation of minimized 10


loss)
11
... Many other modifications since then (Totally Corrective AB, Cascaded AB) 12
13
14
15
16
Pros and cons of AdaBoost

Advantages 1
2
Very simple to implement


3
Feature selection on very large sets of features


4
Fairly good generalization


Disadvantages 6
7
Suboptimal solution for ᾱ


8
Can overfit in presence of noise


9
10
11
12
13
14
15
16
Adaboost with a Totally Corrective Step (TCA)

Given: (x1, y1), ..., (xm, ym); xi ∈ X , yi ∈ {−1, 1} 1


Initialize weights D1(i) = 1/m 2
For t = 1, ..., T : 3

1. (Call WeakLearn), which returns the weak classifier ht : X → {−1, 1} with 4


minimum error w.r.t. distribution Dt; 5
2. Choose αt ∈ R, 6
3. Update Dt+1 7
4. (Call WeakLearn) on the set of hm’s with non zero α’s . Update α.. 8
Update Dt+1. Repeat till |t − 1/2| < δ, ∀t.
9
Comments 10
All weak classifiers have t ≈ 1/2, therefore the classifier selected at t + 1 is


11
”independent” of all classifiers selected so far.
12
It can be easily shown, that the totally corrective step reduces the upper


bound on the empirical error without increasing classifier complexity. 13


The TCA was first proposed by Kivinen and Warmuth, but their αt is set as in 14


stadard Adaboost. 15
Generalization of TCA is an open question.


16
Experiments with TCA on the IDA Database

1
Discrete AdaBoost, Real AdaBoost, and Discrete and Real TCA evaluated 2


Weak learner: stumps. 3




Data from the IDA repository (Ratsch:2000): 4




Input Training Testing Number of 5


dimension patterns patterns realizations
6
Banana 2 400 4900 100
Breast cancer 9 200 77 100 7
Diabetes 8 468 300 100
8
German 20 700 300 100
Heart 13 170 100 100 9
Image segment 18 1300 1010 20 10
Ringnorm 20 400 7000 100
Flare solar 9 666 400 100 11
Splice 60 1000 2175 20 12
Thyroid 5 140 75 100
Titanic 3 150 2051 100 13
Twonorm 20 400 7000 100 14
Waveform 21 400 4600 100
15
Note that the training sets are fairly small


16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


4
the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)


5
6
7
8
9
10
11
12
13
14
15
16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


IMAGE 5
0.4

6
0.35
7
0.3
8
0.25 9

0.2
10
11
0.15
12
0.1
13
0.05 14

0
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


FLARE 5
0.46

6
0.44
7

0.42
8
9
0.4
10
0.38 11
12
0.36

13
0.34
14

0.32
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


GERMAN 5
0.32

6
0.3
7
0.28
8
0.26 9

0.24
10
11
0.22
12
0.2
13
0.18 14

0.16
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


RINGNORM 5
0.4

6
0.35
7
0.3
8
0.25 9

0.2
10
11
0.15
12
0.1
13
0.05 14

0
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


SPLICE 5
0.25

6
7
0.2

8
9
0.15

10

0.1
11
12

0.05
13
14

0
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


THYROID 5
0.25

6
7
0.2

8
9
0.15

10

0.1
11
12

0.05
13
14

0
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000)
4


TITANIC
5
0.24
6
0.235 7
8
0.23
9
10
0.225

11
0.22 12
13
0.215
14

0.21
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


BANANA 5
0.45

6
0.4
7
0.35
8
0.3 9

0.25
10
11
0.2
12
0.15
13
0.1 14

0.05
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


BREAST 5
0.31

6
0.3
7

0.29
8
9
0.28
10
0.27 11
12
0.26

13
0.25
14

0.24
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


DIABETIS 5
0.35

6
0.3
7

0.25
8
9
0.2
10
0.15 11
12
0.1

13
0.05
14

0
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Results with TCA on the IDA Database

1
Training error (dashed line), test error (solid line) 2


Discrete AdaBoost (blue), Real AdaBoost (green),




3
Discrete AdaBoost with TCA (red), Real AdaBoost with TCA (cyan)


the black horizontal line: the error of AdaBoost with RBF network weak classifiers from (Ratsch-ML:2000) 4


HEART 5
0.35

6
0.3
7

0.25
8
9
0.2
10
0.15 11
12
0.1

13
0.05
14

0
15
0 1 2 3
10 10 10 10
Length of the strong classifier 16
Conclusions

The AdaBoost algorithm was presented and analysed 2




3
A modification of the Totally Corrective AdaBoost was introduced


4
Initial test show that the TCA outperforms AB on some standard data sets.


5
6
7
8
9
10
11
12
13
14
15
16
3

2.5

2
err

1.5

0.5

0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
yf(x)
e

0.5

t
3

2.5

2
err

1.5

0.5

0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
yf(x)
3

2.5

2
err

1.5

0.5

0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
yf(x)
e

0.5

t
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 25 30 35 40
IMAGE
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
FLARE
0.46

0.44

0.42

0.4

0.38

0.36

0.34

0.32
0 1 2 3
10 10 10 10
Length of the strong classifier
GERMAN
0.32

0.3

0.28

0.26

0.24

0.22

0.2

0.18

0.16
0 1 2 3
10 10 10 10
Length of the strong classifier
RINGNORM
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
SPLICE
0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
THYROID
0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
TITANIC
0.24

0.235

0.23

0.225

0.22

0.215

0.21
0 1 2 3
10 10 10 10
Length of the strong classifier
BANANA
0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05
0 1 2 3
10 10 10 10
Length of the strong classifier
BREAST
0.31

0.3

0.29

0.28

0.27

0.26

0.25

0.24
0 1 2 3
10 10 10 10
Length of the strong classifier
DIABETIS
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
HEART
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
IMAGE
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
FLARE
0.46

0.44

0.42

0.4

0.38

0.36

0.34

0.32
0 1 2 3
10 10 10 10
Length of the strong classifier
GERMAN
0.32

0.3

0.28

0.26

0.24

0.22

0.2

0.18

0.16
0 1 2 3
10 10 10 10
Length of the strong classifier
RINGNORM
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
SPLICE
0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
THYROID
0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
TITANIC
0.24

0.235

0.23

0.225

0.22

0.215

0.21
0 1 2 3
10 10 10 10
Length of the strong classifier
BANANA
0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05
0 1 2 3
10 10 10 10
Length of the strong classifier
BREAST
0.31

0.3

0.29

0.28

0.27

0.26

0.25

0.24
0 1 2 3
10 10 10 10
Length of the strong classifier
DIABETIS
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier
HEART
0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
0 1 2 3
10 10 10 10
Length of the strong classifier

You might also like