Professional Documents
Culture Documents
R ESAMPLING T ECHNIQUES
Bootstrap algorithm
1. Generate B bootstrap samples B1 , . . . , BB . Each bootstrap sample is obtained
by sampling n times with replacement from the sample data. (Note: Data points
can appear multiple times in any Bb .)
2. Evaluate the estimator on each bootstrap sample:
Ŝb := Ŝ(Bb )
Bootstrap algorithm
1. For b = 1, . . . , B, generate a boostrap sample Bb . In detail:
For i = 1, . . . , n:
I Sample an index j ∈ {1, . . . , n}.
(b)
I Set x̃i := x̃j and add it to Bb .
2. For each b, compute mean and variance estimates:
n n
1 X (b) 1 X (b)
µ̂b := x̃ σ̂b2 := (x̃ − µ̂b )2
n i=1 i n i=1 i
Variance reduction
I Averaging over the individual bootstrap samples can reduce the variance in Ŝ.
I In other words: ŜBS typically has lower variance than Ŝ.
I This is the property we will use for classicifation in the following.
Idea
I Recall Boosting: Weak learners are deterministic, but selected to exhibit high
variance.
I Strategy now: Randomly distort data set by resampling.
I Train weak learners on resampled training sets.
I Resulting algorithm: Bagging (= Bootstrap aggregation)
p1
..
.
1
(y1 + . . . + yn ) =
pk
n .
..
pK
Classification
For data point x, compute
I Compute
B
1X
favg (x) := fb (x)
B b=1
I For any x, this is a vector of the form favg (x) = (p1 (x), . . . , pk (x)).
I The Bagging classifier is given by
i.e. we predict the class label which most weak learners have voted for.
Peter Orbanz · Data Mining
Elements of Statistical Learning (2nd Ed.) !Hastie,
c Tibshirani & Friedman 2009 Chap 8
1 1
0 1 0 0
0 1
1 0
0
0 1 1 0 0 1 0 1
0
1 1
0
0
1
I Two classes, each with 0
1 0
1
1
1 0
Gaussian distribution in R5 .
1 1 0 1 1 0
1 1 0
0 1 0 0
1 1 0 0 0 1 0 1
b=9 b = 10 b = 11
x.1 < 0.395 x.1 < 0.555 x.1 < 0.555
| | |
0
1
0
1
1 0 0 1
0 1 1 0 0 1
Peter Orbanz · Data Mining
E XAMPLE : BAGGING T REES
0.50
Consensus
Probability
Original Tree
0.45
0.40 Bagged Trees
Test Error
0.35
0.30
0.25
Bayes
0.20
I "Original
FIGURE tree" 8.10.
= singleError
tree trained on original
curves for thedata.
bagging example
I The orange dots
of Figure correspond
8.9. Showntoisthethebagging
testclassifier.
error of the original
Peter Orbanz · Data Mining
R ANDOM F ORESTS
Random Forests
Modification of bagging with trees designed to further reduce correlation.
I Tree training optimizes each split over all dimensions.
I Random forests choose a different subset of dimensions at each split.
I Optimal split is chosen within the subset.
I The subset is chosen at random out of all dimensions {1, . . . , d}.
Training
Input: Training sample (x̃1 , ỹ1 ), . . . , (x̃n , ỹn ).
Input parameter: m (positive integer with m < d)
For b = 1, . . . , B:
1. Draw a bootstrap sample Bb of size n from training data.
(That is: Sample n times with replacement from training data.)
2. Train a tree classifier fb on Bb , where each split is computed as follows:
I Select m axes in Rd at random.
I Find the best split (j∗ , t∗ ) on this subset of dimensions.
I Split current node along axis j∗ at t∗ .
Classification
Exactly as for bagging: Classify by majority vote among the B trees. More precisely:
PB
I Compute favg (x) := (p1 (x), . . . , pk (x)) := 1
B b=1 fb (x)
I The Random Forest classification rule is
fBagging (x) := arg maxk {p1 (x), . . . , pk (x)}.
Remarks
√
I Recommended value for m is m = b dc or smaller.
I RF typically achieve similar results as boosting. Implemented in most
packages, often as standard classifier.
Random Forest Classifier
o o oo
o
o o oooo o
oo o o oo o oo
o o o o
o oo
o oo
Example: Synthetic Data o o oo o oo oo oooo o
o
oo o ooooo oo oo o o oo o
ooooo o oo oo o o o o
I This is the RF classification o oo ooo
oo oooo ooo o o oo o o oo o
o o o oo o o o
boundary on the synthetic data we oooo ooo oooo oo o oo o
ooo oo ooo o
o o o o oo ooo o o o
have already seen a few times. o oooo o ooo
o ooo o
o
o o oo o
I Note the bias towards axis-parallel o oo o o
o oooo oo oo o
alignment. o oo o
o o o o
Training Error: 0.000 oo
o o
Test Error: 0.238
Bayes Error: 0.210 o
Linear Classifiers
Approaches:
I One-versus-one classification.
I One-versus-all (more precisely: one-versus-the-rest) classification.
I Multiclass discriminants.
The SVM is particularly problematic.
R1
R1 R3
C1 ?
R2
C3
C1
R2
R3
C2 C2
not C1
C2
not C2
I Positive class = Ck .
I Classify by majority vote.
Negative class = ∪j6=k Cj . I Problem again: Ambiguous regions.
I Problem: Ambiguous regions (green
in figure).
Peter Orbanz · Data Mining
M ULTICLASS D ISCRIMINANTS
Linear classifier
I Recall: Decision rule is f (x) = sgn(hx, vH i − c)
I Idea: Combine classifiers before computing sign. Define
gk (x) := hx, vk i − ck
Problem
I Multiclass discriminant idea: Compare distances to hyperplanes.
I Works if the orthogonal vectors vH determining the hyperplanes are normalized.
I SVM: The K classifiers in multiple discriminant approach are trained on
separate problems, so the individual lengths of vH computed by max-margin
algorithm are not comparable.
Workarounds
I Often: One-against-all approaches.
I It is possible to define a single optimization problem for all classes, but training
time scales quadratically in number of classes.
•o o o o.......oo....... ....... o.......o•....... ....... .......o....... o....... ....... ....... ....... .......o•....... .......o....... .......o.......o•....... ....... ....... o....... ....... ....... ....... ....... ....... ....... .......oo....... ....... ....... ....... ....... ....... ....... ....... ....... .......o.......o•....... ....... ....... ....... ....... ....... .......
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . .o. . . . .o. . o . .o . . . .. o .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .. ..o.. .. .. .. .. .. .. .. .. .. .. o .. .. .. .. .. .. .. o .. .. .. o
•
o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. •ooo
oooo o... ... ... ... ... ... ... ... ... ... ... ... ... ... o
..o .. .. o ..o.. o .. .. .. .. .. .. .. ..o .. ...o
.. .. o
... ...o ... ...o ... ... ... ...o
.. .. .. ..o
.. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. o
... ... ... o ... o ... ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... o
•.. .. .. .. .. ..o..
.. .. .. .. .. .. .. ..
o •
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. o
. . . . . . .. .. .. .. ..o
. . . . .o .. .. .. .. .. o
o.o. . .o. ..o
. ..o . . . . . . . . . . . . . . . .. .. .. .. .. .. ... ... ... ... ... ... ... ...
.. ... ... ... ... ...o... ... ... ...o... ... ... ... ... ... o
.. .. .. .. .. ..•.. .. .. .. .. .. .. .. .. .. ...o
•. ... ... ... ... ... ... .... .... .... .... .... .... ....
o.... ....o.... o.... .... o.... .... .... .... o.... .... .... .... .... .... o.... .... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. ... ...o... ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. • .. .. .. .. .. o . . . . .. .. .. .. o . .. ... ... ... ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ...
.......
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. o .. .. .. ..o o
o.. ... ... ... ... o
o .... .... .... .... .... .... .... .... o.... .... .... .... .... .... ....o.... ....o.... •.... o....oo.... .... .... o.... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
.. o .. .. .. ... ... o .o
. . .. .. .. ... ... ... ... o
. . . •
o . . . . . . . . . . . .......
• ....... ....... .......o....... .......o.......o....... ....... ....... .......o....... ....... .......o....... ....... ....... ....... ....... ....... o.......o....... o....... ....... ....... ....... ....... ....... ....... .......oo•....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......
...................... .o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o
. . . . .•. . . . . . . .o.. .. .. .. .. .. .. .. .. .. .. ..o
......................
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o. . . . . . . . . . . . . . . . . . . . . .
...................... .. .. .. .. .. .. .. .. .. .. .. .. .. o .o . .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . . .. .. .. .. o
.. .. ... ... ... ... ...o
. . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...................... .. .. .. .. .. .. .. .. .. .. .. .. .. o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. ... ...o .. .. o .....................
.. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. ..•.. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... o
...................... . . . . .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o
. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ...Training
. . . . . . . . Error: . . . . . . 0.160 ......
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
. . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ...Test . . . . .. .. .. .. .. .. .. .. .. .. 0.218
. . .. .. .. .. Error: . . . . . . . . . . .. .. .. .. .. ..
...... •.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o•.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...............................................
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Bayes Error: 0.210 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
•
o