Bagging and RandomForest

BAGGING AND R ANDOM F ORESTS
R ESAMPLING T ECHNIQUES
Bootstrap (or resampling) is a technique for improving the quality of estimators.
Resampling = sampling from the empirical distribution
Application to ensemble methods

I We will use resampling to generate weak learners for classification.
I We discuss two classifiers which use resampling: Bagging and random forests.
I Before we do so, we consider the traditional application of Bootstrap, namely
improving estimators.
Peter Orbanz · Data Mining

B OOTSTRAP : BASIC ALGORITHM
Given
I A sample x1 , . . . , xn .
I An estimator Ŝ for a statistic S.
Bootstrap algorithm
1. Generate B bootstrap samples B1 , . . . , BB . Each bootstrap sample is obtained
by sampling n times with replacement from the sample data. (Note: Data points
can appear multiple times in any Bb .)
2. Evaluate the estimator on each bootstrap sample:
Ŝb := Ŝ(Bb )
(That is: We estimate S pretending that Bb is the data.)

3. Compute the bootstrap estimate of S by averaging over all bootstrap samples:
B
1X
ŜBS := Ŝb
B b=1

E XAMPLE : VARIANCE E STIMATION
Mean and Variance

Z Z
µ := x p(x)dx σ 2 := (x − µ)2 p(x)dx
Rd Rd
Plug-in estimators for mean and variance

n n
1X 1X
µ̂ := x̃i σ̂ 2 := (x̃i − µ̂)2
n i=1 n i=1

B OOTSTRAP VARIANCE E STIMATE
Bootstrap algorithm
1. For b = 1, . . . , B, generate a boostrap sample Bb . In detail:
For i = 1, . . . , n:
I Sample an index j ∈ {1, . . . , n}.
(b)
I Set x̃i := x̃j and add it to Bb .
2. For each b, compute mean and variance estimates:
n n
1 X (b) 1 X (b)
µ̂b := x̃ σ̂b2 := (x̃ − µ̂b )2
n i=1 i n i=1 i
3. Compute the bootstrap estimate:

B
2 1X 2
σ̂BS := σ̂b
B b=1

H OW O FTEN D O W E S EE E ACH S AMPLE ?
Sample {x1 , ..., xn }, bootstrap resamples B1 , ..., BB .

In how many sets does a given xi occur?
Probability for xi not to occur in n draws:
1
Pr{x̃i 6∈ Bb } = (1 − )n
n
For large n: n
1 1
lim 1− = ≈ 0.3679
n→∞ n e
Asymptotically: Any xi will appear in ∼ 63% of the bootstrap resamples (multiple
appearances possible).
How often is xi expected to occur?

→ expected number of occurences of each xi is B.
Bootstrap estimate averages over reshuffled samples.

B OOTSTRAP : A PPLICATIONS
Estimate variance of estimators
I Since estimator Ŝ depends on (random) data, it is a random variable.
I The more this variable scatters, the less we can trust our estimate.
I If scatter is high, we can expect the values Ŝb to scatter as well.
I In previous example, this means: Estimating the variance of the variance
estimator.
Variance reduction
I Averaging over the individual bootstrap samples can reduce the variance in Ŝ.
I In other words: ŜBS typically has lower variance than Ŝ.
I This is the property we will use for classicifation in the following.
In traditional statistics: Debiasing

I Can be used to estimate the bias of an estimator.
I Bias estimate can be substracted to obtain an unbiased estimator.
I If you have never heard of estimator bias, forget everything you just read.
BAGGING
Idea
I Recall Boosting: Weak learners are deterministic, but selected to exhibit high
variance.
I Strategy now: Randomly distort data set by resampling.
I Train weak learners on resampled training sets.
I Resulting algorithm: Bagging (= Bootstrap aggregation)

R EPRESENTATION OF CLASS LABELS
For Bagging with K classes, we represent class labels as vectors:
 
0
.
.
.
0
 
xi in class k as yi = 1 ←− kth entry
 
0
 
 .. 
 
.
0
This way, we can average together multiple class labels:
p1
 
 .. 
.
1  
(y1 + . . . + yn ) = 
 pk 

n .
 .. 
pK
We can interpret pk as the probability that one of the n points is in class k.

BAGGING : A LGORITHM
Input: Training sample (x̃1 , ỹ1 ), . . . , (x̃n , ỹn )
Training
For b = 1, . . . , B:
1. Draw a bootstrap sample Bb of size n from training data.
(That is: Sample n times with replacement from training data.)
2. Train a classifier fb on Bb .
Classification
For data point x, compute
I Compute
B
1X
favg (x) := fb (x)
B b=1
I For any x, this is a vector of the form favg (x) = (p1 (x), . . . , pk (x)).
I The Bagging classifier is given by
fBagging (x) := arg max{p1 (x), . . . , pk (x)} ,

k
i.e. we predict the class label which most weak learners have voted for.
Elements of Statistical Learning (2nd Ed.) !Hastie,
c Tibshirani & Friedman 2009 Chap 8
E XAMPLE : BAGGING T REES

Original Tree b=1 b=2
x.1 < 0.395 x.1 < 0.555 x.2 < 0.205
| | |
1 1
0 1 0 0
0 1
1 0
0
0 1 1 0 0 1 0 1
b=3 b=4 b=5

x.2 < 0.285 x.3 < 0.985 x.4 < −1.36
| | |
0
1 1
0
0
1
I Two classes, each with 0
1 0
1
1
1 0
Gaussian distribution in R5 .
1 1 0 1 1 0
b=6 b=7 b=8

I Note the variance between x.1 < 0.395 x.1 < 0.395 x.3 < 0.985
| | |
bootstrapped trees.
1
1 1 0
0 1 0 0
1 1 0 0 0 1 0 1
b=9 b = 10 b = 11
x.1 < 0.395 x.1 < 0.555 x.1 < 0.555
| | |
0
1
0
1
1 0 0 1
0 1 1 0 0 1
E XAMPLE : BAGGING T REES
0.50
Consensus
Probability
Original Tree
0.45
0.40 Bagged Trees
Test Error
0.35
0.30
0.25
Bayes
0.20
0 50 100 150 200
Number of Bootstrap Samples
I "Original
FIGURE tree" 8.10.
= singleError
tree trained on original
curves for thedata.
bagging example
I The orange dots
of Figure correspond
8.9. Showntoisthethebagging
testclassifier.
error of the original
R ANDOM F ORESTS
Bagging vs. Boosting

I Bagging works particularly well for trees, since trees have high variance.
I Boosting typically outperforms bagging with trees.
I The main culprit is usually dependence: Boosting is better at reducing
correlation between the trees than bagging is.
Random Forests
Modification of bagging with trees designed to further reduce correlation.
I Tree training optimizes each split over all dimensions.
I Random forests choose a different subset of dimensions at each split.
I Optimal split is chosen within the subset.
I The subset is chosen at random out of all dimensions {1, . . . , d}.

R ANDOM F ORESTS : A LGORITHM
Training
Input: Training sample (x̃1 , ỹ1 ), . . . , (x̃n , ỹn ).
Input parameter: m (positive integer with m < d)
For b = 1, . . . , B:
1. Draw a bootstrap sample Bb of size n from training data.
(That is: Sample n times with replacement from training data.)
2. Train a tree classifier fb on Bb , where each split is computed as follows:
I Select m axes in Rd at random.
I Find the best split (j∗ , t∗ ) on this subset of dimensions.
I Split current node along axis j∗ at t∗ .
Classification
Exactly as for bagging: Classify by majority vote among the B trees. More precisely:
PB
I Compute favg (x) := (p1 (x), . . . , pk (x)) := 1
B b=1 fb (x)
I The Random Forest classification rule is
fBagging (x) := arg maxk {p1 (x), . . . , pk (x)}.

R ANDOM F ORESTS
Remarks
√
I Recommended value for m is m = b dc or smaller.
I RF typically achieve similar results as boosting. Implemented in most
packages, often as standard classifier.
Random Forest Classifier
o o oo
o
o o oooo o
oo o o oo o oo
o o o o
o oo
o oo
Example: Synthetic Data o o oo o oo oo oooo o
o
oo o ooooo oo oo o o oo o
ooooo o oo oo o o o o
I This is the RF classification o oo ooo
oo oooo ooo o o oo o o oo o
o o o oo o o o
boundary on the synthetic data we oooo ooo oooo oo o oo o
ooo oo ooo o
o o o o oo ooo o o o
have already seen a few times. o oooo o ooo
o ooo o
o
o o oo o
I Note the bias towards axis-parallel o oo o o
o oooo oo oo o
alignment. o oo o
o o o o
Training Error: 0.000 oo
o o
Test Error: 0.238
Bayes Error: 0.210 o

M ULTIPLE C LASSES
M ULTIPLE C LASSES
More than two classes

For some of the classifiers we have discussed, multiple classes are natural:
I Simple classifier fitting one Gaussian per class.
I Trees.
I Ensembles: Number of classes is determined by weak learners.
Exception: All classifiers based on hyperplanes.
Linear Classifiers
Approaches:
I One-versus-one classification.
I One-versus-all (more precisely: one-versus-the-rest) classification.
I Multiclass discriminants.
The SVM is particularly problematic.

O NE -V ERSUS -X C LASSIFICATION
One-versus-all One-versus-one
C3
C1
R1
R1 R3
C1 ?
R2
C3
C1
R2
R3
C2 C2
not C1
C2
not C2
I One linear classifier per class.

I Classifies "in class k" versus "not in
I One linear classifier for each pair of
class k". classes (i.e. K(K−1)
2
in total).
I Positive class = Ck .
I Classify by majority vote.
Negative class = ∪j6=k Cj . I Problem again: Ambiguous regions.
I Problem: Ambiguous regions (green
in figure).
M ULTICLASS D ISCRIMINANTS
Linear classifier
I Recall: Decision rule is f (x) = sgn(hx, vH i − c)
I Idea: Combine classifiers before computing sign. Define
gk (x) := hx, vk i − ck
Multiclass linear discriminant Rj

Ri
I Use one classifier gk (as above) for
each class k.
Rk
I Trained e.g. as one-against-rest. xB
xA x
b
I Classify according to
f (x) := arg max{gk (x)}
k
I If gk (x) is positive for several classes, a larger value of gk means that x lies
“further” into class k than into any other class j.
I If gk (x) is negative for all k, the maximum means we classify x according to the
class represented by the closest hyperplane.
I Regions are convex.
SVM S AND M ULTIPLE C LASSES
Problem
I Multiclass discriminant idea: Compare distances to hyperplanes.
I Works if the orthogonal vectors vH determining the hyperplanes are normalized.
I SVM: The K classifiers in multiple discriminant approach are trained on
separate problems, so the individual lengths of vH computed by max-margin
algorithm are not comparable.
Workarounds
I Often: One-against-all approaches.
I It is possible to define a single optimization problem for all classes, but training
time scales quadratically in number of classes.

S UMMARY: C LASSIFICATION
S UMMARY
Approaches we have discussed

I Linear classifiers
I Perceptron, SVM
I Nonlinear versions using kernels
I Trees (depth 1: linear and axis-parallel, depth ≥ 2: non-linear)
I Ensemble methods
What should we use?

I Overall: RBF SVMs, AdaBoost and Random Forests perform well on many
problems.
I All have strengths and weaknesses. E.g.:
I High dimension, limited data: SVM may have the edge.
I Many dimensions, but we believe only a few are important: AdaBoost
with stumps.
I In general: Feature extraction (what do we measure?) is crucial.

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. .Training. . . . . . . . Error: . . . . . . 0.180 ......
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...............................................
oo
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
.. ..Test. . . . .. .. .. .. .. .. .. .. .. .. 0.245 ...... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. ... ... ... ... Error:
. . . . . . . . . . .. .. .. .. .. .. ...............................................
.. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...............................................
0.210 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
E VERY M ETHOD HAS ITS I DIOSYNCRASIES Bayes Error:
SVM - Radial Kernel in Feature Space

Elements of Statistical Learning (2nd Ed.) !Hastie,
c Tibshirani & Friedman 2
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
•
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...................... o o
•...... ...... o...... ...... ...... ......o•...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ...... ......
o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...................... o o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. ..o
... ... o
.. .. o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o
o •.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o
......................
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o .. ..o
.. .. ... ...o
•.... ....o...... ...... ......o....... .......o....... .......o....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......
• o oo
o ......................
•
o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
......................
..o .. .. .. ... ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o ..... ..... ..... .....oo..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....o.....o..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... ..... .....
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o o ... ... ... ... ... ... ... ... ...oo... ... ... ... ... ... ... ...o o. ..o... o... ... ... ... ... ... ... ... ... ... ... ... ...o
. . . . . . . . . . . . . . .•
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . .. .. .. .. .. .. .. .. .. .. .. .. .. ..
•o o o o.......oo....... ....... o.......o•....... ....... .......o....... o....... ....... ....... ....... .......o•....... .......o....... .......o.......o•....... ....... ....... o....... ....... ....... ....... ....... ....... ....... .......oo....... ....... ....... ....... ....... ....... ....... ....... ....... .......o.......o•....... ....... ....... ....... ....... ....... .......
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . .o. . . . .o. . o . .o . . . .. o .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o .. ..o.. .. .. .. .. .. .. .. .. .. .. o .. .. .. .. .. .. .. o .. .. .. o
•
o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. •ooo
oooo o... ... ... ... ... ... ... ... ... ... ... ... ... ... o
..o .. .. o ..o.. o .. .. .. .. .. .. .. ..o .. ...o
.. .. o
... ...o ... ...o ... ... ... ...o
.. .. .. ..o
.. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. .. .. o
... ... ... o ... o ... ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... o
•.. .. .. .. .. ..o..
.. .. .. .. .. .. .. ..
o •
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o.. o
. . . . . . .. .. .. .. ..o
. . . . .o .. .. .. .. .. o
o.o. . .o. ..o
. ..o . . . . . . . . . . . . . . . .. .. .. .. .. .. ... ... ... ... ... ... ... ...
.. ... ... ... ... ...o... ... ... ...o... ... ... ... ... ... o
.. .. .. .. .. ..•.. .. .. .. .. .. .. .. .. .. ...o
•. ... ... ... ... ... ... .... .... .... .... .... .... ....
o.... ....o.... o.... .... o.... .... .... .... o.... .... .... .... .... .... o.... .... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. ... ...o... ... ... ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. • .. .. .. .. .. o . . . . .. .. .. .. o . .. ... ... ... ... ... o .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ...
.......
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. o .. .. .. ..o o
o.. ... ... ... ... o
o .... .... .... .... .... .... .... .... o.... .... .... .... .... .... ....o.... ....o.... •.... o....oo.... .... .... o.... .... .... ....o.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
.. o .. .. .. ... ... o .o
. . .. .. .. ... ... ... ... o
. . . •
o . . . . . . . . . . . .......
.. .. o .. ... ... ... o

.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...o .. .. .. .. .. .. o ..o...o... ...o .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. ... o
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... o .. .. .. ... ... ... o .. ..o .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ...
. . . . . . . . . .o. . . . .o. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
o.... .... .... .... .... .... .... .... .... .... .... .... .... .... ....o.... ....o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... •o... ... ... o •. . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
• ....... ....... .......o....... .......o.......o....... ....... ....... .......o....... ....... .......o....... ....... ....... ....... ....... ....... o.......o....... o....... ....... ....... ....... ....... ....... ....... .......oo•....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... .......
...................... .o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
o
. . . . .•. . . . . . . .o.. .. .. .. .. .. .. .. .. .. .. ..o
......................
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o. . . . . . . . . . . . . . . . . . . . . .
...................... .. .. .. .. .. .. .. .. .. .. .. .. .. o .o . .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . . .. .. .. .. o
.. .. ... ... ... ... ...o
. . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...................... .. .. .. .. .. .. .. .. .. .. .. .. .. o
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. ... ...o .. .. o .....................
.. .. .. .. .. .. .. .. .. .. ..o.. .. .. .. .. .. .. .. .. .. .. o
.. .. .. .. .. .. .. .. .. .. ..•.. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... o
...................... . . . . .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...o
. . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
... ...Training
. . . . . . . . Error: . . . . . . 0.160 ......
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..o
. . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
... ...Test . . . . .. .. .. .. .. .. .. .. .. .. 0.218
. . .. .. .. .. Error: . . . . . . . . . . .. .. .. .. .. ..
...... •.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. o•.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
...............................................
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Bayes Error: 0.210 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
•
o
Random Forest Classifier 3−Nearest Neighbors

Linear SVM RBF SVM
oo FIGURE 12.3. Two nonlinear SVMs oo for the mix-
o o
o o
o plot useso aoo4th
oooo tureo data. The upper
o
o degree
oooo polynomial
o
oo o o oo o oo o o oo o o
o o o
o
o o
oo oo o ooo o o o oo oo oo
o o oo o oo oo o ooo o o o oo o oo oo oooo o
o o
oo o ooooo oo oo o o oo o oo o ooooo oo oo o o oo o
ooooo o oo oo o o o o o ooooo o oo oo o o o o
o oooo oooo oooo oo o ooo o
o oo o oooo oooo oooo ooo o ooo o oo
o o o o oo ooooo o o oo o o o oo o o o
oooo o o o o
o o oooo ooo oooo oo o oo o
ooo oo ooo o ooo oo ooo o
o o o o oo o o o o o o o oo ooo o o o
o oooo o oooooo o o
o
o o oooo o o
o ooo o o ooo o
o o oo o o o oo o
o oo o o o oo o o
o oooo oo oo o o oooo oo oo o
o oo o o oo o
o o o o o o o o
Training Error: 0.000 oo Training Error: 0.130 oo
o o o o
Test Error: 0.238 Test Error: 0.242
Bayes Error: 0.210 o Bayes Error: 0.210 o

Random forest

Bagging and RandomForest

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bagging and RandomForest

Uploaded by

Copyright:

Available Formats

BAGGING AND R ANDOM F ORESTS

Bootstrap (or resampling) is a technique for improving the quality of estimators.

Resampling = sampling from the empirical distribution

Application to ensemble methods

Peter Orbanz · Data Mining

(That is: We estimate S pretending that Bb is the data.)

Peter Orbanz · Data Mining

Mean and Variance

Plug-in estimators for mean and variance

Peter Orbanz · Data Mining

3. Compute the bootstrap estimate:

Peter Orbanz · Data Mining

Sample {x1 , ..., xn }, bootstrap resamples B1 , ..., BB .

How often is xi expected to occur?

Bootstrap estimate averages over reshuffled samples.

Peter Orbanz · Data Mining

In traditional statistics: Debiasing

Peter Orbanz · Data Mining

This way, we can average together multiple class labels:

We can interpret pk as the probability that one of the n points is in class k.

fBagging (x) := arg max{p1 (x), . . . , pk (x)} ,

E XAMPLE : BAGGING T REES

b=3 b=4 b=5

b=6 b=7 b=8

0 50 100 150 200

Number of Bootstrap Samples

Bagging vs. Boosting

Peter Orbanz · Data Mining

Peter Orbanz · Data Mining

Peter Orbanz · Data Mining

More than two classes

Peter Orbanz · Data Mining

I One linear classifier per class.

Multiclass linear discriminant Rj

Peter Orbanz · Data Mining

Approaches we have discussed

What should we use?

Peter Orbanz · Data Mining

E VERY M ETHOD HAS ITS I DIOSYNCRASIES Bayes Error:

SVM - Radial Kernel in Feature Space

.. .. o .. ... ... ... o

Random Forest Classifier 3−Nearest Neighbors

Peter Orbanz · Data Mining

You might also like