0% found this document useful (0 votes)

123 views68 pages

Tree-Based Methods Explained

Bagging is a technique to reduce variance in decision tree models by averaging multiple trees built on bootstrap samples of the original data. It works by 1) generating B bootstrap samples from the training data, 2) building a decision tree on each sample, and 3) averaging the predictions of the B trees. This reduces variance compared to a single tree. For regression, the average prediction is used, while for classification a majority vote of the tree predictions is taken. The out-of-bag error provides an estimate of test error without cross-validation. While bagging improves accuracy, it reduces interpretability compared to a single tree. Variable importance can still be assessed by how much each variable decreases error across the B trees.

Uploaded by

Nosheen Ramzan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views68 pages

Tree-Based Methods Explained

Uploaded by

Nosheen Ramzan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Tree Based Methods: Bagging, Boosting, and

Regression Trees

Rebecca C. Steorts, Duke University

STA 325, Chapter 8 ISL

1 / 68
Agenda

I Bagging
I Random Forests
I Boosting

2 / 68
Bagging

Recall that we introduced the boostrap (Chapter 5) which is useful

in many situations in which it is hard or even impossible to directly
compute the standard deviation of a quantity of interest.
Here, we will see that we can use the boostrap to improve statistical
learning through decision trees.

3 / 68
High variance in decision trees

Decision trees such as regression or classification trees suffer from

high variance.
This means that if we split the training data into two parts at
random and fit a decision tree to both halves, the results can be
quite different.
In contrast, a procedure with low variance will yield similar results if
applied repeatedly to distinct data sets; linear regression tends to
have low variance, if the ratio of n to p is moderately large.

4 / 68
Bagging

Bootstrap aggregation or bagging is a general-purpose procedure for

reducing the variance of a statistical learning method.

5 / 68
Bagging

Recall that given a set of independent observations

Z1 , . . . , Zn

each with variance σ 2 the variance of Z̄ is σ 2 /n.

In other words, averaging a set of observations reduces variance.
One natural way to reduce the variance and hence increase the
prediction accuracy of a statistical learning method is to take many
training sets from the population and build a separate prediction
model using each training set. Then we average the resulting
predictions.

6 / 68
Bagging

1. Calculate
fˆ1 (x ), fˆ2 (x ), . . . , fˆB (x )
using B seperate training sets.
2. Average them in order to obtain a single low-variance statistical
learning model:

B
1
fˆave (x ) fˆb (x ).
X
B b=1

7 / 68
Bagging

1. Calculate
fˆ1 (x ), fˆ2 (x ), . . . , fˆB (x )
using B seperate training sets.
2. Average them in order to obtain a single low-variance statistical
learning model:

B
1
fˆave (x ) fˆb (x ).
X
B b=1

This is not practical! Why?

8 / 68
Bagging

1. Calculate
fˆ1 (x ), fˆ2 (x ), . . . , fˆB (x )
using B seperate training sets.
2. Average them in order to obtain a single low-variance statistical
learning model:

B
1
fˆave (x ) fˆb (x ).
X
B b=1

This is not practical! Why?

We don’t usually have access to B training data sets.

9 / 68
Bagging

But we can bootstrap!

We can take repeated samples from the (single) training data set.
We will generate B different bootstrapped training data sets.
We then train our method on the bth bootstrapped training set in
order to get fˆ∗b (x ).
Finally, we average all the predictions, to obtain
B
1 X
ˆ
fbag (x ) = fˆ∗b (x ).
B b=1

This is called bagging.

10 / 68
Bagging for Regression Trees

To apply bagging to regression trees do the following:

1. construct B regression trees using B bootstrapped training sets

2. average the resulting predictions.

11 / 68
Bagging for Regression Trees

Regression trees are grown deep, and are not pruned.

Hence each individual tree has high variance, but low bias. Thus, a
veraging these B trees reduces the variance.
Bagging has been demonstrated to give impressive improvements in
accuracy by combining together hundreds or even thousands of trees
into a single procedure.

12 / 68
Bagging for Classification Trees

There are many ways to apply bagging for classification trees.

We explain the simplest way.

13 / 68
Bagging for Classification Trees

For a given test observation, we can record the class predicted by

each of the B trees, and take a majority vote.
A majority vote is simply the overall prediction is the most
commonly occurring class among the B predictions.

14 / 68
Out-of-Bag Error Estimation

It turns out that there is a very straightforward way to estimate the

test error of a bagged model, without the need to perform
cross-validation or the validation set approach.

15 / 68
Out-of-Bag Error Estimation

Recall that the key to bagging is that trees are repeatedly fit to
bootstrapped subsets of the observations.
One can show that on average, each bagged tree makes use of
around two-thirds of the observations. (See Exercise 2, Chapter 5).
The remaining one-third of the observations not used to fit a given
bagged tree are referred to as the out-of-bag (OOB) observations.
We can predict the response for the ith observation using each of
the trees in which that observation was OOB.
This will yield around B/3 predictions for the ith observation.

16 / 68
Out-of-Bag Error Estimation

In order to obtain a single prediction for the ith observation, we can

average these predicted responses (if regression is the goal) or can
take a majority vote (if classification is the goal).
This leads to a single OOB prediction for the ith observation.
An OOB prediction can be obtained in this way for each of the n
observations, from which the overall OOB MSE (for a regression
problem) or classification error (for a classification problem) can be
computed.
The resulting OOB error is a valid estimate of the test error for the
bagged model, since the response for each observation is predicted
using only the trees that were not fit using that observation.

17 / 68
OOB and CV

It can be shown that with B sufficiently large, OOB error is virtually

equivalent to leave-one-out cross-validation error.
The OOB approach for estimating the test error is particularly
convenient when performing bagging on large data sets for which
cross-validation would be computationally onerous.

18 / 68
Interpretability of Bagged Trees

Recall that bagging typically results in improved accuracy over

prediction using a single tree.
Unfortunately, however, it can be difficult to interpret the resulting
model.
But one main attraction of decision trees is their intrepretability!
However, when we bag a large number of trees, it is no longer
possible to represent the resulting statistical learning procedure
using a single tree, and it is no longer clear which variables are most
important to the procedure.
Thus, bagging improves prediction accuracy at the expense of
interpretability.

19 / 68
Variable Importance Measures

Although the collection of bagged trees is much more difficult to

interpret than a single tree, one can obtain an overall summary of
the importance of each predictor using

I the RSS (for bagging regression trees) or

I the Gini index (for bagging classification trees).

20 / 68
Variable Importance Measures

In the case of bagging regression trees, we can record the total

amount that the RSS is decreased due to splits over a given
predictor, averaged over all B trees.
A large value indicates an important predictor.
Similarly, in the context of bagging classification trees, we can add
up the total amount that the Gini index is decreased by splits over a
given predictor, averaged over all B trees.

21 / 68
Random Forests

Random forests provide an improvement over bagged trees by way

of a small tweak that decorrelates the trees.

22 / 68
Random Forests

As in bagging, we build a number of decision trees on bootstrapped

training samples.
But when building these decision trees, each time a split in a tree is
considered, a random sample of m predictors is chosen as split
candidates from the full set of p predictors.
The split is allowed to use only one of those m predictors.
A fresh sample of m predictors is taken at each split, and typically
√
we choose m ≈ p.

23 / 68
Random Forests

In building a random forest, at each split in the tree, the algorithm

is not even allowed to consider a majority of the available predictors.
What’s the rationale behind this?

24 / 68
Random Forests

Suppose that there is one very strong predictor in the data set,
along with a number of other moderately strong predictors.
Then in the collection of bagged trees, most or all of the trees will
use this strong predictor in the top split.
All of the bagged trees will look quite similar to each other and the
predictions from the bagged trees will be highly correlated.
Averaging many highly correlated quantities does not lead to as
large of a reduction in variance as averaging many uncorrelated
quantities.
In particular, this means that bagging will not lead to a substantial
reduction in variance over a single tree in this setting.

25 / 68
Random forests

Random forests overcome this problem by forcing each split to

consider only a subset of the predictors.
Therefore, on average (p − m)/p of the splits will not even consider
the strong predictor, and so other predictors will have more of a
chance.
We can think of this process as decorrelating the trees, thereby
making the average of the resulting trees less variable and hence
more reliable.

26 / 68
Random Forest versus Bagging

What’s the difference between random forests and bagging.

The choice of the predictor subset size m.
If a random forest is built where m = p, then bagging is a special
case of random forests.

27 / 68
Boosting

Boosting is another approach for improving the predictions resulting

from a decision tree.
Like bagging, boosting is a general approach that can be applied to
many statistical learning methods for regression or classification
trees.

28 / 68
Bagging

Recall that bagging works via the following procedure:

1. Create multiple copies of the original training data set using

the bootstrap.
2. Fit a separate decision tree to each bootstraped copy.
3. Then combine all of the decision trees in order to create a
single predictive model.

Each tree is built on a bootstrap data set, which is independent of

the other trees.

29 / 68
Boosting

Boosting works in a similar way, except that the trees are grown
sequentially:

1. Each tree is grown using information from previously grown

trees.
2. Boosting does not involve bootstrap sampling; instead each
tree is fit on a modified version of the original data set.

Let’s see how this works in the context of regression trees.

30 / 68
Boosting for Regresion Trees

What is the idea behind this procedure?

Unlike fitting a single large decision tree to the data, which amounts
to fitting the data hard and potentially overfitting, the boosting
approach instead learns slowly.
In general, statistical learning approaches that learn slowly tend to
perform well. Note that in boosting, unlike in bagging, the
construction of each tree depends strongly on the trees that have
already been grown.

31 / 68
Boosting for Regresion Trees

Given the current model, we fit a decision tree to the residuals from
the model.
That is, we fit a tree using the current residuals r rather than the
outcome Y , as the response.
We then add this new decision tree into the fitted function in order
to update the residuals.

32 / 68
Boosting for Regresion Trees

Each of these trees can be rather small, with just a few terminal
nodes, determined by the parameter d in the algorithm.
By fitting small trees to the residuals, we slowly improve fˆ in areas
where it does not perform well.
The shrinkage parameter λ slows the process down even further,
allowing more and different shaped trees to attack the residuals.

33 / 68
Boosting Algorithm for Regression Trees
1. Set fˆ(x ) = 0 and ri = yi for all i in the training data set.
2. For b = 1, 2, . . . , B, repeat:

1. Fit a tree fˆb with d splits ((d + 1) terminal nodes) to the

training data (X , r )
2. Update fˆ(x ) by adding in a shrunken version of the new tree:

fˆ(x ) ← fˆ(x ) + λfˆb (x ). (1)

3. Update the residuals

ri ← ri − λfˆb (xi ) (2)

3. Output the boosted model

ˆ ˆb
X
B 34 / 68
Tuning parameters

There are three tuning parameters that we must carefully consider.

1. The number of trees B.

2. The shrinkage parameter λ.
3. The number d splits in each tree.

35 / 68
Tuning parameters

1. The number of trees B.

Unlike bagging and random forests, boosting can overfit if B is too

large, although this overfitting tends to occur slowly if at all.
We use cross-validation to select B.

36 / 68
Tuning parameters

2. The shrinkage parameter λ. (This is a small positive number).

This controls the rate at which boosting learns.

Typical values are 0.01 or 0.001, and the right choice can depend on
the application.
Very small λ can require using a very large value of B in order to
achieve good performance.

37 / 68
Tuning parameters

3. The number d splits in each tree.

This controls the complexity of the boosted ensemble.

Often d = 1 works well, in which case each tree is a stump,
consisting of a single split.
In this case, the boosted ensemble is fitting an additive model, since
each term involves only a single variable.
More generally d is the interaction depth, and controls the
interaction order of the boosted model, since d splits can involve at
most d variables.

38 / 68
Boosting for Classification Trees

The details of boosting for classification trees are omitted from

ISLR, but they are in ESL if you wish to go over them on your own.

39 / 68
Bagging and Random Forests on Boston data set

Here we apply bagging and random forests to the Boston data,

using the randomForest package in R.
The exact results obtained in this section may depend on the version
of R and the version of the randomForest package installed on your
computer.
Recall that bagging is just a special case of random forests with
m = p.

40 / 68
Task 1

Setup a training and test set for the Boston data set.

41 / 68
Solution to Task 1

library(MASS)
library(randomForest)

## randomForest 4.6-12

## Type rfNews() to see new features/changes/bug fixes.

set.seed (1)
train = sample(1:nrow(Boston), nrow(Boston)/2)
boston.test=Boston[-train ,"medv"]

42 / 68
Task 2

Apply bagging using the randomForest package in R. What does the

argument mtry do?

43 / 68
Solution to Task 2

bag.boston=randomForest(medv~.,data=Boston,subset=train, mt

The argument mtry = 13 means that we should use all 13 predictors

for each split of the tree, hence, do bagging.

44 / 68
Task 3

How well does the bagged model perform on the test set?

45 / 68
Solution to Task 3
yhat.bag = predict(bag.boston,newdata=Boston[-train,])
plot(yhat.bag, boston.test)
abline(0,1)
50
40
boston.test

30
20
10

10 20 30 40 50

yhat.bag
46 / 68
Solution to Task 3

mean((yhat.bag-boston.test)^2)

## [1] 13.33831

The test set MSE associated with the bagged regression tree is
13.16, almost half that obtained using an optimally-pruned single
tree (investigate this on your own).

47 / 68
Task 4

Change the number of trees grown by randomForest() using the

ntree argument. For example, what happens to the MSE when we
grow the tree from 13 to 25?

48 / 68
Moving to random forests

Growing a random forest proceeds in exactly the same way, except

that we use a smaller value of the mtry argument.
By default, randomForest() uses p/3 variables when building a
√
random forest of regression trees, and p variables when building a
random forest of classification trees.

49 / 68
Task 5

Build a random forest on the same data set using mtry = 6.

Comment on the difference from the test MSE from using the
random forest compared to bagging.

50 / 68
Solution to Task 5

set.seed(1)
rf.boston=randomForest(medv~.,data=Boston,subset=train, mtr
yhat.rf = predict(rf.boston ,newdata=Boston[-train ,])
mean((yhat.rf-boston.test)^2)

## [1] 11.48022

We see that the test MSE for a random forest is 11.48; this
indicates that random forests yielded an improvement over bagging
in this case (versus 13.34).

51 / 68
Task 6

Using the importance() function, comment on the importance of

each variable.

52 / 68
Solution to Task 6
importance(rf.boston)

## %IncMSE IncNodePurity
## crim 12.547772 1094.65382
## zn 1.375489 64.40060
## indus 9.304258 1086.09103
## chas 2.518766 76.36804
## nox 12.835614 1008.73703
## rm 31.646147 6705.02638
## age 9.970243 575.13702
## dis 12.774430 1351.01978
## rad 3.911852 93.78200
## tax 7.624043 453.19472
## ptratio 12.008194 919.06760
## black 7.376024 358.96935
## lstat 27.666896 6927.98475
53 / 68
Solution to Task 6

Two measures of variable importance are reported. The former is

based upon the mean decrease of accuracy in predictions on the out
of bag samples when a given variable is excluded from the model.
The latter is a measure of the total decrease in node impurity that
results from splits over that variable, averaged over all trees.

54 / 68
Solution to Task 6
varImpPlot (rf.boston)

rf.boston

rm lstat

lstat rm

nox dis

dis crim

crim indus

ptratio nox

age ptratio

indus age

tax tax

black black

rad rad

chas chas

zn zn

5 10 15 20 25 30 0 1000 2000 3000 4000 5000 6000 7000

%IncMSE IncNodePurity

The results indicate that across all of the trees considered in the random forest, the wealth level of the community
(lstat) and the house size (rm) are by far the two most important variables. 55 / 68
Boosting on the Boston data set

We use the gbmpackage, and within it the gbm() function, to fit

boosted regression trees to the Boston data set.
We run gbm() with the option distribution=“gaussian” since this is
a regression problem.
If it were a binary classification problem, we would use
distribution=“bernoulli”.
The argument n.trees=5000 indicates that we want 5000 trees, and
the option interaction.depth=4 limits the depth of each tree.
The summary() function produces a relative influence plot and also
outputs the relative influence statistics.

56 / 68
Task 1

Perform boosting on the training data set, treating this as a

regression problem.

57 / 68
Solution to Task 1

library(gbm)

## Loading required package: survival

## Loading required package: lattice

## Loading required package: splines

## Loading required package: parallel

## Loaded gbm 2.1.3

58 / 68
Solution to Task 1

set.seed (1)
boost.boston=gbm(medv~.,data=Boston[train,],distribution="g

“interaction.depth” refers to the maximum depth of variable

interactions. 1 implies an additive model, 2 implies a model with up
to 2-way interactions, etc.

59 / 68
Task 2

What does summary of the boosted regression tree tell us?

60 / 68
Solution to Task 2
summary(boost.boston)
rm
crim
ptratio
age
indus
zn rad

0 10 20 30 40

Relative influence

61 / 68
Solution to Task 2

We see that lstat and rm are by far the most important variables.

62 / 68
Task 3

Produce partial dependence plots for these two variables (lstat and
rm).

63 / 68
Solution to Task 3
par(mfrow=c(1,2))
plot(boost.boston ,i="rm")
plot(boost.boston ,i="lstat")
32

30
30
28

25
f(lstat)
f(rm)

20
24
22

4 5 6 7 8 5 10 15 20 25 30 35

rm lstat

These plots illustrate the marginal effect of the selected variables on the response after integrating out the other
variables. In this case, as we might expect, median house prices are increasing with rm and decreasing with lstat. 64 / 68
Task 4

Now use the boosted model to predict medv on the test set. Report
the MSE.

65 / 68
Solution to Task 4

yhat.boost=predict(boost.boston,newdata=Boston[-train,], n.trees
mean((yhat.boost -boston.test)^2)

## [1] 11.84434

The test MSE obtained is 11.8; similar to the test MSE for random forests
and superior to that for bagging.

66 / 68
Task 5

What happens if we vary the shrinkage parameter from its default of

0.001 to 0.02? Report the test MSE.

67 / 68
Solution to Task 5

boost.boston=gbm(medv~.,data=Boston[train,],distribution="gaussi
yhat.boost=predict(boost.boston,newdata=Boston[-train,], n.trees
mean((yhat.boost -boston.test)^2)

## [1] 11.51109

In this case, using λ = 0.2 leads to a slightly lower test MSE than
λ = 0.001.

68 / 68

Lecture 5
No ratings yet
Lecture 5
53 pages
Bagging, Random Forests, and Boosting
No ratings yet
Bagging, Random Forests, and Boosting
48 pages
Tree-Based Ensemble Methods Explained
No ratings yet
Tree-Based Ensemble Methods Explained
13 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Regression Trees and Random Forests Explained
No ratings yet
Regression Trees and Random Forests Explained
34 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
Bagging Trees & Random Forests Guide
No ratings yet
Bagging Trees & Random Forests Guide
50 pages
Understanding Bagging in Random Forests
No ratings yet
Understanding Bagging in Random Forests
53 pages
Boosted and Random Forest Techniques
No ratings yet
Boosted and Random Forest Techniques
34 pages
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-02-18 Reference-Material-II
No ratings yet
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-02-18 Reference-Material-II
39 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Random Forest and Ensemble Learning Guide
No ratings yet
Random Forest and Ensemble Learning Guide
68 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
Bagging and Random Forest Overview
No ratings yet
Bagging and Random Forest Overview
35 pages
Understanding Regression Trees in ML
No ratings yet
Understanding Regression Trees in ML
11 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Bagging and Random Forest Explained
No ratings yet
Bagging and Random Forest Explained
11 pages
Decision Trees and Bagging Explained
No ratings yet
Decision Trees and Bagging Explained
2 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Bagging, Boosting, and Random Forests Explained
No ratings yet
Bagging, Boosting, and Random Forests Explained
27 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Understanding Random Forests in Machine Learning
No ratings yet
Understanding Random Forests in Machine Learning
12 pages
Decision Trees, Bagging, and Boosting Guide
No ratings yet
Decision Trees, Bagging, and Boosting Guide
2 pages
Unit 3
No ratings yet
Unit 3
63 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
32 pages
Data Science - Decision Tree - Random Forest
No ratings yet
Data Science - Decision Tree - Random Forest
15 pages
Understanding Random Forests and Bagging
No ratings yet
Understanding Random Forests and Bagging
8 pages
Bagging 17122018
No ratings yet
Bagging 17122018
12 pages
Machine Learning Ensembling Guide
No ratings yet
Machine Learning Ensembling Guide
7 pages
Unit 3
No ratings yet
Unit 3
59 pages
Random Forest Lecture
No ratings yet
Random Forest Lecture
5 pages
Understanding Random Forests in Machine Learning
100% (1)
Understanding Random Forests in Machine Learning
28 pages
Understanding Random Forest in ML
No ratings yet
Understanding Random Forest in ML
29 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
Random Forests
No ratings yet
Random Forests
43 pages
Ensemble Modeling Techniques Explained
No ratings yet
Ensemble Modeling Techniques Explained
80 pages
Bagging and Random Forest Presentation1
100% (4)
Bagging and Random Forest Presentation1
23 pages
Random Forest
No ratings yet
Random Forest
5 pages
Week8 Summary Detail
No ratings yet
Week8 Summary Detail
9 pages
Random Forests: Overview & Techniques
No ratings yet
Random Forests: Overview & Techniques
5 pages
Understanding Decision Trees and Random Forests
No ratings yet
Understanding Decision Trees and Random Forests
5 pages
Random Forest
No ratings yet
Random Forest
25 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
17 pages
Handbook Cs Rev2010
No ratings yet
Handbook Cs Rev2010
39 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Ensemble Algorithms
No ratings yet
Ensemble Algorithms
6 pages
Bagging vs Boosting in Ensemble Learning
No ratings yet
Bagging vs Boosting in Ensemble Learning
40 pages
Bagging
No ratings yet
Bagging
7 pages
Bagging and Random Forest 10 Marks Notes
No ratings yet
Bagging and Random Forest 10 Marks Notes
2 pages
DT Instability
No ratings yet
DT Instability
4 pages
Understanding Bagging and Boosting in ML
No ratings yet
Understanding Bagging and Boosting in ML
6 pages
Understanding Random Forest Techniques
100% (1)
Understanding Random Forest Techniques
83 pages
Machine Learning Ensemble Techniques Explained
No ratings yet
Machine Learning Ensemble Techniques Explained
22 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Unit 2
No ratings yet
Unit 2
13 pages
lecture19-FromTreesToForests RandomForests
No ratings yet
lecture19-FromTreesToForests RandomForests
50 pages
DSA5102 Lecture3
No ratings yet
DSA5102 Lecture3
34 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
11.2 Box and Whisker
No ratings yet
11.2 Box and Whisker
3 pages
Statistical Inference Assignment
No ratings yet
Statistical Inference Assignment
16 pages
Split-Plot Designs for Mixture Experiments
No ratings yet
Split-Plot Designs for Mixture Experiments
9 pages
Association Rule Learning
No ratings yet
Association Rule Learning
10 pages
Frequency Distribution Guide
No ratings yet
Frequency Distribution Guide
5 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
9 pages
ID3 Complete Solution
No ratings yet
ID3 Complete Solution
3 pages
Project Report 43
No ratings yet
Project Report 43
56 pages
Machine Learning in Big Data
No ratings yet
Machine Learning in Big Data
10 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Data Mining for IT Students
No ratings yet
Data Mining for IT Students
2 pages
Digital Visualization
No ratings yet
Digital Visualization
65 pages
Unit - IV - ML SPPU IT
No ratings yet
Unit - IV - ML SPPU IT
67 pages
Unit 5
No ratings yet
Unit 5
7 pages
Machine Learning for Hydrothermal Mineral Detection
No ratings yet
Machine Learning for Hydrothermal Mineral Detection
13 pages
A Case Study of Exploiting Decision Tree
No ratings yet
A Case Study of Exploiting Decision Tree
8 pages
DMDT53 002
No ratings yet
DMDT53 002
159 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
ML Interview Questions
No ratings yet
ML Interview Questions
60 pages
Decision Tree Classification Techniques
No ratings yet
Decision Tree Classification Techniques
41 pages
Data Mining Unit 4
No ratings yet
Data Mining Unit 4
12 pages
Project Report
No ratings yet
Project Report
33 pages
An Introduction To Statistical Learning PDF
No ratings yet
An Introduction To Statistical Learning PDF
35 pages
Traffic Accident Prediction in UAE
0% (1)
Traffic Accident Prediction in UAE
40 pages
Machine Learning Algorithm Guide
No ratings yet
Machine Learning Algorithm Guide
4 pages
Intro to Machine Learning Concepts
No ratings yet
Intro to Machine Learning Concepts
15 pages
Proposal New
No ratings yet
Proposal New
17 pages
Unit 3,4,5 MCQ
No ratings yet
Unit 3,4,5 MCQ
7 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
AI & ML Question Bank: Decision Trees, Bayesian Learning, Clustering
No ratings yet
AI & ML Question Bank: Decision Trees, Bayesian Learning, Clustering
7 pages
Introduction To AI and ML
No ratings yet
Introduction To AI and ML
22 pages
Advanced Databases Group Assessment 2
No ratings yet
Advanced Databases Group Assessment 2
34 pages
Regression Trees for Predicting Salaries
No ratings yet
Regression Trees for Predicting Salaries
49 pages

Tree-Based Methods Explained

Uploaded by

Tree-Based Methods Explained

Uploaded by

Tree Based Methods: Bagging, Boosting, and

Rebecca C. Steorts, Duke University

STA 325, Chapter 8 ISL

Recall that we introduced the boostrap (Chapter 5) which is useful

Decision trees such as regression or classification trees suffer from

Bootstrap aggregation or bagging is a general-purpose procedure for

Recall that given a set of independent observations

each with variance σ 2 the variance of Z̄ is σ 2 /n.

This is not practical! Why?

This is not practical! Why?

But we can bootstrap!

This is called bagging.

To apply bagging to regression trees do the following:

1. construct B regression trees using B bootstrapped training sets

Regression trees are grown deep, and are not pruned.

There are many ways to apply bagging for classification trees.

For a given test observation, we can record the class predicted by

It turns out that there is a very straightforward way to estimate the

In order to obtain a single prediction for the ith observation, we can

It can be shown that with B sufficiently large, OOB error is virtually

Recall that bagging typically results in improved accuracy over

Although the collection of bagged trees is much more difficult to

I the RSS (for bagging regression trees) or

In the case of bagging regression trees, we can record the total

Random forests provide an improvement over bagged trees by way

As in bagging, we build a number of decision trees on bootstrapped

In building a random forest, at each split in the tree, the algorithm

Random forests overcome this problem by forcing each split to

What’s the difference between random forests and bagging.

Boosting is another approach for improving the predictions resulting

Recall that bagging works via the following procedure:

1. Create multiple copies of the original training data set using

Each tree is built on a bootstrap data set, which is independent of

1. Each tree is grown using information from previously grown

Let’s see how this works in the context of regression trees.

What is the idea behind this procedure?

1. Fit a tree fˆb with d splits ((d + 1) terminal nodes) to the

fˆ(x ) ← fˆ(x ) + λfˆb (x ). (1)

3. Update the residuals

ri ← ri − λfˆb (xi ) (2)

3. Output the boosted model

There are three tuning parameters that we must carefully consider.

1. The number of trees B.

1. The number of trees B.

Unlike bagging and random forests, boosting can overfit if B is too

2. The shrinkage parameter λ. (This is a small positive number).

This controls the rate at which boosting learns.

3. The number d splits in each tree.

This controls the complexity of the boosted ensemble.

The details of boosting for classification trees are omitted from

Here we apply bagging and random forests to the Boston data,

## Type rfNews() to see new features/changes/bug fixes.

Apply bagging using the randomForest package in R. What does the

The argument mtry = 13 means that we should use all 13 predictors

Change the number of trees grown by randomForest() using the

Growing a random forest proceeds in exactly the same way, except

Build a random forest on the same data set using mtry = 6.

Using the importance() function, comment on the importance of

Two measures of variable importance are reported. The former is

5 10 15 20 25 30 0 1000 2000 3000 4000 5000 6000 7000

We use the gbmpackage, and within it the gbm() function, to fit

Perform boosting on the training data set, treating this as a

## Loading required package: survival

## Loading required package: lattice

## Loading required package: splines

## Loading required package: parallel

## Loaded gbm 2.1.3

“interaction.depth” refers to the maximum depth of variable

What does summary of the boosted regression tree tell us?

What happens if we vary the shrinkage parameter from its default of

You might also like