Professional Documents
Culture Documents
7.3 Random Forest A serious problem that prevents bagging from effectively reducing the variance is
that bagging trees are highly correlated when there are a few strong predictors in the set of all
predictors — the trees built from the n training sets then end up very similar to each other. Random
Forest overcomes this problem by decorrelating the bagging trees with a clever splitting technique:
At each split, m predictors are sampled from all the p predictors (empirically m ≃ √p gives good
performance), and the split is allowed to use only one of these m predictors. Procedurally, RF is
otherwise very similar to bagging, with a moderate increase in variance reduction. 7.4 Boosting
There are two main differences between bagging and Boosting: • Fitting Target: Boosting fits to the
residual rather than the response per se. • Fitting Procedure: Boosting builds trees sequentially
rather than by simultaneous sampling. The following algorithm describes the boosting method: •
Set ˆf(x) = 0 and ri = yi for all i in the training set. • For b = 1, 2, ..., B, repeat: – Fit a tree ˆf b with d
split (d + 1 terminal node) to the training data (X, r). – Update ˆf by adding in a shrunken version of
the new tree: ˆf(x) ← ˆf(x) + λ ˆf b (x) (7.9) – Update the residuals, ri ← ri − λ ˆf b (xi ) (7.10) • Output
the boosted model, ˆf(x) = X B b=1 λ ˆf b (x) (7.11) The idea behind boosting is learning slowly and
improve ˆf in areas where it does not perform well, which is in a way highlighted by residuals. The
shrinkage parameter λ (typically 0.001 ≤ λ ≤ 0.01) tunes the process by showing it down further,
which allows thus fine-tuned attack on the residuals with different shaped trees. d here is a step
parameter which controls the depth (usually d = 1 works well, where each subtree is a stump) of the
subtree added for each iteration (at step 2 in the Boosting Algorithm)
7.5 Lab Code % Decision Tree library(tree) % R version 3.2.3 library(ISLR) attach(Carseats) High =
ifelse(Sales<=8, ‘No’, ‘Yes’) Carseats = data.frame(Carseats, High) % select ‘ Sales’ for response %
factorify ’Sale’ by 8 as threshold tree . carseats = tree(High˜.-Sales, Carseats) % build a DT for ‘High’
using all predictors (leave ‘Sale’ out) summary(tree.carseats) % see report of # of terminal nodes, %
residual mean deviance, misclassification error rate, % and predictors by importance plot(tree .
carseats ) text(tree . carseats , pretty=0) % plot tree and print node labels set .seed(2) train =
sample(1:nrow(Carseats), 200) Carseats. test = Carseats[-train ,] % half-half train - test split High.test
= High[-train] tree . carseats = tree(High˜.-Sales, Carseats, subset=train) tree .pred = predict(tree.
carseats , Carseats. test , type=‘class’ ) table(tree.pred, High.test) (86+57)/200 ... % = 0.715 % (No˜No
+ Yes˜Yes) / Total set.seed(3) cv. carseats = cv.tree(tree . carseats , FUN=prune.misclass) % 10-block
CV by default % report 10 trees of different sizes , and % corresponding misclass rate ($dev) % 9-
node tree has the best par(mfrow=c(1,2)) plot(cv.carseats$size , cv. carseats$dev, type=‘b’) plot(cv.
carseats$k, cv. carseats$dev, type=‘b’)
% plot size of nodes against error rate % plot # of CV-folds against error rate prune.carseats =
prune.misclass(tree. carseats , best=9) plot(prune.carseats) text(prune.carseats, pretty=0) % plot the
best tree tree .pred = predict(prune.carseats, Carseats.test, type=‘class’) table(tree .pred, High.test)
(94+60)/200 ... % = 0.77 % pruned tree performs better! % Regression Tree library (MASS) set
.seed(1) train = sample(1:nrow(Boston), nrow(Boston)/2) tree .boston = tree(medv˜.,Boston,
subset=trian) summary(tree.boston) % outputs # of terminal nodes, % residual mean deviance,
distribution of residuals plot(tree .boston) text(tree .boston, pretty=0) % plot tree cv.boston =
cv.tree(tree .boston) plot(cv.boston$size, cv.boston$dev, type=‘b’) prune.boston =
prune.tree(tree.boston, best=5) plot(prune.boston) text(prune.boston, pretty=0) % prune tree with
the best # of nodes yhat = predict(tree.boston, newdata=Boston[-train,]) boston.test = Boston[-
train,‘medv’] plot(yhat, boston.test ) abline (0,1) mean((yhat-boston.test)ˆ2) % evaluation % Bagging,
Random Forest library (randomForest) set .seed(1)