You are on page 1of 23

Bagging and Random Forest; Boosting and AdaBoost

Prepared By:
1. Endale Daba SGS/0005/2011A
2. Birhanu Mesfin SGS/0002/2011A
3. Senait T/markos SGS/0034/2011A

Submitted To: Dr. Michael Melese (PhD)

June,2019
Outlines
Introduction

Bagging

Bagging Algorithm

Random Forest

Random Forest Algorithm

Boosting

AdaBoost

References
 Machine learning studies automatic techniques for learning to make accurate
predictions based on past observations.
 Machine learning was defined in 90’s by Arthur Samuel described as it is a field
of study that gives the ability to the computer for self-learn without being
explicitly programmed.
 Supervised machine learning about allocating labeled data so that a certain
pattern or function can be deduced from that data.
◦ Classification: Majority vote within the region.
◦ Regression: Mean of training data within the region.
◦ CART: Classification and regression trees.
 Bootstrap aggregation or bagging is a general-purpose procedure for reducing the variance
of a statistical learning method. It is frequently used in the context of decision trees.
 Bagging (bootstrap + aggregating) or simple Bootstrap Aggregating.
 Bootstrapping is a statistical resampling technique that involves random sampling of a
dataset with replacement. It is a means of quantifying the uncertainty in machine learning
model.
 The idea is to repeatedly sample data with replacement from the original training set in order
to produce multiple separate training sets.
 Bagging seems to work especially well for high variance, low bias procedures such as trees.

 Bagging is a way to decrease the variance of prediction by generating additional


data for training from original dataset using combinations with repetitions to
produce multisets of the same cardinality/size as to original data.
 Bagging fit many large trees to bootstrap resampled versions of the training data and classify
by majority vote.
 Addition of a small number of extra training observations can dramatically alter the prediction
performance of a learned tree, despite the training data not changing to any great extent.
 Bagging can be :
 Parallel ensemble each model is built independently
 Aim to decrease variance not bias
 Suitable for high variance low bias models (complex models)
 Method
 Train multiple (k) models on different samples (data splits) and average their predictions.
 Predict (test) by averaging the results of k models.

 Goal
 Improve the accuracy of one model by using its multiple copies.
 The population build a separate prediction model using each training set and
average the resulting predictions.
 While bagging can improve predictions for many regression methods.
 Here is how to apply bagging to regression trees:
 Construct B regression trees using B bootstrapped training sets.
 We then average the predictions.
 These trees are grown deep and are not pruned.
 Each tree has a high variance with low bias. Averaging the B trees brings down the
variance.
 combining hundreds or thousands of trees in a single procedures.
 Algorithm 1: Bagging of Random forest
 Random Forest is a supervised learning algorithm.
 It provides an improvement over bagged trees by way of a small tweak
that decorrelates the trees. As in bagging, we build a number of decision
trees on bootstrapped training samples.
 Random forests is a substantial modification of bagging that builds a large
collection of de-correlated trees and then averages them.
 Random Forest is very flexible and easy to use machine learning
algorithm.
 Due to its simplicity and the fact that it can be used for both regression
and classification tasks, RF algorithm is widely used.
The pseudo code for random forest algorithm can split into two stages.
• Random forest creation pseudo code.
• Pseudo code to perform prediction from the created random forest
classifier.
Random forest creation pseudo code:
◦ 1. Randomly select “k” features from total “m” features.
Where k < m
◦ 2. Among the “k” features, calculate the node “d” using the best split
point.
◦ 3. Split the node into daughter nodes using the best split.
◦ 4. Repeat 1 to 3 steps until “l” number of nodes has been reached.
◦ 5. Build forest by repeating steps 1 to 4 for “n” number times to
create “n” number of trees.
 To perform prediction using the trained random forest
algorithm uses the below pseudo code:
1.Takes the test features and use the rules of each randomly
created decision tree to predict the outcome and stores the
predicted outcome (target)
2.Calculate the votes for each predicted target.
3.Consider the high voted predicted target as the final
prediction from the random forest algorithm.
 To perform the prediction using the trained random forest algorithm we need to
pass the test features through the rules of each randomly created trees. Suppose
let’s say we formed 100 random decision trees to from the random forest.
 Each random forest will predict different target (outcome) for the same test
feature. Then by considering each predicted target votes will be calculated.
Suppose the 100 random decision trees are prediction some 3 unique targets x,
y, z then the votes of x is nothing but out of 100 random decision tree how
many trees prediction is x.

 Likewise for other 2 targets (y, z). If x is getting high votes. Let’s say out of
100 random decision tree 60 trees are predicting the target will be x. Then the
final random forest returns the x as the predicted target.
 This concept of voting is known as majority voting.
 Banking: for finding the loyal customer and finding the fraud
customers.
 Medicine: used identify the correct combination of the
components to validate the medicine. Random forest algorithm is
also helpful for identifying the disease by analyzing the patient’s
medical records.
 Stock Market: used to identify the stock behavior as well as the
expected loss or profit by purchasing the particular stock.
 E-commerce: used only in the small segment of the
recommendation engine for identifying the likely hood of customer
liking the recommend products base on the similar kinds of
customers.
Algorithm 2. Random Forest Algorithm
 Random Forest has tremendous potential of becoming a popular
technique for future classifiers because its performance has been
found to be comparable with ensemble techniques bagging and
boosting.

 It is collection of unpruned CARTS which used rule to combine


individual tree decisions.

 It is used for the purpose of improving prediction accuracy.


 Hyper parameters are the arguments that can be set before training and which
define how the training is done.
 The main hyper parameters in Random Forests are

The number of decision trees to be combined


The maximum depth of the trees
The maximum number of features considered at each split
Whether bagging/bootstrapping is performed with or without replacement
 Random Forest implementations are available in many machine learning libraries
for R and Python like caret (R imports the random Forest and other RF packages)
Scikit-learn (Python) and H2O (R and Python).
 The pros of Random Forests are that they are a relatively fast and
powerful algorithm for classification and regression learning.
 Calculations can be parallelized and perform well on many
problems, even with small datasets and the output returns
prediction probabilities.
 The same random forest algorithm can be used for both
classification and regression task.
 The random forest algorithm can be used for feature engineering
Which means identifying the most important features out of the
available features from the training dataset.
 Like bagging boosting is a general approach that can be applied to many statistical learning

methods for regression or classification.


 Boosting works in a similar way except that the trees are grown sequentially. Each tree is
grown using information from previously grown trees.
 Boosting is a two-step approach where one first uses subsets of the original data to produce a
series of averagely performing models and then boosts their performance by combining them
together using a particular cost function (=majority vote).

 Unlike bagging in the classical boosting the subset creation is not random and depends upon
the performance of the previous models every new subsets contains the elements that were
(likely to be) misclassified by previous models.

 Boosting can Reduce variance (the same as Bagging) But also to eliminate the effect
of high bias of the weak learner (unlike Bagging).

 Boosting works by primarily reducing bias in the early stages and primarily
reducing variance in latter stages.
 Boosting can be:
 sequential ensemble try to add new models that do well where previous models lack
 aim to decrease bias not variance
 suitable for low variance high bias models

 Boosting is an ensemble technique that attempts to create a strong classifier from


a number of weak classifiers.
 It is the best starting point for understanding boosting. Modern boosting methods
build on AdaBoost most notably stochastic gradient boosting machines.

 Boosting is otfen Robust to overfitting. Boosting is all about Combine weak


classifiers to obtain very strong classifier , Weak classifier slightly better than
random on training data and Resulting very strong classifier can eventually
provide zero training error.
 It is boosting by sampling , the AdaBoost algorithm solved many of the practical
difficulties of the earlier boosting algorithms.
 Instead of resampling reweight misclassified training.
 AdaBoost can:
 Generate a sequence of base learners each focusing on previous one’s errors.
 The probability of a correctly classified instance is decreased and the probability of a
miss classified instance increases. This has the effect that the next classifier focuses
more on instances miss classified by the previous classifier.

 AdaBoost is used with short decision trees. After the first tree is created, the
performance of the tree on each training instance is used to weight how much
attention the next tree that is created should pay attention to each training instance.
 Bagging, random forests and boosting are good methods for improving the
prediction accuracy of trees.
 They work by growing many trees on the training data and then combining the
predictions of the resulting ensemble of trees.
 The latter two methods random forests and boosting are among the state-of-
the-art methods for supervised learning.
 Combining multiple learners has been a popular topic in machine learning
since the early 1990s and research has been going on ever since.
[1]Allen, E., Horvath, S., Kraft, P., Tong, F., Spiteri, E., Riggs, A., and Marahrens, Y. (2003), “High
Concentrations
of LINE Sequence Distinguish Monoallelically-Expressed Genes,” in Proceedings of the National
Academy of Sciences, 100(17), pp. 9940–9945.
[2] Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984), Classification and Regression
Trees, New York: Chapman and Hall.
[3] Cox, T. F., and Cox, M. A. A. (2001), Multidimensional Scaling, Boca Raton: Chapman and Hall/CRC.
[4] Hastie, T., Tibshirani,R., and Friedman, J. H. (2001), The Elements of Statistical Learning:
DataMining, Inference, and Prediction, New York: Springer.
[5] Hubert, L., and Arabie, P. (1985), “Comparing Partitions,” Journal of Classification, 2, 193–218.
Kaplan, E. L., and Meier, P. (1958), “Nonparametric Estimation from Incomplete Observations,” Journal
of the American Statistical Association, 53, 457–48.
[6] Kaufman, L., and Rousseeuw, P. J. (1990), Finding Groups in Data: An Introduction to Cluster
Analysis, NewYork: Wiley.
[7] Kruskal, J. B., and Wish, M. (1978), Multidimensional Scaling, Beverly Hills, CA: Sage Publications.
[8] Liaw, A., and Wiener, M. (2002), “Classification and Regression by randomForest,” R News: The
Newsletter of the R Project. Available online at http://cran.r-project.org/doc/Rnews/, 2(3), 18–22.
Thank You!!

You might also like