Ensemble Methods (Final)

+
ENSEMBLE METHODS
MACHINE LEARNING
 Vaishnavi Garg
 Sundram Goyal
+
Introduction
 Ensemble methods is a machine learning technique that
combines several base models in order to produce one
optimal predictive model.
 Ensemble methods take a many models into account,

and average those models to produce one final model.
 Ensemble modeling is a powerful way to improve the

performance of your model. It usually pays off to apply
ensemble learning over and above various models you
might be building.
Model A
Data Model B Predictio

Labeled/U n
nlabeled
Model C
+
Example
I want to invest in a company XYZ. I am not sure about its performance

though. So, I look for advice on whether the stock price will increase more
than 6% per annum or not? I decide to approach various experts having
diverse domain experience:
1. Employee of Company XYZ: In the past, he has been right 70% times.
2. Financial Advisor of Company XYZ: In the past, he has been right 75%
times.
3. Stock Market Trader: In the past, he has been right 70% times.
4. Employee of a competitor: In the past, he has been right 60% of times.
5. Market Research team in same segment: In the past, they have been
right 75% of times.
6. Social Media Expert : In the past, he has been right 65% of times.
+
 Given the broad spectrum of access we have, we can probably
combine all the information and make an informed decision.
 In a scenario when all the 6 experts/teams verify that it’s a

good decision(assuming all the predictions are independent of
each other), we will get a better combined accuracy rate .
+
Homogeneous vs
Heterogeneous Ensemble
Learning
 Homogeneous ensemble consists of members having a
single-type base learning algorithm. Popular methods
like bagging and boosting generate diversity by
sampling from or assigning weights to training
examples but generally utilize a single type of base
classifier to build the ensemble.
 On the other hand, Heterogeneous ensemble consists

of members having different base learning algorithms
such as Support Vector Machine , Artificial Neural
Network and Decision Trees.
+ Sequenti
al
Ensemble
Learning
(Boosting
)
ENSEMBLE Parallel
Ensemble
Learning
LEARNING (Bagging)
Stacking
+ Sequential Ensemble Learning
(Boosting)
 Sequential ensemble methods where the base learners are generated
sequentially.
 Boosting, is a machine learning ensemble algorithm for principally

increasing accuracy in supervised learning, and that convert weak
learner to strong ones.
 Example : Adaboost, Stochastic Gradient Boosting
 Boosting Process Steps:
1.First, generate Random Sample from Training Data-set.
2. Train a classifier model 1 for this generated sample data and test the
whole training data-set.
3.Calculate the error for each instance prediction. if the instance is

classified wrongly, increase the weight for that instance and create
another sample.
4.Repeat this process until you get high accuracy from the system.
+
DATA SAMPLE 1 SAMPLE 2 SAMPLE n
MODEL 1 MODEL 2 MODEL n

Weight Weight Weight
the the the
Instances Instances Instances
Testing Data
Prediction 1 Prediction 2 Prediction n
E1 E2 En
Predictio
n
+
Parallel Ensemble Learning
(Bagging)
 Parallel ensemble methods where the base learners are generated in
parallel.
 Bagging stands for Bootstrap Aggregating, is an approach where you
take random samples of data, build learning algorithms and take
simple means to find predictions.
 Algorithms : Random Forest, Bagged Decision Trees, Extra Trees
 Bagging Process :
1. Select random “m” observations out of a population of “n”

observations (where m<n)(with replacement).
2. After the bootstrapped samples are formed, separate models are

trained with the bootstrapped samples.
3. The models are tested using the test set.
4. The final output prediction is combined across the projections of all

the models.
+ Training Set
Training
D11 D2 Dn
Classification
Models
C1 C2 Cn
Prediction
Predictions P1 P2 Pn
Voting
Final Prediction Pf
+
Bagging vs Boosting
Bagging Boosting
Partitioning of Random Higher vote to
data misclassified samples
Nature Parallel Sequential
Goal to achieve Increase consistency Increase accuracy
Final Decision Average of the n Weighted average of

learners the n learners
Models built Independently New models are
influenced
by performance of
previously built models.
Training Data set Randomly drawn with Every new subsets
replacement from the contains the elements
entire training dataset that were misclassified
by previous models
+ G
BOOSTIN
ING
BAGG
 Bagging and Boosting are two types of Ensemble

Learning. These two decrease the inconsistency of a
single estimate as they combine several estimates
from different models. So the result may be a model
with higher stability:
 If the difficulty of the single model is irregularity, then

Bagging is the best option.
 If the problem is that the single model gets a very low

prediction power or is too simple, Boosting could
generate a combined model with lower errors.
+ Advantages of Ensemble
Methods
 Ensemble methods are used in almost all the ML
hackathons to enhance the prediction abilities of the
models.
 The ensemble of models will give better performance on

the test case scenarios (unseen data) as compared to the
individual models in most of the cases.
 The aggregate result of multiple models is always less

noisy than the individual models. This leads to model
stability and robustness.
 Ensemble models can be used to capture the linear as well

as the non-linear relationships in the data. This can be
accomplished by using two different models and forming
an ensemble of the two.
+
Disadvantages of Ensemble Methods
 Computation and design time is high which is not good
for real time applications.
 It usually produces output that is very hard to analyze.
 The selection of models for creating an ensemble is an
art which is really hard to master.
+ Applications
Remote sensing
 Land cover mapping
 Change detection
Computer security
 Distributed denial of service
 Malware Detection
 Intrusion detection
Face recognition
Emotion recognition
Fraud detection
Financial decision-making
Medicine
+
THE END

Ensemble Methods (Final)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ensemble Methods (Final)

Uploaded by

Copyright:

Available Formats

+

 Ensemble methods take a many models into account,

 Ensemble modeling is a powerful way to improve the

Data Model B Predictio

I want to invest in a company XYZ. I am not sure about its performance

4. Employee of a competitor: In the past, he has been right 60% of times.

 In a scenario when all the 6 experts/teams verify that it’s a

 On the other hand, Heterogeneous ensemble consists

 Boosting, is a machine learning ensemble algorithm for principally

 Example : Adaboost, Stochastic Gradient Boosting

 Boosting Process Steps:

1.First, generate Random Sample from Training Data-set.

3.Calculate the error for each instance prediction. if the instance is

DATA SAMPLE 1 SAMPLE 2 SAMPLE n

MODEL 1 MODEL 2 MODEL n

1. Select random “m” observations out of a population of “n”

2. After the bootstrapped samples are formed, separate models are

3. The models are tested using the test set.

4. The final output prediction is combined across the projections of all

Goal to achieve Increase consistency Increase accuracy

Final Decision Average of the n Weighted average of

 Bagging and Boosting are two types of Ensemble

 If the difficulty of the single model is irregularity, then

 If the problem is that the single model gets a very low

 The ensemble of models will give better performance on

 The aggregate result of multiple models is always less

 Ensemble models can be used to capture the linear as well

You might also like