You are on page 1of 16

+

ENSEMBLE METHODS
MACHINE LEARNING
 Vaishnavi Garg
 Sundram Goyal
+
Introduction
 Ensemble methods is a machine learning technique that
combines several base models in order to produce one
optimal predictive model.

 Ensemble methods take a many models into account,


and average those models to produce one final model.

 Ensemble modeling is a powerful way to improve the


performance of your model. It usually pays off to apply
ensemble learning over and above various models you
might be building.
Model A

Data Model B Predictio


Labeled/U n
nlabeled
Model C
+
Example

I want to invest in a company XYZ. I am not sure about its performance


though. So, I look for advice on whether the stock price will increase more
than 6% per annum or not? I decide to approach various experts having
diverse domain experience:

1. Employee of Company XYZ: In the past, he has been right 70% times.

2. Financial Advisor of Company XYZ: In the past, he has been right 75%
times.

3. Stock Market Trader: In the past, he has been right 70% times.

4. Employee of a competitor: In the past, he has been right 60% of times.

5. Market Research team in same segment: In the past, they have been
right 75% of times.

6. Social Media Expert : In the past, he has been right 65% of times.
+
 Given the broad spectrum of access we have, we can probably
combine all the information and make an informed decision.

 In a scenario when all the 6 experts/teams verify that it’s a


good decision(assuming all the predictions are independent of
each other), we will get a better combined accuracy rate .
+
Homogeneous vs
Heterogeneous Ensemble
Learning
 Homogeneous ensemble consists of members having a
single-type base learning algorithm. Popular methods
like bagging and boosting generate diversity by
sampling from or assigning weights to training
examples but generally utilize a single type of base
classifier to build the ensemble.

 On the other hand, Heterogeneous ensemble consists


of members having different base learning algorithms
such as Support Vector Machine , Artificial Neural
Network and Decision Trees.
+ Sequenti
al
Ensemble
Learning
(Boosting
)

ENSEMBLE Parallel
Ensemble
Learning
LEARNING (Bagging)

Stacking
+ Sequential Ensemble Learning
(Boosting)
 Sequential ensemble methods where the base learners are generated
sequentially.

 Boosting, is a machine learning ensemble algorithm for principally


increasing accuracy in supervised learning, and that convert weak
learner to strong ones.

 Example : Adaboost, Stochastic Gradient Boosting

 Boosting Process Steps:

1.First, generate Random Sample from Training Data-set.

2. Train a classifier model 1 for this generated sample data and test the
whole training data-set.

3.Calculate the error for each instance prediction. if the instance is


classified wrongly, increase the weight for that instance and create
another sample.

4.Repeat this process until you get high accuracy from the system.
+

DATA SAMPLE 1 SAMPLE 2 SAMPLE n

MODEL 1 MODEL 2 MODEL n


Weight Weight Weight
the the the
Instances Instances Instances
Testing Data
Prediction 1 Prediction 2 Prediction n
E1 E2 En

Predictio
n
+
Parallel Ensemble Learning
(Bagging)
 Parallel ensemble methods where the base learners are generated in
parallel.
 Bagging stands for Bootstrap Aggregating, is an approach where you
take random samples of data, build learning algorithms and take
simple means to find predictions.
 Algorithms : Random Forest, Bagged Decision Trees, Extra Trees
 Bagging Process :

1. Select random “m” observations out of a population of “n”


observations (where m<n)(with replacement).

2. After the bootstrapped samples are formed, separate models are


trained with the bootstrapped samples.

3. The models are tested using the test set.

4. The final output prediction is combined across the projections of all


the models.
+ Training Set

Training
D11 D2 Dn

Classification
Models
C1 C2 Cn

Prediction

Predictions P1 P2 Pn

Voting

Final Prediction Pf
+
Bagging vs Boosting
Bagging Boosting
Partitioning of Random Higher vote to
data misclassified samples
Nature Parallel Sequential

Goal to achieve Increase consistency Increase accuracy

Final Decision Average of the n Weighted average of


learners the n learners
Models built Independently New models are
influenced
by performance of
previously built models.
Training Data set Randomly drawn with Every new subsets
replacement from the contains the elements
entire training dataset that were misclassified
by previous models
+ G
BOOSTIN
ING
BAGG

 Bagging and Boosting are two types of Ensemble


Learning. These two decrease the inconsistency of a
single estimate as they combine several estimates
from different models. So the result may be a model
with higher stability:

 If the difficulty of the single model is irregularity, then


Bagging is the best option.

 If the problem is that the single model gets a very low


prediction power or is too simple, Boosting could
generate a combined model with lower errors.
+ Advantages of Ensemble
Methods
 Ensemble methods are used in almost all the ML
hackathons to enhance the prediction abilities of the
models.

 The ensemble of models will give better performance on


the test case scenarios (unseen data) as compared to the
individual models in most of the cases.

 The aggregate result of multiple models is always less


noisy than the individual models. This leads to model
stability and robustness.

 Ensemble models can be used to capture the linear as well


as the non-linear relationships in the data. This can be
accomplished by using two different models and forming
an ensemble of the two.
+
Disadvantages of Ensemble Methods
 Computation and design time is high which is not good
for real time applications.
 It usually produces output that is very hard to analyze.
 The selection of models for creating an ensemble is an
art which is really hard to master.
+ Applications
Remote sensing
 Land cover mapping
 Change detection

Computer security
 Distributed denial of service
 Malware Detection
 Intrusion detection

Face recognition

Emotion recognition

Fraud detection

Financial decision-making

Medicine
+

THE END

You might also like