Ensemble Learning
Ensemble Learning
Ensemble Learning
What is the meaning of Ensemble Learning?
Ensemble learning is a machine learning technique that involves combining multiple models (or
"learners") to improve the overall performance of a predictive model. The idea is that by combining
several different models, each with its own strengths and weaknesses, the ensemble model will be more
accurate and robust than any individual model.
Bagging
Bagging, short for Bootstrap Aggregating, is an ensemble learning technique that involves building
multiple models using subsets of the training data and combining their predictions through a
process of aggregation.
P a g e 1 | 15
Ensemble Learning Designed by Hassan Elhefny
2
In bagging, several subsets of the training data are created by randomly sampling the original data
with replacement. Each subset is used to train a separate model, which is then combined with the other
models to form the final ensemble model.
The aggregation process in bagging can be done by taking the average (for regression) or the majority
vote (for classification) of the predictions made by the individual models. The resulting ensemble
model is usually more robust and accurate than any individual model.
One of the main advantages of bagging is that it can help to reduce overfitting, which occurs when a
model is too complex and captures noise in the training data rather than the underlying pattern. By
training multiple models on different subsets of the data, bagging can help to reduce the impact of noise
and increase the generalization performance of the ensemble model.
1. Random Forest: A decision tree-based model that uses bagging to create an ensemble of
decision trees. Random Forest is a popular model for classification and regression problems.
2. Bagging meta-estimator: A general-purpose bagging model that can be used with any base
estimator, such as decision trees, SVMs, or neural networks.
3. Extra Trees: An extension of Random Forest that introduces more randomness into the tree-
building process by using random thresholds for each feature.
P a g e 2 | 15
Ensemble Learning Designed by Hassan Elhefny
3
Random Forest
Random Forest Classifier is a popular ensemble learning algorithm that combines multiple decision
trees to make predictions for classification problems. It is a type of bagging method that creates
multiple decision trees using a random subset of the features and training data and then combines
the results of these trees to make a prediction.
P a g e 3 | 15
4 Ensemble Learning Designed by Prof. Disha Nagpure
In a random forest, each decision tree is built on a different bootstrap sample of the training data,
which is created by randomly sampling the training data with replacement. Additionally, at each split
of the tree, only a random subset of the features is considered for splitting, rather than all the features.
This helps to reduce the correlation between the trees and increases the diversity of the ensemble.
To make a prediction using a random forest classifier, the prediction of each decision tree is combined
through a majority vote. That is, the output class with the highest number of votes among all the decision
trees is considered the final output.
Random Forest Classifier has several advantages over individual decision trees, including better
generalization performance, reduced overfitting, and improved accuracy. They are also relatively
easy to use and can handle large amounts of data and many features.
Random Forest Classifier is commonly used in a variety of applications, such as medical diagnosis,
sentiment analysis, image classification, and other fields. They are particularly effective in situations
where the data has many features or where there is a high degree of noise or missing data.
What are the main differences between Random Forest and Extra Trees Classifiers?
The main difference between Random Forest and Extra Trees classifiers in bagging is in the wa
individual trees are constructed.
In Random Forest, each decision tree is built using a random subset of features, and the optimal
splitting point for each feature is selected among a subset of randomly sampled thresholds. This
process adds some randomness to the tree-building process, which helps to reduce overfitting and
increase the diversity of the trees in the ensemble.
In contrast, Extra Trees classifiers use an even more random approach to the tree-building process.
Instead of searching for the optimal splitting point for each feature, Extra Trees classifiers select the
splitting point randomly for each feature, without any optimization. This results in a higher level of
randomness in the tree-building process, which can further reduce overfitting and increase the diversity of
the trees.
The increased randomness in Extra Trees classifiers can make them more effective in situations
where the data has many noisy features, and where the optimal splitting points for the features may
be hard to identify.
Page4|9
5 Ensemble Learning Designed by Prof. Disha Nagpure
However, this increased randomness can also make Extra Trees classifiers
more computationally expensive than Random Forest classifiers, as each tree
must be built using a larger number of randomly selected thresholds.
Boosting
Boostingis an ensemble lear
ningtechnique that combines multiple weak
learners to createstrong
a learner. In contrast to bagging methods like
Random Forest and Extra Trees, which create multiple models
independently
and combinetheir results, boostingmethods build a
The basic idea of boostingis to assign higher weights to the samples that
are misclassified by the previous modelso that the subsequent model focuses
more on these samples. The process is repeated iteratively , with each
subsequent model trying to correct the mistakes of the previous models.
The final predictionis a weighted combination of the predictions of all the
models in the sequence.
P a g e 5|9
6 Ensemble Learning Designed by Prof. Disha Nagpure
Boosting Models
There are several types of boos ting methods, including:
Page 6|9
7 Ensemble Learning Designed by Prof. Disha Nagpure
4. LGBM Classifier: LightGBM (LGBM) is a popular gradient-boosting framework that was develop
Microsoft. It is designed to be fast, scalable, and highly efficient for large-scale data and is known
ability to handle large datasets.
Parallel Processing in XGBOOST: XGBoost can perform parallel processing in two ways: data
parallelism and task parallelism. Data parallelism involves partitioning the dataset into smaller
subsets and training multiple models on each subset in parallel. Task parallelism involves
parallelizing the computations within a single model, such as parallelizing the gradient computation.
To enable parallel processing in XGBoost, you can set the n_jobs parameter to specify the number of
CPU cores to use. For example, setting n_jobs=-1 will use all available CPU cores for parallel
processing.
What are the main differences between XGBOOST, Gradient Boosting, LGBM, and Adaboost?
All XGBoost, Gradient Boosting, LightGBM (LGBM), and AdaBoost are ensemble methods that use
boosting to improve the accuracy of machine learning models. However, there are some key differences
between them.
2. Gradient Boosting: Gradient Boosting is a general technique for improving the accuracy of
machine learning models by combining multiple weak models. It works by iteratively training
new models on the errors of the previous models and adding them to the ensemble. Gradient
Boosting is a powerful technique, but it can be slow and prone to overfitting.
Page7|9
8 Ensemble Learning Designed by Prof. Disha Nagpure
making it well-suited for large datasets. LightGBM also includes features such as GP
acceleration and distributed training.
4. AdaBoost (Adaptive Boosting): AdaBoost is a boosting algorithm that works by weighting the
training data and focusing on the misclassified samples. It trains weak models on the weighted
data and combines them into an ensemble. AdaBoost is a simple and effective algorithm, but it
can be sensitive to noisy data.
In summary, while all these algorithms use boosting to improve the accuracy of machine learning models,
they differ in their implementation details and optimization techniques. XGBoost and LightGBM are
known for their speed and performance on large datasets, while Gradient Boosting and AdaBoost are
more widely used and provide a good balance between accuracy and simplicity.
Stacking
Stacking is an ensemble learning technique that involves combining multiple machine learning models to
improve the accuracy of predictions. It is a meta-learning approach that works by training multiple
base models on the same data and then using a meta-model to combine their predictions.
Once the meta-model is trained, it can be used to make predictions on the test set. The final output is a
combination of predictions from the base models and the meta-model.
The main advantage of stacking is that it can often produce more accurate predictions than using a
single model by Combining the predictions from multiple models, can reduce the bias and variance of
the overall model and improve the overall performance.
However, stacking can also be more computationally expensive and require more data than
other ensemble methods like bagging and boosting. It also requires careful tuning of the hyperparameters
of both the base models and the meta-model.
Overall, stacking is a powerful technique for improving the accuracy of machine learning models, but
it requires careful implementation and tuning to achieve the best results.
Page8|9