You are on page 1of 4

A Visual and Overly Simplified Guide

To Bagging and Boosting


Many folks often struggle to understand the core essence of Bagging and boosting. Here’s a
simplified visual guide depicting what goes under the hood.

In a gist, an ensemble combines multiple models to build a more powerful model.

They are fundamentally built on the idea that by aggregating the predictions of multiple models, the
weaknesses of individual models can be mitigated. Combining models is expected to provide better
overall performance.

Whenever I wish to intuitively illustrate their immense power, I use the following image:

Ensembles are primarily built using two different strategies:

1. Bagging
2. Boosting
1) Bagging (short for Bootstrapped Aggregation):

• creates different subsets of data (this is called bootstrapping)


• trains one model per subset
• aggregates all predictions to get the final prediction

Some common models that leverage Bagging are:

• Random Forests
• Extra Trees

Boosting
2) Boosting:

• is an iterative training process


• the subsequent model puts more focus on misclassified samples from the
previous model
• the final prediction is a weighted combination of all predictions

Some common models that leverage Boosting are:

• XGBoost,
• AdaBoost,
etc.

Overall, ensemble models significantly boost the predictive performance compared


to using a single model. They tend to be more robust, generalize better to unseen
data, and are less prone to overfitting.

You might also like