You are on page 1of 2

Fall 2023

9/29/2023

Random Forest
Introduction
It is constructed using multiple decision trees and the final decision is obtained by majority
votes of decision tree. Some of the advantages of the Random forest are low variance that is
it combines the result of multiple decision tree and also each decision tree is trained on
limited data set (making its own subset of data), it reduce overfitting that is model is fitted
well and use bootstrapped( aggregating the result of the decision trees and taking majority
vote in case of classification & mean in case of regression problems and give output), it does
not need normalization that is it works on rule based approach, it have good accuracy ( run
efficiently on large database) and it also estimates missing data.

Working of Random Forest Algorithm


Firstly, it performs bootstrapping (Random sampling with Replacement), that it creates
multiple subsets of the training data through a process called bootstrapping. It randomly
selects samples from original dataset with replacement. Secondly, it develops decision trees,
that is for each subset of the data, a decision tree is trained independently. At each node of
the decision tree, a random subset of features is considered for splitting. This introduces
diversity among the individual trees and prevent overfitting. Thirdly, it works on voting
(Classification) or averaging (regression). After growing a forest of decision trees, when
making predictions, Random forest aggregates the predictions of all individual trees. For
classification tasks, each tree “votes” for a class, and the class with the most votes becomes
the final prediction (majority voting). For regression tasks, the predictions of all trees are
averaged to produce the final regression prediction. Fourthly, it reduces overfitting that is the
random selection of data points and features at each step introduces randomness ad reduces
the risk of overfitting. Finally, it also estimates error, since, each tree is trained on a different
subset of data, the samples not included in a particular tree’s boot strap sample can be used
to estimate its prediction accuracy.
However, there are still some disadvantages of Random forest, that more training time is
required and more decision tree is needed. Also, it requires more memory and
computationally expensive.
Bootstrapping (Random Sampling with Replacement):
 Random Forest starts by creating multiple subsets of the training data through a process
called bootstrapping. It randomly selects samples from the original dataset with replacement,
which means that some data points may be repeated in a subset, while others may be
omitted.

You might also like