0% found this document useful (0 votes)
83 views29 pages

Random Forest

Uploaded by

mehaknoorkaur91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views29 pages

Random Forest

Uploaded by

mehaknoorkaur91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Random Forest

Mandeep Kaur
Mehaknoor
Key Terms

Ensemble Learning
Bagging
Sampling
Decision Trees

Random Forest 2
Ensemble Learning
In Machine Learning, ensemble methods use multiple
learning algorithms to obtain better predictive
performance than could be obtained from any of the
constituent learning algorithms alone

“Wisdom of Crowd”
Combination of Machine Learning Models

Random Forest 3
PRESENTATION TITLE 4
Training of Base Models
Models Data Models + Data
The models can be The Models are same The models are
different but the Data is different. different and the
[Expert’s in different training data as also
domains] different
Trained on different
data

Random Forest 5
• Evaluating the prediction of an ensemble typically requires
more computation than evaluating the prediction of a single
model.
• In one sense, ensemble learning may be thought of as a way
to compensate for poor learning algorithms by performing a
lot of extra computation.
• On the other hand, the alternative is to do a lot more learning
on one non-ensemble system.
• An ensemble system may be more efficient at improving
overall accuracy for the same increase in compute, storage,
or communication resources by using that increase on two or
more methods, than would have been improved by
increasing resource use for a single method.
• Fast algorithms such as Decision Trees are commonly used
in ensemble methods (for example, Random Forest),. 6
Random Forest
Types of Ensemble Learning

1 2 3 4

Voting Bagging Boosting Stacking

Random Forest 7
Voting Ensemble

PRESENTATION TITLE 8
Voting Ensemble
What is Ensemble Voting? Ensemble voting offers
several benefits

Ensemble voting is a popular Improved Accuracy: It leverages the collective


technique in ensemble learning knowledge and diversity of multiple models to make
where the final prediction is made by more accurate predictions than any individual model
combining the predictions of multiple alone. It can reduce bias and variance and achieve
better overall performance.
individual models. It is commonly
used for classification tasks, although
it can also be applied to regression Robustness: This technique can be more robust to
noisy or incorrect predictions from individual models.
problems
It reduces the impact of individual model errors or
biases by considering multiple viewpoints.
.

PRESENTATION TITLE 9
Bagging
OR
Bootstrap
aggregating
INTRODUCTION

OF

BAGGING
Bagging, also known
as Bootstrap
aggregating, is an
ensemble learning
technique that helps
to improve the
performance and
accuracy of machine
learning algorithms.
Working of bagging
Bagging (bootstrap aggregating)
is an ensemble method that
involves training multiple models
independently on random subsets of
the data, and aggregating their
predictions through voting or
averaging.
In detail, each model is trained on a random
subset of the data sampled with
replacement, meaning that the individual
data points can be chosen more than once.
This random subset is known as a bootstrap
sample. By training models on different
bootstraps, bagging reduces the variance of
the individual models.
The predictions from all the sampled models
are then combined through a simple averaging
to make the overall prediction. This way, the
aggregated model incorporates the strengths
of the individual ones and cancels out their
errors.
PRESENTATION TITLE 18
Introduction to Random Forest
 A Random Forest is like a group decision-making team in machine learning.
 It combines the opinions of many “trees” (individual models) to make better predictions, creating a more robust
and accurate overall model
 One of the most important features of the Random Forest Algorithm is that it can handle the data set
containing continuous variables, as in the case of regression, and categorical variables, as in the case of
classification.
 It performs better for classification and regression tasks. In this tutorial, we will understand the working of
random forest and implement random forest on a classification task

E.g.- Let’s dive into a real-life analogy to understand this concept further. A student named X wants to choose a
course after his 10+2, and he is confused about the choice of course based on his skill set. So he decides to consult
various people like his cousins, teachers, parents, degree students, and working people. He asks them varied questions
like why he should choose, job opportunities with that course, course fee, etc. Finally, after consulting various people
about the course he decides to take the course suggested by most people

PRESENTATION TITLE 19
Random Forest

Bagging Technique
Multiple Trees can be trained
• Bagging
Or Group of trees
• Sampling

Data M1 M2 Mn

20
Random Forest
24

Key Features of Random Forest


• Can be used for classification and regression

• A Random Forest can deal with many different features

• A Random Forest is a collection of Decision Trees


of which results are averaged or majority voted

21
How is a Random Forest created?
• A random forest consists of decision trees.
A decision tree consists of
• decision nodes
the top decision node is called the root node
• terminal nodes or leaf nodes

• A selection of data and features is used for each tree


For every decision tree
• a sample of the training data is used
• a sample of the features (√nfeatures up to 30 – 40%) is used

22
Steps Involved in Random
Forest
•Step 1: In the Random forest model, a subset of data points and a subset of
features is selected for constructing each decision tree. Simply put, n random
records and m features are taken from the data set having k number of records.
•Step 2: Individual decision trees are constructed for each sample.
•Step 3: Each decision tree will generate an output.
•Step 4: Final output is considered based on Majority Voting or Averaging for
Classification and regression, respectively.

Consider the fruit basket as the data as shown in the figure below.
Now n number of samples are taken from the fruit basket, and an
individual decision tree is constructed for each sample. Each
decision tree will generate an output, as shown in the figure. The
final output is considered based on majority voting. In the below
figure, you can see that the majority decision tree gives output as an
apple when compared to a banana, so the final output is taken as an
apple
PRESENTATION TITLE 23
Processing the ensemble of trees called
The Random Forest
• Take a set of variables
• Run them through every decision tree
• Determine a predicted target variable for each of the trees
• Average the result of all trees

24
Working of Random Forest

• The random forest algorithm is made up of a collection of decision trees, and


each tree in the ensemble is comprised of a data sample drawn from a training
set with replacement, called the bootstrap sample.
• Of that training sample, one-third of it is set aside as test data, known as the
out-of-bag (oob) sample, which we’ll come back to later.
• Another instance of randomness is then injected through feature bagging,
(sampling) adding more diversity to the dataset and reducing the correlation
among decision trees.
• Depending on the type of problem, the determination of the prediction will
vary. For a regression task, the individual decision trees will be averaged, and
for a classification task, a majority vote—i.e. the most frequent categorical
variable—will yield the predicted class.

Random Forest 25
Key Benefits

•Reduced risk of overfitting: Decision trees run the risk of overfitting as they tend to tightly fit all
the samples within training data. However, when there’s a robust number of decision trees in a
random forest, the classifier won’t overfit the model since the averaging of uncorrelated trees
lowers the overall variance and prediction error.
•Provides flexibility: Since random forest can handle both regression and classification tasks with
a high degree of accuracy, it is a popular method among data scientists. Feature bagging also
makes the random forest classifier an effective tool for estimating missing values as it maintains
accuracy when a portion of the data is missing.
•Easy to determine feature importance: Random forest makes it easy to evaluate variable
importance, or contribution, to the model. There are a few ways to evaluate feature importance.
Gini importance and mean decrease in impurity (MDI) are usually used to measure how much the
model’s accuracy decreases when a given variable is excluded. However, permutation importance,
also known as mean decrease accuracy (MDA), is another importance measure. MDA identifies
the average decrease in accuracy by randomly permutating the feature values in oob samples
26
Random Forest
Applications

 Finance: It is a preferred algorithm over others as it reduces time spent on data management
and pre-processing tasks. It can be used to evaluate customers with high credit risk, to detect
fraud, and option pricing problems.

 Healthcare: The random forest algorithm has applications within computational biology (link
resides outside ibm.com), allowing doctors to tackle problems such as gene expression
classification, biomarker discovery, and sequence annotation. As a result, doctors can make
estimates around drug responses to specific medications.

 E-commerce: It can be used for recommendation engines for cross-sell purposes.

27
Random Forest
Hyperparameters

 Num Estimators
 Max features
 Bootstrap
 Max Samples

Random Forest 28
Thank you

You might also like