0% found this document useful (0 votes)

83 views29 pages

Random Forest

Uploaded by

mehaknoorkaur91

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views29 pages

Random Forest

Uploaded by

mehaknoorkaur91

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Random Forest

Mandeep Kaur
Mehaknoor
Key Terms

Ensemble Learning
Bagging
Sampling
Decision Trees

Random Forest 2
Ensemble Learning
In Machine Learning, ensemble methods use multiple
learning algorithms to obtain better predictive
performance than could be obtained from any of the
constituent learning algorithms alone

“Wisdom of Crowd”
Combination of Machine Learning Models

Random Forest 3
PRESENTATION TITLE 4
Training of Base Models
Models Data Models + Data
The models can be The Models are same The models are
different but the Data is different. different and the
[Expert’s in different training data as also
domains] different
Trained on different
data

Random Forest 5
• Evaluating the prediction of an ensemble typically requires
more computation than evaluating the prediction of a single
model.
• In one sense, ensemble learning may be thought of as a way
to compensate for poor learning algorithms by performing a
lot of extra computation.
• On the other hand, the alternative is to do a lot more learning
on one non-ensemble system.
• An ensemble system may be more efficient at improving
overall accuracy for the same increase in compute, storage,
or communication resources by using that increase on two or
more methods, than would have been improved by
increasing resource use for a single method.
• Fast algorithms such as Decision Trees are commonly used
in ensemble methods (for example, Random Forest),. 6
Random Forest
Types of Ensemble Learning

1 2 3 4

Voting Bagging Boosting Stacking

Random Forest 7
Voting Ensemble

PRESENTATION TITLE 8
Voting Ensemble
What is Ensemble Voting? Ensemble voting offers
several benefits

Ensemble voting is a popular Improved Accuracy: It leverages the collective

technique in ensemble learning knowledge and diversity of multiple models to make
where the final prediction is made by more accurate predictions than any individual model
combining the predictions of multiple alone. It can reduce bias and variance and achieve
better overall performance.
individual models. It is commonly
used for classification tasks, although
it can also be applied to regression Robustness: This technique can be more robust to
noisy or incorrect predictions from individual models.
problems
It reduces the impact of individual model errors or
biases by considering multiple viewpoints.
.

PRESENTATION TITLE 9
Bagging
OR
Bootstrap
aggregating
INTRODUCTION

BAGGING
Bagging, also known
as Bootstrap
aggregating, is an
ensemble learning
technique that helps
to improve the
performance and
accuracy of machine
learning algorithms.
Working of bagging
Bagging (bootstrap aggregating)
is an ensemble method that
involves training multiple models
independently on random subsets of
the data, and aggregating their
predictions through voting or
averaging.
In detail, each model is trained on a random
subset of the data sampled with
replacement, meaning that the individual
data points can be chosen more than once.
This random subset is known as a bootstrap
sample. By training models on different
bootstraps, bagging reduces the variance of
the individual models.
The predictions from all the sampled models
are then combined through a simple averaging
to make the overall prediction. This way, the
aggregated model incorporates the strengths
of the individual ones and cancels out their
errors.
PRESENTATION TITLE 18
Introduction to Random Forest
 A Random Forest is like a group decision-making team in machine learning.
 It combines the opinions of many “trees” (individual models) to make better predictions, creating a more robust
and accurate overall model
 One of the most important features of the Random Forest Algorithm is that it can handle the data set
containing continuous variables, as in the case of regression, and categorical variables, as in the case of
classification.
 It performs better for classification and regression tasks. In this tutorial, we will understand the working of
random forest and implement random forest on a classification task

E.g.- Let’s dive into a real-life analogy to understand this concept further. A student named X wants to choose a
course after his 10+2, and he is confused about the choice of course based on his skill set. So he decides to consult
various people like his cousins, teachers, parents, degree students, and working people. He asks them varied questions
like why he should choose, job opportunities with that course, course fee, etc. Finally, after consulting various people
about the course he decides to take the course suggested by most people

PRESENTATION TITLE 19
Random Forest

Bagging Technique
Multiple Trees can be trained
• Bagging
Or Group of trees
• Sampling

Data M1 M2 Mn

20
Random Forest
24

Key Features of Random Forest

• Can be used for classification and regression

• A Random Forest can deal with many different features

• A Random Forest is a collection of Decision Trees

of which results are averaged or majority voted

21
How is a Random Forest created?
• A random forest consists of decision trees.
A decision tree consists of
• decision nodes
the top decision node is called the root node
• terminal nodes or leaf nodes

• A selection of data and features is used for each tree

For every decision tree
• a sample of the training data is used
• a sample of the features (√nfeatures up to 30 – 40%) is used

22
Steps Involved in Random
Forest
•Step 1: In the Random forest model, a subset of data points and a subset of
features is selected for constructing each decision tree. Simply put, n random
records and m features are taken from the data set having k number of records.
•Step 2: Individual decision trees are constructed for each sample.
•Step 3: Each decision tree will generate an output.
•Step 4: Final output is considered based on Majority Voting or Averaging for
Classification and regression, respectively.

Consider the fruit basket as the data as shown in the figure below.
Now n number of samples are taken from the fruit basket, and an
individual decision tree is constructed for each sample. Each
decision tree will generate an output, as shown in the figure. The
final output is considered based on majority voting. In the below
figure, you can see that the majority decision tree gives output as an
apple when compared to a banana, so the final output is taken as an
apple
PRESENTATION TITLE 23
Processing the ensemble of trees called
The Random Forest
• Take a set of variables
• Run them through every decision tree
• Determine a predicted target variable for each of the trees
• Average the result of all trees

24
Working of Random Forest

• The random forest algorithm is made up of a collection of decision trees, and

each tree in the ensemble is comprised of a data sample drawn from a training
set with replacement, called the bootstrap sample.
• Of that training sample, one-third of it is set aside as test data, known as the
out-of-bag (oob) sample, which we’ll come back to later.
• Another instance of randomness is then injected through feature bagging,
(sampling) adding more diversity to the dataset and reducing the correlation
among decision trees.
• Depending on the type of problem, the determination of the prediction will
vary. For a regression task, the individual decision trees will be averaged, and
for a classification task, a majority vote—i.e. the most frequent categorical
variable—will yield the predicted class.

Random Forest 25
Key Benefits

•Reduced risk of overfitting: Decision trees run the risk of overfitting as they tend to tightly fit all
the samples within training data. However, when there’s a robust number of decision trees in a
random forest, the classifier won’t overfit the model since the averaging of uncorrelated trees
lowers the overall variance and prediction error.
•Provides flexibility: Since random forest can handle both regression and classification tasks with
a high degree of accuracy, it is a popular method among data scientists. Feature bagging also
makes the random forest classifier an effective tool for estimating missing values as it maintains
accuracy when a portion of the data is missing.
•Easy to determine feature importance: Random forest makes it easy to evaluate variable
importance, or contribution, to the model. There are a few ways to evaluate feature importance.
Gini importance and mean decrease in impurity (MDI) are usually used to measure how much the
model’s accuracy decreases when a given variable is excluded. However, permutation importance,
also known as mean decrease accuracy (MDA), is another importance measure. MDA identifies
the average decrease in accuracy by randomly permutating the feature values in oob samples
26
Random Forest
Applications

 Finance: It is a preferred algorithm over others as it reduces time spent on data management
and pre-processing tasks. It can be used to evaluate customers with high credit risk, to detect
fraud, and option pricing problems.

 Healthcare: The random forest algorithm has applications within computational biology (link
resides outside ibm.com), allowing doctors to tackle problems such as gene expression
classification, biomarker discovery, and sequence annotation. As a result, doctors can make
estimates around drug responses to specific medications.

 E-commerce: It can be used for recommendation engines for cross-sell purposes.

27
Random Forest
Hyperparameters

 Num Estimators
 Max features
 Bootstrap
 Max Samples

Random Forest 28
Thank you

Random Forest for ML Enthusiasts
No ratings yet
Random Forest for ML Enthusiasts
4 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Unit 5 2
No ratings yet
Unit 5 2
31 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
Chapter 6 - Feedforward Deep Networks
No ratings yet
Chapter 6 - Feedforward Deep Networks
27 pages
CNN & RNN Explained for Beginners
No ratings yet
CNN & RNN Explained for Beginners
8 pages
DSA Revision Timetable and Topics
No ratings yet
DSA Revision Timetable and Topics
3 pages
IoT-Enabled Precision Irrigation System
No ratings yet
IoT-Enabled Precision Irrigation System
7 pages
Ai Unit I
No ratings yet
Ai Unit I
31 pages
Rpa Unit 1
No ratings yet
Rpa Unit 1
13 pages
Ecosystem Revision Notes
No ratings yet
Ecosystem Revision Notes
6 pages
Machine Learning for Heart Failure Prediction
No ratings yet
Machine Learning for Heart Failure Prediction
15 pages
Koppu Eshwar Aug32
No ratings yet
Koppu Eshwar Aug32
1 page
RL Introduction
No ratings yet
RL Introduction
225 pages
Mediainterview
No ratings yet
Mediainterview
3 pages
Data Engineering Lab Programs Guide
No ratings yet
Data Engineering Lab Programs Guide
2 pages
Agricultural Advancements Through IoT and Machine Learning
No ratings yet
Agricultural Advancements Through IoT and Machine Learning
6 pages
Check Below The Important Formulas, Terms and Properties For CBSE Class 10 Maths Exam 2020: 1. Real Numbers
No ratings yet
Check Below The Important Formulas, Terms and Properties For CBSE Class 10 Maths Exam 2020: 1. Real Numbers
78 pages
AI-unit 3
No ratings yet
AI-unit 3
55 pages
Visual Studio Shortcut Keys
No ratings yet
Visual Studio Shortcut Keys
15 pages
Unit 3 Model Construction 3.1 Machine Learning Concepts - An Overview
No ratings yet
Unit 3 Model Construction 3.1 Machine Learning Concepts - An Overview
36 pages
AIML Module - 03
No ratings yet
AIML Module - 03
34 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
Intro to Feed Forward Neural Networks
No ratings yet
Intro to Feed Forward Neural Networks
41 pages
Deep Learning Applications & CNNs
No ratings yet
Deep Learning Applications & CNNs
14 pages
C/C++ Interview Questions and Answers
No ratings yet
C/C++ Interview Questions and Answers
14 pages
AIML Module - 03 21CS4
No ratings yet
AIML Module - 03 21CS4
34 pages
C Interview Questions and Answers (2023)
No ratings yet
C Interview Questions and Answers (2023)
52 pages
C Programming Interview Q&A Guide
No ratings yet
C Programming Interview Q&A Guide
4 pages
Phy Vol - 1 Formulas With Solved Obj Problems - F
No ratings yet
Phy Vol - 1 Formulas With Solved Obj Problems - F
625 pages
Data Structures & C++ Interview Prep
No ratings yet
Data Structures & C++ Interview Prep
99 pages
ESQL
No ratings yet
ESQL
2 pages
Overview of AI Agent Architectures
No ratings yet
Overview of AI Agent Architectures
26 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
AI Project Cycle Guide
No ratings yet
AI Project Cycle Guide
35 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Guido Van Rossum, Fred L. Drake, JR., (Editor) - Python Tutorial. Release 3.2.3 (2012, Python Software Foundation)
No ratings yet
Guido Van Rossum, Fred L. Drake, JR., (Editor) - Python Tutorial. Release 3.2.3 (2012, Python Software Foundation)
105 pages
Linux and Shell Scripting
No ratings yet
Linux and Shell Scripting
2 pages
Pro*C: Embedded SQL in C Programming
No ratings yet
Pro*C: Embedded SQL in C Programming
2 pages
Unit I
No ratings yet
Unit I
10 pages
Back Propagation
100% (1)
Back Propagation
27 pages
OS Lab Manual: CPU Scheduling Simulations
No ratings yet
OS Lab Manual: CPU Scheduling Simulations
37 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
8 pages
OS Lab Manual: CPU Scheduling Simulations
No ratings yet
OS Lab Manual: CPU Scheduling Simulations
27 pages
Ai-Unit Ii
No ratings yet
Ai-Unit Ii
61 pages
Interview Mantra
No ratings yet
Interview Mantra
5 pages
Data Communication and Computer Network Chapter 1
No ratings yet
Data Communication and Computer Network Chapter 1
32 pages
Pro*C Programming Guide
No ratings yet
Pro*C Programming Guide
18 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Coding Challenges Overview
No ratings yet
Coding Challenges Overview
2 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
2 pages
C Interview Questions Answers
No ratings yet
C Interview Questions Answers
19 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
35 pages
Random Forest
No ratings yet
Random Forest
25 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Random Forest
No ratings yet
Random Forest
14 pages
Data Analytics For Non-Life Insurance Pricing
No ratings yet
Data Analytics For Non-Life Insurance Pricing
240 pages
Brit J of Edu Psychol - 2024 - Cheung - A Machine Learning Model of Academic Resilience in The Times of The COVID 19
No ratings yet
Brit J of Edu Psychol - 2024 - Cheung - A Machine Learning Model of Academic Resilience in The Times of The COVID 19
21 pages
An Ensemble Learning Based Intrusion Detection Model For Industrial IoT Security - PAPER
No ratings yet
An Ensemble Learning Based Intrusion Detection Model For Industrial IoT Security - PAPER
12 pages
Report Rahul
No ratings yet
Report Rahul
26 pages
Diabetes Data Analysis Using Python Report
No ratings yet
Diabetes Data Analysis Using Python Report
15 pages
NN Model and Gap Statistic Analysis
80% (10)
NN Model and Gap Statistic Analysis
14 pages
AI&ML - Lab Manual
No ratings yet
AI&ML - Lab Manual
46 pages
Handoff Using Machine Learning Techniques
No ratings yet
Handoff Using Machine Learning Techniques
4 pages
Comparing ML Algorithms On Financial Fraud Detection N
No ratings yet
Comparing ML Algorithms On Financial Fraud Detection N
5 pages
Machine Learning for Futures Trading
No ratings yet
Machine Learning for Futures Trading
5 pages
Capstone Project
No ratings yet
Capstone Project
89 pages
Dynamic Default Prediction in P2P Lending
No ratings yet
Dynamic Default Prediction in P2P Lending
9 pages
Ecological Modelling: Sciencedirect
No ratings yet
Ecological Modelling: Sciencedirect
11 pages
Supervised LEARNING File
No ratings yet
Supervised LEARNING File
42 pages
Decision Trees and Ensemble Learning
100% (1)
Decision Trees and Ensemble Learning
162 pages
Anshul's Resume
No ratings yet
Anshul's Resume
1 page
Movie Success Prediction Using Machine Learning Algorithms and Their Comparison
No ratings yet
Movie Success Prediction Using Machine Learning Algorithms and Their Comparison
6 pages
Cattle Disease Prediction with AI Techniques
No ratings yet
Cattle Disease Prediction with AI Techniques
8 pages
A1 - Earth Engine Fundamentals and Applications - EEFA - Live Document
No ratings yet
A1 - Earth Engine Fundamentals and Applications - EEFA - Live Document
217 pages
A Random Forest Guided Tour: Gerard - Biau@
No ratings yet
A Random Forest Guided Tour: Gerard - Biau@
41 pages
DAB25 MockFinal
No ratings yet
DAB25 MockFinal
5 pages
Diabetic Final
No ratings yet
Diabetic Final
14 pages
Leo Breiman - Statistical Modeling - Two Cultures
No ratings yet
Leo Breiman - Statistical Modeling - Two Cultures
33 pages
Orange Data Mining
100% (1)
Orange Data Mining
26 pages
Glassdoor Insights for Job Seekers
No ratings yet
Glassdoor Insights for Job Seekers
15 pages
Financial Fraud Detection Using Machine Learning - Final Report With Acceptance Index and Plag Report
No ratings yet
Financial Fraud Detection Using Machine Learning - Final Report With Acceptance Index and Plag Report
95 pages
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
No ratings yet
HEART DISEASE PREDICTION Using MACHINE LEARNING ALGORITHM Presentation
15 pages
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
No ratings yet
Sat - 95.Pdf - Heart Disease Prediction Using Machine Learning Algorithms
11 pages
Forecasting Mass and Metallurgical Balance at A Gold Processing Plant Using Modern Multivariate Statistics
No ratings yet
Forecasting Mass and Metallurgical Balance at A Gold Processing Plant Using Modern Multivariate Statistics
8 pages
Pulsar Data Analysis with Machine Learning
No ratings yet
Pulsar Data Analysis with Machine Learning
10 pages