Welcome to Scribd!

DecisionTree 2

Uploaded by

0% found this document useful (0 votes)

7 views13 pages

This document discusses key concepts for building decision trees such as entropy, information gain, and selecting attributes. It provides examples of calculating the entropy of nodes to measure impurity before and after splitting on attributes. The document explains that the attribute with the highest information gain, or the greatest decrease in entropy, should be used at each split in the tree to reach the purest leaves.

Original Description:

Theory of Data Science

Original Title

DecisionTree_2

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

7 views13 pages

DecisionTree 2

Uploaded by

Aly Boy

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 13

Search inside document

Decision Tree part 2

DS221 – Spring 2023

Instructor: Abinta Mehmood
Outline
• Entropy
• Information Gain
• Selecting the attributes
• Building DT
The entropy is used to calculate the homogeneity of the samples in that node.
If the samples are completely homogeneous, the entropy is zero and if the samples are
equally divided it has an entropy of one.
You can easily calculate the entropy of a node using the frequency table of the
attribute through the entropy formula where P is for the proportion or ratio of a
category,such as drug A or B.
Entropy of dataset before splitting.
You can embed these numbers into the entropy formula to calculate the
impurity of the target attribute before splitting it. In this case it is 0.940.
When it is normal we have six for drug B, and two for drug A.
We can calculate the entropy of this node based on the distribution of drug A and B which is
0.8 in this case.
But, when cholesterol is high, the data is split into three for drug B and three for drug A.
Calculating it's entropy, we can see it would be 1.0.
Now, the question is between the cholesterol and sex attributes which one is a better
choice?
The answer is the tree with the higher information gain after splitting
We can think of information gain and entropy as opposites.
As entropy or the amount of randomness decreases, the information gain or amount
of certainty increases and vice versa.
So, constructing a decision tree is all about finding attributes that return the highest
information gain.
Impurity and Entropy
Now, what is the next attribute after branching by the sex attribute?
Well, as you can guess, we should repeat the process for each branch and test
each of the other attributes to continue to reach the purest leaves. This is the
way you build a decision tree

Decision Tree
Document13 pages
Decision Tree
Aly Boy
No ratings yet
Lec 8
Document23 pages
Lec 8
shubham
No ratings yet
Understanding T-Tests: 1-Sample, 2-Sample, and Paired T-Tests
Document7 pages
Understanding T-Tests: 1-Sample, 2-Sample, and Paired T-Tests
A.K.M. MOBAROK
No ratings yet
Dtree&rf
Document26 pages
Dtree&rf
Mohit Soni
No ratings yet
Decision Trees
Document16 pages
Decision Trees
jatinrastogi81
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
Document13 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
sumanroyal
No ratings yet
Common Statistical Tests
Document12 pages
Common Statistical Tests
shanumanuranu
No ratings yet
Introduction To T Test (One Sample and Dependent Samples)
Document48 pages
Introduction To T Test (One Sample and Dependent Samples)
Alyssa Amigo
No ratings yet
Natural Language Processing Natural Language Processing: Unit - 1 Essential Information Theory
Document34 pages
Natural Language Processing Natural Language Processing: Unit - 1 Essential Information Theory
225003012
No ratings yet
Decision Tree Tutorial
Document8 pages
Decision Tree Tutorial
Дхиа Еддине
No ratings yet
T Test Function in Statistical Software
Document9 pages
T Test Function in Statistical Software
Nur Ain Hasma
No ratings yet
Assignment Name - Machine Learning Basics Problem Statement
Document4 pages
Assignment Name - Machine Learning Basics Problem Statement
Mubassir Surve
No ratings yet
Randomization Distributions: Methods of Data Analysis I
Document61 pages
Randomization Distributions: Methods of Data Analysis I
musa727
No ratings yet
Lesson For T Tests
Document53 pages
Lesson For T Tests
Shin Guzman
No ratings yet
Evaluation Measure Classification
Document6 pages
Evaluation Measure Classification
Enaqkat Painav
No ratings yet
Decision Tree & Random Forest
Document16 pages
Decision Tree & Random Forest
reshma acharya
No ratings yet
Information Theory in Machine Learning
Document3 pages
Information Theory in Machine Learning
Kishore Kumar Das
No ratings yet
An Introduction To T-Tests
Document5 pages
An Introduction To T-Tests
bernadith tolingin
No ratings yet
ML & Statistics Unit 6
Document36 pages
ML & Statistics Unit 6
Soham Badjate
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
Document11 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
Shahbaz khan
No ratings yet
PSYB07 Final Notes
Document7 pages
PSYB07 Final Notes
layanh3103
No ratings yet
Worksheet5 BI343
Document5 pages
Worksheet5 BI343
Cookle Plex
No ratings yet
Final Exam
Document47 pages
Final Exam
Gerald Hernandez
No ratings yet
Prediction by Supervised Principal Components
Document35 pages
Prediction by Supervised Principal Components
pateras34
No ratings yet
Assignment No: 1: Answer: 1. Paired Sample T-Test
Document3 pages
Assignment No: 1: Answer: 1. Paired Sample T-Test
Lubaba Shabbir
No ratings yet
3.decision Tree
Document23 pages
3.decision Tree
anima tor
No ratings yet
Some Statistical Methods in Anachem
Document39 pages
Some Statistical Methods in Anachem
shaine
No ratings yet
Math-Sp: COLLEGE PREP A.Y. 2023-2024
Document25 pages
Math-Sp: COLLEGE PREP A.Y. 2023-2024
Jemina Poche
No ratings yet
Biostat Portfolio
Document21 pages
Biostat Portfolio
Ian Iglesia
No ratings yet
Continous Assessment Answer Sheet
Document29 pages
Continous Assessment Answer Sheet
mounika k
No ratings yet
Bayes' Theorem Explained: The Basic Idea
Document8 pages
Bayes' Theorem Explained: The Basic Idea
fajlsefisjfes
No ratings yet
Classification Algorithm: Supervised Learning Technique Training Data
Document28 pages
Classification Algorithm: Supervised Learning Technique Training Data
mirahaem5
No ratings yet
Chi-Square Test of Goodness-of-Fit
Document6 pages
Chi-Square Test of Goodness-of-Fit
Mohammedseid Ahmedin
No ratings yet
9.0 KNN Nearest Neighbours Algorithm
Document4 pages
9.0 KNN Nearest Neighbours Algorithm
Pratik Raj
No ratings yet
Proteomics Workshop Sessions
Document13 pages
Proteomics Workshop Sessions
Fred Steele
No ratings yet
Two-Sample Problems: Concepts
Document13 pages
Two-Sample Problems: Concepts
Jessica Patricia Francisco
No ratings yet
Testing Proportions
Document2 pages
Testing Proportions
bo
No ratings yet
CH 11 - Small Sample Test
Document8 pages
CH 11 - Small Sample Test
hehehaswal
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
Document27 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
Satyavaraprasad Balla
No ratings yet
Wa Nko Nalipay PR
Document12 pages
Wa Nko Nalipay PR
gbs040479
No ratings yet
SmartAlAnswers ALL
Document322 pages
SmartAlAnswers ALL
Eyosyas Woldekidan
50% (2)
Biostats 2
Document7 pages
Biostats 2
baf49411
No ratings yet
T Test
Document6 pages
T Test
samprt
No ratings yet
HW 2
Document3 pages
HW 2
Aditya Rao
No ratings yet
An Introduction To T Tests
Document6 pages
An Introduction To T Tests
samprt
No ratings yet
Confusion Matrix
Document43 pages
Confusion Matrix
Swapnil Bera
No ratings yet
T Test
Document30 pages
T Test
Espinosa Sánchez Denisse
No ratings yet
Anova Methodology
Document8 pages
Anova Methodology
Rana Adeel
No ratings yet
Chapter13 Stats
Document4 pages
Chapter13 Stats
Poonam Naidu
No ratings yet
T TEST
Document5 pages
T TEST
Meenalochini kannan
No ratings yet
Exp 2
Document6 pages
Exp 2
bryan ongwinardy
No ratings yet
Inferential Statistics For Data Science
Document10 pages
Inferential Statistics For Data Science
rsaranms
100% (1)
Inference For Numerical Data - Stats 250
Document18 pages
Inference For Numerical Data - Stats 250
Oliver Barr
No ratings yet
Mode. by Far The Simplest, But Also The Least Widely Used, Measure of Central Tendency Is Themode. The
Document2 pages
Mode. by Far The Simplest, But Also The Least Widely Used, Measure of Central Tendency Is Themode. The
Ron Neil
No ratings yet
Post Hoc Tests
Document18 pages
Post Hoc Tests
Dean Goldenbb
No ratings yet
Assignment No. 2 Subject: Educational Statistics (8614) (Units 1-4) Subject
Document7 pages
Assignment No. 2 Subject: Educational Statistics (8614) (Units 1-4) Subject
rida batool
No ratings yet
Module 3 DecisionTree Notes
Document14 pages
Module 3 DecisionTree Notes
Manas Hassija
No ratings yet
Introduction To Inferential Statistics
Document8 pages
Introduction To Inferential Statistics
Rabeya Saqib
No ratings yet
Stat Itics
Document20 pages
Stat Itics
Honey C. Perez
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Regression 1
Document22 pages
Regression 1
Aly Boy
No ratings yet
Evaluation Metrics
Document10 pages
Evaluation Metrics
Aly Boy
No ratings yet
Machine Learning
Document27 pages
Machine Learning
Aly Boy
No ratings yet
Data Analytics Life Cycle
Document13 pages
Data Analytics Life Cycle
Aly Boy
No ratings yet
Decision Tree
Document13 pages
Decision Tree
Aly Boy
No ratings yet