Welcome to Scribd!

Decision Tree

Uploaded by

0% found this document useful (0 votes)

5 views3 pages

Decision trees can handle both classification and regression problems using a supervised learning algorithm. Earlier versions of CART used information gain, while version 5.0 uses Gini impurity to select the best split. Information gain and Gini impurity measure the purity of splits to determine the attribute that best splits the data. For example, in the weather data the split on "Outlook" had the highest information gain of 0.693, indicating it best separated the data classes. Overfitting occurs when a complex model learns the training data too well but does not generalize to new data, while underfitting means a simple model performs poorly on both training and test data.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views3 pages

Decision Tree

Uploaded by

priyanka bhasin

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 3

Search inside document

Decision Tree:

Can work on both classification and regression problem.

Supervised Learning algorithm

CART 4.5 -> Information Gain

CART 5.0 -> Gini Impurity.

Information Gain:

Entropy: E =

E = - p(Yes)log(p(Yes)) – p(No) log (p(No)) => log = log of base 2

Information Gain = Entropy (total dataset) – (weighted average)*Entropy(individual features/independent variable)

Gini Impurity: Gini=

E(Outlook=Sunny) = - 2/5* log (2/5) - 3/5*log(3/5) = 0.971

E(Outlook=Overcast) = -4/4*log(4/4) – 0 log (0) =0

E(Outlook=Rainy) = -3/5log(3/5) – 2/5log(2/5) = 0.971

Information Gain (Outlook) = 5/14* E(Outlook=Sunny) + 4/14 * E(Outlook=Overcast) + 5/14* E(Outlook=Rainy)

=5/140.971 + 4/140 + 5/14*0.971=0.693

Information Gain if we do split on Outlook column: = E(S) - Information Gain (Outlook) = 0.94 – 0.693 = 0.247

Similarly we need to calculate Information Gain if we do split on temp, humidity and windy.

Whichever has the highest information gain we will split on that column. We will take the decision
based on lowest Entropy in this case Outlook == Overcast
Entropy means impurity.

For continuous variable we will create buckets. E.g. Age column

Small Age: <12

Teenage: 13- 19

Jr Adult: 20-30

Middle Aged:30-60:

Sr. Citizen :60-80

Super Sr. Citizen >80

Gini=1 – summation of p2=1-(50/50)^2 – (0/50)^2 – (0/50)^2 =1 – 1=0

Overfitting: Training accuracy = high but test accuracy=low

 Very complex model and unable to generalize

Underfitting: both train and test accuracy is low

 It is a very simple model.

Decision Tree
Document10 pages
Decision Tree
Sameer Khan
No ratings yet
2) TCE MOOC-jDecision Tree
Document32 pages
2) TCE MOOC-jDecision Tree
Prajith Sprinťèř
No ratings yet
Decision Tree Entropy Gini
Document5 pages
Decision Tree Entropy Gini
Sudheer Redus
No ratings yet
R. Vishnu Priya
Document16 pages
R. Vishnu Priya
Alok Kumar
No ratings yet
Decision Tree: - Construct A Decision Tree To Classify "Golf Play
Document17 pages
Decision Tree: - Construct A Decision Tree To Classify "Golf Play
Ayaz Hussain
No ratings yet
Dec Tree
Document17 pages
Dec Tree
Van Blizzard
No ratings yet
4.decision Tree
Document68 pages
4.decision Tree
sharad
No ratings yet
Machine Learning
Document27 pages
Machine Learning
sahibpctebca21a
No ratings yet
Decision Tree and KNN Assignment Two
Document13 pages
Decision Tree and KNN Assignment Two
abaynesh moges
No ratings yet
Assignment 4
Document3 pages
Assignment 4
Cesar Orlando Quiroz Villavicencio
No ratings yet
Selesaikan Dengan Logika ID3, Dari Sample Data Kriteriacuaca Untuk Permainan Bola Berikut
Document13 pages
Selesaikan Dengan Logika ID3, Dari Sample Data Kriteriacuaca Untuk Permainan Bola Berikut
Sasya Nadira
No ratings yet
Classification and Prediction
Document81 pages
Classification and Prediction
Krishnan Swami
No ratings yet
Exercises Classificatiwqeon
Document7 pages
Exercises Classificatiwqeon
PascDoina
No ratings yet
Data Handling Class 7
Document34 pages
Data Handling Class 7
Surya Yadav
No ratings yet
15-381 Spring 2007 Assignment 6: Learning
Document14 pages
15-381 Spring 2007 Assignment 6: Learning
sandeepan
No ratings yet
2004 Vijayakumar
Document14 pages
2004 Vijayakumar
Yang Yi
No ratings yet
Solution Manual For Complete Business Statistics 7th Edition Aczel
Document18 pages
Solution Manual For Complete Business Statistics 7th Edition Aczel
hanand6
65% (20)
Applied Statistics in Business and Economics 5th Edition Doane Solutions Manual Download
Document24 pages
Applied Statistics in Business and Economics 5th Edition Doane Solutions Manual Download
Dominic Ezzelle
100% (27)
DAA Project
Document20 pages
DAA Project
Monjurul Rana
No ratings yet
Class 16 Decision Tree
Document45 pages
Class 16 Decision Tree
Sumana Basu
No ratings yet
Decision Tree (Class 37-38) 169692509554958626652505a71d481
Document45 pages
Decision Tree (Class 37-38) 169692509554958626652505a71d481
23mb0072
No ratings yet
Dwnload Full Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual PDF
Document35 pages
Dwnload Full Introductory Econometrics A Modern Approach 4th Edition Wooldridge Solutions Manual PDF
schmitzerallanafx
100% (10)
Lesson 3 Operators (Các Phép Toán)
Document3 pages
Lesson 3 Operators (Các Phép Toán)
api-19473489
No ratings yet
Measures of Central Tendency
Document29 pages
Measures of Central Tendency
Shafiq Ur Rahman
No ratings yet
Solution Numerical Methods For Engineers-Steven-Chapra
Document515 pages
Solution Numerical Methods For Engineers-Steven-Chapra
Ooi Chia En
No ratings yet
Chapter 2
Document10 pages
Chapter 2
lee
No ratings yet
Solutions To Chapter 2 Problems
Document3 pages
Solutions To Chapter 2 Problems
generallygeneric
No ratings yet
Exercises695Clas Solution
Document13 pages
Exercises695Clas Solution
HunAina Irshad
100% (2)
Admas University: Department of Business Management Statistics Group Assignment B-2 GROUP-1
Document6 pages
Admas University: Department of Business Management Statistics Group Assignment B-2 GROUP-1
Amanuel Tesfaye
No ratings yet
Mathematics in Modern World 1
Document18 pages
Mathematics in Modern World 1
JoanaRose DelaTorre - DelaCruz
No ratings yet
Decision Tree Learning and Inductive Inference
Document37 pages
Decision Tree Learning and Inductive Inference
Luisa Grigorescu
No ratings yet
Datamining
Document6 pages
Datamining
ABDUL HANAN ZAHEER
No ratings yet
Chapter 1 PDF
Document7 pages
Chapter 1 PDF
Danish Xain
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
Document22 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
Anil
No ratings yet
Quantitative Methods FOR Decision Making: Assignment No. 1
Document3 pages
Quantitative Methods FOR Decision Making: Assignment No. 1
Arshad Baig Mughal
No ratings yet
GMAT Formula Sheet
Document16 pages
GMAT Formula Sheet
Vanita Agrawal
No ratings yet
If X 10 Then If X 5 Then X 5 Else Print X End If Else DO If X 50 Exit X X - 5 End Do End If
Document44 pages
If X 10 Then If X 5 Then X 5 Else Print X End If Else DO If X 50 Exit X X - 5 End Do End If
Chan Ka Lok
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter3 Students PDF
Document13 pages
Stock Watson 3U ExerciseSolutions Chapter3 Students PDF
Ja Sa
No ratings yet
Dwnload Full Statistics For Engineering and The Sciences 5th Edition Mendenhall Solutions Manual PDF
Document35 pages
Dwnload Full Statistics For Engineering and The Sciences 5th Edition Mendenhall Solutions Manual PDF
rappelpotherueo
100% (10)
ID3 Algorithm: Michael Crawford
Document28 pages
ID3 Algorithm: Michael Crawford
Mihir Shah
No ratings yet
Data Description
Document1 page
Data Description
Ayele Nugusie
No ratings yet
Statistics For Engineering and The Sciences 5th Edition Mendenhall Solutions Manual
Document14 pages
Statistics For Engineering and The Sciences 5th Edition Mendenhall Solutions Manual
helvitekiboshhdw60
100% (27)
Percentage & Average - Summary
Document3 pages
Percentage & Average - Summary
Vivek Kher
No ratings yet
Fall 2022 - STA301 - 1
Document4 pages
Fall 2022 - STA301 - 1
Talha Qayyum
No ratings yet
Edu 2008 Spring C Solutions
Document117 pages
Edu 2008 Spring C Solutions
willhslade
No ratings yet
Essential Statistics in Business and Economics 3rd Edition Doane Solutions Manual
Document25 pages
Essential Statistics in Business and Economics 3rd Edition Doane Solutions Manual
christopherparkerkzrbfyencw
100% (16)
Essential Statistics in Business and Economics 3rd Edition Doane Solutions Manual Full Chapter PDF
Document33 pages
Essential Statistics in Business and Economics 3rd Edition Doane Solutions Manual Full Chapter PDF
TonyDonaldsonjmdye
100% (12)
Hemh 112
Document8 pages
Hemh 112
reet_26
No ratings yet
The Royal Statistical Society 2003 Examinations: Solutions
Document9 pages
The Royal Statistical Society 2003 Examinations: Solutions
JeromeWeir
No ratings yet
ID3 Algorithm: Michael Crawford
Document28 pages
ID3 Algorithm: Michael Crawford
Muhammad Mudassir
No ratings yet
IP Formulation Guide (Abbreviated) 15.053 and 15.058 Spring 2013
Document5 pages
IP Formulation Guide (Abbreviated) 15.053 and 15.058 Spring 2013
Austin Parker
No ratings yet
Pbset1 Dofile
Document3 pages
Pbset1 Dofile
Zydney Wong
No ratings yet
Decision Tree
Document66 pages
Decision Tree
Digvijay Maheshwari
100% (3)
Econ G2 Final
Document10 pages
Econ G2 Final
Thái Tran
No ratings yet
Almojuela, Cedric C. Bsce - 2A
Document7 pages
Almojuela, Cedric C. Bsce - 2A
Cedric Almojuela
No ratings yet
Physical Sciences P1 Feb-March 2012 Memo Afr & Eng
Document15 pages
Physical Sciences P1 Feb-March 2012 Memo Afr & Eng
Theo02
100% (1)
Simple Numbers
From Everand
Simple Numbers
Prasant
No ratings yet
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Summary of Jimmy Song's Programming Bitcoin
From Everand
Summary of Jimmy Song's Programming Bitcoin
IRB Media
No ratings yet
PMP Formula Guide
From Everand
PMP Formula Guide
Mohammad Usmani
Rating: 4.5 out of 5 stars
4.5/5 (15)