Welcome to Scribd!

L22 DecisionTrees

Uploaded by

0% found this document useful (0 votes)

8 views14 pages

This document discusses decision trees, including their history, how they work for classification and regression problems, and algorithms for building decision trees like ID3 and CART. It explains how decision trees classify instances by sorting them from the root node to a leaf node. Each node specifies an attribute test and branches represent attribute values. Metrics like entropy, information gain, and Gini index are used to build trees. The recursive binary splitting algorithm is also described to build regression trees. Sample weather and Python decision trees are shown.

Original Description:

Original Title

L22-DecisionTrees

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views14 pages

L22 DecisionTrees

Uploaded by

whathwaye

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 14

Search inside document

Decision Trees

Arun Kumar

IIT Ropar

1 / 14
Outlines

1 Elements of Information Theory

2 Decision Tree Classification for Categorical Data

3 Decision Tree Regression

2 / 14
History

• Information theory was introduced in 1948 by Shannon.

• The theory cam into existence in connection with the problem of

transmission of information along communications channels.

• “Information" in itself is a very general, qualitative, subjective and not

very precise concept.

• However, information theory is developed into a quantitative, precise,

objective and very useful theory.

3 / 14
Shannon’s Information

• Let the PMF of the rv X is given.

• The question posed by Shannon is the following “Can we find a measure

of how much uncertain we are of the outcome" ?

• Shannon then assumed that if such a function, denoted H(p1 , · · · , pn ),

exists, it is reasonable to expect that it will have the following properties.
a H should be continuous in all the pi .
b If all the pi are equal, i.e., pi = 1/n, then H should have a maximum value
and this maximum value should be a monotonic increasing function of n.
c If a choice is broken down into successive choices, the quantity H should be
the weighted sum of the individual values of H.
Pn
• The entroy function is defined by H(p1 , p2 , · · · , pn ) = − i=1 pi log(pi ).

4 / 14
Measure of Impurity

Entropy
Entropy for a set S is givne by
X
H(S) = − p(c) log2 p(c),
c∈C

where C is the set of classes is S and p(c) are proportions of different

classes.

Gini
Gini impurity for a set S, where the target variable takes N different labels

X N
X
Gini(S) = p(i)p(j) = 1 − p(i)2 ,
i̸=j i=1

where p(i) are the proportions of different labels in the set S.

5 / 14
Decision Tree Introduction

• Decision Tree algorithm belongs to the family of supervised learning

algorithms.
• Decision tree can be used for solving classification as well as regression
problems.
• Decision trees classify instances by sorting them down the tree from the
root node to some leaf node. Leaf node classify the instance.
• Each node in the tree specifies a test of some attribute of instance and
each branch emanating from that node belong to the values of the
attribute.

6 / 14
Sample Decision Tree

7 / 14
Algorithms to build decision trees

• ID3 (Iterative Dichotomiser 3): Entropy and Information Gains as

metrics
• CART (Classification and Regression Trees): Gini Index as metric
• Decision tree regression by using recursive binary splitting as metric
• Others

8 / 14
ID3 Algorithm based on Weather Data

1
1
Based on “Machine Learning", by T. Mitchell, Ch. 3
9 / 14
Final Decision Tree

10 / 14
Recursive Binary Tree Splitting Algorithm

• Suppose there are p predictors.

• We find out the predictor Xj and the cutpoint s such that splitting the
predictor space into the regions {X |Xj < s} and {X |Xj ≥ s} leads to the
highest reduction in Residual Square Sums (RSS).
• In details, for any j and s, define the pair of half-planes given by
R1 (j, s) = {X |Xj < s} and R2 (j, s) = {X |Xj ≥ s}, and we search for the
pair (j, s), that minimize the value of RSS given by
X X
(yi − ŷR1 )2 + (yi − ŷR2 )2
i:xi ∈R1 (j,s) i:xi ∈R2 (j,s)

where ŷR1 is the mean response for the training observations in R1 (j, s),
and ŷR1 is the mean response for the training observations in R2 (j, s).
2

2
Based on “An Introduction to Statistical Learning with Applications in R ", Chapter 8, Page 306
11 / 14
Data

12 / 14
Final Decision Tree Based On Python Sklearn

13 / 14
References

• Beazley, D. M. (2009). Python: Essential Reference (4th ed.) Pearson

Education, Inc.

• James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An

Introduction to Statistical Learning with Applications in R. Springer New
York.

• Mitchell, T. M. (2017). Machine Learning. McGraw Hill Education.

• https://www.superdatascience.com/

14 / 14

Point Estimation: Institute of Technology of Cambodia
Document22 pages
Point Estimation: Institute of Technology of Cambodia
Sao Savath
No ratings yet
Kluber Presentation
Document79 pages
Kluber Presentation
Zegera Mgendi
No ratings yet
The Self From Various Perspective
Document31 pages
The Self From Various Perspective
Mariezen Fernando
No ratings yet
30 Minutes Lesson Plan For CEFR Year 5
Document3 pages
30 Minutes Lesson Plan For CEFR Year 5
Cyril Ryan
No ratings yet
kumarIJMA49 52 2012 PDF
Document8 pages
kumarIJMA49 52 2012 PDF
Thakur Ashok Kumar
No ratings yet
A Fast Decision Tree Learning Algorithm
Document6 pages
A Fast Decision Tree Learning Algorithm
wordmaze
No ratings yet
Summary PDF
Document8 pages
Summary PDF
Nadia Zoubir
No ratings yet
Machine Learning: Chapter 3. Decision Tree Learning
Document29 pages
Machine Learning: Chapter 3. Decision Tree Learning
Panku Rangaree
No ratings yet
Ch02 DecisionTree
Document41 pages
Ch02 DecisionTree
THINH
No ratings yet
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
Document45 pages
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
Đăng Khoa
No ratings yet
Most Compact and Complete Data Science Cheat Sheet 1672981093
Document10 pages
Most Compact and Complete Data Science Cheat Sheet 1672981093
Mahesh Kotnis
No ratings yet
Statistical Methods For Machine Learning
Document272 pages
Statistical Methods For Machine Learning
mohamad
No ratings yet
A Fuzzy TOPSIS Decision Making Model With Entropy Weight Under Intuitionistic Fuzzy Environment
Document4 pages
A Fuzzy TOPSIS Decision Making Model With Entropy Weight Under Intuitionistic Fuzzy Environment
Hivda
No ratings yet
Chap 2:: 4 Levels of Measurement
Document13 pages
Chap 2:: 4 Levels of Measurement
Thi Thi Pham
No ratings yet
Compressed Sensing
Document34 pages
Compressed Sensing
Logan Cheng
No ratings yet
Team 22
Document5 pages
Team 22
api-422839082
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
Document10 pages
What Is Data Science? Probability Overview Descriptive Statistics
MauricioRojas
No ratings yet
Information Geometry in Optimization Machine Learn
Document3 pages
Information Geometry in Optimization Machine Learn
sggtio
No ratings yet
Chapter 3
Document33 pages
Chapter 3
VaradharajanSrinivasan
No ratings yet
Speeding Up Feature Selection by Using An Information Theoretic Bound
Document8 pages
Speeding Up Feature Selection by Using An Information Theoretic Bound
elakadi
No ratings yet
1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
Document14 pages
1983 Efron Gong A Leisurely Look at The Bootstrap Jackknife CV CV
tema.kouznetsov
No ratings yet
R300 Advanced Econometrics Methods Lecture Slides
Document362 pages
R300 Advanced Econometrics Methods Lecture Slides
Marco Brolli
No ratings yet
2072 4119 1 SM
Document5 pages
2072 4119 1 SM
Vijay Mani
No ratings yet
Entropy-Based Algorithm For Discretization: April 2011
Document9 pages
Entropy-Based Algorithm For Discretization: April 2011
alexander fraya
No ratings yet
Survey Wss 2010
Document134 pages
Survey Wss 2010
Robert Romero
No ratings yet
Class 16 Decision Tree
Document45 pages
Class 16 Decision Tree
Sumana Basu
No ratings yet
Content PDF
Document61 pages
Content PDF
Andika Saputra
No ratings yet
Introduction To Statistical Hypothesis Testing: Arun K. Tangirala
Document16 pages
Introduction To Statistical Hypothesis Testing: Arun K. Tangirala
MeetGandhi
No ratings yet
BSc. AC-Sem IV
Document19 pages
BSc. AC-Sem IV
Nisarg Chauhan
No ratings yet
Using Nonlinear Features For Voice Disorder Detection
Document13 pages
Using Nonlinear Features For Voice Disorder Detection
PallavI Baljekar
No ratings yet
Journal of Statistical Software: SPECIES: An R Package For Species Richness Estimation
Document15 pages
Journal of Statistical Software: SPECIES: An R Package For Species Richness Estimation
Benjamin Corvalan
No ratings yet
Decision Tree 2
Document20 pages
Decision Tree 2
nguyenxuanthai140585
No ratings yet
Utilizing A Quantile Function Approach To Obtain Exact Bootstrap Solutions
Document10 pages
Utilizing A Quantile Function Approach To Obtain Exact Bootstrap Solutions
susy m
No ratings yet
Chapter 4 & 5
Document41 pages
Chapter 4 & 5
tinsae
No ratings yet
Imputation Based On Local Linear Regression For Nonmonotone Nonrespondents in Longitudinal Surveys
Document17 pages
Imputation Based On Local Linear Regression For Nonmonotone Nonrespondents in Longitudinal Surveys
ASRIANIMIPA
No ratings yet
Outlier Detection For High-Dimensional Data
Document11 pages
Outlier Detection For High-Dimensional Data
Muhamad ikhrasul amal
No ratings yet
A Data-Based Power Transformation For Compositional Data
Document9 pages
A Data-Based Power Transformation For Compositional Data
Bao Trang Trang
No ratings yet
ملزمة الاحصاء د.عبدالخالق
Document106 pages
ملزمة الاحصاء د.عبدالخالق
Stad Bn
No ratings yet
On Lower Bounds For Statistical Learning Theory
Document17 pages
On Lower Bounds For Statistical Learning Theory
Bill Petrie
No ratings yet
Concept of Data Depth and Applications
Document9 pages
Concept of Data Depth and Applications
MCSlocoloco
No ratings yet
Computation: A Scale Invariant Distribution of The Prime Numbers
Document13 pages
Computation: A Scale Invariant Distribution of The Prime Numbers
Nikos Mantzakouras
No ratings yet
Entropy 20 00788
Document13 pages
Entropy 20 00788
Sm
No ratings yet
Cluster Analisys
Document100 pages
Cluster Analisys
Muji Gunarto
No ratings yet
A Geometrical Viewpoint On The Benign Overfitting Property of The Minimum - Norm Interpolant Estimator
Document32 pages
A Geometrical Viewpoint On The Benign Overfitting Property of The Minimum - Norm Interpolant Estimator
Marcelo Marcy Majstruk Cimillo
No ratings yet
On Information (Pseudo) Metric: Supported by Median Technologies
Document13 pages
On Information (Pseudo) Metric: Supported by Median Technologies
Jose Gregorio Rodriguez Vilarreal
No ratings yet
2.MIT18 650F16 Parametric Inf
Document12 pages
2.MIT18 650F16 Parametric Inf
bob
No ratings yet
Ortonormalidad en Espacios de Hilbert
Document20 pages
Ortonormalidad en Espacios de Hilbert
juan
No ratings yet
Notation
Document7 pages
Notation
alabi1234
No ratings yet
An Introduction To Statistical Inference - 3
Document30 pages
An Introduction To Statistical Inference - 3
prithvimalik543
No ratings yet
A Novel Approach of Refined Plithogenic Neutrosophic Sets in Multi Criteria Decision Making
Document5 pages
A Novel Approach of Refined Plithogenic Neutrosophic Sets in Multi Criteria Decision Making
Science Direct
No ratings yet
Dirilchet 2
Document3 pages
Dirilchet 2
egnpaz
No ratings yet
03 Propensity Scores Notes
Document12 pages
03 Propensity Scores Notes
kasturi.kandalam
No ratings yet
An Extended Idea About Decision Trees
Document6 pages
An Extended Idea About Decision Trees
Bijith K
No ratings yet
EX
Document12 pages
EX
tarek mahmoud
No ratings yet
MIT18 05S14 Cl5contslides PDF
Document11 pages
MIT18 05S14 Cl5contslides PDF
Aftab Saad
No ratings yet
An Introduction To Objective Bayesian Statistics PDF
Document69 pages
An Introduction To Objective Bayesian Statistics PDF
Waterloo Ferreira da Silva
No ratings yet
Size-Biased Poisson-Garima Distribution With Applications: Biometrics & Biostatistics International Journal
Document6 pages
Size-Biased Poisson-Garima Distribution With Applications: Biometrics & Biostatistics International Journal
Suhail Ashraf
No ratings yet
A Kick-Ass Application: Phillip Potamites May 31, 2007
Document4 pages
A Kick-Ass Application: Phillip Potamites May 31, 2007
phli
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
Document7 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
wandalex
No ratings yet
A Survey of Dimension Reduction Techniques
Document18 pages
A Survey of Dimension Reduction Techniques
Eulalio Colubio Jr.
No ratings yet
Advances in Domain Adaptation Theory
From Everand
Advances in Domain Adaptation Theory
Ievgen Redko
No ratings yet
Numerical Analysis
From Everand
Numerical Analysis
John Todd
No ratings yet
Nonlinear Functional Analysis and Applications: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin, Madison, October 12-14, 1970
From Everand
Nonlinear Functional Analysis and Applications: Proceedings of an Advanced Seminar Conducted by the Mathematics Research Center, the University of Wisconsin, Madison, October 12-14, 1970
Louis B. Rall
No ratings yet
Derivatives Basics
Document41 pages
Derivatives Basics
whathwaye
No ratings yet
Lecture 29 30
Document6 pages
Lecture 29 30
whathwaye
No ratings yet
Handout - Basic Regression - Analysis
Document14 pages
Handout - Basic Regression - Analysis
whathwaye
No ratings yet
Lecture 19
Document5 pages
Lecture 19
whathwaye
No ratings yet
Lecture 18
Document8 pages
Lecture 18
whathwaye
No ratings yet
Modern Anaesthetic Machines
Document4 pages
Modern Anaesthetic Machines
Sanj.etc
No ratings yet
Metraflex Fireloop™: Down Have The 180 Return Supported. (See Installation Instructions.)
Document4 pages
Metraflex Fireloop™: Down Have The 180 Return Supported. (See Installation Instructions.)
ANTONIO GUTIERREZ MIRANDA
No ratings yet
Catalog Profile Standard
Document16 pages
Catalog Profile Standard
Bogdan Andrei
No ratings yet
Thesis Statement About Sign Language
Document4 pages
Thesis Statement About Sign Language
afbsyebpu
100% (3)
Horizontal Cement Silo: Container Type - 27 Tonne
Document4 pages
Horizontal Cement Silo: Container Type - 27 Tonne
Anand Puntambekar
No ratings yet
Fc406 q1 Tabilin, Gizelle
Document3 pages
Fc406 q1 Tabilin, Gizelle
Gizelle Alcantara-Tabilin
No ratings yet
American Dream Essay-Ben Taylor
Document2 pages
American Dream Essay-Ben Taylor
api-550023592
No ratings yet
Direct Contracting
Document1 page
Direct Contracting
Charlie Adona
No ratings yet
AccuSine PCS+ in Marine Brochure (Print)
Document5 pages
AccuSine PCS+ in Marine Brochure (Print)
dhanasekhar27
No ratings yet
Advance Shoes For Blind People
Document19 pages
Advance Shoes For Blind People
Moiz Iqbal
100% (1)
Assignment 1-1
Document3 pages
Assignment 1-1
muhaba Adege
No ratings yet
Mil STD 1877a
Document18 pages
Mil STD 1877a
María Camila
No ratings yet
Motion of Centre of Mass
Document3 pages
Motion of Centre of Mass
mukesh1976
No ratings yet
Safeguarding Advisor: Location: (North America) (United States) Town/City: Federal Way Category: Child Development
Document3 pages
Safeguarding Advisor: Location: (North America) (United States) Town/City: Federal Way Category: Child Development
Henrie Abalos
No ratings yet
Certificate: Bionet Servicios Técnicos, S.L
Document1 page
Certificate: Bionet Servicios Técnicos, S.L
Francisco Vicente S.
No ratings yet
Rule Sheet Debate
Document6 pages
Rule Sheet Debate
Shahid Shaikh
No ratings yet
Monica Moya: 3126 Kernan Lake Circle #203, Jacksonville, FL - (954) 736-7681
Document1 page
Monica Moya: 3126 Kernan Lake Circle #203, Jacksonville, FL - (954) 736-7681
Monica Moya
No ratings yet
Science Lesson 3 Clouds Weather
Document3 pages
Science Lesson 3 Clouds Weather
api-636459501
No ratings yet
Instrument Hydra B&G
Document176 pages
Instrument Hydra B&G
Per Löfgren
No ratings yet
C6085 - Et3a
Document2 pages
C6085 - Et3a
Cu Ti
No ratings yet
Asus VS247HV
Document92 pages
Asus VS247HV
bawbawerert34
No ratings yet
Geriatric Nursing First Class
Document45 pages
Geriatric Nursing First Class
Poonam Rana
No ratings yet
City News Nov '09
Document1 page
City News Nov '09
Guillermo Capati
No ratings yet
Science 7 DLP
Document6 pages
Science 7 DLP
Mary Apostol
0% (1)
Matthew Weise, "Understanding Meaningfulness in Video Games"
Document80 pages
Matthew Weise, "Understanding Meaningfulness in Video Games"
MIT Comparative Media Studies/Writing
No ratings yet
Design of FIR Filter Using Window Method: IPASJ International Journal of Electronics & Communication (IIJEC)
Document5 pages
Design of FIR Filter Using Window Method: IPASJ International Journal of Electronics & Communication (IIJEC)
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Englishexam
Document10 pages
Englishexam
Sushant Yadav
No ratings yet