Welcome to Scribd!

Skip carousel

Lesson 36 - Rule Induction and Decision Tree II

Uploaded by

Emil Stankov

0% found this document useful (0 votes)

12 views6 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

12 views6 pages

Lesson 36 - Rule Induction and Decision Tree II

Uploaded by

Emil Stankov

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 6

Search inside document

Module

12
Machine Learning
Version 1 CSE IIT, Kharagpur
Lesson
36
Rule Induction and
Decision Tree - II
Version 1 CSE IIT, Kharagpur
Splitting Functions
What attribute is the best to split the data? Let us remember some definitions from
information theory.

A measure of uncertainty or entropy that is associated to a random variable X is defined

as
H(X) = - Σ pi log pi

where the logarithm is in base 2.

This is the “average amount of information or entropy of a finite complete probability

scheme”.

We will use a entropy based splitting function.

Consider the previous example:

t1
S
S

Size divides the sample in two.

S1 = { 6P, 0NP}
S2 = { 3P, 5NP}

H(S1) = 0
H(S2) = -(3/8)log2(3/8)
-(5/8)log2(5/8)

Version 1 CSE IIT, Kharagpur

S1
t2
S2
t3
S3

t1
humidity divides the sample in three.
S1 = { 2P, 2NP}
S2 = { 5P, 0NP}
S3 = { 2P, 3NP}

H(S1) = 1
H(S2) = 0
H(S3) = -(2/5)log2(2/5)
-(3/5)log2(3/5)

Let us define information gain as follows:

Information gain IG over attribute A: IG (A)

IG(A) = H(S) - Σv (Sv/S) H (Sv)

H(S) is the entropy of all examples. H(Sv) is the entropy of one subsample after
partitioning S based on all possible values of attribute A.

Consider the previous example:

S2 S1

Version 1 CSE IIT, Kharagpur

We have,

H(S1) = 0
H(S2) = -(3/8)log2(3/8)
-(5/8)log2(5/8)

H(S) = -(9/14)log2(9/14)
-(5/14)log2(5/14)

|S1|/|S| = 6/14
|S2|/|S| = 8/14

The principle for decision tree construction may be stated as follows:

Order the splits (attribute and value of the attribute) in decreasing order of
information gain.

12.3.4 Decision Tree Pruning

Practical issues while building a decision tree can be enumerated as follows:

1) How deep should the tree be?

2) How do we handle continuous attributes?
3) What is a good splitting function?
4) What happens when attribute values are missing?
5) How do we improve the computational efficiency

The depth of the tree is related to the generalization capability of the tree. If not carefully
chosen it may lead to overfitting.

A tree overfits the data if we let it grow deep enough so that it begins to capture
“aberrations” in the data that harm the predictive power on unseen examples:

t2
Possibly just noise, but
the tree is grown larger
t3 to capture these examples

There are two main solutions to overfitting in a decision tree:

1) Stop the tree early before it begins to overfit the data.

Version 1 CSE IIT, Kharagpur
+ In practice this solution is hard to implement because it is not clear what is a good
stopping point.

2) Grow the tree until the algorithm stops even if the overfitting problem shows up.
Then prune the tree as a post-processing step.

+ This method has found great popularity in the machine learning community.

A common decision tree pruning algorithm is described below.

1. Consider all internal nodes in the tree.

2. For each node check if removing it (along with the subtree below it) and assigning the
most common class to it does not harm accuracy on the validation set.
3. Pick the node n* that yields the best performance and prune its subtree.
4. Go back to (2) until no more improvements are possible.

Decision trees are appropriate for problems where:

• Attributes are both numeric and nominal.

• Target function takes on a discrete number of values.

• A DNF representation is effective in representing the target concept.

• Data may have errors.

Version 1 CSE IIT, Kharagpur

The Mathematica Handbook
From Everand
The Mathematica Handbook
Martha L. Abell
Rating: 5 out of 5 stars
5/5 (1)
Customization Manual BV Fam R2.3 (Software Release 2.5.1)
Document51 pages
Customization Manual BV Fam R2.3 (Software Release 2.5.1)
Bassam Ghazi
No ratings yet
Json
Document10 pages
Json
AJAY SREEDHAR J
100% (1)
Machine Learning Revision Notes
Document6 pages
Machine Learning Revision Notes
Gaurav Tendolkar
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Attacking IEC-60870-5-104 SCADA Systems PDF
Document6 pages
Attacking IEC-60870-5-104 SCADA Systems PDF
Joao Martins
No ratings yet
Machine Learning: Version 2 CSE IIT, Kharagpur
Document6 pages
Machine Learning: Version 2 CSE IIT, Kharagpur
Gaurav Kispotta
No ratings yet
Study Material On Data Structure and Algorithms
Document43 pages
Study Material On Data Structure and Algorithms
God is every where
No ratings yet
Research Article: SA Sorting: A Novel Sorting Technique For Large-Scale Data
Document8 pages
Research Article: SA Sorting: A Novel Sorting Technique For Large-Scale Data
kings cliff
No ratings yet
COMP 171 Data Structures and Algorithms Spring 2005
Document12 pages
COMP 171 Data Structures and Algorithms Spring 2005
Gobara Dhan
No ratings yet
DSA Unit 1
Document28 pages
DSA Unit 1
savitasainstar
No ratings yet
Solutions 1
Document17 pages
Solutions 1
Eman Asem
No ratings yet
2011 12 S1 Final
Document14 pages
2011 12 S1 Final
Liu jingyan
No ratings yet
Module - 04 (Algorithm Analysis)
Document47 pages
Module - 04 (Algorithm Analysis)
hed0895
No ratings yet
Fuzzy Decision Trees
Document12 pages
Fuzzy Decision Trees
Vu Trung Nguyen
No ratings yet
Ugc Cs Paper 2017
Document13 pages
Ugc Cs Paper 2017
Aman Deep Singh
No ratings yet
Tecplot Data Formats
Document3 pages
Tecplot Data Formats
amin nabavizadeh
No ratings yet
SAK 3117 Data Structures: Chapter 1: Introduction To Data Structure
Document51 pages
SAK 3117 Data Structures: Chapter 1: Introduction To Data Structure
Gunblade IV
No ratings yet
ML 2023 FinalExam
Document3 pages
ML 2023 FinalExam
Pattranit Teerakoson
No ratings yet
Fall 2018 Midterm Examination: Questions 1 2 3 4 5 6 7 Total Points 10 10 10 10 10 25 25 100 Score
Document8 pages
Fall 2018 Midterm Examination: Questions 1 2 3 4 5 6 7 Total Points 10 10 10 10 10 25 25 100 Score
Lou Alan
No ratings yet
Exam 10
Document18 pages
Exam 10
Priyadarshini Srinivasan
No ratings yet
Chapter 1 Complexity
Document8 pages
Chapter 1 Complexity
NE-5437 Official
No ratings yet
Datared: Data Reduction Program. JRC-LLB 2004
Document5 pages
Datared: Data Reduction Program. JRC-LLB 2004
Saida Kaoua
No ratings yet
Solmidt
Document3 pages
Solmidt
aimee_jc
No ratings yet
Data Structures Related Algorithms Correctness Time Space Complexity
Document56 pages
Data Structures Related Algorithms Correctness Time Space Complexity
misganaw2003
No ratings yet
Class 16 Decision Tree
Document45 pages
Class 16 Decision Tree
Sumana Basu
No ratings yet
Analysis of Algorithms (CS-318) Assignment: Instructions
Document5 pages
Analysis of Algorithms (CS-318) Assignment: Instructions
Hina jamil
No ratings yet
Bisma Ali - Assignment
Document5 pages
Bisma Ali - Assignment
Hina jamil
No ratings yet
3 Sem Data Structure Notes
Document165 pages
3 Sem Data Structure Notes
Abhijith
No ratings yet
Example of MLE Computations, Using R
Document9 pages
Example of MLE Computations, Using R
nida pervaiz
No ratings yet
KCS301 Unit1 Lec2
Document23 pages
KCS301 Unit1 Lec2
utkarsh
No ratings yet
Decision Tree
Document25 pages
Decision Tree
ricardo_g1973
No ratings yet
Basic Data Structure
Document53 pages
Basic Data Structure
Saloni
No ratings yet
3 Decision Tree Learning
Document38 pages
3 Decision Tree Learning
Matrix Bot
No ratings yet
SP18 CS182 Midterm Solutions - Edited
Document14 pages
SP18 CS182 Midterm Solutions - Edited
Hasim
No ratings yet
Designand Implementationof Fast Fourier Transform FFTusing VHDLCode
Document7 pages
Designand Implementationof Fast Fourier Transform FFTusing VHDLCode
Estevão De Freitas Ferreira Prince
No ratings yet
Analysis of Algorithms: Introduction: What Is Algorithm?
Document39 pages
Analysis of Algorithms: Introduction: What Is Algorithm?
Dev Arena
No ratings yet
1 Complexity and Recursion
Document6 pages
1 Complexity and Recursion
Shuvadeep Das
No ratings yet
2004 LennertLjung Estimation of Grey Box and Black Box For Nonlinear
Document6 pages
2004 LennertLjung Estimation of Grey Box and Black Box For Nonlinear
nur husnina mohd ali
No ratings yet
Session 2 Student Text1
Document8 pages
Session 2 Student Text1
uki
No ratings yet
ADA PPTs (Merged) A2 Batch
Document319 pages
ADA PPTs (Merged) A2 Batch
Aayush
No ratings yet
Com 124 Exam 2019 Regular Marking Scheme
Document10 pages
Com 124 Exam 2019 Regular Marking Scheme
Oladele Campbell
No ratings yet
Deep Learning and ANN QP
Document2 pages
Deep Learning and ANN QP
Sunil Kumar
No ratings yet
1 Preliminaries: Data Structures and Algorithms
Document21 pages
1 Preliminaries: Data Structures and Algorithms
Pavan Rs
No ratings yet
Lossless Data Compression With Neural Networks: Fabrice Bellard May 4, 2019
Document11 pages
Lossless Data Compression With Neural Networks: Fabrice Bellard May 4, 2019
Anonymous BWVVQpx
No ratings yet
R - Packages With Applications From Complete and Censored Samples
Document43 pages
R - Packages With Applications From Complete and Censored Samples
Liban Ali Mohamud
No ratings yet
DAA Unit1
Document25 pages
DAA Unit1
pddwivedi45
No ratings yet
Answer Key DSA
Document24 pages
Answer Key DSA
Mathumathi
No ratings yet
Random Forest
Document83 pages
Random Forest
Bharath Reddy Mannem
No ratings yet
Logistic Regression Tree Analysis
Document21 pages
Logistic Regression Tree Analysis
Haq Kaleem
No ratings yet
Data Structures and Overview
Document13 pages
Data Structures and Overview
srinidhi1956
No ratings yet
Data Structures and Algorithms Unit - 1: Topics
Document38 pages
Data Structures and Algorithms Unit - 1: Topics
KISHAN GOPAL SONI 1660495
No ratings yet
7898 Catboost Unbiased Boosting With Categorical Features
Document11 pages
7898 Catboost Unbiased Boosting With Categorical Features
Iqbal Hanif
No ratings yet
AAA Lecture 6&7 Divide and Conquer
Document49 pages
AAA Lecture 6&7 Divide and Conquer
Muhammad Juniad
No ratings yet
Classification and Prediction
Document81 pages
Classification and Prediction
Krishnan Swami
No ratings yet
Assignment3 Ans 2015 PDF
Document11 pages
Assignment3 Ans 2015 PDF
Mohsen Frag
No ratings yet
ECE 368 A Tour by Example of Non-Trivial Circuit Design and VHDL Description
Document23 pages
ECE 368 A Tour by Example of Non-Trivial Circuit Design and VHDL Description
Anand Chaudhary
No ratings yet
Isolated Digit Recognizer Using Gaussian Mixture Models
Document44 pages
Isolated Digit Recognizer Using Gaussian Mixture Models
Abdelkbir Ws
No ratings yet
Machine Learning: Chapter 3. Decision Tree Learning
Document29 pages
Machine Learning: Chapter 3. Decision Tree Learning
Panku Rangaree
No ratings yet
Assignment 2
Document2 pages
Assignment 2
Ali Mohsin
No ratings yet
1 Intro To Data-Structures
Document27 pages
1 Intro To Data-Structures
amanp760
No ratings yet
Algo Analysis
Document24 pages
Algo Analysis
Aman Qureshi
No ratings yet
Analysis of Pruned Minimax Trees
Document24 pages
Analysis of Pruned Minimax Trees
Daniel Shawul
No ratings yet
Semantic Clustering Identifying Topics in Source Code
Document30 pages
Semantic Clustering Identifying Topics in Source Code
Emil Stankov
No ratings yet
Employing Clustering For Assisting Source Code Maintainability Evaluation According To ISO/IEC-9126
Document5 pages
Employing Clustering For Assisting Source Code Maintainability Evaluation According To ISO/IEC-9126
Emil Stankov
No ratings yet
Source Code Analysis A Road Map PDF
Document16 pages
Source Code Analysis A Road Map PDF
Emil Stankov
No ratings yet
Source Code Attractiveness PDF
Document10 pages
Source Code Attractiveness PDF
Emil Stankov
No ratings yet
Source Code Analysis - An Overview: Radoslav Kirkov, Gennady Agre
Document18 pages
Source Code Analysis - An Overview: Radoslav Kirkov, Gennady Agre
Emil Stankov
No ratings yet
Developing An Intelligent Diagnosis Assessment E-Learning Tool
Document19 pages
Developing An Intelligent Diagnosis Assessment E-Learning Tool
Emil Stankov
No ratings yet
Effective Document Clustering For Large Heterogeneous Law Firm Collections
Document11 pages
Effective Document Clustering For Large Heterogeneous Law Firm Collections
Emil Stankov
No ratings yet
Division by Multiplication: University Ofsaskatchewan, Saskatoon, Saskatchewan, Canada
Document12 pages
Division by Multiplication: University Ofsaskatchewan, Saskatoon, Saskatchewan, Canada
Emil Stankov
No ratings yet
A Machine Learning Based Tool For Source Code Plagiarism Detection
Document7 pages
A Machine Learning Based Tool For Source Code Plagiarism Detection
Emil Stankov
No ratings yet
Mining Students Data To Analyze Learning Behavior: A Case Study
Document4 pages
Mining Students Data To Analyze Learning Behavior: A Case Study
Emil Stankov
No ratings yet
Ashcraft, M. H. Cognitive Arithmetic: A Review of Data and Theory. Cognition 44, 75-106
Document33 pages
Ashcraft, M. H. Cognitive Arithmetic: A Review of Data and Theory. Cognition 44, 75-106
Emil Stankov
No ratings yet
Machine Learning-Learning and Neural Networks-I
Document5 pages
Machine Learning-Learning and Neural Networks-I
Asim Arunava Sahoo
No ratings yet
Source Code Attractiveness PDF
Document10 pages
Source Code Attractiveness PDF
Emil Stankov
No ratings yet
ROC Graphs: Notes and Practical Considerations For Researchers
Document38 pages
ROC Graphs: Notes and Practical Considerations For Researchers
aamoumenaaa
No ratings yet
Reasoning With Uncertainty-Fuzzy Reasoning: Version 2 CSE IIT, Kharagpur
Document5 pages
Reasoning With Uncertainty-Fuzzy Reasoning: Version 2 CSE IIT, Kharagpur
1ona1
No ratings yet
Planning-Planning Algorithm-I
Document7 pages
Planning-Planning Algorithm-I
Asim Arunava Sahoo
No ratings yet
Metrics Based Plagiarism Monitoring PDF
Document8 pages
Metrics Based Plagiarism Monitoring PDF
Emil Stankov
No ratings yet
Lesson 29 PDF
Document15 pages
Lesson 29 PDF
1ona1
No ratings yet
Bayes Networks
Document10 pages
Bayes Networks
Raghuram Holla
No ratings yet
Lesson 22 - Logic Based Planning
Document9 pages
Lesson 22 - Logic Based Planning
Emil Stankov
No ratings yet
Machine Learning-Learning and Neural Networks-III
Document12 pages
Machine Learning-Learning and Neural Networks-III
Asim Arunava Sahoo
No ratings yet
Machine Learning-Learning and Neural Networks-II
Document6 pages
Machine Learning-Learning and Neural Networks-II
Asim Arunava Sahoo
No ratings yet
Lesson 27 PDF
Document5 pages
Lesson 27 PDF
1ona1
No ratings yet
Planning: Version 2 CSE IIT, Kharagpur
Document5 pages
Planning: Version 2 CSE IIT, Kharagpur
Asim Arunava Sahoo
No ratings yet
Machine Learning-Learning From Observations
Document11 pages
Machine Learning-Learning From Observations
Asim Arunava Sahoo
No ratings yet
Machine Learning Chapter 35
Document6 pages
Machine Learning Chapter 35
survinderpal
No ratings yet
Machine Learning: Version 2 CSE IIT, Kharagpur
Document9 pages
Machine Learning: Version 2 CSE IIT, Kharagpur
Satyanarayan Reddy K
No ratings yet
Problem Solvingusing Search - (Single Agent Search) Lesson-5-Informed Search Strategies-I
Document15 pages
Problem Solvingusing Search - (Single Agent Search) Lesson-5-Informed Search Strategies-I
Asim Arunava Sahoo
No ratings yet
Problem Solvingusing Search - (Single Agent Search) - Lesson-4Uninformed Search
Document28 pages
Problem Solvingusing Search - (Single Agent Search) - Lesson-4Uninformed Search
Asim Arunava Sahoo
No ratings yet
Mil HDBK 1390 - Lora
Document31 pages
Mil HDBK 1390 - Lora
kaiser777
No ratings yet
Dreambox 500 For Newbies 6.1
Document140 pages
Dreambox 500 For Newbies 6.1
Jay McGovern
No ratings yet
Specifications and Ordering Information 3500-93 System Display
Document10 pages
Specifications and Ordering Information 3500-93 System Display
P KUBERUDU
No ratings yet
Cyberark Defender: Cyberark Cau201 Version Demo
Document7 pages
Cyberark Defender: Cyberark Cau201 Version Demo
Hakan
No ratings yet
Operator Precedence
Document2 pages
Operator Precedence
Jaishree Chakare
No ratings yet
64 Bits
Document8 pages
64 Bits
Tatyanazaxarova2
No ratings yet
Python Notes With Programs
Document520 pages
Python Notes With Programs
ronnierocket144
No ratings yet
RDBMS Lab Cycle 2
Document3 pages
RDBMS Lab Cycle 2
Harsha Vardhan
No ratings yet
Computer Chapter 13
Document45 pages
Computer Chapter 13
Md. Sakib Hossain
No ratings yet
Maths Unit 1 and Unit 2
Document36 pages
Maths Unit 1 and Unit 2
saisastra3
No ratings yet
NS2 Lab2
Document11 pages
NS2 Lab2
Tech Talk Paper Presentation
No ratings yet
DDO Tips
Document9 pages
DDO Tips
Rich Vincent
100% (1)
Certified Labview Developer Examination: Instructions
Document8 pages
Certified Labview Developer Examination: Instructions
AugustineEsperonParonda
No ratings yet
Tiruchendurmurugan - Hrce.tn - Gov.in Ticketing Reports e Acknowledgement View - PHP Transid 20240131103116000190
Document2 pages
Tiruchendurmurugan - Hrce.tn - Gov.in Ticketing Reports e Acknowledgement View - PHP Transid 20240131103116000190
vadivoo.1967
No ratings yet
Data Checking PDF
Document9 pages
Data Checking PDF
denzelfawn
No ratings yet
A Seminar Report On "Fluorescent Multilayer Disc": Prepared By: Shah Nilay K. 6 E.C
Document16 pages
A Seminar Report On "Fluorescent Multilayer Disc": Prepared By: Shah Nilay K. 6 E.C
turakhiakrupali
No ratings yet
4 Resaleh
Document424 pages
4 Resaleh
Armin Niknam
No ratings yet
Meghan Ficarelli Resume
Document1 page
Meghan Ficarelli Resume
megfic
No ratings yet
ThinkCentre M70s Spec
Document8 pages
ThinkCentre M70s Spec
Elmer Adedoya
No ratings yet
MAPEH 6 - Q1 Exam - 2019 - With Ans Key - BY JULCAIPS
Document4 pages
MAPEH 6 - Q1 Exam - 2019 - With Ans Key - BY JULCAIPS
gay basquinas
No ratings yet
Cybercrime Tactics and Techniques: Q3 2018: Provided by
Document27 pages
Cybercrime Tactics and Techniques: Q3 2018: Provided by
Andres Felipe Camero
No ratings yet
CCHM Lec 1
Document3 pages
CCHM Lec 1
QUIAL BIN
No ratings yet
SQL Server Installation and Configuration
Document52 pages
SQL Server Installation and Configuration
Ashish Choudhary
No ratings yet
INGLES 1. Ultima Version.
Document45 pages
INGLES 1. Ultima Version.
C̶r̶i̶s̶t̶h̶i̶a̶n̶ V̶e̶r̶a̶ ̶Al̶m̶e̶i̶d̶a̶
No ratings yet
MIT18 445S15 Lecture8 PDF
Document11 pages
MIT18 445S15 Lecture8 PDF
krishna
No ratings yet
8 Bit Parallel To Serial Converter
Document7 pages
8 Bit Parallel To Serial Converter
JackKulch
No ratings yet
Datasheet IPM 100RSE120
Document6 pages
Datasheet IPM 100RSE120
Alif Ariyadi
No ratings yet