1 views

Uploaded by Ananya Khanna

Session 3

- SUPERVISED CLASSIFICATION OF SATELLITE IMAGES TO ANALYZE MULTITEMPORAL LAND USE AND COVERAGE: A CASE STUDY FOR THE TOWN OF MARABÁ, STATE OF PARÁ, BRAZIL
- BayesCube Manual v31
- Zomato Review Analysis Using Text Mining
- 6 Back Tracking
- Defect Effort Prediction Models in Software Maintenance Projects
- An Exhortation to Improve Student’s Progression: A Framework
- Boundary Cutting for Packet Classification
- Decision Tree Example
- L9
- A Smart Pill Box with Remind and Consumption using IOT
- Decision Tree
- Tree Plan Guide 176
- 10.1007@978-981-10-2920-2_19
- A note on two problems in connexion with graphs
- Contextual Trans Using Machine
- Iceberg Queries and Other Data Mining Concepts
- Lecture 12
- Survey on Data Mining Techniques for Disease Prediction
- Lecture-29-CS210-2012.pptx
- AVL Tree

You are on page 1of 19

Session_2a

Introduction to Decision Tree Algorithms for

Classification

1

Introduction

Classification involves identifying unseen data

records into pre-known classes or groups

Important activity from an analytical point of

view

Various algorithms have been developed to

achieve the same over the years beginning

1980s

Fundamental idea of information gain from the

theory of Information Systems is utilized in all

standing algorithms in one way or the other

Our focus would be to conceptually understand

the concept of Information gain and utilize it 2

Decision Tree Induction

Involves two steps

Stage 1: Constructing a classification model

Stage 2: Adapting and applying the model to

classify data records whose classes are

unknown

Various terms utilized by practitioners have led

to confusion, however, training set and testing

set are fairly standard over the literature

Some authors use the term validation set

No consensus on how to divide the dataset but

2/3 and 1/3 is the classical approach, a more

rigorous approach is and 1/2 3

Components of a Decision Tree

Leaf nodes : represent the class label

Internal nodes : Name of an attribute

Links : the link from parent node to child node

represents a value of the attribute of the parent node

Root

Internal node

Leaf Leaf

4

Characteristics of Decision Trees

More than one decision tree can be constructed

from the same data

Structure of the decision tree impacts its

performance, classification involves moving

from root to a possible leaf and this usually

involves a test at every internal node (breadth

and length of the tree)

Accuracy of classification is an important

parameter, and is usually dependent on the

final application

5

Constructing a Decision Tree

A lot of work has been done in last 2-3 decades

in this area

Most methods use the following process

If the training set is empty, create a leaf node

and label it as NULL, it means that there is

nothing to determine the class outcome and

hence class is unknown

If all examples in training set are of same class,

create a leaf node and label it with class label

If the examples in the training set are from

different classes the following operations

needed to be performed 6

Select an attribute to be the root of the current

tree

Partition the current training set into subsets

according to the values of the chosen attributes

Construct a subtree for each subset

Create a link from the root of the current tree to

the root of each subtree, and label the link with

appropriate value of the root attribute that

separates one subset from the others

7

Information Gain

Idea originates from information theory and probability

theory, to quantify the amount of information when

random events occur

Information System S is a system around sample space

comprising set of events E1, E2, E3, En with associated

probabilities of event occurrence P(E1), P(E2),.. P(En)

If M is the size of sample space and Nk is the number of

outcomes that convey the event Ek, then the probability

that Ek occurs is calculated as P(Ek) = (Nk/M)

Each attribute can be considered as an Information

system

For a S self Information of an event Ek of S is defined

as

I(Ek) = logq(1/P(Ek)) = -logqP(Ek)

8

Some points to ponder..

When P(Ek) = 0; I(Ek) is set to 0

The base q of the algorithm defines unit of

measurement for the amount of information, if

base is 2 its bits; if base is 10 it is digits

If Ek always happens P(Ek) = 1, I(Ek)=0,

usually a fact that does not convey any

information

If Ek frequently occurs and is close to P(Ek) ~1,

I(Ek) is close to 0, so not much information

If Ek is rare, P(Ek) ~0, I(Ek) is very large,

conveying a large amount of information

If Ek never occurs I(Ek) is infinite, so it is forced

9

to be set to zero

Shannon's Entropy

Based on self-information of individual events, the

average information of the whole information system S,

is defined as the weighted sum of self information of all

events in S

Given by H(S) = P(Ek)*I(Ek) = -P(Ek)*logqP(Ek)

Given two information systems S1 and S2 the

conditional self information of event Ek of S1 given that

event Fj of S2 has occurred is defined as,

I(Ek|Fj) = -logqP(Ek|Fj) = -logqP((Ek and Fj)/P(Fj))

The average conditional information (expected

information) of system S1 of n event in the presence of

system S2 of m events is the weighted sum of the

conditional self-information over all pairs of events in S1

and S2

H(S1|S2) = (i=1,n)(j=1,m)P(Ei and Fj)*I(Ei|Fj) 10

When a decision tree is being constructed two

info systems are present, the attribute A and

Class

H(Class) represents average information of

class system before attribute is chosen for the

root

H(Class|A) represents expected information of

class system after attribute A is chosen as root

The Information Gain over attribute A is given

by G(A)=H(Class)-H(Class|A)

G(A) represents reduction in uncertainty

11

Representative Data Set

Sr Attributes Class

No

Outlook Temperature Humidity Windy

1 Sunny Hot High False N

2 Sunny Hot High True N

3 Overcast Hot High False P

4 Rain Mild High False P

5 Rain Cool Normal False P

6 Rain Cool Normal True N

7 Overcast Cool Normal True P

8 Sunny Mild High False N

9 Sunny Cool Normal False P

10 Rain Mild Normal False P

11 Sunny Mild Normal True P

12 Overcast Mild High True P

13 Overcast Hot Normal False P

12

14 Rain Mild High True N

Calculate!!!

I(Class=P) = -log2P(Class=P)

= -log29/14 = 0.673 bits

H(Class)= -P(Class=P)*log2P(Class=P) P(Class=N)*log2P(Class=N)

= -(9/14)log2(9/14) (5/14)*log2(5/14) = 0.94 bits

13

Calculate

H(Class|A) = (i=1,v)((pi+ni)/(p+n))(H(Class|A=ai))

Where ((pi+ni)/(p+n)) is probability of attribute A=ai

H(Class|Outlook) = (5/14*H(Class|Outlook=sunny)) +

(4/14*H(Class|Outlook=overcast)) +

(5/14*H(Class|Outlook = rain))

0.694 bits

H(Class) H(Class|Outlook) = 0.246 bits

Corresponding values (G(temp)=0.029,

G(Hum)=0.151, G(wind)=0.048

14

Algorithm-ID3

Algorithm constructTreeID3 (C: training set): decision tree;

Begin

Tree = ; empty tree initially

If C is empty then

Tree:= a leaf node labeled NULL;

return(Tree)

else

If C contains examples of one class then

Tree := a leaf node labeled by the class tag

else

For every attribute Ai(1<=i<=p) do

Calculate information gain Gain(Ai)

endfor;

select attribute A where Gain(A) =

max(Gain(A1),Gain(A2),Gain(A3),Gain(A4)....Gain(Ap));

Partition C into subsets C1, C2, C3, C4, Cw by values of A

For each Ci (1<=i<=w) do

t1 := contructTreeID3(Ci);

Label the links from A to the roots of the subtrees with values of A

endfor;

endif;

endif;

return (Tree); 15

end;

Subsets partitioned from the original training sets

Outlook=Sunny

Hot High False N

Hot High True N

Mild High False N

Cool Normal False P

Mild Normal True P

Outlook=overcast

Hot High False P

Cool Normal True P

Mild High True P

Hot Normal False P

16

Outlook=Rain

Mild High False P

Cool Normal False P

Cool Normal True N

Mild Normal False P

Mild High True N

17

Outlook had the highest info gain and so is the

root of the tree

The process is repeated for the remaining

subsets, for first subset Humidity is identified as

having maximum information gain, forms root of

subtree

Links are labeled as high and normal, subsets

partitioned further and two leaf nodes are

formed

The second subset has only one class, so one

leaf is formed

The third subset forms with windy as the root of

subtree, and again two leafs are formed

18

References

Du, Hongbo, Data Mining Techniques and

Applications, Cengage Learning, India, 2013

Berry, M, and Linoff,G.,1997,Data Mining

Techniques for Marketing, sales and customer

support, John Wiley and Sons

Han, J. and Kamber, M., 2001, Data Mining:

Concepts and Techniques, Morgan Kaufman

Publishers

Tan, P-N, Steinbach, M. and Kumar, V., 2006,

Introduction to Data Mining, Addision-Wesley

19

- SUPERVISED CLASSIFICATION OF SATELLITE IMAGES TO ANALYZE MULTITEMPORAL LAND USE AND COVERAGE: A CASE STUDY FOR THE TOWN OF MARABÁ, STATE OF PARÁ, BRAZILUploaded byCS & IT
- BayesCube Manual v31Uploaded byZík Zắk
- Zomato Review Analysis Using Text MiningUploaded byIJCIRAS Research Publication
- 6 Back TrackingUploaded bySuraj Shrestha
- Defect Effort Prediction Models in Software Maintenance ProjectsUploaded byIAEME Publication
- An Exhortation to Improve Student’s Progression: A FrameworkUploaded byEditor IJRITCC
- Boundary Cutting for Packet ClassificationUploaded byJAYAPRAKASH
- Decision Tree ExampleUploaded bysanjay.diddee
- L9Uploaded byROBERTO GAVIDIA DA CRUZ
- A Smart Pill Box with Remind and Consumption using IOTUploaded byIRJET Journal
- Decision TreeUploaded byArpan Kumar
- Tree Plan Guide 176Uploaded byStraub1979
- 10.1007@978-981-10-2920-2_19Uploaded byDjango Boyee
- A note on two problems in connexion with graphsUploaded byJeff Pratt
- Contextual Trans Using MachineUploaded byArRizka Feby
- Iceberg Queries and Other Data Mining ConceptsUploaded byIlya Kavalerov
- Lecture 12Uploaded byMrunal Ruikar
- Survey on Data Mining Techniques for Disease PredictionUploaded byIRJET Journal
- Lecture-29-CS210-2012.pptxUploaded byMoazzam Hussain
- AVL TreeUploaded byMahdi Mashrur Matin
- ijphm_13_035Uploaded byBinh Tai
- SYMBOLIC CLASSIFICATION FOR MULTIVARIATE TIME SERIES.Uploaded byIJAR Journal
- Amazon Interview Questions 2009Uploaded byShivendra Kumar
- Kaplan decision treeUploaded byShahan Ahmed
- Forecasting Methods White PaperUploaded byBalaji Ramakrishnan
- Data StructureUploaded byHanaa Salami
- Novel Ensemble Tree for Fast Prediction on Data StreamsUploaded byAnonymous 7VPPkWS8O
- 4503 Rc158 010d Machinelearning 1Uploaded byAhsanAnis
- DBMS Appendix BUploaded byJohn
- november2005paper3Uploaded byapi-3753397

- 0007-6813%2882%2990130-6.pdfUploaded byAnanya Khanna
- 0007-6813%2882%2990130-6.pdfUploaded byAnanya Khanna
- Big DataUploaded byAnanya Khanna
- Ch01Uploaded byStefanescu Diana
- David A. Aaker-Building Strong Brands (1995).pdfUploaded byUtopia BCS
- ...Uploaded byAnanya Khanna
- Big DataUploaded byAnanya Khanna
- What 'Digital' Really Means _ McKinsey & CompanyUploaded byAnanya Khanna
- Hero Motorcorp_Accounting_Group 2.docxUploaded byAnanya Khanna
- Hero Motorcorp_Accounting_Group 2.docxUploaded byAnanya Khanna
- Coursework PGPM(2016)Uploaded byAnanya Khanna
- TestUploaded bybgorantl
- SM_ASsignment.docxUploaded byAnanya Khanna
- SM_ASsignment.docxUploaded byAnanya Khanna
- MATTEL: CSR InitiativesUploaded byAnanya Khanna

- Privacy Policy Inference of User-Uploaded Images on Content Sharing SitesUploaded byIJSTE
- 1994-2006Uploaded bypujamaiti
- Introduction to ClassificationUploaded byAlfredo Machado Neto
- IRJET- A Review Paper on Data Mining Classification Techniques for Detection of Lung CancerUploaded byIRJET Journal
- 1.2-UserManual-Vision.pdfUploaded byBui Duy Hung
- rnUploaded byAnonymous A2Bcx7
- Clustering Algorithms For High Dimensional Data.pdfUploaded byKonala Teja Saisandeep Reddy
- 05556052Uploaded byNiyas M
- Class on ClusteringUploaded byTanzeel Khan
- M1Uploaded bykevigocha
- User ProfilingUploaded byesudharaka
- Midterm Solutions MachineUploaded bylapendyala
- Mobile Phone Service PRoviderUploaded byVINEET JOSHI
- Class1-AdvancedDataMiningWithWeka-2016Uploaded bysajid
- Bit 4104 Data MiningUploaded byeddirungu
- ijdps040309Uploaded byijdps
- Brief Introduction to ML for EngineersUploaded byThực Đinh Xuân
- Change Point Detection in Time Series Data With Random ForestsUploaded bymuhammadriz
- ssic2010Uploaded byxavier_toh
- kt.pdfUploaded byrajanrathod
- Data Mining TelcoUploaded byderbelmontasser
- Weka ClusteringUploaded byVyankteshKshirsagar
- TLMS-CEPSI2010Uploaded byAndrew Yang
- Training Data SetUploaded bySrinivasan Nagarajan
- M.Tech CS 1st Year 2014Uploaded byRashmi Sharma
- A Machine Learning Framework for Sport Result PredictionUploaded bySiddharth Shukla
- Bishop - Pattern Recognition and Machine Learning - Springer 2006Uploaded byThilini Nadeesha
- Digital Mapping of Soil Drainage Classes Using Multitemporal RADARSAT-1 and ASTER Images and Soil Survey DataUploaded byAsep Ardianto
- imbalanced-classes-report.pdfUploaded bygimli
- An Empirical Study on Detecting Fake ReviewsUploaded byvivekgandhi7k7