0 Up votes0 Down votes

209 views48 pagesMar 06, 2012

© Attribution Non-Commercial (BY-NC)

PPT, PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

209 views

Attribution Non-Commercial (BY-NC)

- EON Tutorials
- Decision Trees
- i b 3514141422
- id3-c45
- Eckroth, Joshua-Python Artificial Intelligence Projects for Beginners _ Get Up and Running With Artificial Intelligence Using 8 Smart and Exciting AI Applications.-packt Publishing Ltd (2018)
- 2013-AnEvaluationofFactorsAffectingBrandAwarenessintheContextofSocialMedia
- David_Ramos.ppt
- Investigating the Efficacy of Brazilian Public Policies for Ethnic-Racial issues in Higher Education: The Case of Tocantins State in ENADE 2014
- ML Assignment 1
- MIMO Detection Methods_How They Work
- SAS Miner Training
- Rule Discovery and Technical Analysis Expert System Development in Tehran Stock Ex-Change (TSE) Using Decision Trees and Neural Networks, Hooman Hidaji
- Kiwifruit and apricot ﬁrmness measurement.pdf
- Tugas Mandiri Kemometri
- The Unscrambler Tutorials
- output spss.doc
- Strategic Management Article
- Data Structures
- Decision Fania Bab 6 Akhir
- Cereal Example Notes

You are on page 1of 48

CART developed by Breiman Friedman Olsen and Stone: Classification and Regression Trees C4.5 A Machine Learning Approach by Quinlan Engineering approach by Sethi and Sarvarayudu

Example

University of California- a study into patients after admission for a heart attack 19 variables collected during the first 24 hours for 215 patients (for those who survived the 24 hours) Question: Can the high risk (will not survive 30 days) patients be identified?

Answer

Is the minimum systolic blood pressure over the !st 24 hours>91?

Is age>62.5?

Features of CART

Binary Splits Splits based only on one variable

Selection of the Splits Decisions when to decide that a node is a terminal node (i.e. not to split it any further) Assigning a class to each terminal node

Impurity of a Node

Need a measure of impurity of a node to help decide on how to split a node, or which node to split The measure should be at a maximum when a node is equally divided amongst all classes The impurity should be zero if the node is all one class

Measures of Impurity

Misclassification Rate Information, or Entropy Gini Index In practice the first is not used for the following reasons: Situations can occur where no split improves the misclassification rate The misclassification rate can be equal when one option is clearly better for the next step

40 of A

Possible split

60 of A

60 of A

40 of B

Possible split Neither improves misclassification rate, but together give perfect classification!

400 of A 400 of B

OR?

400 of A 400 of B

300 of A 100 of B

100 of A 300 of B

200 of A 400 of B

200 of A 0 of B

1/2

0.5

p1

Information

If a node has a proportion of pj of each of the classes then the information or entropy is:

i ( p) ! p j log p j

j

Gini Index

This is the most widely used measure of impurity (at least by CART) Gini index is:

i ( p ) ! pi p j ! 1 p

i{ j j

2 j

0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 p1 0.6 0.7 0.8 0.9 1 Misclassification rate Gini Index Information

Tree Impurity

We define the impurity of a tree to be the sum over all terminal nodes of the impurity of a node multiplied by the proportion of cases that reach that node of the tree Example i) Impurity of a tree with one single node, with both A and B having 400 cases, using the Gini Index: Proportions of the two cases= 0.5 Therefore Gini Index= 1-(0.5)2- (0.5)2 = 0.5

Numbers of Proportion Cases of Cases A B A pA 400 400 0.5 B pB 0.5 p2A p2B 1- p2A- p2B 0.5 Gini Index

0.25 0.25

Number of Cases A B

Gini Index

Contrib. To Tree

300 100

200 200

400 0.33 0 1

0.67 0

0.1111 1

Selection of Splits

We select the split that most decreases the Gini Index. This is done over all possible places for a split and all possible variables to split. We keep splitting until the terminal nodes have very few cases or are all pure this is an unsatisfactory answer to when to stop growing the tree, but it was realized that the best approach is to grow a larger tree than required and then to prune it!

Classifying A or B

A 6 A A A A A A ABA A A A A A A A A A A A AA A A A A AA A A B A A A A A A B A A A A A A B B A BA B A A A BA B B B A B B B B B B B B B B B B

B B B B B B

B BB B B BB B B B B B

B B B B B B 0

B B B

4 x

Possible Splits

There are two possible variables to split on and each of those can split for a range of values of c i.e.: x<c or xc And: y<c or yc

Split= x 2.61 2.57 2.85 2.45 2.76 2.82 2.68 y Class 2.02 A 2.10 A 2.46 B 2.85 A 3.00 A 3.07 A 3.13 B A 1 1 0 1 1 0 0 B

2.81 A 0 0 0 0 0 0 1 0 0 0 0 0 1 0 B 0 0 1 0 0 0 0

Etc.

Top

50

50

100

0.5

0.5 0.25

0.25

0.5

44 6

7 43

0.02 0.77

0.24 0.21

0.12 0.11

Sum=

0.23

0.27

Then use Data table to find the best value for a split.

Split

Change 0.27

0.3 0.25 C hange in Gini Index 0.2 0.15 0.1 0.05 0 1.5 2 2.5 Split Value on x 3 3.5 4

0.25

0.15

0.1

0.05

Youd now need to develop a series of spreadsheets to work out the next best split This is easier in R!

x< 2.808 |

y>=2.343

y>=3.442

Need to load the package rpart which contains the set of functions for CART The function looks like: NNB.tree<-rpart(Type~., NNB[ , 1:2], cp = 1e-3) This takes the data in Type (which contains the classes for the data, i.e. A or B), and builds a model on all the variables indicated by ~. . The data is in NNB[, 1,2] and cp is complexity parameter (more to come about this).

A 6

A A A A A A A ABA A A A A A A A A A A A A A A A A AA A A B A A A A A A A B A A A A A B B A BA B A A A A B B B B A B B B B B B B B B B B B

B B B B B B

B BB B B BB B B B B B

B B B B B B 0

B B B

4 x

This is based on my own research Wish to tell which is best method of exponential smoothing to use based on the data automatically. The variables used are the differences of the fits for three different methods (SES, Holts and Damped Holts Methods), and the alpha, beta and phi estimated for Damped Holt method.

Diff2>=5.229 | phi< 0.9732 phi< 0.9395 Diff1< 30.02 Diff2>=1.557 Diff2>=3.109 Diff1>=3.833

Diff1< 25.7 Diff2< 14.85 Diff1>=2.293 Diff2< 2.37 DHolt SES DHolt Holt phi>=0.9716 phi< 0.9829 phi< beta>=0.06944 beta>=0.1674 0.6876 SES DHolt DHolt beta< 0.7296 Diff2< 3.481 Diff2>=2.216 SES DHolt DHolt DHolt DHolt Holt Holt phi>=0.7092 beta>=0.6953 Holt DHolt DHolt SES alpha>=0.4296 DHolt SES DHolt Diff1< 44.38 Holt beta>=0.3652 DHolt beta< 0.5043 Holt DHolt Holt

As I said earlier it has been found that the best method of arriving at a suitable size for the tree is to grow an overly complex one then to prune it back. The pruning is based on the misclassification rate. However the error rate will always drop (or at least not increase) with every split. This does not mean however that the error rate on Test data will improve.

Misclassification Rates

1 0.9 0.8 0.7 E rror ra te s 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 Size of the Tree 50 60 70 80

The solution to this problem is crossvalidation. One version of the method carries out a 10 fold cross validation where the data is divided into 10 subsets of equal size (at random) and then the tree is grown leaving out one of the subsets and the performance assessed on the subset left out from growing the tree. This is done for each of the 10 sets. The average performance is then assessed.

This is all done by the command rpart and the results can be accessed using printcp and plotcp. We can then use this information to decide how complex (determined by the size of cp) the tree needs to be. The possible rules are to minimise the cross validation relative error (xerror), or to use the 1-SE rule which uses the largest value of cp with the xerror within one standard deviation of the minimum. This is preferred by Breiman et al and B D Ripley who has included it as a dashed line in the plotcp function

> printcp(expsmooth.tree) Classification tree: rpart(formula = Model ~ Diff1 + Diff2 + alpha + beta + phi, data = expsmooth, cp = 0.001) Variables actually used in tree construction: [1] alpha beta Diff1 Diff2 phi Root node error: 2000/3000 = 0.66667 n= 3000 CP nsplit 1 0.4790000 0 2 0.2090000 1 3 0.0080000 2 4 0.0040000 4 5 0.0035000 5 6 0.0025000 8 7 0.0022500 9 8 0.0020000 13 9 0.0017500 16 10 0.0016667 20 11 0.0012500 23 12 0.0010000 25 rel error 1.0000 0.5210 0.3120 0.2960 0.2920 0.2810 0.2785 0.2675 0.2615 0.2545 0.2495 0.2470 xerror 1.0365 0.5245 0.3250 0.3050 0.3115 0.3120 0.3085 0.3105 0.3075 0.3105 0.3175 0.3195 xstd 0.012655 0.013059 0.011282 0.011022 0.011109 0.011115 0.011069 0.011096 0.011056 0.011096 0.011187 0.011213

1.2 1 R e la t iv e E r r o r R a t e 0.8 0.6 0.4 0.2 0 0 5 10 No of Splits 15 20

This relative CV error tends to be very flat which is why the 1-SE Rule is preferred

25

size of tree 1 2 3 5 6 9 10 14 17 21 24 26

0.2 Inf

0.4

0.6

0.8

1.0

0.32

0.0057

0.003 cp

0.0021

0.0017

0.0011

Diff2>=5.229 |

phi< 0.9732

Diff2>=1.557

This suggests that a cp of 0.003 is about right for this tree - giving the tree shown

DHolt

SES

Cost complexity

Whilst we did not use misclassification rate to decide on where to split the tree we do use it in the pruning. The key term is the relative error (which is normalised to one for the top of the tree). The standard approach is to choose a value of E, and then to choose a tree to minimise RE =R+ Esize where R is the number of misclassified points and the size of the tree is the number of end points. cp is E/R(root tree).

Regression trees

Trees can be used to model functions though each end point will result in the same predicted value, a constant for that end point. Thus regression trees are like classification trees except that the end pint will be a predicted function value rather than a predicted classification.

Instead of using the Gini Index the impurity criterion is the sum of squares, so splits which cause the biggest reduction in the sum of squares will be selected. In pruning the tree the measure used is the mean square error on the predictions made by the tree.

Regression Example

In an effort to understand how computer performance is related to a number of variables which describe the features of a PC the following data was collected: the size of the cache, the cycle time of the computer, the memory size and the number of channels (both the last two were not measured but minimum and maximum values obtained).

mmax< 6100

cach< 27 |

mmax< 2.8e+04

mmax< 1750

syct>=360

cach< 96.5

cach< 56

1.28

1.411.54

1.761.87

We can see that we need a cp value of about 0.008 - to give a tree with 11 leaves or terminal nodes

0.2 Inf

0.4

0.6

0.8

1.0

0.088

0.03

0.018

0.012 cp

0.0054

0.0032

0.0018

cach< 27 |

mmax< 6100

mmax< 2.8e+04

mmax< 1750

syct>=360

cach< 96.5

cach< 56

1.09

1.43

1.28

2.27

2.67

This enables us to see that, at the top end, it is the size of the cache and the amount of memory that determine performance

1.53

1.75

Advantages of CART

Can cope with any data structure or type Classification has a simple form Uses conditional information effectively Invariant under transformations of the variables Is robust with respect to outliers Gives an estimate of the misclassification rate

Disadvantages of CART

CART does not use combinations of variables Tree can be deceptive if variable not included it could be as it was masked by another Tree structures may be unstable a change in the sample may give different trees Tree is optimal at each split it may not be globally optimal.

Exercises

Implement Gini Index on a spreadsheet Have a go at the lecture examples using R and the script available on the web Try classifying the Iris data using CART.

- EON TutorialsUploaded byAnaRocha
- Decision TreesUploaded byJaphet Repolledo
- i b 3514141422Uploaded byAnonymous 7VPPkWS8O
- id3-c45Uploaded byPawan Choure
- Eckroth, Joshua-Python Artificial Intelligence Projects for Beginners _ Get Up and Running With Artificial Intelligence Using 8 Smart and Exciting AI Applications.-packt Publishing Ltd (2018)Uploaded bykhunter
- 2013-AnEvaluationofFactorsAffectingBrandAwarenessintheContextofSocialMediaUploaded bysaikrishna
- David_Ramos.pptUploaded byMuftiaAlfian
- Investigating the Efficacy of Brazilian Public Policies for Ethnic-Racial issues in Higher Education: The Case of Tocantins State in ENADE 2014Uploaded byIJAERS JOURNAL
- ML Assignment 1Uploaded byAvinash Sharma
- MIMO Detection Methods_How They WorkUploaded byمصطفى نبيل
- SAS Miner TrainingUploaded byAli Raza Anjum
- Rule Discovery and Technical Analysis Expert System Development in Tehran Stock Ex-Change (TSE) Using Decision Trees and Neural Networks, Hooman HidajiUploaded byHooman Hidaji
- Kiwifruit and apricot ﬁrmness measurement.pdfUploaded byntdien923
- Tugas Mandiri KemometriUploaded byDyah Indra
- The Unscrambler TutorialsUploaded byViviane Genka
- output spss.docUploaded byDevi Asyaidah
- Strategic Management ArticleUploaded bykhaled
- Data StructuresUploaded bySantosh Jhansi
- Decision Fania Bab 6 AkhirUploaded byFania Anindita Rizka
- Cereal Example NotesUploaded byRohan Gupta
- Buildings-203578- Revised- With Track ChangesUploaded byYezan Arahman
- UGC NET Computer Science & Applications Paper 2 Dec. 2012Uploaded byNibedita Khandual
- Peranan ESA Pada Pasien Anemia RenalUploaded byerahade
- REGRESI 1.docxUploaded byyahya
- Output 1 No OutlierUploaded byHelmi Nur
- ReferatUploaded byflorin
- Multiple Regression Diagnostics CollectionUploaded byJuan Jose Torres
- Data Structures Using c Interview QuestionsUploaded byhemaninassa
- How Excel Works From Data Structures Point of ViewUploaded bypandyakavi
- DADM-Correlation and RegressionUploaded byDipesh Karki

- Juzz-Al-Qira’atUploaded byMeaad Al-Awwad
- ResearchUploaded byHaq Kaleem
- 779SUploaded byHaq Kaleem
- deobandUploaded bypakhral
- j.1753-6405.2008.00166.xUploaded byHaq Kaleem
- Factors Affecting Child Health a Study of Rural FaisalabadUploaded byHaq Kaleem
- English Translations of QuranUploaded bywhvn_haven
- Rafa al-YadainUploaded byIqra Bismi Rabbi Kallazee Khalaq
- English How to Approach the QuranUploaded byHaq Kaleem
- Chapter 12 Designing ExperimentsUploaded byHaq Kaleem
- Clinical and Gene- Expression Profiles ForUploaded byHaq Kaleem
- OMEGA Gutierrez2010Uploaded byHaq Kaleem
- springerUploaded byHaq Kaleem

- The Big Bang TheoryUploaded bymaneesh_massey_1
- What Scientific Concept Would Improve Everybody's Cognitive Toolkit?Uploaded bynamaku0
- Technical Journal GrandstandUploaded byEnrico
- ReadmeUploaded bynazar_poenyax
- Design and Modeling an Automated Digsilent Power System for Optimal New Load LocationsUploaded byIJSTR Research Publication
- Spin Liquids with Topological OrderUploaded byGravvol
- Hasil Uji Validitas Dan Reliabilitas InstrumenUploaded byYusuf Fathur
- Perforated Plate Efficiency Effect of Desgin and Operation VariablesUploaded byGustavo Gabriel Jimenez
- Capital Budgeting (5)Uploaded byfreshkidjay
- 14-01954MertMuameleciRapportUploaded byandredurvalandrade
- How to Measure Projects Success.docxUploaded byMikiRoniWijaya
- Preparing for the ACTUploaded bywerther13
- lesson plan unit 4 day 3Uploaded byapi-251092281
- OO Case Studies with Patterns and C++Uploaded byImre Tuske
- storage-management.pptUploaded byAneeb Arayilakath
- j150514a007Uploaded bySingh Anuj
- Customer+Satisfaction+and+Service+Quality+in+High-Contact+Service+FirmUploaded byanne byun
- Wind Effects on Monosloped and Sawtooth RoofsUploaded byJay-ar Cabus
- IITb Asi CourseUploaded byBittesh Chaki
- Nurbs TorsoUploaded byCarlos Johny Quezada Cerna
- Computer Architecture and Assembly Language Programming - CS401 Spring 2009 Assignment 02 SolutionUploaded byV Xange
- sample_size_formula_for_a_comparative_study.pdfUploaded byKristine Regasa
- AMC 8 2004 SolutionsUploaded byaniket
- chapter 8 Class 11Uploaded byKAMAL KANT KUSHWAHA
- R16B.TECHCSEIVYearSyllabusUploaded byAnonymous ZA97YS
- Special Characters Cheat SheetUploaded byice_cold
- SyllabusUploaded byScott
- Dunn&BradstreetAnalyticsUploaded byArpan Kumar
- Bits and BytesUploaded byPalash Kanti Rohit
- Elec Assign 2Uploaded bypaula

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.