You are on page 1of 35

CLASSIFICATION: Business Analytics

DECISION TREES Lecture 7/8


LEARNING OBJECTIVES
• Explain what is classification
• Define decision tree
• Compare the advantages and disadvantages of decision tree
• Building decision a tree
• Evaluating a decision tree
A. Explain What is Classification

WHAT IS CLASSIFICATION
• Classification is a data mining function that assigns items into
categories or classes.

• The goal of classification is to accurately predict the target class for


each case in the data.
A. Explain What is Classification

EXAMPLES OF
CLASSIFICATION TASKS
• Identifying loan applicants as low, medium, or high credit risks.

• Predicting tumour cells as benign or malignant

• Classifying credit card transactions as legitimate or fraudulent


A. Explain What is Classification

CLASSIFICATION RULES
• Classification rules help assign new objects to classes.

E.g., given a new automobile insurance applicant, should he or she be


classified as low risk, medium risk or high risk?

• Classification rules for the above example could use a variety of data, such
as educational level, salary, age, etc.
Person P, P.degree = master and P.income > 75,000 ⇒ P.credit = excellent
Person P, P.degree = bachelorsand (P.income > 25,000 and P.income< 75,000) ⇒ P.credit = good

• Rules are not necessarily exact - there may be some misclassifications

• Classification rules can be represented by a decision tree.


B. Define Decision Tree

WHAT IS A DECISION TREE?


• A decision tree is a hierarchical collection of rules
that describes how to divide a collection of records
into successively smaller groups of records.
• The aim of the division is to have resulting segments
become more and more similar (pure) to one another
with respect to the target.
• It is a predictive model based on a branching series
of tests
• Can be used for binary or multiple outcomes
• Allows us to understand which variables are
important
• Spot unexpected patterns
B. Define Decision Tree

STRUCTURE OF A DECISION TREE


• Consist of root, nodes, leaves, and splits
• At each node, a decision is made on which
variable to split
• These variables are the most important
• All records landing at the same leaf get the
same prediction
C. Advantages and Disadvantages of Decision Tree

PROS AND CONS


OF DECISION TREES
Pros Cons

+ Reasonable training time - Cannot handle complicated


relationship between attributes
+ Fast application
- Problems are created with lots of
+ Easy to interpret
missing data
+ Easy to implement

+ Can handle large number of


attributes
D. Building a Decision Tree

PURPOSE OF A DECISION TREE


• Given a collection of records (training set )
‒ Each record contains a set of attributes.
• One of the attributes is the class.
• The aim is to find a model for the class attribute as a function of the values
of other attributes.
• Goal: previously unseen records should be assigned a class as accurately as
possible.
‒ A test set is used to determine the accuracy of the model.
‒ Usually, the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
D. Building a Decision Tree

DECISION TREE
CLASSIFICATION TASK
D. Building a Decision Tree

DECISION TREE USING HUNT’S


ALGORITHM
Hunt's algorithm

grows a decision tree

in a recursive fashion

by partitioning the training records

into successively purer subsets.


D. Building a Decision Tree

BUILDING THE DECISION TREE


• We start at the root node with all records in the training set

• Drawn from left to right

• Consider every split on every variable

• Choose the split that maximizes a measure of purity

• For each child of the root node, we again search for the best split

• Eventually, the process stops when no good split is available or leaves are
pure
D. Building a Decision Tree

BUILDING THE DECISION TREE


Step 1

Don’t Cheat (4 )
Cheat (2)
D. Building a Decision Tree

BUILDING THE DECISION TREE


Step 2

Don’t Cheat (1 )
Cheat (3)
D. Building a Decision Tree

BUILDING THE DECISION TREE


Step 3
D. Building a Decision Tree

BUILDING THE DECISION TREE


Step 1

Step 3
Step 2
E. Evaluating a Decision Tree

BUILDING THE DECISION TREE


E. Evaluating a Decision Tree

APPLY MODEL TO TEST DATA


E. Evaluating a Decision Tree

APPLY MODEL TO TEST DATA


E. Evaluating a Decision Tree

APPLY MODEL TO TEST DATA


E. Evaluating a Decision Tree

HOW TO SPLIT DATA


FOR TEST CONDITION
• Depends on attribute types
‒ Nominal (Categorical)
‒ Ordinal (Categorical but ordered for example education level)
‒ Continuous (any value can be represented)
• There are two types of splits
‒ Multi-way split
‒ 2-way split (binary)
E. Evaluating a Decision Tree

SPLITTING BASED ON
NOMINAL ATTRIBUTES
E. Evaluating a Decision Tree

SPLITTING BASED ON
NOMINAL ATTRIBUTES
E. Evaluating a Decision Tree

SPLITTING BASED ON
CONTINUOUS ATTRIBUTES
Different ways of handling
• Change value to form an ordinal categorical attribute
• Binary Decision: (A < v) or (A >=v)
‒ Consider all possible splits and finds the best cut
E. Evaluating a Decision Tree

FINDING A GOOD SPLIT


AT A DECISION TREE NODE
• There are many ways to find a good split
• But, they have two things in common - Splits are preferred where
‒ The children are similar in size
‒ Each child is as pure as possible
• Most algorithms seek to maximize the purity of each of the children
E. Evaluating a Decision Tree

HOW TO DETERMINE
THE BEST SPLIT
• Nodes with homogeneous class distribution are preferred
E. Evaluating a Decision Tree

HOW TO DETERMINE
THE BEST SPLIT
• Before Splitting: 10 records of class 0, 10 records of class 1
E. Evaluating a Decision Tree

PERFORMANCE MEASURES FOR


DECISION TREES
• After a decision tree is constructed, each leaf node has a score
• A leaf score is the likelihood that the more common class arises
• A decision tree also has an accuracy score which is calculated as
follows:
Accuracy = # Correctly classified/ Total #
E. Evaluating a Decision Tree

CALCULATING THE ACCURACY OF


A DECISION TREE
E. Evaluating a Decision Tree

EVALUATION OF
CLASSIFICATION MODELS
• Counts of test records that are correctly (or incorrectly) predicted by
the classification model
• Confusion matrix
E. Evaluating a Decision Tree

EXERCISE FOR DECISION TREE -


SHOULD WE GO SAILING?

yes (2)
no (3)
E. Evaluating a Decision Tree

EXERCISE FOR DECISION TREE -


SHOULD WE GO SAILING?

yes (1)
no (1)
E. Evaluating a Decision Tree

EXERCISE FOR DECISION TREE -


SHOULD WE GO SAILING?
E. Evaluating a Decision Tree

EXERCISE FOR DECISION TREE -


SHOULD WE GO SAILING?
QUESTIONS?

You might also like