Professional Documents
Culture Documents
Document 6
Document 6
Document 6
Introduction
• Predictive modeling involves building
regression(multiple regression or logistic
regression), neural network, and decision tree
models.
Generally,
• Logistic Regression is considered a statistical
method
• Neural network-an artificial intelligence model
• Decision tree- a machine learning technique
2
Decision Tree
• Decision tree are useful for classification and prediction.
• A decision tree model consists of a set of rules for dividing a large
heterogeneous population into smaller, more homogeneous groups
with respect to a particular target.
• The target variable is usually categorical and the decision tree is
used either to:
• (1) calculate the probability that a given record belong to each of
the category or
• (2) To classify the record by assigning it to the most likely class (or
category).
• The algorithm used to construct decision tree is referred to as
recursive partitioning
• Note: Decision tree can also be used to estimate the value of a continuous
target variable. However, multiple regression and neural network models
are generally more appropriate when the target variable is continuous.
3
Decision Tree Structure
• Root node=Top node
• Child node=Descendent node
• Leaf node (or terminal
node)=final node i.e no more
Root
splitting
• Rule=Unique path (set of
Child Child Leaf
conditions)from root to each leaf
Child Leaf
Leaf
4
A Simple Decision Tree
Target: Status:Buyer or Non-Buyer (categorical variable )
Node 0
Buyer 600 40%
Income < $100,000 Non-buyer 900 60% $100,000 and above
Node 1 Node 2
Age Buyer 350 36.84% Gender Buyer 250 45.45%
Non-buyer 600 63.16% 25 and Non-buyer 300 54.55%
female
<25 above male
Chinese
Malay & Indian
Race A customer with
Node 7
Node 8
income less than Buyer 30 15%
Buyer 170 85%
$100000 and age less Non-buyer 170 85%
Non-buyer 30 15%
than 25 is predicted
Note: Input variables that are higher up in the decision tree
as a non-buyer. can be deemed as the more important variables in predicting
the target variable. 5
Decision Rules
1. If Income <$100000 and Age<25 then Status=Non-Buyer
2. If Income <$100000 and Age>=25 then Status=Buyer
3. If Income >=$100000, Gender=Male and Race=Chinese then
Status=Non-Buyer
4. If Income >=$100000, Gender=Male and Race=Malay/Indian then
Status=Buyer
5. If Income >=$100000, Gender=Female then Status=Non-Buyer
Profile of Buyers
• Those earning less than $100000 and age 25 and above.
• Malays or Indians males and earning more than $10000.
6
Regression Tree
Target: salary (continuous variable)
7
Decision Rules
• If Employment=Clerical or Custodial and Gender=Female, average
salary=$25003.69.
• If Employment=Clerical and Gender =Male, average salary=$31558.15.
• If Employment= Custodial or Manager and Gender =Male, average
salary=$30938.89
• If Employment=Manager and Gender =Female, average salary=$47213.50
• If Employment=Manager and Gender =Male, average salary=$66243.24
8
Example: Good & Poor Splits
Splitting criteria
11
Gini (measures Population Diversity)
12
Evaluating the split using Gini
Which of these two proposed splits increases purity the most?
Gini
score=0.52+0.52=0.5
Income<=2000 Income>2000
Female
Male
Buyer
Non-
buyer
Pi is the probability of the i th category of the target variable ocurring in aparticular node
14
Evaluating the split using entropy
Entropy= -1* (P(dark)log2P(dark)+P(light)log2P(light))
Which of these two proposed splits increases information gain the most?
Entropy=1
log2(a)=log10(a)/log10(2)
Entropyleft = -1*(1log2(1)+0=0
Female Income<=2000 Income>2000
Male
16
Algorithms for constructing decision trees
Common ones:
• CHAID (chi-square automatic interaction
detection)
• C5.0
• CART (classification and regression tree)
• Decision tree algorithms are very intensive
(i.e. a lot of computations are performed to
construct the tree)
17
CART(classification and regression tree)
18
CHAID
• Commonly used algorithm to construct decision tree for
qualitative(or categorical) target variable.
• Split algorithm (Chi-Square test) designed for categorical
inputs so continuous inputs must be discretized.
• Example:
• Research Objective of a data mining application is to
predict the buying status (i.e buyer vs non-buyer) of a
product based on demographic variables such as
gender, race, age and income.
• Assume that sample consists of 600 buyers and 900
non-buyers.
19
CHAID(cont’d)
• Step 1:Each input variable is evaluated on
its potential to split the data into two or
more subsets so that the target variable is
as differentiated (in a statistical sense)
between the subsets as possible.
Chi-square test of
Node 0
Buyer 600 40% independence will be
Non-buyer 900 60% used to test the null
Male Female hypothesis:
HO:Buying status and
Node 1 Node 2
Buyer 450 39.13% Buyer 150 42.86%
gender are
Non-buyer 700 60.87 % Non-buyer 200 57.14% independent
21
CHAID(cont’d)
The significance of race in differentiating between buyers and non-buyers can also be evaluated.
22
Decision Tree –Split by Race (Binary Split)
Node 0
Buyer 600 40%
Non-buyer 900 60%
Malay &
Chinese Indian
Node 1 Node 1
Buyer 300 37.5% Buyer 300 42.86%
Non-buyer 500 62.5% Non-buyer 400 57.14%
• Determining the best split for quantitative input variable is more difficult.
• Example: 5 values of a quantitative variable in ascending order
A<B<C<D<E
Four possible splitting points:
average of A and B
average of B and C
average of C and D
average of D and E
As quantitative input variables can take many values, decision tree
algorithms perform very intensive computations to determine the splitting
point.
24
Decision Tree –Determining Split for quantitative input variable
Chi-square=5.52, p-
Node 0 value=0.019<0.05
Buyer 600 40%
Node 1 Node 2
Buyer 230 36.51% Buyer 370 42.53%
Non-buyer 400 63.49% Non-buyer 500 57.47%
Node 1 Node 1
Buyer 350 36.84% Buyer 250 45.45%
Non-buyer 600 63.16% Non-buyer 300 54.55%
Node 0
Buyer 600 40%
Income < $100,000 Non-buyer 900 60% $100,000 and above
Node 2
Node 1
Age Buyer 350 36.84% Gender Buyer 250 45.45%
Chinese
Malay & Indian
Race A customer with Node 7
Node 8
income less than Buyer 30 15%
Buyer 170 85%
$100000 and age less Non-buyer 170 85%
Non-buyer 30 15%
than 25 is predicted
as a non-buyer.
27
Assessing the predictive performance of decision tree
Actual status
Predicted Status
Buyer Non-buyer
Total
28
Comparison of Decision Tree Models
Model Criterion C5.0 CHAID QUEST C&RT
29
Comparison of Decision Tree Algorithms
Details CHAID C&RT QUEST C5.0
30
Comparison of Decision Tree Algorithms
Model Criterion CHAID C&RT QUEST C5.0
32
Decision Tree Advantages
1. Easy to understand and interpret.
2. Map nicely to a set of business rules
3. Can be applied to real problems
4. Make no prior assumptions about the data
5. Able to process both numerical and categorical
data
6. Can handle missing values as a separate
category (CART).
33
34
Using IBM SPSS
Modeler 18
CART
35
CHAID
36
C5
37
EXERCISES
• Use the jobapply.sav data and compate
the CART, CHAID and C5 models.
• Use the bankcredit SAS data and compare
the decision tree models.
38