Professional Documents
Culture Documents
Data Mining
Session 07-08
SUPERVISED
UNSUPERVISED
• MULTIPLE LINEAR REGRESSION
• Explanatory Vs Predictive Modelling
Explanatory Modeling
Goal: Explain relationship between predictors
(explanatory variables) and target
5
Predictive Modeling
Goal: predict target values in other data where we have predictor
values, but not target values
• Classic data mining context
• Model Goal: Optimize predictive accuracy
• Train model on training data
• Assess performance on validation (hold-out) data
• Explaining role of predictors is not primary purpose (but useful)
6
• HOW TO ASSESS THE PERFORMANCE OF A MODEL?
Prediction Accuracy Measure
Selecting Subsets of Predictors
Goal: Find parsimonious model (the simplest model that performs
sufficiently well)
• More robust
• Higher predictive accuracy
Exhaustive Search
9
Exhaustive Search
• All possible subsets of predictors assessed (single,
pairs, triplets, etc.)
• Computationally intensive
• Judge by “adjusted R2”
n 1
R 2
1 (1 R 2 )
n p 1
adj
11
Backward Elimination
• Start with all predictors
• Successively eliminate least useful predictors
one by one
• Stop when all remaining predictors have
statistically significant contribution
12
Stepwise
• Like Forward Selection
• Except at each step, also consider dropping non-
significant predictors
13
Summary
• Linear regression models are very popular tools, not only for
explanatory modeling, but also for prediction
• A good predictive model has high predictive accuracy (to a useful
practical level)
• Predictive models are built using a training data set, and evaluated
on a separate validation data set
• Removing redundant predictors is key to achieving predictive
accuracy and robustness
• Subset selection methods help find “good” candidate models.
These should then be run and assessed.
14
Supervised Learning – Possible Outcomes
Predicted numerical value: when the outcome variable is numerical
(e.g., house price)
32
Cutoff Table
Actual Class Prob. of "1" Actual Class Prob. of "1"
1 0.996 1 0.506
1 0.988 0 0.471
1 0.984 0 0.337
1 0.980 1 0.218
1 0.948 0 0.199
1 0.889 0 0.149
1 0.848 0 0.048
0 0.762 0 0.038
1 0.707 0 0.025
1 0.681 0 0.022
1 0.656 0 0.016
0 0.622 0 0.004
34
Lift
36
When One Class is More Important
In many cases it is more important to identify
members of one class
• Tax fraud
• Credit default
• Response to promotional offer
• Detecting electronic network intrusion
• Predicting delayed flights
38
Lift and Decile Charts – Cont.
Compare performance of DM model to “no model,
pick randomly”
39
Lift and Decile Charts: How to Use
40
Lift Chart – cumulative performance
Actual Class Prob. of "1" Actual Class Prob. of "1"
1 0.996 1 0.506
1 0.988 0 0.471
1 0.984 0 0.337
1 0.980 1 0.218
1 0.948 0 0.199
1 0.889 0 0.149
1 0.848 0 0.048
0 0.762 0 0.038
1 0.707 0 0.025
1 0.681 0 0.022
1 0.656 0 0.016
0 0.622 0 0.004
After examining (e.g.,) 10 cases (x-axis), 9 owners (y-
axis) have been correctly identified
41
Decile Chart
Decile-wise lift chart (training dataset) Actual Class Prob. of "1" Actual Class Prob. of "1"
1 0.996 1 0.506
2.5
1 0.988 0 0.471
Decile mean / Global mean
2 1 0.984 0 0.337
1 0.980 1 0.218
1.5
1 0.948 0 0.199
1 1 0.889 0 0.149
1 0.848 0 0.048
0.5
0 0.762 0 0.038
0 1 0.707 0 0.025
1 2 3 4 5 6 7 8 9 10 1 0.681 0 0.022
1 0.656 0 0.016
Deciles
0 0.622 0 0.004
In “most probable” (top) decile, model is twice as likely to identify the important
class (compared to avg. prevalence)
The y axis is ratio of decile mean vs global mean
The numerator is for how many records the class of interest is predicted correctly in the respective 10% of
records.
The denominator is what is the average number of records that will be classified in the class of interest in the
respective 10% of records.
Lift Charts: How to Compute
• Using the model’s classifications, sort records from
most likely to least likely members of the
important class
43
Lift vs. Decile Charts
Both embody concept of “moving down” through
the records, starting with the most probable
44