Professional Documents
Culture Documents
These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and
grateful acknowledgement to the many others who made their course materials freely available online. Feel
free to reuse or adapt these slides for your own academic purposes, provided that you include proper
attribution.
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
Stages of (Batch) Machine Learning
Given: labeled training data X, Y = {hxi , yi i}ni=1
• Assumes each xi ⇠ D(X ) with yi = ftarget (xi )
X, Y
Train the model:
learner
model ß classifier.train(X, Y )
x model yprediction
# correct predictions
accuracy =
# test instances
# incorrect predictions
error = 1 accuracy =
# test instances
3
Confusion Matrix
• Given a dataset of P positive instances and N negative instances:
Predicted Class
Yes No
Actual Class
Yes TP FN TP + TN
accuracy =
P +N
No FP TN
5
Simple Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1 Decision
5 Region 2
3
Feature 2
0
Decision
Boundary
-1
2 3 4 5 6 7 8 9 10
Feature 1
6
Slide by Padhraic Smyth, UCIrvine
More Complex Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1 Decision
5 Region 2
3
Feature 2
0
Decision
Boundary
-1
2 3 4 5 6 7 8 9 10
Feature 1
7
Slide by Padhraic Smyth, UCIrvine
Example: The Overfitting Phenomenon
8
Slide by Padhraic Smyth, UCIrvine
A Complex Model
Y = high-order polynomial in X
9
Slide by Padhraic Smyth, UCIrvine
The True (simpler) Model
Y = a X + b + noise
10
Slide by Padhraic Smyth, UCIrvine
Example: The Overfitting Phenomenon
11
Slide by Padhraic Smyth, UCIrvine
A Complex Model
Y = high-order polynomial in X
12
Slide by Padhraic Smyth, UCIrvine
The True (simpler) Model
Y = a X + b + noise
13
Slide by Padhraic Smyth, UCIrvine
How Overfitting Affects Prediction
Predictive
Error
Model Complexity
14
Slide by Padhraic Smyth, UCIrvine
How Overfitting Affects Prediction
Predictive
Error
Model Complexity
15
Slide by Padhraic Smyth, UCIrvine
How Overfitting Affects Prediction
Model Complexity
Ideal Range
for Model Complexity
16
Slide by Padhraic Smyth, UCIrvine
Comparing Classifiers
Say we have two classifiers, C1 and C2, and want to
choose the best one to use for future predictions
17
Based on Slide by Padhraic Smyth, UCIrvine
Training and Test Data
Training
Test Data ... Data
Data Training
Data Test Data
Summary statistics
over k test
performances
20
More on Cross-Validation
• Cross-validation generates an approximate estimate
of how well the classifier will do on “unseen” data
– As k à n, the model becomes more accurate
(more training data)
– ...but, CV becomes more computationally expensive
– Choosing k < n is a compromise
k-fold CV Training
Data Training
Test Data
Data
k-fold CV Training
Data Training
Test Data
Data
24
Building Learning Curves
1.) Loop for t trials:
Full Data Set
a.) Randomize Compute learning curve over each
training/testing split
Data Set Shuffle
k-fold CV Training
Data Training
Test Data
Data