w10 PDF

Announcements
05/10/16
CSL465/603 - Machine Learning
Comments and Responses (1)

A large number of students have requested individual
labs
Lab 4 and 5 will be individual labs (no team work)
Quizzes and Labs have not been graded consistently by

the different TAs
Labs will be graded only by Sanatan Sukhija

Quizzes 5 and 7 will be graded only by Ritu Kapur
Quizzes 6 and 8 will be graded only by Rahul Pal Singh
TAs are expected to provide appropriate feedback for the
quizzes and labs.
You can provide your feedback/concerns about the TAs to me.
Reduce the mathematical content of the course

Unfortunately, this is not possible L
05/10/16

Provide some tutorial/questions for practicing the
math/probability/derivations
I will soon upload a set of questions for you to practice.
Provide more interesting labs

Yes! Lab 3 is about music genre classification
Lab 4 and 5 will also have interesting applications
Slides - Not descriptive, Notations are inconsistent

across reference material
Will upload a one pager describing the notations used in
the class/slides
05/10/16

Number of lectures taken
Missed one class before mid-sem.
Totally missed 2 classes (including yesterday)
Will be compensated in the near future.
If you have doubts/questions on the lecture topics,

please feel free to approach me.
05/10/16
Lab 3
Music Genre Classification
High Dimensional dataset (>3000 features)
Competition aspect of the lab

Predict the labels
Save the model that predicts the label
We will run the model and check if the output matches with what
you have provided.
The team with the highest accuracy will receive 10 points

The points for the rest will be scaled accordingly.
Design and Analysis of

Experiments
Lab 6
One pager
Problem statement
Source of data
Any reference material
Email me by Friday September 30th 5.00pm

This cannot be something that is part of your B.Tech
Project.

Experiments
Design and Analysis

of Experiments
CSL465/603 - Fall 2016
Narayanan C Krishnan
ckn@iitrpr.ac.in
Outline
Measures for evaluation
Experimental design
Estimating the generalized performance
Hypothesis testing
Interval estimation
Confidence intervals

Experiments
Confusion Matrix (1)

2-Class Scenario
positive
Total
positive
True positive
( )
False negative
( )
()
negative
False positive
( )
True negative
( )
()
( )
( )
Total

Experiments
negative
Confusion Matrix (2)

K-Class Scenario

Experiments
10
Performance Measures
Error: + /
Accuracy: + /
tp-rate: /
fp-rate: /
positive
Precision: /
negative
Recall: /
Total
Sensitivity: /
Specificity: /
F Measure:
Experiments
positive
negative
Total
True positive
( )
False negative
( )
()
False positive
( )
True negative
( )
()
( )
( )
)+,-./0/12,-.344
+,-./0/125,-.344
11
Receiver Operating Characteristic

Classification error is inefficient when
Costs associated with false positives and negative errors
Class distributions are skewed
ROC - Assess predictive behavior that is

independent of error costs or class distributions
Origin from signal detection theory
Assume a classifier that uses a threshold to

determine the class label
classify as positive if |
The number of true and false positives depend on

Experiments
12
Example (1)
0.5
0.9
0.9
0.7
0.6
0.5
0.4
0.3
0.2
0.1

Experiments
13
Example (2)
0.5
0.9
0.9
0.7
0.6
0.2
0.6
0.3
0.2
0.1

Experiments
14
Receiver Operating Characteristic

Curve

Experiments
15
Domination in ROC Space

Learner L1 dominates L2 if L1s ROC curve is
always above L2s curve
If L1 dominates L2, then L1 is better than L2 for
all possible error costs and class distributions
If neither dominates (L2 and L3), then different
classifiers are better under different conditions

Experiments
16
Quantitative Measure from ROC

Curve
Area Under the (ROC) Curve

Experiments
17
Generating ROC Curve (1)

Assume classifier outputs (|) instead of just
(the predicted class for instance )
Let be a threshold such that if > , then
is classified as , else not
Compute fp-rate and tp-rate for different values of
from 0 to 1
Plot each (fp-rate, tp-rate) and interpolate (or
convex hull)
If multiple points have same fp-rate, then average
tp-rates (k-fold cross-validation)
Experiments
18
Generating ROC Curve (2)

What if classifier does not provide (|), but just
?
E.g., decision tree, rule
Generally, even these discrete classifiers maintain

statistics for classification
E.g., decision tree leaf nodes use proportion of examples
of each class
E.g., rules have the number of examples covered by the
rule
These statistics can be compared against a varying

threshold ()
Experiments
19
Other Performance Measures

Training time and space complexity
Testing time and space complexity
Interpretability of the model

Experiments
20
Evaluating the Hypothesis (1)

Can we make any conclusion about the
generalization performance of the classifier based
on the training set?
How about the validation set?
Could be biased if the validation set is used for
Choosing the classifier (over another)
Parameter tuning
Need another test set that is truly unseen during

training/tuning
Options are limited with small amount of training
data
Experiments
21
Evaluating the Hypothesis (2)

Two main difficulties
Bias in the estimate - performance of the learned
hypothesis on the training set is optimistically biased
Variance in the estimate
Performance estimated on unseen test set is unbiased
However estimate can still vary from the true performance
depending on the make up of the test set.
Interested in the minimum variance unbiased

estimate of the generalization performance.

Experiments
22
Experimental Design (1)

Train/Test Split
Given dataset = E , E I
EGH
Perform random trials, where for each trial
Randomly split into a training set (2/3rd) and testing set
(1/3rd)
Learn a classifier on the training set
Compute the performance (error) on the test set
Compute the average performance (error) over the

trials
Problem?
Experiments
23

Train/Test Split
EGH
Perform random trials, where for each trial
Randomly split into a training set (2/3rd) and testing set
(1/3rd)
Learn a classifier on the training set
Compute the performance (error) on the test set
Compute the average performance (error) over the

trials
Problem?
Train and test sets overlap between trials - bias the result
Experiments
24
-Fold Cross Validation

EGH
Partition into disjoint subsets - H, ), , M
For = 1: trials
Learn the classifier on the training set R
Compute the performance on the test set R
Computer average performance over the trials

A better estimate of generalization performance
Test sets do not overlap
Stratification
Distribution of classes in training and testing sets should be the same

as in original dataset
When size of is very small
Leave one out cross validation - =

Experiments
25

5x2 Cross Validation (Dietterich 1998)
For each of 5 trials (shuffling each time)
Divide in two halves H and )
Compute error using H as training and ) as testing
Compute error using ) as training and H as testing
Computer average error of all 10 results

5 trials best number to minimize overlap among
training and testing sets
Train and Test sets are of similar sizes

Experiments
26

Bootstrapping
If not enough data for -Fold Cross Validation
Generate multiple sets of size from by sampling
with replacement
Each set has approximately 63% of the examples in
Compute average error over all samples

Experiments
27
Interval Estimation (1)

Estimate the mean of a normal distribution
, )
Given a set = E I
EGH
Estimate
I
1
= W E
EGH
where m~ , ) /
Define statistic with a unit normal distribution
0, 1
~
/
Experiments
28
Unit Normal Distribution

95% of lies in
1.96, 1.96
99% of lies in
2.58, 2.58
Therefore,
1.96 < < 1.96
= 0.95
Two-sided confidence
interval
Experiments
29
e/)
2.58
0.99
2.33
0.98
1.96
1.96 < < 1.96 = 0.95
1.64
1.96 <
< 1.96 = 0.95 1.28
1.96
< < + 1.96
= 0.95
0.95
e/)

Experiments
< < + e/)
0.90
0.80
=1
30
Two-Sided Vs One-Sided
Confidence Interval
1.64
e
Experiments
2.33
1.64
1.28
0.99
0.95
0.90
< = 0.95
< =1
31

Previous analysis required us to know )
But typically this is unknown
Instead, we can use sample variance )

I
1
)
=
W E )
1
EGH
When E ~ , ) , then 1 ) / ) is chisquared with 1 degrees of freedom

/ is t-distributed with 1 degrees of
freedom
Experiments
32
Students t-distribution
Similar to normal
distribution, but with
larger spread (heavier
tails)
It includes the
additional uncertainty
with using sample
variance
, it becomes a
normal distribution

Experiments
33

When population variance ) is unknown, we can
use the students t distribution to obtain the interval
)
H
ImH
)
I
, / ~ImH
EGH E
So a two-sided confidence interval estimate would

be of the form
e/),ImH
< < + e/),ImH
=1

Experiments
34
3.0
3.1
3.2
2.8
2.9
3.1
3.2
2.8
2.9
10
3.0

Experiments
=3
) = 0.022
, = 0.149
= 0.05, = 1 = 9
o.o)p,q = 2.685
3 0.127 < < 3 + 0.127 = 0.95
2.873 < < 3.217 = 0.95
35
Hypothesis Testing (1)

Want to claim a hypothesis H
E.g., H : errorx < 0.10
Define the opposite of H to be the null hypothesis

o
E.g., o : errorx 0.10
Perform experiment collecting data about errorx

With what probability can we reject o ?

Experiments
36

Sample = E
I
EGH ~
, )
H
Estimate the sample mean = I
I EGH E
Want to test if is not equal to some constant o
Null hypothesis - o : = o
Alternative hypothesis - H : o
Reject o if too far from o
We fail to reject o with level of significance if o lies in

the 1 confidence interval:
o
e/), e/)
We reject o if o falls outside this interval on either side

(two-sided test)

Experiments
37

Sample = E
I
EGH ~
, )
Estimate the sample mean =
H
I
I
EGH E
Null hypothesis - o : o
Alternative hypothesis - H : > o
Reject o
We fail to reject o with level of significance if o lies in

the 1 confidence interval:
o
, e
We reject o if o falls outside this interval (one-sided

test)

Experiments
38

Sample = E
I
EGH ~
, )
Estimate the sample mean =
H I
EGH E
I
Want to test if is not equal to some constant o

variance ) is unkown, use sample variance )
Reject o
We fail to reject o with level of significance if o lies in the

1 confidence interval:
o
, e,ImH
We reject o if o falls outside this interval (one-sided test)

Experiments
39
3.0
3.1
3.2
2.8
2.9
3.1
3.2
2.8
2.9
10
3.0

Experiments
= 3, ) = 0.022 , = 0.149
o = 2.9
H : > 2.9, o : 2.9
= 0.05, = 1 = 9
o.op,q = 1.833
I }m~
= 2.121 , 1.833
Therefore, reject the null hypothesis

Accept the alternate hypothesis
40
Estimating Classifier Error

Learn classifier on training set
Test classifier on test set of size
Assume probability of error by classifier
= number of errors made by classifier on
described by binomial distribution

= =
1 Im

Experiments
41
Binomial Distribution
Coin Toss experiment
Probability of a head -
The probability of observing heads in coin
tosses is

=
1 Im
Mean -
Standard Deviation -

Experiments
42
Coin Toss Vs Classification

Coin toss
Toss results in a head
Probability of a head
heads observed in
coin tosses
Probability of heads
in coin tosses- ()
Estimating

Experiments
Classification of an
instance
Classifier misclassifies
an instance
Probability of
misclassification
misclassified instances
in samples of
Probability of a
misclassified instance in
S - ( = )
Estimating
43
Binomial Test
Test whether the error probability is less than or
equal to some value o .
Reject o with significance if
I

=W
o 1 o
Im
<
Where = o

Experiments
44
Approximate Normal Test

Approximating with normal distribution
is sum of independent random variables from same
distribution
/ is approximately normal for large with mean o and
variance o 1 o /(central limit theorem)
(/ o )
o 1 o
Fail to reject o : o with significance if

(/ o )
, e
o 1 o
Reject o if outside
Works well for not too small and is not very close to
0 or 1
Experiments
45
Example (1)
Let = 40, = 12, = 0.3
Set o = 0.2, = 0.05
Alternate Hypothesis: H : > o
Null Hypothesis: o : o
Compute
(/ o )
= 1.58 o.op = 1.64
o 1 o
1.58 , 1.64
Therefore fail to reject o
Experiments
46
Example (2)
What is the 95% confidence interval around the
error , given o = 0.3?
95% confidence interval
= 0.05
o e/)
< < o + e/)
=1
0.158 < < 0.442 = 0.95

Experiments
47
t-Test
So far we have looked at single validation set.
Suppose do a k-fold cross validation
error percentages E , 1
M
M
1
1
)
= W E , =
W E
1
EGH
EGH
Hence
o /~MmH
Reject the null hypothesis with significance if this
value is greater than e,MmH

Experiments
48
Comparing two learners

K-fold cross-validated paired t test
Paired test: Both learners get the same train/test sets
Use k-fold cross validation to get the training/test
sets.
Errors of learners 1 and 2 on fold - EH , E)
Paired difference on fold - E = EH E)
Null hypothesis is whether E has mean 0
M
1
= W E ,
EGH
1
=
W E
1
)
EGH
Hence
0 /~MmH
Experiments
49
Summary
Measures for evaluation
Experimental design
Estimating the generalized performance
Hypothesis testing
Interval estimation
Confidence intervals

Experiments
50

w10 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

w10 PDF

Uploaded by

Copyright:

Available Formats

Announcements

CSL465/603 - Machine Learning

Comments and Responses (1)

Quizzes and Labs have not been graded consistently by

Labs will be graded only by Sanatan Sukhija

Reduce the mathematical content of the course

CSL465/603 - Machine Learning

Comments and Responses (2)

Provide more interesting labs

Slides - Not descriptive, Notations are inconsistent

CSL465/603 - Machine Learning

Comments and Responses (3)

If you have doubts/questions on the lecture topics,

CSL465/603 - Machine Learning

Competition aspect of the lab

The team with the highest accuracy will receive 10 points

Design and Analysis of

CSL465/603 - Machine Learning

Email me by Friday September 30th 5.00pm

Design and Analysis of

CSL465/603 - Machine Learning

Design and Analysis

Design and Analysis of

CSL465/603 - Machine Learning

Confusion Matrix (1)

Design and Analysis of

CSL465/603 - Machine Learning

Confusion Matrix (2)

Design and Analysis of

CSL465/603 - Machine Learning

Receiver Operating Characteristic

ROC - Assess predictive behavior that is

Assume a classifier that uses a threshold to

Design and Analysis of

CSL465/603 - Machine Learning

Design and Analysis of

CSL465/603 - Machine Learning

Design and Analysis of

CSL465/603 - Machine Learning

Receiver Operating Characteristic

Design and Analysis of

CSL465/603 - Machine Learning

Domination in ROC Space

Design and Analysis of

CSL465/603 - Machine Learning

Quantitative Measure from ROC

Design and Analysis of

CSL465/603 - Machine Learning

Generating ROC Curve (1)

CSL465/603 - Machine Learning

Generating ROC Curve (2)

Generally, even these discrete classifiers maintain

These statistics can be compared against a varying

CSL465/603 - Machine Learning

Other Performance Measures

Design and Analysis of

CSL465/603 - Machine Learning

Evaluating the Hypothesis (1)

Need another test set that is truly unseen during

CSL465/603 - Machine Learning

Evaluating the Hypothesis (2)

Interested in the minimum variance unbiased

Design and Analysis of