You are on page 1of 50

Announcements

05/10/16

CSL465/603 - Machine Learning

Comments and Responses (1)


A large number of students have requested individual
labs
Lab 4 and 5 will be individual labs (no team work)

Quizzes and Labs have not been graded consistently by


the different TAs

Labs will be graded only by Sanatan Sukhija


Quizzes 5 and 7 will be graded only by Ritu Kapur
Quizzes 6 and 8 will be graded only by Rahul Pal Singh
TAs are expected to provide appropriate feedback for the
quizzes and labs.
You can provide your feedback/concerns about the TAs to me.

Reduce the mathematical content of the course


Unfortunately, this is not possible L
05/10/16

CSL465/603 - Machine Learning

Comments and Responses (2)


Provide some tutorial/questions for practicing the
math/probability/derivations
I will soon upload a set of questions for you to practice.

Provide more interesting labs


Yes! Lab 3 is about music genre classification
Lab 4 and 5 will also have interesting applications

Slides - Not descriptive, Notations are inconsistent


across reference material
Will upload a one pager describing the notations used in
the class/slides

05/10/16

CSL465/603 - Machine Learning

Comments and Responses (3)


Number of lectures taken
Missed one class before mid-sem.
Totally missed 2 classes (including yesterday)
Will be compensated in the near future.

If you have doubts/questions on the lecture topics,


please feel free to approach me.

05/10/16

CSL465/603 - Machine Learning

Lab 3
Music Genre Classification
High Dimensional dataset (>3000 features)

Competition aspect of the lab


Predict the labels
Save the model that predicts the label
We will run the model and check if the output matches with what
you have provided.

The team with the highest accuracy will receive 10 points


The points for the rest will be scaled accordingly.

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

Lab 6
One pager
Problem statement
Source of data
Any reference material

Email me by Friday September 30th 5.00pm


This cannot be something that is part of your B.Tech
Project.

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

Design and Analysis


of Experiments
CSL465/603 - Fall 2016
Narayanan C Krishnan
ckn@iitrpr.ac.in

Outline
Measures for evaluation
Experimental design
Estimating the generalized performance

Hypothesis testing
Interval estimation
Confidence intervals

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

Confusion Matrix (1)


2-Class Scenario
positive

Total

positive

True positive
( )

False negative
( )

()

negative

False positive
( )

True negative
( )

()

( )

( )

Total

Design and Analysis of


Experiments

negative

CSL465/603 - Machine Learning

Confusion Matrix (2)


K-Class Scenario

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

10

Performance Measures
Error: + /
Accuracy: + /
tp-rate: /
fp-rate: /
positive
Precision: /
negative
Recall: /
Total
Sensitivity: /
Specificity: /
F Measure:
Design and Analysis of
Experiments

positive

negative

Total

True positive
( )

False negative
( )

()

False positive
( )

True negative
( )

()

( )

( )

)+,-./0/12,-.344
+,-./0/125,-.344
CSL465/603 - Machine Learning

11

Receiver Operating Characteristic


Classification error is inefficient when
Costs associated with false positives and negative errors
Class distributions are skewed

ROC - Assess predictive behavior that is


independent of error costs or class distributions
Origin from signal detection theory

Assume a classifier that uses a threshold to


determine the class label
classify as positive if |
The number of true and false positives depend on

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

12

Example (1)

0.5

0.9

0.9

0.7

0.6

0.5

0.4

0.3

0.2

0.1

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

13

Example (2)

0.5

0.9

0.9

0.7

0.6

0.2

0.6

0.3

0.2

0.1

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

14

Receiver Operating Characteristic


Curve

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

15

Domination in ROC Space


Learner L1 dominates L2 if L1s ROC curve is
always above L2s curve
If L1 dominates L2, then L1 is better than L2 for
all possible error costs and class distributions
If neither dominates (L2 and L3), then different
classifiers are better under different conditions

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

16

Quantitative Measure from ROC


Curve
Area Under the (ROC) Curve

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

17

Generating ROC Curve (1)


Assume classifier outputs (|) instead of just
(the predicted class for instance )
Let be a threshold such that if > , then
is classified as , else not
Compute fp-rate and tp-rate for different values of
from 0 to 1
Plot each (fp-rate, tp-rate) and interpolate (or
convex hull)
If multiple points have same fp-rate, then average
tp-rates (k-fold cross-validation)
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

18

Generating ROC Curve (2)


What if classifier does not provide (|), but just
?
E.g., decision tree, rule

Generally, even these discrete classifiers maintain


statistics for classification
E.g., decision tree leaf nodes use proportion of examples
of each class
E.g., rules have the number of examples covered by the
rule

These statistics can be compared against a varying


threshold ()
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

19

Other Performance Measures


Training time and space complexity
Testing time and space complexity
Interpretability of the model

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

20

Evaluating the Hypothesis (1)


Can we make any conclusion about the
generalization performance of the classifier based
on the training set?
How about the validation set?
Could be biased if the validation set is used for
Choosing the classifier (over another)
Parameter tuning

Need another test set that is truly unseen during


training/tuning
Options are limited with small amount of training
data
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

21

Evaluating the Hypothesis (2)


Two main difficulties
Bias in the estimate - performance of the learned
hypothesis on the training set is optimistically biased
Variance in the estimate
Performance estimated on unseen test set is unbiased
However estimate can still vary from the true performance
depending on the make up of the test set.

Interested in the minimum variance unbiased


estimate of the generalization performance.

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

22

Experimental Design (1)


Train/Test Split
Given dataset = E , E I
EGH
Perform random trials, where for each trial
Randomly split into a training set (2/3rd) and testing set
(1/3rd)
Learn a classifier on the training set
Compute the performance (error) on the test set

Compute the average performance (error) over the


trials
Problem?
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

23

Experimental Design (2)


Train/Test Split
Given dataset = E , E I
EGH
Perform random trials, where for each trial
Randomly split into a training set (2/3rd) and testing set
(1/3rd)
Learn a classifier on the training set
Compute the performance (error) on the test set

Compute the average performance (error) over the


trials
Problem?
Train and test sets overlap between trials - bias the result
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

24

Experimental Design (3)

-Fold Cross Validation


Given dataset = E , E I
EGH
Partition into disjoint subsets - H, ), , M
For = 1: trials
Learn the classifier on the training set R
Compute the performance on the test set R

Computer average performance over the trials


A better estimate of generalization performance
Test sets do not overlap

Stratification

Distribution of classes in training and testing sets should be the same


as in original dataset

When size of is very small

Leave one out cross validation - =

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

25

Experimental Design (4)


5x2 Cross Validation (Dietterich 1998)
For each of 5 trials (shuffling each time)
Divide in two halves H and )
Compute error using H as training and ) as testing
Compute error using ) as training and H as testing

Computer average error of all 10 results


5 trials best number to minimize overlap among
training and testing sets
Train and Test sets are of similar sizes

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

26

Experimental Design (5)


Bootstrapping
If not enough data for -Fold Cross Validation
Generate multiple sets of size from by sampling
with replacement
Each set has approximately 63% of the examples in

Compute average error over all samples

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

27

Interval Estimation (1)


Estimate the mean of a normal distribution
, )
Given a set = E I
EGH
Estimate
I
1
= W E

EGH

where m~ , ) /
Define statistic with a unit normal distribution
0, 1

~
/
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

28

Unit Normal Distribution


95% of lies in
1.96, 1.96
99% of lies in
2.58, 2.58

Therefore,
1.96 < < 1.96
= 0.95
Two-sided confidence
interval
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

29

e/)

2.58

0.99

2.33

0.98

1.96
1.96 < < 1.96 = 0.95
1.64

1.96 <
< 1.96 = 0.95 1.28

1.96
< < + 1.96
= 0.95

0.95

Interval Estimation (2)

e/)

Design and Analysis of


Experiments

< < + e/)

CSL465/603 - Machine Learning

0.90
0.80

=1

30

Two-Sided Vs One-Sided
Confidence Interval

1.64
e
Design and Analysis of
Experiments

2.33

1.64

1.28

0.99

0.95

0.90

< = 0.95

< =1
CSL465/603 - Machine Learning

31

Interval Estimation (3)


Previous analysis required us to know )
But typically this is unknown

Instead, we can use sample variance )


I
1
)
=
W E )
1
EGH

When E ~ , ) , then 1 ) / ) is chisquared with 1 degrees of freedom


/ is t-distributed with 1 degrees of
freedom
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

32

Students t-distribution
Similar to normal
distribution, but with
larger spread (heavier
tails)
It includes the
additional uncertainty
with using sample
variance
, it becomes a
normal distribution

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

33

Interval Estimation (4)


When population variance ) is unknown, we can
use the students t distribution to obtain the interval
)

H
ImH

)
I

, / ~ImH
EGH E

So a two-sided confidence interval estimate would


be of the form

e/),ImH
< < + e/),ImH
=1

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

34

Interval Estimation (5)

3.0

3.1

3.2

2.8

2.9

3.1

3.2

2.8

2.9

10

3.0

Design and Analysis of


Experiments

=3
) = 0.022
, = 0.149
= 0.05, = 1 = 9
o.o)p,q = 2.685
3 0.127 < < 3 + 0.127 = 0.95
2.873 < < 3.217 = 0.95

CSL465/603 - Machine Learning

35

Hypothesis Testing (1)


Want to claim a hypothesis H
E.g., H : errorx < 0.10

Define the opposite of H to be the null hypothesis


o
E.g., o : errorx 0.10

Perform experiment collecting data about errorx


With what probability can we reject o ?

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

36

Hypothesis Testing (2)


Sample = E

I
EGH ~

, )
H

Estimate the sample mean = I

I EGH E
Want to test if is not equal to some constant o
Null hypothesis - o : = o
Alternative hypothesis - H : o
Reject o if too far from o

We fail to reject o with level of significance if o lies in


the 1 confidence interval:
o
e/), e/)

We reject o if o falls outside this interval on either side


(two-sided test)

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

37

Hypothesis Testing (3)


Sample = E

I
EGH ~

, )

Estimate the sample mean =

H
I

I
EGH E

Null hypothesis - o : o
Alternative hypothesis - H : > o
Reject o

We fail to reject o with level of significance if o lies in


the 1 confidence interval:
o
, e

We reject o if o falls outside this interval (one-sided


test)

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

38

Hypothesis Testing (4)


Sample = E

I
EGH ~

, )

Estimate the sample mean =

H I
EGH E
I

Want to test if is not equal to some constant o


variance ) is unkown, use sample variance )

Null hypothesis - o : o
Alternative hypothesis - H : > o
Reject o

We fail to reject o with level of significance if o lies in the


1 confidence interval:
o
, e,ImH

We reject o if o falls outside this interval (one-sided test)

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

39

Hypothesis Testing (5)

3.0

3.1

3.2

2.8

2.9

3.1

3.2

2.8

2.9

10

3.0

Design and Analysis of


Experiments

= 3, ) = 0.022 , = 0.149
o = 2.9
H : > 2.9, o : 2.9
= 0.05, = 1 = 9
o.op,q = 1.833

I }m~

= 2.121 , 1.833

Therefore, reject the null hypothesis


Accept the alternate hypothesis

CSL465/603 - Machine Learning

40

Estimating Classifier Error


Learn classifier on training set
Test classifier on test set of size
Assume probability of error by classifier
= number of errors made by classifier on
described by binomial distribution

= =
1 Im

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

41

Binomial Distribution
Coin Toss experiment
Probability of a head -
The probability of observing heads in coin
tosses is

=
1 Im

Mean -
Standard Deviation -

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

42

Coin Toss Vs Classification


Coin toss
Toss results in a head
Probability of a head
heads observed in
coin tosses
Probability of heads
in coin tosses- ()
Estimating

Design and Analysis of


Experiments

Classification of an
instance
Classifier misclassifies
an instance
Probability of
misclassification
misclassified instances
in samples of
Probability of a
misclassified instance in
S - ( = )
Estimating

CSL465/603 - Machine Learning

43

Binomial Test
Test whether the error probability is less than or
equal to some value o .
Null hypothesis - o : o
Alternative hypothesis - H : > o
Reject o with significance if
I


=W
o 1 o

Im

<

Where = o

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

44

Approximate Normal Test


Approximating with normal distribution
is sum of independent random variables from same
distribution
/ is approximately normal for large with mean o and
variance o 1 o /(central limit theorem)

(/ o )
o 1 o

Fail to reject o : o with significance if


(/ o )
, e
o 1 o
Reject o if outside

Works well for not too small and is not very close to
0 or 1
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

45

Example (1)
Let = 40, = 12, = 0.3
Set o = 0.2, = 0.05
Alternate Hypothesis: H : > o
Null Hypothesis: o : o
Compute
(/ o )
= 1.58 o.op = 1.64
o 1 o
1.58 , 1.64
Therefore fail to reject o
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

46

Example (2)
What is the 95% confidence interval around the
error , given o = 0.3?
95% confidence interval
= 0.05

o e/)

< < o + e/)

=1

0.158 < < 0.442 = 0.95

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

47

t-Test
So far we have looked at single validation set.
Suppose do a k-fold cross validation
error percentages E , 1
M
M
1
1
)
= W E , =
W E

1
EGH

EGH

Hence
o /~MmH
Reject the null hypothesis with significance if this
value is greater than e,MmH

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

48

Comparing two learners


K-fold cross-validated paired t test
Paired test: Both learners get the same train/test sets
Use k-fold cross validation to get the training/test
sets.
Errors of learners 1 and 2 on fold - EH , E)
Paired difference on fold - E = EH E)
Null hypothesis is whether E has mean 0
M

1
= W E ,

EGH

1
=
W E
1
)

EGH

Hence
0 /~MmH
Design and Analysis of
Experiments

CSL465/603 - Machine Learning

49

Summary
Measures for evaluation
Experimental design
Estimating the generalized performance

Hypothesis testing
Interval estimation
Confidence intervals

Design and Analysis of


Experiments

CSL465/603 - Machine Learning

50

You might also like