Professional Documents
Culture Documents
learning
INFO-F-422
Gianluca Bontempi
Dpartement dInformatique
Boulevard de Triomphe - CP 212
http://www.ulb.ac.be/di
Testing hypothesis
Hypothesis testing is the second major area of statistical inference.
Consider two protein coding genes and their expression levels in a cell.
Are the two genes differentially expressed ?
A statistical test is a procedure that aims to answer such questions.
Types of hypothesis
We start by declaring the working (basic, null) hypothesis H to be tested, in the
form = 0 or , where 0 or are given.
The hypothesis can be
Simple.
Composite.
Example:
against H.
the inferential evidence against H is used to judge whether
H is inappropriate. In other words it is a rule for rejecting H.
Significance test:
rejecting H in favour of H.
Hypothesis test:
Let t(DN ) be a statistic such that the larger its value the more it casts
doubt on H.
The quantity t(DN ) is called test statistic or discrepancy measure.
Let tN = t(DN ) the value of t calculated on the basis of the sample data
DN .
Let us consider the p-value quantity
p = Prob {t(DN ) > tN |H}
If p is small the sample data DN are highly inconsistent with H and p
(significance probability or significance level ) is the measure of such
inconsistency.
Some considerations
p is the proportion of situations under the hypothesis H where we would
observe a degree of inconsistency at least to the extent represented by
tN .
tN is the observed value of the statistic for a given DN . Different DN
yield different values of p (0, 1).
it is essential that the distribution of t(DN ) under H is known.
We cannot say that p is the probability that H is true but better that p is the
probability that the dataset DN is observed given that H is true
Open issues
1. What if H is composite?
2. how to choose t(DN ).
Tests of significance
Suppose that the value p is known. If p is small either a rare event has
occured or perhaps H is not true.
Idea: if p is less than some stated value , we reject H.
S0 where if DN S0 we reject H.
non-critical region
Some considerations
The principle is that we will accept H unless we witness some event that
has sufficiently small probability of arising when H is true.
If H were true we could still obtain data in S0 and consequently wrongly
reject H with probability
Prob {DN S0 |H} = Prob {t(DN ) > t |H} =
The significance level provides an upper bound to the maximum
probability of incorrectly rejecting H.
The p-value is the probability that the test statistic is more extreme than
its observed value. The p-value changes with the observed data (i.e. it is
a random variable) while is a level fixed by the user.
0.4
0.9
0.35
0.8
0.3
0.7
0.25
0.6
0.5
0.2
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0
5
0
5
TP: example
Let DN consist of N independent observations of x N (, 2 ), with
known variance 2 .
We want to test the hypothesis H : = 0 with 0 known.
Let us put a significance level = 10% = 0.1. This means that t should
satisfy
0 | > t |H} =
Prob {t(DN ) > t |H} = Prob {|
0 > t ) (OR)
Prob {(
/ N
t = 1.645/ N
we have
0 | > t |H} = 0.1
Prob {|
and that the critical region is
n
S0 = DN
o
: |
0 | > 1.645/ N
Machine learning p. 12/45
10 + 11 + 12 + 13 + 14 + 15
=
= 12.5
6
and
t(DN ) = |
0 | = 2.5
Type I error.
Type II error.
An analogy
Consider the analogy with a murder trial, where we have as suspect Mr.
Bean.
The null hypothesis H is Mr. Bean is innocent.
Hypothesis testing
Suppose we have some data {z1 , . . . , zN } F from a distribution F .
represent two hypotheses about F .
H and H
On the basis of the data, one is accepted and one is rejected.
Confusion matrix
Then we have
H: Not guilty student (-)
Guilty student (+)
H:
Not refused
Refused
TN
FP
NN
FN
N
N
TP
P
N
NP
N
FP is the number of False Positives and the ratio FP /NN represents the
type I error.
FN is the number of False Negatives and the ratio FN /NP represents
the type II error.
SP =
TN
TN
NN FP
FP
=
=
=1
,
FP + T N
NN
NN
NN
0 SP 1
TP
NP FN
FN
TP
=
=
=1
,
T P + FN
NP
NP
NP
0 SE 1
F P R = 1 SP = 1
FP
FP
TN
=
=
,
FP + TN
FP + TN
NN
0 FPR 1
It decreases by reducing the number of false positive and estimates the Type I error.
False Negative Rate
F N R = 1 SE = 1
FN
FN
TP
=
=
TP + FN
TP + FN
NP
0 FPR 1
Predictive value
Positive Predictive value:
PPV =
Negative Predictive value:
0 PPV 1
P NV =
False Discovery Rate:
TP
TP
,
=
T P + FP
NP
TN
TN
,
=
T N + FN
NN
0 P NV 1
FP
FP
=
F DR =
= 1 P P V,
T P + FP
NP
0 F DR 1
0.0
0.2
0.4
SE
0.6
0.8
1.0
ROC curve
0.0
0.2
0.4
0.6
0.8
1.0
FPR
R script roc.R
Machine learning p. 25/45
Choice of test
The choice of test and consequently the choice of the partition {S0 , S1 } is
based on two steps
1. Define a significance level , that is the probability of type I error
Prob {reject H|H} = Prob {DN S0 |H}
that is the probability of incorrectly rejecting H
2. Among the set of tests {S0 , S1 } of level , choose the test that minimizes
the probability of type II error
TP example
Consider a r.v. z N (, 2 ), where is known and a set of N iid
observations are given.
We want to test the null hypothesis = 0 = 0, with = 0.1
Consider the 3 critical regions S0
1. |
0 | > 1.645/ N
2.
0 > 1.282/ N
3. |
0 | < 0.126/ N
For all these tests Prob {DN S0 |H} , hence the significance level
is the same.
: = 10 the type II error of the three tests is significantly
However if H
different.
TP example (II)
:H
11111111111111111
00000000000000000
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
00000000000000000
11111111111111111
0
S1
:H
1111111111111111111111111111
0000000000000000000000000000
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
10
S0
values for which H is accepted (non-critical region). The interval marked by S0 denotes the set
of observed
values for which H is rejected (critical region). The area of the black pattern
region on the right equals Prob {DN S0 |H}, i.e. the probability of rejecting H when H is true
(Type I error). The area of the grey shaded region on the left equals the probability of accepting
H when H is false (Type II error).
TP example (III)
: H
111
000
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
0
S1
S0
: H
10
S1
values for which H is accepted (non-critical region). The interval marked by S0 denotes the set
of observed
values for which H is rejected (critical region). The area of the pattern region
equals Prob {DN S0 |H}, i.e. the probability of rejecting H when H is true (Type I error).
Which area corresponds to the probability of the Type II error?
Simple vs composite:
Consider the null hypothesis and the alternative (composite and one-sided)
H : = 0 ;
: > 0
H
0 ) N
(
N (0, 1)
z=
STEP 2:
STEP 5:
(
0 ) N
=
Since this is less than z = 1.645, we do not reject the null hypothesis.
Name
one/two sample
known
z-test
one
= 0
z-test
two
12 = 22
1 = 2
6= 0
t-test
one
= 0
t-test
two
1 = 2
1 6= 2
2 -test
one
2 = 02
2 -test
one
2 = 02
2 6= 02
F-test
two
12 = 22
1 6= 2
6= 0
2 6= 02
12 6= 22
Students t-distribution
If x N (0, 1) and y 2N are independent then the Students t-distribution
with N degrees of freedom is the distribution of the r.v.
z= p
x
y/N
)
)
N (
N (
q
tN 1
=
c
SS/(N
1)
: 6= 0
H
(
0 )
N (
0 )
= q
t(DN ) = T = q
P
N
1
2
2
(z
)
i
i=1
N 1
N
|T | > k = t/2,N 1
where t/2,N 1 is the upper point of a T -distribution on N 1 degrees
of freedom, i.e.
Prob |tN 1 | > t/2,N 1 = /2.
where tN 1 TN 1 .
TP example
Does jogging lead to a reduction in pulse rate? Eight non jogging volunteers
engaged in a one-month jogging programme. Their pulses were taken before
and after the programme
pulse rate before
74
86
98
102
78
84
79
70
70
85
90
110
71
80
69
74
decrease
-8
10
-4
Suppose that the decreases are samples from N (, 2 ) for some unknown
2 .
: 6= 0 with a significance = 0.05.
We want to test H : = 0 = 0 against H
We have N = 8,
= 2.75, T = 1.263, t/2,N 1 = 2.365
Since |T | t/2,N 1 , the data is not sufficient to reject the hypothesis H. In
other terms we have not enough evidence to show that there is a reduction in
pulse rate.
H : 2 = 02 ;
c = P (zi )2 .
Let SS
i
: 2 6= 02
H
c 2 2
It can be shown that if H is true then SS/
0
N
c 02 < a1 or SS/
c 02 > a2 where
The size 2 -test rejects H if SS/
Prob
c
SS
< a1
2
0
+ Prob
c
SS
> a2
2
0
c
1. replace with
in the quantity SS
2. use a 2N 1 distribution.
x =
PN
i=1 xi
,
N
SSx =
N
X
i=1
(xi
x )2 ,
y =
PM
i=1 yi
,
M
SSy =
M
X
i=1
(yi
y )2
1
M
x
y
TM +N 2
SSx +SSy
1
+N
M +N 2
F-distribution
Let x 2M and y 2N be two independent r.v.. A r.v. z has a F-distribution
Fm,n with M and N degrees of freedom if
z=
x/M
y/N
F-distribution
FM,N density: M=20 N=10
0.9
0.8
0.9
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.5
1.5
2.5
3.5
4.5
0.5
1.5
2.5
3.5
4.5
R script s_f.R.
: 12 6= 22
H
12
2 2
= 2 FM 1,N 1
f= 2 =
c
2 N 1 /(N 1)
2
2
SS2 /(N 1)