You are on page 1of 13

Big Data Statistics, meeting 9: Multiple

testing, part 2

6 March 2024
Weak control
■ Testing our d hypotheses with a testing procedure such that (4) holds is known in
the literature as weak control. Here comes the definition (basically re-stating what
we have above).
■ Definition (weak control) We say that a testing procedure for testing the family of
hypotheses H1 , . . . , Hd provides weak control at level α if

PHi ,1≤i≤d (reject at least one true hypothesis) ≤ α.

■ We will reflect on the weak control criterion on the next slides.

10
FWER
■ Here comes the definition that introduces this notation.
■ Definition (FWER) For an arbitrary family of hypotheses Hj , j ∈ J, the
probability
PHj ,j∈J (reject at least one true hypothesis)
is called the family wise error rate (FWER).
■ With this definition we can write the question on the previous slide as

PH1 ,H2 ( reject at least one true hypothesis ) = ?

■ Note also that with this definition we can write the question on the previous slide
for the other examples on the previous slide as

PH2 ,H4 ,H5 ( reject at least one true hypothesis ) = ?,

and
PH1 ,...,Hd−1 ( reject at least one true hypothesis ) = ?

12
Strong control
■ Definition (strong control) We say that a testing procedure for the family of
hypotheses H1 , . . . , Hd provides strong control at level α if for all I ⊂ {1, . . . , d}

PHi ,i∈I (reject at least one true hypothesis) ≤ α.

■ Remarks (weak control vs strong control)


◆ Strong control implies weak control as we can take I = {1, . . . , d} in the
definition of strong control.
◆ Weak control does not imply strong control as Hayter (1986) showed.
◆ A test procedure that achieves strong control is preferred to one that provides
only weak control. This is because strong control makes sure that we reject at
least one true hypothesis with a probability of not more than α no matter
which of the d hypotheses H1 , . . . , Hd are true and which are false (remember
we do not know the truth).
◆ Phrased differently we can say: No matter what the truth is if we have strong
control the probability to reject at least one true hypothesis is not more than α.

15
p-value
■ We denote the random variable p-value for the test statistic T by p̂(T ).
■ Assume we test for a normal distribution with known standard deviation σ the
(single) hypothesis H0 : µ ≤ 0 against H1 : µ > 0.
■ As test statistic T based on n observations X1 , . . . , Xn we use
Pn
i=1 Xi
T = √ .

■ If µ = 0 then T is standard normally distributed.


■ Formally, the random variable p̂(T ) is defined as

smallest α ∈ (0, 1) such that T > q1−α , (5)

where q1−α is the 1 − α-quantile of the standard normal.


■ We can rewrite the inequality as

T > Φ−1 (1 − α),

where Φ cdf of a standard normal and Φ−1 its inverse.


18
p-value (cont’d)
■ Solving this inequality for α we see it holds for all α for which we have

α > 1 − Φ(T ).

■ The smallest α2 for which this holds is the p-value, denoted here by p̂(T ) and
taken here to be
p̂(T ) = 1 − Φ(T ).
■ Remark: This equation confirms once more that p-values are random as the test
statistic T is random.
■ On the next slide we look at the distribution of p̂(T ) under the (null) hypothesis.

2
For those interested in mathematical details it it not entirely correct to define α in (5) as the smallest
α. What we are actually looking for is the infinimum.

19
p-value (cont’d)
■ We note that p̂(T ) takes only values between 0 and 1 (as all p-values do).
■ Under the (null) hypothesis with µ = 0 the test statistic T has a standard normal
distribution.
■ Then we find for any u ∈ (0, 1)

Pµ=0 (p̂(T ) ≤ u) = Pµ=0 (1 − Φ(T ) ≤ u)


= Pµ=0 (1 − u ≤ Φ(T ))
= Pµ=0 (Φ−1 (1 − u) ≤ T )
−1

= 1 − Φ Φ (1 − u) = u.

■ Hence under the null hypothesis p̂(T ) has a uniform distribution on (0, 1).
■ This result holds in general.

20
p-value (cont’d)
The general result is as follows:
■ Theorem: Let T be a test statistic for testing a single hypothesis and p̂(T ) be the
corresponding p-value.
a) For θ ∈ ΘH0 we have

Pθ (p̂(T ) ≤ u) ≤ u, ∀u ∈ (0, 1);

b) If, for θ ∈ ΘH0 , T has a continuous distribution we have

Pθ (p̂(T ) ≤ u) = u, ∀u ∈ (0, 1).

■ Remark: (i) Part b) just says that for test statistics with a continuous distribution
the p-value has a uniform distribution under the (null) hypothesis.
(ii) For a test statistic that does not have a continuous distribution part a) says that
under the (null) hypothesis the probability that the p-value is less than or equal to
u is less than or equal to the probability for this event under a uniform distribution.

21
Bonferroni
■ The following testing procedure is known as Bonferroni procedure and it gives
strong control for multiple testing; subject to the conditions in Theorem
(Bonferroni procedure) below.
■ Bonferroni procedure
◆ Before taking the sample decide on the level α.
◆ For test statistic Ti of hypothesis Hi , 1 ≤ i ≤ d, calculate the p-value:
p̂i = p̂i (Ti ).
◆ Now given the data (t1 , . . . , td ): For each i = 1, . . . , d if we have for the
observed p-value p̂i (ti ) ≤ αd reject Hi .
■ Remark (p-value notation) It is common in the literature, by a slight abuse of
notation (recall we use X for a rv and x for a particular value of X), to denote
both the p-value which is a random variable and its realization by p̂i .

23
Bonferroni (cont’d)
■ Theorem (Bonferroni procedure): For i = 1, . . . , d assume that for any u ∈ (0, 1)
and any J ⊆ {1, . . . , d} we have

PHj ,j∈J (p̂i (Ti ) ≤ u) ≤ u,

then the Bonferroni procedure strongly controls the FWER.


■ Proof: We have
X
PHj ,j∈J (reject any Hj ) ≤ PHj ,j∈J (reject Hi )
i∈J
X  α
= PHj ,j∈J p̂i (Ti ) ≤
d
i∈J
α
≤ |J| ≤ α,
d
as |J| ≤ d, where |J| gives the number of elements of J.
■ Remark: The implicit understanding in the assumption is that i ∈ J (because we
want to control rejection probabilities for true hypotheses).

24
Holm
Another method that controls the FWER strongly and that is based on p-values is
Holm’s procedure.
Holm procedure
■ Denote the increasingly ordered observed p-values (committing the above
described abuse of notation) by p̂(1) , . . . , p̂(d) and the associated hypotheses by
H(1) , . . . , H(d) .
■ Step 1: If p̂(1) ≥ αd , accept H1 , . . . , Hd , and stop. If p̂(1) < α
d reject H(1) and test
α
the remaining d − 1 hypotheses at level d−1 .
■ Step 2: If p̂(1) < αd , and p̂(2) ≥ d−1 α
accept the remaining hypotheses
α
H(2) , . . . , H(d) . If p̂(2) < d−1 reject H(2) and test the remaining d − 2 hypotheses
α
at level d−2 .
■ Continue like this.

25
Holm (cont’d)
■ Theorem: Assume the assumptions of the Theorem (Bonferroni procedure) are
satisfied. Then the Holm procedure strongly controls the FWER.
■ Proof: Let the true hypotheses be Hk , k ∈ K. Define p̂min = min{pk , k ∈ K}.
Let R be the rank of p̂min when ranking all p-values.3 Then, Holm’s procedure
rejects at least one true hypothesis only if
α α min α
p̂(1) < , . . . , p̂(R−1) < , p̂ < .
d d−R+2 d−R+1
We note that R is at most d − |K| + 1, cf. Exercise sheet 5. For the probability of
at least one false rejection we find
 
min α
PHk ,k∈K (reject any Hk ) ≤ PHk ,k∈K p̂ < .
d−R+1
α
This is a bit unpleasant, because both p̂min and d−R+1 are random, as R is
random. Yet, R is smaller than the non-random d − |K| + 1.
3
Exercise sheet 5 illustrates p̂min and R

26
Single & step down, p-values
Remarks:
■ The two procedures above can equivalently be written by adjusting the p-values
and comparing them to α:
◆ For Bonferroni the adjusted p-values are d · p̂i (ti );
◆ For Holm the adjusted p-values are (d − i + 1) · p̂(i) .
■ The Bonferroni procedure is a single step method in the sense that there is only one
cutoff point and any hypothesis with p-value less than this cutoff point is rejected.
■ Holm’s procedure is an example of a step down procedure. One starts with the
’most significant’ p-value and then proceeds to the second ’most significant’
p-value and so on. Step-down, because from the most significant p-value we move
in the direction of the least significant p-value.
■ Holm’s procedure has better power than Bonferroni’s procedure.

28

You might also like