You are on page 1of 11

This article was downloaded by: [University of Waikato]

On: 11 July 2014, At: 01:28


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer
House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of the American Statistical Association


Publication details, including instructions for authors and subscription information:
http://amstat.tandfonline.com/loi/uasa20

False Discovery Rates for Spatial Signals


a a
Yoav Benjamini & Ruth Heller
a
Yoav Benjamini is Professor, Department of Statistics and Operations Research, Tel
Aviv University, Tel Aviv 29978, Israel . Ruth Heller is Mark O. Winkelman Distinguished
Scholar in Residence Department of Statistics, University of Pennsylvania . The research
was done as part of Ruth Heller's PhD thesis while being at Tel Aviv University. The
authors thank Nava Rubin for engaging in useful discussions of the fMRI example that
motivated this study and for supplying the fMRI data, and Felix Abramovich for valuable
comments. This study was supported by a grant from the Adams Super Center for Brain
Studies, Tel Aviv University.
Published online: 01 Jan 2012.

To cite this article: Yoav Benjamini & Ruth Heller (2007) False Discovery Rates for Spatial Signals, Journal of the
American Statistical Association, 102:480, 1272-1281, DOI: 10.1198/016214507000000941

To link to this article: http://dx.doi.org/10.1198/016214507000000941

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of
the Content. Any opinions and views expressed in this publication are the opinions and views of the authors,
and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied
upon and should be independently verified with primary sources of information. Taylor and Francis shall
not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other
liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
amstat.tandfonline.com/page/terms-and-conditions
False Discovery Rates for Spatial Signals
Yoav B ENJAMINI and Ruth H ELLER

The problem of multiple testing for the presence of signal in spatial data can involve numerous locations. Traditionally, each location is
tested separately for signal presence, but then the findings are reported in terms of clusters of nearby locations. This is an indication that the
units of interest for testing are clusters rather than individual locations. The investigator may know a priori these more natural units or an
approximation to them. We suggest testing these cluster units rather than individual locations, thus increasing the signal-to-noise ratio within
the unit tested as well as reducing the number of hypothesis tests conducted. Because the signal may be absent from part of each cluster,
we define a cluster as containing a signal if the signal is present somewhere within the cluster. We suggest controlling the false discovery
rate (FDR) on clusters (i.e., the expected proportion of clusters rejected erroneously out of all clusters rejected) or its extension to general
weights (WFDR). We introduce a powerful two-stage testing procedure and show that it controls the WFDR. Once the cluster discoveries
have been made, we suggest “cleaning” locations in which the signal is absent. For this purpose, we develop a hierarchical testing procedure
that first tests clusters, then tests locations within rejected clusters. We show formally that this procedure controls the desired location error
rate asymptotically, and conjecture that this is also so for realistic settings by extensive simulations. We discuss an application to functional
neuroimaging that motivated this research and demonstrate the advantages of the proposed methodology on an example.
KEY WORDS: Functional magnetic resonance imaging; Hierarchical testing; Multiple testing; Signal detection; Weighted testing proce-
dures.

1. INTRODUCTION interest are the contiguous brain regions that participate in cog-
nitive tasks and thus are activated together (see, e.g., Penny and
Consider a spatial signal {μ(s) : s ∈ D ⊂ d } that is deter-
Friston 2003), rather than the individual voxels. Second, spatial
Downloaded by [University of Waikato] at 01:28 11 July 2014

ministic and has spatial structure in the loose sense that the
clusters of measured locations can have an increased signal-to-
signal is present over regions rather than in singular points in noise ratio (SNR), because if the signal is present in one loca-
the spatial domain. Let D0 = {s ∈ D : μ(s) = 0} and D1 = {s ∈ tion, then there tends to be signal in neighboring locations as
D : μ(s) > 0} denote the subsets of D with no signal and posi- well. For example, even if the SNR of every monitoring site in
tive signals. We wish to infer on the subset D1 of D with pos- a region is too low to allow detection of a pollutant in water
itive mean signal (the restriction to positive is for ease of no- or in the air, the SNR of the pooled information may be high
tation) from the N realized processes {Xi (s) : i = 1, . . . , N, s ∈ enough to do so. Another example is the across-country mor-
D ⊂ d }, where Xi (s) = μ(s) + i (s) and i (·) is a Gaussian tality surveillance system. Currently each municipality is tested
mean-0 process. Whereas a large part of the spatial literature is separately for a suspicious increase in mortality (see, e.g., Sar-
devoted to the difficult case where N = 1, requiring extra as- torius, Jacobsen, Torner, and Giesecke 2006). These systems
sumptions on μ(·) that capture the spatial structure of the sig- can benefit from our approach by testing clusters of municipal-
nal, in recent important applications, such as functional mag- ities with similar death rates first, possibly weighing the impor-
netic resonance imaging (fMRI) and satellite measurements, we tance of a cluster by its population, and then testing only those
encounter the more general setting N > 1. municipalities within detected clusters.
The common approach to the problem is to test for the pres- Let C1 , C2 , . . . , Cm be a partition of D into m contiguous
ence of the signal at each location separately, components that we call clusters. We say that a cluster Ci does
not contain signal (or is a null cluster) if the signal is absent in
H0 (s) : μ(s) = 0 versus H1 (s) : μ(s) > 0, (1)
all subsets of Ci (i.e., Ci ⊂ D0 ). This is the only definition that
and adjust the level of the test to the multiplicity of locations guarantees that the p values based on the spatially averaged sig-
to control the familywise error (FWE), using, for example, ran- nal are uniform or stochastically larger than the uniform under
dom field theory or resampling methods, or the false discovery the null. Thus the collection of m hypotheses tested is
rate (FDR). H0i : μ(s) = 0 ∀s ∈ Ci versus
We argue, for two main reasons, that the data should be ag- (2)
gregated into clusters of locations before testing. First, in many H1i : μ(s) > 0 for at least one s ∈ Ci ,
applications the fundamental units of interest are contiguous for all i = 1, . . . , m.
aggregates of measured locations that describe the regions of How to aggregate the data into spatial clusters is problem-
interest. For example, in fMRI, the signal is recorded over time specific and depends on the information at hand. In the fMRI
for a series of brain slices, yielding a measured signal at volu- example, Heller, Stanley, Yekutieli, Rubin, and Benjamini
metric pixels (called voxels). The spatial resolution is set by the (2006) used the data from a preparatory fMRI experiment to
capacity of the MRI machine used, but the fundamental units of approximate the appropriate brain units (subunits) by aggregat-
ing neighboring voxels that are highly correlated; in the pol-
lutant example, the clusters can be defined by considering the
Yoav Benjamini is Professor, Department of Statistics and Operations Re-
search, Tel Aviv University, Tel Aviv 29978, Israel (E-mail: ybenja@post.tau.
geographic positions of the monitoring sites; in the mortality
ac.il). Ruth Heller is Mark O. Winkelman Distinguished Scholar in Resi- surveillance system example, the clusters can be defined by
dence Department of Statistics, University of Pennsylvania (E-mail: ruheller@ aggregating municipalities with similar past death rates. Two
wharton.upenn.edu). The research was done as part of Ruth Heller’s PhD thesis
while being at Tel Aviv University. The authors thank Nava Rubin for engag-
ing in useful discussions of the fMRI example that motivated this study and for © 2007 American Statistical Association
supplying the fMRI data, and Felix Abramovich for valuable comments. This Journal of the American Statistical Association
study was supported by a grant from the Adams Super Center for Brain Studies, December 2007, Vol. 102, No. 480, Theory and Methods
Tel Aviv University. DOI 10.1198/016214507000000941

1272
Benjamini and Heller: False Discovery Rates for Spatial Signals 1273

points are important regarding construction of the partition. 2. FALSE DISCOVERY RATES FOR SPATIAL SIGNALS
First, it should be based on information outside the data that
We first propose useful error rates to control when draw-
we set out to analyze. This guarantees that if the partitioning
ing inference about the set of locations that contain a signal.
has a stochastic component, it does not affect the interpretation
or the validity of the results. In practice, such information is For a rejected set A ⊆ D and a measure λ on the domain
often available for spatial data, as in the foregoing examples. D, define λ1 (A) ≡ λ(A ∩ D1 ), λ0 (A) ≡ λ(A ∩ D0 ), Qλ =
Second, the quality of the partition does affect the potential λ0 (A)/λ(A)I{λ(A)>0} , where I{λ(A)>0} defines Qλ as 0 when
gain from using clusters of locations rather than individual lo- λ(A) = 0. For example, with the two-dimensional Lebesgue
cations. Intuitively, if the data are partitioned into the smallest measure, Qλ is the proportion of the area in which the signal
possible number of homogeneous clusters (in the sense that all is absent out of the area rejected, unless the rejected set has
of its locations contain signal or the signal is absent from all), measure 0, in which case Qλ is 0. A natural generalization of
then the gain will be largest. From extensive simulations and the FDR for spatial applications is given in the following defin-
analytical examination of simple cases, we show that we can ition.
gain power by testing clusters rather than individual locations Definition 1. Using the measure λ for rejected sets A ⊆ D,
even if the partitioning is far from ideal. This is so, for example, the false discovery rate of a testing procedure is
for independent data if on average the proportion of locations  
with (same) signal in a cluster multiplied by the square root of λ0 (A)
FDRλ = E(Qλ ) = E I{λ(A)>0} .
the cluster size is >1. λ(A)
Starting with a given partition of the data into clusters, test-
FDRλ has been suggested by Pacifico et al. (2004) for
Downloaded by [University of Waikato] at 01:28 11 July 2014

ing clusters of locations rather than individual locations raises


a threshold procedure with data-dependent threshold T (X):
the question of which error criterion we would like to control.
A(T ) = {s ∈ D : X(s) ≥ T (X)}. In practice, the signal is usu-
Benjamini and Hochberg (1995) introduced the FDR, which
ally measured in a known, finite number of locations. When the
in our context is the expected proportion of erroneously re-
locations are equally spaced and represent equal area, the regu-
jected clusters out of all clusters rejected, say FDR on clusters
(FDRCLUST ). Benjamini and Hochberg (1997) also introduced lar FDR on locations approximates the FDR in terms of area.
the weighted FDR (WFDR), which may be especially appropri- When the fundamental unit of interest is a cluster, a useful
ate here. For example, we may want to control a size-weighted error rate (in the spirit of Pacifico et al. 2004) for a testing pro-
FDR on clusters (WFDRCLUST ), where the weights are propor- cedure that rejects clusters is the expected proportion of falsely
tional to the size of the clusters, which means that on the one rejected clusters out of all clusters rejected.
hand it is important to reject a large cluster that contains sig- Definition 2. Let C1 , . . . , Cm be a partition of D into clus-
nal, because it considerably increases the weight of the total ters, let I0 be the subset of indices corresponding to null clus-
discoveries, but on the other hand it also increases the weight ters, and let Ri = 1 if cluster i is rejected and 0 otherwise. The
of the errors if in fact it is an error. We give exact definitions FDR on clusters is
and their relationships in Section 2. In Section 3 we give pro-  
cedures for controlling the WFDR on clusters. In particular, we i∈I0 Ri 
FDRCLUST = E m I{ mi=1 Ri >0} .
prove that the two-stage procedure of Benjamini, Krieger, and i=1 Ri
Yekutieli (2006) can be applied using weights and still control When clusters are completely homogeneous, weighing each
the WFDR, a property of interest by itself. cluster by its size leads to an error rate that is identical to FDRλ .
We have already identified the advantage of using clusters
as building blocks for inference. But a researcher may want to Definition 3. The size-weighted FDR on clusters is
detect not only the clusters containing the signal, but also the  
i∈I λ(Ci )Ri 
locations within the clusters in which the signal is present. For WFDRCLUST = E m 0 I{ mi=1 Ri >0} .
this purpose, we introduce a cluster testing and trimming proce- i=1 λ(Ci )Ri
dure for spatial data that first tests clusters and then trims indi- The two foregoing error rates are special cases of the WFDR
vidual locations from the cluster discoveries to control the FDR defined by Benjamini and Hochberg (1997) for general hy-
on locations as well (possibly at a more lenient level). We de- potheses,
scribe its asymptotic properties in Section 4 and examine more  
realistic models using simulations in Section 5. i∈I wi Ri 
WFDR = E m 0 I{ mi=1 Ri >0} ,
We demonstrate our analysis approach on an fMRI example i=1 wi Ri
in Section 6. This approach is especially useful for detecting 
where wi is the weight of cluster i and m i=1 wi = m. Specif-
activated brain regions, because the data are very noisy and the
ically, the weight
 of cluster i is w i = 1 for FDRCLUST and
number of tests (in the original units) is very large.
wi = mλ(Ci )/ m i=1 λ(Ci ) for WFDRCLUST .
The first attempt to give FDR control over clusters was made
by Pacifico, Genovese, Verdinelli, and Wasserman (2004), but Control of FDRCLUST or WFDRCLUST does not guaran-
their approach is very different from ours in principle as well as tee control of FDRλ , but the following simple relation with
in detail. Moreover, their suggested procedure relies on power WFDRCLUST can be used to get an upper bound on FDRλ :
 m 
of single location intensities and thus does not make use of the i=1 Ri λ0 (Ci ) m
power gain made possible by averaging over locations within FDRλ = E  m I{ i=1 Ri >0}
i=1 Ri λ(Ci )
clusters. (We return to this point in Sec. 7.) A procedure for  
estimating the spatial signal using the FDR was given by Shen, i∈I1 Ri λ0 (Ci )
= WFDRCLUST + E m ,
Huang, and Cressie (2002). i=1 Ri λ(Ci )
1274 Journal of the American Statistical Association, December 2007

where I1 is the subset of indices corresponding to nonnull clus- Benjamini et al. (2006) argued (by a simulation study) that
ters. the two-stage procedure for unit weights controls the FDR at
Recall that our motivation for using clusters rather than loca- level q under the PRDS assumption. The argument can be ex-
tions is that the units of interest are clusters, which may have a tended under weighting as follows: Approximate the set of
higher SNR than locations. So the primary interest is to achieve weights by integer positive weights
 {Wi : i = 1, . . . , m}, then
control over WFDR on clusters, where the choice of weights produce a vector of mnew = m i=1 i p values by repeating
W
is goal-oriented. We discuss this in Section 3. Once the cluster Wi times the p value pi for every i = 1, . . . , m, thus produc-
discoveries have been made, the investigator may be interested ing an extreme case of positive dependency between the mnew
in “cleaning” the rejected clusters from null locations, thus re- p values.
ducing the FDRλ ; we cover this in Section 4. A more rigorous result under more general dependency struc-
tures can be given in an asymptotic framework. In this frame-
3. CONTROLLING THE WEIGHTED FALSE
DISCOVERY RATE ON CLUSTERS work, we assume that the locations are measured on a grid. Let
the nth set of spatial observations consist of a grid of locations
Consider any test for testing H0i versus H1i in (2) and let Dn . As the number of locations goes to infinity, we consider
Pi be the corresponding p value and pi be its realized value. the following two asymptotic settings: increasing domain as-
Note that any test statistic is appropriate as long as the p value ymptotics, where the distance between locations remains fixed
H0i
is valid, that is, Pi ∼ U(0, 1) or U(0, 1). but the domain goes to infinity D1 ⊂ D2 ⊂ · · · ⊂ d , and infill
st
Benjamini and Hochberg (1997) extended the FDR con- asymptotics, where the domain remains fixed but the number
Downloaded by [University of Waikato] at 01:28 11 July 2014

trolling linear step-up procedure in Benjamini and Hochberg of points increases to infinity Dn = D ∩ (Z/n )d , where {n }
(1995), also called the BH procedure, to generally weighted is an increasing sequence of positive integers and (Z/n )d =
procedure. {k/λn : k ∈ Z d } is the 1/λn lattice. The first setting is common
in spatial statistics. The second setting makes sense if we think
Procedure 1 (The weighted BH procedure). Order the clus-
of an improved measuring device (e.g., an MRI machine), so
ter p values p(1) ≤ · · · ≤ p(j ) ≤ · ·
· ≤ p(m) . Let w(j ) be the
m that as the resolution of the device increases, the device is also
weight associated with p(j ) and
j j =1 w(j ) = m. Let k = able to differentiate better between nearby locations.
max{j : p(j ) ≤ ( i=1 w(i) /m)q} and reject the clusters corre- Let S and S0 be the number of locations measured and the
sponding to the smallest k p values. number of null locations. Consider the following asymptotic
This procedure was shown by Benjamini and Hochberg conditions:
 to guarantee that for independent test statistics WFDR ≤
(1997) S0
( i∈I0 wi /m)q. Benjamini and Yekutieli (2001) further proved lim = A0 exists and A0 < 1, (3)
S→∞ S
that FDRCLUST ≤ q if the joint distribution of the test statistics
1
is positive regression-dependent (PRDS) on the subset of true S
a.s.
nulls. For example, the PRDS property is satisfied if the cluster FS = 1[ps < t|H0s ] → A0 F (t),
S S→∞
test statistics are Gaussian and nonnegatively correlated and the s=1
testing hypotheses are one-sided. Kling (2005) proved that the F (t) ≤ t ∀t ∈ (0, 1], (4)
bound on WFDR is valid under the same conditions.
In line with many others (e.g., Storey, Taylor, and Siegmund and
2004; see Benjamini et al. 2006 for a review), Benjamini et al.
1
S
(2006) suggested a two-stage procedure that first estimates the a.s.
GS = 1[ps < t|H1s ] → (1 − A0 )G(t) ∀t ∈ (0, 1].
number of null hypotheses and then uses it to enhance the S S→∞
s=1
power when wi = 1, i = 1, . . . , m. They proved that it controls (5)
the FDR at level q for independent test statistics. The gener- These conditions can be satisfied if, for example, we assume
alization
 of their procedure, which is especially useful when that the spatial dependence of the noise between locations is lo-
i∈I0 wi /m is believed to be small, is as follows. cal (m dependence). The two asymptotic settings are equivalent
Procedure 2 (The weighted two-stage procedure). under this assumption of local dependence. Because the depen-
dence among the location test statistics is finite, if condition (3)
1. Use Procedure 1 at level q = q/(q + 1). is satisfied, then the empirical processes (4) and (5) converge
2. Estimate the sum of weights of null clusters by m̂0 (w) = by the Glivenko–Cantelli lemma (which is valid under local de-
 j
m − ki=1 w(i) . Let k2 = max{j : p(j ) ≤ ( i=1 w(i) / pendence).  1 m
m̂0 (w))q }. The clusters corresponding to the smallest k2 Let Fm = m1 m i=1 1[pi < t|H0i ] and Gm = m i=1 1[pi <
p values are rejected. t|H1i ] be the empirical distribution functions of the p values
Theorem 1. For independent test statistics, the weighted two- coming from the cluster null and alternative hypotheses.
stage Procedure 2 controls the WFDR at level q. Theorem 2. Assume that conditions (3)–(5) hold, that the
See Section A.1 for a proof. Note that although the devel- number of clusters m(S) satisfies limS→∞ m(S) = ∞, and that
opment of the weighted two stage procedure was motivated by lim mm0 exists. Also suppose that δ ≡ sup{t : limm→(F t(t)+G (t)) ≤
m m
the problem of testing clusters, it is more general and can be q} ∈ (0, 1]. Then, as S → ∞, Procedure 2 with unit weights
applied to any multiple testing problem with weights. controls the FDR asymptotically at level q.
Benjamini and Heller: False Discovery Rates for Spatial Signals 1275

Proof. The threshold in Procedure 2 is tm ∗ = sup{t : cluster hypothesis (2) using the standardized z score average of
(m̂0 /m)(1+q)t
(Fm (t)+Gm (t))∨1/m ≤ q}. Therefore,
its locations as the cluster test statistic, then test the location
  hypothesis (1) for every location included in a rejected cluster.
∗)
Fm (tm The relevant location p value is no longer the marginal one,
FDR = E
(Fm (tm∗ ) + G (t ∗ )) ∨ 1 but is conditional on it being in a rejected cluster. The following
m m m
notations will be convenient in the derivation of the p value:
 m̂0 ∗ √
m (1 + q)tm ρli = corr(Zli , Z i ) (e.g., ρli = 1/ ci if the measured signal
=E between locations is independent), σZi is the standard deviation

(Fm (tm ) + Gm (tm ∗ )) ∨ 1
m
of Z i , and μi = EH1i (Z i )/σZ i is the standardized expectation
∗ m̂0 ∗ 
Fm (tm ) − m (1 + q)tm of the cluster test statistic under the alternative hypothesis H1i .
+ The conditional p value, pli (u1 , m0 /m, μi , ρli ), of a location
(Fm (tm∗ ) + G (t ∗ )) ∨ 1
m m m
within a cluster that passed an initial fixed cutoff u1 is

∞  m   −1 (u ) − ρ u 
m0
(Fm (t) − lim m t)
≤ q + sup 0 1 li
t≥δ (Fm (t) + Gm (t)) ∨ m
1 pli = 
zli m 1 − ρli
2

(lim mm0 − m̂m0 (1 + q))t ∗    −1 
+ sup + I {tm < δ}. m0  (u1 ) − ρli u − μi
t≥δ (Fm (t) + Gm (t)) ∨ m
1
+ 1−  φ(u) du
m 1 − ρli2
From (3)–(5) and the definition of δ, it follows that (a) the
Downloaded by [University of Waikato] at 01:28 11 July 2014

second term is asymptotically negative because these condi-    


m0 m0 −1
 −1
tions guarantee that the variance of Fm (t) is asymptotically × u1 + 1 − ( (u1 ) − μi ) , (6)
0, so lim Fm (t) ≤ lim( mm0 )t; (b) the third term is asymptoti- m m
cally negative because our estimate m̂m0 = m−R V1 m0
m ≥ (1 − m0 ) m ,
1 where , , and φ are the cumulative distribution, the right tail
where R1 and V1 are the number of hypotheses rejected and probability, and the density of a standard normal distribution.
rejected falsely in the first stage, has a lower bound, because Lemma 1. Assume that the nonnull clusters’ standardized z
E(V1 ) ≤ m0 q : limm→∞ m̂m0 (1 + q) ≥ limm→∞ (1 − q ) mm0 (1 + scores are normally distributed. Then pli given in (6) is a valid
q) = lim mm0 ; and (c) the fourth term is 0. Therefore, the asymp- p value for testing whether location l in rejected cluster i con-
totic upper bound for the FDR is q. tains signal.
Genovese and Wasserman (2004), Storey et al. (2004), and Proof. We must show that pli ∼ U(0, 1) [or stochastically
Ferreira and Zwinderman (2006) have published similar results larger than U(0, 1)] if location l in cluster i contains no signal.
under slightly different conditions and for different estimates of Assume, as in Genovese and Wasserman (2002), that the cluster
the proportion of null hypothesis. hypothesis comes from a Bernoulli distribution with marginal
probability m0 /m of being a null hypothesis. Then
4. HIERARCHICAL TESTING PROCEDURE
pvalue
As explained earlier, even a researcher interested in clusters 
= P0 Zli ≥ zli |Z i /σZ i ≥ −1 (u1 )
prefers to avoid errors in specific locations, so after applying
an FDR controlling procedure on clusters, we look within the P0 (Zli ≥ zli , Z i /σZi ≥ −1 (u1 ))
rejected clusters and eliminate locations that contain no signal. =
P0 (Z i /σZi ≥ −1 (u1 ))
For any chosen locationwise test statistic, let the location-

wise p value map be {p̃li |l = 1, . . . , ci , i = 1, . . . , m}, where m0 
the number of locations in cluster i is ci and zli = −1 (1 − p̃li ) = P0 Zli ≥ zli , Z i /σZi ≥ −1 (u1 )|H0i
m
is the corresponding z score for location l in cluster i. The reg-   
ular FDR on locations takes the following form: m0 −1

+ 1− P0 Zli ≥ zli , Z i /σZ i ≥  (u1 )|H1i
 m ci  m
i=1 l=1 Vli     
FDRLOC = E m ci 
I{ m ci Rli >0} , m0 m0  −1
l=1 Rli
i=1 l=1 −1
i=1 × u1 + 1 − P Z i /σZi ≥  (u1 )|H1i
m m
where Rli = 1 if location l in cluster i is rejected and 0 other-
wise, and Vli = 1 if location l in cluster i is rejected erroneously where the subscript “0” indicates that the probabilities are cal-
and 0 otherwise. culated under the null hypothesis. Because Z i is normally dis-
If the test statistic used at the location level is independent tributed under H1i , the p value is exactly pli , because the joint
of that used at the cluster level, then the BH procedure (or its distribution of a location z score with its standardized cluster
two-stage variant) can be directly applied at the location level average z score is bivariate normal with unit variance, correla-
(for locations within rejected clusters) to control FDRLOC . This tion ρli , location mean 0, and mean 0 or μi for the standardized
is also the case asymptotically if the number of locations within cluster average, depending on whether the cluster null hypoth-
each cluster grows to infinity in such a way that the correlation esis is true or false.
between the location test statistic and that of its cluster goes to Note that the average of ci independent (or m-dependent)
0. But what if the location test statistic is correlated with the random variables will be approximately Gaussian for ci suf-
test statistic at the cluster level? To answer this, we first test the ficiently large. Nonetheless, some caution should be exercised
1276 Journal of the American Statistical Association, December 2007

in using (6) because it is sensitive to the Gaussian distribution c. Apply the two-stage procedure at level q2 :
assumption of the average z score in the extreme right tail of the (1) Sort the resulting m2 p values p̂li : p̂(1) ≤ · · · ≤
distribution for nonnull clusters. Thus it is possible to take the p̂(m2 ) . Let q2 = q2 /(1 + q2 ) and k1 = max{j : p̂(j ) ≤
more conservative approach of approximating the p value by (j/m2 )q2 }.
an upper bound that will be valid regardless of whether or not (2) Let m̂02 = m2 − k1 and k2 = max{j : p̂(j ) ≤ (j/
the Gaussian distribution assumption on nonnull clusters holds. m̂02 )q2 }. Keep all locations with p̂li ≤ (k2 /m̂02 )q2 in
This upper bound occurs when μi → 0: the clusters, trimming the others.
∞ −1
 m
• (zli ) + u11 { zli ((  (u1 )−ρ li u
))φ(u) du − (zli )u1 }. Consider the condition limm→∞ ( m i=1 ci z·i )/ i=1 ci ≤
2
1−ρli mini:i∈I1 ,i rejected E(Z i ). This condition is satisfied if the ex-
Note that the smallest p value is achieved when μi = ∞. pected cluster average of every nonnull cluster that is rejected
∞ −1
• (zli ) + u1 +(m−m
1
0 )/m0
{ zli ((  (u1 )−ρ
2
li u
))φ(u) du − in the testing stage is larger than the expected average of the
1−ρli entire z-score map. This is a fairly weak assumption, especially
(zli )u1 }, so this upper bound is a tight upper bound as when m0 /m is close to 1.
the fraction of null clusters goes to 1.
Lemma 2. Assume that the conditions of Theorem 2 and
In practice, the parameters necessary for the p value calcula- Lemma 1 are satisfied. Moreover, assume that the correlations
tion are unknown and must be estimated from the data. More- between the locations within the clusters are consistently esti-
over, unless the locations have independent statistics, the stan- mated and that the foregoing additional condition holds. Then
dard error of Z i , σZi , is unknown and must be estimated from pˆli converges to a random variable that is stochastically larger
Downloaded by [University of Waikato] at 01:28 11 July 2014

the data. The first step toward estimation of ρli and σZ i is es- than pli .
timating the correlations between locations that belong to the
See Section A.2 for a proof.
same clusters. We suggest estimating the correlations between
Lemma 2 guarantees that the p values are asymptotically
locations by first estimating the variogram, var(Zli − Zki ) =
valid, in the sense that p̂li∞ ≡ lim p̂li is U(0, 1) or stochastically
2γ (Zli , Zki ). If the Z score process satisfies the following “in-
larger than the uniform under the null hypothesis. The condi-
trinsic stationarity on clusters” assumptions (which are satis-
tions of Theorem 2 guarantee that the dependency between the
fied if the Z map is intrinsically stationary): var(Zli − Zki ) =
p values in the set {p̂li∞ } is local, so that the two-stage procedure
2γ (sli − ski 2 ), where sli and ski are the locations in space of
in Procedure 3 controls the FDRLOC asymptotically at level q2
Zli and Zki and E(Zli ) = E(Zki ), then the variogram 2γ (h)
when the asymptotic threshold is >0.
can be empirically estimated. But because E(Zli ) = E(Zki )
holds in most pairs of locations, the variogram is estimated ro- 5. A SIMULATION STUDY
bustly by (see Cressie 1991, p. 75)
  We conducted a simulation study to validate that the clus-
2γ̂ (h) = med{(Zki − Zli )2 : (ski , sli ) ∈ N (h)} /.455. (7) ter testing and trimming procedure controls the FDRLOC at
On the estimated pli ’s, we propose using the two-stage FDR level q2 under realistic settings, and also to compare the per-
procedure of Benjamini et al. (2006). The gain in power over formance of Procedures 1 and 3 with that of a location-wise
the BH procedure may be high, because the potential number of analysis on the entire data. For 1,024 points, we chose 13 dif-
true location discoveries within discovered clusters is expected ferent cluster configurations: equal size clusters (of size 2, 4,
to be quite large. The following procedure summarizes the sug- 8, 16, and 32); uniform ∩-shaped or ∪-shaped symmetric and
gested analysis steps so far. right- and left-skewed distributions of sizes. For each clus-
ter configuration, the percentage of active clusters considered
Procedure 3 (The cluster testing and trimming procedure). was p = 0, .01, .05, .10, .15. The proportion of active locations
1. Testing stage: within each active cluster was h = 1, .75, .5, .25. The propor-
tion h was either fixed for a data set or varied among clusters
a. For every location, calculate its original p value and let with the foregoing set average. The signal at each location was
zli be the corresponding z score for location l in cluster i. either 0 or positive. Active locations had either identical sig-
b. Use the robust variogram estimator (7) to estimate the cor- nal intensity or variable signal intensity; in the latter, nonzero
relation between the z scores of locations within clusters. signals were drawn independently from a truncated normal dis-
c. Let the correlation between two locations be ρ̂l,ki =1−
tribution with fixed mean and variance (with the coefficient of
 i l−1 i variation of the location means within a cluster ranging from .2
γ̂ (sli − ski ). Then σ̂Zi = c1i (ci + 2 cl=1 k=1 ρ̂l,k ).
to 2). White noise was added to each location so that the SNR
d. Apply Procedure 1 at level q on {(Z i /σ̂Zi ), i = 1, . . . , ranged from 0 to 5. The same noise was added for all signal and
m}. Let k be the number of clusters rejected, and let u1 cluster configurations. We used 150 simulation repetitions. The
be the cutoff point of the largest p value rejected (e.g., simulations were performed in Matlab, version 6.5.
u1 = kq/m for a BH procedure). For simulations with spatial dependence, a map of 32 × 32
2. Trimming stage: points was considered with various cluster and signal config-
 ci m urations. White noise convolved with a Gaussian filter created
a. Let m̂0 = m−k , μ̂i = ( m i=1 l=1 zli / i=1 ci )/σ̂Z i , spatially correlated noise. The filter width varied in the simula-
1−q
ci
ρ̂li = (1 + k=l,k=1 ρ̂l,k )/(ci σ̂Z i ) .
i
tions.
b. Estimate the p value by plugging in the estimated para- Procedure 3 was applied to each simulated data set. The
meters in eq. (6) to get p̂li . analysis of the testing stage was restricted to the BH procedure
Benjamini and Heller: False Discovery Rates for Spatial Signals 1277

with a .05 threshold. In the trimming stage, q2 was either .05 or around .05. After the trimming stage, the FDRCLUST was much
.25. The FDR was estimated by averaging over the simulations lower than .05 for small μ. Note that although Procedure 3 with
the proportion of false units rejected among all rejected units. fixed q2 does not guarantee preservation of FDRCLUST , it was
For FDRCLUST and FDRLOC , the units were clusters and loca- preserved in all of our simulations. The reason why it is low
tions. Note that after the trimming stage, a cluster was a dis- for small μ is that only very few locations were rejected, and
covery if at least one location within the cluster was rejected. these belonged mostly to true cluster rejections. The estimated
If no clusters were rejected in the testing stage, then the re- FDRCLUST was most variable for small μ and large c (with stan-
alized FDRCLUST and FDRLOC were set to 0. The power was dard errors at most .021, .010, and .013 after the testing stage
estimated by averaging over the simulations the proportion of and the trimming stage with either q2 = .05 or .25). FDRLOC
true discoveries out of the number of potential discoveries at the was not preserved after the testing stage, yet for all signal and
appropriate units, either locations or clusters. If no discoveries cluster configurations, the FDRLOC was below q2 after the trim-
were made in the testing stage, then the power was set to 0. Four ming stage. This held even for fairly large deviations from the
power measures were calculated: the power of clusters after the fixed alternative model (with standard errors at most .03 after
testing stage, the power of locations after the trimming stage the testing stage and .02 after the trimming stage).
with q2 = .05 and q2 = .25, and the power of a locationwise
Power. Figure 2 shows the power improvements achieved
analysis on the entire data.
by Procedures 1 and 3 over a single location analysis of the
False Discovery Rate Control. For all signal configura- entire data at the .05 level as a function of μ (with standard
tions, even with variable cluster size, variable signal intensity μ errors ≤ .02). Note that the power advantage was largest when
Downloaded by [University of Waikato] at 01:28 11 July 2014

for nonnull locations, and a variable percent of active locations μ was not too small and not too large. In Figure 2 the single
within an active cluster h, control over FDRCLUST and FDRLOC location analysis had more power only for percent signal√ within
was achieved. Figure 1 shows the results graphically for a small cluster h = .25 and cluster size c = 8. This is so because ch <
representative subset of signal configuration. In this subset, 1 in this case.
15% of the clusters were active, the cluster sizes were equal, For independent test statistics, we observed through simu-
and the data satisfied the fixed alternative model assumptions. lations that the power was higher for testing √ clusters than for
The figure shows that the FDRCLUST after the testing stage was testing a single location analysis if h > 1/ c, where h is the

Figure 1. FDRCLUST (left) and FDRLOC (right) as functions of μ for clusterwise analysis (− · −·), Procedure 3 with q2 = .05 (− − −−),
and Procedure 3 with q2 = .25 (—). The FDRCLUST after the testing stage is around .05. After the trimming stage, the FDRCLUST is much
lower than .05 for small μ. The FDRLOC is below q2 after the trimming stage.
1278 Journal of the American Statistical Association, December 2007
Downloaded by [University of Waikato] at 01:28 11 July 2014

Figure 2. Power difference from locationwise analysis as a function of μ for (1) clusterwise analysis (− · −·), Procedure 3√with q2 = .05
(− − −−), and Procedure 3 with q2 = .25 (—). The power advantage
√ is largest when μ is not too small and not too large. As ch increases,
the advantage becomes larger. A disadvantage occurs only when ch < 1.

average percent of signal within clusters containing signal and In this fMRI example, an observer viewed stimuli that con-
c is the average cluster size. The gain in power was due to the tained perceptually completed (“illusory”) contours versus a
reduced number of hypotheses tested, as well as to the higher control stimuli that shared local features but did not contain illu-
H √
SNR per cluster than per location [E(Z i /σZ i ) =1i chμ/σ and sory contours in a block-design experiment. We calculated the
H (s) p value of each voxel from a generalized linear model. Next,
E(Zi (s)) = 1
μ/σ ]. Note that although a cluster is considered
we applied Procedure 3 within the ROI with the modification
a true discovery if at least one location within it contains a sig-
that the variogram was estimated from brain regions outside the
nal (i.e., ch > 1), the power is guaranteed to increase only if
√ √ ROI. Procedure 3 is valid in this setting because (a) intrinsic sta-
ch > 1. As ch increased, the advantage in power of our pro-
tionarity is a good approximation for brain imaging data (see,
cedures over single location analysis also increased. Of course,
e.g., Worsley and Friston 2000), (b) we need this assumption
the gain in power was even larger for q2 > .05.
only for distances within clusters, and (c) according to Gen-
6. A FUNCTIONAL MAGNETIC RESONANCE ovese, Lazar, and Nichols (2002), the correlations between lo-
IMAGING EXAMPLE cations are local and tend to be positive, so moving to clusters
the correlations also will be positive and local.
A functional magnetic resonance imaging (fMRI) signal is Testing clusters at an FDR level of .05 within the ROI, we
recorded over time for a series of brain slices while the subject found 14 clusters consisting of 195 voxels. Trimming further at
performs a sequence of cognitive tasks. The fMRI researcher an FDR level of .25 and .05 revealed 118 voxels within the 14
looks for brain regions that are correlated with the experimental clusters and 34 voxels within 12 clusters. In comparison, single-
paradigm in the form of clusters of voxels showing task-related voxel testing at an FDR level of .05 (using the two-stage FDR
activity, whereas the unit of a “voxel” is arbitrarily determined procedure) found 36 voxels. Figure 3 shows the estimated lo-
by the measurement technique and does not represent a primary cation p values within the detected clusters in slices 9 and 10.
neural entity. We used the correlations between neighboring From the estimated location p values within rejected clusters,
voxels to identify clusters during a preparatory scan (see Heller we can see that, for example, for the light-blue cluster, most of
et al. 2006 for details). From the preparatory scan, we also de- the cluster contained small p values indicating signal except at
fined a limited region of interest (ROI; see Heller et al. 2006 for the edges in slice 10. The largest p values rejected when trim-
details) that contained 232 voxels grouped into 20 clusters. ming at FDR levels of .25 and .05 were .13 and .01 respectively.
Benjamini and Heller: False Discovery Rates for Spatial Signals 1279

(a)

(b)
Downloaded by [University of Waikato] at 01:28 11 July 2014

Figure 3. Results of the cluster testing and trimming procedure in brain slices 9 and 10. (a) Clusters detected in the testing stage (every cluster
in a different color). (b) Estimated location p values within detected clusters. The largest p values rejected when trimming at an FDR level of
.25 and .05 were .13 and .01.

If the investigator wants control over FDR on locations at the et al. (2006) to arbitrary weights and showed that it controlled
.05 level, then Procedure 3 discovers almost the same voxels as the WFDR under independence of test statistics. When the
single voxel testing. But if the investigator is willing to control test statistics are not independent, we know that Procedure 1
the FDRLOC at a higher level (say .25), then, as long as he or controls the WFDR if the PRDS property is satisfied. We be-
she already has control over FDRCLUST at the .05 level, many lieve that, as argued by Benjamini et al. (2006), the two-stage
more voxels are discovered. weighted procedure (Procedure 2) controls the WFDR under
Remark 1. Typically, the experiment is run on more than one dependency, but this remains to be proved. We showed that the
subject, and a group analysis is performed per voxel. We can procedure is valid for local dependencies asymptotically.
apply our clustering algorithm on the averaged or concatenated To the best of our knowledge, the only previous attempt
multisubject time series to find clusters common to all subjects. to control the FDR on clusters was made by Pacifico et al.
Once the cluster units are defined, our suggested testing pro- (2004). They defined the FDR criterion on clusters as E τ (T )
cedures can be easily applied. For example, we can apply Pro- where τ (T ) = #{1 ≤ k ≤ mT : λ0 (Ci )/λ(Ci ) ≥ τ }/mT , mT
cedure 3 on the p-value map produced from a standard fMRI is the number of clusters rejected, and τ is the maximal pro-
group analysis per voxel. The difficult question is how to define portion of null signal allowed within a true cluster discovery.
common cluster units for the group that take into account the At first glance, our definition of FDR on clusters seems simi-
variability between the subjects. lar to theirs, except that we take τ = 1; however, the mT clus-
ters rejected by Pacifico et al. (2004) were found by consid-
7. DISCUSSION ering a rejection threshold T (X) and the rejection set RT =
We argued in favor of testing clusters of locations rather than {s ∈ S : X(s) ≥ T (X)}. Then the decomposition of RT into con-
individual locations in situations where (a) the investigator’s nected components C1 , . . . , CmT defines the set of clusters re-
main interest is in regions of activity rather than activity in in- jected. Thus their procedure is based on each location’s individ-
dividual locations, (b) the SNR in individual locations is low ual test statistic, and no advantage is taken of the SNR increase
but increases when pooling information from neighboring loca- that an aggregation of locations within a cluster can offer. In
tions, and (c) the number of locations tested is high. this article we have shown the advantage of such averaging, in
We suggested controlling the FDR or the WFDR on clus- terms of power, over single location analysis. The partition of
ters. We generalized the two-stage FDR procedure of Benjamini Pacifico et al. (2004) is obtained from the data used for testing,
1280 Journal of the American Statistical Association, December 2007

 1  
whereas our method relies on the assumption that the investi- = E wi Ri I{m wi Ri =a}
gator can obtain a partition from other data or other informa- a i=1
a>0 i∈I0
tion, but this is quite often feasible in spatial applications. An  1  
advantage of the approach of Pacifico et al. (2004) is that the = wi E Ri I{m wi Ri =a}
a i=1
clusters detected are separate regions, whereas our method may i∈I0 a>0
find a region comprised of several detected clusters. An inter-  
 1 
m
esting point for further research would be to combine the two = wi P Ri = 1 ∩ wi Ri = a
a
approaches, so as to gain power from cluster testing as well as i∈I0 a>0 i=1
control the FDR on separate regions.  
 1 
m
The gain in power resulting from testing clusters rather than = wi P Ri = 1 ∩ wj Rj = a − wi
individual locations depends on the quality of the partition. For a
i∈I0 a j =1,j =i
independent test statistics, we showed through simulations that  
√  1 
m 

the power is higher for testing clusters if h > 1/ c, where h is = wi EP (i) P Ri = 1 ∩ wj Rj = a − wi 
the average percentage of the signal within clusters containing a
i∈I0 a>0 j =1,j =i
a signal and c is the average cluster size. A small simulation 
study that examined the damage in applying the cluster-based P (i)
analysis when in fact there was no cluster structure in the data
showed hardly any damage (though no advantage as well) when  
= wi EP (i) Q P (i) ,
the fraction
√ of locations containing signal in the data was at
Downloaded by [University of Waikato] at 01:28 11 July 2014

i∈I0
least 1/ c.
For each cluster detected, we can conclude only that there where P (i) a vector of p values of the m − 1 hypotheses excluding
is signal somewhere within the cluster. We developed the clus- H0i and Q(P (i) ) corresponds to the expression in the square bracket
ter testing and trimming procedure to determine the location of above it.
Let P0i be the p value associated with H0i , and for each value of
the signal within the detected cluster. This procedure extends
P (i) , let r(P0i ) be the sum of weights rejected and l(P0i ) be the in-
the theory on hierarchical FDR controlling procedures of Yeku- dicator that H0i has been rejected as a function of P0i . Because the p
tieli et al. (2006) to control the FDR at the desired level, even values are independently distributed (specifically, P0i is independent
though the test statistics between the two levels of testing are of P (i) ), for every i ∈ I0 , we can express Q(P (i) ) as
 
dependent.  l(P0i )  (i) l(p)
The degree of confidence with which we report our discover- Q P (i) = EP0i P = dF01 (p), (A.1)
r(P0i ) r(p)
ies depends on the FDR levels used. We can achieve the same
where F01 is the cumulative distribution function of P01 .
degree of confidence as when testing individual locations only,
Note that mˆ0 = mˆ0 (P0i , P (i) ) is nondecreasing in P0i for fixed
but this is not necessary. For example, the investigator may be P (i) and can have two values depending on whether P0i is re-
satisfied knowing that the detected clusters with no signal at j1
jected in the first stage: m01 (i) = m − l=1 w(l) − wi if P0i ≤
all make up no more than 5% of the clusters (on average), but j1  j2
( l=1 w(l) + wi )q /m and m02 (i) = m − k=1 w(k) otherwise,
that 25% of the detected locations within these clusters may be 
(i)
false. The willingness to allow a large FDR level for individual where j1 = arg max1≤k≤m−1 {P(k) ≤ ( kl=1 w(l) + wi ) qm } and j2 =
(i) 
locations follows from the fact that the necessary precision is arg max1≤k≤m−1 {P(k) ≤ ( kl=1 w(l) ) qm }.
often that of a general region, rather than the exact spatial loca- Let the number of hypotheses rejected in the second stage along
tions where the signal is present. We allow the investigator to (i)
with H0i be rh + 1, where rh = arg max1≤k≤m−1 {P(k) ≤
choose the degrees of confidence that he feels are necessary for k
( l=1 w(l) + wi )q /m0h (i)}, h = 1, 2. Using (A.1), we get
his application.  j1
Although the p value calculation for the trimming stage is  1 l=1 w(l) + wi
Q P (i) = r1 q
exact, taking into account that the cluster average has passed an w + w i m
l=1 (l)
initial cutoff point, the estimated trimming stage p values are
1
conservative due to the conservative estimation of the unknown + r2
parameters. If these parameters were known, then the FDR con- l=1 w(l) + wi
trolling procedure would have been even more powerful. This  r2  j1 
l=1 w(l) + wi l=1 w(l) + wi
is a point for further research. × q − q
m02 (i) m
APPENDIX: PROOFS OF THEOREMS  r2 
1 l=1 w(l) + wi
≤ r2 q
l=1 w(l) + wi
A.1 Proof of Theorem 1 m02 (i)
The proof makes use of the ideas in Benjamini et al. (2006). Let a q
be the sum of weights rejected. Note that a need not be an integer, but = , (A.2)
m02(i)
that its range is finite, because we have a finite number of weights. The
WFDR is where the inequality
 follows because r2 ≤ r1 . A lower bound on
m02 (i) is wi + j ∈I0 ,j =i wj Yj , where Yj ∼ B(1, 1 − q ).
WFDR
  Lemma A.1. If Yj ∼ B(1, 1 − q ), j = 1, . . . , m0 , are independent,
i∈I wi Ri  m0
= E m 0 I{ m Ri >0} then i=1 wi E( m0 1 1 (1 − (q )m0 ).
) = 1−q
i=1 wi Ri
i=1 wi + wj Yj
j =1,j =i
Benjamini and Heller: False Discovery Rates for Spatial Signals 1281

limm→∞ p(u1 , Aˆ0 , μ̂i , ρli ) = p(δ, A∞ ∞


d
Proof. For convenience we index the null p values and their asso-
0 , μc , ρli ). Property 1 guar-
ciated weights by {1, . . . , m0 }. Let S(k, i) and S(k) denote all possible ∞
antees that p(δ, A0 , μc , ρli ) ≥ p(δ, A0 , μ∞

c , ρli ). Property 2, to-
subsets of size k from {1, . . . , i − 1, i + 1, . . . , m0 } and {1, . . . , m0 }, gether with the fact that P0 (Zli ≥ zli |Z i /σZ ≥ −1 (u1 ), H1i ) is de-
i
m0
   creasing in μi [because the joint density of (Zli , Z i /σZ − μi ) is to-
1
wi E m0 i
wi + tally positive of order 2 and thus P0 (Zli ≥ t|Z i /σZ − μi ≥ x, H1i ) is
i=1 j =1,j =i wj Yj i
increasing in x for every fixed t], guarantees that p(δ, A0 , μ∞
c , ρli ) ≤
0 −1 
m0 m
 ∞ ≥ p∞ .
p(δ, A0 , μi , ρli ). Therefore p̂li
w
(1 − q )k (q )m0 −1−k
li
= i
wi + j ∈s wj
[Received February 2006. Revised June 2007.]
i=1 k=0 s∈S(k,i)

0 −1
m m0  REFERENCES
 w
= (1 − q )k (q )m0 −1−k i Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery Rate:
wi + j ∈s wj A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal
k=0 i=1 s∈S(k,i)
Statistical Society, Ser. B, 57, 289–300.
0 −1
m   (1997), “Multiple Hypotheses Testing With Weights,” Scandinavian
wj
= (1 − q )k (q )m0 −1−k  Journal of Statistics, 24, 407–418.
k=0 s∈S(k+1) j ∈s j ∈s wj Benjamini, Y., and Yekutieli, Y. (2001), “The Control of the False Discovery
Rate in Multiple Testing Under Dependency,” The Annals of Statistics, 29,
0 −1
m   1165–1188.
m0 Benjamini, Y., Krieger, A. M., and Yekutieli, D. (2006), “Adaptive Linear Step-
= (1 − q )k (q )m0 −1−k .
k+1 up False Discovery Rate–Controlling Procedures,” Biometrika, 93, 491–507.
k=0 Cressie, N. (1991), Statistics for Spatial Data, New York: Wiley.
Downloaded by [University of Waikato] at 01:28 11 July 2014

Ferreira, J., and Zwinderman, A. (2006), “On the Benjamini–Hochberg


The result is immediate: Method,” The Annals of Statistics, 34, 1827–1849.
 1 Genovese, C., and Wasserman, L. (2002), “Operating Characteristics and Ex-
WFDR ≤ q wi EP (i) tensions of the False Discovery Rate Procedure,” Journal of the Royal Statis-
m02 (i) tical Society, Ser. B, 64, 499–517.
i∈I0
   (2004), “A Stochastic Process Approach to False Discovery Control,”
1 q The Annals of Statistics, 32, 1035–1061.
≤q wi E m0 ≤ = q. Genovese, C., Lazar, N., and Nichols, T. (2002), “Thresholding of Statistical
wi + j =1,j =i wj Yj
1−q
i∈I0 Maps in Functional Neuroimaging Using the False Discovery Rate,” Neu-
roImage, 15, 870–878.
A.2 Proof of Lemma 2 Heller, R., Stanley, D., Yekutieli, D., Rubin, N., and Benjamini, Y. (2006),
“Cluster Based Analysis of FMRI Data,” NeuroImage, 33, 599–608.
Let A0 = m0 /m and A1 = 1 − A0 . Let us first show, under the Kling, Y. (2005), “Issues of Multiple Hypothesis Testing in Statistical Process
conditions of Lemma 2, the asymptotic properties of our estimators. Control,” thesis, The Neiman Library of Exact Sciences & Engineering, Tel-
Aviv University.
Property 1. Let A∞ ∞
0 = limm→∞ m̂0 /m. Then A0 ≥ A0 with Pacifico, M., Genovese, C., Verdinelli, I., and Wasserman, L. (2004), “False
probability 1, Discovery Control for Random Fields,” Journal of the American Statistical
Association, 99, 1002–1014.
mˆ0 (m − R1 ) (m0 − V1 ) 1 − V1 /m0 Penny, W., and Friston, K. (2003), “Mixtures of General Linear Models for
= ≥ = A0 ,
m m(1 − q) m(1 − q) 1−q Functional Neuroimaging,” IEEE Transactions on Medical Imaging, 22,
504–514.
where V1 and R1 are the number of falsely rejected and rejected null Sartorius, B., Jacobsen, H., Torner, A., and Giesecke, J. (2006), “Description
hypotheses at the testing stage. The result follows because at the testing of a New All-Cause Mortality Surveillance System in Sweden as a Warning
System Using Threshold Detection Algorithms,” European Journal of Epi-
stage for a rejection, the p value must be <q, so limm→∞ V1 /m0 < q. demiology, 21, 181–189.
m ci m
Property 2. Let μ∞ c = limm→∞ i=1 l=1 zli / i=1 ci . Then
Shen, X., Huang, H.-C., and Cressie, N. (2002), “Nonparametric Hypothesis
Testing for a Spatial Signal,” Journal of the American Statistical Association,
μ∞c ≤ E(Z i ) with probability 1; this follows from the condition on 97, 1122–1140.
the cluster means in the theorem. Storey, J., Taylor, J., and Siegmund, D. (2004), “Strong Control, Conservative
Point Estimation, and Simultaneous Conservative Consistency of False Dis-
Property 3. The cluster testing procedure corresponds to a positive covery Rates: A Unified Approach,” Journal of the Royal Statistical Society,
asymptotic threshold procedure with threshold δ (defined in Thm. 2). Ser. B, 66, 187–205.
Worsley, K., and Friston, K. (2000), “A Test for Conjunction,” Statistics and
The asymptotic p value of a location in the trimming stage Probability Letters, 47, 135–140.
∞ = p (δ, A , μ , ρ ). The esti- Yekutieli, D., Reiner, A., Elmer, G., Kafkafi, N., Letwin, N., Lee, N., and
is, using the notation in (6), pli li 0 i li Benjamini, Y. (2006), “Approaches to Multiplicity Issues in Complex Re-
ˆ ∞=
mated p value is pli (u1 , A0 , μ̂i , ρ̂li ). By Slutsky’s theorem, p̂li search in Microarray Analysis,” Statistica Neerlandica, 60, 414–437.

You might also like