Professional Documents
Culture Documents
x
December 2007
Summary. A challenging and crucial issue in clinical studies in cancer involving gene microarray experi-
ments is the discovery, among a large number of genes, of a relatively small panel of genes whose elements
are associated with a relevant clinical outcome variable such as time-to-death or time-to-recurrence of dis-
ease. A semiparametric approach, using dependence functions known as copulas, is considered to quantify
and estimate the pairwise association between the outcome and each gene expression. These time-to-event
type endpoints are typically subject to censoring as not all events are realized at the time of the analysis.
Furthermore, given that the total number of genes is typically large, it is imperative to control a relevant
error rate in any gene discovery procedure. The proposed method addresses the two aforementioned issues by
direct incorporation of the censoring mechanism and by appropriate statistical adjustment for multiplicity.
The performance of the proposed method is studied through simulation and illustrated with an application
using a case study in lung cancer.
Key words: Copula; Dependence; Multiple testing; Multivariate distribution; Permutation resampling;
Semiparametric model; Survival analysis.
1. Introduction for which key concepts and standard examples, albeit outlined
Given the emergence and promise of high throughput of ge- briefly, are provided in the next section, render any joint dis-
nomic technology, modern clinical studies in cancer are com- tribution function expressible as a function of its marginals
plemented by bioinformatics correlative science companions and a scalar, which stipulates the degree of dependence. The
involving the analysis of large-scale gene microarray expres- model is semiparametric in the sense that although the pair-
sion data. A challenging and crucial objective is to investi- wise dependence structure is parametric, by virtue of the cho-
gate potential associations among the clinical endpoints of sen copula, no parametric assumptions are imposed on the
the study, such as time to recurrence or time to death, with marginal distributions of the time-to-event variable and the
these genes. More specifically, what is of interest is to find, gene expressions. The estimation will be carried out by opti-
among a large panel of genes, a relatively small subpanel of mizing a pseudolikelihood, which allows for direct incorpora-
genes whose expressions are associated with these clinical out- tion of the censoring mechanism. For illustrative purposes, we
comes. In many cases, the clinical outcome is subject to cen- will address the issue of multiplicity by controlling a family-
soring, as the corresponding event for a patient may not have wise type error rate. The proposed method, to be discussed at
occurred at the time of the final analysis or for example if that a later point, is flexible in allowing other adjustment methods
patient were to be lost to follow-up. As such, the appropriate for multiplicity.
modeling of the dependence structure and incorporation of In the forthcoming discussions, it is assumed that, follow-
the censoring mechanism are imperative. Furthermore, given ing various preprocessing steps (e.g., normalization, filtering,
that relative to the number of samples, the number of genes is and aggregation), what is made available, in addition to the
rather large, one must appropriately control a relevant error clinical outcome data, for the final analysis, is a matrix of
rate (e.g., the family-wise error rate [FWER] or the false- observed expression data of dimension n × K, where n and
discovery rate [FDR]) in the gene discovery procedure. K denote the number of patients and genes, respectively. The
Our proposed semiparametric approach assumes a pair- effects and ramifications of these processing steps may be pro-
wise parametric dependence structure between each gene found and there is potential for missing data. We point out
expression and the time-to-event outcome using dependence in advance that the proposed method, as presented here, may
functions known as parametric copulas. Parametric copulas, not adequately account for these effects.
C 2007, The International Biometric Society 1089
1090 Biometrics, December 2007
There is a vast amount of literature concerning the prob- stipulates the type (i.e., negative or positive) and the amount
lem of associating gene expressions to a time-to-event end- of dependence. Moreover, it is noted that marginals do not
point. A limited overview of some of the methods discussed depend on θ. In turns out that there exists a very rich class of
in this literature is provided next. The pairwise relation- functions, namely parametric copulas, which yield bivariate
ships between each gene expression and the time-to-event distributions expressible as a function of their marginals and
outcome can be estimated within the framework of a Cox a finite-dimensional dependence parameter.
model (e.g., Dhanasekaran et al., 2001; Wigle et al., 2002) A copula, from a statistical point of view, can be defined as
or by using a rank-covariance estimator (Jung, Owzar, and a bivariate distribution function with uniform marginals. An
George, 2005). Another frequently used approach (see, e.g., important result, generally attributed to Sklar (1959), stip-
André et al., 2002; Shannon et al., 2002), used to circumvent ulates that for any bivariate distribution function F, with
the censoring issue by considering an event, rather than a marginals say F1 and F2 , there exists a copula C, such that
time to event, type endpoint by dichotomizing the outcome (1), expressible as
variable at a given clinically relevant time-point (e.g., 2-year
F[x, y] = C[F1 [x], F2 [y]], (3)
survival). The dichotomization yields the problem amenable
to the employment of classical statistical methods for two- for (x , y)T in the support of F. Furthermore, if F1 and F2 are
sample problems (e.g., t-test or Wilcoxon test) as well as clas- continuous, then the given representation is unique. It should
sification and statistical learning methods (e.g., neural nets, be noted that these C functions have aptly been named copu-
support vector machines, k-nearest neighborhood, classifica- las as they couple a joint distribution to its marginals. A triv-
tion and regression trees). The dichotomization approach may ial but rather useful converse to this result states that for any
be biased as the length of follow-up of patients in a clinical given copula function C and pair of continuous marginals F1
study varies and may not be fully efficient as it fails to incor- and F2 the function defined as F[x, y] := C[F1 [x], F2 [y]] yields
porate all available follow-up data. Rather than the response, a bona fide bivariate distribution function. Sklar’s result and
the gene expressions may be dichotomized (e.g., under ver- its converse also extend to joint survival functions (i.e., a bi-
sus overexpressed) and the discrepancy in the time-to-event variate survival function is expressible as a function of its
profiles may then be assessed using, for example, the log-rank marginal survival functions via a copula). We will limit our
statistic (e.g., Jenssen et al., 2002). The results based on this attention to the case where both marginals are continuous, to
approach may be sensitive to the cutoff. Finally, we mention ensure identifiability of C in (3), and shall refer to C as the
the method proposed by Li and Luan (2003) who employ the copula generating the distribution F.
support vector machine approach by using the negative of the Sklar’s result relates copulas to the study of bivariate dis-
log-partial-likelihood of the Cox model as the loss function. tribution functions. Next, we will attempt to illustrate how
Shoemaker and Lin (2005) provide, in addition to a review copulas relate more specifically to the study of statistical de-
article with an extensive bibliography (see chapter 2), a num- pendence. Most standard measures of association, such as
ber of articles employing a diverse collection of methods for Pearson’s correlation and Spearman’s rank correlation mea-
this problem. sure, as thoroughly discussed in Schweizer and Wolff (1981),
can be expressed in terms of the copula generating the corre-
2. Copulas: An Overview
sponding distribution function. For example, Spearman’s rank
The following section will provide a brief overview of some of correlation coefficient can be expressed, by a using simple in-
the key concepts behind copulas in relation to the study of tegral transformation, as
bivariate distributions and dependence. To this end, we have
often found the presentation of the following example useful. ρ[F] = 12 {F[x, y] − F1 [x]F2 [y]} dF1 [x] dF2 [y]
It is a matter of simple algebraic manipulations to show that R R
a standard bivariate (mean zero and unit variance) normal 1 1
distribution function, say F with parameter θ is expressible = 12 {C[u, v] − uv } du dv. (4)
as 0 0
The last expression can also be deduced from the fact that ρ[F]
F[x, y] = CN [F1 [x], F2 [y], θ], (1)
is the Pearson correlation of (F 1 [X], F 2 [Y ])T . More precisely,
for (x, y)T ∈ R2 , where (X, Y)T is distributed according to C[F1 [x], F2 [y]] if and only if
(F 1 [X], F 2 [Y ])T , a random pair with uniform marginals with
CN [u, v; θ]
variances of 1/12, is distributed according to C[u, v]. Note
ξu ξv
1 a2 − 2θab + b2 that, as expected, the copula presentation of Spearman’s co-
= √ exp − da db, (2) efficient of F is free of the marginals and completely specified
2π 1 − θ2 −∞ −∞ 2(1 − θ2 )
by virtue of its generating copula.
for (u, v )T ∈ [0, 1]2 , and where ξ p denotes the quantile func- The functions defined as C0 [u, v] = uv , C− [u, v] = max{u +
tion for a univariate standard normal distribution evaluated v − 1, 0} and C+ [u, v] = min{u, v}, for every (u, v ) ∈ [0, 1]2 ,
at p ∈ [0, 1]. Here F1 and F2 are both standard univariate are all copulas. Any copula is bounded from below and from
normal distribution functions. What one should take away above by C− and C+ which are generally referred to as
from this example is that as shown in (1), the joint distri- the lower and upper Frechet–Hoeffding bounds, respectively.
bution function F has, by virtue of the function CN , been Given that (X, Y)T is a random pair, whose joint distribution
rendered expressible as a function of its marginals F1 and F2 is generated by a copula C, then trivially X and Y are inde-
and a finite-dimensional parameter θ which, loosely speaking, pendent if and only if C attains C0 . Moreover, X is an almost
A Copula Approach for Detecting Prognostic Genes 1091
surely increasing (decreasing) function of Y if C attains the ing hypotheses can be canonically presented as testing H0k :
upper (lower) Frechet–Hoeffding bound. Note that, from prac- T ⊥ Xk versus H1k : T ⊥ X k , where the symbols ⊥ and ⊥
tical point of view, the independence and Frechet–Hoeffding are used to denote stochastic independence and dependence,
bounds may be of limited use for the empirical study of de- respectively. The main hypotheses of interest are canonically
pendence, as this would require the assessment of the em- presented as testing
pirical discrepancy between the estimated copula surface and
the three bounds. Fortunately, there exists a rich subfamily of
K
The normal copula CN (2) has already been introduced. For The null and alternative hypotheses formulated above can
this copula, CN ( ) C0 as θ ( ) 0, CN C− as θ −1 be identified as a (finite) union-intersection hypothesis test-
and CN C as θ
+
+ 1, where ( ) denotes monotone ing problem (see Roy, 1953). It is assumed that for each k ∈
increasing (decreasing) pointwise convergence. Frank’s copula {1, . . . , K}, the joint distribution of (T, Xk )T is generated by
(Frank, 1979) is defined as a parametric copula C[u, v, θk ] such that
1 (1 − exp[−θu])(1 − exp[−θv]) Fk [t, x] = C[F0 [t], Fk [t], θk ], (9)
CF [u, v, θ] = − log 1 − , (5)
θ 1 − exp[−θ]
where F0 and Fk are the marginal distributions of T and Xk ,
where θ ∈ R − {0}. For this copula CF ( )C0 as θ ( )0, respectively. Note that under this assumption, as C is not
CF C− as θ −∞ and CF C+ as θ + ∞. Finally, subscripted by k, the pairs (T, Xk )T all share the same depen-
Clayton’s copula (Clayton, 1978) is defined as dence structure. What potentially differs among these pairs
1 is the marginal Fk or the dependence parameter θk . Although
CC [u, v, θ] = {u1−θ + v 1−θ − 1} 1−θ . (6) one can weaken the assumption of a common copula, it is not
where θ ∈ (1, ∞). It can be shown that CC C as θ
0
1 deemed to be practical for the problem at hand. We note that
and CC C+ as θ + ∞. We note that the normal copula to associate Fk and θk in (9) with gene k, we have subscripted
and Frank’s copula admit negative as well as positive depen- them by k. This should not give the erroneous impression that
dence structures, while Clayton’s copula only admits a posi- the marginal depends on the parameter θk . We may reformu-
tive dependence structure. Furthermore, for all these copulas late the hypotheses of interest, in terms of testing
the convergence to C0 , C− , or C+ is monotone in θ. As such θ
K
provides insight about the direction (i.e., positive or negative) H0 : C[u, v, θk ] = uv for all (u, v)T ∈ [0, 1]2 , (10)
as well as degree of dependence. k=1
2.2 Further Reading versus
As the provision of a comprehensive account on copulas and
K
their relation to the study of bivariate distributions and de- H1 : C[u, v, θk ] = uv for some (u, v)T ∈ [0, 1]2 . (11)
pendence is not within the purview of this article, what has k=1
been presented is a selection of topics in an attempt to moti-
vate their utility. Most, but not all, of the properties discussed We will assume that the variable T is subject to some in-
naturally extend to higher dimensions. The monographs by dependent right censoring mechanism. As such, what is ob-
Joe (1997) and Nelsen (1998) provide comprehensive accounts served, at time of the analysis, are not T n = (T 1 , . . . , Tn ), the
on copulas. In particular, both monographs provide a exten- actual event times, but rather Y n = (Y 1 , . . . , Yn )T , where
sive list of parametric copulas and their related properties. Yi = min{Ti , Ci } and C n = (C 1 , . . . , Cn )T are the censor-
Both of these references also discuss the application of copu- ing times. The corresponding event variables are defined as
las in the context of survival functions and provide compre- ∆i = I[Ti ≤ Ci ]. It may be helpful to visualize the observed
hensive lists of references on topics related to copulas. data as the n × (2 + K) matrix
3. Copula Modelings in Microarrays
(Y n , ∆n ) X n,1 , . . . , X n,K , (12)
3.1 General Hypothesis
With the preliminary notion of copulas outlined in Section 2, n×2 n×K
our main objective is to incorporate the copula models in mi- where X n,k
= (X k k T
1 , . . . , Xn )
denotes the vector of the n
croarray studies. We start with a general hypothesis formula- expressions for gene k ∈ {1, . . . , K} and ∆n = (∆1 , . . . , ∆n )T
tion. Let T denote a realization of the clinical time-to-event denotes the vector of event variables.
variable of interest and let X1 , . . . , XK denote generic expres-
sions for genes 1 through K. Gene k ∈ {1, . . . , K} is considered 3.2 Single Gene Case
to be prognostic if its corresponding expression, namely Xk , is In this section, we will outline the estimation procedure
associated with the time-to-event variable T. The correspond- to be employed for estimating the dependence between the
1092 Biometrics, December 2007
time-to-event response and a single gene. Following the no- obtain an estimator by optimizing the corresponding pseudo-
tation of the preceding section, the joint distribution of (T, likelihood as:
Xk )T is given by
θ̂nk = θnk F̂0n , F̂kn = argmax n θ | F̂0n , F̂kn . (21)
Fk [t, x] = C[F0 [t], Fk [x], θk ], (13) θ∈Θ
Table 3
This table illustrates the empirical power, TRR and FRR as defined in Table 1, under
HK:D
1 (30) at the nominal two-sided FWER level of α = 0.05. Each example is based on
N = 1000 simulations where the sampling distributions of (28) based on the normal copula
is approximated with B = 1000 random permutations. In this table K denotes the number
genes while D denotes number of prognostic genes. The data were simulated from a
block-compound-symmetric normal distribution given by (35), (36), and (37). Here θ
denotes the correlation between the prognostic genes and the time-to-event variable, ρ
denotes the correlation between the gene expressions and q denotes the expected proportion
of censored observations. The censoring variables were drawn from a uniform distribution.
The 30 (out of K = 4966) genes with the smallest adjusted of view. The normal copula, however, does not admit so-called
P-values, as described in (29), are displayed in Table 4. It tail dependence (see, e.g., Joe, 1997; Nelsen, 1998). The latter
is noted that none of the genes were found to be significant refers to dependence property which stipulates that extreme
at the two-sided 0.05 FWER level. The gene with the small- realizations of one variable are associated with extreme re-
est adjusted P-value was FUCA1 with an adjusted P-value of alizations for the other dependent variable. Furthermore, by
0.0995. The BCR gene has an adjusted P-value of 0.1597. The virtue of the assumption in (13), we have implicitly assumed,
next smallest P-value is 0.3077 for gene LCK. The intersec- as the copula C does not depend on k, that the pairwise de-
tion between the top thirty genes in Table 4 with that of the pendence structures between T and all K genes are identical
genes in the top 100 list of Beer et al. (2002), and with that up to the parameter θk . If a small panel of interesting target
of the genes listed in Table III of Jung et al. (2003) have been genes were to be identified, for example from findings of a pre-
indicated in Table 4. These data were also analyzed by Jung viously conducted pilot study, then one may consider limiting
et al. (2005), who have reported KIAA0084 and NP, with ad- the analysis to the genes in the panel and model the depen-
justed two-sided P-values of 0.0227 and 0.0769, respectively. dence structure for each of the target genes using copulas.
Both these genes are included in Table 4 but with larger ad- This may, needless to say, result in modeling the genes using
justed P-values of 0.3815 and 0.4722, respectively. It should different copulas. The copula model selection methods dis-
be pointed out that the analysis carried out by Beer et al. cussed in Wang and Wells (2000b) may be helpful to this end.
(2002) seemingly does not adjust for multiplicity. As such, a It is important to note that under independence any copula
direct comparison to the results presented by our method and reduces to the independence copula (C[u, v, ] = uv ). As such,
those reported in Jung et al. (2005) may not be appropriate. any misspecification due to a wrong choice of the copula will
This data set has also been analyzed in several other papers not have an effect on FWER and FRR. Such misspecification
(see Shoemaker and Lin, 2005). may degrade the power and TRR.
In addition to the copula method proposed in this pa-
5. Final Remarks per, as we have already pointed out in the literature review,
Although the proposed methodology does not make paramet- one may consider using Cox regression or the rank-covariance
ric assumptions on the marginal distributions, it does make, estimator discussed in Jung et al. (2005) to model the pairwise
by virtue of the chosen parametric copula, parametric as- associations. The latter can be thought of as a nonparamet-
sumptions on the dependence structure. The estimation of ric counterpart to Cox regression. More specifically, one may
the dependence parameter for many copulas is rather tricky think of the latter as Cox regression on the ranks of the co-
around the value θ0 at which the copula attains C0 . Many cop- variate. Additional details are provided in Jung et al. (2005)
ulas, such as Frank’s copula (5), attain C0 only in the limit. on page 3079. For notational brevity, we will refer to this as
The normal copula was employed for the simulation and case the nonparametric Cox method. What all of these three meth-
studies presented in this paper as it enjoys computational ro- ods have in common is that they enable the direct incorpora-
bustness, which is an attractive feature from a practical point tion of the censoring mechanism and the covariate (i.e., gene
1096 Biometrics, December 2007
did not notice any considerable changes in the results. The Table 5
code used for the numerical work presented in this paper is This table lists the top thirty (out of K = 4966) adjusted
available, for use with R (R Development Core Team, 2006), P-values (P̃kB ) for the data presented in Beer et al. (2002) by
from the first author. Currently, the code only incorporates modeling the pairwise associations using a log-linear Cox
the normal copula and offers some rudimentary facilities for model. The P-values are calculated based on (29) using B =
parallelization via Snow and Rmpi (Rossini, Tierney, and Li, 10,000 permutation replicates.
2003).
The multiple testing method employed in this paper aims Symbol Probeset ID P̃kB
to control a FWER to ensure that the probability of erro- HPCAL1 D16227 at 0.9131
neously classifying any gene as prognostic given that none KIAA0263 D87452 at 0.9113
of the genes are prognostic is adequately controlled. We re- FUT3 U27326 s at 0.8822
iterate that the main thrust of this paper is intended to be NULL AFFX−HUMGAPDH/M33197 5 at 0.8747
the illustration of the applicability of copulas in applied prob- SUI1 L26247 at 0.8224
lems and not the advocacy of any specific method for multiple RPS3 X55715 at 0.8118
testing. Common criticisms leveled against the employment of KIAA0020 D13645 at 0.8029
UGP2 U27460 at 0.7798
FWER is that it is too conservative. Alternatively, one may
S100P X65614 at 0.7111
consider controlling a FDR as for example described in Ben- FLJ20493 HG613−HT613 at 0.6838
jamini and Hochberg (1995) or Storey (2002). To this end, BM−002 Z70222 at 0.6809
the unadjusted P-value for gene k can be easily
√ approximated ARHE S82240 at 0.6787
either by recalling that the distribution of n(θ̂kn − θ0 ) under GAPD X01677 f at 0.6778
H0k is asymptotically normal or by using RPC Y11651 at 0.6520
P63 X69910 at 0.6282
1 n
B
H2AFZ M37583 at 0.6110
p̃B,∗
k = I θ̂k > θ̃jK,n , (40) INHA X04445 rna1 s at 0.5628
B KIAA0005 D13630 at 0.4733
j=1
KIAA0084 D42043 at 0.4718
NULL AFFX−HUMGAPDH/M33197 M at 0.4492
based on B permutation resampling replicates instead of the
5T4 Z29083 at 0.4466
adjusted P-value (29). Many FDR type methods assume that GARS U09510 s at 0.4090
the K tests are mutually independent. It is noted that the HRB L42025 rna1 at 0.3719
aforementioned unadjusted P-values are not necessarily mutu- SLC2A1 K03195 at 0.3618
ally independent. The permutation FWER adjusted P-values CASP4 U28014 at 0.3520
based on the proposed method were presented in Table 4, CDC6 U77949 at 0.3279
where we have additionally provided the permutation unad- STC1 U25997 at 0.2442
justed P-values, using (40) along with the corresponding q- PEX7 U88871 at 0.2303
values as discussed in Storey (2002). All of the q-values for ADM D14874 at 0.1155
the top thirty gene list are less than 0.1. TMF1 L01042 at 0.1096
As pointed out in the previous section, the top thirty list
obtained by the method proposed in this paper does not co-
incide with that obtained by the nonparametric Cox method. and
Given that the methods used in Beer et al. (2002) are based
Λ+ θ̃K,n ; θ0 = max θ̃1n − θ0 , . . . , θ̃K
n
− θ0 , (42)
on asymptotics and do not adjust for multiplicity, we decided
to repeat our permutation based analysis using parametric and by using the α and 1 − α quantiles, respectively.
Cox regression using B = 10,000 replicates, so as to enable The proposed method, as presented, is useful for studies
a direct comparison among the three methods when applied where the objective is to identify individual genes which are
to this data set. The resulting list is provided in Table 5. At associated with the time-to-event outcome. In some cases,
a two-sided FWER level of 0.15, only two genes TMF1 and what may be of interest is to identify collection or clusters of
ADM, with adjusted P-values of 0.1096 and 0.1155, are signif- genes, sometimes referred to as metagenes or gene signatures,
icant. Overall this list of genes differs considerably from those which are associated with the outcome variable. The proposed
of Table 4 (copula method) and Table II (Jung et al., 2005; method may be easily modified to accommodate this multi-
nonparametric Cox method). We also observe that this list variate approach. One can, for example, create metagenes as
contains two housekeeping genes (probeset ids start with the linear combinations of the gene-expression vectors, using an
string AFFX), one of which has an adjusted P-value of less appropriate dimension-reduction technique, and then identify
than 0.5. The comparison of the lists based on the three meth- those vectors which are associated with the time-to-event vari-
ods, all of which theoretically control FWER, suggests that able using the method proposed in this paper. One may also
the results are greatly sensitive to the method employed. consider using the proposed method to filter out genes whose
We have considered the so-called two-sided problem in our unadjusted P-value are below a certain threshold and then
discussions. The one-sided variants are easily obtained by us- attempt to find clusters among the remaining genes using un-
ing the following modified versions of (26) supervised learning methods.
We have limited our discussion to the case of indepen-
Λ− θ̃K,n ; θ0 = min θ̃1n − θ0 , . . . , θ̃K
n
− θ0 , (41) dent right censoring. The assumption of independence can be
1098 Biometrics, December 2007
weakened to that of conditional independence on the gene ex- Jung, S.-H., Owzar, K., and George, S. L. (2005). A multiple
pression. Other censoring mechanisms (e.g., interval censor- testing procedure to associate gene expression levels with
ing) may be considered by expressing the likelihood function survival. Statistics in Medicine 24, 3077–3088.
in (17) mutatis mutandis in terms of the copula function and Kepner, J. L., Harper, J. D., and Keith, S. Z. (1989). A note
its partial derivatives. The plug-in estimation method of the on evaluating a certain orthant probability (C/R: P291-
dependence parameter of bivariate copulas in the framework 292; Com: 91V45 p256). The American Statistician 43,
of bivariate survival subject to interval censoring has been 48–49.
presented in Sun, Wang, and Sun (2006). Li, H. and Luan, Y. (2003). Kernel cox model for relating
gene expression profiles to censored survival data. Pacific
6. Supplementary Material Symposium on Biocomputing 8, 65–76.
Supplementary materials for this paper may be accessed at Nelsen, R. (1998). An Introduction to Copulas. New York:
the Biometrics website http://www.tibs.org/biometrics. Springer-Verlag.
Oakes, D. (1982). A concordance test for independence in the
presence of censoring. Biometrics 38, 451–455.
Acknowledgements R Development Core Team (2006). R: A Language and En-
The authors would like to thank the editor, an associate edi- vironment for Statistical Computing. Vienna, Austria: R
tor, and two referees for insightful comments and suggestions Foundation for Statistical Computing.
that have led to a substantially improved paper. Rossini, A., Tierney, L., and Li, N. (2003). Simple parallel
statistical computing in R. Technical Report 193, UW
References Biostatistics Working Paper Series.
André, A., Karn, T., Solbach, C., Seiter, T., Strebhardt, K., Roy, S. N. (1953). On a heuristic method of test construction
Holtrich, U., and Kaufmann, M. (2002). Identification and its use in multivariate analysis. Annals of Mathemat-
of high risk breast-cancer patients by gene expression ical Statistics 24, 23–30.
profiling. Lancet 359, 131–132. Schweizer, B. and Wolff, E. F. (1981). On nonparametric mea-
Beer, D. G., Kardia, S. L., Huang, C. C., et al. (2002). Gene- sures of dependence for random variables. The Annals of
expression profiles predict survival of patients with lung Statistics 9, 879–885.
adenocarcinoma. Nature Medicine 8, 816–824. Shannon, W. D., Watson, M. A., Perry, A., and Rich, K.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false (2002). Mantel statistics to correlate gene expression
discovery rate: A practical and powerful approach to levels from microarrays with clinical covariates. Genetic
multiple testing. Journal of the Royal Statistical Society, Epidemiology 23, 87–96.
Series B: Methodological 57, 289–300. Shih, J. H. and Louis, T. A. (1995). Inferences on the associ-
Bickel, P. J., Klaasen, C. A. J., Ritov, Y., and Wellner, J. A. ation parameter in copula models for bivariate survival
(1993). Efficient and Adaptive Estimation for Semipara- data. Biometrics 51, 1384–1399.
metric Models. New York: Springer-Verlag. Shoemaker, J. and Lin, S. (eds). (2005). Methods of Microarray
Clayton, D. G. (1978). A model for association in bivariate life Data Analysis IV. New York: Springer-Verlag.
tables and its application in epidemiological studies of fa- Sklar, A. (1959). Fonctions de repartition a $n$ dimensions
milial tendency in chronic disease incidence. Biometrika et leures marges. Publications de l’Institut de Statistique
65, 141–152. de L’Universite de Paris 8, 229–231.
Dhanasekaran, S. M., Barrette, T. R., Ghosh, D., Shah, R., Storey, J. D. (2002). A direct approach to false discovery
Varambally, S., Kurachi, K., Pienta, K. J., Rubin, M. A., rates. Journal of the Royal Statistical Society, Series B:
and Chinnaiyan, A. M. (2001). Delineation of prognostic Statistical Methodology 64, 479–498.
biomarkers in prostate cancer. Nature 412, 822–826. Sun, L., Wang, L., and Sun, J. (2006). Estimation of the asso-
Fleming, T. R. and Harrington, D. P. (1984). Nonparamet- ciation for bivariate interval-censored failure time data.
ric estimation of the survival distribution in censored Scandinavian Journal of Statistics 33, 637–649.
data. Communications in Statistics, Part A — Theory and Wang, W. and Wells, M. T. (2000a). Estimation of Kendall’s
Methods 13, 2469–2486. tau under censoring. Statistica Sinica 10, 1199–1215.
Frank, M. J. (1979). On the simultaneous associativity of Wang, W. and Wells, M. T. (2000b). Model selection and
f(x,y) and x + y − f(x,y). Aequationes Mathematicae 14, semiparametric inference for bivariate failure-time data
194–226. (C/R: p73-76). Journal of the American Statistical Asso-
Genest, C., Ghoudi, K., and Rivest, L.-P. (1995). A semipara- ciation 95, 62–72.
metric estimation procedure of dependence parameters Wigle, D. A., Jurisica, I., Radulovich, N., et al. (2002). Molec-
in multivariate families of distributions. Biometrika 82, ular profiling of nonsmall cell lung cancer and correlation
543–552. with disease-free survival. Cancer Research 62, 3005–
Jenssen, T. K., Kuo, W. P., Stokke, T., and Hovig, E. (2002). 3008.
Associations between gene expressions in breast cancer
and patient survival. Hum Genet 111, 411–420.
Joe, H. (1997). Multivariate Models and Dependence Concepts. Received September 2005. Revised December 2006.
Boca Raton, Florida: Chapman & Hall/CRC. Accepted January 2007.