You are on page 1of 14

Evolution, 54(5), 2000, pp.

1480–1492

TAXON SAMPLING, CORRELATED EVOLUTION, AND INDEPENDENT CONTRASTS


DAVID D. ACKERLY
Department of Biological Sciences, Stanford University, Stanford, California 94305 E-
mail: dackerly@stanford.edu

Abstract. Independent contrasts are widely used to incorporate phylogenetic information into studies of continuous
traits, particularly analyses of evolutionary trait correlations, but the effects of taxon sampling on these analyses have
received little attention. In this paper, simulations were used to investigate the effects of taxon sampling patterns and
alternative branch length assignments on the statistical performance of correlation coefficients and sign tests; ‘‘full-
tree’’ analyses based on contrasts at all nodes and ‘‘paired-comparisons’’ based only on contrasts of terminal taxon
pairs were also compared. The simulations showed that random samples, with respect to the traits under consideration,
provide statistically robust estimates of trait correlations. However, exact significance tests are highly dependent on
appropriate branch length information; equal branch lengths maintain lower Type I error than alternative topological
approaches, and adjusted critical values of the independent contrast correlation coefficient are provided for use with
equal branch lengths. Nonrandom samples, with respect to univariate or bivariate trait distributions, introduce dis-
crepancies between interspecific and phylogenetically structured analyses and bias estimates of underlying evolutionary
correlations. Examples of nonrandom sampling processes may include community assembly processes, convergent
evolution under local adaptive pressures, selection of a nonrandom sample of species from a habitat or life-history
group, or investigator bias. Correlation analyses based on species pairs comparisons, while ignoring deeper relation-
ships, entail significant loss of statistical power and as a result provide a conservative test of trait associations. Paired
comparisons in which species differ by a large amount in one trait, a method introduced in comparative plant ecology,
have appropriate Type I error rates and high statistical power, but do not correctly estimate the magnitude of trait
correlations. Sign tests, based on full-tree or paired-comparison approaches, are highly reliable across a wide range of
sampling scenarios, in terms of Type I error rates, but have very low power. These results provide guidance for
selecting species and applying comparative methods to optimize the performance of statistical tests of trait associations.

Key words. Comparative methods, correlated evolution, independent contrasts, simulations, taxon sampling.

Received November 5, 1999. Accepted April 24, 2000.

In recent years, numerous analytical methods have been dance (e.g., common taxa due to ease of study), research
introduced to incorporate phylogenetic information into tests history (taxa that have been the focus of prior studies), and
of evolutionary hypotheses such as rates of phenotypic evo- other practical considerations. These limitations will act more
lution, lineage diversification, the timing and pattern of co- strongly when the data are expensive and/or time-consuming
evolution among host-parasite or plant-herbivore lineages, to gather (e.g., population genetic parameters, behavior). Sec-
and correlations between phenotypic traits (Harvey and Pagel ond, many questions in evolutionary ecology address only a
1991; Sanderson and Donoghue 1994; Huelsenbeck and Ran- limited set of taxa that share certain life-history or other
nala 1997; Pagel 1999). These methods focus on how to characteristics (Westoby 1999). For example, a study of the
analyze interspecific data given a phylogenetic framework, significance of seed size in relation to gap regeneration in
but little attention has been paid to the problem of which tropical forests (Foster and Janson 1985) will necessarily be
taxa to select for analysis and how taxon sampling may in- limited to tropical tree species, and as a group these species
fluence the results. Taxon sampling has recently received do not form a clade. Rather, they represent a very diffuse
increased attention in relation to phylogeny reconstruction
sample of taxa drawn from the entire seed plant lineage (>
(e.g., Hillis 1996; Graybeal 1998; Poe 1998), with particular
275,000 species). Any particular study would be limited to
focus on the relative value of more taxa versus more char-
only a small sample, frequently a sample drawn from one
acters for improving phylogenetic accuracy. In studies of
geographic region, and analysis of these data in a phyloge-
character evolution, it has been shown for discrete traits that
netic context will necessarily use a heavily pruned phylogeny
the relative frequency of different character states will influ-
(e.g., Kelly and Purvis 1993; Kelly 1995). The consequences
ence ancestral state reconstructions and the analysis of cor-
of this focus on particular life-history groups or communities
related trait changes (Frohlich 1987; Maddison 1990; Sille´n-
and of the associated pruning of phylogenetic trees have re-
Tullberg 1993). For continuous traits, the sampling of taxa
ceived little attention in relation to phylogenetic comparative
and the resulting distribution of character values will also
methods. Finally, in a historical context speciation and ex-
have significant impacts on ancestral reconstructions (Cun-
tinction also represent sampling processes, in the sense that
ningham et al. 1998). However, the effects of taxon sampling
trait-dependent changes in speciation or extinction probabil-
on the study of correlated evolution in continuous characters
ities will influence the relative frequency of character states
have not been explicitly considered.
in extant taxa. The resulting patterns may be the focus of
In comparative studies, sampling of taxa within lineages attention for some questions (e.g., patterns of species sorting
of interest occurs for several reasons. First, all studies face during mass extinction events). However, in most cases, stud-
methodological limitations of time and effort and it is often ies are limited to extant taxa because the traits can only be
difficult or impossible to obtain data on all taxa in a clade. measured in living organisms or populations, so the potential
The resulting samples will frequently reflect geography (e.g., effects of nonrandom speciation/extinction processes merit
taxa that live in the researcher’s immediate vicinity), abun- consideration.
1480
© 2000 The Society for the Study of Evolution. All rights reserved.
TAXON SAMPLING AND COMPARATIVE METHODS 1481

FIG. 1. (A) Example of a 256-taxon phylogeny based on a ‘‘time-only’’ Markovian speciation model (Martins 1996). (B) Phylogeny
of a random sample of 32 species drawn from the phylogeny in A, with true branch lengths based on the corresponding segments of the full
tree. (C) Phylogeny for the same set of species, with equal branch lengths. Circles show the pairs of sister taxa that would be selected for
species pair analyses of independent contrasts and sign tests. Tree graphics generated using TREEVIEW (Page 1996).

Westoby et al. (1998; Westoby 1999) have recently ad- ples of taxa (with respect to the traits of interest) are appro-
vanced several specific proposals for taxon sampling in com- priate. In contrast, to detect associations between traits Wes-
parative ecology, specifically considering the role of phy- toby et al. recommend a nonrandom approach based on sam-
logeny. The process of species selection is analogized to the pling pairs of closely related taxa in which the species of each
problem of random sampling in ecological research, where pair differ by some large amount in the ‘‘independent trait.’’
it has long been recognized as a central component of robust For example, in studies of seed size and its correlates, they
hypothesis testing. For estimates of properties such as the sampled phylogenetic species pairs in which seed size
mean and variance of a trait in different clades, random sam- differed by at least an order of magnitude (Armstrong and
FIG. 2. Type I error rates under the null model (A) and statistical power under the correlated-changes model (B) for random samples of
different size, based on nominal significance levels at a = 0.05. The last point on each line is based on analysis of the full 256-taxon
phylogenies. Sample size is the number of species or the number of contrasts: N for Rspec ; N —1 for Rpic and Spic; and N/2 for Rpair and Spair.
The ordinate in (A) is log transformed to show differences at low levels. The number of pairs is less than half the number of species in
the tree at each sample size due to tree topology (see Fig. 1).

Westoby 1993; Saverimuttu and Westoby 1996; Swanbor- independent third trait (which is not considered in the cor-
ough and Westoby 1996). Comparisons of species pairs, relation analyses), are there effects on parameter estimates
which invoke only a minimal level of phylogenetic infor- or Type I error rates? How do results based only on sister
mation, have been widely used in plant ecology dating back taxa pairs compare to full-tree analyses conducted over all
to Salisbury (1942). In a recent survey of the plant functional nodes of the phylogeny? Do samples of species pairs that
ecology literature, Ackerly (1999) found 28 studies that used differ by a predetermined amount in one character (following
species pairs, compared to 19 papers using independent con- Westoby 1999) lead to bias in estimating trait correlations
trasts calculated over a full phylogeny for the species of or influence Type I error or statistical power? Under the dif-
interest. In many cases, each pair is chosen to represent con- ferent scenarios outlined above, do nonparametric sign tests
trasting states of a discrete life-history trait (e.g., comparisons of associations between independent contrasts provide reli-
of physiology or allocation in annuals vs. perennials; Garnier
able tests (i.e., appropriate Type I error when p = 0) and
1992; Silvertown and Dodd 1996). If the contrasting states of
how does their power compare to parametric statistics when
the discrete trait represent extreme values of an underlying
p ⁄ 0?
continuous character, then such comparisons may be similar
to Westoby’s approach of selecting species that exhibit a
large difference in the independent trait. However, the po- SIMULATIONS
tential advantages and disadvantages of species pairs com- The simulations presented here are based on a two-step
parisons and of Westoby’s recommendations for statistical model of trait evolution followed by taxon sampling and
hypothesis testing have not been explicitly investigated (see analysis of character correlations. In the first step, the evo-
Maddison 2000). lution of two traits was simulated on a large phylogeny, based
In this study, simulations of trait evolution on a phylogeny either on a null model with no correlation between changes
were used to address the problem of taxon sampling, focusing in the traits ( p = 0) or a correlated-changes model sampling
on the method of independent contrasts and its application changes from a distribution with a true correlation of p =
to the study of correlated evolution between continuous phe- 0.5. In the second step, a subsample of species was selected
notypic traits. Simulations were conducted on a set of ran- from the terminal taxa on this phylogeny, based on various
domly generated 256-taxon phylogenies. Subsamples of the sampling schemes described below, and the phylogeny was
terminal taxa were selected, following various criteria out- pruned to show relationships among this subset (‘‘species’’
lined below, to address the following questions: For random and ‘‘taxa’’ are used interchangeably in this paper; Fig. 1).
species samples, how does sample size influence Type I error For each sample, correlations between the two traits were
rates (when the true correlation, p = 0) and statistical power estimated based on species values and independent contrasts
(for p ⁄ 0) of trait correlations based on either independent (including species pairs analyses, sign tests, and alternative
contrast or interspecific (nonphylogenetic) analyses? Do non- branch lengths) to estimate: (1) the sample correlation be-
random species samples, with respect to one or both char- tween the two traits to determine if the taxon sampling led
acters, lead to biased estimates of trait correlations? How to bias in estimates of the true correlation ( p = 0 or 0.5); (2)
does nonrandom sampling affect Type I error rates when p
for the null model, Type I error rates at the a = 0.05 and
= 0? Similarly, if species are chosen on the basis of an 0.01 levels, based on standard significance testing; and (3)
TABLE 1. Summary of correlation statistics (see abbreviations in text) for simulated evolution of two traits on a set of 256 random phylogenies
(full tree) and for subsamples of taxa chosen based on six different sampling algorithms; for NR2 the sampling bias coefficient = 0.75, and
for NR3 the probability of change in Z per time step = 0.001 (see text). N, number of taxa sampled for Rspec, number of contrasts for Rpic and
Spic, and number of species pairs for Rpair and Spair (the mean and range of values is shown for sampling schemes in which sample size varied);
mean and SD of correlation coefficients (R) or proportion of same sign contrasts (S) are based on 10,000 replicates. Type I error and statistical
power are the proportion of nominally significant results at a S 0.05 or 0.01. Results for p = 0 are for the null model of no true correlation;
results for p = 0.5 are for an alternative model with a true correlation between evolutionary trait changes.

Null model: p = 0 I error rates


Type Correlated-changes model: p = 0.5
Statistical power
Sample Statistic N Range R or S SD (a = 0.05) (a = 0.01) R or S SD (a = 0.05) (a = 0.01)
Full tree Rspec 256 0.003 0.213 0.568 0.447 0.488 0.171 0.971 0.960
Rpic.true 255 0.002 0.070 0.058 0.014 0.497 0.060 0.995 0.994
Rpic.equal 255 0.002 0.084 0.122 0.045 0.497 0.071 0.995 0.994
Rpair 84.9 (78–94) 0.003 0.152 0.155 0.067 0.493 0.121 0.981 0.954
Spic 255 0.501 0.033 0.053 0.014 0.666 0.033 0.994 0.992
Spair 84.9 (78–94) 0.501 0.056 0.055 0.014 0.666 0.054 0.877 0.700
Random Rspec 32 0.003 0.263 0.195 0.089 0.486 0.209 0.768 0.623
Rpic.true 31 0.000 0.180 0.051 0.010 0.494 0.139 0.850 0.657
Rpic.equal 31 —0.001 0.202 0.083 0.023 0.493 0.157 0.825 0.651
Rpair 10.7 (7–15) —0.005 0.352 0.095 0.026 0.478 0.283 0.418 0.213
Spic 31 0.499 0.091 0.073 0.013 0.666 0.085 0.529 0.243
Spair 10.7 (7–15) 0.497 0.155 0.025 0.010 0.665 0.145 0.111 0.047
NR1 Rspec 32 —0.001 0.246 0.165 0.067 0.318 0.229 0.481 0.312
Rpic.true 31 0.000 0.179 0.049 0.009 0.385 0.158 0.611 0.364
Rpic.equal 31 0.000 0.198 0.075 0.019 0.359 0.177 0.554 0.332
Rpair 10.6 (6–15) —0.004 0.346 0.084 0.024 0.388 0.308 0.303 0.131
Spic 31 0.501 0.091 0.074 0.011 0.624 0.089 0.340 0.126
Spair 10.6 (6–15) 0.500 0.154 0.022 0.009 0.639 0.150 0.084 0.035
NR2 Rspec 32 0.573 0.117 0.958 0.855 0.651 0.102 0.994 0.959
Rpic.true 31 0.362 0.147 0.557 0.292 0.588 0.108 0.975 0.891
Rpic.equal 31 0.392 0.161 0.622 0.390 0.591 0.123 0.959 0.872
Rpair 10.7 (6–15) 0.253 0.336 0.182 0.065 0.540 0.258 0.510 0.276
Spic 31 0.621 0.086 0.329 0.109 0.703 0.081 0.702 0.398
Spair 10.7 (6–15) 0.566 0.153 0.037 0.014 0.689 0.142 0.144 0.063
NR3 Rspec 32 0.002 0.295 0.252 0.130 0.482 0.231 0.744 0.610
Rpic.true 31 0.000 0.179 0.050 0.009 0.494 0.137 0.850 0.659
Rpic.equal 31 0.000 0.218 0.106 0.035 0.491 0.168 0.810 0.639
Rpair 10.5 (6–15) 0.002 0.390 0.138 0.050 0.473 0.317 0.437 0.243
Spic 31 0.501 0.090 0.073 0.013 0.668 0.086 0.540 0.255
Spair 10.5 (6–15) 0.502 0.155 0.021 0.009 0.669 0.146 0.111 0.054
DP Rspec 66.9 (42–94) 0.003 0.210 0.258 0.139 0.525 0.157 0.952 0.900
Rpic.true 65.9 (41–93) —0.001 0.122 0.048 0.009 0.635 0.069 1.000 1.000
Rpic.equal 65.9 (41–93) 0.001 0.133 0.066 0.015 0.617 0.078 1.000 0.999
Rpair 33.4 (21–47) —0.001 0.184 0.061 0.015 0.688 0.091 0.998 0.990
Spic 65.9 (41–93) 0.499 0.062 0.048 0.010 0.745 0.053 0.987 0.943
Spair 33.4 (21–47) 0.498 0.087 0.049 0.008 0.824 0.066 0.985 0.924
RP Rspec 66.9 (42–94) —0.005 0.232 0.309 0.188 0.490 0.183 0.901 0.834
Rpic.true 65.9 (41–93) 0.000 0.124 0.051 0.012 0.497 0.094 0.992 0.962
Rpic.equal 65.9 (41–93) 0.000 0.143 0.094 0.028 0.496 0.111 0.982 0.939
Rpair 33.4 (21–47) 0.003 0.213 0.113 0.038 0.490 0.166 0.829 0.671
Spic 65.9 (41–93) 0.500 0.062 0.048 0.009 0.666 0.059 0.775 0.548
Spair 33.4 (21–47) 0.501 0.087 0.051 0.010 0.666 0.081 0.491 0.245

for the correlated-changes model, statistical power at the a length was 50.2 (range = 1–571), and the total height of the
= 0.05 and 0.01 levels. All simulations and analyses of in- trees (sum of branch lengths from the root to each terminal
dependent contrasts were conducted using ACAP, Version 5, taxon) averaged 623 (range = 422–960 across trees). Within
written by the author (see Ackerly 1997; Ackerly and each tree all taxa were at the same height from the root, so
Donoghue 1998). the trees can be interpreted as a set of contemporaneous taxa
Simulations for this study were conducted on a set of 100 in which branch lengths are proportional to time and the rate
random phylogenies of 256 taxa, generated using a ‘‘time- of character evolution is constant over the tree.
only’’ speciation model in COMPARE 4.1 (see Martins Evolution of two continuous traits (X,Y) was simulated by
1996). The trees differed from each other in topology and Brownian motion with initial character states = (0,0) at the
branch lengths. Branch lengths generated by COMPARE root of the tree. In each time step (one time step equals one
were multiplied by 1000 and rounded to the nearest integer unit of branch length), evolutionary changes in the two char-
to allow stepwise simulation of trait evolution along each acters were drawn from a bivariate normal distribution with µ
branch, as described below. Across all trees, mean branch = 0, s2 =X s2 =Y1, and correlation coefficient (the input
correlation, following Martins and Garland 1991) p = 0 for by membership in that group. It differs from random sampling
the null model and p = 0.5 for the correlated-changes model. because selected taxa are significantly clustered in groups at
For a particular sampling algorithm or sample size, each run the tips of the tree.
of the model was based on a total of 10,000 replicates (100 Species pairs with a large difference (DP). Phylogenetic
character simulations on each of the 100 trees). From each species pairs were selected in which the value of character X
simulation one taxon sample was selected for analysis, guar- differed by at least a specified amount (following the ap-
anteeing that each replicate was independent both in terms proach of Westoby 1999; see introduction). This was accom-
of the distribution of character states in the full tree and in plished by starting with the first terminal taxon (at the left
the sampling of taxa for analysis. side of the tree) and searching down the tree until a node
was reached in which the descendant taxa with the minimum
Taxon Sampling Algorithms and maximum values for character X differed by more than
Six different taxon sampling algorithms were evaluated to the specified amount; these two taxa were then chosen for
address the questions presented in the introduction. inclusion in the study, all other taxa in the clade excluded,
Random (R). A specified number of species was selected and the process was repeated in the next clade in the tree.
at random. A series of N = 16, 32, 64, and 128 were used Only one species pair was drawn from each monophyletic
to examine the effects of sample size. For comparison of this clade (paraphyletic pairs, or pairs nested within other pairs
method with the next three nonrandom algorithms, the results were not considered; cf. Burt 1989; Purvis and Rambaut
for N = 32 were used. 1995). The minimum allowable difference was set arbitrarily
Nonrandom on character 1 (NR1). A specified number of (at a value of 14.5) such that approximately 64 taxa (= 32
species (N = 32) was selected at random from those in which pairs) would be chosen, thus facilitating comparison of re-
the value of character X was greater than the mean value for sults with the methods above.
all taxa. This represents a case in which the species chosen Random pairs (RP). Species pairs were chosen at random.
for study exhibit a larger (or smaller) than average trait value This algorithm was run in tandem with the DP selection pro-
for one of the traits of interest. For example, an analysis of cess above, and the random pairs were chosen from the same
seed size versus height in trees would represent a nonrandom set of clades in each simulation run. This approach guaranteed
sample of species with respect to height, compared to the that the number of pairs were the same for each replicate, so
distribution of height in seed plants as a whole. Because there that comparisons of the DP and RP methods were not affected
are numerous evolutionary transitions between ‘‘tree’’ and by the distribution of sample sizes.
‘‘nontree’’ in the seed plants, any particular sample of trees
would be broadly distributed across the underlying seed plant Branch Lengths
phylogeny. Relative branch lengths, in terms of expected variance in
Nonrandom on both characters (NR2). A specified num- ber character evolution, are critical for correct application of in-
of species (N = 32) was chosen with an enforced cor- dependent contrasts (Felsenstein 1985; Martins and Garland
relation (the ‘‘sampling bias’’ coefficient) between character 1991; Purvis et al. 1994). Incorrect branch lengths inevitably
values. This was accomplished by randomly selecting co- raise Type I error rates, although they do not cause biased
ordinates from a bivariate distribution with a specified cor- estimates of the true correlations (Martins and Garland 1991;
relation coefficient (0, 0.25, 5, 0.75, or 0.9), and mean and Purvis et al. 1994). In empirical studies, fossil or molecular
standard deviations equal to those observed in the simulated clock data may be available to estimate branch lengths in
data, and then selecting the species in the data that were units of time, which would be appropriate if the characters
closest to the randomly chosen points in the X-Y space. Bi- under study evolve at a constant rate. Alternatively, if the
ological or investigator-driven scenarios that might result in
rate of evolution of phenotypic characters along each branch
bivariate nonrandom sampling are discussed in detail below
is assumed to be proportional to the corresponding rates for
(see Discussion).
characters used in phylogeny reconstruction (molecular or
Nonrandom with respect to a third character (NR3). Evo-
morphological), branch lengths may be estimated based on
lution of a binary character (Z) was simulated along the tree,
number of character changes in the phylogenetic dataset. In
starting with state 0 and changing from 0 to 1 or 1 to 0 with
either case, when a subsample of taxa are included in a study
probability 0.001 at each time step (simulations resulting in
of character evolution, the ‘‘correct’’ branch lengths can be
fewer than 33 taxa with Z = 1 were discarded). This low rate calculated for the pruned tree based on sums of branch
of change was selected such that less than half the ter- minal
lengths in the original phylogeny (Fig. 1A, B).
taxa would end up in state 1 (on average) and they would be In many (maybe most) cases, there is no basis to assume
clustered at the tips of the tree. Over the 10,000 simulations, that phenotypic traits evolve at constant or proportional rates
the number of taxa with Z = 1 averaged 84.6 (range = 33– (Martins and Garland 1991). In addition, some methods of
231). A random sample of 32 taxa was then selected from phylogeny reconstruction do not provide biologically mean-
those in which Z = 1 for analysis of correla- tions between ingful branch lengths (e.g., supertree analyses; Sanderson et
continuous traits X and Y. This scheme rep- resents a al. 1998). In these situations, all branches may be assigned
situation in which sampling is concentrated in a certain equal lengths (because only relative branch lengths are im-
habitat or life-history group (e.g., grassland species or portant, lengths would normally be set to one; Fig. 1C). Equal
annual plants), but the distribution and bivariate relation- ship branch lengths are sometimes interpreted in terms of a punc-
between the phenotypic traits of interest is not biased tuational model of evolutionary change because there is one
‘‘bout’’ of evolution associated with each speciation event. signment of equal branch lengths between all pairs) because
However, the punctuational interpretation is weaker when
the species pairs approach is employed to minimize necessary
considering taxon sampling because each branch represents
phylogenetic information; (4) sign tests based on independent
an unknown number of speciation events in the full phylog-
contrasts over the full phylogeny (S pic), calculated as the
eny (even if all extant taxa in a clade are sampled, this is
proportion of nodes with contrasts of same vs. different sign;
true due to the unknown distribution of speciation and ex-
and (5) sign tests based on sister taxa pairs only (S pair), anal-
tinction events in the past). Because only relative, and not
ogous to Rpair above. The Pearson correlation coefficient was
absolute, branch lengths influence independent contrast anal-
used for Rspec and the correlation coefficient calculated
yses, equal branch lengths may also be effective if the true
through the origin was used for Rpic and Rpair (for details, see
branch lengths are randomly distributed with respect to depth
Garland et al. 1992). Degrees of freedom were N — 2 for
in the tree. This is true for trees generated by time-only spe-
Rspec and Rpic and NP — 1 for Rpair, where NP is the number of
ciation models (although the resulting branch length distri-
sister taxa pairs (note that the total number of species in
butions are right-skewed; Martins 1996), and for such trees
paired tests = 2NP). Significance at the a = 0.05 and 0.01
equal branch lengths are fairly effective (Purvis et al. 1994).
levels was determined by a t-test for df S 30, and by the z*-
Purvis and Webster (1999) have also argued that equal branch
transformation for df S 31 (Sokal and Rohlf 1995, pp. 575–
lengths may be appropriate when there is significant error in
579). For sign tests, significance at the a = 0.05 level was
the estimation of species trait values, to avoid dispropor-
determined from critical values of the binomial distribution
tionate weighting of contrasts between closely related spe-
for N or NP S 22 (for Spic and Spair, respectively) and by a t-test
cies, which are particularly sensitive to these errors.
approximation for N or NP S 23 (Sokal and Rohlf 1995,
Alternatively, branch lengths may be constructed using p. 444). The mean and standard deviation of the correlation
topological approaches that lengthen branches selectively, coefficients were calculated from each distribution of 10,000
placing all terminal taxa at the same total height from the replicates to test for bias in estimating the true evolutionary
root (e.g., Grafen 1989; or a ‘‘minimum-extension’’ method, correlation (= 0 or 0.5 in the null and correlated-changes
as in the graphical output from PAUP, Swofford 1993). If models, respectively). Type I error rates and statistical power
all taxa under study are extant and rates of evolution are were calculated as the proportion of significant results (at P
assumed to be constant, then the placement of all terminal S 0.05 and S 0.01) under the null model and the correlated-
taxa at the same height is intuitively appealing. However, changes model, respectively.
there is no a priori basis to expect that the relative branch
lengths constructed from these topological rules are biolog- RESULTS
ically appropriate. Simulations have found that Type I error
rates are higher using Grafen’s topological algorithm in com- Full Trees
parison with equal branch lengths when the true branch For the full 256-taxon trees, the mean values of Rspec and
lengths are derived from a random speciation model (Purvis Rpic (analyzed using true or equal branch lengths) were in-
et al. 1994). distinguishable from 0.0 under the null model and 0.5 under
For the simulations reported here, branch lengths in the the correlated-changes model, demonstrating that both mea-
pruned trees were calculated from the true lengths in the full sures provide unbiased estimates of the evolutionary corre-
phylogeny or were set to equal length. All analyses were also lation (Table 1). However, the standard error of the corre-
conducted using the two topological approaches mentioned lations was very high for Rspec and somewhat elevated for
above (the Grafen and minimum extension algorithms), but in Rpic.equal, compared to Rpic.true. As a result, the Type I error rate
almost all cases these resulted in higher Type I error rates for Rspec was 0.57 (all values for Type I error and sta- tistical
compared to analysis on equal branch lengths, so the results power discussed in the text are based on a = 0.05; values for
are not shown (full results are available from the author on a = 0.01 are also reported in Table 1). Type I error rates for
request). Rpic.true were close to nominal levels, whereas rates for Rpic.equal
were somewhat elevated at 0.12. Statistical power for Rpic
Statistical Analyses of Correlations when p = 0.5 was close or equal to one under both branch
For each simulation run, the following correlations were length algorithms, but slightly lower for Rspec (Table 1). Spic,
calculated for all taxa over the full phylogeny and for the the sign test based on contrasts at all nodes, also had
subset of sampled taxa over the pruned tree: (1) species cor- appropriate Type I error rate and high power at this large
relations (Rspec), based on trait values among terminal taxa (i.e., sample size (Table 1).
TIPS; Martins and Garland 1991); (2) correlations of The set of 100 random phylogenies used in this study had an
phylogenetically independent contrasts (R pic), which are sta- average of 84.9 pairs of sister taxa (range = 78–94; the
tistically equivalent to correlations of evolutionary changes remaining taxa were allied with clades of two or more species
(Pagel 1993); Rpic.true is based on contrasts calculated from true so paired comparisons were not possible). Contrast correla-
branch lengths, and Rpic.equal is based on equal branch lengths tions based on these sister taxa pairs (R pair) provided unbiased
(equivalent to FLG and FLP, respectively, in Martins and estimates of true correlations, but elevated Type I error rates
Garland 1991); (3) correlations of contrasts between pairs of of 0.15 (Table 1). Spair, the sign test based on sister taxa pairs
sister taxa at terminal nodes (R pair), thereby excluding contrasts only, had appropriate Type I error rates but relatively low
power (0.88) under the correlated-changes model, presum-
based on internal nodes (Fig. 1C); these contrasts were
ably due to the reduced sample size and the conservative
unstandardized for branch lengths (equivalent to as-
nature of sign tests.
Random Samples
For random samples ranging from N = 16 to 128, Rpic.true
was unbiased with Type I error rates close to nominal levels
in all cases (Fig. 2A). Under a true correlation of p = 0.5,
statistical power ranged from 0.54 at N = 16 to 1.0 at N =
128 (Fig. 2B). Rspec was also unbiased at all sample sizes, but
Type I error rates increased from 0.12 at N = 16 to 0.44 at N
= 128, whereas statistical power increased from ap-
proximately 0.5 to 0.95 (Fig. 2A). Thus, the probability of
incorrectly rejecting the null hypothesis using species cor-
relations increases with larger sample sizes. Sign tests based
on all comparisons (S pic) maintained appropriate Type I error
rates, but had substantially reduced statistical power at low
sample sizes (Fig. 2A, B).
The use of equal branch lengths (Rpic.equal) resulted in
slightly elevated Type I error rates, increasing from 0.074 at
N = 16 to 0.1 at N = 128, with slightly lower power relative
to Rpic.true (Table 1, Fig. 2B). For random samples, Type I
error rates were lower using equal branch lengths compared
to results based on topological alternatives (results not
shown; see Simulations: Branch Lengths). In all cases, es-
timates of the true correlation were unbiased.
Analyses of sister taxa pairs (R pair and Spair) maintained
appropriate Type I error rates at all samples sizes. However,
these samples had greatly reduced sample size because only
a portion of the species selected in each sample were paired FIG. 3. An example of the NR2 sampling algorithm. Character
evolution on a 256-taxon phylogeny was simulated with p = 0
with another species (see Fig. 1C). Based on the random tree (open circles) and then 32 taxa were chosen that fell closest to
topologies in this study, the number of pairs in random sam- points randomly chosen from a bivariate distribution with a
ples averaged approximately two-thirds the maximum pos- sampling bias coefficient of 0.5 (filled squares). For this example,
sible number (e.g., for N = 32, the average number of pairs Rspec for the full tree = —0.24 and in the sample Rspec = 0.59 and
Rpic.true = 0.39.
was 10.7 of a maximum of 16). In addition, each pair of
species provides only one comparison. Thus, although the
statistical power of Rpair and Spair were similar to correspond-
ing values for Rpic and Spic, for the same degrees of freedom, sampling bias was similar in magnitude to the underlying
they were much lower when compared on the basis of the evolutionary trait correlation.
number of species studied (Table 1, Fig. 2B). For a range of sampling bias coefficients from 0.0 to 0.9,
mean values of Rspec increased rapidly under the null model,
Nonrandom Samples whereas Rpic and Rpair increased less markedly (Fig. 4A). Type
I error rates also increased somewhat for Rspec, Rpic, and Spic,
NR1. (See sampling methods under Simulations: Taxon reaching 0.65 for Rpic.true under the strongest sam- pling bias
Sampling Algorithms.) Under the null model, sampling of 32 coefficient of 0.9. Rpair had a stable but elevated Type I error
taxa in which X > X¯ resulted in unbiased estimates of of < 0.2, and only Spair maintained appropriate Type I error
the true correlation for Rpic and Rpair, and approximately cor- rates across all values of the sampling bias coefficient (results
rect Type I error rates for Rpic.true, Spic, and Spair. However, not shown). Under a true correlation of p
under the correlated-changes model, the true correlation of
= 0.5, Rspec increased from 0.14 to 0.69 for sampling bias
0.5 was underestimated by Rpic and Rpair (both averaged ap-
coefficients increasing from 0.0 to 0.9 (Fig. 4B). Rpic and Rpair
proximately 0.38) and statistical power was diminished rel-
were less sensitive across this range. All parameters
ative to random samples of the same size (Table 1).
converged at a sampling bias of 0.5, where the sampling
NR2. Nonrandom sampling relative to both characters
pattern matched the underlying evolutionary correlation (Fig.
imposed a correlation among the two traits in the sampled
4B). Thus, if sampling processes impose a weaker or stronger
taxa relative to all taxa in the clade (e.g., Fig. 3). This was
correlation of trait values in the sample, relative to the cor-
the only sampling method, among those tested here, that re-
relation among all species in the lineage, then interspecific
sulted in biased estimates of the true correlation under the
and independent contrast correlations will underestimate or
null model; for p = 0 and a sampling bias coefficient of 0.75, overestimate, respectively, the evolutionary correlation. In-
mean Rspec was 0.59 and mean Rpic.true was 0.36 (Table 1). dependent contrast correlations are less sensitive to these
Type I error rates were elevated correspondingly for all cor- biases, but the use of a phylogenetically structured analysis
relation statistics, except for Spair (which also had very low does not eliminate these effects.
statistical power). For p = 0.5, mean correlations for a sam- NR3. When taxa were sampled based on the character state
pling bias coefficient of 0.75 were slightly elevated using of an independent binary trait, all statistical indicators were
Rspec and Rpic, although the effect was small because the similar to random samples of the same size, except that
FIG. 4. Mean correlations under the NR2 sampling regime as a function of the sampling bias coefficient. (A) Null model, p = 0. (B) Correlated-
changes model, p = 0.5. N = 32 for all samples.

Type I error rates of Rpic.equal and Rpair were very slightly higher correlations are both equal to the true evolutionary correlation
and power was correspondingly lower (Table 1). describing the underlying patterns of change (Pagel 1993).
Thus, as expected, estimates of trait correlations based on
Sampling of Species Pairs random samples of different sizes are also unbiased with
For the DP sampling algorithm, the minimum difference respect to the true correlation, using interspecific or contrast
between taxa in character X was set at 14.5 units, which was correlations. However, as has been noted in all simulation
equivalent to approximately 0.73 standard deviations, based studies on this problem, the variance in the outcome of in-
on the trait distributions over the 256 species. Mean sample terspecific (i.e., nonphylogenetically adjusted) correlations is
sizes using this criterion were 66.8 species or 33.4 pairs, and much greater than for independent contrast correlations, so
the mean difference in character X between species in each Type I error rates for interspecific correlations are greatly
pair was 21.4 units. Under the null model, Type I error rates inflated (Martins and Garland 1991; Garland et al. 1993;
were approximately 0.05 for Rpic, Rpair, Spic, and Spair, but, as Purvis et al. 1994; Diaz-Uriarte and Garland 1996). Another
in other cases, they were elevated for Rspec (Table 1). aspect of this result, which has not been previously noted,
However, for p = 0.5, mean values of Rpic and Rpair were is that Type I error rates using the interspecific correlation
substantially elevated (0.64 and 0.69, respectively). Thus, increase with sample size (Fig. 2A); this occurs because the
this sampling regime maintained appropriate Type I error, standard deviations of Rspec declined only slightly with in-
but resulted in biased estimates of evolutionary trait corre- creasing N (SD = 0.35 for N = 12; SD = 0.21 for N = 256),
lations when the true correlation was not equal to zero. but the critical values for significance testing decline rapidly
Random species pairs (RP), drawn from the same clades and therefore more of the outcomes are judged significant
as the DP samples (and thus with the same sample sizes), using standard criteria. For independent contrast analyses,
exhibited a mean difference in character one of 7.9 units. the maintenance of appropriate Type I error rates for random
These samples had appropriate Type I error rates for Rpic.true, samples (using true branch lengths) is also consistent with
Spic, and Spair and slightly elevated values for Rpic.equal and Rpair; theoretical expectations. The method of independent con-
all estimates of the evolutionary correlation were un- biased trasts is specifically designed to estimate the independent
under the null and correlated-changes models. How- ever, the events along each branch of the phylogeny, and pruning some
statistical power of Rpair and Spair was much lower than the branches will not alter these estimates for the remaining taxa
corresponding values using DP sampling (0.83 vs. and their underlying divergences, as long as the correct
0.99 for Rpair; 0.49 vs. 0.99 for Spair; Table 1). branch length information is available.

DISCUSSION Nonrandom Sampling


Recent reviews of empirical studies using comparative Three approaches to nonrandom sampling were examined
methods have noted that correlation coefficients (and re- here: (1) NR1 sampled species in which the value of character
gression slopes) based on interspecific analyses and on in- X was greater than the mean value across all species; (2)
dependent contrast analyses are usually quite similar (Rick- NR2 sampled species from a nonrandom bivariate distribu-
lefs and Starck 1996; Price 1997; Ackerly and Donoghue tion (Fig. 3); and (3) NR3 sampled species with state 1 for
1998; Ackerly 1999). This pattern is consistent with theo- a discrete character Z which evolved independently along the
retical predictions based on the Brownian motion model, be- phylogeny with a low probability of change. NR3 resulted in
cause the expected values of the interspecific and the contrast a clustering of species in groups at the tips of the phy-
logeny, but there was no bias in the distribution of character
values for the traits under study. The results for NR3 were
essentially the same as random sampling for an equivalent
sample size, except for a very slight elevation of Type I error
rates and reduction in power when using equal branch
lengths. Thus, clustering of taxa does not affect the use of
independent contrast analyses to assess trait correlations. In
practice, clustering of taxa will occur if a study focuses on
species that share a particular life-history trait or habitat af-
finity (e.g., tropical trees) and if this trait evolves slowly so
that closely related species tend to exhibit the same state. If
the trait evolves more quickly, the distribution of taxa with
a particular state will be random with respect to the phylog-
eny. In either case, this has no effect on correlations of con-
tinuous traits provided that the traits under study evolve in-
dependently of the sampling character. It is not clear whether
the same conclusion would hold for associations of discrete
trait changes based on Maddison’s (1990) concentrated
changes test.
In contrast, the NR1 and NR2 schemes represent nonran-
dom sampling with respect to the distributions of the traits
of interest. NR1 resulted in unbiased estimates and appro- FIG. 5. Hypothetical scenario of a ‘‘grade shift’’ in which two traits
priate Type I error rates under the null model. However, for are strongly correlated within communities, but the elevations of
an evolutionary correlation of 0.5, the sample correlation the relationships are displaced, leading to a much weaker overall
correlation for all species combined. See discussion in text.
coefficients underestimated the true value, ranging from 0.32
for Rspec to 0.39 for Rpic.true, and statistical power was reduced
compared to a random sample of the same size. Thus, for ations of traits in these communities reflect both the evolu-
example, a study of relationships between body size and me- tionary history of the traits in the larger, encompassing
tabolism based only on the large species in a particular clade lineages and the sorting and assembly of the resulting species
would slightly underestimate the correlation between these into local communities. For example, Tofts and Silvertown
two traits. The underestimate of the correlation probably re- (2000) recently demonstrated that ‘‘environmental structur-
sults from the reduced range of variation in the data, because ing’’ led to greater similarity of coexisting species on a local
contrasts with large differences in X and Y would be rep- scale than expected by chance (as opposed to the prediction
resented with lower frequency than expected by chance. The of greater dissimilarity resulting from competitive exclusion).
fact that the bias appears in Rspec and Rpic demonstrates that it Thus, the species in a community may exhibit a narrower
is a general result of biased sampling, although the use of range of trait variation than would a random sample from
independent contrasts ameliorates the effect slightly. the local species pool or the encompassing lineage. Based on
Species samples that are nonrandom with respect to both the NR1 results, this reduction in variability would result in
characters (NR2, Fig. 3) influence estimates of trait corre- underestimates of trait correlations in analyses based on this
lations based on species values or independent contrasts, and sample of species.
this was the only sampling scheme in which estimates were It is also conceivable that the bivariate trait combinations
biased under the null model (Fig. 4). Under this scenario the that are viable in a particular community will fall into a
interspecific correlation provided the best estimate of the co- narrow envelope, leading to tighter associations between
efficient underlying the sampling process, but the worst es- traits within communities than in the set of all species across
timate of the underlying correlation of evolutionary changes in communities (Fig. 5). Associations among leaf functional
the traits. Trait correlations based on contrasts of species pairs traits in species of contrasting environments illustrate this
(Rpair) were the most robust for estimation of the evo- lutionary scenario. Reich et al. (1997, 1999) found that several traits
correlation, and only the sign test based on species pairs (e.g., leaf life span, leaf mass per area, assimilation rate) are
provided appropriate significance testing. strongly correlated across species from a broad range of hab-
In light of these results, it is important to consider the itats. However, in comparisons of different communities, the
biological processes that may result in nonrandom trait dis- elevation of these relationships shift with climate; for ex-
tributions or correlations in a set of species relative to the ample at any particular value of leaf life span, leaves from
evolutionary correlation in the history of the clade. In an drier habitats are thicker. Consequently, the overall relation-
ecological context, community assembly processes may act as ship observed across multiple communities arises from a se-
filters resulting in taxon sampling patterns analogous to the ries of tighter relationships within communities with grade
nonrandom schemes examined here. As mentioned in the shifts across communities. In the case of leaf life span and
introduction, many comparative studies in plant ecology fo- leaf mass per area, the correlation across 110 species from
cus on species sampled from particular communities or geo- six communities was 0.75, whereas the correlations within
graphic areas (e.g., the Indiana Dunes, Mazer 1989; the Shef- communities ranged from 0.85 to 0.92, and the elevations of
field, U.K., flora, Rees 1996). The distributions and associ- the regression lines were significantly displaced (Reich et al.
1999). When this relationship was analyzed using phyloge- a posteriori tests, merit further consideration (for related ex-
netic independent contrasts (and equal branch lengths), Rpic- amples, see Garland et al. 1993; Hansen 1997).
values were 0.64 for the full sample, but ranged from 0.69
For analyses of continuous traits, the primary disadvantage
to 0.93 within communities (Ackerly and Reich 1999; unpubl.
of the paired comparisons method is the loss of statistical
analyses). While all of these correlations are large and bio-
power, due to the elimination of the deeper nodes in the
logically significant, it is important to note that both the
phylogeny and the resulting reduction in sample size from
species correlations and contrast correlations were higher
N — 1 to N/2 comparisons. For example, the power to detect
within communities. These values presumably reflect the con-
straints on ‘‘allowable’’ trait combinations or fine-tuning by true correlations of 0.5 (at a = 0.05) based on data for 32
convergent evolution due to environmental conditions within species is 0.85 using 31 contrasts over a full phylogeny
each community (for a model of trait evolution with analo- (Rpic.true) compared to approximately 0.53 if the 32 species
gous results, see Price 1997). were arranged as 16 pairs (Rpair). For sign tests, the corre-
sponding values were 0.53 for Spic compared to 0.21 for Spair.
If data are available from multiple communities in con-
In most cases, there is further loss of power because some
trasting environments, patterns of grade shifts can be easily
species that might be available for study are not directly allied
detected. A phylogenetically structured multiple regression or
with a sister taxon for comparison (this limitation can be
analysis of covariance could be used to test for shifts in
circumvented by using paraphyletic or nested pairs; see Burt
patterns of correlated evolution within versus among com-
1989; Purvis and Rambaut 1995; for algorithms for optimiz-
munities (for a related example, see Garland et al. 1993).
ing the selection of species pairs, see Maddison 2000). Wes-
However, the situation that arises more commonly in plant
toby’s method, based on the selection of pairs of species that
ecology is that data from species in a single community are
differ by a large amount in the independent trait (e.g., order
used to test a general evolutionary hypothesis and comparable
of magnitude differences in seed size; Armstrong and Wes-
data are not available to test for differences among com-
toby 1993), increases the power of the paired comparisons
munities. In a literature review of comparative plant ecology,
method to levels comparable to full-tree analyses (Table 1),
Ackerly (1999) found 22 studies (of 92 employing phylo-
and Type I error rates are correct under the null model. How-
genetic comparative methods) in which species were sampled
ever, the method has a significant drawback in that estimates
from a geographic region or ecological community, without
of nonzero correlations are inflated. Thus, the method can
regard to phylogenetic affiliation. Thus, if trait correlations
only be used to determine the presence or absence of a sig-
are stronger within communities, as in the case described
nificant association between traits, but not to estimate the
above, these studies may result in overestimation of the
magnitude of a correlation (or presumably the slope of a
strength of correlated evolution in the traits under consid-
regression, although this was not explicitly examined here).
eration. Furthermore, the interspecific correlation may pro-
On balance, if phylogenetic relationships among species are
vide a more accurate measure of the biological process re-
known and the assumption of homogeneity in evolutionary
sponsible for patterns within communities (e.g., community
processes over the full tree is acceptable, it appears that there
assembly or convergent evolution).
is little to be gained from the paired comparisons method if
one is interested in both the significance and magnitude of
Paired Comparisons trait associations.
In comparative analyses, the method of paired comparisons Branch lengths. Previous studies have clearly demon-
has several advantages (see Maddison 2000). It requires min- strated that appropriate branch lengths are critical for appro-
imal phylogenetic information, and it may be applied simply priate Type I error when using independent contrasts (Martins
by comparing congeners (e.g., Salisbury 1942). Paired com- and Garland 1991; Garland et al. 1992; Purvis et al. 1994).
parisons are also used in testing relationships between a dis- However, there are many situations in which there is no
crete, independent trait and a continuous, dependent trait meaningful source of branch length information; e.g., when
(e.g., growth analysis of congeneric annuals vs. perennials; only the phylogenetic topology is known or when there is no
Garnier 1992; also see Grafen 1989; Purvis and Rambaut correspondence between rates of evolution in continuous
1995; Martins and Hansen 1997). Another circumstance in traits and in the systematic traits used to construct the phy-
which the paired comparison method may be valuable is when logeny. As discussed in the introduction, topological algo-
each pair represents a recent divergence that has occurred in rithms may be applied to generate branch lengths that will
a particular habitat or under certain environmental conditions. raise all terminal taxa to the same height (e.g., Grafen 1989);
For example, if several wind-dispersed plant lineages colo- these methods have some intuitive appeal in the study of
nize a newly formed island, correlated changes in the size of extant taxa because the terminal taxa appear contempora-
the seed and the dispersal structure would demonstrate pos- neous. Alternatively, all branch lengths may be set equal, in
sible adaptations of dispersal to island conditions. Indepen- the absence of other information. For traits evolving on a
dent contrasts for these two traits deeper in the phylogeny tree generated by a Markovian branching process, both this
would reflect evolution under other circumstances that would study and Purvis et al. (1994) found that equal branch lengths
not be relevant to the question at hand. This scenario high- performed better than the topological lengths based on Gra-
lights the critical assumption of homogeneity in the evolu- fen’s algorithm, although Type I error rates were inflated in
tionary process when independent contrasts are calculated both cases compared to true branch lengths.
over the full phylogeny for species of interest. Methods for Garland et al. (1992) have recommended that the appro-
detecting heterogeneity in trait correlations, using a priori or priateness of a branch length distribution (e.g., from molec-
ular clock or fossil data) should be tested by the regression
of standardized contrasts on the square root of the summed This universe may be all taxa in a particular clade, the species
branch lengths at each divergence (the slope should not be of a particular ecological community or geographic region,
different from zero). If this test fails (and branch lengths or taxa that share a morphological or ecological character-
cannot be adequately transformed) or if no information is istic. In each case, the appropriate sample of species will
available at the outset, then it appears that the use of equal differ, and it is necessary to be explicit about the nature of
branch lengths may be a reasonable alternative. However, the hypothesis and the scope of inference to determine the
equal branch lengths do inflate Type I error rates, and this appropriate sampling protocol. Westoby et al. (1998) have
effect has not been taken into account in empirical studies. To provided very useful guidance in this direction by suggesting
correct this tendency, I have calculated critical values of the several different forms of questions that arise in comparative
independent contrast correlation coefficient at a = 0.1, 0.05, ecology and the appropriate species selection designs in each
0.02, and 0.01, based on the assumption that branch lengths case. The following conclusions and recommendations re-
on the true tree resemble those generated by random garding taxon sampling and the implementation of indepen-
speciation models (Martins 1996) and that the analysis was dent contrast analyses emerge from the results of this sim-
then conducted on a random species sample using equal ulation study.
branch lengths (see Ackerly 2000). The use of these adjusted
For studies of associations between evolutionary changes
critical values for the random samples in this study reduced
in continuous traits, random samples of species within clades
statistical power (under the correlated-changes model of p =
provide robust estimates of correlation coefficients (and pre-
0.5) from 0.53 to 0.46 at N = 16, whereas there was virtually
sumably regression slopes as well, although regression anal-
no effect on statistical power at N S 64.
yses were not examined in this study). Therefore, there is no a
priori reason why comparative studies of correlated evo-
Sign Tests
lution should attempt to include all known taxa in a clade
A further conclusion of these studies is that the use of sign (thereby restricting the scope of inference to that lineage),
tests, which do not use branch length information at all, pro- rather than subsamples drawn from larger clades.
vides a very robust and conservative approach to significance Biological or methodological factors that lead to nonran-
testing for independent contrasts. Across all sampling re- dom taxon samples with respect to univariate or bivariate trait
gimes examined here (except NR2), sign tests based on all distributions may introduce systematic discrepancies be-
contrasts or on terminal species pairs alone (S pic and Spair) tween interspecific and phylogenetically structured analyses
maintained Type I error rates S 0.05. As with all nonpara- and bias estimates of evolutionary correlations. Examples
metric tests, this robust performance is gained at the price may include community assembly processes, convergent evo-
of greatly reduced statistical power for detecting nonzero lution under adaptive constraints in a particular environment,
correlations. Sign tests are therefore very conservative and a or a focus by investigators on species that exhibit the trait
significant result can be confidently interpreted as indicating combinations of interest (for an example with discrete traits,
a true relationship, but a nonsignificant result does not pro- see Maddison 1990; Sille´n-Tullberg 1993). If the bias is
vide a strong basis for accepting the null hypothesis (cf. Kelly in- troduced by biological causes, then it is important to rec-
and Purvis 1993). Furthermore, sign tests do not provide ognize that two or more processes may be at work (e.g.,
estimates of the magnitude of a correlation or the sign of the correlated character evolution and nonrandom community as-
slope in regression analyses. As in all statistical analysis, the sembly processes), and comparative analyses must be care-
problems of significance testing are most critical for results fully designed to isolate the effects of each process. Large
close to P = 0.05 (or other relevant cutoffs). Highly signif- discrepancies between the results of interspecific and inde-
icant results will usually be significant under any of the al- pendent contrast analyses may be one indication of hetero-
ternatives, and any result that is dependent on a particular geneous processes influencing trait distributions (see Price
branch length distribution or that is significant using para- 1997; Ackerly and Donoghue 1998). Methodological bias, in
metric tests but not a sign test, should be accepted with cau- contrast, should be corrected by an effort to sample taxa
tion. randomly with respect to the traits or trait associations of
interest.
Conclusions Analyses based on related species pairs entail a significant
loss of statistical power relative to the investment in data
The problem of taxon sampling focuses attention on the
collection, because N species provide only N/2 contrasts. Spe-
nature of inference in comparative biology (see Westoby
cies pairs sampling may be appropriate if patterns of evo-
1999). Scientific inference has several components: the spec-
lutionary divergence or functional relationships are expected to
ification of a statistical universe to which a hypothesis ap-
be different in recent divergences, compared to deeper nodes
plies, the selection of an appropriate sample from that uni-
of the phylogeny, but this question remains to be ex- plored.
verse, and the choice of a statistical model for analysis. The
Barring specific rationales to the contrary or the ab- sence of
introduction of independent contrasts and other quantitative
phylogenetic information at deeper nodes, indepen- dent
methods incorporating phylogenies (Harvey and Pagel 1991;
contrast analyses should be conducted over all available and
Huelsenbeck and Rannala 1997) has led to much more ex-
appropriate species for a particular question and not re- stricted
plicit consideration of the statistical models underlying com-
to samples of related species pairs.
parative analyses. However, there has been much less atten-
tion to the problem of taxon sampling, and the explicit spec- If species pairs are employed, Westoby’s (1999) method
ification of the ‘‘universe’’ of taxa to which inferences apply. of choosing species pairs that differ by a large amount in the
independent character may be adequate for determining
whether a relationship exists between two traits. This ap- correlated evolution using phylogenetically independent con-
proach maintains appropriate Type I error rates (using cor- trasts: sensitivity to deviations from Brownian motion. Syst. Biol.
relation or sign tests) and statistical power is comparable to 45:27–47.
full-tree analyses based on similar total numbers of species. Felsenstein, J. 1985. Phylogenies and the comparative method. Am.
Nat. 125:1–15.
However, a major drawback of this method is that the mag- Foster, S. A., and C. H. Janson. 1985. The relationship between
nitude of the correlation (and possibly of regression slopes seed size and establishment conditions in tropical woody plants.
as well) is not estimated accurately, so only presence/absence Ecology 66:773–780.
of a relationship can be established. Frohlich, M. W. 1987. Common-is-primitive: a partial validation
by tree-counting. Syst. Bot. 12:217–237.
Exact significance tests are highly dependent on appro- Garland, T., Jr., P. H. Harvey, and A. R. Ives. 1992. Procedures for
priate branch length information (see Garland et al. 1992). the analysis of comparative data using phylogenetically in-
In the absence of other information, equal branch lengths dependent contrasts. Syst. Biol. 41:18–32.
apparently maintain the lowest Type I error rates under a Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993.
variety of conditions. A table of adjusted critical values is Phylogenetic analysis of covariance by computer simu- lation.
Syst. Biol. 42:265–292.
provided to obtain correct Type I error rates on the assump- Garnier, E. 1992. Growth analysis of congeneric annual and pe-
tion that the true branch lengths correspond to a time-only rennial grass species. J. Ecol. 80:665–675.
speciation model but equal branch lengths are used for anal- Grafen, A. 1989. The phylogenetic regression. Phil. Trans. R. Soc.
ysis (Ackerly 2000). Lond. Ser. B 326:119–157.
Graybeal, A. 1998. Is it better to add taxa or characters to a difficult
Alternatively, the use of sign tests provides a very con- phylogenetic problem. Syst. Biol. 47:9–17.
servative approach to significance testing. The nonparametric Hansen, T. F. 1997. Stabilizing selection and the comparative anal-
sign test maintains appropriate Type I error rates under a ysis of adaptation. Evolution 51:1341–1351.
wide range of circumstances, but the cost is a marked re- Harvey, P. H., and M. Pagel. 1991. The comparative method in
evolutionary biology. Oxford Univ. Press, Oxford, U.K.
duction in statistical power. Thus, significant results based Hillis, D. M. 1996. Inferring complex phylogenies. Nature 383:130.
on sign tests may be presented with great confidence, but Huelsenbeck, J. P., and B. Rannala. 1997. Phylogenetic methods come
nonsignificant results do not provide a strong basis for ac- of age: testing hypotheses in an evolutionary context. Sci-
cepting the null hypothesis. ence 276:227–232.
Kelly, C. K. 1995. Seed size in tropical trees: a comparative study
of factors affecting seed size in Peruvian angiosperms. Oecol-
ACKNOWLEDGMENTS ogia 102:377–388.
Kelly, C. K., and A. Purvis. 1993. Seed size and establishment
I thank M. Donoghue, J. Gittleman, J. Losos, E. Martins, conditions in tropical trees: on the use of taxonomic relatedness in
A. Purvis, T. Price, D. Schwilk, and M. Westoby for com- determining ecological patterns. Oecologia 94:356–360.
ments and discussions that greatly improved this paper. I also Maddison, W. P. 1990. A method for testing the correlated evolution
of two binary characters: are gains or losses concentrated on
thank A. Purvis for suggesting the NR3 sampling scheme. certain branches of a phylogenetic tree? Evolution 44:539–557.
W. Maddison provided critical support in development of the ———. 2000. Testing character correlations using pairwise com-
ACAP simulation and analysis software. Financial support parisons on a phylogeny. J. Theor. Biol. 202:195–204.
from the National Science Foundation (DEB 94-03252) and Martins, E. P. 1996. Conducting phylogenetic comparative studies
when the phylogeny is not known. Evolution 50:12–22.
a Terman Fellowship from Stanford University is gratefully Martins, E. P., and T. Garland Jr. 1991. Phylogenetic analyses of
acknowledged. the correlated evolution of continuous characters: a simulation
study. Evolution 45:534–557.
LITERATURE CITED Martins, E. P., and T. F. Hansen. 1997. Phylogenies and the com-
parative method: a general approach to incorporating phyloge-
Ackerly, D. D. 1997. ACAP2: another comparative analysis pro- netic information into the analysis of interspecific data. Am. Nat.
gram. Available at http://www.stanford.edu/~dackerly/ 149:646–667.
ACAP.html. Mazer, S. J. 1989. Ecological, taxonomic, and life history correlates of
———. 1999. Phylogeny and the comparative method in plant func- seed mass among Indiana dune angiosperms. Ecol. Monogr.
tional ecology. Pp. 391–413 in M. Press, J. D. Scholes, and M. 59:153–175.
G. Barker, eds. Physiological plant ecology. Blackwell Scien- tific, Page, R. D. M. 1996. TREEVIEW: an application to display phy-
Oxford, U.K. logenetic trees on personal computers. Comp. Appl. Biosci. 12:
———. 2000. Table of critical values of independent contrast cor- 357–358.
relations based on equal branch lengths. Available at http:// Pagel, M. D. 1993. Seeking the evolutionary regression coefficient: an
www/stanford.edu/~dackerly/archive/Rcritvals.html. analysis of what comparative methods measure. J. Theor. Biol.
Ackerly, D. D., and M. J. Donoghue. 1998. Leaf size, sapling al- 164:191–205.
lometry, and Corner’s rules: a phylogenetic study of correlated ———. 1999. Inferring the historical patterns of biological evo-
evolution in maples (Acer). Am. Nat. 152:767–791. lution. Nature 401:877–884.
Ackerly, D. D., and P. B. Reich. 1999. Convergence and correlations Poe, S. 1998. Sensitivity of phylogeny estimation to taxonomic
among leaf size and function in seed plants: a comparative test sampling. Syst. Biol. 47:18–31.
using independent contrasts. Am. J. Bot. 86:1272–1281. Price, T. 1997. Correlated evolution and independent contrasts. Phil.
Armstrong, D. P., and M. Westoby. 1993. Seedlings from large seeds Trans. R. Soc. Lond. Ser. B 352:519–529.
tolerate defoliation better: a test using phylogenetically Purvis, A., and A. Rambaut. 1995. Comparative analysis by inde-
independent contrasts. Ecology 74:1092–1100. pendent contrasts (CAIC): an Apple Macintosh application for
Burt, A. 1989. Comparative methods using phylogenetically in- analysing comparative data. Comp. Appl. Bios. 11:247–251.
dependent contrasts. Evol. Biol. 6:33–53. Purvis, A., and A. J. Webster. 1999. Phylogenetically independent
Cunningham, C. W., K. E. Omland, and T. H. Oakley. 1998. Re- comparisons and primate phylogeny. Pp. 44–70 in P. C. Lee,
constructing ancestral character states: a critical reappraisal. ed. Comparative primate socioecology. Cambridge Univ. Press,
Trends Ecol. Evol. 13:361–366. Cambridge, U.K.
Diaz-Uriarte, R., and T. Garland Jr. 1996. Testing hypotheses of Purvis, A., J. L. Gittleman, and L. Hang-Kwang. 1994. Truth or
consequences: effects of phylogenetic accuracy on two com-
parative methods. J. Theor. Biol. 167:293–300. Silvertown, J., and M. Dodd. 1996. Comparing plants and con-
Rees, M. 1996. Evolutionary ecology of seed dormancy and seed size. necting traits. Phil. Trans. R. Soc. Lond. Ser. B 351:1233–1239.
Phil. Trans. R. Soc. Lond. Ser. B 351:1299–1308. Sokal, R. R., and F. J. Rohlf. 1995. Biometry. 3rd ed. W. H. Free-
Reich, P. B., M. B. Walters, and D. S. Ellsworth. 1997. From tropics man, New York.
to tundra: global convergence in plant functioning. Proc. Natl. Swanborough, P., and M. Westoby. 1996. Seedling relative growth
Acad. Sci. 94:13730–13734. rate and its components in relation to seed size: phylogenetically
Reich, P. B., D. S. Ellsworth, M. B. Walters, J. M. Vose, C. Gresh- independent contrasts. Funct. Ecol. 10:176–184.
am, J. C. Volin, and W. D. Bowman. 1999. Generality of leaf trait Swofford, D. 1993. PAUP: phylogenetic analysis using parsimony.
relationships: a test across six biomes. Ecology 80: 1955–1969. Smithsonian Institution, Washington, DC.
Ricklefs, R. E., and J. M. Starck. 1996. Applications of phyloge- Tofts, R., and J. Silvertown. 2000. A phylogenetic approach to
netically independent contrasts: a mixed progress report. Oikos community assembly from a local species pool. Proc. R. Soc.
77:167–172. Lond. Ser B 267:363–369.
Salisbury, E. J. 1942. The reproductive capacity of plants. Bell, Westoby, M. 1999. Generalization in functional plant ecology: the
London. species-sampling problem, plant ecology strategy schemes, and
Sanderson, M. J., and M. J. Donoghue. 1994. Shifts in diversifi- cation phylogeny. Pp. 847–872 in F. I. Pugnaire and F. Valladares, eds.
rate with the origin of angiosperms. Science 264: 1590–1593. Handbook of functional plant ecology. M. Dekker, New York.
Sanderson, M. J., A. Purvis, and C. Henze. 1998. Phylogenetic Westoby, M., S. A. Cunningham, C. M. Fonseca, J. M. Overton, and
supertrees: assembling the trees of life. Trends Ecol. Evol. 13: I. J. Wright. 1998. Phylogeny and variation in light capture area
105–109. deployed per unit investment in leaves: designs for selecting study
Saverimuttu, T., and M. Westoby. 1996. Seedling longevity under species with a view to generalizing. Pp. 539–566 in H. Lambers, H.
deep shade in relation to seed size. J. Ecol. 84:681–689. Poorter, and M. M. I. Van Vuuren, eds. Inherent variation in plant
Sille´n-Tullberg, B. 1993. The effect of biased inclusion of taxa on growth: physiological mechanisms and eco- logical consequences.
the correlation between discrete characters in phylogenetic trees. Backhuys Publishers, Leiden, the Neth-
Evolution 47:1182–1191. erlands.

Corresponding Editor: J. Losos

You might also like