Professional Documents
Culture Documents
1480–1492
Abstract. Independent contrasts are widely used to incorporate phylogenetic information into studies of continuous
traits, particularly analyses of evolutionary trait correlations, but the effects of taxon sampling on these analyses have
received little attention. In this paper, simulations were used to investigate the effects of taxon sampling patterns and
alternative branch length assignments on the statistical performance of correlation coefficients and sign tests; ‘‘full-
tree’’ analyses based on contrasts at all nodes and ‘‘paired-comparisons’’ based only on contrasts of terminal taxon
pairs were also compared. The simulations showed that random samples, with respect to the traits under consideration,
provide statistically robust estimates of trait correlations. However, exact significance tests are highly dependent on
appropriate branch length information; equal branch lengths maintain lower Type I error than alternative topological
approaches, and adjusted critical values of the independent contrast correlation coefficient are provided for use with
equal branch lengths. Nonrandom samples, with respect to univariate or bivariate trait distributions, introduce dis-
crepancies between interspecific and phylogenetically structured analyses and bias estimates of underlying evolutionary
correlations. Examples of nonrandom sampling processes may include community assembly processes, convergent
evolution under local adaptive pressures, selection of a nonrandom sample of species from a habitat or life-history
group, or investigator bias. Correlation analyses based on species pairs comparisons, while ignoring deeper relation-
ships, entail significant loss of statistical power and as a result provide a conservative test of trait associations. Paired
comparisons in which species differ by a large amount in one trait, a method introduced in comparative plant ecology,
have appropriate Type I error rates and high statistical power, but do not correctly estimate the magnitude of trait
correlations. Sign tests, based on full-tree or paired-comparison approaches, are highly reliable across a wide range of
sampling scenarios, in terms of Type I error rates, but have very low power. These results provide guidance for
selecting species and applying comparative methods to optimize the performance of statistical tests of trait associations.
Key words. Comparative methods, correlated evolution, independent contrasts, simulations, taxon sampling.
In recent years, numerous analytical methods have been dance (e.g., common taxa due to ease of study), research
introduced to incorporate phylogenetic information into tests history (taxa that have been the focus of prior studies), and
of evolutionary hypotheses such as rates of phenotypic evo- other practical considerations. These limitations will act more
lution, lineage diversification, the timing and pattern of co- strongly when the data are expensive and/or time-consuming
evolution among host-parasite or plant-herbivore lineages, to gather (e.g., population genetic parameters, behavior). Sec-
and correlations between phenotypic traits (Harvey and Pagel ond, many questions in evolutionary ecology address only a
1991; Sanderson and Donoghue 1994; Huelsenbeck and Ran- limited set of taxa that share certain life-history or other
nala 1997; Pagel 1999). These methods focus on how to characteristics (Westoby 1999). For example, a study of the
analyze interspecific data given a phylogenetic framework, significance of seed size in relation to gap regeneration in
but little attention has been paid to the problem of which tropical forests (Foster and Janson 1985) will necessarily be
taxa to select for analysis and how taxon sampling may in- limited to tropical tree species, and as a group these species
fluence the results. Taxon sampling has recently received do not form a clade. Rather, they represent a very diffuse
increased attention in relation to phylogeny reconstruction
sample of taxa drawn from the entire seed plant lineage (>
(e.g., Hillis 1996; Graybeal 1998; Poe 1998), with particular
275,000 species). Any particular study would be limited to
focus on the relative value of more taxa versus more char-
only a small sample, frequently a sample drawn from one
acters for improving phylogenetic accuracy. In studies of
geographic region, and analysis of these data in a phyloge-
character evolution, it has been shown for discrete traits that
netic context will necessarily use a heavily pruned phylogeny
the relative frequency of different character states will influ-
(e.g., Kelly and Purvis 1993; Kelly 1995). The consequences
ence ancestral state reconstructions and the analysis of cor-
of this focus on particular life-history groups or communities
related trait changes (Frohlich 1987; Maddison 1990; Sille´n-
and of the associated pruning of phylogenetic trees have re-
Tullberg 1993). For continuous traits, the sampling of taxa
ceived little attention in relation to phylogenetic comparative
and the resulting distribution of character values will also
methods. Finally, in a historical context speciation and ex-
have significant impacts on ancestral reconstructions (Cun-
tinction also represent sampling processes, in the sense that
ningham et al. 1998). However, the effects of taxon sampling
trait-dependent changes in speciation or extinction probabil-
on the study of correlated evolution in continuous characters
ities will influence the relative frequency of character states
have not been explicitly considered.
in extant taxa. The resulting patterns may be the focus of
In comparative studies, sampling of taxa within lineages attention for some questions (e.g., patterns of species sorting
of interest occurs for several reasons. First, all studies face during mass extinction events). However, in most cases, stud-
methodological limitations of time and effort and it is often ies are limited to extant taxa because the traits can only be
difficult or impossible to obtain data on all taxa in a clade. measured in living organisms or populations, so the potential
The resulting samples will frequently reflect geography (e.g., effects of nonrandom speciation/extinction processes merit
taxa that live in the researcher’s immediate vicinity), abun- consideration.
1480
© 2000 The Society for the Study of Evolution. All rights reserved.
TAXON SAMPLING AND COMPARATIVE METHODS 1481
FIG. 1. (A) Example of a 256-taxon phylogeny based on a ‘‘time-only’’ Markovian speciation model (Martins 1996). (B) Phylogeny
of a random sample of 32 species drawn from the phylogeny in A, with true branch lengths based on the corresponding segments of the full
tree. (C) Phylogeny for the same set of species, with equal branch lengths. Circles show the pairs of sister taxa that would be selected for
species pair analyses of independent contrasts and sign tests. Tree graphics generated using TREEVIEW (Page 1996).
Westoby et al. (1998; Westoby 1999) have recently ad- ples of taxa (with respect to the traits of interest) are appro-
vanced several specific proposals for taxon sampling in com- priate. In contrast, to detect associations between traits Wes-
parative ecology, specifically considering the role of phy- toby et al. recommend a nonrandom approach based on sam-
logeny. The process of species selection is analogized to the pling pairs of closely related taxa in which the species of each
problem of random sampling in ecological research, where pair differ by some large amount in the ‘‘independent trait.’’
it has long been recognized as a central component of robust For example, in studies of seed size and its correlates, they
hypothesis testing. For estimates of properties such as the sampled phylogenetic species pairs in which seed size
mean and variance of a trait in different clades, random sam- differed by at least an order of magnitude (Armstrong and
FIG. 2. Type I error rates under the null model (A) and statistical power under the correlated-changes model (B) for random samples of
different size, based on nominal significance levels at a = 0.05. The last point on each line is based on analysis of the full 256-taxon
phylogenies. Sample size is the number of species or the number of contrasts: N for Rspec ; N —1 for Rpic and Spic; and N/2 for Rpair and Spair.
The ordinate in (A) is log transformed to show differences at low levels. The number of pairs is less than half the number of species in
the tree at each sample size due to tree topology (see Fig. 1).
Westoby 1993; Saverimuttu and Westoby 1996; Swanbor- independent third trait (which is not considered in the cor-
ough and Westoby 1996). Comparisons of species pairs, relation analyses), are there effects on parameter estimates
which invoke only a minimal level of phylogenetic infor- or Type I error rates? How do results based only on sister
mation, have been widely used in plant ecology dating back taxa pairs compare to full-tree analyses conducted over all
to Salisbury (1942). In a recent survey of the plant functional nodes of the phylogeny? Do samples of species pairs that
ecology literature, Ackerly (1999) found 28 studies that used differ by a predetermined amount in one character (following
species pairs, compared to 19 papers using independent con- Westoby 1999) lead to bias in estimating trait correlations
trasts calculated over a full phylogeny for the species of or influence Type I error or statistical power? Under the dif-
interest. In many cases, each pair is chosen to represent con- ferent scenarios outlined above, do nonparametric sign tests
trasting states of a discrete life-history trait (e.g., comparisons of associations between independent contrasts provide reli-
of physiology or allocation in annuals vs. perennials; Garnier
able tests (i.e., appropriate Type I error when p = 0) and
1992; Silvertown and Dodd 1996). If the contrasting states of
how does their power compare to parametric statistics when
the discrete trait represent extreme values of an underlying
p ⁄ 0?
continuous character, then such comparisons may be similar
to Westoby’s approach of selecting species that exhibit a
large difference in the independent trait. However, the po- SIMULATIONS
tential advantages and disadvantages of species pairs com- The simulations presented here are based on a two-step
parisons and of Westoby’s recommendations for statistical model of trait evolution followed by taxon sampling and
hypothesis testing have not been explicitly investigated (see analysis of character correlations. In the first step, the evo-
Maddison 2000). lution of two traits was simulated on a large phylogeny, based
In this study, simulations of trait evolution on a phylogeny either on a null model with no correlation between changes
were used to address the problem of taxon sampling, focusing in the traits ( p = 0) or a correlated-changes model sampling
on the method of independent contrasts and its application changes from a distribution with a true correlation of p =
to the study of correlated evolution between continuous phe- 0.5. In the second step, a subsample of species was selected
notypic traits. Simulations were conducted on a set of ran- from the terminal taxa on this phylogeny, based on various
domly generated 256-taxon phylogenies. Subsamples of the sampling schemes described below, and the phylogeny was
terminal taxa were selected, following various criteria out- pruned to show relationships among this subset (‘‘species’’
lined below, to address the following questions: For random and ‘‘taxa’’ are used interchangeably in this paper; Fig. 1).
species samples, how does sample size influence Type I error For each sample, correlations between the two traits were
rates (when the true correlation, p = 0) and statistical power estimated based on species values and independent contrasts
(for p ⁄ 0) of trait correlations based on either independent (including species pairs analyses, sign tests, and alternative
contrast or interspecific (nonphylogenetic) analyses? Do non- branch lengths) to estimate: (1) the sample correlation be-
random species samples, with respect to one or both char- tween the two traits to determine if the taxon sampling led
acters, lead to biased estimates of trait correlations? How to bias in estimates of the true correlation ( p = 0 or 0.5); (2)
does nonrandom sampling affect Type I error rates when p
for the null model, Type I error rates at the a = 0.05 and
= 0? Similarly, if species are chosen on the basis of an 0.01 levels, based on standard significance testing; and (3)
TABLE 1. Summary of correlation statistics (see abbreviations in text) for simulated evolution of two traits on a set of 256 random phylogenies
(full tree) and for subsamples of taxa chosen based on six different sampling algorithms; for NR2 the sampling bias coefficient = 0.75, and
for NR3 the probability of change in Z per time step = 0.001 (see text). N, number of taxa sampled for Rspec, number of contrasts for Rpic and
Spic, and number of species pairs for Rpair and Spair (the mean and range of values is shown for sampling schemes in which sample size varied);
mean and SD of correlation coefficients (R) or proportion of same sign contrasts (S) are based on 10,000 replicates. Type I error and statistical
power are the proportion of nominally significant results at a S 0.05 or 0.01. Results for p = 0 are for the null model of no true correlation;
results for p = 0.5 are for an alternative model with a true correlation between evolutionary trait changes.
for the correlated-changes model, statistical power at the a length was 50.2 (range = 1–571), and the total height of the
= 0.05 and 0.01 levels. All simulations and analyses of in- trees (sum of branch lengths from the root to each terminal
dependent contrasts were conducted using ACAP, Version 5, taxon) averaged 623 (range = 422–960 across trees). Within
written by the author (see Ackerly 1997; Ackerly and each tree all taxa were at the same height from the root, so
Donoghue 1998). the trees can be interpreted as a set of contemporaneous taxa
Simulations for this study were conducted on a set of 100 in which branch lengths are proportional to time and the rate
random phylogenies of 256 taxa, generated using a ‘‘time- of character evolution is constant over the tree.
only’’ speciation model in COMPARE 4.1 (see Martins Evolution of two continuous traits (X,Y) was simulated by
1996). The trees differed from each other in topology and Brownian motion with initial character states = (0,0) at the
branch lengths. Branch lengths generated by COMPARE root of the tree. In each time step (one time step equals one
were multiplied by 1000 and rounded to the nearest integer unit of branch length), evolutionary changes in the two char-
to allow stepwise simulation of trait evolution along each acters were drawn from a bivariate normal distribution with µ
branch, as described below. Across all trees, mean branch = 0, s2 =X s2 =Y1, and correlation coefficient (the input
correlation, following Martins and Garland 1991) p = 0 for by membership in that group. It differs from random sampling
the null model and p = 0.5 for the correlated-changes model. because selected taxa are significantly clustered in groups at
For a particular sampling algorithm or sample size, each run the tips of the tree.
of the model was based on a total of 10,000 replicates (100 Species pairs with a large difference (DP). Phylogenetic
character simulations on each of the 100 trees). From each species pairs were selected in which the value of character X
simulation one taxon sample was selected for analysis, guar- differed by at least a specified amount (following the ap-
anteeing that each replicate was independent both in terms proach of Westoby 1999; see introduction). This was accom-
of the distribution of character states in the full tree and in plished by starting with the first terminal taxon (at the left
the sampling of taxa for analysis. side of the tree) and searching down the tree until a node
was reached in which the descendant taxa with the minimum
Taxon Sampling Algorithms and maximum values for character X differed by more than
Six different taxon sampling algorithms were evaluated to the specified amount; these two taxa were then chosen for
address the questions presented in the introduction. inclusion in the study, all other taxa in the clade excluded,
Random (R). A specified number of species was selected and the process was repeated in the next clade in the tree.
at random. A series of N = 16, 32, 64, and 128 were used Only one species pair was drawn from each monophyletic
to examine the effects of sample size. For comparison of this clade (paraphyletic pairs, or pairs nested within other pairs
method with the next three nonrandom algorithms, the results were not considered; cf. Burt 1989; Purvis and Rambaut
for N = 32 were used. 1995). The minimum allowable difference was set arbitrarily
Nonrandom on character 1 (NR1). A specified number of (at a value of 14.5) such that approximately 64 taxa (= 32
species (N = 32) was selected at random from those in which pairs) would be chosen, thus facilitating comparison of re-
the value of character X was greater than the mean value for sults with the methods above.
all taxa. This represents a case in which the species chosen Random pairs (RP). Species pairs were chosen at random.
for study exhibit a larger (or smaller) than average trait value This algorithm was run in tandem with the DP selection pro-
for one of the traits of interest. For example, an analysis of cess above, and the random pairs were chosen from the same
seed size versus height in trees would represent a nonrandom set of clades in each simulation run. This approach guaranteed
sample of species with respect to height, compared to the that the number of pairs were the same for each replicate, so
distribution of height in seed plants as a whole. Because there that comparisons of the DP and RP methods were not affected
are numerous evolutionary transitions between ‘‘tree’’ and by the distribution of sample sizes.
‘‘nontree’’ in the seed plants, any particular sample of trees
would be broadly distributed across the underlying seed plant Branch Lengths
phylogeny. Relative branch lengths, in terms of expected variance in
Nonrandom on both characters (NR2). A specified num- ber character evolution, are critical for correct application of in-
of species (N = 32) was chosen with an enforced cor- dependent contrasts (Felsenstein 1985; Martins and Garland
relation (the ‘‘sampling bias’’ coefficient) between character 1991; Purvis et al. 1994). Incorrect branch lengths inevitably
values. This was accomplished by randomly selecting co- raise Type I error rates, although they do not cause biased
ordinates from a bivariate distribution with a specified cor- estimates of the true correlations (Martins and Garland 1991;
relation coefficient (0, 0.25, 5, 0.75, or 0.9), and mean and Purvis et al. 1994). In empirical studies, fossil or molecular
standard deviations equal to those observed in the simulated clock data may be available to estimate branch lengths in
data, and then selecting the species in the data that were units of time, which would be appropriate if the characters
closest to the randomly chosen points in the X-Y space. Bi- under study evolve at a constant rate. Alternatively, if the
ological or investigator-driven scenarios that might result in
rate of evolution of phenotypic characters along each branch
bivariate nonrandom sampling are discussed in detail below
is assumed to be proportional to the corresponding rates for
(see Discussion).
characters used in phylogeny reconstruction (molecular or
Nonrandom with respect to a third character (NR3). Evo-
morphological), branch lengths may be estimated based on
lution of a binary character (Z) was simulated along the tree,
number of character changes in the phylogenetic dataset. In
starting with state 0 and changing from 0 to 1 or 1 to 0 with
either case, when a subsample of taxa are included in a study
probability 0.001 at each time step (simulations resulting in
of character evolution, the ‘‘correct’’ branch lengths can be
fewer than 33 taxa with Z = 1 were discarded). This low rate calculated for the pruned tree based on sums of branch
of change was selected such that less than half the ter- minal
lengths in the original phylogeny (Fig. 1A, B).
taxa would end up in state 1 (on average) and they would be In many (maybe most) cases, there is no basis to assume
clustered at the tips of the tree. Over the 10,000 simulations, that phenotypic traits evolve at constant or proportional rates
the number of taxa with Z = 1 averaged 84.6 (range = 33– (Martins and Garland 1991). In addition, some methods of
231). A random sample of 32 taxa was then selected from phylogeny reconstruction do not provide biologically mean-
those in which Z = 1 for analysis of correla- tions between ingful branch lengths (e.g., supertree analyses; Sanderson et
continuous traits X and Y. This scheme rep- resents a al. 1998). In these situations, all branches may be assigned
situation in which sampling is concentrated in a certain equal lengths (because only relative branch lengths are im-
habitat or life-history group (e.g., grassland species or portant, lengths would normally be set to one; Fig. 1C). Equal
annual plants), but the distribution and bivariate relation- ship branch lengths are sometimes interpreted in terms of a punc-
between the phenotypic traits of interest is not biased tuational model of evolutionary change because there is one
‘‘bout’’ of evolution associated with each speciation event. signment of equal branch lengths between all pairs) because
However, the punctuational interpretation is weaker when
the species pairs approach is employed to minimize necessary
considering taxon sampling because each branch represents
phylogenetic information; (4) sign tests based on independent
an unknown number of speciation events in the full phylog-
contrasts over the full phylogeny (S pic), calculated as the
eny (even if all extant taxa in a clade are sampled, this is
proportion of nodes with contrasts of same vs. different sign;
true due to the unknown distribution of speciation and ex-
and (5) sign tests based on sister taxa pairs only (S pair), anal-
tinction events in the past). Because only relative, and not
ogous to Rpair above. The Pearson correlation coefficient was
absolute, branch lengths influence independent contrast anal-
used for Rspec and the correlation coefficient calculated
yses, equal branch lengths may also be effective if the true
through the origin was used for Rpic and Rpair (for details, see
branch lengths are randomly distributed with respect to depth
Garland et al. 1992). Degrees of freedom were N — 2 for
in the tree. This is true for trees generated by time-only spe-
Rspec and Rpic and NP — 1 for Rpair, where NP is the number of
ciation models (although the resulting branch length distri-
sister taxa pairs (note that the total number of species in
butions are right-skewed; Martins 1996), and for such trees
paired tests = 2NP). Significance at the a = 0.05 and 0.01
equal branch lengths are fairly effective (Purvis et al. 1994).
levels was determined by a t-test for df S 30, and by the z*-
Purvis and Webster (1999) have also argued that equal branch
transformation for df S 31 (Sokal and Rohlf 1995, pp. 575–
lengths may be appropriate when there is significant error in
579). For sign tests, significance at the a = 0.05 level was
the estimation of species trait values, to avoid dispropor-
determined from critical values of the binomial distribution
tionate weighting of contrasts between closely related spe-
for N or NP S 22 (for Spic and Spair, respectively) and by a t-test
cies, which are particularly sensitive to these errors.
approximation for N or NP S 23 (Sokal and Rohlf 1995,
Alternatively, branch lengths may be constructed using p. 444). The mean and standard deviation of the correlation
topological approaches that lengthen branches selectively, coefficients were calculated from each distribution of 10,000
placing all terminal taxa at the same total height from the replicates to test for bias in estimating the true evolutionary
root (e.g., Grafen 1989; or a ‘‘minimum-extension’’ method, correlation (= 0 or 0.5 in the null and correlated-changes
as in the graphical output from PAUP, Swofford 1993). If models, respectively). Type I error rates and statistical power
all taxa under study are extant and rates of evolution are were calculated as the proportion of significant results (at P
assumed to be constant, then the placement of all terminal S 0.05 and S 0.01) under the null model and the correlated-
taxa at the same height is intuitively appealing. However, changes model, respectively.
there is no a priori basis to expect that the relative branch
lengths constructed from these topological rules are biolog- RESULTS
ically appropriate. Simulations have found that Type I error
rates are higher using Grafen’s topological algorithm in com- Full Trees
parison with equal branch lengths when the true branch For the full 256-taxon trees, the mean values of Rspec and
lengths are derived from a random speciation model (Purvis Rpic (analyzed using true or equal branch lengths) were in-
et al. 1994). distinguishable from 0.0 under the null model and 0.5 under
For the simulations reported here, branch lengths in the the correlated-changes model, demonstrating that both mea-
pruned trees were calculated from the true lengths in the full sures provide unbiased estimates of the evolutionary corre-
phylogeny or were set to equal length. All analyses were also lation (Table 1). However, the standard error of the corre-
conducted using the two topological approaches mentioned lations was very high for Rspec and somewhat elevated for
above (the Grafen and minimum extension algorithms), but in Rpic.equal, compared to Rpic.true. As a result, the Type I error rate
almost all cases these resulted in higher Type I error rates for Rspec was 0.57 (all values for Type I error and sta- tistical
compared to analysis on equal branch lengths, so the results power discussed in the text are based on a = 0.05; values for
are not shown (full results are available from the author on a = 0.01 are also reported in Table 1). Type I error rates for
request). Rpic.true were close to nominal levels, whereas rates for Rpic.equal
were somewhat elevated at 0.12. Statistical power for Rpic
Statistical Analyses of Correlations when p = 0.5 was close or equal to one under both branch
For each simulation run, the following correlations were length algorithms, but slightly lower for Rspec (Table 1). Spic,
calculated for all taxa over the full phylogeny and for the the sign test based on contrasts at all nodes, also had
subset of sampled taxa over the pruned tree: (1) species cor- appropriate Type I error rate and high power at this large
relations (Rspec), based on trait values among terminal taxa (i.e., sample size (Table 1).
TIPS; Martins and Garland 1991); (2) correlations of The set of 100 random phylogenies used in this study had an
phylogenetically independent contrasts (R pic), which are sta- average of 84.9 pairs of sister taxa (range = 78–94; the
tistically equivalent to correlations of evolutionary changes remaining taxa were allied with clades of two or more species
(Pagel 1993); Rpic.true is based on contrasts calculated from true so paired comparisons were not possible). Contrast correla-
branch lengths, and Rpic.equal is based on equal branch lengths tions based on these sister taxa pairs (R pair) provided unbiased
(equivalent to FLG and FLP, respectively, in Martins and estimates of true correlations, but elevated Type I error rates
Garland 1991); (3) correlations of contrasts between pairs of of 0.15 (Table 1). Spair, the sign test based on sister taxa pairs
sister taxa at terminal nodes (R pair), thereby excluding contrasts only, had appropriate Type I error rates but relatively low
power (0.88) under the correlated-changes model, presum-
based on internal nodes (Fig. 1C); these contrasts were
ably due to the reduced sample size and the conservative
unstandardized for branch lengths (equivalent to as-
nature of sign tests.
Random Samples
For random samples ranging from N = 16 to 128, Rpic.true
was unbiased with Type I error rates close to nominal levels
in all cases (Fig. 2A). Under a true correlation of p = 0.5,
statistical power ranged from 0.54 at N = 16 to 1.0 at N =
128 (Fig. 2B). Rspec was also unbiased at all sample sizes, but
Type I error rates increased from 0.12 at N = 16 to 0.44 at N
= 128, whereas statistical power increased from ap-
proximately 0.5 to 0.95 (Fig. 2A). Thus, the probability of
incorrectly rejecting the null hypothesis using species cor-
relations increases with larger sample sizes. Sign tests based
on all comparisons (S pic) maintained appropriate Type I error
rates, but had substantially reduced statistical power at low
sample sizes (Fig. 2A, B).
The use of equal branch lengths (Rpic.equal) resulted in
slightly elevated Type I error rates, increasing from 0.074 at
N = 16 to 0.1 at N = 128, with slightly lower power relative
to Rpic.true (Table 1, Fig. 2B). For random samples, Type I
error rates were lower using equal branch lengths compared
to results based on topological alternatives (results not
shown; see Simulations: Branch Lengths). In all cases, es-
timates of the true correlation were unbiased.
Analyses of sister taxa pairs (R pair and Spair) maintained
appropriate Type I error rates at all samples sizes. However,
these samples had greatly reduced sample size because only
a portion of the species selected in each sample were paired FIG. 3. An example of the NR2 sampling algorithm. Character
evolution on a 256-taxon phylogeny was simulated with p = 0
with another species (see Fig. 1C). Based on the random tree (open circles) and then 32 taxa were chosen that fell closest to
topologies in this study, the number of pairs in random sam- points randomly chosen from a bivariate distribution with a
ples averaged approximately two-thirds the maximum pos- sampling bias coefficient of 0.5 (filled squares). For this example,
sible number (e.g., for N = 32, the average number of pairs Rspec for the full tree = —0.24 and in the sample Rspec = 0.59 and
Rpic.true = 0.39.
was 10.7 of a maximum of 16). In addition, each pair of
species provides only one comparison. Thus, although the
statistical power of Rpair and Spair were similar to correspond-
ing values for Rpic and Spic, for the same degrees of freedom, sampling bias was similar in magnitude to the underlying
they were much lower when compared on the basis of the evolutionary trait correlation.
number of species studied (Table 1, Fig. 2B). For a range of sampling bias coefficients from 0.0 to 0.9,
mean values of Rspec increased rapidly under the null model,
Nonrandom Samples whereas Rpic and Rpair increased less markedly (Fig. 4A). Type
I error rates also increased somewhat for Rspec, Rpic, and Spic,
NR1. (See sampling methods under Simulations: Taxon reaching 0.65 for Rpic.true under the strongest sam- pling bias
Sampling Algorithms.) Under the null model, sampling of 32 coefficient of 0.9. Rpair had a stable but elevated Type I error
taxa in which X > X¯ resulted in unbiased estimates of of < 0.2, and only Spair maintained appropriate Type I error
the true correlation for Rpic and Rpair, and approximately cor- rates across all values of the sampling bias coefficient (results
rect Type I error rates for Rpic.true, Spic, and Spair. However, not shown). Under a true correlation of p
under the correlated-changes model, the true correlation of
= 0.5, Rspec increased from 0.14 to 0.69 for sampling bias
0.5 was underestimated by Rpic and Rpair (both averaged ap-
coefficients increasing from 0.0 to 0.9 (Fig. 4B). Rpic and Rpair
proximately 0.38) and statistical power was diminished rel-
were less sensitive across this range. All parameters
ative to random samples of the same size (Table 1).
converged at a sampling bias of 0.5, where the sampling
NR2. Nonrandom sampling relative to both characters
pattern matched the underlying evolutionary correlation (Fig.
imposed a correlation among the two traits in the sampled
4B). Thus, if sampling processes impose a weaker or stronger
taxa relative to all taxa in the clade (e.g., Fig. 3). This was
correlation of trait values in the sample, relative to the cor-
the only sampling method, among those tested here, that re-
relation among all species in the lineage, then interspecific
sulted in biased estimates of the true correlation under the
and independent contrast correlations will underestimate or
null model; for p = 0 and a sampling bias coefficient of 0.75, overestimate, respectively, the evolutionary correlation. In-
mean Rspec was 0.59 and mean Rpic.true was 0.36 (Table 1). dependent contrast correlations are less sensitive to these
Type I error rates were elevated correspondingly for all cor- biases, but the use of a phylogenetically structured analysis
relation statistics, except for Spair (which also had very low does not eliminate these effects.
statistical power). For p = 0.5, mean correlations for a sam- NR3. When taxa were sampled based on the character state
pling bias coefficient of 0.75 were slightly elevated using of an independent binary trait, all statistical indicators were
Rspec and Rpic, although the effect was small because the similar to random samples of the same size, except that
FIG. 4. Mean correlations under the NR2 sampling regime as a function of the sampling bias coefficient. (A) Null model, p = 0. (B) Correlated-
changes model, p = 0.5. N = 32 for all samples.
Type I error rates of Rpic.equal and Rpair were very slightly higher correlations are both equal to the true evolutionary correlation
and power was correspondingly lower (Table 1). describing the underlying patterns of change (Pagel 1993).
Thus, as expected, estimates of trait correlations based on
Sampling of Species Pairs random samples of different sizes are also unbiased with
For the DP sampling algorithm, the minimum difference respect to the true correlation, using interspecific or contrast
between taxa in character X was set at 14.5 units, which was correlations. However, as has been noted in all simulation
equivalent to approximately 0.73 standard deviations, based studies on this problem, the variance in the outcome of in-
on the trait distributions over the 256 species. Mean sample terspecific (i.e., nonphylogenetically adjusted) correlations is
sizes using this criterion were 66.8 species or 33.4 pairs, and much greater than for independent contrast correlations, so
the mean difference in character X between species in each Type I error rates for interspecific correlations are greatly
pair was 21.4 units. Under the null model, Type I error rates inflated (Martins and Garland 1991; Garland et al. 1993;
were approximately 0.05 for Rpic, Rpair, Spic, and Spair, but, as Purvis et al. 1994; Diaz-Uriarte and Garland 1996). Another
in other cases, they were elevated for Rspec (Table 1). aspect of this result, which has not been previously noted,
However, for p = 0.5, mean values of Rpic and Rpair were is that Type I error rates using the interspecific correlation
substantially elevated (0.64 and 0.69, respectively). Thus, increase with sample size (Fig. 2A); this occurs because the
this sampling regime maintained appropriate Type I error, standard deviations of Rspec declined only slightly with in-
but resulted in biased estimates of evolutionary trait corre- creasing N (SD = 0.35 for N = 12; SD = 0.21 for N = 256),
lations when the true correlation was not equal to zero. but the critical values for significance testing decline rapidly
Random species pairs (RP), drawn from the same clades and therefore more of the outcomes are judged significant
as the DP samples (and thus with the same sample sizes), using standard criteria. For independent contrast analyses,
exhibited a mean difference in character one of 7.9 units. the maintenance of appropriate Type I error rates for random
These samples had appropriate Type I error rates for Rpic.true, samples (using true branch lengths) is also consistent with
Spic, and Spair and slightly elevated values for Rpic.equal and Rpair; theoretical expectations. The method of independent con-
all estimates of the evolutionary correlation were un- biased trasts is specifically designed to estimate the independent
under the null and correlated-changes models. How- ever, the events along each branch of the phylogeny, and pruning some
statistical power of Rpair and Spair was much lower than the branches will not alter these estimates for the remaining taxa
corresponding values using DP sampling (0.83 vs. and their underlying divergences, as long as the correct
0.99 for Rpair; 0.49 vs. 0.99 for Spair; Table 1). branch length information is available.