You are on page 1of 13

2002 POINTS OF VIEW 625

MOORE, W. S. 1995. Inferring phylogenies from mtDNA offshoot of avian stem—Evidence from ®-crystallin A
variation: Mitochondrial-gene trees versus nuclear- sequences. Nature 311:257–259.
gene trees. Evolution 49:718–726. SWOFFORD, D. L. 1999. PAUP¤ : Phylogenetic analysis
NAYLOR, G. J. P., T. M. COLLINS , and W. M. BR OWN. 1995. using parsimony (¤ and other methods), version 4.0.
Hydrophobicity and phylogeny. Nature 373:565–566. Sinauer, Sunderland, Massachusetts.
PATON, T., O. HADDRATH, and A. J. BAKER . 2002. Com- SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, and D. M.
plete mitochondrial DNA genome sequences show HILLIS . 1996. Phylogenetic inference. Pages 407–514
that modern birds are not descended from transitional in Molecular systematics, 2nd edition (D. M. Hillis,
shorebirds. Proc. R. Soc. Lond. B 269:839–846. C. Moritz, and B. K. Mable, eds.). Sinauer, Sunderland,
POE, S., and D. L. SWOFFORD. 1999. Taxon sampling re- Massachusetts.
visited. Nature 398:299–300. VAN TUINEN, M., C. G. SIBLEY , and S. B. HEDG ES . 2000.
PRAGER, E. M., and A. C. WILSON. 1976. Congruencey The early history of modern birds inferred from DNA
of phylogenies derived from different proteins. J. Mol. sequences of nuclear and mitochondrial ribosomal
Evol. 9:45–57. genes. Mol. Biol. Evol. 17:451–457.
PRAGER, E. M., A. C. WILSON, D. T. OS UGA, and R. E. VOELKER , G., and S. V. EDWARDS. 1998. Can weighting
FEENEY . 1976. Evolution of ightless land birds on improve bushy trees? Models of cytochrome b evolu-
southern continents: Transferrin comparison shows tion and the molecular systematics of pipits and wag-
monophyletic origin of ratites. J. Mol. Evol. 8:283– tails (Aves: Motacillidae). Syst. Biol. 47:589–603.
294. WETMORE, A. 1960. A classiŽcation for the birds of the
SHIMODAIRA, H., and M. HASEGAWA. 1999. Multiple world. Smithson. Misc. Collect. 139:1–37.
comparisons of log-likelihoods with applications to YANG , Z. 1994. Estimating the pattern of nucleotide
phylogenetic inference. Mol. Biol. Evol. 16:1114–1116. substitution. J. Mol. Evol. 39:105–111.
SIBLEY, C. G., and J. E. AHLQUIST . 1990. Phylogeny and
classiŽcation of birds: A study in molecular evolution.
Yale Univ. Press, New Haven, Connecticut. First submitted 28 November 2001; reviews returned
STAPEL, S. O., J. A. M. LEUNIS SEN, M. VERSTEEG , J. 13 March 2002; Žnal acceptance 15 April 2002
WATTEL, and W. W. DE JONG . 1984. Ratites as oldest Associate Editor: Karl Kjer

Syst. Biol. 51(4):625–637, 2002


DOI: 10.1080/10635150290102302

The Utility of the Incongruence Length Difference Test

F. K EITH B ARKER 1 AND FRANÇOIS M. LUTZONI 2


1
Department of Ornithology, American Museum of Natural History, Central Park West at 79th Street, New York,
New York 10024, USA; E-mail: fbarker@amnh.org
2
Department of Biology, Duke University, Box 90338, Durham, North Carolina 27708, USA

Conditional combination of phylogenetic tern of ancestor–descendant relationships


data requires deŽnition of explicit criteria for among terminals) and uniform probabili-
combinability (Bull et al., 1993). In this con- ties of change among character states (e.g.,
text, combinability refers to the methodolog- branch lengths and relative frequencies of
ical validity of combining multiple sources character state transformation). Data sets
of phylogenetic data, given the underly- sampling the same phylogenetic history, but
ing assumptions (explicit or otherwise) of with drastically different evolutionary dy-
the analysis. Combinability has been eval- namics, could yield biased estimates when
uated by the effect of data set combina- combined and analyzed using a model and
tion on phylogenetic accuracy: Combinable parameters with a poor Žt to at least one
data sets increase accuracy (Bull et al., of the partitions. For molecular data, these
1993; Cunningham, 1997b). When inferen- requirements are explicit in the calcula-
tial methods are statistically consistent, this tion of conditional probabilities based on
convergent property is guaranteed by sta- the maximum-likelihood criterion, where the
tistical homogeneity of the data sets to be overall likelihood is the product of individ-
combined: Increasing sample size increases ual site likelihoods, under the assumption
precision. In a phylogenetic context, data that site patterns are independent and
homogeneity can be deŽned as the shar- identically distributed (Felsenstein, 1981).
ing of a single history (topological pat- However, likelihood methods allow this
626 S YSTEMATIC BIOLOGY VOL. 51

requirement to be relaxed in various ways, data sets within a combined analysis (Farris
such as by allowing sites to vary in rate (Yang, et al., 1995a, 1995b). However, the test has
1993) or relative probabilities of character- gained wide usage in parsimony analyses
state transformations (Yang, 1996). Given both as a test of topological congruence (e.g.,
that homogeneity of these transformation testing the clonality of fungi; Koufopanou
probability parameters can be relaxed, the et al., 1997; Geiser et al., 1998; Carbone
most basic requirement of combinability is et al., 1999) and more generally as a test of
topological congruence (e.g., Mason-Gamer combinability (Cunningham, 1997a, 1997b;
and Kellogg, 1996). Swofford, 1998).
Whereas tests of congruence (Huelsenbeck Although the test is widely used, a num-
and Bull, 1996) and process homogeneity ber of authors have noted peculiarities in its
(e.g., Sullivan, 1996; Yang, 1996) are self- behavior that have called into question its va-
evident (if computationally demanding) lidity as a criterion for congruence and com-
within a maximum-likelihood framework, binability. In one of the few studies of the
the same has not been true for parsimony. effect of combining data sets with varying
This lack, in conjunction with debate re- signiŽcance with the ILD test, Cunningham
garding the effects of combining data with (1997a) concluded that the ILD test per-
differing evolutionary rates in parsimony formed best in predicting when data should
analysis, has contributed to signiŽcant be combined, compared with the tests of
controversy over how data homogeneity, Templeton (1983) and Rodrigo et al. (1993).
topological congruence, and combinability This conclusion was based on an analysis
should be assessed and interpreted of the effects of adding together individual
(Bull et al., 1993; Kluge and Wolf, 1993; partitions in estimation of a phylogenetic hy-
Chippindale and Wiens, 1994; Lutzoni pothesis strongly supported by all the avail-
and Vilgalys, 1995; Brower et al., 1996; able data (proxy for a “known” phylogeny).
Mason-Gamer and Kellogg, 1996; Nixon However, Cunningham (1997a) suggested
and Carpenter, 1996; Cunningham, 1997a, that a critical ® value of somewhere between
1997b; DeSalle and Brower, 1997; Lutzoni, 0.01 and 0.001 was a more appropriate cri-
1997). One of the procedures that has been terion for rejection of combinability than the
developed in a parsimony context is the generally accepted 0.05 level, suggesting an
incongruence length difference (ILD) test excessive type I error rate for the ILD test as a
(Farris et al., 1995a, 1995b). The test is based measure of combinability (see also Sullivan,
on the ILD index of Mickevich and Johnson 1996). Graham et al. (1998) obtained signif-
(1976), which measures the proportion icant ILD values when testing for incongru-
of inferred homoplasy attributable to the ence between sequence data from the chloro-
combination of individual data sets or plast genome and morphological data in the
partitions, which may each require conict- angiosperm family Pontederiaceae but inter-
ing minimal-length topologies. The index preted this conict as the result of high levels
can be deŽned as (i T ¡ i W )= i T , where i T is of homoplasy in the morphological data. To
the total number of homoplastic character support this contention, they performed the
changes required under parsimony on the ILD test using their molecular data and ran-
shortest tree for two or more data sets dom data sets with four equiprobable states,
analyzed simultaneously, and i W is the generated using the “Fill random” option in
sum of homoplastic changes required for MacClade (Maddison and Maddison, 1999).
each data set on its own minimum length Despite low structure retention in the 50%
tree (or trees). The ILD test compares the bootstrap majority rule consensus for these
value of this index with a null distribution “random” data sets, all 20 of their replicates
generated by random permutation of char- were incongruent with the molecular data
acters among partitions (in practice, only at ® · 0:01. These results suggested that the
the sum of the tree lengths from separate ILD test might have an excessively high type I
analyses is calculated and compared with error rate as a test of congruence. SpeciŽ-
its permuted null distribution). The ILD cally, they indicated that this effect might be
test was intended to detect the presence caused by disparity in levels of homoplasy
of strongly supported character conict among data sets. These results and others
(“hard” incongruence) among individual (e.g., Cunningham, 1997b; Stanger-Hall and
2002 POINTS OF VIEW 627

Cunningham, 1998; Yoder et al., 2001) have Here, we explore the interrelated proper-
suggested that the test might be biased or in- ties of congruence, homogeneity, and com-
accurate as a measure of both combinability binability in the context of the ILD test. We
and congruence. explain briey the underlying statistical dif-
Recently, Dolphin et al. (2000) performed a Žculty with the ILD test as a measure of con-
series of data set manipulations that showed gruence and discuss its potential utility as a
conclusively that signiŽcance values of the test of homogeneity among data partitions.
ILD test are related to disparity in levels of The ILD test appears to be an inappropri-
homoplasy between two or more data sets. In ate measure of congruence and homogene-
their investigation, they permuted character ity under reasonable simulated conditions of
states among taxa for increasingly large pro- molecular evolution. We also assessed the
portions of perfectly consistent binary data utility of the ILD test as a criterion for data
sets. These perturbed data sets were eval- set combinability, as estimated by its predic-
uated for congruence with unmanipulated tive value with regard to the effect of data set
data using the ILD procedure. As the propor- combination on phylogenetic accuracy.
tion of permuted characters increased, the
signiŽcance of the ILD likewise increased, al- M ETHODS
though no well-supported structure should
have been retained in the permuted data. DNA Sequence Data Simulations
Dolphin et al. concluded that differing lev- Pairs of identical-size data sets for evalua-
els of homoplasy between the two data sets tion via the ILD test were generated stochas-
per se caused the signiŽcant result. Their tically according to established models of
Žgure 3 shows the underestimation of homo- DNA sequence evolution. Contrasts between
plasy characteristic of the parsimony method data sets in a number of factors (e.g., sam-
(Archie and Felsenstein, 1993), which under- ple size, average substitution rates, substi-
lies the signiŽcant ILD values. Darlu and tution models, and lineage-speciŽc and site-
Lecointre (2002) generalized this result to speciŽc rate heterogeneity) are common in
molecular data simulated under a variety of molecular data (e.g., Reed and Sperling, 1999;
evolutionary conditions. They found signif- Wilgenbusch and de Quieroz, 2000), and
icant conict between data sets evolved on some of these factors are known to affect sig-
a single tree but with contrasting lineage- niŽcance of the ILD (Darlu and Lecointre,
speciŽc rates of evolution and patterns of 2002). In this study, the only difference
among-site rate variation (simulated using a between pairs of data sets tested was in evo-
0 distribution of rates; Yang, 1993). lutionary rate. However, each rate compari-
These results suggest that the ILD test is son was repeated under a number of evolu-
not a valid measure of “hard” (well sup- tionary models (Table 1). All data sets were
ported) incongruence. However, it remains generated using Seq-Gen 1.1 (Rambaut and
to be seen whether the ILD test is an appro- Grassly, 1997), which allows a multiplier to
priate measure of data set combinability. The be applied to all branches of an input phy-
ILD test could be a good measure of com- logeny (option ¡s). All unique pairwise com-
binability without being an appropriate test parisons were made between data sets with
of congruence (a necessary but not sufŽcient multipliers of 1 (the base model tree and
condition of combinability). SpeciŽcally, the branch lengths), 5, 10, and 50 (a total of 10
test may combine phylogenetic congruence rate comparisons, i.e., 1:1, 1:5, : : : , 50:50).
and uniformity of character transformation These 10 unique rate comparisons were re-
probabilities inextricably, as in a test of ho- peated under a variety of simulated con-
mogeneity (regarding which, nota bene the ditions of DNA sequence evolution, deter-
current naming of the ILD implementation mined by three main axes: (1) tree shape,
in PAUP¤ is the partition homogeneity test; (2) base frequency skewness, and (3) tran-
Swofford, 1998). To evaluate the utility of the sition/transversion bias (see Table 1). Two
ILD test, it must be examined not only with symmetric (perfectly balanced) base model
regard to the evolutionary conditions that trees were chosen for testing, one with equal
yield signiŽcance but with an exploration of branch lengths set at 0.077 (in expected num-
the consequences of data set combination as ber of changes per site) and the second
a function of the test’s signiŽcance. with the Žve internal branches set at 0.012
628 S YSTEMATIC BIOLOGY VOL. 51

TABLE 1. Conditions of DNA sequence simulations. Conditions indicated by each row were implemented with
the base rates (unscaled branch lengths indicated under Tree shape) and with branch lengths scaled by multipliers
of 5, 10, and 50. Within each model, 100 replicates (with 1,000 characters evolved at each rate) of all unique pairwise
rate comparisons (1:1, 1:5, : : : , 50:50) were evaluated via the ILD procedure (Farris et al., 1995a, 1995b), yielding
“error” estimates for a total of 80 simulated conditions.

Tree shapea Base frequencies Transition/transversion ratio Modelb


Even A D C D G D T (even) 0.5 (unbiased) JC69
ADCDGDT 5.0 (biased) K2P
A D T D 0.125, C D G D 0.375 (skewed) 0.5 F81
A D T D 0.125, C D G D 0.375 5.0 HKY85
Short internal ADCDGDT 0.5 JC69
ADCDGDT 5.0 K2P
A D T D 0.125, C D G D 0.375 0.5 F81
A D T D 0.125, C D G D 0.375 5.0 HKY85
a Even D (1: 0, (2: 0.076923 , ((3: 0.076923 , 4: 0.076923): 0.076923 , ((5: 0.076923 , 6: 0.076923): 0.076923, (7: 0.076923 , 8: 0.076923):

0.076923) : 0.076923) : 0.076923) : 0.076923); short internal D (1: 0, (2: 0.117647 , ((3: 0.117647 , 4: 0.117647) : 0.011765, ((5: 0.117647,
6: 0.117647) : 0.011765 , (7: 0.117647 , 8: 0.117647): 0.011765) : 0.011765): 0.011765) : 0.117647) .
b JC69–Jukes and Cantor, 1969; K2P–Kimura, 1980; F81–Felsenstein, 1981; HKY85–Hasegawa et al., 1985.

and the eight external branches set at 0.118 Statistical Evaluation of the ILD
(see Table 1). The total tree length in both For each of the 80 simulated comparisons,
cases was 1.000 (values rounded). Branch 100 replicate data sets of 1,000 characters per
lengths generated with the base rate and partition were generated (one partition for
with multipliers of 5 and 10 represent reason- each of the two rates being compared under
able levels of comparison frequently encoun- a given substitution model and tree). Each of
tered in problems of phylogeny estimation these replicate data sets was analyzed using
using DNA sequence data. The multiplier the ILD procedure as implemented in PAUP ¤
of 10 yielded data with phylogenetic signal 4.0b8 (Swofford, 1998) using the branch-and-
signiŽcantly degraded by multiple substi- bound search algorithm with 100 permuta-
tutions (pers. obs.), and a multiplier of 50 tion replicates to generate the null distri-
yielded data sets that were nearly random- bution. The fraction of ILD null replicates
ized. The base model of sequence evolution greater than the initial value (the “signiŽ-
used was that of Jukes and Cantor (1969; cance” value of the ILD) was recorded for
JC69), which has a single rate for all nu- each simulation replicate.
cleotide substitutions and equal representa- A more extensive analysis of the data
tion of all four bases. Two evolutionary pa- sets simulated under the HKY85 model of
rameters were varied from this base model sequence evolution was conducted. In ad-
to assess their impact on signiŽcance values dition to the ILD signiŽcance values, the
of the ILD test. The Žrst of these was base inferred most-parsimonious tree (or trees,
frequency skewness, which was imposed by found via the branch-and-bound algorithm)
setting base frequencies for G and C to 37.5% for each of the two partitions separately and
and those for A and T to 12.5% (correspond- the two partitions analyzed simultaneously
ing to the model of Felsenstein, 1981; F81). were recorded. Congruence between these
The second factor was the proportion of tran- trees and the generating tree was quanti-
sitions to transversions, which was set to Žed by the normalized consensus fork index
5 to mimic the observed skewness in some (nCFI; Colless, 1980), which has its maximum
data sets (e.g., mitochondrial DNA; corre- at complete congruence with the generating
sponding to the model of Kimura, 1980; K2P). tree and its minimum when the inferred tree
These departures were also imposed simul- shares no nodes with the generating tree.
taneously (the model of Hasegawa et al., Changes in the accuracy of phylogenetic esti-
1985; HKY85). All 10 rate comparisons were mation with data combination (1nCFI) were
made for each of the four substitution mod- estimated by subtracting the nCFI of the low-
els (JC69, K2P, F81, and HKY85) using both rate data set (which invariably provided a
of the model trees (even and short internal), better estimate of phylogeny under the simu-
yielding a total of 80 comparisons. lation conditions used here) from the nCFI of
2002 POINTS OF VIEW 629

the combined analysis (for single-rate com- niŽcant comparisons tended to increase on
parisons, the choice of single data set nCFI the unequal branch length tree. Imposi-
was arbitary). Thus, if data combination in- tion of a transition/transversion bias gen-
creased accuracy over the best single data set erally reduced the signiŽcance levels of the
in terms of the number of correctly inferred ILD. When combined with skewed branch
nodes, 1nCFI was positive, and vice versa. lengths, imposition of a transition bias had
SigniŽcance of the ILD (using ln-transformed a reduced effect, although in general it still
P values; Cunningham, 1997b) was evalu- decreased the proportion of signiŽcant val-
ated for its value as a predictor of 1nCFI via ues. Overall, skewed base frequency had lit-
least-squares regression (StatView 5.0.1, SAS tle effect on observed signiŽcance values,
Institute). The effect of combination was also the most noticeable being the comparison
quantiŽed discretely as negative (1nCFI < 0) for the unequal branch length tree with
versus neutral/positive (1nCFI ¸ 0), and the transition/transversion bias, where addition
signiŽcance of the ILD for these cases (® · of base frequency skewness (K2P versus
0:05) was noted. These data were subjected to HKY85) appeared to have a slight reducing
a  2 contingency table analysis to determine effect on signiŽcance levels.
whether signiŽcance of the ILD success-
fully predicted negative effects of data set
combination. ILD as a Predictor of Phylogenetic Accuracy
with Data Combination
R ESULTS Somewhat surprisingly given the highly
signiŽcant ILD values found for many of
SigniŽcance Values of the ILD as a Function these simulated data sets (Fig. 1), phyloge-
of Model Conditions and Rate Comparison netic accuracy of individual data sets and
The results of ILD evaluations of simu- data sets in combination was extremely high
lated DNA sequence data are summarized in (generally ¸90% of replicates recovered the
Figure 1. Comparisons of data sets evolved generating tree, except rate 50 data and ex-
at identical rates yielded very few signiŽ- treme rate comparisons under the HKY85
cant values, even for extremely high rates model). Levels of accuracy for most models
(data sets evolved with a multiplier of 50 of sequence evolution (JC69, F81, and K2P)
were essentially randomized; uncorrected were high enough that there was little varia-
p distances among sequences were ¼0.75, the tion available for analysis of the ILD test as a
random expectation with even base frequen- predictor of accuracy. For this reason, we fo-
cies). Comparisons of data sets with con- cused on analysis of the HKY85 model data
trasting rates of nucleotide substitution, es- (Fig. 2). Even for these data, levels of accuracy
pecially the 1:10, 1:50, 5:10, 5:50, and 10:50 were very high for the even-branch-length
comparisons, demonstrated signiŽcance val- trees, except for data sets including only char-
ues for the ILD test markedly in excess of acters evolving at the base rates with a mul-
0.05. Rate comparisons based on the model tiplier of 50 (Fig. 2A). With 2,000 characters,
tree with equal branch lengths showed rather even the rate 50 data yielded the correct tree
abrupt transitions between failure to detect in 2 of 100 replicates, indicating that not quite
signiŽcant differences and complete rejection all phylogenetic signal was eliminated.
of the null hypothesis (e.g., rate propor- Phylogenetic accuracy of individual and
tions of 1:10 versus 1:50), especially for com- combined data sets was severely compro-
parisons on this tree that included a tran- mised for characters evolved on the short-
sition/transversion bias (K2P and HKY85 internal-branches tree (Fig. 2B). Even the
models). base rate data failed to recover the cor-
Increases in external branch lengths at the rect tree in 14.5% of the replicates with
cost of decreasing internal branch lengths 1,000 characters, although doubling the data
yielded reduced signiŽcance levels of the set size increased accuracy to 100% (see
ILD test, in cases where the test yielded sig- 1:1 combined data set, Fig. 2B). The gen-
niŽcant values with equal branch lengths erating tree was never recovered from rate
(Figs. 1A, 1B). However, where signiŽcance 50 data with this tree shape. In general,
values were low with equal branch lengths increasing substitution rates decreased ac-
(1:1, 1:5, 5:5, 10:10), the percentage of sig- curacy for individual data sets and for
630 S YSTEMATIC BIOLOGY VOL. 51

FIGURE 1. Percentage of replicate simulated DNA sequence data set comparisons for which the ILD test returned
P · 0:05 (signiŽcance). (A) Results of simulations on trees with equal branch lengths. (B) Results of simulations on
trees with short internal branches. See Table 1 for parameters used in each simulation. The ratios under each bar
indicate the rate comparisons being reported (e.g., 5:10 indicates data sets simulated on the base tree with branch
length multipliers of 5 and 10, respectively, were being prepared).
2002 POINTS OF VIEW 631

FIGURE 2. Phylogenetic accuracy of data sets evolved under the HKY85 model of sequence evolution with (A)
equal branch lengths or (B) short internal branches (see Table 1). Boxplots indicate the distribution of accuracy of
parsimony trees compared with the tree used to generate the data sets (as measured by the normalized consensus
fork index, nCFI; Colless, 1980). The center horizontal line of each box indicates the median accuracy of a given
data set composition, the lower and upper boundaries of each box indicate the 25th and 75th percentiles of data
set accuracy, the whiskers outside each box extend to the 10th and 90th percentiles, and individual values in the
lowest and highest 10% of the distribution are plotted as circles (one or more of these features may coincide if the
corresponding percentiles overlap; e.g., data sets with 100% accuracy are represented by a single horizontal line).
Accuracy of single data sets (1,000 bases, left) was estimated from 1000 replicate data sets, and that of combined
data sets (2,000 bases, right) was estimated from 100 replicates. The values at the top of the box plots indicate the
percentage of these replicates that recovered the tree used to generate the data sets.
632 S YSTEMATIC BIOLOGY VOL. 51

combined single-rate data sets. Combining TABLE 2. The qualitative relationship between ILD
high-rate data sets with low-rate data sets signiŽcance and the effect of data set combination on
phylogenetic accuracy. For the HKY85 model of evo-
generally decreased phylogenetic accuracy lution on the short-internal-branches tree (see Table 1),
relative to averages for the lower rate data the number of replicates with signiŽcant (S) and non-
alone. The only exceptions to this trend were signiŽcant (NS) ILD values (at three signiŽcance levels)
the combination of rate 1 and rate 5 data, is split into replicates that show decreased and increased
accuracy upon data set combination. All values for the
which only slightly decreased accuracy of in-  2 contingency test are signiŽcant at the P < 0:01 level
ference over rate 1 data alone, and the com- (df D 1).
bination of rate 5 and rate 10 data, which sig-
niŽcantly improved accuracy over rate 5 data ILD signiŽcance
(Fig. 2B). ® D 0:01 ® D 0:05 ® D 0:10
Variance around this general pattern
Accuracy S NS S NS S NS
of reduced accuracy with combination of
high- and low-rate data was examined for Decreased 29 275 75 229 114 190
Increased 13 683 63 633 106 590
evidence of the predictive value of the ILD.
Â2 30.9 43.4 61.1
SpeciŽcally, we asked whether signiŽcance
of the ILD was a good predictor of the effect
of combining two data sets on the accuracy of
the combined estimate. We measured this ef-
accuracy of the combined data estimate. We
fect relative to the better of the two sepa-
also examined this trend qualitatively using
rate estimates (invariably the lower rate data
contingency table analysis. This analysis in-
set under the conditions simulated here). For
dicated that data set combination for repli-
the HKY85 model of evolution, on the short-
cates with signiŽcant ILD values resulted in
internal-branch tree, the ILD P value was a
reduced accuracy at a much higher frequency
signiŽcant predictor of the effect of data com-
than did combination for replicates with
bination on relative accuracy, as evaluated by
nonsigniŽcant values across a range of sig-
simple regression (Fig. 3). Thus, decreasing
niŽcance levels (Table 2). Although ILD sig-
signiŽcance for the ILD (higher ILD P values)
niŽcance was a signiŽcant predictor of the ef-
is related to overall increases in phylogenetic
fect of data set combination on accuracy, the
amount of variation in this effect explained
by the ILD P value was extremely small (co-
efŽcient of determination, r 2 D 0:11).

D IS CUS SION
ILD SigniŽcance as an Indicator of Topological
Congruence
In agreement with previous results
(Cunningham, 1997a; Graham et al., 1998;
Dolphin et al., 2000; Yoder et al., 2001; Darlu
and Lecointre, 2002), the simulations pre-
sented here strongly support the contention
that the ILD procedure is, under certain con-
ditions, biased as a test of congruence, that is,
in terms of shared phylogenetic history. The
proportion of individual ILD signiŽcance
values · ® (® D 0:05) in our simulations indi-
FIGURE 3. Least-squares regression of the effect of cates the type I error rate of the ILD as a test of
data set combination on phylogenetic accuracy (1nCFI) topological congruence at that value of ® (the
as predicted by signiŽcance levels of the ILD (ln- probability of rejecting congruence given
transformed ILD P values; includes data generated us- that congruence is true, which is the case be-
ing the HKY85 model with the short-internal-branch-
length tree). The equation of the regression line is
cause the data sets were generated from the
1nCFI D 0.081 C 0.082[ln(ILD P value)]; r 2 D 0.11, same tree). In most cases simulated here, this
Pregression < 0:01. proportion far exceeded the target value of
2002 POINTS OF VIEW 633

0.05. All of the factors varied, including tran- that contain phylogenetic information (non-
sition/transversion ratio bias, base composi- randomized data) but that exhibit varying
tion, and internal:external branch length pro- levels of homoplasy and structure because
portions, appear to have their main effects of varying rates and patterns of evolution.
by inuencing the amount of structure in the Data that share a single history can, because
more structured data set. The higher the con- of differences in evolutionary dynamics, ex-
trast in degree of structure between the two hibit signiŽcant incongruence as measured
data sets, the stronger the effect on signiŽ- by the ILD test.
cance levels of the ILD. Imposing a transition This difŽculty with the application of the
bias or base composition skewness or reduc- ILD test as a criterion of topological congru-
ing the length of the internal branches on the ence (a measure of shared phylogenetic his-
model tree have the effect of reducing the tory) could prompt at least two responses,
level of contrast in strength of phylogenetic that is, retention of the ILD as a criterion
signal (as quantiŽed by the consistency and under certain conditions or in some modi-
retention indices) between individual data Žed form, or rejection of the ILD as a crite-
sets, especially in the most extreme cases rion (and possibly its resurrection in some
(e.g., rate < 50 versus rate 50 comparisons). other role). Regarding the former option, de-
The results presented here expand upon lineation of conditions under which the test
previous conclusions regarding the behavior might be biased offers one potential rem-
of the ILD drawn from studies of random- edy. A full deŽnition of the parameter space
ized data. Dolphin et al. (2000) previously ar- within which the ILD test might be a statis-
gued that signiŽcance of the ILD in compar- tically valid test of congruence is beyond the
isons of randomized and structured data was scope of this study. A number of factors other
due to the nonlinear relationship between than evolutionary rates and patterns may af-
increasing homoplasy levels and parsimony fect the performance of the test, such as re-
estimates of tree length. This consistent un- solving power, sample size, and the num-
derestimation of character change inherent ber of character states available (Lutzoni,
to parsimony procedures is well documented 1997). We have performed preliminary tests
and has fueled continuing debates over the of the effects of resolving power (testing data
appropriateness of various measures of ho- sets against jackknifed subsets) and sample
moplasy and phylogenetic signal (reviewed size (using independent data sets of differing
by Archie, 1996). SpeciŽcally, Archie and sizes). Neither factor per se appears to be sig-
Felsenstein (1993) noted that the length of niŽcant; however, both should have an im-
shortest trees for random data is usually sub- pact to the degree that they affect the proba-
stantially lower than that of random trees. bility of recovering the generating tree in null
In the context of the ILD, combined analy- model replicates when disparity in levels of
sis of random and structured data will result homoplasy exists (predicting increased bias
in higher estimates of character change for of the test in comparisons of large, relatively
the randomized characters than would be ob- structured data sets with small, relatively un-
tained on minimum-length trees generated structured data sets). Additionally, disparity
using those characters alone. If the structured in the number of available character states
characters dominate in producing trees for between data sets evolving at similar rates
the null replicates of the test, the mode of the will result in different levels of homoplasy at
null distribution will be shifted up a number sufŽciently high rates of change.
of steps depending on the degree to which Regarding the conditions tested in our
parsimony underestimates amounts of char- study, examination of homoplasy indices for
acter change for the randomized characters the data sets used in these simulations pro-
alone. Consequently, comparison of the ini- vides some useful information. Figure 4 sum-
tial summed tree lengths (with changes in marizes consistency index (CI) and reten-
the randomized data signiŽcantly underes- tion index (RI) values for the simulated DNA
timated in the separate analysis) with the data sets. CI values are conspicuously high,
null will yield a conclusion of signiŽcance. even for essentially randomized data (rate
The current results (as well as other simu- 50), because of the small number of taxa in
lation data; Darlu and Lecointre, 2002) indi- each data set. Graphically, the RI values ap-
cate that this effect can be signiŽcant for data pear more useful in discriminating among
634 S YSTEMATIC BIOLOGY VOL. 51

FIGURE 4. (A) Ensemble consistency index (CI) and (B) retention index (RI) for the simulated DNA sequence
data sets. Error bars indicate the 95% conŽdence interval of each value. Reported values are for individual simulated
data sets from associated shortest trees as found by the branch-and-bound search algorithm of PAUP¤ (Swofford,
1998).

data sets, although the overall pattern is es- high-rate data set. Basically, comparison of
sentially the same as that displayed by the RI D 0.25 DNA data with any more struc-
CI. For those cases where the ILD test ab- tured data (RI > 0.25) yielded signiŽcant ILD
solutely rejects the null hypothesis (all com- values.
parisons with rate 50), the RI of 0.25 indi- In contrast, when the model tree is less
cates the essentially random nature of the easily estimated (i.e., short internal branch
2002 POINTS OF VIEW 635

lengths), RI values for the lower rate data ference in rates (5:10) and the highest ILD
(speciŽcally 5 and 10) drop to levels simi- P value of all replicates generated under
lar to those for the rate 50 data (RI D 0.25, this model ( P D 1:00). A likelihood ratio test
Fig. 4B), and the ILD test yields less signif- comparing the Žt of a single-rate model with
icant values. An intermediate case is that of that of the two-rate model to these data un-
the 5:10 rate comparison under a JC69 model der the HKY85 model of evolution (using the
(Fig. 4B). In this case, RI values are approxi- generating tree) was highly signiŽcant (¡2 ln
mately 0.45 for the rate 5 data and 0.30 for the ` D 268:96, df D 1, P < 0:001; calculated us-
rate 10 data; comparison of these data sets ing PAML 3.0c; Yang, 2000). Even under the
indicates signiŽcant (40% type I error rate, simplest model of DNA substitution (JC69; a
Fig. 1A) bias in the ILD on the null hypothe- poor Žt to these data), this rate difference was
sis of congruence. Although there is a general easily detectable (¡2 ln ` D 107:97, df D 1,
trend in contrasting RI values being associ- P < 0:001). Thus, in a case with the small-
ated with signiŽcant values of the ILD, even est contrast in rates between partitions sim-
very small contrasts in RI may still be associ- ulated here and the largest number of fac-
ated with signiŽcance. For example, the 5:50 tors that might obscure this contrast (base
rate comparison under the K2P model with composition skewness and transition bias),
unequal branch lengths yields signiŽcant the maximum-likelihood method was easily
values of the test in 30% (Fig. 1B) of the sim- able to detect the difference. The ILD test con-
ulated replicates, but examination of data set sistently indicated heterogeneity in only the
RI values indicates that they are essentially cases of greatest contrast (e.g., comparisons
identical (RI ¼ 0.25, Fig. 4B). For this reason, with rate 50 data). Thus, if the ILD test were
it may be more appropriate to seek alterna- a measure of rate homogeneity, it is an ex-
tives to the ILD as a criterion of congruence or tremely inefŽcient one relative to other meth-
to create modiŽcations of the ILD that would ods currently available for analysis of molec-
make it a valid criterion of congruence. ular data. In addition, the results of Darlu and
Lecointre (2002) indicate that the test has lit-
tle power to detect other types of heterogene-
ILD as an Indicator of Homogeneity ity, such as differences in lineage-speciŽc and
Although one option for dealing with the site-speciŽc rate heterogeneity.
behavior of the ILD test would be to aban-
don it as a criterion of topological congru-
ence (i.e., shared phylogenetic history), this ILD as a Criterion of Combinability
begs the question of what exactly the ILD is Although data set homogeneity guaran-
measuring. One candidate interpretation of tees increasing phylogenetic accuracy with
the ILD test is as a measure of homogene- data set combination when analytical meth-
ity among data partitions. With this inter- ods are statistically consistent, combining
pretation, the proportion of ILD signiŽcance heterogeneous data can also increase accu-
values · ® in our simulations is an indica- racy, even if the analysis does not explicitly
tion of the test’s statistical power to detect incorporate that heterogeneity. For example,
heterogeneity (¯; probability of rejecting ho- Figure 2B indicates that the combination of
mogeneity given that homogeneity is false). rate 5 and rate 10 data signiŽcantly increased
Under this interpretation, the ILD test seems the average accuracy of phylogenetic estima-
to fare poorly. In Figure 1B, for the HKY85 tion using parsimony. Others have argued
model of evolution with the short-internal- that varying levels of homoplasy in differ-
branches tree and the most extreme rate com- ent data sets might contribute to an overall
parison (1:50), only »50% of ILD replicates robust signal (Barrett et al., 1991; Nixon and
reject homogeneity at the ® D 0:05 level. Carpenter, 1996; Vidal and Lecointre, 1998;
To place this value in an appropriate Wenzel and Siddall, 1999). Although it may
context, we used a maximum-likelihood be advantageous to combine heterogeneous
approach (Yang, 1996) to detect among- data, ideally some criterion should be used
partition rate heterogeneity in a combined to indicate whether or not data combination
data set generated under the same model is desirable in individual cases.
of evolution (HKY85 with the short-internal- To evaluate the ILD as a criterion for
branch tree) but with the smallest relative dif- combinability, we analyzed changes in
636 S YSTEMATIC BIOLOGY VOL. 51

phylogenetic accuracy accompanying data in this effect is left unexplained and deci-
set combination as a function of ILD P val- sions regarding data combination based on
ues. Increasing ILD P values (i.e., decreasing the ILD would be misleading in a large
signiŽcance levels) were correlated with im- proportion of cases. Beyond the realm of
provements in phylogenetic accuracy with combinability testing per se, the ILD has
data set combination (Table 2). However, the been used as a criterion for model choice in
relationship was extremely weak (Fig. 3), and combined data analysis (e.g., Giribet et al.,
the amount of variance in improvement ex- 2001), but recent results suggest that even
plained was generally small (»10%). We also this use may be problematic (Dowton and
examined this question from a less stringent Austin, 2002). The precise utility and ap-
point of view, asking whether or not sig- propriate uses of the ILD test remain to be
niŽcant ILD values were more likely to be established.
associated with decreases in accuracy (and
conversely whether nonsigniŽcant ILD val-
ues were generally associat ed with, at worst, ACKNOWLEDGMENTS
neutral effects of data set combination). Our For comments on various versions of the manuscript,
 2 analysis of the HKY85 data on the short- we thank Cliff Cunningham, Chris Simon, and two
internal-branches tree indicate that this trend anonymous reviewers. This research was partially sup-
exists and is signiŽcant (Table 2). How- ported by a grant from the National Science Foundation,
USA, Systematic Biology (DEB-9615542 ) to F.M.L.
ever, nearly 30% of combined data sets with
nonsigniŽcant ILD values still had reduced
accuracy relative to the better of the two sep-
arate analyses, and nearly half of the com- R EFERENCES
bined data sets with signiŽcant ILD values ARCHIE, J. W. 1996. Measures of homoplasy. Pages 153–
showed increased accuracy. In sum, the ILD 188 in Homoplasy: The recurrence of similarity in
appears to be a relatively poor indicator of evolution (M. J. Sanderson and L. Hufford, eds.).
Academic Press, San Diego, CA.
data set combinability with the criterion of ARCHIE, J. W., AND J. FELS ENSTEIN . 1993. The number of
phylogenetic accuracy and should not be evolutionary steps on random and minimum length
used for this purpose even when using low trees for random evolutionary data. Theor. Popul.
critical ® values between 0.01 and 0.001 (see Biol. 43:52–79.
BARRETT , M., M. J. DONOGHUE, AND E. SOBER . 1991.
Sullivan, 1996; Cunningham, 1997a). Against consensus. Syst. Zool. 40:486–493.
BROWER, A. V. Z., R. DESALLE, AND A. VOGLER .
1996. Gene trees, species trees, and systematics: A
CONCLUSIONS cladistic perspective. Annu. Rev. Ecol. Syst. 27:423–
450.
We have briey reviewed the three related BULL , J. J., J. P. HUELS ENBECK , C. W. CUNNINGHAM , D. L.
concepts of topological congruence, homo- SWOFFORD, AND P. J. WADDELL. 1993. Partitioning and
geneity among data partitions, and combin- combining data in phylogenetic analysis. Syst. Biol.
ability speciŽcally with regard to the util- 42:384–397.
ity of the ILD test in decisions regarding CARBONE, I., J. B. ANDERSON, AND L. M. KOHN. 1999.
Patterns of descent in clonal lineages and their mul-
phylogenetic data analysis. Our simulation tilocus Žngerprints are resolved with combined gene
study supports previous studies in rejecting genealogies. Evolution 53:11–21.
the ILD test as a unbiased measure of phy- CHIPPINDALE , P. T., AND J. J. WIENS . 1994. Weighting,
logenetic congruence (Graham et al., 1998; partitioning, and combining characters in phyloge-
netic analysis. Syst. Biol. 43:278–287.
Dolphin et al., 2000; Darlu and Lecointre, COLLESS , D. H. 1980. Congruence between morphome-
2002). The observed bias occurs under a bi- tric and allozyme data for Menidia species: A reap-
ologically realistic range of parameters and praisal. Syst. Zool. 29:288–299.
cannot be easily predicted from observed CUNNINGHAM , C. W. 1997a. Can three incongruence
levels of homoplasy. Our results further in- tests predict when data should be combined? Mol.
Biol. Evol. 14:733–740.
dicate that the ILD test has relatively little CUNNINGHAM , C. W. 1997b. Is congruence between data
statistical power to detect substitution rate partitions a reliable predictor of phylogenetic accu-
heterogeneity, especially relative to avail- racy? Empirically testing an iterative procedure for
able alternative methods. Although signif- choosing among phylogenetic methods. Syst. Biol.
46:464–478.
icance values of the ILD broadly predict DARLU, P., AND G. LECOINTRE. 2002. When does the in-
the effect of data set combination on phy- congruence length difference test fail? Mol. Biol. Evol.
logenetic accuracy, a great deal of variation 19:432–437.
2002 POINTS OF VIEW 637

DE SALLE, R., AND A. V. Z. BROWER. 1997. Process par- in the tribe Triticeae (Gramineae). Syst. Biol. 45:524–
titions, congruence, and the independence of char- 545.
acters: Inferring relationships among closely related MICKEVICH, M. F., AND M. S. JOHNS ON. 1976. Congru-
Hawaiian Drosophila from multiple gene regions. Syst. ence between morphological and allozyme data in
Biol. 46:751–764. evolutionary inference and character evolution. Syst.
DOLPHIN , K., R. BELSHAW , C. D. L. ORME, AND D. L. J. Zool. 25:260–270.
QUICKE. 2000. Noise and incongruence: Interpreting NIXON, K. C., AND J. M. CARPENTER . 1996. On simulta-
results of the incongruence length difference test. Mol. neous analysis. Cladistics 12:221–241.
Phylogenet. Evol. 17:401–406. RAMBAUT , A., AND N. C. GRASSLY. 1997. Seq-Gen: An
DOWTON, M., AND A. D. AUS TIN. 2002. Increased incon- application for the Monte Carlo simulation of DNA se-
gruence does not necessarily indicate increased phy- quence evolution along phylogenetic trees. Comput.
logenetic accuracy—The behavior of the ILD test in Appl. Biosci. 13:235–238.
mixed-model analyses. Syst. Biol. 51:19–31. REED , R. D., AND F. A. H. SPERLING . 1999. Interaction of
FARRIS , J. S., M. KÄLLERS J Ö , A. G. KLUGE, AND C. BULT . process partitions in phylogenetic analysis: An exam-
1995a. Constructing a signiŽcance test for incongru- ple from the swallowtail buttery genus Papilio. Mol.
ence. Syst. Biol. 44:570–572. Biol. Evol. 16:286–297.
FARRIS , J. S., M. KÄLLERS J Ö , A. G. KLUGE, AND C. BULT . RODRIGO , A. G., M. K ELLY -BORGES , P. R. BERGQUIST ,
1995b. Testing signiŽcance of incongruence. Cladistics AND P. L. BERGQUIST . 1993. A randomisation test of
10:315–319. the null hypothesis that two cladograms are sam-
FELSENSTEIN , J. 1981. Evolutionary trees from DNA ple estimates of a parametric phylogenetic tree. N.Z.
sequences: A maximum likelihood approach. J. Mol. J. Bot. 31:257–268.
Evol. 17:368–376. STANGER -HALL, I. K., AND C. W. CUNNINGHAM . 1998.
GEISER , D. M., J. I. PITT , AND J. W. TAYLOR. 1998. Cryp- Support for a monophyletic Lemuriformes: Overcom-
tic speciation and recombination in the aatoxin pro- ing incongruence between data partitions. Mol. Biol.
ducing fungus Aspergillus avus. Proc. Natl. Acad. Sci. Evol. 15:1572–1577.
USA 95:388–393. SULLIVAN, J. 1996. Combining data with different dis-
GIR IBET , G., G. D. EDGECOMBE, AND W. C. W HEELER . tributions of among-site variation. Syst. Biol. 45:375–
2001. Arthropod phylogeny based on eight molecular 380.
loci and morphology. Nature 413:157–161. SWOFFORD, D. L. 1998. PAUP¤ : Phylogenetic analysis
GRAHAM, S. W., J. R. KOHN, B. R. MORTON, J. E. using parsimony (¤ and other methods), version 4.0.
ECKENWALDER, AND S. C. H. BARRETT . 1998. Phylo- Sinauer, Sunderland, Massachusetts.
genetic congruence and discordance among one mor- TEMPLETON, A. R. 1983. Phylogenetic inference from re-
phological and three molecular data sets from Pont- striction endonuclease cleavage site maps with par-
ederiaceae. Syst. Biol. 47:545–567. ticular reference to the humans and apes. Evolution
HASEG AWA, M., H. KISHINO , AND T. YANO . 1985. Dat- 37:221–244.
ing the human–ape splitting by a molecular clock of VIDAL, N., AND G. LECOINTRE. 1998. Weighting and
mitochondrial DNA. J. Mol. Evol. 22:160–174. congruence: A case study based on three mitochon-
HUELS ENBECK , J. P., AND J. J. BULL. 1996. A likelihood ra- drial genes in vipers. Mol. Phylogenet. Evol. 9:366–
tio test to detect conicting phylogenetic signal. Syst. 374.
Biol. 45:92–98. WENZEL, J. W., AND M. E. SIDDALL. 1999. Noise.
JUKES , T. H., AND C. R. CANTO R . 1969. Evolution of Cladistics 15:51–64.
protein molecules. Pages 21–132 in Mammalian pro- WILGENBUSCH, J., AND K. D E QUIEROZ. 2000. Phyloge-
tein metabolism (H. N. Munro, ed.). Academic Press, netic relationships among the phrynosomatid sand
New York. lizards inferred from mitochondrial DNA sequences
KIMUR A, M. 1980. A simple method for estimating evo- generated by heterogeneous evolutionary processes.
lutionary rates of base substitutions through compar- Syst. Biol. 49:592–612.
ative studies of nucleotide sequences. J. Mol. Evol. YANG , Z. 1993. Maximum-likelihood estimation of
16:111–120. phylogeny from DNA sequences when substitu-
KLUGE, A. G., AND A. J. W OLF. 1993. Cladistics: What’s tion rates differ over sites. Mol. Biol. Evol. 10:1396–
in a word? Cladistics 9:183–199. 1401.
KOUFOPANOU, V., A. BURT , AND J. W. TAYLOR . 1997. YANG , Z. 1996. Maximum-likelihood models for com-
Concordance of gene genealogies reveals reproduc- bined analyses of multiple sequence data. J. Mol. Evol.
tive isolation in the pathogenic fungus Coccidioides 42:587–596.
immitis. Proc. Natl. Acad. Sci. USA 94:5478–5482. YANG , Z. 2000. Phylogenetic analysis by maximum
LUTZONI , F. M. 1997. Phylogeny of lichen- and non- likelihood (PAML), version 3.0c. Univ. College,
lichen-forming omphalinoid mushrooms and the util- London.
ity of testing for combinability among multiple data YODER, A. D., J. A. IR WIN , AND B. A. PAYSEUR . 2001.
sets. Syst. Biol. 46:373–406. Failure of the ILD to determine data combinabil-
LUTZONI , F. M., AND R. VILGALYS . 1995. Integration of ity for slow loris phylogeny. Syst. Biol. 50:408–
morphological and molecular data sets in estimating 424.
fungal phylogenies. Can. J. Bot. 73:S649–S659.
MADDIS ON, W. P., AND D. R. MADDIS ON. 1999.
MacClade: Analysis of phylogeny and charac-
ter evolution, version 3.08. Sinauer, Sunderland,
Massachusetts. First submitted 25 September 2000; reviews returned
MASON-GAMER , R. J., AND E. A. K ELLOGG . 1996. Testing 23 January 2001; Žnal acceptance 9 April 2002
for phylogenetic conict among molecular data sets Associate Editor: Richard Olmstead

You might also like