You are on page 1of 9
Journal of Applied Ecology 1998, 38 523-531 © 1998 British Ecological Society A taxonomic distinctness index and its statistical properties K.R, CLARKE and R.M. WARWICK, Centre for Coastal and Mavine Sciences, Plymouth Marine Laboratory, Prospect Place, West Hoe, Plymouth PLI3DH, UK Summary For biological community data (species-by-sample abundance matrices), Warwick & Clarke (1995) defined (wo biodiversity indices, capturing the structure not only of the distribution of abundances amongst species but also the taxonomic relatedness of the species in each sample. The first index, taxonomic diversity (A), can be thought chosen at random from the sample: this distance can be visualized simply us the length of the path connecting these (wo organisms, traced through (say) a Linnean or phylogenetic classification of the Tull set of species involved. The second index, taxonomic dis- tinetness (A®), is the average path length between any «wo randomly chosen indi- viduals, conditional on them being from different species. Thisis equivalent to dividing taxonomic diversity, A, by the value it would take were there to be no taxonomic hierarchy (all species belonging to the same genus). A* can therefore be seen as a measure of pure taxonomic relatedness, whereas A mixes taxonomic relatedness with of as the average taxonomic ‘distance’ between any two organism the evenness properties of the abundance distribution, 2. This paper explores the statistical sampling properties of A and A*. Taxonomic ersity is seen to be & natural extension ofa form of Simpson's index, incorporating taxonomic (or phylogenetic) information. Importantly for practical comparisons, both A and A* are shown not to be dependent, on average, on the degree of sampling effort involved in the data collection: this is in sharp contrast with those diversity di ‘measures that are strongly influenced by the number of observed species. 3. The special case where the data consist only of presence/absence information is dealt ‘with in detail: A and A* converge to the same statistic (A~), which is now defined as the average taxonomic path length between any (wo randomly chosen species. Its lack of dependence, in mean value, on sampling effort implies that A* can be compared across studies with differing and uncontrolled degrees of sampling effort (Subject to assumptions ‘may be of particular significance for historic (diffusely collected) species lists from different localities or regions, which at first concerning comparable taxonomic accuracy). Thi sight may seem unamenable (o valid diversity comparison of any sort 4. Furthermore, a randomization testis possible, to detect a difference in the taxonomic ‘pected’ A value derived from a ‘master species list for the relevant group of organisms. The exact randomization pro- cedure requires heavy computation, and an approximation is developed, by deriving an appropriate variance formula, This leads to a ‘confidence funnel” against which dis- can be checked, and formally addresses the question of whether a putatively impacted locality hhas a ‘lower than expected’ taxonomic spread, The procedure is illustrated for the UK species list of free-living marine nematodes and sets of samples from intertidal sites in distinctness, for any observed set of species, from th linetness values for any specific area, pollution condition, habitat type, el two localities, the Exe estuary and the Firth of Clyde. Kes~words: biodiversity, randomization test, sampling effort, unbiasedness, variance estimate, Journal of Applied Ecology (1998) 35, $23-531 Correspondence: Dr K. R, Clarke (fax: 01752 633101; e-mail: b.clarke@ panl.c.uk), 23 504 Properties of a distinctness inde 1998 British Ecological Society, Journal of Applied Ecology. 38, 523-531 Introduction It is increasingly recognized (e.g. Harper & Hawks- worth 1994) that adequate measures of biodiversity ‘within a particular taxonomic group should not be merely functions of the number of species present and their relative abundances, but should also include information on the relatedness’ ofthese species, There is now a substantial literature (Faith 1994; Humphr- ies, Williams & Vane-Wright 1995 and referrals ther- cin) on measures incorporating, principally, phylo- genetic relationships amongst species and their possible use in selecting species or reserves of greatest conservation priority. Vane-Wright, Humphries & Wiliams (1991), Williams, Humphries & Vane: Wright (1991) and May (1990) introduced measures of distinctness based only on the topology of a phylo- genetic tree, appropriate when branch lengths are entirely unknown, and Faith (1992, 1994) defined and justified a phylogenetic diversity (PD) measure based ‘on known branch lengths: PD issimply the cumulative branch length ofthe full tree ‘This literature does not appear, to date, to have carried over into the area of environmental moni- toring and assessment, where the emphasis is not on ‘choosing species to conserve but monitoring for environmental degradation or the benefits of reme- Giation, The considerations here are eather different the raw material is often a set of community samples With recorded abundances for each observed species, rather than a single species list, thought of as a com- plete inventory. The outcome required is not a pref ‘erential selection of species ftom the inventory for ‘conservation status, but an assessment of whether sampled assemblages display some pattern in bio diversity through time or in space. Natural variation, and thus sampling properties of the resulting abun- ‘dance matrices and derived indices, are of paramount importance. Also, the basic information on species relatedness is often just a Linnean taxonomy (Fig. 1, 4 crude approximation to a phylogeny but one that Families Genera — ef) Species — Individuals —* x, Xp Fig. 1. Part of a taxonomic classic does impose an ordering of branch lengths which is interpretable and should be used. For example, even allowing that the only data available are in the form of presence/absence, measures that rely solely on top> ology andjor species richness would not distinguish between Fig. a and Fig. 2b, yet Fig. 2aclearly exhibits ‘greater biodiversity inthe Sense of richness in higher ‘axa. Similarly, PD, applied toa Linnean classification (Faith 1994), has a focus exclusively on ‘character richness’ rather than “character combinations’ (in the terminology of Humphries, Wiliams & Vane-Wright 1995), so hat PD concentrates on higher taxon rich- ness and ignores the evenness component in diversity. ‘Thus, PD would not distinguish between Fig. 2c and Fig 24, yet Fig 2d clearly represents a less taxo- ‘nomically diverse assemblage than Fig. 2c, both in the sense of possessing greater vulnerability to species loss and in potential functional ineficieney. ‘An over-riding consideration ina comparative biodiversity study is the extent to which a putative statistic is sensitive to differing sampling effort at diffrent sites or times. 1 is well-known, and demon- strated statkly in Fig.3a-c, that standard diversity estimates can be strongly dependent on sampling effort, particularly in so far as they are influenced by the number of species in the sample. Species richness is crucially dependent on sampling effort and it must be expected that only carefully controlled and equi table sampling studies can provide comparative data Warwick & Clarke (1995), however, define taxonomic diversity/distineiness measures that satisfy the above requirements of incorporating higher taxon richness and evenness concepts (sce the A values inthe legend to Fig.2) but also have an apparent insensitivity 10 sampling effort (Fig. 34-1 In the Methods that follow, the construction of tnd A* is described and the link to the Simpson diver- sity drawn. It is demonstrated theoretically tht, if A and A, are defined us the values of & and A* feom a subset of m organisms, randomly selected from a total of m individuals, then they are either exactly (4) or Vf have showing examples of path fength weights (o, used to define taxonomic diver siydstineness measures: conventional diversity indices wilize only the species abundanees [j= b-« 35 KR Clarke & RM, Warwick 1998 British Ecological Society owral of Applied Ecology. 3, sues omer = a) wo) Femity Gerus ‘Species —+ cl @ Family Genus Species — Fie. 2. Some simple, contrasting taxonomic tees for presencelabsence data (ie. ignoring species abundance information) versity measures based only on topology ofthe tees would not distinguish (frm (5), and measures Based on otal branch Fength would not distinguish () fram (a), but taxonomic distinctness A“, based on the average of pair-wise path lengths {equation ofthe text). does draw these distinctions. Using simple (1,2 3.) weighting of path lengths, A~ values ae (a) 50, (b) 10. c) 186 and (d) 12. placing the four configurations inthe intuitively expected distinctness order othe 1h = a ! Ea cle 3 og i al i z a i i ! 57 ! ia ! . al eeu Ds 4th 3 eee ce E oof g . i eee * 2 F sal so, of incividvate in aubaampte Fi. Simulation snd one ss of sample sion forty indies ing singe, compost sample of bunds ‘of THT nematode species (1000 indivi) fom six ts in the Firth of Cyde (Lambshead 1986). Subsamples of individuals ‘were drawn at random for 10 (logaithmicaly increasing) subset sizes, with 10 replicate simulations at each sie, and the following indies computed: a) Shannon diversity, (b) MarglePsd(a species richness index that attempts to acest for sample sie, (e)Pielows J eeecting evenness of abundances actos species), (0) A (eguation 1) 2) A* (equation 3) 1)” equation). ‘The simulations forthe final plot ignored the species abundances and selected fixed numbers of species (ftom the 111 for computation of &*, The conventional diversity indices ae sen to be dependent on subsample size. unlike the taxonomic siversitystnetness measure. approximately (A,*) unbiased estimates of the respee-_exactin the particular case where the data only records tive true A and A* for the whole sample, whatever the the presence or absence of species, not their abun- stubset size m. The unbiasedness is also shown to be dances 526 Properties of a distinciness index 1 1998 British Ecological Society. Journal of Applied Ecology. 38, 524-531 Methods DEFINITION OF INDICES “Taxonomic diversity’ (Warwick & Clarke 1995) is defined, using the notation of Fig. 1, as: a IEE, goenns + EO-sh)— 12Y (EE. ny + Eady, — 2] [EE 1) is also a random pair from the fll set of « species. By definition, A* is ‘the expected path length for a randomly selected pair ‘of species from the Full set of s species, so it must follow that E{A,") = A*. Similar reasoning yields the exact unbiasedness result for A,, but not for A,*, because of the congitionalty clause in its definition; recourse needs to be made to the Taylor series approximation of the Appendix, ‘The Appendix then goes on to show that the vari- anceof thesubsampleestimate ,* hasthefollowing form: var") = Is — minim — 1X5 = 24s — 3)" {(s—m~ No, +215 1m 20 where 4.0 WEE a Vs = I= cans ean 22 10/1 — 0 eqn7 8,= Beant — 1) can = Gays = EE panos I) Ar cand ‘These two 6° terms are straightforward properties of the taxonomic tree for the full species set, with 6, corresponding 1 the variance ofall path lengths {0} between different species, and 6, the vatiance of the ‘mean path lengths {3} from each species to all others. Note that equation Sis an exact result not a Taylor seties approximation, These sampling properties now motivate a statis: tical test for increase oF decrease in observed taxo- nomic distinciness, based either on direct simulation ‘oF approximate confidence intervals (of the usual ‘mean +2 SD form). constructed from the variance expression of equation 5. Results A PRACTICAL TEST FOR CHANGE IN TAXONOMIC DISTINCTINESS. The fact that, for presence/absence data, the dis tinetness estimate (4,*) from a subset of m species unbiasedly estimates the distinctness (A) of the fall set, suggests the following test scenario for situations in which, at first sight, no valid diversity comparisons seem possible. The starting assumption is that there ‘exists 4 reasonably comprehiensive species lst (inven tory) for a region, within which certain localities are postulated to have reduced diversity. Ifthe only data Available at these localities are local species lists from one-off studies, and theres no control ofthe sampling effort expended in each Jocation (or in constructing the regional inventory). then the only conventional diversity measure caleulable ~ the number of species found at each locality ~is uninterpretable. However, the above results show that one can unbiasedly com= pare taxonomic distinctness at locality with that For the global ist. For the null hypothesis of no difference, 4 randomization test can be performed by repeatedly subsampling species sets of size m, drawn at random from the global lst, and constructing the histogram of the resulting A. ‘around the global distintness of A* and the spread ‘of the simulated values can be used to determine ifthe observed 4," for that locality is at variance with the null hypothesis Figured is based on a UK species list for freelving ‘marine nematodes (s = 395; se the companion paper Warwick & Clarke 1998), a nematode species list (= 122) from combined core samples taken over the ‘course ofa year at eight sandy sites in the Exe estuary, England, UK (Warwick 1971), and a further nema- ‘ode species list (m = 111) from six sandy sites in the Clyde estuary, Scotland, UK (Lambshead 1986). For total of 1000 random samples of size m = 122 (for Fig da), and a further 1000 random samples with ‘m= 111 (for Fig. 46), drawn from the global ist the ‘Ay estimates give the histograms of Fig. 4a,b, show~ ing the typically rather narrow range of distinetness values commensurate with the null hypothesis For estimates, These will centre 28 Properties of @ taxonomic distinemess index {© 1998 Bris Ecological Society. Journal of Applied Ecology. 38, 52531 160} Exe sands (a) a 478 120 Frequency | 44-45 46 47 48 49 160} Clyde sands tb) 120 20) to) A= 448 il 4445 46 47 48 49 Taxonomic distinctness 4° Fig. 4, Histogram of A* values for 1000 random subsamples of fixed number m of species, fom ful ist of fresiving ‘marine nematodes of the UK ts 195 species) (a) m = 122, (B) m= U1, corresponding to the sublist sizes for combined samples a intertidal sandy sitesi the Exe and Clyde estuaries, respectively. Te true A~ values for both Ioclit “ate also indicated: forthe Clyde, the null hypothesis thatthe average distinctness equates with hat forthe UK. as 8 whole clearly ejected (P< 01%) these subsample sizes. The true &,* for the Exe estu- ary sands, of 4:75, lies centrally to the distribution of Fig. 4a and therefore provides no evidence ofa differ- ent average distinetness at this locality than in the UK, region as a whole. To reject the null hypothesis, at approximately the 5% level, the tue 3.,* would need to fall below the 25th lowest (of 1000) simulated A," values inthe histogram, or above the 25th highest. Ia contrast, the true 4,” for the Clyde sands (4:46) is below this lower limit in Fig. 40 and in fact its smaller than any of the 1000 simulated values, so there is significant evidence ofa lower taxonomic distinctness here than for the UK as a whole (P < 0-1%). ‘The computational burden of this large number of simulations, which needs to be repeated for every locality under test (with a diferent species subset size), can be heavy, although not usually prohibitive. A ‘much faster, approximate procedure is provided by the variance formula of equation S. The constants 6, and 9, in this expression are a function only of the tree structure ofthe global list (of « species) and need to be calculated only once (for all marine nematodes of the UK, for example). The variance expression is tena rather simple funetion of subsample size m ind these constants, so that an approximate 95% con- fidence ‘funnel’ (mean +2 SD) can easily be con- structed over the full ange of m-values. Here the mean ‘sequal to A* for the global lst (= 4-72 for UK marine nematodes) and the SD is the square root of the vari= ‘ance expression in equation 5, Figure displays this funnel (the smooth, darker lines) and contrasts it with the results of extensive simulation runs (the circles, joined by lighter lines) for subset lists of m = 10, 15. 20, 25, ..., 380 species. At each point there are 1000 random selections and the circles denote the 2Sth low- fest and 25th highest distinetness values (simulated 98% confidence limits). There is clear evidence of a lefiskewed distribution for 4,” in this ease (as also for the Chilean nematode data of the companion paper; Warwick & Clarke 1998) but the normality approximation to the lower confidence limit (the important limit in practic) is good enough to suggest that this may be @ useful shorteut to the full ran- ddomization procedure, in non-borderline cases, when computing power is limiting. An improved empirical fit could doubtless be constructed [rom an expression for the third moment of A,” Discussion As shown in Fig. 5, distinctness values for any specific Toealty, habitat type, pellution condition, ete.. can be plotted on the confidence funnel created from a regional species list, to test for significant departures from the null hypothesis (that a particular subsample ‘bchaves, in terms of ts pair-wise average distinctness, as if it were a random sample from the larger list. The companion paper, Warwick & Clarke (1998), applies and interprets this method in a range of situ: ations Its perhaps surprising that a diversity test of any sort should be possible ina case where sampling effort is uncontrolled and the only data consist of presence for absence of species. Indeed, the test could not be expected to have the same sensitivity 4s that obtain- able from a wider range of diversity measures (or multivariate analysis) calculable from abundance data in carefully standardized sampling plans. The key point to recognize here is that certain diversity features, most obviously the number of species re corded in a sample, are highly dependent on the sam- pling regime, and can only be straightforwaedly com= pared under conditions of comparable sampling effort, The same caveats will apply to other diversity totals, such as PD, the total phylogenetic or taxo- nomic branch length ina subtree for a particular 529 KR, Clarke & RM, Warwick 46 1998 Bris Ecological Sosy, Journal of Applied Ecology. 38, 503-31 54| UK nematodes 52 50 fx ot in meee te ee 46 Taxonomic distinctness A” Simulated mean (true meen ie 4:72) Imulated 96% contidence limite Theoretical 96% contidence limite 200 300 ‘400 Subset size (m) Fie, Confidence funnels for the A” randomization et from the al-UK lis of marine nematode species. Circles correspond to-dget randomization resis foreach sult size, and smooth (thick) lines to approximate limits using the variance formela fof equation. The dashed line gives the mean 4° over each (97 = 472 forthe Fl set of 395 specs). Jocalty/condition, They will not apply in general to average properties, such as the paif-wise taxonomic distinctness indices discussed here or, possibly, an average phylogenetic diversity, defined as PDs, (Note though that, as pointed out earlier, the latter would have certain interpretational drawbacks: average PD takes the same value for Fig, 2e.6. Its also true that ‘average PD calculated from a randomly selected sub- list of m species does not unbiasedly estimate average PD for the total list of s species, a fact that can be seen as further limiting the usefulness of this possible alternative formulation.) ‘Thus, for historic data andjor meta-analyses in hich results from different workers are contrasted there may be litte choice but to recognize that only certain aspeets of diversity, such as average taxonomic distinctness, may be validly compared. This raises a final question, on the extent to which the com- parability of 4° is compromised by the differing taxo- nomic identification skills of different workers. In fact, for A,” to remain unbiased for A” itis not necessary to assume that all workers are equally efficient, only that taxonomic aecuraey is independent of the taxo~ homie relatedness of the species involved. To put it another way, certain workers may miss (or mis- ‘dentify) species but, provided they do so at random across the species pool, in effect the test remains ‘unchanged. (Whether low numbers of species are ound because of low sampling effort or a low identi- fication rate is then irrelevant to the construction of A*) Whether such an independence scenario is reasonable in practice is diseussed further in the com. anion paper (Warwick & Clarke 1998) simulation, sonfeming the theoretical unbusedness result Acknowledgements ‘This work forms part of the Marine Biodiversity pro- jeot of the CCMS Plymouth Marine Laborator [Natural Environment Research Couneil, UK, and is partfunded by the UK Ministry of Agriculture. Fish cries and Food (project no. AEI113). We thank Paul Harvey (University of Oxford) for useful contextual comments, and also an anonymous referee who con- tributed helpful insights into the derivation of results, References. Clarke, KAR, & Green, RUF, (1988) Statistical design and Anal}ss for a “bolowcal eects’ study. Marine Ecology Progress Series, 46, 215-226 th. P1992) Conservation evaluation and phylogenetic ‘diversity. Biological Consertarin, 61 110. Faith. D.P. (1994) Phylogenetic pattem and the quant ‘ation of organismal biodiversity. Philosophical Tran. ‘actions ofthe Royal Society of London Sevtes B38, 45 5s. Harper, LL. & Hawksworth, D.L. (1994) Biodiversity Imeasurement and. estimation, Preface. Philosophical Transactions ofthe Royal Soe of Land Series B. 348, 12 Humphries, CJ. Wiliams, PH. & Vane Wright, RI (1995) “Measuring biodiversity value for conservation, Annual ecies of Ecology and Systematics, 6, 98-101 Lambshead, PLD. (1986) Subcatastophic sewage and industal waste contamination as revealed by marine nematode faunal analysis. Marine Ecology Progress Series, 29.247-260, May, R.M. (1990) Taxonomy as destiny. Nature, 347, 129 130, Plelou, EC. (1978) Ecological Dicey. Wiley, New York Fs 530 Properties of a distinctness index © 1998 Bris Ecological Society Journal of Applied Ecology. 38, 2-531 Simpson, EH, (1949) Measurement of diversity. Nanure, 163, ‘8 ‘Vone-Wright, RL, Humphries, CJ. & Willams, PAE (1991) ‘What to protect? Systematics and the agony of choice Biological Consercatian. 85, 235-284 Warwick. RM. (1971) Nematode associations in the Exe ‘estuary. Journal ofthe Marine Biological Assocation of the United Kingdom, 51 439-454, Warwick, RM. & Clarke, K.R. (1995) New ‘biodiversiy’ ‘measures reveal a dereus in taxonomic distinctness with Increasing suess. Marine Ecology Progrese Series, 129, 50-30, Warwick, RM. & Clarke, K.R (1998) Taxonomic dis. tinctnss. and envionmental assessment. Journal of Applied Ecology, 38, 532-543 Willams, PHL, Humphries, CJ. & Vane-Wrieht, R1.(1981) Measuring biodivesty: taxonomic Felatednes Tor con- servation priostis. usralion Systematic Bosany, 4, 695: 6m. ‘ected 31 August 1997: revision receted 2 Apri 1998 Append CASE 1: RANDOM SUBSAMPLES OF INDIVIDUALS For the situation represented by Fig. 3d, the exact ‘unbiasedness of A, and asymptotic unbiasedness of A,*, as estimators for A and A*, respectively, is ddémonstrated under random subsampling (without replacement) of organisms trom the full set of ‘The species abundances {x3 1= 1... 55 Ex, = 7] aand the total number of species s are thought of as fixed, with the taxonomic diversity A and distinctness A® for the full data set given by equations | and 3 of the main text. For a fixed-size (m) subset of indi- viduals, denote the abundances of each ofthe s species by {YeF= leo... 3,¥, =m]. capital letters reflecting the fact that these are the ‘random variables’. The estimators of A and A* from a sample of size m are: y= (EE, ey Fionn — 12) Bat = BE 10, Y YEE, 0 VY ‘The {¥)} are jointly hypergeometric, with probability distribution: Pri = yn a= Y= eqnAl eqn AZ = (WCHIECH)--LGEHYMCm) eqn A3 and mean values EY) = msn (Fy), eqn as Using the fact that the expectation ofa sum of random variables i the sum of the expectations, even when non-independent, the expectation of A,, i: Udy) = LEE, 42g YI lon — 1/2) Itean be shown from equation A3 that (YY) = (one — xan — 0] tim eqn AS ssi AD ean AS so that EA.) = FEE, tae en — 192] = In a similar way, but this time utilizing an asymp- totic Taylor series expansion to express the mean of & eqn aT ratio as approximately the ratio of the means, and again using equation A6: BAR) © (EE, eECYYIMEE, < EY YD BZoecsNIEE an) =a eqn AS CASE 2: RANDOM SUBLISTS OF SPECIES ‘The exact unbiasediness ofthe 4, estimator for Ais demonstrated for random sublists of species drawn (without replacement) from the full list of s species. In addition, the exact variance of 4, is derived, a8 basis for confidence funnels, such a8 that in Fig. 5 ‘This is a special case of the formulation in ease 1, with abundances taking the values x, = 1 forall! = 1 +5 species present in the full set; the taxonomic istinctness A” of the full tree (395 species in the UK nematode example of Fig. 5) is given by equation 4 of the main text. For a fixed-size (m) sublist of species,

You might also like