You are on page 1of 176
CHANGE IN MARINE COMMUNITIES: An Approach to Statistical Analysis and Interpretation 2nd Edition K. R. Clarke & R. M. Warwick Cum Plymouth Marine Laboratory, UK -E Lid Pablished 2001, by PRIMER-E Lid Plymouth Marin Laboratory Prospect Place West Hoe Plymouth PLI 3DH United Kingdom Business OFice: 6 Hedingham Gardens Roborough Plymouth PL6 7OX United Kingdom ios K Raber Cee MSOPRD_Rayind Gy MA Reprimed 957 Sond etn 2001 ‘Clarke, KR, Warwil, ML, 2001 ‘Change it marine communities: an approach to statistical analysis and interpretation, Dad odeon. PRIMERCE: Plymouth Copyright 2001 PRIME E Li al igh esd Cover phos: Steve Smith Unerty f New England Armidale, NSW, Aas, Pl Somerfield (Pymoth Marine Lert, UN), and hey Remon NDA ligt, No Zesand CONTENTS INTRODUCTION CHAPTER 1 CHAPTER 2 CHAPTERS cuaprer ‘CHAPTERS (CHAPTER 6 CHAPTER 7 (CHAPTERS CHAPTERS CHAPTER 10 (CHAPTER 11 carrer 12 (CHAPTER 13 CHAPTER 14 CHAPTER 15 CHAPTER 16 cHaPreR 17 APPENDIX APPENDIX APPENDIX ‘A framework for studying changes in community structure “Measures of sinilariy of species abundance biomass between samples, Hierarchical clustering Ondination of samples by Principal Components Analysis (PCA) Ondination of samples by Muli-Dimensional Scaling (MDS) Testing for differences between groups of samples Species analyses Diversity measures, dominance curves and other graphical analyses Transformations Species removal and ageregaton Linking community analyses wo environmental variables Causality: community experiment in the fel and laboratory Data requirements for biological effects studies: which components and attributes of the marine biota to examine? Relative sensitivities and merits of univariate, graphical/distributional and ‘multivariate techniques. Multivariate measures of community sess Further comparison of multivariate patterns Biodiversity measures based on relatedness of species Index of example data Principal literature sources and further reading References cited INTRODUCTION Parpose ‘This manual accompanies the computer sofware package PRIMER (Plymouth Routines in Maltvariate Fcologieal Research), obtainable from PRIMER-E Ld, Plymouth see wow primer som). le scope i the analysis of data arising in community eslogy and ‘ervionmertal scence which fs mula ia character {onany species, multiple environmental variables) and its intended for use by ecologists with no more han a minimal background in statistics. AS sch, thi methods ‘manual complements the PRIMER user minal, by ving the background to the statistical whic employed bythe analysis programs (Table?) t level of detail hich shot allow the eicogist 10 lunderstand the Outpt from the programs, beable 0 Aesrie the results ina non-technical way to others tnd have confidence thatthe right methods ae being sed forthe right problem. “This may seem a tll order, in an area of statistics (primarily multivariate anal) which septation 4 esotrie and mathematically complex! However, ‘whist itis tre thatthe computational details of some Of the core techniques described here (For sxample, on-met multidimensional sealing) are decid n= trivial, we maintain that all of the methods at have Tale Chap tehsil Inerbng see PRIMER einer are ever Rowtnes ‘Capes ‘Sinan a ‘CLUSTER 3 Pca an os “anos ANOSTI 6 sire 1 DiveRse an CASWELL, Goomertc & Dominance Pots '§ Tranjerm 9 Atarewase 0 ‘BIO-ENY, Draiuman Plot " IMYDISP, RELATE 5 BUSTEP, STAGE 6 peer creeper joaemesahyae eins Introdeton ree -| ‘ben adopted or developed within PRIMER are so conceptually seaightorward a to be ameneble to ‘simple explanation and transparent interpretation In fact, the adoption of non-parametric and permutation approaches for display and testing of multivariate ata requires, paradoxically, lower level of statisti Sophistication onthe pr ofthe usr tan ds satis- factory exposition of standard (parametric) hypothesis testing inthe wnivoriae case, ‘The principal am ofthis manus is therfore to dsibe 4 coherent strategy for the interprettion of dita on ‘community strutore, namely values of abundance, biomass, % cover presencefabecace ee, fora et oF species and one oF more replicate samples shen 8) ata numberof sites atone time spatial analysis), 1b) atthe same site at a number of times (temporal analysis); ©) for @ community subject to different uncontrolled ‘or manipulative "weatment'; or some combination ofthese, These speceshy-samplesarays are typically large, and patems in community ste ate often not ey pparent fom sinpe inspection of the dts ‘Statistical tals therefore cea around redoing the compl iy ofthese matrices, sll by some graphical pres tation of the biological relationships between the samples. This is followed by statisti! testing 19 ‘eri and characterise change in community sacture Intime or space and ela these to changing environ ‘mental or experimental conditions, Material covered 1 should be made clear atthe outset that the te "Change ia Marine Communities” doesnot in ny ‘vay reflect a restrition inthe scope ofthe techniques in the PRIMER package to the marine environment, ‘The fist eiton of this manual war intended primarily for a marin audience and, given that ts examples are sill drava entirely from marine contexts, would be disingenuous to change the tile ow toa more general fone. However, it wll be self-evident to the reader ‘that there is very litle in he following pages tat ‘exclusively marin, Indeed, the PRIMER package i ‘now not only used world-wide forall types of maine ‘community surveys and experiments, of bethi ana, alge, corals, plankton, fis, diet stadies ete, but is ineresingly found in freshwater, tenes and palacontlogy contexts, and sometimes solely in mul ‘arate studies of physico-chemical characteristics, ‘As result ofthe authors’ own esearch interests and the widespread use of commnity data in polition ‘monitoring. amor isto the manual ithe Biological fet of contains but, again, mos ofthe methods fre much more genealy applicable. Thisis reflected ina range of more fundamental ecological studies song the eal datasets exemplified here ‘The leramre contains a large array of sophisticated staistical techniques for handling species-b- samples atc, ranging rm tee redacton wo spe dere sy. indies, through curvilinear or distributional representations of richness, dominance, evenness et, ‘2 plethoa of multivariate approaches involving lasing or ordination methods. This manual does hot attempt i give an overview ofall the options, oF ven the majority of them. Instead it present satay ‘which has evolved over several years within the Community BcologyBiodivesity group a Pimouth Marine Laburatory (PML), and which basa proven teak eco in interpretation of a wide ange of marine community cata: se, fr example, papers listed under {Clarke or Wersick in Appendix 3 (which have attained fouefigure tal ctions i SC journals). Tho analyses in displays thse papers, andi his manual almost raw up the wide ange of routines available in ‘he PRIMER package hough in many cases amnotations ‘ten plots hve boo Further elite by simple ping ino graphics pogems suchas Microsoft Pwverpiat). Note also that, whist other software packages wi not encompass this specific combination of routines, everal of the individual techniques can be found elsewhere. For example, the core clustering and ‘ordination methods described here ae available in ‘Several munstream statistical packages (SAS, -Phs, Syst Statgaphies ete), and more speed stat- isc’ programs (CANOCO, PATN, PCORD, the Cornell Eedogy programs, et.) tackle essentially similar probes. though usally employing diferent techniques and a diferent strategy ttalning wokshops funded jointly by FAO, UNEP and UNESCOMIOC, and a series of commercially-un PRIMER courses at Plymouth and venues outside the UK. The acvocacyof these techniques ths springs not nly rem vegular use and development within PML's Community Esology/Biodiversty group but leo fo valiablefecdback fom a seis of workshops in which rate data analyses were central ‘Throughout the manual extensive use is made of data sets rom the pulse iterate to illustrate the et niques. Appendix 1 gives the cial erature sour foreach ofthese 25 o so data Sets and an index tall the pages om which they are analysed, Each dataset is allocated singe leer designation an, to avold confusion, refered fin the text ofthe mal by tha let, placed in curly brackets (eg. 4) = Amoco Cadiz sil spi, macrofauns; (8) ~ Brisol Chanve, 200- ‘ankton: (C7 = Celie Se, zooplankton ete. Literatare citation ‘This 2nd edition ofthe manual follows the Ist ition elosely in espect ofthe Fist 15 chapters, though minor revisions have been made throughout” Chapters 16 tnd 17 ae entirely new. Appendix? lists some back- ground papers appropriate o each chapter, incling the souree of specific analyses, ard fll isting of referenees cited isin Appendix 3 ‘Whi the manual is peninely collaboratively authored, forthe purposes of drestng queries on specie topics ‘tis broadly tre that the rst ather (KR) Bears the responsibility forthe chapters on satstical methods (077, 9,11) and the second author RMW) i mainly responsible forthe chapters on nt pretation (10, 12 14), the responsibility for Chapters & and 1S being shared more or less equally. Chapters 16 and 17 were ‘writen by KRC, drawing onthe rests of joint popes in various authorship combination by KRC, RMW and Paul Somerfield (also of the Plymouth Marine aboraton). Since this manual is ot accessible within the published literature, referal 0 the methods it destibes would properly be by citing the primary person which itis based: these ae indicated in the text and Appendix 2. Alternatively, comprehensive Aiscussion ofthe philosophy (and many of the details) ofthe molvviat and univariate approaches abvocted fan be found in Clarke (1993, 1599) and Warwick (1993, respectively, withthe newer methods in his tition best summarised in Clavke and Warwick (19983), Somerfield and Clarke (1995) and Warwick nd Clark (2001). Acknowledgements ‘We are gratefil to a large numberof individuals and Insitutions for thee help and support ~ pease se the etal ist tthe end ofthe mani KR Clarke RM Warwick 2x01 Chapt age CHAPTER 1: A FRAMEWORK FOR STUDYING CHANGES IN COMMUNITY STRUCTURE ‘The purpose of this ofeing chapters twofold 8) to introduce some ofthe datasets which are used ‘extensively as illsrations of techniques, trough- ‘ut the man 1) to outline a famework forthe various posible sages in a communi analysis. Examples are given of some core elements of the recommended approaches, foreshadowing the analyses ‘explained in detail ner an referring forward othe relevant chapters. Though, a this stage, the details fre Tikely to remain rysifpng, the intention f that ‘his opening chaper sould give the reader some fee! Tor where the various zchniqus are leading and how they slot topeter. AS such, ii intended to serve both as an introduction and a summary Stages tis convenient ctezoris possible analyses bray int four main stages 1) Representing communities by eaphical description ofthe relationships between the biota in the various Samples. This i bought of a8 pure description, rather than expanaton etext, ad the emphasis iS on reducing the sompleity ofthe maltivariate information in typical spcie/simples matrices, 2 ‘obtain some form af low-dimensional piste of how the biological amples ntereate, 2) Discriminating steveondtions onthe basis of hci biotic composition. ‘The paradigm her i that of the hypothesis test, examining Whether there are “proven” community differences beoveen groups of samples identified prior for example demon- stating diferences between contol and putatively Impacted sites, esablishing beforlaftr impact ferences ata singe site ee 3) Determining level of “sess” 0 disturbance, by tempting to const biological measures from the Community data which ae indiestive of disturbed ‘conditions. These nay be absolute measures this ‘sbeered scr! fue is ndtve of polaion”) ‘or relative eritria under impact, this coocient “rhe commun waned oh he re ment ose oceans ang ae ‘wear imply sna orci pen spoon cmp tapes fas is expected to doors level") Noe the con fe previous stage, however, which i estetd 1 demons ones betwee grup of samples, not asrbing Airetionalty to the change (eg. deleterious sone sequence). 4) Linking to environmental variables so exasiving ‘sues of enusaiy of any changes. Having allowed the bilogies! information to “lls onn ston”, any associated physical of chemical” variables ‘matched the same set of samples can be examined for thei own structure and its relation othe Bote patem (is “explanatory power"), The extent 10 ‘hich identified environmental differences are actually eamsal to observed community changes can only relly be determined by: manipulative experiments, either inthe field or though laboratory Frmesocosm tues, ‘Techniques ‘The spread of methods for exacting workable repre ‘mations and summaries ofthe biological data can be srouped into tne categories, 1) Univariate methods collapse te ful sto species ‘counts fora sample into single coefficient, for example a species diversiy index. This might be Some measore of the numbers of different species fora fixed pumber of individuals (species richness) forthe exten o which the community counts are dominate ty a sal numberof species (dominance Fevenness index), of some combination of these Also included ar lovers indices which measre the degree to which species or organs in a sample are taxonomically or phylogenetically related to eachother. Clea, the prior selection ‘of asingl taxon as an indicator species, amenable to specifi inferences about its response toa partic- lar environmental gradient, alo gives rise fo.@ ‘ivr nasi. 2) Distributional techniques, sso termed graphical ‘or cunilinear plot (when they are not strictly Aistutona, “are a elass of methods “which summarise the set of species counts for single sample by a curve or histogram. One example ise dominance curves (Lambsad eo 1983), bic rank the species in decreasing order of abundance, ‘convert the values percentage abundance relative page 1-2 to the tot number of individuals inthe sample, fand plot the cumulated percentages against the Specs rank, This, andthe analogous plot based (of species biomass, are superimposed to define ABC (abundance-biomass comparison) curves ‘Warwick, 986), which have proved a sof com ‘retin investigating disturbance eflets. Another example is the species abundance distribution Gomatines termed the dsrbution of individuals lamongstspriet) is which the species ae cates ‘red ino geometrically sealed abundance classes land a histogram plated ofthe number of species falling in each abundance range (eg. Gray and Pearson, 1983), tis then argued again from emp inal evidence, hat there are cern characteristic changes in fis distribution associate with comm ‘nity disturbance, ‘Sach distribatona techniques rel the constraint inthe prevous category tht the summary from cach sample shouldbe a singe variable; here the emphasis is more on diversity curves than single ‘iversity ines, bu note hat both these catagories share the propery tha comparisons between samp les are not Based on particular species identities: ‘to samples can have exactly the same diversity or Alistbutional structure without possessing a single species in exmmen, 5) Multivariate methods ar characterised by the Fst that they be their comparisons of two (or more) samples on he extent to wich these samples share particular species, at comparable levels of abund- ance. Either explicitly or implicitly all molt techniques se founded on such similarity ceffic- ents, clelaed beeen every pair of samples There then facilitate a classical oe clustering (these terms are interchangable) of samples into soups which re mutally similar, or an odinaion loin which or example, the sample re “mapped” (swallyin two oF tre dimensions in sucha way thatthe distances between pairs of samples veteet their relative dissimilarity of species compsition Techwigues described in detail inthis mans! are 9 Imethod of erarchica agglomerative clustering (ea. Eveit, 1980), in which samples ae suscessive- Iy fused into larger groups, av the erterion forthe similar Tevel defining group membership i relaxed, tnd evo ordination techniques: principal components analysis (PCA, eg. Chatfield and Collins, 180) and tnom-meric mut-dimensonal sealing (NNIDS,wsuly Shortened to MDS, Kruskal and Wish, 1978) For each broad category of analysis, the techniques appropiate to cach stage are now diseusie, and pointers piven othe relevant chapters. UNIVARIATE TECHNIQUES For diversity indices and other singlevarsble extractions from the data marx, standard satis ‘methods are usually applicable and the wader is refered to one of the many excellent general statisti texts (eg Sokal and Rohlf, 1981) The equisite techniques fo each tage ae summarised in Table 1.1. For example, when samples have the structure of @ numberof replicates taken at each of 3 umber of sites (or times, or conditions), computing the means and 95% confidence intervals gies at appropriate represemtation of the Shannon diversity (Gap) at each ste, with discrimination berwoen ster being demonstrated by one-way analysis of variance (ANOVA), which sa test ofthe mull hyptteis that there ae no differences in mean diversity between ale Unt teenie. Sammars foro es Ua pe Does ies (Or) adr her nentCh 7) 1 Reps eon nd 95% conden mri oh steondton Ch, 9.7) 2) Distniting se ons far ANOY sites‘condiions Cnorey eeapae es 2) Deeminige Byrne hr! ear te ( 1,19) regina “pec po041) Srarlees| —Unimay aden mcs al meete ln operant Lat of oom tne 2 Lakin Aarne O41): focal ince Ch 12 sites, Linking tthe environment is ten als relt- ily staihtfrward, parca i th environmental ‘arabes can be condense int one (osm number «key summary statistics. Simple or mute regres fan of Shannon diversity asthe dependent variable, Against the envionment descriptors as independent ‘ariabls, i then technically feasible, though rarely SE infrmtine n act, en the needed rate ofthe information ut For impact studies, much hasbeen writen about the ect of potion or disturbance on diversity mesures hist the eesponse is not necessarily undiectionl (ander the hypothesis of Huston, 1979, diversity is expected to rise at intermediate disturbance levels tefore its strong decline. with gross disturbance), there is sense in which determining stress levels is pps, hough elon to historia diversity paterss for particular environmental gradients. Similarly, teplrical evidence may exist that particlar indiestor texa (@e. Capitli) change in’ abundance along spe olluton gradients eof oginc enrichment), [Nate though that, nlite the diversity measures con- structed fom abundances across species, averaged in some was indicator species levels or the number of fecies ina sample (S) may not intl ste the ‘sumptions necessiy for clasizal statistical analysis For, the nomality and constant variance conditions can usually be produced by tansfomation of the variable (ele 8). However, for most individu species abundance aos these of samples i key XE be a very poorl-behaved variable, statistically syeaking. Typically, species wil be absent from any ofthe samples and, when tis present, the counts ‘ae often highly variable, with an abundance probab- ity disebuton which is eavly rightshewed! Thus, ferall ba the most common individual species trans. Fermation is no real help and parametric statistical analyses canna be appli tothe counts, in any or, In any ease, iis not Valid to “snoop” in a large data natrix, of typically 100-250 ta, for one oF more “interesting” species to analyse by univariate techn- ‘gues (any indeator or heystone species selection must Anda ajc ema ihrem, wih en "at r,t in he ly fe nmani, ha sof pees a dated ‘iron pea Plon pratt a nh cee ‘a ie Howat lc! trator foci tet ‘Chaser mechani of ern, warty ey ‘eect. "Ta ear cou mae teed onerliered contin wit igh sof ‘rac cing mae oon aping prams mig ‘eweoreal nto Chapter 1 meets be done a prior. Such arguments lead to the tenets underlying this mana {community dita re usually highly mivarite (large numbers of species, cach subject 10 high Statistical mole) and need tobe analysed en masse in order (0 ect the important biological signal nd it elation to he environment ‘standard paramedic modelling stotaly inva ‘Thus, throughout litle emphasis given to represent ing communities by univariate measures though some Aetnitons of iedices can be found atthe sat of Chapter 8, some bref remarks on hypatheris testing (ANOVA) atthe start of Chaper 6,8 discussion of transfematons (» approximate aot ac constant sariance) at the tart of Chapter 9, an example given ofa univariate regression between biota and environ ment in Chapter 1, and a moee extensive discussion of sampling properties of diversity indices and bi diversity measur based on taxonomic relatedness, ‘makes up Chapt 17. Finally, Chapter gives a series of detalled comparisons of tnivarate with Aistrbutonal and multivariate techniques in onde to ‘sauge their elative sensitivities and merits na range of practical studies, EXAMPLE: Fvierfjord macrofauna ‘The first exampye is fom the JOCIGEEP practical workshop on bisogical effects of plltants (Bayne tal 1988), hel atthe University of Oslo, August 1986, Thisatenyeed o contrast a range of biochem, > 1) ant the prevalence of zeros Here, as slsewhere, even an undesirable reduction to the 30 “most important” species (se Chapter 2) leaves more Chapter ig 1. Peer Nene IF. Bc onion sctihane CURED Dua Wap teamed Jeena yt ary ‘Pig. 12. Poeerd mcrfaana fF). Maas ad 9% fice trator Sarr es om ar ‘plea aeach afar) than 50% of the matrix consisting of zeros. Standanh multivariate normal analyses (eg: Maria eal 1979) ofthese cots are ceri ruled ou; they require both that the number of species (variables) be small in relation to he numberof samples, and that the abund- !nce or binmass values are transformable 1 approx mate norway nether is possible. As discussed above, one easy route to simplification ofthis “high-dimensional” complexity i to reduce tach column ofthe matrix (each sample) to single, Tite 12, Pernt micrfaun Ph Arion ones ics at ny ore 0 secs 4 amps op era fs 4) ebmdoee nabs or (ne oma tg po —__ Semper oa ALAD AS AG Ba Abundance Corioniualions 0 0 00 0000 Halenpusp = 0 0 0 1 D000 Onchnsoms = 0 0 OD DOOD Phacalonsvonbs 0-0 0-1 9010 a Holohwodee = 0 000 «9000 Nemerina nde 12-6 «8 6 9 61? Pobeoua mi = 3 000 0010 Amocnariaa 1 1 10 9000 Anphietesgumed 0 0 0.0 4 0.00 Anpharcie «0 000 1000 Anatidesgrom. «0 0 OT 1000 acter 0000 0000 Biomass Conanturlinds 0 000 0000 HHatenpmssp = «0 0 0 0 000 JOncineroma = «0 0 0 0 0 OOO Phascolonsrombi 0 0:0 6 0 020 a a Hoohuroeo = «0 9-0 0 0 O00 Nemertina det 140390 «1S TOT Popcerm inde «= 290-0 0 0 0 OO JAmscna rotate 4 14234-00800 JAmphces sumer -0-0«0:«0 «0 00 Amphoe «0:0 0 0 0 OOO JAnandesgroen. «=«0:«0:«0 FT «HO OO JAnawdse 0 OD OOOO univariate descriptin. Fig. 12 shows the results of computing the Shanton diversity (H, see Chaper 8) of each sample’, an ploting for each site the mean diversity and its 95% confidence interval, based ona pooled estate oftariance acrost al sites from the ‘ANOVA table, Chapter (An analysis ofthe type ‘outlined in Chapter shows that prior ransformation (of HT is not requir it already has approximately ‘constant variance acres the ites, a necesary pees ie for standard ANOVA). The most obvious feture of Fig. 12 isthe wlatively higher diveesity atthe conto” lation, 4 1 ng the PRIMER DIERSE rate Stazes ‘ABC dominance) ares (Ch) 1 Representing Cares for och 2) Dissininating ANOVA ona Sends ANOSIM tex (Ch6) on “dsc erween eer pa of cues 3) Determining Biomass ere shops Blow stessleves mbt curve wer dtbance 4) Liking to Dia exp for ai DISTRIBUTIONAL TECHNIQUES ‘A les condensed frm of summary of each sample is ‘ered bythe dsetional graphical methods utined Forte four stages in Table 13. Representation is by curves or histograms (Chapter 8) either ploted foreach replicate sample spaately or fer pooled data witha sites or conditions. The former Permits visu judgement ofthe sampling raciation Inthe curves and, as with diversity indices, plication require to dscriminae sie, ic ts thew yp ‘ss tat two or more sites (conditions et) have the ‘ame curvilinear structure. The easiest aproech to ‘esting then to summarise each eeplizate eve by 8 single statistic and apply ANOVA as befoe for the ‘ABC mathod, mentioned eat, te I staisiChapter 8) isa convenient measre ofthe extent to which the biomass curve “dominates” the abundance carve, or ce-vesa. This is efective in practice though, in theory simply amounts o computing another divest index and is therefore jut a univariate ppreach, A, more general ts, hich Honours the urna struc ‘ure, could be eonstucted by the ANOSIM procedure (Gescrited later under multivariate techniques), comp ‘ed Between every pir of replicate ABC cures. 1 iano xr ot pra hima or serge Crk (95) Sar ste te romp ore fee Sol ewe seat cs om pce abundance atom th age ‘Soran pacha xing proba nh at ‘iat b ove meres (fo sa sets (te) ond cay at 400 aay eo ack pate Inter pansy oh nae to fare {Cameron Mins approach er eng eu of oe Imre gues diate, ba la bad ok pecs imeem mig sane strut empl ioe chine Sua fans fo fo age itelcondtion (roel pice) aria summaries (eg, CH, 0 Ts fr commonality of batons (eg chexqured fread Specie abundance dtrbution has Tonge tai” ith cbunce rite summaries of the curves fy regrsin) (Cowal see C812) ‘The distributonal/graphical techniques have been proposed specifically a8 a way of determining sires levels For the ABC method, the strongly polluted (istrbe) statis indicated if the abundance f-r- Jnmee cue falls above the biomass curve throughout its egth (eg. see the ltr pots in Fig. 1: the phen- ‘omenon is linked to the os of ageodied “linay™ Species and the rie of small portnists. Nowe thatthe ABC procedure claims to ghe an bsolae measure nthe sense that disturbance stasis ati able on the bass of simples from a single site in Practice however iti aways wise to design collection fiom (matched) impacted and contol sites o conten that the control condition exhibits the undisturbed ‘ABC pattern (biomass curve above the abundange curve, thoughout) Similarly, the species abundance distribution has features characteristic of disturbed sats (eg se the middle plts in Fig. 1.6), namely a move toa less“ shape” disteibuton by a reduction inthe first one or ‘wo abundance clases (lose of rust specs), ‘combined with the gain of some higher abundine lasses (Very numers opportunist specie). ‘The disibuionlgraphical metheds may thas have particular merits in allowing recognition of stressed States (Chapter 14), though they have the disadvant- ge of being more dificult o work with statistical for example in linking t0 environmental variables where the only viable course again seems tobe rect fon of the curve(s) for each sample 10 summary statistic (suchas 7), which ean be regressed on past- ‘colar abiotic variables, ale LA Lack Line mar). Mandan amas mann op) eto wa pyar (96-179) 1968 Species A a Taaopervenranas =O 7 Stel om oo pi eal Cg 0 aia flr yen Myre spina 2M 76 cinoma borate B09 Monacuaferapinose 1 oo ° pla idea vo ° Arasp. nd oo 0 Cortada gibb0 Geatnes 5 aca ide Oo 0 ° 1964 1968 1966 al a 7 005 oo = te 0 ea = ape 0 50” 0 ° a a 05 Geo eed ey 3 eg | hace od, im 00 2 07 ° 4 om oo ° oo ° ° 2 026 eo eo te 0 oo oo Pig 13. Lach Line and Loh Stn. po te ina ont Eh ample nm 185-1973 EXAMPLE: Loch Linnhe macrofauna Pearson (1975) describes time series of maerbenthie community srl, then over the period 1963~1973 incsve, at to sites in sea feh system on the west exat of Seoland((), Fig. 13) Pooling toa Single sample foreach of the 11 years resulted in hundance and biomass matrices of 115 rw species) ‘nd 11 columns (Samples), small part of which i shown in Table 14 Starting in 1966, palp-mill efflent was discharged 10 the sea lochs (Fae 13), "eplapd h farpaety fritraio i it 2 sal fi orator PRIMER whieh eps bdo od ona iformstn be sept shay) ara with the rate increasing in 1970 and a siniticant resistin taking place in 1972 (Pearson, 1575). The {op leftchand plot of Fig 1 shows the Shannon divers i of the macrobenthie samples over this prod, and the remaining plas the ABC curves foreach year” “There appears to be a consistent change of stucture from one in sshich the Biomass curve dominates the lndane ease inthe erly yeas fo the curves cross ing, reversing altogeter and then finally reverting {heir original fom, EXAMPLE: Garroch Head macrofauna Pearson and Blackstock (1984) describe th sampling fof a transect of 12 sites acrose the sewage-siadge Si and Se > Si now), showing that transformations ca have © significant effect on he inal ordination or lustring Table 22, Look Linke maroon (Lf set) ra fared bande fs anon ape of Tae (Rang tos Cos ty mae over @ @ Bw sample: 1 23 8) Sample 1 23 4 Species r Eckinoce 17 0 «0 0 2 26 ~ Mrioe 21-0 0133 0 68 — Labidepl 17 25 018 4 2 6 Amaeana 0 19:35:17 Capiela 0 34 4312 ee) 5 proscar te min efi calcite bythe PRIER Sry rou, hs tow ange of oir Yow au Chapter eet In fat, for ver arial data, choice of transformation ‘an somtimes be more cies! han choice of imirty ‘officent or ondination technique, and the subject ‘erefore mori a chapter to itself (Chapter 9}. (Canberra coefcient An atemative transformation fo selec similarity coeticient tha auomatialy lances the weighting igen to each species when compused on rial counts (siorassicova)- One such possiiliy given by Lance ‘and Wiliams (1967), and refered toa the Canberra oeticient, defines similarity Between sample j and ample Fas sei (Clearly, this tas a strong tikeness to the Bray-Cortis ‘oeticint but the absolte difference in coun for ‘och species separately sealed, Le. the denominator Sealing term is inside not outside the summation over species. Fer ample, from Table 21a, the Canberra Similarity betwen samples | ants $ bono A, en DEO +90) yo slo sig 21 Nate tat join absences have no effect hee because they are dlibaratlyexclaed (since 00 undefined) and is reset be the aur of spocies that are preset ina leas one o th two samples under consideration ‘The separate sealing constrains cach species to make al contribu on (potently to he similarity between fo samples. However abundant a species 5, its ‘contribution to Scan never be more than IO, and a ‘ace species wah 2 singe individual in each ofthe to samples contrbtes the same as a common species ‘with 1000 in viuals in each, Whilst there may be circumstances in which hiss dosiable, moe afen it Teas to overdomaton ofthe prem by alarge number of are species, of po real significance. (Often the sampling sttezy is incapable of adequately quantiing the rare species, that they ae distributed arbitrary, to some depres, scoss the samples) ‘Correlation coefficient ‘A common saistial means of assessing the elation ‘ship between swo columns of data (samples j and & hor) is the sundard product moment, oc Pearson, cotton cotiient es) ‘where 5. i defined as the mean vale overall species forthe th sample. n his fem it isnot a simiaity coefficient, since it takes values in the rang 1,1) ‘(0 100, wth postive colton (ear) high counts inane sample match high counts in be othe, land negative coreation (7 <0) iPhigh courts match absences, There area mimber of ways of eenerting toa similarity coefficient, the most obvious for ‘community data being $= 50(17. Wilt comolaton i sometimes used as similarity coefficient explicitly inthis form, and more often implicit asthe similarity measur nderying eran. ‘ordination techniques (eg. Principal Components ‘Analysis, Chapter ti ot parouarly ele for ‘much bjologeal community dats, with it plethora of zero values, For example, it violates the eet that 'S should not depend on joint absences; Here 10 columns are more highly positively cored (and ve 5 nearer 100) if specles are ded which have ro counts fr both samples. I corlation tobe Used a measure of similarity, it makes good sense to transform the data inital. exactly a fo the Bray ‘Curis computation, so that large counts or biomass ‘dono totally dominate the eoeticient, Why does he Bray-Cuts coefficient hive sich a dominant role in ecological studies? The answer is simple: it sone ofthe very few measures tht sts all ofthe following. practically desirable citer: 1) takes the value 100 when wo samples are etic (as do most coeticients, 1b takes the value 0 when wo samples have m species in common (this is» much tougher condi and ‘most coefficients fil i: ©) achange of measurement unit does not afetits vale (vost coeicients pass this on}: {its value is unchanged by inclusion or excusion of species wich is oily absent fom the tw snes (nother difficult condition to etsy, aad may soetiient il: 6) inclusion (or exclusion) ofa hid sample, in the data ary makes no ference 10 the sinilrity ‘between samples A and B (suprisingly, may coe: icles fl his, because they depend on seme form ‘of standantsaton cased ou for ach species, citer by the species tla or maximum vale aro all simples 1 thas the Meili to register differences in total abundance fortwo samples as a less prfet ‘Siri when he relative abundinees fol Species ae identical (Some coelciens stndacze ao- ‘matialy by sample totals, 0 cannot reflect this ‘component of sinarityifeenee). In ation, Faith e f (1987) use a simlaon study 10 look a the robustness of various slaty oeff- cients in reconstructing, « (nonlinear) ological response gradient. ey tind hat Brsy-Cuts a a very closly-flaed modification, the Kuleynshi coefficient (Kulezyaski 1928) ity 2 Ema perform most stisfctor 4) CCoeticients ther than Bray-Curt, which sty all ofthe above conditions either have counetalancing Arawbacks, or are so closely related to Bry “uri to be virtually indstingushable inmost practical applications. An example of th farmer ithe Canter ootiient, with i forced equal woightng of rare land commen spaces. The later is exempfed by Kulezynski, which clearly reverts exactly 10 Bray Curis for standardized samples (when the column totals are all 100). Comparing equations (1) and (2.4, itis seen only to differ from Bry- Cons, for on standardised das, in respect of the form of aserase used inthe denominator term, employing 2 ermonic ‘ther than arithmetic mean ofthe column ol This cn only have a substantial influence onthe outcome in cases where total abundance (biomaslave) 1 ‘very variable and close to 2 fr coe or more samples, which will also usualy be restricted to anayses on transformed data. PRESENCE/ABSENCE DATA, [As discussed at the beginning ofthis chapter, quan ‘ative uncertainty may make desirable to rece the ata simply to presence or absence of each spies in ‘ich sample, o his may be the only Feasible or cost fective option for data collection in the it place Altematively, reduction to presencaabsence may be thought ofa the ultimate in severe tansformtion of ‘counts; the data matrix (eg in Table 2.1) i eplaced craptera res by 1 (presence FO (absence and Bray-Cunts similarity (s3y) computed. This wil have the elfest of giving Potentially equal weight to all species, whether rare ‘or abundant (and wl us have somewhat inlet to the Canbera coeficen) “Many similarity coofciens have buen proposed based fon (0, 1) data arrays; soe for example, Sneath and Sokal (1973) Legenive and Legend (1998), When computing similarity Between samples j and ke the ‘wo columns of daa can be reduced to the following four summary stasis without any loss of relevant information: <2 the numberof species which ae present in bh sanples, (=the numer of species pest in sample but asont from sample & >the numer of species preset in sample kt absent from sample | «4~ the numberof species absent rom both simples For example, when comparing samples 1 and 4 from Table 21a these feguencis ae Smee 2 bet a eo? ant Sample In fact, cause ofthe symmetry, coos must be 2 symimeticfueton of bande, otherwise Swill et tbe egual fo. Similarly, similarity measures that ae not affected by joint absences wil not contain The following are se ofthe mote commonly ao ated coticens, ‘The simple matching siilaty between simples j and Bis defined a Ss 000+ aNa+ b+ e+e] es) So called because it represents the probability (+100) ‘of a single species picked at random (fom the fall species list) being present in both samples or absent in both samples. Note that Sia faction of d here, nd hus depends on joint absences. 1 the "simple matching” coetTicient is adjusted, by first removing al pcies which are jointly absent fom samples and b, one obtains the Jaccard coefiien = 100[aa 4+) on Chapter? pes i. S isthe probability (100) that a single species picked at random (rm the eeduced species fist) wil be present in both samples. [A popular coefiirt found under several names, commonly Sorenson o¢ Dice is Se= 100(20120-+ +e en [Note tht this i emia to the Bray-Curtis coef ‘cient when the later is ealelat on (0, 1) presence absence data, as can be Seen mest clearly fom the second fom of equation (21) For example, reducing Table 21a (0, 1 data, and comparing samples 1 and 4 as previously, equation (2.1) gives: ool A0es1+0+050)) 55 rea-zerte } This iso b= ly the same constuction as substituting int equation (27) ‘Several ter coeTiceu have been propose: Legere nd Legendre (199) Tit east 15, bt nly oe other ‘ase given here. Inthe light of the ear discus ion on coefficients satisfying desiabl,bilogialy- rotvted eriteri, not tht tere isa presencelabsence Ferm ofthe Kulegsh! soefcien (2.4), close relive of Bray-Curtis, namely s+) as RECOMMENDATIONS, 1) in most ecological studies, some initve axioms for desirable pacical Behaviour ofa similarity efficient lead inexorably To the use of the Bray Carts measure (cr s closely-elatedcoetiient sch as that of Kuleryoaki) 2)Similares calculated on orignal abundance (or biomass) values ean often be over dominated by 3 small numberof highly abundant (or large-bodied) Species, so that they il rel sir of overall community composition "ree Serre coment a oie the PRIMER Satna by afr hed pre ence ‘sing Bry Cory 3)Some coffins (euch 8 the Cantera), which Sparately scale the contribution ofeach species faut fo ths, ve a tendency to vercompenste, iecrre pein which my be rir dated scrote sapien ar hen eal weigh common es. The sac ism appli reduedon of he via marino simple pesercetbsence of ich Species," addon, the Tater loses potentially “alene of species (abst, rar, reset in modest amber common, very abundant) 4) balanced compromise i often to apply the Bra: ‘Cus similarity to counts (biomssvcover values) hich have been moderately, Yo fairly severely ‘wagsfrmed, log») or Vy” All species then ‘ontibte Something to the definition of similarity, ‘wilt the retention of some infmaton on the prevalence ofa species ensues tht the commoner Species ae generally piven great weight than the 5) Inia standardisation is ocessinally desirable, viding cach count by the total abundance ofall species in that sample his is essential when non- compare, unknown sample volumes have Been {aken.Witow this column standardisation, the Bray-Curis coefficient will rest diferences between tvo samples due both to differing commun ity composition and difering orl abundance. The sandardisation removes any effet of the later ‘wheter this i desirable i a biological rather than Statistical question. (Experience with benthic ‘communities suggests that standrdsation should ‘lly be avoided, vliabe biological information ‘ring contained inthe abundance, biomass or ever total), Note, however, that colin standardisation doesnot remove the need subsequently t ans ‘he data mati, if te similares ato ake account ‘of more than just the few commonsst species ‘SPECIES SIMILARITIES, Starting with the original data matrix of abundances {or biomass % cover ec), the similarity between any Bair of species can be defined in an analogous way 12 that for samples, but this time invobing comparison ofthe Anand ih row species) aro all = By columns (samples). * ete PRIMER Sort renin drone it nf smpe tb Ved i ge aed te ap nonfat Bray-Curtis coefficent “The Bray-Carts similarity between species J and fi snl Zbl} a» Taoven| Testee ves we (0) evi Sif ew seleshve mo sepia commen rf tae) \8'= 100 Fhe yvaes for two species ar he same at allsites However, diferent inital treatment of the data is equzed, in wo respects 1) Similarities betwoon tare species hve litle messing: ‘very often such species have single occurrences, Astrbuted more or less arbitral asros theses, So that "is usually zero (or oceasionally 100) If these values are Ie inthe similarity matrix they Wil tnd to eontuse and disupt the patterns in any Subsequent clustering or oedination analysis, the rarer species should thus be omited from the data tris before computing species silat, 2) cifrem form of andar of he data matrix isapproprite and (in conta othe samples analysis) it usualy makes sense to amy this out outnay in place of a tansformaton, Two species could have quite different mean Tovels of abundance yet be "perfectly similar inthe sense ha thir counts ae ‘in src toto cach other across the sples- One Species might be of much larger body sie, and sus {end 1 have smaller counts or extmpe: or there ght be a direct host parasite eationship between the two species. Is therefore appropriate o sand andse the original data by dividing each etry by itsrow (species) total, and multiplying by 100: Hy = 1009, / Sy before computing the similarities (5). The effect ofthis can be seen from the artificial example inthe following table, for three species and five samples. For the orignal mats, the Beay-Cuns similarity Uetwewn apevies¥ and 2, for ensmpley only = 5390 bt the wo spaces re found in sit proportion tw each other across the samples so tha after row standardisation, they have a moe realistic similarity ‘of 100%, Note that itis nt cleae hat a rans Tormation now serves any wsefl purpose. Its role e109 Coapter2 m2? ‘Counts ‘Similars Sumple 1-23 4 $ Specie 1 23 120048 3 Be BE Ow we Ss osadd ° YStantaetie soe 1239405 pee 3 Species TL 1 000” 2 1 - 2 wo0ow 3 ‘BB i ooo wo in the samples analysis was to reduce (though not totaly emoe) the lre ips in counts between species: the standardisation by row toa has here removed sich differences, Corre ion coefficint| “The standard prodvet moment correlation coefficient defined in equation (23), and subsequently molied ‘0. simlariy is pethaps more appropriate fo defining species smilies than it was for samples, in that it sutomtilly incrperates type ofr standardisation, In fact, this is fll normalisation (suring the row mean from each count and dividing bythe row standard deviation) and is Tess appropiate tan the Simple ow standarisation above.” In dion, the previous argument aboot the effet of joint abuses |sequaly appropriate o species similarities: an inter- tidal species s no mor similar toa decp-ea species bacase neither is ound in shelf samples. A corltion wil again be Tonetion of joint absences the Bray Canis coeTcient will not RECOMMENDATION For species similarities, coefisint such as Bray Curtis calcuatod on row-standarised and nrans frm data seems mot appropriate, There species (osualya east half ofthe species st should ist be removed from the matrix © have any chance ofan Jtrpretable clustering or ordination analysis. There ane several ways of doing this ll of them arbitrary to some dees. Field taf (1982) suggest removal of al species that never constitute more han pe o he total abundance (iomasscover) af any sample, where is chosen o retin around 50 or 60 species (piclly ‘p= 3%, oF 50, fo sot-sedimenthenthi dia). Tiss preferable to simply retaining the 50 or 60 species With the highest ft abundance across all sample, Chapter me8 since the later strategy may rest in omiting several Species which are key onsitents ofa site which is ‘haractried by Jo tla numbor of individuals Tes important to note, however, that this inevitably arbitrary process of omtting species is no necessary forthe more usa between-ample similarity calcul ations. There the eonputation of the Bra)-Curtis ‘coefficient downeight the contributions ofthe Tess ‘common species in an enely natural and continuous fashion (he rarer the species the less it contributes, ‘on average), and all secies should be retained in these calculations DISSIMILARITY COFFFICIENTS “The converse concept similarity stat of ls arity, the dopse to which too samples are nike each ‘ther. Thoush sinianty and dinsimilarity are just ‘opposite sides ofthe sme coin, the later isa tore fatal staring point in constucting oedinations, in which dssnilaries (8 beteen pairs of samples are turned into distances (2h between sample locations on ‘inp Ths large dsr implies that samples ‘Should be loated a age distance from each eter, nd isinilrts near imply nearby location; 8 must ‘hereorw away’ be postive of course. Sinilriies cn easily betumed no dst by: = 100-5 ean For example, the iy-Cuntscoeticint his ves yy ~ Ya The +94) which has mits 3 0 (no disiniariy) and 3= 100 oval sia). by =100. en) However, rather than conversion fom similares, caer important dissimilarity measures arise in the fireplace as distances, Their role a implicit dise Similarity matrices underlying particular cxdination techniques wil be seen more Clearly late (@g. in Principal Components Analysis, Chapter). The PRIMER Sint resi wi camps Bo Cort ct sie ho th rsdn aermot (oth te roomed ee fo ae ‘Sacha pin toe reson oe er of ce, ting hose rset ps ne of [eam a by singe ic 9 fa spore (pourri ane ie cn pal Traer pal nh perso uctidean distance “The natural distance between any two points in space isefereed tas Euclidean distance (from clasial Eveldean geomet). In the context of «sce abun nce matrix, the Evelidean distance between saples J and Kis defined algebraically as: 3) This can best be understood, geometrically, by taking the cil ease where there ae only O40 species tha ‘samples canbe represented by points in 2-dinwsional Space, nancy thee postion on het aes of Species | find Species 2 counts This lastated Blew for a ‘Simple two samples by two species abundance ati ‘The co-ordinate poins (2, 3) and 1) onthe (Sp. 1 Sp.2) anes are the to samples jand k The direct distance dy between them of (2-5) + (3-1F] em Pythagoras clearly corresponds to equation (213) ae Se TT gps It iseasy to envisage the extension ofthis to a matrix vvith three species; the two pints are now simply Toca on 3imonsinal specs ake and her saght Tine distance apart is a natural geomevic concept. Algebraically, it the root ofthe sums of squared Aistanes apart along the three axe, equation 213). [Extension four and higher numbers of species (dimen sons) is harder to envisage geometrically (i our 3 ‘imensionsl wold) but the concept remains unchanged land the algsbr is no more dificult to understand in Biaher dimensions than tre: addtional squared dist nees apart om each new species ans ar add tothe ‘Stmmation under the square root in (2.13). In ict, this concept of representing a species-by samples ‘atx as pont in high-dimensional species space is Nery fundamental and important one sd wil be met ‘again ia Chapter 4, where it is crucial to ar under Standing of Principal Components Analysis. Manhattan distance velidaan distance ent the only wy of dating it ace apart of to Samples i species space; an aller ive sto sum the distances along each species axl 44=Diby val ex “This iso refered Manhaten (oipeck Thence bomen diesen coregod thldstancejo woul hve tote ot een toto ssn pose set ot wal pd is itntae ere snge Sanpete dhs Nahum des Sehnert tee at snows lati to Bry-Coe ai, enton 13), ‘her danas nay elu sade ie mt tesa), ay Cie Estar (So the Morten dace see he sume tain nth boxe in a2 1) en aa hes thee [it s worth soting pint of terminology in passing, ‘though not ane of any great practical consequence Euclidean acd Mahan measures, 2.13) and (214), are called detances or metrics because they obey the {rangle inept, ie, for any three samples e+ > dy ery Bray-Cortis similarity does not, in genera, satis the wiangle noquaiy, so should not be ealled a metic However, many other wsefl dissimilarity coecients axe also not nevies. For example, the square of Eulie fan distance (.e. equation (213) without the ¥ sign) ‘sanother nara definition of "siance” whichis net Chapter? meno ‘amet, yet dissimilarities ffom this would have the Same rank order as those fom Eve dean distance ad therefore ive rise, for example, 10 identical MDS ‘owdnations(e8 Chapter 8). I follws hat whether a AissimilaritycoeTcient sont a metic ike to be of limited practical signicance forthe strategy ‘his manual advocates RECOMMENDATION There ae thus a variety of means of generting Similarity or dsimilaity stance) matrix to input to the next stage of a mulvarize analysis, which ‘might be either a clustering o dination of samples, Fig. 21. For comparative purposes it may sometimes be of interest to use Evelidean distance inthe species Space as inpt to clister analysis (an example i {sven later in Fig 5.5 but, in geen, the resommend tion remains unchanged ey-Curtis similryis- Similarity, computed ater suite tansformation, Will often bea satsatory coefient fr biaogica data on community stctore, Background physical for chemical dat is diferent mater since it wsually oa rather diferent type, and Chapter 11 shows the Usefulness ofthe concept of Evcldean distance i the {normalised emsronmental varia space. Init though, concentration i on analysing the biologie ata in isolation, andthe next stage will often be #0 perform cluster analysis (Fig 2.1). iIll| ~a c ‘curteing St eamplee eg 3 Ss inemineae bo aaa The PIER Snory rine sng aco tn (crm rol ge 4808 tr bo emanmet pene CHAPTER 3: HIERARCHICAL CLUSTERING. (CLUSTER ANALYSIS “The previous chapter has shown how to replace the ‘original data matrix with pinvse similarities, chosen ‘to reflect the pacar aspect of similar commity structre (similarity in cous of abundant species, similarity in general disposition of rare species ete) ‘which the Biologist requires to emphsise forthe study in question, Typical, the number of pairwise smi aries i large 12 form samples, and i ea fen be no easier to detect a pattem ithe resulting lower ‘wingula silty matrix than iti i the eiginal da. Table 3.1 illustrates this or just potion roughly 2 quater) ofthe similarity matrix forthe Frirjord Iacrofauna dts (F). Close examination shows that the four repiats within site A generally have higher witinsite ines than do pis of replicates wikia ‘hes Band C replicates benveen ss, butte ptr ef frm ler. What i needed i graphical display linking samples that have mutually high levels of similarity. Table 1. Freord mcrafune sunt Fear in tree. ser Verner co fore pi of fae emp fom trt BC on ot aps ee BENE ES oy GM SS oS ee {Custer analysis (or elssifcation ams to find “nate tral groupings” of samples such that samples within a troup are more similar to each ater, generally, than Samples in diferent groups. Cluster analysis i sed inthe present context in the following ways 2) Different sites (or dtferent times atthe same site) canbe sen to have differing community composi- tions by noting that replicate samples within 8 site form aes that i istnc from replicates within cer sites. This canbe an important hue over: come in any analysis; if replicates for a site are ‘steed more ors randomly with elias rom every ater ste than further ierpretatin is ikely to be dangerous. (A more formal statisti et for Aistingushing site ithe subject of Chapter 6) 1) When tis established hat sites can be distinguished fiom one another (oF, when replicas are not ake, ‘itis assumed tha a sngle sample i representative ofthat stor time} ites or tmes can be prone Ito groups with sila commanity strstr, ©) Cluster analysis ofthe species similarity mati can be sed to define cies assemblages, ke. groupe of species that fend lo co-occr in parallel manner Range of methods Literally hundreds o clustering methods exist, some of ther operating on sinilriysiiarity matrices ‘whilst oters are based on the orginal data. Evert (1980) and Comack (1971) give excellent and readable reviews. Cliffrd ard Stephenson (1975) is another well-established text on clasiiaton methods, rom ‘mn ecologies viewpoint Five clases of lastering methods ean be disingishod, following the categories of Cormack (1971). Hierarchical mettods. Samples are grouped and the groups themselves form clusters at lower levels of similar 2) Optimising techniques. A single set of mutsly exclsive groups (esualy pre-specified number) is formed by optinising some clustering criterion, for example mininising + withinluster distance measure inthe spose space. 3) Modeseking methods, These ar sed on consi. ations af densi cf samples in the neighbourhood of eher samples, agai inthe species space. 4) Clumping technigues. The tem “slumping” is reserved for methods n which samples ean be placed Inmore than one cise, 5) Miscellaneous techniques ‘Cormack (1971) alo warmed aginst the indiscriminate use of chsteranasis “aaiabiiy of elation techies has led othe waste of mee valuable seen ie ime than any otter statistical” innovation”. The ‘Chapters eee? ‘ver larger number of techniques and thei increasing ‘ccessibiiy on meden computer systems makes this ‘raring no Fess pertinent today. The policy adopted here is fo concentrate ona single technique tht has ean found to be of widespread utility in ecological sts, whilst emphasising the potential abivainss inal clssifiaton methods and stessing the need © perform a custranalsis in conjuction with a range ‘or oter cng (ee nina, ttt esting) to obtain Balanced and reliable conclusions. HIERARCHICAL AGGLOMERATIVE (CLUSTERING “The most commonly wsed clustering lechigques are the herarchica agglomerative methods, These stall) take similarity maria her starting pia and suce- tessvely fie the sols into groups ad the aoups into larger chsters,arting with the highest mutal similarities thon gradually lowering te similarity eve teh groups ae formed Te proces ends with {single cluster conning all samples, Hierarchical “isve methods perform the oposite sequence, tart ing with single elaser and pling 0 form suc cxsivly smaller gros ‘The result of a hieratcical clustering is represented by a tee dlagram or dendrogram, with the = axis representing the full set of samples and the y axis Aetining a similarity vel at which two samples or ‘soups are consiere to have fed, Note tht there $5 no Firm convention fr which way up the dendro- ‘yam should be porrayed (increasing or deereasing y Axis values) or even whether he tee canbe placed on is sido all three possibilities can be found in this ‘mana Fig, 3.1 shows a denérogram forthe similarity matrix from the Frerfjord acrofaunal abundances, a subset ‘of which s shown in Table 3.1. Ttean be een that all Four replicates from sites A, D, E and G fase with cach other t form distinc site groups Before they Smalgnmate sith samples from any’ eter site hat, comersely, site B and C replicates are not distn- {uished, and that A, and G do not link 1 B, Cand D ut quite low levels of between-group sinilitis are reached, The PRIMER CLUSTER oie dps the dengan jot herchl ggomest rng wig oo the tree Togs ptlites dred blo adr pans fora lating neato yu snty s 2 Pig 21 Fleer macau cout (F. Desogra foe evar fing sonnavee Inking of for ple aml fo ach tanh Bi ‘The mechanism by which Fig, 3.1 extracted from the similarity mati, including the various options {or defining what is meant bythe similarity of to sroups of samples, is best described fora simpler example Construction of dendrogeam “Table 32 shows the steps inthe sucesive fsing of samples, forthe subset of Loch Litnhe macrofaual bundances used 38 an example ine previous chap- ters The data matrix has boon Weranaformed, and the first wiangulararay isthe Bray-Crts similarity of Table 22. Samples 2 and 4 are seen to have the highest similr- ity (anderined) so they are combined, at similarity level 68.1%. (Above tis eel there are considered be four clusters, simply the four separate samples) A new similarity matris then compe, now contin- ing thre clusters “1°, "244" and". The simiaity between cluster "and cluster °3" is unchanged at (0.0 ofcourse but what is an appropiate definition of similarity (1, 2&4) between clusters "I" and "28, for example? This will be some fonction ofthe simile aries S12), batween samples 1 and 2, and SC), between I and d; thee are tree rain possibilities hee Chapter 3 raed al 32. Lach Une maou fst banka arya Wasi he reingBrayCor arty mars and ace edna mae fom oral ha, ie avr an Bee TTT sme 2 ese tan 3 same ase ceo 7 ra 7 Bue 6 bone ke iy aac ais fen. a a 3 7th a or ie - Mae Hh bso iB aia - thee’ oot a 2) Sipe tdage. 1,280) the mama ofS,2) 9g atthe 32% © Compl tnkage. 1, 284 te minima of & Stand ie 25%, a» ©) Group-average link. S(1, 2&4) is the average of i and Mh De 389% | Table 32 alot grouper inking hence Sage 284.3) = 192,3)+.98,3)2 = 580 Deadeoram etaree ‘The new matric is again examined for the highest Simiarity, defining the next sing; here this is be ‘ween “244 and 3", at similarity Level 85.0%. The ‘ati i gun efomed forthe to new clusters“ and "28344" and there is nly a single similarity, (1,243), to define For group-average linking, this the mean of S284) and (1, 3) But it must bo aweigled mean, allowing Tor the ft that there te tice a6 many samples in cluster "24" as in laste "3 Hers S11, 28384) = [2 SCL, 284) +1 1,39) @x389+ 1x03 = 259 ‘Though itis computationally efficent form each successive similarity matrix by takina weighted aver. 5 ofthe similarities inthe previous matt, an tematve which is enirly equivalent (and perhaps ‘onezpualy simpler is to define the similarity bet ‘weer to Broups as the simple (unweighted) average ‘of al between-group similarities inthe intial tring ‘lar atin. Thus 1, 28384) = [50,2)+ 5,3) SCL, 413 = 086+ 00452298 = 259, the same answer a above ‘The Final meg of all samples into a single group therefore takes place at similarity level 289% and the shntering process forthe roup-averageHnking shown in Table 32 an be displayed in he fllowing dendogram. This example raises a numberof more general points shout the use and appearance of dendrogram 1) Samples need to be re-ordered along the axis, for ‘lear presentation of the dendrogram; iis aways posible wo arange samples in such an order that one ofthe dendrogram branches eos ech ther. 2) The resutng onder of samples onthe x ans isnot unique simple analogy would be with a childs “obi; the vertical lines are strings and the horizontal lines sig bars. When the srctre is suspended by the top string, the bars can rotate Feely. generating many posible re-arangements ‘of samples onthe x axis For example nthe aoe Figure, samples 2 and 4 could switch places (new sequence 4, 2,3, 1) orsample 1 move tothe apposite side ofthe diagram new sequence 1,24, 3), buta Sequence such as 1,2, 3, 4is not posible, [follows ‘hat to use the x ais Sequence as an ordering of samples is misleading, 3) Cluster analysis tmp to group samples into die ret casters, not display thei interrelations on 2 ontiuous seal; the later is the province of ordination and his would be preferable forthe Simpl example above. Clustering imposes a rater uubitrary grouping om what appeats to bea entin= tuum of change from an unpellted year (1968), through steadily increasing impact (oss of some Species, increase in abundance of "opportuniss” like Cepielia, 10 the stat of & reversion to an Chapter 3 age improved condition in 1973. Ofcourse itis unwise land unnecessary to stem serious interpretation ‘of sich small subset oF data but, even $0, the ‘equivalent MDS ordinate for this subset (met in Chapter 8) comtasts well with the reaively un- helpful information inthe above dendrogram. (A PCA ordination ofthe ul data st can be Seen in Fig 41) 4) The hierarchical nature ofthis clustering procedure states that, once a spe is grouped with ater, Iwill never be separated frm them ina ater stage of the process. Thus, early borderline decisions Sohich may be somenfut arbitrary ae pepetated ttvough the analysis and may sometimes have a significant effect onthe shape of the ital dendvo gam, For example similares S@2, 3) and 2,4) shove are very Healy equal. Had S(, 3) been just, freer than S124), rahe than he ote way round, the final picture would have Been a ile differen. Tn fact, the reader can verify that ad) been sround 56% (sy) the sme marginal shift inthe Shes of 82,1) and SC, 3) woul have had aie ‘Sonseguences the final dendrogram now grouping, {vith and Twit 4 befor these Ovo groups come together ina single caer. From being the Fst 0 te joined, samples ? ar now only Tink up atthe final sep, Such situations are certain tars if as here, one isting to force what is essentially @ steadily changing pattem into discrete elt Dissimiartes Exactly the converse operations are neoded when stering from a dissimilarity ater than a similarity Imatix. The tvo samples or groups withthe lowest Aissmilarity teach stage are fused. The single link- ge definition of dissimilarity of two groups is the ‘minim Sisiilarity one ll prs of samples bet swoon grovps: complete Tinkage soles the manu Aiswiilarty and sroup-average linking involves just an unweighted mean dssiniarity Linkage options ‘The tring consequences ofthe the linkage options are most easy sen forthe special case used in Chapter 2 sshere there ar only 189 species (rows) inthe og inal data mata. Samples re then pont inthe species spoce, wih the (x) axes epesenting abundances of (pl, Sp.2) respectively. Consider also the case where similarity bereen two samples is defined simply as their (Euclidean) distance apart in this plo. a oso Cope ik se I the above diagram, the single lnk dissimilar beeween Groups 1 and 2 isthe simply the minimum distanee apart of the two groups, giving rise to an erative name forthe single linkage, namely near- fx neighbour clustering. Complete linhage dissin tity ileal the maximum distance apt of any 60 Samples inthe diferent aroups, namely furthest neh dour clustering Groupaverage dissimilarity 1s the ean distance apart of the two aroups, averaging over all Beeseen-r0up pairs. Single ad compte inkage have some stracive ther tial properties. For example, they are effectively Imonemetric. Soppose thatthe Bray-Curtis (53) simi rien in the orginal tianelar matrix are replaced by thei rams, the highest similarity is given the vale I the next highest 2, down to the lowest simile ity wih ank nov 112 for n samples. Then a single (or complet) ink clustering ofthe ranked matrix wll have the exacily the same stracrre as that based on the orginal similares (though they axis similarity sale inthe dendrogram wil be tansformed in some ‘onvinear way). This a desirable feature sine the precise similarity values wll nt ofen have any diet nificance; what mates is thei relationship each ‘ther and any no-linear(monetoni) rescaling ofthe Similarities would ieally not affect the analysis. This is'ako the stance taken forthe prefered erdination technique in this manual’s strategy, the method of on-metricmuls-dimensional scaling (MDS, see Chapter 5), However, in practice, single link clustering bas a tendency to produce chains of liked samples, with ach suecessive stage just adding. another single Sample onto a large group. Complete linkage will tend to have the opposite effect, with an emphasis on sorallchsters atthe early stages. (These character. Isis can be reproduced by experimenting With the special case above, generating nearest and furthest ‘neighbours ina 2dimensional species space) GGroup-averaging, on the other hand, is often found empirically to stike a balance in which a moderate ‘number of medium-sized clusters ae produced, and ‘only grouped togethor at ater stage 1g. 12 Briel Chae soplaton 8 Sampling ts EXAMPLE: Bristol Channel zooplankton Collins and Williams (1982) perform ierarchicsl cluster analyses of zooplankton samples, eolected by double oblique net haols at $7 sites in the Brisa ‘Channel UK, for thre different seasons in 1974 This was ota plltion study but a bassline survey ciried out by the Plymouth laboratory, a part of & rnajor programme to understand and mode the eco- spstem ofthe estuary. Fig 3.2 isa map ofthe sample Iecations, sites 1-58 (site 30 not sampled), Fig. 33 shows the resus of hierarchical ehsering using groupaverage linking on data sampled daring Anil 197. The raw data were expressed a numbers ‘x exbie mete for each of 24 holezoplkon sess, fed Bray-Curtis similarities calculated on trans Feemed abundances. From the resulting dendrogram, Collin and Willisms select the four roape determined ata 55% similarity level and characterise these as rue fexuarine (sites 1-8, 10, 12), estuarine cd marine (9, 11,1527, 29, euryhaline marine (28,31, 33°38, 42. 45,4730, 33-35) and stenoaline marine (32, 38-1, 45,46, 51, 52,5688). A coresponding clustering of species an a re-ordering of the rows and columns of thy orginal data mat allows the ientifation of number of species groups characterising these main se cuses asi sen later (Chapter 7) rapier ee 5 ‘The dendrogram provides a sequence of fairly con- vineng groups; once ead ofthe four main groups has formed it remains separate from other groups over a relatively large drop in silat. Even so, a clster tnalyss gives an incomplete and disjointed picture of the sample pater. Remembering the analogy of the “mobile, its mot clear fom the dendrogram alone whether there is any paral sequence of community ‘change across the four main clusters (implicit in the ‘designations tue estuarine, estuarine and marine, ‘enyaline marine, tenotline marine) For example, the stenohaline marine group could just as correctly Inve been rotated to lie Between the estuarine and ‘marine and exyhaline marine groups. In act, there 4 strong (and more-r-esscontinnons) gradient of ‘community change across the region, aseoeited with the changing sity levels This is best sen in an ‘nation ofthe 57 samples cn which ae superimposed ‘he salinity level teach sit his example Is there fore returned in Chapter 1 RECOMMENDATIONS 1) Hierarchical clustering with group-yerae linking, ‘based on Sample similis or dissimilarities such 85 Bray-Curts, bas proved a sel technique in a umber of clecal sis of the st thre sade. Wis appropriate for dineting groups of sites with cy ct Bray-Curtis similarty 8 80 0 RGRADURIRGSSSRUSTRSSSIAARTTTARRANZ: Fig 13 rtd Chal sop {8}. Derararon for eric cherig of th 37 se, win rapaneap inn of rap Curs snd ceded on onfmedaundon deta, istinct community tracture (his is ot 10 imply that groups have no species in commen, of course, bu that diferent characteristic paterns of abundance are found consistently in diferent group). 2)Clastering i less wef (nd could sometimes he misleading) where there is steady gradation in mami satire aces sits, paps in respense to strong environmental forcing (large range of salinity, sediment grain size, depth of water column, et). (Onlinaton is preferable in these situations 3) Even for samples which are stongly grouped, cluster analysis often best wed in conjunction ‘with eedination. Superposition of the elasters (Gt earns Inve of similarity) oa cndination ‘plot will allow any relationship keteen the groups to be mote infrmatively displayed, and it wil be ston laer (Chapter 5) that agree between the two representations. stengthens belief in the adequacy ofboth Chapter 4 rae} CHAPTER 4: ORDINATION OF SAMPLES BY PRINCIPAL COMPONENTS ANALYSIS (PCA) ORDINATIONS ‘Aa ordination is2 map ofthe samples, usualy in 60 ‘orthrec dimensions in which he placement of samples, rather than representing their simple geographical Ieeation, rfles the similrity of their biological communities. To be more precise distances between ‘simples onthe ordination atempt t match the cor sponding. dissimilarities in community structure: arb pins have very similar communities, samples ‘which are fr apart have few species in common ot the same spaces at very diferent levels of sundance {Ge biomass). The word "atempt” is important here Since thre is no unigely defined way’ in which this fin be achieved. (Indeed, when a large nunber of ‘syciesfactat in abundance in response toa wide ‘arity of envionment sariables, each species being ailected ina diferent way, the community stractre iseseualy high-dimensional and tay be impos {te to obtain a useful tuo or thrce-dimensionl ep- resentation. So, as wth caster nays, several methods have been roposed, each using diferent forms of the original fia and varying in thee technique for approximating high-limensonal information in low-dimensional plas ‘They ince 1) Principal Components Analysis, PCA (see, for ‘example, Chatfield nd Colin, 1980); Principal Co-ordinates Anabsis, PCOA. (Gower, 1966): Correspondence Analysis sed Detrended Corres. ‘pondence Analysis, DECORANA (Hil nd Gach, "980, Muli Dimensonal Scaling, MDS; in pariculee tnommeirie MDS (see, for example, Kriskal and ‘Wish, 1978), ‘A comprehensive survey of ordination methods is ‘ouside the sope ofthis volume. As with eluting ‘methods, detailed explanation i given only of the techniques required forthe analysis strategy adopted ttroehout he mana. This snot to deny the validity fof eher methods bat snply to affirm the importance ‘of applying, with understanding one oto techniques lof provenuilty. The tw ceination methods selected are therefore (axuably the simplest ofthe various options, atleast i concept. 2) PCA isthe longest-establshed method, though the relative inflextilty of its definition init is pac tial usefulness mare to multivariate analyse of ‘vironmental data rather tha species abundances tor biomas: nonetheless sil widely encountered tnd is of fundimental importance, b)Noasmetie MDS is 9 more recent development, ‘whose complex algorithm coukt only bave bee Contemplated 1 an ea of advanced computations! Power however, its rationale ean be vey simply Aesribed and understood, and many people would argue that the ned to make few (any) assumptions shout the data make ithe most widely appliable nd etective metho available, PRINCIPAL COMPONENTS ANALYSIS. “The stating point for PCA isthe orignal data mate rather than a defved similarity matrix (though there is an inplictdsinilaity matrix underying PCA, that of Bucidean distance). The data ary is thousht of 85 defining th postions of samples in relation to ter reperenting the fall stuf spores un ais for ach species Ths 5 the very important concep intro deed in Chaper2, following equation 2.13). Typie- ally there are miny species so the samples are points ina very high-inensioal space A simple 2-dimensiona example thes to visualise the proces by again considering 1m (atifcal) example tn hich there are only to Specie (and nine samples). Smpe 234s 6 7 ao “The nine samples ae theretore points in wo dimensions, and lbeling these pois with the sample nab gives the folowing plo ‘caaptee 4 pees? sp2 pos ci “ Co 5 Crear) el “This isan ondination already, of 2dimensionsl data on a 2imensonl mip, and i summarises pictrially ail the relatnships Beoween the samples, without needing to disard any information at allover, Suppose forthe sake of example that a dimensional ‘ondiaton is required, in which the egal data is feduced to-a genine ordering of samples long a Tne. How de we best place the samples in order? ‘One possibly (hough a rather poor one! is simply to ighore altogether the counts for one ofthe species, ‘ty Species 2. The Species | axis then automatically tives the I-dinensonal ordination (Sp counts are ‘gain abled by simple number: Sule 2 sot (Think ofthis as projecting the points in he 2-dimen- Sonal space dent th Sp. axis). Not suprisingly, this rather inaccurate I-dimensioal summary of the sample retionships inte fall 2imensioal dst, eg samples Tan 9 are raha to close together, cen Samples seem tobe in the “wrong order” (9 shouldbe ‘loser to 8 than 7s, 1 should be closer to 2 than 3 is), tte. More intuitively obvious woud be to choose the [dimensional picture as the (perpendicular projection cof points onto ‘he ine of best ii the 2einensonal plet Spl The |-dimensnal edition, called the lst pineal component aus (PCI) isthe Some? 1 4 36 79 8 and his picture i a much more elise spprosimation 1o the 2-dimenensional sample elationships eg. 1st now closer 102 than 3s, 7,9 and 8 are mor equally Space and the right” seyerne el). The second principal component axis (PC2) is defined as the aus perpendicular to PCI. and afl pnelpal component analysis ten consis simply of arotation ofthe eignal2-dimensional ple § per - \ sp ‘Obviously the (PC1, PC2) plot contains exactly the same infrmation asthe orginal (Sp. Sp) sph ‘The whole point ofthe procedure though is tht. as in the eurent example, we maybe able dispense with the seed principal component (PC2}- the points in the (PCI, PC2) space are projected onto the °C1 axis tnd rsatively litle information about the sample ‘elation is lst inthis redstion of dimensionality pit ta res Definition of PCL axis Pez Up to now we have been rather vague abou what i ‘meant by the “best iting” fine through the sample points in 2-dimensional species space. There ae to ‘atral definitions. The first chooses the PCI axis the line which mininises the sum of squared perpen ieular distances of the points from the Hine’ The sscond approsch comes from noting inthe above ‘example that the biggest ferences betven samples {ake place along the PC] axis with relatively small changes inthe PC? direction. ‘The PCL axis i there Fore defined as that drston in which the variance of sample points prcjeced perpendicularly ont the axis 1s mised. In fact, these two separate definitions ofthe PCI axis tam out to be totally eguvolnt and fone ean use whichever concep is easier to visualise Extensions to 3cimensional data ‘Suppose that the imple example above i extend 2 the following matin of counts for re species Ateiise Sp. ses s1 66 ou ww Samples are now points in three dimensions (Spl, Sp2 and Sp3 acts) and thre are therefore three principal componst axes, again simply rotation of the thee species axes. The definition of the (PCI, 'PC2, PC3) anes generalise the 2-dimensionl case it ‘natural ay PCI isthe avis which maximises the variance of points projected perrediculary onto it, C2 is constrained tobe perpendicular to PCI, bu is then again clasen as the diestion in which the ‘variance of prints projected perpendicularly onto itismaximisel, CS fs the axis perpendicular to both PCI and PC2 (there i no choice remaining be), Sp3 Sp2 st “Tharp of a a oro oir nr aeon scp hte oer feral pmecalteeenion mee en re ta Chapter & mets ‘An equivalent way of vsulising this sgn in terms of best fi PCL is the "best Fig line othe sample Points and, together, the PCI and PC2 axes define a plane (stippled in the above digram) whichis the est fing” plane Algebraic dei The above geometric formulation can be expressed algebraically. The thee new variables (PCS) are just Tinear combinations 9 the old variables (species). ‘sich that PCI, PC2 and PCS ae uncorrelated. In the above example PCI = 062x801 + 0.82% Sp.2 + 0.58xSp3 PC2=-0.73x80.1 + 063xSp.2 + 0.20%Sp.3 (4.1) PCI 028x801 + 055x812 ~079% S93 he principal componants are therefore inter (in theory) in tems ef the counts foreach eriginal Species axis. Thus PCI is sum of rouehly equal {Gnd postive) contibitons from each othe species; iti essentially ordering the samples fom low to high tox abundance. At amore subtle lve, fr samples with the same total abundance, PC2 then mainly i tinguishes relatively Ligh counts of Sp.2 (and low Spl) from low 82 (and high 8p.1); $p.3 valves do ot feature stongly in PC2 Because the eorespending olynomial) These are ‘parameric models giving rise 10 the term metric ‘MDS foe his approach. 'b) Perform a nom-paramevie regression of don 8 _Evng rise to nommerie MDS. Fig. $2 lusts the nonparametric (monotone) regression ine. This isa beseiting” ine which moulds self to the Shape of the eater pl, bat isalvays constrained to increase (and therefore consists of series of steps) The elative succes of nonsmetrie MDS, in preserving te sample relationships i the distances ‘ofthe ordination plo, comes from the Nexbilty in Shape a thi noneptertrisregeeion Tine A perfect MDS was define! previously as one in Ishich the rank order of dssimilrtes was totally preserved in the rank order of distances individual points on the Shepard plot must then all be {rmenctonic) increasing: the larger a dissimilar, the lager (or equal) the coresponding distance, and he non paramete repression line ea perfect Fit I follows thatthe extent to which the seater points deviate fom the lie measures the fale to atch the rank order dissimilarities, motivating the following definition of sess. 4) Measure goodness-ofsfit ofthe regression by cle: lating the tess vale EAE Le 6) were dy isthe dna pei om the Fite rs regression line corresponding to dissimilarity MW dy =dy for ll the m(n-1)/2 distances in this summation, the stress er. Large scatter clearly Teads to large sess and this can be thought ofa measuring the difculty involved in compressing the sample relationships into two (ora small numer) ‘of dimension. Note tht the denominator ising ‘scaling term: distances in the final plot have only felative not absolute meaning and the squared distance term in the denominator makes sure that ‘reas ia dimensionless quliy 5) Perturb the current configuration tn a direction Of decreasing stress. ‘This is perhaps the most Aiffcut pat ofthe lgorithm to visualise and will ‘not be dete; iti based on established tecni- {ues of numerical optimisation, in particular the Iethod of steepest descent. The essential idea is thatthe eesion relation i wef evaluate sess For (smal) changes in the positon of points on the ‘ordination plo, and pints are then moved to new postions i dictions which 10k Tike they wil ‘decrease the sess mast rapidly 6) Repeat steps 3 105 until convergence is achieved, ‘The iteration now eyeles around the two stages ofa new regression of distance on dissimilarity forthe ‘new ordination positions then Further perturbation ofthe positions in directions of decreasing sues In most cases, the cycle will stop when further ajustment ofthe points leads to no improvement Features ofthe algorithm, Like all iterative procedures, especially ones this comple, things ean go wrong! BY 8 series of minor ‘djusimens tothe parameters at its disposal (he co ordinate positions in the configuration), the method sradually finds its ay down to a mii of the tess fiction. This s most easly envicaged in three dimensions, with just a 2dimensional parameter space (the 9 plane) and the verteal ais (2) denoting the sress at each (& 9) point In realy the stress, surface isa fanetion of more parameters than this of ‘ours, but we ive seen before how sel it can be to visualise high-dimensional algsbraie operations in tems of 3limensionsl geometry. An appropriate analogy i to imagine a rambler walking across 8 range of hills ina thick fog), attempting to find the owe point wihin an encireling range of high peaks ‘A good satay is aways fo walkin the direction in ‘which the ground slopes away most steply (he Imethed of stepest descent, infact) but thee i m0 arance that his strategy will necessarily find the Fewest point overl, i the global minimum ofthe stress function. The rambler may reach low point fiom which the ground rises in all diections (and th the steepest descent algorithm converges) but there may bean even lower point on the other sie of an ajacer hill He then eapped in lca minim ‘ofthe sess fneton. Whether he finds he global or 4 local minimum depends vey much on where he ‘ars the walk, Lethe starting configuration of Points inthe ordination plot. ‘Such foal minima do oecur in many MDS analyses, sully comesponding to configurations of sample pins which are only slightly different from ane ‘nother. Often this may be because there are one ot {v0 points which bear litle relation to any of the other samples and there ae Several choices a8 10 where they may be plced, or perhaps they have a more complex relationship with other samples and may be dificult to fit into (sy) @ 2dimensional picture There is no guaranted method of ensuring that 2 ‘lobal minimam of the stress fnetion has been Feached: the practical solution is therefore to repeat the MDS analysis several times starting with diferent random postions of samples inte inital configuration (ep 2 above). Ifthe same (lowest sess) solution reappears ffom a number of different stants then there is a stong assurance, though never 8 toa guarantee, tht this is indeed the best solution, Note thatthe easiest ay to determine whether the same slo is et eed win vi ag simply check for equality of the sess remenbe th the congrations hemsees cl be abirarly rotated or reflect with respect to each Chapters peer SS cxer* In_genuine applications, converged tess ‘ales are rarely precisely the sue If configurations ‘iff materially. Degenerate solutions ca also occu, in which soups of samples cllase to the same point (eventhough they are not 100% similar), orto the vertices of triangle, or are strung ut round a cisle In these ‘ase the tess may goto 22. (This i akin to out rambler stating his walk outside the encircling hills ‘0 that hese off i totally the wrong dition and ‘nds up at the sea!) Artefacts solution of his sort ae relatively rae and easly detected: repetition fom Gitfeeat random stats will ind any solutions ‘hich are more sensible (hn fact, a more likely ease of an ordination in which points tend to be placed ound the ercumference of ice is that the input maz i of similarities when the program hasbeen told to expect dissimilarities, or vice-versa; in such cases the sess wll lo be very high) much more emmon form of degenerate soltion is repeatable tnd is a genine rest ofa disjunction in the data For example, ithe data divide int two groups which have no species in commoa, or fr which al disse tes within the eroups are salle than cy sie ary between groups, then thee is ley mo yardstick within our non-parametric approach for determining how far part the groups shod he place in the MDS ‘plot. It is then not surprising to find thatthe samples in each group collapse to point (a commonly inet special eases when one ofthe to groups conse of 4 single outing point). The solution isto split the data and carry out an ordination separately on the 160 groups rin he later case, e-un the MDS omiting the tien Another feature of MDS mentioned sare is that Unlike PCA thee in any det reltoship between cdinations in diferent numbers of dimensions. In PCA, the 2-dimensional picture i just a projection of ‘the 3.dimersonal ene and all PC aes canbe generated ina single analysis. With MDS, the minimisation of ‘ress is clearly a quit diferent optimisation problem foreach ordination of different dimensionality indeed, this explains the greater success of MDS in distanoe™ ‘reservation. Samples tat are inthe same position with respect (PCI, PC2) axes, though ae far apart Die bares of area can be a praca mance hen comprng fem earn ey io ‘ete or MOS tot etn of inal son ys Petia he bd MOS coors (et ne “ppbing PCA ote ga sw maf cee ie Fai Sos one ‘oes aly a oe emis eye co o a erentataneefcon ‘Chapters pees ‘on the PC3 axis, wil be projected on top of each ‘ther in 4 2-dimensinal PCA bot they will main Separate, to some degre, ina dimensional s Well {8 S-dimensional MDS, IF theultimate aim i 2imensonaloxdiaation, it may sil be useful tocar outa -imensional MDS inl, lt fist swo dimensions will Ren provide a reasonable tardng pt to the iterative computations for the 2-dimensiond configuration’ In fact, this strategy will tend to reduce the risk of fining local Ininims or degenerne solutions. The samples are Tikely to ft more easy into three dimensions, el reducing the risk of Sring a local minimum: the 2- ‘dimensional eration wil thon be constrained wo start ‘much nearer a gloat minimum than it would for @ purely random intl configuration. Another reason {or obtaining highe-dnensonl solutions so compare thei tres with that om two dimensions: this one of several ways in whih the accuracy ofa 2imenson a1 MDS can be assessed ADEQUACY OF MDS [REPRESENTATION Is the siress value small? By definition, sess Increases with ducing dimensionality. of the ation (or in re eases where 31 -imensional fondinntion isa pf epesetation, sess remains Constant) has therefore been suggested tht stress values in 2,3, 4 ete dimensions should be compared: if there is a particulrty large drop in Stress pasting fom to fo thre dimensions (39) tnd only a modest, steady decrease thereat, this ‘would imply that a Sdimensional ordination is Tikely tobe a more satisfactory representation than 4 2alimensinal one. However, experience with cological data suggests that clear-cut “shoulders Such as ths Inthe ple of minimum sires against Aimensionaliy, areal seen tis lo undeiale ‘hata 2-dimensind pictur wil usually bea more useful and accessie summary, so the question is ‘en ted around not "What he ra mension ality of the data?” but Is 2-dimensional plot a sable summary ofthe sample relationships, ois it Vikely to be suffiently misleading to foree is shandonment in four of 3- or higher-dmensional ru practi ado by he PRIMER NDS rst, which thi lor the cr me he moar of rane rat ‘eal aoe 1," PRIMER relegate rats of te ot et se) Pineal nd ine ‘Sal soot afro ol and pea non poh pil. or ogre Plot?” One answer to this is though empirical Evidence and simulation stdies of stress vals, Swess increases not only with redsing dimensional ‘ty but also ith increasing quantty of dat, but a rough ule-of thumb for 2-dimensinal ordinations, using the sess formula (51) i allows Seess<0.05 gives an excollent representation wth no Prospect of misinterpetation (a perk representation ‘would probably be one wih stess <0.01 since humeral eration procedures often terminate ‘she stress reduces bow this vals! Stress <0. coresponds toa good oxination with no real prospect of a misleading interpretation; 3- or higherdimensional_ solutions wil not edd. any ditional information about the overall struct (hough the Tine stracture of any compeet groups ‘may bear closer examination), ‘Sess <0.2 sil gives potently sf mensional Pitre, though for vals at the apper end ofthis Fange too much reliance shuld at be placed on the detail ofthe plot a eos check ny conclusions should be made agaist those fram an alternative technique (eg the superimposition of cluster groups suggested in point 5 below) ‘Suess 0-3 indicates thatthe points re clase to being arbitrarily placed in the 2-dimeasional ordination Space. In fc, the totally andom positions used as 4 starting, configuration for the eration wsilly ive a suess around 035-045, Value of ses in therange 0.2-0.3 sul therefore be weated witha seat deal of scepticism and certainly discarded in the upper half of this range, espeilly for a smal to mederate number of points (<50 say). Other tecnigues wl be certain to igi consistencies and higherdimensional ordinatons shold be ‘amined 2)Does the Shepard diagram appear satisfactory? ‘The ates vale totale the sate sound the repess- jon lin in Shepard diagram, for example the low stress of 0.05 for Fig. 51s reflected inthe low Scatter in Fig. 5.2. Outlying pintsin the plot could Tre ae lave fast of se, or exemple the ess formal opt poe i th MOSCAL od ROS prarame "Thiers oo he deat sca tm bt {Gib bars ore th kof gee i ‘Se be more appropri fr te fara fart ‘Saige mute! tft wk rt the ‘Sop of ma 5 Pie of he MDS roi n PRIMER for example. Chapiers rare) pti f te — | ts Sh eee rate ering om ray-Cutsimiarity be idemifed with he samples involved ote there area range of ours al ivolving dissimilarities With a particular sample and this can indicate Point which really needs a. higherdimensional representation for accurate placement, or simply corresponds to a major ero ia the data mati, 3)ls there dlstorton when similar samples are connected in the ordination plot? One simple check onthe succes of the ordination in disimi- fry-pretervation is 10 identify the top 10% or 20% (eay) of vals inthe similarity mati and raw a line between the comesponding points on the MDS configuration, An inaccurate represen ation is indicated if several connections are made between poits which are fuer apart om the plot than ether unconnected pais of points. 4)1s the “minimum spanning tre” consistent with the ordination picture? siilar ide to the fbove i to constrict the minimum spanning tee (MST, Gower and Ross, 1969). All samples are “comected” by a single line which is allowed t0 ‘ranch but doesnot form a closed loop such that ‘one minimises the sum along this line ot dsm ‘artes (ken from the orignal disiniaryy matrix ‘ot the distance matrix from the ordination note). This line is then ploted on the 2-imensional condiaton and inadequacy is again indicated by {cnectons which lok unnatral nthe context oF placement of samples inthe MDS configuration ‘rowonnrd atu The ns a po ‘testo edi) are 5)Do superimposed groups from a cluster analysis distort the ordination plot? The eonbination of ‘lustering and ordation analyses can be a very fective way of lacking the adequicyand mul consis of both representations. Fig 5.3 shows ‘he dendrogram fom a cluster analysis ofthe Exe estuary nematode dat (X) of Fig. 5.1. Two oF ‘more (abitary sinilarty valves are chosen at a spread of hierrcical levels, each determining & Particular grooping of samples. in Fig. $3, four ‘sr0ups are formed a around a 13% similarity vet tnd cight groups would be determined for any Similarity theshole between 30 and 45%. Pig 54 Baz manne (X_N “Pearin ov ag 51 th spring cers ip ‘Sa ay of 19 ed i) od 0384 Chaptes nee 5-8 He a Pig. 8 Dosing evens, ‘Sits ’ in ‘Sukersrond ment Nomoto abandon or fps oh of ‘dan od gh def (tatoo ant afer ‘pecs aon nd ‘ronjermaton or fig 13 ong Compotrape ce Ding fom bor si Pex @ Lo oe Gsm ‘aud hail {ibs bane rom he ane torts mar = 018, B79 Cooper chr: ng rm Ene a ‘Shon fom to lel a ia Peto ig 42. ne ° i pre det “Those two sets of groupings are superimposed on the MDS ordination, Fig. 54, and iis clear that the agroment between the two techniques excelent the clusters are sharply defined. and ‘would be determined in much the same way iFone were to select clasers by eye rom the 2-dimensional ‘wdinaton alone. The sess for Fig. 5 is also to at 0.05, giving confidence that the 2-dimensioal plot is an accurate representation of the sample Featonships. One is not aways as fortunate a8 this, and a more revealing example ofthe benefits of viwing clustering an erinaton in combination isprovided by he data of Fig. 4.2" EXAMPLE: Dosing experiment, Solbergatrand “The nematode abundance data fm the dosing experi cent (D} a the GEEP Oslo Wosshop was previously analysed by PCA, see Fig. 42 and accompanying text The analysis was likely t0 be unsatisfactory, since the % of variance explined by the first to 5 One pion wit PRIMER io on CLUSTER om he kof the sor ate thon he oe ers. Wht ot ‘fy val a al a a fe pn Cre {05S ete ec te ep ‘ace he MOS pla he apa tha the nematon dy bth nto ma vn mor compaale Principal components was very low, at 37% Fig 55e shows the MDS ordination from the sare ds, nd inorder o make a fair comparison with tte PCA the dats matrix was tested in exactly the se prior to analysis” The stress forthe 2-dimessonal MDS configufation is moderately high (ot 0.16), Indicating some difficulty in displaying the ration Ships between these 16 samples in two dimasins However, the PCA was positively misleading in its pparent separation ofthe four high dove (H) replicate {nthe 2-imensionsl space; by eontras the MES does provide a usable summary which isnot ikl lead {0 serous misimerpreation. This ean be sen by superimposing the corresponding. cluster analysis results, Fig. 5S, onto the MDS. Two silty thresholds have been chosen in Fig. 5.58 sich that they (arbitrarily) divide the simples into 5 and 10 sroups the coesponding hierarchy of clase: being Indicated in Fig. 5.5 by thin and thik ins respect ively. Whilst itis clear that there are no natural ‘groupings ofthe samples inthe MDS plot, nd the ‘roupings provided by the cluster analysis must therefore be regarded with some caution, the to analyses are not markedly inconsistent. 1 The rome 26 sper were renin and og oman poli ore compton of ray Cartref tr pie ra wd a moray be mary ik NDS cheng fsa In contrast the parle operation forthe PCA craton clearly ilutites the poorer distance preserving fropetes of this metod. Fig. 55d repeats the 2 Aimensional PCA of Fg. 42 but with superimposed rps rom chaser aralyss ofthe Bachidean distinc Imax between the [samples (Fig, 5.50). With the Same division into five clusters (hin lines) and en clusters (hick lines), much moe distorted piture results, with samples at are virally coeiden in the PCA plot Beng paced in separste groups and samples apearing dstmt from eachother forming 2 omen gro, The outcome tht would be expected on theoretical sounds is therefore apparent in practice here: MDS. ‘an provide a moe realistic picture in suations where PCA gives a distort representation of the tue “distances” between samples. Infact, the biological conclusions fom this particular study are entirely regative the test descbed in Chapter 6 shows that there are_no sutsically significant diferences in ‘sommunity stricture between any ofthe four dosing Teves in this experimert. EXAMPLE: CeltieSea zooplankton In situations where the samples ae sonal grouped, sin Fig, 5.3 and $4, both clustering and ordination falyses will demonstat this, usually in equally !Meguate fashion. "The stength of ordination is in ‘isplaying 2 gradation of community composition ‘across ase of samples, An example f provided by Fig. 56, of aoplankon data fom the Celtic Sea fC) Samples were collect fem 14 depths, scpartly for day and aight time studies ata singe sit The changing community composition wth depth can be ced on the esking MDS (from Bray-Curts sin artes). There isa pater degree of variability in ommuniy structure of the nearsurface samples, ‘with a marked change ‘composition a about 20-25 1m; deeper than this th changes are steady but less pronounced and they step in parallel or day and night time samples! Anothe’ obvious feature i the strong difference in communty composition between day and night near-surface samples, contrasted with thet * a pointy and Buen dice i he dinar * Tepe rations mee te dy ong sono helper ep 18) won ot be oe yan DS fr de atone te reat prio ig ome MDS ‘otrnnng a ete ope te at Sige and keen) th aE ape Tht roan of ‘i sald bra sommoneaed ate he cna {Catyhghaineenl german lo Seer Chapters porary Pip 56 Cote Sen optenton (6 hod) a dey tne oper fm 1 spc ete 8) ton a ge te ing ptr 178 DS plot ox mate relatively higher similarity’ at greater depth, Chater ‘analysis ofthe same data would clearly not pamit the accuracy and suet a interpretation that osible from ordination of such a gradually changing comm= unity patter, MDS STRENGTHS 1) MDS is simple in concep. The numerical lid ‘s undeniably complex, but itis alvays clear what MDS istrying io achieve he construction ofa sample map whose interpoint distances hav the same rank order as the coresponing dissin ites between samples 2)1tis based om the relevant sample informaion. [MDS works om the sample dissimilarity matric aot, ‘onthe original data array, so here is complete {feedom of choice to define similarity of comm lity composition in whatever erms ae bicoyally ‘most meaningful 23) Species deletions are unnecessary. Avether adn ‘geo staring fom the sample dissimilarity matrix {shat the numberof species on which wes based i largely irlevant 10 the amoant of calelation required OF couse, if the orginal matrix enti Alarge numberof species shore pattems of abun ance across the samples varied widely 04 por transformation (or choice of siilarity coeffi) Pie 63, Frid macofoune (FMD onan o for Socom fs itor me in de underlying wingular similarity mars. jy is defined ashe averngeof al rank silts among replicates within sites, and Fy i the average of rank Sinilrtesarsing from al irs of replicates between Afferent sites then a suitable tet statistic ) tar 6) where Mf = afr-1)2 and m is the total number of samples under consideration. Note thatthe highest similarity comesponds to 2 rank of | (he lowest value), following the wis! mathematics convention For assigning ranks, ‘The denominstor constant in equation (6.1) has been chosen so that: 2) Ream nevertecnialy lie outside the range (1) 1b) = | only if al replicates within sites are more ‘Similar teach oe than ey replicates from dif: ©) is approximately ero ifthe null hypothesis is true, so that similarities between and within sitet Ibe the ame on average, 2 will usualy fll between 0 and 1, iaeting some degre of diseimination beeen he sites. R substan ill less than sro sulky since it would comespond to similarities across different sites being higher than ‘hose within ste; such an occurence is mere likely Chapters paseo to indiate a incorect labelling of samples) The R Statist ial sa useful comparative measure of the ‘egies of separation of sites, and ts vale is lest ‘ss important as is statistic signtcance arguably more so). As with standard univariate 125s, i 5 pete posible fr R to be significantly diferent From 2x0 yet inconsequential small if there are many epiates at each sit 2) Recompute the statistic under permutations of he sample Iatels Under the mull hypothesis Hh: “no Sifference sewer sites, sere wil be ite effect on average tothe value of Rit the labels identifying ‘which replete belong to which sits are abitrrly rearangd the 12 samples of Fig. 63 ar just replicates from single nie fH tte Thin he rationale fora permutation test of I: all possible allocations of four Bue Cand our D labels othe 12 samples ‘sr examined and the R state recalulated for eth, Ingenera here are oye 62) Aistnet wins of permting the labels frm replicates tech of sites, pving $775 permutation ere. Is ‘omputatinaly possible o examine this number of rerlabllings but the scale of ealeulation can quickly {et out of hand with modest increases in replication, 50 the full et of permutations is randomly sampled (sualy wath eplacement) 0 give the nal distribution ‘FR. Inotor words, the labels in Fig. 63 are randy ‘reshuffled, resulted and the process repeated 3 Tang numer of tines (7). 3) Calculate the significance level by eefering the ‘observed value of Ro its permutation distribution. 1f is ue, he likely spread of values of Ris piven by the random rearrangements, so that ithe rue value (oF looks ulialy to Have come from this disrbuton there is evidence o rej the ul hypothesis. Formal, ifonly ofthe Tsmulated values of Rar slings (or larger than the observed then Hy canbe ejected at 8 significance level of (1 )(P+1), oF i parentage terms, 1000" \(T+1 * Chapman en Unkerad (997) pit ome tao it whch mine R'ver oh nt necro ‘Brame cw x prac hehe comma tr pera kine cer pot ron rigs dei fale «mar aig org ‘whites encmpeed oth ch peop ft ‘roe nt nae EXAMPLE: Frierfjord macrofauna “The rank similarities underying Fig, 6:3 are shown in “Table 62 (note that these ae the siniarites inv ing only sites B,C and D, exacted from the matrix forall sites and roranked) Averaging across the 3 Aiagonal submatrices (within groupe B, C and D) ‘ves jy, 22 7,andacros the remaining (fF diagonal) cenies gives 7 "373. Alon = 12 and M66, 50 that R= O48. In sontas, the spread of R values possible from random redabeling ofthe 12 samples an be sen inthe histogram of Fis. 6.4 the largest of T = 999 simulators is less than 0.45 (€=0). An observed value of R= 0.15 is soon to be a most ‘unibely event ith a probability of les than fin a TOOD a te ad We ea thereore jet He a 3 Significance level of peO1% (at lest, Because ‘048 may sill have been the most exeme outcome ‘observed had we ehosen an even larger number of ‘Simulations), Table 62, Ferd marae (F Ra saary max “arte npc indie Choate Hw wa aaa op oF a 6 su Dia 12% 8 St we Dr 4 0 4 39 32 W635 6 Don ese 7 mM? The above is a global tes indicating that there re site Ailferences somevehere tht may be wrth examining further Spacific pars of sites ean then be compared for example, the snares involving only ses B and C ae exacted, ezanked snd the test procedare repeated, giving an vale of 0.23. This time thare axe only 35 distinc: relabelings 50, under the all hypothesis H, that stesB and C donot dite, the full permutation distribution of possible values of ean be computed; [2% 0 these vals are eq o or larger than 023 so H,cansot be rested. BY cones, (054 forthe comparion of B against D, which the rmostentreme vale possible under the 38 pemmuttions a Boe 8 R static Pig, 64 Prernd arf (FSi distin of the tt te ator 81) ser the al opts of ‘ose ference coms ooh on see for Foe B and D are therefore infered to dif significantly atthe p<3% level For C against D, R= 0.57 similarly leads fo rejection af the nll hypathesis (7-39). ‘There is dango in such repeated significance tests Which should be noted although rather litle ean be one to ameliorate hee). To eject the mall hypothesis ata significance lve of 3% implies that 3 risk is ‘being run of drawing an incoret coachison (a Type Terror in tistical terminology). IF many such tests fare performed tht risk will cumulae, For example, All pairwise comparisons between 10 sites, each with {replicates (allowing 3% level tests at bes), would Involve 45 tests, andthe overall sk of drawing at least one false coclusion i high. For the analogous Prise comparisons following the global F tet a Univariate ANOVA, there exist multiple comparison tess which ater fo adjust Tor this repetition of risk, (One trightorwan possiblity, which could be cared ‘over tothe preset multivariate test, a Bonferroni orretion. In itssimpest form, this demands tht, if thre ae pairwise comparisons in total, each tes uses a significance level of 0.0Sin, The socalled experimentwise “ype leroy, the overall probability fof rejecting the mall hypothesis at Teast once inthe Series of pairwise tests, when there are no genuine ference, then kept to O05 However, the difeuty with sucha Bonferroni cor ‘tion is clear fom the above example: wih ony 4 replicates in each group, and thus only 35 posible permutations, a sianificance level of 05/3 (=1.7%) an never be achieved! Itmay be posible to pln for 28 modest improvement inthe number of replicates: 5 replicates from each site would alow a I lve est, rapier age for a paiise comparison, equation (62) showing that here ae then 12 permstations, ahd wo groups ‘of 6 replicates would ive close to 90.2% level text, However, this may nct be realise in some protic contexts, o it may be neicient to concentrate effort fon too many relists atone sit rather than (sy) increase the spi coverage of sites. Als, for a fixed numberof replcates, «too demandingly low Type | or signifiare level) vil be atthe expos of great risk of Type IT error, the probity of rot detecting a difereace when one geninely exit, Strategy for interpreation The solution, as with all significance tests, i to treat them ina more pragmatic way, exerising due cation interpretation cerainy, buna allowing the fray ‘of atest procedure for pairwise comparison to ere? ‘ith the natural explasation ofthe group differences Herein lies the real suength of defining a tet sats, Such as R,which as an absolute interpretation oft ‘value. This is in eontnst toa standud Zp stati, which ypcally divides an appropriate measure (aking ‘he vale zo under the nul ype) by stn deviation, so that interpretation is liited purely to Statistical signfieanee ofthe departure Irom Zero “The recommended couse of aston, fora case such as the above Frerfjord dea is therefore always o arty ‘out, and take totally seriously, the global ANOSIM test for overall dfferesses between groups. Usually the total number of replicates, and this posible ‘Permutations, i relatively large, and the test wil be reliable and informative. IF ts n0t significant, then ‘generally no further inerpretatin is permissible, Hit 4s significa, i is legitimate to ask where the main between-group differences have arisen. Thebes tol for this isan examination ofthe R vale fr cach pa Wise comparison: large values (lose to unit) are indietve of complete separation ofthe groups, smal Vales lose to zr) imply ite or no sepezation. If the MDS iso suficierdly low sues to give arelabe picture, then the relative group Separations wil aso ‘evident fom his The value tf snot unduly alfecte bythe numberof replicates in the two groups being compare: this isin stark contrast statistical significance, which is dominated by the group sies (or large numbers of replicates, R values ner 2210 5 bu he compton of DSI Rast more serra el oprah hon rt dep tec atone ie he ome hg Catton of hs ny ‘lenin ot wht ees sprint pit a MDS satay a ice Recap on he inden ftdimenstonl any mt could sil be deemed “significant, and conversely, few replicates could lead to values close tunity being elased as “non signifsant) The analogue ofthis approach inthe univariate case (ayn th comparison of species rcess between sis) ‘Would be firstly to compute the global F test forthe ANOVA. If this establishes hat there are sigeificant ‘over differences between ses the sizeof the effets ‘Soul be acettined by examining the diferences in inesn vals between cach prof sites, oe equivalent, by simply foking at pot tow the mean chess varies gross sits (perhaps With the replicates also Shown). Its then ionmediely apparent where the nan differences nd he interpretation ew natal ‘one, emphasising the impetant bologcal featres {ee absolute losin rics i 5, 10,20 species, oF relative losis $8, 10%, 20% of the species ool, or ‘whatever) rather than puting the emphasis solely on Significance level in pirwite comparisons of means, Which runs the risk of mising the main message slogether. So, retuing to the multivariate data of the above Friend example, interptaton of the ANOSIM: tests i son Io be ssightforward: a significant level (p-0.1%) and a mid-ango vale of R (= 048) fr the ‘slob tof sites B, C and D establishes that there tue statistically significant ferences between thee ses, Similarly miang vs of F slightly ihe, {054 and 0.57 forthe BvD and Cv D comparisons, ‘ontasted with auch lover vale (of 027) for BC, imply thatthe explanation fr the loba est result is that D differs from both B end C but the later sites sxe not distinguishable ‘The above discussion has raised the issue of Type I ‘enor for an ANOSIM permutation test, and the com: lementary concep tat ofthe power ofthe test ame the robbliy of detecting a ference between groups shen one genuinely exists. ess of power ae not ‘aly examined for nonparametie proedures oF is {ype which make no dsriutional assumptions and for which its difiul to specify a precise non-all hypothesis. All tht ean be abviouly sad in general i that power wil improve with inresing repletion, and some low levels of replication should be avoided llogethor. For example, i eamparng only two ours witha [-vay ANOSIM test based on only 3 rephiates far each group, then hee areony 10 disnst pemitat- ions and a significance level batter than 10% could rover be atsned. A test demanding significance level of $% would then have no power to detect diference between the groups, however large that ‘iference ist Generalty of application Its evident tha few, if any, assumgtions are made bout the data in constructing the way ANOSIM {est and itis therefore very generally applicable. Iki ot restited to Bray-Curtis similarises or even 10 Snilarites computed fom species abendance dats: it could provide a non-parametric altemative to Wilk ‘A test for data hich are more neary mitvarite- ‘omally disubuted, eg for testing wheter groups (tes of times) can be distinguished on the basis of their environmental data (See Chapter 11). The later ‘would involve computing Euclidean distance matrix between samples (ater suitable tansfrmation of the ‘rvveamental variables) and entering his as 8 dis iarity matrix tothe ANOSIME procedure. Clea, if nulvarite normality assumptions are genuinely {stifed then the ANOSIM test mast lack sensitivity in comparison with standard MANOVA, but this ‘would Seem to be more than compenated fr by is eater generality [Now also that there is no restcton to a balanced numberof replicates. Some groups could even have ‘only one replicate provided enough replication exists mothe roups to generate sufficient permutations for the lob est (though there willbe a sense in which the power ofthe test is compromised by a markedly unbalanced desian, here as elsewhere). More useful, note that no assumptions have been made about the ‘aribility of within-group replication needing to be Smile for all groups. Ths is seen in the following Example, fr which the groups in he [ay layout are ot ses but samples from diferent years ata sngle ‘Warwick er al (19905) examine data fom 10 replete transects across a single corabreef ste in S. Tis sland, Thousand Islands, Indonesia, foreach ofthe six years 1981, 1983, 1984, 1985, 1987 and 1988 ‘The community data are inthe form of % cover of 2 transect by each of the SB coral species Metified, and the analysis used Bray-Cunts similarities on tntransfrmed data to obtain the MS of Fig. 6 ‘There appears to be a strong change in community patter bebween 1981 and 1983 (putnivey inked to the 1982/3 FI Nio) and this is confimes by 8 Teway 'ANOSIM rest for those two years alone: R= D3 Gr (0.1%). Note that, though nt really designed fr this Situation, the testis perfectly valid in the face of rater “variability” in 1983 than 1981; infact i is ‘mainly a change in variability ater than location in the MDS plot tat distinguishes the 1981 and 1983, groups (a point retuned aia Chapter 15) This isin irs with he standard univariate ANOVA {or i= Saiate MANOVA) test, which will have nopowe © eect a variability chang: indced itis nvalil wthot sumption of approximately equal variances (or ‘ariance covariance matrices) arossthe gros. Pip. nde rf oa. Tia and. MDS of Tearoom nef | “The basic vay ANOSIM test can aso be extended {0 eater (to some degree) for more comple sample designs, as follows ANOSIM FOR TWO-WAY LAYOUTS Three types of field and laboratory des sidered here: 8) the 2omay mesed case can arse where two levels ‘of sptial replication are involved, eg sites are _rouped a prior tobe representative of 0 "eat ‘ment™ categories (contol and polluted) bt there ar also repiat samples taken within ste the 2ay crossed ease can arise fom studying fixed set of ste at Several times (with rept at ach siete combination), or fom an experim- eal study in which the same set of resents” Serre gt rt to thoy en The pce fr bogs det Sgr 9 iio pe ucla ef peyton ices set ‘sro i rer yi enn Chapter rare? (€6,contol and impact are apliod at a numberof locations ("blocks for example inthe diferent mesocosm basins of laboratory exporiment; ©) Downy crossed case mith mo replication of exch tweatmentilock combination can alo be estred for, to a Timited extent, by ferent spe of pormtation test. The following examples of cases a) and) are drawn From Clarke (1993) and the two examples of case) are from Clarke and Warwick (194). EXAMPLE: Clyde nematodes (2-vay nested case) ambshea (986 analysed meiohentic communities fiom three putatively pllted (F) areas ofthe Fith of (Cyde and thre contol (C) sites, taking thee replicate samples at each ste (with one exception). The euting [MDS, based on fourh-oottansormed abundances ofthe 113 species inthe 16 samples, i given in Fi 66a. Thesites are numbered | to fr bath condtons but the numbering is arbitrary ~ there is nothing in common between PI and Cl (say) This i what is meant by sites being “nested” within conditions Two Iypothess are then appropriate His there are no ferences among sites within each “treatment” conto or posed conditions), 1 there are no iernces hereon conto and pil: uted conditions. ‘The approach to 2 might depend onthe outcome of testing HI Hi can be examined by extending the -way ANOSIM test to a constrained randomisation procedure, The presumption user His that there may bea difference between general lecation of C and P samples inthe MDS plots but within each condition there cannot be ny pater in allocation of episates tthe these. Treating the two conditions entirely separately, one therefore has two parte Inay permutation analyses of exactly the same type a fr the Frefjord macto fauna data (Fig 63). hese generate test statistics fe and Ry, computed from equation (6.1), which ean be combined to produce an average suatisie 1 This ‘an be tested by comparing it with values fom all posible permutations of sample label permite under the null hypothesis This does not mean that all 16 ‘ample labels may be arbitrary permuted the random Tsaion is constrained to take place only within the inp 6 reee-s seperate conditions: and C labels may not he switshed ven so, the namber of posible permutations is ge (around 20,000), [Notice again that the testis nor restcted balanced dlexigns Le ose with eaual numbers of relieate “amples within see andor equal numbers of sites ‘within retnent (although lack of balance causes ‘minor completion in the efficient weraging of Re and Ry, see Clarks, 1988, 1993). Fig 6 6b displays ‘he ests f 99 simulations (conseainedrlaelings) fom the permutation distbution for under the null hypothesis Hi. Possible values range frm 03, 606, though 95% ofthe values ae seen tobe <0.27 and 9% are <1.46, The observed of 0.75 therefore Bovis sn Sian econ ot pots i 2, which will usally be the more interesting ofthe ‘0 hypothexs, can now be examined. The test of Hi demonsred that there are, in effet, oly toe > fad Fig 64 Che ema 9.) MOS of specie bunds "fom thc poled” F179) and tee “roe er (= “Ewin tere ape a ma sf = 809) 1 Sian seat ean Find pass i een io wesc te nd genuine “replicates” (the sites 1-3) at each othe two onditions (Cand Pp, ‘This isa I-vay layout and H2 canbe tested by L-way [ANOSIM but ane frst needs to combine the inform tion from the three orginal replicates teach ite, to Getine asmilrity maw for the 6 "new" replicate, Consistent withthe overall strategy that tse shoul only be dependent oa the rank similarities in the ‘riginal angular matrix, one Fis averages over the appropriate ranks to obtain a reduced matix. For example, the similarity beeen the three PL and ‘roe P2 replies is defined as the average of the ine intergroup rank similarities; this is pce into the now similarity matrix along with the 18 other averages (CI with C2, PD with C1 ete) atd all 15 ‘values ae then re-ronie; the Iovay ANOSIM then ives R= 074. Thee ae only 10 disinctpemuatons that, although this is actually the most exreme value posible, H2 is only able to be rejected at a _P=10%% significance level ‘The ether scenario to consider is thatthe frst test fais to reject HI. There are then two possibiies for ‘examining 2: 8) Proceed with the average ranking and reranking exactly ax above, onthe assumption thal even i i cannot be proved that thre are no diferences between sites it would be unwise to assume tht this ss; the test may have had rather ie power to detect sucha difrence b)Infer from the test of HI that there are mo differ. ences between sites, and treat all replies as if they were separate sites, eg. there weuld be 7 replicates for control and 9 replicates for polluted conditions in a [vay ANOSIM test appli tothe 6 samples in Fig. 6.6, Which of these two courses to tke is @ mater for debate, and the argument here is svactly that of whether "to pool” oF “nt 19 poo!” in forming the residual forthe analogous univariate 2-way ANOVA. ‘Option b wll ertanly have greater powe tat runs 2 real isk of being invalid; option a ithe conarvative test and it is ceil uve to dig stay th anything the than option 2 a mind Tne NOs preram i te PRIMER poke aay ter efit ton EXAMPLE: Eaglehawk Neck meiofauna [Boway crossed case) ‘An example of a two-way crossed desig is given in ‘ansick ef a (1990) and is itr more fully hee in Chapter 12. This 8 so-aled natural exp ‘riment, ssing distance effects on meibenthic ‘immunities by the continual reworking of sediment by solder crabs. Two replicate samples were taken from each of four disturbed patches of sediment, and from adjacent undisturbed areas, on a sand fat aplehawk Neck, Tasmania; Fig. 6 i schematic representation of the 16 sample locations. Thee are te factors the presence or absence of disturbance bythe crabs andthe block effet” ofthe four diferent Aturbance patshes. 1 might be anticipated thatthe cenmunty will change naturally across the sand lt, fiom block 1 blk, andi important oe ale to sparate this effet from any’ changes asointed ith the disturbance itself There are pales herewith impact studies in which pollutants affect section of several bays, 50 that matched control and polluted ntons can be compared aginst a background of clanging community pattern across a wie spatial scale. There are presumed to be repliate samples fiom each testmenvblock combination (the meaning ofthe term erossed), though balanced numbers are rc essential Fer the Faglchawh Neck data, Fig. 6:7 displays the [MDS forthe 16 samples (2 treatments «4 books = 2 replicates), based on Bra)-Cumts smilies from oot trinsformed abundances of 59 meiofaunal species, ‘The patter is remarkably else and clic anlogue of what, in univariate two-way ANOVA, would be called an adve model. The meibenthie community Js seen to change ffom area to area aross the sand flat but also appears to difer consistently between Aisurbed and undisturbed conditions. A test forthe later sts up a nl hypothesis hat there reno dsr ane effets, allowing for the fact that there may be beck effects, and the procedure is then exactly thet ‘ofthe 2ovay ANOSIM test for hypothesis HI of the rested case, For each separate Bock an statistic is culated from equation (6.1), aif fora simple one- way test for a disturbance effect, andthe resulting ‘ues averaged to gives permutation distribution unde the all hypothesis is generated by examining all simultaneous re-rderngs of the four lnels (0 iturbed, two undisturbed) within each block. There are only three distinct permutations ia each block, ting total of (81) combinations overall and Chapters nee 69 b + . oA a ° : a a 7g un Fig 62 Tasmania Each Neck) 0) Schoma of be "ony come sonpling deg for meio art ‘wo darted an random pas rom seh ef a other ening at Ble as aed 9 MS [pats abamloncs fort Is samp shine separation (ec eet an tn of ad oa the observed value of & (= 0.94 is the highest value tained inthe 81 permtations, The aul hypothesis is therefore rejected at significance level of just over % ‘The provedure departs from the nested case because ‘ofthe symmetry inthe crossed design, One an now teat the mull hypothesis thr there are no block effets, owing forthe fact that there av treatment (distr ance) diferences, by simply reversing the roles of tteatmens and blocks. is now an average of 40 8 statistics, separately calculated fer disturbed and n- isturbed samples, and there are 8(20V41} = 105 Permutations of te 8 labels foreach treatment, A Fandom selection from the 105° = 11,025 posible ‘combinations mus therefore be rade, In 1000 ils Chapter 6 page 10 pe e Fie 68 Wotenche nem _ oder expren MDS vs ‘S ecer chow fot “ Mee meres ey eames 10 yi os ice nar the tue value of B (085) is agai the most extreme tnd is almost certainly the Ingest inthe fll se the tll ypothesis i decinely rejected. In this case the testis inherently unitcresting but nother situations (x a sites» times study) tests for both factors could beof practical importanse EXAMPLE: Mesoeosm experiment 2-way 1..} being from independent variables: this js obviouiy notte of similarity coetiients from All possible pairs of set of (independent) samples “This doesnot make pany the ess appropriate as measure of agreement whose deperte from 2500 (section of H,) is testable by permaaton Forthe mutient enrichment experiment Fig 610 shows the separate MDS plots forthe 4 mesocosm basins Although the stess valves are rather high (and the plots therefore slighly unreliable asa summary ofthe among treatment relationships), there appears to be ‘no commonality of pattern, and this is bore out by a near zro value fr Pu of -0.03. This i central to the range of simulated values for py under H,(abtined by permuting teatnent labels separately for cach bck and resompting p). so the test provides no ‘evidence of any treatment ferences. Note thatthe Symmeuy of the 2a) layout alo allows a test ofthe (des interesting) hypothesis that there are no block «iets, by looking for any consstney in the among- ‘asin relationships across separate analyses foreach ofthe 16 weamnens The tet i again nonsignificant, with py. = 0.02. The overall negative conclusion 9 the tests should bar any farther attempts at interpret. ation aF hese dt, EXAMPLE: Exe nematodes (no replication and missing data) {A final example demonstrates positive outcome to ina common ease of a 2-way layout of Sites and times withthe ditional enue that saplen fare missing altogether fom small number of cel. Fig 6.11 shows again the MDS, fom Chapter 5, of ig 1, Ex etary nematodes. MDS fo 9 rl ti icin oie 8 In fact, this i based on an average of data over six successive bi-monthly sampling occasions. For the Individual times, the samples remain soapy cstered into the 4 oF 5 main groups apparent fom Fig. 611 Less clear, however, i wheter any’ stuctre exists the largest group (Sites 12 1019) or whether the scatter in Fig 6.11 i simply te consequence of sampling variation Rejection of he ll hypothesis of “no ste ditfrences” would be siggested by a common ste pater in the separate MDS plots forthe 6 times (Fig. 612), At fone ofthe times, however one of the site samples rmissing (ste 19 at imes I and2, ste 18 at time 4 and fie [Bat ime 6) Instead of removing these sites from Al plots, in order achive matching sets of simile ities, on can remove for each par of ines enly those fies ising for either of that pir, and compute the Speaman carltion p between the remaining rank similarities, The p values for all pice of tines ae ‘hen averaged 10 Ve Py ie the lefchand route is taken ia the lover half of Fig. 69. This i wually tefered to a8 pairwise removal of missing data, in contrast the Mews removal that would be needed for the right-hand route. Though increasing. the computation time, pairwise removal clearly wiles ‘more ofthe avilable information Fig, 6.12 shows evidence ofa consistent site pater, for example in the proximity of sites 12 0 1 and the tendency of site 13 to be placed on its own the fact. that ste 15 is missing on one ozcasion dos not under rine this persivedsticture. Pairwise computation ves py, ~ 0.6 and its significance canbe determined bby a Monte Carlo test as before. The (nonsmissing) Chapter 6 mage site Ibe: are permuted amongst the availble spies Separately for each ime, and these designations fixed whilst al the pred p values are computed (using Pairwise removal) and averaged. Here the, largest alu in 999 simulations was 0.30, 0 nll hyphess is rejected atthe 0.1% evel In the sane way, one can also carry out tet ofthe hypothesis that there are no diflerences aro rime for sites 121019. The component plots ofthe #10 6 times for each ste, display no obvious fetes and x= 008 (p18). The file to rejost this nul ype justifies, o some exten the ue of averaged dat across the 6 ties, nthe ear analyses. ‘Test of his form, searching for agreement betwen ‘v0 or more similarity maces, occur lo in Chapter 1 Gin the context of matching species to enitonental da) and Chapter 15 (where they link bite patterns to some mode! srcture),The discussion thee cae use of measures other than a simple Spearman soeff ‘cient, for example a weighted Spearman coeficient (Suggested for reasons explained in Chapter 11), and these adjustments could certainly be implemented here als if desire, using the TeR-hand route inthe lower hal of Fig6.9. ln the present context, his type ‘of “matching” testis clearly a inferior one to that possible where genuine replication exist within the 2-vay layout. Itcannt cope with followup ets for Aitfrenes beeen specific pairs of treatments and ‘tan have ite sersviy if he numbers of weatmens tnd blocks are bth smal. A test fortwo treatment 'simpossble note, since the treatment patter in all books would be deta. ae ln es NDS fora 0 2 Io’ peered spray 12 | forthe samp is fed a ‘rt ote on {phe rand ming 8 pl ee coercion ‘epee is apart range fom 00, coapeer RECOMMENDATIONS 1) For typical species abundance mates, its much beter o se an ANOSIMGype pertation poced- tre rather than clsical MANOVA test: the later ‘vill almost always be txlly invalic 2) Choice ofthe level and typeof replication should be carefully considered, Though tis dificult to define power for any of the ANOSIM ests, i is ‘fealy important to take suceat replicates to generate a large enough set of permutation for ‘meaning significance levels. Equally important {s that replicates should gemunel represent the condition being sampled: pseudo-rplcaion (see Hurer, 1988) Is commonplace, ef, analyses of sub-cores of single ere, osteo” spatially cont ‘gous simples ich are unepresenative ofthe extent of a site. For pseudo-epictes n a 2-vay Tayou, the only valid course iso average them and cary ut the above global tt far the ease of "m0 replication”. 3) point tht cannot be overstressedis that ANOSIM. ‘ess onl aply to groups of samples specified prior to scing (rcllectng) he dat A dangerous is ‘conception is that one can use a cluster analysis of ‘he species abundance data o define sample group> ings, whose statisial validity can be established by performing an ANOSIM test for differences Iuoen these groups. This entirely erroneous, the argument being completely cireula The only safe couse here i © use this ist set of data 10 efine poeta pup uf iret ew ees he hypothesis, and then to collet firther set of ata totest that hypothesis CHAPTER 7: SPECIES ANALYSIS SPECIES CLUSTERING AND MDS Chapter 2 (page 246) describes how the orginal ds ‘mati canbe used to define similarities between every pio spaces; two species ae thought of a5 sina” 5 their mbers (or biomass) tend to flstate in pale oss sts. The resulting species smarty matrix cua be input to a cluster analysis o ordination Im exaety the same way’ as for sample similares Fig. 7.1 displays the results of clster analysis onthe Exe estuary nematode data (4 Fist sen in Caper 5 “The dendngram is based on Bray-Curts silt computed on standardised abundances, a given ‘qustions 2.9) and (210). Following the eeommend- ations on page 2-6, the numberof species was fist reduced, retaining only those that accounted for more than 44 ofthe eal abundance at anyone sts, Chister analysis witha eater number of species is posible but the hitand-miss" occurence ofthe rare species toss the sites fends to confuse the pte Infact, Chapter ee similarity of around 10%, he dendrogram divides fay neatly nto S chsers of species, and these soups ‘an be identified wi the 5 clstrs that emerge fom the sample dendrogam, Fig 53. (This Mention ‘owes simply fom etteporising the species by the Site groups in which hey have te greatest abundance, the comespondence between sit and species groupings ‘on this basis is sento be very close) Fig 7.2 shows the 2-imensional MDS plot ofthe same species similarities. The groups determined from the ‘luster analysis are superimposed and india «good measure of agreement. However, bath clustering and MDS have worked wel here because the sts are strongly grouped, with many species characteristic of ‘only oe site group. Typically, species caster analyses tue les leary delnated than hia the corespond- ing MDS ordinations have high sess. A moe inform ive approach is ofen to concentrate onthe sample Similaries and. highlight the species principally responsible for determining the sample groupings in the cluster or cedinsiom analyses. ig 21, xe eey noma tc P. Destopi ng Epon frm snared ans ‘ine the 57 mee port ‘pecs te amon lanceolate any Pe ea 98) rial tof 182 The Brg ine at ny Soto! fT we Campa pees sare a pn ual inthe PRIMER CLUSTER oti nd Pi, 22. Bae ery nema eM ees OS cele tl The oe J he crane DETERMIN SPECIES: ING DISCRIMINATING With a wide range of sophisticated multivariate tech- niques at one's dispesal, ial too easy t los sight (ofthe original data full understanding requires the {ata matrix to be reexamined in helt of the mic ‘variate results. Ins orignal form, ican be difficult to trace patterns inthe data matrix (indeed, this isthe rationale for mulvariate analysis in the Fist place) ‘but a simple reordering of columns (snipes) and ‘rows (species) can be an efetive way of displaying ‘sroupings or gradual changes in species composition ‘This illustrated by the data on Bristol Channel 200- plankon 8) met ia Chapt: 3. The 3 sites arereorder- ‘ef according othe four groupings inthe dendrogram (Fig. 33), and the 24 species reordered according thei approximate placement eros the species MDS, The abundance vals ae 2 ACY for counts up to 10,000 (2 fr 210,000), and with dos denoing absences. The resulting matrix (ig, 73) is highly scemet presentation of the rs ata, with vieualy no less of information conten The log scale, which ensues narrower interval wi forsnallercouns,refets ypc sampling variability for abundance data and matches the ikely mui variate analysis. (Ordination of the decoded daa, ‘sing interval means, is indsnguishable from that tase on orginal abundances Fig 75, rt Chal ppt 8) Reread hn ma forte Fes nd $7 rer Adan foitrare ot 929 "Wad ec/Ob-I2Iy cota Sort sana astae 2857999 Zaonaa er 4 ih grt cle Init tonetSlonsprao nd? emg Leal) ‘The effet ofthe reordering here sto concentrate the higher abundances inthe dingonal reson of the mati, ani i then relatively easy to identify species wish have characteristially dierent abundance. levels between (99) sample groups | and 2 (eg. species 6, 1-4, 23, 18,3). However, fora matrix with larger numbers of species and less sisfictory species ‘ndinaton, a more alomatic, natal procedure for identifying intucraal species is preferable, sfllows, si rity breakdown ‘The fundamental information on the multivariate structure of an abundance matrix is saris inthe Bray-Curts similares between samples, andi by sisagrezating these hat one most precisely denis the species responsible for panicle aspects of the utivarate picture” So, fst compute the average Ne iz S015) SSDS) E55 ere al ae 1 Acwria basa 122 aa oe cei Limitations ofthe method The SIMPER procedure las to main constrains which, to-some extent limits useulnss* 1) tapi only to Bry-Curis dissimilarities, whereas ‘one might legitimately want to examine the ialence of particular variables in a more general ase, eg ‘when the variables ae net species abundances but environmental ease, and the dissimilarity coet- ‘sient i not Beay-Curi ut Euclidean distance by Itcompares two groups of samples ata tie, ienify- ing the inflenil species only for each specific comparison, Some multivariate patterns, however, fare not so readily categorised but represents cont- jnaum of community change in response te one of ‘more underying gradients. tink coarae ite recon monet iing soar ott ofa To als ‘itn tthe sins breton mo let ff | (lot ce Pat alsa rn oe ie? What is needed here is a more holistic technique, ‘ening the se of itu species whch, bnween ‘hem, capture the full multivariate pater (whether ‘lustre or forming a gradation), and whch operates ‘with any appropriately-dfind similarity coeticient ‘A possible method is suggested in a later chapter (16) ‘on comparing multivariate pater RECOMMENDATIONS ‘A mlivariate display of the samples ether by an ceination of cluster anal, not the end point of 2 community analysis it should be seen asa Tame- ‘work within which the pattems of individual species sbundanes can be interpreted 1) This may be by simple re-examination of the data mas, ordered and ro proseatd (perhaps averaged ‘within groups) inthe light ofthe information fom the ‘mulivarae analysis 2) In the case ofa convincing clustering of samples, individual species contributions tothe seperation ofthe _roups canbe examined with the SIMPER procedure ‘Note tht hii ot a statisti esting framework, just an exploratory analysis. I ingieates which species are Princ responsible either fran observed chsterng pater oF for differences between sels of samples that have been defined @ prior and are confirmed to ier in community strctre bythe tess o Chapter 6 5) Species idenitied in this manner (or by the more ‘general pttemmatching procedures “discussed in {Chapter 16) are sometnes viewed most efestivelyin ‘conjunction withthe ordination. One at tie, they ‘an be superimposed on an MDS (or PCA) plot, a3 cirles whose vying diameters reflect the abundance ‘hinges for that spociey across samples (see, for ‘example, Fig 183). "tah te PRIMER MDS and PCA vets al fo sigh foward pringoion of eterna es eae ‘am ha snd tote teeta) or om at Indpenti t po ontontal ars raed forename. chapters age 1 CHAPTER 8: DIVERSITY MEASURES, DOMINANCE CURVES AND OTHER GRAPHICAL ANALYSES UNIVARIATE MEASURES A variety of diferent statistics (single numbers) can ‘be wsed as measures of some atte of community sete in sample. These incle the total number of indvidals (total aumiber of pecies (5, the tal Doma (2), ad seo ratios sucha BV (ie average size of an organism inthe sample) ad NS (he average humber of individuals per species). Abundance oF biomass totals (or averages) are not dimensionless _quattes so tend tobe less informative than diversity Indices, such as: chness ofthe simple in terms of the umber of species fora given nunber of inviduals; dominance in the way in which the total number of individuals inthe sample is divided up among the Aitfret species (echnical refered asthe species ‘abundance distribution) Diversity ‘The main aim is 1 rede the rolivarate (multi- species) compleiy of assemblage data ita a single index (r small numberof indies) evaluated foreach sample, which can then be bandlbd statistically by univariate analyses. It wil often be possible opps Standard normalteory tess (tests and ANOVA) 10 such derived indies, possibly afer transformation (Gee page 61), [A bewildering variety of diversity indices has been sed, in avast lteature on the subject, and some of the most frequently used candidates are listed below “More detail ean be found in vo (of several) overviews imed specially a the biological render, Heip ef al (1988) and Magurean (1991). It should be noted, however, tht diversiy indies ofthis type tend to exploit some combination of just two features of the sample infomation 1) Species richness. This mesure ithe simply the total number of species present or some adjusted 5 the PRIMER DIVERSE rin pert ton of bt ‘from and mor) rng th es ret fo ‘ahr nao pro mat eel ocage "Ts can fen be damon ue fen clang «pln of tes ot of ome hen ot th ‘ating wera th PCA row tg ery ‘mes hey nr emromenl vere Caper Theft oo ppl ara en hen dpa ae fe sry he lie of ns he oration lotsa on mesa. form which atempt o allow for differing numbers ‘of individuals. Obviously for samples which ae Stietly comparable, we would consider a sample containing mote species than another obs the more divers, )Equitabitiy. ‘This expresses how evenly the iniv= ial are distibued among the different species and is often termed evenness. For example if two ‘samples each comprising 100 individuals and fur species had species abundances of 25, 25,25 25 tnd 97, 1,1 1, we would ituively consir the former 10 be more diverse although the species ices isthe same The former hs high evenness, bout low dominance (essentially the reverse of ‘evenness, while the later has low eveanes and high dominance (he sample beng highly dominated by one species, Different diversity indies emphasize the species rch nes or equitability components of diversity to varying deares. ‘The most commonly used diversity measure 'sthe Shannon or Shannon-Wiens) diversity index: WE piloap) “ny weep isthe proportion ofthe total count (or biomass arising om the th species. Note that logarithms tothe bse 2 are often sometimes used in the elation, reflecting the index's genesis in information theory ‘There is, however, no natal biological interpretation here, so the more sual natural logarithm (0 the base ©) is probably preferable, and commonly sed. Clea, ‘when comparing published indices tf important to heck thatthe same logarithm base hasbeen sed in tach case, If is simple convert Between results Since logs = (ogs)(log2), ie all indices just need to be multiplied or divided by a constant factor Whether itis sensible wo compare acres diferent sts i another mate, since Chapter 17 shows tha, like many of the indices given here (Simpson being notable exception, Fig 17.1) it can be sensitive to the ears of sampling effort. Hence H shoul! only be compared across equivalent sampling sins, Species richness ‘Species richness is often given simply a6 the tol number of species (S), which is obviously very ‘dependent on sample size (the bigger the sample, the more species there ae likely 10 be). Alleratively, Margaef's index () ius, which ls incorporates, the total numberof individuals (N) and is a measure ofthe numberof species present for a given number ofindividuale 22 1 /gn > Bagwitabii This is often expressed a Pelou's evenness Index: FEW Mtn = HRS 3) whore He isthe maxinum possible value of Shannon iversiy, Le. hit which woul be achieved fl species ‘were eqally abundant (ame, loa) Simpson Another commonly sed meas is the Simpson index, which has a numb of fons 2e= (ENEDVINW-D) ~ MN-DYEND where Nis the numberof individual of speci /. The Index 3. has a natural interpretation as the probability that any to individual from the sample, chosen at random, are frm the sume species (i always = 1, eis a dominance index in the sense that ts largest ‘values eorrespond 1 asiemblages whose ttl abund- nce dominate by one, a Very Te, ofthe species present Is complement, |= is thus an quit for evenmes inden, air it ages vale (of | ~S) ‘vena species ave the same abundance, The sigh revised forms 2 and = are appropiate when teal Sample size (\) i small (hey correspond to choosing the two individuals at random without replacement rather than with replacement). AS with Shannon, Simpson diversity canbe employed when the {p} me fre proportions cf biomass, standardised abund- fnce or ofher dx ‘which are not stctly intra counts but, in that ase, he and 2 forms are not appropriate oy Other counthasel measures Further welLesablished nics into that of Bilouin {Gee Pci 1975) = Nog AUNINMD) (BS) and. further model-based description, Fisher's a (Fisher eta, 1983), which isthe shape paneer, fined by maximum Tikelihood, under the sssemption thatthe species abundance disibution follows a log, series. This has certainly been shown to be fe case for seme ecological Us ets but ca by no mess Be universally assumed, and (as with Brillouin) is use is clearly restricted to genuine (integral) count. ‘The Final option in this eategoey is the rarefaction method of Sanders (1968) and Harber (1971, which Under the strict assumpsion that individuals rive in the sample independenly ofeach other, can te used to project back from the count of total spies (8) ‘and individuals (N), how many species (ES) would Fave been “expected” had we observed a small umber (ot individs woZif wv sa) ao (W=N, mE | “The iden i thereby to generate an absolute messre of species richness, say AS: (the numberof diferent Species “expected” in sample of 100 individuals, ‘which en be compared scons samples of very dieing Sizes. Itmust be admit, however, hat the independ ence asumgsion is pracaly unrealistic, Ireorsponds {0 individuals from each species being sptialy rand comly distributed, giving rise to independent Poisson ‘count in replicate samples. This is rarely observed in practice, with most species exhibiting some form of spatial clustering, which ean offen be eee, Rarefaction will then be stongly biased, consistently ‘overestimating the expected number of spetes for ‘aller sample sizes, 4 wumbers Finally, Hil (19736) proposed a unification of several iversity measures in‘ sige satis, which cides as special cases: mes expr) me vEpt N= 1 / masts an 1; is thus a transform of Shannon diversity, the reciprocal of Simpsons 2 and N. is clearly another possible evenness index, taking larger values if 20 Species dominate theta abundance, Other variations these lll numbers are given by Hep (1989) Macrotaune Units of measurement ‘The numbers of individuals belonging to each species are the most common units used inthe caleulation of the above indies, For internal comparative purposes ‘other units an somtimes be used eg. biomass o tal over of each apeciesshng « wamect unin quadrats (x Forhard-bttom eifauna), but obviously diversity Ieasures using differen units are not comparable (hen, on hard bottoms where coloial enersting ‘organi re diffe to enumerate percent cover wil be much more realists to determine than species abundances. ‘Representing communities ‘Changs in univariate indices between sites o overtime ae ustally presented praphially simpy as plots of means and confidence intervals foreach ste o tine For example Fig 81 graphs the difrences in diversity of the macrobenthos and meiobenhic nemstodes at six stations in Hamilton Harbour, Bermuda, showing that there are clear diferences in diversity between sites forthe fomer but much less obvious diferences forth ater. Fig. 8.2 graphs the temporal changes in three univariate indices fr reef eras at South Tikus Islnd, Indonesia spanning the period of the 1982-3 EI Nito(an abnormally long perio of hi water temp atures which caused extensive coral leaching fn ‘many areas throughout the Pacific Note the dramatic decine between 1981 and 1983 and subsequent partial recovery in both the number of species (S) andthe Shannon diversity (7), but no obvious changes in evenness (1), Diseriminat or times “The significance of differences in univariate indies between sampling sites or times can simply be tested hy one-way analyse of variance (ANOVA), flowed Ginpeers meee Nematodes tt Pig 81, Main Harbour, ‘ermutei Dossty ‘nt 9% conic aro for mare od ‘ene nol it tt by tet or (preferably) multiple comparison test for individual pas of ses se the discussion a the tart of Chaper'6. Determining stress levels Increasing eel of envteamentl sess have historiee ally Been considered to decrease diversi eI?) decrease species richness (ed) ad decrease even. ress (68.7) ie. create dominance. This interpre ation may, however, be at eversimpifistion of the situation.” Subsequent theories on the influence of istrbance or stress om diversity have suggested that in sinations where distrbance is minimal, species versity is reduced because of competitive exclusion between species; with slighty increased level ot fiequency of disturbane> compettion is relased, resulting in an increased diversity, and than at sill higher or more frequent levels of disturbance species Sat to become eliminated by ses, s that diversi falls again. Ths it a iterate levels of distur nce that diversity i highest (Connel, 1978; Huston, 1979). Therefore, depending onthe sting point of ‘he community in relation to existing sess levels increasing levels of stress (eg induced by pollution} may cer esl in aninerease or decrease in dives Tes diffculy i not impossible to say at what point ‘on this continuum the commonity under investigation exists, or what value of diversity oe might expect at that siti the community were aot subjected fo any anthropogenic stress. This, changes in diversity ean only asessed by comparions between stations along 8 spatial contamination gedient (e Fig 8.1) or with historical dat (Fig. 82), 1 se noe ety aie nal as ais pags and arent ropa neve ves 9 PRDTER: te DIVERSE roe cn ny cpr he che a of ers fotach snp ara wore ich can Se ed i el oe "fora or pa oma uh paca

You might also like