You are on page 1of 8

MINIREVIEW

A guide to statistical analysis in microbial ecology: a
community-focused, living review of multivariate data analyses
Pier Luigi Buttigieg1,2,3 & Alban Ramette1,4
1
HGF-MPG Group for Deep Sea Ecology and Technology, Bremerhaven, Germany; 2Alfred Wegener Institute Helmholtz Centre for Polar and
Marine Research, Bremerhaven, Germany; 3MARUM Center for Marine Sciences, Bremen, Germany; and 4Max Planck Institute for Marine
Microbiology, Bremen, Germany

Correspondence: Pier Luigi Buttigieg, Abstract
HGF-MPG Group for Deep Sea Ecology and
Technology, Max Planck Institute for Marine The application of multivariate statistical analyses has become a consistent fea-
Microbiology, Celsiusstrasse 1, 28359 ture in microbial ecology. However, many microbial ecologists are still in the

Downloaded from http://femsec.oxfordjournals.org/ by guest on October 7, 2016
Bremen, Germany. Tel.: +49 421 2028 984; process of developing a deep understanding of these methods and appreciating
fax: +49 421 2028 690; their limitations. As a consequence, staying abreast of progress and debate in
e-mail: pbuttigi@mpi-bremen.de
this arena poses an additional challenge to many microbial ecologists. To
address these issues, we present the GUide to STatistical Analysis in Microbial
Present address: Alban Ramette, Institute of
Social and Preventive Medicine (ISPM),
Ecology (GUSTA ME): a dynamic, web-based resource providing accessible
University of Bern, Finkenhubelweg 11, 3012 descriptions of numerous multivariate techniques relevant to microbial ecolo-
Bern, Switzerland gists. A combination of interactive elements allows users to discover and navi-
gate between methods relevant to their needs and examine how they have been
Received 16 June 2014; revised 30 used by others in the field. We have designed GUSTA ME to become a com-
September 2014; accepted 6 October 2014. munity-led and -curated service, which we hope will provide a common refer-
Final version published online 5 November ence and forum to discuss and disseminate analytical techniques relevant to
2014.
the microbial ecology community.
MICROBIOLOGY ECOLOGY

DOI: 10.1111/1574-6941.12437

Editor: Gerard Muyzer

Keywords
multivariate statistics; online resource;
interactive guide; complex data.

(Kuczynski et al., 2012) and environmental studies (e.g.
Introduction
Zinger et al., 2012). Notable examples include the MOTHUR
Multivariate statistical analyses are typically used to sum- software (Schloss et al., 2009), the Quantitative Insights
marise high-dimensional data, test hypotheses involving Into Microbial Ecology (QIIME) platform (Caporaso et al.,
multiple response variables, and examine relationships 2010), the PHYLOSEQ package (McMurdie & Holmes, 2013)
between large sets of variables (Legendre & Legendre, and the Biodiversity Virtual e-Laboratory (BIOVEL; http://
1998; H€ardle & Simar, 2007). The use of multivariate www.biovel.eu/) project. While such developments may
analyses is supplanting ‘simple’ descriptive analyses across lead one to conclude that standard statistical recipes and
ecology (see James & McCulloch, 1990 and Økland, 2007 ‘workflows’ now exist for microbial ecology data, it is
for comment) and has become common in microbial vital to recognise that gauging the appropriateness of a
ecology, where complex, multidimensional data sets abound given technique to the data and phenomena under inves-
(e.g. Ramette, 2007; Bertics & Ziebis, 2009; Frossard tigation is not necessarily a ‘cut and dried’ affair.
et al., 2012; Thioulouse et al., 2012; Hartmann et al., Firstly, it is essential to recognise that the application
2013; Rivers et al., 2013). Indeed, numerous software of statistical techniques to ecological data is the focus of a
tools used by microbial ecologists implement multivariate living field of study: numerical ecologists and statisticians
analysis techniques and have been recommended as stan- routinely re-evaluate the properties and limitations of
dard components of, for example, microbiome analysis even well-known techniques in relation to ecological

FEMS Microbiol Ecol 90 (2014) 543–550 ª 2014 The Authors. FEMS Microbiology Ecology
published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies.
This is an open access article under the terms of the Creative Commons Attribution License, which permits use,
distribution and reproduction in any medium, provided the original work is properly cited.

however. While some insist tion approaches (e. 2005a. racterise spatial structures in ecological data across all Ramette. The contemporary and faceted techniques which show promise in an empirical setting nature of such debates presents another challenge to often require review from expert statisticians to be fully the effective and duly cautious application of powerful understood. 2013). type analyses and may not deliver as much power as an ties between sampling units rather than analyse abun. goal in ented multivariate analysis. From the above examples. FEMS Microbiology Ecology FEMS Microbiol Ecol 90 (2014) 543–550 published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies. Similarly. 2004. examination of ‘raw’ presence–absence or abundance dance data directly. many microbial ecologists who are not equipped recently unfolded in the journal Ecology (Legendre. proposals of new tech. increase and their application to microbial ecological data users must be aware of the key debates that emerge in the has become technically simplified. are conceptually ceptual formulation allows the identification of species appealing as they can address issues such as the handling groups with similar ‘opinions’ (gauged by their variable of the double zero problem: accounting for the fact that values) which may be used as indicators of a given eco. must occur to draw valid conclusions. however. Kuczynski et al. citing niques and adaptations of existing techniques are steadily methods based on generalised estimating equations (War- encountered.544 P. functions. or even necessary. These authors present a developed data. it is clear that users 2009. a and up-to-date understanding of their properties and lim- multi-year discussion concerning the analysis of beta itations is still not widespread in the community. (2006) proposed a form of principal design. 2007. (2006) developed supporting theory and connected et al. Legendre describes several ical entities across the same sampling units are not neces- important caveats to the statistic’s use in ecology. 2004. Aside from re-evaluation.. nonissue) components analysis suited to the sparse data sets gener. Treating species as the ‘judges’ native to W’s con. Prosser et al. One example features the work of Borcard & analytical methods in microbial ecology. Buttigieg & A. or experimental the well-known multivariate analysis of variance (MANOVA) units) which may violate key assumptions of regression- to approaches that rely on the calculation of dissimilari. 2010). 2004) and methods that replication of treatments (or environmental contexts) to systematically assess the impact of rare phylotypes on across ‘truly’ independent sampling or experimental units analytical results (Gobet et al. a developed multivariate analysis of ecological data.. Pavoine et al. all variables are suited to its assumptions. 2008. ated by. In response to these authors’ call for more thor.. 2003. with microbial ecology is ongoing (e. 2001. Schank & Koehnle. Curtis dissimilarity or Jaccard index. 2009. others argue that ples of relatively recent developments in ecologically ori. B€ oer et al. Legendre et al. Distance (W) in determining species associations in field survey and dissimilarity measures. However. (or other count-based) data. Legendre (2002). samples. On another front. observed absences (or zero abundances) of several ecolog- logical phenomenon. For example. 2009. 2016 approaches should be questioned and that alternatives mean–variance relationships characteristic of abundance may bring several advantages in generalisation and exten.. ined the value of the Kendall coefficient of concordance 2008. with new molecular techniques. Zinger et al. the use of these measures introduces dependen- Warton & Hudson (2004) compared the effectiveness of cies between objects (e.L. Similar debate which is argued to be better-suited to ecological data also surrounds aspects of experimental and sampling while Zou et al. Dray and ecological sampling strategies both on global (Rusch et al. 2007) and faces the challenge of keeping pace scales. 2012. 2007. sequencing technologies ough mathematical appraisal of their technique. (2012) underscored of multivariate statistical techniques in microbial ecology this issue as well as its connection to the use of new statisti- must stay abreast of a steadily developing body of work cal techniques in the field of aquatic microbial ecology. 2009. involving a wide range of expertise. Prosser.g. Zinger et al.g. Koehnle & Schank. 2009. Laliberte. 2011) and the original method to a broader set of autocorrelation on local scales (e.. who proposed a variant of the well. sites.oxfordjournals. with deep numerical training face a ‘black box’ approach ª 2014 The Authors. such as the well-known Bray– data..g. for example.org/ by guest on October 7. For example. as not sarily indicators of similarity between those entities. Legendre (2005b) recently re-exam. Tuomisto & Ruokolainen. Anderson (2001) developed a ton. new ecological investigations. 2006). new ordina. For example. . for ordination (Legendre & Gallagher. As they emerge. such as the issue (or. 1984.. Coss. 2011. Ramette needs. genomic sequencing technologies. 2008.g. Zhou et al. Warton et al. 2011) and an original method named constrained nonparametric multivariate analysis of variance approach additive ordination (Yee.. 2001). 2006) as examples. 2008. the harmonisation of canonical ecological theory known principal coordinates analysis to detect and cha. Karsenti et al. As a diversity using distance-based and ‘raw data’ approaches result.. of pseudoreplication in ecological investigations (Hurlbert. The popularity of multivariate analyses is continuing to Secondly. Pelissier et al.. These authors call for greater sibility. Cottenie & De Approaches to meaningfully transform ecological data sets Meester. emphasis to be given to model-based approaches.. this may not be an achievable. Lastly. (2012) demon- case suggesting that the use of dissimilarity-based strated that (dis)similarity-based methods confound the Downloaded from http://femsec. to make informed methodological choices. 2010) provide other exam. as some contend. Oksanen.

Borcard et al. Examples of the interlinked. 1). and community-reviewed resource containing supplementary methods such as data transformations) on their own descriptions of both established and novel multivariate or preloaded data through an interactive. 1998. 2009) are helpful primers for microbial ecologists. range of multivariate techniques. they may directly however. A user may access or discover GUSTA ME’s high-level Downloaded from http://femsec. (c) users may browse through walkthroughs (see text) to observe how others have used multivariate methods and navigate to emerge with low frequency and are rarely targeted to methods that interest them. user-friendly web-page. (3) by following a ‘walkthrough’ Legendre. 2007. aiming rounding the issue of pseudoreplication – at greater instead to clarify the conceptual basis of each method. multicollinearity. we present the GUide to STatistical Analysis in Microbial Ecology (GUSTA ME. 2012) offer great depth and breadth. are evant to their investigations and interests through inter. GUSTA ME also discusses the debates noted above – such as that sur. who are often confronted with data visualisation library to select methods based on their visual output.oxfordjournals.g. mathematical descriptions as far as possible. Commentary on the statistical and ecological library (Fig. and nonmetric multidimensional scaling (NMDS). ways: (a) should a user know of the method. often face uncertainty in evaluating whether research- ers have performed appropriate analyses and produced fair interpretations of their results. Fig- as ‘end points’.org/ by guest on October 7. General warnings which refer to common risks (e. a wizard may be used Legendre. As these reference pages Finally. Borcard & sented by a ‘wizard’.. MASAME application to perform the featured method (as well as based. (b) should the user require guidance. End points avoid technical and formalised ure 2 illustrates how a user may navigate to the various FEMS Microbiol Ecol 90 (2014) 543–550 ª 2014 The Authors. its length. (e) sets which require specific statistical treatment. 2011. Legendre & function. ciples of these methods as well as their key assumptions ing study or (4) by browsing GUSTA ME’s visualisation and output. but for details). or (d) users may browse GUSTA ME’s microbial ecologists. during the course of ‘wizards’ (see below). Ramette. e MASAME lored to the needs of the microbial ecology community. ated with each method. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies. Below. Where appropriate.net/gustame) – an online. 1999) and principal (1) directly. Each end point describes the main prin- which reflects the analytical procedures used in an exist. Legendre & Anderson. such as canonical correspondence analysis (CCA) multivariate analysis by directing them to techniques rel. http://mb3is.A guide to statistical analysis in microbial ecology 545 to multivariate analysis and the associated risks of misap. In contrast. 1. multiple testing. high-level summaries of multivariate meth.megx. latter category include distance-based redundancy analysis ods (henceforth. and the meaning of its results. We believe this resource will assist microbial limitations. included as well as techniques which are relatively new or active interfaces. My Data Endpoint GUSTA ME: a living reference for multivariate statistics Periodic reviews of multivariate statistics targeting the Fig. specifically curated for their relevance to microbial ecology. ‘end points’) which users may access (db-RDA. coordinates of neighbour matrices (PCNM. (2) by following a series of questions pre. We From selected end points. links to references pertinent to each method are arrived at through user interaction. data dredging) are explained at greater length in dedi- Reference pages as end points cated pages which are linked to end points and intervene. 2016 uninitiated life scientist (e. Jombart descriptions of multivariate methods (end points) in a number of et al. web. . Reviewers. these components of GUSTA ME interpretation of each endpoint’s results is also included as well as its community-led development model are as well as warnings emphasising common pitfalls associ- described. Legendre & to guide the user to an end point appropriate to their needs (see text Legendre. we refer to them are provided on their respective description page. users have the option to launch a designed GUSTA ME as a compromise: a ‘living’.. 2002). To support and pro- b a c d mote the constantly developing understanding of multi- variate analyses in microbial ecology.g. too. seminal textbooks (e. GUSTA ME’s core comprises high-level descriptions of a as appropriate.g. must be limited in depth to achieve sufficient navigate to the relevant end point through direct links or via a search breadth. techniques. GUSTA ME comprises a collection of show significant potential in the field. dynamically updated resource with content tai. User plying techniques or misinterpreting results. Classical tech- ecologists in navigating the initially daunting field of niques.

Key steps or GUSTA ME’s wizards are only able to suggest a single methods included in the walkthrough are linked to the rel- end point when there is a (relatively) clear prescription evant end point(s). to a wizard. the end point for principal components analysis (PCA. Buttigieg & A. End points are linked to relevant material across GUSTA ME. under what circumstances and with what forms of data multivariate analyses have been used by the community. As an example. (RDA). indicated by an asterisk). . 2016 tasks whose outcome can be determined by following a sation of their analytical methodology as approachable.546 P. employed multivariate techniques serve as important ex- nique or set of techniques which would best match their emplars for the community. It is then left to the user to familiarise themselves with the techniques suggested and make an Wizards informed choice or to interact with other users via The interactivity of GUSTA ME is primarily offered GUSTA ME’s community forum (see below). 2. to guide users to a technique or family of techniques allowing users to deepen and broaden their understanding. (d) walkthroughs (see text for description) that feature the method described. as comma-separated-value files) or on preloaded ª 2014 The Authors. and (e) links to relevant literature. ME or in response to community input. b Visualisation libraries literature Links to g Linearity c a PCA * e Data visualisation is a major outcome of many multivari- g ate analyses and is instrumental in rendering high-dimen- * b sional data into a form that humans can grasp. A section of GUSTA ME is needs. Data transformations d PCoA * f PCNM * charts and plots native to many multivariate techniques are designed to be readily interpretable by analysts and nonanalysts alike. when for principal coordinates analysis (PCoA) and redundancy analysis user input is required. Dependent on Peer-reviewed studies in microbial ecology which have their answers.oxfordjournals. Wizards through ‘wizards’: user-interface agents that partition dif. Ramette components of the guide from a given endpoint. will be adapted as new end points are added to GUSTA ficult or complex tasks into a linear series of compara. This collection of when the answers required of the user are too technical methodological summaries provides an opportunity for in nature (i. connecting users to GUSTA ME’s con- for an analytical problem. Example-based learning through 1997). each Analysis applications – the MASAME suite page linked to a given end point is also linked to other material such Selected pages across GUSTA ME include links to inter- as (f) relatively new or advanced techniques and (g) potential approaches to contend with warnings. User input determines the succession of these steps and the outcome of the overall task (Dryer. Dryer (1997) noted that wizards are best suited to dedicated to the capture of such exemplars and the visuali- Downloaded from http://femsec. The recognition of an effective visuali- Walkthroughs sation may occur without deep knowledge of the underly- ing mathematical basis of a technique. shown linked to: (b) end points of related techniques such as those Visualisations link to an appropriate end point or.L. methods that may suit the user’s needs and will link to their end points. GUSTA ME features a library of visualisations which may be browsed Fig. (c) pages describing warnings that are associated with the technique. GUSTA ME’s wizards comprise a hierarchical suc- ‘walkthroughs’ cession of simple questions which approximate the deci- sion-making process of a data analyst. Graphs. microbial ecologists to examine.org/ by guest on October 7. Online applications in the active analysis applications which allow users to perform MASAME suite (see text) allow users to apply the method to their the technique or procedure discussed on that page. Consequently. GUSTA ME wizard will present a brief description of ing on their input and interaction. tively simple steps. predetermined prescription or recipe.e. When this is not the case or tent and curated reference material. interactive flowcharts dubbed ‘walkthroughs’. depend. Further. a) is which may deliver a useful representation of their data. Sections of GUSTA ME’s community for- ums are dedicated to the discussion of these walkthroughs. they presuppose knowledge which the tar. Advanced / new methods and users may contribute their own walkthroughs to the Heteroscedasticity c RDA * f db-RDA * guide. in an example-based get users of the guide are not expected to have). own data are launchable from selected pages across GUSTA ME either on their own data sets (which may be uploaded (here. FEMS Microbiology Ecology FEMS Microbiol Ecol 90 (2014) 543–550 published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies. a manner. users will be directed to consider a tech.

The student also browses from the perspectives of a doctoral student in search of a GUSTA ME’s community forum to familiarise themselves method to explore their multivariate data. A doctoral student wishes to explore a priori groupings guage. make an informed choice regarding the most appropriate cific scenarios in microbial ecology relative to more classi. and OTU relative abundances as variables. Suitably warned. GUSTA ME and MASAME are hypothesis testing methods. following the advice in each end point. accessed through user-friendly web-pages. post critiques. however. and note alternative views. and are easily enhanced to address new needs as they With their data screened and appropriately prepro- arise. This will be particularly useful in popularising measures. and a as the Multivariate AnalysiS Applications for Microbial reviewer harbouring concerns about a manuscript’s ana- Ecology (MASAME) suite. dent is unsure where to begin. the student is able to less well-known techniques that are better-suited to spe.. interactively adjust the methods’ parameters to suitable values. the student then proceeds to the data exploration wizard. measure to use with their (dis)similarity-based method of cal methods. their data for. transparency across microbial ecology. the student follows a link present a gateway for new contributors. cessed. As the results look promising. the stu- for microbial ecologists to share their evaluations of mul. and use or importance. we hope to choice. Through this excurse.g. By providing such a service. we refer to these applications investigator formulating a project proposal. among other features. Zuur et al. and editors on both the end points and the exploration wizard to an with additional analytical expertise to join and enhance end point and another wizard dealing with (dis)similarity the guide.. A single working group is likely to overlook or is presented which lists and briefly describes the aims of only partly represent developments which may be of great several (dis)similarity-based ordination. Analysis of Similarity (ANOSIM. User test (avoiding data dredging) and navigates to these meth- input will allow these resources to grow based on the ods’ end points. which call upon The student numerous functions from well-known packages belong- ing to the statistical programming environment and lan. alternatives put forth. For in a data set containing sampling sites as objects and example. enters a wizard and is prompted to consider screening cally meaningful methods (after Legendre & Gallagher. GUSTA the student launches the methods’ MASAME applications ME will serve to encourage analytical consistency and via links on each end point.oxfordjournals. statistical methods relevant to micro. the presence of out- 2001). Somewhat uncertain about the nature of needs of the microbial ecology community and will offer (dis)similarity measures. the stu- PCNM methods from the vegan package (Oksanen et al. discuss the methods featured.A guide to statistical analysis in microbial ecology 547 example data. Collectively. and gaps in the domain’s Satisfied that the candidate methods are appropriate. 2016 applications. The student decides to linked to on-line forums where users may comment on attempt an NMDS ordination complemented with an their content. 2014). multicollinearity. the student allow data transformations using standard and ecologi. The wizard presents a warning page describing the unfortunately all-too-common mistake of ‘data dredging’ Community involvement and development or ‘P-hacking’ (Nuzzo. 2014).. A page domains.org/ by guest on October 7. as e. 2014). suggest revi. we describe three usage scenarios of GUSTA ME deepen their understanding. and download functionality on a single liers. Returning to the NMDS and ANOSIM end points foster both a community-curated reference and a forum and using them much like short review articles. . either through between their samples are of prime interest. Using the ‘Explore data’ 2013) are combined with supporting functions which starting point on GUSTA ME’s home page. clustering. chooses the novel development or through adaptation from other path of analysis based on (dis)similarity matrices. rendered by the shiny package (RStudio Inc. moderators. analytical and theoretical repertoire highlighted. R (R Development Core Team. a principal with frequently asked questions concerning how their FEMS Microbiol Ecol 90 (2014) 543–550 ª 2014 The Authors. 1993) hypothesis sions. Such tools add a practical complement to from the MASAME suite to evaluate and preprocess their GUSTA ME’s review of multivariate analysis techniques data. after determining that the differences bial ecologists are constantly emerging. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies. The student navigates to descrip- point-and-click interfaces are common to all MASAME tion pages for these tasks and is able to use applications Downloaded from http://femsec. (partial) CCA. and (multivariate) normality (see webpage. (partial) RDA. the student As described above. dent becomes familiar with the requirements and limita- tivariate techniques. MASAME applications are lytical methods. Users need not know the R language. the student explores some of the Usage examples relevant literature listed on each methods’ end point to Below. reached. They upload their data and. We hope that as consensuses are tions of the candidate methods. 2010). plotting. NMDS. Clarke. then proceeds and. Thus.

authors explore this method or rigorously justify their use of MANOVA. diversity – new insights into marine ecosystems function- ing applications.g. AR is funded by the Max Planck basic form..eu) for further development. and concludes that it is very unlikely ing and its biotechnological potential) under grant agree- that this parametric hypothesis test can be applied in its ment no 287589. it et al. ment of multivariate statistical approaches in microbial quately reports if the method’s key assumptions have ecology and look forward to the involvement of the com- been met. we will implement the user-feedback mech- parametric multivariate hypothesis test. integration. sup. the PI familia. . become central to ecology. Buttigieg & A. variation par. nonparametric MANOVA (NPMANOVA or PERMA- employing multivariate analyses and contributes a walk. 2010). NOVA.. The reviewer finds that the authors have not munity in this endeavour. aug. the authors have not reported if they have screened their This work is a component of the Micro B3 project and is data for outliers.. and homogeneity of covariances. integrating geno- details of relevant experts. mic and environmental data with an array of tools and tact these experts and invite them to create and manage services for the global research community. in a manuscript. 2005a. ª 2014 The Authors. GUSTA ME and MASAME will be integrated into design and replication strategy to their proposed analyti. the student publishes a study skilfully mutational. We hope these reviewer uses GUSTA ME’s search function to locate the efforts will promote the usage. further develops. while its implementa- spatial distance. The authors declare no conflict of interests. Ultimately. As it path analysis (Wright. and PCNM. Society. even in its current form. 1934).mic- cal approaches in order to arrive at valid conclusions. Encountering a starting point for a more comprehensive solution. 2006). Further. and uses GUSTA ME’s community will serve as a multicomponent information system for forum to post requests for enhancement and adds contact European marine microbial genomics. tion allows users to quickly locate and focus their efforts ods others have used to approach similar questions. (2012) which feature the use of RDA. Further. the MicroB3 Information System (MicroB3 IS. Programme (Joint Call OCEAN. The reviewer both resources may also be used independently. Guided A reviewer is unsure about the appropriateness of a by user input. PI browses GUSTA ME’s walkthroughs and interacts with We recognise that the current state of GUSTA ME is but their components to quickly learn more. The reviewer down. The GUSTA ME editors con. FEMS Microbiology Ecology FEMS Microbiol Ecol 90 (2014) 543–550 published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies. discussion.L. will provide a useful resource for microbial ecologists titioning (Legendre. 2016 walkthroughs based on Bienhold et al. noted in GUSTA ME. Fortunately. In particular. and processing modules by provid- ing support in the analysis of integrated data. ments their initial project proposal.org/ by guest on October 7. while reading the MANOVA end point. multivariate anisms and editorial policies required to allow commu- analysis of variance (MANOVA). Curious as to which multivariate meth. In the near realises the central importance of aligning their sampling future. Anderson. and develop- relevant end point and ensures that the manuscript ade. wherein methods that have ported by GUSTA ME’s end points and warnings. how- Downloaded from http://femsec. line- Acknowledgements arity. 2012). uploads it to the MASAME data screen.2011-2: Marine microbial loads the data. 2001). The reviewer suggests that the through based on their study to GUSTA ME. GUSTA ME has the potential to become rises themselves with the interplay of the central and a focal repository for accessible analytical knowledge and ancillary methods involved in these studies and. may be easily explored. The principal investigator Conclusion & outlook A principal investigator. The nity-led development of this resource. rob3. as well as their criticisms (e. The MicroB3 IS. (2012) and Kopp ever. GUSTA ME end points and wizards aligned with their expertise. Its content offers an accessible environment or whether changes are simply a function of resource for teaching and reference. who has an introductory famil- iarity with multivariate statistical methods. we are confident that. debate in microbial ecology. 2007. the PI Warton et al. the study’s authors have funded by the European Union’s Seventh Framework made their data available for review. the on analytical approaches pertinent to their investigations.oxfordjournals. and MASAME will complement the system’s data man- agement.548 P.net web platform (Kottmann et al. reported if their data have been appropriately trans- formed to meet the assumptions of near-normality. based The PI also feels that several important methods are not on the megx. www. Ramette method of choice is best applied to microbial ecology the reviewer is directed to an end point describing per- data. however. Peres-Neto et al. wishing to delve deeper into multivariate statistics. is designing GUSTA ME is an interactive ‘living’ review of multivari- an investigation to assess whether energy availability ate analyses with specific relevance to the microbial drives microbial community change in a little-studied ecology community.

Coss RG (2009) Pseudoreplication conventions are testable Laliberte E (2008) Analyzing or explaining beta diversity? hypotheses. Yilmaz Springer. Anderson MJ (2001) A new method for non-parametric Oikos 104: 591–597. – IUI ‘97 (Moore J. Kuczynski J. 330–341. Walters WA. Legendre P & Bouchon neighbour matrices. Austral Ecol 18: analytical tools for studying the human microbiome. and beyond. Austral Ecol 26: 32–46. community structure in subtidal sands. Legendre P & Legendre L (2012) Numerical Ecology. Ecol Model 153: 51–68.A guide to statistical analysis in microbial ecology 549 Hurlbert SH (2004) On misinterpretations of References pseudoreplication and related matters: a reply to Oksanen. Widmer F & Frey B 3238–3244. Zimmermann S. Monogr 75: 435–450. Dray S.oxfordjournals. Springer. Edmonds E & Puerta A. Gevers D & Knight R (2012) Experimental and of changes in community structure. Amsterdam. Parfrey LW. FEMS Microbiology Ecology published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies. Mutz M & Gessner MO (2012) ecological experiments. P. Analysis. Kuczynski J. ISME J 3: 1269–1285. Koehnle TJ & Schank JC (2009) An ancient black art. 506: 150–152. Kostadinov I. 2016 C (2012) Spatial and temporal variation in a Caribbean Borcard D. eds). J Comp Psychol 123: 434–443. Comment. Downloaded from http://femsec. 483–493. with R (Gentleman R. Oecologia corridors. J Comp Borcard D & Legendre P (2002) All-scale spatial analysis of Psychol 123: 452–458. Legendre P. data. energy-diversity relationship of complex bacterial Jombart T. Berlin. Gobet A. Amsterdam. pp. Stombaugh J et al. FEMS Microbiol Ecol 90 (2014) 543–550 ª 2014 The Authors. Hurlbert SH (2009) The ancient black art and transdisciplinary Bertics VJ & Ziebis W (2009) Biodiversity of benthic microbial extent of pseudoreplication. Niklaus PA. Nat 117–143. Hankeln W. explaining beta diversity? Comment. Hurlbert SH (1984) Pseudoreplication and the design of Nuzzo R (2014) Scientific method: statistical errors. Rev Genet 13: 47–58. (2010) QIIME Megx. Nucleic Acids Res 38: D391–D395. L€ uscher P. 3rd edn. J Coast Res 278: 63–72. Nature ecological field experiments. Borcard D & Peres-Neto P (2008) Analyzing or Hartmann M. Ecol Monogr 69: 1–24. multivariate analysis of variance. Ecol Model 196: 10: 226–245. Ecology 89: Kremer J. ecological data by means of principal coordinates of Kopp D. Heredity 102: 724–732. analysis: testing multispecies responses in multifactorial Frossard A. Buttigieg PL. guides. Nat Methods 7: 335–336. Bouchon-Navaro Y. Gerull L. Nucleic Acids Res 38: e155. Legendre P (2007) Studying beta diversity: ecological variation Dryer DC (1997) Wizards. Abarenkov K. Lauber CL. Schmutz S. Elsevier. communities in bioturbated coastal sediments is controlled James F & McCulloch C (1990) Multivariate analysis in by geochemical microniches. Clemente Clarke KR (1993) Non-parametric multivariate analyses JC. Louis M. Fuhrman JA. Legendre P & Gillet F (2011) Numerical Ecology herbivorous fish assemblage.org/ by guest on October 7. Pontier D & Dufour A-B (2009) Genetic markers communities in Arctic deep-sea sediments. 2nd edn. . Hornik K & Parmigiani GG. New York. 265– Legendre P & Anderson MJ (1999) Distance-based redundancy 268. Karsenti E. ACM Press. Duhaime MB. Disconnect of microbial structure and function: enzyme Legendre P & Gallagher ED (2001) Ecologically meaningful activities and bacterial communities in nascent stream transformations for ordination of species data. J Agric Biol Environ Stat analysis of neighbour matrices (PCNM). microbiome census data. spatial variation of community composition data. Ecol Oikos 100: 394–396. Waldmann J & Gl€ ockner FO (2010) Caporaso JG. Hedtkamp SIC. ISME J 6: in the playground of multivariate analysis. Cottenie K & De Meester L (2003) Comment to Oksanen Legendre P (2005a) Analyzing beta diversity: partitioning the (2001): reconciling Oksanen (2001) and Hurlbert (1984). J the 2nd International Conference on Intelligent User Interfaces Plant Ecol 1: 3–8. H€ardle W & Simar L (2007) Applied Multivariate Statistical Elsevier. Heidelberg. PLoS ONE 8: e61217. van Beusekom JEE. PLoS Biol 9: depth-related variations in bacterial diversity and e1001177. Ecology 89: 3232–3237. ISME J 8: reproducible interactive analysis and graphics of 226–244. ISME J 3: 780–791. eds). Kottmann R. (2013) Resistance and resilience of the forest soil McMurdie PJ & Holmes S (2013) phyloseq : an R package for microbiome to logging-associated compaction. J Comp Psychol 123: 444–446. Legendre P & Peres-Neto PR (2006) Spatial modelling: Legendre P (2005b) Species associations: the Kendall a comprehensive framework for principal coordinate coefficient of concordance revisited. B€ oer SI. Bork P et al. ISME J 6: 680–691.net: integrated database resource for marine ecological allows analysis of high-throughput community sequencing genomics. Boetius A & Ramette A (2012) The Rev Ecol Syst 21: 129–166. Acinas SG. ecology and systematics: panacea or Pandora’s box? Annu Bienhold C. 2nd edn. Quince C & Ramette A (2010) Multivariate Cutoff Legendre P & Legendre L (1998) Numerical Ecology. (2011) A holistic Boetius A & Ramette A (2009) Time. Level Analysis (MultiCoLA) of large community data sets. Proceedings of partitioning by multiple regression and canonical analysis. New York. 129: 271–280. Ecol Monogr 54: 187–211.and sediment approach to marine eco-systems biology.

Prosser JI. Hurlbert. effects. Fuhrman JA. 384–392. and comparison of fractions. Rusch DB. Oikos 104: 598–605. http://CRAN. Welch DBM. Solymos P. http://CRAN. Amaral-Zettler LA.1. Xue K. Huse SM. Dufour A-B & Chessel D (2004) From Tuomisto H & Ruokolainen K (2008) Analyzing or explaining dissimilarities among species to dissimilarities among beta diversity? Reply. Simpson GL. Wright ST & Wang Y (2012) Distance-based Prosser JI (2010) Replicate or lie. Ramette A (2007) Multivariate analyses in microbial ecology. Ryabin T et al. Couteron P & Dray S (2008) Analyzing or estimating equations. analysis of high-dimensional data using generalized Pelissier R. Appl Environ Microbiol Oksanen L (2004) The devil lies in details: reply to Stuart 75: 7537–7541. platform-independent. . Yee TW (2006) Constrained additive ordination. ONE 6: e24570. Wu L.0-7. http://www. Ecology 85: 858–874. Stevens MHH & Ecol Stat 19: 499–520. Mol Ecol 21: 1878–1896. Ecology 89: 3244–3256.R-project.550 P. Biometrics 67: 116–123. ª 2014 The Authors. FEMS Microbiology Ecology FEMS Microbiol Ecol 90 (2014) 543–550 published by John Wiley & Sons Ltd on behalf of Federation of European Microbiological Societies. Environ Microbiol 12: multivariate analyses confound location and dispersion 1806–1810. Zhou BY. FEMS Microbiol Ecol 62: 142–160. explaining beta diversity? Ecology 89: 3227–3232. He Z & Yang Y (2013) Random sampling process leads to org/ overestimation of b-diversity of microbial communities. Ieno EN & Elphick CS (2010) A protocol for data Schank JC & Koehnle TJ (2009) Pseudoreplication is a exploration to avoid common statistical problems. Ecology 87: 2614–2625. (2014) shiny: Web Application Framework for R. Zinger L. Methods pseudoproblem. Ann Math ecological theory in microbial ecology. Gobet A & Pommier T (2012) Two decades of R package version 0.org/ beta diversity? Understanding the targets of different package=vegan. Sutton G et al. RStudio Inc. Sharma S. PLoS 7: 2315–2329. Statistical Computing. Sogin M. analyses in soil microbial ecology: a new paradigm.oxfordjournals. PLoS Biol 5: e77. J Comput Graph Stat 15: 265–286. Shi Z. Warton DI & Hudson HM (2004) A MANOVA statistic is just Downloaded from http://femsec. R Foundation for Zhou J.R-project. Bohannan B & Curtis T (2007) The role of Wright S (1934) The method of path coefficients. Legendre P. Ramette Økland RHR (2007) Wise use of statistical tools in ecological Schloss PD. Environ O’Hara RB. (2009) Introducing field studies. Methods Ecol Evol 3: 89–101.2. Austria. mothur: open-source. Hastie T & Tibshirani R (2006) Sparse principal Global Ocean Sampling expedition: northwest Atlantic component analysis. J Theor Warton DI (2011) Regularized sandwich estimators for Biol 228: 523–537. Legendre P. Dray S & Borcard D (2006) as powerful as distance-based statistics for multivariate Variation partitioning of species data matrices: estimation abundances.org/ describing the unseen majority of aquatic microbial package=shiny.10. Nat Rev Microbiol 5: Stat 5: 161–215. Wagner H (2013) vegan: community Ecology Package. Deng Y. methods of analysis.r-project. comparing microbial communities. 2016 Peres-Neto P. Martin J. Jiang Y-H. Vienna. Halpern AL. Buttigieg & A. Westcott SL. through eastern tropical Pacific. Zinger L.org/ by guest on October 7. Ecology 87: 2697–2708. diversity. Kindt R. Zuur AF. Ecology 87: R Development Core Team (2014) R: A Language and 203–213. R Tuomisto H & Ruokolainen K (2006) Analyzing or explaining package version 2. (2007) The Sorcerer II Zou H. communities: a double principal coordinate analysis.L. Joye SB & Moran MC. Oksanen L (2001) Logic of experiments in ecology: is community-supported software for describing and pseudoreplication a pseudoissue? Oikos 94: 27–38. J Comp Psychol 123: 421–433. Ecol Evol 1: 3–14. Prin Y & Duponnois R (2012) Multivariate Oksanen J. Environment for Statistical Computing. Minchin PR. Thioulouse J. Pavoine S. Blanchet FG. Folia Geobot 42: 123–140. Warton DI. ISME J beta-diversity in seafloor and seawater ecosystems. mBio 4: e00324-13. Horner-Devine Rivers AR. Martiny JBH. MA (2013) Transcriptional response of bathypelagic marine Boetius A & Ramette A (2011) Global patterns of bacterial bacterioplankton to the Deepwater Horizon oil spill. Tringe SG.