Statistics for Ecologists Using R and Excel
Data collection, exploration, analysis and presentation
Mark Gardener
DATA IN THE WILD SERIES
Pelagic Publishing  www.pelagicpublishing.com
Published by Pelagic Publishing www.pelagicpublishing.com PO Box 725, Exeter, EX1 9QU
Statistics for Ecologists Using R and Excel ^{®} Data collection, exploration, analysis and presentation
ISBN 9781907807121 (Pbk) ISBN 9781907807138 (Hbk)
Copyright © 2012 Mark Gardener
All rights reserved. No part of this document may be produced, stored in a retrieval sys recording or otherwise without prior permission from the publisher.
of the information presented, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Pelagic Publishing, its agents and distributors will be held liable for any damage or loss caused or alleged to be caused directly or indirectly by this book.
more information visit www.apple.com.
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library.
Cover image © istockphoto.com/dulezidar
About the author
Mark began his career as an optician but returned to science and trained as an ecologist. His research is in the area of pollination ecology. He has worked extensively in the UK as well as Australia and the United States. Currently he works as an associate lecturer for the Open University and also runs courses in data analysis for ecology and environmental science.
Acknowledgements
I am especially grateful to Nigel Massen at Pelagic Publishing for his help and persever ance throughout the production of this book.
Thanks go to Anne Goodenough for patiently and thoroughly reviewing the manuscript, your comments and views were most helpful.
With a book of this nature data examples are always useful. Some of the data illustrated for allowing me to use these data as examples.
process.
Software used
^{®} spreadsheet were used in the preparation of this ^{®} although other versions may also be illustrated (including Excel X for Apple Macintosh ^{®} ).
Several versions of the R program were used and illustrated including 2.8.1. for Windows
Downloading free code examples
statistics can be found at:
Reader feedback
We welcome feedback from readers – please email us at info@pelagicpublishing.com and of your email.
Publish with Pelagic Publishing
with a particular focus on ecology, conservation and environment. Pelagic Publishing pro duces books that set new benchmarks, share advances in research methods and encourage and inform wildlife investigation for all.
If you are interested in publishing with Pelagic please contact editor@pelagicpublishing. statement describing the impact you would like your book to have on readers.
Contents
Introduction 
viii 
1. Planning 
1 


















2. Data recording
23




3. Beginning data exploration – using software tools
29






4. Exploring data – looking at numbers
57
















vi  Contents
5. 
Exploring data – which test is right? 
91 






6. 
Exploring data – using graphs 
95 












7. 
Tests for di erences 
103 
ttest 
103 

Utest 
112 




8. 
Tests for linking data – correlations 
123 















9. 
Tests for linking data – associations 
147 












10. 
Di erences between more than two samples 
161 









11. 
Tests for linking several factors 
195 






12. 
Reporting results 
239 























Contents  vii








13. Summary 
315 
Glossary 
317 
Index 
322 
Introduction
here, this book is about the processes involved in looking at data. These processes involve planning what you want to do, writing down what you found and writing up what your analyses showed. The statistics part is also in there of course but this is not a course in statistics. By the end I hope that you will have learnt some statistics but in a practical way, i.e. what statistics can do for you as well) and a computer program called R. The spreadsheet will allow you to collect your data in a sensible layout and also do some basic analyses (as well as a few less basic ones). The R program will do much of the detailed statistical work (although we will also use get the job done.
and may be summarised by four main headings:
Planning
Data recording
Data exploration
Reporting results
The book is arranged into these four broad categories. The sections are rather uneven in size and tend to focus on the analysis. The section on reporting also covers presentation of analyses (e.g. graphs).
Although the emphasis is on ecological work and many of the data examples are of that sort, I hope that other scientists and students of other disciplines will see relevance to what they do.
Mark Gardener 2011
6. Exploring data – using graphs
Graphs are useful for several reasons. They can help us to visualise the data and decide 
tackle the data, and secondly to present results. We will look at details of graphs and how
to produce them in Excel and R in Section 12.4 where we examine ways to present
analytical methods to examine our data. Indeed we have already seen some examples in Chapter 4. In this short chapter we will summarise the graphs we might use to help us explore our data.
6.1 Exploratory graphs
One of the most common analysis of sample of data is to determine if they are normally data. There are several ways we can illustrate the distribution of a data sample. We may use a simple tally plot or a stem–leaf plot; we can even do this right from our notebook in
1 
 679 

2 
 112334 

2 
 5666678899 

3 
 01124 

3 
 
6 
In this example, the data are sorted in numerical order in each row but we can still gain insights into the data distribution if the numbers are not sorted.
1 
 967 

2 
 143123 

2 
 9568667869 

3 
 40121 

3 
 
6 
A simpler version of a stem–leaf plot is the tally plot, and in this case we enter the data as
a simple tally mark. In Table 28, we see a tally plot of the same data as our stem–leaf plot.
96  Statistics for Ecologists Using R and Excel
Table 28. A tally plot to show data distribution
Tally 
Bin 
x 
16 
x 
18 
x 
20 
xxx 
22 
xxx 
24 
xxxxx 
26 
xxx 
28 
xxx 
30 
xxx 
32 
x 
34 
x 
36 
These are simple plots but nevertheless can be extremely helpful. When we return from
Figure 78. A histogram to illustrate the distribution of a data sample
dataset that lie within each size class, represented on the xaxis. We may decide to use a
6. Exploring data – using graphs  97
Figure 79. A density plot to illustrate the distribution of a data sample
Some types of graph are useful because they show a lot of information in a compact man
Figure 80. A box–whisker plot can be used to illustrate data distribution as well as provid ing other information, e.g. median, interquartiles and max/min
symmetrical about the median stripe. We can use the box–whisker plot to look at several
98  Statistics for Ecologists Using R and Excel
Another way we can visualise our data is by using a line graph to show the running average (mean or median). We met this earlier in Section 4.7 where we used the idea to help deter
Figure 81. A line graph illustrating the running mean
decision.
6.2 Graphs to illustrate di erences
illustrate the situation using bar charts or box–whisker plots. We met the box–whisker plot
Figure 82. A box–whisker plot illustrating di erences between three samples
6. Exploring data – using graphs  99
gain some insight into the distribution. A common alternative to the box–whisker plot is within each sample.
Figure 83. A bar chart illustrating di erences between three samples
6.3 Graphs to illustrate links
When we think of ways to link data together there are two main approaches. In one approach, we have two sets of values, both are numeric and one represents a dependent variable and the other an independent variable. We are looking for a correlation. In the other kind of approach, we have categories of items and we are looking to associate one set of categories with the other.
6.3.1 Graphs to illustrate correlations
speed of the water in which it lives.
100  Statistics for Ecologists Using R and Excel
Figure 84. A scatter plot illustrating a correlation
In this case, it appears as though as the water speed increases so does the abundance of the
Figure 85. Multiple scatter plots showing one dependent variable plotted against several independent variables
than the others; one shows a positive correlation and the other a negative one (although at
6. Exploring data – using graphs  101
6.3.2 Graphs to illustrate associations
When we have categorical variables, we have various choices. We can display the data for pie charts to be produced (one for each row or column category, depending on how we want to look at the data). The pie chart shows the data proportionally, each slice of pie shows the contribution as a proportion of the total.
Figure 86. A pie chart illustrating categorical data. The proportions of common bird species in a garden habitat
When we have this kind of data we can always represent it in the form of a bar chart instead. The advantage of the bar chart is that we can show several categories at one time
Figure 87. A bar chart illustrating categorical data. The number of common garden birds in various habitats
102  Statistics for Ecologists Using R and Excel
included a legend on the graph so the reader can identify the various bars more easily.
6.4 Graphs – a summary
and make important decisions about the analytical approach (Table 29). We should also use graphs to illustrate our data, which can make them more comprehensible to readers. When we present graphs we should ensure they are fully labelled and as clear as possible. Even when we use graphs for our own use it is good practice to label and title them fully.
Label axes and include the units.
 sary produce two graphs rather than one.
Give a main title explaining what the graph shows. Usually this is done as a caption in a word processor. The caption should enable a reader to understand what the graph shows sure you describe the graph so that someone else can understand it.
Table 29. Summary of graph types to use for di erent purposes
Purpose
Types of graph
Illustrating distribution
Illustrating di erences between samples Illustrating correlations Illustrating associations Illustrating sample sizes
Stem–leaf plot, tally plot, histogram, density chart, box–whisker plot Bar chart, box–whisker plot Scatter plot Pie charts, bar charts Line plot of running average (mean or median)
We will examine graphs in more detail in Chapter 12, which will also cover he presentation of results. Sections 12.4.1 and 12.5 will deal with producing graphs in R and Section 12.4.3 will cover producing graphs in Excel. We will also make some references to graphs in each of the sections dealing with the details of the various analytical methods. It is important to remember that our graphical analysis should go alongside the mathematical one.
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.