You are on page 1of 17

Statistics for Ecologists Using R and Excel

Data collection, exploration, analysis and presentation

Mark Gardener

DATA IN THE WILD SERIES

Pelagic Publishing | www.pelagicpublishing.com

Published by Pelagic Publishing www.pelagicpublishing.com PO Box 725, Exeter, EX1 9QU

Statistics for Ecologists Using R and Excel ® Data collection, exploration, analysis and presentation

ISBN 978-1-907807-12-1 (Pbk) ISBN 978-1-907807-13-8 (Hbk)

Copyright © 2012 Mark Gardener

All rights reserved. No part of this document may be produced, stored in a retrieval sys- recording or otherwise without prior permission from the publisher.

of the information presented, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Pelagic Publishing, its agents and distributors will be held liable for any damage or loss caused or alleged to be caused directly or indirectly by this book.

more information visit www.apple.com.

British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library.

Cover image © istockphoto.com/dulezidar

About the author

Mark began his career as an optician but returned to science and trained as an ecologist. His research is in the area of pollination ecology. He has worked extensively in the UK as well as Australia and the United States. Currently he works as an associate lecturer for the Open University and also runs courses in data analysis for ecology and environmental science.

Acknowledgements

I am especially grateful to Nigel Massen at Pelagic Publishing for his help and persever- ance throughout the production of this book.

Thanks go to Anne Goodenough for patiently and thoroughly reviewing the manuscript, your comments and views were most helpful.

With a book of this nature data examples are always useful. Some of the data illustrated for allowing me to use these data as examples.

process.

Software used

® spreadsheet were used in the preparation of this ® although other versions may also be illustrated (including Excel X for Apple Macintosh ® ).

Several versions of the R program were used and illustrated including 2.8.1. for Windows

Downloading free code examples

statistics can be found at:

Reader feedback

We welcome feedback from readers – please email us at info@pelagicpublishing.com and of your email.

Publish with Pelagic Publishing

with a particular focus on ecology, conservation and environment. Pelagic Publishing pro- duces books that set new benchmarks, share advances in research methods and encourage and inform wildlife investigation for all.

If you are interested in publishing with Pelagic please contact editor@pelagicpublishing. statement describing the impact you would like your book to have on readers.

Contents

Introduction

viii

1. Planning

1

2. Data recording

23

3. Beginning data exploration – using software tools

29

4. Exploring data – looking at numbers

57

vi | Contents

5.

Exploring data – which test is right?

91

6.

Exploring data – using graphs

95

7.

Tests for di erences

103

t-test

103

U-test

112

8.

Tests for linking data – correlations

123

9.

Tests for linking data – associations

147

10.

Di erences between more than two samples

161

11.

Tests for linking several factors

195

12.

Reporting results

239

Contents | vii

13. Summary

315

Glossary

317

Index

322

Introduction

here, this book is about the processes involved in looking at data. These processes involve planning what you want to do, writing down what you found and writing up what your analyses showed. The statistics part is also in there of course but this is not a course in statistics. By the end I hope that you will have learnt some statistics but in a practical way, i.e. what statistics can do for you as well) and a computer program called R. The spreadsheet will allow you to collect your data in a sensible layout and also do some basic analyses (as well as a few less basic ones). The R program will do much of the detailed statistical work (although we will also use get the job done.

and may be summarised by four main headings:

Planning

Data recording

Data exploration

Reporting results

The book is arranged into these four broad categories. The sections are rather uneven in size and tend to focus on the analysis. The section on reporting also covers presentation of analyses (e.g. graphs).

Although the emphasis is on ecological work and many of the data examples are of that sort, I hope that other scientists and students of other disciplines will see relevance to what they do.

Mark Gardener 2011

6. Exploring data – using graphs

Graphs are useful for several reasons. They can help us to visualise the data and decide -

tackle the data, and secondly to present results. We will look at details of graphs and how

to produce them in Excel and R in Section 12.4 where we examine ways to present

analytical methods to examine our data. Indeed we have already seen some examples in Chapter 4. In this short chapter we will summarise the graphs we might use to help us explore our data.

6.1 Exploratory graphs

One of the most common analysis of sample of data is to determine if they are normally data. There are several ways we can illustrate the distribution of a data sample. We may use a simple tally plot or a stem–leaf plot; we can even do this right from our notebook in

1

| 679

2

| 112334

2

| 5666678899

3

| 01124

3

|

6

In this example, the data are sorted in numerical order in each row but we can still gain insights into the data distribution if the numbers are not sorted.

1

| 967

2

| 143123

2

| 9568667869

3

| 40121

3

|

6

A simpler version of a stem–leaf plot is the tally plot, and in this case we enter the data as

a simple tally mark. In Table 28, we see a tally plot of the same data as our stem–leaf plot.

96 | Statistics for Ecologists Using R and Excel

Table 28. A tally plot to show data distribution

Tally

Bin

x

16

x

18

x

20

xxx

22

xxx

24

xxxxx

26

xxx

28

xxx

30

xxx

32

x

34

x

36

These are simple plots but nevertheless can be extremely helpful. When we return from

nevertheless can be extremely helpful. When we return from Figure 78. A histogram to illustrate the

Figure 78. A histogram to illustrate the distribution of a data sample

dataset that lie within each size class, represented on the x-axis. We may decide to use a

6. Exploring data – using graphs | 97

6. Exploring data – using graphs | 97 Figure 79. A density plot to illustrate the

Figure 79. A density plot to illustrate the distribution of a data sample

Some types of graph are useful because they show a lot of information in a compact man-

because they show a lot of information in a compact man- Figure 80. A box–whisker plot

Figure 80. A box–whisker plot can be used to illustrate data distribution as well as provid- ing other information, e.g. median, inter-quartiles and max/min

symmetrical about the median stripe. We can use the box–whisker plot to look at several

98 | Statistics for Ecologists Using R and Excel

Another way we can visualise our data is by using a line graph to show the running average (mean or median). We met this earlier in Section 4.7 where we used the idea to help deter-

earlier in Section 4.7 where we used the idea to help deter- Figure 81. A line

Figure 81. A line graph illustrating the running mean

decision.

6.2 Graphs to illustrate di erences

illustrate the situation using bar charts or box–whisker plots. We met the box–whisker plot

charts or box–whisker plots. We met the box–whisker plot Figure 82. A box–whisker plot illustrating di

Figure 82. A box–whisker plot illustrating di erences between three samples

6. Exploring data – using graphs | 99

gain some insight into the distribution. A common alternative to the box–whisker plot is within each sample.

to the box–whisker plot is within each sample. Figure 83. A bar chart illustrating di erences

Figure 83. A bar chart illustrating di erences between three samples

6.3 Graphs to illustrate links

When we think of ways to link data together there are two main approaches. In one approach, we have two sets of values, both are numeric and one represents a dependent variable and the other an independent variable. We are looking for a correlation. In the other kind of approach, we have categories of items and we are looking to associate one set of categories with the other.

6.3.1 Graphs to illustrate correlations

speed of the water in which it lives.

100 | Statistics for Ecologists Using R and Excel

100 | Statistics for Ecologists Using R and Excel Figure 84. A scatter plot illustrating a

Figure 84. A scatter plot illustrating a correlation

In this case, it appears as though as the water speed increases so does the abundance of the

as the water speed increases so does the abundance of the Figure 85. Multiple scatter plots

Figure 85. Multiple scatter plots showing one dependent variable plotted against several independent variables

than the others; one shows a positive correlation and the other a negative one (although at

6. Exploring data – using graphs | 101

6.3.2 Graphs to illustrate associations

When we have categorical variables, we have various choices. We can display the data for pie charts to be produced (one for each row or column category, depending on how we want to look at the data). The pie chart shows the data proportionally, each slice of pie shows the contribution as a proportion of the total.

of pie shows the contribution as a proportion of the total. Figure 86. A pie chart

Figure 86. A pie chart illustrating categorical data. The proportions of common bird species in a garden habitat

When we have this kind of data we can always represent it in the form of a bar chart instead. The advantage of the bar chart is that we can show several categories at one time

bar chart is that we can show several categories at one time Figure 87. A bar

Figure 87. A bar chart illustrating categorical data. The number of common garden birds in various habitats

102 | Statistics for Ecologists Using R and Excel

included a legend on the graph so the reader can identify the various bars more easily.

6.4 Graphs – a summary

and make important decisions about the analytical approach (Table 29). We should also use graphs to illustrate our data, which can make them more comprehensible to readers. When we present graphs we should ensure they are fully labelled and as clear as possible. Even when we use graphs for our own use it is good practice to label and title them fully.

Label axes and include the units.

- sary produce two graphs rather than one.

Give a main title explaining what the graph shows. Usually this is done as a caption in a word processor. The caption should enable a reader to understand what the graph shows sure you describe the graph so that someone else can understand it.

Table 29. Summary of graph types to use for di erent purposes

Purpose

Types of graph

Illustrating distribution

Illustrating di erences between samples Illustrating correlations Illustrating associations Illustrating sample sizes

Stem–leaf plot, tally plot, histogram, density chart, box–whisker plot Bar chart, box–whisker plot Scatter plot Pie charts, bar charts Line plot of running average (mean or median)

We will examine graphs in more detail in Chapter 12, which will also cover he presentation of results. Sections 12.4.1 and 12.5 will deal with producing graphs in R and Section 12.4.3 will cover producing graphs in Excel. We will also make some references to graphs in each of the sections dealing with the details of the various analytical methods. It is important to remember that our graphical analysis should go alongside the mathematical one.