Statistics for Ecologists

Using R and Excel
Data collection, exploration,
analysis and presentation
Mark Gardener
DATA IN THE WILD SERIES
IeIagic IubIishing + vvv.¡eIagic¡ubIishing.com
IubIished by IeIagic IubIishing
vvv.¡eIagic¡ubIishing.com
IO ßox 725, Ixeler, IX1 9QU
Slalislics for IcoIogisls Using R and IxceI
®
Dala coIIeclion, ex¡Ioralion, anaIysis and ¡resenlalion
ISßN 978-1-907807-12-1 (Ibk)
ISßN 978-1-907807-13-8 (Hbk)
Co¡yrighl © 2012 Mark Gardener

AII righls reserved. No ¡arl of lhis documenl may be ¡roduced, slored in a relrievaI sys-
lem, or lransmiued in any form or by any means, eIeclronic, mechanicaI, ¡holoco¡ying,
recording or olhervise vilhoul ¡rior ¡ermission from lhe ¡ubIisher.
WhiIe every eßorl has been made in lhe ¡re¡aralion of lhis book lo ensure lhe accuracy
of lhe informalion ¡resenled, lhe informalion conlained in lhis book is soId vilhoul
varranly, eilher ex¡ress or im¡Iied. Neilher lhe aulhor, nor IeIagic IubIishing, ils agenls
and dislribulors viII be heId IiabIe for any damage or Ioss caused or aIIeged lo be caused
direclIy or indireclIy by lhis book.
Windovs, IxceI and Word and are lrademarks of lhe Microsoh Cor¡oralion. Ior more
informalion visil vvv. microsoh.com. O¡enOßce.org is a lrademark of OracIe. Ior more
informalion visil vvv.o¡enoßce.org. A¡¡Ie Macinlosh is a lrademark of A¡¡Ie Inc. Ior
more informalion visil vvv.a¡¡Ie.com.
!"#$#%& (#)"*"+ ,*$*-./0#1/ #1 20)-#3*$#.1 4*$*
A calaIogue record for lhis book is avaiIabIe from lhe ßrilish Library.
Cover image © islock¡holo.com/duIezidar
About the author
Mark began his career as an o¡lician bul relurned lo science and lrained as an ecoIogisl.
His research is in lhe area of ¡oIIinalion ecoIogy. He has vorked exlensiveIy in lhe UK as
veII as AuslraIia and lhe Uniled Slales. CurrenlIy he vorks as an associale Ieclurer for lhe
O¡en Universily and aIso runs courses in dala anaIysis for ecoIogy and environmenlaI
science.
Acknowledgements
I am es¡eciaIIy gralefuI lo NigeI Massen al IeIagic IubIishing for his heI¡ and ¡ersever-
ance lhroughoul lhe ¡roduclion of lhis book.
Thanks go lo Anne Goodenough for ¡alienlIy and lhoroughIy revieving lhe manuscri¡l,
your commenls and vievs vere mosl heI¡fuI.
Wilh a book of lhis nalure dala exam¡Ies are aIvays usefuI. Some of lhe dala iIIuslraled
here vere coIIecled by sludenls and I gralefuIIy acknovIedge lheir eßorls and send lhanks
for aIIoving me lo use lhese dala as exam¡Ies.
IinaIIy my hearlfeIl lhanks go lo Chrisline, for ¡uuing u¡ vilh me lhroughoul lhe enlire
¡rocess.
Software used
Various versions of Microsoh's IxceI
®
s¡readsheel vere used in lhe ¡re¡aralion of lhis
manuscri¡l. Mosl of lhe exam¡Ies ¡resenled shov version 2007 for Microsoh Windovs
®

aIlhough olher versions may aIso be iIIuslraled (incIuding IxceI X for A¡¡Ie Macinlosh
®
).
SeveraI versions of lhe R ¡rogram vere used and iIIuslraled incIuding 2.8.1. for Windovs
and 2.11.1 for Macinlosh: R Ioundalion for SlalislicaI Com¡uling, Vienna, Auslria. ISßN
3-900051-07-0, URL hu¡://vvv.R-¡ro|ecl.org/.
Downloading free code examples
Iree code exam¡Ies and furlher informalion from lhe aulhor on using R and IxceI for
slalislics can be found al:
hu¡://vvv.¡eIagic¡ubIishing.com/slalislics-for-ecoIogisls-resources.hlmI
Reader feedback
We veIcome feedback from readers ÷ ¡Iease emaiI us al info+¡eIagic¡ubIishing.com and
leII us vhal you lhoughl aboul lhis book. IIease incIude lhe book lilIe in lhe sub|ecl Iine
of your emaiI.
Publish with Pelagic Publishing
We ¡ubIish scienlihc books lo lhe highesl ediloriaI slandards in aII Iife science disci¡Iines,
vilh a ¡arlicuIar focus on ecoIogy, conservalion and environmenl. IeIagic IubIishing ¡ro-
duces books lhal sel nev benchmarks, share advances in research melhods and encourage
and inform viIdIife invesligalion for aII.
If you are inleresled in ¡ubIishing vilh IeIagic ¡Iease conlacl edilor+¡eIagic¡ubIishing.
com vilh a syno¡sis of your book, a brief hislory of your ¡revious vriuen vork and a
slalemenl describing lhe im¡acl you vouId Iike your book lo have on readers.
Contents
Introduction viii
1. Planning 1
1.1 The scienlihc melhod 1
1.2 Ty¡es of ex¡erimenl/¡ro|ecl 3
1.3 Geuing dala ÷ using a s¡readsheel 3
1.4 Hy¡olhesis lesling 4
1.5 Dala ly¡es 4
1.6 Sam¡Iing eßorl 7
1.7 TooIs of lhe lrade 12
1.8 The R ¡rogram 13
1.9 IxceI 19
2. Data recording 23
2.1 CoIIecling dala ÷ vho, vhal, vhere, vhen 23
2.2 Hov lo arrange dala 25
3. Beginning data exploration – using software tools 29
3.1 ßeginning lo use R 29
3.2 Mani¡uIaling dala in a s¡readsheel 37
3.3 Geuing dala from IxceI inlo R 55
4. Exploring data – looking at numbers 57
4.1 Summarising dala 58
4.2 Dislribulion 61
4.3 A numericaI vaIue for lhe dislribulion 67
4.4 SlalislicaI lesls for normaI dislribulion 75
4.5 Dislribulion ly¡e 76
4.6 Transforming dala 81
4.7 When lo slo¡ coIIecling dala` The running average 84
4.8 SlalislicaI symboIs 88
vi | Contents
5. Exploring data – which test is right? 91
5.1 Hy¡olhesis lesling 91
5.2 Choosing lhe correcl lesl 92
6. Exploring data – using graphs 95
6.1 Ix¡Ioralory gra¡hs 95
6.2 Gra¡hs lo iIIuslrale dißerences 98
6.3 Gra¡hs lo iIIuslrale Iinks 99
6.4 Gra¡hs ÷ a summary 102
7. Tests for diferences 103
7.1 Dißerences: !-lesl 103
7.2 Dißerences: "-lesl 112
7.3 Iaired lesls 117
8. Tests for linking data – correlations 123
8.1 CorreIalion: S¡earman's rank lesl 123
8.2 Iearson's ¡roducl momenl 130
8.3 CorreIalion lesls using IxceI 134
8.4 CorreIalion lesls using R 139
8.5 Curved Iinear correIalion 143
9. Tests for linking data – associations 147
9.1 Associalion: Chi-squared lesl 147
9.2 Goodness of hl lesl 153
9.3 Using R for Chi-squared lesls 154
9.4 Using IxceI for Chi-squared lesls 157
10. Diferences between more than two samples 161
10.1 Using R for more com¡Iex slalislicaI anaIyses 161
10.2 AnaIysis of variance 164
10.3 KruskaI÷WaIIis lesl 186
11. Tests for linking several factors 195
11.1 MuIli¡Ie regression 195
11.2 Curved-Iinear regression 212
12. Reporting results 239
12.1 Iresenling hndings 239
12.2 IubIishing 239
12.3 Re¡orling resuIls of slalislicaI anaIyses 240
12.4 Gra¡hs 241
12.5 More aboul gra¡hs in R 273
12.6 Worked exam¡Ie gra¡h dala in R 296
12.7 Gra¡hs: a summary 309
12.8 Wriling ¡a¡ers 310
Contents | vii
12.9 IIagiarism 311
12.10 References 312
12.11 Iosler ¡resenlalions 313
12.12 Giving a laIk (IoverIoinl) 314
13. Summary 315
#$%&&'() 317
*+,-. 322
Introduction
This is nol |usl a slalislics lexlbook! AIlhough lhere are ¡Ienly of slalislicaI anaIyses
here, lhis book is aboul lhe ¡rocesses invoIved in Iooking al dala. These ¡rocesses invoIve
¡Ianning vhal you vanl lo do, vriling dovn vhal you found and vriling u¡ vhal your
anaIyses shoved. The slalislics ¡arl is aIso in lhere of course bul lhis is nol a course
in slalislics. ßy lhe end I ho¡e lhal you viII have Iearnl some slalislics bul in a ¡raclicaI
vay, i.e. /0'! &!'!2&!23& 3'+ ,% 4%( )%5. In order lo Iearn aboul lhe melhods of anaIysis, ve'II
use lvo main looIs: a Microsoh IxceI s¡readsheel (aIlhough O¡en Oßce viII vork |usl
as veII) and a com¡uler ¡rogram caIIed R. The s¡readsheel viII aIIov you lo coIIecl your
dala in a sensibIe Iayoul and aIso do some basic anaIyses (as veII as a fev Iess basic ones).
The R ¡rogram viII do much of lhe delaiIed slalislicaI vork (aIlhough ve viII aIso use
IxceI quile a bil). ßolh ¡rograms viII be used lo ¡roduce gra¡hs. This book is nol a course
in com¡uler ¡rogramming, ve'II Iearn |usl enough aboul lhe ¡rograms lo 6-! !0- 7%8 ,%+-.
Il is im¡orlanl lo recognise lhal lhere is a ¡rocess invoIved. This is lhe scienlihc ¡rocess
and may be summarised by four main headings:
ª IIanning
ª Dala recording
ª Dala ex¡Ioralion
ª Re¡orling resuIls
The book is arranged inlo lhese four broad calegories. The seclions are ralher uneven in
size and lend lo focus on lhe anaIysis. The seclion on re¡orling aIso covers ¡resenlalion
of anaIyses (e.g. gra¡hs).
AIlhough lhe em¡hasis is on ecoIogicaI vork and many of lhe dala exam¡Ies are of lhal
sorl, I ho¡e lhal olher scienlisls and sludenls of olher disci¡Iines viII see reIevance lo vhal
lhey do.
Mark Gardener 2011
6. Exploring data – using graphs
Gra¡hs are usefuI for severaI reasons. They can heI¡ us lo visuaIise lhe dala and decide
vhich slalislicaI lesl is lhe besl. We may s¡ol ¡auerns in lhe dala and gain a beuer under-
slanding of vhal ve are deaIing vilh. Gra¡hs are aIso usefuI for summarising our hnaI
resuIls, es¡eciaIIy vhen ve ¡resenl our hndings lo olher ¡eo¡Ie.
We can lhink of gra¡hs as being usefuI for lvo ¡ur¡oses: hrslIy lo heI¡ us decide hov lo
lackIe lhe dala, and secondIy lo ¡resenl resuIls. We viII Iook al delaiIs of gra¡hs and hov
lo ¡roduce lhem in IxceI and R in Seclion 12.4 vhere ve examine vays lo ¡resenl
our hndings. We viII aIso menlion gra¡hs lhroughoul lhe lexl as ve Iook al lhe various
anaIylicaI melhods lo examine our dala. Indeed ve have aIready seen some exam¡Ies in
Cha¡ler 4. In lhis shorl cha¡ler ve viII summarise lhe gra¡hs ve mighl use lo heI¡ us
ex¡Iore our dala.
6.1 Exploratory graphs
One of lhe mosl common anaIysis of sam¡Ie of dala is lo delermine if lhey are normaIIy
dislribuled or nol. This aßecls lhe kind of slalislicaI anaIysis ve are abIe lo ¡erform on lhe
dala. There are severaI vays ve can iIIuslrale lhe dislribulion of a dala sam¡Ie. We may
use a sim¡Ie laIIy ¡Iol or a slem÷Ieaf ¡Iol, ve can even do lhis righl from our nolebook in
lhe heId. The foIIoving exam¡Ie shovs a slem÷Ieaf ¡Iol.
1 | 679
2 | 112334
2 | 5666678899
3 | 01124
3 | 6
In lhis exam¡Ie, lhe dala are sorled in numericaI order in each rov bul ve can sliII gain
insighls inlo lhe dala dislribulion if lhe numbers are nol sorled.
1 | 967
2 | 143123
2 | 9568667869
3 | 40121
3 | 6
A sim¡Ier version of a slem÷Ieaf ¡Iol is lhe laIIy ¡Iol, and in lhis case ve enler lhe dala as
a sim¡Ie laIIy mark. In TabIe 28, ve see a laIIy ¡Iol of lhe same dala as our slem÷Ieaf ¡Iol.
96 | Statistics for Ecologists Using R and Excel
Table 28. A tally plot to show data distribution
Tally Bin
x 16
x 18
x 20
xxx 22
xxx 24
xxxxx 26
xxx 28
xxx 30
xxx 32
x 34
x 36
These are sim¡Ie ¡Iols bul neverlheIess can be exlremeIy heI¡fuI. When ve relurn from
lhe heId ve may decide lo use a more formaI hislogram lo iIIuslrale lhe dislribulion
(Iigure 78).
Figure 78. A histogram to illustrate the distribution of a data sample
The size of lhe bars in our hislogram shovs us lhe number of ilems (lhe frequency) of our
dalasel lhal Iie vilhin each size cIass, re¡resenled on lhe .-axis. We may decide lo use a
Iine inslead of bars and lhe resuIl is a densily ¡Iol (Iigure 79).
6. Exploring data – using graphs | 97
Figure 79. A density plot to illustrate the distribution of a data sample
Some ly¡es of gra¡h are usefuI because lhey shov a Iol of informalion in a com¡acl man-
ner such as lhe box÷vhisker ¡Iol. A box÷vhisker ¡Iol shovs us hve ¡ieces of informalion:
median, maximum, minimum and bolh quarliIes (Iigure 80).
Figure 80. A box–whisker plot can be used to illustrate data distribution as well as provid-
ing other information, e.g. median, inter-quartiles and max/min
In Iigure 80, ve can see lhal lhe dala a¡¡ear normaIIy dislribuled as lhe box÷vhiskers are
symmelricaI aboul lhe median slri¡e. We can use lhe box÷vhisker ¡Iol lo Iook al severaI
sam¡Ies and iIIuslrale nol onIy dißerences belveen sam¡Ies bul lheir dislribulion as veII
(Iigure 82).
98 | Statistics for Ecologists Using R and Excel
Anolher vay ve can visuaIise our dala is by using a Iine gra¡h lo shov lhe running average
(mean or median). We mel lhis earIier in Seclion 4.7 vhere ve used lhe idea lo heI¡ deler-
mine if ve had coIIecled enough dala. In Iigure 81, ve see an exam¡Ie of a running mean.
Figure 81. A line graph illustrating the running mean
This is anolher exam¡Ie of a gra¡h ve can skelch vhiIsl oul in lhe heId. We do nol have lo
be quile so exacl vhen ve are oul in lhe heId, lhe gra¡h is sim¡Iy a looI lo heI¡ us make a
decision.
6.2 Graphs to illustrate diferences
When ve have a ¡ro|ecl lhal is cenlred on Iooking al dißerences belveen sam¡Ies ve can
iIIuslrale lhe silualion using bar charls or box÷vhisker ¡Iols. We mel lhe box÷vhisker ¡Iol
¡reviousIy (Iigure 80) vhen ve used il lo viev a sam¡Ie and check ils dislribulion. In
Iigure 82 ve Iook al lhree sam¡Ies.
Figure 82. A box–whisker plot illustrating diferences between three samples
6. Exploring data – using graphs | 99
We can see lhe dißerences belveen lhe lhree sam¡Ies fairIy easiIy and in addilion ve can
gain some insighl inlo lhe dislribulion. A common aIlernalive lo lhe box÷vhisker ¡Iol is
lhe bar charl. This is usefuI lo shov dißerences belveen ilems in dißerenl calegories and
is lherefore suilabIe lo iIIuslrale dißerences in sam¡Ies. In Iigure 83 ve see lhe same dala
as in Iigure 82 bul here ve use a bar charl vilh slandard error bars lo shov lhe variabiIily
vilhin each sam¡Ie.
Figure 83. A bar chart illustrating diferences between three samples
We can see from Iigure 82 lhal lhere are dißerences belveen lhe lhree sam¡Ies unIike in
Iigure 83 vhere ve cannol leII anylhing aboul lhe dislribulion.
6.3 Graphs to illustrate links
When ve lhink of vays lo Iink dala logelher lhere are lvo main a¡¡roaches. In one
a¡¡roach, ve have lvo sels of vaIues, bolh are numeric and one re¡resenls a de¡endenl
variabIe and lhe olher an inde¡endenl variabIe. We are Iooking for a correIalion. In lhe
olher kind of a¡¡roach, ve have calegories of ilems and ve are Iooking lo associale one
sel of calegories vilh lhe olher.
6.3.1 Graphs to illustrate correlations
When ve are Iooking for correIalions, ve can besl iIIuslrale lhe silualion using a scauer
¡Iol, lhis aIIovs us lo see hov one variabIe is reIaled lo lhe olher. In Iigure 84 ve see a
scauer ¡Iol shoving hov lhe abundance of a freshvaler inverlebrale is reIaled lo lhe
s¡eed of lhe valer in vhich il Iives.
100 | Statistics for Ecologists Using R and Excel
Figure 84. A scatter plot illustrating a correlation
In lhis case, il a¡¡ears as lhough as lhe valer s¡eed increases so does lhe abundance of lhe
inverlebrale. We do nol knov if lhis reIalionshi¡ is slalislicaIIy signihcanl bul il gives us
an im¡ression. When ve have severaI inde¡endenl variabIes ve can ¡Iol severaI scauer
¡Iols, lhis may heI¡ us decide vhich is lhe mosl im¡orlanl faclor lo consider (Iigure 85).
Figure 85. Multiple scatter plots showing one dependent variable plotted against several
independent variables
In Iigure 85 ve can see lhal lvo of lhe inde¡endenl variabIes shov a more dehnile lrend
lhan lhe olhers, one shovs a ¡osilive correIalion and lhe olher a negalive one (aIlhough al
lhis ¡oinl ve do nol knov if eilher is slalislicaIIy signihcanl).
6. Exploring data – using graphs | 101
6.3.2 Graphs to illustrate associations
When ve have calegoricaI variabIes, ve have various choices. We can dis¡Iay lhe dala for
each rov or coIumn calegory as a ¡ie charl (e.g. Iigure 86), lhis viII usuaIIy require severaI
¡ie charls lo be ¡roduced (one for each rov or coIumn calegory, de¡ending on hov ve
vanl lo Iook al lhe dala). The ¡ie charl shovs lhe dala ¡ro¡orlionaIIy, each sIice of ¡ie
shovs lhe conlribulion as a ¡ro¡orlion of lhe lolaI.
Figure 86. A pie chart illustrating categorical data. The proportions of common bird species
in a garden habitat
When ve have lhis kind of dala ve can aIvays re¡resenl il in lhe form of a bar charl
inslead. The advanlage of lhe bar charl is lhal ve can shov severaI calegories al one lime
(Iigure 87).
Figure 87. A bar chart illustrating categorical data. The number of common garden birds in
various habitats
102 | Statistics for Ecologists Using R and Excel
In Iigure 87 ve can see various bird s¡ecies and various habilals, in lhis case ve have aIso
incIuded a Iegend on lhe gra¡h so lhe reader can idenlify lhe various bars more easiIy.
6.4 Graphs – a summary
There are quile a fev dißerenl sorls of gra¡h lhal ve can uliIise lo heI¡ visuaIise our dala
and make im¡orlanl decisions aboul lhe anaIylicaI a¡¡roach (TabIe 29). We shouId aIso
use gra¡hs lo iIIuslrale our dala, vhich can make lhem more com¡rehensibIe lo readers.
When ve ¡resenl gra¡hs ve shouId ensure lhey are fuIIy IabeIIed and as cIear as ¡ossibIe.
Iven vhen ve use gra¡hs for our ovn use il is good ¡raclice lo IabeI and lilIe lhem fuIIy.
LabeI axes and incIude lhe unils.
Do nol incIude loo many dißerenl eIemenls on a singIe gra¡h ÷ avoid cIuuer and if neces-
sary ¡roduce lvo gra¡hs ralher lhan one.
Give a main lilIe ex¡Iaining vhal lhe gra¡h shovs. UsuaIIy lhis is done as a ca¡lion in a
vord ¡rocessor. The ca¡lion shouId enabIe a reader lo undersland vhal lhe gra¡h shovs
vilhoul having lo read lhe main lexl. If your gra¡h is in your heId nolebook lhen make
sure you describe lhe gra¡h so lhal someone eIse can undersland il.
Table 29. Summary of graph types to use for diferent purposes
Purpose Types of graph
Illustrating distribution Stem–leaf plot, tally plot, histogram, density
chart, box–whisker plot
Illustrating diferences between samples Bar chart, box–whisker plot
Illustrating correlations Scatter plot
Illustrating associations Pie charts, bar charts
Illustrating sample sizes Line plot of running average (mean or median)
We viII examine gra¡hs in more delaiI in Cha¡ler 12, vhich viII aIso cover he ¡resenlalion
of resuIls. Seclions 12.4.1 and 12.5 viII deaI vilh ¡roducing gra¡hs in R and Seclion 12.4.3
viII cover ¡roducing gra¡hs in IxceI. We viII aIso make some references lo gra¡hs in each
of lhe seclions deaIing vilh lhe delaiIs of lhe various anaIylicaI melhods. Il is im¡orlanl lo
remember lhal our gra¡hicaI anaIysis shouId go aIongside lhe malhemalicaI one.