You are on page 1of 15

Estimating Species Richness and Diversity

and Assessing Stream Quality: A Survey of


the Macro-invertebrates in Boulder Creek

Part II

LABORATORY # 5

OVERVIEW
This lab is the data analysis component of the stream diversity project. During this
lab, we will talk about how researchers measure diversity and the GTA will help you
process the data from the upstream site. You will then be placed into a group with several
other students and work to process the data from the downstream site. After the groups
process their data and interpret their results, each group will present their findings to the
section as a whole.

L5-1
INTRODUCTION
In our current lab, we will use the data we collected last week to test two
hypotheses. The first hypothesis was that human inputs and disturbances would
negatively impact the diversity of macroinvertebrates found in Boulder Creek. The
second hypothesis was that human inputs and disturbances affect the relative health of
Boulder Creek. If the second hypothesis is correct, we should find that taxa that are most
affected by changes in stream quality should be those that should be less common in the
potentially impacted downstream site. Before we put our hypotheses to the test, we need
to address the different ways to quantify diversity and how to use bioassessment
protocols to assess stream quality.

SPECIES RICHNESS AND DIVERSITY INDICES


Most studies that examine species diversity focus on quantifying species richness.
Species richness is simply the total number of species within a habitat or community.
Species richness is the most commonly used measure of diversity because it is a
straightforward measure and it is intuitive. The main problem with using species richness
is that it does not provide any information on how well each of the species is represented
in the sampled area. Species diversity is a measure of both the number of species
(species richness) and the relative contribution of each of these species to the total
number of individuals in a community (evenness).
Consider two communities of 100 individuals each and composed of 10 different
species (Table 1). One community has 10 individuals of each species; the other has 1
individual of each of 9 species, and 91 individuals of the tenth species. Which community
is generally more diverse? Clearly the first one is more diverse because, although both
communities have the same species richness (i.e., 10 species), the first community has
greater evenness (i.e., species are more equally represented).

Table 1. An example of two communities that have similar species richness, but differ in
the relative abundance (commonness) of the each of the species.

Community Species & Number of Individuals


A B C D E F G H I J
1 10 10 10 10 10 10 10 10 10 10
2 1 1 1 1 1 1 1 1 1 91

THE SHANNON-WEINER INDEX AND EVENNESS


Diversity indices take into account both species richness and the relative
abundance of each species to quantify how well species are represented within a
community. One of the most commonly used diversity index is the Shannon-Wiener
Index (H’). The Shannon-Wiener Index takes both species richness and the relative
abundance of each of these species in a community into account to determine the
uncertainty that an individual picked at random will be of a given species (Equation 1). In
a species aggregation where only few species are relatively abundant, we can be
relatively certain of the identity of a species chosen at random. In a species aggregation
where each of the species is fairly well represented, it is difficult to predict the identity of
a randomly sampled individual. For example, in Table 1 above, if we picked an

L5-2
individual at random from community 1, we may be more uncertain as to which species
we may have sampled than if we had randomly picked an individual from community 2.
Biologically realistic H’ values range from 0 (only one species present with no
uncertainty as to what species each individual will be) to about 4.5 (high uncertainty as
species are relatively evenly distributed). In theory, the H’ value can be much higher than
4.5, although most real world estimates of H’ range from 1.5 to 3.5. In general, it is
thought that more disturbed and less stable environments should have lower H’ values.
The Shannon-Wiener Index can, however, be difficult to compare between sites
because this value can only not differ due to changes in the relative abundance of
different species, but also due to increases in species richness (Table 1, compare H’
values among communities 2 & 3).

Equation 1.
s
H’ = -Σ (pi)(lnpi)
i=1
pi = proportion of individuals of species i in community (= ni /N;
where n is the number of individuals of a given species & N is the
total number of individuals in a sample)

Table 1. Calculating species richness (S) and the Shannon-Wiener (H’) and Evenness (E)
Indices for two communities. All communities have roughly one hundred individuals
distributed among two or three different species.

Species 1 Species 2 Species 3

Community 1 99 1 ------
Community 2 50 50 ------
Community 3 33 33 33

Community 1:
S=2
H’ = -(0.99(ln0.99) +0.01(ln0.01)) = 0.056
E = 0.056/ln2 = 0.08

Community 2:
S=2
H’ = -(0.5(ln0.5) + 0.5(ln0.5)) = 0.693
E = 0.693/ln2 = 1.0

Community 3:
S=3
H’ = -((0.33(ln0.33) + 0.33(ln.33) + 0.33(ln0.33)) = 1.098
E = 1.098/ln3 = 1.0

L5-3
Evenness (E) is an index that makes the H’ values comparable between
communities by controlling for the number of species found within the communities.
(Note that the Evenness index is slightly different than the more general use of the term
“evenness” that we used above). If you had a given number of species in a community,
what would be the highest H’ (H’max) possible if these species were equally represented
in the community? The answer is:
H’max = lnS (where S = total # of species)
So, if we have H’ (actual estimate of the community) and divide it by H’max, we
are asking a proportions question. If H’max represents the best we could do given S
number of species, how well did we do?
E = H’ / H’max

E can range from close to 0, where most species are rare and just a few are
abundant, to 1, where the potential evenness between species (H’max) is equal to that
which was observed (H’).

JACCARD’S SIMILARITY INDEX


Although the Shannon-Wiener and the Evenness Indices provide us with methods
of quantifying the degree to which species in a community are represented, the Jaccard’s
Index allows us to quantify the degree of overlap between the species in the two
communities. The Jaccard’s index is a valuable tool because it allows one to determine
whether two communities are composed of similar species. The Jaccard’s Index is
calculated as

Equation 2.
Jaccard’s Index = A/ (A+B+C)

A= total number of species present in both communities


B= the number of species present in community 1 but not 2
C= the number of species present in community 2 but not 1

If the Jaccard’s Index is equal to one (B=0 & C =0), all species are shared
between the two communities. If the Jaccard’s Index is near 0, few if any species are
shared. If we have 14 or more total number of species (A) in our sample, we can test
whether these samples are significantly dissimilar from each other. The probability of
these two communities being dissimilar follows a binomial distribution (similar to the
chi-square) and fortunately, someone has calculated the similarity values that we would
need to get equal to or lower than to have a probability (P value) of 0.05 or lower that two
communities are dissimilar (Table 2).

L5-4
Table 2. Given a certain number of species (A), if the Jaccard’s Index of similarity is equal
to or lower than a given value, the two communities are considered to be significantly
dissimilar (P < 0.05).

Total Number of Jaccard’s Index Value


Species (A)
14 0.080
16 0.083
19 0.133
22 0.167
26 0.190
30 0.214
35 0.200
40 0.212
46 0.219
52 0.222
60 0.223
70 0.224
80 0.230
90 0.240
100 0.243

ESTIMATION OF SPECIES DIVERSITY


One problem with diversity indices is that they rely on the assumption that the
habitats of interest have been fully sampled. However, if your diversity index is based on
the relative abundance of the 10 species in two habitats and there really are 20 species in
one habitat and 60 in another, what confidence might you have in the reliability of your
index?
EstimateS is a program that is designed to address this sampling issue and to
essentially allow the researcher to determine both how well they have sampled a habitat
and, given the collecting data, to also estimate the number of species that should be
present in the habitat. This diversity estimating program, created by Rob Colwell, has
been used by people studying diversity in everything from seedlings, to insects and fishes
and even mammals in the fossil record. For a free copy of the program and a manual see
http://viceroy.eeb.uconn.edu/EstimateS.

How does EstimateS work?


To use EstimateS, one has to repeatedly sample a habitat or environment and
determine which species are found in each sample. Abundance information is also helpful
and can allow us to calculate diversity indices using this program but for simplicity’s
sake, we will stick to just presence and absence data. Given the species collected in each
sample, EstimateS will calculate the following information for us.

1. A bootstrap analysis of our observed species.


2. A variety of estimator calculations (ACE, ICE, Chao2, Jack2 etc.)
3. A look at the number of unique and duplicate species in our survey.
4. A variety of other diversity indices calculations that we will not get the chance to
examine.

L5-5
Let us use the following figure to understand the calculations provided to us by
the EstimateS program. The only thing you should know is that there were 15 sampling
events and that in total 26 species were actually found.

70
60 Sobs
Richness

50 Uniques
Duplicates
Diversity

40
ACE
30
Species

ICE
20 Chao2
10 Jack2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Number
Number of samplesBouts
of Collecting

Figure 1. An example output using EstimateS to assess the diversity of a habitat.

First, let us look at the axes (Figure 1). On the X axis, we have the number of
collecting bouts, or essentially the number of times a given habitat/ community was
sampled. Each collecting bout is also referred to as a unit of “effort” and the above could
represent 15 bouts of 5 minute stream net sampling or 15 one hour bird mist-netting
events. The important thing is that each unit of effort MUST BE similar throughout (i.e.,
you can not do one 15 minute sample and one 2 hour sample). The Y axis is species
richness, or simply the number of species present or estimated.
Now let’s look at a bootstrap analysis of our observed species. Bootstrapping is
simply a statistical method based on repeated random sampling of an original set of
samples. Look at the key and figure above and find the Sobs (# of species observed). To
generate this line, a bootstrap analysis was conducted with the default on the program
being set to sub-sample each number of collecting bouts 50 times. So, under 1 on the X
axis above, the EstimateS program randomly picked out one of the original 15 sampling
bouts 50 different times and took the average of these 50 sub-sampling events (which in
this case was 4.3 species). In other words, if we randomly picked one of the 15 original
collecting bouts, we would, on average, find 4.3 species. Then for X = 2, the program
randomly picked 2 sampling events, added them together and then took the average of
number of species we would find if we just picked 2 collecting bouts at random 50 times.
Finally, for sample 15 above, the program randomly picked 15 samples (50 times), added
them up and took the average, which in this case was 26 (the total number of species we
found) because all sites were sampled.
Looking at the Sobs, notice that the number of species is just beginning to
asymptote, that is, the number of species is starting to level off. If the same species are
common throughout each of our sampling events, we would expect to see this level off.
For example, if there were 26 species and they were commonly found in each sample, we
could have had a situation where after, say 8 samples, we could have had Sobs be 26. In

L5-6
our case the numbers keep increasing with each additional sampling event telling us that
each time we sampled we got new species. But how many more species should we expect
to see?
The second set of calculations that the EstimateS program offers us is a variety of
estimator calculations (ACE, ICE, Chao2, Jack2 etc.). If you look at the figure above,
you will notice the ACE, ICE, Choa2 and Jack2 estimations of the number of species that
we expect to be in the habitat given our 15 samples or units of effort. The take home
message is that although we observed 26 species, the estimators predict (after 15
samples) that there should really be 30 species in the habitat (although Jack2 predicts 36).
How do the estimators work? The estimators can generate predictions based largely on
the total number of species found given a certain number of pooled samples (in our case
1-15 pooled samples) and the ratio of uniques to duplicates found within the pooled
sample(see Equation 3). If we compare uniques to duplicates, we are comparing the
number of species that occurred once in the pooled X number of samples to those that
occurred twice. Think of it this way, if we keep getting uniques when we sample our
data, we can expect many more new species but if we start getting an increase in
duplicates, we can start feeling confident that we are getting more of the same species
instead of new ones.
Below is the equation for the number of species predicted using the Choa2
estimator. The number of species predicted (Schoa2) is equal to the number of species
observed (Sobs) plus an estimate of how many more species should be out there based on
the ratio of number of samples collected (m) multiplied by the ratio of uniques (Q1) to
duplicates (Q2).

Equation 3.

It is important to note that in our example above, the number of observed species
and the number of uniques and duplicates is calculated 15 different times and based on
sub-sampling and pooling 1 through 15 samples 50 times. Notice that the number of
species estimated climbed quickly when we pooled 2 to 3 samples, but that this number
dropped rapidly and leveled off at about 30 species after 7 samples. Why did this happen?
In order to see why, we should look at the number of uniques and duplicate species in
our survey (Figure 1). Early on, we have many more uniques relative to duplicates and
this gives us a large estimate of potential diversity because we have so many species that
only occur once and thus it is assumed that if we keep sampling, we will continue to find
more unique species (see the equation above). As the figure shows, later on the number
of duplicates begins to increase and uniques begin to decrease. As the number of
duplicates increase, relative to the uniques, the number of new species that will be
predicted decreases. Usually when the unique and duplicate lines two cross (and they
don’t yet on our figure) is when people feel most confident about their estimates as this is
where the species estimators asymptote (level off).

L5-7
So the number of species we actually found was 26 and our estimators predict
there are potentially 30 species present. What does it mean to estimate that there should
be 30 species when only 26 were found? In this case, the estimates are telling us that we
probably found most of the species that are out there. What if we found 26 and our
estimators predicted 50 or 80 species? How might that affect our interpretation of
diversity within a given area? Many times if the estimators are much higher than what is
actually found, this tells researchers they should keep sampling. Using estimator
programs such as EstimateS helps us get a clearer understanding of the world around us
because they tell us something of how well we actually sampled the diversity that makes
up a given habitat or area of interest and helps us estimate how many organisms should
really be out there.

HOW TO USE ESTIMATES


To use EstimateS, we’ll need to put our data into an Excel file first. We will use a
dataset of species collected in the upstream and downstream sites by Dr. James
McCutchan, a stream ecologist who has repeatedly sampled the areas we sampled last
week. The files should be titled “Upstream Insect Species” and “Downstream Insect
Species”. You can download these files from the lab website and open them with Excel.

The files should look as follows when opened:

Title Record: The first line of the Input File contains our site title.
Parameter Record: The second line contains two obligatory control parameters; the
total number of species present and the number of times the habitat was sampled.

Species are in rows and each sampling events is represented by a column. If a species is
present in a sample, it is coded as 1 and if the species is absent, it is coded as a 0.

FROM EXCEL TO ESTIMATES AND BACK


1. Go to the “File” menu of Excel and choose the “save as” option. You will need to
convert this file into a “Text (tab delimited)” format. You should rename the file so you
are aware that it is the tab delimited version of your data file. Click “yes” to each option
you are given until actual saving begins. Close your new tab delimited file.

2. Open EstimateS and click the “ok” option to get to the main screen. Go under “file”;
choose “Load input file” and open the Text (tab delimited) file you previously made (not
the Excel file). Next, choose the following options – “format 1” (this is how our data are
arranged) and under the “Skip” option you will choose to skip 2 rows and 2 columns.
This just tells the EstimateS program where the data begins following the obligatory first
(title) and second (#species & #samples) lines.

3. Because we are using simple absence and presence data, you will need to go to the
“diversity” option and select “diversity settings”. Now choose the “Estimators” option.
Within the “Estimators” options, choose “Use Classic Formula for Chao 1 and Chao 2”
and hit ok.

L5-8
4. Now go back to the to the “diversity” menu and choose “compute diversity statistics”.
If asked “are you sure because stats have already been calculated”, choose “yes”.

5. You will be shown a table of your results. Click the “export” option on the bottom of
the screen and create a new “ASCII” text file. Now go back to Excel and import the new
text file you made. You will be given a few options but all you need to do is click on the
“yes” option to each question- no need to read all the details here.

6. Go back to Excel and open our exported EstimateS file. In Excel and you can graph
the results. For our purpose you will want to keep only the columns that read:

Samples Sobs (Mao Tau) Uniques Duplicates ACE ICE Chao2 Jack2

7. Graphing our data. First, select all of the columns you saved in step 6 above. Next
go to the “Insert” menu and choose chart. Under the chart options, choose “line”. Now
you will need to answer each of the questions you are posed but the only one you have to
worry about is the label axes option. You will want the X axis to be “Number of samples”
and the Y to be “Number of Species” or “Diversity”.

ESTIMATING STREAM QUALITY


Bioassessment is the process of using the presence or absence of species to
determine the health of a given habitat. In bioassessment, an assemblage of organisms is
used to calculate a biotic index, which, in our case, is a rating of water quality. In stream
ecology, the basic premise is that some species are more tolerant of pollution or
environmental stress than others. If an ecosystem has nothing but pollution tolerant
species, then it is most likely polluted. If an ecosystem has many intolerant species, then
it is most likely an unpolluted, or unstressed ecosystem,
Stream ecologists classify different species according to their tolerance of
pollution. Often times these ecologists use Ephemerotera (mayflies), Plecoptera
(stoneflies) and Trichoptera (caddisflies) (EPTs for short) as indicator taxa, as these
groups are commonly found in streams, and are sensitive to the presence of pollutants or
changes in stream quality (such as pH, turbidity, temperature etc). It is the presence of
intolerant taxa, for example, that would lead us to believe that a stream is healthy,
whereas the lack of these taxa would lead us to question the health of the stream. When
using bioassement protocols, it is important that we have some sort of reference to which
we can compare the site of interest to. In our study, we will us the “upstream site” as our
reference to compare to the “downstream” site.

L5-9
ANALYZING DIVERISTY
H1: Human uses and inputs negatively impact the diversity of macroinvertebrates
in Boulder Creek.
Hnull: Human uses and inputs do not impact the diversity of macroinvertebrates
in Boulder Creek.

Predictions:
If human uses and inputs negatively impact the diversity of stream macroinvertebrates, then
1. There should be more total families in the upstream site, relative to the
downstream site.

2. There should be a difference in the taxa found at the two sites.

3. Families at the upstream site should be more evenly represented than


those at the downstream site.

4. There should be more absolute and estimated species at the upstream


site, relative to the downstream site.

5. Our bioassessment protocol should indicate that pollution sensitive taxa


are more common at the upstream site relative to the down stream site.

In this lab you and your fellow students will be placed into one of 4 groups to test
the predictions above. You only have to turn in the page you and your group worked on
(7pts), but be that by next week, you also turn in your homework (3pts, last page).

L5-10
GROUP 1. You will test predictions 1& 2. Go to the website and download the Excel file
titled “Boulder Creek Diversity.” You will use sheet 1 (on the lower left hand corner) for
your analysis. After calculating the values below, talk within your group to decide what
your findings mean. Are the results consistent with the above hypothesis?

1. Provide the descriptive statistics for the number of families found at the upstream and
downstream sites (mean, standard deviation, sample size). Draw a bar graph to illustrate
your data.

2. Conduct a t-test to determine whether the average number of families at the two sites
differs. Provide the t value, degrees of freedom (1, total sample size-2) and the P-value.

3. Using the Jaccard’s index, determine the degree of similarity between the families
found at both sites. Are the sites significantly dissimilar (use the table in the Jaccard’s
section to determine this if possible)?

L5-11
GROUP 2. You will test prediction 3. Go to the web site and download the Excel file
“Boulder Creek Diversity.” You will use sheet 2 (on the lower left hand corner) for your
analysis. After calculating the values below, talk within your group to decide what your
findings mean. Are the results consistent with the above hypothesis?

1. Calculate the Shannon-Wiener Index coefficient (H’) for both sites.

H’ Upstream=
H’ Downstream site=

2. Calculate the Evenness coefficient for both sites.

E Upstream (H’/H’max) =
E Downstream (H’/H’max) =

3. On the Excel file titled “Boulder Creek Diversity”, go to sheet 3. Provide the
descriptive statistics for the Evenness estimated by the sections that surveyed the
upstream and downstream sites (mean, standard deviation, sample size). Draw a bar
graph to illustrate your data (i.e. what was the average E scores and SD for both sites).

4. Using the data in #3 above, conduct a t-test to determine whether the average Evenness
values at the two sites differs. Provide the t value, degrees of freedom (1, total sample
size-2) and the P-value.

L5-12
GROUP 3. You will test prediction 4. Go to the web site and download the Excel file
“EstimateS Downstream.” After calculating the values below, talk within your group to
decide what your findings mean. Are the results consistent with the above hypothesis?

1. Use EstimateS to determine the number of species that are estimated to be in the
downstream site. Compare the estimated number of species to that which was actually
found at the site. Has the site been well sampled?

2. Create a graph in Excel to illustrate your findings. Be sure to include Sobs, uniques,
duplicates and your estimators (ACE, ICE, Chao2 & Jack2).

3. Compare the estimated and actual diversity of the downstream and the upstream sites.
Do you your finding support the above hypothesis?

In your presentation, be sure to interpret what the number of uniques and duplicates tell
you about the estimated species values. Is the number of actual and estimated species
beginning to asymptote?

L5-13
Group 4. You will test prediction 5. Go to the web site and download the Excel file
“Boulder Creek Diversity”. You will use sheet 4 (on the lower left hand corner) for your
analysis. After calculating the values below, talk within your group to decide what your
findings mean. Are the results consistent with the above hypothesis?

1. Provide the descriptive statistics for the stream quality assessment values calculated by
the different lab sections for the upstream and downstream sites (mean, standard
deviation, sample size). Draw a bar graph to illustrate your data.

2. Conduct a t-test to determine whether the average quality assessment values differed
between the two sites. Provide the t value, degrees of freedom (1, total sample size-2) and
the P-value. Where your findings consistent with the hypothesis above?

3. How else might you measures stream health?

L5-14
Homework (3pts). Use a Chi-square analysis to determine whether the orders of aquatic
insects are similarly represented at the two sites. Go to the web site and download the
Excel file “Boulder Creek Diversity”. You will use sheet 5 for your analysis. See the
section titled “The Chi-Square Analysis (X2): Comparing frequency distributions”, in lab
number 3 for instructions.

1. Provide the X2 and df for your analysis. Using Table 9 in Lab 3, what was your
estimated “P value”?

2. In a few sentences, interpret your findings.

L5-15

You might also like