Professional Documents
Culture Documents
Diversity Vegan
Diversity Vegan
Jari Oksanen
processed with vegan 2.4-6 in R version 3.4.3 (2017-11-30) on January 24, 2018
1
default is to use natural logarithms. Shannon index
is calculated with: 15 47 43
●
● 4.5
● ●
●
●
> H <- diversity(BCI) ● ●
●
● ● 4.0
●
diversity
> J <- H/log(specnumber(BCI)) 28 23 12
●
4.5 ● ● ●
● ●
●
where specnumber is a simple vegan function to 4.0
● ● ●
●
●
●
● ●
● ●
2.0 ● ● ●
S
1 X
Ha = log pai , (4) 00.25
0.51 2 4 8 163264Inf 00.25
0.51 2 4 8 163264Inf 00.25
0.51 2 4 8 163264Inf
1−a i=1
and the corresponding Hill number is Na = Figure 1: Rényi diversities in six randomly selected
exp(Ha ). Many common diversity indices are spe- plots. The plot uses Trellis graphics with a separate
cial cases of Hill numbers: N0 = S, N1 = exp(H 0 ), panel for each site. The dots show the values for
N2 = D2 , and N∞ = 1/(max pi ). The correspond- sites, and the lines the extremes and median in the
ing Rényi diversities
P 2 are H0 = log(S), H1 = H 0 , data set.
H2 = − log( pi ), and H∞ = − log(max pi ). Tsal-
lis diversity of order q is (Tóthmérész, 1995):
S
! 2 Rarefaction
1 X
q
Hq = 1− p . (5)
q−1 i=1
Species richness increases with sample size, and dif-
ferences in richness actually may be caused by dif-
These correspond to common diversity indices:
0 ferences in sample size. To solve this problem, we
H0 = S − 1, H1 = H , and H2 = D1 , and can
may try to rarefy species richness to the same num-
be converted to Hill numbers:
ber of individuals. Expected number of species in
1 a community rarefied from N to n individuals is
Nq = (1 − (q − 1)Hq ) 1−q . (6)
(Hurlbert, 1971):
We select a random subset of five sites for Rényi
diversities: S
X N −xi
n
Ŝn = (1 − qi ) , where qi = N
. (7)
> k <- sample(nrow(BCI), 6)
i=1 n
> R <- renyi(BCI[k,])
2
et al., 1975): 3 Taxonomic and functional
diversity
s2 = qi (1 − qi )
S X
X
"
N −xi −xj
Simple diversity indices only consider species iden-
#
n
+2 N
− qi qj
. (8) tity: all different species are equally different. In
i=1 j>i n contrast, taxonomic and functional diversity in-
dices judge the differences of species. Taxonomic
Equation 8 actually is of the same form as the vari- and functional diversities are used in different fields
ance of sum of correlated variables: of science, but they really have very similar reason-
S X
ing, and either could be used either with taxonomic
or functional traits of species.
X X X
VAR xi = VAR(xi )+2 COV(xi , xj ) .
i=1 j>i
(9) 3.1 Taxonomic diversity: average
The number of stems per hectare varies in our distance of traits
data set:
The two basic indices are called taxonomic diver-
> quantile(rowSums(BCI))
sity ∆ and taxonomic distinctness ∆∗ (Clarke and
0% 25% 50% 75% 100% Warwick, 1998):
340.0 409.0 428.0 443.5 601.0
PP
To express richness for the same number of individ- i<j ωij xi xj
∆= (10)
uals, we can use: n(n − 1)/2
PP
i<j ωij xi xj
> Srar <- rarefy(BCI, min(rowSums(BCI))) ∆∗ = P P . (11)
i<j xi xj
Rarefaction curves often are seen as an objective
solution for comparing species richness with differ- These equations give the index values for a single
ent sample sizes. However, rank orders typically site, and summation goes over species i and j, and
differ among different rarefaction sample sizes, rar- ω are the taxonomic distances among taxa, x are
efaction curves can cross. species abundances, and n is the total abundance
As an extreme case we may rarefy sample size to for a site. With presence–absence data, both in-
two individuals: dices reduce to the same index called ∆+ , and for
this it is possible to estimate standard deviation.
> S2 <- rarefy(BCI, 2)
There are two indices derived from ∆+ : it can be
1 +
This will not give equal rank order with the previ- multiplied with species richness to give s∆ , or
ous rarefaction richness: it can be used to estimate an index of variation in
taxonomic distinctness Λ+ (Clarke and Warwick,
> all(rank(Srar) == rank(S2))
2001):
2
PP
i<j ωij
[1] FALSE
+
Λ = − (∆+ )2 . (12)
Moreover, the rarefied richness for two individuals n(n − 1)/2
is a finite sample variant of Simpson’s diversity in- We still need the taxonomic differences among
dex (Hurlbert, 1971) – or more precisely of D1 + 1, species (ω) to calculate the indices. These can be
and these two are almost identical in BCI: any distance structure among species, but usually
> range(diversity(BCI, "simp") - (S2 -1)) it is found from established hierarchic taxonomy.
Typical coding is that differences among species in
[1] -0.002868298 -0.001330663
the same genus is 1, among the same family it is 2
Rarefaction is sometimes presented as an ecolog- etc. However, the taxonomic differences are scaled
ically meaningful alternative to dubious diversity 1 This text normally uses upper case letter S for species
indices (Hurlbert, 1971), but the differences really richness, but lower case s is used here in accordance with
seem to be small. the original papers on taxonomic diversity
3
3.2 Functional diversity: the height
● ●
of trait tree
80
● ●
●
● In taxonomic diversity the primary data were tax-
●
● onomic trees which were transformed to pairwise
distances among species. In functional diversity
70
● ●
● ●
● ●
the primary data are species traits which are trans-
lated to pairwise distances among species and then
●
●
to clustering trees of species traits. The argument
∆+
●
60
2006).
Function treedive implements functional diver-
●
sity defined as the total branch length in a trait
6 8 10 12 14
dendrogram connecting all species, but excluding
the unnecessary root segments of the tree (Petchey
Number of Species
and Gaston, 2002, 2006). The example uses the
taxonomic distances of the previous chapter. These
Figure 2: Taxonomic diversity ∆+ for the dune are first converted to a hierarchic clustering (which
meadow data. The points are diversity values of actually were their original form before taxa2dist
single sites, and the funnel is their approximate converted them into distances)
confidence intervals (2× standard error).
> tr <- hclust(taxdis, "aver")
> mod <- treedive(dune, tr)
4
models usually fit poorly to the BCI data, but here
our random plot (number 2):
30
> prestondistr(BCI[k,])
25
mode width S0
Species
Frequencies by Octave
0 1 2 3 4
10
5
eter (α), log-normal and Zipf models two (µ, σ, or
0 20 40 60 80
p̂1 , γ, resp.), and Zipf–Mandelbrot model has three
Null Preemption Lognormal
2^5
● AIC = 296.73 ● AIC = 281.46 ● AIC = 272.80 (c, β, γ).
●
● ●
● ●
●
●● ●● ●● Function radfit also works with data frames,
2^4 ●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
and fits models for each site. It is curious that
2^3 ● ● ●
●
●●
●●
●
●●
●●
●
●●
●
●●
●●
●
●●
●●
●
●●
●
●●
●●
●
●●
●●
●
●●
log-normal model rarely is the choice, although it
2^2 ●●
●
● ●●
●
● ●●
●
●
●
●
●●
●●
● ●
●
●●
●●
● ●
●
●●
●●
● generally is regarded as the canonical model, in
2^1 ●
●
●●
●●
●
●●
●●
●
● ●
●
●●
●●
●
●●
●●
●
● ●
●
●●
●●
●
●●
●●
●
●
particular in data sets like Barro Colorado tropi-
Abundance
2^0 ●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●● ●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●● ●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●● cal forests.
Zipf Mandelbrot
● AIC = 302.38 ● AIC = 260.71 2^5
●
● ●
●
●●
●
●
●●
●●
●
●
●●
●
●
●●
●●
●
●
2^4 5 Species accumulation and
● ●
● ●
●
●
●●
●●
●
●●
●●
●
●●
●
●
●●
●●
●
●●
●●
●
●●
2^3
beta diversity
●●
●
● ●●
●
● 2^2
●
●
●●
●●
● ●
●
●●
●●
●
●
●
●●
●●
●
●●
●●
●
● ●
●
●●
●●
●
●●
●●
●
● 2^1 Species accumulation models and species pool mod-
●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●● ●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●●
●
●●
●● 2^0 els study collections of sites, and their species rich-
0 20 40 60 80 ness, or try to estimate the number of unseen
Rank species.
Function radfit compares the models using al- where rij is the correlation coefficient between
ternatively Akaike’s or Schwartz’s Bayesian infor- species i and j. Both of these are unpublished:
mation criteria. These are based on log-likelihood, eq. 19 was developed by Roeland Kindt, and eq. 20
but penalized by the number of estimated param- by Jari Oksanen. The third analytic method was
eters. The penalty per parameter is 2 in aic, and suggested by Coleman et al. (1982):
log S in bic. Brokenstick is regarded as a null S f
model and has no estimated parameters in ve-
X 1 i
Sn = (1 − pi ), where pi = 1 − , (21)
gan. Preemption model has one estimated param- i=1
n
6
ᾱ (Tuomisto, 2010a):
β = S/ᾱ − 1 . (22)
200
> ncol(BCI)/mean(specnumber(BCI)) - 1
[1] 1.478519
7
14 "co" = (a*c+a*b+2*b*c)/(2*(a+b)*(a+c))
15 "cc" = (b+c)/(a+b+c)
16 "g" = (b+c)/(a+b+c)
0.6
17 "-3" = pmin(b,c)/(a+b+c)
18 "l" = (b+c)/2
19 "19" = 2*(b*c+1)/(a+b+c)/(a+b+c-1)
0.5
20 "hk" = (b+c)/(2*a+b+c)
21 "rlb" = a/(a+c)
Distance to centroid
22 "sim" = pmin(b,c)/(pmin(b,c)+a)
23 "gl" = 2*abs(b-c)/(2*a+b+c)
0.4
24 "z" = (log(2)-log(2*a+b+c)+log(a+b+c))/log(2)
0.3
them are well known dissimilarity indices. One of
the more interesting indices is based on the Arrhe-
nius species–area model
0.2
Ŝ = cX z , (24)
8
where f is the number of species occuring in ex- are often expressed as they would apply to the
actly i sites in the data: fN is the number of speciespool size Sp ; this is only true if we assume that
occurring on every N site, f1 the number of species VAR(So ) = 0. The variance of the Chao estimate
occurring once, and f0 the number of species in the is (Chiu et al., 2014):
species pool but not found in the sample. The total
G3
number of species in the pool Sp is G
VAR(fˆ0 ) = f1 A2 + A2 G2 + A ,
4 2
N
X N −1 f1
Sp = fi = f0 + So , (25) where A = and G = . (30)
i=0
N f2
where So =
P For the bias-corrected form of eq. 26 (case f2 = 0),
i>0 fi is the observed number of
species. The sampling proportion i/N is an es- the variance is (Chiu et al., 2014, who omit small-
timate for the commonness of the species in the sample correction in some terms):
community. When species is present in the com-
VAR(fˆ0 ) = 14 A2 f1 (2f1 − 1)2 + 21 Af1 (f1 − 1)
munity but not in the sample, i = 0 is an obvious
under-estimate, and consequently, for values i > 0 f14
− 14 A2 . (31)
the species commonness is over-estimated (Good, Sp
1953). The models for the pool size estimate the The variance of the first-order jackknife is based
number of species missing in the sample f0 . on the number of “singletons” r (species occurring
Function specpool implements the following only once in the data) in sample plots (Smith and
models to estimate the number of missing species van Belle, 1984):
f0 . Chao estimator is (Chao, 1987; Chiu et al., !
N
2014): X f1 N − 1
ˆ
VAR(f0 ) = 2
ri − . (32)
( 2
f1 N −1 i=1
N N
if f2 > 0
fˆ0 = f2f1 (f
2 N
1 −1) N −1
(26)
if f2 = 0 . Variance of the second-order jackknife is not evalu-
2 N
ated in specpool (but contributions are welcome).
The latter case for f2 = 0 is known as the bias- The variance of bootstrap estimator is(Smith and
corrected form. Chiu et al. (2014) introduced the van Belle, 1984):
small-sample correction term NN−1 , but it was not
So
originally used (Chao, 1987). X
VAR( ˆ0 ) =
f qi (1 − qi )
The first and second order jackknife estimators
are (Smith and van Belle, 1984): i=1
XSo
N − 1 (Zij /N )N − qi qj
fˆ0 = f1 (27) + 2
N i6=j
2
2N − 3 (N − 2) where qi = (1 − pi )N , (33)
fˆ0 = f1 + f2 . (28)
N N (N − 1)
and Zij is the number of sites where both species
The boostrap estimator is (Smith and van Belle, are absent.
1984): The extrapolated richness values for the whole
So
X BCI data are:
fˆ0 = (1 − pi )N . (29) > specpool(BCI)
i=1
Species chao chao.se jack1 jack1.se jack2
The idea in jackknife seems to be that we missed All 225 236.3732 6.54361 245.58 5.650522 247.8722
about as many species as we saw only once, and the boot boot.se n
idea in bootstrap that if we repeat sampling (with All 235.6862 3.468888 50
replacement) from the same data, we miss as many If the estimation of pool size really works, we should
species as we missed originally. get the same values of estimated richness if we take
The variance estimaters only concern the esti- a random subset of a half of the plots (but this is
mated number of missing species fˆ0 , although they rarely true):
9
> s <- sample(nrow(BCI), 25) appendix):
> specpool(BCI[s,])
1 1
Species chao chao.se jack1 jack1.se jack2 s2 = [a1 (Sp a31 a2 + 4Sp a21 a22
All 213 240.1341 12.86254 242.76 8.96714 256.2917 4 (a2 + 1)4 Sp
boot boot.se n
All 226.8904 4.862695 25 + 2Sp a1 a32 + 6Sp a21 a2 + 2Sp a1 a22 − 2Sp a32
+ 4Sp a21 + Sp a1 a2 − 5Sp a22 − a31 − 2a21 a2
6.2 Pool size from a single site − a1 a22 − 2Sp a1 − 4Sp a2 − Sp )] . (36)
The specpool function needs a collection of sites, The variance estimators only concern the number
but there are some methods that estimate the num- of unseen species like previously.
ber of unseen species for each single site. These The ace is estimator is defined as (O’Hara,
functions need counts of individuals, and species 2005):
seen only once or twice, or other rare species, take
the place of species with low frequencies. Function Srare a1 2
Sp = Sabund + + γ , where
estimateR implements two of these methods: CACE CACE
a1
> estimateR(BCI[k,]) CACE = 1 −
Nrare
2 10
S.obs 84.000000 2 Srare X Nrare − 1
S.chao1 117.214286 γ = i(i − 1)a1 .
CACE i=1 Nrare
se.chao1 15.918953
S.ACE 117.317307 (37)
se.ACE 5.571998
Now a1 takes the place of f1 above, and means the
In abundance based models ai denotes the number
number of species with only one individual. Here
of species with i individuals, and takes the place of
Sabund and Srare are the numbers of species of abun-
fi of previous models. Chao’s method is similar as
dant and rare species, with an arbitrary upper limit
the bias-corrected model eq. 26 (Chao, 1987; Chiu
of 10 individuals for a rare species, and Nrare is the
et al., 2014):
total number of individuals in rare species. The
a1 (a1 − 1) variance estimator uses iterative solution, and it is
Sp = So + . (34) best interpreted from the source code or following
2(a2 + 1)
O’Hara (2005).
When f2 = 0, eq. 34 reduces to the bias-corrected The pool size is estimated separately for each
form of eq. 26, but quantitative estimators are site, but if input is a data frame, each site will be
based on abundances and do not use small-sample analysed.
correction. This is not usually needed because sam- If log-normal abundance model is appropriate, it
ple sizes are total numbers of individuals, and these can be used to estimate the pool size. Log-normal
are usually high, unlike in frequency based models, model has a finite number of species which can be
where the sample size is the number of sites (Chiu found integrating the log-normal:
et al., 2014). √
Sp = Sµ σ 2π , (38)
A commonly used approximate variance estima-
tor of eq. 34 is: where Sµ is the modal height or the expected num-
ber of species at maximum (at µ), and σ is the
2 a1 (a1 − 1) a1 (2a1 + 1)2 width. Function veiledspec estimates this inte-
s = +
2 (a2 + 1)2 gral from a model fitted either with prestondistr
a21 a2 (a1 − 1)2 or prestonfit, and fits the latter if raw site data
+ 4
. (35) are given. Log-normal model may fit poorly, but
4(a2 + 1)
we can try:
However, vegan does not use this, but instead the > veiledspec(prestondistr(BCI[k,]))
following more exact form which was directly de- Extrapolated Observed Veiled
rived from eq. 34 following Chiu et al. (2014, web 96.20084 84.00000 12.20084
10
Ceiba pentandra References
3.0
● ● ● ● ●● ● ●
Anderson MJ, Ellingsen KE, McArdle BH (2006).
“Multivariate dispersion as a measure of beta di-
Occurrence
●● ● ●
● ●●
● ●
●● ●● ● ● ●
capture-recapture data with unequal catchabil-
ity.” Biometrics, 43, 783–791.
0.5
● ●
● ●●●
●● ●
●
●● ● ●●
● ●
●● ●
species richness via a modified Good-Turing fre-
0.52 0.54 0.56 0.58 0.60
quency formula.” Biometrics, 70, 671–682.
Probability of occurrence Clarke KR, Warwick RM (1998). “A taxonomic
distinctness index and its statistical properties.”
Figure 7: Beals smoothing for Ceiba pentandra. Journal of Applied Ecology, 35, 523–531.
Clarke KR, Warwick RM (1999). “The taxonomic
> veiledspec(BCI[k,]) distinctness measure of biodiversity: weighting of
step lengths between hierarchical levels.” Marine
Extrapolated Observed Veiled
103.14426 84.00000 19.14426 Ecology Progress Series, 184, 21–29.
Clarke KR, Warwick RM (2001). “A further bio-
6.3 Probability of pool membership diversity index applicable to species lists: varia-
tion in taxonomic distinctness.” Marine Ecology
Beals smoothing was originally suggested as a tool Progress Series, 216, 265–278.
of regularizing data for ordination. It regularizes
data too strongly, but it has been suggested as a Coleman BD, Mares MA, Willis MR, Hsieh Y
method of estimating which of the missing species (1982). “Randomness, area and species richness.”
could occur in a site, or which sites are suitable for Ecology, 63, 1121–1133.
a species. The probability for each species at each
De Cáceres M, Legendre P (2008). “Beals smooth-
site is assessed from other species occurring on the
ing revisited.” Oecologia, 156, 657–669.
site.
Function beals implement Beals smoothing (Mc- Fisher RA, Corbet AS, Williams CB (1943). “The
Cune, 1987; De Cáceres and Legendre, 2008): relation between the number of species and the
> smo <- beals(BCI) number of individuals in a random sample of ani-
mal population.” Journal of Animal Ecology, 12,
We may see how the estimated probability of oc- 42–58.
currence and observed numbers of stems relate in
one of the more familiar species. We study only one Good IJ (1953). “The population frequencies of
species, and to avoid circular reasoning we do not species and the estimation of population param-
include the target species in the smoothing (Fig. 7): eters.” Biometrika, 40, 237–264.
> j <- which(colnames(BCI) == "Ceiba.pentandra") Heck KL, van Belle G, Simberloff D (1975). “Ex-
> plot(beals(BCI, species=j, include=FALSE), BCI[,j], plicit calculation of the rarefaction diversity mea-
ylab="Occurrence", main="Ceiba pentandra",
xlab="Probability of occurrence") surement and the determination of sufficient
sample size.” Ecology, 56, 1459–1461.
11
Hill MO (1973). “Diversity and evenness: a unifying Whittaker RH (1960). “Vegetation of Siskiyou
notation and its consequences.” Ecology, 54, 427– mountains, Oregon and California.” Ecological
473. Monographs, 30, 279–338.
Hurlbert SH (1971). “The nonconcept of species Whittaker RH (1965). “Dominance and diversity in
diversity: a critique and alternative parameters.” plant communities.” Science, 147, 250–260.
Ecology, 52, 577–586.
Williamson M, Gaston KJ (2005). “The lognormal
Koleff P, Gaston KJ, Lennon JJ (2003). “Measuring distribution is not an appropriate null hypothesis
beta diversity for presence-absence data.” Jour- for the species-abundance distribution.” Journal
nal of Animal Ecology, 72, 367–382. of Animal Ecology, 74, 409–422.
McCune B (1987). “Improving community ordina- Wilson JB (1991). “Methods of fitting domi-
tion with the Beals smoothing function.” Eco- nance/diversity curves.” Journal of Vegetation
science, 1, 82–86. Science, 2, 35–46.
12