You are on page 1of 11

Environmental Modelling and Software 125 (2020) 104615

Contents lists available at ScienceDirect

Environmental Modelling and Software


journal homepage: http://www.elsevier.com/locate/envsoft

ENMTML: An R package for a straightforward construction of complex


ecological niche models
e Felipe Alves de Andrade a, *, Santiago Jos�e Elías Velazco b, Paulo De Marco Júnior a
Andr�
a
Theory, Metacommunity and Landscape Ecology Lab, ICB V, Universidade Federal de Goi� as, CP 131, 74.001-970, Goi^ania, GO, Brazil
b
Instituto de Biología Subtropical, Universidad Nacional de Misiones-CONICET, Bertoni 85, CP 3370, Puerto Iguazú, Misiones, Argentina

A R T I C L E I N F O A B S T R A C T

Keywords: Ecological niche models (ENMs) is a popular method in ecology, mostly due to its broad applicability and the fact
Species distribution model that required data is simple and easily accessible from digital databases. Nevertheless, there is an underlying
Open-source software methodological complexity, often overlooked by many scientists that rely on ENMs to achieve other objectives.
Niche modeling
We present here the package ENMTML, an Open Source R package. The main purpose of this package is to
Model evaluation
assemble all this methodological complexity spread over several papers and bring it into the spotlight in a simple
way for people not used to the details of ENMs. The package contains several alternatives to different meth-
odological steps, e.g., pseudo-absence allocation and accessible area delimitation, formulated within a single
function, to make it accessible for people not used to the programming environment.

Software and data availability model that requires only occurrence data and environmental variables,
and (ii) a huge effort employed by researchers to develop robust
Availability of Software License GPL methods and software. There is a significant change in methods from
Name of software: ENMTML Code repository https://github. first ENMs studies, that uses one to few algorithms and do not explore
Type of software: Add-on package for R com/andrefaa/ENMTML other steps that could influence the result (Peterson and Holt, 2003), to
https://cran.r-project.org Installation in R install_github current studies, which use several algorithms and diverse steps to fit
First available 2019 (“andrefaa/ENMTML")
models, such as pseudo-absence allocation and accessible area definition
Program language: R
Requires R version 3.6.0 or later (e.g., Velazco et al., 2019).
One of the major assets of ENMs is its community, with several re-
searchers dedicated to delving into specific methodological aspects of
the modeling process. Some noteworthy aspects involve the control of
collinearity among environmental variables (De Marco and No �brega,
1. Introduction 2018), different strategies for the allocation of pseudo-absences (Engler
et al., 2004; Barbet-Massin et al., 2012; Senay et al., 2013); careful
Ecological Niche and Species Distribution Models (ENMs and SDMs, definition of the accessible area (Peterson et al., 2001; Sobero�n, 2010;
respectively), are widely applied in ecology, providing important basal Barve et al., 2011; Cooper and Sober� on, 2018); ensemble of different
information for the most diverse fields, such as conservation (e.g. Keppel algorithms (Marmion et al., 2009; Thuiller et al., 2009; Hao et al., 2019);
et al., 2012; Razgour et al., 2018), biological invasions (Peterson, 2003; different evaluation metrics (Allouche et al., 2006; Leroy et al., 2018)
Campos et al., 2014; Lins et al., 2018), phylogenetic/evolutionary and diverse methods to partition the occurrence data for fitting and
studies (e.g. Carstens and Richards, 2007; Chifflet et al., 2016) and evaluating the model (Muscarella et al., 2014; Roberts et al., 2017).
disease management (Peterson and Shaw, 2003). While there are theo- Given the wide variety of methods for each one of the several steps of
retical differences among ENMs and SDMs (see Peterson and Sobero �n,
fitting ENMs and the possible interactions that may arise, the number of
2012), we will adopt the nomenclature ENM from now on as most models produced for a single species may easily surpass a thousand.
studies are closer to estimating species’ niche. Such broad applicability The great diversity of choices creates a duality in ENMs: while
is related to two significant properties of ENMs: (i) a simple underlying

* Corresponding author.
E-mail addresses: andrefaandrade@gmail.com, andrefaa@discente.ufg.br (A.F.A. Andrade).

https://doi.org/10.1016/j.envsoft.2019.104615
Received 29 October 2019; Received in revised form 19 December 2019; Accepted 26 December 2019
Available online 6 January 2020
1364-8152/© 2020 Elsevier Ltd. All rights reserved.
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

models are simple to fit and the required data is easily available, several fitting the model. Occurrence data is input as a tab-separated text file
decisions should be made regarding methodological steps that must be (TXT). The program automatically uses unique occurrences per cell. In
done judiciously and are not as readily available as the data. As a result, addition, the user can control two steps acting over the occurrence
studies that rely on ENMs usually do not have the same methodological dataset: i) the minimum number of occurrences valid for model fitting
rigor as studies that focus on developing ENMs, i.e., several studies still and ii) perform a thinning process to reduce sampling bias. Regarding
apply (Area Under the Curve) AUC as an evaluation metric, even though predictors, there are three methods to control for collinearity and the
it has been demonstrated for over 10 years that the metric is deeply possibility to include predictors for other time or geographic windows.
affected by prevalence (Lobo et al., 2008) or the extent of the accessible As for pre-processing steps, there are five different strategies for pseudo-
area (Peterson et al., 2008; Barve et al., 2011). On the other side, there absence allocation with the option to control for presence-absence ratio;
has been a great effort to develop alternatives for the AUC and several four methods to partition the data into subsets, with the possibility to
other methodological aspects, which have been implemented in several provide a specific dataset for independent evaluation (a useful asset
R packages and ENMs software (Thuiller et al., 2009; Guo and Liu, 2010; when studying biotic invasions); two methods to create species-specific
Naimi and Araújo, 2016; Hijmans et al., 2017; Golding et al., 2018; Kass accessible areas; and it is also possible to identify extrapolation areas
et al., 2018; Sa
�nchez-Tapia et al., 2018; Cobos et al., 2019). based on a Mobility-Oriented Parity analysis (Owens et al., 2013).
Ideally, ENMs should be fit-for-purpose, which means that fitting The processing stage is when algorithms will fit models, and the
ENMs is a process that must be thought carefully, as there is not a single suitability maps generated. For starters, the user can choose if both
correct way to fit models (Guillera-Arroita et al., 2015; Qiao et al., partial and final suitability maps will be generated or not. There are
2015). Due to the great variety of methodological choices and the ve- thirteen algorithms available for model fitting: Bioclim (Nix, 1986),
locity that new alternatives arise, it may be hard to keep up with nov- Mahalanobis Distance (Farber and Kadmon, 2003), Domain (Carpenter
elties within the ENMs’ field. As a result, people who are not involved in et al., 1993), Ecological Niche Factor Analysis (Hirzel et al., 2002),
the methodological developments within the field or do not have con- Generalized Linear Models (McCullagh and Nelder, 1989), Generalized
nections to developers have small participation in all the published Additive Models (Hastie and Tibshirani, 1990), Boosted Regression Tree
papers (Ahmed et al., 2015). We introduce here ENMTML, a new R (Friedman, 2001), Random Forests (Prasad et al., 2006), Support Vector
package to fit ENMs. The main objective of this package is to put Machine (Guo et al., 2005), Maximum Entropy with quadratic and linear
together all this methodological diversity developed within the ENM (Anderson and Gonzalez, 2011) and default features (Phillips et al.,
field and present it to users simply and transparently. Despite being an R 2006; Phillips, 2017), Maximum Likelihood (Royle et al., 2012) and
package, we also made it friendly for non-programmers and summarized Gaussian Process (Golding and Purse, 2016).
the whole fitting process into a single function with several arguments Finally, in the post-processing stage, the suitability maps generated
that correspond to the methodological alternatives. from the different algorithms are evaluated using seven different metrics
(AUC, True Skill Statistics (TSS), Kappa, Jaccard, Sorensen, Boyce, and
2. Methods description Fpb). When multiple models are fitted for the same species (i.e., several
replicates or geographical partitions), the evaluation output result is the
2.1. Arguments and settings mean and standard deviation of the partial models. Other post-
processing options include the creation of binary maps based on five
The ENMTML package and its processes can be divided into three different thresholds; six different ways to generate ensemble models;
major stages: pre-processing, processing, and post-processing. This di- and the application of spatial restrictions to reduce model commission
vision in three stages is familiar to most ENMs routines. Identifying the and bring the result closer to an estimation of the species realized dis-
stage in which each methodological step will be performed may help tribution (MSDM).
users to understand the connections among the different methodological All features are organized in a single R function with multiple ar-
steps and provides an overview that assists the decision-making process guments the user needs to fill according to the specific purpose. We
(Fig. 1). chose not to establish default arguments, so users must think carefully
In the pre-processing stage, the data is input (species occurrences and about the choices. To provide support, we briefly explain the method-
predictors variables), and a series of steps can be performed before ological steps and indicate relevant studies for each alternative.

Fig. 1. General workflow of the ENMTML package


and all steps that can be taken at each stage of the
modeling routine. In the pre-processing stage occur-
rence and predictors are imported and the user may
take six different steps before fitting the models. The
processing stage involves fitting the models and pro-
ducing the suitability maps, which can be made using
thirteen different algorithms. In the post-processing
stage, the results are evaluated and the user may
perform analysis upon the different suitability maps
produced.

2
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

2.2. Occurrence data processing (Heikkinen et al., 2006; De Marco and No �brega, 2018). Predictors
eliminated by the Pearson and VIF will also be eliminated for projections
Arguments involved: (occ_file/Sp/x/y/min_occ/thin_occ) datasets. When users choose to perform a PCA and have datasets for
Occurrence data is imported as a tab-separated TXT file that needs to projection, the linear relationship between the predictors and the prin-
be specified by the user as the file path of the file in the argument cipal components is projected onto the new datasets to create the prin-
occ_file. This file must contain information about species name, longi- cipal components for the projection datasets (see De Marco and
tude, and latitude (in decimal degrees), and the name of those columns No�brega, 2018).
must be provided in the arguments Sp, x, y.
The user must also provide the minimum number of unique occur- 2.4. Pseudo-absences and background points allocation
rences valid for model fitting in the argument min_occ, species below
this number will be excluded from the analysis. There is not a rule for the Arguments involved: (pseudoabs_method/pres_abs_ratio)
definition of a minimum number of occurrences, but there are several The program allocates pseudo-absences and background points
studies that indicate that model accuracy is directly related to sample within the area used to calibrate the models (Table 3). Such allocation
size (Wisz et al., 2008). There are several factors that affect the viable will be particular for those geographical partitioning method (such us
minimum number of occurrences (Mateo et al., 2010), but a good block- and band-cross validation) in which pseudo-absences and back-
framework for exploring this subject is the one developed by van ground points are created after performing such partition, in order to
Proosdij et al. (2015). maintain a homogeneous distribution of background points between
Finally, users might opt to reduce autocorrelation in occurrence data partitions, as well as a constant prevalence (conceived here as the
and possible sampling bias by a thinning technique (argument thin_occ), relationship between presences and pseudo-absences). Since algorithm’s
performed using the package spThin (Aiello-Lammens et al., 2015). performance may be sensible to the way pseudo-absences are distributed
There are three alternatives for defining the thinning distance: i) based throughout the calibration area (Wisz and Guisan, 2009; Barbet-Massin
on the distance of a Moran’s I Variogram that minimizes the spatial et al., 2012), the program offers five pseudo-absences allocation
autocorrelation; ii) retaining unique cells that fall within a grid two methods: i) ‘single random’ distribution (Zaniewski et al., 2002); ii)
times greater than the original cellsize; and iii) based on a minimum ‘geographically constrained method’, i.e., pseudo-absences are allocated
distance defined by the user (Table 1). For a better comprehension of the outside a buffer around presences (Barbet-Massin et al., 2012); iii)
topic see (Aiello-Lammens et al., 2015). ‘environmental constrained methods’ based on the lowest suitable re-
gion predicted by a Bioclim model (Engler et al., 2004); iv) ‘geographical
2.3. Predictors input and collinearity reduction and environmental constrained method’ (Lobo et al., 2010) and; v) a
three-step method which combine environmental and geographical
Arguments involved: (pred_dir/proj_dir/colin_var) approach plus a k-mean non-agglomerative cluster process to distribute
Predictors are imported in the argument pred_dir, which specifies the homogeneously on environmental space (Senay et al., 2013).
folder path of the predictors, and should be in any of the given formats: The program also allows for the user to define the ratio between
BIL, TIF, ASC, TXT. Predictors for projection also accept the same for- presences and absences (argument pres_abs_ratio), a methodological
mats and should be included in nested folders, with a major folder step that received considerable focus from researchers and affects al-
including all the projections datasets each with its respective sub-folder gorithm performance (Barbet-Massin et al., 2012).
(Fig. 2).
Collinearity in predictors can be controlled using three different 2.5. Methods to define the accessible area
strategies (Table 2): i) Pearson correlation with a threshold defined by
the user; ii) Variance Inflation Factor (VIF; Marquaridt, 1970) and; Arguments involved: (sp_accessible_area)
Principal Component Analysis (PCA), using the axis that account for A crucial decision at the moment to construct ENMs is the hypoth-
95% of the total variance in the predictors as the new predictors esized accessible area, i.e., the geographical region used by a species
throughout a relevant period of time (Barve et al., 2011), also known as
Table 1 the movement component of the BAM diagram (Soberon and Peterson,
Thinning alternatives in the ENMTML package (references for each method 2005). Such an accessible area can be delimited based on the knowledge
indicate its original development or an example of its best practice use). of species ecology, dispersal ability, geographical barriers, and ancient
Occurrence Acronym Method Additional References region were species inhabited (Sobero �n, 2010; Peterson et al., 2011).
thinning used in the description arguments Nonetheless, this information is often missing for most species; there-
method thin_occ fore, different techniques act as an approximation of the accessible area.
argument ENMTML account with four option to define accessible areas: i) no re-
None NULL Use original – – striction, i.e., the entire predictors extent will be used as accessible area;
unique ii) define an accessible area based on a buffer around occurrence data;
occurrences
iii) define the accessible area based on a mask, e.g., using a shapefile for
Moran MORAN Choose from pairs Veloz
biogeographical ecoregions, or; iv) accessible are defined by the user

Variogram of occurrences (2009)
which are within a (supported formats: SHP/TIF/BIL/ASC/TXT; Table 4).
distance defined
by a Moran 2.6. Data partition
Variogram
2x cell-size CELLSIZE Choose from pairs – Velazco
of occurrences et al.(2019) Arguments involved: (eval_occ/part)
which are within a An ideal model evaluation requires a dataset in which occurrences
distance defined are independent of the ones used to fit the model; this independent
by 2x cellsize
dataset can be supplied as the path to a TXT file in the argument
User-defined USER- Choose from pairs Distance Aiello-
DEFINED of occurrences Lammens eval_occ.
which are within a et al.(2015) Nevertheless, the most common evaluation method is to partition
distance defined occurrence data in two subsets, one to fit the model and another for
by the user (in evaluation. For this option (argument part), the package offers four
km)
methods for data partitioning, two based on random partitions and two

3
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

Fig. 2. All folders and subfolders involved in a single run of the ENMTML package. Yellow folders (occurrence and predictors) are mandatory to run the main
function. Green folders (projection and accessible area) are optional and will be required according to the modeling objective. Blue folders are produced by the script,
is that most outputs are within the main Results folders, which contains a set of TXT files with model evaluation and information and sub-folders with the models
produced by each algorithm and ensemble methods. Folders related to the accessible area, pseudo-absence allocation, and geographical partition are also created to
avoid repeating those analyses in the future. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of
this article.)

4
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

Table 2 Table 3
Methods to reduce predictor collinearity available in the ENMTML package. Pseudo-absence allocation methods available in the ENMTML package.
Variable Acronym Method Additional References Pseudo-absence Acronym used in the Description of Reference
collinearity used in the description arguments allocation pseudoabs_method restriction
reduction colin_var method argument
method argument
Random RND None Zaniewski
None NULL Use original – – et al. (2002)
variables Geographical GEO_CONST Outside a Barbet-Massin
provided by the Constrain distance buffer et al. (2012)
user Environmental ENV_CONST Within lowest Engler et al.
Pearson PEARSON Eliminates Threshold Dormann Constrain suitability areas (2004)
Correlation correlated et al. (2013) predicted by a
variables Bioclim
according to a Environmental GEO_ENV_CONST Combination of Lobo et al.
chosen and Geographical and (2010)
threshold Geographical Environmental
Variance VIF Eliminates – Marquaridt Constrain
Inflation correlated (1970) Three-Step GEO_ENV_KM_CONST Combination of Senay et al.
Factor variables based Constrain Environmental, (2013)
on VIF Geographical and
Principal PCA Performs a PCA – De Marco k-mean cluster
Components on variables and and N�
obrega
Analysis use the principal (2018)
components as 2019), and assuming that no single methods can lead with all modeling
variables
situation (Qiao et al., 2015), our ENMTML package fit 13 algorithms
that range different statistical techniques and type of data used to fit the
on geographical partitions (Table 5). Among random partition methods models (Table 6).
the user can choose: i) bootstrap, in which users specify the number of
replicates and proportion of the dataset used for fitting the model, e.g., 2.9. Model evaluation
10 replicates each with 70% for training models, the remaining 30% is
used for validation; and ii) k-fold, in which the dataset is split into a Model evaluation is performed using seven different metrics: Area
chosen number of folds, and on each run the model is fit using k-1 folds Under the Curve (AUC, (Fielding and Bell, 1997), Kappa (Cohen, 1960),
and evaluated on the folder left out. As alternatives for geographical True Skill Statistic (Allouche et al., 2006), Jaccard (Leroy et al., 2018),
partitions, the dataset can be split based on bands (latitudinal/longitu- Sorensen (Leroy et al., 2018), Fpb (Li and Guo, 2013), Boyce (Boyce
dinal) or based on a checkerboard (blocks), with occurrence data being et al., 2002), partial ROC and its respective p-value (Peterson et al.,
split into two subsets, alternatively used for fitting and evaluating the 2008), omission rate (OR; Fielding and Bell, 1997) and proportion of the
model. The optimal band or checkerboard is found based on the size total area in which species is considered to be present (Peterson, 2001).
which presents (i) the lower spatial autocorrelation, based on Moran’s I, The values at the table are an average of the several replicates (if the
(ii) the maximum environmental similarity, based on Multivariate bootstrap partition was chosen), folds (if random k-folds were chosen),
Environmental Similarity Surface metric (MESS) and (iii) the minimum or geographical subsets (if bands or block partition was chosen),
difference in the number of records between subsets (Velazco et al., accompanied by the respective standard deviation. Metrics are given for
2019). The importance of carefully delimiting blocks for fitting and each algorithm used to fit models for each species, and each threshold
evaluating the models is discussed by Roberts et al. (2017). chosen to create binary maps. The type of partition used to create
occurrence subsets is also indicated (Table 7).

2.7. Measure of models’ extrapolation

Arguments involved: (extrapolation)


ENMs are fitted based on conditions found in occurrences and Table 4
absence/pseudo-absence/background data. When making predictions, it Methods to delimit species accessible area available at the ENMTML package.
is not uncommon for models to predict onto new conditions (non-analog Accessible Acronym used in the Type of data required Reference
climates), especially when performing projections to other time periods area sp_accessible_area
or geographical regions. In those situations, models will perform ex- definition argument
trapolations, which means that there is some uncertainty as models were method
not fitted on those environmental conditions (Fitzpatrick and Hargrove, Whole NULL no data –
2009). To identify geographical locations in which models are per- predictors
extent
forming extrapolations, we included a Mobility-Oriented Parity analysis
Buffer BUFFER Type 1 ¼ buffer radius Barve et al.
(MOP; Owens et al., 2013), which is based on the defined accessible area based on occurrence (2011)
for each species. If there is no accessible area, the program calculates data
MOP based on all conditions within the geographical extent of pre- Type 2 ¼ buffer radius
dictors. Example of articles that discuss the main issues caused by model defined by the user
Mask MASK Single shapefile or Peterson
extrapolation are discussed by Elith et al. (2010) and Owens at al.
raster (BIL/ASC/TIF/ et al.
(2013). SHP/TXT) mask from (2001)
which boundaries will
be extracted
2.8. Modeling algorithms User- USER-DELIMITED Folder with multiple –
delimited shapefile or raster
Arguments involved: (algorithm) (BIL/ASC/TIF/SHP/
As one of the primary sources of ENMs/SDMs uncertainty is the TXT) masks, one for
each species
method used to construct them (Watling et al., 2015; Thuiller et al.,

5
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

Table 5
Data partition methods available in the ENMTML package.
Data partition Type of Acronym used in the part Method description Additional arguments References
method partition argument

Bootstrap Random BOOT Random partition between training and test replicates and Fielding and Bell
subsets proportion (1997)
K-Fold Random KFOLD Random partition of occurrences in folds folds Fielding and Bell
(1997)
Bands Geographical BAND Geographical partition in one-dimension bands type ¼ 1(Latitude) Bahn and McGill
type ¼ 2 (Longitude) (2013)
Block Geographical BLOCK Geographical partition in two-dimensions – Roberts et al. (2017)
(checkerboard)

threshold at which the sum of Specificity and Sensitivity is maximum.


Table 6
The same logic stands for all the other alternatives, except for Lowest
Algorithms used by the ENMTML package to construct ecological niche and
Presence Threshold (LPT; Pearson, 2007) and Sensitivity. LPT threshold
species distribution models.
establishes a threshold value in which suitability is the lowest among all
Algorithm Acronym Package Data used to Reference
occurrence data. Sensitivity requires users to specify a desired sensitivity
used in the create package
algorithms models value for the resulting binary map (Table 8).
argument

Bioclim BIO dismo Presences Hijmans 2.11. Ensemble methods


(Envelope et al. (2017)
Score) Arguments involved: (ensemble)
Mahalanobis MAH dismo Presences Hijmans
The major source of model uncertainty is caused by the different
et al. (2017)
Domain DOM dismo Presences Hijmans
algorithms used to fit ENMs (Diniz-Filho et al., 2009; Thuiller et al.,
et al. (2017) 2019). A commonly used method to deal with this is to create an
Generalized GLM stats Presences R Core Team ensemble model of different algorithms (Araújo and New, 2007; Mar-
Linear and pseudo- (2018) mion et al., 2009). ENMTML offers six ensemble methods, three based on
Models absences
different ways to calculate models average and three based on PCA
Generalized GAM gam Presences Hastie
Additive and pseudo- (2018) derived from the models. Average-based ensembles can be created
Models absences using: i) a simple average of all models, ii) weighted average, in which
Support Vector SVM kernlab Presences Karatzoglou models suitability is weighted by how well that algorithm performed
Machine and pseudo- et al. (2004)
and iii) superior average, in which a simple average is calculated only
absences
Boosted BRT dismo Presences Hijmans
for those algorithms that performed better than the average of all al-
Regression and pseudo- et al. (2017) gorithms. PCA-based ensemble performs a principal components anal-
Trees absences ysis on suitability maps and uses the first component as the final map,
Random Forest RDF randomForest Presences Liaw and this can be performed: i) using all models, ii) using only the superior
and pseudo- Wiener
models, selected similarly to the superior average, and iii) principal
absences (2002)
Maximum MLK maxlike Presences Royle et al. components are calculated using only suitability values above the
Likelihood and (2012) threshold for each algorithm, values below the threshold are set to zero
background (Table 9).
points
Bayesian GAU GRaF Presences Golding
Gaussian and pseudo- (2014) 2.12. Methods to constrain ENMs
Process absences
Maximum MXS maxnet Presences Phillips Arguments involved: (msdm)
Entropy and (2017)
There is an underlying difference between ecological niche models
simple (only background
linear and points (ENMs) and species distribution models (SDMs), being that both the
quadratic niche and the distribution are more suitable to answer different ques-
features) tions (Peterson and Sobero �n, 2012). Usually, models’ output represents
Maximum MXD maxnet Presences Phillips the niche (ENMs), being that methods that bring ENMs closer to SDMs,
Entropy and (2017)
default (all background
called here MSDM, is a topic lightly treated on species distribution
features) points (Mendes et al., in prep; Table 10). MSDM procedures are grouped in two
Ecological ENF adehabitatHS Presences Calenge approaches, a priori and a posteriori methods. The first set of techniques
Niche Factor and (2006) creates geographic variables that are incorporated as predictors for
Analysis background
ENMs fitting (Allouche et al., 2008). The second set of methods con-
points
strains generated species suitability patterns using estimates of site
accessibility, not being included as predictors while fitting models
2.10. Threshold for binary maps (Mendes et al., in prep).

Arguments involved: (thr) 2.13. Parallel processing


The different thresholds are used to create binary maps, being that
more than one option can be chosen, which results in different sets of Arguments involved: (cores)
binary maps created within a single script run (Table 8). The thresholds The ENMTML package has the option to fit models using parallel
are chosen based on the suitability value that maximizes a given metric. processing, which accelerates the process. However, as this is
For instance, the MAX_TSS threshold uses the suitability value that gives computation-intensive, we chose to leave it open for users to decide the
the highest TSS value to create binary maps. This is the common number of computer cores allocated for fitting ENMs. If the users do not

6
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

Table 7
Example of an evaluation table output for models created using the algorithm Maxent for two different species and two different thresholds evaluated by a random
Bootstrap partition.
Sp Alg Part Thr AUC Kappa TSS Jaccard Sorensen Fpb pROC OR %Area Boyce

Sp_18 MXS BOOT MAX_TSS 0.995 0.950 0.950 0.954 0.976 1.909 1.754 0.240 65.765% 1.000
Sp_18 MXS BOOT LPT 0.990 0.929 0.928 0.938 0.966 1.875 1.675 0.000 78.345% 0.831
Sp_34 MXS BOOT MAX_TSS 0.998 0.966 0.966 0.969 0.984 1.938 1.876 0.120 72.972% 0.807
Sp_34 MXS BOOT LPT 0.990 0.929 0.928 0.938 0.966 1.875 1.290 0.000 87.029% 0.831
Sp AUC_SD Thr Kap_SD TSS_SD Jacc_SD Sor_SD Fpb_SD pROC_SD OR_SD %Area_SD Boyce_SD
Sp_18 0.007 MAX_TSS 0.071 0.071 0.064 0.034 0.129 0.023 0.120 2.875% 0.002
Sp_18 0.014 LPT 0.101 0.101 0.088 0.047 0.177 0.042 0.014 5.897% 0.015
Sp_34 0.003 MAX_TSS 0.047 0.047 0.044 0.023 0.088 0.054 0.028 3.471% 0.023
Sp_34 0.003 LPT 0.047 0.047 0.044 0.023 0.088 0.076 0.270 9.743% 0.042

Table 8 Table 9
Threshold for binary maps available in the ENMTML package. Ensemble methods available in the ENMTML package.
Chosen Acronym used Method Additional References Ensemble method Acronym used Method description Reference
metric for in the thr description arguments in the ensemble
threshold argument argument
definition
None NULL No ensemble is performed –
Least LPT Lowest – Pearson Mean MEAN Simple average of Thuiller
Presence suitability value (2007) suitability predicted by et al.
Threshold among different algorithms (2009)
occurrence data Weighted mean W_MEAN Average of suitability Thuiller
True Skill MAX_TSS Suitability value – Allouche values weighted by the et al.
Statistics that maximizes et al. performance of the (2009)
the TSS (2006) algorithms (TSS)
Kappa MAX_KAPPA Suitability value – Allouche Mean of the best SUP Average of the best Velazco
that maximizes et al. models algorithms, i.e., those et al.(2019)
the Kappa (2006) with TSS over the average
Sensitivity SENSITIVITY Suitability value sens – for a single species
that results in Principal PCA Performs a PCA with Thuiller
the specified Component algorithms suitability and (2004)
sensitivity value Analysis (PCA) returns the eigenvalues of
Jaccard JACCARD Suitability value – Leroy et al. the first principal
that maximizes (2018) component
the Jaccard Principal PCA_SUP Performs a PCA with the –
Index Component suitability of the best
Sorensen SORENSEN Suitability value – Leroy et al. Analysis with algorithms, i.e., those
that maximizes (2018) the best models with TSS over the average
the Sorensen for a single species, and
Index returns the eigenvalues of
the first principal
component
specify the number of cores, only a single core will be used. Principal PCA_THR Performs a PCA with –
Component suitability values above
Analysis with thresholds used to
2.14. Output & folders threshold binarize each algorithm

Arguments involved: (save_part/save_final)


There are several possible outputs for a single run of the ENMTML pseudo-absence allocation. Finally, if the user chose to perform a
package. All the outputs produced by the fitting process are within a geographical partition of the occurrence dataset, there will be a corre-
Result folder, which is created at the same level as the Predictors folders sponding sub-folder named BLOCK or BANDS, with the areas used to
(Fig. 2). Within the Result folder, there is a sub-folder named Algorithm delimit each occurrence subset.
that contains the suitability and binary maps produced for each algo- Other than the folders, there is also a series of TXT (tab-delimited)
rithm for each species. If the user chose to create ensemble models, there files within the Results folder. The main ones are the Evaluation_Table,
is another subfolder named Ensemble, with the combined maps created which contains the results for model evaluation; Thresholds contains the
for each ensemble type chosen by the user. If the user chose to perform suitability values used to create the binary maps, and InfoModelling
projections to different geographical regions or time periods there will provides a summary of the arguments used to fit the model. Other than
also be a sub-folder named Projection, within which are the sub-folders those, other useful files are Number_Unique_Occurrences, which specifies
for each projection scenario, with contains suitability maps generated the number of unique occurrences for each species; Occurrences_Cleaned
for all the algorithms and the ensemble of those algorithms, if the user- and Occurrences_Filtered returns the datasets produced after occurrences
specified an ensemble method. Users can control if partial and final went through the unique occurrences and thinning steps; Occurrences_-
models will be saved, altering the arguments save_part and save_final Fitting and Occurrences_Evaluation returns the dataset used for fitting and
(TRUE/FALSE). evaluating the models; Moran_and_Mess files have information about the
Files generated at the pre-processing stage are also within the Results Moran’s I and the environmental similarity (MESS) calculated between
folder. Accessible area masks for each species are found within the subsets, available both for random and geographical partition.
Extent_Masks sub-folder. Masks used to constrain pseudo-absence allo-
cation are also saved within Results, i.e., if the user chose to restrict 3. Comparison with other packages and innovations
pseudo-absences allocation using an environmental constraint, there
will be a sub-folder named Env_Constrain which indicates valid areas for There are several R packages to fit ENMs. We performed a literature

7
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

Table 10
Spatial restriction (MSDM) methods available in the ENMTML package.
Method Method names Acronym used in the msdm Characteristic Reference
type argument

None None NULL Does not constrain ENMs –


A priori Latlong XY Create two layers with latitude and longitude values Allouche et al.
(2008)
Minimum distance MIN Create a layer with the distance of each cell to the closest occurrence Allouche et al.
(2008)
Cumulative distance CML Create a layer with information of the summed distance from each cell to all Allouche et al.
occurrences (2008)
Kernel KER Create a layer with a Gaussian-Kernel on the occurrence data Allouche et al.
(2008)
A posteriori Occurrences Based OBR Uses the distance between points to exclude far suitable patches Mendes et al. (in
Restriction prep)
Lower Quantile LR Select 25% of suitability patches without presences that are nearest Mendes et al. (in
suitability patches with presences prep)
Presence PRES Select only the patches with confirmed occurrence data Mendes et al. (in
prep)
Minimum Convex Polygon MCP Excludes suitable cells outside the minimum convex polygon of the Kremen et al.
occurrence data (2008)
Buffered Minimum Convex MCP-B Creates a buffer around the MCP Kremen et al.
Polygon (2008)

search and found seven alternatives: biomod (Thuiller et al., 2009), produced for a single species (full models’ outputs can be found in Ap-
ModEco (Guo and Liu, 2010), sdm (Naimi and Araújo, 2016), Model-R pendix A). For this example, we used five bioclimatic variables (bio1,
(S�
anchez-Tapia et al., 2018), Wallace (Kass et al., 2018), ZOON (Golding bio3, bio4, bio12, and bio15) from the WorldClim database v2.0 (https:
et al., 2018), and kuenm (Cobos et al., 2019). We summarize those //www.worldclim.org). We projected the models to 2080 climatic
packages in a table, highlighting each package features and contrasting conditions with a Representative Concentration Pathway (RCP) of 8.5.
them with the features available at ENMTML (Table 11). Most packages We used the MOHC HadGEM2-ES model and the same bioclimatic
focus on the development of a specific aspect of the modeling process, e. variables used in current conditions sourced by GCM Downscaled Data
g., the package biomod was proposed as a platform for creating ensemble Portal (http://ccafs-climate.org). Current and future variables had ten
models, while the package kuenm is heavily focused towards accurately arcmins of resolution. We performed a Principal Component Analysis
developing Maxent models; therefore a crucial aspect of softwar- (PCA) in the environmental data in order to reduce predictors collin-
e/package selection lies on the study objective. earity (see the details of this procedure in the Methods sub-section
We introduce the package ENMTML, which proposes to integrate “Predictors input and collinearity reduction”). We employed Support
complex methodological developments in the ENMs’ field, published Vector Machine (SVM), Random Forests (RDF), and Maximum Entropy
from several different sources, in a single package and make them visible with default tuning (MXD) as algorithms. We used an equal number of
for users, which are not accustomed to the methodological details of absences and presences (i.e., presences/absences ratio equal to 1), which
ENMs. Our secondary objective was to make the package user-friendly, were randomly allocated within a calibration area (i.e., species acces-
even for people not comfortable with the programming environment; sible area) delimited by a buffer of 500 km around the presences. Models
therefore, we summarized the whole process into one single function were validated by spatial block cross-validation. For the current condi-
with arguments that must be filled by the user according to the study tion we constrained the models using the method MCP-B (see Methods
objectives. We covered the majority of the ENMs process, from pre- sub-section “Methods to constrain ENMs”) with a buffer of 200 km
processing occurrences and predictors to post-processing suitability around the MCP. Final models were constructed by ensembling all the
models into ensembles or MSDM and provided several methodological algorithms with a PCA (see details in Methods sub-section “Ensemble
alternatives to the different modeling steps. methods”). We calculated models’ extrapolation for current and future
conditions based on Mobility-Oriented Parity (MOP) metric. The total
4. Example time used for fitting and processing the models of five species employing
four cores was 2.545 min.
We used the ENMTML package to fit current and future distribution All these procedures are expressed in R command line below:
for five virtual species (Fig. 3). We only present here the results

8
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

Table 11
Comparison of features included in seven R packages used for fitting ENMs and the ENMTML package.

Fig. 3. Some output layers generated by


ENMTML package. a) The calibration area
used to construct the models based on a 200
km buffer around presences (white dots).
Black and yellow checkerboard shows the
best geographic block partition found for
this species occurrences. b) and c) depict
continuous and binary suitability patterns
without restriction, respectively. d) and g)
represent models’ extrapolation for current
and 2080 (RCP 8.5) environmental condi-
tions, respectively. Extrapolation is based on
the Mobility-Oriented Parity metric. The
closer to zero, the higher the extrapolation.
e) and f) depict continuous and binary suit-
ability patterns constrained by a Minimum
Convex Polygon plus a buffer of 200 km. h)
and i) represent a continuous and binary
suitability pattern for 2080 environmental
conditions (RCP 8.5). Current and 2080
suitability patterns are ensembled models
perfomed by Principal Component Analysis.
(For interpretation of the references to
colour in this figure legend, the reader is
referred to the Web version of this article.)

9
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

5. Future prospects Anderson, R.P., Gonzalez, I., 2011. Species-specific tuning increases robustness to
sampling bias in models of species distributions: an implementation with Maxent.
Ecol. Model. 222, 2796–2811.
We present the release of the ENMTML package, but we already have Araújo, M.B., New, M., 2007. Ensemble forecasting of species distributions. Trends Ecol.
in mind ideas for future implementations. As the main objective of the Evol. 22, 42–47.
package is to approach complex methodological developments to people Bahn, V., McGill, B.J., 2013. Testing the predictive performance of distribution models.
Oikos 122, 321–331.
that rely on ENMs but do not focus the development of new methods and Barbet-Massin, M., Jiguet, F., Albert, C.H., Thuiller, W., 2012. Selecting pseudo-absences
are not comfortable using R, in the next update we expect to launch a for species distribution models: how, where and how many? Methods Ecol. Evol. 3,
web platform using Shiny. On the other hand, we also believe that 327–338.
Barve, N., Barve, V., Jim� enez-Valverde, A., Lira-Noriega, A., Maher, S.P., Peterson, A.T.,
ENMTML package might be of great use for the whole ENMs’ commu- Sober�on, J., Villalobos, F., 2011. The crucial role of the accessible area in ecological
nity, as it centers on methodological developments scattered around the niche modeling and species distribution modeling. Ecol. Model. 222, 1810–1819.
literature, and not always implemented in R, in one single location. With Boyce, M.S., Vernier, P.R., Nielsen, S.E., Schmiegelow, F.K., 2002. Evaluating resource
selection functions. Ecol. Model. 157, 281–300.
that in mind, we also look forward to providing further options for Calenge, C., 2006. The package adehabitat for the R software: a tool for the analysis of
people who are interested in the fine-tuning of models. One of the first space and habitat use by animals. Ecol. Model. 197, 516–519.
additions already planned is the possibility for users to change algo- Campos, M., de Andrade, A.F.A., Kunzmann, B., 2014. Modelling of the potential
distribution of Limnoperna fortunei (Dunker, 1857) on a global scale. Aquat.
rithms parameters. In addition, we also plan to explore in-depth the Invasions 9, 253–265.
ensemble field and include more ensemble alternatives and uncertainty Carpenter, G., Gillison, A.N., Winter, J., 1993. DOMAIN: a flexible modelling procedure
maps. Finally, we believe an important aspect of ENMs is to be clear for mapping potential distributions of plants and animals. Biodivers. Conserv. 2,
667–680.
about model uncertainty; therefore, in the upcoming update, we will
Carstens, B.C., Richards, C.L., 2007. Iintegrating coalescent and ecological niche
implement metrics to calculate source of uncertainty for each species in modeling in comparative phylogeography. Evolution 61, 1439–1454.
a way similar to Watling et al. (2015). Other than the already planned Chifflet, L., Rodriguero, M.S., Calcaterra, L.A., Rey, O., Dinghi, P.A., Baccaro, F.B.,
improvements, users can expect novel methodological approaches Souza, J.L.P., Follett, P., Confalonieri, V.A., 2016. Evolutionary history of the little
fire ant Wasmannia auropunctata before global invasion: inferring dispersal
published in the literature to be implemented in the future versions of patterns, niche requirements and past and present distribution within its native
the package and are welcome to contribute with the development of the range. J. Evol. Biol. 29, 790–809.
package and suggest new features. Cobos, M.E., Peterson, A.T., Barve, N., Osorio-Olvera, L., 2019. kuenm: an R package for
detailed development of ecological niche models using Maxent. PeerJ 7, e6281.
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. XX,
37–46.
Declaration of competing interest Cooper, J.C., Sober� on, J., 2018. Creating individual accessible area hypotheses improves
stacked species distribution model performance. Glob. Ecol. Biogeogr. 27, 156–165.
Diniz-Filho, J.A.F., Mauricio Bini, L., Fernando Rangel, T., Loyola, R.D., Hof, C., Nogu�es-
The authors declare that they have no known competing financial Bravo, D., Araújo, M.B., 2009. Partitioning and mapping uncertainties in ensembles
interests or personal relationships that could have appeared to influence of forecasts of species turnover under climate change. Ecography 32, 897–906.
Dormann, C.F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carr�e, G., Marqu� ez, J.R.G.,
the work reported in this paper.
Gruber, B., Lafourcade, B., Leit~ao, P.J., Münkemüller, T., McClean, C., Osborne, P.E.,
Reineking, B., Schr€ oder, B., Skidmore, A.K., Zurell, D., Lautenbach, S., 2013.
Collinearity: a review of methods to deal with it and a simulation study evaluating
Acknowledgments
their performance. Ecography 36, 27–46.
Elith, J., Kearney, M., Phillips, S., 2010. The art of modelling range-shifting species.
We like to use this chance to thank the many people who helped us Methods Ecol. Evol. 1, 330–342.
by testing the package throughout its development. There is not enough Engler, R., Guisan, A., Rechsteiner, L., 2004. An improved approach for predicting the
distribution of rare and endangered species from occurrence and pseudo-absence
space to name them all, but we are greatly indebted to a large number of data. J. Appl. Ecol. 41, 263–274.
people. This study was financed in part by the Coordenaça ~o de Aper- Farber, O., Kadmon, R., 2003. Assessment of alternative approaches for bioclimatic
feiçoamento de Pessoal de Nível Superior - Brazil (CAPES) - Finance modeling with special emphasis on the Mahalanobis distance. Ecol. Model. 160,
115–130.
Code 001. A.F.A.A thanks CAPES for his doctoral research grant. S.J.E.V. Fielding, A.H., Bell, J.F., 1997. A review of methods for the assessment of prediction
thanks to Consejo Nacional de Investigaciones Científicas y T�ecnicas - errors in conservation presence/absence models. Environ. Conserv. 24, 38–49.
Argentina (CONICET) for the postdoctoral fellowship. P.D.M is contin- Fitzpatrick, M.C., Hargrove, W.W., 2009. The projection of species distribution models
and the problem of non-analog climate. Biodivers. Conserv. 18, 2255–2261.
uously supported by Conselho Nacional de Desenvolvimento Científico e Friedman, J., 2001. Greedy function Approximation : a gradient boosting machine. Ann.
Tecnolo �gico - Brazil (CNPq) productivity grants (308694/2015-5). A.F. Stat. 29, 1189–1232.
A.A and S.J.E.V were responsible for the code; A.F.A.A, S.J.E.V, and P.D. Golding, N., 2014. GRaF: Species Distribution Modelling Using Latent Gaussian Random
Fields.
M discussed the methodological steps and the respective implementa-
Golding, N., August, T.A., Lucas, T.C.D., Gavaghan, D.J., van Loon, E.E., McInerny, G.,
tion; S.J.E.V created the package structure and the help file; A.F.A.A led 2018. The ZOON R package for reproducible and shareable species distribution
the writing of the manuscript. All authors contributed to the draft, modelling. Methods Ecol. Evol. 9, 260–268.
Golding, N., Purse, B.V., 2016. Fast and flexible Bayesian species distribution modelling
improvement of the ideas and gave final approval for publication.
using Gaussian processes. Methods Ecol. Evol. 7, 598–608.
Guillera-Arroita, G., Lahoz-Monfort, J.J., Elith, J., Gordon, A., Kujala, H., Lentini, P.E.,
Appendix A. Supplementary data Mccarthy, M.A., Tingley, R., Wintle, B.A., 2015. Is my species distribution model fit
for purpose? Matching data and models to applications. Glob. Ecol. Biogeogr. 24,
276–292.
Supplementary data to this article can be found online at https://doi. Guo, Q., Kelly, M., Graham, C.H., 2005. Support vector machines for predicting
org/10.1016/j.envsoft.2019.104615. distribution of Sudden Oak Death in California. Ecol. Model. 182, 75–90.
Guo, Q., Liu, Y., 2010. ModEco: an integrated software package for ecological niche
modeling. Ecography 33, 637–642.
References Hao, T., Elith, J., Guillera-Arroita, G., Lahoz-Monfort, J.J., 2019. A review of evidence
about use and performance of species distribution modelling ensembles like
BIOMOD. Divers. Distrib. 1–14.
Ahmed, S.E., Mcinerny, G., O’Hara, K., Harper, R., Salido, L., Emmott, S., Joppa, L.N.,
Hastie, T., 2018. Gam: Generalized Additive Models.
2015. Scientists and software - surveying the species distribution modelling
Hastie, T., Tibshirani, R., 1990. Generalised Additive Models. Chapman and Hall.
community. Divers. Distrib. 21, 258–267.
Heikkinen, R.K., Luoto, M., Araújo, M.B., Virkkala, R., Thuiller, W., Sykes, M.T., 2006.
Aiello-Lammens, M.E., Boria, R.A., Radosavljevic, A., Vilela, B., Anderson, R.P., 2015.
Methods and uncertainties in bioclimatic envelope modelling under climate change.
spThin: an R package for spatial thinning of species occurrence records for use in
Prog. Phys. Geogr. 30, 751–777.
ecological niche models. Ecography 38, 541–545.
Hijmans, R.J., Phillips, S., Leathwick, J., Elith, J., 2017. Dismo: species distribution
Allouche, O., Steinitz, O., Rotem, D., Rosenfeld, A., Kadmon, R., 2008. Incorporating
modeling. http://Cran.R-Project.Org/Web/Packages/Dismo.
distance constraints into species distribution models. J. Appl. Ecol. 45, 599–609.
Hirzel, a.H., Hausser, J., Chessel, D., Perrin, N., 2002. Ecological-niche factor analysis:
Allouche, O., Tsoar, A., Kadmon, R., 2006. Assessing the accuracy of species distribution
how to compute habitat-suitability maps without absence data? Ecology 83,
models: prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol. 43,
2027–2036.
1223–1232.

10
A.F.A. Andrade et al. Environmental Modelling and Software 125 (2020) 104615

Karatzoglou, A., Smola, A., Hornik, K., Zeileis, A., 2004. kernlab-an S4 package for kernel Peterson, A.T., Sober� on, J., 2012. Species distribution modeling and ecological niche
methods in R. J. Stat. Softw. 11, 1–20. modeling: getting the Concepts Right. Nat. Conserv. 10, 102–107.
Kass, J.M., Vilela, B., Aiello-Lammens, M.E., Muscarella, R., Merow, C., Anderson, R.P., Peterson, A.T., Sober� on, J., Pearson, R.G., Anderson, R.P., Martínez-Meyer, E.,
2018. Wallace: a flexible platform for reproducible modeling of species niches and Nakamura, M., Bastos Araujo, M., 2011. Ecological Niches and Geographic
distributions built for community expansion. Methods Ecol. Evol. 9, 1151–1156. Distributions. Princeton University Press.
Keppel, G., Van Niel, K.P., Wardell-Johnson, G.W., Yates, C.J., Byrne, M., Mucina, L., Peterson, T.A., 2001. Predicting species’ geographic distributions based on ecological
Schut, A.G.T., Hopper, S.D., Franklin, S.E., 2012. Refugia: identifying and niche modeling. Condor 103, 599–605.
understanding safe havens for biodiversity under climate change. Glob. Ecol. Phillips, S., 2017. Maxnet: fitting “maxent” species distribution models with “glmnet.
Biogeogr. 21, 393–404. Phillips, S., Anderson, R., Schapire, R., 2006. Maximum entropy modeling of species
Kremen, C., Cameron, A., Moilanen, A., Phillips, S.J., Thomas, C.D., Beentje, H., geographic distributions. Ecol. Model. 190, 231–259.
Dransfield, J., Fisher, B.L., Glaw, F., Good, T.C., Harper, G.J., Hijmans, R.J., Lees, D. Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classification and regression tree
C., Louis, E., Nussbaum, R.A., Raxworthy, C.J., Razafimpahanana, A., Schatz, G.E., techniques: bagging and random forests for ecological prediction. Ecosystems 9,
Vences, M., Vieites, D.R., Wright, P.C., Zjhra, M.L., 2008. Aligning conservation 181–199.
priorities across Taxa in Madagascar with high-resolution planning tools. Science van Proosdij, A.S.J., Sosef, M.S.M., Wieringa, J.J., Raes, N., 2015. Minimum required
320, 222–226. number of specimen records to develop accurate species distribution models.
Leroy, B., Delsol, R., Hugueny, B., Meynard, C.N., Barhoumi, C., Barbet-Massin, M., Ecography 39 (6), 542–552. https://doi.org/10.1111/ecog.01509.
Bellard, C., 2018. Without quality presence – absence data , discrimination metrics Qiao, H., Sober� on, J., Peterson, A.T., 2015. No silver bullets in correlative ecological
such as TSS can be misleading measures of model performance. J. Biogeogr. 45, niche modelling: insights from testing among many potential algorithms for niche
1994–2002. estimation. Methods Ecol. Evol. 6, 1126–1136.
Li, W., Guo, Q., 2013. How to assess the prediction accuracy of species presence-absence R Core Team, 2018. R: A Language and Environment for Statistical Computing. R
models without absence data? Ecography 36, 788–799. Foundation for Statistical Computing.
Liaw, A., Wiener, M., 2002. Classification and regression by randomForest. R. News 2, Razgour, O., Taggart, J.B., Manel, S., Juste, J., Ib�an
~ ez, C., Rebelo, H., Alberdi, A.,
18–22. Jones, G., Park, K., 2018. An integrated framework to identify wildlife populations
Lins, D.M., de Marco, P., Andrade, A.F.A., Rocha, R.M., 2018. Predicting global ascidian under threat from climate change. Mol. Ecol. Resour. 18, 18–31.
invasions. Divers. Distrib. 24, 692–704. Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G.,
Lobo, J.M., Jim�enez-Valverde, A., Hortal, J., 2010. The uncertain nature of absences and Hauenstein, S., Lahoz-Monfort, J.J., Schr€ oder, B., Thuiller, W., Warton, D.I.,
their importance in species distribution modelling. Ecography 33, 103–114. Wintle, B.A., Hartig, F., Dormann, C.F., 2017. Cross-validation strategies for data
Lobo, J.M., Jim�enez-Valverde, A., Real, R., 2008. AUC: a misleading measure of the with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 1–17.
performance of predictive distribution models. Glob. Ecol. Biogeogr. 17, 145–151. Royle, J.A., Chandler, R.B., Yackulic, C., Nichols, J.D., 2012. Likelihood analysis of
De Marco, P., N� obrega, C.C., 2018. Evaluating collinearity effects on species distribution species occurrence probability from presence-only data for modelling species
models: an approach based on virtual species simulation. PLoS One 13, e0202403. distributions. Methods Ecol. Evol. 3, 545–554.
Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R.K., Thuiller, W., 2009. Evaluation S�
anchez-Tapia, A., de Siqueira, M.F., Lima, R.O., Barros, F.S.M., Gall, G.M., Gadelha, L.
of consensus methods in predictive species distribution modelling. Divers. Distrib. M.R., da Silva, L.A.E., Osthoff, C., 2018. Model-R: a framework for scalable and
15, 59–69. reproducible ecological niche modeling. Commun. Comput. Inf. Sci. 796, 218–232.
Marquaridt, D.W., 1970. Generalized inverses, ridge regression, biased linear estimation, Senay, S.D., Worner, S.P., Ikeda, T., 2013. Novel three-step pseudo-absence selection
and nonlinear estimation. Technometrics 12, 591–612. technique for improved species distribution modelling. PLoS One 8, e71218.
Mateo, R.G., Felicísimo, A.M.,
� Mu~noz, J., 2010. Effects of the number of presences on Soberon, J., Peterson, A.T., 2005. Interpretation of models of fundamental ecological
reliability and stability of MARS species distribution models: the importance of niches and species’ distributional areas. Biodivers. Inf. 2, 1–10.
regional niche variation and ecological heterogeneity. J. Veg. Sci. 21, 908–922. Sober�on, J.M., 2010. Niche and area of distribution modeling: a population ecology
McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models. Chapman and Hall. perspective. Ecography 33, 159–167.
Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J.M., Uriarte, M., Thuiller, W., 2004. Patterns and uncertainties of species’ range shifts under climate
Anderson, R.P., 2014. ENMeval: an R package for conducting spatially independent change. Glob. Chang. Biol. 10, 2020–2027.
evaluations and estimating optimal model complexity for MAXENT ecological niche Thuiller, W., Gu� eguen, M., Renaud, J., Karger, D.N., Zimmermann, N.E., 2019.
models. Methods Ecol. Evol. 5, 1198–1205. Uncertainty in ensembles of global biodiversity scenarios. Nat. Commun. 10, 1446.
Naimi, B., Araújo, M.B., 2016. sdm: a reproducible and extensible R platform for species Thuiller, W., Lafourcade, B., Engler, R., Araújo, M.B., 2009. BIOMOD - a platform for
distribution modelling. Ecography 39, 368–375. ensemble forecasting of species distributions. Ecography 32, 369–373.
Nix, H.A., 1986. A biogeographic analysis of Australian elapid snakes. Aust. Flora Fauna Velazco, S.J.E., Villalobos, F., Galv~
ao, F., De Marco Júnior, P., 2019. A dark scenario for
Ser. 7 4–15 (Australian Government Publishing Service). Cerrado plant species: effects of future climate, land use and protected areas
Owens, H.L., Campbell, L.P., Dornak, L.L., Saupe, E.E., Barve, N., Sober� on, J., ineffectiveness. Divers. Distrib. 25, 660–673.
Ingenloff, K., Lira-Noriega, A., Hensz, C.M., Myers, C.E., Peterson, A.T., 2013. Veloz, S.D., 2009. Spatially autocorrelated sampling falsely inflates measures of accuracy
Constraints on interpretation of ecological niche models by limited environmental for presence-only niche models. J. Biogeogr. 36, 2290–2299.
ranges on calibration areas. Ecol. Model. 263, 10–18. Watling, J.I., Brandt, L.A., Bucklin, D.N., Fujisaki, I., Mazzotti, F.J., Roma~nach, S.S.,
Pearson, R.G., 2007. Species’ distribution modeling for conservation educators and Speroterra, C., 2015. Performance metrics and variance partitioning reveal sources
practitioners. Lessons Conserv. 3, 54–89. of uncertainty in species distribution models. Ecol. Model. 309–310, 48–59.
Peterson, A.T., 2003. Predicting the geography of species’ invasions via ecological niche Wisz, M.S., Guisan, A., 2009. Do pseudo-absence selection strategies influence species
modeling. Q. Rev. Biol. 78, 419–433. distribution models and their predictions? An information-theoretic approach based
Peterson, A.T., Holt, R.D., 2003. Niche differentiation in Mexican birds: using point on simulated data. BMC Ecol. 9, 8.
occurrences to detect ecological innovation. Ecol. Lett. 6, 774–782. Wisz, M.S., Hijmans, R.J., Li, J., Peterson, A.T., Graham, C.H., Guisan, A., Elith, J.,
Peterson, A.T., Papeş, M., Sober� on, J., 2008. Rethinking receiver operating characteristic Dudík, M., Ferrier, S., Huettmann, F., Leathwick, J.R., Lehmann, A., Lohmann, L.,
analysis applications in ecological niche modeling. Ecol. Model. 213, 63–72. Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M.C.,
Peterson, A.T., S� anchez-Cordero, V., Sober� on, J., Bartley, J., Buddemeier, R.W., Navarro- Phillips, S.J., Richardson, K.S., Scachetti-Pereira, R., Schapire, R.E., Sober�on, J.,
Sigüenza, A.G., 2001. Effects of global climate change on geographic distributions of Williams, S.E., Zimmermann, N.E., 2008. Effects of sample size on the performance
Mexican Cracidae. Ecol. Model. 144, 21–30. of species distribution models. Divers. Distrib. 14, 763–773.
Peterson, A.T., Shaw, J., 2003. Lutzomyia vectors for cutaneous leishmaniasis in Zaniewski, A.E., Lehmann, A., Overton, J.M., 2002. Predicting species spatial
Southern Brazil: ecological niche models, predicted geographic distributions, and distributions using presence-only data: a case study of native New Zealand ferns.
climate change effects. Int. J. Parasitol. 33, 919–931. Ecol. Model. 157, 261–280.

11

You might also like