You are on page 1of 38

A Brief Tutorial on Maxent

By Steven Phillips, AT&T Research



This tutorial gives a basic introduction to use of the MaxEnt progra for axiu entropy odelling of
species! geographic distributions, "ritten by Steven Phillips, Miro #udi$ and Rob Schapire, "ith support
fro AT&T %abs&Research, Princeton 'niversity, and the (enter for Biodiversity and (onservation,
Aerican Museu of )atural *istory+ ,or ore details on the theory axiu entropy odeling as "ell
as a description of the data used and the ain types of statistical analysis used here, see-
Steven .+ Phillips, Robert P+ Anderson and Robert E+ Schapire, Maximum entropy modeling of species
geographic distributions+ Ecological Modelling, /ol 01234&5 pp 640&671, 6228+
A second paper describing ore recently&added features of the Maxent soft"are is-
Steven .+ Phillips and Miroslav #udi$, Modeling of species distributions with Maxent: new
extensions and a comprehensive evaluation+ Ecography, /ol 40, pp 080&097, 622:+
The environental data "e "ill use consist of cliatic and elevational data for South Aerica, together
"ith a potential vegetation layer+ ;ur saple species "ill be Bradypus variegatus, the bro"n&throated
three&toed sloth+ This tutorial "ill assue that all the data files are located in the sae directory as the
axent progra files< other"ise you "ill need to use the path =e+g+, c->data>axent>tutorial? in front of the
file naes used here+
@etting started
#o"nloading
The soft"are consists of a Aar file, axent+Aar, "hich can be used on any coputer running .ava version
0+5 or later+ Maxent can be do"nloaded, along "ith associated literature, fro
www.cs.princeton.edu/~schapire/maxent< the .ava runtie environent can be obtained fro
Aava+sun+co3Aavase3do"nloads+ Bf you are using Microsoft Cindo"s =as "e assue here?, you should
also do"nload the file axent+bat, and save it in the sae directory as axent+Aar+ The "ebsite has a file
called Dreade+txtE, "hich contains instructions for installing the progra on your coputer+
,iring up
Bf you are using Microsoft Cindo"s, siply clic$ on the file axent+bat+ ;ther"ise, enter FAava &x706
&Aar axent+AarF in a coand shell ="here F706F can be replaced by the egabytes of eory you "ant
ade available to the progra?+ The follo"ing screen "ill appear-
To perfor a run, you need to supply a file containing presence localities =DsaplesE?, a directory
containing environental variables, and an output directory+ Bn our case, the presence localities are in the
file Dsaples>bradypus+csvE, the environental layers are in the directory DlayersE, and the outputs are
going to go in the directory DoutputsE+ Gou can enter these locations by hand, or bro"se for the+ Chile
bro"sing for the environental variables, reeber that you are loo$ing for the directory that contains
the H you don!t need to bro"se do"n to the files in the directory+ After entering or bro"sing for the files
for Bradypus, the progra loo$s li$e this-
The file Dsaples>bradypus+csvE contains the presence localities in +csv forat+ The first fe" lines are as
follo"s-
species,longitude,latitude
bradypus_variegatus,-65.4,-1.!"!!
bradypus_variegatus,-65.!"!!,-1.!"!!
bradypus_variegatus,-65.1!!!,-16."
bradypus_variegatus,-6!.666#,-1#.45
bradypus_variegatus,-6!."5,-1#.4
There can be ultiple species in the sae saples file, in "hich case ore species "ould appear in the
panel, along "ith Bradypus+ (oordinate systes other than latitude and longitude can be used provided
that the saples file and environental layers use the sae coordinate syste+ The DxE coordinate
=longitude, in our case? should coe before the DyE coordinate =latitude? in the saples file+ Bf the
presence data has duplicate records =ultiple records for the sae species in the sae grid cell?, the
duplicates are reoved by default< this can be changed by clic$ing on the DSettingsE button and
deselecting DReove duplicate presence recordsE+
The directory DlayersE contains a nuber of ascii raster grids =in ESRB!s +asc forat?, each of "hich
describes an environental variable+ The grids ust all have the sae geographic bounds and cell siIe
=i+e+ all the ascii file headings ust atch each other perfectly?+ ;ne of our variables, DecoregE, is a
categorical variable describing potential vegetation classes+ The categories ust be indicated by nubers,
rather than letters or "ords+ Gou ust tell the progra "hich variables are categorical, as has been done
in the picture above+
#oing a run
Siply press the DRunE button+ A progress onitor describes the steps being ta$en+ After the
environental layers are loaded and soe initialiIation is done, progress to"ards training of the axent
odel is sho"n li$e this-
The gain is closely related to deviance, a easure of goodness of fit used in generaliIed additive and
generaliIed linear odels+ Bt starts at 2 and increases to"ards an asyptote during the run+ #uring this
process, Maxent is generating a probability distribution over pixels in the grid, starting fro the unifor
distribution and repeatedly iproving the fit to the data+ The gain is defined as the average log probability
of the presence saples, inus a constant that a$es the unifor distribution have Iero gain+ At the end
of the run, the gain indicates ho" closely the odel is concentrated around the presence saples< for
exaple, if the gain is 6, it eans that the average li$elihood of the presence saples is exp=6? J 9+5 ties
higher than that of a rando bac$ground pixel+ )ote that Maxent isn!t directly calculating Dprobability of
occurrenceE+ The probability it assigns to each pixel is typically very sall, as the values ust su to 0
over all the pixels in the grid =though "e return to this point "hen "e copare output forats?+
The run produces ultiple output files, of "hich the ost iportant for analyIing your odel is an htl
file called Dbradypus+htlE+ The end of this file gives pointers to the other outputs, li$e this-
%oo$ing at a prediction
By default, the htl output file contains a picture of the odel applied to the given environental data-
The iage uses colors to indicate predicted probability that conditions are suitable, "ith red indicating
high probability of suitable conditions for the species, green indicating conditions typical of those "here
the species is found, and lighter shades of blue indicating lo" predicted probability of suitable conditions+
,or Bradypus, "e see that suitable conditions are predicted to be highly probable through ost of lo"land
(entral Aerica, "et lo"land areas of north"estern South Aerica, the AaIon basin, (aribean islands,
and uch of the Atlantic forests in south&eastern BraIil+ The file pointed to is an iage file =+png? that you
can Aust clic$ on =in Cindo"s? or open in ost iage processing soft"are+ Bf you "ant to copy these
iages, or "ant to open the "ith other soft"are, you "ill find the +png files in the directory called
DplotsE that has been created as an output during the run+
The test points are a rando saple ta$en fro the species presence localities+ The sae rando saple
is used each tie you run Maxent on the sae data set, unless you select the Drando seedE option on the
settings panel+ Alternatively, test data for one or ore species can be provided in a separate file, by giving
the nae of a DTest saple fileE in the Settings panel+
Output formats
Maxent supports three output forats for odel values- ra", cuulative and logistic+ ,irst, the ra" output
is Aust the Maxent exponential odel itself+ Second, the cuulative value corresponding to a ra" value of
r is the percentage of the Maxent distribution "ith ra" value at ost r+ (uulative output is best
interpreted in ters of predicted oission rate- if "e set a cuulative threshold of c, the resulting binary
prediction "ould have oission rate cK on saples dra"n fro the Maxent distribution itself, and "e can
predict a siilar oission rate for saples dra"n fro the species distribution+ Third, if c is the
exponential of the entropy of the axent distribution, then the logistic value corresponding to a ra" value
of r is cr3=0Lcr?+ This is a logistic function, because the ra" value is an exponential function of the
environental variables+ The three output forats are all onotonically related, but they are scaled
differently, and have different interpretations+ The default output is logistic, "hich is the easiest to
conceptualiIe- it gives an estiate bet"een 2 and 0 of probability of presence+ )ote that probability of
presence depends on details of the sapling design, such as the plot siIe and =for vagile organiss?
observation tie< logistic output estiates probability of presence assuing that the sapling design is
such that typical presence localities have probability of presence of about 2+7+ This value of 2+7 is fairly
arbitrary, and can be adAusted =using the Ddefault prevalenceE paraeter? if inforation is available on the
probability of presence at typical presence localities+ The picture of the Bradypus odel above uses the
logistic forat+ Bn coparison, using the ra" forat gives the follo"ing picture-
)ote that "e have used a logarithic scale for the colors+ A linear scale "ould be ostly blue, "ith a fe"
red pixels =you can verify this by deselecting D%ogscale picturesE on the Settings panel? since the ra"
forat typically gives a sall nuber of sites relatively large values H this can be thought of as an artifact
of the ra" output being given by an exponential distribution+
'sing the cuulative output forat gives the follo"ing picture-
As "ith the ra" output, "e have used a logarithic scale for coloring the picture in order to ephasiIe
differences bet"een saller values+ (uulative output can be interpreted as predicting suitable conditions
for the species above a threshold in the approxiate range of 0&62 =or yello" through orange, in this
picture?, depending on the level of predicted oission that is acceptable for the application+
Statistical analysis
The D67E "e entered for Drando test percentageE told the progra to randoly set aside 67K of the
saple records for testing+ This allo"s the progra to do soe siple statistical analysis+ Much of the
analysis used the use of a threshold to a$e a binary prediction, "ith suitable conditions predicted above
the threshold and unsuitable belo"+ The first plot sho"s ho" testing and training oission and predicted
area vary "ith the choice of cuulative threshold, as in the follo"ing graph-
*ere "e see that the oission on test saples is a very good atch to the predicted oission rate, the
oission rate for test data dra"n fro the Maxent distribution itself+ The predicted oission rate is a
straight line, by definition of the cuulative output forat+ Bn soe situations, the test oission line lies
"ell belo" the predicted oission line- a coon reason is that the test and training data are not
independent, for exaple if they derive fro the sae spatially autocorrelated presence data+
The next plot gives the receiver operating curve for both training and test data, sho"n belo"+ The area
under the R;( curve =A'(? is also given here< if test data are available, the standard error of the A'( on
the test data is given later on in the "eb page+
Bf you use the sae data for training and for testing then the red and blue lines "ill be identical+ Bf you
split your data into t"o partitions, one for training and one for testing it is noral for the red =training? line
to sho" a higher A'( than the blue =testing? line+ The red =training? line sho"s the DfitE of the odel to
the training data+ The blue =testing? line indicates the fit of the odel to the testing data, and is the real test
of the odels predictive po"er+ The turMuoise line sho"s the line that you "ould expect if your odel "as
no better than rando+ Bf the blue line =the test line? falls belo" the turMuoise line then this indicates that
your odel perfors "orse than a rando odel "ould+ The further to"ards the top left of the graph that
the blue line is, the better the odel is at predicting the presences contained in the test saple of the data+
,or ore detailed inforation on the A'( statistic a good starting reference is- ,ielding, A+*+ & Bell, .+,+
=0119? A revie" of ethods for the assessent of prediction errors in conservation presence3 absence
odels+ Environental (onservation 65=0?- 4:&51+ Because "e have only occurrence data and no
absence data, Dfractional predicted areaE =the fraction of the total study area predicted present? is used
instead of the ore standard coission rate =fraction of absences predicted present?+ ,or ore
discussion of this choice, see the paper in Ecological Modelling entioned on Page 0 of this tutorial+ Bt is
iportant to note that A'( values tend to be higher for species "ith narro" ranges, relative to the study
area described by the environental data+ This does not necessarily ean that the odels are better<
instead this behavior is an artifact of the A'( statistic+
Bf test data are available, the progra autoatically calculates the statistical significance of the prediction,
using a binoial test of oission+ ,or Bradypus, this gives-
,or ore detailed inforation on the binoial statistic, see the Ecological Modelling paper entioned
above+
Chich variables atter ostN
A natural application of species distribution odeling is to ans"er the Muestion, "hich variables atter
ost for the species being odeledN There is ore than one "ay to ans"er this Muestion< here "e outline
the possible "ays in "hich Maxent can be used to address it+
Chile the Maxent odel is being trained, it $eeps trac$ of "hich environental variables are contributing
to fitting the odel+ Each step of the Maxent algorith increases the gain of the odel by odifying the
coefficient for a single feature< the progra assigns the increase in the gain to the environental
variable=s? that the feature depends on+ (onverting to percentages at the end of the training process, "e
get the iddle colun in the follo"ing table-
These percent contribution values are only heuristically defined- they depend on the particular path that
the Maxent code uses to get to the optial solution, and a different algorith could get to the sae
solution via a different path, resulting in different percent contribution values+ Bn addition, "hen there are
highly correlated environental variables, the percent contributions should be interpreted "ith caution+ Bn
our Bradypus exaple, annual precipitation is highly correlated "ith ;ctober and .uly precipitation+
Although the above table sho"s that Maxent used the ;ctober precipitation variable ore than any other,
and hardly used annual precipitation at all, this does not necessarily iply that ;ctober precipitation is far
ore iportant to the species than annual precipitation+
The right&hand colun in the table sho"s a second easure of variable contributions, called perutation
iportance+ This easure depends only on the final Maxent odel, not the path used to obtain it+ The
contribution for each variable is deterined by randoly peruting the values of that variable aong the
training points =both presence and bac$ground? and easuring the resulting decrease in training A'(+ A
large decrease indicates that the odel depends heavily on that variable+ /alues are noraliIed to give
percentages+
To get alternate estiates of variable iportance, "e can also run a Aac$$nife test by selecting the D#o
Aac$$nife to easure variable iportantE chec$box+ Chen "e press the DRunE button again, a nuber of
odels are created+ Each variable is excluded in turn, and a odel created "ith the reaining variables+
Then a odel is created using each variable in isolation+ Bn addition, a odel is created using all
variables, as before+ The results of the Aac$$nife appear in the Dbradypus+htlE files in three bar charts,
and the first of these is sho"n belo"+
Ce see that if Maxent uses only pre8012Ol0 =average .anuary rainfall? it achieves alost no gain, so that
variable is not =by itself? useful for estiating the distribution of Bradypus+ ;n the other hand, ;ctober
rainfall =pre8012Ol02? allo"s a reasonably good fit to the training data+ Turning to the lighter blue bars, it
appears that no variable contains a substantial aount of useful inforation that is not already contained
in the other variables, because oitting each variable in turn did not decrease the training gain
considerably+
The bradypus+htl file has t"o ore Aac$$nife plots, "hich use either test gain or A'( in place of
training gain, sho"n belo"+
(oparing the three Aac$$nife plots can be very inforative+ The A'( plot sho"s that annual
precipitation =pre8012Oann? is the ost effective single variable for predicting the distribution of the
occurrence data that "as set aside for testing, "hen predictive perforance is easured using A'(, even
though it "as hardly used by the odel built using all variables+ The relative iportance of annual
precipitation also increases in the test gain plot, "hen copared against the training gain plot+ Bn addition,
in the test gain and A'( plots, soe of the light blue bars =especially for the onthly precipitation
variables? are longer than the red bar, sho"ing that predictive perforance iproves "hen the
corresponding variables are not used+
This tells us that onthly precipitation variables are helping Maxent to obtain a good fit to the training
data, but the annual precipitation variable generaliIes better, giving coparatively better results on the set&
aside test data+ Phrased differently, odels ade "ith the onthly precipitation variables appear to be
less transferable+ This is iportant if our goal is to transfer the odel, for exaple by applying the odel
to future cliate variables in order to estiate its future distribution under cliate change+ Bt a$es sense
that onthly precipitation values are less transferable- li$ely suitable conditions for Bradypus "ill depend
not on precise rainfall values in selected onths, but on the aggregate average rainfall, and perhaps on
rainfall consistency or lac$ of extended dry periods+ Chen "e are odeling on a continental scale, there
"ill probably be shifts in the precise tiing of seasonal rainfall patterns, affecting the onthly
precipitation but not suitable conditions for Bradypus+
Bn general, it "ould be better to use variables that are ore li$ely to be directly relevant to the species
being odeled+ ,or exaple, the Corldcli "ebsite ="""+"orldcli+org? provides DBB;(%BME
variables, including derived variables such as Drainfall in the "ettest MuarterE, rather than onthly values+
A last note on the Aac$$nife outputs- the test gain plot sho"s that a odel ade only "ith .anuary
precipitation =pre8012Ol0? results in a negative test gain+ This eans that the odel is slightly "orse than
a null odel =i+e+, a unifor distribution? for predicting the distribution of occurrences set aside for
testing+ This can be regarded as ore evidence that the onthly precipitation values are not the best
choice for predictor variables+
*o" does the prediction depend on the variablesN
)o" press the D(reate response curvesE, deselect the Aac$$nife option, and rerun the odel+ This results
in the follo"ing section being added to the Dbradypus+htlE file-
Each of the thubnail iages can be selected =by clic$ing on the? to obtain a ore detailed plot, and if
you "ould li$e to copy or open these plots "ith other soft"are, the +png files can be found in the DplotsE
directory+ %oo$ing at frs8012Oann, "e see that the response is high for the sallest values of frs8012Oann
=close to 2?, and Muic$ly drops to"ard 2+ The value sho"n on the y&axis is predicted probability of
suitable conditions, as given by the logistic output forat, "ith all other variables set to their average
value over the set of presence localities+
)ote that if the environental variables are correlated, as they are here, the arginal response curves can
be isleading+ ,or exaple, if t"o closely correlated variables have response curves that are near
opposites of each other, then for ost pixels, the cobined effect of the t"o variables ay be sall+ As
another exaple, "e see that predicted suitability is negatively correlated "ith annual precipitation
=pre8012Oann?, if all other variables are held fixed+ Bn other "ords, once the effect of all the other
variables has already been accounted for, the arginal effect of increasing annual precipitation is to
decrease predicted suitability+ *o"ever, annual precipitation is highly correlated "ith the onthly
precipitation variables, so in reality "e cannot easily hold the onthly values fixed "hile varying the
annual value+ The progra therefore produces a second set of response curves, in "hich each curve is
ade by generating a odel using only the corresponding variable, disregarding all other variables-
Bn contrast to the arginal response to annual precipitation in the first set of response curves, "e no" see
that predicted suitability generally increases "ith increasing annual precipitation+
Feature types and response curves
Response curves allo" us to see the difference aong different feature types+ #eselect the Dauto
featuresE, select DThreshold featuresE, and press the DRunE button again+ Ta$e a loo$ at the resulting
feature profiles H you!ll notice that they are all step functions, li$e this one for pre8012Ol02-
Bf the sae run is done using only hinge features, the resulting feature profile loo$s li$e this-
The outlines of the t"o profiles are siilar, but they differ because different feature types allo" different
possible shapes of response curves+ The exponent in a Maxent odel is a su of features, and a su of
threshold features is al"ays a step function, so the logistic output is also a step function =as are the ra"
and cuulative outputs?+ Bn coparison, a su of hinge features is al"ays a piece&"ise linear function, so
if only hinge features are used, the Maxent exponent is piece&"ise linear+ This explains the seMuence of
connected line segents in the second response curve above+ =)ote that the lines are slightly curved,
especially to"ards the extree values of the variable< this is because the logistic output applies a sigoid
function to the Maxent exponent+? 'sing all classes together =the default, given enough saples? allo"s
any coplex responses to be accurately odeled+ A deeper explanation of the various feature types can
be found by clic$ing on the help button+
Bnteractive exploration of predictions- the Explain tool
This interactive tool allo"s you to investigate ho" Maxent!s prediction is deterined by the predictor
variables across a study area+ (lic$ing on a point on the ap sho"s its location in each response curve+
The top right graph sho"s ho" uch each variable contributes to the logit of the prediction =pointing at a
bar on the graph gives the variable nae and nuerical contribution?+ By observing the contributions to
the logit, you "ill see ho" the Maxent prediction is driven by different variables in different parts of the
region+
The tool reMuires the odel to be additive ="ithout interactions bet"een variables?, so it should only be
run on the output of a runs "ithout product features+ Gour coputer needs enough eory to hold all
predictor variables at once+ Bf you do a run "ithout product features, a clic$able lin$ to the Explain tool is
included after the ain picture of the odel+
SC# ,orat
Another input forat can be very useful, especially "hen your environental grids are very large+ ,or
lac$ of a better nae, it!s called Dsaples "ith dataE, or Aust SC#+ The SC# version of our Bradypus
file, called DbradypusOs"d+csvE, starts li$e this-
species,longitude,latitude,cld8012Oann,dtr8012Oann,ecoreg,frs8012Oann,hOde,pre8012Oann,pre8012Ol02,pre8012Ol0,pre8012Ol5,pre8012Ol
9,tn8012Oann,tp8012Oann,tx8012Oann,vap8012Oann
bradypusOvariegatus,&87+5,&02+4:44,98+2,025+2,02+2,6+2,060+2,58+2,50+2,:5+2,75+2,4+2,016+2,688+2,449+2,691+2
bradypusOvariegatus,&87+4:44,&02+4:44,98+2,025+2,02+2,6+2,060+2,58+2,52+2,:5+2,75+2,4+2,016+2,688+2,449+2,691+2
bradypusOvariegatus,&87+0444,&08+:,79+2,005+2,02+2,0+2,600+2,87+2,78+2,061+2,7:+2,45+2,052+2,655+2,460+2,660+2
bradypusOvariegatus,&84+8889,&09+57,79+2,006+2,02+2,4+2,484+2,48+2,44+2,90+2,69+2,04+2,047+2,661+2,429+2,626+2
bradypusOvariegatus,&84+:7,&09+5,79+2,004+2,02+2,4+2,424+2,41+2,47+2,99+2,61+2,07+2,045+2,661+2,428+2,626+2
Bt can be used in place of an ordinary saples file+ The difference is only that the progra doesn!t need to
loo$ in the environental layers =the ascii files? to obtain values for the variables at the saple points,
instead it reads the values for the environental variables directly fro the table+ The environental
layers are thus only used to read the environental data for the Dbac$groundE pixels H pixels "here the
species hasn!t necessarily been detected+ Bn fact, the bac$ground pixels can also be specified in a SC#
forat file+ The file Dbac$ground+csvE contains 02,222 bac$ground data points+ The first fe" loo$ li$e
this-
bac$ground,&80+997,8+097,82+2,022+2,02+2,2+2,959+2,77+2,65+2,79+2,57+2,:0+2,0:6+2,641+2,422+2,646+2
bac$ground,&88+297,7+467,89+2,008+2,02+2,4+2,024:+2,97+2,08+2,8:+2,85+2,057+2,0:0+2,658+2,440+2,645+2
bac$ground,&71+:97,&68+467,59+2,061+2,1+2,0+2,94+2,40+2,54+2,46+2,54+2,02+2,19+2,60:+2,441+2,0:1+2
bac$ground,&8:+497,&07+497,7:+2,006+2,02+2,55+2,6241+2,44+2,89+2,40+2,42+2,8+2,020+2,0:0+2,670+2,044+2
bac$ground,&8:+767,5+997,96+2,17+2,02+2,2+2,87+2,96+2,08+2,87+2,81+2,044+2,60:+2,690+2,458+2,6:1+2
Ce can run Maxent "ith DbradypusOs"d+csvE as the saples file and Dbac$ground+csvE =both located in
the Ds"dE directory? as the environental layers file+ Try running it H you!ll notice that it runs uch
faster, because it doesn!t have to load the large environental grids+ Another advantage is that you can
associate different records "ith environental conditions fro different tie periods+ ,or exaple, t"o
occurrences recorded 022 years apart fro the sae grid cell probably reflect considerable variation in
environental conditions, but unless you use SC# forat, both records "ould be given the sae
environental variables values+ The do"nside is that it can!t a$e pictures or output grids, because it
doesn!t have all the environental data+ The "ay to get around this is to use a DproAectionE, described
belo"+
Batch running
Soeties you need to generate ultiple odels, perhaps "ith slight variations in the odeling
paraeters or the inputs+ @eneration of odels can be autoated "ith coand&line arguents,
obviating the need to clic$ and type repetitively at the progra interface+ The coand line arguents
can either be given fro a coand "indo" =a+$+a+ shell?, or they can be defined in a batch file+ Ta$e a
loo$ at the file DbatchExaple+batE =for exaple, right clic$ on the +bat file inCindo"s Explorer and open
it using )otepad?+ Bt contains the follo"ing line-
Aava &x706 &Aar axent+Aar environentallayersPlayers togglelayertypePecoreg
saplesfilePsaples>bradypus+csv outputdirectoryPoutputs redoifexists autorun
The effect is to tell the progra "here to find environental layers and saples file and "here to put
outputs, to indicate that the ecoreg variable is categorical+ The DautorunE flag tells the progra to start
running iediately, "ithout "aiting for the DRunE button to be pushed+ )o" try double clic$ing on the
file to see "hat it does+
Many aspects of the Maxent progra can be controlled by coand&line arguents H press the D*elpE
button to see all the possibilities+ Multiple runs can appear in the sae file, and they "ill siply be run
one after the other+ Gou can change the default values of paraeters by adding coand&line arguents
to the Daxent+batE file+ Many of the coand&line arguents also have abbreviations, so the run
described in batchExaple+bat could also be initiated using this coand-
Aava &x706 &Aar axent+Aar He layers Ht eco Hs saples>bradypus+csv Ho outputs Hr &a
Replication
The FreplicatesF option can be used to do ultiple runs for the sae species+ The ost coon uses for
this flag are for repeated subsapling and for cross&validation+ Replication can be controlled either fro
the Settings panel, or using coand line arguents+ By default, the for of replication used is cross&
validation, "here the occurrence data is randoly split into a nuber of eMual&siIe groups called DfoldsE,
and odels are created leaving out each fold in turn+ The left&out folds are then used for evaluation+
(ross&validation has one big advantage over using a single training3test split- it uses all of the data for
validation, thus a$ing better use of sall data sets+ As an exaple, doing a run "ith the nuber of
replicates set to 02 creates 02 htl pages, plus a page that suariIes statistical inforation for the cross&
validation+ ,or exaple, "e get R;( curves "ith error bars and average A'( across odels, and
suary response curves "ith one standard deviation error bars+ ,or Bradypus, the cross&validated R;(
curve sho"s soe variability bet"een odels-
The single&variable response of Bradypus to annual precipitation sho"s little variation =on the left, belo"?,
"hile the arginal response to annual precipitation is ore variable =belo", right?+

T"o alternative fors of replication are supported- repeated subsapling, in "hich the presence points are
repeatedly split into rando training and testing subsets, and bootstrapping, "here the training data is
selected by sapling "ith replaceent fro the presence points, "ith the nuber of saples eMualing the
total nuber of presence points+ Cith bootstrapping, the nuber of presence points in each set eMuals the
total nuber of presence points, so the training data sets "ill have duplicate records+
Cith all three fors of replication, you ay "ant to avoid eating up dis$ space by turning off the D"rite
output gridsE option, "hich "ill suppress "riting of output grids for the replicate runs, so that you only get
the suary statistics grids =avg, stderr etc+?+
RegulariIation+
The DregulariIation ultiplierE paraeter on the settings panel affects ho" focused or closely&fitted the
output distribution is H a saller value than the default of 0+2 "ill result in a ore localiIed output
distribution that is a closer fit to the given presence records, but can result in to overfitting =fitting so close
to the training data that the odel doesn!t generaliIe "ell to independent test data?+ A larger
regulariIation ultiplier "ill give a ore spread out, less localiIed prediction+ Try changing the
ultiplier, and exaine the pictures produced and changes in the A'(+ As an exaple, setting the
ultiplier to 4 a$es the follo"ing picture, sho"ing a uch ore diffuse distribution than before-
The potential for overfitting increases as the odel coplexity increases+ ,irst try setting the ultiplier
very sall =e+g+ 2+20? "ith the default set of features to see a highly overfit odel+ Then try the sae
regulariIation ultiplier "ith only linear and Muadratic features+
ProAecting
A odel trained on one set of environental layers =or SC# file? can be DproAectedE by applying it to
another set of environental layers =or SC# file?+ Situations "here proAections are needed include
odeling species distributions under changing cliate conditions, applying a odel of the native
distribution of an invasive species to assess invasive ris$ in a different geographic area, or siply
evaluating the odel at a set of test locations in order to do further statistical analysis+ *ere "e!re going
to use proAection for a siplistic cliate change prediction, and to give a taste of the difficulties involved
in a$ing reliable predictions of distributions under cliate change+
The directory DhotlayersE has the sae environental data as the DlayersE directory, "ith t"o changes- the
annual average teperature variable =tp8012Oann+asc? has all values increased by 42, representing a
unifor 4 degree (elsius increase, "hile the axiu teperature variable =tx8012Oann+asc? has all
values increased by 52, representing a 5 degree (elsius increase+ These changes represent a very
siplified estiate of future cliate, "ith higher average teperature and higher teperature variability,
but "ith no change in precipitation+ To apply a odel of Bradypus to this ne" cliate, enter the saples
file and current environental data as before, using either grids or SC# forat, and enter the DhotlayersE
directory in the DProAection %ayers #irectoryE, as pictured belo"+
The proAection layers directory =or SC# file? ust contain variables "ith the sae naes as the variables
used for training the odel, but describing a different conditions =e+g+, a different geographic region or
different cliatic odel?+ ,or both the training and proAection data, each variable nae is either the
colun title =if using an SC# forat file? or the filenae "ithout the +asc file ending =if using a directory
of grids?+
Chen you press DRunE, a odel is trained on the environental variables corresponding to current
cliate conditions, and then proAected onto the ascii grids in the DhotlayersE directory+ The output ascii
grid is called DbradypusOvariegatusOhotlayers+ascE, and in general, the proAection directory nae is
appended to the species nae, in order to distinguish it fro the standard =un&proAected? output+ Bf Da$e
pictures of predictionsE is selected, a picture of the proAected odel "ill appear in the Dbradypus+htlE
file+ Bn our case, this produces the follo"ing picture-
Ce see that the predicted probability of presence is drastically lo"er under the "arer cliate+ The
prediction is of course dependent on the paraeters of the odel "e!re proAecting+ Bf "e use only hinge
and categorical features, rather than the default set of features, the proAected distribution is ore
substantial-
T"o different odels that loo$ very siilar in the area used for training ay loo$ very different "hen
proAected to a ne" geographic area or ne" cliate conditions+ This is especially true if there are
correlated variables that allo" a variety of "ays to fit siilar&loo$ing odels, since the correlations
bet"een the variables ay change in the area you!re proAecting to+
Bs the predicted range reduction of Bradypus under cliate change reasonableN Bf "e loo$ at the arginal
response curves for the odel ade "ith default features, "e see that the axiu teperature is
exerting a uch stronger influence on the prediction-
%oo$ing at a histogra of axiu teperature values at the $no"n occurrences for Bradypus, "e see
that ost occurrences =about :2K? have axiu teperature bet"een 42 and 45 degrees (elsius+ ;nly
a single occurrence is above 45 degrees, even though a significant fraction of the bac$ground is bet"een
45 and 47 degrees+
'nder our cliate change prediction, all :2K of the Bradypus locations currently above 42 degrees "ill
have the axiu teperature increase to above 45 degrees+ Therefore it ay indeed be reasonable to
predict that such locations "ill not be suitable for Bradypus, so Bradypus ight not survive in ost of its
current range+ )ote that it is difficult to a$e any conclusions about "hy such conditions are not suitable-
it ay be that Bradypus is intolerant to heat, or it ay be that higher axiu teperatures "ould allo"
fire to cause "idespread replaceent of rainforest by fire&tolerant tree species, eliinating ost suitable
habitat for Bradypus+ To further investigate the prospects of Bradypus under cliate change, "e could do
physiological studies to investigate the species! tolerance for heat, or study the fire ecology of rainforest
boundaries in the region+
)ote- histogras li$e the t"o above are useful tools for investigating your data+ They "ere ade in R
using the follo"ing coands-
s"dPresence Q& read+csv=Fs"d3bradypusOs"d+csvF?
hist=s"dPresenceRtx8012Oann, probabilityPTR'E, brea$sPc=7-49S02?, xlabPFAnnual axiu tep S
02F, ainPFBradypus presence pointsF?
s"dBac$ground Q& read+csv=Fs"d3bac$ground+csvF?
hist=s"dBac$groundRtx8012Oann, probabilityPTR'E, brea$sPc=7-49S02?, xlabPFAnnual axiu
tep S 02F, ainPFBac$ground pointsF?
Ce can see fro the histogras that Bradypus can occasionally tolerate high teperatures, as evidenced
by the single record "ith axiu teperature of 47 degrees+ ;n the other had, there are extreely fe"
points in the bac$ground "ith teperatures of 48 or above, so "e have no evidence of "hether or not
Bradypus can tolerate even higher teperatures, "hich "ill be "idespread under the future cliate
prediction+ This is $no"n as the proble of novel cliate conditions- "hen proAecting, the predictor
variables ay ta$e on values outside the range seen during odel training+ The priary "ay Maxent
deals "ith this proble is DclapingE, "hich treats variables outside the training range as if they "ere at
the liit of the training range+ This effect can be seen in the response curves described above, as the
response is held constant outside the training range+ Chenever a odel is proAected, Maxent a$es a
picture that sho"s "here claping has had a large effect+ ProAecting the Bradypus odel ade "ith all
features gives this claping picture, "here the values depicted are the absolute difference bet"een
predictions "ith and "ithout claping+
(laping has clearly had little effect in this case H in particular, the response curve for axiu
teperature above sho"s that the prediction had already leveled off near Iero at the hot end of the scale,
so claping has little effect+
Ce also copare the environental variables used for proAection to those used for training the odel+
After the claping ap, "e see the follo"ing t"o pictures-

The leftost picture is a ultivariate siilarity surface =MESS?, as described in Elith et al+, Methods in
Ecology and Evolution, 6202+ Bt sho"s ho" siilar each point is in hotlayers to conditions seen during
odel training+ )egative values =sho"n in red? indicate novel cliate, i+e+, hotlayers values outside the
range in layers+ The value sho"n is the iniu over the predictors of ho" far out of range the point is,
expressed as a fraction of the range of that predictor!s values in layers+ Positive values =sho"n in blue?
are siilar to BB;(%BM values, "ith a score of 022 eaning that a point is not at all novel, in the sense
that its hotlayers values are all exactly eMual to the edian value in layers+ The picture on the right sho"s
the ost dissiilar variable =Mo#?, and as "e "ould expect, it sho"s that novel cliate conditions in
hotlayers are due to average teperature =auve, ostly north of the AaIon River? or axiu
teperature =teal blue, ostly south of the AaIon? being outside of the training range+
Mas$s
A as$ variable is useful if you "ant to train a odel using only a subset of the region+ ,or exaple, "e
ay "ant to train a odel for Bradypus based on occurrences in (entral Aerica, and then proAect the
odel onto South Aerica+ To do this, "e create a ne" DpredictorE variable =called as$+asc, for
exaple? "ith the sae diensions, cell siIe and proAection as the environental variables, containing a
constant value =0, for exaple? throughout (entral Aerica and no&data values every"here else+ The
as$ variable is placed in the sae directory as the environental variables, and is treated the sae "ay
as the other environental variables+ Because it is constant, it is never used in the odel, but the no&data
values serve to restrict odel training to (entral Aerica+
To proAect the resulting odel onto South Aerica, "e "ould create a ne" directory containing copies of
all the environental variables, together "ith a ne" as$ variable =also called as$+asc?, that is eMual to
0 throughout South Aerica, and has no&data values else"here+ This ne" directory is given as a
DproAection layersE arguent to Maxent+
Bias grids
By default, "hen using Maxent "e a$e the assuption that species occurrence data are unbiased,
independent saples fro the distribution of the species+ The assuption of lac$ of bias is easily
violated, for exaple if saple collection effort is biased to"ards ore easily accessed areas such as areas
close to roads or population centers+ Bf you believe that your species occurrence data constitute a biased
saple, and you have a good understanding of the spatial pattern of saple collection effort that produced
your occurrence data, you can provide Maxent "ith a Dbias gridE "hich is then used to correct for the bias+
The bias grid should have the sae diensions, cell siIe and proAection as the environental variables,
and should be positive =or no&data? every"here+ The values should indicate relative sapling effort, so if
t"o cells have values 0 and 6, that eans the probability of having visited the second cell is t"ice as high
as the first+ )ote that the bias grid gives a priori relative sapling probabilities< it does not indicate "here
sapling actually happened+
Additional coand&line tools
The Maxent Aar file contains a nuber of tools that can be accessed fro the coand line+ ,or
Microsoft users- the features described here can be used in a batch file, li$e axent+bat+ As an alternative,
Start&Trun&Tcd gets you a shell for running coands interactively< cyg"in =available free online? is a
good alternative "ith a uch ore po"erful shell that offers any unix utilities+
Quick visualization of grid file
@rid files in +asc, +grd and +xe forat, and soe files in +bil forat, can be vie"ed using the follo"ing
coand-
Aava &x706 &cp axent+Aar density+Sho" filenae
As "ith all the coands described belo", you ay need to add the path to the axent+Aar file and3or the
file you "ant to vie"+ ,or exaple, you ight use-
Aava &x0222 &cp (->axentfiles>axent+Aar density+Sho" (->ydata>var0+asc
Sho" can ta$e soe optional arguents =iediately after density+Sho"?-
&s saple,ile gives a file "ith presences to be sho"n in "hite dots
&S speciesnae says "hich species in the saple,ile to sho" "ith dots
&r radius controls the siIe of the "hite and purple dots for occurrence records
&% reoves the legend
&o "rites the picture to a file in +png forat
Cith a little Cindo"s "iIardry, you can a$e Sho" be invo$ed Aust by clic$ing on +asc, +grd or +xe
files+ Ma$e a batch file, say called sho",ile+bat, "ith the follo"ing single line in it-
Aava &x706 &cp Fc->axentfiles>axent+AarF density+Sho" K0
then associate files of type +asc, +grd or +xe "ith the batch file- fro a "indo"s explorer =a+$+a+ FMy
(oputerF?, Tools&T,older ;ptions&T,ile Types+++ Gou ay need to a$e the batch file executable- right
clic$ on it and follo" directions+
Making an SWD file
To a$e an SC#&forat file fro a non&SC# file-
Aava &cp axent+Aar density+@etval saplesfile grid0 grid6 +++
"here saplesfile is +csv file of occurrence data and grid0, grid6, etc+ are grids in +asc, +xe, +grd or +bil
forat+ The output is "ritten to Fstandard outputF, "hich eans it appears in the coand "indo"+ To
"rite the output to a file, use a FredirectF-
Aava &cp axent+Aar density+@etval saplesfile grid0 grid6 +++ T outfile
Bf all the grids are in a directory you can avoid having to list the all by nae by using a F"ildcardF-
Aava &cp axent+Aar density+@etval saplesfile directory3S+asc +++ T outfile
because the "ildcard =S? gets expanded to a list of all files that atch+
Making an SWD background file
To pic$ a collection of bac$ground points uniforly at rando fro your study area-
Aava &cp axent+Aar density+tools+RandoSaple nu grid0 grid6 +++
"here FnuF is the nuber of bac$ground points desired+
Calculating !C
The follo"ing coand-
Aava &cp axent+Aar density+A'( testpointfile predictionfile
"ill calculate a presence&bac$ground A'(, "here the presence points are given in the testpointfile and
bac$ground points are dra"n randoly fro the predictionfile+ The testpointfile is a +csv file ="hich ay
optionally be s"d forat?, "hile the predictionfile is a grid file, typically representing the output of a
species distribution odel+
"ro#ection
This tool allo"s you to apply a previously&calculated Maxent odel to a ne" set of environental data-
Aava &cp axent+Aar density+ProAect labda,ile grid#ir out,ile UargsV
*ere labda,ile is a +labdas file describing a Maxent odel, and grid#ir is a directory containing grids
for all the predictor variables described in the +labdas file+ As an alternative, grid#ir could be an s"d
forat file+ The optional args can contain any flags understood by Maxent && for exaple, a FgrdF flag
"ould a$e the output grid of density+ProAect be in +grd forat+
$ile conversion
To convert a directory full of grids in one forat to another-
Aava &cp axent+Aar density+(onvert indir insuffix outdir outsuffix
"here indir and outdir are directories and insuffix and outsuffix are one of asc, xe, grd or bil+
AnalyIing Maxent output in R
Maxent produces a nuber of output files for each run+ Soe of these files can be iported into other
progras if you "ant to do your o"n analysis of the predictions+ *ere "e deonstrate the use of the free
statistical pac$age R on Maxent outputs- this section is intended for users "ho have experience "ith R+
Ce "ill use the follo"ing t"o files produced by Maxent-
bradypusOvariegatusObac$groundPredictions+csv
bradypusOvariegatusOsaplePredictions+csv
The first file is only produced "hen the D"ritebac$groundpredictionsE option is turned on, either by using
a coand&line flag or by selecting it fro Maxent!s settings panel+ The second file is al"ays produced+
Ma$e sure you have test data =for exaple, by setting the rando test percentage to 67?< "e "ill be
evaluating the Maxent outputs using the sae test data Maxent used+ ,irst "e start R, and install soe
pac$ages =assuing this is the first tie "e!re using the? and then load the by typing =or pasting?-
install+pac$ages=FR;(RF, dependenciesPTR'E?
install+pac$ages=FvcdF, dependenciesPTR'E?
library=R;(R?
library=vcd?
library=boot?
Throughout this section "e "ill use blue text to sho" R code and coands and green to sho" R outputs+
)ext "e change directory to "here the Maxent outputs are, for exaple-
set"d=Fc-3axent3tutorial3outputsF?
and then read in the Maxent predictions at the presence and bac$ground points, and extract the coluns
"e need-
presence Q& read+csv=FbradypusOvariegatusOsaplePredictions+csvF?
bac$ground Q& read+csv=FbradypusOvariegatusObac$groundPredictions+csvF?
pp Q& presenceR%ogistic+prediction W get the colun of predictions
testpp Q& ppUpresenceRTest+or+trainPPFtestFV W select only test points
trainpp Q& ppUpresenceRTest+or+trainPPFtrainFV W select only test points
bb Q& bac$groundRlogistic
)o" "e can put the prediction values into the forat reMuired by R;(R, the pac$age "e "ill use to do
soe R;( analysis, and generate the R;( curve-
cobined Q& c=testpp, bb? W cobine into a single vector
label Q& c=rep=0,length=testpp??,rep=2,length=bb??? W labels- 0Ppresent, 2Prando
pred Q& prediction=cobined, label? W labeled predictions
perf Q& perforance=pred, FtprF, FfprF? W True 3 false positives, for R;( curve
plot=perf, coloriIePTR'E? W Sho" the R;( curve
perforance=pred, FaucF?Xy+valuesUU0VV W (alculate the A'(
The plot coand gives the follo"ing result-
"hile the DperforanceE coand gives an A'( value of 2+:899971, consistent "ith the A'( reported
by Maxent+ )ext, as an exaple of a test available in R but not in Maxent, "e "ill a$e a bootstrap
estiate of the standard deviation of the A'(+
A'( Q& function=p,ind? Y
pres Q& pUindV
cobined Q& c=pres, bb?
label Q& c=rep=0,length=pres??,rep=2,length=bb???
predic Q& prediction=cobined, label?
return=perforance=predic, FaucF?Xy+valuesUU0VV?
Z
b0 Q& boot=testpp, A'(, 022? W do 022 bootstrap A'( calculations
b0 W gives estiates of standard error and bias
This gives the follo"ing output-
;R#B)ARG );)PARAMETRB( B;;TSTRAP
(all-
boot=data P testpp, statistic P A'(, R P 022?
Bootstrap Statistics -
original bias std+ error
t0S 2+:899971 &2+222496504: 2+26196704
and "e see that the bootstrap estiate of standard error =2+26196704? is close to the standard error
coputed by Maxent =2+26:?+ The bootstrap results can also be used to deterine confidence intervals for
the A'(-
boot+ci=b0?
gives the follo"ing four estiates H see the resources section at the end of this tutorial for references that
define and copare these estiates+
Bntervals -
%evel )oral Basic
17K = 2+:211, 2+1685 ? = 2+:025, 2+1610 ?
%evel Percentile B(a
17K = 2+:285, 2+1676 ? = 2+99:8, 2+1010 ?

Those failiar "ith use of the bootstrap "ill notice that "e are bootstrapping only the presence values
here+ Ce could also bootstrap the bac$ground values, but the results "ould not change uch, given the
very large nuber of bac$ground values =02222?+
As a final exaple, "e "ill investigate the calculation of binoial and (ohen!s [appa statistics for soe
exaple threshold rules+ ,irst, the follo"ing R code calculates [appa for the threshold given by the
iniu presence prediction-
confusion Q& function=thresh? Y
return=cbind=c=length=testppUtestppTPthreshV?, length=testppUtestppQthreshV??,
c=length=bbUbbTPthreshV?, length=bbUbbQthreshV????
Z
y$appa Q& function=thresh? Y
return=[appa=confusion=thresh???
Z
y$appa=in=trainpp??
"hich gives a value of 2+2296+ Bf "e "ant to use the threshold that iniiIes the su of sensitivity and
specificity on the test data, "e can do the follo"ing, using the true positive rate and false positive rate
values fro the DperforanceE obAect used above to plot the R;( curve-
fpr P perfXx+valuesUU0VV
tpr P perfXy+valuesUU0VV
su P tpr L =0&fpr?
index P "hich+ax=su?
cutoff P perfXalpha+valuesUU0VVUUindexVV
y$appa=cutoff?
This gives a $appa value of 2+2055+ To deterine binoial probabilities for these t"o threshold values,
"e can do-
ybinoial Q& function=thresh? Y
conf Q& confusion=thresh?
trials Q& length=testpp?
return=bino+test=confUU0VVUU0VV, trials, confUU0,6VV 3 length=bb?, FgreaterF??
Z
ybinoial=in=trainpp??
ybinoial=cutoff?
This gives p&values of 7+191e&21 and 6+419e&00 respectively, "hich are both slightly larger than the p&
values given by Maxent+ The reason for the difference is that the nuber of test saples is greater than
67, the threshold above "hich Maxent uses a noral approxiation to calculate binoial p&values+
R Resources
Soe good introductory aterial on using R can be found at-
http$//spider.stat.umn.edu/%/doc/manual/%-intro.html, and other pages at the same
site.
http$//www.math.ilstu.edu/dh'im/%stu((/%tutor.html

You might also like