A Brief Tutorial on Maxent

By Steven Phillips, AT&T Research

This tutorial gives a basic introduction to use of the MaxEnt progra for axi u entropy odelling of species! geographic distributions, "ritten by Steven Phillips, Miro #udi$ and Rob Schapire, "ith support fro AT&T %abs&Research, Princeton 'niversity, and the (enter for Biodiversity and (onservation, A erican Museu of )atural *istory+ ,or ore details on the theory axi u entropy odeling as "ell as a description of the data used and the ain types of statistical analysis used here, seeSteven .+ Phillips, Robert P+ Anderson and Robert E+ Schapire, Maximum entropy modeling of species geographic distributions+ Ecological Modelling, /ol 01234&5 pp 640&671, 6228+ A second paper describing ore recently&added features of the Maxent soft"are is-

Steven .+ Phillips and Miroslav #udi$, Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation+ Ecography, /ol 40, pp 080&097, 622:+ The environ ental data "e "ill use consist of cli atic and elevational data for South A erica, together "ith a potential vegetation layer+ ;ur sa ple species "ill be Bradypus variegatus, the bro"n&throated three&toed sloth+ This tutorial "ill assu e that all the data files are located in the sa e directory as the axent progra files< other"ise you "ill need to use the path =e+g+, c->data> axent>tutorial? in front of the file na es used here+

@etting started
#o"nloading
The soft"are consists of a Aar file, axent+Aar, "hich can be used on any co puter running .ava version 0+5 or later+ Maxent can be do"nloaded, along "ith associated literature, fro www.cs.princeton.edu/~schapire/maxent< the .ava runti e environ ent can be obtained fro Aava+sun+co 3Aavase3do"nloads+ Bf you are using Microsoft Cindo"s =as "e assu e here?, you should also do"nload the file axent+bat, and save it in the sa e directory as axent+Aar+ The "ebsite has a file called Dread e+txtE, "hich contains instructions for installing the progra on your co puter+

,iring up
Bf you are using Microsoft Cindo"s, si ply clic$ on the file axent+bat+ ;ther"ise, enter FAava & x706 &Aar axent+AarF in a co and shell ="here F706F can be replaced by the egabytes of you "ant ade available to the progra ?+ The follo"ing screen "ill appeare ory

To perfor a run, you need to supply a file containing presence localities =Dsa plesE?, a directory containing environ ental variables, and an output directory+ Bn our case, the presence localities are in the file Dsa ples>bradypus+csvE, the environ ental layers are in the directory DlayersE, and the outputs are going to go in the directory DoutputsE+ Gou can enter these locations by hand, or bro"se for the + Chile bro"sing for the environ ental variables, re e ber that you are loo$ing for the directory that contains the H you don!t need to bro"se do"n to the files in the directory+ After entering or bro"sing for the files for Bradypus, the progra loo$s li$e this-

The file Dsa ples>bradypus+csvE contains the presence localities in +csv for at+ The first fe" lines are as follo"sspecies,longitude,latitude bradypus_variegatus,-65.4,-1 .!"!! bradypus_variegatus,-65.!"!!,-1 .!"!! bradypus_variegatus,-65.1!!!,-16." bradypus_variegatus,-6!.666#,-1#.45 bradypus_variegatus,-6!."5,-1#.4

There can be ultiple species in the sa e sa ples file, in "hich case ore species "ould appear in the panel, along "ith Bradypus+ (oordinate syste s other than latitude and longitude can be used provided that the sa ples file and environ ental layers use the sa e coordinate syste + The DxE coordinate =longitude, in our case? should co e before the DyE coordinate =latitude? in the sa ples file+ Bf the presence data has duplicate records = ultiple records for the sa e species in the sa e grid cell?, the duplicates are re oved by default< this can be changed by clic$ing on the DSettingsE button and deselecting DRe ove duplicate presence recordsE+ The directory DlayersE contains a nu ber of ascii raster grids =in ESRB!s +asc for at?, each of "hich describes an environ ental variable+ The grids ust all have the sa e geographic bounds and cell siIe =i+e+ all the ascii file headings ust atch each other perfectly?+ ;ne of our variables, DecoregE, is a categorical variable describing potential vegetation classes+ The categories ust be indicated by nu bers,

Maxent is generating a probability distribution over pixels in the grid. it eans that the average li$elihood of the presence sa ples is exp=6? J 9+5 ti es higher than that of a rando bac$ground pixel+ )ote that Maxent isn!t directly calculating Dprobability of occurrenceE+ The probability it assigns to each pixel is typically very s all. if the gain is 6. a easure of goodness of fit used in generaliIed additive and generaliIed linear odels+ Bt starts at 2 and increases to"ards an asy ptote during the run+ #uring this process. starting fro the unifor distribution and repeatedly i proving the fit to the data+ The gain is defined as the average log probability of the presence sa ples. the gain indicates ho" closely the odel is concentrated around the presence sa ples< for exa ple.rather than letters or "ords+ Gou in the picture above+ ust tell the progra "hich variables are categorical. as the values ust su to 0 over all the pixels in the grid =though "e return to this point "hen "e co pare output for ats?+ The run produces ultiple output files. of "hich the ost i portant for analyIing your odel is an ht l file called Dbradypus+ht lE+ The end of this file gives pointers to the other outputs. li$e this- . progress to"ards training of the odel is sho"n li$e thisaxent The gain is closely related to deviance. inus a constant that a$es the unifor distribution have Iero gain+ At the end of the run. as has been done #oing a run Si ply press the DRunE button+ A progress onitor describes the steps being ta$en+ After the environ ental layers are loaded and so e initialiIation is done.

the ht l output file contains a picture of the odel applied to the given environ ental data- .%oo$ing at a prediction By default.

ra". but they are scaled differently. or "ant to open the "ith other soft"are. "et lo"land areas of north"estern South A erica. "hich is the easiest to conceptualiIe. (aribean islands. by giving the na e of a DTest sa ple fileE in the Settings panel+ Output formats Maxent supports three output for ats for odel values.if "e set a cu ulative threshold of c. test data for one or ore species can be provided in a separate file.it gives an esti ate bet"een 2 and 0 of probability of presence+ )ote that probability of presence depends on details of the sa pling design. such as the plot siIe and =for vagile organis s? observation ti e< logistic output esti ates probability of presence assu ing that the sa pling design is such that typical presence localities have probability of presence of about 2+7+ This value of 2+7 is fairly arbitrary. then the logistic value corresponding to a ra" value of r is c·r3=0Lc·r?+ This is a logistic function. if c is the exponential of the entropy of the axent distribution. cu ulative and logistic+ . using the ra" for at gives the follo"ing picture- . the resulting binary prediction "ould have o ission rate cK on sa ples dra"n fro the Maxent distribution itself.irst. and can be adAusted =using the Ddefault prevalenceE para eter? if infor ation is available on the probability of presence at typical presence localities+ The picture of the Bradypus odel above uses the logistic for at+ Bn co parison. "e see that suitable conditions are predicted to be highly probable through ost of lo"land (entral A erica. green indicating conditions typical of those "here the species is found.or Bradypus. the A aIon basin. and have different interpretations+ The default output is logistic. the ra" output is Aust the Maxent exponential odel itself+ Second.The i age uses colors to indicate predicted probability that conditions are suitable. "ith red indicating high probability of suitable conditions for the species. the cu ulative value corresponding to a ra" value of r is the percentage of the Maxent distribution "ith ra" value at ost r+ (u ulative output is best interpreted in ter s of predicted o ission rate. because the ra" value is an exponential function of the environ ental variables+ The three output for ats are all onotonically related. unless you select the Drando seedE option on the settings panel+ Alternatively. you "ill find the +png files in the directory called DplotsE that has been created as an output during the run+ The test points are a rando sa ple ta$en fro the species presence localities+ The sa e rando sa ple is used each ti e you run Maxent on the sa e data set. and lighter shades of blue indicating lo" predicted probability of suitable conditions+ . and "e can predict a si ilar o ission rate for sa ples dra"n fro the species distribution+ Third. and uch of the Atlantic forests in south&eastern BraIil+ The file pointed to is an i age file =+png? that you can Aust clic$ on =in Cindo"s? or open in ost i age processing soft"are+ Bf you "ant to copy these i ages.

"ith a fe" red pixels =you can verify this by deselecting D%ogscale picturesE on the Settings panel? since the ra" for at typically gives a s all nu ber of sites relatively large values H this can be thought of as an artifact of the ra" output being given by an exponential distribution+ .)ote that "e have used a logarith ic scale for the colors+ A linear scale "ould be ostly blue.

in this picture?. "e have used a logarith ic scale for coloring the picture in order to e phasiIe differences bet"een s aller values+ (u ulative output can be interpreted as predicting suitable conditions for the species above a threshold in the approxi ate range of 0&62 =or yello" through orange.'sing the cu ulative output for at gives the follo"ing picture- As "ith the ra" output. depending on the level of predicted o ission that is acceptable for the application+ .

"ith suitable conditions predicted above the threshold and unsuitable belo"+ The first plot sho"s ho" testing and training o ission and predicted area vary "ith the choice of cu ulative threshold.a co on reason is that the test and training data are not independent. the test o ission line lies "ell belo" the predicted o ission line. the o ission rate for test data dra"n fro the Maxent distribution itself+ The predicted o ission rate is a straight line. sho"n belo"+ The area under the R. the standard error of the A'( on the test data is given later on in the "eb page+ .( curve =A'(? is also given here< if test data are available. as in the follo"ing graph- *ere "e see that the o ission on test sa ples is a very good atch to the predicted o ission rate.Statistical analysis The D67E "e entered for Drando test percentageE told the progra to rando ly set aside 67K of the sa ple records for testing+ This allo"s the progra to do so e si ple statistical analysis+ Much of the analysis used the use of a threshold to a$e a binary prediction. by definition of the cu ulative output for at+ Bn so e situations. for exa ple if they derive fro the sa e spatially autocorrelated presence data+ The next plot gives the receiver operating curve for both training and test data.

A+*+ & Bell.or Bradypus.4:&51+ Because "e have only occurrence data and no absence data. and is the real test of the odels predictive po"er+ The turMuoise line sho"s the line that you "ould expect if your odel "as no better than rando + Bf the blue line =the test line? falls belo" the turMuoise line then this indicates that your odel perfor s "orse than a rando odel "ould+ The further to"ards the top left of the graph that the blue line is.or ore detailed infor ation on the A'( statistic a good starting reference is. Dfractional predicted areaE =the fraction of the total study area predicted present? is used instead of the ore standard co ission rate =fraction of absences predicted present?+ . one for training and one for testing it is nor al for the red =training? line to sho" a higher A'( than the blue =testing? line+ The red =training? line sho"s the DfitE of the odel to the training data+ The blue =testing? line indicates the fit of the odel to the testing data.Bf you use the sa e data for training and for testing then the red and blue lines "ill be identical+ Bf you split your data into t"o partitions.or ore discussion of this choice. this gives- . see the paper in Ecological Modelling entioned on Page 0 of this tutorial+ Bt is i portant to note that A'( values tend to be higher for species "ith narro" ranges. using a bino ial test of o ission+ .+. relative to the study area described by the environ ental data+ This does not necessarily ean that the odels are better< instead this behavior is an artifact of the A'( statistic+ Bf test data are available. the progra auto atically calculates the statistical significance of the prediction.ielding.+ =0119? A revie" of ethods for the assess ent of prediction errors in conservation presence3 absence odels+ Environ ental (onservation 65=0?. .. the better the odel is at predicting the presences contained in the test sa ple of the data+ .

or ore detailed infor ation on the bino ial statistic. it $eeps trac$ of "hich environ ental variables are contributing to fitting the odel+ Each step of the Maxent algorith increases the gain of the odel by odifying the coefficient for a single feature< the progra assigns the increase in the gain to the environ ental . "hich variables atter ost for the species being odeledN There is ore than one "ay to ans"er this Muestion< here "e outline the possible "ays in "hich Maxent can be used to address it+ Chile the Maxent odel is being trained. see the Ecological Modelling paper above+ entioned Chich variables atter ostN A natural application of species distribution odeling is to ans"er the Muestion..

the percent contributions should be interpreted "ith caution+ Bn our Bradypus exa ple. annual precipitation is highly correlated "ith . and a different algorith could get to the sa e solution via a different path. resulting in different percent contribution values+ Bn addition.ctober precipitation variable ore than any other. not the path used to obtain it+ The contribution for each variable is deter ined by rando ly per uting the values of that variable a ong the . "hen there are highly correlated environ ental variables.ctober precipitation is far ore i portant to the species than annual precipitation+ The right&hand colu n in the table sho"s a second easure of variable contributions. "e get the iddle colu n in the follo"ing table- These percent contribution values are only heuristically defined. this does not necessarily i ply that .variable=s? that the feature depends on+ (onverting to percentages at the end of the training process.uly precipitation+ Although the above table sho"s that Maxent used the .ctober and . and hardly used annual precipitation at all.they depend on the particular path that the Maxent code uses to get to the opti al solution. called per utation i portance+ This easure depends only on the final Maxent odel.

"hich use either test gain or A'( in place of . "e can also run a Aac$$nife test by selecting the D#o Aac$$nife to easure variable i portantE chec$box+ Chen "e press the DRunE button again. because o itting each variable in turn did not decrease the training gain considerably+ The bradypus+ht l file has t"o training gain. and a odel created "ith the re aining variables+ Then a odel is created using each variable in isolation+ Bn addition. it appears that no variable contains a substantial a ount of useful infor ation that is not already contained in the other variables. a odel is created using all variables.training points =both presence and bac$ground? and easuring the resulting decrease in training A'(+ A large decrease indicates that the odel depends heavily on that variable+ /alues are nor aliIed to give percentages+ To get alternate esti ates of variable i portance. a nu ber of odels are created+ Each variable is excluded in turn.anuary rainfall? it achieves al ost no gain. so that variable is not =by itself? useful for esti ating the distribution of Bradypus+ . as before+ The results of the Aac$$nife appear in the Dbradypus+ht lE files in three bar charts.n the other hand.ctober rainfall =pre8012Ol02? allo"s a reasonably good fit to the training data+ Turning to the lighter blue bars. and the first of these is sho"n belo"+ Ce see that if Maxent uses only pre8012Ol0 =average . . sho"n belo"+ ore Aac$$nife plots.

(o paring the three Aac$$nife plots can be very infor ative+ The A'( plot sho"s that annual precipitation =pre8012Oann? is the ost effective single variable for predicting the distribution of the occurrence data that "as set aside for testing. . "hen predictive perfor ance is easured using A'(. even though it "as hardly used by the odel built using all variables+ The relative i portance of annual precipitation also increases in the test gain plot. "hen co pared against the training gain plot+ Bn addition.

giving co paratively better results on the set& aside test data+ Phrased differently.in the test gain and A'( plots. it "ould be better to use variables that are ore li$ely to be directly relevant to the species being odeled+ . the Corldcli "ebsite ="""+"orldcli +org? provides DBB.(%BME variables. but on the aggregate average rainfall. so e of the light blue bars =especially for the onthly precipitation variables? are longer than the red bar.or exa ple. rather than onthly values+ A last note on the Aac$$nife outputs. affecting the onthly precipitation but not suitable conditions for Bradypus+ Bn general. and perhaps on rainfall consistency or lac$ of extended dry periods+ Chen "e are odeling on a continental scale. but the annual precipitation variable generaliIes better. including derived variables such as Drainfall in the "ettest MuarterE. there "ill probably be shifts in the precise ti ing of seasonal rainfall patterns. for exa ple by applying the odel to future cli ate variables in order to esti ate its future distribution under cli ate change+ Bt a$es sense that onthly precipitation values are less transferable.li$ely suitable conditions for Bradypus "ill depend not on precise rainfall values in selected onths.anuary precipitation =pre8012Ol0? results in a negative test gain+ This eans that the odel is slightly "orse than a null odel =i+e+. odels ade "ith the onthly precipitation variables appear to be less transferable+ This is i portant if our goal is to transfer the odel.the test gain plot sho"s that a odel ade only "ith . a unifor distribution? for predicting the distribution of occurrences set aside for testing+ This can be regarded as ore evidence that the onthly precipitation values are not the best choice for predictor variables+ . sho"ing that predictive perfor ance i proves "hen the corresponding variables are not used+ This tells us that onthly precipitation variables are helping Maxent to obtain a good fit to the training data.

and Muic$ly drops to"ard 2+ The value sho"n on the y&axis is predicted probability of suitable conditions. the +png files can be found in the DplotsE directory+ %oo$ing at frs8012Oann. and if you "ould li$e to copy or open these plots "ith other soft"are. deselect the Aac$$nife option. "e see that the response is high for the s allest values of frs8012Oann =close to 2?.*o" does the prediction depend on the variablesN )o" press the D(reate response curvesE. as given by the logistic output for at. "ith all other variables set to their average value over the set of presence localities+ . and rerun the in the follo"ing section being added to the Dbradypus+ht lE fileodel+ This results Each of the thu bnail i ages can be selected =by clic$ing on the ? to obtain a ore detailed plot.

if t"o closely correlated variables have response curves that are near opposites of each other. so in reality "e cannot easily hold the onthly values fixed "hile varying the annual value+ The progra therefore produces a second set of response curves.or exa ple. the arginal response curves can be isleading+ . annual precipitation is highly correlated "ith the onthly precipitation variables. "e no" see that predicted suitability generally increases "ith increasing annual precipitation+ . once the effect of all the other variables has already been accounted for. as they are here. then for ost pixels. the co bined effect of the t"o variables ay be s all+ As another exa ple. the arginal effect of increasing annual precipitation is to decrease predicted suitability+ *o"ever. in "hich each curve is ade by generating a odel using only the corresponding variable. "e see that predicted suitability is negatively correlated "ith annual precipitation =pre8012Oann?. if all other variables are held fixed+ Bn other "ords. disregarding all other variables- Bn contrast to the arginal response to annual precipitation in the first set of response curves.)ote that if the environ ental variables are correlated.

and press the DRunE button again+ Ta$e a loo$ at the resulting feature profiles H you!ll notice that they are all step functions. the resulting feature profile loo$s li$e this- .Feature types and response curves Response curves allo" us to see the difference a ong different feature types+ #eselect the Dauto featuresE. li$e this one for pre8012Ol02- Bf the sa e run is done using only hinge features. select DThreshold featuresE.

so the logistic output is also a step function =as are the ra" and cu ulative outputs?+ Bn co parison. and a su of threshold features is al"ays a step function. the Maxent exponent is piece&"ise linear+ This explains the seMuence of connected line seg ents in the second response curve above+ =)ote that the lines are slightly curved. especially to"ards the extre e values of the variable< this is because the logistic output applies a sig oid function to the Maxent exponent+? 'sing all classes together =the default. a su of hinge features is al"ays a piece&"ise linear function. given enough sa ples? allo"s any co plex responses to be accurately odeled+ A deeper explanation of the various feature types can be found by clic$ing on the help button+ . so if only hinge features are used.The outlines of the t"o profiles are si ilar. but they differ because different feature types allo" different possible shapes of response curves+ The exponent in a Maxent odel is a su of features.

a clic$able lin$ to the Explain tool is included after the ain picture of the odel+ .the Explain tool This interactive tool allo"s you to investigate ho" Maxent!s prediction is deter ined by the predictor variables across a study area+ (lic$ing on a point on the ap sho"s its location in each response curve+ The top right graph sho"s ho" uch each variable contributes to the logit of the prediction =pointing at a bar on the graph gives the variable na e and nu erical contribution?+ By observing the contributions to the logit.Bnteractive exploration of predictions. so it should only be run on the output of a runs "ithout product features+ Gour co puter needs enough e ory to hold all predictor variables at once+ Bf you do a run "ithout product features. you "ill see ho" the Maxent prediction is driven by different variables in different parts of the region+ The tool reMuires the odel to be additive ="ithout interactions bet"een variables?.

005+2.645+2 bac$ground.ecoreg.458+2.79+2.79+2.660+2 bradypusOvariegatus.44+2.6241+2.45+2.4+2. described belo"+ .006+2.8+097.061+2.0:6+2.626+2 bradypusOvariegatus.02+2.:0+2.&09+5.t x8012Oann. t"o occurrences recorded 022 years apart fro the sa e grid cell probably reflect considerable variation in environ ental conditions.460+2.&8:+497.020+2.&80+997.89+2.&84+8889.frs8012Oann.&88+297.&09+57.vap8012Oann bradypusOvariegatus.8:+2.65+2.69+2.59+2.85+2.658+2.02+2.7+467.dtr8012Oann.02+2.55+2.&02+4:44.449+2.pre8012Oann.0+2.54+2.52+2.08+2.4+2.08+2.690+2.SC# .60:+2.052+2.600+2.60:+2.6+2.&87+0444.58+2.060+2.02+2.42+2.44+2.96+2.428+2.98+2.959+2. because it doesn!t have all the environ ental data+ The "ay to get around this is to use a DproAectionE.060+2.222 bac$ground data points+ The first fe" loo$ li$e thisbac$ground.or lac$ of a better na e.&68+467.484+2.688+2.hOde .:5+2.96+2.688+2.82+2.047+2.5+997.:5+2.&84+:7.02+2.87+2.77+2.&02+4:44.&8:+767. it!s called Dsa ples "ith dataE.t p8012Oann. starts li$e thisspecies. called DbradypusOs"d+csvE.0:0+2.449+2.2+2.89+2.7:+2. both records "ould be given the sa e environ ental variables values+ The do"nside is that it can!t a$e pictures or output grids.87+2.661+2.&08+:.17+2.97+2.429+2.90+2.40+2.006+2.78+2.47+2.latitude.02+2.&07+497.422+2.440+2.4+2. especially "hen your environ ental grids are very large+ .41+2.424+2.50+2.025+2.646+2 bac$ground.pre8012Ol0.7:+2.48+2.004+2.longitude. instead it reads the values for the environ ental variables directly fro the table+ The environ ental layers are thus only used to read the environ ental data for the Dbac$groundE pixels H pixels "here the species hasn!t necessarily been detected+ Bn fact.79+2.022+2.661+2.2+2.40+2.0:0+2.75+2.0:1+2 bac$ground.or at Another input for at can be very useful.0+2.057+2.75+2.46+2.t n8012Oann.4+2.016+2.99+2.87+2.6+2.07+2.626+2 Bt can be used in place of an ordinary sa ples file+ The difference is only that the progra doesn!t need to loo$ in the environ ental layers =the ascii files? to obtain values for the variables at the sa ple points.19+2.061+2.045+2.044+2 bac$ground.pre8012Ol 9.641+2.&87+4:44.pre8012Ol02.6:1+2 Ce can run Maxent "ith DbradypusOs"d+csvE as the sa ples file and Dbac$ground+csvE =both located in the Ds"dE directory? as the environ ental layers file+ Try running it H you!ll notice that it runs uch faster.008+2.044+2.61+2.94+2.cld8012Oann. the bac$ground pixels can also be specified in a SC# for at file+ The file Dbac$ground+csvE contains 02.691+2 bradypusOvariegatus. because it doesn!t have to load the large environ ental grids+ Another advantage is that you can associate different records "ith environ ental conditions fro different ti e periods+ .02+2.1+2. or Aust SC#+ The SC# version of our Bradypus file.670+2.4+2.81+2.441+2.8+2.025+2. but unless you use SC# for at.04+2.pre8012Ol5.98+2.016+2.57+2.02+2.54+2.or exa ple.655+2.&87+5.691+2 bradypusOvariegatus.79+2.02+2.&71+:97.02+2.024:+2.58+2.

or they can be defined in a batch file+ Ta$e a loo$ at the file DbatchExa ple+batE =for exa ple.Batch running So eti es you need to generate ultiple odels. perhaps "ith slight variations in the odeling para eters or the inputs+ @eneration of odels can be auto ated "ith co and&line argu ents. "ithout "aiting for the DRunE button to be pushed+ )o" try double clic$ing on the file to see "hat it does+ Many aspects of the Maxent progra can be controlled by co and&line argu ents H press the D*elpE button to see all the possibilities+ Multiple runs can appear in the sa e file. and they "ill si ply be run one after the other+ Gou can change the default values of para eters by adding co and&line argu ents to the D axent+batE file+ Many of the co and&line argu ents also have abbreviations. to indicate that the ecoreg variable is categorical+ The DautorunE flag tells the progra to start running i ediately. right clic$ on the +bat file inCindo"s Explorer and open it using )otepad?+ Bt contains the follo"ing lineAava & x706 &Aar axent+Aar environ entallayersPlayers togglelayertypePecoreg sa plesfilePsa ples>bradypus+csv outputdirectoryPoutputs redoifexists autorun The effect is to tell the progra "here to find environ ental layers and sa ples file and "here to put outputs. obviating the need to clic$ and type repetitively at the progra interface+ The co and line argu ents can either be given fro a co and "indo" =a+$+a+ shell?. so the run described in batchExa ple+bat could also be initiated using this co andAava & x706 &Aar axent+Aar He layers Ht eco Hs sa ples>bradypus+csv Ho outputs Hr &a .

or using co and line argu ents+ By default. belo"?. right?+ . the cross&validated R.or exa ple.( curve sho"s so e variability bet"een odels- The single&variable response of Bradypus to annual precipitation sho"s little variation =on the left. plus a page that su ariIes statistical infor ation for the cross& validation+ . "e get R. thus a$ing better use of s all data sets+ As an exa ple.or Bradypus. "hile the arginal response to annual precipitation is ore variable =belo".( curves "ith error bars and average A'( across odels. and su ary response curves "ith one standard deviation error bars+ . the for of replication used is cross& validation. doing a run "ith the nu ber of replicates set to 02 creates 02 ht l pages.Replication The FreplicatesF option can be used to do ultiple runs for the sa e species+ The ost co on uses for this flag are for repeated subsa pling and for cross&validation+ Replication can be controlled either fro the Settings panel. "here the occurrence data is rando ly split into a nu ber of eMual&siIe groups called DfoldsE. and odels are created leaving out each fold in turn+ The left&out folds are then used for evaluation+ (ross&validation has one big advantage over using a single training3test split.it uses all of the data for validation.

so that you only get the su ary statistics grids =avg. "hich "ill suppress "riting of output grids for the replicate runs. and bootstrapping. stderr etc+?+ . you ay "ant to avoid eating up dis$ space by turning off the D"rite output gridsE option. so the training data sets "ill have duplicate records+ Cith all three for s of replication. the nu ber of presence points in each set eMuals the total nu ber of presence points. "ith the nu ber of sa ples eMualing the total nu ber of presence points+ Cith bootstrapping.T"o alternative for s of replication are supported. in "hich the presence points are repeatedly split into rando training and testing subsets.repeated subsa pling. "here the training data is selected by sa pling "ith replace ent fro the presence points.

setting the ultiplier to 4 a$es the follo"ing picture. sho"ing a uch ore diffuse distribution than before- The potential for overfitting increases as the odel co plexity increases+ . but can result in to overfitting =fitting so close to the training data that the odel doesn!t generaliIe "ell to independent test data?+ A larger regulariIation ultiplier "ill give a ore spread out. less localiIed prediction+ Try changing the ultiplier. and exa ine the pictures produced and changes in the A'(+ As an exa ple.RegulariIation+ The DregulariIation ultiplierE para eter on the settings panel affects ho" focused or closely&fitted the output distribution is H a s aller value than the default of 0+2 "ill result in a ore localiIed output distribution that is a closer fit to the given presence records.irst try setting the ultiplier very s all =e+g+ 2+20? "ith the default set of features to see a highly overfit odel+ Then try the sa e regulariIation ultiplier "ith only linear and Muadratic features+ .

ProAecting A odel trained on one set of environ ental layers =or SC# file? can be DproAectedE by applying it to another set of environ ental layers =or SC# file?+ Situations "here proAections are needed include odeling species distributions under changing cli ate conditions. "hile the axi u te perature variable =t x8012Oann+asc? has all values increased by 52. as pictured belo"+ . or si ply evaluating the odel at a set of test locations in order to do further statistical analysis+ *ere "e!re going to use proAection for a si plistic cli ate change prediction. but "ith no change in precipitation+ To apply a odel of Bradypus to this ne" cli ate. "ith higher average te perature and higher te perature variability.the annual average te perature variable =t p8012Oann+asc? has all values increased by 42. representing a unifor 4 degree (elsius increase. representing a 5 degree (elsius increase+ These changes represent a very si plified esti ate of future cli ate. enter the sa ples file and current environ ental data as before. using either grids or SC# for at. applying a odel of the native distribution of an invasive species to assess invasive ris$ in a different geographic area. and enter the DhotlayersE directory in the DProAection %ayers #irectoryE. and to give a taste of the difficulties involved in a$ing reliable predictions of distributions under cli ate change+ The directory DhotlayersE has the sa e environ ental data as the DlayersE directory. "ith t"o changes.

but describing a different conditions =e+g+.The proAection layers directory =or SC# file? ust contain variables "ith the sa e na es as the variables used for training the odel. a picture of the proAected odel "ill appear in the Dbradypus+ht lE file+ Bn our case. each variable na e is either the colu n title =if using an SC# for at file? or the filena e "ithout the +asc file ending =if using a directory of grids?+ Chen you press DRunE. in order to distinguish it fro the standard =un&proAected? output+ Bf D a$e pictures of predictionsE is selected. and in general. a different geographic region or different cli atic odel?+ . this produces the follo"ing picture- . a odel is trained on the environ ental variables corresponding to current cli ate conditions. the proAection directory na e is appended to the species na e. and then proAected onto the ascii grids in the DhotlayersE directory+ The output ascii grid is called DbradypusOvariegatusOhotlayers+ascE.or both the training and proAection data.

the proAected distribution is ore substantial- T"o different odels that loo$ very si ilar in the area used for training ay loo$ very different "hen proAected to a ne" geographic area or ne" cli ate conditions+ This is especially true if there are correlated variables that allo" a variety of "ays to fit si ilar&loo$ing odels.Ce see that the predicted probability of presence is drastically lo"er under the "ar er cli ate+ The prediction is of course dependent on the para eters of the odel "e!re proAecting+ Bf "e use only hinge and categorical features. rather than the default set of features. since the correlations bet"een the variables ay change in the area you!re proAecting to+ .

"e could do physiological studies to investigate the species! tolerance for heat. or it ay be that higher axi u te peratures "ould allo" fire to cause "idespread replace ent of rainforest by fire&tolerant tree species. or study the fire ecology of rainforest boundaries in the region+ )ote.nly a single occurrence is above 45 degrees. "e see that ost occurrences =about :2K? have axi u te perature bet"een 42 and 45 degrees (elsius+ .histogra s li$e the t"o above are useful tools for investigating your data+ They "ere using the follo"ing co andsade in R .Bs the predicted range reduction of Bradypus under cli ate change reasonableN Bf "e loo$ at the arginal response curves for the odel ade "ith default features. "e see that the axi u te perature is exerting a uch stronger influence on the prediction- %oo$ing at a histogra of axi u te perature values at the $no"n occurrences for Bradypus. eli inating ost suitable habitat for Bradypus+ To further investigate the prospects of Bradypus under cli ate change. so Bradypus ight not survive in ost of its current range+ )ote that it is difficult to a$e any conclusions about "hy such conditions are not suitableit ay be that Bradypus is intolerant to heat. all :2K of the Bradypus locations currently above 42 degrees "ill have the axi u te perature increase to above 45 degrees+ Therefore it ay indeed be reasonable to predict that such locations "ill not be suitable for Bradypus. even though a significant fraction of the bac$ground is bet"een 45 and 47 degrees+ 'nder our cli ate change prediction.

"hich treats variables outside the training range as if they "ere at the li it of the training range+ This effect can be seen in the response curves described above. so "e have no evidence of "hether or not Bradypus can tolerate even higher te peratures. "here the values depicted are the absolute difference bet"een predictions "ith and "ithout cla ping+ . xlabPFAnnual axi u te p S 02F."hen proAecting. "hich "ill be "idespread under the future cli ate prediction+ This is $no"n as the proble of novel cli ate conditions. brea$sPc=7-49S02?. probabilityPTR'E. as evidenced by the single record "ith axi u te perature of 47 degrees+ . probabilityPTR'E. ainPFBradypus presence pointsF? s"dBac$ground Q& read+csv=Fs"d3bac$ground+csvF? hist=s"dBac$groundRt x8012Oann. xlabPFAnnual axi u te p S 02F.s"dPresence Q& read+csv=Fs"d3bradypusOs"d+csvF? hist=s"dPresenceRt x8012Oann. ainPFBac$ground pointsF? Ce can see fro the histogra s that Bradypus can occasionally tolerate high te peratures. the predictor variables ay ta$e on values outside the range seen during odel training+ The pri ary "ay Maxent deals "ith this proble is Dcla pingE. brea$sPc=7-49S02?. Maxent a$es a picture that sho"s "here cla ping has had a large effect+ ProAecting the Bradypus odel ade "ith all features gives this cla ping picture. as the response is held constant outside the training range+ Chenever a odel is proAected.n the other had. there are extre ely fe" points in the bac$ground "ith te peratures of 48 or above.

i+e+. ostly south of the A aIon? being outside of the training range+ . expressed as a fraction of the range of that predictor!s values in layers+ Positive values =sho"n in blue? are si ilar to BB. the response curve for axi u te perature above sho"s that the prediction had already leveled off near Iero at the hot end of the scale. and as "e "ould expect. in the sense that its hotlayers values are all exactly eMual to the edian value in layers+ The picture on the right sho"s the ost dissi ilar variable =Mo#?. 6202+ Bt sho"s ho" si ilar each point is in hotlayers to conditions seen during odel training+ )egative values =sho"n in red? indicate novel cli ate. so cla ping has little effect+ Ce also co pare the environ ental variables used for proAection to those used for training the After the cla ping ap. "e see the follo"ing t"o picturesodel+ The left ost picture is a ultivariate si ilarity surface =MESS?. as described in Elith et al+. hotlayers values outside the range in layers+ The value sho"n is the ini u over the predictors of ho" far out of range the point is.(%BM values. it sho"s that novel cli ate conditions in hotlayers are due to average te perature = auve.(la ping has clearly had little effect in this case H in particular. ostly north of the A aIon River? or axi u te perature =teal blue. Methods in Ecology and Evolution. "ith a score of 022 eaning that a point is not at all novel.

for exa ple if sa ple collection effort is biased to"ards ore easily accessed areas such as areas close to roads or population centers+ Bf you believe that your species occurrence data constitute a biased sa ple. for exa ple? throughout (entral A erica and no&data values every"here else+ The as$ variable is placed in the sa e directory as the environ ental variables. together "ith a ne" as$ variable =also called as$+asc?. cell siIe and proAection as the environ ental variables. so if t"o cells have values 0 and 6.the features described here can be used in a batch file. you can provide Maxent "ith a Dbias gridE "hich is then used to correct for the bias+ The bias grid should have the sa e di ensions.Mas$s A as$ variable is useful if you "ant to train a odel using only a subset of the region+ . for exa ple? "ith the sa e di ensions. and then proAect the odel onto South A erica+ To do this.or Microsoft users. independent sa ples fro the distribution of the species+ The assu ption of lac$ of bias is easily violated. +grd and + xe for at. cell siIe and proAection as the environ ental variables. Start&Trun&Tc d gets you a shell for running co ands interactively< cyg"in =available free online? is a good alternative "ith a uch ore po"erful shell that offers any unix utilities+ Quick visualization of grid file @rid files in +asc.or exa ple. that eans the probability of having visited the second cell is t"ice as high as the first+ )ote that the bias grid gives a priori relative sa pling probabilities< it does not indicate "here sa pling actually happened+ Additional co and&line tools The Maxent Aar file contains a nu ber of tools that can be accessed fro the co and line+ . "hen using Maxent "e a$e the assu ption that species occurrence data are unbiased. "e ay "ant to train a odel for Bradypus based on occurrences in (entral A erica. and is treated the sa e "ay as the other environ ental variables+ Because it is constant. and you have a good understanding of the spatial pattern of sa ple collection effort that produced your occurrence data. and so e files in +bil for at. it is never used in the odel. li$e axent+bat+ As an alternative. and should be positive =or no&data? every"here+ The values should indicate relative sa pling effort. and has no&data values else"here+ This ne" directory is given as a DproAection layersE argu ent to Maxent+ Bias grids By default. containing a constant value =0. "e create a ne" DpredictorE variable =called as$+asc. can be vie"ed using the follo"ing co and- . that is eMual to 0 throughout South A erica. "e "ould create a ne" directory containing copies of all the environ ental variables. but the no&data values serve to restrict odel training to (entral A erica+ To proAect the resulting odel onto South A erica.

+grd or + xe "ith the batch file.older . +grd or + xe files+ Ma$e a batch file.fro a "indo"s explorer =a+$+a+ FMy (o puterF?.or exa ple.ile to sho" "ith dots &r radius controls the siIe of the "hite and purple dots for occurrence records &% re oves the legend &o "rites the picture to a file in +png for at Cith a little Cindo"s "iIardry.ptions&T.ile+bat. + xe. +grd or +bil for at+ The output is "ritten to Fstandard outputF.Aava & x706 &cp axent+Aar density+Sho" filena e axent+Aar file and3or the As "ith all the co ands described belo". say called sho". Tools&T. you ay need to add the path to the file you "ant to vie"+ . use a FredirectFAava &cp axent+Aar density+@etval sa plesfile grid0 grid6 +++ T outfile all by na e by using a F"ildcardF- Bf all the grids are in a directory you can avoid having to list the Aava &cp axent+Aar density+@etval sa plesfile directory3S+asc +++ T outfile atch+ because the "ildcard =S? gets expanded to a list of all files that Making an SWD background file To pic$ a collection of bac$ground points unifor ly at rando Aava &cp axent+Aar density+tools+Rando Sa ple nu fro your study area- grid0 grid6 +++ . you ight useAava & x0222 &cp (-> axentfiles> axent+Aar density+Sho" (-> ydata>var0+asc Sho" can ta$e so e optional argu ents =i ediately after density+Sho"?&s sa ple. etc+ are grids in +asc.right clic$ on it and follo" directions+ Making an SWD file To a$e an SC#&for at file fro a non&SC# file- Aava &cp axent+Aar density+@etval sa plesfile grid0 grid6 +++ "here sa plesfile is +csv file of occurrence data and grid0. you can a$e Sho" be invo$ed Aust by clic$ing on +asc.ile Types+++ Gou ay need to a$e the batch file executable.ile gives a file "ith presences to be sho"n in "hite dots &S speciesna e says "hich species in the sa ple. grid6. "hich eans it appears in the co and "indo"+ To "rite the output to a file. "ith the follo"ing single line in itAava & x706 &cp Fc-> axentfiles> axent+AarF density+Sho" K0 then associate files of type +asc.

ile UargsV *ere la bda. typically representing the output of a species distribution odel+ "ro#ection This tool allo"s you to apply a previously&calculated Maxent Aava &cp odel to a ne" set of environ ental data- axent+Aar density+ProAect la bda. grd or bil+ "here indir and outdir are directories and insuffix and outsuffix are one of asc. "hile the predictionfile is a grid file. and grid#ir is a directory containing grids for all the predictor variables described in the +la bdas file+ As an alternative. "here the presence points are given in the testpointfile and bac$ground points are dra"n rando ly fro the predictionfile+ The testpointfile is a +csv file ="hich ay optionally be s"d for at?.ile is a +la bdas file describing a Maxent odel. . grid#ir could be an s"d for at file+ The optional args can contain any flags understood by Maxent && for exa ple. a FgrdF flag "ould a$e the output grid of density+ProAect be in +grd for at+ $ile conversion To convert a directory full of grids in one for at to anotherAava &cp axent+Aar density+(onvert indir insuffix outdir outsuffix xe."here Fnu F is the nu ber of bac$ground points desired+ Calculating !C The follo"ing co Aava &cp and- axent+Aar density+A'( testpointfile predictionfile "ill calculate a presence&bac$ground A'(.ile grid#ir out.

this section is intended for users "ho have experience "ith R+ Ce "ill use the follo"ing t"o files produced by MaxentbradypusOvariegatusObac$groundPredictions+csv bradypusOvariegatusOsa plePredictions+csv The first file is only produced "hen the D"ritebac$groundpredictionsE option is turned on. for exa pleset"d=Fc-3 axent3tutorial3outputsF? and then read in the Maxent predictions at the presence and bac$ground points.irst "e start R. and install so e pac$ages =assu ing this is the first ti e "e!re using the ? and then load the by typing =or pasting?install+pac$ages=FR. FtprF. FfprF? W True 3 false positives.rep=2. 2Prando pred Q& prediction=co bined.length=testpp??. either by using a co and&line flag or by selecting it fro Maxent!s settings panel+ The second file is al"ays produced+ Ma$e sure you have test data =for exa ple. FaucF?Xy+valuesUU0VV W (alculate the A'( .( analysis.( curve plot=perf.(R. dependenciesPTR'E? library=R. dependenciesPTR'E? install+pac$ages=FvcdF.length=bb??? W labels. coloriIePTR'E? W Sho" the R.( curve perfor ance=pred. for R. and extract the colu ns "e needpresence Q& read+csv=FbradypusOvariegatusOsa plePredictions+csvF? bac$ground Q& read+csv=FbradypusOvariegatusObac$groundPredictions+csvF? pp Q& presenceR%ogistic+prediction W get the colu n of predictions testpp Q& ppUpresenceRTest+or+trainPPFtestFV W select only test points trainpp Q& ppUpresenceRTest+or+trainPPFtrainFV W select only test points bb Q& bac$groundRlogistic )o" "e can put the prediction values into the for at reMuired by R.0Ppresent. the pac$age "e "ill use to do so e R.AnalyIing Maxent output in R Maxent produces a nu ber of output files for each run+ So e of these files can be i ported into other progra s if you "ant to do your o"n analysis of the predictions+ *ere "e de onstrate the use of the free statistical pac$age R on Maxent outputs. bb? W co bine into a single vector label Q& c=rep=0. by setting the rando test percentage to 67?< "e "ill be evaluating the Maxent outputs using the sa e test data Maxent used+ . label? W labeled predictions perf Q& perfor ance=pred. and generate the R.(RF.( curveco bined Q& c=testpp.(R? library=vcd? library=boot? Throughout this section "e "ill use blue text to sho" R code and co ands and green to sho" R outputs+ )ext "e change directory to "here the Maxent outputs are.

)PARAMETRB( B.length=bb??? predic Q& prediction=co bined.rep=2.length=pres??. FaucF?Xy+valuesUU0VV? Z b0 Q& boot=testpp. "e "ill a$e a bootstrap esti ate of the standard deviation of the A'(+ A'( Q& function=p. A'(. as an exa ple of a test available in R but not in Maxent.R#B)ARG ).The plot co and gives the follo"ing result- "hile the Dperfor anceE co and gives an A'( value of 2+:899971.TSTRAP (all- . consistent "ith the A'( reported by Maxent+ )ext. bb? label Q& c=rep=0.ind? Y pres Q& pUindV co bined Q& c=pres. label? return=perfor ance=predic.. 022? W do 022 bootstrap A'( calculations b0 W gives esti ates of standard error and bias This gives the follo"ing output.

using the true positive rate and false positive rate values fro the Dperfor anceE obAect used above to plot the R. statistic P A'(. "e "ill investigate the calculation of bino ial and (ohen!s [appa statistics for so e exa ple threshold rules+ .boot=data P testpp. length=bbUbbQthreshV???? Z y$appa Q& function=thresh? Y return=[appa=confusion=thresh??? Z y$appa= in=trainpp?? "hich gives a value of 2+2296+ Bf "e "ant to use the threshold that ini iIes the su of sensitivity and specificity on the test data. 2+1676 ? = 2+99:8. 2+1010 ? Those fa iliar "ith use of the bootstrap "ill notice that "e are bootstrapping only the presence values here+ Ce could also bootstrap the bac$ground values. 2+1685 ? = 2+:025. c=length=bbUbbTPthreshV?. R P 022? Bootstrap Statistics original bias std+ error t0S 2+:899971 &2+222496504: 2+26196704 and "e see that the bootstrap esti ate of standard error =2+26196704? is close to the standard error co puted by Maxent =2+26:?+ The bootstrap results can also be used to deter ine confidence intervals for the A'(boot+ci=b0? gives the follo"ing four esti ates H see the resources section at the end of this tutorial for references that define and co pare these esti ates+ Bntervals %evel )or al Basic 17K = 2+:211. given the very large nu ber of bac$ground values =02222?+ As a final exa ple.irst.( curvefpr P perfXx+valuesUU0VV tpr P perfXy+valuesUU0VV su P tpr L =0&fpr? index P "hich+ ax=su ? cutoff P perfXalpha+valuesUU0VVUUindexVV . the follo"ing R code calculates [appa for the threshold given by the ini u presence predictionconfusion Q& function=thresh? Y return=cbind=c=length=testppUtestppTPthreshV?. length=testppUtestppQthreshV??. 2+1610 ? %evel Percentile B(a 17K = 2+:285. but the results "ould not change uch. "e can do the follo"ing.

umn.html. the threshold above "hich Maxent uses a nor al approxi ation to calculate bino ial p&values+ R Resources So e good introductory aterial on using R can be found athttp$//spider.stat.6VV 3 length=bb?.edu/%/doc/manual/%-intro. FgreaterF?? Z ybino ial= in=trainpp?? ybino ial=cutoff? This gives p&values of 7+191e&21 and 6+419e&00 respectively.&and&other&pages&at&the&same& site. trials.html .y$appa=cutoff? This gives a $appa value of 2+2055+ To deter ine bino ial probabilities for these t"o threshold values.math. confUU0. http$//www. "hich are both slightly larger than the p& values given by Maxent+ The reason for the difference is that the nu ber of test sa ples is greater than 67. "e can doybino ial Q& function=thresh? Y conf Q& confusion=thresh? trials Q& length=testpp? return=bino +test=confUU0VVUU0VV.ilstu.edu/dh'im/%stu((/%tutor.