Professional Documents
Culture Documents
7423 CH 7
7423 CH 7
7.1 INTRODUCTION
GIS can be a useful tool for spatial or land-use planning, but only if several
conditions are fulfilled. The key conditions are related to 1) the quality of basic
spatial information, and 2) the statistical methods applied to the spatial nature of the
data. Appropriate information and methods allow the generation of robust models
that guarantee objective and methodologically sound decisions.
In this study we apply several multivariate statistical methods and test their
usefulness to provide robust solutions in forestry planning using GIS. We must
emphasize that in our Iberian study area, where forests have progressively
decreased in extent over centuries, the main aims of forestry planning are the
reduction of forest fragmentation, biodiversity conservation, and restoration of
degraded biotopes.
The research develops a set of likelihood or suitability models for the presence
of tree species that are widely distributed over a study area of 41,000 km2. The
utility of suitability models has been demonstrated in some previous studies1, but
they are still not as widely employed as might be expected.
A suitability model is a raster map in which each pixel is assigned a value
reflecting suitability for a given use (e.g., presence of a tree species). Suitability
models can be generated through diverse techniques, such as logistic regression or
non-parametric CART (classification and regression trees) and MARS (multiple
adaptive regression splines)2-4. All of these techniques require a vegetation map
(dependent variable) and a set of environmental variables (climate, topography,
geology, etc.) which potentially influence the vegetation distribution. The
foundation of the method is to establish relationships between the environmental
variables and the spatial distribution of the vegetation. Typically, each vegetation
type will respond in a different way as a consequence of its contrasting
environmental requirements.
Suitability is commonly expressed on a 0-1 scale (incompatible-ideal). The
precise value depends on a set of physical and biological factors that favor or limit
the growth of each type of vegetation. Once the distribution of suitability values
across a region is known, decisions on land use and management can be made on
the basis of objective criteria.
117
The set of suitability values for a region can be considered as the potential
distribution model if presented as a map: the area defined as ‘suitable’ in a model
should reflect the potential area for the vegetation type under consideration. Such
a model also represents the relationships between presence/absence of each forest
type and the values of the potentially influential environmental variables in a given
region. Usually, current forest distributions are significantly smaller than the
potential spatial extents because they have been systematically logged. Potential
distribution models allow the recognition and delineation of such former
distribution areas in order to direct current and future management plans, provide
valuable data for restoration initiatives and highlight areas where such actions
should be considered a priority.
7.2 OBJECTIVES
The main objectives of the study were to 1) use several different statistical
methods to generate maps of potential distributions and suitability for each of three
species of Quercus (oak) in the study area, and 2) identify the most appropriate
method and assess its advantages and limitations. In order to fulfill these
objectives, we developed a workflow that included sampling strategies, GIS
implementation of statistical models and validation of results.
Figure 7.2 Dehesas are artificial ecotypes comparable to savannas: a Mediterranean (seasonal) grassland
containing scattered trees of the genus Quercus.
7.4 DATA
Current Quercus species distribution maps were taken from the Forestry Map of
Spain (scale 1:50,000), produced by the Spanish General Directorate for Nature
Conservation during the period 1986-96. We used the digital version of the map to
identify the main vegetation classes and the current spatial distributions (Figure
7.3).
Figure 7.3 Current distribution of Quercus species in the study area (black represents Pyrenean oak, Q.
pyrenaica; dark gray, cork oak, Q. suber; and pale gray, holm oak, Q. rotundifolia).
7.5 METHODS
The methods used in predictive modelling are usually of two main types: global
parametric and local non-parametric. Global parametric models adopt an approach
where each entered predictor has a universal relationship with the response
variable. An advantage of global parametric models, such as linear and logistic
regression, is that they are easy and quick to compute, and their integration with a
GIS is straightforward. As an example of such a model we used logistic multiple
regression (LMR). This is widely employed in predictive modelling10, but has
several important limitations. For instance, ecologists frequently assume a
response function which is unimodal and symmetric, yet this is often not
justified11,12.
An alternative hypothesis when modelling organism or community distributions
is to assume that the response is related to predictor variables in a non-linear and
local manner. Local non-parametric models are appropriate for such an approach
since they use a strategy of local variable selection and reduction, and are flexible
enough to allow non-linear relationships. Two examples of this type of model are
CART (classification and regression trees) and MARS (multiple adaptive
regression splines).
All three types of model used in this study were calculated from stratified
random samples of pixels with an approximately even representation of points
where each Quercus species was present or absent. Each random sample covered
about 10-20% of the total area for each species. One sample was used to generate
the models, and a second to test the reliability of the predictions.
where P(i) is the probability of presence (e.g., for a tree species), x1 ...xn represent
the values of the independent variables, and b1...bn the coefficients. The predicted
values from the regression are probabilities which range from 0 to 1 and can be
interpreted as measures of potential suitability13. Several studies have combined
LMR with GIS tools to present such probabilities in cartographic form. For
instance, Guisan et al.14 used LMR in the ArcInfo GIS to generate a distribution
model for the plant Carex curvula in the Swiss Alps. A similar study on aquatic
vegetation was conducted by Van de Rijt et al.15 using the GRASS GIS. In this
study LMR was performed using a forward conditional stepwise method in SPSS®
11.516 and the results were then imported back into the ArcInfo® GIS17 for
mapping.
on a node purity criterion and subsequently this is usually ‘pruned back’ to avoid
over-fitting via cross-validation.
The main drawback of CART models when used to predict organism
distributions is that the generated models can be extremely complex and difficult to
interpret. For example, work on Australian forests by Moore et al.18 produced a
tree with 510 nodes from just 10 predictors. In this study, the optimal tree
generated from the Quercus rotundifolia data set had 4889 terminal nodes.
Although the complexity of such a tree does not diminish its predictive power, it
makes it almost impossible to interpret, which in many studies is a key
requirement. Moreover, implementation of such an analysis within a GIS is
difficult. Nevertheless, as part of this study we developed a method to translate the
large CART reports (text files) to AML (Arc Macro Language) files that could be
run with the ArcInfo GIS. Such files can be large (e.g., the text file containing the
CART decision rules for constructing the Q. rotundifolia suitability map was 1.8
Mb in size) and execution times may be long (about 55 hours for the Q.
rotundifolia model).
where PT4 is the mean rainfall for the period October-December (l/m2 * 10) and
MDE50 is elevation (m).
Changes in the slope of these basis functions occur at points called ‘knots’ (the
values 3431 or 1181 in the above examples). Regression lines are allowed to bend
at the knots, which mark the end of one region of data and the beginning of another
with different functional behavior. Like the subdivisions in CART, knots are
established in a forward/backward stepwise way. A model which clearly overfits is
produced first and then those knots that contribute least to efficiency are discarded
in a backwards-pruning step to avoid overfitting. The best model is selected via
cross-validation, a process that applies a penalty to each term (i.e., a knot) added to
the model in order to keep complexity as low as possible.
As in the CART analysis, we transformed the MARS text report files into AML
and then generated the suitability models using the ArcInfo GIS.
7.6 RESULTS
Terminal Confidence
Quercus Species Method Nodes AUC Interval (95%)
Q. pyrenaica, Pyrenean oak LMR Not Applicable 0.924 Not Available
Another feature of the CART model output became apparent when the results
were converted into suitability maps. As is illustrated in Figure 7.4a the CART
maps show abrupt transitions between areas of high and low suitability (darker and
lighter shading respectively) which reflects the reliance on binary rules. In
addition, due to the influence of climate variables, the suitability models frequently
replicate the shapes of isopleths, which makes them visually less convincing.
Although the backward pruning process in CART reduces the number of terminal
nodes and makes the final model less complex, it does not eliminate such effects.
These features are not present in the MARS-based maps (Figures 7.4b-7.4d) which
show more smoothed and continuous distributions of suitability values. For this
reason, we decided to use the MARS model output to generate a potential
vegetation distribution.
Figure 7.4 Suitability models: a) CART model for Q. rotundifolia, b) MARS model for Q. pyrenaica, c)
MARS model for Q. suber, d) MARS model for Q. rotundifolia. Darker shading indicates higher
suitability.
colonization likelihood. The result of these calculations was a model showing, for
each cell, the type of forest with the highest potential value after considering
colonization processes. Figure 7.5 shows the result, highlighting relatively
clustered regions for Q. pyrenaica amidst more dispersed distributions for the other
two Quercus species.
Figure 7.5 Potential distribution model of Quercus species in Extremadura; Q. pyrenaica (black), Q.
suber (dark gray), Q. rotundifolia (pale gray).
information. There is an explicit procedure to produce the final result and the
entire workflow of information is transparent and repeatable.
The models used are based on real data (data driven) and in our experience
these methods give good results in mountainous areas because the limiting factors
are mainly physical: elevation, potential insolation, slope, etc. Data on such
variables are generally available (e.g., elevation) or can be derived with sufficient
accuracy (e.g., potential insolation). However, the choice of statistical methods to
employ can be very important.
Our results (see also Muñoz and Felicísimo4) show that spatial distributions can
be better defined if we accept that they may follow non-linear patterns. LMR has
been widely used in predictive modelling to successfully predict organism/
community distributions despite drawbacks such as an inability to deal with skewed
or multi-modal responses, but we have also provided evidence that CART and
MARS are very effective methods in the most difficult cases.
The analysis presented in this chapter has used various procedures to link
statistical tools and GIS, but it is clear that there is still a need for better integration
of such capabilities in most common commercial GIS. Transferring the potential
vegetation model into practical forestry action would also require further
information, especially on soil properties and economic factors. Dealing with such
implementation issues is beyond the scope of this chapter, but it is evident that the
generated maps and statistics represent data of obvious utility. Combining such
model-based maps with current land-use information and management data could
help provide decision support tools that would be extremely useful in many aspects
of spatial or environmental planning.
7.8 ACKNOWLEDGMENTS
This study was conducted as part of Project 2PR01C023 co-funded by the Junta
de Extremadura and FEDER (Fondo Europeo de Desarrollo Regional).
7.9 REFERENCES
1.
Guisan, A. and Zimmermann, N.E., Predictive habitat distribution models in ecology, Ecological
Modelling, 135, 147-186, 2000.
2.
Breiman, L., Friedman, F., Olshen, R., and Stone, C., Classification and Regression Trees,
Wadsworth, Monterey, 1984.
3.
Friedman, J.H., Multivariate adaptive regression splines, Annals of Statistics, 19, 1-141, 1991.
4.
Muñoz, J. and Felicísimo, A.M., Comparison of statistical methods commonly used in predictive
modeling, Journal of Vegetation Science, 15, 285-292, 2004.
5.
James, M., Pattern Recognition, John Wiley, New York, 1988.