Forests 14 00170 v2

Article
Forest-Fire-Risk Prediction Based on Random Forest and

Backpropagation Neural Network of Heihe Area in
Heilongjiang Province, China
Chao Gao 1,2, *, Honglei Lin 3 and Haiqing Hu 1
1 College of Forestry, Northeast Forestry University, Harbin 150040, China

2 Heilongjiang Shengshan National Nature Reserve Service Center, Heihe 164300, China
3 School of Electrical Engineering, Heilongjiang University, Harbin 150080, China
* Correspondence: gaochaozfb@163.com; Tel.: +86-151-4662-4498
Abstract: Forest fires are important factors that influence and restrict the development of forest
ecosystems. In this paper, forest-fire-risk prediction was studied based on random forest (RF) and
backpropagation neural network (BPNN) algorithms. The Heihe area of Heilongjiang Province is one
of the key forest areas and forest-fire-prone areas in China. Based on daily historical forest-fire data
from 1995 to 2015, daily meteorological data, topographic data and basic geographic information
data, the main forest-fire driving factors were first analyzed by using RF importance characteristic
evaluation and logistic stepwise regression. Then, the prediction models were established by using the
two machine learning methods. Furthermore, the goodness of fit of the models was tested using the
receiver operating characteristic test method. Finally, the fire-risk grades were divided by applying
the kriging method. The results showed that 11 driving factors were significantly correlated with
forest-fire occurrence, and days after the last rain, daily average relative humidity, daily maximum
temperature, daily average water vapor pressure, daily minimum relative humidity and distance to
settlement had a high correlation with the risk of forest-fire occurrence. The prediction accuracy of the
two algorithms in regard to fire points was higher than that for nonfire points. The overall prediction
accuracy and goodness of fit of the RF and BPNN algorithms were similar. The two methods were
both suitable for forest-fire occurrence prediction. The high-fire-risk zones were mainly concentrated
Citation: Gao, C.; Lin, H.; Hu, H.
Forest-Fire-Risk Prediction Based on
in the northwestern and central parts of the Heihe area.
Random Forest and Backpropagation
Neural Network of Heihe Area in Keywords: random forest; backpropagation neural network; forest-fire occurrence prediction; forest-
Heilongjiang Province, China. Forests fire driving factor
2023, 14, 170. https://doi.org/
10.3390/f14020170
Academic Editor:
1. Introduction
Paul-Antoine Santoni
Forest ecosystems are related to the global carbon cycle and biochemical cycle, and
Received: 2 December 2022 the damage caused by forest fires to forest ecosystems is particularly serious [1]. Forest-fire
Revised: 13 January 2023
forecasting is necessary for the management and control of forest fires [2,3]. The factors
Accepted: 15 January 2023
causing forest fires include meteorological factors, topography, the source of fires, human
Published: 17 January 2023
activities, and so on. Determining the main driving factors is the key to establishing an
effective prediction model of forest-fire occurrence [4–6]. Meanwhile, with global climate
crisis, extreme weather occurs frequently, and the influence of climate change on an increase
Copyright: © 2023 by the authors.
in fire frequency and intensity has been reported [7–11]. The relationship between these
Licensee MDPI, Basel, Switzerland. factors and forest-fire occurrence is complex and possibly nonlinear.
This article is an open access article Research on the forest fire prediction emerged in the 1920s in countries around the
distributed under the terms and world. Especially, the United States, Canada, Australia, Russia and other countries attached
conditions of the Creative Commons great importance to forest fire research work [5,12–15]. There were many works devoted to
Attribution (CC BY) license (https:// forest fire prediction like probabilistic, deterministic, empirical and other. Mining historical
creativecommons.org/licenses/by/ fire data using mathematical and statistical methods to build a spatio-temporal prediction
4.0/). model between the forest fire and its driving factors was one of the most commonly used
Forests 2023, 14, 170. https://doi.org/10.3390/f14020170 https://www.mdpi.com/journal/forests

Forests 2023, 14, 170 2 of 17
forest fire occurrence prediction methods. The most used model for fire behavior prediction
simulation was the Rothermel model [16], which was a semi-physical model and had been
applied in many areas with good prediction accuracy. Based on the available meteorological
conditions and environmental observations, a multi-agent forest fire decision support
system integrating prediction, detection, and management was proposed in the Ref. [17]. A
point process framework was developed in the Ref. [18] for wildfire ignition observed in
the Mediterranean France, the model revealed significant covariate effects in the southern
French continent, and pointing out the influence of abnormally high temperatures and low
precipitation on the risk of fire occurrence. Applying GIS and remote sensing techniques,
the wildland fire risk mapped in several regions of Spain and an effective fire management
system were proposed in the Ref. [19]. A new artificial neural network-based machine
learning method was proposed in the Ref. [20] to build a GIS database of tropical forest
fires and a spatial model of forest fire risk in the Lam Dong province, Vietnam. By applying
remote sensing techniques, the incidence of the fire in Valmiki Tiger Reserve (Himalayan
foothills) was studied in the Ref. [21]. In the Ref. [22], the spatio-temporal change of
fire risk and danger potential was studied for northwestern Turkey where the data of
Landsat imagery were used. A fire risk map with GIS technologies was generated and
evaluated by using vegetation, topographic factors, and human factors for the Yeşilova
Forestry Enterprise located in Kahramanmaraş, Turkey in the Ref. [23]. Taking account to
anthropogenic and physical factors, a forest fire risk model for the north-western Anatolia
section of Turkey was analyzed based on GIS, Remote Sensing and Analytical Hierarchy
Process in the Ref. [24].
Throughout the fire management decision-making process, it is important to under-
stand the spatial distribution of fires and to identify the human and environmental factors
that contribute to the occurrence of fires in different regions and scales [25]. Additionally,
identifying the main drivers of the fire occurrence is essential to understand the spatial
pattern of wildfires and to implement effective fire management [26]. For the Mediter-
ranean region, two different linear models were proposed in the Ref. [25] based on fire
occurrence probability and frequency, respectively, to assess the most important human
and/or biophysical drivers affecting the model. A brief fire history in eastern Kentucky,
USA was reconstructed in the Ref. [27], and it found that elevation and slope were signifi-
cantly related to fire occurrence. A prediction model for the probability of the lightning fire
occurrence was studied in the Ref. [28]. An integrated dynamic spatio-temporal prediction
of the forest-fire occurrence in the western United States was performed in the Ref. [29].
To predict spatial patterns of fire occurrence at regional and national levels in Mexico, the
geographically weighted regression was used to predict fire density in the Ref. [30]. The
fire danger index model for north Lebanon was developed based on the meteorological
indices in the Ref. [31].
The traditional linear regression model is usually not enough to reveal their complex
relationship. Machine learning methods were widely used [32,33], which can overcome
the complex interaction among variables and have the ability to address nonlinear func-
tions [34–39], with higher explanatory ability than traditional regression methods. The
random forest (RF) algorithm is considered one of the best classification algorithms [40,41],
and backpropagation neural network (BPNN) is one of the most widely used neural net-
works. They have shown impressive diversity in their applications [42–50]. A machine
learning methodology was developed for the spatial prediction of forest fires with a case
study of tropical forest fires in Lao Cai in Vietnam [37]. The Bayesian network model was
used [38] to study the effects of temperature, relative humidity, wind speed, distance from
settlements, tree species, distance from roads, and so on in regard to the occurrence of
forest fires. Traditional multiple linear regression and RF were applied for the analysis
of fire occurrence at the European scale [39] based on fire density and different physical,
socioeconomic and demographic variables. The RF model showed a higher predictive
ability than multiple linear regression. Applying the maximum entropy algorithm and
considering physical and human variables, the prediction of human-caused fire occurrence
Forests 2023, 14, 170 3 of 17
was made [10] for the northeast of Spain. The predictive ability of the logistic regression
model and neural network algorithm was compared [42] for wasteland fire occurrence
in central Portugal, and the study results showed that the neural network algorithm had
higher prediction accuracy. By adopting artificial neural networks and support vector
machines, two forest-fire occurrence prediction algorithms were developed and tested
based only on cumulative precipitation and relative humidity [43]. A hybrid machine
learning algorithms was proposed in the Ref. [44] to mapping forest fire susceptibility in the
north of Morocco. The study of forest fire susceptibility zones based on different machine
learning algorithms has been a hot topic [45,46].
In recent years, the machine learning methods were also widely studied for the forest
fire prediction in China. By adopting the BPNN algorithm, the daily, monthly and seasonal
changes in forest-fire occurrence in Guangdong Province in China were predicted based
on meteorological factors in the Ref. [47]. The main forest-fire driving factors in Shanxi
Province in China based on the RF algorithm were analyzed [48]. Based on meteorological
factors, forest-fire prediction models were established by using the RF algorithm [51]. The
study determined that the RF algorithm was superior to the logistic regression model in
forest-fire prediction ability [48,49]. By using the RF algorithm, the forest-fire occurrence
prediction in Fujian Province in China based on meteorological factors was analyzed [50].
The risk prediction of forest fires of the southwest of China was studied in the Ref. [51] based
on ant-miner algorithm. The application of convolutional neural networks to the prediction
of forest fire susceptibility for Yunnan Province, China was developed in the Ref. [52].
Although a variety of machine learning methods have been applied and developed in forest
fire occurrence prediction, the methods still lack universal diffusion in China, and there is a
lack of comparison among different machine learning methods.
The Heihe area is a large forest-fire prevention area in Heilongjiang Province with a
heavy frequency and intensity of forest fires [53]. At present, research on forest fires in
Heilongjiang Province, China, has mainly been focused on the Daxing’an Mountains, and
little research on forest fires in this area has been reported. Carrying out research work on
forest-fire prediction in this area is of great significance to enhance the forest-fire defense
capability in the northern part of Heilongjiang Province. Since many fire decisions are
made at the zoning level, the ability to make daily predictions of fires in a given area is
useful for many fire management applications [27]. Motivated by this analysis, to more
accurately identify forest fire prone areas, the prediction adaptability of the RF and BPNN
algorithms on forest fires was compared for the Heihe area in Heilongjiang Province, China;
the main forest-fire driving factors were analyzed combining RF importance characteristic
evaluation and logistic stepwise regression; the probability prediction models of forest-fire
occurrence was established, in which the receiver operating characteristic test method was
used in the validation process; and the forest-fire-risk grades were divided to provide
important technical support for scientific and effective fire prevention and fire suppression
work in this area.
2. Study Area and Data

2.1. Study Area
The Heihe area is located on the northeastern border of China, has a longitude of
124◦ 450 E to 129◦ 180 E and latitude of 47◦ 420 to 51◦ 030 N and spans the Daxing’an Mountains
and Xiaoxing’an Mountains from north to south. In terms of structure, function, nature
and status, the Heihe area is an important part of the natural ecosystem of the Daxing’an
Mountains and Xiaoxing’an Mountains, has complete ecological characteristics of boreal
temperate forests and has a unique ecological location with a forest coverage of 48.2%.
The topography of the study area was shown in Figure 1. The forest types are broadleaf
forest, coniferous forest and conifer-broadleaf forest. From 1995 to 2015, there were more
than 700 forest fires with an average annual burned area of approximately 1200 hm2 . The
occurrences of forest fires were mainly concentrated in March, April, May, June, September
and October; no forest fires occurred in January, February and December; and only a few
tics of boreal temperate forests and has a unique ecological location with a forest cover-
age of
tics of boreal
48.2%. temperate
The topography
forestsof
andthehas
study area was
a unique shown location
ecological in Figurewith1. The forestcover-
a forest types
are of
age broadleaf
48.2%. Theforest, coniferous
topography of forest
the study andarea
conifer-broadleaf
was shown inforest.
FigureFrom 1. The1995 to types
forest 2015,
there were more than 700 forest fires with an average annual burned
are broadleaf forest, coniferous forest and conifer-broadleaf forest. From 1995 to 2015, area of approxi-
mately
there 1200more
were hm2.thanThe 700
occurrences
forest firesof with
forestanfires wereannual
average mainlyburned
concentrated
area ofinapproxi-
March,
Forests 2023, 14, 170 April, May,
mately 1200 June,
hm . September
2 and October;
The occurrences of forestnofires
forest firesmainly
were occurred in January,inFebruary
concentrated 4 of 17
March,
and December;
April, May, June, and only a few
September andforest fires no
October; occurred in July
forest fires and August.
occurred The mean
in January, tem-
February
perature
and was about
December; and 0.8
only°C, the mean
a few forestwind speed wasinabout
fires occurred 2.5 m
July and s−1, andThe
August. themean
meantem-
pre-
cipitation
perature was
was
forest fires about
about 0.8
occurred 1.5 mm.
°C, the
in July andmean
August. wind Thespeed
meanwas about 2.5was
temperature m sabout
−1 , and0.8 ◦
theC,mean pre-
the mean
cipitation
wind speed was about
was 1.52.5
about mm.m s−1 , and the mean precipitation was about 1.5 mm.
Figure 1. The topography of the study area.

Thetopography
Figure1.1.The
Figure topographyofofthe
thestudy
studyarea.
area.
2.2. Data
2.2. Data
2.2. Data
The daily historical forest-fire data of the Heihe area were collected from 1995 to
The daily historical forest-fire data of the Heihe area were collected from 1995 to 2015.
2015.The
In the forest
daily fires, the
historical largest forest
forest-fire firethe
was a lightning fire with an area of 1995
185,698
In the forest fires, the largest forestdata of
fire was Heihe area
a lightning firewere
withcollected
an area offrom
185,698 hmto2 .
hm 2. The fire was caused by many reasons. The forest fire causes are given in Figure 2.
2015. In the
The fire wasforest
caused fires,
by the
many largest forest
reasons. fire
The wasfire
forest a lightning firegiven
causes are within
anFigure
area of
2. 185,698
hm2. The fire was caused by many reasons. The forest fire causes are given in Figure 2.
Figure 2. The cause of the fire in the study area.

Figure 2. The cause of the fire in the study area.
FigureWhen
2. The establishing theinfull
cause of the fire the sample of forest-fire data, ArcGIS 10.2 software was used
study area.
When establishing
to randomly the full sample
select non-ignition pointsof inforest-fire
time and data,
spaceArcGIS 10.2tosoftware
according was used
the proportion of
to randomly
ignition
When select
points non-ignition
1:1 [54]. The
establishing points
fire sample
the full point was in time and
assigned “1”,
of forest-fire space
and
data, according
the nonfire
ArcGIS to the proportion
point waswas
10.2 software of
assigned
used
ignition
“0”.
to points
There
randomly were 1:1
1418
select [54].
data The firepoints
points.
non-ignition point inwastimeassigned “1”, according
and space and the nonfire point was as-
to the proportion of
signed
ignition “0”.
The There
points 1:1were
[54]. 1418
meteorological data
The data
were
fire points.
pointobtained from the
was assigned daily
“1”, andmeteorological
the nonfire point datawas
of four
as-
The
national
signed meteorological
meteorological
“0”. There were 1418 data were
stations
data in obtained
the Heihefrom
points. areathefromdaily
1995meteorological data of from
to 2015 downloaded four
the China
national
The Meteorological
meteorological Data
stations Sharing
in the Network
Heihe area(http://www.cma.gov.cn/
from 1995 to 2015 accessed
downloaded
data were obtained from the daily meteorological data of four on
from
3 March 2016). They include daily average relative humidity (%),
national meteorological stations in the Heihe area from 1995 to 2015 downloaded from3
the China Meteorological Data Sharing Network (http://www.cma.gov.cn/ daily average
accessed wind
on
speed
the China s−1 ), daily average
(m Meteorological Datatemperature (◦ C), daily
Sharing Network average water vapor pressure
(http://www.cma.gov.cn/ accessed (hPa),
on 3
daily average air pressure (hPa), daily maximum temperature (◦ C), daily maximum wind
speed (m s−1 ), daily minimum temperature (◦ C), daily minimum relative humidity (%),
daily sunshine hours (hour) and daily precipitation (mm). Moreover, the days after the
last rain were calculated from the meteorological data. In data processing, according to
the geographic coordinates of the full sample data and the geographic coordinates of the
four meteorological stations, MATLAB software was used to calculate the corresponding
distance. The nearest meteorological station data were used as the meteorological data of
Forests 2023, 14, 170 5 of 17
fire points or nonfire points, and the corresponding meteorological value was extracted
according to the occurrence time of fire points or nonfire points.
The basic geographic information data were collected from the 1:250,000 national
basic geographic databases provided by the National Geographic Information Resources
Directory System website (http://www.webmap.cn/ accessed on 28 February 2022). In
data processing, the analysis tool in ArcGIS 10.2 software was used to calculate the distance
(m) from the fire points or nonfire points to the settlements, roads and railways.
Topographic factors were derived from ASTER GDEM 30M resolution Digital Eleva-
tion Model (DEM) data provided by the Geospatial Data Cloud of China (http://www.
gscloud.cn/ accessed on 28 February 2022). The 3D analysis tool in ArcGIS 10.2 software
was used to extract and calculate the elevation (m) and slope of fire points and nonfire
points based on DEM data. The considered forest-fire driving factors in the models can be
found in Table 1.
Table 1. The considered forest-fire driving factors in the models.
Factors Data Sources Resolution/Scale Minimum Value Maximum Value

Daily average wind
1 m s−1 3 7.4
speed
Daily maximum
1 m s−1 2.2 12.6
wind speed
Daily sunshine hours 1h 0 15.1
Daily average air
1 hPa 967.8 1012
pressure
Daily average
1 ◦C −30.6 24.7
temperature
Daily maximum China Meteorological Data Sharing
Meteorological 1 ◦C −24.5 32.2
temperature Network (http://www.cma.gov.cn/
data accessed on 3 March 2016)
Daily minimum
1 ◦C −37 −19.4
temperature
Daily average water
1 hPa 0.4 24.1
vapor pressure
Daily average
1% 24 95
relative humidity
Daily minimum
1% 8 58
relative humidity
Daily precipitation 1 mm 0 49.4
Days after the last
1 day 0 62
rain
Distance to National Geographic Information 1m 51.4 14,519.2
settlement Resources Directory System website
Basic geographic
information data Distance to road (http://www.webmap.cn/ accessed 1m 1.3 8557.5
on 28 February 2022)
Distance to railway 1m 12.0 163,573.1
Elevation Geospatial Data Cloud of China 1m 85 1022
Topographic data (http://www.gscloud.cn/ accessed
Slope on 28 February 2022) 1◦ 0 24.2
To avoid the prediction error caused by the difference in order of magnitude, forest-
fire impact factors were normalized at the same scale. The max–min method was used to
transform the sample data into numbers between (0, 1), and the functional form is:
xi − xmin
xi = (1)
xmax − xmin
where xi is the sample data, xmin is the minimum value and xmax is the maximum value.
This paper used MATLAB software to realize the algorithm and ArcGIS 10.2 software
to draw.
where xi is the sample data, xmin is the minimum value and xmax is the maximum
value. This paper used MATLAB software to realize the algorithm and ArcGIS 10.2
software to draw.
3. Methods
Forests 2023, 14, 170 6 of 17
3.1. Random Forest Algorithm
The RF algorithm is a highly flexible machine learning algorithm whose basic unit is
3. Methods tree. By integrating multiple trees into one through the idea of ensemble
a decision
learning,
3.1. Randomthe RF algorithm
Forest Algorithm has good anti-noise ability and does not easily fall into over-
fitting and underfitting [49]. Let there be n forest-fire data and m forest-fire driving fac-
The RF algorithm is a highly flexible machine learning algorithm whose basic unit is a
tors. In the
decision tree.RF
Byalgorithm,
integratingthe ntree sample
multiple set one
trees into draws from the
through the idea
full sample data learning,
of ensemble by using
the
theRF algorithm
bootstrap has good
sampling anti-noise
methods, the nand
thenability does not easily
tree classification fallare
trees into overfitting
built, and the andmtry
underfitting [49]. Let there be n forest-fire data and m forest-fire
factors are randomly selected from each node of each tree. Furthermore, the variable with driving factors. In the
RF
thealgorithm,
strongest the ntree sample
classification set draws
ability from the
is selected full sample and
for branching, datathe by classification
using the bootstrap
results
sampling methods, then the n tree classification trees
of RF (strong classifier) were finally obtained by voting on the tree (weak are built, and the mtry factors are
classifier). Let
randomly selected from each node of each tree. Furthermore, the variable with the strongest
mtry = m , where the value of ntree takes enough to make the overall error rate stable
classification ability is selected for branching, and the classification results of RF (strong √
[55].
classifier) were finally obtained by voting on the tree (weak classifier). Let mtry = m,
whereAssume
the valuethat the
of n treetrain
takesset D = {to
enough ( x1make
, y1 ),( xthe
2 , y2overall yn )} .rate
),...,( xn ,error Thestable
final classification
[55]. de-
Assume that
cision is given as: the train set D = {( x ,
1 1 y ) , ( x 2 2, y ) , . . . , ( x ,
n ny )} . The final classification
decision is given as: ntree

ntree
H ( x) = arg max I (h ( x) = y ) (2)
H ( x ) = argmaxy∈∑ Y I (h (ix ) = y)
i =1 i
(2)
y ∈Y
i =1
whereh (hxi ()xis
where
) the
is the weak
weak classifier,
classifier, andand
I() is indicator
I (·) is the the indicator function.
function.
i
For
Forthethegiven
givenclassification models
classification models m1 (mx1)(,xm
),2m(2x()x,),...,
. . . ,m
mkk((xx)), , the
the training data of
ofeach
each
classification were sampled from the original data ( x, y). The marginal function was
classification were sampled from the original data (x, y) . The marginal function was
computed by
computed by mg( x, y) = avk I (mk ( x ) = y) − maxavk I (mk ( x ) = j) (3)
j6=k
m g ( x , y ) = av k I ( m k ( x ) = y ) − m ax av k I ( m k ( x ) = j ) (3)
j≠k
Then, the errors can be defined by
Then, the errors can be defined by
PE = Px,y (mg( x, y) < 0) (4)
PE = Px, y (mg( x, y) < 0) (4)
The
Thesimple
simpleflowchart
flowchartofofthe
theRF
RFalgorithm
algorithmisisshown
shownininFigure
Figure3.3.
Figure 3. The simple flowchart of the RF algorithm.

Figure 3. The simple flowchart of the RF algorithm.
3.2. Backpropagation Neural Network Algorithm
3.2. Backpropagation Neural Network Algorithm
The BPNN algorithm is a kind of supervised learning algorithm that has a strong
nonlinear mapping ability and anti-interference ability. The number of neuron nodes in
the input layer of the BPNN depends on the number of forest-fire driving factors. The
number of hidden layers ranges from one or more. Here, one hidden layer is selected. The
number of neuron nodes in the hidden layer was determined to be 11 by experience and
the experiment. The number of neuron nodes in the output layer is 1. When the number of
neurons is too large, it is easy to increase the training time, and overfitting occurs when the
amount of data processed is too large. Furthermore, the transfer function, summation unit
The BPNN algorithm is a kind of supervised learning algorithm that has a strong
nonlinear mapping ability and anti-interference ability. The number of neuron nodes in
the input layer of the BPNN depends on the number of forest-fire driving factors. The
number of hidden layers ranges from one or more. Here, one hidden layer is selected.
The number of neuron nodes in the hidden layer was determined to be 11 by experience
Forests 2023, 14, 170 7 of 17
and the experiment. The number of neuron nodes in the output layer is 1. When the
number of neurons is too large, it is easy to increase the training time, and overfitting
occurs when the amount of data processed is too large. Furthermore, the transfer func-
and
tion,activation
summation function
unit among layers of function
and activation the neuralamong
network are determined.
layers Thenetwork
of the neural amount areof
forest-fire data in this paper is more suitable for applying the BPNN. The simple flowchart
determined. The amount of forest-fire data in this paper is more suitable for applying the
of BPNNThe
BPNN. is given
simple in flowchart
Figure 4. of BPNN is given in Figure 4.
Figure4.4.The
Figure Thesimple
simpleflowchart
flowchartofofthe
theBPNN
BPNNalgorithm.
algorithm.
ByByinputting
inputtingtraining
trainingsamples
samplesandand passing
passing them
them layer
layer by layer,
by layer, the error
the error between
between the
the actual
actual outputoutput
valuevalue andexpected
and the the expected
outputoutput
valuevalue is compared
is compared inoutput
in the the output
layer,layer,
the
the error
error is back-propagated,
is back-propagated, andweight
and the the weight and threshold
and threshold are modified
are modified layer
layer by by until
layer layer
the error
until theiserror
reduced to the given
is reduced accuracy
to the range. The
given accuracy BPNN
range. Thecan be optimized
BPNN by changing
can be optimized by
the network
changing thetopology,
networkthe learning
topology, rate,
the the initial
learning rate,weight andweight
the initial the threshold.
and the The main
threshold.
calculation
The main formulas
calculationof formulas
BPNN areofshown
BPNNasare below.
shownTheastransfer
below.function is calculated
The transfer function byis
calculated by
1
f (x) = 1x
−
(5)
f ( x)1=+ e − x (5)
1+ e
The error function is calculated by
The error function is calculated by
1 1 22
E p2=∑ (
Ep = tl (− OO
tl − l) )
l
(6)
(6)
l2 l
wheretl tis
where l
is the
the expected
expected outputs,
outputs, and
and is the
Ol Oisl the calculated
calculated outputs
outputs by the
by the nets.
nets.
The outputs of the neuron nodes in the hidden layer are calculated
The outputs of the neuron nodes in the hidden layer are calculated by by
yi = if (∑ (
y = f ( (wij x j − θi )) (7)
w x − θi ))
j ij j
(7)
j
where wij is the connection weight between the input neuron node and the hidden
where wijnodes,
neuron is the and
connection weight between the input neuron node and the hidden neuron
θ i is its threshold.
nodes, and θi is its threshold.
The outputs of the neuron nodes in the output layer are calculated by
The outputs of the neuron nodes in the output layer are calculated by
Ol = f ( (Tij xi − θl )) (8)
Ol = f (∑ (iTij xi − θl )) (8)
where Tij is the connection weight between the hidden neuron node and the output
i
where
neuronTijnodes, and θl is its
is the connection weight between
threshold. the hidden
Moreover, θ i , Tij and θl can be corrected
wij ,neuron node and the output neu-
ron nodes, and θl is its threshold. Moreover, wij , θi , Tij and θl can be corrected by the
by the learning
learning rate. rate.
3.3.RF
3.3. RFImportance
ImportanceCharacteristic
CharacteristicEvaluation
Evaluationand
andLogistic
LogisticStepwise
StepwiseRegression
Regression
Inthe
In theRF
RFalgorithm,
algorithm,the
the bootstrap
bootstrap resampling
resampling method
method waswas adopted,
adopted, andand the sam-
the samples
ples that were not drawn each time accounted for approximately 36.8% of the
that were not drawn each time accounted for approximately 36.8% of the total samples, total sam-
which were called out-of-bag (OOB) data. Then, they were taken as the test sample data to
obtain the OOB estimation of the model [56].
The RF algorithm can evaluate the importance of feature variables according to the
OOB error rate obtained by OOB. The principle is as follows: the OOB error rate is calculated
according to the OOB data of each classification tree t (errOOBt ). When evaluating the
importance of the feature variable Xi , other variables are kept unchanged, the sequence of
Xi is randomly transformed, the OOB error rate of the OOB data for each transformation
Forests 2023, 14, 170 8 of 17
(errOOBit ) is calculated, and the importance of the feature variable is evaluated by analyzing
the increase in the OOB error rate when the sequence changes [49].
The calculation formula of the importance (VI) of variable Xi is [49]:
1
ntree ∑
V I (Xj ) = (errOOBit − errOOBt ) (9)
t
Logistic stepwise regression is a common method used to select independent vari-

ables for linear models. In this paper, the input variable index systems for the forest-fire
prediction model were established by using logistic stepwise regression and RF importance
characteristic evaluation.
3.4. Receiver Operating Characteristic Curve (ROC)

The receiver operating characteristic (ROC) curve is a more commonly used accuracy
evaluation tool [57,58], and the curve is plotted with the false-positive rate (probability
of judging an actual false value as a true value) as the horizontal coordinate and the true-
positive rate (probability of judging an actual true value as a true value) as the vertical
coordinate. The area under the curve (AUC) of the ROC curves was used as a measure
of the predictive capability and the goodness of fit of the models. Usually, AUC > 0.7
is meaningful, and the closer the AUC value is to 1, the better the corresponding model
fit [48]. In this paper, AUC was used as an evaluation index.
4. Results and Analysis

4.1. Forest-Fire Driving Factors
The temporal extent, spatial extent and main drivers of forest fire prediction would
affect the accuracy of forest fire occurrence prediction. After determining the temporal
and spatial scales of forest fire prediction, the forest fire drivers would directly affect the
prediction accuracy of forest fire occurrence prediction models. Therefore, establishing the
forest fire drivers was the key to building a high prediction accuracy prediction model of
the forest fire occurrence.
To obtain the models with strong generalization performance, the full sample dataset
was randomly divided into training sets (60%) and validation sets (40%) five times. Then,
five subsets of the data were randomly obtained to extract the important feature variables.
Logistic stepwise regression was used to screen out the forest-fire driving factors
with a significance level of p < 0.05 in the five subsets. The significant characteristic
variables that appeared more than 3 times in the five stepwise regressions were used as
the logistic variable indicator system and entered into the cross-validation analysis of the
algorithms. Meanwhile, logistic regression was performed by using the full sample set
to calculate the significance level of each variable in the logistic variable indicator system.
The variables in the logistic variable index systems with significance level p were daily
maximum temperature with p < 0.0001, days after the last rain with p < 0.0001, daily average
relative humidity with p < 0.0001, daily average water vapor pressure with p < 0.0001,
distance to settlement with p < 0.0001, daily minimum relative humidity with p < 0.001,
elevation with p < 0.001, and daily average air pressure with p < 0.05. Logistic stepwise
regression was implemented by R software.
In the RF importance characteristic evaluation, the higher the scores of variables are,
the greater the impact on forest-fire occurrence and the greater the importance. In the five
subsets of data samples, according to the score of feature importance, the insignificant
variables were removed to reconstruct the RF, and the set of variables with the smallest
OOB error was selected as the variables of this sample set [54]. Combining the screening
results of the five subsets of data samples, the significant feature variables that appeared
more than 3 times together were screened out, as shown in Figure 5. The scores of important
variables that appeared more than 3 times were screened out by importance ranking, and
the variables were screened one by one in the full sample set with the criterion of minimum
OOB error. Then, the RF variable index system (importance score) was obtained, including
subsets of data samples, according to the score of feature importance, the insignificant
variables were removed to reconstruct the RF, and the set of variables with the smallest
OOB error was selected as the variables of this sample set [54]. Combining the screening
results of the five subsets of data samples, the significant feature variables that appeared
Forests 2023, 14, 170 more than 3 times together were screened out, as shown in Figure 5. The scores of im- 9 of 17
portant variables that appeared more than 3 times were screened out by importance
ranking, and the variables were screened one by one in the full sample set with the crite-
rion of minimum OOB error. Then, the RF variable index system (importance score) was
days after the last rain (2.95), daily average relative humidity (2.63), daily maximum
obtained, including days after the last rain (2.95), daily average relative humidity (2.63),
temperature (1.61), daily precipitation (1.38), daily minimum relative humidity (1.16),
daily maximum temperature (1.61), daily precipitation (1.38), daily minimum relative
daily average
humidity temperature
(1.16), (1.13),temperature
daily average daily minimum temperature
(1.13), (1.09),temperature
daily minimum daily average water
(1.09),
vapor pressure (1.06), and distance to settlement (0.99). The RF importance characteristic
daily average water vapor pressure (1.06), and distance to settlement (0.99). The RF im-
evaluation was implemented
portance characteristic by using
evaluation was MATLAB software.
implemented by using MATLAB software.
(a) (b)
Forests 2023, 14, x FOR PEER REVIEW 10 of 17
(c) (d)
(e) (f)
Figure5.5.RF
Figure RF variable
variable importance
importance contrast:
contrast:(a)
(a)Data
Data1;1;(b)
(b)Data
Data2; 2;
(c)(c)
Data 3; (d)
Data Data
3; (d) Data4; (e) Data
4; (e) 5; 5;
Data
and (f) Full Data. x1 denotes days after the last rain; x2 denotes daily average relative humidity; x3
and (f) Full Data. x1 denotes days after the last rain; x2 denotes daily average relative humidity; x3
denotes daily maximum temperature; x4 denotes daily minimum relative humidity; x5 denotes
denotes daily maximum
daily average temperature; x6 denotesxdaily
temperature; 4 denotes daily minimum
precipitation; relative
x7 denotes humidity;
distance x5 denotes
to settlement; daily
x8 de-
average temperature; x denotes
notes daily minimum temperature;
6 daily precipitation; x
x9 denotes daily average
7 denotes distance to
water vapor pressure.settlement; x 8 denotes
daily minimum temperature; x9 denotes daily average water vapor pressure.
Significant forest-fire driving factors with a significance level of p < 0.001 in logistic
stepwise regression were also the top important forest-fire drivers in the RF importance
characteristic evaluation, which could be judged to be the most important factors influ-
encing forest-fire occurrence and were used as an integrated variable index system. Fi-
nally, three variable index systems of forest-fire factors were obtained, as shown in Table
Forests 2023, 14, 170 10 of 17
Significant forest-fire driving factors with a significance level of p < 0.001 in logistic
stepwise regression were also the top important forest-fire drivers in the RF importance
characteristic evaluation, which could be judged to be the most important factors influenc-
ing forest-fire occurrence and were used as an integrated variable index system. Finally,
three variable index systems of forest-fire factors were obtained, as shown in Table 2. The
importance of days after the last rain and the daily average relative humidity in the RF
variable index system was significantly greater than other variables, followed by daily
maximum temperature. In general, meteorological factors had the greatest influence on the
occurrence of forest fires, especially humidity and temperature. Human activity factors,
such as distance to the settlement, distance to the road, and distance to the railway, and
topographic factors were less influential than meteorological factors. Topographic factors
were not screened out in the RF variable index system, and the elevation in the logistics
variable index system was smaller than that of the distance to the settlement. In other
words, ignitions were concentrated close to settlements.
Table 2. Three variable index systems.
Logistic Variable RF Variable Index Integrated Variable

Forest-Fire Factors
Index System System Index System
Daily average wind
− − −
speed
Daily maximum wind
− − −
speed
Daily sunshine hours − − −
Daily average air
+ − −
pressure
Daily average
− + −
temperature
Daily maximum
+ + +
temperature
Daily minimum
− + −
temperature
Daily average water
+ + +
vapor pressure
Daily average relative
+ + +
humidity
Daily minimum
+ + +
relative humidity
Daily precipitation − + −
Days after the last
+ + +
rain
Distance to settlement + + +
Distance to road − − −
Distance to railway − − −
Elevation + − −
Slope − − −
“+” indicates that the factor appeared in the variable index system, and “−” indicates that the factor did not
appear in the variable index system.
According to Figure 2, the main forest-fire causes were human activities, being about
67.4%, in which burning the stubble in agricultural lands took precedence, followed by
smoking, hunting or cooking in the wild in third place. Lightning strikes and power lines
Forests 2023, 14, 170 11 of 17
were relevant natural causes of forest fires, being about 3.5%. Furthermore, about 27.5%
of forest fires were unexplained. The forest fire occurrence prediction mainly focused on
artificial fire in this paper. Next, the prediction accuracy and goodness of fit of the models
are analyzed in the following subsection.
4.2. Prediction Accuracy and Goodness of Fit of the Models

The AUC value calculated by the ROC curve and the prediction accuracy were used
to conduct the cross-validation analysis and the comparison between the RF and BPNN
models under the three variable index systems. The full sample data were randomly
divided into a 60% training set and a 40% validation set, and the models were trained by
the training set and validated by the validation set under the three variable index systems.
To provide a more objective evaluation of the models, the average prediction accuracies
and average AUC values of the validation sets of 100 Monte Carlo experiments were taken
as the prediction accuracy and the goodness of fit of the models, respectively.
Table 3 shows the prediction accuracy and AUC values of the two machine learning
methods under the three variable index systems. The prediction accuracy of the RF algo-
rithm was between 87.91% and 88.98% in the three variable index systems. The accuracy
of predicting ignition Point “1” was approximately 3.99%–4.49% higher than that of non-
ignition Point “0”, and the highest accuracy level was 91.24%. The lowest AUC was 0.946,
and the highest AUC was 0.955.
Table 3. Results of the cross-validations.
Logistic RF Variable Integrated

Method/Accuracy/AUC Variable Index Index Variable Index
System System System
0 86.44% 86.75% 85.86%
Accuracy 1 90.43% 91.24% 90.00%
RF algorithm
Total 88.42% 88.98% 87.91%
AUC 0.947 0.955 0.946
0 83.00% 85.18% 85.18%
BPNN Accuracy 1 89.03% 88.74% 88.61%
algorithm Total 86.01% 86.94% 86.88%
AUC 0.930 0.939 0.938
“0” indicates ignition; “1” indicates non-ignition.
The prediction accuracy of the BPNN algorithm in the three variable index systems
was between 86.01% and 86.94%. The accuracy of predicting ignition Point “1” was
approximately 3.43%–6.03% higher than that of non-ignition Point “0”. The lowest AUC
was 0.930, and the highest AUC was 0.939.
The prediction accuracies and AUC values of the two machine learning methods were
similar under the three variable index systems. Both models had high goodness of fit and
prediction accuracy, and the accuracy of predicting ignition points was higher than that of
non-ignition points, which were both suitable for predicting and analyzing the occurrence
probability of forest fires.
The results showed that the prediction accuracy and AUC values of the algorithms
were the highest in the RF variable index system. In terms of prediction accuracy, the
RF algorithm was 1.03%–2.41% higher than the BP neural network algorithm. The RF
algorithm under the RF variable index system had the highest overall prediction accuracy
level and goodness of fit.
However, the integrated variable index system relying only on the main four forest-fire
drivers could obtain an average prediction accuracy close to that of the RF algorithm under
the RF variable index system, which was determined to be the best variable index system
for forest-fire prediction in the Heihe area, and the corresponding variables were the main
driving factors affecting the occurrence of forest fires. In summary, the RF algorithm based
level and goodness of fit.
However, the integrated variable index system relying only on the main four for-
est-fire drivers could obtain an average prediction accuracy close to that of the RF algo-
rithm under the RF variable index system, which was determined to be the best variable
index system for forest-fire prediction in the Heihe area, and the corresponding variables
Forests 2023, 14, 170 12 of 17
were the main driving factors affecting the occurrence of forest fires. In summary, the RF
algorithm based on the integrated variable index system was determined to be the best
forest-fire prediction model. However, it is noted that the prediction accuracies are only a
on the integrated variable index system was determined to be the best forest-fire prediction
little bit different among the two algorithms under the three variable index systems. The
model. However, it is noted that the prediction accuracies are only a little bit different
proposed method does not bring much improvement to the three variable index systems.
among the two algorithms under the three variable index systems. The proposed method
According to the established probability prediction model, the probability of for-
does not bring much improvement to the three variable index systems.
est-fireAccording
occurrence wasestablished
to the divided into five grades
probability [59], as
prediction shown
model, thein Table 4. of forest-fire
probability
occurrence was divided into five grades [59], as shown in Table 4.
Table 4. The standard of forest-fire-risk grade division.
Table 4. The standard of forest-fire-risk grade division.
Forest-Fire Occurrence Probability Fire-Risk Grade
0~0.2 Probability
Forest-Fire Occurrence Ⅰ Basically
Fire-Risk no fire
Grade
0.2~0.4
0~0.2 Ⅱ Not prone to fire
I Basically no fire
0.4~0.6
0.2~0.4 Ⅲ Possible
II Not fire
prone to fire
0.4~0.6
0.6~0.8 III
ⅣPossible
Prone fire
to fire
0.6~0.8 IV Prone to fire
0.8~1
0.8~1 Ⅴ Extremely
V Extremely prone
prone to fire
to fire
The RF algorithm and BP neural network algorithm based on the integrated variable
The RF algorithm and BP neural network algorithm based on the integrated variable
index system were used to calculate the prediction probability of the full sample data,
index system were used to calculate the prediction probability of the full sample data, and
and
thethe kriging
kriging method
method in ArcGIS
in ArcGIS 10.2used
10.2 was wastoused to classify
classify the fire-risk
the fire-risk grade.
grade. Figure Figure 6
6 shows
shows the ROC curves of the two
the ROC curves of the two models. models.
(a) (b)
Figure
Figure6.6.ROC
ROCcurve.
curve.(a)
(a)ROC
ROC curve of RF
curve of RF algorithm.
algorithm.(b)
(b)ROC
ROCcurve
curveofof BPNN
BPNN algorithm.
algorithm.
Figure77shows
Figure shows the
the distribution
distribution of of forest-fire
forest-fireoccurrence
occurrenceprobability
probabilityandand
fire-risk
fire-risk
distribution based on the RF algorithm. Figure 8 shows the distribution of
distribution based on the RF algorithm. Figure 8 shows the distribution of forest-fire oc- forest-fire
occurrence
currence probability
probability and
and fire-riskdistribution
fire-risk distribution based
basedon onthe
theBPNN
BPNN algorithm. TheThe
algorithm. general
general
trends of the spatial distribution maps of forest fire occurrence probability corresponding
trends of the spatial distribution maps of forest fire occurrence probability corresponding
to the two models obtained in this study are basically consistent, but some local differences
to the two models obtained in this study are basically consistent, but some local differ-
are also apparent, which are related to the specific implementation mechanisms of each of
ences are also apparent, which are related to the specific implementation mechanisms of
the two models. The forest-fire-prone areas in the Heihe area were mainly concentrated in
the northwestern and central areas.
Forests 2023,
Forests 2023, 14,
14, xx FOR
FOR PEER
PEER REVIEW
REVIEW 13 of
13 of 17
17
Forests 2023, 14, 170 each of

each of the
the two
two models.
models. The
The forest-fire-prone
forest-fire-prone areas
areas in
in the
the Heihe
Heihe area
area were
were mainly
mainly con-
13 ofcon-
17
centrated in
centrated in the
the northwestern
northwestern and
and central
central areas.
areas.
(a)
(a) (b)
(b)
Figure
Figure
Figure 7. Forest-fire
7. Forest-fire
7. Forest-fire occurrence
occurrence
occurrence probability
probability
probability anddistribution
and
and risk risk distribution
risk distribution based
based based
on on RF
on RF algorithm.
algorithm.
RF algorithm. (a) For-
(a) For-
(a) Forest-fire
est-fire occurrence
est-fire occurrence probability.
probability. (b)
(b) Forest-fire-risk
Forest-fire-risk distribution.
distribution.
occurrence probability. (b) Forest-fire-risk distribution.
(a)
(a) (b)
(b)
Figure 8.
Figure 8. Forest-fire
Forest-fire occurrence
occurrence probability
probability and
and risk
risk distribution
distribution based
based on
on BPNN
BPNN algorithm.
algorithm. (a)
(a)
Figure 8. Forest-fire occurrence probability and risk distribution based on BPNN algorithm. (a) Forest-
Forest-fire occurrence
Forest-fire occurrence probability.
probability. (b)
(b) Forest-fire-risk
Forest-fire-risk distribution.
distribution.
fire occurrence probability. (b) Forest-fire-risk distribution.
5.
5. 5. Discussion
Discussion
Discussion
In
InInthe the
the selectionofof
selection
selection offorest-fire
forest-firefactors,
forest-fire factors,itititwas
factors, was found
was found that
that the
that the days
thedays after
daysafter the
afterthe last
thelast rain,
lastrain,
rain,the
the
thedaily
daily
daily average
average
average relative
relative
relative humidity,
humidity,
humidity, daily maximum
daily
daily maximum
maximum temperature,
temperature,
temperature, daily
daily average
daily water
average
average water vapor
water
vapor
pressure,
vapor
pressure, dailydaily
pressure,
daily minimum
minimumminimum relative
relative humidity
relative humidity
humidity and and
and distance
distance
distance to settlement
to settlement
to settlement were
were werethe the
the most
most
significant
most significant
significant factors
factors
factors in the
in the
in the logistic
logistic
logistic stepwise
stepwise
stepwise regression
regression
regression (significance
(significance
(significance level
level of
level p
of of < 0.0001)
p <p0.0001)
< 0.0001) and
and
andwere
were
were also
also the
alsothe top
thetop significant
topsignificant factors in
significant factors in RF
RFimportance
RF importancecharacteristic
importance characteristicevaluation,
characteristic evaluation,
evaluation, which
which
which
could
could be be judged
judged as as
the the
most most important
important factors
factors affecting
affecting the
could be judged as the most important factors affecting the occurrence of forest fires the occurrence
occurrence of of forest
forest fires
fires in
in in
thethe Heihe
Heihe area.
area. Meanwhile,
Meanwhile, forest-fire
forest-fire prediction
prediction models
models
the Heihe area. Meanwhile, forest-fire prediction models with high estimation accuracy with
with high
high estimation
estimation accuracy
accuracy
could
could
could be
be be obtained
obtained
obtained basedbased
based on these
on these
on these six significant
significant
six significant
six forest-fire
forest-fire driving
forest-fire driving
factors.
driving factors. Daily mini-
Daily minimum
factors. Daily mini-
mum relative
relative
mum relative humidity,
humidity, humidity,
elevation, elevation,
daily average
elevation, daily average
daily average
temperature, temperature, daily average
daily average
temperature, daily average air pressure,
air pressure,
air pressure,
daily
daily minimum
minimum
daily minimum
temperature temperature
and dailyand
temperature and daily precipitation
precipitation
precipitation
daily appeared
appearedappeared in the
in the variable
in theselection
variableprocess
variable selection
selection
and had some
process
process andinfluence
and had some
had some oninfluence
forest-fireon
influence occurrence.
on forest-fireAmong
forest-fire the meteorological
occurrence.
occurrence. Among the
Among factors, days
the meteorological
meteorological
after the
factors, last rain,
days daily
after theaverage
last relative
rain, daily humidity,
average and daily
relative
factors, days after the last rain, daily average relative humidity, and daily maximum maximum
humidity, andtemperature
daily had
maximum
the greatest
temperature influence
had the on forest-fire
greatest occurrence,
influence on followed
forest-fire by distance
occurrence,
temperature had the greatest influence on forest-fire occurrence, followed by distance to to
followedsettlements,
by and
distance to
topographic
settlements, and
settlements, factors had
and topographicthe least
topographic factors influence
factors had on
had the forest-fire
the least occurrence.
least influence
influence on Humidity
on forest-fire variation
forest-fire occurrence.
occurrence.
and temperature variation had significant effects on forest-fire occurrence, which is consis-
tent with previous research results [54]. Distance to settlements had a significant effect on
forest-fire occurrence, which was due to the highly intertwined agriculture and forestry in
Forests 2023, 14, 170 14 of 17
the Heihe area. According to Figure 2, the main cause of forest fires was human activities,
and there were about 67.4% forest fires caused by human activity. Lightning strikes and
power lines were relevant natural causes of forest fires. There were 15 forest fires caused by
lightning activity and 10 forest fires by power lines. Furthermore, about 27.5% forest fires
were unexplained. Therefore, the research of this paper mainly focused on artificial fire.
The cross-validation in the three variable index systems revealed that the prediction
accuracy of the RF algorithm and BP neural network algorithm differed less, and both
could obtain high prediction accuracy and goodness of fit, with an average prediction
accuracy between 86.01% and 88.98%, and AUC values between 0.930 and 0.955, both
of which were suitable for forest-fire occurrence prediction. The results of the fire-risk
distribution results showed that forest-fire-prone areas were mainly concentrated in the
northwestern and central parts of the Heihe area. At present, there is still a lack of effective
forest fire prediction and evaluation systems in forest-fire-prone areas, and no effective
technical support for resource allocation. For these areas, the local emergency management
departments can act in advance, make overall arrangements, strictly manage fire sources,
eliminate fire risks, and do a good job in forest-fire early warnings. For forest-fire-prone
areas, they should be equipped with more fire resources and fire towers, and fire prevention
planning, and reasonable allocation of related materials according to the fire probability
can be made, which can save human, material and financial resources, and help improve
the efficiency of forest fire monitoring and prevention work.
6. Conclusions
In this study, using RF and BPNN, the influence of meteorological factors and topo-
graphic factors on forest-fire occurrence in the Heihe area was comprehensively analyzed.
The main cause of fires in this area was human activities. Hurting the stubble in agricultural
lands was the main cause in human activities. The days after the last rain, daily average
relative humidity, daily maximum temperature, and distance to settlement were the main
driving factors affecting forest-fire occurrence. It is noted that days after the last rain is
an important driving factor which is revealed in this study. The relationship between
forest-fire occurrence and meteorological changes was significant. The prediction of forest
fires based on the index systems of different driving factors showed that the prediction
accuracy and the goodness of fit of the RF algorithm were slightly higher than those of
the BPNN algorithm, and both were suitable for the prediction of forest-fire occurrence
probability. The fire-risk distribution results based on the prediction models showed that
forest fires were prone to occur in the northwestern and central areas.
It was found that combining multiple forest-fire influencing factors could improve the
accuracy of the prediction models [60]. This paper relied on meteorological, topographic,
and human activity factors to study the forest-fire occurrence probability prediction model
but did not yet analyze the influence of forest-fuel types, fire sources, and other factors
on forest-fire occurrence, which is a shortcoming and also affects the model accuracy.
Moreover, the internal connection between forest-fire causes and the forest-fire driving
factors are not well analyzed. About 27.5 percent of fires are unexplained. To give a more
accurate prediction result, these forest-fire causes need to be further clarified. However,
the investigation of those unexplained forest-fire ignitions is a difficulty. In future work,
forest-fire causes will be further clarified by investigating and visiting the local emergency
management departments, and then the prediction of the possible forest fire causes in this
area will be further studied.
In addition, the use of an intelligent optimization algorithm to optimize RF and BPNN
is expected to further improve the prediction accuracy of the models [61]. The correct choice
of models for forest-fire occurrence prediction is crucial to reveal the real spatial distribution
pattern of forest fires. Moreover, no single model can solve the problem perfectly due to the
differences in the models themselves and the study areas, and it may be a better choice to
adopt several more suitable models at the same time to compare and synthesize the results
of different models. These issues will also be considered in our future research.
Forests 2023, 14, 170 15 of 17
Author Contributions: Methodology, C.G.; software, H.L.; validation, H.L.; resources, H.H.; writing—
original draft preparation, C.G.; and writing—review and editing, C.G., H.L. and H.H. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by the “Strategic International Scientific and Technological
Innovation Cooperation Special Fund of National Key Research and Development Program of China,
Grant Number 2018YFE0207800”, “Young Innovative Talents Training Program of Universities in Hei-
longjiang Province, Grant Number UNPYSCT-2020001” and “Heilongjiang University Outstanding
Youth Fund, Grant Number JCL202101”.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Rodrigues, M.; Riva, J.; Fotheringham, S. Modeling the spatial variation of the explanatory factors of human-caused wildfires in
Spain using geographically weighted logistic regression. Appl. Geogr. 2014, 48, 25–63. [CrossRef]
2. Ozbayoglu, A.M.; Bozer, R. Estimation of the burned area in forest fires using computational intelligence techniques. Procedia
Comput. Sci. 2012, 12, 282–287. [CrossRef]
3. Wu, C.; Xu, W.H.; Huang, S.D.; Qin, M.M.; Wang, Q.H. Research progress of remote sensing for forest-fire monitoring. J. Southwest
For. Univ. 2020, 40, 172–179.
4. Somashekar, R.K.; Ravikumar, P.; Mohan-Kumar, C.N.; Prakash, K.L.; Nagaraja, B.C. Burnt area mapping of bandipur national
park, India using IRS 1C/1D LISS III data. J. Indian Soc. Remote Sens. 2009, 37, 37–50. [CrossRef]
5. Cardille, J.A.; Ventura, S.J.; Turner, M.G. Environmental and social factors influencing wildfires in the upper midwest, United
States. Ecol. Appl. 2001, 11, 111–127. [CrossRef]
6. Gao, C.; Lin, H.L.; Hu, H.Q.; Song, H. A review of models of forest fire occurrence prediction in China. Chin. J. Appl. Ecol. 2020,
31, 3227–3240.
7. Li, W.; Jiang, Z.H.; Zhang, X.B.; Li, L.; Sun, Y. Additional risk in extreme precipitation in China from 1.5 ◦ C to 2.0 ◦ C global
warming levels. Sci. Bull. 2018, 63, 228–234. [CrossRef]
8. Turco, M.; Llasat, M.C.; Hardenberg, J.V.; Provenzale, A. Impact of climate variability on summer fires in a Mediterranean
environment (northeastern Iberian Peninsula). Clim. Chang. 2013, 116, 665–678. [CrossRef]
9. Gu, X.L.; Wu, Z.W.; Zhang, Y.J.; Yan, S.J.; Fu, J.J.; Du, L.H. Prediction research of the forest fire in Jiangxi province in the
background of climate change. Acta Ecol. Sin. 2020, 40, 667–677.
10. Martín, Y.; Zúñiga-Antón, M.; Rodrigues Mimbrero, M. Modelling temporal variation of fire-occurrence towards the dynamic
prediction of human wildfire ignition danger in northeast Spain. Geomat. Nat. Hazards Risk 2019, 10, 385–411. [CrossRef]
11. Gao, B.; Shan, Z.H.; Cao, L.L.; Shan, Y.L.; Han, X.Y.; Wang, M.X.; Yin, S.N. Study on monthly dynamic change and occurrence
prediction of forest fires in Daxing’an mountains. J. Cent. South Univ. For. Technol. 2021, 41, 53–62.
12. Hering, A.S.; Bell, C.L.; Genton, M.G. Modeling spatio-temporal wildfire ignition point patterns. Environ. Ecol. Stat. 2009, 16,
225–250. [CrossRef]
13. Alonso-Betanzos, A.; Fontenla-Romero, O.; Guijarro-Berdiñas, B.; Hernández-Pereira, E.; Andrade, M.I.P.; Jiménez, E.; Soto, J.L.L.;
Carballas, T. An intelligent system for forest fire risk prediction and fire fighting management in Galicia. Expert Syst. Appl. 2003,
25, 545–554. [CrossRef]
14. Pew, K.L.; Larsen, C.P.S. GIS analysis of spatial and temporal patterns of human-caused wildfires in the temperate rain forest of
Vancouver island, Canada. For. Ecol. Manag. 2001, 140, 1–18. [CrossRef]
15. Minnich, R.A.; Bahre, C.J. Wildland fire and chaparral succession along the California-Baja California boundary. Int. J. Wildland
Fire 1995, 5, 13–24. [CrossRef]
16. Rothermel, R.C. A Mathematical Model for Predicting Fire Spread in Wild Land Fuels; USDA Forest Service: Ogden, UT, USA, 1972;
p. 115.
17. Elmas, Ç.; Sönmez, Y. A data fusion framework with novel hybrid algorithm for multi-agent Decision Support System for Forest
Fire. Expert Syst. Appl. 2011, 38, 9225–9236. [CrossRef]
18. Opitz, T.; Bonneu, F.; Gabriel, E. Point-process based Bayesian modeling of space-time structures of forest fire occurrences in
Mediterranean France. Spat. Stat. 2020, 40, 100429. [CrossRef]
19. Chuvieco, E.; Aguado, I.; Yebra, M.; Nieto, H.; Salas, J.; Martín, M.P.; Vilar, L.; Martínez, J.; Martín, S.; Ibarra, P.; et al. Development
of a framework for fire risk assessment using remote sensing and geographic information system technologies. Ecol. Model. 2010,
221, 46–58. [CrossRef]
Forests 2023, 14, 170 16 of 17
20. Bui, D.T.; Le, H.V.; Hoang, N.D. GIS-based spatial prediction of tropical forest fire danger using a new hybrid machine learning
method. Ecol. Inform. 2018, 48, 104–116.
21. Murthy, K.K.; Sinha, S.K.; Kaul, R.; Vaidyanathan, S. A fine-scale state-space model to understand drivers of forest fires in the
Himalayan foothills. For. Ecol. Manag. 2019, 432, 902–911. [CrossRef]
22. Sağlam, B.; Bilgili, E.; Durmaz, B.D.; Kadıoğulları, A.İ.; Küçük, Ö. Spatio-temporal analysis of forest fire risk and danger using
LANDSAT imagery. Sensors 2008, 8, 3970–3987. [CrossRef]
23. Sivrikaya, F.; Sağlam, B.; Akay, A.E.; Bozali, N. Evaluation of forest fire risk with GIS. Pol. J. Environ. Stud. 2014, 23, 187–194.
24. Akbulak, C.; Tatlı, H.; Aygun, G.; Sağlam, B. Forest fire risk analysis via integration of GIS, RS and AHP: The Case of Canakkale,
Turkey. J. Hum. Sci. 2018, 15, 2127–2143. [CrossRef]
25. Elia, M.; Giannico, V.; Lafortezza, R.; Sanesi, G. Modeling fire ignition patterns in Mediterranean urban interface. Stoch. Environ.
Res. Risk Assess. 2019, 33, 169–181. [CrossRef]
26. Camp, P.E.; Krawchuk, M.A. Spatially varying constraints of human-caused fire occurrence in British Columbia, Canada. Int. J.
Wildland Fire 2017, 26, 219–229. [CrossRef]
27. Maingi, J.K.; Henry, M.C. Factors influencing wildfire occurrence and distribution in eastern Kentucky, USA. Int. J. Wildland Fire
2007, 16, 23–33. [CrossRef]
28. Anderson, K. A model to predict lightning-caused fire occurrences. Int. J. Wildland Fire 2002, 11, 163–172. [CrossRef]
29. Ager, A.A.; Barros, A.M.G.; Day, M.A.; Preisler, H.K.; Spies, T.A.; Bolte, J. Analyzing fine scale spatiotemporal drivers of wildfire
in a forest landscape model. Ecol. Model. 2018, 384, 87–102. [CrossRef]
30. Monjarás-Vega, N.A.; Briones-Herrera, C.I.; Vega-Nieva, D.J.; Calleros-Flores, E.; Corral-Rivas, J.J.; López-Serrano, P.M.; Pompa-
García, P.; Rodríguez-Trejo, D.A.; Carrillo-Parra, A.; González-Cabán, A.; et al. Predicting forest fire kernel density at multiple
scales with geographically weighted regression in Mexico. Sci. Total Environ. 2020, 718, 137313. [CrossRef]
31. Hamadeh, N.; Karouni, A.; Daya, B.; Chauvet, P. Using correlative data analysis to develop weather index that estimates the risk
of forest fires in Lebanon & Mediterranean: Assessment versus prevalent meteorological indices. Case Stud. Fire Saf. 2017, 7, 8–22.
32. Milanović, S.; Kaczmarowski, J.; Ciesielski, M.; Trailović, Z.; Mielcarek, M.; Szczygieł, R.; Kwiatkowski, M.; Bałazy, R.; Zasada, M.;
Milanović, S.D. Modeling and mapping of forest fire occurrence in the Lower Silesian Voivodeship of Poland based on Machine
Learning methods. Forests 2023, 14, 46. [CrossRef]
33. Kalantar, B.; Ueda, N.; Idrees, M.O.; Janizadeh, S.; Ahmadi, K.; Shabani, F. Forest fire susceptibility prediction based on machine
learning models with resampling algorithms on remote sensing data. Rem. Sens. 2020, 12, 3682. [CrossRef]
34. Zheng, Z.; Huang, W.; Li, S.N.; Zeng, Y.N. Forest fire spread simulating model using cellular automaton with extreme learning
machine. Ecol. Model. 2017, 348, 33–43. [CrossRef]
35. Iban, M.C.; Sekertekin, A. Machine learning based wildfire susceptibility mapping using remotely sensed fire data and GIS: A
case study of Adana and Mersin provinces, Turkey. Ecol. Inform. 2022, 69, 101647. [CrossRef]
36. Elia, M.; Este, M.D.; Ascoli, D.; Giannico, V.; Spano, G.; Ganga, A.; Colangelo, G.; Lafortezza, R.; Sanesi, G. Estimating the
probability of wildfire occurrence in Mediterranean landscapes using artificial neural networks. Environ. Impact Assess. Rev. 2020,
85, 106474. [CrossRef]
37. Bui, D.T.; Hoang, N.D.; Samui, P. Spatial pattern analysis and prediction of forest fire using new machine learning approach of
multivariate adaptive regression splines and differential flower pollination optimization: A case study at Lao Cai province (Viet
Nam). J. Environ. Manag. 2019, 237, 476–487.
38. Sevinc, V.; Kucuk, O.; Goltas, M. A Bayesian network model for prediction and analysis of possible forest fire causes. For. Ecol.
Manag. 2020, 457, 117723. [CrossRef]
39. Oliveira, S.; Oehler, F.; San-Miguel-Ayanz, J.; Camia, A.; Pereira, J. Modeling spatial patterns of fire occurrence in Mediterranean
Europe using Multiple Regression and Random Forest. For. Ecol. Manag. 2012, 275, 117–129. [CrossRef]
40. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
41. Cutler, D.R.; Edwards, T.J.; Beard, K.H.; Cutler, A.; Hess, H.T.; Gibson, J.; Lawler, J.J. Random forests for classification in ecology.
Ecology 2007, 88, 2783–2792. [CrossRef]
42. Vasconcelos, M.J.; Sllva, S.; Tome, M.; Alvim, M.; Pereira, J.C. Spatial prediction of fire ignition probabilities: Comparing Logistic
regression and neural networks. Photogramm. Eng. Remote Sens. 2001, 67, 73–81.
43. Sakr, G.E.; Elhajj, I.H.; Mitri, G. Efficient Forest fire occurrence prediction for developing countries using two weather parameters.
Eng. Appl. Artif. Intell. 2011, 24, 888–894. [CrossRef]
44. Mohajane, M.; Costache, R.; Karimi, F.; Pham, Q.B.; Essahlaoui, A.; Nguyen, H.; Laneve, G.; Oudija, F. Application of remote
sensing and machine learning algorithms for forest fire mapping in a Mediterranean area. Ecol. Indic. 2021, 129, 107869. [CrossRef]
45. Saha, S.; Bera, B.; Shit, P.K.; Bhattacharjee, S.; Sengupta, N. Prediction of forest fire susceptibility applying machine and deep
learning algorithms for conservation priorities of forest resources. Remote Sens. Appl. Soc. Environ. 2023, 29, 100917. [CrossRef]
46. Achu, A.L.; Thomas, J.; Aju, C.D. Machine-learning modelling of fire susceptibility in a forest-agriculture mosaic landscape of
southern India. Ecol. Inform. 2021, 64, 101348. [CrossRef]
47. Yang, J.B.; Ma, X.X. On the basis of artificial neural network to forecast the forest fire in Guangdong Province. Sci. Silvae Sin. 2005,
41, 127–132.
48. Ma, W.Y.; Feng, Z.K.; Cheng, Z.X.; Wang, F.G. Study on driving factors and distribution pattern of forest fires in Shanxi Province.
J. Cent. South Univ. For. Technol. 2020, 40, 57–69.
Forests 2023, 14, 170 17 of 17
49. Liang, H.L.; Lin, Y.R.; Yang, G.; Su, Z.W.; Wang, W.H.; Guo, F.T. Application of random forest algorithm on the forest fire
prediction in Tahe area based on meteorological factors. Sci. Silvae Sin. 2016, 52, 89–98.
50. Liang, H.L.; Guo, F.T.; Su, Z.W.; Wang, W.H.; Lin, F.F.; Lin, Y.R. Analysis of meteorological factors on forest fire occurrence of
Fujian based on random forest algorithm. Fire Saf. Sci. 2015, 24, 191–200.
51. Zheng, Z.; Gao, Y.H.; Yang, Q.Y.; Zou, B.; Xu, Y.J.; Chen, Y.Y.; Yang, S.Q.; Wang, Y.Q.; Wang, Z.W. Predicting Forest fire risk based
on mining rules with ant-miner algorithm in cloud-rich areas. Ecol. Indic. 2020, 118, 106772. [CrossRef]
52. Zhang, G.; Wang, M.; Liu, K. Deep neural networks for global wildfire susceptibility modelling. Ecol. Indic. 2021, 127, 107735.
[CrossRef]
53. Cui, Y.; Di, H.T.; Xing, Y.Q.; Chang, X.Q.; Shan, W. Spatial and temporal distributions of forest fires in Heilongjiang Province from
2001 to 2018 based on MODIS data. J. Nanjing For. Univ. (Nat. Sci. Ed.) 2021, 45, 205–211.
54. Guo, F.T.; Su, Z.W.; Ma, X.Q.; Song, Y.H.; Sun, L.; Hu, H.Q.; Yang, T.T. Climatic and non-climatic factors driving lightning-induced
fire in Tahe, Daxing’an mountation. Acta Ecol. Sin. 2015, 35, 6439–6448.
55. Fang, K.N.; Wu, J.B.; Zhu, J.P.; Xie, B.C. A review of technologies on random forests. Stat. Inf. Forum 2011, 26, 32–37.
56. Liaw, A.; Wiener, M. Classification and regression by random forests. Rnews 2002, 2, 18–22.
57. Catry, F.X.; Rego, F.C.; Bação, F.L.; Moreira, F. Modeling and mapping wildfire ignition risk in Portugal. Int. J. Wildland Fire 2009,
18, 921–931. [CrossRef]
58. Chang, Y.; Zhu, Z.L.; Bu, R.C.; Chen, H.W.; Feng, Y.T.; Li, Y.H.; Hu, Y.M.; Wang, Z.C. Predicting fire occurrence patterns with
logistic regression in Heilongjiang Province, China. Landsc. Ecol. 2013, 28, 1989–2004. [CrossRef]
59. Deng, O.; Li, Y.Q.; Feng, Z.K.; Zhang, D.Y. Model and zoning of forest fire risk in Heilongjiang province based on spatial Logistic.
Trans. Chin. Soc. Agric. Eng. 2012, 28, 200–205.
60. Zhu, Z.; Zhao, F.; Wang, Q.H.; Gao, Z.L.; Deng, X.F.; Huang, P.G. Driving factors of forest fire and fire risk zoning in Kunming
City. J. Zhejiang A F Univ. 2022, 39, 380–387.
61. Wang, L.; Hao, R.Y.; Liu, W.; Wen, Z.M. A multi-factor forest fire risk rating prediction model based on particle swarm optimization
algorithm and back-propagation neural network. J. For. Eng. 2019, 4, 137–144.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Forests 14 00170 v2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Forests 14 00170 v2

Uploaded by

Copyright:

Available Formats

Article

Forest-Fire-Risk Prediction Based on Random Forest and

1 College of Forestry, Northeast Forestry University, Harbin 150040, China

Forests 2023, 14, 170. https://doi.org/10.3390/f14020170 https://www.mdpi.com/journal/forests

2. Study Area and Data

Figure 1. The topography of the study area.

Figure 2. The cause of the fire in the study area.

Table 1. The considered forest-fire driving factors in the models.

Factors Data Sources Resolution/Scale Minimum Value Maximum Value

Figure 3. The simple flowchart of the RF algorithm.

Logistic stepwise regression is a common method used to select independent vari-

3.4. Receiver Operating Characteristic Curve (ROC)

4. Results and Analysis

Forests 2023, 14, x FOR PEER REVIEW 10 of 17

Table 2. Three variable index systems.

Logistic Variable RF Variable Index Integrated Variable

4.2. Prediction Accuracy and Goodness of Fit of the Models

Table 3. Results of the cross-validations.

Logistic RF Variable Integrated

Forests 2023, 14, 170 each of

You might also like