You are on page 1of 36

11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Wildfire Area Prediction: An AI


Approach to a GIS Problem
Kirti Girdhar Follow
Nov 9 · 15 min read

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 1/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Folium Heatmap of Forest Fires in United States

Since past few years, forest fires have become a huge environmental concern
throughout the world as these fires not only remove the trees and vegetation
but impact air quality, displaces and kills wildlife leading to extinction of some
species, alters water cycles, endangers the lives of local communities. There are
so many incidents in the past where forest fires rage out of control, not to forget
the recent Brazil rainforest wildfires. All forest fires are risky and departments
all over the world are looking out for ways to minimize these fires and their
effects.

Getting an area estimate of an ongoing forest fire burn can be very beneficial in
preparing the mitigation strategies. Fire Size forecasting at the right time can
be a huge benefit to firefighting units in resource allocation and management.

Prerequisites
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 2/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

This post assumes familiarity with machine learning concepts, beginner


level Python understanding and some basic GIS knowledge.

Looking at things from AI perspective


Through Machine Learning point of view, the problem can be either seen as
a Regression problem (for prediction the Forest Area which can be any real
number) or it can be considered as a Multi Class Classification problem
(Considering we have predefined classes in our dataset based on the size of
Fire).

Instead of focusing on getting the exact value of burnt area through


regression, the better approach would be to classify it in the right class with
higher accuracy because the mitigation strategies can be formed as per the
class in which the forest fire lies.

Bird’s-eye view of the Study


This project is an attempt to solve the problem of forest fires spread by
forecasting its potential area coverage as soon as the fire is discovered using
Artificial Intelligence.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 3/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

The entire study is a 9-step process:

Data Overview

Real World/Business objectives and constraints

Exploratory Data Analysis

Feature Engineering

Choosing the right Performance Matrices

Preparing a First Cut Solution

Hyperparameter tuning of elected models

Data Modeling

Obtaining Results

Dataset Overview
Data for this study if obtained from www.kaggle.com, it is a wildfire data of
years 1992–2015 for the United States. There are 39 parameters/features
which may/may not be important in prediction of Wildfire Area.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 4/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Going through each feature in the dataset:

1. OBJECTID: This is a unique ID provided for each row

2. FOD_ID: Global unique identifier.

3. FPA_ID: Unique identifier that contains information necessary to track


back to the original record in the source dataset.

4. SOURCE_SYSTEM_TYPE: Type of source database or system that the


record was drawn from (federal, nonfederal, or interagency).

5. SOURCE_SYSTEM: Name or other identifier for source database or


system that the record was drawn from.

6. NWCG_REPORTING_AGENCY: Active National Wildlife Coordinating


Group (NWCG) Unit Identifier for the agency preparing the fire report.

7. NWCG_REPORTING_UNIT_ID: Active NWCG Unit Identifier for the unit


preparing the fire report.

8. NWCG_REPORTING_UNIT_NAME: Active NWCG Unit Name for the unit


preparing the fire report.

9. SOURCE_REPORTING_UNIT: Code for the agency unit preparing the fire


report, based on code/name in the source dataset.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 5/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

10. SOURCE_REPORTING_UNIT_NAME: Name of reporting agency unit


preparing the fire report, based on code/name in the source dataset.

11. LOCAL_FIRE_REPORT_ID: Number or code that uniquely identifies an


incident report for a particular reporting unit and a particular calendar
year.

12. LOCAL_INCIDENT_ID: Number or code that uniquely identifies an


incident for a particular local fire management organization within a
particular calendar year.

13. FIRE_CODE: Code used within the interagency wildland fire community
to track and compile cost information for emergency fire suppression.

14. FIRE_NAME: Name of the incident, from the fire report (primary) or
ICS-209 report (secondary).

15. ICS_209_INCIDENT_NUMBER: Incident (event) identifier, from the ICS-


209 report.

16. ICS_209_NAME: Name of the incident, from the ICS-209 report.

17. MTBS_ID = Incident identifier, from the MTBS perimeter dataset.

18. MTBS_FIRE_NAME: Name of the incident, from the MTBS perimeter


dataset.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 6/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

19. COMPLEX_NAME: Name of the complex under which the fire was
ultimately managed, when discernible.

20. FIRE_YEAR: Calendar year in which the fire was discovered or


confirmed to exist.

21. DISCOVERY_DATE: Date on which the fire was discovered or confirmed


to exist.

22. DISCOVERY_DOY: Day of year on which the fire was discovered or


confirmed to exist.

23. DISCOVERY_TIME: Time of day that the fire was discovered or


confirmed to exist.

24. STAT_CAUSE_CODE: Code for the (statistical) cause of the fire.

25. STAT_CAUSE_DESC: Description of the (statistical) cause of the fire.

26. CONT_DATE: Date on which the fire was declared contained or


otherwise controlled (mm/dd/yyyy where mm=month, dd=day, and
yyyy=year).

27. CONT_DOY: Day of year on which the fire was declared contained or
otherwise controlled.

28. CONT_TIME: Time of day that the fire was declared contained or
otherwise controlled (hhmm where hh=hour, mm=minutes).

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 7/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

29. FIRE_SIZE: Estimate of acres within the final perimeter of the fire.

30. FIRE_SIZE_CLASS: Code for fire size based on the number of acres
within the final fire perimeter expenditures (A=greater than 0 but less
than or equal to 0.25 acres, B=0.26-9.9 acres, C=10.0-99.9 acres,
D=100-299 acres, E=300 to 999 acres, F=1000 to 4999 acres, and
G=5000+ acres).

31. LATITUDE: Latitude (NAD83) for point location of the fire (decimal
degrees).

32. LONGITUDE: Longitude (NAD83) for point location of the fire (decimal
degrees).

33. OWNER_CODE: Code for the primary owner or entity responsible for
managing the land at the point of origin of the fire at the time of the
incident.

34. OWNER_DESCR: Name of primary owner or entity responsible for


managing the land at the point of origin of the fire at the time of the
incident.

35. STATE: Two-letter alphabetic code for the state in which the fire burned
(or originated), based on the nominal designation in the fire report.

36. COUNTY: County, or equivalent, in which the fire burned (or


originated), based on nominal designation in the fire report.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 8/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

37. FIPS_CODE: Three-digit code from the Federal Information Process


Standards (FIPS) publication 6-4 for representation of counties and
equivalent entities.

38. FIPS_NAME: County name from the FIPS publication 6-4 for
representation of counties and equivalent entities.

39. Shape: shape of the forest fire.

Real World/Business objectives and constraints


1. The cost of Misclassification is high in this problem as the resources for
fire mitigation would be assigned as per the predictions. For cases when
the predicted fire size is smaller than the actual spread, the resources
would fall short and therefore damage would be higher.

2. There are moderate latency constraints i.e. the model can take a few
minutes for computing the probability values but cannot take hours.

Performing Exploratory Data analysis


https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 9/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

There are 39 features in the dataset that we are using amongst which most
of them are unique identification numbers used by different agencies and
do not provide any information about fire. Therefore, we would remove
those features right away after which feature analysis is performed on every
individual feature to analyze their in our model.

Analyzing our class label: FIRE_SIZE_CLASS feature

The feature is first encoded manually to ‘int’ data type to facilitate machine
learning model computation. Here different classes of fire size indicate
different fire area sizes (in acres) as given below:

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 10/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

1/A: 0–0.25 acres

2/B: 0.26–9.9 acres

3/C: 10.0–99.9 acres

4/D: 100–299 acres

5/E: 300–999 acres

6/F: 1000–4999 acres

7/G: 5000+ acres

As seen in the graph, dataset is highly imbalanced with maximum reported


incidents lying in Class 2.

Features such as OBJECTID, FOD_ID, FPA_ID are removed as they do add


any value to the model.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 11/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Analyzing feature: SOURCE_SYSTEM_TYPE

Code snippet for individual feature analysis is as follows:

Feature Analysis Code

The feature is then encoded to a numeric value using ‘Label Encoders’ after
which graphical analysis is performed.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 12/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Observation:

For each of the fire size class, maximum fires are associated with source
system type = 2 which is ‘INTERAGCY’ and minimum fires in each class
associated with source system type = 1 which is ‘NONFED’.

Therefore, this feature would be useful for our model.

Analyzing feature: SOURCE_SYSTEM

Similar statistical analysis is performed for each feature as shown in the


above code snippet. The graphical analysis of this feature is as follows:

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 13/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Observation: The X- Axis values of this plot re overlapping but we can


clearly observe than most of the box plots have their 25–75%values lying
between fire classes 1 and 2. This observation can be an effect of huge data
imbalance which can be avoided by adding weight to each fire class during
modeling.

Analyzing feature: NWCG_REPORTING_AGENCY

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 14/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Observation: In most of the Box Plots, the median line is overlapping with
the 25th/75th percentile lines. Most agencies have been reporting fires of
size lying in class 1,2,3 but Agency 3 has 25–75% reported incidents lying
in fire size class 5,6 and agency 4 between class 6 and 7.

Features such as ICS_209_INCIDENT_NUMBER, MTBS_ID,


MTBS_FIRE_NAME are again removed as they were not important for the
prediction.

Analyzing feature: FIRE_YEAR

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 15/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Observation: Year 2006 has seen maximum forest fires not just class 2 fires
but also Class 1 and Class 3 are visibly higher than any other year. More
recent years i.e. 2014, 2015 show similar numbers which are not as high as
2006.

Since for every year, most of the count values are of fires lying in class 1,2
and 3, for analyzing fires lying in other classes, We can analyze the fires
with dataset not having fires of classes 1, 2 and 3. Therefore we would filter
out our data, the code for which is given below:

Code for Filtering out Majority classes from Imbalanced Data

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 16/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Observation: This is a graph of filtered data such that the data contains only
fires of classes 4,5,6,7. After analyzing the filtered data, we observe that in
2006, even the fires of bigger sizes are much more than other years. Fires of
class 7 (biggest size) are the largest in years 2006, 2007, 2011, 2012 and
2015.

Feature DISCOVERY_DATE is present in Julian format, it can be


converted into YYYY-MM-DD format for better understanding but we
already have Discovery Year, Discovery Day of Year from which we can
get the month of forest fire and after that we have discovery time which
can be used to find out at which interval of the day did the fire occur,

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 17/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

therefor this feature would not add any value to our dataset and can be
discarded.

Analyzing feature: STAT_CAUSE_CODE

Observation: When the fire cause code is 1: we have maximum fires of size
1. In fact, these is a probability of fires from all 7 size classes to be present in
it, on looking at the STAT_CAUSE_DESC feature we realized that class 1 is
actually ‘Miscellaneous’ therefore it makes sense. For causes of class 10
(Powerline), very less fires are cause due to it as the count is very small and
even when the fires are cause, the fires are only of class size 1 or 2 i.e. small
fires. Similarly cause 12 (Fireworks) has the least number of instances and
only in class 1 and 2. For class 9 (Smoking) we can see fires of class 3,4,5
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 18/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

and 6 along with 1 and 2 i.e. smoking can cause huge fires and is a much
more dangerous cause.

Containment features such as CONT_DATE cannot be present for current


fires therefore removing all containment information.

Analyzing features: LATITUDE and LONGITUDE

Geographical analysis of forest fires based on coordinate values is done by


plotting heatmaps using the code below:

Code for creating heatmap of Forest Fires

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 19/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Observation: Lighter color means more wildfires (as given in the legend).
This is a view of all 52 states in the U.S. which include both the continent
part and the main land U.S. To get a better visualization of wildfires in
mainland U.S., we can remove the continent states data from our data
frame i.e. states = Hawaii, Alaska and Puerto Rico (Reference taken from
world map)

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 20/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Heatmap for data filtered for U.S. mainland

Observation: This is just the forest fires visualization in U.S. mainland.


Taking reference from U.S. map, we can see a lot of forest fires in California
(left corner)

Regions like Montana, Nebraska, Colorado, Indiana, Illinois have very less
cases of forest fires and most of them are darker in shades i.e. the count of
forest fires is very less there. Then again in regions at right bottom corner
i.e. Virginia, North and south Carolina, Georgia, Florida, Alabama there are
so many forest fires as the color is very bright and dense.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 21/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

We can also represent forest fire counts throughout the years using Folium
for better representation through a map (Folium image present in article
title). Code for Folium representation is as follows:

Folium Data presentation Code

Similar analysis is performed on features such as COUNTY, OWNER_CODE


etc.

Feature Engineering
Through domain knowledge, we know that weather conditions provide a
significant role in igniting or cooling down the forest fires but dataset used

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 22/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

in this study does not provide any weather information. Therefore we have
performed feature engineering to add additional features to our dataset.

Similarly, feature like DAY_OF_YEAR, DISCOVERY_TIME are not significant


if used as is but we can modify them into useful features.

Modifying DAY_OF_YEAR to get feature DISCOVERY_MONTH and


analyzing the feature:

Observation: This feature provides some useful insights on the variation of


fire size classes based on the month of year as we can see in the months of
January-April(1–4) fires of area class 2 have occurred much more than any

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 23/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

other class. Then in month of May(5), fires of class 2 have reduced abruptly
and we can see a slight increase in bigger fire classes (class 5 and 6 values
appeared). During months June-August (6–8), we can see forest fires of
class 7 present that is huge area of forest fires. After that in months
September-December(9–12), the forest fires have reduced. Therefore it is
an extremely important feature to have.

Filtering the data to specifically view minority fire classes.

Observation: This graph only shows the fires of classes 4,5,6 and 7. while in
the month of March, the fires of size class 4 are the maximum but In the

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 24/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

months of July and August (7 and 8) Fires from all classes i.e. all sizes of
fires (Referring the above graph and this one) occur the most.

Modifying DISCOVERY_TIME feature to get DISCOVERY_TOD (Time of


Day) i.e. instead of analyzing fire discovery time minute-wise, getting
information about the interval of day like Early Morning, Morning,
Noon, Evening and Night would be much more beneficial.

Observation: Most of the fires are discovered in the early morning period
i.e. between 0000–0600 (12 A.M — 6 A.M.) Least number of fires occur
after that, i.e. Morning hours (6 A.M. to 12 P.M.)

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 25/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Observation: Even though the fires of size lying between class 1–4 have
maximum number of occurrences between the time period 12:00 A.M. —
06:00 A.M. But for bigger fire sizes (class 5–7), the time period in which
maximum fires have occurred in the past is 12:00–16:00 i.e. 12:00 P.M. —
04:00 P.M. The time period for minimum fire incidents observed for all
classes remain the same (Morning hours: 6 A.M — 12 P.M.)

Percentage Forest Coverage for each state can be a useful feature in


analyzing the forest fire size, therefore using the STATE feature, we can
obtain their forest coverage and stored the information in an excel file.

Precipitation and Average temperature of the state are also obtained


through other sources and combined together in an excel file.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 26/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

AVG_TEMP feature is formed by considering two features: STATE and


FIRE_YEAR.

After getting done with individual feature analysis. Correlation Matrix is


used to analyze the effect of each feature on each other and the feature
which were highly correlated to each other were then removed from
dataset.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 27/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Correlation Matrix between features

Once the EDA and Feature Engineering are done, final encoded features are
stored in a data frame which would be now used for Data Modeling.

Performance Matrices used


MAPE (Mean Absolute Percentage Error)
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 28/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

It is the most common statistical measure for getting accuracy of forecasting


data. It is represented by the formula:

Here M = MAPE, n = no. of observations, At = Actual value, Ft = forecast value

MAE (Mean Absolute Error)

It is the simplest method for measuring Forecast accuracy. Represented by


formula:

Here n = no. of observations, xi = Actual value, yi = forecast value

First Cut Solution


https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 29/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Since the features we are dealing with are geospatial features, we cannot
just normalize/scale our features (such as latitude, longitude) . So we
cannot use linear models too as they require scaled features as input to the
model.

In such cases, Tree based/grouping models can be a good solution as they


will preserve the original values of features and would provide the output
without any modifications in input model. So starting with the Simplest
Model that works on the concept of neighborhood i.e. KNN. Without any
hyperparameter tuning, ‘KNN’ provides a good MAPE value. Therefore it
can be used in our final model.

Then ‘Gaussian Naive Bayes’ is used as it does not require feature


normalization too but the model did not perform well for our data.

After which we will move to tree based models such as Decision Tree,
Random Forest which show decent performance and can do better with
hyper parameter tuning.

Selecting the best approach and performing hyper


parameter tuning

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 30/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Since KNN, decision tree and random forest are performing well for our
data, doing hyperparameter tuning for each of these 3 models and
analyzing MAPE and MAE value obtained from them. The results for which
are as follows:

KNN — MAE: 0.438, MAPE: 23.953%

DECISION TREE — MAE: 0.464, MAPE: 35.902%

RANDOM FOREST — MAE: 0.455, MAPE: 24.996%

Data Modeling
Instead of using any 1 of the 3 models above. We have created an ensemble
model using the following steps:

1. Splitting whole data into train and test (80–20 ratio)

2. Splitting the 80 test data into 2 equal sets D1 and D2 (50–50 ratio)

3. Creating 3 samples from D1 (using sampling with replacement)

4. Training these 3 samples with 3 different models (KNN, Decision Tree


and Random Forest)

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 31/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

5. Passing D2 set to each of these models and getting predictions for fire
size class. For each row of data, we would get 3 predictions from 3
different models (i.e. 1 prediction each). Storing them in a new data
frame along with actual values of D2 set.

6. Using this new data frame to get the final prediction by performing
majority vote (from the 3 predictions) and getting MAE and MAPE
values for Train Data

7. For the remaining 20% test data: Passing it through the same 3 models
and then doing majority vote on the 3 predictions to get the final
predicted value.

Code for steps 1–4

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 32/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Predicting MAE and MAPE values for Train Data

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 33/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Predicting MAE and MAPE for Test Data

Results
On providing an input of a random set of 4500 forest fire data points to our
model, Mean Absolute Error comes out to be 0.50 and the value Mean
Absolute Percentage Error which is our primary performance matrix
comes out to be ~23.4% i.e. for the other 76.6% of the times, the model
is forecasting the correct forest fire size class.

Therefore the model can be used for predicting forest fires if all required
features are present for an existing fire.

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 34/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Future Work
Features such as ‘Average Temperature’ and ‘Average Precipitation’ of the
entire state can be replaced by ‘Current Temperature’ and ‘Current
Precipitation’ if real world applications. It would increase the model
accuracy.

Due to computation limits, each parameter of Random Forest algorithm


was not fine tuned. With good computation power, the model can be
improved further giving better results.

Thanks for reading!

I hope you found the article interesting and worth the read. Feel free to provide
your feedbacks on my study. You can reach out to me on LinkedIn.

For a deeper understanding of this study, you can refer to the notebooks shared
at my Github Repository.

References
Reference 1

Reference 2

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 35/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium

Reference 3

Reference 4

Reference 5

Reference 6

Reference 7

Forest Fires Machine Learning GIS Data Science Artificial Intelligence

Learn more. Make Medium yours. Share your thinking.


Medium is an open platform where 170 Follow the writers, publications, and topics If you have a story to tell, knowledge to
million readers come to find insightful and that matter to you, and you’ll see them on share, or a perspective to offer — welcome
dynamic thinking. Here, expert and your homepage and in your inbox. Explore home. It’s easy and free to post your thinking
undiscovered voices alike dive into the heart on any topic. Write on Medium
of any topic and bring new ideas to the
surface. Learn more

About Help Legal

https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 36/36

You might also like