Professional Documents
Culture Documents
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 1/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Since past few years, forest fires have become a huge environmental concern
throughout the world as these fires not only remove the trees and vegetation
but impact air quality, displaces and kills wildlife leading to extinction of some
species, alters water cycles, endangers the lives of local communities. There are
so many incidents in the past where forest fires rage out of control, not to forget
the recent Brazil rainforest wildfires. All forest fires are risky and departments
all over the world are looking out for ways to minimize these fires and their
effects.
Getting an area estimate of an ongoing forest fire burn can be very beneficial in
preparing the mitigation strategies. Fire Size forecasting at the right time can
be a huge benefit to firefighting units in resource allocation and management.
Prerequisites
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 2/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Data Overview
Feature Engineering
Data Modeling
Obtaining Results
Dataset Overview
Data for this study if obtained from www.kaggle.com, it is a wildfire data of
years 1992–2015 for the United States. There are 39 parameters/features
which may/may not be important in prediction of Wildfire Area.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 4/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 5/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
13. FIRE_CODE: Code used within the interagency wildland fire community
to track and compile cost information for emergency fire suppression.
14. FIRE_NAME: Name of the incident, from the fire report (primary) or
ICS-209 report (secondary).
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 6/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
19. COMPLEX_NAME: Name of the complex under which the fire was
ultimately managed, when discernible.
27. CONT_DOY: Day of year on which the fire was declared contained or
otherwise controlled.
28. CONT_TIME: Time of day that the fire was declared contained or
otherwise controlled (hhmm where hh=hour, mm=minutes).
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 7/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
29. FIRE_SIZE: Estimate of acres within the final perimeter of the fire.
30. FIRE_SIZE_CLASS: Code for fire size based on the number of acres
within the final fire perimeter expenditures (A=greater than 0 but less
than or equal to 0.25 acres, B=0.26-9.9 acres, C=10.0-99.9 acres,
D=100-299 acres, E=300 to 999 acres, F=1000 to 4999 acres, and
G=5000+ acres).
31. LATITUDE: Latitude (NAD83) for point location of the fire (decimal
degrees).
32. LONGITUDE: Longitude (NAD83) for point location of the fire (decimal
degrees).
33. OWNER_CODE: Code for the primary owner or entity responsible for
managing the land at the point of origin of the fire at the time of the
incident.
35. STATE: Two-letter alphabetic code for the state in which the fire burned
(or originated), based on the nominal designation in the fire report.
38. FIPS_NAME: County name from the FIPS publication 6-4 for
representation of counties and equivalent entities.
2. There are moderate latency constraints i.e. the model can take a few
minutes for computing the probability values but cannot take hours.
There are 39 features in the dataset that we are using amongst which most
of them are unique identification numbers used by different agencies and
do not provide any information about fire. Therefore, we would remove
those features right away after which feature analysis is performed on every
individual feature to analyze their in our model.
The feature is first encoded manually to ‘int’ data type to facilitate machine
learning model computation. Here different classes of fire size indicate
different fire area sizes (in acres) as given below:
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 10/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 11/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
The feature is then encoded to a numeric value using ‘Label Encoders’ after
which graphical analysis is performed.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 12/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Observation:
For each of the fire size class, maximum fires are associated with source
system type = 2 which is ‘INTERAGCY’ and minimum fires in each class
associated with source system type = 1 which is ‘NONFED’.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 13/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 14/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Observation: In most of the Box Plots, the median line is overlapping with
the 25th/75th percentile lines. Most agencies have been reporting fires of
size lying in class 1,2,3 but Agency 3 has 25–75% reported incidents lying
in fire size class 5,6 and agency 4 between class 6 and 7.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 15/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Observation: Year 2006 has seen maximum forest fires not just class 2 fires
but also Class 1 and Class 3 are visibly higher than any other year. More
recent years i.e. 2014, 2015 show similar numbers which are not as high as
2006.
Since for every year, most of the count values are of fires lying in class 1,2
and 3, for analyzing fires lying in other classes, We can analyze the fires
with dataset not having fires of classes 1, 2 and 3. Therefore we would filter
out our data, the code for which is given below:
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 16/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Observation: This is a graph of filtered data such that the data contains only
fires of classes 4,5,6,7. After analyzing the filtered data, we observe that in
2006, even the fires of bigger sizes are much more than other years. Fires of
class 7 (biggest size) are the largest in years 2006, 2007, 2011, 2012 and
2015.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 17/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
therefor this feature would not add any value to our dataset and can be
discarded.
Observation: When the fire cause code is 1: we have maximum fires of size
1. In fact, these is a probability of fires from all 7 size classes to be present in
it, on looking at the STAT_CAUSE_DESC feature we realized that class 1 is
actually ‘Miscellaneous’ therefore it makes sense. For causes of class 10
(Powerline), very less fires are cause due to it as the count is very small and
even when the fires are cause, the fires are only of class size 1 or 2 i.e. small
fires. Similarly cause 12 (Fireworks) has the least number of instances and
only in class 1 and 2. For class 9 (Smoking) we can see fires of class 3,4,5
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 18/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
and 6 along with 1 and 2 i.e. smoking can cause huge fires and is a much
more dangerous cause.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 19/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Observation: Lighter color means more wildfires (as given in the legend).
This is a view of all 52 states in the U.S. which include both the continent
part and the main land U.S. To get a better visualization of wildfires in
mainland U.S., we can remove the continent states data from our data
frame i.e. states = Hawaii, Alaska and Puerto Rico (Reference taken from
world map)
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 20/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Regions like Montana, Nebraska, Colorado, Indiana, Illinois have very less
cases of forest fires and most of them are darker in shades i.e. the count of
forest fires is very less there. Then again in regions at right bottom corner
i.e. Virginia, North and south Carolina, Georgia, Florida, Alabama there are
so many forest fires as the color is very bright and dense.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 21/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
We can also represent forest fire counts throughout the years using Folium
for better representation through a map (Folium image present in article
title). Code for Folium representation is as follows:
Feature Engineering
Through domain knowledge, we know that weather conditions provide a
significant role in igniting or cooling down the forest fires but dataset used
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 22/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
in this study does not provide any weather information. Therefore we have
performed feature engineering to add additional features to our dataset.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 23/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
other class. Then in month of May(5), fires of class 2 have reduced abruptly
and we can see a slight increase in bigger fire classes (class 5 and 6 values
appeared). During months June-August (6–8), we can see forest fires of
class 7 present that is huge area of forest fires. After that in months
September-December(9–12), the forest fires have reduced. Therefore it is
an extremely important feature to have.
Observation: This graph only shows the fires of classes 4,5,6 and 7. while in
the month of March, the fires of size class 4 are the maximum but In the
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 24/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
months of July and August (7 and 8) Fires from all classes i.e. all sizes of
fires (Referring the above graph and this one) occur the most.
Observation: Most of the fires are discovered in the early morning period
i.e. between 0000–0600 (12 A.M — 6 A.M.) Least number of fires occur
after that, i.e. Morning hours (6 A.M. to 12 P.M.)
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 25/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Observation: Even though the fires of size lying between class 1–4 have
maximum number of occurrences between the time period 12:00 A.M. —
06:00 A.M. But for bigger fire sizes (class 5–7), the time period in which
maximum fires have occurred in the past is 12:00–16:00 i.e. 12:00 P.M. —
04:00 P.M. The time period for minimum fire incidents observed for all
classes remain the same (Morning hours: 6 A.M — 12 P.M.)
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 26/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 27/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Once the EDA and Feature Engineering are done, final encoded features are
stored in a data frame which would be now used for Data Modeling.
Since the features we are dealing with are geospatial features, we cannot
just normalize/scale our features (such as latitude, longitude) . So we
cannot use linear models too as they require scaled features as input to the
model.
After which we will move to tree based models such as Decision Tree,
Random Forest which show decent performance and can do better with
hyper parameter tuning.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 30/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Since KNN, decision tree and random forest are performing well for our
data, doing hyperparameter tuning for each of these 3 models and
analyzing MAPE and MAE value obtained from them. The results for which
are as follows:
Data Modeling
Instead of using any 1 of the 3 models above. We have created an ensemble
model using the following steps:
2. Splitting the 80 test data into 2 equal sets D1 and D2 (50–50 ratio)
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 31/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
5. Passing D2 set to each of these models and getting predictions for fire
size class. For each row of data, we would get 3 predictions from 3
different models (i.e. 1 prediction each). Storing them in a new data
frame along with actual values of D2 set.
6. Using this new data frame to get the final prediction by performing
majority vote (from the 3 predictions) and getting MAE and MAPE
values for Train Data
7. For the remaining 20% test data: Passing it through the same 3 models
and then doing majority vote on the 3 predictions to get the final
predicted value.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 32/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 33/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Results
On providing an input of a random set of 4500 forest fire data points to our
model, Mean Absolute Error comes out to be 0.50 and the value Mean
Absolute Percentage Error which is our primary performance matrix
comes out to be ~23.4% i.e. for the other 76.6% of the times, the model
is forecasting the correct forest fire size class.
Therefore the model can be used for predicting forest fires if all required
features are present for an existing fire.
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 34/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Future Work
Features such as ‘Average Temperature’ and ‘Average Precipitation’ of the
entire state can be replaced by ‘Current Temperature’ and ‘Current
Precipitation’ if real world applications. It would increase the model
accuracy.
I hope you found the article interesting and worth the read. Feel free to provide
your feedbacks on my study. You can reach out to me on LinkedIn.
For a deeper understanding of this study, you can refer to the notebooks shared
at my Github Repository.
References
Reference 1
Reference 2
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 35/36
11/30/2020 Wildfire Area Prediction: An AI Approach to a GIS Problem | by Kirti Girdhar | The Startup | Nov, 2020 | Medium
Reference 3
Reference 4
Reference 5
Reference 6
Reference 7
https://medium.com/swlh/wildfire-area-prediction-an-ai-approach-to-a-gis-problem-2a4e8d97d7e8 36/36