0% found this document useful (0 votes)

23 views1 page

Predicting Forest Fire Damage with SVM

Uploaded by

Harsh Bangia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views1 page

Predicting Forest Fire Damage with SVM

Uploaded by

Harsh Bangia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Build a Machine Learning Model that can predict the burned area of forest fires using meteorological and

other data.

Support Vector Machines (SVM).

Support Vectors

Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane. Because of this, they can be considered the critical elements of a
data set.

hyperplane

Imagine hyperplane as a line that linearly separates and classifies a set of data.

The away from the hyperplane our data points lie, the more confident we are that they have been correctly classified. We therefore want our data points to be as far away from the hyperplane as possible, while still being on the
correct side of it.

The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will
be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.

So when new testing data is added, whatever side of the hyperplane it lands will decide the class that we assign to it.

in brief we can define hyperplane as - There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data
points. This best boundary is known as the hyperplane of SVM.

How to find the right hyperplane To find right hyperplane it is important to set margin appropriatly

The distance between the hyperplane and the nearest data point from either set is known as the margin
The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly.

SVM

Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression problems. it is mostly used in classification problems
An SVM model is basically a representation of different classes in a hyperplane in multidimensional space
It can solve linear and non-linear problems and work well for many practical problems
The idea of SVM is simple: The algorithm creates a line or a hyperplane which separates the data into classes
The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized
The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH)

Two types of SVM

Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is
used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as
Non-linear SVM classifier.

For documentation on SVM Click here

Applications of SVM

SVMs are helpful in text and hypertext categorization, as their application can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings. Some methods for shallow
semantic parsing are based on support vector machines.
Classification of images can also be performed using SVMs. Experimental results show that SVMs achieve significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of
relevance feedback. This is also true for image segmentation systems,
Classification of satellite data like SAR data using supervised SVM.
Hand-written characters can be recognized using SVM.
The SVM algorithm has been widely applied in the biological and other sciences. They have been used to classify proteins with up to 90% of the compounds classified correctly. Permutation tests based on SVM weights
have been suggested as a mechanism for interpretation of SVM models. Support-vector machine weights have also been used to interpret SVM models in the past.Posthoc interpretation of support-vector machine models
in order to identify features used by the model to make predictions is a relatively new area of research with special significance in the biological sciences

In [4]: #importing the libraries

# importing numpy library for numarical operations
import numpy as np
# importing pandas library for data manupulation operations
import pandas as pd
# importing pyplot on matplotlib for visualisation purpose
import matplotlib.pyplot as plt
%matplotlib inline
#importing seaborn for advanced visualization
import seaborn as sns

In [5]: #importing the data set

forest=pd.read_csv("forestfires.csv")

- Forest Fire Damage Prediction¶

data we have is about forest fires, let's try to predict the damage caused by a given fire
We will use classification model to make our prediction
To download the dataset Click here

Attribute information:

month - month of the year: "jan" to "dec"

day - day of the week: "mon" to "sun"

FFMC - FFMC index from the FWI system: 18.7 to 96.20
DMC - DMC index from the FWI system: 1.1 to 291.3
DC - DC index from the FWI system: 7.9 to 860.6
ISI - ISI index from the FWI system: 0.0 to 56.10
temp - temperature in Celsius degrees: 2.2 to 33.30
RH - relative humidity in %: 15.0 to 100
wind - wind speed in km/h: 0.40 to 9.40
rain - outside rain in mm/m2 : 0.0 to 6.4
area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).
let's go

In [6]: #looking into the dataset

forest

Out[6]: month day FFMC DMC DC ISI temp RH wind rain ... monthfeb monthjan monthjul monthjun monthmar monthmay monthnov monthoct monthsep size_category

0 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 ... 0 0 0 0 1 0 0 0 0 small

1 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 ... 0 0 0 0 0 0 0 1 0 small

2 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 ... 0 0 0 0 0 0 0 1 0 small

3 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 ... 0 0 0 0 1 0 0 0 0 small

4 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 ... 0 0 0 0 1 0 0 0 0 small

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

512 aug sun 81.6 56.7 665.6 1.9 27.8 32 2.7 0.0 ... 0 0 0 0 0 0 0 0 0 large

513 aug sun 81.6 56.7 665.6 1.9 21.9 71 5.8 0.0 ... 0 0 0 0 0 0 0 0 0 large

514 aug sun 81.6 56.7 665.6 1.9 21.2 70 6.7 0.0 ... 0 0 0 0 0 0 0 0 0 large

515 aug sat 94.4 146.0 614.7 11.3 25.6 42 4.0 0.0 ... 0 0 0 0 0 0 0 0 0 small

516 nov tue 79.5 3.0 106.7 1.1 11.8 31 4.5 0.0 ... 0 0 0 0 0 0 1 0 0 small

517 rows × 31 columns

In [7]: # to know the pair wise correlation

forest.corr()

Out[7]: FFMC DMC DC ISI temp RH wind rain area dayfri ... monthdec monthfeb monthjan monthjul monthjun monthmar monthmay monthnov monthoct mon

FFMC 1.000000 0.382619 0.330512 0.531805 0.431532 -0.300995 -0.028485 0.056702 0.040122 0.019306 ... -0.137044 -0.281535 -0.454771 0.031833 -0.040634 -0.074327 -0.037230 -0.088964 -0.005998 0.0

DMC 0.382619 1.000000 0.682192 0.305128 0.469594 0.073795 -0.105342 0.074790 0.072994 -0.012010 ... -0.176301 -0.317899 -0.105647 -0.001946 -0.050403 -0.407404 -0.081980 -0.074218 -0.187632 0.1

DC 0.330512 0.682192 1.000000 0.229154 0.496208 -0.039192 -0.203466 0.035861 0.049383 -0.004220 ... -0.105642 -0.399277 -0.115064 -0.100887 -0.186183 -0.650427 -0.114209 -0.078380 0.093279 0.5

ISI 0.531805 0.305128 0.229154 1.000000 0.394287 -0.132517 0.106826 0.067668 0.008258 0.046695 ... -0.162322 -0.249777 -0.103588 0.020982 0.111516 -0.143520 -0.060493 -0.076559 -0.071154 -0.0

temp 0.431532 0.469594 0.496208 0.394287 1.000000 -0.527390 -0.227116 0.069491 0.097844 -0.071949 ... -0.329648 -0.320015 -0.146520 0.142588 0.051015 -0.341797 -0.045540 -0.053798 -0.053513 0.0

RH -0.300995 0.073795 -0.039192 -0.132517 -0.527390 1.000000 0.069410 0.099751 -0.075519 0.064506 ... -0.047714 0.140430 0.170923 0.013185 0.009382 -0.089836 0.086822 -0.035885 -0.072334 -0.0

wind -0.028485 -0.105342 -0.203466 0.106826 -0.227116 0.069410 1.000000 0.061119 0.012317 0.118090 ... 0.269702 -0.029431 -0.070245 -0.040645 0.012124 0.181433 0.015054 0.011864 -0.053850 -0.1

rain 0.056702 0.074790 0.035861 0.067668 0.069491 0.099751 0.061119 1.000000 -0.007366 -0.004261 ... -0.009752 -0.014698 -0.004566 -0.013390 -0.013510 -0.020744 -0.004566 -0.003225 -0.012665 -0.0

area 0.040122 0.072994 0.049383 0.008258 0.097844 -0.075519 0.012317 -0.007366 1.000000 -0.052911 ... 0.001010 -0.020732 -0.012589 0.006149 -0.020314 -0.045596 0.006264 -0.008893 -0.016878 0.0

dayfri 0.019306 -0.012010 -0.004220 0.046695 -0.071949 0.064506 0.118090 -0.004261 -0.052911 1.000000 ... -0.019140 0.046323 -0.027643 -0.048969 0.006000 0.036205 0.056423 -0.019527 -0.045585 0.1

daymon -0.059396 -0.107921 -0.052993 -0.158601 -0.136529 0.009376 -0.063881 -0.029945 -0.021206 -0.181293 ... 0.114519 0.003933 -0.025470 -0.013300 0.017553 0.077125 -0.025470 -0.017992 0.060975 0.0

daysat -0.019637 -0.003653 -0.035189 -0.038585 0.034899 -0.023869 -0.063799 -0.032271 0.087868 -0.195372 ... -0.058625 0.020406 0.057019 0.060945 -0.022408 0.021024 0.057019 -0.019390 0.017584 -0.0

daysun -0.089517 0.025355 -0.001431 -0.003243 0.014403 0.136220 0.027981 -0.017872 -0.020463 -0.210462 ... -0.024966 0.008416 0.050887 -0.018241 0.024540 -0.047726 -0.029568 -0.020887 0.007252 -0.0

daythu 0.071730 0.087672 0.051859 -0.022406 0.051432 -0.123061 -0.062553 -0.026798 0.020121 -0.162237 ... -0.002838 -0.042278 -0.022793 -0.019300 -0.000195 -0.026885 -0.022793 -0.016101 -0.063223 0.0

daytue 0.011225 0.000016 0.028368 0.068610 0.035630 -0.014211 0.053396 0.139311 -0.001333 -0.166728 ... -0.005125 -0.014491 -0.023424 0.049688 -0.069308 -0.032351 -0.023424 0.117121 0.005008 -0.0

daywed 0.093908 0.017939 0.024803 0.125415 0.090580 -0.087508 -0.019965 -0.020744 -0.011452 -0.151487 ... 0.002899 -0.035713 -0.021282 -0.008985 0.043422 -0.033917 -0.021282 -0.015034 0.016325 -0.0

monthapr -0.117199 -0.197543 -0.268211 -0.106478 -0.157051 0.021235 0.048266 -0.009752 -0.008280 -0.019140 ... -0.017717 -0.026701 -0.008295 -0.034190 -0.024543 -0.045456 -0.008295 -0.005860 -0.023008 -0.0

monthaug 0.228103 0.497928 0.279361 0.334639 0.351404 0.054761 0.028577 0.093101 -0.004187 -0.100837 ... -0.098941 -0.149116 -0.046323 -0.190937 -0.137065 -0.253859 -0.046323 -0.032724 -0.128493 -0.5

monthdec -0.137044 -0.176301 -0.105642 -0.162322 -0.329648 -0.047714 0.269702 -0.009752 0.001010 -0.019140 ... 1.000000 -0.026701 -0.008295 -0.034190 -0.024543 -0.045456 -0.008295 -0.005860 -0.023008 -0.0

monthfeb -0.281535 -0.317899 -0.399277 -0.249777 -0.320015 0.140430 -0.029431 -0.014698 -0.020732 0.046323 ... -0.026701 1.000000 -0.012501 -0.051528 -0.036989 -0.068508 -0.012501 -0.008831 -0.034676 -0.1

monthjan -0.454771 -0.105647 -0.115064 -0.103588 -0.146520 0.170923 -0.070245 -0.004566 -0.012589 -0.027643 ... -0.008295 -0.012501 1.000000 -0.016007 -0.011491 -0.021282 -0.003883 -0.002743 -0.010772 -0.0

monthjul 0.031833 -0.001946 -0.100887 0.020982 0.142588 0.013185 -0.040645 -0.013390 0.006149 -0.048969 ... -0.034190 -0.051528 -0.016007 1.000000 -0.047363 -0.087722 -0.016007 -0.011308 -0.044402 -0.1

monthjun -0.040634 -0.050403 -0.186183 0.111516 0.051015 0.009382 0.012124 -0.013510 -0.020314 0.006000 ... -0.024543 -0.036989 -0.011491 -0.047363 1.000000 -0.062972 -0.011491 -0.008117 -0.031874 -0.1

monthmar -0.074327 -0.407404 -0.650427 -0.143520 -0.341797 -0.089836 0.181433 -0.020744 -0.045596 0.036205 ... -0.045456 -0.068508 -0.021282 -0.087722 -0.062972 1.000000 -0.021282 -0.015034 -0.059034 -0.2

monthmay -0.037230 -0.081980 -0.114209 -0.060493 -0.045540 0.086822 0.015054 -0.004566 0.006264 0.056423 ... -0.008295 -0.012501 -0.003883 -0.016007 -0.011491 -0.021282 1.000000 -0.002743 -0.010772 -0.0

monthnov -0.088964 -0.074218 -0.078380 -0.076559 -0.053798 -0.035885 0.011864 -0.003225 -0.008893 -0.019527 ... -0.005860 -0.008831 -0.002743 -0.011308 -0.008117 -0.015034 -0.002743 1.000000 -0.007610 -0.0

monthoct -0.005998 -0.187632 0.093279 -0.071154 -0.053513 -0.072334 -0.053850 -0.012665 -0.016878 -0.045585 ... -0.023008 -0.034676 -0.010772 -0.044402 -0.031874 -0.059034 -0.010772 -0.007610 1.000000 -0.1

monthsep 0.076609 0.110907 0.531857 -0.068877 0.088006 -0.062596 -0.181476 -0.051733 0.056573 0.107671 ... -0.093982 -0.141642 -0.044001 -0.181367 -0.130195 -0.241135 -0.044001 -0.031083 -0.122053 1.0

28 rows × 28 columns

In [8]: #returns the number of missing values in the data set

forest.isnull().sum()

month 0
Out[8]:
day 0
FFMC 0
DMC 0
DC 0
ISI 0
temp 0
RH 0
wind 0
rain 0
area 0
dayfri 0
daymon 0
daysat 0
daysun 0
daythu 0
daytue 0
daywed 0
monthapr 0
monthaug 0
monthdec 0
monthfeb 0
monthjan 0
monthjul 0
monthjun 0
monthmar 0
monthmay 0
monthnov 0
monthoct 0
monthsep 0
size_category 0
dtype: int64

from the above obtained details we can clearly mention that there is are no null values

In [9]: # to find duplicates

forest[forest.duplicated()].shape

(8, 31)
Out[9]:

In [38]: # removing all the duplicates

forest.drop_duplicates()

Out[38]: month day FFMC DMC DC ISI temp RH wind rain ... monthfeb monthjan monthjul monthjun monthmar monthmay monthnov monthoct monthsep size_category