Predicting Forest Fire Damage with SVM
Predicting Forest Fire Damage with SVM
other data.
Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane. Because of this, they can be considered the critical elements of a
data set.
hyperplane
Imagine hyperplane as a line that linearly separates and classifies a set of data.
The away from the hyperplane our data points lie, the more confident we are that they have been correctly classified. We therefore want our data points to be as far away from the hyperplane as possible, while still being on the
correct side of it.
The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will
be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.
So when new testing data is added, whatever side of the hyperplane it lands will decide the class that we assign to it.
in brief we can define hyperplane as - There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data
points. This best boundary is known as the hyperplane of SVM.
How to find the right hyperplane To find right hyperplane it is important to set margin appropriatly
The distance between the hyperplane and the nearest data point from either set is known as the margin
The goal is to choose a hyperplane with the greatest possible margin between the hyperplane and any point within the training set, giving a greater chance of new data being classified correctly.
SVM
Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression problems. it is mostly used in classification problems
An SVM model is basically a representation of different classes in a hyperplane in multidimensional space
It can solve linear and non-linear problems and work well for many practical problems
The idea of SVM is simple: The algorithm creates a line or a hyperplane which separates the data into classes
The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized
The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH)
Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is
used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as
Non-linear SVM classifier.
Applications of SVM
SVMs are helpful in text and hypertext categorization, as their application can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings. Some methods for shallow
semantic parsing are based on support vector machines.
Classification of images can also be performed using SVMs. Experimental results show that SVMs achieve significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of
relevance feedback. This is also true for image segmentation systems,
Classification of satellite data like SAR data using supervised SVM.
Hand-written characters can be recognized using SVM.
The SVM algorithm has been widely applied in the biological and other sciences. They have been used to classify proteins with up to 90% of the compounds classified correctly. Permutation tests based on SVM weights
have been suggested as a mechanism for interpretation of SVM models. Support-vector machine weights have also been used to interpret SVM models in the past.Posthoc interpretation of support-vector machine models
in order to identify features used by the model to make predictions is a relatively new area of research with special significance in the biological sciences
data we have is about forest fires, let's try to predict the damage caused by a given fire
We will use classification model to make our prediction
To download the dataset Click here
Attribute information:
Out[6]: month day FFMC DMC DC ISI temp RH wind rain ... monthfeb monthjan monthjul monthjun monthmar monthmay monthnov monthoct monthsep size_category
0 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 ... 0 0 0 0 1 0 0 0 0 small
1 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 ... 0 0 0 0 0 0 0 1 0 small
2 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 ... 0 0 0 0 0 0 0 1 0 small
3 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 ... 0 0 0 0 1 0 0 0 0 small
4 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 ... 0 0 0 0 1 0 0 0 0 small
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
512 aug sun 81.6 56.7 665.6 1.9 27.8 32 2.7 0.0 ... 0 0 0 0 0 0 0 0 0 large
513 aug sun 81.6 56.7 665.6 1.9 21.9 71 5.8 0.0 ... 0 0 0 0 0 0 0 0 0 large
514 aug sun 81.6 56.7 665.6 1.9 21.2 70 6.7 0.0 ... 0 0 0 0 0 0 0 0 0 large
515 aug sat 94.4 146.0 614.7 11.3 25.6 42 4.0 0.0 ... 0 0 0 0 0 0 0 0 0 small
516 nov tue 79.5 3.0 106.7 1.1 11.8 31 4.5 0.0 ... 0 0 0 0 0 0 1 0 0 small
Out[7]: FFMC DMC DC ISI temp RH wind rain area dayfri ... monthdec monthfeb monthjan monthjul monthjun monthmar monthmay monthnov monthoct mon
FFMC 1.000000 0.382619 0.330512 0.531805 0.431532 -0.300995 -0.028485 0.056702 0.040122 0.019306 ... -0.137044 -0.281535 -0.454771 0.031833 -0.040634 -0.074327 -0.037230 -0.088964 -0.005998 0.0
DMC 0.382619 1.000000 0.682192 0.305128 0.469594 0.073795 -0.105342 0.074790 0.072994 -0.012010 ... -0.176301 -0.317899 -0.105647 -0.001946 -0.050403 -0.407404 -0.081980 -0.074218 -0.187632 0.1
DC 0.330512 0.682192 1.000000 0.229154 0.496208 -0.039192 -0.203466 0.035861 0.049383 -0.004220 ... -0.105642 -0.399277 -0.115064 -0.100887 -0.186183 -0.650427 -0.114209 -0.078380 0.093279 0.5
ISI 0.531805 0.305128 0.229154 1.000000 0.394287 -0.132517 0.106826 0.067668 0.008258 0.046695 ... -0.162322 -0.249777 -0.103588 0.020982 0.111516 -0.143520 -0.060493 -0.076559 -0.071154 -0.0
temp 0.431532 0.469594 0.496208 0.394287 1.000000 -0.527390 -0.227116 0.069491 0.097844 -0.071949 ... -0.329648 -0.320015 -0.146520 0.142588 0.051015 -0.341797 -0.045540 -0.053798 -0.053513 0.0
RH -0.300995 0.073795 -0.039192 -0.132517 -0.527390 1.000000 0.069410 0.099751 -0.075519 0.064506 ... -0.047714 0.140430 0.170923 0.013185 0.009382 -0.089836 0.086822 -0.035885 -0.072334 -0.0
wind -0.028485 -0.105342 -0.203466 0.106826 -0.227116 0.069410 1.000000 0.061119 0.012317 0.118090 ... 0.269702 -0.029431 -0.070245 -0.040645 0.012124 0.181433 0.015054 0.011864 -0.053850 -0.1
rain 0.056702 0.074790 0.035861 0.067668 0.069491 0.099751 0.061119 1.000000 -0.007366 -0.004261 ... -0.009752 -0.014698 -0.004566 -0.013390 -0.013510 -0.020744 -0.004566 -0.003225 -0.012665 -0.0
area 0.040122 0.072994 0.049383 0.008258 0.097844 -0.075519 0.012317 -0.007366 1.000000 -0.052911 ... 0.001010 -0.020732 -0.012589 0.006149 -0.020314 -0.045596 0.006264 -0.008893 -0.016878 0.0
dayfri 0.019306 -0.012010 -0.004220 0.046695 -0.071949 0.064506 0.118090 -0.004261 -0.052911 1.000000 ... -0.019140 0.046323 -0.027643 -0.048969 0.006000 0.036205 0.056423 -0.019527 -0.045585 0.1
daymon -0.059396 -0.107921 -0.052993 -0.158601 -0.136529 0.009376 -0.063881 -0.029945 -0.021206 -0.181293 ... 0.114519 0.003933 -0.025470 -0.013300 0.017553 0.077125 -0.025470 -0.017992 0.060975 0.0
daysat -0.019637 -0.003653 -0.035189 -0.038585 0.034899 -0.023869 -0.063799 -0.032271 0.087868 -0.195372 ... -0.058625 0.020406 0.057019 0.060945 -0.022408 0.021024 0.057019 -0.019390 0.017584 -0.0
daysun -0.089517 0.025355 -0.001431 -0.003243 0.014403 0.136220 0.027981 -0.017872 -0.020463 -0.210462 ... -0.024966 0.008416 0.050887 -0.018241 0.024540 -0.047726 -0.029568 -0.020887 0.007252 -0.0
daythu 0.071730 0.087672 0.051859 -0.022406 0.051432 -0.123061 -0.062553 -0.026798 0.020121 -0.162237 ... -0.002838 -0.042278 -0.022793 -0.019300 -0.000195 -0.026885 -0.022793 -0.016101 -0.063223 0.0
daytue 0.011225 0.000016 0.028368 0.068610 0.035630 -0.014211 0.053396 0.139311 -0.001333 -0.166728 ... -0.005125 -0.014491 -0.023424 0.049688 -0.069308 -0.032351 -0.023424 0.117121 0.005008 -0.0
daywed 0.093908 0.017939 0.024803 0.125415 0.090580 -0.087508 -0.019965 -0.020744 -0.011452 -0.151487 ... 0.002899 -0.035713 -0.021282 -0.008985 0.043422 -0.033917 -0.021282 -0.015034 0.016325 -0.0
monthapr -0.117199 -0.197543 -0.268211 -0.106478 -0.157051 0.021235 0.048266 -0.009752 -0.008280 -0.019140 ... -0.017717 -0.026701 -0.008295 -0.034190 -0.024543 -0.045456 -0.008295 -0.005860 -0.023008 -0.0
monthaug 0.228103 0.497928 0.279361 0.334639 0.351404 0.054761 0.028577 0.093101 -0.004187 -0.100837 ... -0.098941 -0.149116 -0.046323 -0.190937 -0.137065 -0.253859 -0.046323 -0.032724 -0.128493 -0.5
monthdec -0.137044 -0.176301 -0.105642 -0.162322 -0.329648 -0.047714 0.269702 -0.009752 0.001010 -0.019140 ... 1.000000 -0.026701 -0.008295 -0.034190 -0.024543 -0.045456 -0.008295 -0.005860 -0.023008 -0.0
monthfeb -0.281535 -0.317899 -0.399277 -0.249777 -0.320015 0.140430 -0.029431 -0.014698 -0.020732 0.046323 ... -0.026701 1.000000 -0.012501 -0.051528 -0.036989 -0.068508 -0.012501 -0.008831 -0.034676 -0.1
monthjan -0.454771 -0.105647 -0.115064 -0.103588 -0.146520 0.170923 -0.070245 -0.004566 -0.012589 -0.027643 ... -0.008295 -0.012501 1.000000 -0.016007 -0.011491 -0.021282 -0.003883 -0.002743 -0.010772 -0.0
monthjul 0.031833 -0.001946 -0.100887 0.020982 0.142588 0.013185 -0.040645 -0.013390 0.006149 -0.048969 ... -0.034190 -0.051528 -0.016007 1.000000 -0.047363 -0.087722 -0.016007 -0.011308 -0.044402 -0.1
monthjun -0.040634 -0.050403 -0.186183 0.111516 0.051015 0.009382 0.012124 -0.013510 -0.020314 0.006000 ... -0.024543 -0.036989 -0.011491 -0.047363 1.000000 -0.062972 -0.011491 -0.008117 -0.031874 -0.1
monthmar -0.074327 -0.407404 -0.650427 -0.143520 -0.341797 -0.089836 0.181433 -0.020744 -0.045596 0.036205 ... -0.045456 -0.068508 -0.021282 -0.087722 -0.062972 1.000000 -0.021282 -0.015034 -0.059034 -0.2
monthmay -0.037230 -0.081980 -0.114209 -0.060493 -0.045540 0.086822 0.015054 -0.004566 0.006264 0.056423 ... -0.008295 -0.012501 -0.003883 -0.016007 -0.011491 -0.021282 1.000000 -0.002743 -0.010772 -0.0
monthnov -0.088964 -0.074218 -0.078380 -0.076559 -0.053798 -0.035885 0.011864 -0.003225 -0.008893 -0.019527 ... -0.005860 -0.008831 -0.002743 -0.011308 -0.008117 -0.015034 -0.002743 1.000000 -0.007610 -0.0
monthoct -0.005998 -0.187632 0.093279 -0.071154 -0.053513 -0.072334 -0.053850 -0.012665 -0.016878 -0.045585 ... -0.023008 -0.034676 -0.010772 -0.044402 -0.031874 -0.059034 -0.010772 -0.007610 1.000000 -0.1
monthsep 0.076609 0.110907 0.531857 -0.068877 0.088006 -0.062596 -0.181476 -0.051733 0.056573 0.107671 ... -0.093982 -0.141642 -0.044001 -0.181367 -0.130195 -0.241135 -0.044001 -0.031083 -0.122053 1.0
28 rows × 28 columns
month 0
Out[8]:
day 0
FFMC 0
DMC 0
DC 0
ISI 0
temp 0
RH 0
wind 0
rain 0
area 0
dayfri 0
daymon 0
daysat 0
daysun 0
daythu 0
daytue 0
daywed 0
monthapr 0
monthaug 0
monthdec 0
monthfeb 0
monthjan 0
monthjul 0
monthjun 0
monthmar 0
monthmay 0
monthnov 0
monthoct 0
monthsep 0
size_category 0
dtype: int64
from the above obtained details we can clearly mention that there is are no null values
(8, 31)
Out[9]:
Out[38]: month day FFMC DMC DC ISI temp RH wind rain ... monthfeb monthjan monthjul monthjun monthmar monthmay monthnov monthoct monthsep size_category
0 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 ... 0 0 0 0 1 0 0 0 0 small
1 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 ... 0 0 0 0 0 0 0 1 0 small
2 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 ... 0 0 0 0 0 0 0 1 0 small
3 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 ... 0 0 0 0 1 0 0 0 0 small
4 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 ... 0 0 0 0 1 0 0 0 0 small
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
512 aug sun 81.6 56.7 665.6 1.9 27.8 32 2.7 0.0 ... 0 0 0 0 0 0 0 0 0 large
513 aug sun 81.6 56.7 665.6 1.9 21.9 71 5.8 0.0 ... 0 0 0 0 0 0 0 0 0 large
514 aug sun 81.6 56.7 665.6 1.9 21.2 70 6.7 0.0 ... 0 0 0 0 0 0 0 0 0 large
515 aug sat 94.4 146.0 614.7 11.3 25.6 42 4.0 0.0 ... 0 0 0 0 0 0 0 0 0 small
516 nov tue 79.5 3.0 106.7 1.1 11.8 31 4.5 0.0 ... 0 0 0 0 0 0 1 0 0 small
from above data we can clearly see that code is trying to drop duplicates
In [11]: forest.dropna()
Out[11]: month day FFMC DMC DC ISI temp RH wind rain ... monthfeb monthjan monthjul monthjun monthmar monthmay monthnov monthoct monthsep size_category
0 mar fri 86.2 26.2 94.3 5.1 8.2 51 6.7 0.0 ... 0 0 0 0 1 0 0 0 0 small
1 oct tue 90.6 35.4 669.1 6.7 18.0 33 0.9 0.0 ... 0 0 0 0 0 0 0 1 0 small
2 oct sat 90.6 43.7 686.9 6.7 14.6 33 1.3 0.0 ... 0 0 0 0 0 0 0 1 0 small
3 mar fri 91.7 33.3 77.5 9.0 8.3 97 4.0 0.2 ... 0 0 0 0 1 0 0 0 0 small
4 mar sun 89.3 51.3 102.2 9.6 11.4 99 1.8 0.0 ... 0 0 0 0 1 0 0 0 0 small
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
512 aug sun 81.6 56.7 665.6 1.9 27.8 32 2.7 0.0 ... 0 0 0 0 0 0 0 0 0 large
513 aug sun 81.6 56.7 665.6 1.9 21.9 71 5.8 0.0 ... 0 0 0 0 0 0 0 0 0 large
514 aug sun 81.6 56.7 665.6 1.9 21.2 70 6.7 0.0 ... 0 0 0 0 0 0 0 0 0 large
515 aug sat 94.4 146.0 614.7 11.3 25.6 42 4.0 0.0 ... 0 0 0 0 0 0 0 0 0 small
516 nov tue 79.5 3.0 106.7 1.1 11.8 31 4.5 0.0 ... 0 0 0 0 0 0 1 0 0 small
we can observe that we have successfully created new dataframe for the data we had in forest dataframe
creating new dataframe helps us to hold the data in two data frames which helps us in model building and other steps
In [15]: FR.tail()
FR.loc[row_indexes,'Area']="large"
row_indexes=FR[FR['area']<5].index
FR.loc[row_indexes,'Area']="small"
FR.head()
In [18]: FR.head()
we can clearly notice the changes implementing on the dataset as we are procceeding further
In [19]: FR.corr()
<seaborn.axisgrid.PairGrid at 0x151154a0b20>
Out[20]:
from the above plot we can clearly notice the parameters affecting as heat is increasing
as temperature is increasing we can notice that the wind is also decreasing
rain is also drastically reduced with respective to the temperature
<AxesSubplot:>
Out[21]:
from above heatmap it is clearly observable that the diagonal elements have no impacts on themselves
as tempearture is increasing the the rain hours is also decreasing
various insights can be found from the heatmap with observance
C:\Users\shyam\anaconda3\lib\site-packages\seaborn\distributions.py:2551: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt y
our code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='temp', ylabel='Density'>
Out[22]:
it is clearly observable that increase in the temperature is also leading to the increase in the density but at some point the density is reducing with temperature to
Typically, when you separate a data set into a training set and testing set, most of the data is used for training, and a smaller portion of the data is used for testing. ##### Test data :
Training data is necessary to teach an ML algorithm ##### Train data :
Test data helps you to validate the progress of the algorithm's training and adjust or optimize it for improved results
In [24]: X= FR.iloc[:,0:4]
y= FR.iloc[:,4]
X_train, X_test, y_train, y_test= train_test_split(X,y, test_size=0.3)
#splitting the data into test and train
here the test size 0.3 represents that the percentage is 33% of data is for testing
In [27]: y_train
275 large
Out[27]:
322 small
390 large
234 large
282 small
...
375 large
423 small
33 small
339 small
304 small
Name: Area, Length: 361, dtype: object
In [28]: y_test
469 large
Out[28]:
108 small
185 large
298 small
397 small
...
122 small
111 small
126 small
100 small
71 small
Name: Area, Length: 156, dtype: object
In [29]: # importing support vector classifier (SVC) from sklearn for nect operations
# SVC always works best for low dimensional data
from sklearn.svm import SVC
Kernal .
- Kernalized algorithms
Kernal help to determine the shape of the hyperplane(Hyperplanes are decision boundaries that help classify the data points) and decision boundary. We can set the value of the kernel parameter in the SVM code
Kernalized algorithms are used in classification problem
SVM ( Support Vector Machines) and PCA ( Principle Ccomposite Analysis support kernal operations)
In the SVM classifier, it is easy to have a linear hyper-plane between these two classes. But, another burning question which arises is, should we need to add this feature manually to have a hyper-plane. No, the SVM
algorithm has a technique called the kernel trick. The SVM kernel is a function that takes low dimensional input space and transforms it to a higher dimensional space i.e. it converts not separable problem to separable
problem. It is mostly useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then finds out the process to separate the data based on the labels or outputs you’ve defined.
Support vector machines are a tool which best serves the purpose of separating two classes. They are a kernel-based algorithm.
A kernel refers to a function that transforms the input data into a high dimensional space where the question or problem can be solved.
A kernel function can be either linear or non-linear. Kernel methods are a type of class of algorithms for pattern analysis.
The primary function of the kernel is to get data as input and transform them into the required forms of output.
In statistics, “kernel” is the mapping function that calculates and represents values of a 2-dimensional data in a 3-dimensional space format.
A support vector machine uses a kernel trick which transforms the data to a higher dimension and then it tries to find an optimal hyperplane between the outputs possible.
Kernel’s method of analysis of data in support vector machine algorithms using a linear classifier to solve non-linear problems is known as ‘kernel trick’.
Kernels are used in statistics and math, but it is most widely and also most commonly used in support vector machines.
In [39]: #- Kernel Function is a method used to take data as input and transform into the required form of processing data
# Kernel = rbf
model_rbf = SVC(kernel = "rbf")
model_rbf.fit(X_train,y_train)
pred_test_rbf = model_rbf.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test,pred_test_rbf ))
np.mean(pred_test_rbf==y_test)
[[ 0 40]
[ 0 116]]
0.7435897435897436
Out[39]:
Kernal is the function which is used to perform certain mathematical operation in support vector machine, it can manupulate the according to it
Radial Basis Function kernel, or RBF kernel is the default kernel used within the sklearn's SVM classification algorithm
Classification report
A classification report is a performance evaluation metric in machine learning #### Metrics Definition
Precision - Precision is defined as the ratio of true positives to the sum of true and false positives.
Recall - Recall is defined as the ratio of true positives to the sum of true positives and false negatives.
F1 Score - The F1 is the weighted harmonic mean of precision and recall. The closer the value of the F1 score is to 1.0, the better the expected performance of the model is.
Support - Support is the number of actual occurrences of the class in the dataset. It doesn’t vary between models, it just diagnoses the performance evaluation process.
In [31]: print(classification_report(y_test,pred_test_rbf))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Linear kernal
Linear Kernel is used when the data is Linearly separable, that is, it can be separated using a single Line. It is one of the most common kernels to be used. It is mostly used when there are a Large number of Features in a
particular Data Set. One of the examples where there are a lot of features, is Text Classification, as each alphabet is a new feature. So we mostly use Linear Kernel in Text Classification
np.mean(pred_test_linear==y_test)
[[ 0 40]
[ 0 116]]
0.7435897435897436
Out[32]:
In [33]: print(classification_report(y_test,pred_test_linear))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
Sigmoid kernal
this function is equivalent to a two-layer, perceptron model of neural network, which is used as activation function for artificial neurons
Sigmoid Kernel: K(X,Y)=tanh(γ⋅XTY+r) which is similar to the sigmoid function in logistic regression
np.mean(pred_test_sigmoid==y_test)
[[ 8 32]
[37 79]]
0.5576923076923077
Out[40]:
The equation is
In [35]: print(classification_report(y_test,pred_test_sigmoid))
Polynomial kernal
Polynomial kernel looks not only at the given features of input samples to determine their similarity, but also combinations of these.
The feature space of a polynomial kernel is equivalent to that of polynomial regression, but without the combinatorial blowup in the number of parameters to be learned.
When the input features are binary-valued (booleans), then the features correspond to logical conjunctions of input features.
np.mean(pred_test_poly==y_test)
[[ 0 40]
[ 0 116]]
0.7435897435897436
Out[41]:
In [37]: print(classification_report(y_test,pred_test_poly))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\shyam\anaconda3\lib\site-packages\sklearn\metrics\_classification.py:1248: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels wit
h no predicted samples. Use `zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
cons:
Thank you