SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science

Get unlimited access to the best of Medium for less than $1/week.
Become a member
SVM (Support Vector Machine) for

classification
SVM for Classification and Regression
Aditya Kumar · Follow

Published in Towards Data Science
11 min read · Jul 8, 2020
Listen Share More
SVM: Support Vector Machine is a supervised classification algorithm where we

draw a line between two different categories to differentiate between them. SVM is
also known as the support vector network.
Consider an example where we have cats and dogs together.
Dogs and Cats (Image by Author)
We want our model to differentiate between cats and dogs.

SVM Terminology (Image by Author)
There are many cases where the differentiation is not so simple as shown above. In
that case, the hyperplane dimension needs to be changed from 1 dimension to the
Nth dimension. This is called Kernel. To be more simple, its the functional
relationship between the two observations. It will add more dimensions to the data
so we can easily differentiate among them.
We can have three types of kernels.
1. Linear Kernels
2. Polynomial Kernels
3. Radial Basis Function Kernel
In practical life, it’s very difficult to get a straight hyperplane. Consider the image
below where the points are mixed together. You cannot separate the points using a
straight 2d hyperplane.
Image by Author
Some use cases of SVM:
1. Face detection
2. Handwriting detection
3. Image Classifications
4. Text and Hypertext Categorization
let’s check how SVM works for regression.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-
courses-data/CognitiveClass/DA0101EN/automobileEDA.csv'
df = pd.read_csv(path)
df.head()
Open in app
Search
Image by Author
df.head() will give us the details of the top 5 rows of every column. We can use
df.tail() to get the last 5 rows and similarly df.head(10) to get to the top 10 rows.
The data is about cars and we need to predict the price of cars using the above data
We will be using the Decision Tree to get the price of the car.
df.dtypes
symboling int64
normalized-losses int64
make object
aspiration object
num-of-doors object
body-style object
drive-wheels object
engine-location object
wheel-base float64
length float64
width float64
height float64
curb-weight int64
engine-type object
num-of-cylinders object
engine-size int64
fuel-system object
bore float64
stroke float64
compression-ratio float64
horsepower float64
peak-rpm float64
city-mpg int64
highway-mpg int64
price float64
city-L/100km float64
horsepower-binned object
diesel int64
gas int64
dtype: object
dtypes give the data type of column
df.describe()
Image by Author
In the above data frame, some of the columns are not numeric. So we will consider
only those columns whose values are in numeric and will make all numeric to float.
df.dtypes
for x in df:
if df[x].dtypes == "int64":
df[x] = df[x].astype(float)
print (df[x].dtypes)
Out:
float64
float64
float64
float64
float64
float64
float64
float64
Preparing the Data As with the classification task, in this section, we will divide our
data into attributes and labels and consequently into training and test sets. We will
create 2 data sets, one for the price while the other (df-price). Since our data frame
has various data in object format, for this analysis we are removing all the columns
with object type and for all NaN values, we are removing that row.
df = df.select_dtypes(exclude=['object'])
df=df.fillna(df.mean())
X = df.drop('price',axis=1)
y = df['price']
Here the X variable contains all the columns from the dataset, except the ‘Price’
column, which is the label. The y variable contains values from the ‘Price’ column,
which means that the X variable contains the attribute set and y variable contains
the corresponding labels.
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=0)
Training SVM
from sklearn.svm import SVR
We will create an object svr using the function SVM. We will use the kernel as linear.
svr = SVR(kernel = 'linear',C = 1000)
in order to work in an efficient manner, we will standardize our data.SVM works at a

distance of points so it's necessary that all our data should be of the same standard.
from sklearn.preprocessing import StandardScaler
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
X_test_std
sc= StandardScaler().fit(X_train)
Out:
array([[ 0.17453157, -0.7473421 , -0.70428107, -1.4995245 ,
-1.05619832,
-0.67877552, -1.30249126, -0.87278899, -1.15396095,
-0.47648372,
-0.09140157, -0.90774727, 0.59090608, 2.00340082,
1.79022864,
-1.50033307, -0.29738086, 0.29738086],
[-1.42118568, -1.74885637, 0.63398001, 0.14076744,
0.30739662,
0.50614488, -0.1142863 , -0.38613195, -0.24348674,
0.32881569,
3.4734668 , -0.82496629, -1.30634872, 0.73955013,
0.31375141,
-0.82735207, 3.36269123, -3.36269123],
[-0.62332705, -0.01896807, 2.63290164, 2.08080815,
1.2007864 ,
2.05879919, 1.74841125, 0.63584784, 1.38777956,
0.89923611,
3.05894722, -0.2179058 , -2.04417004, -0.05035655,
-0.86743037,
-0.18802012, 3.36269123, -3.36269123],
[-0.62332705, 0.16312543, 0.29517974, 0.64867509,
0.30739662,
0.58786352, 1.09156527, 1.34150055, 0.36349607,
0.06038255,
-0.2572094 , 1.35493275, 0.16929391, -1.31420724,
-1.31037354,
1.61715245, -0.29738086, 0.29738086],
[-1.42118568, -0.01896807, 0.9897203 , 1.15658275,
0.30739662,
0.17927028, 1.20136639, 0.85484351, -0.24348674,
0.32881569,
-0.20194012, 1.46530738, 0.16929391, -0.99824457,
-1.0150781 ,
1.02334568, -0.29738086, 0.29738086],
[ 0.9723902 , -0.86873777, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.51623682, -0.4104648 , -0.54697814,
0.4965864 ,
-0.2572094 , -0.49384238, 0.27469695, 0.26560612,
0.46139913,
-0.47216765, -0.29738086, 0.29738086],
[ 0.9723902 , -0.01896807, 0.19353966, 0.28231547,
0.21335559,
-0.22932296, -0.06134647, 0.24652221, -0.54697814,
0.4965864 ,
-0.39538259, 0.19599908, 0.80171217, -0.99824457,
-0.86743037,
1.02334568, -0.29738086, 0.29738086],
[ 0.17453157, -1.08118019, -0.50100091, -1.26638656,
-1.05619832,
0.34270758, -1.08484976, -0.82412328, -1.07808809,
-0.74491686,
-0.2572094 , -1.12849654, -0.67393045, 1.52945681,
1.19963775,
-1.28401775, -0.29738086, 0.29738086],
[-0.62332705, 1.98406049, 0.43069985, 0.2406837 ,
-0.49195214,
0.26098893, 0.44452297, 0.92784206, -0.09174103,
-0.20805059,
-0.2572094 , 0.49952933, -1.83336395, -0.6822819 ,
-0.4244872 ,
0.54264497, -0.29738086, 0.29738086],
[-0.62332705, -0.95978452, -0.50100091, -0.63358358,
-0.6800342 ,
-0.27018228, -0.89661927, -0.67812617, -0.54697814,
-0.74491686,
-0.2572094 , -0.90774727, -0.67393045, 0.73955013,
0.9043423 ,
-0.82735207, -0.29738086, 0.29738086],
[-0.62332705, -0.2314105 , 0.02413952, 0.32394725,
0.30739662,
0.75130082, -0.22212668, -0.09413772, 0.21175037,
0.46303226,
-0.36774795, -0.52143604, -0.67393045, 0.10762479,
0.16610369,
-0.33555826, -0.29738086, 0.29738086],
[ 1.77024883, -0.01896807, -1.55128176, -0.41709835,
-0.39791111,
-0.84221282, 0.51314867, 1.65782762, 1.53952526,
-1.18112071,
-0.11903621, 2.87258398, 1.64493653, -1.31420724,
-0.86743037,
1.61715245, -0.29738086, 0.29738086],
[ 0.9723902 , -0.86873777, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.5378049 , -0.70245902, -1.2298338 ,
0.4965864 ,
3.61163999, -1.40443312, -0.67393045, 1.84541948,
2.23317181,
-1.43212554, 3.36269123, -3.36269123],
[-0.62332705, -0.95978452, -0.50100091, -0.63358358,
-0.6800342 ,
-0.27018228, -0.51623682, -0.38613195, -0.24348674,
0.32881569,
3.4734668 , -1.29405849, -1.30634872, 1.37147547,
0.75669458,
-1.20342969, 3.36269123, -3.36269123],
[ 1.77024883, -0.01896807, -0.46712088, -0.05906508,
0.21335559,
-1.41424335, 0.75039751, 0.73317925, 0.97047888,
2.04007695,
-0.80990217, 1.16177714, -0.25231827, -0.99824457,
-1.0150781 ,
1.02334568, -0.29738086, 0.29738086],
[ 0.17453157, -0.01896807, 1.20994048, 1.56457414,
2.61140185,
0.83301947, 0.81510174, 0.24652221, -0.54697814,
0.4965864 ,
-0.39538259, 0.19599908, 0.80171217, -0.99824457,
-0.86743037,
1.02334568, -0.29738086, 0.29738086],
[ 0.17453157, -0.65629534, -0.83980118, -1.99077944,
-0.86811626,
-0.43361958, -1.14171105, -0.82412328, -1.60919805,
0.53014054,
-0.20194012, -0.74218531, 1.85574262, 0.73955013,
0.46139913,
-0.82735207, -0.29738086, 0.29738086],
[ 1.77024883, 0.83080162, 0.07495956, 1.05666649,
0.30739662,
0.99645676, 0.33080038, -0.11847057, -3.01284579,
-3.96611452,
-0.17430548, 0.19599908, 0.27469695, -0.6822819 ,
-0.4244872 ,
0.54264497, -0.29738086, 0.29738086],
[-0.62332705, -0.50455075, -0.3654808 , -0.53366732,
-0.30387008,
-0.14760431, -0.48878654, -0.38613195, -0.69872384,
1.10056096,
-0.2572094 , -0.46624873, 1.43413044, 0.26560612,
0.31375141,
-0.47216765, -0.29738086, 0.29738086],
[ 0.9723902 , 1.16463971, -0.83980118, -1.38295553,
-0.6800342 ,
-1.16908741, -1.16523986, -0.82412328, -1.3815795 ,
-0.07383402,
-0.14667084, -0.96293458, 0.80171217, 0.89753147,
1.05199003,
-0.93047013, -0.29738086, 0.29738086],
[ 0.9723902 , -0.86873777, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.42996451, -0.70245902, -1.2298338 ,
0.4965864 ,
3.61163999, -0.96293458, -1.30634872, 1.84541948,
1.64258092,
-1.43212554, 3.36269123, -3.36269123],
[-0.62332705, -1.14187802, -0.29772074, -0.02575966,
-0.20982905,
0.50614488, 0.21903853, -0.43479765, 1.08428815,
-2.05352841,
-0.6164597 , 0.22359274, -0.67393045, -0.36631922,
-1.16272582,
0.14554438, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 2.42962147, 2.13909264,
1.76503258,
-0.35190093, 2.99543824, 3.21513015, 1.12222458,
3.08025536,
-0.50592114, 2.01718056, -0.7793335 , -1.63016991,
-1.75331671,
2.36930768, -0.29738086, 0.29738086],
[ 0.17453157, 1.37708213, -0.70428107, -0.43375106,
-0.86811626,
-0.43361958, -0.72407465, -0.67812617, -0.54697814,
-0.74491686,
-0.2572094 , -0.90774727, -0.67393045, 0.58156879,
0.46139913,
-0.71712242, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 0.02413952, 0.32394725,
0.30739662,
0.75130082, -0.18683347, -0.09413772, 0.21175037,
0.46303226,
3.52873607, -1.07330922, -0.99013959, 1.68743815,
1.64258092,
-1.3601287 , 3.36269123, -3.36269123],
[ 1.77024883, -0.01896807, -1.55128176, -0.41709835,
-0.39791111,
-0.84221282, 0.42687636, 1.65782762, 1.53952526,
-1.18112071,
-0.11903621, 2.87258398, 1.64493653, -1.31420724,
-0.86743037,
1.61715245, -0.29738086, 0.29738086],
[ 0.9723902 , -0.01896807, -0.22996069, -0.18396041,
-0.16280853,
0.83301947, -0.64564528, -0.4104648 , -0.54697814,
0.4965864 ,
-0.2572094 , -0.49384238, 0.27469695, 0.26560612,
0.46139913,
-0.47216765, -0.29738086, 0.29738086],
[ 1.77024883, -0.01896807, -0.70428107, -1.21642843,
-0.77407523,
0.79216014, -0.55741224, -0.4104648 , -0.54697814,
0.4965864 ,
-0.39538259, -0.35587409, 0.80171217, -0.20833789,
-0.27683948,
-0.02818713, -0.29738086, 0.29738086],
[ 1.77024883, 1.92336265, -0.70428107, -0.41709835,
1.15376589,
-1.41424335, 0.47001251, 0.61151499, 2.29825376,
-0.47648372,
-0.11903621, 1.10658982, 0.80171217, -0.99824457,
-0.57213493,
1.02334568, -0.29738086, 0.29738086],
[-0.62332705, 0.67905703, 2.42962147, 2.13909264,
1.76503258,
-0.35190093, 2.99543824, 3.21513015, 1.12222458,
3.08025536,
-0.50592114, 2.01718056, -0.7793335 , -1.63016991,
-1.75331671,
2.36930768, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 1.92142106, 1.92260741,
2.37629928,
1.07817541, 1.89546632, 2.0228204 , 1.08428815,
0.46303226,
-0.53355578, 2.18274251, 0.59090608, -1.63016991,
-1.60566899,
2.36930768, -0.29738086, 0.29738086],
[ 1.77024883, 0.83080162, -0.56876096, -0.40877199,
-0.0687675 ,
-1.6593993 , -0.07507161, -1.11611751, -0.01681188,
0.01643855,
-0.14667084, 0.88584055, 1.85574262, -1.47218858,
-1.16272582,
1.96972521, -0.29738086, 0.29738086],
[-0.62332705, -1.26327369, -0.50100091, -0.35048751,
-1.05619832,
2.22223648, -0.48682581, -0.82412328, -1.07808809,
-0.74491686,
-0.2572094 , -1.12849654, -0.67393045, 0.26560612,
0.16610369,
-0.47216765, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 2.63290164, 2.08080815,
1.2007864 ,
2.05879919, 1.85625163, 0.63584784, 1.38777956,
0.89923611,
3.05894722, -0.2179058 , -2.04417004, -0.05035655,
-0.86743037,
-0.18802012, 3.36269123, -3.36269123],
[ 0.17453157, -0.14036374, -0.83980118, -1.38295553,
-0.96215729,
-1.16908741, -0.80446476, -0.67812617, -1.15396095,
0.46303226,
-0.64409434, -0.02475019, 0.80171217, -0.20833789,
-0.12919176,
-0.02818713, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 0.29517974, 0.76524406,
0.49547868,
0.58786352, 0.04845465, -0.4104648 , -0.54697814,
0.4965864 ,
-0.2572094 , -0.41106141, 0.80171217, -0.05035655,
0.01845597,
-0.18802012, -0.29738086, 0.29738086],
[ 0.9723902 , -0.56524859, 0.07495956, 1.05666649,
0.30739662,
0.99645676, 0.30727157, -0.11847057, 0.78079675,
-0.61070029,
-0.17430548, 0.19599908, 0.27469695, -0.6822819 ,
-0.4244872 ,
0.54264497, -0.29738086, 0.29738086],
[ 0.9723902 , 1.25568646, 0.1257796 , 0.22403099,
0.2603761 ,
0.26098893, 0.56020629, 0.24652221, -0.54697814,
0.4965864 ,
-0.53355578, 0.33396738, 0.80171217, -1.1562259 ,
-1.31037354,
1.30375443, -0.29738086, 0.29738086],
[ 0.17453157, 0.07207868, -0.39936082, -0.12567592,
-0.20982905,
-0.84221282, -0.26134137, -0.09413772, 0.06000467,
0.69791126,
-0.39538259, -0.41106141, -0.25231827, -0.05035655,
0.16610369,
-0.18802012, -0.29738086, 0.29738086],
[-0.62332705, -0.01896807, 2.63290164, 2.08080815,
1.2007864 ,
2.05879919, 1.3562644 , -0.14280343, 0.47730535,
-0.20805059,
-0.42301723, -0.16271848, -0.25231827, -0.99824457,
-1.0150781 ,
1.02334568, -0.29738086, 0.29738086],
[ 0.9723902 , -1.20257586, -0.83980118, -1.41626095,
-1.15023935,
0.01583299, -0.95740203, -0.70245902, 1.08428815,
-2.99304439,
-0.2572094 , -0.93534092, -0.46312436, 0.89753147,
0.75669458,
-0.93047013, -0.29738086, 0.29738086]])
Our data has been standardized.
svr.fit(X_train_std,y_train)
y_test_pred = svr.predict(X_test_std)
y_train_pred = svr.predict(X_train_std)
Check our predicted values
y_test_pred
Out:
array([ 5957.14966842, 14468.92070095, 20448.68298715,

21478.92571603,
20124.68107731, 9079.70352739, 15827.33391626,
6005.66841863,
16069.42347072, 7254.92359917, 9776.48212662,
20670.10877046,
12433.67581456, 11143.00652033, 13539.9142024 ,
19226.71716277,
6548.30452181, 20166.24495046, 8818.8454487 ,
6470.62142339,
12170.16325257, 10461.81392719, 30525.17977575,
6766.09198268,
13432.63305394, 20616.74081099, 8883.9869784 ,
8581.12259715,
14982.54291298, 30430.16956057, 28911.77849742,
13980.97058004,
7824.82285653, 20515.39293649, 8003.99866007,
11357.89548277,
13718.79721617, 16467.89155357, 9304.60919156,
18705.27852977,
6421.02399024]
Time to check modal performance.
from sklearn.metrics import r2_score

r2_score(y_train,y_train_pred)
Out:
0.8510467833352241
r2_score(y_test,y_test_pred)
Out:
0.720783662521954
Our R sqrt score for the test data is 0.72 and for the train data, it is 0.85 which is good
value.
import seaborn as sns

plt.figure(figsize=(5, 7))
ax = sns.distplot(y, hist=False, color="r", label="Actual Value")
sns.distplot(y_test_pred, hist=False, color="b", label="Fitted
Values" , ax=ax)
plt.title('Actual vs Fitted Values for Price')
plt.show()
plt.close()
Actual vs Fitted Value
The above is the graph between the actual and predicted values.
SVM for classification
Here we will use the diabetes data that I used in my earlier story for
KNN.https://towardsdatascience.com/knn-algorithm-what-when-why-how-
41405c16c36f
let’s predict the same dataset result using SVM for classification.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv("../input/diabetes.csv")
data.head()
Data from the dataset.
We will read the CSV file through pd.read.csv.And through head() we can see the top
5 rows. There are some factors where the values cannot be zero. For example,
Glucose value cannot be 0 for a human. Similarly, blood pressure, skin thickness,
Insulin, and BMI cannot be zero for a human.
non_zero =
['Glucose','BloodPressure','SkinThickness','Insulin','BMI']
for coloumn in non_zero:
data[coloumn] = data[coloumn].replace(0,np.NaN)
mean = int(data[coloumn].mean(skipna = True))
data[coloumn] = data[coloumn].replace(np.NaN,mean)
print(data[coloumn])
from sklearn.model_selection import train_test_split
X =data.iloc[:,0:8]
y =data.iloc[:,8]
X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=0.2,random_state=0, stratify=y)
X.head()
Image by Author
For data X we are taking all the rows of columns ranging from 0 to 7. Similarly, for y,
we are taking all the rows for the 8th column.
We have train_test_split which we had imported during the start of the program and
we have defined test size as 0.2 which implies out of all the data 20% will be kept
aside to test the data at a later stage.
from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
from sklearn import svm
svm1 = svm.SVC(kernel='linear', C = 0.01)
svm1.fit(X_test,y_test)
SVC(C=0.01, kernel='linear')
y_train_pred = svm1.predict(X_train)
y_test_pred = svm1.predict(X_test)
y_test_pred
Out:
array([0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0,
1, 0,
1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1,
0, 0,
1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0,
0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0])
We have an array of data but we need to evaluate our model to check the accuracy.
Let's start it with confusion matrix
from sklearn.metrics import accuracy_score,confusion_matrix

Let's check the confusion matrix
confusion_matrix(y_test,y_test_pred)
Out:
array([[92, 8],
[26, 28]])
We have the confusion matrix where the diagonal with 118 and 36 shows the correct
value and 0,0 shows the prediction that we missed.
We will check the accuracy score
accuracy_score(y_test,y_test_pred)
Out:
0.7792207792207793
We have an accuracy score of 0.78
Let’s figure out the difference between the actual and predicted values.
df=pd.DataFrame({'Actual':y_test, 'Predicted':y_test_pred})
df
Actual vs Predicted
We created our linear model with C as 0.01. But how to ensure its the best value. One
option is to change is manual. We can assign different values and run the code one
by one. but this process is very lengthy and time-consuming.
We will use a grid search where we will assign different values of C and from the
dictionary of the value, our model will tell users which is the best value for C as per
the model. To do so we need to import GridsearchCV
from sklearn.model_selection import GridSearchCV

param = {'C':(0,0.01,0.5,0.1,1,2,5,10,50,100,500,1000)}
Here we have defined 10 different values for C.
svm1 = svm.SVC(kernel = 'linear')

svm.grid =
GridSearchCV(svm1,param,n_jobs=1,cv=10,verbose=1,scoring='accuracy')
cv represents cross-validation. verbose is 1: represents the boolean, the message

will be created.
svm.grid.fit(X_train,y_train)
[Parallel(n_jobs=1)]: Done 120 out of 120 | elapsed: 43.8s
finished
Out:
GridSearchCV(cv=10, estimator=SVC(kernel='linear'), n_jobs=1,

param_grid={'C': (0, 0.01, 0.5, 0.1, 1, 2, 5, 10, 50,
100, 500,
1000)},
scoring='accuracy', verbose=1)
svm.grid.best_params_
Out:
{'C': 0.1}
This will give us the result of the best C value for the model
linsvm_clf = svm.grid.best_estimator_
accuracy_score(y_test,linsvm_clf.predict(X_test))
Out:
0.7597402597402597
This is the best accuracy we can get out of the above C values.
In the similar way we can try for Kernel ='poly'. But for ‘rbf’ we need to define gaama
values as well. param = {'C':(0,0.01,0.5,0.1,1,2,5,10,50,100,500,1000)}, 'gamma':
(0,0.1,0.2,2,10) and with normal one value of C from sklearn import svm svm1 =
svm.SVC(kernel='rbf',gamma=0.5, C = 0.01) svm1.fit(X_test,y_test).
The above code can be checked at https://www.kaggle.com/adityakumar529/svm-

claasifier.
You can check my codes on
Aditya Kumar | Kaggle

Kaggle is the world's largest data science community with powerful tools and resources to
help you achieve your data…
www.kaggle.com
adityakumar529/Coursera_Capstone
Coursera_Capstone . Contribute to
adityakumar529/Coursera_Capstone development by creating an…
github.com
Support Vector Machine Regression Classification Data Science
Follow
Written by Aditya Kumar

58 Followers · Writer for Towards Data Science
Data Scientist with 6 years of experience. To find out more connect with me on
https://www.linkedin.com/in/adityakumar529/
More from Aditya Kumar and Towards Data Science

Aditya Kumar in Towards Data Science
KNN Algorithm: What?When?Why?How?

KNN: K Nearest Neighbour is one of the fundamental algorithms to start Machine Learning.
Machine Learning models use a set of input values…
6 min read · May 25, 2020
38 1
Adrian H. Raudaschl in Towards Data Science

Forget RAG, the Future is RAG-Fusion
The Next Frontier of Search: Retrieval Augmented Generation meets Reciprocal Rank Fusion
and Generated Queries
· 10 min read · Oct 6
2.3K 24
Marco Peixeiro in Towards Data Science
TimeGPT: The First Foundation Model for Time Series Forecasting

Explore the first generative pre-trained forecasting model and apply it in a project with Python
· 12 min read · Oct 24
1.8K 18
Aditya Kumar in Towards Data Science
Random Forest for prediction

Using Random Forest to predict automobile prices
6 min read · Jun 22, 2020
110 1
See all from Aditya Kumar
See all from Towards Data Science
Recommended from Medium

Abdullah Siddique in Dev Genius
Understanding Handwritten Digit Recognition Using K-Nearest

Neighbors (KNN)
Introduction
9 min read · Jun 7
61
Rukaiya Bano
Convolution Neural Networks: All you need to Know
“Understanding the Power of Convolutional Neural Networks (CNNs) in Machine Learning and
Image Processing”
4 min read · May 30
Lists
Predictive Modeling w/ Python

20 stories · 567 saves
Practical Guides to Machine Learning

New_Reading_List
Coding & Development

Tasmay Pankaj Tibrewal in Low Code for Data Science
Support Vector Machines (SVM): An Intuitive Explanation

Everything you always wanted to know about this powerful supervised ML algorithm
17 min read · Jul 1
980 6
Viswa
Support Vector Regression: Unleashing the Power of Non-Linear

Predictive Modeling
Support Vector Regression (SVR) is a powerful machine learning technique used for regression
tasks, particularly in scenarios where linear…
8 min read · Jul 28
3
Ambassador of Newland in The Data Analytics Academy
Understand Linear Support Vector Classifier (SVC) In Machine Learning a

Classification Algorithm
In the vast realm of machine learning, where algorithms strive to make sense of complex data
patterns and make accurate predictions…
· 17 min read · Sep 13
2
Prathammodi
Keras CNN Tutorial: Classifying Images Made Easy

So you want to build a dog breed classifier? Cool, you’ve come to the right place. In this tutorial,
we’re going to use convolutional…
15 min read · Oct 16
17 1
See more recommendations

SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science

Uploaded by

Copyright:

Available Formats

Get unlimited access to the best of Medium for less than $1/week.

SVM (Support Vector Machine) for

Aditya Kumar · Follow

Listen Share More

SVM: Support Vector Machine is a supervised classification algorithm where we

Consider an example where we have cats and dogs together.

Dogs and Cats (Image by Author)

We want our model to differentiate between cats and dogs.

We can have three types of kernels.

3. Radial Basis Function Kernel

Some use cases of SVM:

4. Text and Hypertext Categorization

let’s check how SVM works for regression.

dtypes give the data type of column

from sklearn.model_selection import train_test_split

from sklearn.svm import SVR

svr = SVR(kernel = 'linear',C = 1000)

in order to work in an efficient manner, we will standardize our data.SVM works at a

from sklearn.preprocessing import StandardScaler

Our data has been standardized.

Check our predicted values

array([ 5957.14966842, 14468.92070095, 20448.68298715,

Time to check modal performance.

from sklearn.metrics import r2_score

import seaborn as sns

SVM for classification

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import accuracy_score,confusion_matrix

We will check the accuracy score

We have an accuracy score of 0.78

from sklearn.model_selection import GridSearchCV

Here we have defined 10 different values for C.

svm1 = svm.SVC(kernel = 'linear')

cv represents cross-validation. verbose is 1: represents the boolean, the message

GridSearchCV(cv=10, estimator=SVC(kernel='linear'), n_jobs=1,

The above code can be checked at https://www.kaggle.com/adityakumar529/svm-

Aditya Kumar | Kaggle

Support Vector Machine Regression Classification Data Science

Written by Aditya Kumar

More from Aditya Kumar and Towards Data Science

KNN Algorithm: What?When?Why?How?

6 min read · May 25, 2020

Adrian H. Raudaschl in Towards Data Science

· 10 min read · Oct 6

Marco Peixeiro in Towards Data Science

TimeGPT: The First Foundation Model for Time Series Forecasting

· 12 min read · Oct 24

Random Forest for prediction

6 min read · Jun 22, 2020

See all from Aditya Kumar

See all from Towards Data Science

Recommended from Medium

Understanding Handwritten Digit Recognition Using K-Nearest

9 min read · Jun 7

4 min read · May 30

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Coding & Development

Tasmay Pankaj Tibrewal in Low Code for Data Science

Support Vector Machines (SVM): An Intuitive Explanation

Support Vector Regression: Unleashing the Power of Non-Linear

8 min read · Jul 28

Understand Linear Support Vector Classifier (SVC) In Machine Learning a