Professional Documents
Culture Documents
Experiment No. 3
A.1 Aim:
To implement Support Vector Machine.
A.2 Prerequisite:
Python Basic Concepts
A.3 Outcome:
Students will be able To implement Support Vector Machine.
A.4 Theory:
Machine Learning, being a subset of Artificial Intelligence (AI), has been playing a
dominant role in our daily lives. Data science engineers and developers working in
various domains are widely using machine learning algorithms to make their tasks
simpler and life easier.
Types of SVM
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
SVM algorithm is implemented with kernel that transforms an input data space into the
required form. SVM uses a technique called the kernel trick in which kernel takes a low
dimensional input space and transforms it into a higher dimensional space. In simple
words, kernel converts non-separable problems into separable problems by adding more
dimensions to it. It makes SVM more powerful, flexible and accurate. The following are
some of the types of kernels used by SVM.
Linear Kernel
It can be used as a dot product between any two observations. The formula of linear
kernel is as below −
K(x,xi)=sum(x∗xi)K(x,xi)=sum(x∗xi)
From the above formula, we can see that the product between two vectors say 𝑥 & 𝑥𝑖 is
the sum of the multiplication of each pair of input values.
Polynomial Kernel
It is more generalized form of linear kernel and distinguish curved or nonlinear input
space. Following is the formula for polynomial kernel −
k(X,Xi)=1+sum(X∗Xi)^dk(X,Xi)=1+sum(X∗Xi)^d
Here d is the degree of polynomial, which we need to specify manually in the learning
algorithm.
PART B
(PART B : TO BE COMPLETED BY STUDENTS)
(Students must submit the soft copy as per following segments within two hours of the practical. The
soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge faculties at
the end of the practical in case the there is no Black board access available)
import numpy as np
from google.colab import drive
import csv
import pandas as pd
import seaborn as sns
df.head()
#removing outliers
import pandas as pd
columns_to_check = ['LUNG_CANCER']
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X = data.drop('LUNG_CANCER', axis=1)
y = data[‘LUNG_CANCER']
log_reg.fit(X_train, y_train)
y_pred = log_reg.predict(X_test)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Split the data into features and labels
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
import pandas as pd
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from scipy.stats import randint
B.4 Conclusion:
In this experiment, we successfully implemented a Support Vector Machine (SVM) classifier with an
RBF kernel on a given dataset.
Ans: A support vector machine (SVM) is a type of supervised learning algorithm used in
machine learning to solve classification and regression tasks; SVMs are particularly good at
solving binary classification problems, which require classifying the elements of a data set into
two groups.
The aim of a support vector machine algorithm is to find the best possible line, or decision
boundary, that separates the data points of different data classes. This boundary is called a
hyperplane when working in high-dimensional feature spaces. The idea is to maximize the
margin, which is the distance between the hyperplane and the closest data points of each
category, thus making it easy to distinguish data classes.