B24 ML Exp-3

PART A
(PART A: TO BE REFFERED BY STUDENTS)
Experiment No. 3
A.1 Aim:
To implement Support Vector Machine.
A.2 Prerequisite:
Python Basic Concepts
A.3 Outcome:
Students will be able To implement Support Vector Machine.
A.4 Theory:
Machine Learning, being a subset of Artificial Intelligence (AI), has been playing a
dominant role in our daily lives. Data science engineers and developers working in
various domains are widely using machine learning algorithms to make their tasks
simpler and life easier.
The objective of the support vector machine algorithm is to find a hyperplane in an N-

dimensional space(N — the number of features) that distinctly classifies the data points.
To separate the two classes of data points, there are many possible hyperplanes that could
be chosen. Our objective is to find a plane that has the maximum margin, i.e the maximum
distance between data points of both classes. Maximizing the margin distance provides
some reinforcement so that future data points can be classified with more confidence.
Hyperplanes are decision boundaries that help classify the data points. Data points falling
on either side of the hyperplane can be attributed to different classes. Also, the dimension
of the hyperplane depends upon the number of features. If the number of input features is
2, then the hyperplane is just a line. If the number of input features is 3, then the
hyperplane becomes a two-dimensional plane. It becomes difficult to imagine when the
number of features exceeds 3.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
SVM algorithm is implemented with kernel that transforms an input data space into the
required form. SVM uses a technique called the kernel trick in which kernel takes a low
dimensional input space and transforms it into a higher dimensional space. In simple
words, kernel converts non-separable problems into separable problems by adding more
dimensions to it. It makes SVM more powerful, flexible and accurate. The following are
some of the types of kernels used by SVM.
Linear Kernel
It can be used as a dot product between any two observations. The formula of linear
kernel is as below −
K(x,xi)=sum(x∗xi)K(x,xi)=sum(x∗xi)
From the above formula, we can see that the product between two vectors say 𝑥 & 𝑥𝑖 is
the sum of the multiplication of each pair of input values.
Polynomial Kernel
It is more generalized form of linear kernel and distinguish curved or nonlinear input
space. Following is the formula for polynomial kernel −
k(X,Xi)=1+sum(X∗Xi)^dk(X,Xi)=1+sum(X∗Xi)^d
Here d is the degree of polynomial, which we need to specify manually in the learning
algorithm.
Pros and Cons of SVM Classifiers
Pros of SVM classifiers

SVM classifiers offers great accuracy and work well with high dimensional space. SVM
classifiers basically use a subset of training points hence in result uses very less memory.
Cons of SVM classifiers
They have high training time hence in practice not suitable for large datasets. Another
disadvantage is that SVM classifiers do not work well with overlapping classes.
PART B
(PART B : TO BE COMPLETED BY STUDENTS)
(Students must submit the soft copy as per following segments within two hours of the practical. The
soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge faculties at
the end of the practical in case the there is no Black board access available)
Roll. No. B24 Name:Sakshi Bhaskar Tupsundar

Class: BE-Comps Batch:B2
Date of Experiment:10-10-2023 Date of Submission:12-10-2023
Grade:
B.1 Software Code written by student:
import numpy as np
from google.colab import drive
import csv
import pandas as pd
import seaborn as sns
df = pd.read_csv('/content/survey lung cancer.csv')

df.shape
df.isnull().sum()
df.head()
from sklearn import preprocessing
# label_encoder object knows

# how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'species'.

df['GENDER']= label_encoder.fit_transform(df['GENDER'])
df['GENDER'].unique()
df['LUNG_CANCER']= label_encoder.fit_transform(df['LUNG_CANCER'])
df['LUNG_CANCER'].unique()
df.head()
import matplotlib.pyplot as plt

plt.figure(figsize=(14, 8))
plt.suptitle("Lung Disease Prediction")
ax = plt.gca()
df.boxplot()
#removing outliers
import pandas as pd
columns_to_check = ['LUNG_CANCER']
# Step 1: Calculate the first quartile (Q1), third quartile (Q3),

# and IQR for each column
Q1 = data[columns_to_check].quantile(0.25)
Q3 = data[columns_to_check].quantile(0.75)
IQR = Q3 - Q1
# Step 2: Define the outlier boundaries
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
# Step 3: Identify outliers for each column

outliers = {}
for column_name in columns_to_check:
outliers[column_name] = data[(data[column_name] <
lower_bound[column_name]) |
(data[column_name] >
upper_bound[column_name])]
# For example, if you choose to remove the outliers:

data_cleaned = data.copy()
for column_name in columns_to_check:
data_cleaned = data_cleaned[
(data_cleaned[column_name] >= lower_bound[column_name]) &
(data_cleaned[column_name] <= upper_bound[column_name])]
Applying SVM model before outlier removal
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
X = data.drop('LUNG_CANCER', axis=1)
y = data[‘LUNG_CANCER']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)
from sklearn.preprocessing import StandardScaler

st_x= StandardScaler()
X_train= st_x.fit_transform(X_train)
X_test= st_x.transform(X_test)
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)
y_pred = log_reg.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)
Applying SVM model after outlier removal
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# Split the data into features and labels
X = data.drop('LUNG_CANCER', axis=1) # Adjust as needed

y = data['LUNG_CANCER']
# Initialize an empty list to store selected features

selected_features = []
best_accuracy = 0.0
# Split the data into training and testing sets

random_state=42)
while len(selected_features) < X.shape[1]: # Repeat until all features

are selected
# Find the feature that improves the model the most
best_feature = None
best_feature_accuracy = 0.0
for feature in X.columns:

if feature not in selected_features:
# Create a new feature set by adding the current feature
current_features = selected_features + [feature]
# Train an SVM classifier on the current feature set

svm = SVC()
svm.fit(X_train[current_features], y_train)
# Make predictions on the test set

y_pred = svm.predict(X_test[current_features])
# Calculate accuracy
# Check if this feature improves accuracy

if accuracy > best_feature_accuracy:
best_feature_accuracy = accuracy
best_feature = feature
# Add the best feature to the selected features list

selected_features.append(best_feature)
best_accuracy = best_feature_accuracy
#Print the selected feature and its accuracy

print(f"Selected Feature: {best_feature}, Accuracy:
{best_accuracy:.4f}")
print("Forward selection complete.")

print("Selected Features:", selected_features)
Applying SVM model sfter feature selection Process

#from sklearn import svm
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report
# Create an SVM model with the 'rbf' kernel

clf = svm.SVC(kernel='rbf')
# Fit the SVM model to the training data

clf.fit(X_train_after, y_train_after)
# Make predictions on the test data

y_pred = clf.predict(X_test_after)
# Calculate accuracy on the test set

accuracy = accuracy_score(y_test_after, y_pred)
print("Testing Accuracy:", accuracy)
# Perform cross-validation and print the cross-validation scores

cv_scores = cross_val_score(clf, X_train_after, y_train_after, cv=5) #
You can change the number of folds (cv) as needed
print("Cross-Validation Scores:", cv_scores)
print("Mean CV Score:", cv_scores.mean())
print(classification_report(y_test_after, y_pred))
hyperparameter tuning for SVM
import pandas as pd
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint
# Assuming the columns are named 'feature1', 'feature2', and 'target'

X = data[['AGE']]
y = data['AGE']
# Split the data into training and testing sets

random_state=42)
# Define the model

rf_model = RandomForestClassifier()
# Define the hyperparameter distributions to sample from

param_dist = {
'n_estimators': randint(50, 200),
'max_depth': [None] + list(randint(1, 20, 10).rvs(10)),
'min_samples_split': randint(2, 11),
'min_samples_leaf': randint(1, 5)
}
# Handle None for max_depth separately

param_dist['max_depth'].append(None)
# Perform randomized search with cross-validation

random_search = RandomizedSearchCV(estimator=rf_model,
param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy',
random_state=42)
random_search.fit(X_train, y_train)
# Get the best hyperparameters

best_params = random_search.best_params_
print("Best Hyperparameters:", best_params)
# Evaluate the model on the test set using the best hyperparameters
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
print("Accuracy on Test Set:", accuracy)
B.2 Input and Output:
SVM Model Scores

Training Accuracy score 0.89475
Testing Accuracy score 0.86
ROC_AUC score 0.951477
CV score 0.756
SVM Model(Feature Slection) Scores

Training Accuracy score 0.84963
Testing Accuracy score 0.9575
ROC_AUC score 0.9153
CV score 0.9425
Hyper Parameter tuning for Scores

SVM Model
Accuracy score 0.91935483870
ROC_AUC score 0.55
CV score 0.95967741
B.3 Observations and learning:
 The SVM classifier with an RBF kernel demonstrated strong predictive capabilities, achieving a
high accuracy rate and effectively classifying data points into their respective classes.
 Support Vector Machines are powerful classifiers that can be applied to a wide range of
classification problems.
 Evaluating the performance of an SVM model through metrics like accuracy, precision, recall,
and the confusion matrix helps in understanding its strengths and weaknesses.
 SVMs with RBF kernels are suitable for complex datasets with non-linear relationships, but
hyperparameter tuning and feature selection are crucial for optimizing their performance.
B.4 Conclusion:
In this experiment, we successfully implemented a Support Vector Machine (SVM) classifier with an
RBF kernel on a given dataset.
B.5 Question of Curiosity

Q1. What is a support vector machine (SVM)?
Ans: A support vector machine (SVM) is a type of supervised learning algorithm used in
machine learning to solve classification and regression tasks; SVMs are particularly good at
solving binary classification problems, which require classifying the elements of a data set into
two groups.
The aim of a support vector machine algorithm is to find the best possible line, or decision
boundary, that separates the data points of different data classes. This boundary is called a
hyperplane when working in high-dimensional feature spaces. The idea is to maximize the
margin, which is the distance between the hyperplane and the closest data points of each
category, thus making it easy to distinguish data classes.

B24 ML Exp-3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B24 ML Exp-3

Uploaded by

Copyright:

Available Formats

PART A

(PART A: TO BE REFFERED BY STUDENTS)

The objective of the support vector machine algorithm is to find a hyperplane in an N-

SVM can be of two types:

Pros and Cons of SVM Classifiers

Pros of SVM classifiers

Roll. No. B24 Name:Sakshi Bhaskar Tupsundar

df = pd.read_csv('/content/survey lung cancer.csv')

from sklearn import preprocessing

# label_encoder object knows

# Encode labels in column 'species'.

import matplotlib.pyplot as plt

# Step 1: Calculate the first quartile (Q1), third quartile (Q3),

# Step 3: Identify outliers for each column

# For example, if you choose to remove the outliers:

Applying SVM model before outlier removal

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

from sklearn.preprocessing import StandardScaler

accuracy = accuracy_score(y_test, y_pred)

Applying SVM model after outlier removal

X = data.drop('LUNG_CANCER', axis=1) # Adjust as needed

# Initialize an empty list to store selected features

# Split the data into training and testing sets

while len(selected_features) < X.shape[1]: # Repeat until all features

for feature in X.columns:

# Train an SVM classifier on the current feature set

# Make predictions on the test set

# Check if this feature improves accuracy

# Add the best feature to the selected features list

#Print the selected feature and its accuracy

print("Forward selection complete.")

Applying SVM model sfter feature selection Process

# Create an SVM model with the 'rbf' kernel

# Fit the SVM model to the training data

# Make predictions on the test data

# Calculate accuracy on the test set

# Perform cross-validation and print the cross-validation scores

# Assuming the columns are named 'feature1', 'feature2', and 'target'

# Split the data into training and testing sets

# Define the model

# Define the hyperparameter distributions to sample from

# Handle None for max_depth separately

# Perform randomized search with cross-validation

# Get the best hyperparameters

B.2 Input and Output:

SVM Model Scores

SVM Model(Feature Slection) Scores

Hyper Parameter tuning for Scores

B.5 Question of Curiosity

You might also like