Week 1

Session No: CO3-6
Session Topic: Support Vector Machine (SVM)
DATA WAREHOUSING and MINING

(Course code: 20CS3052R)
Session Objective
An ability to understand the SVM algorithm and its

applications.
Topic to be Covered
• Support Vector Machine
• Types of SVM
• Terminology of SVM
• Working of SVM
• Examples
• Applications
• Conclusion
• References
Support Vector Machine- Classification
• Support vector machines (SVMs) are supervised machine learning algorithms which are used
both for classification and regression.
• But generally, they are used in classification problems.
• In 1960s, SVMs were first introduced but later they got refined in 1990.
• Support vector machine is another simple algorithm that every machine learning expert
should have used.
• SVM can be used for both regression and classification tasks. But, it is widely used in
classification objectives.
Cont.
• SVMs have their unique way of implementation as compared to

other machine learning algorithms.
• Later, SVM extremely popular because of their ability to handle

multiple continuous and categorical variables.
• The main task of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional space into
classes so that we can easily put the new data point in the correct
category in the future. And this best decision boundary is called a
hyperplane.
Types of SVM
SVM can be categorize as Linear and Non-Linear
•Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed
as linearly separable data, and classifier is used called as Linear SVM classifier.
•Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which

means if a dataset cannot be classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear SVM classifier.
Algorithm
1. Define an optimal hyperplane: maximize margin
2. Extend the above definition for non-linearly separable problems: have a penalty
term for misclassifications.
3. Map data to high dimensional space where it is easier to classify with linear
decision surfaces: reformulate problem so that data is mapped implicitly to this
space.
Terminology used in SVM
• SVMs are based on the idea of finding a hyperplane

that best divides a dataset into two classes.
Support Vectors
 Support vectors are the data points nearest to the

hyperplane.
 The points of a data set that, if removed, would alter the

position of the dividing hyperplane.
 Because of this, they can be considered the critical

elements of a data set.
Cont.
Hyperplane
• For a classification task with only two features (consider previous slide example), you can
think of a hyperplane as a line that linearly separates and classifies a set of data.
• The hyperplane our data points lie, the more confident we are that they have been correctly
classified.
• We therefore want our data points to be as far away from the hyperplane as possible, while
still being on the correct side of it.
• So when new testing data is added, whatever side of the hyperplane it lands will decide the
class that we assign to it.
Cont.
• To find the right hyperplane or to best segregate

the two classes within the data we following
steps-
 The distance between the hyperplane and the

nearest data point from either set is known as
the margin.
 The goal is to choose a hyperplane with the

greatest possible margin between the hyperplane
and any point within the training set, giving a
greater chance of new data being classified
correctly.
Cont.
But what happens when there is no clear

hyperplane-
• Data is rarely ever as clean as our simple

example above.
• A dataset will often look more like the

jumbled balls which represent a linearly
non separable dataset.
Working of Support Vector Machine (SVM)
• SVM is a supervised machine learning algorithm.
• In the SVM algorithm, we plot each data item as a point in n-dimensional space
(where n is number of features you have) with the value of each feature being
the value of a particular coordinate.
• Later we perform classification by finding the hyper-plane that differentiates

the two classes very well.
Working of SVM
• Suppose that you are working on a text classification problem.
• Refining training data.
• Support Vector Machines (SVM), a fast and dependable

classification algorithm that performs very well with a limited
amount of data to analyze.
Example (Data is linear)
Let’s imagine we have two tags: red and blue, and our data has

two features, x and y. We want a classifier that, given a pair
of (x,y) coordinates, outputs if it’s either red or blue. We plot our
already labeled training data on a plane:
Cont.
• A support vector machine takes these data

points and outputs the hyperplane (which in
two dimensions it’s simply a line) that best
separates the tags.
• This line is the decision boundary: anything

that falls to one side of it we will classify
as blue, and anything that falls to the other
as red.
Cont.
Now select best hyperplane it’s

the one that maximizes the
margins from both tags or the
hyperplane (line in this case)
whose distance to the nearest
element of each tag is the
largest.
Example 2
• Suppose we see a strange cat that also has some features

of dogs, so if we want a model that can accurately
identify whether it is a cat or dog, so such a model can
be created by using the SVM algorithm.
• We will first train our model with lots of images of cats
and dogs so that it can learn about different features of
cats and dogs, and then we test it with this strange
creature.
• So as support vector creates a decision boundary
between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme
case of cat and dog. On the basis of the support vectors,
it will classify it as a cat. Consider the given diagram-
Example 3
Speech
Recognition
Pros & Cons of Support Vector Machines
Pros:
•Accuracy
•Works well on smaller cleaner datasets
•It can be more efficient because it uses a subset of training points
Cons:
•Isn’t suited to larger datasets as the training time with SVMs can be high
•Less effective on noisier datasets with overlapping classes

SVM Application
• SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
• SVM is used for text classification tasks such as category assignment, detecting
spam and sentiment analysis.
• It is also commonly used for image recognition challenges, performing

particularly well in aspect-based recognition and color-based classification.
• SVM also plays a vital role in many areas of handwritten digit recognition, such
as postal automation services.
Active learning
1. Implement the SVM algorithm using Python (Lab ).

2. ( Case Study/Problem Solving) (e.g. In this section we use a dataset to breast
cancer diagnostic and apply SVM in it. The SVM model will be able to
discriminate benign and malignant tumors.)
Conclusion
• Support vector machine is highly preferred by many as it produces significant

accuracy with less computation power.
• Support vector machine is an elegant and powerful algorithm.

Thank You
Query?
References
Han J & Kamber M, “Data Mining: Concepts and Techniques”, Third Edition, Elsevier, 2011.
https://www.analyticsvidhya.com/blog/2017/09/understaing-support-vector-machine-example-code/
https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm
https://www.kdnuggets.com/2016/07/support-vector-machines-simple-explanation.html
https://www.saedsayad.com/support_vector_machine.htm
https://towardsdatascience.com/https-medium-com-pupalerushikesh-svm-f4b42800e989
Next Class Topic
In next class I will cover following topic-
 Performance Analysis

Week 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 1

Uploaded by

Copyright:

Available Formats

Session No: CO3-6

Session Topic: Support Vector Machine (SVM)

DATA WAREHOUSING and MINING

An ability to understand the SVM algorithm and its

• But generally, they are used in classification problems.

• SVMs have their unique way of implementation as compared to

• Later, SVM extremely popular because of their ability to handle

SVM can be categorize as Linear and Non-Linear

•Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which

1. Define an optimal hyperplane: maximize margin

• SVMs are based on the idea of finding a hyperplane

 Support vectors are the data points nearest to the

 The points of a data set that, if removed, would alter the

 Because of this, they can be considered the critical

• To find the right hyperplane or to best segregate

 The distance between the hyperplane and the

 The goal is to choose a hyperplane with the

But what happens when there is no clear

• Data is rarely ever as clean as our simple

• A dataset will often look more like the

• SVM is a supervised machine learning algorithm.

• Later we perform classification by finding the hyper-plane that differentiates

• Suppose that you are working on a text classification problem.

• Refining training data.

• Support Vector Machines (SVM), a fast and dependable

Example (Data is linear)

Let’s imagine we have two tags: red and blue, and our data has

• A support vector machine takes these data

• This line is the decision boundary: anything

Now select best hyperplane it’s

• Suppose we see a strange cat that also has some features

•Works well on smaller cleaner datasets

•It can be more efficient because it uses a subset of training points

•Less effective on noisier datasets with overlapping classes

• It is also commonly used for image recognition challenges, performing

1. Implement the SVM algorithm using Python (Lab ).

• Support vector machine is highly preferred by many as it produces significant

• Support vector machine is an elegant and powerful algorithm.

In next class I will cover following topic-

You might also like