Session 16A Wed 24-Nov-2021

Session 16A: Support Vector Machines
Course: Data Science Algorithms

and Applications (19ECE436A)
Session Duration: 100 minutes
Course Leader:
Prof. Raghavendra V. Kulkarni, PhD
Department of Electronics and Communication Engineering
Faculty of Engineering and Technology, MSRUAS, Bengaluru
Email: raghavendra.ec.et@msruas.ac.in
1
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Intended Learning Outcomes
At the end of this session, the student should be able to:
• Explain the maximum margin classifiers

• Discuss the bias-variance tradeoff in machine learning
algorithms
• Discuss the effect of outliers on the performance of a
maximum margin classifier
• Discuss the support vectors and support vector classifiers
• Determine the classifier boundary based on the given
training data
2
Binary Classification
• Consider that the weights of babies are recorded and plotted as
shown below:
• Assume that the red circles represent the weights of underweight

babies, and the green circles represent the weights of the babies
that are normal.
• Based on these observations, we can decide a threshold weight.
3
• If the weight of a newborn baby is lower than the threshold weight,
then we can classify the baby to be underweight (black circle in the
figure).
• And if the weight of a newborn baby is higher than the threshold

weight, then we can classify the baby to be normal (black circle in
the figure).
4
• What if the weight of a newborn baby is as shown by a black circle
in the following figure?
• Since this weight is just more than the threshold weight, this baby is
classified as a normal one.
• However, that turns out to be a wrong classification because the
weight of this baby is closer to the weights of underweight babies.
5
• One way to address this issue is to consider the weights at
the boundaries of the two clusters (the weights of the
heaviest underweight, and the lightest normal weight
babies)
• The new threshold is decided as the mid-point between

these weights as shown below:
6
• With this, now the weight of the newborn baby
considered on Slide 5 gets closer to the underweight
babies than it is from the normal weight babies.
• This results in correct classification as an underweight

baby.
7
Classification Margin
• The shortest distance between the observations and the
threshold is referred to as the classification margin.
• Since the threshold is the mid-point between the

observations closest to the thresholds, both margins
(shown as red and green dotted lines) are equal.
• This also ensures that the margin is as large as possible.
8
• If the threshold is moved to the left a little, then the
distance between the threshold and the underweight
observation would be smaller.
9
• Similarly, if the threshold is moved to the right, then the distance
between the threshold and the normal observation would be
smaller.
10
• It makes sense to use maximum margin classifiers.
• Maximum margin classifiers work well if the data-points (also called

observations, patterns, exemplars, stimuli) in the two classes are
separated well. However, they have problems when there are
outlier data points.
• An outlier data-point is far away form the centroid of the data-
points in its class. 11
Margin in Data with Outliers
• Consider the case if the given training data looks as shown in the
figure below:
• There is an outlier data-point which is far away from the rest of the
data-points in the underweight babies class, and closer to the data
points that represent normal weight babies.
12
• This forces the threshold to move very close to normal weight
observations,
• And far away from underweight observations
13
• Due to this, the newborn baby represented by a the black circle in
the following figure will be classified as an underweight.
• This may not be a right classification since this data-point

is far away from the others in the same class, and close to
the data points in the other class.
14
Need for Margin Violation
• This shows that the maximum margin classifiers are very
sensitive to the outliers in the data points.
• In order to make the classifier less sensitive to outliers in
data, some misclassifications must be allowed!
• If the threshold is decided based on the two data-points
highlighted in the following figure, we would get a better
classification performance.
15
Need for Margin Violation
• Let’s say we decide the threshold that allows some
misclassification.
• Now, the new observation represented by the black circle
would be classified correctly as a normal weight baby.
• Choosing a threshold that allows misclassification is an
example of bias-variance trade-off.
16
Bias-Variance Trade-off
• The bias is the difference between the prediction of the values by
the ML model and the correct value.
• Being high in bias gives a large error in training as well as testing
data.
• The variability of model prediction for a given data point which tells
us spread of our data is called the variance of the model.
• The model with high variance has a very complex fit to the training
data and thus is not able to fit accurately on the data which it
hasn’t seen before.
• If the algorithm is too simple, then it may be high on bias and low
on variance. Thus, it is error-prone.
• If the algorithm fits too complex, then it may be high on variance
and low on bias. 17
Bias-Variance Trade-off
• Before the threshold that allowed misclassifications was picked, it
was very sensitive to training data (low on bias).
• But it performed poorly when the new data-point arrived (high on

variance).
18
Soft Margin Classification
• On the contrary, when the threshold that permitted the
misclassification, the classifier has high bias, but it is less sensitive
to outliers in the data (low variance).
• When misclassifications are permitted, the distance between the

observations and the threshold is called the soft margin.
19
Soft Margin Classification
• The number of observations permitted to be misclassified and
permitted to be inside the soft margin are determined using the
method of cross validation.
• The observations on the edge and within the soft margins are called
Support Vectors.
• Soft margin classifiers are also known as Support Vector Classifiers.
These are highlighted in dark red and green colors in the figure
below:
20
Higher Dimensional Data
• If we measure the heights of the babies in addition to their weights,
then the data-points representing the babies’ measurements would
be having two components.
• Each such data point would represent a double, or a point in the
two dimensional space.
21
22
• In two dimensions, the classifier threshold would be a straight line,
and so are the marginal lines.
23
24
• Other data-points, than the ones shown in Slide 28 are
outside of the margin.
• The number of data-points permitted to be on or within
the margins are determined using cross validation.
25
• Other data-points, than the ones shown in Slide 28 are
outside of the margin.
• The number of data-points permitted to be on or within
the margins are determined using cross validation.
• In the higher dimensional space, the decision hyperplane
is represented by 𝒘𝒙 + 𝑏 = 0.
• 𝒘 represents the weight vector, 𝒙 represents the feature
vector, and 𝑏 represents the bias.
• These have been studied thoroughly in the topic of
Rosenblatt's perceptron.
26
Example: 2-D Linear Binary Classifier
• Consider the following:
• Points (3, 1), (3, −1), (6, 1) and (6, −1) are positively
labeled (they belong to one class).
• Points (1, 0), (0, 1), (0, −1) and (−1, 0) are negatively
labeled (they belong to the other class).
• The task is to determine the straight line that represents
the maximum classifier threshold.
• This involves determining the slope and the 𝑦-intercept of
the threshold line that classifies these data-points
correctly, and delivers the maximum margin.
27
28
• We can see that the threshold line must pass through the
point (2, 0).
• The support vectors (the data-points) closest to the
threshold line are: (1, 0), 3, 1 and 3, −1 .
• Let’s call them 𝑆1 , 𝑆2 and 𝑆3 , respectively.
1 3 3
• 𝑆1 = , 𝑆2 = and 𝑆3 =
0 1 −1
• These support vectors are shown in the next slide.
29
30
• In the next step, we augment each support vector with a
bias = 1. We denote each augmented support vectors
with a tilde ~ on its head.
1 3 3
• 𝑆1 = 0 , 𝑆2 = 1 and 𝑆3 = −1
1 1 1
• These support vectors are shown in the next slide.
• Then we must determine three variables:
31
• This amounts to the following:
32
33
• This gives us:
𝛼1 = −0.375, 𝛼2 = 0.75 and 𝛼3 = 0.75
• The weight Vector is obtained as follows:
34
• Here, the bias 𝑏 = −2
• This equation is of the form 𝑦 = 𝒘𝒙 + 𝑏
1
• 𝒘=
0
• This represents a line parallel to the 𝑦-axis with an offset
of 2 units.
• That is a vertical line that passes through 𝑥 = 2
35
36
More Solved Problems
• More details on the solution of this problem, and
many more solved problems on SVMs are available
in this document.
37
Summary
• Binary classifiers such as SVMs involve bias-variance tradeoff.
• Maximum margin linear classifiers are sensitive to outliers in
training data.
• Some misclassification is permitted in order to make the
classifier less sensitive to training data.
• SVMs that allow such misclassification are called soft margin
SVMs.
• Support vectors are the data-points closest to the
classification hyperplane.
• A problem on linear binary classifiers has been solved in this
session.
38
39
40
Faculty
©M. S. of Engineering
Ramaiah & Technology
University of Applied Sciences © Ramaiah University of Applied Sciences

Session 16A Wed 24-Nov-2021

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 16A Wed 24-Nov-2021

Uploaded by

Copyright:

Available Formats

Session 16A: Support Vector Machines

Course: Data Science Algorithms

Session Duration: 100 minutes

• Explain the maximum margin classifiers

• Assume that the red circles represent the weights of underweight

• And if the weight of a newborn baby is higher than the threshold

• The new threshold is decided as the mid-point between

• This results in correct classification as an underweight

• Since the threshold is the mid-point between the

• Maximum margin classifiers work well if the data-points (also called

• And far away from underweight observations

• This may not be a right classification since this data-point

• But it performed poorly when the new data-point arrived (high on

• When misclassifications are permitted, the distance between the

You might also like