You are on page 1of 40

Session 16A: Support Vector Machines

Course: Data Science Algorithms


and Applications (19ECE436A)

Session Duration: 100 minutes

Course Leader:
Prof. Raghavendra V. Kulkarni, PhD
Department of Electronics and Communication Engineering
Faculty of Engineering and Technology, MSRUAS, Bengaluru
Email: raghavendra.ec.et@msruas.ac.in
1
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Intended Learning Outcomes
At the end of this session, the student should be able to:

• Explain the maximum margin classifiers


• Discuss the bias-variance tradeoff in machine learning
algorithms
• Discuss the effect of outliers on the performance of a
maximum margin classifier
• Discuss the support vectors and support vector classifiers
• Determine the classifier boundary based on the given
training data

2
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Binary Classification
• Consider that the weights of babies are recorded and plotted as
shown below:

• Assume that the red circles represent the weights of underweight


babies, and the green circles represent the weights of the babies
that are normal.
• Based on these observations, we can decide a threshold weight.

3
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Binary Classification
• If the weight of a newborn baby is lower than the threshold weight,
then we can classify the baby to be underweight (black circle in the
figure).

• And if the weight of a newborn baby is higher than the threshold


weight, then we can classify the baby to be normal (black circle in
the figure).

4
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Binary Classification
• What if the weight of a newborn baby is as shown by a black circle
in the following figure?

• Since this weight is just more than the threshold weight, this baby is
classified as a normal one.
• However, that turns out to be a wrong classification because the
weight of this baby is closer to the weights of underweight babies.

5
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Binary Classification
• One way to address this issue is to consider the weights at
the boundaries of the two clusters (the weights of the
heaviest underweight, and the lightest normal weight
babies)

• The new threshold is decided as the mid-point between


these weights as shown below:

6
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Binary Classification
• With this, now the weight of the newborn baby
considered on Slide 5 gets closer to the underweight
babies than it is from the normal weight babies.

• This results in correct classification as an underweight


baby.

7
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Classification Margin
• The shortest distance between the observations and the
threshold is referred to as the classification margin.

• Since the threshold is the mid-point between the


observations closest to the thresholds, both margins
(shown as red and green dotted lines) are equal.
• This also ensures that the margin is as large as possible.
8
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Classification Margin
• If the threshold is moved to the left a little, then the
distance between the threshold and the underweight
observation would be smaller.

9
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Classification Margin
• Similarly, if the threshold is moved to the right, then the distance
between the threshold and the normal observation would be
smaller.

10
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Classification Margin
• It makes sense to use maximum margin classifiers.

• Maximum margin classifiers work well if the data-points (also called


observations, patterns, exemplars, stimuli) in the two classes are
separated well. However, they have problems when there are
outlier data points.
• An outlier data-point is far away form the centroid of the data-
points in its class. 11
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Margin in Data with Outliers
• Consider the case if the given training data looks as shown in the
figure below:

• There is an outlier data-point which is far away from the rest of the
data-points in the underweight babies class, and closer to the data
points that represent normal weight babies.

12
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Margin in Data with Outliers
• This forces the threshold to move very close to normal weight
observations,

• And far away from underweight observations

13
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Margin in Data with Outliers
• Due to this, the newborn baby represented by a the black circle in
the following figure will be classified as an underweight.

• This may not be a right classification since this data-point


is far away from the others in the same class, and close to
the data points in the other class.

14
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Need for Margin Violation
• This shows that the maximum margin classifiers are very
sensitive to the outliers in the data points.
• In order to make the classifier less sensitive to outliers in
data, some misclassifications must be allowed!
• If the threshold is decided based on the two data-points
highlighted in the following figure, we would get a better
classification performance.

15
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Need for Margin Violation
• Let’s say we decide the threshold that allows some
misclassification.
• Now, the new observation represented by the black circle
would be classified correctly as a normal weight baby.
• Choosing a threshold that allows misclassification is an
example of bias-variance trade-off.

16
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Bias-Variance Trade-off
• The bias is the difference between the prediction of the values by
the ML model and the correct value.
• Being high in bias gives a large error in training as well as testing
data.
• The variability of model prediction for a given data point which tells
us spread of our data is called the variance of the model.
• The model with high variance has a very complex fit to the training
data and thus is not able to fit accurately on the data which it
hasn’t seen before.
• If the algorithm is too simple, then it may be high on bias and low
on variance. Thus, it is error-prone.
• If the algorithm fits too complex, then it may be high on variance
and low on bias. 17
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Bias-Variance Trade-off
• Before the threshold that allowed misclassifications was picked, it
was very sensitive to training data (low on bias).

• But it performed poorly when the new data-point arrived (high on


variance).

18
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Soft Margin Classification
• On the contrary, when the threshold that permitted the
misclassification, the classifier has high bias, but it is less sensitive
to outliers in the data (low variance).

• When misclassifications are permitted, the distance between the


observations and the threshold is called the soft margin.

19
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Soft Margin Classification
• The number of observations permitted to be misclassified and
permitted to be inside the soft margin are determined using the
method of cross validation.
• The observations on the edge and within the soft margins are called
Support Vectors.
• Soft margin classifiers are also known as Support Vector Classifiers.
These are highlighted in dark red and green colors in the figure
below:

20
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Higher Dimensional Data
• If we measure the heights of the babies in addition to their weights,
then the data-points representing the babies’ measurements would
be having two components.
• Each such data point would represent a double, or a point in the
two dimensional space.

21
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Higher Dimensional Data

22
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Higher Dimensional Data
• In two dimensions, the classifier threshold would be a straight line,
and so are the marginal lines.

23
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Higher Dimensional Data

24
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Higher Dimensional Data
• Other data-points, than the ones shown in Slide 28 are
outside of the margin.
• The number of data-points permitted to be on or within
the margins are determined using cross validation.

25
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Higher Dimensional Data
• Other data-points, than the ones shown in Slide 28 are
outside of the margin.
• The number of data-points permitted to be on or within
the margins are determined using cross validation.
• In the higher dimensional space, the decision hyperplane
is represented by 𝒘𝒙 + 𝑏 = 0.
• 𝒘 represents the weight vector, 𝒙 represents the feature
vector, and 𝑏 represents the bias.
• These have been studied thoroughly in the topic of
Rosenblatt's perceptron.

26
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier
• Consider the following:
• Points (3, 1), (3, −1), (6, 1) and (6, −1) are positively
labeled (they belong to one class).
• Points (1, 0), (0, 1), (0, −1) and (−1, 0) are negatively
labeled (they belong to the other class).
• The task is to determine the straight line that represents
the maximum classifier threshold.
• This involves determining the slope and the 𝑦-intercept of
the threshold line that classifies these data-points
correctly, and delivers the maximum margin.

27
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier

28
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier
• We can see that the threshold line must pass through the
point (2, 0).
• The support vectors (the data-points) closest to the
threshold line are: (1, 0), 3, 1 and 3, −1 .
• Let’s call them 𝑆1 , 𝑆2 and 𝑆3 , respectively.
1 3 3
• 𝑆1 = , 𝑆2 = and 𝑆3 =
0 1 −1
• These support vectors are shown in the next slide.

29
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier

30
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier
• In the next step, we augment each support vector with a
bias = 1. We denote each augmented support vectors
with a tilde ~ on its head.
1 3 3
• 𝑆1 = 0 , 𝑆2 = 1 and 𝑆3 = −1
1 1 1
• These support vectors are shown in the next slide.
• Then we must determine three variables:

31
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier
• This amounts to the following:

32
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier

33
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier
• This gives us:
𝛼1 = −0.375, 𝛼2 = 0.75 and 𝛼3 = 0.75
• The weight Vector is obtained as follows:

34
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier
• Here, the bias 𝑏 = −2
• This equation is of the form 𝑦 = 𝒘𝒙 + 𝑏
1
• 𝒘=
0
• This represents a line parallel to the 𝑦-axis with an offset
of 2 units.
• That is a vertical line that passes through 𝑥 = 2

35
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Example: 2-D Linear Binary Classifier

36
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
More Solved Problems
• More details on the solution of this problem, and
many more solved problems on SVMs are available
in this document.

37
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
Summary
• Binary classifiers such as SVMs involve bias-variance tradeoff.
• Maximum margin linear classifiers are sensitive to outliers in
training data.
• Some misclassification is permitted in order to make the
classifier less sensitive to training data.
• SVMs that allow such misclassification are called soft margin
SVMs.
• Support vectors are the data-points closest to the
classification hyperplane.
• A problem on linear binary classifiers has been solved in this
session.
38
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
39
Faculty of Engineering & Technology © Ramaiah University of Applied Sciences
40
Faculty
©M. S. of Engineering
Ramaiah & Technology
University of Applied Sciences © Ramaiah University of Applied Sciences

You might also like