Ds 2

Learning a Linear
Classifier
Quiz 1
Jan 24 (Wed), 6:30PM, L18, L19, L20
Only for registered students (regular + audit)
Assigned seating – will be announced soon
Open notes (handwritten only)
No mobile phones, tablets etc
Bring your institute ID card
If you don’t bring it you will have to spend
precious time waiting to get verified
Syllabus:
All videos, slides, code linked on the course
discussion page (link below) till 22 Jan, 2024
https://www.cse.iitk.ac.in/users/purushot/courses/
ml/2023-24-w/discussion.html
See GitHub for practice questions
Authentication by Secret Questions
Give me your device ID and TS271828182845

answer the following questions
1. 10111100 1. 1
2. 00110010 2. 0
3. 10001110 3. 1
4. 00010100 4. 0
5. … 5. …
SERVER DEVICE
Physically Unclonable Functions
0.50ms These tiny differences are
difficult to predict or clone
0.55ms
Then these could act

as the fingerprints
for the devices!
Arbiter PUFs If the top signal reaches the finish line first,
the “answer” to this question is 0, else if the
bottom signal reaches first, the “answer” is 1
Question: 1011
1 0 1 1
1?
Arbiter PUFs If the top signal reaches the finish line first,
the “answer” to this question is 0, else if the
bottom signal reaches first, the “answer” is 1
Question: 0110
0 1 1 0
0?
Linear Models
We have
where
𝐰
If , upper signal wins and answer is 0

If , lower signal wins and answer is 1
Thus, answer is simply
This is nothing but
a linear classifier!
The “best” Linear Classifier
It seems infinitely many
classifiers perfectly
classify the data. Which
8
one should I choose?
Indeed! Such models would be very brittle and might

misclassify test data (i.e. predict the wrong class), even
those test data which look very similar to train data
It is better to not select a model

whose decision boundary passes
very close to a training data point
Large Margin Classifiers 9
Fact: distance of origin from hyperplane is
Fact: distance of a point from this hyperplane is
Given train data for a binary classfn problem where and , we want
two things from a classifier
Demand 1: classify every point correctly – how to ask this politely?
One way: demand that for all ,
Easier way: demand that for all ,
Demand 2: not let any data point come close to the boundary
Demand that be as large as possible
Support Vector Machines 10
Just a fancy way of saying Let us simplify this “
“
optimization
Please find me a linear classifier problem
that perfectly
classifies the train data while keeping data points
as far away from the hyperplane as possible
The mathematical way of writing this request is the following
Constraints Objective
such that for all
This is known as an This looks so complicated,

optimization problem with an how will I ever find a
objective and lots of solution to this optimization
Constrained Optimization 101
Constraints are usually specified using
math equations. The set of points that
satisfy all the constraints is called the
11
HOW WE MUST SPEAK TO MS Mfeasible set ofWE
HOW the optimization
SPEAK TOproblem A HUMAN
Objective
Constraints
I want to find an unknown that
gives me the best value
such that For
Youyour
optimization
according
specifiedproblem
constraints,
to this
has
function
thenooptimal
solution (least)
sincevalue
no point
of is
and etc. etc. satisfies
Oh!and all
it isyour
btw, achieved
not constraints
anyatwould do!
must satisfy these conditions
Feasible
Feasibleset
set is the
is empty!
interval
All I am saying is, of the values
of that satisfy my conditions,
s.t.
s.t. find me the one that gives the
and
and best value according to
3 6
Back to SVMs 12
Assume there do exist params that perfectly classify all train data
Consider one such params which classifies train data perfectly
Now, as
Thus, geometric margin is same as since model has perfect
classification!
We will use this useful fact to greatly simplify the optimization
problem
We will remove What if train data is non-linearly

this assumption separable i.e no linear classifier can
later perfectly classify it? For example
Support Vector Machines 13
Let be the data point that comes closest to the hyperplane i.e.
Recall that all this discussion holds only for a perfect classifier
Let and consider
Note this gives us for all as well as (as )
Thus, instead of searching for , easier to search for
such that for all

min {‖~ 𝐰 2‖2 }
~ ~
𝐰 ,𝑏 ❑
The C-SVM Technique What prevents me from misusing the slack
variables to learn a model that misclassifies every
data point?
14
For linearly separable cases where
The termweprevents
suspect youafrom
perfect
doing classifier
so.
exists If we set to be a large value (it is a
hyper-parameter), then it will penalize
solutions that misuse slack too much
s.t. for all
If a linear classifier cannot perfectly classify data, then find model
using Having the constraint prevents
us from misusing slack to
artificially inflate the margin
s.t. for all Recall English
as well as for all phrase “cut me some
slack”
The terms are called slack variables. They allow some data points
to come close to the hyperplane or be misclassified altogether
From C-SVM to Loss Functions 15
We can further simplify the previous optimization problem
Note basically allows us to have (even )
Thus, the amount of slack we want is just
However, recall that we must also satisfy
any slack i.e. you should have in this case

𝑥
Another way of saying that if you already have , then you [ ]
don’t need
+ ¿= max { 𝑥 ,0 } ¿
Thus, we need only set
The above is nothing but the popular hinge loss function!
Hinge Loss 16
Captures how well as a classifier classified a data point
Suppose on a data point , a model gives prediction score of (for a
linear model , we have )
We obviously want for correct classification but we also want for
large margin – hinge loss function captures both
Note that hinge loss not only penalizes misclassification

but also correct classification if the data point gets
too close to the hyperplane!
Final Form of C-SVM 17
Recall that the C-SVM optimization finds a model by solving
s.t. for all

as well as for all
Using the previous discussion, we can rewrite the above very simply

Ds 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ds 2

Uploaded by

Copyright:

Available Formats

Learning a Linear

Give me your device ID and TS271828182845

Then these could act

If , upper signal wins and answer is 0

Indeed! Such models would be very brittle and might

It is better to not select a model

such that for all

This is known as an This looks so complicated,

We will remove What if train data is non-linearly

such that for all

any slack i.e. you should have in this case

Note that hinge loss not only penalizes misclassification

s.t. for all

You might also like