You are on page 1of 2

Welcome back.

This is machine learning lecture number four.


Last time, we talked about how to formulate maximum margin
linear classification as an optimization problem.
Today, we're going to try to understand
the solutions to that optimization problem
and how to find those solutions.
If you recall, our objective function
for a learning problem, the objective function,
is decomposed into two parts.
One is the average loss, trying to agree with the observation.
And the other part is regularization
where we are trying to understand and bias
the solution towards that type of solution
we are interested in.
In this case, maximum margin linear separators.
So more specifically, in case of support vector machine,
our average loss is an average hinge loss
of the linear predictor on those training examples.
When the agreement here falls below value one, agreement
between the linear function and the provided label,
when that falls below one, we start incurring loss.
And the average of those losses is
what we are trying to minimize.
On the other hand, we have a regularization term.
Minimizing that regularization term
will try to push those margin boundaries further and further
apart.
And the balance between these two
are controlled by the regularization parameter
lambda.
And we are going to initially try
to understand how the nature of the solution
varies as a function of that regularization parameter.
So here's the picture that you've seen before.
Here is the objective function for learning as before.
And the margin boundaries, whether linear predictor
takes value 1 or minus 1, are distance 1
over the norm of the parameter vector apart.
So when we minimize the regularization term here,
we are pushing the margin boundaries
further and further apart.
And as we do that, we start hitting the actual training
examples.
The boundary needs to orient itself,
and we may start incurring losses.
So as we change the regularization parameter,
lambda, we change the balance between these two terms.
The larger devalued lambda is the more
we try to push the margin boundaries apart,
the smaller it is, the more emphasis
we put on minimizing the average laws on the training example.
So we are trying to classify the examples correctly
within the margin boundaries.
And the margin boundaries themselves
will start shrinking towards the actual decision boundary.
So let's see initially, pictorially how this works.
So here, I have a set of training examples.
This is a linearly separable problem.
And I start with the regularization parameter being
small, 0.1.
So I'm emphasizing correct classification
of these examples.
And the solution that I get is then
that all the examples are on the correct side of the decision
boundary.
For example, this one here is on the correct side
of the positive margin boundary.
As I start increasing the value of lambda,
I put more emphasis on regularization term, that is,
trying to push the margin boundaries further and further
apart.
In this case, they did not yet move
because that additional emphasis on the regularization term
is not enough to counterbalance losses
that I would incur if the margin boundaries
past the actual examples.
As I increase their value of lambda further,
I start allowing some of the margin boundaries
to let the examples within the actual margin boundaries.
As I increase that further, the solution changes.
The solution to the optimization problem will be different.
And as you can see, the further and further the margin
boundaries are posed, the more the solution
will be guided by the [INAUDIBLE] of the points,
rather than just the points that are right close to the decision
boundary.
So the solution changes as we change the regularization
parameter.
Here is another example where the problem is actually already
not linearly separable.
And I have a small value of the regulators and parameter.
But I cannot find a separable solution,
so some of the margin constraints are violated.
So I will incur losses on some of those points already here.
As I start increasing the value of the regularization parameter
lambda, the solution starts changing as before,
being guided more by where the bulk of the points
are similar to before.

You might also like