This document discusses how changing the regularization parameter in support vector machines affects the solution to the optimization problem. With a small lambda value, the solution focuses on correctly classifying all training examples. As lambda increases, the margins are pushed further apart, allowing some examples to fall within the margins in order to prioritize regularization. The solution is increasingly guided by the overall distribution of points rather than just points near the decision boundary.
This document discusses how changing the regularization parameter in support vector machines affects the solution to the optimization problem. With a small lambda value, the solution focuses on correctly classifying all training examples. As lambda increases, the margins are pushed further apart, allowing some examples to fall within the margins in order to prioritize regularization. The solution is increasingly guided by the overall distribution of points rather than just points near the decision boundary.
This document discusses how changing the regularization parameter in support vector machines affects the solution to the optimization problem. With a small lambda value, the solution focuses on correctly classifying all training examples. As lambda increases, the margins are pushed further apart, allowing some examples to fall within the margins in order to prioritize regularization. The solution is increasingly guided by the overall distribution of points rather than just points near the decision boundary.
Last time, we talked about how to formulate maximum margin linear classification as an optimization problem. Today, we're going to try to understand the solutions to that optimization problem and how to find those solutions. If you recall, our objective function for a learning problem, the objective function, is decomposed into two parts. One is the average loss, trying to agree with the observation. And the other part is regularization where we are trying to understand and bias the solution towards that type of solution we are interested in. In this case, maximum margin linear separators. So more specifically, in case of support vector machine, our average loss is an average hinge loss of the linear predictor on those training examples. When the agreement here falls below value one, agreement between the linear function and the provided label, when that falls below one, we start incurring loss. And the average of those losses is what we are trying to minimize. On the other hand, we have a regularization term. Minimizing that regularization term will try to push those margin boundaries further and further apart. And the balance between these two are controlled by the regularization parameter lambda. And we are going to initially try to understand how the nature of the solution varies as a function of that regularization parameter. So here's the picture that you've seen before. Here is the objective function for learning as before. And the margin boundaries, whether linear predictor takes value 1 or minus 1, are distance 1 over the norm of the parameter vector apart. So when we minimize the regularization term here, we are pushing the margin boundaries further and further apart. And as we do that, we start hitting the actual training examples. The boundary needs to orient itself, and we may start incurring losses. So as we change the regularization parameter, lambda, we change the balance between these two terms. The larger devalued lambda is the more we try to push the margin boundaries apart, the smaller it is, the more emphasis we put on minimizing the average laws on the training example. So we are trying to classify the examples correctly within the margin boundaries. And the margin boundaries themselves will start shrinking towards the actual decision boundary. So let's see initially, pictorially how this works. So here, I have a set of training examples. This is a linearly separable problem. And I start with the regularization parameter being small, 0.1. So I'm emphasizing correct classification of these examples. And the solution that I get is then that all the examples are on the correct side of the decision boundary. For example, this one here is on the correct side of the positive margin boundary. As I start increasing the value of lambda, I put more emphasis on regularization term, that is, trying to push the margin boundaries further and further apart. In this case, they did not yet move because that additional emphasis on the regularization term is not enough to counterbalance losses that I would incur if the margin boundaries past the actual examples. As I increase their value of lambda further, I start allowing some of the margin boundaries to let the examples within the actual margin boundaries. As I increase that further, the solution changes. The solution to the optimization problem will be different. And as you can see, the further and further the margin boundaries are posed, the more the solution will be guided by the [INAUDIBLE] of the points, rather than just the points that are right close to the decision boundary. So the solution changes as we change the regularization parameter. Here is another example where the problem is actually already not linearly separable. And I have a small value of the regulators and parameter. But I cannot find a separable solution, so some of the margin constraints are violated. So I will incur losses on some of those points already here. As I start increasing the value of the regularization parameter lambda, the solution starts changing as before, being guided more by where the bulk of the points are similar to before.