affects the performance of our model for any general objective function in this form. So let's start with the case with alpha being 0. In that case, our objective function is purely just a lost term, which means that our model should do very well to approximate the relationship between the training data inputs and the training data outputs. So in terms of training objective function, as alpha increases from 0, we should see a increase of our objective function value. In terms of accuracy, which is proportional to the inverse of our objective function, because we're trying to minimize our objective function, this will look like something like this as we increase alpha.
Now, let's think about what happens
to the testing objective function value and testing accuracy. When alpha is very small, let's say we start off with some testing objective function value that's here. As we increase alpha, we increase the amount of regularization that's been put on our model, which means that our model is starting to generalize better to unseen testing data. So the objection function value actually decreases. However, beyond a certain point, we may generalize, we may place too much value on the regularization term and make our model so generalizable that it can no longer classify any sort of data set, so our testing error starts to increase again. On the accuracy plot, this will look something like this.
And we see that there's actually a value--
let's call that alpha star--
for which we have a minimum in our testing objective function
of value and a maximum in our testing accuracy performance. And this is the alpha that we want to use, because it's the optimal alpha that gave us the best performance for our model on unseen testing data. And the question is, how do we actually find this alpha star value with only training set data?
Because we can't actually use testing
set to test what this alpha star value is because we, by definition, testing set is data we haven't seen before. And to do this, we'll actually see that the method we use is cross-validation.