Professional Documents
Culture Documents
Is it learning at all?
It’s usually a good idea to plot the learning process. First, take a look
at the training loss. It should approach zero, or at least a reasonably
low value (see graphic below). If not, your model is probably either
not a good fit, is not flexible enough, or — and that is often forgotten
— the output is not a function of the input data so there is no
relationship to learn. Think about whether a sentence is really a
function of words, for instance. It may not be true unconditionally,
but can still be useful in some ways.
Training loss should decrease with the number of training iterations (but
not necessarily to 0)
What is generalization?
Bias
Variance
Like high bias and underfitting, high variance and overfitting are
related as well but are still not totally equivalent in meaning. See
below.
Overfitting
A large gap between training and validation loss is a hint that the
model does not generalize well and you may want to try to narrow
that gap (graphic below). The simplest solution to overfitting is
early-stopping, that is to stop the training loop as soon as validation
loss is beginning to level off. Alternatively, regularization may help
(see below). Underfitting, on the other hand, may happen if you stop
too early.
Generalization is low if there is large gap between training and validation
loss.
Regularization
To sum it all up, learning is well and good, but generalization is what
we really want. For that matter, a good model is supposed to have
both low bias and low variance. Overfitting and underfitting should
both be avoided as well. And regularization may be part of the
solution to all of that.
Don’t be surprised if real plots look very different from the ones
presented here. They are only supposed to visualize the concepts
and should help understand them. Real loss curves will sometimes
deviate in some way or another. Nevertheless, it’s good to know
what to look out for when training a machine learning model.