You are on page 1of 3

Overfitting in ML: How to prevent it

What is Overfitting
“When model fits training data too well giving minimum sum of square of errors
but fails to perform well in testing dataset”. Overfitting happens when a model
learns the details and noise in the training data to the extent that it negatively
impacts the performance of the model on unseen data.
Let’s look visually;
In Linear Regression, we would like our model to follow a line like the following:

Even though the overall cost is not minimal, the line above fits within the trend
very well, making the model reliable. Let’s say we want to infer an output for an
input value that is not currently resident in the data set (i.e. generalize). The line
above could give a very likely prediction for the new input, as, in terms of Machine
Learning, the outputs are expected to follow the trend seen in the training set.
If our model gives trend line as below then it is Overfitting:

If above model obtains very minimal sum of square of errors and is fitting line with
all points, then model has captured all noise from data. Surely it is not going to fit
testing data. We call this model with high Variance.

How to Prevent it:


1) Regularization: In machine learning is the process of regularizing the
parameters that constrain, regularizes, or shrinks the coefficient estimates
towards zero
By looking at Green curve we can clearly say it is overfit. And below are
equation for both curves:

Curve1 = -x4+7X3-5x2+31x+30
Curve 2 = (1/5)x4+(7/5)X3-x2+(31/5)x+30
Larger coefficients tend to learn all points leading to overfitting so we
regularize the coefficients to avoid the problem

2) Cross Validation
In this technique we split training data into multiple mini train-test splits.
Then these splits are used to tune your model.

3) Remove features
You must remove irrelevant features from model to stop overfitting.
Multicollinearity should be also checked thoroughly in this technique.

You might also like