You are on page 1of 3

Can K-fold cross validation cause overfitting?

Asked 4 years, 2 months ago Modified 2 years, 6 months ago Viewed 14k times

I am learning k -fold cross validation. Since each fold will be used to train the model (in k
iterations), won't that cause overfitting?
8
cross-validation overfitting

Share Cite Improve this question edited Jul 9, 2019 at 4:11 asked Jul 9, 2019 at 3:01

Follow Frans Rodenburg eric2323223


10.7k 3 30 64 407 1 6 16

4 Please keep in mind that training restarts after every iteration, with the model "forgetting everything it
knew about the data" – David Jul 9, 2019 at 7:19

1 FYI: How does cross-validation overcome the overfitting problem? – Franck Dernoncourt Jul 19, 2020
at 23:05

"No. Each fold is used to train a new model from scratch, predict the accuracy, and then the model is
discarded. You don't use any of the models trained during CV." But when ı used k fold cv, ı see that
accuracy increasing each fold, smallest one is initial fold, is it just a coincidicence? – Zafer Demir Mar
11, 2021 at 11:27

Sorted by:
2 Answers
Highest score (default)

K-fold cross validation is a standard technique to detect overfitting. It cannot "cause"


overfitting in the sense of causality.
22
However, there is no guarantee that k-fold cross-validation removes overfitting. People are
using it as a magic cure for overfitting, but it isn't. It may not be enough.
Your privacy
By clicking “Accept way
The proper all cookies”,
to applyyou agree Stack Exchange
cross-validation is as acan store cookies
method on your
to detect device and
overfitting. disclose
If you do CV,
information in accordance with our Cookie Policy.
and if there is a big difference between the test and the training error then you know you are
overfitting and need to get more diverse data or choose simpler models and stronger
Accept all cookies
regularization. The contrary does not hold: no big difference between test and train error
does not mean you haven't been overfitting.
Necessary cookies only
It's not a magic cure, but the best method to detect overfitting we have (when used
right). Customize settings
Some examples when cross-validation can fail:

data is ordered, and not shuffled prior to splitting

unbalanced data (try stratified cross-validation)

duplicates in different folds


natural groups (e.g., data from the same user) shuffled into multiple folds

There are other cases where it cannot detect information leakage and overtitting even when
used perfectly right. For example when analyzing time series, people like to standardize the
data, split it into past and future data, then train a model to predict the future development
of these stocks. The subtle information leakage was in the preprocessing: standardization
prior to temporal splitting leaks information about the average of the remainder. Similar leaks
can occur in other preprocessing. In outlier detection, if you scale the data to 0:1, a model can
learn that values close to 0 and 1 are the most extreme values you can observe etc.

Back to your question:

Since each fold will be used to train the model (in iterations), won't that cause
overfitting?

No. Each fold is used to train a new model from scratch, predict the accuracy, and then the
model is discarded. You don't use any of the models trained during CV.

You use validation (such as CV) for two purposes:

1. Estimate how good your model will (hopefully) work in practise when you deploy it,
without risking a real A-B-test in production yet. You only want to go live with models
that are expected to work better than you current approach, or this may cost your
company millions.

2. Find the "best" parameters for train your final model (which you want to train on the
entire training data). Tuning hyperparameters is when you have a high risk of overfitting
if you are not careful.

CV is not a way of "training" a model by feeding 10 batches of data.

Share Cite Improve this answer edited Jul 9, 2019 at 6:05 answered Jul 9, 2019 at 5:44
Your privacy
Follow Has QUIT--Anony-
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose
Mousse
information in accordance with our Cookie Policy.
41.6k 7 65 105

On the contrary, cross-validation is a good way to combat overfitting!

7 Why k-fold CV?


Suppose you have a model and you want an estimate of its out-of-sample performance...
You could assess the prediction error on the same data used to fit the model (i.e. the training
error), but this is obviously not a good indicator of out-of-sample performance. If the model
is indeed overfitting, it will perform poorly on new observations, but you will still observe a
low training error.

Alternatively, you could split your data into two part (train/test) and only use the train set to
fit the model. The rest of the data, never seen by the model in any way, is then used to get an
estimate of the out-of-sample performance. Great! But what if we had used a different split?
As it turns out the variance between results obtained from different splits can be quite large...
so large in fact, that data splitting is only reliable for really large n.

This is what k -fold CV attempts to tackle, by doing the following repeatedly:

Fit your model with n − n

k
observations;

Observe its performance on the remaining n

k
observations, which were not used to fit
your model.

You repeat this process k times, each time leaving out the next n

k
observations for testing,
until all observations have been used once as a test set. You then sum the errors on the test
set of each fold (or compute a weighted average), and you have an estimate of out-of-sample
performance that is less sensitive to the particular splits used, because there are now k of
them.†

Can this cause overfitting?


Now to answer your question:

Since each fold will be used to train the model (in k iterations), won't that cause
overfitting?

Each fold is indeed used to train the same model... from scratch. So while there is indeed
overlap between training sets, and thus you are indeed fitting models on (partially) the same
data multiple times, you are not reusing the data to update your estimates!

If your model would overfit in a particular fold, then the training error of that fold would be
lower than the testing error of that fold. Hence, when summing/averaging the errors of all
folds, a model that overfits would have low cross-validated performance.
Your privacy
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose
information in accordance with our Cookie Policy.
† : Even better, if you can afford it computationally, is to repeat k -fold CV multiple times.

Share Cite Improve this answer edited Jul 9, 2019 at 5:50 answered Jul 9, 2019 at 4:26
Follow Frans Rodenburg
10.7k 3 30 64

You might also like