Professional Documents
Culture Documents
Asked 4 years, 2 months ago Modified 2 years, 6 months ago Viewed 14k times
I am learning k -fold cross validation. Since each fold will be used to train the model (in k
iterations), won't that cause overfitting?
8
cross-validation overfitting
Share Cite Improve this question edited Jul 9, 2019 at 4:11 asked Jul 9, 2019 at 3:01
4 Please keep in mind that training restarts after every iteration, with the model "forgetting everything it
knew about the data" – David Jul 9, 2019 at 7:19
1 FYI: How does cross-validation overcome the overfitting problem? – Franck Dernoncourt Jul 19, 2020
at 23:05
"No. Each fold is used to train a new model from scratch, predict the accuracy, and then the model is
discarded. You don't use any of the models trained during CV." But when ı used k fold cv, ı see that
accuracy increasing each fold, smallest one is initial fold, is it just a coincidicence? – Zafer Demir Mar
11, 2021 at 11:27
Sorted by:
2 Answers
Highest score (default)
There are other cases where it cannot detect information leakage and overtitting even when
used perfectly right. For example when analyzing time series, people like to standardize the
data, split it into past and future data, then train a model to predict the future development
of these stocks. The subtle information leakage was in the preprocessing: standardization
prior to temporal splitting leaks information about the average of the remainder. Similar leaks
can occur in other preprocessing. In outlier detection, if you scale the data to 0:1, a model can
learn that values close to 0 and 1 are the most extreme values you can observe etc.
Since each fold will be used to train the model (in iterations), won't that cause
overfitting?
No. Each fold is used to train a new model from scratch, predict the accuracy, and then the
model is discarded. You don't use any of the models trained during CV.
1. Estimate how good your model will (hopefully) work in practise when you deploy it,
without risking a real A-B-test in production yet. You only want to go live with models
that are expected to work better than you current approach, or this may cost your
company millions.
2. Find the "best" parameters for train your final model (which you want to train on the
entire training data). Tuning hyperparameters is when you have a high risk of overfitting
if you are not careful.
Share Cite Improve this answer edited Jul 9, 2019 at 6:05 answered Jul 9, 2019 at 5:44
Your privacy
Follow Has QUIT--Anony-
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose
Mousse
information in accordance with our Cookie Policy.
41.6k 7 65 105
Alternatively, you could split your data into two part (train/test) and only use the train set to
fit the model. The rest of the data, never seen by the model in any way, is then used to get an
estimate of the out-of-sample performance. Great! But what if we had used a different split?
As it turns out the variance between results obtained from different splits can be quite large...
so large in fact, that data splitting is only reliable for really large n.
k
observations;
k
observations, which were not used to fit
your model.
You repeat this process k times, each time leaving out the next n
k
observations for testing,
until all observations have been used once as a test set. You then sum the errors on the test
set of each fold (or compute a weighted average), and you have an estimate of out-of-sample
performance that is less sensitive to the particular splits used, because there are now k of
them.†
Since each fold will be used to train the model (in k iterations), won't that cause
overfitting?
Each fold is indeed used to train the same model... from scratch. So while there is indeed
overlap between training sets, and thus you are indeed fitting models on (partially) the same
data multiple times, you are not reusing the data to update your estimates!
If your model would overfit in a particular fold, then the training error of that fold would be
lower than the testing error of that fold. Hence, when summing/averaging the errors of all
folds, a model that overfits would have low cross-validated performance.
Your privacy
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose
information in accordance with our Cookie Policy.
† : Even better, if you can afford it computationally, is to repeat k -fold CV multiple times.
Share Cite Improve this answer edited Jul 9, 2019 at 5:50 answered Jul 9, 2019 at 4:26
Follow Frans Rodenburg
10.7k 3 30 64