Professional Documents
Culture Documents
On
K-Fold Cross Validation Method
TABLE OF CONTENT
DATA SETS
• In machine Learning, the study and construction of algorithm that can learn from and
make prediction on the data is a common task. Such algorithms works by making
data- driven predictions or decision through building a mathematical model from
input data.
• The data used is called Data Sets.
• Date Sets are classified into two types.
1-Training Data set
2-Testing Data set.
4
• It is used for leaning of result and making algorithm or pattern. Hence it should be in
large amount say 70% of the initial data.
• Also know as Development data set.
5
• It is used for Data validation. Hence it should be maximum say 30% of initial data.
• Also know as Validation data sets.
6
CROSS VALIDATION
• Cross Validation is a model validation technique for accessing how the result of
statistical analysis will generalize to an independent data set.
• So we can say cross validation is used for-
1-Finding or estimating expected error.
2-Helps in selecting the best fit Model (Model which fit the data set best).
3-Avoiding Over-Fit Model.(e.g. time fit Model like Earthquake.)
8
Here we will discuss only 2 methods Hold out sample Validation and K-Fold Cross
Validation only.
9
ADVANTAGES /DISADVANTAGES
OF HOLD OUT METHOD
• Advantages
1-Simplest method
2-Easily can work on large Data.
3-Fast method as compared to other method.
Disadvantage
1-Not working for small data set.(here it comes the Role of K-Fold Cross validation.
12
• Given We have now a big data for validation. In case of Hold out method we have
only 150 data sets now in case of K-Fold cross validation method we have
100x5=500 data sets for validation.
• Hence we Have big data so it will more accurate as compared to other methods.
19
REFERENCES
• Wikipedia-
https://en.wikipedia.org/wiki/Training,_test,_and_validation_sets
• Geeks for Geeks
https://www.geeksforgeeks.org/cross-validation-machine-learning/
• Udacity
https://www.youtube.com/watch?v=TIgfjmp-4BA
21