K Fold

PRESENTATION
On
K-Fold Cross Validation Method
• Under the Guidance –Mrs.Divya Gupta • Made By Shubham Gupta

(M.tech) Aktu-Roll No-1511010041
Assistant Professor Computer Science B.Tech CSE 3rd Year 39
Department IERT ALD
2
TABLE OF CONTENT
• Data Sets • Hold –Out Method For Cross Validation

1-Training Data Sets 1-Definition
2-Testing Data Sets 2-Need
3-Data Set Figure Representation 3-Advantages
• Cross validation 4-Disadvantages
1-Definition • K-Fold Cross Validation methods
2-Methods of Cross Validation 1-Definition
2-Need
3-Advantages
4-Disadvantages
* References
3
DATA SETS
• In machine Learning, the study and construction of algorithm that can learn from and
make prediction on the data is a common task. Such algorithms works by making
data- driven predictions or decision through building a mathematical model from
input data.
• The data used is called Data Sets.
• Date Sets are classified into two types.
1-Training Data set
2-Testing Data set.
4
TRAINING DATA SET

• Type of data set in which we know the solution or in other words we can say we
know the input and output data both such type of data is called Training Data set.
Eg-History (We know the outcome of that).
• It is used for leaning of result and making algorithm or pattern. Hence it should be in
large amount say 70% of the initial data.
• Also know as Development data set.
5
TESTING DATA SET

• Type of Data Set in which we don't know the solution or in other words we don’t
now the output of that input set. Such type of Data Set is called Training Data sets.
Eg. Future (We don’t know the outcomes of events that will occur in future.
• It is used for Data validation. Hence it should be maximum say 30% of initial data.
• Also know as Validation data sets.
6
DATA SET FIGURE REPRESENTATION

7
CROSS VALIDATION
• Cross Validation is a model validation technique for accessing how the result of
statistical analysis will generalize to an independent data set.
• So we can say cross validation is used for-
1-Finding or estimating expected error.
2-Helps in selecting the best fit Model (Model which fit the data set best).
3-Avoiding Over-Fit Model.(e.g. time fit Model like Earthquake.)
8
METHOD USED FOR CROSS

VALIDATION
• There are four methods used for Cross Validation. These are-
1-Hold out sample Validation.
2-K-Fold Cross Validation
3-Leave one out Cross Validation
4-Bootstraps Methods
Here we will discuss only 2 methods Hold out sample Validation and K-Fold Cross
Validation only.
9
HOLD OUT CROSS VALIDATION

• Step by Step-
• Step 1:- Took all data
• Step 2-Randomly divided into two parts

(say 70% 30)
• Step 3: Use Part1 as development

(training data set) and Part2 as
testing data set.
10
WHY WE DID SO IN HOLD OUT

METHOD
• To ensure that we learn the generalized
pattern without much error.
• Pattern obtained from the training set data

must show similar results in test/validation
data.
11
ADVANTAGES /DISADVANTAGES
OF HOLD OUT METHOD
• Advantages
1-Simplest method
2-Easily can work on large Data.
3-Fast method as compared to other method.
Disadvantage
1-Not working for small data set.(here it comes the Role of K-Fold Cross validation.
12
WHY WE NEED K-FOLD CROSS

VALIDATION METHOD
• Suppose a situation in which we have a short data
set say 500 data sets.
• Now we split the data into 70 :30 % as hold out
method says.
• Hence we only get 150 records which is too low.
• To increase it we make it 50:50 %Ratio.
• Now if we make 50:50 ratio than the training data
will become too low.
• If we don’t have much training data the model
develop will have more error and will not be accurate.
13
DILEMMA STATE IN TRAINING

AND TESTING DATA
• #More Training data more • #more Testing data more Value to

accurate model will develop. check data.
• #Less error in the model.
• Here it comes the role of K-Fold

CV.
14
K-FOLD CROSS VALIDATION
• Let assume k=5.So it will be 5-Fold validation.

• First take the data and divide it into 5
equal parts.
• Each part will have 20% of the data set values.

15

CONTD
• Now used 4 parts as
development and 1 parts
for validation.
See the given figure
16

CONTD
• Similar we can
done the same
thing for next
four.
See the
Figure
17

CONTD
• Points to be noted
• Each part become available for 1 time in validation set.
• Similar Each part will become 4 times in the training Set.
• Hence we have increased both validation set and training.
18
ADVANTAGES OF K-FOLD CROSS VALIDATION

METHOD
• Given We have big data for model Development as in the Hold out method we have
only 500 data set now we have 500x5=2500 data sets in the K-Fold Cross validation
method .
• Given We have now a big data for validation. In case of Hold out method we have
only 150 data sets now in case of K-Fold cross validation method we have
100x5=500 data sets for validation.
• Hence we Have big data so it will more accurate as compared to other methods.
19
DISADVANTAGES OF K-FOLD CROSS

VALIDATION METHOD
• Only the Disadvantage that the K-Fold Cross Validation method has is it calculation.
• As we Repeat the model-K-times Hence it required More heavy calculation. Infact

it required K-times more calculation as compared to Hold –Out Cross Validation
method.
• Hence it is K-times slower.

20
REFERENCES
• Wikipedia-
https://en.wikipedia.org/wiki/Training,_test,_and_validation_sets
• Geeks for Geeks
https://www.geeksforgeeks.org/cross-validation-machine-learning/
• Udacity
https://www.youtube.com/watch?v=TIgfjmp-4BA
21

K Fold

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

K Fold

Uploaded by

Copyright:

Available Formats

PRESENTATION

• Under the Guidance –Mrs.Divya Gupta • Made By Shubham Gupta

• Data Sets • Hold –Out Method For Cross Validation

TRAINING DATA SET

TESTING DATA SET

DATA SET FIGURE REPRESENTATION

METHOD USED FOR CROSS

HOLD OUT CROSS VALIDATION

• Step 2-Randomly divided into two parts

• Step 3: Use Part1 as development

WHY WE DID SO IN HOLD OUT

• Pattern obtained from the training set data

WHY WE NEED K-FOLD CROSS

DILEMMA STATE IN TRAINING

• #More Training data more • #more Testing data more Value to

• Here it comes the role of K-Fold

K-FOLD CROSS VALIDATION

• Let assume k=5.So it will be 5-Fold validation.

• Each part will have 20% of the data set values.

K-FOLD CROSS VALIDATION

K-FOLD CROSS VALIDATION

K-FOLD CROSS VALIDATION

ADVANTAGES OF K-FOLD CROSS VALIDATION

DISADVANTAGES OF K-FOLD CROSS

• As we Repeat the model-K-times Hence it required More heavy calculation. Infact

• Hence it is K-times slower.

You might also like