You are on page 1of 4

Name Of the Department: - ISE Semester/Year: - VI and 3rd year

Name of the Student: GANA SHREE G Roll number: - 20191ISE0053


Section: - 6ISE1 Date: - 24/05/2022

Course code: - CSE261 Course Title: - MACHINE LEARNING AND ALGORITHMS

BOOK REVIEW
MACHINE LEARNING: An Algorithmic Perspective
Authors: - ETHEM ALPAYDIN
Textbook
ABOUT THE AUTHOR
ETHEM ALPAYDIN is Professor in the Department of Computer Engineering at Özyegin University and a
member of the Science Academy, Istanbul. He is the author of the widely used textbook, Introduction to
Machine Learning (MIT Press), now in its fourth edition.

CROSS VALIDATION AND RE-SAMPLING METHOD


Re-sampling techniques are a set of methods to either repeat sampling from a given sample or population,
or a way to estimate the precision of a statistic.

Cross-validation is a re-sampling procedure used to evaluate machine learning models on a limited data
sample.

K Fold cross validation


This k Fold cross validation makes use of single parameter called K. This k refers to the number of groups
that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When
a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10
becoming 10-fold cross-validation. Cross-validation is primarily used in applied machine learning to estimate
the skill of a machine learning model on unseen data. That is, to use a limited sample in order to estimate how
the model is expected to perform in general when used to make predictions on data not used during the
training of the model.

It is a popular method because it is simple to understand and it generally results in a less biased or less
optimistic estimate of the model skill than other methods, such as a simple train/test split.

The general procedure is as follows:

1. Shuffle the dataset randomly.


2. Split the dataset into k groups
3. For each unique group:
 Take the group as a hold out or test data set
 Take the remaining groups as a training data set
 Fit a model on the training set and evaluate it on the test set
 Retain the evaluation score and discard the model
4. Summarize the skill of the model using the sample of model evaluation scores

CONFIGURATION OF K

The k value must be chosen carefully for your data sample.


A poorly chosen value for k may result in a mis-representative idea of the skill of the model, such as a
score with a high variance (that may change a lot based on the data used to fit the model), or a high bias,
(such as an overestimate of the skill of the model).
Three common tactics for choosing a value for k are as follows:
 Representative: The value for k is chosen such that each train/test group of data samples is large
enough to be statistically representative of the broader dataset.
 k=10: The value for k is fixed to 10, a value that has been found through experimentation to
generally result in a model skill estimate with low bias a modest variance.
 k=n: The value for k is fixed to n, where n is the size of the dataset to give each test sample an
opportunity to be used in the hold out dataset. This approach is called leave-one-out cross-
validation.
If a value for k is chosen that does not evenly split the data sample, then one group will contain a
remainder of the examples. It is preferable to split the data sample into k groups with the same number of
samples, such that the sample of model skill scores are all equivalent.
To make the cross-validation procedure concrete, let’s look at a worked example.
Imagine we have a data sample with 6 observations:
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6]

The first step is to pick a value for k in order to determine the number of folds used to split the data. Here,
we will use a value of k=3. That means we will shuffle the data and then split the data into 3 groups.
Because we have 6 observations, each group will have an equal number of 2 observations.

For example:

1 Fold1: [0.5, 0.2]


2 Fold2: [0.1, 0.3]
3 Fold3: [0.4, 0.6]

We can then make use of the sample, such as to evaluate the skill of a machine learning algorithm.

Three models are trained and evaluated with each fold given a chance to be the held out test set.

For example:

Model1: Trained on Fold1 + Fold2, Tested on Fold3


Model2: Trained on Fold2 + Fold3, Tested on Fold1
Model3: Trained on Fold1 + Fold3, Tested on Fold2
The models are then discarded after they are evaluated as they have served their purpose.

The skill scores are collected for each model and summarized for use.

Signature of faculty

You might also like