Professional Documents
Culture Documents
Example : In Driverless Car, the training data is fed to Algorithm like how to Drive Car
in Highway, Busy and Narrow Street with factors like speed limit, parking, stop at
signal etc. After that, a Logical and Mathematical model is created on the basis of that
and after that, the car will work according to the logical model. Also, the more data the
data is fed the more efficient output is produced.
2
Step 1) Choosing the Training Experience: The very important and first task is to
choose the training data or training experience which will be fed to the Machine
Learning Algorithm. It is important to note that the data or experience that we fed to the
algorithm must have a significant impact on the Success or Failure of the Model. So
Training data or experience should be chosen wisely.
Below are the attributes which will impact on Success and Failure of Data:
The training experience will be able to provide direct or indirect feedback regarding
choices. For example: While Playing chess the training data will provide feedback to
itself like instead of this move if this is chosen the chances of success increases.
Second important attribute is the degree to which the learner will control the
sequences of training examples. For example: when training data is fed to the
machine then at that time accuracy is very less but when it gains experience while
playing again and again with itself or opponent the machine algorithm will get
feedback and control the chess game accordingly.
Third important attribute is how it will represent the distribution of examples over
which performance will be measured. For example, a Machine learning algorithm
will get experience while going through a number of different cases and different
examples. Thus, Machine Learning Algorithm will get more and more experience by
passing through more and more examples and hence its performance will increase.
Step 2- Choosing target function: The next important step is choosing the target
function. It means according to the knowledge fed to the algorithm the machine
learning will choose NextMove function which will describe what type of legal moves
should be taken. For example : While playing chess with the opponent, when
opponent will play then the machine learning algorithm will decide what be the number
of possible legal moves taken in order to get success.
Step 3- Choosing Representation for Target function: When the machine algorithm
will know all the possible legal moves the next step is to choose the optimized move
4
Decision Tree
Difficulty Level : Easy
Last Updated : 22 Jun, 2021
Decision Tree: Decision tree is the most powerful and popular tool for classification and
prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a
test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal
node) holds a class label.
Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with
the DSA Self Paced Course at a student-friendly price and become industry ready. To complete
your preparation from learning a language to DS Algo and many more, please refer Complete
Interview Preparation Course.
In case you wish to attend live classes with experts, please refer DSA Live Classes for Working
Professionals and Competitive Programming Live for Students.
5
would be sorted down the leftmost branch of this decision tree and would therefore be classified
as a negative instance.
In other words we can say that decision tree represent a disjunction of conjunctions of constraints
on the attribute values of instances.
Entropy
Bits
Lossless
Formula
8
The formula above gives us the minimum average encoding size , which
uses the minimum encoding size for each message type.
MODULE 2
In computational learning theory, probably approximately correct (PAC) learning is a framework for
mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.[1]
In this framework, the learner receives samples and must select a generalization function (called the
hypothesis) from a certain class of possible functions. The goal is that, with high probability (the
"probably" part), the selected function will have low generalization error (the "approximately
correct" part). The learner must be able to learn the concept given any arbitrary approximation ratio,
probability of success, or distribution of the samples.
k-Fold Cross-Validation
Cross-validation is a resembling procedure used to evaluate machine learning models on a limited
data sample.
10
The procedure has a single parameter called k that refers to the number of groups that a given data
sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a
specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10
becoming 10-fold cross-validation.
Cross-validation is primarily used in applied machine learning to estimate the skill of a machine
learning model on unseen data. That is, to use a limited sample in order to estimate how the model is
expected to perform in general when used to make predictions on data not used during the training of
the model.
It is a popular method because it is simple to understand and because it generally results in a less
biased or less optimistic estimate of the model skill than other methods, such as a simple train/test
split.
This approach involves randomly dividing the set of observations into k groups, or folds, of
approximately equal size. The first fold is treated as a validation set, and the method is fit on the
remaining k − 1 folds.
The results of a k-fold cross-validation run are often summarized with the mean of the model skill
scores. It is also good practice to include a measure of the variance of the skill scores, such as the
standard deviation or standard error.
Configuration of k
The k value must be chosen carefully for your data sample.
A poorly chosen value for k may result in a mis-representative idea of the skill of the model, such as a
score with a high variance (that may change a lot based on the data used to fit the model), or a high
bias, (such as an overestimate of the skill of the model).
Representative: The value for k is chosen such that each train/test group of data samples is large
enough to be statistically representative of the broader dataset.
k=10: The value for k is fixed to 10, a value that has been found through experimentation to generally
result in a model skill estimate with low bias a modest variance.
k=n: The value for k is fixed to n, where n is the size of the dataset to give each test sample an
opportunity to be used in the hold out dataset. This approach is called leave-one-out cross-validation.
The choice of k is usually 5 or 10, but there is no formal rule. As k gets larger, the difference in size
between the training set and the resampling subsets gets smaller. As this difference decreases, the
bias of the technique becomes smaller
To summarize, there is a bias-variance trade-off associated with the choice of k in k-fold cross-
validation. Typically, given these considerations, one performs k-fold cross-validation using k = 5 or k
= 10, as these values have been shown empirically to yield test error rate estimates that suffer neither
from excessively high bias nor from very high variance.
Learning curves (LCs) are deemed effective tools for monitoring the performance of workers exposed
to a new task. LCs provide a mathematical representation of the learning process that takes place as
task repetition occurs.
— Learning curve models and applications: Literature review and research directions, 2011.
For example, if you were learning a musical instrument, your skill on the instrument could be
evaluated and assigned a numerical score each week for one year. A plot of the scores over the 52
weeks is a learning curve and would show how your learning of the instrument has changed over
time.
The metric used to evaluate learning could be maximizing, meaning that better scores (larger
numbers) indicate more learning. An example would be classification accuracy.
It is more common to use a score that is minimizing, such as loss or error whereby better scores
(smaller numbers) indicate more learning and a value of 0.0 indicates that the training dataset was
learned perfectly and no mistakes were made.
During the training of a machine learning model, the current state of the model at each step of the
training algorithm can be evaluated. It can be evaluated on the training dataset to give an idea of how
well the model is “learning.” It can also be evaluated on a hold-out validation dataset that is not part of
the training dataset. Evaluation on the validation dataset gives an idea of how well the model is
“generalizing.”
Train Learning Curve: Learning curve calculated from the training dataset that gives an idea of how
well the model is learning.
Validation Learning Curve: Learning curve calculated from a hold-out validation dataset that gives
an idea of how well the model is generalizing.
It is common to create dual learning curves for a machine learning model during training on both the
training and validation datasets.
In some cases, it is also common to create learning curves for multiple metrics, such as in the case of
classification predictive modeling problems, where the model may be optimized according to cross-
entropy loss and model performance is evaluated using classification accuracy. In this case, two plots
are created, one for the learning curves of each metric, and each plot can show two learning curves,
one for each of the train and validation datasets.
Optimization Learning Curves: Learning curves calculated on the metric by which the parameters of
the model are being optimized, e.g. loss.
Performance Learning Curves: Learning curves calculated on the metric by which the model will be
evaluated and selected, e.g. accuracy.
13
Now that we are familiar with the use of learning curves in machine learning, let’s look at some
common shapes observed in learning curve plots.