Professional Documents
Culture Documents
ML Unit - I - RSK
ML Unit - I - RSK
Credit : 3
Teaching Scheme: Examination Scheme:
Lecture:03 Hr/Week In-Sem: 30 Marks;
End-Sem: 70 Marks
II
III
IV
Course Contents
Unit Course Contents
VI
Course (Theory) Objectives:
•If any corrections are identified, the algorithm can incorporate that
information to improve its future decision making.
The main goal of machine learning is-
Regardless of the definition you choose, at its most basic level, the goal
of machine learning is-
•The systems learn, identify patterns, and make decisions with minimal
intervention from humans.
•Base knowledge for which the answer is known that enables (trains)
the system to learn.
Initially, the model is fed parameter data for which the answer is
known. The algorithm is then run, and adjustments are made until the
algorithm’s output (learning) agrees with the known answer. At this
point, increasing amounts of data are input to help the system learn and
process higher computational decisions.
Why Is Machine Learning Important?
..........continued
Why Is Machine Learning Important?
• New techniques in the field are evolving rapidly and expanded the
application of ML to nearly limitless possibilities.
https://www.youtube.com/watch?v=ukzFI9rgwfU
Develop predictive model based Group and interpret data based
on both input and output on input data
• In supervised learning, the machine is taught by example.
•The operator provides the machine learning algorithm with a known dataset that
includes desired inputs and outputs, and the algorithm must find a method to
determine how to arrive at those inputs and outputs.
•While the operator knows the correct answers to the problem, the algorithm
identifies patterns in data, learns from observations and makes predictions.
•The algorithm makes predictions and is corrected by the operator – and this process
continues until the algorithm achieves a high level of accuracy/performance.
• Supervised Learning is the most popular paradigm for performing machine learning
operations.
• It is widely used for data where there is a precise mapping between input-output
data.
• The dataset, in this case, is labeled, meaning that the algorithm identifies the
features explicitly and carries out predictions or classification accordingly.
• As the training period progresses, the algorithm is able to identify the relationships
between the two variables such that we can predict a new outcome.
• Resulting Supervised learning algorithms are task-oriented.
Classification:
Forecasting:
• Linear Regression
•Random Forest
........continued
• Gradient Boosting
SVMs are powerful classifiers that are used for classifying the binary
dataset into two classes with the help of hyperplanes.
•Logistic Regression
•These models are able to draw features from the image through
various filters. Finally, if there is a high similarity score between the
input image and the image in the database, a positive match is
provided.
Semi-supervised learning:
• The algorithm tries to organise that data in some way to describe its
structure.
Clustering:
Reinforcement
learning (RL) concerned with
how intelligent agents ought to
take actions in an environment
in order to maximize the
notion of cumulative reward.
Reinforcement learning:
• By defining the rules, the machine learning algorithm then tries to-
Learning a Function:
Y = f(x)
•An algorithm learns this target mapping function from training data.
• Assumptions can greatly simplify the learning process, but can also
limit what can be learned.
2. Learn the coefficients for the function from the training data.
Example:
An easy to understand functional form for the mapping function is a
line, as is used in linear regression:
b0 + b1*x1 + b2*x2 = 0
Where b0, b1 and b2 are the coefficients of the line that control the
intercept and slope, and x1 and x2 are two input variables.
Parametric Machine Learning Algorithms:...Example Continued
The problem is, the actual unknown underlying function may not be a
linear function like a line. It could be almost a line and require some
minor transformation of the input data to work right. Or it could be
nothing like a line in which case the assumption is wrong and the
approach will produce poor results.
Parametric Machine Learning Algorithms:... Continued
oPerceptron
oNaive Bayes
Less Data: They do not require as much training data and can work well
even if the fit to the data is not perfect.
Algorithms that do not make strong assumptions about the form of the
mapping function are called nonparametric machine learning algorithms.
By not making assumptions, they are free to learn any functional form
from the training data.
Nonparametric methods are good when you have a lot of data and no
prior knowledge, and when you don’t want to worry too much about
choosing just the right features.
— Artificial Intelligence: A Modern Approach, page 757
Nonparametric Machine Learning Algorithms:
•The method does not assume anything about the form of the mapping
function other than patterns that are close are likely to have a similar
output variable.
More data: Require a lot more training data to estimate the mapping
function.
Slower: A lot slower to train as they often have far more parameters to
train.
Overfitting: More of a risk to overfit the training data and it is harder to
explain why specific predictions are made.
Regression Analysis in Machine learning:
•It is one of the very simple and easy algorithms which works on
regression and shows the relationship between the continuous
variables.
•If there is only one input variable (x), then such linear regression is
called simple linear regression. And if there is more than one input
variable, then such linear regression is called multiple linear regression.
Example: Here we are predicting the salary of an employee on the basis
of the year of experience.
It uses the concept of threshold levels, values above the threshold level
are rounded up to 1, and values below the threshold level are rounded
up to 0.
The model is still linear as the coefficients are still linear with quadratic
Example: Linear regression ; Identify errors of prediction in a scatter
plot with a regression line
•In simple linear regression, we predict scores on one variable from the
scores on a second variable.
•In simple linear regression, the topic of this section, the predictions of
Y when plotted as a function of X form a straight line.
Example: Linear regression ; Identify errors of prediction in a scatter
plot with a regression line...continued
The example data in Table 1 are plotted in Figure 1. One can see that
there is a positive relationship between X and Y. If you were going to
predict Y from X, the higher the value of X, the higher your prediction of
Y.
X Y
Table 1.
1.00 1.00
Example data.
2.00 2.00
3.00 1.30
4.00 3.75
5.00 2.25
The error of prediction for a point is the value of the point minus the
predicted value (the value on the line).
Let predicted values : Y‘ then the errors of prediction = Y-Y’.
VARIANCE:
•Variability can also be defined in terms of how close the scores in the
distribution are to the middle of the distribution.
•Using the mean as the measure of the middle of the distribution, the
variance is defined as the average squared difference of the scores from
the mean.
X Y x = X- x2 y= Y-My y2 xy
Mx
1.00 1.00 -2 4 -1.06 1.123 2.12
Then
Y 94 70 59 80 92 65 87 95 99 105
y’= 6.79 X - 22
Then
Y = f(X, β)+ ϵ,
where X is a vector of p predictors,
β is a vector of k parameters,
f( ) is some known regression function,
and ϵ is an error term whose distribution may or may not be normal.
(Error term representing random sampling noise or the effect of variables not
included in the model.)
Notice that we no longer necessarily have the dimension of the
parameter vector simply one greater than the number of predictors.
•Given one or more inputs a classification model will try to predict the
value of one or more outcomes.
For example,
•when filtering emails “spam” or “not spam”,
Quick Check:
•Predict the number of copies a music album will be sold next month
•For example, if given a banana, the classifier will see that the fruit is of
yellow color, oblong shaped and long and tapered. All of these features
will contribute independently to the probability of it being a banana and
are not dependent on each other. Naive Bayes is based on Bayes’
theorem, which is given as:
•The question is at the node and it places the resulting decisions below
at the leaves.
•Knowing this, we can make a tree which has the features at the nodes
and the resulting classes at the leaves.
❏Two entities
Decision nodes ~ Splits the dataset
Leaves ~ Decisions are taken
Attribute-based representations
• Examples described by attribute values (Boolean, discrete,
continuous)
• E.g., situations where I will/won't wait for a table:
• Feature selection:
Feature selection tries to select a subset of the original features for
use in the machine learning model. In this way, we could remove
redundant and irrelevant features without incurring much loss of
information.
Dimensionality Reduction:....continued
Feature extraction:
• Feature extraction is also called feature projection. Whereas
feature selection returns a subset of the original features, feature
extraction creates new features by projecting the data in the high-
dimensional space to a space of fewer dimensions. This approach
can also derive informative and non-redundant features.
Feature selection:
In this, we try to find a subset of the original set of variables, or
features, to get a smaller subset which can be used to model the
problem. It usually involves three ways:
•Filter
•Wrapper
•Embedded
Overfitting occurs when the model performs well on training data but
generalizes poorly to unseen data.
Eight simple approaches to alleviate overfitting:
1. Hold-out (data)
• Rather than using all of our data for training, we can simply split our
dataset into two sets: training and testing.
• A common split ratio is 80% for training and 20% for testing.
• We train our model until it performs well not only on the training
set but also for the testing set.
• We let one of the groups to be the testing set (please see hold-out
explanation) and the others as the training set, and repeat this
process until each individual group has been used as the testing set
(e.g., k repeats).
•L2 regularization allows weights to decay towards zero but not to zero,
while L1 regularization allows weights to decay to zero.
7. Dropout (model)
•By applying dropout, which is a form of regularization, to our layers, we
ignore a subset of units of our network with a set probability.
•However, with dropout, we would need more epochs for our model to
converge.
Eight simple approaches to alleviate overfitting:...continued
•Once the validation loss begins to degrade (e.g., stops decreasing but
rather begins increasing), we stop the training and save the current
model.
2. What do you mean by linear regression? With suitable example, describe how linear
regression is used to predict the output for test example/input sample. What is non linear
regression? [8] OR
2. Describe Parametric and non parametric learning with their advantages and limitations. State
any four applications where machine learning is used? [8]
3. What is over fitting in machine learning? What are the different methods to overcome the
over fitting problem. Describe in brief. [6]