Machine Learning Notes

What is machine Learning?
Machine learning is a subfield of artificial intelligence. Its goal is to enable computers to learn on
their own. A machine’s learning algorithm enables it to identify patterns in observed data, build
models that explain the world, and predict things without having explicit pre-programmed rules and
models.
Machine learning is one of many subfields of artificial intelligence, concerning the ways that
computers learn from experience to improve their ability to think, plan, decide, and act.
Machine Learning is the field of study that gives computers the ability to learn without being
explicitly programmed.
ML is a branch of AI, concerned with design and development of algorithms that allow computers to
evolve behaviours based on empirical data.
“A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E. ”
Types of ML:
Supervised Learning: Predictive Model: A training set of examples with the correct
responses (targets) is provided and based on this training set algorithm generalises to respond
correctly to all possible inputs. This is also called learning from example.
The aim is to learn a mapping from the input to an output whose correct values are provided by the
supervisor.
Supervised learning is where you have input variables (x) and an output variable (Y) and you use an
algorithm to learn the mapping function from the input to the output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data (x) that
you can predict the output variables (Y) for that data.
It is called supervised learning because the process of an algorithm learning from the training
dataset can be thought of as a teacher supervising the learning process. We know the correct
answers, the algorithm iteratively makes predictions on the training data and is corrected by the
teacher. Learning stops when the algorithm achieves an acceptable level of performance.
Unsupervised Learning: Descriptive Model: Correct responses are not provided, but instead the
algorithm tries to identify similarities between the inputs so that inputs that have something in
common are categorised together. the statistical approach to unsupervised learning is known as
density estimation.
Unsupervised learning is where you only have input data (X) and no corresponding output variables.
The goal for unsupervised learning is to model the underlying structure or distribution in the data in
order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above there is no correct
answers and there is no teacher. Algorithms are left to their own devises to discover and present the
interesting structure in the data.
Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby

also a branch of Artificial Intelligence. It allows machines and software agents to automatically
determine the ideal behaviour within a specific context, in order to maximize its performance.
Simple reward feedback is required for the agent to learn its behaviour; this is known as the
reinforcement signal.
Semi-Supervised Learning: Problems where you have a large amount of input data (X) and only
some of the data is labeled (Y) are called semi-supervised learning problems.
semi-supervised learning is actually a supervised method that avoids labeling a large number of
instances. This is done by using some of the labeled data to help the classifier labeling the unlabeled
data. Then, this automatic labeled data is also used by the training process.
Issues in Machine Learning.
 What algorithms exist for learning general target function from specific training examples?
What algorithm perform best for which types of problems and representations?
 How much training data is sufficient?
 What is the best strategy for choosing a useful next training experience, and how does the
choice of this strategy alter the complexity of the training problem?
 What is the best way to reduce the learning task to one or more function approximation
problem?
 How can a learner automatically alter its representation to improve its ability to represent
and learn the target function?
 When and how the prior knowledge held by the learner guide the process of generalizing
from example?
Key Terminology:
 Expert System
 Features/ Attributes
 Classification
 training set
 test set
 validation set
 target variable
 Knowledge Representation
 cross validation
 inductive bias
Applications of Machine Learning:
 Classification (SPAM Mail detection)

 Pattern Recognition
 Medical diagnosis
 Fraud Detection
 Web Search Results
 Sentiment Analysis
 Predictions
 NIDS
 Image Recognition
How to choose the right algorithm?
1. First of all, we need to decide what is your goal. What are you trying to get out of this? Also
what data do you have or can you collect?
2. If the goal is to predict or forecast a target value, then you have to go for Supervised
Learning. Else if there is no target value, you need Unsupervised learning.
3. If you go for Supervised Learning and your target values are categorical value, then look into
classification. Else for continuous values you need regression.
4. If you are using Unsupervised Learning and you need to fit your data into some discrete
groups, the look for clustering algorithms. Else if you need some numerical estimate of how
strong the fit is into each group; you need density estimation algorithm.
5. With the algorithms narrowed, there is no single answer to what the best algorithm is or
what will give you the best result. You need to try different algorithms and see how they
perform on your data.
6. Optimize hyperparameters of the algorithms using cross-validation, you can tune each
algorithm to optimize performance, if time permits it. If not, manually selected
hyperparameters will work well enough for the most part.
7. The best algorithm is the iterative process of trial and error.
 For predicting values (product demand, sales figures, trends) – Regression

 Finding Unusual Occurrences (detect fraud, predict credit risk) – Anomaly Detection (PCA,
SVM)
 Discover Structures (customer Segmentation, customer taste) – Clustering
 Predict Categories – Classification
Steps in developing Machine Learning Applications:
1. Collect Data
2. Pre-processing / Transform, Prepare the input data.
3. Analyse the input data.
4. Data Splitting
5. Train the algorithm
6. Test the algorithm
7. Use it to prediction
Linear Regression:
 Linear Regression is a simple approach for supervised learning. It is a useful algorithm for
predicting quantitative response.
 Regression is used to predict continuous value.
 Regression predicts a continuous target variable (quantitative response/ dependent
variable) Y, on the basis of input data (a single/multiple predictor variable / independent
variables) X. It allows you to estimate a value, such as housing prices or human lifespan,
based on input data (features/attributes) X.
 Here, target variable means the unknown variable we care about predicting, and continuous
means there aren’t gaps (discontinuities) in the value that Y can take on.
 It assumes that there is approximately a linear relationship between X and Y.
Mathematically, we can write a linear relationship for single independent variable X as
Y ≈ β0 + β1X
β0 and β1 are two unknown constants that represent the intercept and slope terms in the
linear model. Together, β0 and β1 are known as the model coefficients or parameters. Once
we have used our training data to produce estimates β’0 and β’1 for the model coefficients
we can predict the quantitative response or the target variable Y, using the above formula
with the estimates β’0 and β’1 .
y’ = β’0 + β’1x
 Predicting income is a classic regression problem. Your input data X includes all relevant
information about individuals in the data set that can be used to predict income, such as
years of education, years of work experience, job title, or zip code. These attributes are
called features, which can be numerical (e.g. years of work experience) or categorical (e.g.
job title or field of study).
 You’ll want as many training observations as possible relating these features to the target
output Y, so that your model can learn the relationship f between X and Y.
 The data is split into a training data set and a test data set. The training set has labels, so
your model can learn from these labelled examples. The test set does not have labels, i.e.
you don’t yet know the value you’re trying to predict. It’s important that your model can
generalize to situations it hasn’t encountered before so that it can perform well on the test
data.
 For estimating coefficients one of the method is Ordinary Least Squares which uses calculus
to estimate the coefficients.
 We have our data set X, and corresponding target values Y. The goal of ordinary least
squares (OLS) regression is to learn a linear model that we can use to predict a new y given a
previously unseen x with as little error as possible.
 Estimated coefficients should be such that they minimize the error in model’s prediction as
much as possible.
 We need a cost function/ loss function that measures how inaccurate our model’s prediction
is.
 For OLS method/approach the cost function taken is the Sum of Squared Errors (SSE)
SSE = (y1− β’0− β’1x1)2+(y2− β’0− β’1x2)2+. . .+(yn− β’0− β’1xn)2
Xi = ith of observations in the dataset dataset.

Yi is the actual output and y’i = β’0 + β’1xi is the predicted output by estimated. So we
are taking summation of difference between the actual and the predicted outcome for each
observation in SSE.
 The least squares approach chooses β’0 and β’1 to minimize the SSE. Using some calculus,
one can show that the minimizers are
Fit:
A fit refers to how well you approximate a target function.
Supervised machine learning algorithms seek to approximate the unknown underlying mapping
function for the output variables given the input variables. Hence the goodness of fit refers to
measures used to estimate how well the approximation of the function matches the target function.
Overfitting:
Overfitting refers to a model that models the training data too well. Learning a function that
perfectly explains the training data that the model learned from, but doesn’t generalize well to
unseen test data.
Overfitting happens when a model learns the detail and noise in the training data to the extent that
it negatively impacts the performance of the model on new data. This means that the noise or
random fluctuations in the training data is picked up and learned as concepts by the model. The
problem is that these concepts do not apply to new data and negatively impact the models ability to
generalize.
Underfitting:
Underfitting refers to a model that can neither model the training data nor generalize to new data.
Underfitting is a related issue where your model is not complex enough to capture the underlying
trend in the data.
An underfit machine learning model is not a suitable model and will be obvious as it will have poor
performance on the training data.
Underfitting is often not discussed as it is easy to detect given a good performance metric. The
remedy is to move on and try alternate machine learning algorithms. Nevertheless, it does provide a
good contrast to the problem of overfitting.
Perfect Fitting: refers to the model that is quite generalised for any type of test data. So variance and
bias both are low.
Bias: is the variation in the predicted values by the estimated target function/ model with the actual
values of the dataset.
It is the amount of error introduced by approximating the actual relationship between input and
output variable with a simplified model.
Variance: is the variation in the estimation of the target function/model with change in the training
data.
It is how much your model's test error changes based on variation in the training data.
Logistic Regression:
 In linear regression the response variable is continuous and quantitative in nature. But there
are situations in which the response variable is qualitative in nature are categorical.
 Predicting this qualitative response is known as classification in Machine Learning.
 Predicting a qualitative response for an observation can be referred to as classifying that
observation, since it involves assigning the observation to a category, or class. On the other
hand, often the methods used for classification first predict the probability of each of the
categories of a qualitative variable, as the basis for making the classification. In this sense
they also behave like regression methods.
 Classification predicts a discrete target label Y. Classification is the problem of assigning new
observations to the class to which they most likely belong, based on a classification model
built from labeled training data.
 Logistic regression is a method of Classification.
 Here Rather than modelling the dependent variable or the response Y directly, logistic
regression models the probability that Y belongs to a particular category.
 So we use a logit model which was designed for assigning a probability between 0% and
100% that Y belongs to a certain class.
 The logit model is a modification of linear regression that makes sure to output a probability
between 0 and 1 by applying the sigmoid function, which, when graphed, looks like the
characteristic S-shaped curve.
 To solve this issue of getting model

outputs less than 0 or greater than 1, we’re going to define a new function F(g(x)) that
transforms g(x) by squashing the output of linear regression to a value in the [0,1] range. So
here we use the sigmoid function.
 So we plug g(x) into the sigmoid function above, resulting in a function of our original
function (yes, things are getting meta) that outputs a probability between 0 and 1:
 In other words,
we’re calculating the probability that the training example belongs to a certain class: P(Y=1).
 Here we’ve isolated p, the probability that Y=1, on the left side of the equation. If we want
to solve for a nice clean β0 + β1x + ϵ on the right side so we can straightforwardly interpret
the beta coefficients we’re going to learn, we’d instead end up with the logodds ratio, or
logit, on the left side—hence the name “logit model”:
 The log-odds ratio is

simply the natural log
of the odds ratio, p/(1-
p).
 p/(1-p), this quantity is known as odds.
 The cost function for logistic regression is as follows
  To fit the model, we use a method called maximum likelihood, to estimate the
parameters.
Inductive Bias:
In machine learning, the term inductive bias refers to a set of (explicit or implicit) assumptions made
by a learning algorithm in order to perform induction, that is, to generalize a finite set of observation
(training data) into a general model of the domain. Without a bias of that kind, induction would not
be possible, since the observations can normally be generalized in many ways. Treating all these
possibilities in equally, i.e., without any bias in the sense of a preference for specific types of
generalization (reflecting background knowledge about the target function to be learned),
predictions for new situations could not be made.

Machine Learning Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning Notes

Uploaded by

Copyright:

Available Formats

What is machine Learning?

Reinforcement Learning: Reinforcement Learning is a type of Machine Learning, and thereby

Issues in Machine Learning.

 Classification (SPAM Mail detection)

How to choose the right algorithm?

 For predicting values (product demand, sales figures, trends) – Regression

Steps in developing Machine Learning Applications:

Xi = ith of observations in the dataset dataset.

A fit refers to how well you approximate a target function.

 To solve this issue of getting model

 The log-odds ratio is

You might also like