Reasons To Learn Probability For Machine Learning

9/23/2019 Reasons To Learn Probability for Machine Learning - usm systems - Medium
Reasons To Learn Probability for

Machine Learning
usm systems Follow
Sep 23 · 5 min read
https://medium.com/@usmsystems23/reasons-to-learn-probability-for-machine-learning-a17a7eb56d31 1/11
Probability is the field of mathematics that measures uncertainty.
This is a pillar in the field of machine learning, and it is essential to study

before starting. This is misleading advice because the potential makes more
sense to a learner when there is a context of the applied machine learning
process.
In this post, you will find out why machine learning practitioners study the
possibilities to improve their skills and capabilities.
After reading this post, you will know:
Not everyone should learn the potential; It depends on where you are on
your journey of learning machine learning.
Most algorithms are designed using tools and techniques from probability
such as Naive Bayes and Probabilistic Graphical Models.
The maximum likelihood framework that underlies the training of many
machine learning algorithms comes from the field of probability.
Let’s start.
Overview
This tutorial is divided into seven sections; They are:
1. Reasons for not learning probability
2. It is necessary to assess the likelihood of class membership
3. Some algorithms are designed using probability
4. Models are trained using the probabilistic framework
5. Models can be tuned with a probabilistic framework
6. Probability measurements are used to estimate model proficiency
7. One More Reason
Reasons for not learning probability
Before you get into the reasons why you need to learn probability, let’s start
by taking a brief look at the reasons why you shouldn’t.
If you are just starting out with applied machine learning I think you should
not study the possibilities.
It is not necessary.
Some machine learning algorithms do not require an appreciation of the

underlying abstract theory to use machine learning as a tool to solve
problems.
It was slow.
If it takes months and years to study the entire field before starting machine
learning, the model will delay in achieving your goals of working through
attendance modeling issues.
This is a huge field.
Not all probability is related to theoretical machine learning, let alone

applied machine learning.
I recommend the width-first approach to getting started in applied machine
learning.
I call this the results-first approach. You start by learning and practicing the
steps to work out a model attendance modeling problem end-to-end (eg
how to get results) with a tool (such as Skeet-Learn and Pandas in Python).
This process then provides the skeleton and context to gradually deepen
your knowledge, i.e. how the algorithms work and the mathematics that
ultimately counts.
Once you know how to work through the attendance modeling problem,
let’s look at why you can increase your awareness of probability.
1) It is necessary to assess the likelihood of class membership
Classification Predictive Modeling Problems An example is where a given

label is assigned.
One example you know is the Iris Flowers Dataset, where we have four
dimensions of a flower and the goal is to assign one of three different
species of Iris flower to consideration.
We can model the problem by assigning the class label directly to each
observation.
Input: Dimensions of a flower.

Output: an iris species.
A more general approach is to formulate the problem as a potential class

membership, where the probability of each known class being examined is
evaluated.
Input: Dimensions of a flower.

Output: the probability of membership for each iris species.
Formulating the problem as an estimate of class membership simplifies the

modeling problem and makes the model easier to learn. This allows the
model to capture the opacity in the data, which allows the user, such as the
downstream process, to understand the probabilities in the domain context.
By selecting the class with the largest probability, the probabilities can be
transformed into a crisp class label. The probabilities can be scaled or
changed using the probability calibration procedure.
This choice of class membership framing requires a basic understanding of

the likelihood of a modeled prediction problem description.
2) Some algorithms are designed using probability
There are specially designed algorithms to use tools and techniques from
probability.
These are from individual algorithms, such as the Naive Bayes algorithm,
which are built using some simple హ with Bayes theory.
Naive Bayes
It also extends to the whole field of study, called probability graphical
models, often graphical models, or abbreviated PGM, and is based around
Bayes theory.
Probabilistic graphical models

A notable graphical model is Bayesian Belief Networks or Bayesian
Networks, which can capture conditional dependencies between variables.
Bayesian belief networks
3) Models are trained using the probabilistic framework
Many machine learning models are trained using an iterative algorithm

developed under a potential framework.
The framework of maximum likelihood estimation is probably the most

common, sometimes abbreviated as MLE. It is a framework for estimating
the model parameters (eg weights) given the observed data.
It is a framework that represents the general least squares estimation of the

linear regression model.
The expectation-maximization algorithm, or EM for short, is an approach

for maximum likelihood estimation that is often used for unsupervised data
clustering, e.g. Estimating k for k clusters, also known as k -means
clustering algorithm.
For models that estimate class membership, the maximum likelihood

estimation framework provides a way to reduce the difference or difference
between the observed and estimated probability distribution. It is used in
classification algorithms such as logistic regression and deep learning
neural networks.
It is common to measure this difference in the probability distribution

during training using entropy, e.g. By cross entropy. The differences
between the distributions measured by the entropy, and the KL divergence,
and from the cross-entropy information theory are built directly on the
theory of probability. For example, entropy is directly calculated as a

negative log of probability.
4) Models can be tuned with a probabilistic framework
It is common to tune the hyperparameters of the machine learning model,

kNN fork, or the learning rate on the neural network.
Typical methods include grid search ranges of hyperparameters or

randomly coupled hyperparameter combinations.
Bayesian optimization is more efficient for hyperparameter optimization,

which searches the space of possible configurations based on those
configurations that lead to improved performance.
As its name suggests, this approach is designed and uses Bayes theory when
designing the space of possible configurations.
5) Probability measurements are used to estimate model proficiency
For predictive algorithms of probability, evaluation steps are necessary to

capture the model performance.
There are several steps to capture the performance of the model based on
the probability given by Prob. Common examples include overall measures
such as log loss and Briar score.
For binary classification tasks where a single probability score is estimated,

the receiver operating characteristic, or ROC, can construct curves to
explore different cut-offs, which can be used in interpreting the estimation,
resulting in different trade-offs. The area under the ROC curve, or ROC
AUC, can also be calculated as a total measure.
The selection and interpretation of these scoring methods require a

foundational understanding of the theory of probability.
Machine Learning AI Arti cial Intelligence Ai Development Company Ai Development
Discover Medium Make Medium yours Become a member

Welcome to a place where words matter. Follow all the topics you care about, and Get unlimited access to the best stories on
On Medium, smart voices and original we’ll deliver the best stories for you to your Medium — and support writers while
ideas take center stage - with no ads in homepage and inbox. Explore you’re at it. Just $5/month. Upgrade
sight. Watch
About Help Legal

Reasons To Learn Probability For Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reasons To Learn Probability For Machine Learning

Uploaded by

Copyright:

Available Formats

9/23/2019 Reasons To Learn Probability for Machine Learning - usm systems - Medium

Reasons To Learn Probability for

Probability is the field of mathematics that measures uncertainty.

This is a pillar in the field of machine learning, and it is essential to study

After reading this post, you will know:

This tutorial is divided into seven sections; They are:

1. Reasons for not learning probability

2. It is necessary to assess the likelihood of class membership

3. Some algorithms are designed using probability

4. Models are trained using the probabilistic framework

5. Models can be tuned with a probabilistic framework

6. Probability measurements are used to estimate model proficiency

7. One More Reason

Reasons for not learning probability

Some machine learning algorithms do not require an appreciation of the

This is a huge field.

Not all probability is related to theoretical machine learning, let alone

1) It is necessary to assess the likelihood of class membership

Classification Predictive Modeling Problems An example is where a given

Input: Dimensions of a flower.

A more general approach is to formulate the problem as a potential class

Input: Dimensions of a flower.

Formulating the problem as an estimate of class membership simplifies the

This choice of class membership framing requires a basic understanding of

2) Some algorithms are designed using probability

Probabilistic graphical models

Bayesian belief networks

3) Models are trained using the probabilistic framework

Many machine learning models are trained using an iterative algorithm

The framework of maximum likelihood estimation is probably the most

It is a framework that represents the general least squares estimation of the

The expectation-maximization algorithm, or EM for short, is an approach

For models that estimate class membership, the maximum likelihood

It is common to measure this difference in the probability distribution

theory of probability. For example, entropy is directly calculated as a

4) Models can be tuned with a probabilistic framework

It is common to tune the hyperparameters of the machine learning model,

Typical methods include grid search ranges of hyperparameters or

Bayesian optimization is more efficient for hyperparameter optimization,

5) Probability measurements are used to estimate model proficiency

For predictive algorithms of probability, evaluation steps are necessary to

For binary classification tasks where a single probability score is estimated,

The selection and interpretation of these scoring methods require a

Machine Learning AI Arti cial Intelligence Ai Development Company Ai Development

Discover Medium Make Medium yours Become a member

About Help Legal

You might also like