You are on page 1of 16

1

UNIT-3
BAYESIAN AND COMPUTIONBAL LEARNING

Introduction to Machine Learning:


 Machine learning is a growing technology which enables computers to learn
automatically from past data.
 Machine learning uses various algorithms for building mathematical models and
making predictions using historical data or information.
 Currently, it is being used for various tasks such as image recognition, speech
recognition, email filtering, Face book auto-tagging, recommender system, and
many more.
 This machine learning tutorial gives you an introduction to machine learning along with
the wide range of machine learning techniques such as Supervised, Unsupervised,
and Reinforcement learning.
 You will learn about regression and classification models, clustering methods, hidden
models, and various sequential models.
 A Machine Learning system learns from historical data, builds the prediction
models, and whenever it receives new data, predicts the output for it.
 Suppose we have a complex problem, where we need to perform some predictions, so
instead of writing a code for it, we just need to feed the data to generic algorithms, and
with the help of these algorithms, machine builds the logic as per the data and predict
the output.
 Machine learning has changed our way of thinking about the problem.
 The below block diagram explains the working of Machine Learning algorithm:
2

Evolution (or) History of Machine Learning:


Before some years (about 40-50 years), machine learning was science fiction, but
today it is the part of our daily life. Machine learning is making our day to day life easy
from self-driving cars to Amazon virtual assistant "Alexa". However, the idea behind
machine learning is so old and has a long history. Below some milestones are given which
have occurred in the history of machine learning:

The early history of Machine Learning (Pre-1940):


o 1834: In 1834, Charles Babbage, the father of the computer, conceived a device that
could be programmed with punch cards. However, the machine was never built, but
all modern computers rely on its logical structure.
The era of stored program computers:
o 1940: In 1940, the first manually operated computer, "ENIAC" was invented, which
was the first electronic general-purpose computer. After that stored program
computer such as EDSAC in 1949 and EDVAC in 1951 were invented.
Computer machinery and intelligence:
o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery and
Intelligence," on the topic of artificial intelligence. In his paper, he asked, "Can
machines think?"
Machine intelligence in Games:
3

o 1952: Arthur Samuel, who was the pioneer of machine learning, created a program
that helped an IBM computer to play a checkers game. It performed better more it
played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur Samuel.
The first "AI" winter:
o The duration of 1974 to 1980 was the tough time for AI and ML researchers, and
this duration was called as AI winter.
Machine learning from theory to reality
o 1959: In 1959, the first neural network was applied to a real-world problem to
remove echoes over phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural
network NETtalk, which was able to teach itself how to correctly pronounce 20,000
words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against the
chess expert Garry Kasparov, and it became the first computer which had beaten a
human chess expert.
Machine learning at 21st century
o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new name to
neural net research as "deep learning," and nowadays, it has become one of the
most trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to recognize the
image of humans and cats in YouTube videos.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go game. In
2017 it beat the number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was able
to learn the online trolling. It used to read millions of comments of different
websites to learn to stop online trolling.
4

Need for Machine Learning


 The need for machine learning is increasing day by day. The reason behind the need for
machine learning is that it is capable of doing tasks that are too complex for a person to
implement directly.
 As a human, we have some limitations as we cannot access the huge amount of data
manually.
 We can train machine learning algorithms by providing them the huge amount of data
and let them explore the data, construct the models, and predict the required output
automatically.
 The performance of the machine learning algorithm depends on the amount of data, and
it can be determined by the cost function.
 The importance of machine learning can be easily understood by its uses cases,
currently, machine learning is used in self-driving cars, cyber fraud detection, face
recognition, and friend suggestion by Facebook, etc.

Classification of Machine Learning


At a broad level, machine learning can be classified into three types:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
 Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.
 The system creates a model using labeled data to understand the datasets and learn
about each data, once the training and processing are done then we test the model by
providing a sample data to check whether it is predicting the exact output or not.
5

 The goal of supervised learning is to map input data with the output data.

 The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:


o Classification and Regression
2) Unsupervised Learning
 Unsupervised learning is a learning method in which a machine learns without any
supervision.
 The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision.
 The goal of unsupervised learning is to restructure the input data into new features
or a group of objects with similar patterns.
 The machine tries to find useful insights from the huge amount of data. It can be
further classifieds into two categories of algorithms:
o Clustering and Association
3) Reinforcement Learning
 Reinforcement learning is a feedback-based learning method, in which a learning
agent gets a reward for each right action and gets a penalty for each wrong action.
 The agent learns automatically with these feedbacks and improves its performance.
 In reinforcement learning, the agent interacts with the environment and explores it.

Applications of Machine learning in Industry and Real World


Machine learning is a buzzword for today's technology, and it is growing very
rapidly day by day. We are using machine learning in our daily life even without knowing
it such as Google Maps, Google assistant, Alexa, etc. Below are some most trending real-
world applications of Machine Learning:
6

1. Image Recognition:
 Image recognition is one of the most common applications of machine learning.
 It is used to identify objects, persons, places, digital images, etc.
 The popular use case of image recognition and face detection is, Automatic friend
tagging suggestion:
 Whenever we upload a photo with our Face book friends, then we automatically get
a tagging suggestion with name, and the technology behind this is machine
learning's face detection and recognition algorithm.
2. Speech Recognition

 While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.

 Speech recognition is a process of converting voice instructions into text, and it is


also known as "Speech to text", or "Computer speech recognition."
 At present, machine learning algorithms are widely used by various applications of
speech recognition.
3. Traffic prediction:

 If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.
 It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or
heavily congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
4. Product recommendations:
 Machine learning is widely used by various e-commerce and entertainment
companies such as Amazon, Netflix, etc., for product recommendation to the user.
7

5. Email Spam and Malware Filtering:

 Whenever we receive a new email, it is filtered automatically as important, normal,


and spam.

 We always receive an important mail in our inbox with the important symbol and
spam emails in our spam box, and the technology behind this is Machine learning.
 Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,
and Naïve Bayes classifier are used for email spam filtering and malware detection.
6. Virtual Personal Assistant:
 We have various virtual personal assistants such as Google
assistant, Alexa, Cortana, Siri. As the name suggests, they help us in finding the
information using our voice instruction.
 These assistants can help us in various ways just by our voice instructions such as
Play music, call someone, Open an email, Scheduling an appointment, etc.
7. Online Fraud Detection:

 Machine learning is making our online transaction safe and secure by detecting fraud
transaction.
 Whenever we perform some online transaction, there may be various ways that a
fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction.
8. Automatic Language Translation:
 Nowadays, if we visit a new place and we are not aware of the language then it is
not a problem at all, as for this also machine learning helps us by converting the text
into our known languages.
Difference between Supervised and Unsupervised Learning:

Supervised Learning Unsupervised Learning


8

Supervised learning algorithms are trained Unsupervised learning algorithms are


using labeled data. trained using unlabeled data.

This model takes direct feedback to check This model does not take any feedback.
if it is predicting correct output or not.

Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.

In supervised learning, input data is In unsupervised learning, only input data


provided to the model along with the is provided to the model.
output.

It can be categorized it can be classified


in Classification and Regression. in Clustering and Associations problems.

This model produces an accurate result. This model may give less accurate result
as compared to supervised learning.

It includes various algorithms such as It includes various algorithms such as


Linear Regression, Support Vector Clustering, KNN algorithms.
Machine, Decision tree, Bayesian Logic,
etc.

Bayes Theorem
 Bayes theorem is one of the most popular machine learning concepts that helps to
calculate the probability of occurring one event with uncertain knowledge while
other one has already occurred.
 Bayes' theorem can be derived using product rule and conditional probability of
event X with known event Y:
o According to the product rule we can express as the probability of event X with
known event Y as follows;
1. P(X ? Y)= P(X|Y) P(Y) {equation 1}
o Further, the probability of event Y with known event X:
1. P(X ? Y)= P(Y|X) P(X) {equation 2}
9

Mathematically, Bayes theorem can be expressed by combining both equations on right


hand side. We will get:

Here, both events X and Y are independent events which means probability of outcome of
both events does not depends one another.
The above equation is called as Bayes Rule or Bayes Theorem.
o P(X|Y) is called as posterior, which we need to calculate. It is defined as updated
probability after considering the evidence.
o P(Y|X) is called the likelihood. It is the probability of evidence when hypothesis is
true.
o P(X) is called the prior probability, probability of hypothesis before considering
the evidence
o P(Y) is called marginal probability. It is defined as the probability of evidence under
any consideration.
Hence, Bayes Theorem can be written as: posterior = likelihood * prior / evidence
Maximum Likelihood
 Maximum Likelihood Estimation (MLE) is a probabilistic based approach to
determine values for the parameters of the model. MLE is a widely used technique
in machine learning, time series, panel data and discrete data.
 The likelihood function measures the extent to which the data provide support for
different values of the parameter. It indicates how likely it is that a particular
population will produce a sample.
Working of Maximum Likelihood Estimation
 The maximization of the likelihood estimation is the main objective of the MLE.
10

 So MLE will calculate the possibility for each data point in salary and then by using
that possibility, it will calculate the likelihood of those data points to classify them
as either 0 or 1.
 It will repeat this process of likelihood until the learner line is best fitted. This
process is known as the maximization of likelihood.

 MLE is the base of a lot of supervised learning models, one of which is Logistic
regression.
 Logistic regression maximum likelihood technique to classify the data. Let’s see
how Logistic regression uses MLE.
 Specific MLE procedures have the advantage that they can exploit the properties of
the estimation problem to deliver better efficiency and numerical stability.
 These methods can often calculate explicit confidence intervals.
 The parameter “solver” of the logistic regression is used for selecting different
solving strategies for classification for better MLE formulation.
Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
11

o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can make
quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a certain feature
is independent of the occurrence of other features. Such as if the fruit is identified on
the bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized
as an apple. Hence each feature individually contributes to identify that it is an apple
without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
o The formula for Bayes' theorem is given as:

Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability of
a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
12

Working of Naïve Bayes' Classifier:


Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable
"Play". So using this dataset we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to solve this problem, we need to
follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:

Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes
Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:


13

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71


Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
14

o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but
the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.
Instanced based learning:
 The Machine Learning systems which are categorized as instance-based learning are
the systems that learn the training examples by heart and then generalize to new
instances based on some similarity measure.
 It is called instance-based because it builds the hypotheses from the training instances.
 It is also known as memory-based learning or lazy-learning .
 The time complexity of this algorithm depends upon the size of training data.
 Each time whenever a new query is encountered, its previously stores data is
examined.
 And assign to a target function value for the new instance.
 The worst-case time complexity of this algorithm is O (n), where n is the number of
training instances.
Some of the instance-based learning algorithms are:
1. K Nearest Neighbor (KNN)
2. Learning Vector Quantization (LVQ)
3. Locally Weighted Learning (LWL)
4. Case-Based Reasoning
K-Nearest Neighbor (KNN) Algorithm for Machine Learning
 K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
 K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
15

 K-NN algorithm stores all the available data and classifies a new data point based on
the similarity.
 This means when new data appears then it can be easily classified into a well suite
category by using K- NN algorithm.
 K-NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
 K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
 It is also called a lazy learner algorithm .
 KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new data. .
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories.
To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can
easily identify the category or class of a particular dataset. Consider the below diagram:

The K-NN working can be explained on the basis of the below algorithm:
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of K number of neighbors
16

 Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
 Step-4: Among these k neighbors, count the number of the data points in each
category.
 Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
 Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category. Consider
the below image:

 Firstly, we will choose the number of neighbors, so we will choose the k=5.
 Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:

 By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the
below image:

You might also like