ML - Unit 2

16CS401
Machine Learning
UNIT-II
Syllabus
 BAYESIAN DECISION THEORY: Introduction; Classification; Losses and risks; Discriminant

functions; Association rules.
 PARAMETRIC METHODS: Maximum likelihood estimation; Bernoulli density; Multinomial
density; gaussian (normal) density; Evaluating an estimator: Bias and variance.
 THE BAYES’ ESTIMATOR: Parametric classification; Regression; Tuning model complexity:
Bias/ variance dilemma; Model selection procedures.
Bayesian Decision
Theory:
Introduction
Bayesian Decision Theory: Introduction
• In classification, Bayes’ rule is used to calculate the probabilities of the classes.

• Bayesian Decision Theory (i.e. the Bayesian Decision Rule) predicts the outcome not
only based on previous observations, but also by taking the current situation into
account. The rule describes the most reasonable action to take based on an
observation.
• It is the decision making when all underlaying probability distributions are known.
• It is optimal, when the given distributions are known.
 Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
 The formula for Bayes' theorem is given as:
Where,
• P(c|x) is Posterior probability: Probability of hypothesis ‘c’ on the observed event ‘x’.
• P(x|c) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.
• P(c) is Prior Probability: Probability of hypothesis before observing the evidence.
• P(x) is Marginal Probability: Probability of Evidence.

 Working of Bayes' Classifier can be understood with the help of the below example:
 Suppose we have a dataset of weather conditions and corresponding target variable

"Play". So using this dataset we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to solve this problem, we need to
follow the below steps:
1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the
Player should play or not?
Outlook Play Frequency table for the Weather Conditions:

0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes Weather Yes No
4 Sunny No Overcast 5 0
5 Rainy Yes Rainy 2 2
6 Sunny Yes Sunny 3 2
7 Overcast Yes
8 Rainy No
Total 10 4
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Outlook Play
0 Rainy Yes Frequency table for the Likelihood table weather
1 Sunny Yes Weather Conditions:
2 Overcast Yes
condition:
4 Sunny No Weather Yes No
5 Rainy Yes Overcast 5 0 Overcast 5 0 P(O)=> 5/14= 0.35
6 Sunny Yes Rainy 2 2 Rainy 2 2 P(R)=> 4/14=0.29
7 Overcast Yes Sunny 3 2 Sunny 3 2 P(S)=> 5/14=0.35
8 Rainy No
Total 10 4 10/14=0.7 4/14
9 Sunny No
All 1 =0.29
10 Sunny Yes P(Y) P(N)
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Applying Bayes’
theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny) P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3 P(Sunny|NO)= 2/4=0.5

P(Sunny)= 0.35 P(No)= 0.29
P(Yes)=0.71 P(Sunny)= 0.35
So P(Yes| So P(No|Sunny)=
Sunny) = 0.5*0.29/0.35 = 0.41
0.3*0.71/0.35= So as we can see from the above calculation that
0.60 P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Outlook Play Frequency table for the Weather Conditions:

0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
4 Sunny No Overcast 5 0
5 Rainy Yes Rainy 2 2
6 Sunny Yes Sunny 3 2
7 Overcast Yes
8 Rainy No
Total 10 4
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Let us consider and example:

• Tossing a coin is a random process because we cannot predict at any toss whether the
outcome will be heads or tails—that is why we toss coins, or buy lottery tickets, or
get insurance.
• We can only talk about the probability that the outcome of the next toss will be
heads or tails.
• We may have access to extra knowledge such as the exact composition of the coin,
its initial position, the force and its direction that is applied to the coin when
tossing it, where and how it is caught, and so forth, the exact outcome of the toss
can be predicted.
• The extra pieces of knowledge that we do not have access to are named as the
unobservable variables.
• In the coin tossing example, the only observable variable is the outcome of the toss.
• Denoting the unobservable by Z and the observable as x.
𝑥=𝑓 𝑍
Where f(·) is the deterministic function that defines the outcome from the
unobservable pieces of knowledge
We cannot model the process in this way, we define the outcome X as a random
variable drawn from a probability distribution P(X = 𝑥 ) that specifies the
process.
• Let us say X = 1 denotes that the outcome of a toss is heads and X = 0 denotes tails.
• Such X are Bernoulli-distributed where the parameter of the distribution 𝑝0 is
the probability that the outcome is heads:
• Our prediction will be heads  if 𝑝0 > 0.5, otherwise tails.

• This is because if we choose the more probable case, the probability of error is 1
minus the probability of our choice, will be minimum.
• If we don’t know P(X) and want to estimate this from a given sample.
• We have a sample, X, containing examples drawn from the probability distribution
of the observables 𝑥 𝑡 , denoted as p(x).
• The aim is to build an approximator to it, 𝑝Ƹ 𝑥 , using the sample X.
• The sample contains the outcomes of the past N tosses. Then using X, we
can estimate 𝑝0, which is the parameter that uniquely specifies the
distribution.
• Our estimate of 𝑝0 is:
• Numerically using the random variables, 𝑥 𝑡 is 1 if the outcome of toss t is heads

and 0 otherwise.
• Given the sample {heads, heads, heads, tails, heads, tails, tails, heads, heads},
• we have X={1, 1, 1, 0, 1, 0, 0, 1, 1} and the estimate is:

ML - Unit 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML - Unit 2

Uploaded by

Copyright:

Available Formats

16CS401

 BAYESIAN DECISION THEORY: Introduction; Classification; Losses and risks; Discriminant

• In classiﬁcation, Bayes’ rule is used to calculate the probabilities of the classes.

 The formula for Bayes' theorem is given as:

• P(c) is Prior Probability: Probability of hypothesis before observing the evidence.

• P(x) is Marginal Probability: Probability of Evidence.

 Suppose we have a dataset of weather conditions and corresponding target variable

1. Convert the given dataset into frequency tables.

Outlook Play Frequency table for the Weather Conditions:

P(Sunny|Yes)= 3/10= 0.3 P(Sunny|NO)= 2/4=0.5

Outlook Play Frequency table for the Weather Conditions:

Let us consider and example:

• Our prediction will be heads  if 𝑝0 > 0.5, otherwise tails.

• Numerically using the random variables, 𝑥 𝑡 is 1 if the outcome of toss t is heads

You might also like