You are on page 1of 15

16CS401

Machine Learning
UNIT-II

Syllabus

 BAYESIAN DECISION THEORY: Introduction; Classification; Losses and risks; Discriminant


functions; Association rules.
 PARAMETRIC METHODS: Maximum likelihood estimation; Bernoulli density; Multinomial
density; gaussian (normal) density; Evaluating an estimator: Bias and variance.
 THE BAYES’ ESTIMATOR: Parametric classification; Regression; Tuning model complexity:
Bias/ variance dilemma; Model selection procedures.
Bayesian Decision
Theory:
Introduction
Bayesian Decision Theory: Introduction

• In classification, Bayes’ rule is used to calculate the probabilities of the classes.


• Bayesian Decision Theory (i.e. the Bayesian Decision Rule) predicts the outcome not
only based on previous observations, but also by taking the current situation into
account. The rule describes the most reasonable action to take based on an
observation.
• It is the decision making when all underlaying probability distributions are known.
• It is optimal, when the given distributions are known.
Bayesian Decision Theory: Introduction

 Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.

 The formula for Bayes' theorem is given as:

Where,
• P(c|x) is Posterior probability: Probability of hypothesis ‘c’ on the observed event ‘x’.

• P(x|c) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.

• P(c) is Prior Probability: Probability of hypothesis before observing the evidence.

• P(x) is Marginal Probability: Probability of Evidence.


Bayesian Decision Theory: Introduction

 Working of Bayes' Classifier can be understood with the help of the below example:

 Suppose we have a dataset of weather conditions and corresponding target variable


"Play". So using this dataset we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to solve this problem, we need to
follow the below steps:

1. Convert the given dataset into frequency tables.


2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the
Player should play or not?

Outlook Play Frequency table for the Weather Conditions:


0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes Weather Yes No
4 Sunny No Overcast 5 0
5 Rainy Yes Rainy 2 2
6 Sunny Yes Sunny 3 2
7 Overcast Yes
8 Rainy No
Total 10 4
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Problem: If the weather is sunny, then the
Player should play or not?
Outlook Play
0 Rainy Yes Frequency table for the Likelihood table weather
1 Sunny Yes Weather Conditions:
2 Overcast Yes
condition:
3 Overcast Yes Weather Yes No
4 Sunny No Weather Yes No
5 Rainy Yes Overcast 5 0 Overcast 5 0 P(O)=> 5/14= 0.35
6 Sunny Yes Rainy 2 2 Rainy 2 2 P(R)=> 4/14=0.29
7 Overcast Yes Sunny 3 2 Sunny 3 2 P(S)=> 5/14=0.35
8 Rainy No
Total 10 4 10/14=0.7 4/14
9 Sunny No
All 1 =0.29
10 Sunny Yes P(Y) P(N)
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Problem: If the weather is sunny, then the
Player should play or not?

Applying Bayes’
theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny) P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3 P(Sunny|NO)= 2/4=0.5


P(Sunny)= 0.35 P(No)= 0.29
P(Yes)=0.71 P(Sunny)= 0.35
So P(Yes| So P(No|Sunny)=
Sunny) = 0.5*0.29/0.35 = 0.41
0.3*0.71/0.35= So as we can see from the above calculation that
0.60 P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Problem: If the weather is sunny, then the
Player should play or not?

Outlook Play Frequency table for the Weather Conditions:


0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes Weather Yes No
4 Sunny No Overcast 5 0
5 Rainy Yes Rainy 2 2
6 Sunny Yes Sunny 3 2
7 Overcast Yes
8 Rainy No
Total 10 4
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Bayesian Decision Theory: Introduction

Let us consider and example:


• Tossing a coin is a random process because we cannot predict at any toss whether the
outcome will be heads or tails—that is why we toss coins, or buy lottery tickets, or
get insurance.
• We can only talk about the probability that the outcome of the next toss will be
heads or tails.
• We may have access to extra knowledge such as the exact composition of the coin,
its initial position, the force and its direction that is applied to the coin when
tossing it, where and how it is caught, and so forth, the exact outcome of the toss
can be predicted.
Bayesian Decision Theory: Introduction

• The extra pieces of knowledge that we do not have access to are named as the
unobservable variables.
• In the coin tossing example, the only observable variable is the outcome of the toss.
• Denoting the unobservable by Z and the observable as x.
𝑥=𝑓 𝑍
Where f(·) is the deterministic function that defines the outcome from the
unobservable pieces of knowledge
We cannot model the process in this way, we define the outcome X as a random
variable drawn from a probability distribution P(X = 𝑥 ) that specifies the
process.
Bayesian Decision Theory: Introduction

• Let us say X = 1 denotes that the outcome of a toss is heads and X = 0 denotes tails.
• Such X are Bernoulli-distributed where the parameter of the distribution 𝑝0 is
the probability that the outcome is heads:

• Our prediction will be heads  if 𝑝0 > 0.5, otherwise tails.


• This is because if we choose the more probable case, the probability of error is 1
minus the probability of our choice, will be minimum.
Bayesian Decision Theory: Introduction

• If we don’t know P(X) and want to estimate this from a given sample.
• We have a sample, X, containing examples drawn from the probability distribution
of the observables 𝑥 𝑡 , denoted as p(x).
• The aim is to build an approximator to it, 𝑝Ƹ 𝑥 , using the sample X.
• The sample contains the outcomes of the past N tosses. Then using X, we
can estimate 𝑝0, which is the parameter that uniquely specifies the
distribution.
• Our estimate of 𝑝0 is:
Bayesian Decision Theory: Introduction

• Numerically using the random variables, 𝑥 𝑡 is 1 if the outcome of toss t is heads


and 0 otherwise.
• Given the sample {heads, heads, heads, tails, heads, tails, tails, heads, heads},
• we have X={1, 1, 1, 0, 1, 0, 0, 1, 1} and the estimate is:

You might also like